lots more cleanups. people should check these over.

svn:r3593
This commit is contained in:
Roger Dingledine 2005-02-09 04:34:50 +00:00
parent c5c46d6fb6
commit e4989f33c9

View File

@ -82,7 +82,7 @@ design and goals. Here we describe some policy, social, and technical
issues that we face as we continue deployment.
Rather than providing complete solutions to every problem, we
instead lay out the challenges and constraints that we have observed while
deploying Tor in the wild. In doing so, we aim to provide a research agenda
deploying Tor. In doing so, we aim to provide a research agenda
of general interest to projects attempting to build
and deploy practical, usable anonymity networks in the wild.
@ -179,10 +179,9 @@ for use in securing government
communications, and by the Electronic Frontier Foundation for use
in maintaining civil liberties for ordinary citizens online. The Tor
protocol is one of the leading choices
for anonymizing layer in the European Union's PRIME directive to
for the anonymizing layer in the European Union's PRIME directive to
help maintain privacy in Europe.
% XXXX We should credit the specific group, not the whole university.
The University of Dresden in Germany
The AN.ON project in Germany
has integrated an independent implementation of the Tor protocol into
their popular Java Anon Proxy anonymizing client.
% This wide variety of
@ -220,14 +219,16 @@ of intra-network~\cite{back01,attack-tor-oak05,flow-correlation04} and
end-to-end~\cite{danezis-pet2004,SS03} anonymity-breaking attacks.
Tor does not attempt to defend against a global observer. In general, an
attacker who can observe both ends of a connection through the Tor network
attacker who can measure both ends of a connection through the Tor network
% I say 'measure' rather than 'observe', to encompass murdoch-danezis
% style attacks. -RD
can correlate the timing and volume of data on that connection as it enters
and leaves the network, and so link communication partners.
Known solutions to this attack would seem to require introducing a
prohibitive degree of traffic padding between the user and the network, or
introducing an unacceptable degree of latency (but see Section
\ref{subsec:mid-latency}). Also, it is not clear that these methods would
work at all against even a minimally active adversary who could introduce timing
work at all against a minimally active adversary who could introduce timing
patterns or additional traffic. Thus, Tor only attempts to defend against
external observers who cannot observe both sides of a user's connections.
@ -267,7 +268,7 @@ responders.
%However, it is still essentially confirming
%suspected communicants where the responder suspects are ``stored'' rather
%than observed at the same time as the client.
Similarly latencies of going through various routes can be
Similarly, latencies of going through various routes can be
cataloged~\cite{back01} to connect endpoints.
% XXX hintz-pet02 just looked at data volumes of the sites. this
% doesn't require much variability or storage. I think it works
@ -286,18 +287,17 @@ rather than halt the attacks in the cases where they succeed.
%routes through the network to each site will be random even if they
%have relatively unique latency characteristics. So this does not seem
%an immediate practical threat.
Along similar lines, the same
paper suggested a ``clogging attack''. In \cite{attack-tor-oak05}, a
version of this was demonstrated to be practical against portions of
the fifty node Tor network as deployed in mid 2004. There it was shown
that an outside attacker can trace a stream through the Tor network
while a stream is still active by observing the latency of his
own traffic sent through various Tor nodes. These attacks do not show
client and server addresses, only the first and last nodes within the Tor
network, so it is still necessary to observe those nodes to complete the
attacks. This may make
helper nodes all the more worthy of exploration (see
Section~\ref{subsec:helper-nodes}).
Along similar lines, the same paper suggests a ``clogging
attack''. Murdoch and Danezis~\cite{attack-tor-oak05} show a practical
clogging attack against portions of
the fifty node Tor network as deployed in mid 2004.
An outside attacker can actively trace a circuit through the Tor network
by observing changes in the latency of his
own traffic sent through various Tor nodes. These attacks only reveal
the Tor nodes in the circuit, not initiator and responder addresses,
so it is still necessary to discover the endpoints to complete the
attacks. Increasing the size and diversity of the Tor network may
help counter these attacks.
%discuss $\frac{c^2}{n^2}$, except how in practice the chance of owning
%the last hop is not $c/n$ since that doesn't take the destination (website)
@ -389,18 +389,18 @@ handles only web browsing rather than arbitrary TCP\@.
Zero-Knowledge Systems' commercial Freedom
network~\cite{freedom21-security} was even more flexible than Tor in
transporting arbitrary IP packets, and also supported
pseudonymous in addition to anonymity; but it has
pseudonymity in addition to anonymity; but it has
a different approach to sustainability (collecting money from users
and paying ISPs to run Tor nodes), and was eventually shut down due to financial
load. Finally, potentially
load. Finally,
more scalable peer-to-peer designs like Tarzan~\cite{tarzan:ccs02} and
MorphMix~\cite{morphmix:fc04} have been proposed in the literature, but
have not yet been fielded. These systems differ somewhat
have not been fielded. These systems differ somewhat
in threat model and presumably practical resistance to threats.
MorphMix is close to Tor in circuit setup, and, by separating
node discovery from route selection from circuit setup, Tor is
flexible enough to potentially contain a MorphMix experiment within
it. We direct the interested reader
Note that MorphMix and Tor differ only in
node discovery and circuit setup; so Tor's architecture is flexible
enough to contain a MorphMix experiment.
We direct the interested reader
to~\cite{tor-design} for a more in-depth review of related work.
Tor also differs from other deployed systems for traffic analysis resistance
@ -440,8 +440,8 @@ Tor's interaction with other services on the Internet.
\subsection{Communicating security}
Usability for anonymity systems
contributes directly to their security, because usability
effects the possible anonymity set~\cite{econymics,back01}.
contributes to their security, because usability
affects the possible anonymity set~\cite{econymics,back01}.
Conversely, an unusable system attracts few users and thus can't provide
much anonymity.
@ -483,10 +483,10 @@ the initiator to her destination.% This is why Tor's threat model is
Like Tor, the current JAP implementation does not pad connections
apart from using small fixed-size cells for transport. In fact,
JAP's cascade-based network topology may be more vulnerable to these
attacks, because the network has fewer edges. JAP was born out of
attacks, because its network has fewer edges. JAP was born out of
the ISDN mix design~\cite{isdn-mixes}, where padding made sense because
every user had a fixed bandwidth allocation and altering the timing
pattern of packets could be immediately detected, but in its current context
pattern of packets could be immediately detected. But in its current context
as a general Internet web anonymizer, adding sufficient padding to JAP
would probably be prohibitively expensive and ineffective against a
minimally active attacker.\footnote{Even if JAP could
@ -498,10 +498,6 @@ model the number of concurrent users does not seem to have much impact
on the anonymity provided, we suggest that JAP's anonymity meter is not
accurately communicating security levels to its users.
% because more users don't help anonymity much, we need to rely more
% on other incentive schemes, both policy-based (see sec x) and
% technically enforced (see sec y)
On the other hand, while the number of active concurrent users may not
matter as much as we'd like, it still helps to have some other users
on the network. We investigate this issue next.
@ -666,8 +662,8 @@ So when letters arrive, operators are likely to face
pressure to block file-sharing applications entirely, in order to avoid the
hassle.
But blocking file-sharing would not necessarily be easy; many popular
protocols have evolved to run on a non-standard ports in order to
But blocking file-sharing is not easy: many popular
protocols have evolved to run on non-standard ports to
get around other port-based bans. Thus, exit node operators who want to
block file-sharing would have to find some way to integrate Tor with a
protocol-aware exit filter. This could be a technically expensive
@ -706,29 +702,27 @@ file-sharing protocols that have separate control and data channels.
It was long expected that, alongside legitimate users, Tor would also
attract troublemakers who exploited Tor in order to abuse services on the
Internet with vandalism, rude mail, and so on.
%[XXX we're not talking bandwidth abuse here, we're talking vandalism,
%hate mails via hotmail, attacks, etc.]
Our initial answer to this situation was to use ``exit policies''
to allow individual Tor nodes to block access to specific IP/port ranges.
This approach aims to make operators more willing to run Tor by allowing
them to prevent their nodes from being used for abusing particular
services. For example, all Tor nodes currently block SMTP (port 25), in
order to avoid being used for spam.
services. For example, all Tor nodes currently block SMTP (port 25),
to avoid being used for spam.
This approach is useful, but is insufficient for two reasons. First, since
Exit policies are useful, but are insufficient for two reasons. First, since
it is not possible to force all nodes to block access to any given service,
many of those services try to block Tor instead. More broadly, while being
blockable is important to being good netizens, we would like to encourage
services to allow anonymous access; services should not need to decide
services to allow anonymous access. Services should not need to decide
between blocking legitimate anonymous use and allowing unlimited abuse.
This is potentially a bigger problem than it may appear.
On the one hand, people should be allowed to refuse connections to
their services. But, it's not just
for himself that a node administrator is deciding when he decides
whether he prefers to be able to post to Wikipedia from his Tor node address,
or to allow
people to read Wikipedia anonymously through his Tor node. (Wikipedia
On the one hand, services should be allowed to refuse connections from
sources of possible abuse.
But when a Tor node administrator decides whether he prefers to be able
to post to Wikipedia from his IP address, or to allow people to read
Wikipedia anonymously through his Tor node, he is making the decision
for others as well. (Wikipedia
has blocked all posting from all Tor nodes based on IP addresses.) If
the Tor node shares an address with a campus or corporate NAT,
then the decision can prevent the entire population from posting.
@ -736,10 +730,9 @@ This is a loss for both Tor
and Wikipedia: we don't want to compete for (or divvy up) the
NAT-protected entities of the world.
Worse, many IP blacklists are not terribly fine-grained.
No current IP blacklist, for example, allows a service provider to blacklist
only those Tor nodes that allow access to a specific IP or port, even
though this information is readily available. One IP blacklist even bans
Worse, many IP blacklists are coarse-grained. Some
ignore Tor's exit policies, preferring to punish
all Tor nodes. One IP blacklist even bans
every class C network that contains a Tor node, and recommends banning SMTP
from these networks even though Tor does not allow SMTP at all. This
coarse-grained approach is typically a strategic decision to discourage the
@ -751,6 +744,7 @@ to shut it down in order to get unblocked themselves.
%[XXX Mention: it's not dumb, it's strategic!]
%[XXX Mention: for some servops, any blacklist is a blacklist too many,
% because it is risky. (Guy lives in apt _building_ with one IP.)]
%XXX roger should add more
Problems of abuse occur mainly with services such as IRC networks and
Wikipedia, which rely on IP blocking to ban abusive users. While at first
@ -771,7 +765,7 @@ this is why services use IP blocking. In order to deter abuse, pseudonymous
identities need to require a significant switching cost in resources or human
time. Some popular webmail applications
impose cost with Reverse Turing Tests, but these may not be costly enough to
deter abusers. Freedom solved this using blind signatures to limit
deter abusers. Freedom used blind signatures to limit
the number of pseudonyms for each paying account, but Tor has neither the
ability nor the desire to collect payment.
@ -779,7 +773,7 @@ ability nor the desire to collect payment.
%non-anonymous costly identification mechanism to allow access to a
%blind-signature pseudonym protocol. This would effectively create costly
%pseudonyms, which services could require in order to allow anonymous access.
%This approach has difficulties in practise, however:
%This approach has difficulties in practice, however:
%\begin{tightlist}
%\item Unlike Freedom, Tor is not a commercial service. Therefore, it would
% be a shame to require payment in order to make Tor useful, or to make
@ -828,21 +822,21 @@ at the IP layer. Before this could be done, many issues need to be resolved:
IP-level packet normalization, to stop things like TCP fingerprinting
attacks. %There likely exist libraries that can help with this.
This is unlikely to be a trivial task, given the diversity and complexity of
various TCP stacks.
TCP stacks.
\item \emph{Application-level streams still need scrubbing.} We still need
Tor to be easy to integrate with user-level application-specific proxies
such as Privoxy. So it's not just a matter of capturing packets and
anonymizing them at the IP layer.
\item \emph{Certain protocols will still leak information.} For example, we
must rewrite DNS requests so they are delivered to an unlinkable DNS server
rather than a DNS server at a user's ISP;thus, we must understand the
rather than the DNS server at a user's ISP; thus, we must understand the
protocols we are transporting.
\item \emph{The crypto is unspecified.} First we need a block-level encryption
approach that can provide security despite
packet loss and out-of-order delivery. Freedom allegedly had one, but it was
never publicly specified.
Also, TLS over UDP is not yet implemented or
specified, though some early work has begun on that~\cite{dtls}.
specified, though some early work has begun~\cite{dtls}.
\item \emph{We'll still need to tune network parameters.} Since the above
encryption system will likely need sequence numbers (and maybe more) to do
replay detection, handle duplicate frames, and so on, we will be reimplementing
@ -863,8 +857,8 @@ which nodes will allow which packets to exit.
support hidden service {\tt{.onion}} addresses (and other special addresses,
like {\tt{.exit}} which lets the user request a particular exit node),
by intercepting the addresses when they are passed to the Tor client.
Doing so at the IP level would require more complex interface between
Tor and local DNS resolver.
Doing so at the IP level would require a more complex interface between
Tor and the local DNS resolver.
\end{enumerate}
This list is discouragingly long, but being able to transport more
@ -930,14 +924,13 @@ quality of those choices.
\subsection{Enclaves and helper nodes}
\label{subsec:helper-nodes}
It has long been thought that users can improve their
anonymity by running their
own node~\cite{tor-design,or-ih96,or-pet00}, and using it in an
\emph{enclave} configuration, where all their circuits begin at the node
under their control. By running Tor clients only on Tor nodes
at the enclave perimeter, enclave configuration can also permit anonymity
protection even when policy or other requirements prevent individual machines
within the enclave from running Tor clients~\cite{or-jsac98,or-discex00}.
It has long been thought that users can improve their anonymity by
running their own node~\cite{tor-design,or-ih96,or-pet00}, and using
it in an \emph{enclave} configuration, where all their circuits begin
at the node under their control. Running Tor clients or servers at
the enclave perimeter is useful when policy or other requirements
prevent individual machines within the enclave from running Tor
clients~\cite{or-jsac98,or-discex00}.
Of course, Tor's default path length of
three is insufficient for these enclaves, since the entry and/or exit
@ -1041,8 +1034,8 @@ News sites like Bloggers Without Borders (www.b19s.org) are advertising
a hidden-service address on their front page. Doing this can provide
increased robustness if they use the dual-IP approach we describe
in~\cite{tor-design},
but in practice they do it first to increase visibility
of the Tor project and their support for privacy, and second to offer
but in practice they do it to increase visibility
of the Tor project and their support for privacy, and to offer
a way for their users, using unmodified software, to get end-to-end
encryption and authentication to their website.
@ -1077,8 +1070,11 @@ adversary, nodes should be in ASes that have the most links to other ASes:
Tier-1 ISPs such as AT\&T and Abovenet. Further, a given transaction
is safest when it starts or ends in a Tier-1 ISP\@. Therefore, assuming
initiator and responder are both in the U.S., it actually \emph{hurts}
our location diversity to enter or exit from far-flung nodes in
our location diversity to use far-flung nodes in
continents like Asia or South America.
% it's not just entering or exiting from them. using them as the middle
% hop reduces your effective path length, which you presumably don't
% want because you chose that path length for a reason.
Many open questions remain. First, it will be an immense engineering
challenge to get an entire BGP routing table to each Tor client, or to
@ -1089,9 +1085,11 @@ and MorphMix~\cite{morphmix:fc04} suggest that we compare IP prefixes to
determine location diversity; but the above paper showed that in practice
many of the Mixmaster nodes that share a single AS have entirely different
IP prefixes. When the network has scaled to thousands of nodes, does IP
prefix comparison become a more useful approximation? Alternatively, can
relevant parts of the routing tables be summarized centrally and delivered to
clients in a less verbose format?
prefix comparison become a more useful approximation? % Alternatively, can
%relevant parts of the routing tables be summarized centrally and delivered to
%clients in a less verbose format?
%% i already said "or to summarize is sufficiently" above. is that not
%% enough? -RD
%
Second, we can take advantage of caching certain content at the
exit nodes, to limit the number of requests that need to leave the
@ -1106,7 +1104,7 @@ anonymity against larger real-world adversaries who can take advantage
of knowing our algorithm?
%
Fourth, can we use this knowledge to figure out which gaps in our network
most effect our robustness to this class of attack, and go recruit
most affect our robustness to this class of attack, and go recruit
new nodes with those ASes in mind?
%Tor's security relies in large part on the dispersal properties of its
@ -1141,7 +1139,7 @@ to the users inside the country, and give them software to use them,
without letting the censors also enumerate this list and block each
relay. Anonymizer solves this by buying lots of seemingly-unrelated IP
addresses (or having them donated), abandoning old addresses as they are
`used up', and telling a few users about the new ones. Distributed
`used up,' and telling a few users about the new ones. Distributed
anonymizing networks again have an advantage here, in that we already
have tens of thousands of separate IP addresses whose users might
volunteer to provide this service since they've already installed and use
@ -1152,7 +1150,7 @@ to generate node descriptors and send them to a special directory
server that gives them out to dissidents who need to get around blocks.
Of course, this still doesn't prevent the adversary
from enumerating and preemtively blocking the volunteer relays.
from enumerating and preemptively blocking the volunteer relays.
Perhaps a tiered-trust system could be built where a few individuals are
given relays' locations, and they recommend other individuals by telling them
those addresses, thus providing a built-in incentive to avoid letting the
@ -1169,15 +1167,17 @@ help address censorship; we wish them success.
Tor is running today with hundreds of nodes and tens of thousands of
users, but it will certainly not scale to millions.
Scaling Tor involves three main challenges. First is safe node discovery,
both while bootstrapping (how does Tor client robustly find an initial node
list?) and later (how does Tor client can learn about a fair sample of honest
nodes and not let the adversary control his circuits?) Second is detecting
and handling the speed and reliability of the variety of nodes as the network
becomes increasingly heterogeneous: since the speed and reliability of a
circuit is limited by its worst link, we must learn to track and predict
performance. Third, in order to get a large set of nodes in the first
place, we must address incentives for users to carry traffic for others.
Scaling Tor involves four main challenges. First, in order to get a
large set of nodes in the first place, we must address incentives for
users to carry traffic for others. Next is safe node discovery, both
while bootstrapping (how does a Tor client robustly find an initial
node list?) and later (how does a Tor client learn about a fair sample
of honest nodes and not let the adversary control his circuits?).
We must also detect and handle node speed and reliability as the network
becomes increasingly heterogeneous: since the speed and reliability
of a circuit is limited by its worst link, we must learn to track and
predict performance. Finally, we must stop assuming that all points on
the network can connect to all other points.
\subsection{Incentives by Design}
@ -1246,17 +1246,6 @@ large set of nodes that meet some minimum service threshold
without opening Alice up as much to attacks. All of this requires
further study.
%XXX rewrite the above so it sounds less like a grant proposal and
%more like a "if somebody were to try to solve this, maybe this is a
%good first step".
%We should implement the above incentive scheme in the
%deployed Tor network, in conjunction with our plans to add the necessary
%associated scalability mechanisms. We will do experiments (simulated
%and/or real) to determine how much the incentive system improves
%efficiency over baseline, and also to determine how far we are from
%optimal efficiency (what we could get if we ignored the anonymity goals).
\subsection{Trust and discovery}
\label{subsec:trust-and-discovery}