More tweaks, grammar, etc. I say it's ready to submit.

svn:r3605
This commit is contained in:
Paul Syverson 2005-02-09 17:42:21 +00:00
parent 1cc0933d93
commit 0d5dedd732

View File

@ -245,8 +245,8 @@ complicating factors:
(3)~Users do not in fact choose nodes with uniform probability; they
favor nodes with high bandwidth or uptime, and exit nodes that
permit connections to their favorite services.
See Section~\ref{subsec:routing-zones} for discussion of larger
adversaries and our dispersal goals.
(See Section~\ref{subsec:routing-zones} for discussion of larger
adversaries and our dispersal goals.)
% I'm trying to make this paragraph work without reference to the
% analysis/confirmation distinction, which we haven't actually introduced
@ -270,15 +270,25 @@ in the presence of the variability and volume quantization introduced by the
Tor network, but it seems likely that these factors will at best delay
rather than halt the attacks in the cases where they succeed.
Along similar lines, the same paper suggests a ``clogging
attack.'' Murdoch and Danezis~\cite{attack-tor-oak05} show a practical
clogging attack against portions of
attack'' in which the throughput on a circuit is observed to slow
down when an adversary clogs the right nodes with his own traffic.
To determine the nodes in a circuit this attack requires the ability
to continuously monitor the traffic exiting the network on a circuit
that is up long enough to probe all network nodes in binary fashion.
% Though somewhat related, clogging and interference are really different
% attacks with different assumptions about adversary distribution and
% capabilities as well as different techniques. -pfs
Murdoch and Danezis~\cite{attack-tor-oak05} show a practical
interference attack against portions of
the fifty node Tor network as deployed in mid 2004.
An outside attacker can actively trace a circuit through the Tor network
by observing changes in the latency of his
own traffic sent through various Tor nodes. These attacks only reveal
own traffic sent through various Tor nodes. This can be done
simultaneously at multiple nodes; however, like clogging,
this attack only reveals
the Tor nodes in the circuit, not initiator and responder addresses,
so it is still necessary to discover the endpoints to complete the
attacks. Increasing the size and diversity of the Tor network may
so it is still necessary to discover the endpoints to complete an
effective attack. Increasing the size and diversity of the Tor network may
help counter these attacks.
%discuss $\frac{c^2}{n^2}$, except how in practice the chance of owning
@ -374,10 +384,11 @@ Anon Proxy~\cite{web-mix} provides similar functionality to Tor but
handles only web browsing rather than all TCP\@.
%Some peer-to-peer file-sharing overlay networks such as
%Freenet~\cite{freenet} and Mute~\cite{mute}
Zero-Knowledge Systems' Freedom
network~\cite{freedom21-security} was even more flexible than Tor in
The Freedom
network from Zero-Knowledge Systems~\cite{freedom21-security}
was even more flexible than Tor in
transporting arbitrary IP packets, and also supported
pseudonymity in addition to anonymity; but it has
pseudonymity in addition to anonymity; but it had
a different approach to sustainability (collecting money from users
and paying ISPs to run Tor nodes), and was eventually shut down due to financial
load. Finally, %potentially more scalable
@ -537,7 +548,7 @@ to dissuade them.
\subsection{Sustainability and incentives}
One of the unsolved problems in low-latency anonymity designs is
how to keep the nodes running. Zero-Knowledge Systems's Freedom network
how to keep the nodes running. ZKS's Freedom network
depended on paying third parties to run its servers; the JAP project's
bandwidth depends on grants to pay for its bandwidth and
administrative expenses. In Tor, bandwidth and administrative costs are
@ -552,8 +563,8 @@ We have not formally surveyed Tor node operators to learn why they are
running nodes, but
from the information they have provided, it seems that many of them run Tor
nodes for reasons of personal interest in privacy issues. It is possible
that others are running Tor for their own
anonymity reasons, but of course they are
that others are running Tor nodes for the protection of their own
anonymity, but of course they are
hardly likely to tell us specifics if they are.
%Significantly, Tor's threat model changes the anonymity incentives for running
%a node. In a high-latency mix network, users can receive additional
@ -914,7 +925,8 @@ node. A hostile web server can send constant interference traffic to
all nodes in the network, and learn which nodes are involved in the
circuit (though at least in the current attack, he can't learn their
order). Using randomized path lengths may help some, since the attacker
will never be certain he has identified all nodes in the path, but as
will never be certain he has identified all nodes in the path unless
he probes the entire network, but as
long as the network remains small this attack will still be feasible.
Helper nodes also aim to help Tor clients, because choosing entry and exit
@ -924,8 +936,8 @@ even a few nodes to eventually link some of their destinations. The goal
is to take the risk once and for all about choosing a bad entry node,
rather than taking a new risk for each new circuit. (Choosing fixed
exit nodes is less useful, since even an honest exit node still doesn't
protect against a hostile website.) But obstacles still remain before
we can implement them.
protect against a hostile website.) But obstacles remain before
we can implement helper nodes.
For one, the literature does not describe how to choose helpers from a list
of nodes that changes over time. If Alice is forced to choose a new entry
helper every $d$ days and $c$ of the $n$ nodes are bad, she can expect
@ -1040,6 +1052,13 @@ continents like Asia or South America.
% it's not just entering or exiting from them. using them as the middle
% hop reduces your effective path length, which you presumably don't
% want because you chose that path length for a reason.
%
% Not sure I buy that argument. Two end nodes in the right ASs to
% discourage linking are still not known to each other. If some
% adversary in a single AS can bridge the middle node, it shouldn't
% therefore be able to identify initiator or responder; although it could
% contribute to further attacks given more assumptions.
% Nonetheless, no change to the actual text for now.
Many open questions remain. First, it will be an immense engineering
challenge to get an entire BGP routing table to each Tor client, or to
@ -1095,12 +1114,12 @@ Anti-censorship networks hoping to bridge country-level blocks face
a variety of challenges. One of these is that they need to find enough
exit nodes---servers on the `free' side that are willing to relay
traffic from users to their final destinations. Anonymizing
networks including Tor are well-suited to this task, since we have
networks incorporating Tor are well-suited to this task since we have
already gathered a set of exit nodes that are willing to tolerate some
political heat.
The other main challenge is to distribute a list of reachable relays
to the users inside the country, and give them software to use them,
to the users inside the country, and give them software to use those relays,
without letting the censors also enumerate this list and block each
relay. Anonymizer solves this by buying lots of seemingly-unrelated IP
addresses (or having them donated), abandoning old addresses as they are
@ -1117,7 +1136,8 @@ server that gives them out to dissidents who need to get around blocks.
Of course, this still doesn't prevent the adversary
from enumerating and preemptively blocking the volunteer relays.
Perhaps a tiered-trust system could be built where a few individuals are
given relays' locations, and they recommend other individuals by telling them
given relays' locations. They could then recommend other individuals
by telling them
those addresses, thus providing a built-in incentive to avoid letting the
adversary intercept them. Max-flow trust algorithms~\cite{advogato}
might help to bound the number of IP addresses leaked to the adversary. Groups
@ -1131,13 +1151,12 @@ help address censorship; we wish them success.
Tor is running today with hundreds of nodes and tens of thousands of
users, but it will certainly not scale to millions.
Scaling Tor involves four main challenges. First, to get a
large set of nodes in the first place, we must address incentives for
large initial set of nodes, we must address incentives for
users to carry traffic for others. Next is safe node discovery, both
while bootstrapping (how does a Tor client robustly find an initial
node list?) and later (how does a Tor client learn about a fair sample
of honest nodes and not let the adversary control his circuits?).
while bootstrapping (Tor clients must robustly find an initial
node list) and later (Tor client must learn about a fair sample
of honest nodes and not let the adversary control his circuits).
We must also detect and handle node speed and reliability as the network
becomes increasingly heterogeneous: since the speed and reliability
of a circuit is limited by its worst link, we must learn to track and
@ -1159,7 +1178,7 @@ public rankings of the throughput and reliability of nodes, much like
seti@home. We further explain to users that they can get
deniability for any traffic emerging from the same address as a Tor
exit node, and they can use their own Tor node
as an entry or exit point and be confident it's not run by an adversary.
as an entry or exit point with confidence that it's not run by an adversary.
Further, users may run a node simply because they need such a network
to be persistently available and usable, and the value of supporting this
exceeds any countervening costs.
@ -1215,8 +1234,9 @@ further study.
\subsection{Trust and discovery}
\label{subsec:trust-and-discovery}
The published Tor design uses a deliberately simplistic design for
authorizing new nodes and informing clients about Tor nodes and their status.
The published Tor design is deliberately simplistic in how
new nodes are authorized and how clients are informed about Tor
nodes and their status.
All nodes periodically upload a signed description
of their locations, keys, and capabilities to each of several well-known {\it
directory servers}. These directory servers construct a signed summary
@ -1251,9 +1271,9 @@ Second, as more nodes join the network, %the more unreasonable it
directories
become infeasibly large, and downloading the list of nodes becomes
burdensome.
Third, the validation scheme may do as much harm as it does good. It not
only can't prevent clever attackers from mounting Sybil attacks,
but it may deter node operators from joining the network, if
Third, the validation scheme may do as much harm as it does good. It
does not prevent clever attackers from mounting Sybil attacks,
and it may deter node operators from joining the network---if
they expect the validation process to be difficult, or they do not share
any languages in common with the directory server operators.
@ -1277,6 +1297,10 @@ to learn which nodes are on the network. But omitting this information
from the Tor distribution would only delegate the trust problem to each
individual user. %, most of whom are presumably less informed about how to make
%trust decisions than the Tor developers.
A well publicized, widely available, authoritatively and independently
endorsed and signed list of initial directory servers and their keys
is a possible solution. But, setting that up properly is itself a large
bootstrapping task.
%Network discovery, sybil, node admission, scaling. It seems that the code
%will ship with something and that's our trust root. We could try to get
@ -1368,7 +1392,7 @@ routing problems.
We can address these points by reducing the network's connectivity.
Danezis~\cite{danezis-pets03} considers
the anonymity implications of restricting routes on mix networks, and
the anonymity implications of restricting routes on mix networks and
recommends an approach based on expander graphs (where any subgraph is likely
to have many neighbors). It is not immediately clear that this approach will
extend to Tor, which has a weaker threat model but higher performance
@ -1377,8 +1401,8 @@ probability of an attacker's viewing whole paths, we will need to examine the
attacker's likelihood of compromising the endpoints.
%
Tor may not need an expander graph per se: it
may be enough to have a single subnet that is highly connected, like
an internet backbone. % As an
may be enough to have a single central subnet that is highly connected, like
an Internet backbone. % As an
%example, assume fifty nodes of relatively high traffic capacity. This
%\emph{center} forms a clique. Assume each center node can
%handle 200 connections to other nodes (including the other ones in the
@ -1387,10 +1411,10 @@ an internet backbone. % As an
%network easily scales to c. 2500 nodes with commensurate increase in
%bandwidth.
There are many open questions: how to distribute connectivity information
(presumably nodes will learn about the center nodes
when they download Tor), whether center nodes
(presumably nodes will learn about the central nodes
when they download Tor), whether central nodes
will need to function as a `backbone', and so on. As above,
this could create problems for the expected anonymity for a mix-net,
this could reduce the amount of anonymity available from a mix-net,
but for a low-latency network where anonymity derives largely from
the edges, it may be feasible.
@ -1434,7 +1458,7 @@ architecture does not scale even to handle current user demand. We must
find designs and incentives to let some clients relay traffic too, without
sacrificing too much anonymity.
These are difficult and open questions, yet choosing not to solve them
These are difficult and open questions. Yet choosing not to solve them
means leaving most users to a less secure network or no anonymizing
network at all.