early iteration of an 'experiences' section

tweaks throughout


svn:r994
This commit is contained in:
Roger Dingledine 2004-01-14 11:10:29 +00:00
parent 9a53afb45b
commit 619fdd1ea7
2 changed files with 112 additions and 54 deletions

View file

@ -843,7 +843,7 @@
volume = 1,
number = 1,
pages = {66--92},
month = {November},
month = {June},
}
%note = {\url{http://citeseer.nj.nec.com/284739.html}}

View file

@ -75,7 +75,7 @@ close with a list of open problems in anonymous communication.
\label{sec:intro}
Onion Routing is a distributed overlay network designed to anonymize
low-latency TCP-based applications such as web browsing, secure shell,
TCP-based applications such as web browsing, secure shell,
and instant messaging. Clients choose a path through the network and
build a \emph{circuit}, in which each node (or ``onion router'' or ``OR'')
in the path knows its predecessor and successor, but no other nodes in
@ -164,7 +164,7 @@ that can be unreliable and complex. % open to partitioning attacks.
Tor takes a simplified view toward distributing such
information. Certain more trusted nodes act as \emph{directory
servers}: they provide signed directories that describe known
routers and their availability. Users periodically download the
routers and their current state. Users periodically download the
directories via HTTP.
\textbf{Variable exit policies:} Tor provides a consistent mechanism
@ -180,15 +180,17 @@ circuit could change the contents of data cells as they passed by---for
example, to alter a connection request so it would connect
to a different webserver, or to `tag' encrypted traffic and look for
corresponding corrupted traffic at the network edges \cite{minion-design}.
Tor hampers these attacks by checking data integrity before it leaves
Tor hampers these attacks by verifying data integrity before it leaves
the network.
\textbf{Improved robustness to failed nodes:} A failed node
in the old design meant that circuit building failed, but thanks to
Tor's step-by-step circuit building, users notice failed nodes
while building circuits and route around them. Additionally, liveness
information from directories allows users to avoid unreliable nodes in
the first place.
%\textbf{Improved robustness to failed nodes:} A failed node
%in the old design meant that circuit building failed, but thanks to
%Tor's step-by-step circuit building, users notice failed nodes
%while building circuits and route around them. Additionally, liveness
%information from directories allows users to avoid unreliable nodes in
%the first place.
%% Can't really claim this, now that we've found so many variants of
%% attack on partial-circuit-building. -RD
\textbf{Rendezvous points and hidden services:}
Tor provides an integrated mechanism for responder anonymity via
@ -205,9 +207,13 @@ TCP streams. Not requiring patches (or built-in support) in an
operating system's network stack has been valuable to Tor's
portability and deployability.
We have implemented most of the above features. Our source code is
available under a free license, and, as far as we know, is unencumbered by
patents. We have recently begun deploying a wide-area alpha network
We have implemented all of the above features except rendezvous
points. Our source code is
available under a free license, and Tor
%, as far as we know, is unencumbered by patents.
is not covered by the patent that affected distribution and use of
earlier versions of Onion Routing.
We have recently begun deploying a wide-area alpha network
to test the design in practice, to get more experience with usability
and users, and to provide a research platform for experimentation.
@ -216,7 +222,8 @@ our goals and assumptions in Section~\ref{sec:assumptions},
and then address the above list of improvements in
Sections~\ref{sec:design}-\ref{sec:rendezvous}. We summarize
in Section~\ref{sec:attacks} how our design stands up to
known attacks, and conclude with a list of open problems in
known attacks, and talk about our early deployment experiences in
Section~\ref{sec:in-the-wild}. We conclude with a list of open problems in
Section~\ref{sec:maintaining-anonymity} and future work for the Onion
Routing project in Section~\ref{sec:conclusion}.
@ -234,31 +241,26 @@ decrypts, delays, and re-orders messages, before relaying them toward
their destinations.
Subsequent relay-based anonymity designs have diverged in two
main directions. Some have tried to maximize anonymity at
the cost of introducing comparatively large and variable latencies,
including {\bf Babel} \cite{babel}, {\bf Mixmaster}
\cite{mixmaster-spec}, and
{\bf Mixminion} \cite{minion-design}. Because of this
decision, these \emph{high-latency} networks resist strong global
adversaries,
main directions. Systems like {\bf Babel} \cite{babel}, {\bf Mixmaster}
\cite{mixmaster-spec}, and {\bf Mixminion} \cite{minion-design} have tried
to maximize anonymity at the cost of introducing comparatively large and
variable latencies. Because of this decision, these \emph{high-latency}
networks resist strong global adversaries,
but introduce too much lag for interactive tasks like web browsing,
Internet chat, or SSH connections.
Tor belongs to the second category: \emph{low-latency} designs that
try to anonymize interactive network traffic. These systems handle
a variety of bidirectional protocols.
They also provide more convenient
a variety of bidirectional protocols. They also provide more convenient
mail delivery than the high-latency anonymous email
networks, because the remote mail server provides explicit and timely
delivery confirmation.
But because these designs typically
delivery confirmation. But because these designs typically
involve many packets that must be delivered quickly, it is
difficult for them to prevent an attacker who can eavesdrop both ends of the
communication from correlating the timing and volume
of traffic entering the anonymity network with traffic leaving it. These
protocols are also vulnerable against active attacks in which an
adversary introduces timing patterns into traffic entering the network and
looks
protocols are similarly vulnerable to an active adversary who introduces
timing patterns into traffic entering the network and looks
for correlated patterns among exiting traffic.
Although some work has been done to frustrate
these attacks, %\footnote{
@ -269,13 +271,13 @@ these attacks, %\footnote{
%}
% Point in the footnote is covered above, yes? -PS
most designs protect primarily against traffic analysis rather than traffic
confirmation (cf.\ Section~\ref{subsec:threat-model}).
confirmation (see Section~\ref{subsec:threat-model}).
The simplest low-latency designs are single-hop proxies such as the
{\bf Anonymizer} \cite{anonymizer}, wherein a single trusted server strips the
{\bf Anonymizer} \cite{anonymizer}: a single trusted server strips the
data's origin before relaying it. These designs are easy to
analyze, but users must trust the anonymizing proxy.
Concentrating the traffic to a single point increases the anonymity set
Concentrating the traffic to this single point increases the anonymity set
(the people a given user is hiding among), but it is vulnerable if the
adversary can observe all traffic going into and out of the proxy.
@ -305,6 +307,7 @@ stronger anonymity at the cost of allowing a single user to shut
down the network simply by not sending. Systems like {\bf ISDN mixes}
\cite{isdn-mixes} were designed for other environments with
different assumptions.
%XXX please can we fix this sentence to something less demeaning
In P2P designs like {\bf Tarzan} \cite{tarzan:ccs02} and {\bf MorphMix}
\cite{morphmix:fc04}, all participants both generate traffic and relay
@ -313,7 +316,7 @@ whether a given peer originated a request
or just relayed it from another peer. While Tarzan and MorphMix use
layered encryption as above, {\bf Crowds} \cite{crowds-tissec} simply assumes
an adversary who cannot observe the initiator: it uses no public-key
encryption, so any node on a circuit can read that circuit's traffic.
encryption, so any node on a circuit can read the circuit's traffic.
{\bf Hordes} \cite{hordes-jcs} is based on Crowds but also uses multicast
responses to hide the initiator. {\bf Herbivore} \cite{herbivore} and
@ -346,7 +349,7 @@ and anonymity. For example, a system that understands HTTP, such as Crowds,
can strip
identifying information from those requests, can take advantage of caching
to limit the number of requests that leave the network, and can batch
or encode those requests in order to minimize the number of connections.
or encode those requests to minimize the number of connections.
On the other hand, an IP-level anonymizer can handle nearly any protocol,
even ones unforeseen by its designers (though these systems require
kernel-level modifications to some operating systems, and so are more
@ -407,7 +410,7 @@ therefore not
require modifying applications; should not introduce prohibitive delays;
and should require users to make as few configuration decisions
as possible. Finally, Tor should be easily implemented on all common
platforms; we cannot require users to change their operating system in order
platforms; we cannot require users to change their operating system
to be anonymous. (The current Tor implementation runs on Windows and
assorted Unix clones including Linux, FreeBSD, and MacOS X.)
@ -452,9 +455,9 @@ as Privoxy to hide differences between clients, and expunge protocol
features that leak identity.
Note that by this separation Tor can also provide services that
are anonymous to the network yet authenticated to the responder, like
SSH. Similarly, Tor does not currently integrate
tunneling for non-stream-based protocols like UDP; this too must be
provided by an external service.
SSH. Similarly, Tor does not integrate
tunneling for non-stream-based protocols like UDP; this must be
provided by an external service if appropriate.
\textbf{Not steganographic:} Tor does not try to conceal who is connected
to the network.
@ -497,10 +500,9 @@ network stops; or by introducing patterns into traffic that can later be
detected. The adversary might subvert the directory servers to give users
differing views of network state. Additionally, he can try to decrease
the network's reliability by attacking nodes or by performing antisocial
activities from reliable servers and trying to get them taken down;
making the network unreliable flushes users to other less anonymous
systems, where they may be easier to attack.
We summarize
activities from reliable nodes and trying to get them taken down---making
the network unreliable flushes users to other less anonymous
systems, where they may be easier to attack. We summarize
in Section~\ref{sec:attacks} how well the Tor design defends against
each of these attacks.
@ -863,8 +865,8 @@ Volunteers are generally more willing to run services that can limit
their own bandwidth usage. To accommodate them, Tor servers use a
token bucket approach \cite{tannenbaum96} to
enforce a long-term average rate of incoming bytes, while still
permitting short-term bursts above the allowed bandwidth. Current bucket
sizes are set to ten seconds' worth of traffic.
permitting short-term bursts above the allowed bandwidth.
% Current bucket sizes are set to ten seconds' worth of traffic.
%Further, we want to avoid starving any Tor streams. Entire circuits
%could starve if we read greedily from connections and one connection
@ -878,13 +880,13 @@ Because the Tor protocol generates roughly the same number of outgoing
bytes as incoming bytes, it is sufficient in practice to limit only
incoming bytes.
With TCP streams, however, the correspondence is not one-to-one:
relaying a single incoming byte can require an entire 256-byte cell.
relaying a single incoming byte can require an entire 512-byte cell.
(We can't just wait for more bytes, because the local application may
be waiting for a reply.) Therefore, we treat this case as if the entire
cell size had been read, regardless of the fullness of the cell.
Further, inspired by Rennhard et al's design in \cite{anonnet}, a
circuit's edges heuristically distinguish interactive streams from bulk
circuit's edges can heuristically distinguish interactive streams from bulk
streams by comparing the frequency with which they supply cells. We can
provide good latency for interactive streams by giving them preferential
service, while still giving good overall throughput to the bulk
@ -950,10 +952,8 @@ Currently, non-data relay cells do not affect the windows. Thus we
avoid potential deadlock issues, for example, arising because a stream
can't send a \emph{relay sendme} cell when its packaging window is empty.
These arbitrarily chosen parameters
%are probably not optimal; more
%research remains to find which parameters
seem to give tolerable throughput and delay; more research remains.
These arbitrarily chosen parameters seem to give tolerable throughput
and delay; see Section~\ref{sec:in-the-wild}.
\Section{Other design decisions}
@ -984,8 +984,8 @@ TLS handshakes or accepting \emph{create} cells. So long as these
tokens are easy to verify and computationally expensive to produce, this
approach limits the attack multiplier. Additionally, ORs may limit
the rate at which they accept create cells and TLS connections, so that
the computational work of processing them does not drown out the (comparatively
inexpensive) work of symmetric cryptography needed to keep cells
the computational work of processing them does not drown out the
symmetric cryptography operations that keep cells
flowing. This rate limiting could, however, allow an attacker
to slow down other users when they build new circuits.
@ -1189,8 +1189,8 @@ Using directory servers is simpler and more flexible than flooding.
Flooding is expensive, and complicates the analysis when we
start experimenting with non-clique network topologies. Signed
directories can be cached by other
onion routers.
Thus directory servers are not a performance
onion routers,
so directory servers are not a performance
bottleneck when we have many users, and do not aid traffic analysis by
forcing clients to periodically announce their existence to any
central point.
@ -1414,7 +1414,7 @@ been shown to be effective against SafeWeb \cite{hintz-pet02}.
It may be less effective against Tor, since
streams are multiplexed within the same circuit, and
fingerprinting will be limited to
the granularity of cells (currently 256 bytes). Additional
the granularity of cells (currently 512 bytes). Additional
defenses could include
larger cell sizes, padding schemes to group websites
into large sets, and link
@ -1610,6 +1610,64 @@ point is no more sensitive than any other OR on
a circuit, since all data passing through the rendezvous is encrypted
with a session key shared by Alice and Bob.
\Section{Early experiences: Tor in the Wild}
\label{sec:in-the-wild}
The current Tor network, as of mid January 2004, consists of 16 nodes
(14 in the US, 2 in Europe), and we're adding more each week as the code
gets more robust.\footnote{For comparison, the current remailer network
has about 30 nodes.} Each node has at least a 768k/768k connection, and
most have 10Mb. The number of users varies (and of course, it's hard to
tell for sure), but we sometimes have several hundred users---admins at
several companies have started putting their entire department's web
traffic through Tor, to block snooping admins in other divisions of
their company from reading the traffic. Tor users have reported using
the network for web browsing, ftp, IRC, AIM, Kazaa, and ssh.
As of mid January, each Tor node was processing roughly 800,000 relay
cells (a bit under half a gigabyte) per week. On average, about 80\%
of each 500-byte payload is full for cells going back to the client,
whereas about 40\% is full for cells coming from the client. (They are
difference because most of our traffic is web browsing.) Interactive
traffic like ssh brings down the average a lot---once we have more
experience, and assuming we can resolve the anonymity issues, we will
consider partitioning traffic into two relay cell sizes: one to handle
bulk traffic and one for interactive traffic.
We haven't asked to use PlanetLab \cite{planetlab} to provide more nodes,
because their AUP excludes projects like Tor (see also \cite{darkside}. On
the other hand, we have had no abuse issues since the network was deployed
in October 2003. Our default exit policy rejects smtp requests, to block
spamming even before it becomes an issue. For now we're happy with our
slow growth rate, while we add features, resolve bugs, and get a feel for
what users actually want from an anonymity system. We are not eager to
attract the Kazaa or warez communities, even though they would greatly
bolster the anonymity sets---we must build a reputation of being for
privacy, human rights, research, and other entirely legitimate activities.
As for performance, profiling shows that almost all the CPU time for the
Tor program itself is spent in AES (which is fast). Thus latency comes
from two factors. First, network latency is a critical factor: we are
intentionally bouncing traffic around the world several times. Second,
our end-to-end congestion control algorithm focuses on protecting our
volunteer servers from accidental DoS rather than providing maximum
performance. Right now the first $500*500B=250KB$ of the stream arrives
quickly, and after that throughput depends on the rate that \emph{relay
sendme} acknowledgements arrive. We can tweak the congestion control
parameters to provide faster throughput at the expense of requiring
larger buffers at each node; adding the heuristics mentioned in
Section~\ref{subsec:rate-limit} to give better speed to low-volume
streams will change the equation too. More research remains to find the
right balance.
%performs badly on lossy networks. may need airhook or something else as
%transport alternative?
With the current network's topology and load, users can typically
get 1-2 megabits sustained transfer rate. Overall, this performance is
sufficient. The Tor design focuses on security; usability and performance
just have to not suck too much.
\Section{Open Questions in Low-latency Anonymity}
\label{sec:maintaining-anonymity}