mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-11-20 02:09:24 +01:00
e14417f130
svn:r3471
1079 lines
55 KiB
TeX
1079 lines
55 KiB
TeX
\documentclass{llncs}
|
|
|
|
\usepackage{url}
|
|
\usepackage{amsmath}
|
|
\usepackage{epsfig}
|
|
|
|
\newenvironment{tightlist}{\begin{list}{$\bullet$}{
|
|
\setlength{\itemsep}{0mm}
|
|
\setlength{\parsep}{0mm}
|
|
% \setlength{\labelsep}{0mm}
|
|
% \setlength{\labelwidth}{0mm}
|
|
% \setlength{\topsep}{0mm}
|
|
}}{\end{list}}
|
|
|
|
\begin{document}
|
|
|
|
\title{Challenges in practical low-latency stream anonymity (DRAFT)}
|
|
|
|
\author{Roger Dingledine and Nick Mathewson}
|
|
\institute{The Free Haven Project\\
|
|
\email{\{arma,nickm\}@freehaven.net}}
|
|
|
|
\maketitle
|
|
\pagestyle{empty}
|
|
|
|
\begin{abstract}
|
|
foo
|
|
\end{abstract}
|
|
|
|
\section{Introduction}
|
|
|
|
Tor is a low-latency anonymous communication overlay network designed
|
|
to be practical and usable for protecting TCP streams over the
|
|
Internet~\cite{tor-design}. We have been operating a publicly deployed
|
|
Tor network since October 2003 that has grown to over a hundred volunteer
|
|
nodes and carries on average over 70 megabits of traffic per second.
|
|
|
|
Tor has a weaker threat model than many anonymity designs in the
|
|
literature, because our foremost goal is to deploy a
|
|
practical and useful network for interactive (low-latency) communications.
|
|
Subject to this restriction, we try to
|
|
provide as much anonymity as we can. In particular, because we
|
|
support interactive communications without impractically expensive padding,
|
|
we fall prey to a variety
|
|
of intra-network~\cite{attack-tor-oak05,flow-correlation04,bar} and
|
|
end-to-end~\cite{danezis-pet2004,SS03} anonymity-breaking attacks.
|
|
|
|
Tor is secure so long as adversaries are unable to
|
|
observe connections as they both enter and leave the Tor network.
|
|
Therefore, Tor's defense lies in having a diverse enough set of servers
|
|
that most real-world
|
|
adversaries are unlikely to be in the right places to attack users.
|
|
Specifically,
|
|
Tor aims to resist observers and insiders by distributing each transaction
|
|
over several nodes in the network. This ``distributed trust'' approach
|
|
means the Tor network can be safely operated and used by a wide variety
|
|
of mutually distrustful users, providing more sustainability and security
|
|
than some previous attempts at anonymizing networks.
|
|
The Tor network has a broad range of users, including ordinary citizens
|
|
concerned about their privacy, corporations
|
|
who don't want to reveal information to their competitors, and law
|
|
enforcement and government intelligence agencies who need
|
|
to do operations on the Internet without being noticed.
|
|
|
|
Tor research and development has been funded by the U.S. Navy, for use
|
|
in securing government
|
|
communications, and also by the Electronic Frontier Foundation, for use
|
|
in maintaining civil liberties for ordinary citizens online. The Tor
|
|
protocol is one of the leading choices
|
|
to be the anonymizing layer in the European Union's PRIME directive to
|
|
help maintain privacy in Europe. The University of Dresden in Germany
|
|
has integrated an independent implementation of the Tor protocol into
|
|
their popular Java Anon Proxy anonymizing client. This wide variety of
|
|
interests helps maintain both the stability and the security of the
|
|
network.
|
|
|
|
%awk
|
|
Tor's principal research strategy, in attempting to deploy a network that is
|
|
practical, useful, and anonymous, has been to insist, when trade-offs arise
|
|
between these properties, on remaining useful enough to attract many users,
|
|
and practical enough to support them. Subject to these
|
|
constraints, we aim to maximize anonymity. This is not the only possible
|
|
direction in anonymity research: designs exist that provide more anonymity
|
|
than Tor at the expense of significantly increased resource requirements, or
|
|
decreased flexibility in application support (typically because of increased
|
|
latency). Such research does not typically abandon aspirations towards
|
|
deployability or utility, but instead tries to maximize deployability and
|
|
utility subject to a certain degree of inherent anonymity (inherent because
|
|
usability and practicality affect usage which affects the actual anonymity
|
|
provided by the network \cite{back01,econymics}). We believe that these
|
|
approaches can be promising and useful, but that by focusing on deploying a
|
|
usable system in the wild, Tor helps us experiment with the actual parameters
|
|
of what makes a system ``practical'' for volunteer operators and ``useful''
|
|
for home users, and helps illuminate undernoticed issues which any deployed
|
|
volunteer anonymity network will need to address.
|
|
|
|
While~\cite{tor-design} gives an overall view of the Tor design and goals,
|
|
this paper describes the policy and technical issues that Tor faces as
|
|
we continue deployment. Rather than trying to provide complete solutions
|
|
to every problem here, we lay out the assumptions and constraints
|
|
that we have observed through deploying Tor in the wild. In doing so, we
|
|
aim to create a research agenda for others to
|
|
help in addressing these issues. Section~\ref{sec:what-is-tor} gives an
|
|
overview of the Tor
|
|
design and ours goals. Sections~\ref{sec:crossroads-policy}
|
|
and~\ref{sec:crossroads-technical} go on to describe the practical challenges,
|
|
both policy and technical respectively, that stand in the way of moving
|
|
from a practical useful network to a practical useful anonymous network.
|
|
|
|
\section{What Is Tor}
|
|
\label{sec:what-is-tor}
|
|
|
|
Here we give a basic overview of the Tor design and its properties. For
|
|
details on the design, assumptions, and security arguments, we refer
|
|
the reader to~\cite{tor-design}.
|
|
|
|
\subsection{Distributed trust: safety in numbers}
|
|
|
|
Tor provides \emph{forward privacy}, so that users can connect to
|
|
Internet sites without revealing their logical or physical locations
|
|
to those sites or to observers. It also provides \emph{location-hidden
|
|
services}, so that critical servers can support authorized users without
|
|
giving adversaries an effective vector for physical or online attacks.
|
|
The design provides this protection even when a portion of its own
|
|
infrastructure is controlled by an adversary.
|
|
|
|
To create a private network pathway with Tor, the user's software (client)
|
|
incrementally builds a \emph{circuit} of encrypted connections through
|
|
servers on the network. The circuit is extended one hop at a time, and
|
|
each server along the way knows only which server gave it data and which
|
|
server it is giving data to. No individual server ever knows the complete
|
|
path that a data packet has taken. The client negotiates a separate set
|
|
of encryption keys for each hop along the circuit to ensure that each
|
|
hop can't trace these connections as they pass through.
|
|
|
|
Once a circuit has been established, many kinds of data can be exchanged
|
|
and several different sorts of software applications can be deployed over
|
|
the Tor network. Because each server sees no more than one hop in the
|
|
circuit, neither an eavesdropper nor a compromised server can use traffic
|
|
analysis to link the connection's source and destination. Tor only works
|
|
for TCP streams and can be used by any application with SOCKS support.
|
|
|
|
For efficiency, the Tor software uses the same circuit for connections
|
|
that happen within the same minute or so. Later requests are given a new
|
|
circuit, to prevent long-term linkability between different actions by
|
|
a single user.
|
|
|
|
Tor also makes it possible for users to hide their locations while
|
|
offering various kinds of services, such as web publishing or an instant
|
|
messaging server. Using Tor ``rendezvous points'', other Tor users can
|
|
connect to these hidden services, each without knowing the other's network
|
|
identity.
|
|
%This hidden service functionality could allow Tor users to
|
|
%set up a website where people publish material without worrying about
|
|
%censorship. Nobody would be able to determine who was offering the site,
|
|
%and nobody who offered the site would know who was posting to it.
|
|
|
|
Tor attempts to anonymize the transport layer, not the application layer, so
|
|
application protocols that include personally identifying information need
|
|
additional application-level scrubbing proxies, such as
|
|
Privoxy~\cite{privoxy} for HTTP. Furthermore, Tor does not permit arbitrary
|
|
IP packets; it only anonymizes TCP and DNS, and only supports cconnections
|
|
SOCKS (see section \ref{subsec:tcp-vs-ip}).
|
|
|
|
Tor differs from other deployed systems for traffic analysis resistance
|
|
in its security and flexibility. Mix networks such as
|
|
Mixmaster~\cite{mixmaster} or its successor Mixminion~\cite{minion-design}
|
|
gain the highest degrees of anonymity at the expense of introducing highly
|
|
variable delays, thus making them unsuitable for applications such as web
|
|
browsing that require quick response times. Commercial single-hop
|
|
proxies~\cite{anonymizer} present a single point of failure, where
|
|
a single compromise can expose all users' traffic, and a single-point
|
|
eavesdropper can perform traffic analysis on the entire network.
|
|
Also, their proprietary implementations place any infrastucture that
|
|
depends on these single-hop solutions at the mercy of their providers'
|
|
financial health as well as network security.
|
|
|
|
No organization can achieve this security on its own. If a single
|
|
corporation or government agency were to build a private network to
|
|
protect its operations, any connections entering or leaving that network
|
|
would be obviously linkable to the controlling organization. The members
|
|
and operations of that agency would be easier, not harder, to distinguish.
|
|
|
|
Instead, to protect our networks from traffic analysis, we must
|
|
collaboratively blend the traffic from many organizations and private
|
|
citizens, so that an eavesdropper can't tell which users are which,
|
|
and who is looking for what information. By bringing more users onto
|
|
the network, all users become more secure \cite{econymics}.
|
|
|
|
Naturally, organizations will not want to depend on others for their
|
|
security. If most participating providers are reliable, Tor tolerates
|
|
some hostile infiltration of the network. For maximum protection,
|
|
the Tor design includes an enclave approach that lets data be encrypted
|
|
(and authenticated) end-to-end, so high-sensitivity users can be sure it
|
|
hasn't been read or modified. This even works for Internet services that
|
|
don't have built-in encryption and authentication, such as unencrypted
|
|
HTTP or chat, and it requires no modification of those services to do so.
|
|
|
|
weasel's graph of \# nodes and of bandwidth, ideally from week 0.
|
|
|
|
Tor doesn't try to provide steg (but see Sec \ref{china}), or
|
|
the other non-goals listed in tor-design.
|
|
|
|
[arma will do this part]
|
|
|
|
Tor is not the only anonymity system that aims to be practical and useful.
|
|
Commercial single-hop proxies~\cite{anonymizer}, as well as unsecured
|
|
open proxies around the Internet~\cite{open-proxies}, can provide good
|
|
performance and some security against a weaker attacker. Dresden's Java
|
|
Anon Proxy~\cite{jap} provides similar functionality to Tor but only
|
|
handles web browsing rather than arbitrary TCP. Also, JAP's network
|
|
topology uses cascades (fixed routes through the network); since without
|
|
end-to-end padding it is just as vulnerable as Tor to end-to-end timing
|
|
attacks, its dispersal properties are therefore worse than Tor's.
|
|
%Some peer-to-peer file-sharing overlay networks such as
|
|
%Freenet~\cite{freenet} and Mute~\cite{mute}
|
|
Zero-Knowledge Systems' commercial Freedom
|
|
network~\cite{freedom21-security} was even more flexible than Tor in
|
|
that it could transport arbitrary IP packets, and it also supported
|
|
pseudonymous access rather than just anonymous access; but it had
|
|
a different approach to sustainability (collecting money from users
|
|
and paying ISPs to run servers), and has shut down due to financial
|
|
load. Finally, more scalable designs like Tarzan~\cite{tarzan} and
|
|
MorphMix~\cite{morphmix} have been proposed in the literature, but
|
|
have not yet been fielded. We direct the interested reader to Section
|
|
2 of~\cite{tor-design} for a more indepth review of related work.
|
|
|
|
%six-four. crowds. i2p.
|
|
|
|
|
|
have a serious discussion of morphmix's assumptions, since they would
|
|
seem to be the direct competition. in fact tor is a flexible architecture
|
|
that would encompass morphmix, and they're nearly identical except for
|
|
path selection and node discovery. and the trust system morphmix has
|
|
seems overkill (and/or insecure) based on the threat model we've picked.
|
|
% this para should probably move to the scalability / directory system. -RD
|
|
|
|
\section{Threat model}
|
|
|
|
Tor does not attempt to defend against a global observer. Any adversary who
|
|
can see a user's connection to the Tor network, and who can see the
|
|
corresponding connection as it exits the Tor network, can use the timing
|
|
correlation between the two connections to confirm the user's chosen
|
|
communication partners. Defeating this attack would seem to require
|
|
introducing a prohibitive degree of traffic padding between the user and the
|
|
network, or introducing an unacceptable degree of latency (but see
|
|
\ref{subsec:mid-latency} below). Thus, Tor only
|
|
attempts to defend against external observers who can observe both sides of a
|
|
user's connection.
|
|
|
|
Against internal attackers, who sign up Tor servers, the situation is more
|
|
complicated. In the simplest case, if an adversary has compromised $c$ of
|
|
$n$ servers on the Tor network, then the adversary will be able to compromise
|
|
a random circuit with probability $\frac{c^2}{n^2}$ (since the circuit
|
|
initiator chooses hops randomly). But there are
|
|
complicating factors:
|
|
\begin{tightlist}
|
|
\item If the user continues to build random circuits over time, an adversary
|
|
is pretty certain to see a statistical sample of the user's traffic, and
|
|
thereby can build an increasingly accurate profile of her behavior. (See
|
|
\ref{subsec:helper-nodes} for possible solutions.)
|
|
\item If an adversary controls a popular service outside of the Tor network,
|
|
he can be certain of observing all connections to that service; he
|
|
therefore will trace connections to that service with probability
|
|
$\frac{c}{n}$.
|
|
\item Users do not in fact choose servers with uniform probability; they
|
|
favor servers with high bandwidth, and exit servers that permit connections
|
|
to their favorite services.
|
|
\end{tightlist}
|
|
|
|
%discuss $\frac{c^2}{n^2}$, except how in practice the chance of owning
|
|
%the last hop is not $c/n$ since that doesn't take the destination (website)
|
|
%into account. so in cases where the adversary does not also control the
|
|
%final destination we're in good shape, but if he *does* then we'd be better
|
|
%off with a system that lets each hop choose a path.
|
|
%
|
|
%Isn't it more accurate to say ``If the adversary _always_ controls the final
|
|
% dest, we would be just as well off with such as system.'' ? If not, why
|
|
% not? -nm
|
|
|
|
in practice tor's threat model is based entirely on the goal of dispersal
|
|
and diversity. george and steven describe an attack \cite{draft} that
|
|
lets them determine the nodes used in a circuit; yet they can't identify
|
|
alice or bob through this attack. so it's really just the endpoints that
|
|
remain secure. and the enclave model seems particularly threatened by
|
|
this, since this attack lets us identify endpoints when they're servers.
|
|
see \ref{subsec:helper-nodes} for discussion of some ways to address this
|
|
issue.
|
|
|
|
see \ref{subsec:routing-zones} for discussion of larger
|
|
adversaries and our dispersal goals.
|
|
|
|
[this section will get written once the rest of the paper is farther along]
|
|
|
|
\section{Crossroads: Policy issues}
|
|
\label{sec:crossroads-policy}
|
|
|
|
Many of the issues the Tor project needs to address are not just a
|
|
matter of system design or technology development. In particular, the
|
|
Tor project's \emph{image} with respect to its users and the rest of
|
|
the Internet impacts the security it can provide.
|
|
|
|
As an example to motivate this section, some U.S.~Department of Enery
|
|
penetration testing engineers are tasked with compromising DoE computers
|
|
from the outside. They only have a limited number of ISPs from which to
|
|
launch their attacks, and they found that the defenders were recognizing
|
|
attacks because they came from the same IP space. These engineers wanted
|
|
to use Tor to hide their tracks. First, from a technical standpoint,
|
|
Tor does not support the variety of IP packets one would like to use in
|
|
such attacks (see Section \ref{subsec:ip-vs-tcp}). But aside from this,
|
|
we also decided that it would probably be poor precedent to encourage
|
|
such use---even legal use that improves national security---and managed
|
|
to dissuade them.
|
|
|
|
With this image issue in mind, here we discuss the Tor user base and
|
|
Tor's interaction with other services on the Internet.
|
|
\subsection{Image and reputability}
|
|
|
|
Image: substantial non-infringing uses. Image is a security parameter,
|
|
since it impacts user base and perceived sustainability.
|
|
|
|
grab reputability paragraphs from usability.tex [arma will do this]
|
|
|
|
A Tor gui, how jap's gui is nice but does not reflect the security
|
|
they provide.
|
|
Public perception, and thus advertising, is a security parameter.
|
|
|
|
good uses are kept private, bad uses are publicized. not good.
|
|
|
|
users do not correlate to anonymity. arma will do this.
|
|
|
|
\subsection{Usability and bandwidth and sustainability and incentives}
|
|
|
|
low-pain-threshold users go away until all users are willing to use it
|
|
|
|
Sustainability. Previous attempts have been commercial which we think
|
|
adds a lot of unnecessary complexity and accountability. Freedom didn't
|
|
collect enough money to pay its servers; JAP bandwidth is supported by
|
|
continued money, and they periodically ask what they will do when it
|
|
dries up.
|
|
|
|
"outside of academia, jap has just lost, permanently"
|
|
|
|
Usability: fc03 paper was great, except the lower latency you are the
|
|
less useful it seems it is.
|
|
|
|
[nick will write this section]
|
|
|
|
\subsection{Tor and file-sharing}
|
|
|
|
[nick will write this section]
|
|
|
|
Bittorrent and dmca. Should we add an IDS to autodetect protocols and
|
|
snipe them?
|
|
|
|
because only at the exit is it evident what port or protocol a given
|
|
tor stream is, you can't choose not to carry file-sharing traffic.
|
|
|
|
hibernation vs rate-limiting: do we want diversity or throughput? i
|
|
think we're shifting back to wanting diversity.
|
|
|
|
\subsection{Tor and blacklists}
|
|
|
|
Takedowns and efnet abuse and wikipedia complaints and irc
|
|
networks.
|
|
|
|
It was long expected that, alongside Tor's legitimate users, it would also
|
|
attract troublemakers who exploited Tor in order to abuse services on the
|
|
Internet. Our initial answer to this situation was to use ``exit policies''
|
|
to allow individual Tor servers to block access to specific IP/port ranges.
|
|
This approach was meant to make operators more willing to run Tor by allowing
|
|
them to prevent their servers from being used for abusing particular
|
|
services. For example, all Tor servers currently block SMTP (port 25), in
|
|
order to avoid being used to send spam.
|
|
|
|
This approach is useful, but is insufficient for two reasons. First, since
|
|
it is not possible to force all ORs to block access to any given service,
|
|
many of those services try to block Tor instead. More broadly, while being
|
|
blockable is important to being good netizens, we would like to encourage
|
|
services to allow anonymous access; services should not need to decide
|
|
between blocking legitimate anonymous use and allowing unlimited abuse.
|
|
|
|
This is potentially a bigger problem than it may appear.
|
|
On the one hand, if people want to refuse connections from you on
|
|
their servers it would seem that they should be allowed to. But, a
|
|
possible major problem with the blocking of Tor is that it's not just
|
|
the decision of the individual server administrator whose deciding if
|
|
he wants to post to wikipedia from his Tor node address or allow
|
|
people to read wikipedia anonymously through his Tor node. If e.g.,
|
|
s/he comes through a campus or corporate NAT, then the decision must
|
|
be to have the entire population behind it able to have a Tor exit
|
|
node or write access to wikipedia. This is a loss for both of us (Tor
|
|
and wikipedia). We don't want to compete for (or divvy up) the NAT
|
|
protected entities of the world.
|
|
|
|
(A related problem is that many IP blacklists are not terribly fine-grained.
|
|
No current IP blacklist, for example, allow a service provider to blacklist
|
|
only those Tor servers that allow access to a specific IP or port, even
|
|
though this information is readily available. One IP blacklist even bans
|
|
every class C network that contains a Tor server, and recommends banning SMTP
|
|
from these networks even though Tor does not allow SMTP at all.)
|
|
|
|
Problems of abuse occur mainly with services such as IRC networks and
|
|
Wikipedia, which rely on IP-blocking to ban abusive users. While at first
|
|
blush this practice might seem to depend on the anachronistic assumption that
|
|
each IP is an identifier for a single user, it is actually more reasonable in
|
|
practice: it assumes that non-proxy IPs are a costly resource, and that an
|
|
abuser can not change IPs at will. By blocking IPs which are used by Tor
|
|
servers, open proxies, and service abusers, these systems hope to make
|
|
ongoing abuse difficult. Although the system is imperfect, it works
|
|
tolerably well for them in practice.
|
|
|
|
But of course, we would prefer that legitimate anonymous users be able to
|
|
access abuse-prone services. One conceivable approach would be to require
|
|
would-be IRC users, for instance, to register accounts if they wanted to
|
|
access the IRC network from Tor. But in practise, this would not
|
|
significantly impede abuse if creating new accounts were easily automatable;
|
|
this is why services use IP blocking. In order to deter abuse, pseudonymous
|
|
identities need to impose a significant switching cost in resources or human
|
|
time.
|
|
|
|
Once approach, similar to that taken by Freedom, would be to bootstrap some
|
|
non-anonymous costly identification mechanism to allow access to a
|
|
blind-signature pseudonym protocol. This would effectively create costly
|
|
pseudonyms, which services could require in order to allow anonymous access.
|
|
This approach has difficulties in practise, however:
|
|
\begin{tightlist}
|
|
\item Unlike Freedom, Tor is not a commercial service. Therefore, it would
|
|
be a shame to require payment in order to make Tor useful, or to make
|
|
non-paying users second-class citizens.
|
|
\item It is hard to think of an underlying resource that would actually work.
|
|
We could use IP addresses, but that's the problem, isn't it?
|
|
\item Managing single sign-on services is not considered a well-solved
|
|
problem in practice. If Microsoft can't get universal acceptance for
|
|
passport, why do we think that a Tor-specific solution would do any good?
|
|
\item Even if we came up with a perfect authentication system for our needs,
|
|
there's no guarantee that any service would actually start using it. It
|
|
would require a nonzero effort for them to support it, and it might just
|
|
be less hassle for them to block tor anyway.
|
|
\end{tightlist}
|
|
|
|
Squishy IP based ``authentication'' and ``authorization'' is a reality
|
|
we must contend with. We should say something more about the analogy
|
|
with SSNs.
|
|
|
|
|
|
|
|
\subsection{Other}
|
|
|
|
[Once you build a generic overlay network, everybody wants to use it.]
|
|
|
|
Tor's scope: How much should Tor aim to do? Applications that leak
|
|
data: we can say they're not our problem, but they're somebody's problem.
|
|
Also, the more widely deployed Tor becomes, the more people who need a
|
|
deployed overlay network tell us they'd like to use us if only we added
|
|
the following more features. For example, Blossom \cite{blossom} and
|
|
random community wireless projects both want source-routable overlay
|
|
networks for their own purposes. Fortunately, our modular design separates
|
|
routing from node discovery; so we could implement Morphmix in Tor just
|
|
by implementing the Morphmix-specific node discovery and path selection
|
|
pieces. On the other hand, we could easily get distracted building a
|
|
general-purpose overlay library, and we're only a few developers.
|
|
|
|
[arma will work on this]
|
|
|
|
%Should we allow revocation of anonymity if a threshold of
|
|
%servers want to?
|
|
|
|
Logging. Making logs not revealing. A happy coincidence that verbose
|
|
logging is our \#2 performance bottleneck. Is there a way to detect
|
|
modified servers, or to have them volunteer the information that they're
|
|
logging verbosely? Would that actually solve any attacks?
|
|
|
|
\section{Crossroads: Scaling and Design choices}
|
|
\label{sec:crossroads-design}
|
|
|
|
\subsection{Transporting the stream vs transporting the packets}
|
|
\ref{subsec:stream-vs-packet}
|
|
|
|
We periodically run into ex ZKS employees who tell us that the process of
|
|
anonymizing IPs should ``obviously'' be done at the IP layer. Here are
|
|
the issues that need to be resolved before we'll be ready to switch Tor
|
|
over to arbitrary IP traffic.
|
|
|
|
\begin{enumerate}
|
|
\setlength{\itemsep}{0mm}
|
|
\setlength{\parsep}{0mm}
|
|
\item \emph{IP packets reveal OS characteristics.} We still need to do
|
|
IP-level packet normalization, to stop things like IP fingerprinting
|
|
\cite{ip-fingerprinting}. There exist libraries \cite{ip-normalizing}
|
|
that can help with this.
|
|
\item \emph{Application-level streams still need scrubbing.} We still need
|
|
Tor to be easy to integrate with user-level application-specific proxies
|
|
such as Privoxy. So it's not just a matter of capturing packets and
|
|
anonymizing them at the IP layer.
|
|
\item \emph{Certain protocols will still leak information.} For example,
|
|
DNS requests destined for my local DNS servers need to be rewritten
|
|
to be delivered to some other unlinkable DNS server. This requires
|
|
understanding the protocols we are transporting.
|
|
\item \emph{The crypto is unspecified.} First we need a block-level encryption
|
|
approach that can provide security despite
|
|
packet loss and out-of-order delivery. Freedom allegedly had one, but it was
|
|
never publicly specified, and we believe it's likely vulnerable to tagging
|
|
attacks \cite{tor-design}. Also, TLS over UDP is not implemented or even
|
|
specified, though some early work has begun on that \cite{ben-tls-udp}.
|
|
\item \emph{We'll still need to tune network parameters}. Since the above
|
|
encryption system will likely need sequence numbers and maybe more to do
|
|
replay detection, handle duplicate frames, etc, we will be reimplementing
|
|
some subset of TCP anyway to manage throughput, congestion control, etc.
|
|
\item \emph{Exit policies for arbitrary IP packets mean building a secure
|
|
IDS.} Our server operators tell us that exit policies are one of
|
|
the main reasons they're willing to run Tor over previous attempts
|
|
at anonymizing networks. Adding an IDS to handle exit policies would
|
|
increase the security complexity of Tor, and would likely not work anyway,
|
|
as evidenced by the entire field of IDS and counter-IDS papers. Many
|
|
potential abuse issues are resolved by the fact that Tor only transports
|
|
valid TCP streams (as opposed to arbitrary IP including malformed packets
|
|
and IP floods), so exit policies become even \emph{more} important as
|
|
we become able to transport IP packets. We also need a way to compactly
|
|
characterize the exit policies and let clients parse them to decide
|
|
which nodes will allow which packets to exit.
|
|
\item \emph{The Tor-internal name spaces would need to be redesigned.} We
|
|
support hidden service {\tt{.onion}} addresses, and other special addresses
|
|
like {\tt{.exit}} (see Section \ref{subsec:}), by intercepting the addresses
|
|
when they are passed to the Tor client.
|
|
\end{enumerate}
|
|
|
|
This list is discouragingly long right now, but we recognize that it
|
|
would be good to investigate each of these items in further depth and to
|
|
understand which are actual roadblocks and which are easier to resolve
|
|
than we think. We certainly wouldn't mind if Tor one day is able to
|
|
transport a greater variety of protocols.
|
|
|
|
\subsection{Mid-latency}
|
|
\label{subsec:mid-latency}
|
|
|
|
Though Tor has always been designed to be practical and usable first
|
|
with as much anonymity as can be built in subject to those goals, we
|
|
have contemplated that users might need resistance to at least simple
|
|
traffic confirmation attacks. Raising the latency of communication
|
|
slightly might make this feasible. If the latency could be kept to two
|
|
or three times its current overhead, this might be acceptable to the
|
|
majority of Tor users. However, it might also destroy much of the user
|
|
base, and it is difficult to know in advance. Note also that in
|
|
practice, as the network is growing and we accept cable modem, DSL
|
|
nodes, and more nodes in various continents, we're \emph{already}
|
|
looking at many-second delays for some transactions. The engineering
|
|
required to get this lower is going to be extremely hard. It's worth
|
|
considering how hard it would be to accept the fixed (higher) latency
|
|
and improve the protection we get from it. Thus, it may be most
|
|
practical to run a mid-latency option over the Tor network for those
|
|
users either willing to experiment or in need of more a priori
|
|
anonymity in the network. This will allow us to experiment with both
|
|
the anonymity provided and the interest on the part of users.
|
|
|
|
Adding a mid-latency option should not require significant fundamental
|
|
change to the Tor client or server design; circuits can be labeled as
|
|
low or mid latency on servers as they are set up. Low-latency traffic
|
|
would be processed as now. Packets on circuits that are mid-latency
|
|
would be sent in uniform size chunks at synchronized intervals. To
|
|
some extent the chunking is already done because traffic moves through
|
|
the network in uniform size cells, but this would occur at a courser
|
|
granularity. If servers forward these chunks in roughly synchronous
|
|
fashion, it will increase the similarity of data stream timing
|
|
signatures. By experimenting with the granularity of data chunks and
|
|
of synchronization we can attempt once again to optimize for both
|
|
usability and anonymity. Unlike in \cite{sync-batch}, it may be
|
|
impractical to synchronize on network batches by dropping chunks from
|
|
a batch that arrive late at a given node---unless Tor moves away from
|
|
stream processing to a more loss-tolerant processing of traffic (cf.\
|
|
section~\ref{subsec:stream-vs-packet}). In other words, there would
|
|
probably be no direct attempt to synchronize on batches of data
|
|
entering the Tor network at the same time. Rather, it is the link
|
|
level batching that will add noise to the traffic patterns exiting the
|
|
network. Similarly, if end-to-end traffic confirmation is the
|
|
concern, there is little point in mixing. It might also be feasible to
|
|
pad chunks to uniform size as is done now for cells; if this is link
|
|
padding rather than end-to-end, then it will take less overhead,
|
|
especially in bursty environments. This is another way in which it
|
|
would be fairly practical to set up a mid-latency option within the
|
|
existing Tor network. Other padding regimens might supplement the
|
|
mid-latency option; however, we should continue the caution with which
|
|
we have always approached padding lest the overhead cost us either
|
|
performance or volunteers.
|
|
|
|
The distinction between traffic confirmation and traffic analysis is
|
|
not as practically cut and dried as we might wish. In \cite{} it was
|
|
shown that if latencies to and/or data volumes of various popular
|
|
responder destinations are catalogued, it may not be necessary to
|
|
observe both ends of a stream to confirm a source-destination link.
|
|
These are likely to entail high variability and massive storage since
|
|
routes through the network to each site will be random even if they
|
|
have relatively unique latency or volume characteristics. So these do
|
|
not seem an immediate practical threat. Further along similar lines, in
|
|
\cite{attack-tor-oak05}, it was shown that an outside attacker can
|
|
trace a stream through the Tor network while a stream is still active
|
|
simply by observing the latency of his own traffic sent through
|
|
various Tor nodes. These attacks are especially significant since they
|
|
counter previous results that running one's own onion router protects
|
|
better than using the network from the outside. The attacks do not
|
|
show the client address, only the first server within the Tor network,
|
|
making helper nodes all the more worthy of exploration for enclave
|
|
protection. Setting up a mid-latency subnet as described above would
|
|
be another significant step to evaluating resistance to such attacks.
|
|
|
|
The attacks in \cite{attack-tor-oak05} are also dependent on
|
|
cooperation of the responding application or the ability to modify or
|
|
monitor the responder stream, in order of decreasing attack
|
|
effectiveness. So, another way to counter these attacks in some cases
|
|
would be to employ caching of responses. This is infeasible for
|
|
application data that is not relatively static and from frequently
|
|
visited sites; however, it might be useful for DNS lookups. This is
|
|
also likely to be trading one practical threat for another. To be
|
|
useful, such caches would need to be distributed to any likely exit
|
|
nodes of recurred requests for the same data. Aside from the logistic
|
|
difficulties and overhead of distribution, they constitute a collected
|
|
record of destinations and/or data visited by Tor users. While
|
|
limited to network insiders, given the need for wide distribution
|
|
they could serve as useful data to an attacker deciding which locations
|
|
to target for confirmation.
|
|
|
|
[nick will work on this]
|
|
|
|
\subsection{Application support: socks doesn't solve all our problems}
|
|
|
|
socks4a isn't everywhere. the dns problem. etc.
|
|
|
|
nick will work on this.
|
|
|
|
\subsection{Measuring performance and capacity}
|
|
|
|
How to measure performance without letting people selectively deny service
|
|
by distinguishing pings. Heck, just how to measure performance at all. In
|
|
practice people have funny firewalls that don't match up to their exit
|
|
policies and Tor doesn't deal.
|
|
|
|
Network investigation: Is all this bandwidth publishing thing a good idea?
|
|
How can we collect stats better? Note weasel's smokeping, at
|
|
http://seppia.noreply.org/cgi-bin/smokeping.cgi?target=Tor
|
|
which probably gives george and steven enough info to break tor?
|
|
|
|
[nick will work on this section, unless arma gets there first]
|
|
|
|
\subsection{Anonymity benefits for running a server}
|
|
|
|
Does running a server help you or harm you? George's Oakland attack.
|
|
|
|
Plausible deniability -- without even running your traffic through Tor!
|
|
But nobody knows about Tor, and the legal situation is fuzzy, so this
|
|
isn't very true really.
|
|
|
|
We have to pick the path length so adversary can't distinguish client from
|
|
server (how many hops is good?).
|
|
|
|
in practice, plausible deniability is hypothetical and doesn't seem very
|
|
convincing. if ISPs find the activity antisocial, they don't care *why*
|
|
your computer is doing that behavior.
|
|
|
|
[arma will write this section]
|
|
|
|
\subsection{Helper nodes}
|
|
|
|
When does fixing your entry or exit node help you?
|
|
Helper nodes in the literature don't deal with churn, and
|
|
especially active attacks to induce churn.
|
|
|
|
Do general DoS attacks have anonymity implications? See e.g. Adam
|
|
Back's IH paper, but I think there's more to be pointed out here.
|
|
|
|
Game theory for helper nodes: if Alice offers a hidden service on a
|
|
server (enclave model), and nobody ever uses helper nodes, then against
|
|
George+Steven's attack she's totally nailed. If only Alice uses a helper
|
|
node, then she's still identified as the source of the data. If everybody
|
|
uses a helper node (including Alice), then the attack identifies the
|
|
helper node and also Alice, and knows which one is which. If everybody
|
|
uses a helper node (but not Alice), then the attacker figures the real
|
|
source was a client that is using Alice as a helper node. [How's my
|
|
logic here?]
|
|
|
|
point to routing-zones section re: helper nodes to defend against
|
|
big stuff.
|
|
|
|
[nick will write this section]
|
|
|
|
\subsection{Location-hidden services}
|
|
|
|
[arma will write this section]
|
|
|
|
Survivable services are new in practice, yes? Hidden services seem
|
|
less hidden than we'd like, since they stay in one place and get used
|
|
a lot. They're the epitome of the need for helper nodes. This means
|
|
that using Tor as a building block for Free Haven is going to be really
|
|
hard. Also, they're brittle in terms of intersection and observation
|
|
attacks. Would be nice to have hot-swap services, but hard to design.
|
|
|
|
people are using hidden services as a poor man's vpn and firewall-buster.
|
|
rather than playing with dyndns and trying to pierce holes in their
|
|
firewall (say, so they can ssh in from the outside), they run a hidden
|
|
service on the inside and then rendezvous with that hidden service
|
|
externally.
|
|
|
|
in practice, sites like bloggers without borders (www.b19s.org) are
|
|
running tor servers but more important are advertising a hidden-service
|
|
address on their front page. doing this can provide increased robustness
|
|
if they used the dual-IP approach we describe in tor-design, but in
|
|
practice they do it to a) increase visibility of the tor project and their
|
|
support for privacy, and b) to offer a way for their users, using vanilla
|
|
software, to get end-to-end encryption and end-to-end authentication to
|
|
their website.
|
|
|
|
|
|
\subsection{Trust and discovery}
|
|
|
|
[arma will edit this and expand/retract it]
|
|
|
|
The published Tor design adopted a deliberately simplistic design for
|
|
authorizing new nodes and informing clients about servers and their status.
|
|
In the early Tor designs, all ORs periodically uploaded a signed description
|
|
of their locations, keys, and capabilities to each of several well-known {\it
|
|
directory servers}. These directory servers constructed a signed summary
|
|
of all known ORs (a ``directory''), and a signed statement of which ORs they
|
|
believed to be operational at any given time (a ``network status''). Clients
|
|
periodically downloaded a directory in order to learn the latest ORs and
|
|
keys, and more frequently downloaded a network status to learn which ORs are
|
|
likely to be running. ORs also operate as directory caches, in order to
|
|
lighten the bandwidth on the authoritative directory servers.
|
|
|
|
In order to prevent Sybil attacks (wherein an adversary signs up many
|
|
purportedly independent servers in order to increase her chances of observing
|
|
a stream as it enters and leaves the network), the early Tor directory design
|
|
required the operators of the authoritative directory servers to manually
|
|
approve new ORs. Unapproved ORs were included in the directory, but clients
|
|
did not use them at the start or end of their circuits. In practice,
|
|
directory administrators performed little actual verification, and tended to
|
|
approve any OR whose operator could compose a coherent email. This procedure
|
|
may have prevented trivial automated Sybil attacks, but would do little
|
|
against a clever attacker.
|
|
|
|
There are a number of flaws in this system that need to be addressed as we
|
|
move forward. They include:
|
|
\begin{tightlist}
|
|
\item Each directory server represents an independent point of failure; if
|
|
any one were compromised, it could immediately compromise all of its users
|
|
by recommending only compromised ORs.
|
|
\item The more servers appear join the network, the more unreasonable it
|
|
becomes to expect clients to know about them all. Directories
|
|
become unfeasibly large, and downloading the list of servers becomes
|
|
burdonsome.
|
|
\item The validation scheme may do as much harm as it does good. It is not
|
|
only incapable of preventing clever attackers from mounting Sybil attacks,
|
|
but may deter server operators from joining the network. (For instance, if
|
|
they expect the validation process to be difficult, or if they do not share
|
|
any languages in common with the directory server operators.)
|
|
\end{tightlist}
|
|
|
|
We could try to move the system in several directions, depending on our
|
|
choice of threat model and requirements. If we did not need to increase
|
|
network capacity in order to support more users, there would be no reason not
|
|
to adopt even stricter validation requirements, and reduce the number of
|
|
servers in the network to a trusted minimum. But since we want Tor to work
|
|
for as many users as it can, we need XXXXX
|
|
|
|
In order to address the first two issues, it seems wise to move to a system
|
|
including a number of semi-trusted directory servers, no one of which can
|
|
compromise a user on its own. Ultimately, of course, we cannot escape the
|
|
problem of a first introducer: since most users will run Tor in whatever
|
|
configuration the software ships with, the Tor distribution itself will
|
|
remain a potential single point of failure so long as it includes the seed
|
|
keys for directory servers, a list of directory servers, or any other means
|
|
to learn which servers are on the network. But omitting this information
|
|
from the Tor distribution would only delegate the trust problem to the
|
|
individual users, most of whom are presumably less informed about how to make
|
|
trust decisions than the Tor developers.
|
|
|
|
%Network discovery, sybil, node admission, scaling. It seems that the code
|
|
%will ship with something and that's our trust root. We could try to get
|
|
%people to build a web of trust, but no. Where we go from here depends
|
|
%on what threats we have in mind. Really decentralized if your threat is
|
|
%RIAA; less so if threat is to application data or individuals or...
|
|
|
|
\section{Crossroads: Scaling}
|
|
%\label{sec:crossroads-scaling}
|
|
%P2P + anonymity issues:
|
|
|
|
Tor is running today with hundreds of servers and tens of thousands of
|
|
users, but it will certainly not scale to millions.
|
|
|
|
Scaling Tor involves three main challenges. First is safe server
|
|
discovery, both bootstrapping -- how a Tor client can robustly find an
|
|
initial server list -- and ongoing -- how a Tor client can learn about
|
|
a fair sample of honest servers and not let the adversary control his
|
|
circuits (see Section x). Second is detecting and handling the speed
|
|
and reliability of the variety of servers we must use if we want to
|
|
accept many servers (see Section y).
|
|
Since the speed and reliability of a circuit is limited by its worst link,
|
|
we must learn to track and predict performance. Finally, in order to get
|
|
a large set of servers in the first place, we must address incentives
|
|
for users to carry traffic for others (see Section incentives).
|
|
|
|
\subsection{Incentives by Design}
|
|
|
|
[nick will try to make this section shorter and more to the point.]
|
|
|
|
[most of the technical incentive schemes in the literature introduce
|
|
anonymity issues which we don't understand yet, and we seem to be doing
|
|
ok without them]
|
|
|
|
There are three behaviors we need to encourage for each server: relaying
|
|
traffic; providing good throughput and reliability while doing it;
|
|
and allowing traffic to exit the network from that server.
|
|
|
|
We encourage these behaviors through \emph{indirect} incentives, that
|
|
is, designing the system and educating users in such a way that users
|
|
with certain goals will choose to relay traffic. In practice, the
|
|
main incentive for running a Tor server is social benefit: volunteers
|
|
altruistically donate their bandwidth and time. We also keep public
|
|
rankings of the throughput and reliability of servers, much like
|
|
seti@home. We further explain to users that they can get \emph{better
|
|
security} by operating a server, because they get plausible deniability
|
|
(indeed, they may not need to route their own traffic through Tor at all
|
|
-- blending directly with other traffic exiting Tor may be sufficient
|
|
protection for them), and because they can use their own Tor server
|
|
as entry or exit point and be confident it's not run by the adversary.
|
|
Finally, we can improve the usability and feature set of the software:
|
|
rate limiting support and easy packaging decrease the hassle of
|
|
maintaining a server, and our configurable exit policies allow each
|
|
operator to advertise a policy describing the hosts and ports to which
|
|
he feels comfortable connecting.
|
|
|
|
Beyond these, however, there is also a need for \emph{direct} incentives:
|
|
providing payment or other resources in return for high-quality service.
|
|
Paying actual money is problematic: decentralized e-cash systems are
|
|
not yet practical, and a centralized collection system not only reduces
|
|
robustness, but also has failed in the past (the history of commercial
|
|
anonymizing networks is littered with failed attempts). A more promising
|
|
option is to use a tit-for-tat incentive scheme: provide better service
|
|
to nodes that have provided good service to you.
|
|
|
|
Unfortunately, such an approach introduces new anonymity problems.
|
|
Does the incentive system enable the adversary to attract more traffic by
|
|
performing well? Typically a user who chooses evenly from all options is
|
|
most resistant to an adversary targetting him, but that approach prevents
|
|
us from handling heterogeneous servers \cite{casc-rep}.
|
|
When a server (call him Steve) performs well for Alice, does Steve gain
|
|
reputation with the entire system, or just with Alice? If the entire
|
|
system, how does Alice tell everybody about her experience in a way that
|
|
prevents her from lying about it yet still protects her identity? If
|
|
Steve's behavior only affects Alice's behavior, does this allow Steve to
|
|
selectively perform only for Alice, and then break her anonymity later
|
|
when somebody (presumably Alice) routes through his node?
|
|
|
|
These are difficult and open questions, yet choosing not to scale means
|
|
leaving most users to a less secure network or no anonymizing network
|
|
at all. We will start with a simplified approach to the tit-for-tat
|
|
incentive scheme based on two rules: (1) each node should measure the
|
|
service it receives from adjacent nodes, and provide service relative to
|
|
the received service, but (2) when a node is making decisions that affect
|
|
its own security (e.g. when building a circuit for its own application
|
|
connections), it should choose evenly from a sufficiently large set of
|
|
nodes that meet some minimum service threshold. This approach allows us
|
|
to discourage bad service without opening Alice up as much to attacks.
|
|
|
|
%XXX rewrite the above so it sounds less like a grant proposal and
|
|
%more like a "if somebody were to try to solve this, maybe this is a
|
|
%good first step".
|
|
|
|
%We should implement the above incentive scheme in the
|
|
%deployed Tor network, in conjunction with our plans to add the necessary
|
|
%associated scalability mechanisms. We will do experiments (simulated
|
|
%and/or real) to determine how much the incentive system improves
|
|
%efficiency over baseline, and also to determine how far we are from
|
|
%optimal efficiency (what we could get if we ignored the anonymity goals).
|
|
|
|
\subsection{Peer-to-peer / practical issues}
|
|
|
|
[leave this section for now, and make sure things here are covered
|
|
elsewhere.]
|
|
|
|
Making use of servers with little bandwidth. How to handle hammering by
|
|
certain applications.
|
|
|
|
Handling servers that are far away from the rest of the network, e.g. on
|
|
the continents that aren't North America and Europe. High latency,
|
|
often high packet loss.
|
|
|
|
Running Tor servers behind NATs, behind great-firewalls-of-China, etc.
|
|
Restricted routes. How to propagate to everybody the topology? BGP
|
|
style doesn't work because we don't want just *one* path. Point to
|
|
Geoff's stuff.
|
|
|
|
\subsection{ISP-class adversaries}
|
|
|
|
[arma will write this]
|
|
|
|
Routing-zones. It seems that our threat model comes down to diversity and
|
|
dispersal. But hard for Alice to know how to act. Many questions remain.
|
|
|
|
\subsection{The China problem}
|
|
|
|
Citizens in a variety of countries, such as most recently China and
|
|
Iran, are periodically blocked from accessing various sites outside
|
|
their country. These users try to find any tools available to allow
|
|
them to get-around these firewalls. Some anonymity networks, such as
|
|
Six-Four~\cite{six-four}, are designed specifically with this goal in
|
|
mind; others like the Anonymizer~\cite{anonymizer} are paid by sponsors
|
|
such as Voice of America to set up a network to encourage `Internet
|
|
freedom'~\cite{voice-of-america-anonymizer}. Even though Tor wasn't
|
|
designed with ubiquitous access to the network in mind, thousands of
|
|
users across the world are trying to use it for exactly this purpose.
|
|
% Academic and NGO organizations, peacefire, \cite{berkman}, etc
|
|
|
|
Anti-censorship networks hoping to bridge country-level blocks face
|
|
a variety of challenges. One of these is that they need to find enough
|
|
exit nodes---servers on the `free' side that are willing to relay
|
|
arbitrary traffic from users to their final destinations. Anonymizing
|
|
networks including Tor are well-suited to this task, since we have
|
|
already gathered a set of exit nodes that are willing to tolerate some
|
|
political heat.
|
|
|
|
The other main challenge is to distribute a list of reachable relays
|
|
to the users inside the country, and give them software to use them,
|
|
without letting the authorities also enumerate this list and block each
|
|
relay. Anonymizer solves this by buying lots of seemingly-unrelated IP
|
|
addresses (or having them donated), abandoning old addresses as they are
|
|
`used up', and telling a few users about the new ones. Distributed
|
|
anonymizing networks again have an advantage here, in that we already
|
|
have tens of thousands of separate IP addresses whose users might
|
|
volunteer to provide this service since they've already installed and use
|
|
the software for their own privacy~\cite{koepsell-wpes2004}. Because
|
|
the Tor protocol separates routing from network discovery (see Section
|
|
\ref{do-we-discuss-this?}), volunteers could configure their Tor clients
|
|
to generate server descriptors and send them to a special directory
|
|
server that gives them out to dissidents who need to get around blocks.
|
|
|
|
Of course, this still doesn't prevent the adversary
|
|
from enumerating all the volunteer relays and blocking them preemptively.
|
|
Perhaps a tiered-trust system could be built where a few individuals are
|
|
given relays' locations, and they recommend other individuals by telling them
|
|
those addresses, thus providing a built-in incentive to avoid letting the
|
|
adversary intercept them. Max-flow trust algorithms~\cite{advogato}
|
|
might help to bound the number of IP addresses leaked to the adversary. Groups
|
|
like the W3C are looking into using Tor as a component in an overall system to
|
|
help address censorship; we wish them luck.
|
|
|
|
%\cite{infranet}
|
|
|
|
\subsection{Non-clique topologies}
|
|
|
|
[nick will try to shrink this section]
|
|
|
|
Because of its threat model that is substantially weaker than high
|
|
latency mixnets, Tor is actually in a potentially better position to
|
|
scale at least initially. From the perspective of a mix network, one
|
|
of the worst things that can happen is partitioning. The more
|
|
potential senders of messages entering the network the better the
|
|
anonymity. Roughly, if a network is, e.g., split in half, then your
|
|
anonymity is cut in half. Attacks become half as hard (if they're
|
|
linear in network size), etc. In some sense this is still true for
|
|
Tor: if you want to know who Alice is talking to, you can watch her
|
|
for one end of a circuit. For a half size network, you then only have
|
|
to brute force examine half as many nodes to find the other end. But
|
|
Tor is not meant to cope with someone directly attacking many dozens
|
|
of nodes in a few minutes. It was meant to cope with traffic
|
|
confirmation attacks. And, these are independent of the size of the
|
|
network. So, a simple possibility when the scale of a Tor network
|
|
exceeds some size is to simply split it. Care could be taken in
|
|
allocating which nodes go to which network along the lines of
|
|
\cite{casc-rep} to insure that collaborating hostile nodes are not
|
|
able to gain any advantage in network splitting that they do not
|
|
already have in joining a network.
|
|
|
|
The attacks in \cite{attack-tor-oak05} show that certain types of
|
|
brute force attacks are in fact feasible; however they make the
|
|
above point stronger not weaker. The attacks do not appear to be
|
|
significantly more difficult to mount against a network that is
|
|
twice the size. Also, they only identify the Tor nodes used in a
|
|
circuit, not the client. Finally note that even if the network is split,
|
|
a client does not need to use just one of the two resulting networks.
|
|
Alice could use either of them, and it would not be difficult to make
|
|
the Tor client able to access several such network on a per circuit
|
|
basis. More analysis is needed; we simply note here that splitting
|
|
a Tor network is an easy way to achieve moderate scalability and that
|
|
it does not necessarily have the same implications as splitting a mixnet.
|
|
|
|
Alternatively, we can try to scale a single network. Some issues for
|
|
scaling include how many neighbors can nodes support and how many
|
|
users (and how much application traffic capacity) can the network
|
|
handle for each new node that comes into the network. This depends on
|
|
many things, most notably the traffic capacity of the new nodes. We
|
|
can observe, however, that adding a tor node of any feasible bandwidth
|
|
will increase the traffic capacity of the network. This means that, as
|
|
a first step to scaling, we can focus on the interconnectivity of the
|
|
nodes, followed by directories, discovery, etc.
|
|
|
|
By reducing the connectivity of the network we increase the total
|
|
number of nodes that the network can contain. Anonymity implications
|
|
of restricted routes for mix networks have already been explored by
|
|
Danezis~\cite{danezis-pets03}. That paper explicitly considered only
|
|
traffic analysis resistance provided by a mix network and sidestepped
|
|
questions of traffic confirmation resistance. But, Tor is designed
|
|
only to resist traffic confirmation. For this and other reasons, we
|
|
cannot simply adopt his mixnet results to onion routing networks. If
|
|
an attacker gains minimal increase in the likelyhood of compromising
|
|
the endpoints of a Tor circuit through a sparse network (vs.\ a clique
|
|
on the same node set), then the restriction will have had minimal
|
|
impact on the anonymity provided by that network.
|
|
|
|
The approach Danezis describes is based on expander graphs, i.e.,
|
|
graphs in which any subgraph of nodes is likely to have lots of nodes
|
|
as neighbors. For Tor, we may not need to have an expander per se, it
|
|
may be enough to have a single subnet that is highly connected. As an
|
|
example, assume fifty nodes of relatively high traffic capacity. This
|
|
\emph{center} forms are a clique. Assume each center node can each
|
|
handle 200 connections to other nodes (including the other ones in the
|
|
center). Assume every noncenter node connects to three nodes in the
|
|
center and anyone out of the center that they want to. Then the
|
|
network easily scales to c. 2500 nodes with commensurate increase in
|
|
bandwidth. There are many open questions: how directory information
|
|
is distributed (presumably information about the center nodes could
|
|
be given to any new nodes with their codebase), whether center nodes
|
|
will need to function as a `backbone', etc. As above the point is
|
|
that this would create problems for the expected anonymity for a mixnet,
|
|
but for an onion routing network where anonymity derives largely from
|
|
the edges, it may be feasible.
|
|
|
|
Another point is that we already have a non-clique topology.
|
|
Individuals can set up and run Tor nodes without informing the
|
|
directory servers. This will allow, e.g., dissident groups to run a
|
|
local Tor network of such nodes that connects to the public Tor
|
|
network. This network is hidden behind the Tor network and its
|
|
only visible connection to Tor at those points where it connects.
|
|
As far as the public network is concerned or anyone observing it,
|
|
they are running clients.
|
|
|
|
|
|
|
|
|
|
\section{The Future}
|
|
\label{sec:conclusion}
|
|
|
|
we should put random thoughts here until there are enough for a
|
|
conclusion.
|
|
|
|
will our sustainability approach work? we'll see.
|
|
|
|
"These are difficult and open questions, yet choosing not to solve them
|
|
means leaving most users to a less secure network or no anonymizing
|
|
network at all."
|
|
|
|
\bibliographystyle{plain} \bibliography{tor-design}
|
|
|
|
\appendix
|
|
|
|
\begin{figure}[t]
|
|
%\unitlength=1in
|
|
\centering
|
|
%\begin{picture}(6.0,2.0)
|
|
%\put(3,1){\makebox(0,0)[c]{\epsfig{figure=graphnodes,width=6in}}}
|
|
%\end{picture}
|
|
\mbox{\epsfig{figure=graphnodes,width=5in}}
|
|
\caption{Number of servers over time. Lowest line is number of exit
|
|
nodes that allow connections to port 80. Middle line is total number of
|
|
verified (registered) servers. The line above that represents servers
|
|
that are not yet registered.}
|
|
\label{fig:graphnodes}
|
|
\end{figure}
|
|
|
|
\begin{figure}[t]
|
|
\centering
|
|
\mbox{\epsfig{figure=graphtraffic,width=5in}}
|
|
\caption{The sum of traffic reported by each server over time. The bottom
|
|
pair show average throughput, and the top pair represent the largest 15
|
|
minute burst in each 4 hour period.}
|
|
\label{fig:graphtraffic}
|
|
\end{figure}
|
|
|
|
\end{document}
|
|
|