mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-11-20 18:22:09 +01:00
4ea3835735
svn:r8934
1671 lines
83 KiB
TeX
1671 lines
83 KiB
TeX
\documentclass{llncs}
|
|
|
|
\usepackage{url}
|
|
\usepackage{amsmath}
|
|
\usepackage{epsfig}
|
|
|
|
\setlength{\textwidth}{5.9in}
|
|
\setlength{\textheight}{8.4in}
|
|
\setlength{\topmargin}{.5cm}
|
|
\setlength{\oddsidemargin}{1cm}
|
|
\setlength{\evensidemargin}{1cm}
|
|
|
|
\newenvironment{tightlist}{\begin{list}{$\bullet$}{
|
|
\setlength{\itemsep}{0mm}
|
|
\setlength{\parsep}{0mm}
|
|
% \setlength{\labelsep}{0mm}
|
|
% \setlength{\labelwidth}{0mm}
|
|
% \setlength{\topsep}{0mm}
|
|
}}{\end{list}}
|
|
|
|
\begin{document}
|
|
|
|
\title{Design of a blocking-resistant anonymity system\\DRAFT}
|
|
|
|
%\author{Roger Dingledine\inst{1} \and Nick Mathewson\inst{1}}
|
|
\author{Roger Dingledine \and Nick Mathewson}
|
|
\institute{The Free Haven Project\\
|
|
\email{\{arma,nickm\}@freehaven.net}}
|
|
|
|
\maketitle
|
|
\pagestyle{plain}
|
|
|
|
\begin{abstract}
|
|
|
|
Websites around the world are increasingly being blocked by
|
|
government-level firewalls. Many people use anonymizing networks like
|
|
Tor to contact sites without letting an attacker trace their activities,
|
|
and as an added benefit they are no longer affected by local censorship.
|
|
But if the attacker simply denies access to the Tor network itself,
|
|
blocked users can no longer benefit from the security Tor offers.
|
|
|
|
Here we describe a design that builds upon the current Tor network
|
|
to provide an anonymizing network that resists blocking
|
|
by government-level attackers.
|
|
|
|
\end{abstract}
|
|
|
|
\section{Introduction and Goals}
|
|
|
|
Anonymizing networks such as Tor~\cite{tor-design} bounce traffic around
|
|
a network of relays. They aim to hide not only what is being said, but
|
|
also who is communicating with whom, which users are using which websites,
|
|
and so on. These systems have a broad range of users, including ordinary
|
|
citizens who want to avoid being profiled for targeted advertisements,
|
|
corporations who don't want to reveal information to their competitors,
|
|
and law enforcement and government intelligence agencies who need to do
|
|
operations on the Internet without being noticed.
|
|
|
|
Historically, research on anonymizing systems has focused on a passive
|
|
attacker who monitors the user (call her Alice) and tries to discover her
|
|
activities, yet lets her reach any piece of the network. In more modern
|
|
threat models such as Tor's, the adversary is allowed to perform active
|
|
attacks such as modifying communications to trick Alice
|
|
into revealing her destination, or intercepting some connections
|
|
to run a man-in-the-middle attack. But these systems still assume that
|
|
Alice can eventually reach the anonymizing network.
|
|
|
|
An increasing number of users are using the Tor software
|
|
less for its anonymity properties than for its censorship
|
|
resistance properties---if they use Tor to access Internet sites like
|
|
Wikipedia
|
|
and Blogspot, they are no longer affected by local censorship
|
|
and firewall rules. In fact, an informal user study (described in
|
|
Appendix~\ref{app:geoip}) showed China as the third largest user base
|
|
for Tor clients, with perhaps ten thousand people accessing the Tor
|
|
network from China each day.
|
|
|
|
The current Tor design is easy to block if the attacker controls Alice's
|
|
connection to the Tor network---by blocking the directory authorities,
|
|
by blocking all the server IP addresses in the directory, or by filtering
|
|
based on the signature of the Tor TLS handshake. Here we describe a
|
|
design that builds upon the current Tor network to provide an anonymizing
|
|
network that also resists this blocking. Specifically,
|
|
Section~\ref{sec:adversary} discusses our threat model---that is,
|
|
the assumptions we make about our adversary; Section~\ref{sec:current-tor}
|
|
describes the components of the current Tor design and how they can be
|
|
leveraged for a new blocking-resistant design; Section~\ref{sec:related}
|
|
explains the features and drawbacks of the currently deployed solutions;
|
|
and ...
|
|
|
|
% The other motivation is for places where we're concerned they will
|
|
% try to enumerate a list of Tor users. So even if they're not blocking
|
|
% the Tor network, it may be smart to not be visible as connecting to it.
|
|
|
|
%And adding more different classes of users and goals to the Tor network
|
|
%improves the anonymity for all Tor users~\cite{econymics,usability:weis2006}.
|
|
|
|
% Adding use classes for countering blocking as well as anonymity has
|
|
% benefits too. Should add something about how providing undetected
|
|
% access to Tor would facilitate people talking to, e.g., govt. authorities
|
|
% about threats to public safety etc. in an environment where Tor use
|
|
% is not otherwise widespread and would make one stand out.
|
|
|
|
\section{Adversary assumptions}
|
|
\label{sec:adversary}
|
|
|
|
The history of blocking-resistance designs is littered with conflicting
|
|
assumptions about what adversaries to expect and what problems are
|
|
in the critical path to a solution. Here we try to enumerate our best
|
|
understanding of the current situation around the world.
|
|
|
|
In the traditional security style, we aim to describe a strong
|
|
attacker---if we can defend against this attacker, we inherit protection
|
|
against weaker attackers as well. After all, we want a general design
|
|
that will work for citizens of China, Iran, Thailand, and other censored
|
|
countries; for
|
|
whistleblowers in firewalled corporate network; and for people in
|
|
unanticipated oppressive situations. In fact, by designing with
|
|
a variety of adversaries in mind, we can take advantage of the fact that
|
|
adversaries will be in different stages of the arms race at each location,
|
|
so a server blocked in one locale can still be useful in others.
|
|
|
|
We assume there are three main network attacks in use by censors
|
|
currently~\cite{clayton:pet2006}:
|
|
|
|
\begin{tightlist}
|
|
\item Block a destination or type of traffic by automatically searching for
|
|
certain strings or patterns in TCP packets.
|
|
\item Block a destination by manually listing its IP address at the
|
|
firewall.
|
|
\item Intercept DNS requests and give bogus responses for certain
|
|
destination hostnames.
|
|
\end{tightlist}
|
|
|
|
We assume the network firewall has limited CPU and memory per
|
|
connection~\cite{clayton:pet2006}. Against an adversary who carefully
|
|
examines the contents of every packet, we would need
|
|
some stronger mechanism such as steganography, which introduces its
|
|
own problems~\cite{active-wardens,tcpstego,bar}.
|
|
|
|
More broadly, we assume that the authorities are more likely to
|
|
block a given system as its popularity grows. That is, a system
|
|
used by only a few users will probably never be blocked, whereas a
|
|
well-publicized system with many users will receive much more scrutiny.
|
|
|
|
We assume that readers of blocked content are not in as much danger
|
|
as publishers. So far in places like China, the authorities mainly go
|
|
after people who publish materials and coordinate organized
|
|
movements~\cite{mackinnon}.
|
|
If they find that a user happens
|
|
to be reading a site that should be blocked, the typical response is
|
|
simply to block the site. Of course, even with an encrypted connection,
|
|
the adversary may be able to distinguish readers from publishers by
|
|
observing whether Alice is mostly downloading bytes or mostly uploading
|
|
them---we discuss this issue more in Section~\ref{subsec:upload-padding}.
|
|
|
|
We assume that while various different regimes can coordinate and share
|
|
notes, there will be a time lag between one attacker learning
|
|
how to overcome a facet of our design and other attackers picking it up.
|
|
Similarly, we assume that in the early stages of deployment the insider
|
|
threat isn't as high of a risk, because no attackers have put serious
|
|
effort into breaking the system yet.
|
|
|
|
We do not assume that government-level attackers are always uniform across
|
|
the country. For example, there is no single centralized place in China
|
|
that coordinates its specific censorship decisions and steps.
|
|
|
|
We assume that our users have control over their hardware and
|
|
software---they don't have any spyware installed, there are no
|
|
cameras watching their screens, etc. Unfortunately, in many situations
|
|
these threats are real~\cite{zuckerman-threatmodels}; yet
|
|
software-based security systems like ours are poorly equipped to handle
|
|
a user who is entirely observed and controlled by the adversary. See
|
|
Section~\ref{subsec:cafes-and-livecds} for more discussion of what little
|
|
we can do about this issue.
|
|
|
|
We assume that widespread access to the Internet is economically,
|
|
politically, and/or
|
|
socially valuable to the policymakers of each deployment country. After
|
|
all, if censorship
|
|
is more important than Internet access, the firewall administrators have
|
|
an easy job: they should simply block everything. The corollary to this
|
|
assumption is that we should design so that increased blocking of our
|
|
system results in increased economic damage or public outcry.
|
|
|
|
We assume that the user will be able to fetch a genuine
|
|
version of Tor, rather than one supplied by the adversary; see
|
|
Section~\ref{subsec:trust-chain} for discussion on helping the user
|
|
confirm that he has a genuine version and that he can connect to the
|
|
real Tor network.
|
|
|
|
\section{Components of the current Tor design}
|
|
\label{sec:current-tor}
|
|
|
|
Tor is popular and sees a lot of use. It's the largest anonymity
|
|
network of its kind.
|
|
Tor has attracted more than 800 volunteer-operated routers from around the
|
|
world. Tor protects users by routing their traffic through a multiply
|
|
encrypted ``circuit'' built of a few randomly selected servers, each of which
|
|
can remove only a single layer of encryption. Each server sees only the step
|
|
before it and the step after it in the circuit, and so no single server can
|
|
learn the connection between a user and her chosen communication partners.
|
|
In this section, we examine some of the reasons why Tor has become popular,
|
|
with particular emphasis to how we can take advantage of these properties
|
|
for a blocking-resistance design.
|
|
|
|
Tor aims to provide three security properties:
|
|
\begin{tightlist}
|
|
\item 1. A local network attacker can't learn, or influence, your
|
|
destination.
|
|
\item 2. No single router in the Tor network can link you to your
|
|
destination.
|
|
\item 3. The destination, or somebody watching the destination,
|
|
can't learn your location.
|
|
\end{tightlist}
|
|
|
|
For blocking-resistance, we care most clearly about the first
|
|
property. But as the arms race progresses, the second property
|
|
will become important---for example, to discourage an adversary
|
|
from volunteering a relay in order to learn that Alice is reading
|
|
or posting to certain websites. The third property helps keep users safe from
|
|
collaborating websites: consider websites and other Internet services
|
|
that have been pressured
|
|
recently into revealing the identity of bloggers~\cite{arrested-bloggers}
|
|
or treating clients differently depending on their network
|
|
location~\cite{google-geolocation}.
|
|
% and cite{goodell-syverson06} once it's finalized.
|
|
|
|
The Tor design provides other features as well that are not typically
|
|
present in manual or ad hoc circumvention techniques.
|
|
|
|
First, the Tor directory authorities automatically aggregate, test,
|
|
and publish signed summaries of the available Tor routers. Tor clients
|
|
can fetch these summaries to learn which routers are available and
|
|
which routers are suitable for their needs. Directory information is cached
|
|
throughout the Tor network, so once clients have bootstrapped they never
|
|
need to interact with the authorities directly. (To tolerate a minority
|
|
of compromised directory authorities, we use a threshold trust scheme---
|
|
see Section~\ref{subsec:trust-chain} for details.)
|
|
|
|
Second, Tor clients can be configured to use any directory authorities
|
|
they want. They use the default authorities if no others are specified,
|
|
but it's easy to start a separate (or even overlapping) Tor network just
|
|
by running a different set of authorities and convincing users to prefer
|
|
a modified client. For example, we could launch a distinct Tor network
|
|
inside China; some users could even use an aggregate network made up of
|
|
both the main network and the China network. (But we should not be too
|
|
quick to create other Tor networks---part of Tor's anonymity comes from
|
|
users behaving like other users, and there are many unsolved anonymity
|
|
questions if different users know about different pieces of the network.)
|
|
|
|
Third, in addition to automatically learning from the chosen directories
|
|
which Tor routers are available and working, Tor takes care of building
|
|
paths through the network and rebuilding them as needed. So the user
|
|
never has to know how paths are chosen, never has to manually pick
|
|
working proxies, and so on. More generally, at its core the Tor protocol
|
|
is simply a tool that can build paths given a set of routers. Tor is
|
|
quite flexible about how it learns about the routers and how it chooses
|
|
the paths. Harvard's Blossom project~\cite{blossom-thesis} makes this
|
|
flexibility more concrete: Blossom makes use of Tor not for its security
|
|
properties but for its reachability properties. It runs a separate set
|
|
of directory authorities, its own set of Tor routers (called the Blossom
|
|
network), and uses Tor's flexible path-building to let users view Internet
|
|
resources from any point in the Blossom network.
|
|
|
|
Fourth, Tor separates the role of \emph{internal relay} from the
|
|
role of \emph{exit relay}. That is, some volunteers choose just to relay
|
|
traffic between Tor users and Tor routers, and others choose to also allow
|
|
connections to external Internet resources. Because we don't force all
|
|
volunteers to play both roles, we end up with more relays. This increased
|
|
diversity in turn is what gives Tor its security: the more options the
|
|
user has for her first hop, and the more options she has for her last hop,
|
|
the less likely it is that a given attacker will be watching both ends
|
|
of her circuit~\cite{tor-design}. As a bonus, because our design attracts
|
|
more internal relays that want to help out but don't want to deal with
|
|
being an exit relay, we end up with more options for the first hop---the
|
|
one most critical to being able to reach the Tor network.
|
|
|
|
Fifth, Tor is sustainable. Zero-Knowledge Systems offered the commercial
|
|
but now defunct Freedom Network~\cite{freedom21-security}, a design with
|
|
security comparable to Tor's, but its funding model relied on collecting
|
|
money from users to pay relay operators. Modern commercial proxy systems
|
|
similarly
|
|
need to keep collecting money to support their infrastructure. On the
|
|
other hand, Tor has built a self-sustaining community of volunteers who
|
|
donate their time and resources. This community trust is rooted in Tor's
|
|
open design: we tell the world exactly how Tor works, and we provide all
|
|
the source code. Users can decide for themselves, or pay any security
|
|
expert to decide, whether it is safe to use. Further, Tor's modularity
|
|
as described above, along with its open license, mean that its impact
|
|
will continue to grow.
|
|
|
|
Sixth, Tor has an established user base of hundreds of
|
|
thousands of people from around the world. This diversity of
|
|
users contributes to sustainability as above: Tor is used by
|
|
ordinary citizens, activists, corporations, law enforcement, and
|
|
even government and military users~\cite{tor-use-cases}, and they can
|
|
only achieve their security goals by blending together in the same
|
|
network~\cite{econymics,usability:weis2006}. This user base also provides
|
|
something else: hundreds of thousands of different and often-changing
|
|
addresses that we can leverage for our blocking-resistance design.
|
|
|
|
We discuss and adapt these components further in
|
|
Section~\ref{sec:bridges}. But first we examine the strengths and
|
|
weaknesses of other blocking-resistance approaches, so we can expand
|
|
our repertoire of building blocks and ideas.
|
|
|
|
\section{Current proxy solutions}
|
|
\label{sec:related}
|
|
|
|
Relay-based blocking-resistance schemes generally have two main
|
|
components: a relay component and a discovery component. The relay part
|
|
encompasses the process of establishing a connection, sending traffic
|
|
back and forth, and so on---everything that's done once the user knows
|
|
where she's going to connect. Discovery is the step before that: the
|
|
process of finding one or more usable relays.
|
|
|
|
For example, we can divide the pieces of Tor in the previous section
|
|
into the process of building paths and sending
|
|
traffic over them (relay) and the process of learning from the directory
|
|
servers about what routers are available (discovery). With this distinction
|
|
in mind, we now examine several categories of relay-based schemes.
|
|
|
|
\subsection{Centrally-controlled shared proxies}
|
|
|
|
Existing commercial anonymity solutions (like Anonymizer.com) are based
|
|
on a set of single-hop proxies. In these systems, each user connects to
|
|
a single proxy, which then relays traffic between the user and her
|
|
destination. These public proxy
|
|
systems are typically characterized by two features: they control and
|
|
operate the proxies centrally, and many different users get assigned
|
|
to each proxy.
|
|
|
|
In terms of the relay component, single proxies provide weak security
|
|
compared to systems that distribute trust over multiple relays, since a
|
|
compromised proxy can trivially observe all of its users' actions, and
|
|
an eavesdropper only needs to watch a single proxy to perform timing
|
|
correlation attacks against all its users' traffic and thus learn where
|
|
everyone is connecting. Worse, all users
|
|
need to trust the proxy company to have good security itself as well as
|
|
to not reveal user activities.
|
|
|
|
On the other hand, single-hop proxies are easier to deploy, and they
|
|
can provide better performance than distributed-trust designs like Tor,
|
|
since traffic only goes through one relay. They're also more convenient
|
|
from the user's perspective---since users entirely trust the proxy,
|
|
they can just use their web browser directly.
|
|
|
|
Whether public proxy schemes are more or less scalable than Tor is
|
|
still up for debate: commercial anonymity systems can use some of their
|
|
revenue to provision more bandwidth as they grow, whereas volunteer-based
|
|
anonymity systems can attract thousands of fast relays to spread the load.
|
|
|
|
The discovery piece can take several forms. Most commercial anonymous
|
|
proxies have one or a handful of commonly known websites, and their users
|
|
log in to those websites and relay their traffic through them. When
|
|
these websites get blocked (generally soon after the company becomes
|
|
popular), if the company cares about users in the blocked areas, they
|
|
start renting lots of disparate IP addresses and rotating through them
|
|
as they get blocked. They notify their users of new addresses (by email,
|
|
for example). It's an arms race, since attackers can sign up to receive the
|
|
email too, but operators have one nice trick available to them: because they
|
|
have a list of paying subscribers, they can notify certain subscribers
|
|
about updates earlier than others.
|
|
|
|
Access control systems on the proxy let them provide service only to
|
|
users with certain characteristics, such as paying customers or people
|
|
from certain IP address ranges.
|
|
|
|
Discovery in the face of a government-level firewall is a complex and
|
|
unsolved
|
|
topic, and we're stuck in this same arms race ourselves; we explore it
|
|
in more detail in Section~\ref{sec:discovery}. But first we examine the
|
|
other end of the spectrum---getting volunteers to run the proxies,
|
|
and telling only a few people about each proxy.
|
|
|
|
\subsection{Independent personal proxies}
|
|
|
|
Personal proxies such as Circumventor~\cite{circumventor} and
|
|
CGIProxy~\cite{cgiproxy} use the same technology as the public ones as
|
|
far as the relay component goes, but they use a different strategy for
|
|
discovery. Rather than managing a few centralized proxies and constantly
|
|
getting new addresses for them as the old addresses are blocked, they
|
|
aim to have a large number of entirely independent proxies, each managing
|
|
its own (much smaller) set of users.
|
|
|
|
As the Circumventor site explains, ``You don't
|
|
actually install the Circumventor \emph{on} the computer that is blocked
|
|
from accessing Web sites. You, or a friend of yours, has to install the
|
|
Circumventor on some \emph{other} machine which is not censored.''
|
|
|
|
This tactic has great advantages in terms of blocking-resistance---recall
|
|
our assumption in Section~\ref{sec:adversary} that the attention
|
|
a system attracts from the attacker is proportional to its number of
|
|
users and level of publicity. If each proxy only has a few users, and
|
|
there is no central list of proxies, most of them will never get noticed by
|
|
the censors.
|
|
|
|
On the other hand, there's a huge scalability question that so far has
|
|
prevented these schemes from being widely useful: how does the fellow
|
|
in China find a person in Ohio who will run a Circumventor for him? In
|
|
some cases he may know and trust some people on the outside, but in many
|
|
cases he's just out of luck. Just as hard, how does a new volunteer in
|
|
Ohio find a person in China who needs it?
|
|
|
|
% another key feature of a proxy run by your uncle is that you
|
|
% self-censor, so you're unlikely to bring abuse complaints onto
|
|
% your uncle. self-censoring clearly has a downside too, though.
|
|
|
|
This challenge leads to a hybrid design---centrally-distributed
|
|
personal proxies---which we will investigate in more detail in
|
|
Section~\ref{sec:discovery}.
|
|
|
|
\subsection{Open proxies}
|
|
|
|
Yet another currently used approach to bypassing firewalls is to locate
|
|
open and misconfigured proxies on the Internet. A quick Google search
|
|
for ``open proxy list'' yields a wide variety of freely available lists
|
|
of HTTP, HTTPS, and SOCKS proxies. Many small companies have sprung up
|
|
providing more refined lists to paying customers.
|
|
|
|
There are some downsides to using these open proxies though. First,
|
|
the proxies are of widely varying quality in terms of bandwidth and
|
|
stability, and many of them are entirely unreachable. Second, unlike
|
|
networks of volunteers like Tor, the legality of routing traffic through
|
|
these proxies is questionable: it's widely believed that most of them
|
|
don't realize what they're offering, and probably wouldn't allow it if
|
|
they realized. Third, in many cases the connection to the proxy is
|
|
unencrypted, so firewalls that filter based on keywords in IP packets
|
|
will not be hindered. And last, many users are suspicious that some
|
|
open proxies are a little \emph{too} convenient: are they run by the
|
|
adversary, in which case they get to monitor all the user's requests
|
|
just as single-hop proxies can?
|
|
|
|
A distributed-trust design like Tor resolves each of these issues for
|
|
the relay component, but a constantly changing set of thousands of open
|
|
relays is clearly a useful idea for a discovery component. For example,
|
|
users might be able to make use of these proxies to bootstrap their
|
|
first introduction into the Tor network.
|
|
|
|
\subsection{JAP}
|
|
|
|
Stefan's WPES paper~\cite{koepsell:wpes2004} is probably the closest
|
|
related work, and is
|
|
the starting point for the design in this paper.
|
|
|
|
\subsection{steganography}
|
|
|
|
infranet
|
|
|
|
\subsection{break your sensitive strings into multiple tcp packets;
|
|
ignore RSTs}
|
|
|
|
\subsection{Internal caching networks}
|
|
|
|
Freenet is deployed inside China and caches outside content.
|
|
|
|
\subsection{Skype}
|
|
|
|
port-hopping. encryption. voice communications not so susceptible to
|
|
keystroke loggers (even graphical ones).
|
|
|
|
|
|
\subsection{Tor itself}
|
|
|
|
And last, we include Tor itself in the list of current solutions
|
|
to firewalls. Tens of thousands of people use Tor from countries that
|
|
routinely filter their Internet. Tor's website has been blocked in most
|
|
of them. But why hasn't the Tor network been blocked yet?
|
|
|
|
We have several theories. The first is the most straightforward: tens of
|
|
thousands of people are simply too few to matter. It may help that Tor is
|
|
perceived to be for experts only, and thus not worth attention yet. The
|
|
more subtle variant on this theory is that we've positioned Tor in the
|
|
public eye as a tool for retaining civil liberties in more free countries,
|
|
so perhaps blocking authorities don't view it as a threat. (We revisit
|
|
this idea when we consider whether and how to publicize a Tor variant
|
|
that improves blocking-resistance---see Section~\ref{subsec:publicity}
|
|
for more discussion.)
|
|
|
|
The broader explanation is that the maintainance of most government-level
|
|
filters is aimed at stopping widespread information flow and appearing to be
|
|
in control, not by the impossible goal of blocking all possible ways to bypass
|
|
censorship. Censors realize that there will always
|
|
be ways for a few people to get around the firewall, and as long as Tor
|
|
has not publically threatened their control, they see no urgent need to
|
|
block it yet.
|
|
|
|
We should recognize that we're \emph{already} in the arms race. These
|
|
constraints can give us insight into the priorities and capabilities of
|
|
our various attackers.
|
|
|
|
\section{The relay component of our blocking-resistant design}
|
|
\label{sec:bridges}
|
|
|
|
Section~\ref{sec:current-tor} describes many reasons why Tor is
|
|
well-suited as a building block in our context, but several changes will
|
|
allow the design to resist blocking better. The most critical changes are
|
|
to get more relay addresses, and to distribute them to users differently.
|
|
|
|
%We need to address three problems:
|
|
%- adapting the relay component of Tor so it resists blocking better.
|
|
%- Discovery.
|
|
%- Tor's network signature.
|
|
|
|
%Here we describe the new pieces we need to add to the current Tor design.
|
|
|
|
\subsection{Bridge relays}
|
|
|
|
Today, Tor servers operate on less than a thousand distinct IP addresses;
|
|
an adversary
|
|
could enumerate and block them all with little trouble. To provide a
|
|
means of ingress to the network, we need a larger set of entry points, most
|
|
of which an adversary won't be able to enumerate easily. Fortunately, we
|
|
have such a set: the Tor users.
|
|
|
|
Hundreds of thousands of people around the world use Tor. We can leverage
|
|
our already self-selected user base to produce a list of thousands of
|
|
often-changing IP addresses. Specifically, we can give them a little
|
|
button in the GUI that says ``Tor for Freedom'', and users who click
|
|
the button will turn into \emph{bridge relays} (or just \emph{bridges}
|
|
for short). They can rate limit relayed connections to 10 KB/s (almost
|
|
nothing for a broadband user in a free country, but plenty for a user
|
|
who otherwise has no access at all), and since they are just relaying
|
|
bytes back and forth between blocked users and the main Tor network, they
|
|
won't need to make any external connections to Internet sites. Because
|
|
of this separation of roles, and because we're making use of software
|
|
that the volunteers have already installed for their own use, we expect
|
|
our scheme to attract and maintain more volunteers than previous schemes.
|
|
|
|
As usual, there are new anonymity and security implications from running a
|
|
bridge relay, particularly from letting people relay traffic through your
|
|
Tor client; but we leave this discussion for Section~\ref{sec:security}.
|
|
|
|
%...need to outline instructions for a Tor config that will publish
|
|
%to an alternate directory authority, and for controller commands
|
|
%that will do this cleanly.
|
|
|
|
\subsection{The bridge directory authority}
|
|
|
|
How do the bridge relays advertise their existence to the world? We
|
|
introduce a second new component of the design: a specialized directory
|
|
authority that aggregates and tracks bridges. Bridge relays periodically
|
|
publish server descriptors (summaries of their keys, locations, etc,
|
|
signed by their long-term identity key), just like the relays in the
|
|
``main'' Tor network, but in this case they publish them only to the
|
|
bridge directory authorities.
|
|
|
|
The main difference between bridge authorities and the directory
|
|
authorities for the main Tor network is that the main authorities provide
|
|
a list of every known relay, but the bridge authorities only give
|
|
out a server descriptor if you already know its identity key. That is,
|
|
you can keep up-to-date on a bridge's location and other information
|
|
once you know about it, but you can't just grab a list of all the bridges.
|
|
|
|
The identity key, IP address, and directory port for each bridge
|
|
authority ship by default with the Tor software, so the bridge relays
|
|
can be confident they're publishing to the right location, and the
|
|
blocked users can establish an encrypted authenticated channel. See
|
|
Section~\ref{subsec:trust-chain} for more discussion of the public key
|
|
infrastructure and trust chain.
|
|
|
|
Bridges use Tor to publish their descriptors privately and securely,
|
|
so even an attacker monitoring the bridge directory authority's network
|
|
can't make a list of all the addresses contacting the authority.
|
|
Bridges may publish to only a subset of the
|
|
authorities, to limit the potential impact of an authority compromise.
|
|
|
|
|
|
%\subsection{A simple matter of engineering}
|
|
%
|
|
%Although we've described bridges and bridge authorities in simple terms
|
|
%above, some design modifications and features are needed in the Tor
|
|
%codebase to add them. We describe the four main changes here.
|
|
%
|
|
%Firstly, we need to get smarter about rate limiting:
|
|
%Bandwidth classes
|
|
%
|
|
%Secondly, while users can in fact configure which directory authorities
|
|
%they use, we need to add a new type of directory authority and teach
|
|
%bridges to fetch directory information from the main authorities while
|
|
%publishing server descriptors to the bridge authorities. We're most of
|
|
%the way there, since we can already specify attributes for directory
|
|
%authorities:
|
|
%add a separate flag named ``blocking''.
|
|
%
|
|
%Thirdly, need to build paths using bridges as the first
|
|
%hop. One more hole in the non-clique assumption.
|
|
%
|
|
%Lastly, since bridge authorities don't answer full network statuses,
|
|
%we need to add a new way for users to learn the current status for a
|
|
%single relay or a small set of relays---to answer such questions as
|
|
%``is it running?'' or ``is it behaving correctly?'' We describe in
|
|
%Section~\ref{subsec:enclave-dirs} a way for the bridge authority to
|
|
%publish this information without resorting to signing each answer
|
|
%individually.
|
|
|
|
\subsection{Putting them together}
|
|
\label{subsec:relay-together}
|
|
|
|
If a blocked user knows the identity keys of a set of bridge relays, and
|
|
he has correct address information for at least one of them, he can use
|
|
that one to make a secure connection to the bridge authority and update
|
|
his knowledge about the other bridge relays. He can also use it to make
|
|
secure connections to the main Tor network and directory servers, so he
|
|
can build circuits and connect to the rest of the Internet. All of these
|
|
updates happen in the background: from the blocked user's perspective,
|
|
he just accesses the Internet via his Tor client like always.
|
|
|
|
So now we've reduced the problem from how to circumvent the firewall
|
|
for all transactions (and how to know that the pages you get have not
|
|
been modified by the local attacker) to how to learn about a working
|
|
bridge relay.
|
|
|
|
There's another catch though. We need to make sure that the network
|
|
traffic we generate by simply connecting to a bridge relay doesn't stand
|
|
out too much.
|
|
|
|
%The following section describes ways to bootstrap knowledge of your first
|
|
%bridge relay, and ways to maintain connectivity once you know a few
|
|
%bridge relays.
|
|
|
|
% (See Section~\ref{subsec:first-bridge} for a discussion
|
|
%of exactly what information is sufficient to characterize a bridge relay.)
|
|
|
|
|
|
|
|
\section{Hiding Tor's network signatures}
|
|
\label{sec:network-signature}
|
|
\label{subsec:enclave-dirs}
|
|
|
|
Currently, Tor uses two protocols for its network communications. The
|
|
main protocol uses TLS for encrypted and authenticated communication
|
|
between Tor instances. The second protocol is standard HTTP, used for
|
|
fetching directory information. All Tor servers listen on their ``ORPort''
|
|
for TLS connections, and some of them opt to listen on their ``DirPort''
|
|
as well, to serve directory information. Tor servers choose whatever port
|
|
numbers they like; the server descriptor they publish to the directory
|
|
tells users where to connect.
|
|
|
|
One format for communicating address information about a bridge relay is
|
|
its IP address and DirPort. From there, the user can ask the bridge's
|
|
directory cache for an up-to-date copy of its server descriptor, and
|
|
learn its current circuit keys, its ORPort, and so on.
|
|
|
|
However, connecting directly to the directory cache involves a plaintext
|
|
HTTP request. A censor could create a network signature for the request
|
|
and/or its response, thus preventing these connections. To resolve this
|
|
vulnerability, we've modified the Tor protocol so that users can connect
|
|
to the directory cache via the main Tor port---they establish a TLS
|
|
connection with the bridge as normal, and then send a special ``begindir''
|
|
relay command to establish an internal connection to its directory cache.
|
|
|
|
Therefore a better way to summarize a bridge's address is by its IP
|
|
address and ORPort, so all communications between the client and the
|
|
bridge will use ordinary TLS. But there are other details that need
|
|
more investigation.
|
|
|
|
What port should bridges pick for their ORPort? We currently recommend
|
|
that they listen on port 443 (the default HTTPS port) if they want to
|
|
be most useful, because clients behind standard firewalls will have
|
|
the best chance to reach them. Is this the best choice in all cases,
|
|
or should we encourage some fraction of them pick random ports, or other
|
|
ports commonly permitted through firewalls like 53 (DNS) or 110
|
|
(POP)? Or perhaps we should use other ports where TLS traffic is
|
|
expected, like 993 (IMAPS) or 995 (POP3S). We need more research on our
|
|
potential users, and their current and anticipated firewall restrictions.
|
|
|
|
Furthermore, we need to look at the specifics of Tor's TLS handshake.
|
|
Right now Tor uses some predictable strings in its TLS handshakes. For
|
|
example, it sets the X.509 organizationName field to ``Tor'', and it puts
|
|
the Tor server's nickname in the certificate's commonName field. We
|
|
should tweak the handshake protocol so it doesn't rely on any unusual details
|
|
in the certificate, yet it remains secure; the certificate itself
|
|
should be made to resemble an ordinary HTTPS certificate. We should also try
|
|
to make our advertised cipher-suites closer to what an ordinary web server
|
|
would support.
|
|
|
|
Tor's TLS handshake uses two-certificate chains: one certificate
|
|
contains the self-signed identity key for
|
|
the router, and the second contains a current TLS key, signed by the
|
|
identity key. We use these to authenticate that we're talking to the right
|
|
router, and to limit the impact of TLS-key exposure. Most (though far from
|
|
all) consumer-oriented HTTPS services provide only a single certificate.
|
|
These extra certificates may help identify Tor's TLS handshake; instead,
|
|
bridges should consider using only a single TLS key certificate signed by
|
|
their identity key, and providing the full value of the identity key in an
|
|
early handshake cell. More significantly, Tor currently has all clients
|
|
present certificates, so that clients are harder to distinguish from servers.
|
|
But in a blocking-resistance environment, clients should not present
|
|
certificates at all.
|
|
|
|
Last, what if the adversary starts observing the network traffic even
|
|
more closely? Even if our TLS handshake looks innocent, our traffic timing
|
|
and volume still look different than a user making a secure web connection
|
|
to his bank. The same techniques used in the growing trend to build tools
|
|
to recognize encrypted Bittorrent traffic
|
|
%~\cite{bt-traffic-shaping}
|
|
could be used to identify Tor communication and recognize bridge
|
|
relays. Rather than trying to look like encrypted web traffic, we may be
|
|
better off trying to blend with some other encrypted network protocol. The
|
|
first step is to compare typical network behavior for a Tor client to
|
|
typical network behavior for various other protocols. This statistical
|
|
cat-and-mouse game is made more complex by the fact that Tor transports a
|
|
variety of protocols, and we'll want to automatically handle web browsing
|
|
differently from, say, instant messaging.
|
|
|
|
% Tor cells are 512 bytes each. So TLS records will be roughly
|
|
% multiples of this size? How bad is this? -RD
|
|
% Look at ``Inferring the Source of Encrypted HTTP Connections''
|
|
% by Marc Liberatore and Brian Neil Levine (CCS 2006)
|
|
% They substantially flesh out the numbers for the web fingerprinting
|
|
% attack. -PS
|
|
% Yes, but I meant detecting the signature of Tor traffic itself, not
|
|
% learning what websites we're going to. I wouldn't be surprised to
|
|
% learn that these are related problems, but it's not obvious to me. -RD
|
|
|
|
\subsection{Identity keys as part of addressing information}
|
|
|
|
We have described a way for the blocked user to bootstrap into the
|
|
network once he knows the IP address and ORPort of a bridge. What about
|
|
local spoofing attacks? That is, since we never learned an identity
|
|
key fingerprint for the bridge, a local attacker could intercept our
|
|
connection and pretend to be the bridge we had in mind. It turns out
|
|
that giving false information isn't that bad---since the Tor client
|
|
ships with trusted keys for the bridge directory authority and the Tor
|
|
network directory authorities, the user can learn whether he's being
|
|
given a real connection to the bridge authorities or not. (After all,
|
|
if the adversary intercepts every connection the user makes and gives
|
|
him a bad connection each time, there's nothing we can do.)
|
|
|
|
What about anonymity-breaking attacks from observing traffic, if the
|
|
blocked user doesn't start out knowing the identity key of his intended
|
|
bridge? The vulnerabilities aren't so bad in this case either---the
|
|
adversary could do similar attacks just by monitoring the network
|
|
traffic.
|
|
% cue paper by steven and george
|
|
|
|
Once the Tor client has fetched the bridge's server descriptor, it should
|
|
remember the identity key fingerprint for that bridge relay. Thus if
|
|
the bridge relay moves to a new IP address, the client can query the
|
|
bridge directory authority to look up a fresh server descriptor using
|
|
this fingerprint.
|
|
|
|
So we've shown that it's \emph{possible} to bootstrap into the network
|
|
just by learning the IP address and ORPort of a bridge, but are there
|
|
situations where it's more convenient or more secure to learn the bridge's
|
|
identity fingerprint as well as instead, while bootstrapping? We keep
|
|
that question in mind as we next investigate bootstrapping and discovery.
|
|
|
|
\section{Discovering working bridge relays}
|
|
\label{sec:discovery}
|
|
|
|
Tor's modular design means that we can develop a better relay component
|
|
independently of developing the discovery component. This modularity's
|
|
great promise is that we can pick any discovery approach we like; but the
|
|
unfortunate fact is that we have no magic bullet for discovery. We're
|
|
in the same arms race as all the other designs we described in
|
|
Section~\ref{sec:related}.
|
|
|
|
In this section we describe a variety of approaches to adding discovery
|
|
components for our design.
|
|
|
|
\subsection{Bootstrapping: finding your first bridge.}
|
|
\label{subsec:first-bridge}
|
|
|
|
In Section~\ref{subsec:relay-together}, we showed that a user who knows
|
|
a working bridge address can use it to reach the bridge authority and
|
|
to stay connected to the Tor network. But how do new users reach the
|
|
bridge authority in the first place? After all, the bridge authority
|
|
will be one of the first addresses that a censor blocks.
|
|
|
|
First, we should recognize that most government firewalls are not
|
|
perfect. That is, they may allow connections to Google cache or some
|
|
open proxy servers, or they let file-sharing traffic, Skype, instant
|
|
messaging, or World-of-Warcraft connections through. Different users will
|
|
have different mechanisms for bypassing the firewall initially. Second,
|
|
we should remember that most people don't operate in a vacuum; users will
|
|
hopefully know other people who are in other situations or have other
|
|
resources available. In the rest of this section we develop a toolkit
|
|
of different options and mechanisms, so that we can enable users in a
|
|
diverse set of contexts to bootstrap into the system.
|
|
|
|
(For users who can't use any of these techniques, hopefully they know
|
|
a friend who can---for example, perhaps the friend already knows some
|
|
bridge relay addresses. If they can't get around it at all, then we
|
|
can't help them---they should go meet more people or learn more about
|
|
the technology running the firewall in their area.)
|
|
|
|
By deploying all the schemes in the toolkit at once, we let bridges and
|
|
blocked users employ the discovery approach that is most appropriate
|
|
for their situation.
|
|
|
|
\subsection{Independent bridges, no central discovery}
|
|
|
|
The first design is simply to have no centralized discovery component at
|
|
all. Volunteers run bridges, and we assume they have some blocked users
|
|
in mind and communicate their address information to them out-of-band
|
|
(for example, through Gmail). This design allows for small personal
|
|
bridges that have only one or a handful of users in mind, but it can
|
|
also support an entire community of users. For example, Citizen Lab's
|
|
upcoming Psiphon single-hop proxy tool~\cite{psiphon} plans to use this
|
|
\emph{social network} approach as its discovery component.
|
|
|
|
There are several ways to do bootstrapping in this design. In the simple
|
|
case, the operator of the bridge informs each chosen user about his
|
|
bridge's address information and/or keys. A different approach involves
|
|
blocked users introducing new blocked users to the bridges they know.
|
|
That is, somebody in the blocked area can pass along a bridge's address to
|
|
somebody else they trust. This scheme brings in appealing but complex game
|
|
theoretic properties: the blocked user making the decision has an incentive
|
|
only to delegate to trustworthy people, since an adversary who learns
|
|
the bridge's address and filters it makes it unavailable for both of them.
|
|
Also, delegating known bridges to members of your social network can be
|
|
dangerous: an the adversary who can learn who knows which bridges may
|
|
be able to reconstruct the social network.
|
|
|
|
Note that a central set of bridge directory authorities can still be
|
|
compatible with a decentralized discovery process. That is, how users
|
|
first learn about bridges is entirely up to the bridges, but the process
|
|
of fetching up-to-date descriptors for them can still proceed as described
|
|
in Section~\ref{sec:bridges}. Of course, creating a central place that
|
|
knows about all the bridges may not be smart, especially if every other
|
|
piece of the system is decentralized. Further, if a user only knows
|
|
about one bridge and he loses track of it, it may be quite a hassle to
|
|
reach the bridge authority. We address these concerns next.
|
|
|
|
\subsection{Families of bridges, no central discovery}
|
|
|
|
Because the blocked users are running our software too, we have many
|
|
opportunities to improve usability or robustness. Our second design builds
|
|
on the first by encouraging volunteers to run several bridges at once
|
|
(or coordinate with other bridge volunteers), such that some
|
|
of the bridges are likely to be available at any given time.
|
|
|
|
The blocked user's Tor client would periodically fetch an updated set of
|
|
recommended bridges from any of the working bridges. Now the client can
|
|
learn new additions to the bridge pool, and can expire abandoned bridges
|
|
or bridges that the adversary has blocked, without the user ever needing
|
|
to care. To simplify maintenance of the community's bridge pool, each
|
|
community could run its own bridge directory authority---reachable via
|
|
the available bridges, and also mirrored at each bridge.
|
|
|
|
\subsection{Public bridges with central discovery}
|
|
|
|
What about people who want to volunteer as bridges but don't know any
|
|
suitable blocked users? What about people who are blocked but don't
|
|
know anybody on the outside? Here we describe how to make use of these
|
|
\emph{public bridges} in a way that still makes it hard for the attacker
|
|
to learn all of them.
|
|
|
|
The basic idea is to divide public bridges into a set of pools based on
|
|
identity key. Each pool corresponds to a \emph{distribution strategy}:
|
|
an approach to distributing its bridge addresses to users. Each strategy
|
|
is designed to exercise a different scarce resource or property of
|
|
the user.
|
|
|
|
How do we divide bridges between these strategy pools such that they're
|
|
evenly distributed and the allocation is hard to influence or predict,
|
|
but also in a way that's amenable to creating more strategies later
|
|
on without reshuffling all the pools? We assign a given bridge
|
|
to a strategy pool by hashing the bridge's identity key along with a
|
|
secret that only the bridge authority knows: the first $n$ bits of this
|
|
hash dictate the strategy pool number, where $n$ is a parameter that
|
|
describes how many strategy pools we want at this point. We choose $n=3$
|
|
to start, so we divide bridges between 8 pools; but as we later invent
|
|
new distribution strategies, we can increment $n$ to split the 8 into
|
|
16. Since a bridge can't predict the next bit in its hash, it can't
|
|
anticipate which identity key will correspond to a certain new pool
|
|
when the pools are split. Further, since the bridge authority doesn't
|
|
provide any feedback to the bridge about which strategy pool it's in,
|
|
an adversary who signs up bridges with the goal of filling a certain
|
|
pool~\cite{casc-rep} will be hindered.
|
|
|
|
% This algorithm is not ideal. When we split pools, each existing
|
|
% pool is cut in half, where half the bridges remain with the
|
|
% old distribution policy, and half will be under what the new one
|
|
% is. So the new distribution policy inherits a bunch of blocked
|
|
% bridges if the old policy was too loose, or a bunch of unblocked
|
|
% bridges if its policy was still secure. -RD
|
|
%
|
|
% I think it should be more chordlike.
|
|
% Bridges are allocated to wherever on the ring which is divided
|
|
% into arcs (buckets).
|
|
% If a bucket gets too full, you can just split it.
|
|
% More on this below. -PFS
|
|
|
|
The first distribution strategy (used for the first pool) publishes bridge
|
|
addresses in a time-release fashion. The bridge authority divides the
|
|
available bridges into partitions, and each partition is deterministically
|
|
available only in certain time windows. That is, over the course of a
|
|
given time slot (say, an hour), each requestor is given a random bridge
|
|
from within that partition. When the next time slot arrives, a new set
|
|
of bridges from the pool are available for discovery. Thus some bridge
|
|
address is always available when a new
|
|
user arrives, but to learn about all bridges the attacker needs to fetch
|
|
all new addresses at every new time slot. By varying the length of the
|
|
time slots, we can make it harder for the attacker to guess when to check
|
|
back. We expect these bridges will be the first to be blocked, but they'll
|
|
help the system bootstrap until they \emph{do} get blocked. Further,
|
|
remember that we're dealing with different blocking regimes around the
|
|
world that will progress at different rates---so this pool will still
|
|
be useful to some users even as the arms races progress.
|
|
|
|
The second distribution strategy publishes bridge addresses based on the IP
|
|
address of the requesting user. Specifically, the bridge authority will
|
|
divide the available bridges in the pool into a bunch of partitions
|
|
(as in the first distribution scheme), hash the requestor's IP address
|
|
with a secret of its own (as in the above allocation scheme for creating
|
|
pools), and give the requestor a random bridge from the appropriate
|
|
partition. To raise the bar, we should discard the last octet of the
|
|
IP address before inputting it to the hash function, so an attacker
|
|
who only controls a single ``/24'' network only counts as one user. A
|
|
large attacker like China will still be able to control many addresses,
|
|
but the hassle of establishing connections from each network (or spoofing
|
|
TCP connections) may still slow them down. Similarly, as a special case,
|
|
we should treat IP addresses that are Tor exit nodes as all being on
|
|
the same network.
|
|
|
|
The third strategy combines the time-based and location-based
|
|
strategies to further constrain and rate-limit the available bridge
|
|
addresses. Specifically, the bridge address provided in a given time
|
|
slot to a given network location is deterministic within the partition,
|
|
rather than chosen randomly each time from the partition. Thus, repeated
|
|
requests during that time slot from a given network are given the same
|
|
bridge address as the first request.
|
|
|
|
The fourth strategy is based on Circumventor's discovery strategy.
|
|
The Circumventor project, realizing that its adoption will remain limited
|
|
if it has no central coordination mechanism, has started a mailing list to
|
|
distribute new proxy addresses every few days. From experimentation it
|
|
seems they have concluded that sending updates every three or four days
|
|
is sufficient to stay ahead of the current attackers.
|
|
|
|
The fifth strategy provides an alternative approach to a mailing list:
|
|
users provide an email address and receive an automated response
|
|
listing an available bridge address. We could limit one response per
|
|
email address. To further rate limit queries, we could require a CAPTCHA
|
|
solution
|
|
%~\cite{captcha}
|
|
in each case too. In fact, we wouldn't need to
|
|
implement the CAPTCHA on our side: if we only deliver bridge addresses
|
|
to Yahoo or GMail addresses, we can leverage the rate-limiting schemes
|
|
that other parties already impose for account creation.
|
|
|
|
The sixth strategy ties in the social network design with public
|
|
bridges and a reputation system. We pick some seeds---trusted people in
|
|
blocked areas---and give them each a few dozen bridge addresses and a few
|
|
\emph{delegation tokens}. We run a website next to the bridge authority,
|
|
where users can log in (they connect via Tor, and they don't need to
|
|
provide actual identities, just persistent pseudonyms). Users can delegate
|
|
trust to other people they know by giving them a token, which can be
|
|
exchanged for a new account on the website. Accounts in ``good standing''
|
|
then accrue new bridge addresses and new tokens. As usual, reputation
|
|
schemes bring in a host of new complexities~\cite{rep-anon}: how do we
|
|
decide that an account is in good standing? We could tie reputation
|
|
to whether the bridges they're told about have been blocked---see
|
|
Section~\ref{subsec:geoip} below for initial thoughts on how to discover
|
|
whether bridges have been blocked. We could track reputation between
|
|
accounts (if you delegate to somebody who screws up, it impacts you too),
|
|
or we could use blinded delegation tokens~\cite{chaum-blind} to prevent
|
|
the website from mapping the seeds' social network. We put off deeper
|
|
discussion of the social network reputation strategy for future work.
|
|
|
|
Pools seven and eight are held in reserve, in case our currently deployed
|
|
tricks all fail at once and the adversary blocks all those bridges---so
|
|
we can adapt and move to new approaches quickly, and have some bridges
|
|
immediately available for the new schemes. New strategies might be based
|
|
on some other scarce resource, such as relaying traffic for others or
|
|
other proof of energy spent. (We might also worry about the incentives
|
|
for bridges that sign up and get allocated to the reserve pools: will they
|
|
be unhappy that they're not being used? But this is a transient problem:
|
|
if Tor users are bridges by default, nobody will mind not being used yet.
|
|
See also Section~\ref{subsec:incentives}.)
|
|
|
|
%Is it useful to load balance which bridges are handed out? The above
|
|
%pool concept makes some bridges wildly popular and others less so.
|
|
%But I guess that's the point.
|
|
|
|
\subsection{Public bridges with coordinated discovery}
|
|
|
|
We presented the above discovery strategies in the context of a single
|
|
bridge directory authority, but in practice we will want to distribute the
|
|
operations over several bridge authorities---a single point of failure
|
|
or attack is a bad move. The first answer is to run several independent
|
|
bridge directory authorities, and bridges gravitate to one based on
|
|
their identity key. The better answer would be some federation of bridge
|
|
authorities that work together to provide redundancy but don't introduce
|
|
new security issues. We could even imagine designs where the bridge
|
|
authorities have encrypted versions of the bridge's server descriptors,
|
|
and the users learn a decryption key that they keep private when they
|
|
first hear about the bridge---this way the bridge authorities would not
|
|
be able to learn the IP address of the bridges.
|
|
|
|
We leave this design question for future work.
|
|
|
|
\subsection{Assessing whether bridges are useful}
|
|
|
|
Learning whether a bridge is useful is important in the bridge authority's
|
|
decision to include it in responses to blocked users. For example, if
|
|
we end up with a list of thousands of bridges and only a few dozen of
|
|
them are reachable right now, most blocked users will not end up knowing
|
|
about working bridges.
|
|
|
|
There are three components for assessing how useful a bridge is. First,
|
|
is it reachable from the public Internet? Second, what proportion of
|
|
the time is it available? Third, is it blocked in certain jurisdictions?
|
|
|
|
The first component can be tested just as we test reachability of
|
|
ordinary Tor servers. Specifically, the bridges do a self-test---connect
|
|
to themselves via the Tor network---before they are willing to
|
|
publish their descriptor, to make sure they're not obviously broken or
|
|
misconfigured. Once the bridges publish, the bridge authority also tests
|
|
reachability to make sure they're not confused or outright lying.
|
|
|
|
The second component can be measured and tracked by the bridge authority.
|
|
By doing periodic reachability tests, we can get a sense of how often the
|
|
bridge is available. More complex tests will involve bandwidth-intensive
|
|
checks to force the bridge to commit resources in order to be counted as
|
|
available. We need to evaluate how the relationship of uptime percentage
|
|
should weigh into our choice of which bridges to advertise. We leave
|
|
this to future work.
|
|
|
|
The third component is perhaps the trickiest: with many different
|
|
adversaries out there, how do we keep track of which adversaries have
|
|
blocked which bridges, and how do we learn about new blocks as they
|
|
occur? We examine this problem next.
|
|
|
|
\subsection{How do we know if a bridge relay has been blocked?}
|
|
\label{subsec:geoip}
|
|
|
|
There are two main mechanisms for testing whether bridges are reachable
|
|
from inside each blocked area: active testing via users, and passive
|
|
testing via bridges.
|
|
|
|
In the case of active testing, certain users inside each area
|
|
sign up as testing relays. The bridge authorities can then use a
|
|
Blossom-like~\cite{blossom-thesis} system to build circuits through them
|
|
to each bridge and see if it can establish the connection. But how do
|
|
we pick the users? If we ask random users to do the testing (or if we
|
|
solicit volunteers from the users), the adversary should sign up so he
|
|
can enumerate the bridges we test. Indeed, even if we hand-select our
|
|
testers, the adversary might still discover their location and monitor
|
|
their network activity to learn bridge addresses.
|
|
|
|
Another answer is not to measure directly, but rather let the bridges
|
|
report whether they're being used.
|
|
%If they periodically report to their
|
|
%bridge directory authority how much use they're seeing, perhaps the
|
|
%authority can make smart decisions from there.
|
|
Specifically, bridges should install a GeoIP database such as the public
|
|
IP-To-Country list~\cite{ip-to-country}, and then periodically report to the
|
|
bridge authorities which countries they're seeing use from. This data
|
|
would help us track which countries are making use of the bridge design,
|
|
and can also let us learn about new steps the adversary has taken in
|
|
the arms race. (The compressed GeoIP database is only several hundred
|
|
kilobytes, and we could even automate the update process by serving it
|
|
from the bridge authorities.)
|
|
More analysis of this passive reachability
|
|
testing design is needed to resolve its many edge cases: for example,
|
|
if a bridge stops seeing use from a certain area, does that mean the
|
|
bridge is blocked or does that mean those users are asleep?
|
|
|
|
There are many more problems with the general concept of detecting whether
|
|
bridges are blocked. First, different zones of the Internet are blocked
|
|
in different ways, and the actual firewall jurisdictions do not match
|
|
country borders. Our bridge scheme could help us map out the topology
|
|
of the censored Internet, but this is a huge task. More generally,
|
|
if a bridge relay isn't reachable, is that because of a network block
|
|
somewhere, because of a problem at the bridge relay, or just a temporary
|
|
outage somewhere in between? And last, an attacker could poison our
|
|
bridge database by signing up already-blocked bridges. In this case,
|
|
if we're stingy giving out bridge addresses, users in that country won't
|
|
learn working bridges.
|
|
|
|
All of these issues are made more complex when we try to integrate this
|
|
testing into our social network reputation system above.
|
|
Since in that case we punish or reward users based on whether bridges
|
|
get blocked, the adversary has new attacks to trick or bog down the
|
|
reputation tracking. Indeed, the bridge authority doesn't even know
|
|
what zone the blocked user is in, so do we blame him for any possible
|
|
censored zone, or what?
|
|
|
|
Clearly more analysis is required. The eventual solution will probably
|
|
involve a combination of passive measurement via GeoIP and active
|
|
measurement from trusted testers. More generally, we can use the passive
|
|
feedback mechanism to track usage of the bridge network as a whole---which
|
|
would let us respond to attacks and adapt the design, and it would also
|
|
let the general public track the progress of the project.
|
|
|
|
%Worry: the adversary could choose not to block bridges but just record
|
|
%connections to them. So be it, I guess.
|
|
|
|
\subsection{Advantages of deploying all solutions at once}
|
|
|
|
For once, we're not in the position of the defender: we don't have to
|
|
defend against every possible filtering scheme; we just have to defend
|
|
against at least one. On the flip side, the attacker is forced to guess
|
|
how to allocate his resources to defend against each of these discovery
|
|
strategies. So by deploying all of our strategies at once, we not only
|
|
increase our chances of finding one that the adversary has difficulty
|
|
blocking, but we actually make \emph{all} of the strategies more robust
|
|
in the face of an adversary with limited resources.
|
|
|
|
%\subsection{Remaining unsorted notes}
|
|
|
|
%In the first subsection we describe how to find a first bridge.
|
|
|
|
%Going to be an arms race. Need a bag of tricks. Hard to say
|
|
%which ones will work. Don't spend them all at once.
|
|
|
|
%Some techniques are sufficient to get us an IP address and a port,
|
|
%and others can get us IP:port:key. Lay out some plausible options
|
|
%for how users can bootstrap into learning their first bridge.
|
|
|
|
%\section{The account / reputation system}
|
|
%\section{Social networks with directory-side support}
|
|
%\label{sec:accounts}
|
|
|
|
%One answer is to measure based on whether the bridge addresses
|
|
%we give it end up blocked. But how do we decide if they get blocked?
|
|
|
|
%Perhaps each bridge should be known by a single bridge directory
|
|
%authority. This makes it easier to trace which users have learned about
|
|
%it, so easier to blame or reward. It also makes things more brittle,
|
|
%since loss of that authority means its bridges aren't advertised until
|
|
%they switch, and means its bridge users are sad too.
|
|
%(Need a slick hash algorithm that will map our identity key to a
|
|
%bridge authority, in a way that's sticky even when we add bridge
|
|
%directory authorities, but isn't sticky when our authority goes
|
|
%away. Does this exist?)
|
|
|
|
%\subsection{Discovery based on social networks}
|
|
|
|
%A token that can be exchanged at the bridge authority (assuming you
|
|
%can reach it) for a new bridge address.
|
|
|
|
%The account server runs as a Tor controller for the bridge authority.
|
|
|
|
%Users can establish reputations, perhaps based on social network
|
|
%connectivity, perhaps based on not getting their bridge relays blocked,
|
|
|
|
%Probably the most critical lesson learned in past work on reputation
|
|
%systems in privacy-oriented environments~\cite{rep-anon} is the need for
|
|
%verifiable transactions. That is, the entity computing and advertising
|
|
%reputations for participants needs to actually learn in a convincing
|
|
%way that a given transaction was successful or unsuccessful.
|
|
|
|
%(Lesson from designing reputation systems~\cite{rep-anon}: easy to
|
|
%reward good behavior, hard to punish bad behavior.
|
|
|
|
\section{Security considerations}
|
|
\label{sec:security}
|
|
|
|
\subsection{Possession of Tor in oppressed areas}
|
|
|
|
Many people speculate that installing and using a Tor client in areas with
|
|
particularly extreme firewalls is a high risk---and the risk increases
|
|
as the firewall gets more restrictive. This notion certain has merit, but
|
|
there's
|
|
a counter pressure as well: as the firewall gets more restrictive, more
|
|
ordinary people behind it end up using Tor for more mainstream activities,
|
|
such as learning
|
|
about Wall Street prices or looking at pictures of women's ankles. So
|
|
as the restrictive firewall pushes up the number of Tor users, the
|
|
``typical'' Tor user becomes more mainstream, and therefore mere
|
|
use or possession of the Tor software is not so surprising.
|
|
|
|
It's hard to say which of these pressures will ultimately win out,
|
|
but we should keep both sides of the issue in mind.
|
|
|
|
%Nick, want to rewrite/elaborate on this section?
|
|
|
|
\subsection{Observers can tell who is publishing and who is reading}
|
|
\label{subsec:upload-padding}
|
|
|
|
Tor encrypts traffic on the local network, and it obscures the eventual
|
|
destination of the communication, but it doesn't do much to obscure the
|
|
traffic volume. In particular, a user publishing a home video will have a
|
|
different network signature than a user reading an online news article.
|
|
Based on our assumption in Section~\ref{sec:assumptions} that users who
|
|
publish material are in more danger, should we work to improve Tor's
|
|
security in this situation?
|
|
|
|
In the general case this is an extremely challenging task:
|
|
effective \emph{end-to-end traffic confirmation attacks}
|
|
are known where the adversary observes the origin and the
|
|
destination of traffic and confirms that they are part of the
|
|
same communication~\cite{danezis:pet2004,e2e-traffic}. Related are
|
|
\emph{website fingerprinting attacks}, where the adversary downloads
|
|
a few hundred popular websites, makes a set of "signatures" for each
|
|
site, and then observes the target Tor client's traffic to look for
|
|
a match~\cite{pet05-bissias,defensive-dropping}. But can we do better
|
|
against a limited adversary who just does coarse-grained sweeps looking
|
|
for unusually prolific publishers?
|
|
|
|
One answer is for bridge users to automatically send bursts of padding
|
|
traffic periodically. (This traffic can be implemented in terms of
|
|
long-range drop cells, which are already part of the Tor specification.)
|
|
Of course, convincingly simulating an actual human publishing interesting
|
|
content is a difficult arms race, but it may be worthwhile to at least
|
|
start the race. More research remains.
|
|
|
|
\subsection{Anonymity effects from acting as a bridge relay}
|
|
|
|
Against some attacks, relaying traffic for others can improve
|
|
anonymity. The simplest example is an attacker who owns a small number
|
|
of Tor servers. He will see a connection from the bridge, but he won't
|
|
be able to know whether the connection originated there or was relayed
|
|
from somebody else. More generally, the mere uncertainty of whether the
|
|
traffic originated from that user may be helpful.
|
|
|
|
There are some cases where it doesn't seem to help: if an attacker can
|
|
watch all of the bridge's incoming and outgoing traffic, then it's easy
|
|
to learn which connections were relayed and which started there. (In this
|
|
case he still doesn't know the final destinations unless he is watching
|
|
them too, but in this case bridges are no better off than if they were
|
|
an ordinary client.)
|
|
|
|
There are also some potential downsides to running a bridge. First, while
|
|
we try to make it hard to enumerate all bridges, it's still possible to
|
|
learn about some of them, and for some people just the fact that they're
|
|
running one might signal to an attacker that they place a higher value
|
|
on their anonymity. Second, there are some more esoteric attacks on Tor
|
|
relays that are not as well-understood or well-tested---for example, an
|
|
attacker may be able to ``observe'' whether the bridge is sending traffic
|
|
even if he can't actually watch its network, by relaying traffic through
|
|
it and noticing changes in traffic timing~\cite{attack-tor-oak05}. On
|
|
the other hand, it may be that limiting the bandwidth the bridge is
|
|
willing to relay will allow this sort of attacker to determine if it's
|
|
being used as a bridge but not easily learn whether it is adding traffic
|
|
of its own.
|
|
|
|
We also need to examine how entry guards fit in. Entry guards
|
|
(a small set of nodes that are always used for the first
|
|
step in a circuit) help protect against certain attacks
|
|
where the attacker runs a few Tor servers and waits for
|
|
the user to choose these servers as the beginning and end of her
|
|
circuit\footnote{http://wiki.noreply.org/noreply/TheOnionRouter/TorFAQ\#EntryGuards}.
|
|
If the blocked user doesn't use the bridge's entry guards, then the bridge
|
|
doesn't gain as much cover benefit. On the other hand, what design changes
|
|
are needed for the blocked user to use the bridge's entry guards without
|
|
learning what they are (this seems hard), and even if we solve that,
|
|
do they then need to use the guards' guards and so on down the line?
|
|
|
|
It is an open research question whether the benefits of running a bridge
|
|
outweigh the risks. A lot of the decision rests on which attacks the
|
|
users are most worried about. For most users, we don't think running a
|
|
bridge relay will be that damaging, and it could help quite a bit.
|
|
|
|
\subsection{Trusting local hardware: Internet cafes and LiveCDs}
|
|
\label{subsec:cafes-and-livecds}
|
|
|
|
Assuming that users have their own trusted hardware is not
|
|
always reasonable.
|
|
|
|
For Internet cafe Windows computers that let you attach your own USB key,
|
|
a USB-based Tor image would be smart. There's Torpark, and hopefully
|
|
there will be more thoroughly analyzed options down the road. Worries
|
|
remain about hardware or
|
|
software keyloggers and other spyware---and physical surveillance.
|
|
|
|
If the system lets you boot from a CD or from a USB key, you can gain
|
|
a bit more security by bringing a privacy LiveCD with you. (This
|
|
approach isn't foolproof of course, since hardware
|
|
keyloggers and physical surveillance are still a worry).
|
|
|
|
In fact, LiveCDs are also useful if it's your own hardware, since it's
|
|
easier to avoid leaving private data and logs scattered around the
|
|
system.
|
|
|
|
%\subsection{Forward compatibility and retiring bridge authorities}
|
|
%
|
|
%Eventually we'll want to change the identity key and/or location
|
|
%of a bridge authority. How do we do this mostly cleanly?
|
|
|
|
\subsection{The trust chain}
|
|
\label{subsec:trust-chain}
|
|
|
|
Tor's ``public key infrastructure'' provides a chain of trust to
|
|
let users verify that they're actually talking to the right servers.
|
|
There are four pieces to this trust chain.
|
|
|
|
First, when Tor clients are establishing circuits, at each step
|
|
they demand that the next Tor server in the path prove knowledge of
|
|
its private key~\cite{tor-design}. This step prevents the first node
|
|
in the path from just spoofing the rest of the path. Second, the
|
|
Tor directory authorities provide a signed list of servers along with
|
|
their public keys---so unless the adversary can control a threshold
|
|
of directory authorities, he can't trick the Tor client into using other
|
|
Tor servers. Third, the location and keys of the directory authorities,
|
|
in turn, is hard-coded in the Tor source code---so as long as the user
|
|
got a genuine version of Tor, he can know that he is using the genuine
|
|
Tor network. And last, the source code and other packages are signed
|
|
with the GPG keys of the Tor developers, so users can confirm that they
|
|
did in fact download a genuine version of Tor.
|
|
|
|
In the case of blocked users contacting bridges and bridge directory
|
|
authorities, the same logic applies in parallel: the blocked users fetch
|
|
information from both the bridge authorities and the directory authorities
|
|
for the `main' Tor network, and they combine this information locally.
|
|
|
|
How can a user in an oppressed country know that he has the correct
|
|
key fingerprints for the developers? As with other security systems, it
|
|
ultimately comes down to human interaction. The keys are signed by dozens
|
|
of people around the world, and we have to hope that our users have met
|
|
enough people in the PGP web of trust
|
|
%~\cite{pgp-wot}
|
|
that they can learn
|
|
the correct keys. For users that aren't connected to the global security
|
|
community, though, this question remains a critical weakness.
|
|
|
|
%\subsection{Security through obscurity: publishing our design}
|
|
|
|
%Many other schemes like dynaweb use the typical arms race strategy of
|
|
%not publishing their plans. Our goal here is to produce a design---a
|
|
%framework---that can be public and still secure. Where's the tradeoff?
|
|
|
|
%\section{Performance improvements}
|
|
%\label{sec:performance}
|
|
%
|
|
%\subsection{Fetch server descriptors just-in-time}
|
|
%
|
|
%I guess we should encourage most places to do this, so blocked
|
|
%users don't stand out.
|
|
%
|
|
%
|
|
%network-status and directory optimizations. caching better. partitioning
|
|
%issues?
|
|
|
|
\section{Maintaining reachability}
|
|
|
|
\subsection{How many bridge relays should you know about?}
|
|
|
|
The strategies described in Section~\ref{sec:discovery} talked about
|
|
learning one bridge address at a time. But if most bridges are ordinary
|
|
Tor users on cable modem or DSL connection, many of them will disappear
|
|
and/or move periodically. How many bridge relays should a blocked user
|
|
know about so that she is likely to have at least one reachable at any
|
|
given point? This is already a challenging problem if we only consider
|
|
natural churn: the best approach is to see what bridges we attract in
|
|
reality and measure their churn. We may also need to factor in a parameter
|
|
for how quickly bridges get discovered and blocked by the attacker;
|
|
we leave this for future work after we have more deployment experience.
|
|
|
|
A related question is: if the bridge relays change IP addresses
|
|
periodically, how often does the blocked user need to fetch updates in
|
|
order to keep from being cut out of the loop?
|
|
|
|
Once we have more experience and intuition, we should explore technical
|
|
solutions to this problem too. For example, if the discovery strategies
|
|
give out $k$ bridge addresses rather than a single bridge address, perhaps
|
|
we can improve robustness from the user perspective without significantly
|
|
aiding the adversary. Rather than giving out a new random subset of $k$
|
|
addresses at each point, we could bind them together into \emph{bridge
|
|
families}, so all users that learn about one member of the bridge family
|
|
are told about the rest as well.
|
|
|
|
This scheme may also help defend against attacks to map the set of
|
|
bridges. That is, if all blocked users learn a random subset of bridges,
|
|
the attacker should learn about a few bridges, monitor the country-level
|
|
firewall for connections to them, then watch those users to see what
|
|
other bridges they use, and repeat. By segmenting the bridge address
|
|
space, we can limit the exposure of other users.
|
|
|
|
\subsection{Cablemodem users don't usually provide important websites}
|
|
\label{subsec:block-cable}
|
|
|
|
Another attacker we might be concerned about is that the attacker could
|
|
just block all DSL and cablemodem network addresses, on the theory that
|
|
they don't run any important services anyway. If most of our bridges
|
|
are on these networks, this attack could really hurt.
|
|
|
|
The first answer is to aim to get volunteers both from traditionally
|
|
``consumer'' networks and also from traditionally ``producer'' networks.
|
|
Since bridges don't need to be Tor exit nodes, as we improve our usability
|
|
it seems quite feasible to get a lot of websites helping out.
|
|
|
|
The second answer (not as practical) would be to encourage more use of
|
|
consumer networks for popular and useful Internet services.
|
|
%(But P2P exists;
|
|
%minor websites exist; gaming exists; IM exists; ...)
|
|
|
|
A related attack we might worry about is based on large countries putting
|
|
economic pressure on companies that want to expand their business. For
|
|
example, what happens if Verizon wants to sell services in China, and
|
|
China pressures Verizon to discourage its users in the free world from
|
|
running bridges?
|
|
|
|
\subsection{Scanning resistance: making bridges more subtle}
|
|
|
|
If it's trivial to verify that a given address is operating as a bridge,
|
|
and most bridges run on a predictable port, then it's conceivable our
|
|
attacker could scan the whole Internet looking for bridges. (In fact, he
|
|
can just concentrate on scanning likely networks like cablemodem and DSL
|
|
services---see Section~\ref{block-cable} above for related attacks.) It
|
|
would be nice to slow down this attack. It would be even nicer to make
|
|
it hard to learn whether we're a bridge without first knowing some
|
|
secret. We call this general property \emph{scanning resistance}.
|
|
|
|
Password protecting the bridges.
|
|
Could provide a password to the bridge user. He provides a nonced hash of
|
|
it or something when he connects. We'd need to give him an ID key for the
|
|
bridge too, and wait to present the password until we've TLSed, else the
|
|
adversary can pretend to be the bridge and MITM him to learn the password.
|
|
|
|
We could use some kind of ID-based knocking protocol, or we could act like an
|
|
unconfigured HTTPS server if treated like one.
|
|
|
|
We can assume that the attacker can easily recognize https connections
|
|
to unknown servers. It can then attempt to connect to them and block
|
|
connections to servers that seem suspicious. It may be that password
|
|
protected web sites will not be suspicious in general, in which case
|
|
that may be the easiest way to give controlled access to the bridge.
|
|
If such sites that have no other overt features are automatically
|
|
blocked when detected, then we may need to be more subtle.
|
|
Possibilities include serving an innocuous web page if a TLS encrypted
|
|
request is received without the authorization needed to access the Tor
|
|
network and only responding to a requested access to the Tor network
|
|
of proper authentication is given. If an unauthenticated request to
|
|
access the Tor network is sent, the bridge should respond as if
|
|
it has received a message it does not understand (as would be the
|
|
case were it not a bridge).
|
|
|
|
|
|
\subsection{How to motivate people to run bridge relays}
|
|
\label{subsec:incentives}
|
|
|
|
One of the traditional ways to get people to run software that benefits
|
|
others is to give them motivation to install it themselves. An often
|
|
suggested approach is to install it as a stunning screensaver so everybody
|
|
will be pleased to run it. We take a similar approach here, by leveraging
|
|
the fact that these users are already interested in protecting their
|
|
own Internet traffic, so they will install and run the software.
|
|
|
|
Make all Tor users become bridges if they're reachable---needs more work
|
|
on usability first, but we're making progress.
|
|
|
|
Also, we can make a snazzy network graph with Vidalia that emphasizes
|
|
the connections the bridge user is currently relaying. (Minor anonymity
|
|
implications, but hey.) (In many cases there won't be much activity,
|
|
so this may backfire. Or it may be better suited to full-fledged Tor
|
|
servers.)
|
|
|
|
% Also consider everybody-a-server. Many of the scalability questions
|
|
% are easier when you're talking about making everybody a bridge.
|
|
|
|
%\subsection{What if the clients can't install software?}
|
|
|
|
%[this section should probably move to the related work section,
|
|
%or just disappear entirely.]
|
|
|
|
%Bridge users without Tor software
|
|
|
|
%Bridge relays could always open their socks proxy. This is bad though,
|
|
%first
|
|
%because bridges learn the bridge users' destinations, and second because
|
|
%we've learned that open socks proxies tend to attract abusive users who
|
|
%have no idea they're using Tor.
|
|
|
|
%Bridges could require passwords in the socks handshake (not supported
|
|
%by most software including Firefox). Or they could run web proxies
|
|
%that require authentication and then pass the requests into Tor. This
|
|
%approach is probably a good way to help bootstrap the Psiphon network,
|
|
%if one of its barriers to deployment is a lack of volunteers willing
|
|
%to exit directly to websites. But it clearly drops some of the nice
|
|
%anonymity and security features Tor provides.
|
|
|
|
%A hybrid approach where the user gets his anonymity from Tor but his
|
|
%software-less use from a web proxy running on a trusted machine on the
|
|
%free side.
|
|
|
|
\subsection{Publicity attracts attention}
|
|
\label{subsec:publicity}
|
|
|
|
Many people working on this field want to publicize the existence
|
|
and extent of censorship concurrently with the deployment of their
|
|
circumvention software. The easy reason for this two-pronged push is
|
|
to attract volunteers for running proxies in their systems; but in many
|
|
cases their main goal is not to build the software, but rather to educate
|
|
the world about the censorship. The media also tries to do its part by
|
|
broadcasting the existence of each new circumvention system.
|
|
|
|
But at the same time, this publicity attracts the attention of the
|
|
censors. We can slow down the arms race by not attracting as much
|
|
attention, and just spreading by word of mouth. If our goal is to
|
|
establish a solid social network of bridges and bridge users before
|
|
the adversary gets involved, does this attention tradeoff work to our
|
|
advantage?
|
|
|
|
\subsection{The Tor website: how to get the software}
|
|
|
|
One of the first censoring attacks against a system like ours is to
|
|
block the website and make the software itself hard to find. Our system
|
|
should work well once the user is running an authentic
|
|
copy of Tor and has found a working bridge, but to get to that point
|
|
we rely on their individual skills and ingenuity.
|
|
|
|
Right now, most countries that block access to Tor block only the main
|
|
website and leave mirrors and the network itself untouched.
|
|
Falling back on word-of-mouth is always a good last resort, but we should
|
|
also take steps to make sure it's relatively easy for users to get a copy,
|
|
such as publicizing the mirrors more and making copies available through
|
|
other media.
|
|
See Section~\ref{subsec:first-bridge} for more discussion.
|
|
|
|
\section{Future designs}
|
|
|
|
\subsection{Bridges inside the blocked network too}
|
|
|
|
Assuming actually crossing the firewall is the risky part of the
|
|
operation, can we have some bridge relays inside the blocked area too,
|
|
and more established users can use them as relays so they don't need to
|
|
communicate over the firewall directly at all? A simple example here is
|
|
to make new blocked users into internal bridges also---so they sign up
|
|
on the bridge authority as part of doing their query, and we give out
|
|
their addresses
|
|
rather than (or along with) the external bridge addresses. This design
|
|
is a lot trickier because it brings in the complexity of whether the
|
|
internal bridges will remain available, can maintain reachability with
|
|
the outside world, etc.
|
|
|
|
Hidden services as bridges. Hidden services as bridge directory authorities.
|
|
|
|
\section{Conclusion}
|
|
|
|
a technical solution won't solve the whole problem. after all, china's
|
|
firewall is *socially* very successful, even if technologies exist to
|
|
get around it.
|
|
|
|
but having a strong technical solution is still useful as a piece of the
|
|
puzzle. and tor provides a great set of building blocks to start from.
|
|
|
|
\bibliographystyle{plain} \bibliography{tor-design}
|
|
|
|
%\appendix
|
|
|
|
%\section{Counting Tor users by country}
|
|
%\label{app:geoip}
|
|
|
|
\end{document}
|
|
|
|
ship geoip db to bridges. they look up users who tls to them in the db,
|
|
and upload a signed list of countries and number-of-users each day. the
|
|
bridge authority aggregates them and publishes stats.
|
|
|
|
bridge relays have buddies
|
|
they ask a user to test the reachability of their buddy.
|
|
leaks O(1) bridges, but not O(n).
|
|
|
|
we should not be blockable by ordinary cisco censorship features.
|
|
that is, if they want to block our new design, they will need to
|
|
add a feature to block exactly this.
|
|
strategically speaking, this may come in handy.
|
|
|
|
Bridges come in clumps of 4 or 8 or whatever. If you know one bridge
|
|
in a clump, the authority will tell you the rest. Now bridges can
|
|
ask users to test reachability of their buddies.
|
|
|
|
Giving out clumps helps with dynamic IP addresses too. Whether it
|
|
should be 4 or 8 depends on our churn.
|
|
|
|
the account server. let's call it a database, it doesn't have to
|
|
be a thing that human interacts with.
|
|
|
|
so how do we reward people for being good?
|
|
|
|
\subsubsection{Public Bridges with Coordinated Discovery}
|
|
|
|
****Pretty much this whole subsubsection will probably need to be
|
|
deferred until ``later'' and moved to after end document, but I'm leaving
|
|
it here for now in case useful.******
|
|
|
|
Rather than be entirely centralized, we can have a coordinated
|
|
collection of bridge authorities, analogous to how Tor network
|
|
directory authorities now work.
|
|
|
|
Key components
|
|
``Authorities'' will distribute caches of what they know to overlapping
|
|
collections of nodes so that no one node is owned by one authority.
|
|
Also so that it is impossible to DoS info maintained by one authority
|
|
simply by making requests to it.
|
|
|
|
Where a bridge gets assigned is not predictable by the bridge?
|
|
|
|
If authorities don't know the IP addresses of the bridges they
|
|
are responsible for, they can't abuse that info (or be attacked for
|
|
having it). But, they also can't, e.g., control being sent massive
|
|
lists of nodes that were never good. This raises another question.
|
|
We generally decry use of IP address for location, etc. but we
|
|
need to do that to limit the introduction of functional but useless
|
|
IP addresses because, e.g., they are in China and the adversary
|
|
owns massive chunks of the IP space there.
|
|
|
|
We don't want an arbitrary someone to be able to contact the
|
|
authorities and say an IP address is bad because it would be easy
|
|
for an adversary to take down all the suspicious bridges
|
|
even if they provide good cover websites, etc. Only the bridge
|
|
itself and/or the directory authority can declare a bridge blocked
|
|
from somewhere.
|
|
|
|
|
|
9. Bridge directories must not simply be a handful of nodes that
|
|
provide the list of bridges. They must flood or otherwise distribute
|
|
information out to other Tor nodes as mirrors. That way it becomes
|
|
difficult for censors to flood the bridge directory servers with
|
|
requests, effectively denying access for others. But, there's lots of
|
|
churn and a much larger size than Tor directories. We are forced to
|
|
handle the directory scaling problem here much sooner than for the
|
|
network in general. Authorities can pass their bridge directories
|
|
(and policy info) to some moderate number of unidentified Tor nodes.
|
|
Anyone contacting one of those nodes can get bridge info. the nodes
|
|
must remain somewhat synched to prevent the adversary from abusing,
|
|
e.g., a timed release policy or the distribution to those nodes must
|
|
be resilient even if they are not coordinating.
|
|
|
|
I think some kind of DHT like scheme would work here. A Tor node is
|
|
assigned a chunk of the directory. Lookups in the directory should be
|
|
via hashes of keys (fingerprints) and that should determine the Tor
|
|
nodes responsible. Ordinary directories can publish lists of Tor nodes
|
|
responsible for fingerprint ranges. Clients looking to update info on
|
|
some bridge will make a Tor connection to one of the nodes responsible
|
|
for that address. Instead of shutting down a circuit after getting
|
|
info on one address, extend it to another that is responsible for that
|
|
address (the node from which you are extending knows you are doing so
|
|
anyway). Keep going. This way you can amortize the Tor connection.
|
|
|
|
10. We need some way to give new identity keys out to those who need
|
|
them without letting those get immediately blocked by authorities. One
|
|
way is to give a fingerprint that gets you more fingerprints, as
|
|
already described. These are meted out/updated periodically but allow
|
|
us to keep track of which sources are compromised: if a distribution
|
|
fingerprint repeatedly leads to quickly blocked bridges, it should be
|
|
suspect, dropped, etc. Since we're using hashes, there shouldn't be a
|
|
correlation with bridge directory mirrors, bridges, portions of the
|
|
network observed, etc. It should just be that the authorities know
|
|
about that key that leads to new addresses.
|
|
|
|
This last point is very much like the issues in the valet nodes paper,
|
|
which is essentially about blocking resistance wrt exiting the Tor network,
|
|
while this paper is concerned with blocking the entering to the Tor network.
|
|
In fact the tickets used to connect to the IPo (Introduction Point),
|
|
could serve as an example, except that instead of authorizing
|
|
a connection to the Hidden Service, it's authorizing the downloading
|
|
of more fingerprints.
|
|
|
|
Also, the fingerprints can follow the hash(q + '1' + cookie) scheme of
|
|
that paper (where q = hash(PK + salt) gave the q.onion address). This
|
|
allows us to control and track which fingerprint was causing problems.
|
|
|
|
Note that, unlike many settings, the reputation problem should not be
|
|
hard here. If a bridge says it is blocked, then it might as well be.
|
|
If an adversary can say that the bridge is blocked wrt
|
|
$\mathit{censor}_i$, then it might as well be, since
|
|
$\mathit{censor}_i$ can presumably then block that bridge if it so
|
|
chooses.
|
|
|
|
11. How much damage can the adversary do by running nodes in the Tor
|
|
network and watching for bridge nodes connecting to it? (This is
|
|
analogous to an Introduction Point watching for Valet Nodes connecting
|
|
to it.) What percentage of the network do you need to own to do how
|
|
much damage. Here the entry-guard design comes in helpfully. So we
|
|
need to have bridges use entry-guards, but (cf. 3 above) not use
|
|
bridges as entry-guards. Here's a serious tradeoff (again akin to the
|
|
ratio of valets to IPos) the more bridges/client the worse the
|
|
anonymity of that client. The fewer bridges/client the worse the
|
|
blocking resistance of that client.
|
|
|
|
|
|
|