Merge branch 'isolated-streams'

Conflicts: doc/spec/proposals/000-index.txt
2024-11-20 10:12:15 +01:00 · 2010-12-07 11:02:10 -05:00 · 2010-12-07 11:02:10 -05:00 · ed0eeed835
commit ed0eeed835
parent 3fc43debfb a1e46c5393
4 changed files with 354 additions and 61 deletions
--- a/doc/spec/proposals/000-index.txt
+++ b/doc/spec/proposals/000-index.txt
@ -91,6 +91,7 @@ Proposals by number:
 168  Reduce default circuit window [OPEN]
 169  Eliminate TLS renegotiation for the Tor connection handshake [DRAFT]
 170  Configuration options regarding circuit building [DRAFT]
+171  Separate streams across circuits by connection metadata [OPEN]
 172  GETINFO controller option for circuit information [ACCEPTED]
 173  GETINFO Option Expansion [ACCEPTED]
 174  Optimistic Data for Tor: Server Side [OPEN]
@ -107,7 +108,7 @@ Proposals by status:
   144  Increase the diversity of circuits by detecting nodes belonging the same provider
   169  Eliminate TLS renegotiation for the Tor connection handshake [for 0.2.2]
   170  Configuration options regarding circuit building
-   175  Automatically promoting Tor clients to nodes [for ]
+   175  Automatically promoting Tor clients to nodes
 NEEDS-REVISION:
   131  Help users to verify they are using Tor
 OPEN:
@ -123,6 +124,7 @@ Proposals by status:
   164  Reporting the status of server votes [for 0.2.2]
   165  Easy migration for voting authority sets
   168  Reduce default circuit window [for 0.2.2]
+   171  Separate streams across circuits by connection metadata
   174  Optimistic Data for Tor: Server Side
 ACCEPTED:
   110  Avoiding infinite length circuits [for 0.2.1.x] [in 0.2.1.3-alpha]
--- a/doc/spec/proposals/001-process.txt
+++ b/doc/spec/proposals/001-process.txt
@ -75,7 +75,7 @@ How new proposals get added:

  To get your proposal in, send it to or-dev.

-  The current proposal editor is Nick Mathewson.
+  The current proposal editors are Nick Mathewson and Jacob Appelbaum.

 What should go in a proposal:

--- a/doc/spec/proposals/171-separate-streams.txt
+++ b/doc/spec/proposals/171-separate-streams.txt
@ -0,0 +1,350 @@
+Filename: 171-separate-streams.txt
+Title: Separate streams across circuits by connection metadata
+Author: Robert Hogan, Jacob Appelbaum, Damon McCoy, Nick Mathewson
+Created: 21-Oct-2008
+Modified: 7-Dec-2010
+Status: Open
+
+Summary:
+
+  We propose a new set of options to isolate unrelated streams from one
+  another, putting them on separate circuits so that semantically
+  unrelated traffic is not inadvertently made linkable.
+
+Motivation:
+
+  Currently, Tor attaches regular streams (that is, ones not carrying
+  rendezvous or directory traffic) to circuits based only on whether Tor
+  circuit's current exit node supports the destination, and whether the
+  circuit has been dirty (that is, in use) for too long.
+
+  This means that traffic that would otherwise be unrelated sometimes
+  gets sent over the same circuit, allowing the exit node to link such
+  streams with certainty, and allowing other parties to link such
+  streams probabilistically.
+
+  Older versions of onion routing tried to address this problem by
+  sending every stream over a separate circuit; performance issues made
+  this unfeasible. Moreover, in the presence of a localized adversary,
+  separating streams by circuits increases the odds that, for any given
+  linked set of streams, at least one will go over a compromised
+  circuit.
+
+  Therefore we ought to look for ways to allow streams that ought to be
+  linked to travel over a single circuit, while keeping streams that
+  ought not be linked isolated to separate circuits.
+
+Discussion:
+
+  Let's call a series of inherently-linked streams (like a set of
+  streams downloading objects from the same webpage, or a browsing
+  session where the user requests several related webpages) a "Session".
+
+  "Sessions" are a necessarily a fuzzy concept.  While users typically
+  consider some activities as wholly unrelated to each other ("My IM
+  session has nothing to do with my web browsing!"), the boundaries
+  between activities are sometimes hard to determine.  If I'm reading
+  lolcats in one browser tab and reading about treatments for an
+  embarrassing disease in another, those are probably separate sessions.
+  If I search for a forum, log in, read it for a while, and post a few
+  messages on unrelated topics, that's probably all the same session.
+
+  So with the proviso that no automated process can identify sessions
+  100% accurately, let's see which options we have available.
+
+  Generally, all the streams on a session come from a single
+  application.  Unfortunately, isolating streams by application
+  automatically isn't feasible, given the lack of any nice
+  cross-platform way to tell which local process originated a given
+  connection.  (Yes, lsof works.  But a quick review of the lsof code
+  should be sufficient to scare you away from thinking there is a
+  portable option, much less a portable O(1) option.)  So instead, we'll
+  have to use some other aspect of a Tor request as a proxy for the
+  application.
+
+  Generally, traffic from separate applications is not in the same
+  session.
+
+  With some applications (IRC, for example), each stream is a session.
+
+  Some applications (most notably web browsing) can't be meaningfully
+  split into sessions without inspecting the traffic itself and
+  maintaining a lot of state.
+
+  How well do ports correspond to sessions?  Early versions of this
+  proposal focused on using destination ports as a proxy for
+  application, since a connection to port 22 for SSH is probably not in
+  the same session as one to port 80. This only works with some
+  applications better than others, though: while SSH users typically
+  know when they're on port 22 and when they aren't, a web browser can
+  be coaxed (though img urls or any number of releated tricks) into
+  connecting to any port at all.  Moreover, when Tor gets a DNS lookup
+  request, it doesn't know in advance which port the resulting address
+  will be used to connect to.
+
+  So in summary, each kind of traffic wants to follow different rules,
+  and assuming the existence of a web browser and a hostile web page or
+  exit node, we can't tell one kind of traffic from another by simply
+  looking at the destination:port of the traffic.
+
+  Fortunately, we're not doomed.
+
+Design:
+
+  When a stream arrives at Tor, we have the following data to examine:
+    1) The destination address
+    2) The destination port (unless this a DNS lookup)
+    3) The protocol used by the application to send the stream to Tor:
+       SOCKS4, SOCKS4A, SOCKS5, or whatever local "transparent proxy"
+       mechanism the kernel gives us.
+    4) The port used by the application to send the stream to Tor --
+       that is, the SOCKSListenAddress or TransListenAddress that the
+       application used, if we have more than one.
+    5) The SOCKS username and password, if any.
+    6) The source address and port for the application.
+
+  We propose to use 3, 4, and 5 as a backchannel for applications to
+  tell Tor about different sessions.  Rather than running only one
+  SOCKSPort, a Tor user who would prefer better session isolation should
+  run multiple SOCKSPorts/TransPorts, and configure different
+  applications to use separate ports. Applications that support SOCKS
+  authentication can further be separated on a single port by their
+  choice of username/password.  Streams sent to separate ports or using
+  different authentication information should never be sent over the
+  same circuit.  We allow each port to have its own settings for
+  isolation based on destination port, destination address, or both.
+
+  Handling DNS can be a challenge.  We can get hostnames by one of three
+  means:
+
+    A) A SOCKS4a request, or a SOCKS5 request with a hostname.  This
+       case is handled trivially using the rules above.
+    B) A RESOLVE request on a SOCKSPort.  This case is handled using the
+       rules above, except that port isolation can't work to isolate
+       RESOLVE requests into a proper session, since we don't know which
+       port will eventually be used when we connect to the returned
+       address.
+    C) A request on a DNSPort.  We have no way of knowing which
+       address/port will be used to connect to the requested address.
+
+  When B or C is required but problematic, we could favor the use of
+  AutomapHostsOnResolve.
+
+Interface:
+
+  We propose that {SOCKS,Natd,Trans,DNS}ListenAddr be deprecated in
+  favor of an expanded {SOCKS,Natd,Trans,DNS}Port syntax:
+
+  ClientPortLine = OptionName SP (Addr ":")? Port (SP Options?)
+  OptionName = "SOCKSPort" / "NatdPort" / "TransPort" / "DNSPort"
+  Addr = An IPv4 address / an IPv6 address surrounded by brackets.
+         If optional, we default to 127.0.0.1
+  Port = An integer from 1 through 65535 inclusive
+  Options = Option
+  Options = Options SP Option
+  Option = IsolateOption / GroupOption
+  GroupOption = "SessionGroup=" UINT
+  IsolateOption =  OptNo ("IsolateDestPort" / "IsolateDestAddr" /
+         "IsolateSOCKSUser"/ "IsolateClientProtocol" /
+         "IsolateClientAddr") OptPlural
+  OptNo = "No" ?
+  OptPlural = "s" ?
+  SP = " "
+  UINT = An unsigned integer
+
+  All options are case-insensitive.
+
+  The "IsolateSOCKSUser" and "IsolateClientAddr" options are on by
+  default; "NoIsolateSOCKSUser" and "NoIsolateClientAddr" respectively
+  turn them off.  The IsolateDestPort and IsolateDestAddr and
+  IsolateClientProtocol options are off by default.  NoIsolateDestPort and
+  NoIsolateDestAddr and NoIsolateClientProtocol have no effect.
+
+  Given a set of ClientPortLines, streams must NOT be placed on the same
+  circuit if ANY of the following hold:
+
+    * They were sent to two different client ports, unless the two
+      client ports both specify a "SessionGroup" option with the same
+      integer value.
+    * At least one was sent to a client port with the IsolateDestPort
+      active, and they have different destination ports.
+    * At least one was sent to a client port with IsolateDestAddr
+      active, and they have different destination addresses.
+    * At least one was sent to a client port with IsolateClientProtocol
+      active, and they use different protocols (where SOCKS4, SOCKS4a,
+      SOCKS5, TransPort, NatdPort, and DNS are the protocols in question)
+    * At least one was sent to a client port with IsolateSOCKSUser
+      active, and they have different SOCKS username/password values
+      configurations.  (For the purposes of this option, the
+      username/password pair of ""/"" is distinct from SOCKS without
+      authentication, and both are distinct from any non-SOCKS client's
+      non-authentication.)
+    * At least one was sent to a client port with IsolateClientAddr
+      active, and they came from different client addresses.  (For the
+      purpose of this option, any local interface counts as the same
+      address.  So if the host is configured with addresses 10.0.0.1,
+      192.0.32.10, and 127.0.0.1, then traffic from those addresses can
+      leave on the same circuit, but traffic to from 10.0.0.2 (for
+      example) could not share a circuit with any of them.)
+
+  These rules apply regardless of whether the streams are active at the
+  same time.  In other words, if the rules say that streams A and B must
+  not be on the same circuit, and stream A is attached to circuit X,
+  then stream B must never be attached to stream X, even if stream A is
+  closed first.
+
+Alternative Interface:
+
+  We're cramming a lot onto one line in the design above.  Perhaps
+  instead it would be a better idea to have grouped lines of the form:
+
+    StreamGroup 1
+    SOCKSPort 9050
+    TransPort 9051
+    IsolateDestPort 1
+    IsolateClientProtocol 0
+    EndStreamGroup
+
+    StreamGroup 2
+    SOCKSPort 9052
+    DNSPort 9053
+    IsolateDestAddr 1
+    EndStreamGroup
+
+  This would be equivalent to:
+   SOCKSPort 9050 SessionGroup=1 IsolateDestPort NoIsolateClientProtocol
+   TransPort 9051 SessionGroup=1 IsolateDestPort NoIsolateClientProtocol
+   SOCKSPort 9052 SessionGroup=2 IsolateDestAddr
+   DNSPort   9053 SessionGroup=2 IsolateDestAddr
+
+  But it would let us extend range of allowed options later without
+  having client port lines group without bound.  For example, we might
+  give different circuit building parameters to different session
+  groups.
+
+Example of use:
+
+  Suppose that we want to use a web browser, an IRC client, and a SSH
+  client all at the same time.  Let's assume that we want web traffic to
+  be isolated from all other traffic, even if the browser makes
+  connections to ports usually used for IRC or SSH.  Let's also assume
+  that IRC and SSH are both used for relatively long-lived connections,
+  and we want to keep all IRC/SSH sessions separate from one another.
+
+  In this case, we could say:
+
+    SOCKSPort 9050
+    SOCKSPort 9051 IsolateDestAddr IsolateDestPort
+
+  We would then configure our browser to use 9050 and our IRC/SSH
+  clients to use 9051.
+
+Advanced example of use, #2:
+
+  Suppose that we have a bunch of applications, and we launch them all
+  using torsocks, and we want to keep each applications isolated from
+  one another.  We just create a shell script, "torlaunch":
+    #!/bin/bash
+    export TORSOCKS_USERNAME="$1"
+    exec torsocks $@
+  And we configure our SOCKSPort with IsolateSOCKSUser.
+
+  Or if we're on Linux and we want to isolate by application invocation,
+  we would change the TORSOCKS_USERNAME line to:
+
+    export TORSOCKS_USERNAME="`cat /proc/sys/kernel/random/uuid`"
+
+Advanced example of use, #2:
+
+  Now suppose that we want to achieve the benefits of the first example
+  of use, but we are stuck using transparent proxies.  Let's suppose
+  this is Linux.
+
+    TransPort 9090
+    TransPort 9091 IsolateDestAddr IsolateDestPort
+    DNSPort 5353
+    AutomapHostsOnResolve 1
+
+  Here we use the iptables --cmd-owner filter to distinguish which
+  command is originating the packets, directing traffic from our irc
+  client and our SSH client to port 9091, and directing other traffic to
+  9090.  Using AutomapHostsOnResolve will confuse ssh in its default
+  configuration; we'll need to find a way around that.
+
+Security Risks:
+
+  Disabling IsolateClientAddr is a pretty bad idea.
+
+  Setting up a set of applications to use this system effectively is a
+  big problem.  It's likely that lots of people who try to do this will
+  mess it up.  We should try to see which setups are sensible, and see
+  if we can provide good feedback to explain which streams are isolated
+  how.
+
+Performance Risks:
+
+  This proposal will result in clients building many more circuits than
+  they do today.  To avoid accidentally hammering the network, we should
+  have in-process limits on the maximum circuit creation rate and the
+  total maximum client circuits.
+
+Specification:
+
+  The Tor client circuit selection process is not entirely specified.
+  Any client circuit specification must take these changes into account.
+
+Implementation notes:
+
+  The more obvious ways to implement the "find a good circuit to attach
+  to" part of this proposal involve doing an O(n_circuits) operation
+  every time we have a stream to attach.  We already do such an
+  operation, so it's not as if we need to hunt for fancy ways to make it
+  O(1).  What will be harder is implementing the "launch circuits as
+  needed" part of the proposal.  Still, it should come down to "a simple
+  matter of programming."
+
+  The SOCKS4 spec has the client provide authentication info when it
+  connects; accepting such info is no problem.  But the SOCKS5 spec has
+  the client send a list of known auth methods, then has the server send
+  back the authentication method it chooses.  We'll need to update the
+  SOCKS5 implementation so it can accept user/password authentication if
+  it's offered.
+
+  If we use the second syntax for describing these options, we'll want
+  to add a new "section-based" entry type for the configuration parser.
+  Not a huge deal; we already have kludged up something similar for
+  hidden service configurations.
+
+  Opening circuits for predicted ports has the potential to get a little
+  more complicated; we can probably get away with the existing
+  algorithm, though, to see where its weak points are and look for
+  better ones.
+
+  Perhaps we can get our next-gen HTTP proxy to communicate browser tab
+  or session into to tor via authentication, or have torbutton do it
+  directly.  More design is needed here, though.
+
+Alternative designs:
+
+  The implementation of this option may want to consider cases where the
+  same exit node is shared by two or more circuits and
+  IsolateStreamsByPort is in force.  Since one possible use of the option
+  is to reduce the opportunity of Exit Nodes to attack traffic from the
+  same source on multiple ports, the implementation may need to ensure
+  that circuits reserved for the exclusive use of given ports do not
+  share the same exit node.  On the other hand, if our goal is only that
+  streams should be unlinkable, deliberately shunting them to different
+  exit nodes is unnecessary and slightly counterproductive.
+
+  Earlier versions of this design included a mechanism to isolate
+  _particular_ destination ports and addresses, so that traffic sent to,
+  say, port 22 would never share a port with any traffic *not* sent to
+  port 22.  You can achieve this here by having all applications that
+  send traffic to one of these ports use a separate SOCKSPort, and
+  then setting IsolateDestPorts on that SOCKSPort.
+
+Lingering questions:
+
+  I suspect there are issues remaining with DNS and TransPort users, and
+  that my "just use AutomapHostsOnResolve" suggestion may be
+  insufficient.
--- a/doc/spec/proposals/ideas/xxx-separate-streams-by-port.txt
+++ b/doc/spec/proposals/ideas/xxx-separate-streams-by-port.txt
@ -1,59 +0,0 @@
-Filename: xxx-separate-streams-by-port.txt
-Title: Separate streams across circuits by destination port
-Author: Robert Hogan
-Created: 21-Oct-2008
-Status: Draft
-
-Here's a patch Robert Hogan wrote to use only one destination port per
-circuit. It's based on a wishlist item Roger wrote, to never send AIM
-usernames over the same circuit that we're hoping to browse anonymously
-through. The remaining open question is: how many extra circuits does this
-cause an ordinary user to create? My guess is not very many, but I'm wary
-of putting this in until we have some better estimate. On the other hand,
-not putting it in means that we have a known security flaw. Hm.
-
-Index: src/or/or.h
-===================================================================
--- src/or/or.h (revision 17143)
-+++ src/or/or.h (working copy)
-@@ -1874,6 +1874,7 @@
-
-   uint8_t state; /**< Current status of this circuit. */
-   uint8_t purpose; /**< Why are we creating this circuit? */
-+  uint16_t service; /**< Port conn must have to use this circuit. */
-
-   /** How many relay data cells can we package (read from edge streams)
-    * on this circuit before we receive a circuit-level sendme cell asking
-Index: src/or/circuituse.c
-===================================================================
--- src/or/circuituse.c (revision 17143)
-+++ src/or/circuituse.c (working copy)
-@@ -62,10 +62,16 @@
-       return 0;
-   }
-
-  if (purpose == CIRCUIT_PURPOSE_C_GENERAL)
-+  if (purpose == CIRCUIT_PURPOSE_C_GENERAL) {
-     if (circ->timestamp_dirty &&
-        circ->timestamp_dirty+get_options()->MaxCircuitDirtiness <= now)
-       return 0;
-+    /* If the circuit is dirty and used for services on another port,
-+      then it is not suitable. */
-+    if (circ->service && conn->socks_request->port &&
-+       (circ->service != conn->socks_request->port))
-+      return 0;
-+  }
-
-   /* decide if this circ is suitable for this conn */
-
-@@ -1351,7 +1357,9 @@
-     if (connection_ap_handshake_send_resolve(conn) < 0)
-       return -1;
-   }
-
-+  if (conn->socks_request->port
-+     && (TO_CIRCUIT(circ)->purpose == CIRCUIT_PURPOSE_C_GENERAL))
-+    TO_CIRCUIT(circ)->service = conn->socks_request->port;
-   return 1;
- }
-