Add a proposal-ideas document for crypto migration.

2024-11-20 10:12:15 +01:00 · 2010-12-13 23:39:54 -05:00 · 2010-12-13 23:39:54 -05:00 · 462185d180
commit 462185d180
parent 2118028c50
1 changed files with 382 additions and 0 deletions
--- a/doc/spec/proposals/ideas/xxx-crypto-migration.txt
+++ b/doc/spec/proposals/ideas/xxx-crypto-migration.txt
@ -0,0 +1,382 @@
+
+Title: Initial thoughts on migrating Tor to new cryptography
+Author: Nick Mathewson
+Created: 12 December 2010
+
+1. Introduction
+
+  Tor currently uses AES-128, RSA-1024, and SHA1.  Even though these
+  ciphers were a decent choice back in 2003, and even though attacking
+  these algorithms is by no means the best way for a well-funded
+  adversary to attack users (correlation attacks are still cheaper, even
+  with pessimistic assumptions about the security of each cipher), we
+  will want to move to better algorithms in the future.  Indeed, if
+  migrating to a new ciphersuite were simple, we would probably have
+  already moved to RSA-1024/AES-128/SHA256 or something like that.
+
+  So it's a good idea to start figuring out how we can move to better
+  ciphers.  Unfortunately, this is a bit nontrivial, so before we start
+  doing the design work here, we should start by examining the issues
+  involved.  Robert Ransom and I both decided to spend this weekend
+  writing up documents of this type so that we can see how much two
+  people working independently agree on.  I know more Tor than Robert;
+  Robert knows far more cryptography than I do.  With luck we'll
+  complement each other's work nicely.
+
+  A note on scope: This document WILL NOT attempt to pick a new cipher
+  or set of ciphers.  Instead, it's about how to migrate to new ciphers
+  in general.  Any algorithms mentioned other than those we use today
+  are just for illustration.
+
+  Also, I don't much consider the importance of updating each particular
+  usage; only the methods that you'd use to do it.
+
+  Also, this isn't a complete proposal.
+
+2. General principles and tricks
+
+  Before I get started, let's talk about some general design issues.
+
+2.1. Many algorithms or few?
+
+  Protocols like TLS and OpenPGP allow a wide choice of cryptographic
+  algorithms; so long as the sender and receiver (or the responder and
+  initiator) have at least one mutually acceptable algorithm, they can
+  converge upon it and send each other messages.
+
+  This isn't the best choice for anonymity designs.  If two clients
+  support a different set of algorithms, then an attacker can tell them
+  apart.  A protocol with N ciphersuites would in principle split
+  clients into 2**N-1 sets.  (In practice, nearly all users will use the
+  default, and most users who choose _not_ to use the default will do so
+  without considering the loss of anonymity.  See "Anonymity Loves
+  Company: Usability and the Network Effect".)
+
+  On the other hand, building only one ciphersuite into Tor has a flaw
+  of its own: it has proven difficult to migrate to another one.  So
+  perhaps instead of specifying only a single new ciphersuite, we should
+  specify more than one, with plans to switch over (based on a flag in
+  the consensus or some other secure signal) once the first choice of
+  algorithms start looking iffy.  This switch-based approach would seem
+  especially easy for parameterizable stuff like key sizes.
+
+2.2. Waiting for old clients and servers to upgrade
+
+  The easiest way to implement a shift in algorithms would be to declare
+  a "flag day": once we have the new versions of the protocols
+  implemented, pick a day by which everybody must upgrade to the new
+  software.  Before this day, the software would have the old behavior;
+  after this way, it would use the improved behavior.
+
+  Tor tries to avoid flag days whenever possible; they have well-known
+  issues.  First, since a number of our users don't automatically
+  update, it can take a while for people to upgrade to new versions of
+  our software.  Second and more worryingly, it's hard to get adequate
+  testing for new behavior that is off-by-default.  Flag days in other
+  systems have been known to leave whole networks more or less
+  inoperable for months; we should not trust in our skill to avoid
+  similar problems.
+
+  So if we're avoiding flag days, what can we do?
+
+  * We can add _support_ for new behavior early, and have clients use it
+    where it's available.  (Clients know the advertised versions of the
+    Tor servers they use-- but see 2.3 below for a danger here, and 2.4
+    for a bigger danger.)
+
+  * We can remove misfeatures that _prevent_ deployment of new
+    behavior.  For instance, if a certain key length has an arbitrary
+    1024-bit limit, we can remove that arbitrary limitation.
+
+  * Once an optional new behavior is ubiquitous enough, the authorities
+    can stop accepting descriptors from servers that do not have it
+    until they upgrade.
+
+  It is far easier to remove arbitrary limitations than to make other
+  changes; such changes are generally safe to back-port to older stable
+  release series.  But in general, it's much better to avoid any plans
+  that require waiting for any version of Tor to no longer be in common
+  use: a stable release can take on the order of 2.5 years to start
+  dropping off the radar.  Thandy might fix that, but even if a perfect
+  Thandy release comes out tomorrow, we'll still have lots of older
+  clients and relays not using it.
+
+  We'll have to approach the migration problem on a case-by-case basis
+  as we consider the algorithms used by Tor and how to change them.
+
+2.3. Early adopters and other partitioning dangers
+
+  It's pretty much unavoidable that clients running software that speak
+  the new version of any protocol will be distinguishable from those
+  that cannot speak the new version.  This is inevitable, though we
+  could try to minimize the number of such partitioning sets by having
+  features turned on in the same release rather than one-at-a-time.
+
+  Another option here is to have new protocols controlled by a
+  configuration tri-state with values "on", "off", and "auto".  The
+  "auto" value means to look at the consensus to decide wither to use
+  the feature; the other two values are self-explanatory.  We'd ship
+  clients with the feature set to "auto" by default, with people only
+  using "on" for testing.
+
+  If we're worried about early client-side implementations of a protocol
+  turning out to be broken, we can have the consensus value say _which_
+  versions should turn on the protocol.
+
+2.4. Avoid whole-circuit switches
+
+  One risky kind of protocol migration is a feature that gets used only
+  when all the routers in a circuit support it.  If such a feature is
+  implemented by few relays, then each relay learns a lot about the rest
+  of the path by seeing it used.  On the other hand, if the feature is
+  implemented by most relays, then a relay learns a lot about the rest of
+  the path when the feature is *not* used.
+
+  It's okay to have a feature that can be only used if two consecutive
+  routers in the patch support it: each router knows the ones adjacent
+  to it, after all, so knowing what version of Tor they're running is no
+  big deal.
+
+2.5. The Second System Effect rears its ugly head
+
+  Any attempt at improving Tor's crypto is likely to involve changes
+  throughout the Tor protocol.  We should be aware of the risks of
+  falling into what Fred Brooks called the "Second System Effect": when
+  redesigning a fielded system, it's always tempting to try to shovel in
+  every possible change that one ever wanted to make to it.
+
+  This is a fine time to make parts of our protocol that weren't
+  previously versionable into ones that are easier to upgrade in the
+  future.  This probably _isn't_ time to redesign every aspect of the
+  Tor protocol that anybody finds problematic.
+
+2.6. Low-hanging fruit and well-lit areas
+
+  Not all parts of Tor are tightly covered.  If it's possible to upgrade
+  different parts of the system at different rates from one another, we
+  should consider doing the stuff we can do easier, earlier.
+
+  But remember the story of the policeman who finds a drunk under a
+  streetlamp, staring at the ground?  The cop asks, "What are you
+  doing?"  The drunk says, "I'm looking for my keys!"  "Oh, did you drop
+  them around here?" says the policeman.  "No," says the drunk, "But the
+  light is so much better here!"
+
+  Or less proverbially: Simply because a change is easiest, does not
+  mean it is the best use of our time.  We should avoid getting bogged
+  down solving the _easy_ aspects of our system unless they happen also
+  to be _important_.
+
+2.7. Nice safe boring codes
+
+  Let's avoid, to the extent that we can:
+    - being the primary user of any cryptographic construction or
+      protocol.
+    - anything that hasn't gotten much attention in the literature.
+    - anything we would have to implement from scratch
+    - anything without a nice BSD-licensed C implementation
+
+  Some times we'll have the choice of a more efficient algorithm or a
+  more boring & well-analyzed one.  We should not even consider trading
+  conservative design for efficiency unless we are firmly in the
+  critical path.
+
+2.8. Key restrictions
+
+  Our spec says that RSA exponents should be 65537, but our code never
+  checks for that.  If we want to bolster resistance against collision
+  attacks, we could check this requirement.  To the best of my
+  knowledge, nothing violates it except for tools like "shallot" that
+  generate cute memorable .onion names.  If we want to be nice to
+  shallot users, we could check the requirement for everything *except*
+  hidden service identity keys.
+
+3. Aspects of Tor's cryptography, and thoughts on how to upgrade them all
+
+3.1. Link cryptography
+
+  Tor uses TLS for its link cryptography; it is easy to add more
+  ciphersuites to the acceptable list, or increase the length of
+  link-crypto public keys, or increase the length of the DH parameter,
+  or sign the X509 certificates with any digest algorithm that OpenSSL
+  clients will support.  Current Tor versions do not check any of these
+  against expected values.
+
+  The identity key used to sign the second certificate in the current
+  handshake protocol, however, is harder to change, since it needs to
+  match up with what we see in the router descriptor for the router
+  we're connecting to.  See notes on router identity below.  So long as
+  the certificate chain is ultimately authenticated by a RSA-1024 key,
+  it's not clear whether making the link RSA key longer on its own
+  really improves matters or not.
+
+  Recall also that for anti-fingerprinting reasons, we're thinking of
+  revising the protocol handshake some time in the 0.2.3.x timeframe.
+  If we do that, that might be a good time to make sure that we aren't
+  limited by the old identity key size.
+
+3.2. Circuit-extend crypto
+
+  Currently, our code requires RSA onion keys to be 1024 bits long.
+  Additionally, current nodes will not deliver an EXTEND cell unless it
+  is the right length.
+
+  For this, we might add a second, longer onion-key to router
+  descriptors, and a second CREATE2 cell to open new circuits
+  using this key type.  It should contain not only the onionskin, but
+  also information on onionskin version and ciphersuite.  Onionskins
+  generated for CREATE2 cells should use a larger DH group as well, and
+  keys should be derived from DH results using a better digest algorithm.
+
+  We should remove the length limit on EXTEND cells, backported to all
+  supported stable versions; call these "EXTEND2" cells.  Call these
+  "lightly patched".  Clients could use the new EXTEND2/CREATE2 format
+  whenever using a lightly patched or new server to extend to a new
+  server, and the old EXTEND/CREATE format otherwise.
+
+  The new onion skin format should try to avoid the design oddities of
+  our old one.  Instead of its current iffy hybrid encryption scheme, it
+  should probably do something more like a BEAR/LIONESS operation with a
+  fixed key on the g^x value, followed by a public key encryption on the
+  start of the encrypted data.  (Robert reminded me about this
+  construction.)
+
+  The current EXTEND cell format ends with a router identity
+  fingerprint, which is used by the extended-from router to authenticate
+  the extended-to router when it connects.  Changes to this will
+  interact with changes to how long an identity key can be and to the
+  link protocol; see notes on the link protocol above and about router
+  identity below.
+
+3.2.1. Circuit-extend crypto: fast case
+
+  When we do unauthenticated circuit extends with CREATE/CREATED_FAST,
+  the two input values are combined with SHA1.  I believe that's okay;
+  using any entropy here at all is overkill.
+
+3.3. Relay crypto
+
+  Upon receiving relay cells, a router transform the payload portion of
+  the cell with the appropriate key appropriate key, sees if it
+  recognized the cell (the recognized field is zero, the digest field is
+  correct, the cell is outbound), and pass them on if not.  It is
+  possible for each hop in the circuit to handle the relay crypto
+  differently; nobody but the client and the hop in question need to
+  coordinate their operations.
+
+  It's not clear, though, whether updating the relay crypto algorithms
+  would help anything, unless we changed the whole relay cell processing
+  format too.  The stream cipher is good enough, and the use of 4 bytes
+  of digest does not have enough bits to provide cryptographic strength,
+  no matter what cipher we use.
+
+  This is the likeliest area for the second-system effect to strike;
+  there are lots of opportunities to try to be more clever than we are
+  now.
+
+3.4. Router identity
+
+  This is one of the hardest things to change.  Right now, routers are
+  identified by a "fingerprint" equal to the SHA1 hash of their 1024-bit
+  identity key as given in their router descriptor.  No existing Tor
+  will accept any other size of identity key, or any other hash
+  algorithm.  The identity key itself is used:
+    - To sign the router descriptors
+    - To sign link-key certificates
+
+  The fingerprint is used:
+    - To identify a router identity key in EXTEND cells
+    - To identify a router identity key in bridge lines
+    - Throughout the controller interface
+    - To fetch bridge descriptors for a bridge
+    - To identify a particular router throughout the codebase
+    - In the .exit notation.
+    - By the controller to identify nodes
+    - To identify servers in the logs
+    - Probably other places too
+
+  To begin to allow other key types, key-lengths, and hash functions, we
+  would either need to wait till all current Tors are obsolete, or allow
+  routers to have more than one identity for a while.
+
+  To allow routers to have more than one identity, we need to
+  cross-certify identity keys.  We can do this trivially, in theory, by
+  listing both keys in the router descriptor and having both identities
+  sign the descriptor.  In practice, we will need to analyze this pretty
+  carefully to avoid attacks where one key is completely fake aimed to
+  trick old clients somehow.
+
+  Upgrading the hash algorithm once would be easy: just say that all
+  new-type keys get hashed using the new hash algorithm.  Remaining
+  future-proof could be tricky.
+
+  This is one of the hardest areas to update; "SHA1 of identity key" is
+  assumed in so many places throughout Tor that we'll probably need a
+  lot of design work to work with something else.
+
+3.5. Directory objects
+
+  Fortunately, the problem is not so bad for consensuses themselves,
+  because:
+    - Authority identity keys are allowed to be RSA keys of any length;
+      in practice I think they are all 3072 bits.
+    - Authority signing keys are also allowed to be of any length.
+      AFAIK the code works with longer signing keys just fine.
+    - Currently, votes are hashed with both sha1 and sha256; adding
+      more hash algorithms isn't so hard.
+    - Microdescriptor consensuses are all signed using sha256.  While
+      regular consensuses are signed using sha1, exploitable collisions
+      are hard to come up with, since once you had a collision, you
+      would need to get a majority of other authorities to agree to
+      generate it.
+
+  Router descriptors are currently identified by SHA1 digests of their
+  identity keys and descriptor digests in regular consensuses, and by
+  SHA1 digests of identity keys and SHA256 digests of microdescriptors
+  in microdesc consensuses.  The consensus-flavors design allows us to
+  generate new flavors of consensus that identity routers by new hashes
+  of their identity keys.  Alternatively, existing consensuses could be
+  expanded to contain more hashes, though that would have some space
+  concerns.
+
+  Router descriptors themselves are signed using RSA-1024 identity keys
+  and SHA1.  For information on updating identity keys, see above.
+
+  Router descriptors and extra-info documents cross-certify one another
+  using SHA1.
+
+  Mirodescriptors are currently specified to contain exactly one
+  onion key, of length 1024 bits.
+
+3.6. The directory protocol
+
+  Most objects are indexed by SHA1 hash of an identity key or a
+  descriptor object.  Adding more hash types wouldn't be a huge problem
+  at the directory cache level.
+
+3.7. The hidden service protocol
+
+  Hidden services self-identify by a 1024-bit RSA key.  Other keys
+  lengths are not supported.  This key is turned into an 80 bit half
+  SHA-1 hash for hidden service names.
+
+  The most simple change here would be to set an interface for putting
+  the whole ugly SHA1 hash in the hidden service name.  Remember that
+  this needs to coexist with the authentication system which also uses
+  .onion hostnames; that hostnames top out around 255 characters and and
+  their components top out at 63.
+
+  Currently, ESTABLISH_INTRO cells take a key length parameter, so in
+  theory they allow longer keys.  The rest of the protocol assumes that
+  this will be hashed into a 20-byte SHA1 identifier.  Changing that
+  would require changes at the introduction point as well as the hidden
+  service.
+
+  The parsing code for hidden service descriptors currently enforce a
+  1024-byte identity key, though this does not seem to be described in
+  the specification.  Changing that would be at least as hard as doing
+  it for regular identity keys.
+
+  Fortunately, hidden services are nearly completely orthogonal to
+  everything else.
+