Add guard node failure plans to proposal.

svn:r15706
This commit is contained in:
Mike Perry 2008-07-06 23:36:33 +00:00
parent 0f8761f9fa
commit 272165e659

View file

@ -9,9 +9,9 @@ Status: Draft
Overview Overview
The performance of paths selected can be improved by adjusting the The performance of paths selected can be improved by adjusting the
CircuitBuildTimeout and the number of guards. This proposal describes CircuitBuildTimeout and avoiding failing guard nodes. This proposal
a method of tracking buildtime statistics, and using those statistics describes a method of tracking buildtime statistics, and using those
to adjust the CircuitBuildTimeout and the number of guards. statistics to adjust the CircuitBuildTimeout and the number of guards.
Motivation Motivation
@ -26,14 +26,17 @@ Implementation
Based on studies of build times, we found that the distribution of Based on studies of build times, we found that the distribution of
circuit buildtimes appears to be a Pareto distribution. The number circuit buildtimes appears to be a Pareto distribution. The number
of circuits to observe (ncircuits_to_observe) before changing the of circuits to observe (ncircuits_to_cutoff) before changing the
CircuitBuildTimeout will be tunable. From our preliminary CircuitBuildTimeout will be tunable. From out measurements,
measurements, it is likely that ncircuits_to_observe will be ncircuits_to_cuttoff appears to be on the order of 100.
somewhere on the order of 1000. The values can be represented
compactly in Tor in milliseconds as a circular array of 16 bit In addition, the total number of circuits gathered
integers. More compact long-term storage representations can be (ncircuits_to_observe) will also be tunable. It is likely that
implemented by simply storing a histogram with 50 millisecond ncircuits_to_observe will be somewhere on the order of 1000. The values
buckets when writing out the statistics to disk. can be represented compactly in Tor in milliseconds as a circular array
of 16 bit integers. More compact long-term storage representations can
be implemented by simply storing a histogram with 50 millisecond buckets
when writing out the statistics to disk.
Calculating the preferred CircuitBuildTimeout Calculating the preferred CircuitBuildTimeout
@ -47,13 +50,43 @@ Implementation
of expected CDF of timeouts. Also, in the event of network failure, of expected CDF of timeouts. Also, in the event of network failure,
the observation mechanism should stop collecting timeout data. the observation mechanism should stop collecting timeout data.
Other notes Dropping Failed Guards
In addition, we have noticed that some entry guards are much more
failure prone than others. In particular, the circuit failure rates for
the fastest entry guards was approximately 20-25%, where as slower
guards exhibit failure rates as high as 45-50%. In [1], it was
demonstrated that failing guard nodes can deliberately bias path
selection to improve their success at capturing traffic. For both these
reasons, failing guards should be avoided.
We propose increasing the number of entry guards to five, and gathering
circuit failure statistics on each entry guard. Any guards that exceed
the average failure rate of all guards by 10% after we have
gathered ncircuits_to_observe circuits will be replaced.
Issues
Impact on anonymity
Since this follows a Pareto distribution, large reductions on the Since this follows a Pareto distribution, large reductions on the
timeout can be achieved without cutting off a great number of the timeout can be achieved without cutting off a great number of the
total paths. However, hard statistics on which cutoff percentage total paths. However, hard statistics on which cutoff percentage
gives optimal performance have not yet been gathered. gives optimal performance have not yet been gathered.
Issues Guard Turnover
We contend that the risk from failing guards biasing path selection
outweighs the risk of exposure to larger portions of the network
for the first hop. Furthermore, from our observations, it appears
that circuit failure is strongly correlated to node load. Allowing
clients to migrate away from failing guards should naturally
rebalance the network, and eventually clients should converge on
a stable set of reliable guards. It is also likely that once clients
begin to migrate away from failing guards, their load should go
down, causing their failure rates to drop as well.
[1] http://www.crhc.uiuc.edu/~nikita/papers/relmix-ccs07.pdf
Impact on anonymity