mirror of
https://gitlab.torproject.org/tpo/core/tor.git
synced 2024-11-20 10:12:15 +01:00
Add proposed methodolody for tracking national usage trends.
svn:r14578
This commit is contained in:
parent
2238d8008d
commit
32065813ac
88
doc/spec/proposals/ideas/xxx-geoip-survey-plan.txt
Normal file
88
doc/spec/proposals/ideas/xxx-geoip-survey-plan.txt
Normal file
@ -0,0 +1,88 @@
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
This document explains how to tell about how many Tor users there
|
||||
are, and how many there are in which country. Statistics are
|
||||
involved.
|
||||
|
||||
Motivation
|
||||
|
||||
There are a few reasons we need to keep track of which countries
|
||||
Tor users (in aggregate) are coming from:
|
||||
|
||||
- Resource allocation. Knowing about underserved countries with
|
||||
lots of users can let us know about where we need to direct
|
||||
translation and outreach efforts.
|
||||
|
||||
- Anticensorship. Sudden drops in usage on a national basis can
|
||||
indicate the arrival of a censorious firewall.
|
||||
|
||||
- Sponsor outreach and self-evalutation. Many people and
|
||||
organizations who are interested in funding The Tor Project's
|
||||
work want to know that we're successfully serving parts of the
|
||||
world they're interested in, and that efforts to expand our
|
||||
userbase are actually succeeding. So, when you come right
|
||||
down to it, do we.
|
||||
|
||||
Goals
|
||||
|
||||
We want to know about how many Tor users there are, and which
|
||||
countries they're in, even in the presence of a hypothetical
|
||||
"directory guard" feature. Some uncertainty is okay, but we'd like
|
||||
to be able to put a bound on the uncertainty.
|
||||
|
||||
We need to make sure this information isn't exposed in a way that
|
||||
helps an adversary.
|
||||
|
||||
Methods:
|
||||
|
||||
Every client downloads network status documents. There are
|
||||
currently three methods (one hypothetical) for clients to get them.
|
||||
- 0.1.2.x clients (and earlier) fetch a v2 networkstatus
|
||||
document about every NETWORKSTATUS_CLIENT_DL_INTERVAL [30
|
||||
minutes].
|
||||
|
||||
- 0.2.0.x clients fetch a v3 networkstatus consensus document
|
||||
at a random interval between when their current document is no
|
||||
longer freshest, and when their current document is about to
|
||||
expire.
|
||||
|
||||
[In both of the above cases, clients choose a directory cache at
|
||||
random with odds roughly proportional to its bandwidth.]
|
||||
|
||||
- In some future version, clients will choose directory caches
|
||||
to serve as their "directory guards" to avoid profiling
|
||||
attacks, similarly to how clients currently start all their
|
||||
circuits at guard nodes.
|
||||
|
||||
We assume that a directory cache can tell which of these three
|
||||
categories a client is in by the format of its status request.
|
||||
|
||||
A directory cache can be made to count distinct client IP
|
||||
addresses that make a certain request of it in a given timeframe.
|
||||
For the first two cases, a cache can get a picture of the overall
|
||||
number and countries of users in the network by dividing the IP
|
||||
count by the probability with which they (as a cache) would be
|
||||
chosen. Assuming that our listed bandwidth is such that we expect
|
||||
to be chosen with probability P for any given request, and we've
|
||||
been counting IPs for long enough that we expect the average
|
||||
client to have made N requests, they will have visited us at least
|
||||
once with probability P' = 1-(1-P)^N, and so we divide the IP
|
||||
counts we've seen by P' for our estimate.
|
||||
|
||||
If directory guards are in use, directory guards get a picture of
|
||||
all those users who chose them as a guard when they were listed
|
||||
as a good choice for a guard, and who are also on the network
|
||||
now. The cleanest data here will come from nodes that were listed
|
||||
as good new-guards choices for a while, and have not been so for a
|
||||
while longer (to study decay rates); nodes that have been listed
|
||||
as good new-guard choices consistently for a long time (to get a
|
||||
sample of the network); and nodes that have been listed as good
|
||||
new-guard choices only recently (to get a sample of new users and
|
||||
users whose guards have died out.)
|
||||
|
||||
Note that these measurements *shouldn't* be taken at directory
|
||||
authorities: their picture of the network is too skewed by the
|
||||
special cases in which clients fetch from them directly.
|
||||
|
Loading…
Reference in New Issue
Block a user