proposals/175-automatic-node-promotion.txt

   1 Filename: 175-automatic-node-promotion.txt
   2 Title: Automatically promoting Tor clients to nodes
   3 Author: Steven Murdoch
   4 Created: 12-Mar-2010
   5 Status: Draft
   6
   7 1. Overview
   8
   9    This proposal describes how Tor clients could determine when they
  10    have sufficient bandwidth capacity and are sufficiently reliable to
  11    become either bridges or Tor relays. When they meet this
  12    criteria, they will automatically promote themselves, based on user
  13    preferences. The proposal also defines the new controller messages
  14    and options which will control this process.
  15
  16    Note that for the moment, only transitions between client and
  17    bridge are being considered. Transitions to public relay will
  18    be considered at a future date, but will use the same
  19    infrastructure for measuring capacity and reliability.
  20
  21 2. Motivation and history
  22
  23    Tor has a growing user-base and one of the major impediments to the
  24    quality of service offered is the lack of network capacity. This is
  25    particularly the case for bridges, because these are gradually
  26    being blocked, and thus no longer of use to people within some
  27    countries. By automatically promoting Tor clients to bridges, and
  28    perhaps also to full public relays, this proposal aims to solve
  29    these problems.
  30
  31    Only Tor clients which are sufficiently useful should be promoted,
  32    and the process of determining usefulness should be performed
  33    without reporting the existence of the client to the central
  34    authorities. The criteria used for determining usefulness will be
  35    in terms of bandwidth capacity and uptime, but parameters should be
  36    specified in the directory consensus. State stored at the client
  37    should be in no more detail than necessary, to prevent sensitive
  38    information being recorded.
  39
  40 3. Design
  41
  42 3.x Opt-in state model
  43
  44    Tor can be in one of five node-promotion states:
  45
  46    - off (O): Currently a client, and will stay as such
  47    - auto (A): Currently a client, but will consider promotion
  48    - bridge (B): Currently a bridge, and will stay as such
  49    - auto-bridge (AB): Currently a bridge, but will consider promotion
  50    - relay (R): Currently a public relay, and will stay as such
  51
  52    The state can be fully controlled from the configuration file or
  53    controller, but the normal state transitions are as follows:
  54
  55    Any state -> off: User has opted out of node promotion
  56    Off -> any state: Only permitted with user consent
  57
  58    Auto -> auto-bridge: Tor has detected that it is sufficiently
  59     reliable to be a *bridge*
  60    Auto -> bridge: Tor has detected that it is sufficiently reliable
  61     to be a *relay*, but the user has chosen to remain a *bridge*
  62    Auto -> relay: Tor has detected that it is sufficiently reliable
  63     to be *relay*, and will skip being a *bridge*
  64    Auto-bridge -> relay: Tor has detected that it is sufficiently
  65     reliable to be a *relay*
  66
  67    Note that this model does not support automatic demotion. If this
  68    is desirable, there should be some memory as to whether the
  69    previous state was relay, bridge, or auto-bridge. Otherwise the
  70    user may be prompted to become a relay, although he has opted to
  71    only be a bridge.
  72
  73 3.x User interaction policy
  74
  75    There are a variety of options in how to involve the user into the
  76    decision as to whether and when to perform node promotion. The
  77    choice also may be different when Tor is running from Vidalia (and
  78    thus can readily prompt the user for information), and standalone
  79    (where Tor can only log messages, which may or may not be read).
  80
  81    The option requiring minimal user interaction is to automatically
  82    promote nodes according to reliability, and allow the user to opt
  83    out, by changing settings in the configuration file or Vidalia user
  84    interface.
  85
  86    Alternatively, if a user interface is available, Tor could prompt
  87    the user when it detects that a transition is available, and allow
  88    the user to choose which of the available options to select. If
  89    Vidalia is not available, it still may be possible to solicit an
  90    email address on install, and contact the operator to ask whether
  91    a transition to bridge or relay is permitted.
  92
  93    Finally, Tor could by default not make any transition, and the user
  94    would need to opt in by stating the maximum level (bridge or
  95    relay) to which the node may automatically promote itself.
  96
  97 3.x Performance monitoring model
  98
  99    To prevent a large number of clients activating as relays, but
 100    being too unreliable to be useful, clients should measure their
 101    performance. If this performance meets a parameterized acceptance
 102    criteria, a client should consider promotion. To measure
 103    reliability, this proposal adopts a simple user model:
 104
 105     - A user decides to use Tor at times which follow a Poisson
 106       distribution
 107     - At each time, the user will be happy if the bridge chosen has
 108       adequate bandwidth and is reachable
 109     - If the chosen bridge is down or slow too many times, the user
 110       will consider Tor to be bad
 111
 112    If we additionally assume that the recent history of relay
 113    performance matches the current performance, we can measure
 114    reliability by simulating this simple user.
 115
 116    The following parameters are distributed to clients in the
 117    directory consensus:
 118
 119      - min_bandwidth: Minimum self-measured bandwidth for a node to be
 120        considered useful, in bytes per second
 121      - check_period: How long, in seconds, to wait between checking
 122        reachability and bandwidth (on average)
 123      - num_samples: Number of recent samples to keep
 124      - num_useful: Minimum number of recent samples where the node was
 125        reachable and had at least min_bandwidth capacity, for a client
 126        to consider promoting to a bridge
 127
 128    A different set of parameters may be used for considering when to
 129    promote a bridge to a full relay, but this will be the subject of a
 130    future revision of the proposal.
 131
 132 3.x Performance monitoring algorithm
 133
 134    The simulation described above can be implemented as follows:
 135
 136    Every 60 seconds:
 137      1. Tor generates a random floating point number x in
 138         the interval [0, 1).
 139      2. If x > (1 / (check_period / 60)) GOTO end; otherwise:
 140      3. Tor sets the value last_check to the current_time (in seconds)
 141      4. Tor measures reachability
 142      5. If the client is reachable, Tor measures its bandwidth
 143      6. If the client is reachable and the bandwidth is >=
 144         min_bandwidth, the test has succeeded, otherwise it has failed.
 145      7. Tor adds the test result to the end of a ring-buffer containing
 146         the last num_samples results: measurement_results
 147      8. Tor saves last_check and measurements_results to disk
 148      9. If the length of measurements_results == num_samples and
 149         the number of successes >= num_useful, Tor should consider
 150         promotion to a bridge
 151    end.
 152
 153    When Tor starts, it must fill in the samples for which it was not
 154    running. This can only happen once the consensus has downloaded,
 155    because the value of check_period is needed.
 156
 157       1. Tor generates a random number y from the Poisson distribution [1]
 158          with lambda = (current_time - last_check) * (1 / check_period)
 159       2. Tor sets the value last_check to the current_time (in seconds)
 160       3. Add y test failures to the ring buffer measurements_results
 161       4. Tor saves last_check and measurements_results to disk
 162
 163    In this way, a Tor client will measure its bandwidth and
 164    reachability every check_period seconds, on average. Provided
 165    check_period is sufficiently greater than a minute (say, at least an
 166    hour), the times of check will follow a Poisson distribution. [2]
 167
 168    While this does require that Tor does record the state of a client
 169    over time, this does not leak much information. Only a binary
 170    reachable/non-reachable is stored, and the timing of samples becomes
 171    increasingly fuzzy as the data becomes less recent.
 172
 173    On IP address changes, Tor should clear the ring-buffer, because
 174    from the perspective of users with the old IP address, this node
 175    might as well be a new one with no history. This policy may change
 176    once we start allowing the bridge authority to hand out new IP
 177    addresses given the fingerprint.
 178    [Perhaps another consensus param? Also, this means we save previous
 179     IP address in our state file, yes? -RD]
 180
 181 3.x Bandwidth measurement
 182
 183    Tor needs to measure its bandwidth to test the usefulness as a
 184    bridge. A non-intrusive way to do this would be to passively measure
 185    the peak data transfer rate since the last reachability test. Once
 186    this exceeds min_bandwidth, Tor can set a flag that this node
 187    currently has sufficient bandwidth to pass the bandwidth component
 188    of the upcoming performance measurement.
 189
 190    For the first version we may simply skip the bandwidth test,
 191    because the existing reachability test sends 500 kB over several
 192    circuits, and checks whether the node can transfer at least 50
 193    kB/s.  This is probably good enough for a bridge, so this test
 194    might be sufficient to record a success in the ring buffer.
 195
 196 3.x New options
 197
 198 3.x New controller message
 199
 200 4. Migration plan
 201
 202    We should start by setting a high bandwidth and uptime requirement
 203    in the consensus, so as to avoid overloading the bridge authority
 204    with too many bridges. Once we are confident our systems can scale,
 205    the criteria can be gradually shifted down to gain more bridges.
 206
 207 5. Related proposals
 208
 209 6. Open questions:
 210
 211    - What user interaction policy should we take?
 212
 213    - When (if ever) should we turn a relay into an exit relay?
 214
 215    - What should the rate limits be for auto-promoted bridges/relays?
 216      Should we prompt the user for this?
 217
 218    - Perhaps the bridge authority should tell potential bridges
 219      whether to enable themselves, by taking into account whether
 220      their IP address is blocked
 221
 222    - How do we explain the possible risks of running a bridge/relay
 223      * Use of bandwidth/congestion
 224      * Publication of IP address
 225      * Blocking from IRC (even for non-exit relays)
 226
 227    - What feedback should we give to bridge relays, to encourage them
 228      e.g. number of recent users (what about reserve bridges)?
 229
 230    - Can clients back-off from doing these tests (yes, we should do
 231      this)
 232
 233 [1] For algorithms to generate random numbers from the Poisson
 234     distribution, see: http://en.wikipedia.org/wiki/Poisson_distribution#Generating_Poisson-distributed_random_variables
 235 [2] "The sample size n should be equal to or larger than 20 and the
 236      probability of a single success, p, should be smaller than or equal to
 237      .05. If n >= 100, the approximation is excellent if np is also <= 10."
 238     http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc331.htm (e-Handbook of Statistical Methods)
 239
 240 % vim: spell ai et: