doc/spec/proposals/161-computing-bandwidth-adjustments.txt

   1 Title: Computing Bandwidth Adjustments
   2 Filename: 161-computing-bandwidth-adjustments.txt
   3 Author: Mike Perry
   4 Created: 12-May-2009
   5 Target: 0.2.2.x
   6 Status: Finished
   7
   8
   9 1. Motivation
  10
  11   There is high variance in the performance of the Tor network. Despite
  12   our efforts to balance load evenly across the Tor nodes, some nodes are
  13   significantly slower and more overloaded than others.
  14
  15   Proposal 160 describes how we can augment the directory authorities to
  16   vote on measured bandwidths for routers. This proposal describes what
  17   goes into the measuring process.
  18
  19
  20 2. Measurement Selection
  21
  22   The general idea is to determine a load factor representing the ratio
  23   of the capacity of measured nodes to the rest of the network. This load
  24   factor could be computed from three potentially relevant statistics:
  25   circuit failure rates, circuit extend times, or stream capacity.
  26
  27   Circuit failure rates and circuit extend times appear to be
  28   non-linearly proportional to node load. We've observed that the same
  29   nodes when scanned at US nighttime hours (when load is presumably
  30   lower) exhibit almost no circuit failure, and significantly faster
  31   extend times than when scanned during the day.
  32
  33   Stream capacity, however, is much more uniform, even during US
  34   nighttime hours. Moreover, it is a more intuitive representation of
  35   node capacity, and also less dependent upon distance and latency
  36   if amortized over large stream fetches.
  37
  38
  39 3. Average Stream Bandwidth Calculation
  40
  41   The average stream bandwidths are obtained by dividing the network into
  42   slices of 50 nodes each, grouped according to advertised node bandwidth.
  43
  44   Two hop circuits are built using nodes from the same slice, and a large
  45   file is downloaded via these circuits. The file sizes are set based
  46   on node percentile rank as follows:
  47
  48      0-10: 2M
  49      10-20: 1M
  50      20-30: 512k
  51      30-50: 256k
  52      50-100: 128k
  53
  54   These sizes are based on measurements performed during test scans.
  55
  56   This process is repeated until each node has been chosen to participate
  57   in at least 5 circuits.
  58
  59
  60 4. Ratio Calculation
  61
  62   The ratios are calculated by dividing each measured value by the
  63   network-wide average.
  64
  65
  66 5. Ratio Filtering
  67
  68   After the base ratios are calculated, a second pass is performed
  69   to remove any streams with nodes of ratios less than X=0.5 from
  70   the results of other nodes. In addition, all outlying streams
  71   with capacity of one standard deviation below a node's average
  72   are also removed.
  73
  74   The final ratio result will be greater of the unfiltered ratio
  75   and the filtered ratio.
  76
  77
  78 6. Pseudocode for Ratio Calculation Algorithm
  79
  80   Here is the complete pseudocode for the ratio algorithm:
  81
  82     Slices = {S | S is 50 nodes of similar consensus capacity}
  83     for S in Slices:
  84       while exists node N in S with circ_chosen(N) < 7:
  85         fetch_slice_file(build_2hop_circuit(N, (exit in S)))
  86       for N in S:
  87         BW_measured(N) = MEAN(b | b is bandwidth of a stream through N)
  88         Bw_stddev(N) = STDDEV(b | b is bandwidth of a stream through N)
  89       Bw_avg(S) = MEAN(b | b = BW_measured(N) for all N in S)
  90       for N in S:
  91         Normal_Streams(N) = {stream via N | bandwidth >= BW_measured(N)}
  92         BW_Norm_measured(N) =  MEAN(b | b is a bandwidth of Normal_Streams(N))
  93
  94     Bw_net_avg(Slices) = MEAN(BW_measured(N) for all N in Slices)
  95     Bw_Norm_net_avg(Slices) = MEAN(BW_Norm_measured(N) for all N in Slices)
  96
  97     for N in all Slices:
  98       Bw_net_ratio(N) = Bw_measured(N)/Bw_net_avg(Slices)
  99       Bw_Norm_net_ratio(N) = BW_Norm_measured(N)/Bw_Norm_net_avg(Slices)
 100
 101       ResultRatio(N) = MAX(Bw_net_ratio(N), Bw_Norm_net_ratio(N))
 102
 103
 104 7. Security implications
 105
 106   The ratio filtering will deal with cases of sabotage by dropping
 107   both very slow outliers in stream average calculations, as well
 108   as dropping streams that used very slow nodes from the calculation
 109   of other nodes.
 110
 111   This scheme will not address nodes that try to game the system by
 112   providing better service to scanners. The scanners can be detected
 113   at the entry by IP address, and at the exit by the destination fetch
 114   IP.
 115
 116   Measures can be taken to obfuscate and separate the scanners' source
 117   IP address from the directory authority IP address. For instance,
 118   scans can happen offsite and the results can be rsynced into the
 119   authorities. The destination server IP can also change.
 120
 121   Neither of these methods are foolproof, but such nodes can already
 122   lie about their bandwidth to attract more traffic, so this solution
 123   does not set us back any in that regard.
 124
 125
 126 8. Parallelization
 127
 128   Because each slice takes as long as 6 hours to complete, we will want
 129   to parallelize as much as possible. This will be done by concurrently
 130   running multiple scanners from each authority to deal with different
 131   segments of the network. Each scanner piece will continually loop
 132   over a portion of the network, outputting files of the form:
 133
 134    node_id=<idhex> SP strm_bw=<BW_measured(N)> SP
 135          filt_bw=<BW_Norm_measured(N)> ns_bw=<CurrentConsensusBw(N)> NL
 136
 137   The most recent file from each scanner will be periodically gathered
 138   by another script that uses them to produce network-wide averages
 139   and calculate ratios as per the algorithm in section 6. Because nodes
 140   may shift in capacity, they may appear in more than one slice and/or
 141   appear more than once in the file set. The most recently measured
 142   line will be chosen in this case.
 143
 144
 145 9. Integration with Proposal 160
 146
 147   The final results will be produced for the voting mechanism
 148   described in Proposal 160 by multiplying the derived ratio by
 149   the average published consensus bandwidth during the course of the
 150   scan, and taking the weighted average with the previous consensus
 151   bandwidth:
 152
 153      Bw_new = Round((Bw_current * Alpha + Bw_scan_avg*Bw_ratio)/(Alpha + 1))
 154
 155   The Alpha parameter is a smoothing parameter intended to prevent
 156   rapid oscillation between loaded and unloaded conditions. It is
 157   currently fixed at 0.333.
 158
 159   The Round() step consists of rounding to the 3 most significant figures
 160   in base10, and then rounding that result to the nearest 1000, with
 161   a minimum value of 1000.
 162
 163   This will produce a new bandwidth value that will be output into a
 164   file consisting of lines of the form:
 165
 166      node_id=<idhex> SP bw=<Bw_new> NL
 167
 168   The first line of the file will contain a timestamp in UNIX time()
 169   seconds. This will be used by the authority to decide if the
 170   measured values are too old to use.
 171
 172   This file can be either copied or rsynced into a directory readable
 173   by the directory authority.
 174