doc/spec/path-spec.txt

   1
   2                            Tor Path Specification
   3
   4                               Roger Dingledine
   5                                Nick Mathewson
   6
   7 Note: This is an attempt to specify Tor as currently implemented.  Future
   8 versions of Tor will implement improved algorithms.
   9
  10 This document tries to cover how Tor chooses to build circuits and assign
  11 streams to circuits.  Other implementations MAY take other approaches, but
  12 implementors should be aware of the anonymity and load-balancing implications
  13 of their choices.
  14
  15                     THIS SPEC ISN'T DONE YET.
  16
  17 1. General operation
  18
  19    Tor begins building circuits as soon as it has enough directory
  20    information to do so (see section 5 of dir-spec.txt).  Some circuits are
  21    built preemptively because we expect to need them later (for user
  22    traffic), and some are built because of immediate need (for user traffic
  23    that no current circuit can handle, for testing the network or our
  24    reachability, and so on).
  25
  26    When a client application creates a new stream (by opening a SOCKS
  27    connection or launching a resolve request), we attach it to an appropriate
  28    open circuit if one exists, or wait if an appropriate circuit is
  29    in-progress. We launch a new circuit only
  30    if no current circuit can handle the request.  We rotate circuits over
  31    time to avoid some profiling attacks.
  32
  33    To build a circuit, we choose all the nodes we want to use, and then
  34    construct the circuit.  Sometimes, when we want a circuit that ends at a
  35    given hop, and we have an appropriate unused circuit, we "cannibalize" the
  36    existing circuit and extend it to the new terminus.
  37
  38    These processes are described in more detail below.
  39
  40    This document describes Tor's automatic path selection logic only; path
  41    selection can be overridden by a controller (with the EXTENDCIRCUIT and
  42    ATTACHSTREAM commands).  Paths constructed through these means may
  43    violate some constraints given below.
  44
  45 1.1. Terminology
  46
  47    A "path" is an ordered sequence of nodes, not yet built as a circuit.
  48
  49    A "clean" circuit is one that has not yet been used for any traffic.
  50
  51    A "fast" or "stable" or "valid" node is one that has the 'Fast' or
  52    'Stable' or 'Valid' flag
  53    set respectively, based on our current directory information.  A "fast"
  54    or "stable" circuit is one consisting only of "fast" or "stable" nodes.
  55
  56    In an "exit" circuit, the final node is chosen based on waiting stream
  57    requests if any, and in any case it avoids nodes with exit policy of
  58    "reject *:*". An "internal" circuit, on the other hand, is one where
  59    the final node is chosen just like a middle node (ignoring its exit
  60    policy).
  61
  62    A "request" is a client-side stream or DNS resolve that needs to be
  63    served by a circuit.
  64
  65    A "pending" circuit is one that we have started to build, but which has
  66    not yet completed.
  67
  68    A circuit or path "supports" a request if it is okay to use the
  69    circuit/path to fulfill the request, according to the rules given below.
  70    A circuit or path "might support" a request if some aspect of the request
  71    is unknown (usually its target IP), but we believe the path probably
  72    supports the request according to the rules given below.
  73
  74 1.1. A server's bandwidth
  75
  76    Old versions of Tor did not report bandwidths in network status
  77    documents, so clients had to learn them from the routers' advertised
  78    server descriptors.
  79
  80    For versions of Tor prior to 0.2.1.17-rc, everywhere below where we
  81    refer to a server's "bandwidth", we mean its clipped advertised
  82    bandwidth, computed by taking the smaller of the 'rate' and
  83    'observed' arguments to the "bandwidth" element in the server's
  84    descriptor.  If a router's advertised bandwidth is greater than
  85    MAX_BELIEVABLE_BANDWIDTH (currently 10 MB/s), we clipped to that
  86    value.
  87
  88    For more recent versions of Tor, we take the bandwidth value declared
  89    in the consensus, and fall back to the clipped advertised bandwidth
  90    only if the consensus does not have bandwidths listed.
  91
  92 2. Building circuits
  93
  94 2.1. When we build
  95
  96 2.1.1. Clients build circuits preemptively
  97
  98    When running as a client, Tor tries to maintain at least a certain
  99    number of clean circuits, so that new streams can be handled
 100    quickly.  To increase the likelihood of success, Tor tries to
 101    predict what circuits will be useful by choosing from among nodes
 102    that support the ports we have used in the recent past (by default
 103    one hour). Specifically, on startup Tor tries to maintain one clean
 104    fast exit circuit that allows connections to port 80, and at least
 105    two fast clean stable internal circuits in case we get a resolve
 106    request or hidden service request (at least three if we _run_ a
 107    hidden service).
 108
 109    After that, Tor will adapt the circuits that it preemptively builds
 110    based on the requests it sees from the user: it tries to have two fast
 111    clean exit circuits available for every port seen within the past hour
 112    (each circuit can be adequate for many predicted ports -- it doesn't
 113    need two separate circuits for each port), and it tries to have the
 114    above internal circuits available if we've seen resolves or hidden
 115    service activity within the past hour. If there are 12 or more clean
 116    circuits open, it doesn't open more even if it has more predictions.
 117
 118    Only stable circuits can "cover" a port that is listed in the
 119    LongLivedPorts config option. Similarly, hidden service requests
 120    to ports listed in LongLivedPorts make us create stable internal
 121    circuits.
 122
 123    Note that if there are no requests from the user for an hour, Tor
 124    will predict no use and build no preemptive circuits.
 125
 126    The Tor client SHOULD NOT store its list of predicted requests to a
 127    persistent medium.
 128
 129 2.1.2. Clients build circuits on demand
 130
 131    Additionally, when a client request exists that no circuit (built or
 132    pending) might support, we create a new circuit to support the request.
 133    For exit connections, we pick an exit node that will handle the
 134    most pending requests (choosing arbitrarily among ties), launch a
 135    circuit to end there, and repeat until every unattached request
 136    might be supported by a pending or built circuit. For internal
 137    circuits, we pick an arbitrary acceptable path, repeating as needed.
 138
 139    In some cases we can reuse an already established circuit if it's
 140    clean; see Section 2.3 (cannibalizing circuits) for details.
 141
 142 2.1.3. Servers build circuits for testing reachability and bandwidth
 143
 144    Tor servers test reachability of their ORPort once they have
 145    successfully built a circuit (on start and whenever their IP address
 146    changes). They build an ordinary fast internal circuit with themselves
 147    as the last hop. As soon as any testing circuit succeeds, the Tor
 148    server decides it's reachable and is willing to publish a descriptor.
 149
 150    We launch multiple testing circuits (one at a time), until we
 151    have NUM_PARALLEL_TESTING_CIRC (4) such circuits open. Then we
 152    do a "bandwidth test" by sending a certain number of relay drop
 153    cells down each circuit: BandwidthRate * 10 / CELL_NETWORK_SIZE
 154    total cells divided across the four circuits, but never more than
 155    CIRCWINDOW_START (1000) cells total. This exercises both outgoing and
 156    incoming bandwidth, and helps to jumpstart the observed bandwidth
 157    (see dir-spec.txt).
 158
 159    Tor servers also test reachability of their DirPort once they have
 160    established a circuit, but they use an ordinary exit circuit for
 161    this purpose.
 162
 163 2.1.4. Hidden-service circuits
 164
 165    See section 4 below.
 166
 167 2.1.5. Rate limiting of failed circuits
 168
 169    If we fail to build a circuit N times in a X second period (see Section
 170    2.3 for how this works), we stop building circuits until the X seconds
 171    have elapsed.
 172    XXXX
 173
 174 2.1.6. When to tear down circuits
 175
 176    XXXX
 177
 178 2.2. Path selection and constraints
 179
 180    We choose the path for each new circuit before we build it.  We choose the
 181    exit node first, followed by the other nodes in the circuit.  All paths
 182    we generate obey the following constraints:
 183      - We do not choose the same router twice for the same path.
 184      - We do not choose any router in the same family as another in the same
 185        path.
 186      - We do not choose more than one router in a given /16 subnet
 187        (unless EnforceDistinctSubnets is 0).
 188      - We don't choose any non-running or non-valid router unless we have
 189        been configured to do so. By default, we are configured to allow
 190        non-valid routers in "middle" and "rendezvous" positions.
 191      - If we're using Guard nodes, the first node must be a Guard (see 5
 192        below)
 193      - XXXX Choosing the length
 194
 195    For circuits that do not need to be "fast", when choosing among
 196    multiple candidates for a path element, we choose randomly.
 197
 198    For "fast" circuits, we pick a given router as an exit with probability
 199    proportional to its bandwidth.
 200
 201    For non-exit positions on "fast" circuits, we pick routers as above, but
 202    we weight the bandwidth of Exit-flagged nodes depending
 203    on the fraction of bandwidth available from non-Exit nodes.  Call the
 204    total bandwidth for Exit nodes under consideration E,
 205    and the total bandwidth for all nodes under
 206    consideration T.  If E<T/3, we do not consider Exit-flagged nodes.
 207    Otherwise, we weight their bandwidth with the factor (E-T/3)/E. This
 208    ensures that bandwidth is evenly distributed over nodes in 3-hop paths.
 209
 210    Similarly, guard nodes are weighted by the factor (G-T/3)/G, and not
 211    considered for non-guard positions if this value is less than 0.
 212
 213    Additionally, we may be building circuits with one or more requests in
 214    mind.  Each kind of request puts certain constraints on paths:
 215
 216      - All service-side introduction circuits and all rendezvous paths
 217        should be Stable.
 218      - All connection requests for connections that we think will need to
 219        stay open a long time require Stable circuits.  Currently, Tor decides
 220        this by examining the request's target port, and comparing it to a
 221        list of "long-lived" ports. (Default: 21, 22, 706, 1863, 5050,
 222        5190, 5222, 5223, 6667, 6697, 8300.)
 223      - DNS resolves require an exit node whose exit policy is not equivalent
 224        to "reject *:*".
 225      - Reverse DNS resolves require a version of Tor with advertised eventdns
 226        support (available in Tor 0.1.2.1-alpha-dev and later).
 227      - All connection requests require an exit node whose exit policy
 228        supports their target address and port (if known), or which "might
 229        support it" (if the address isn't known).  See 2.2.1.
 230      - Rules for Fast? XXXXX
 231
 232 2.2.1. Choosing an exit
 233
 234    If we know what IP address we want to connect to or resolve, we can
 235    trivially tell whether a given router will support it by simulating
 236    its declared exit policy.
 237
 238    Because we often connect to addresses of the form hostname:port, we do not
 239    always know the target IP address when we select an exit node.  In these
 240    cases, we need to pick an exit node that "might support" connections to a
 241    given address port with an unknown address.  An exit node "might support"
 242    such a connection if any clause that accepts any connections to that port
 243    precedes all clauses (if any) that reject all connections to that port.
 244
 245    Unless requested to do so by the user, we never choose an exit server
 246    flagged as "BadExit" by more than half of the authorities who advertise
 247    themselves as listing bad exits.
 248
 249 2.2.2. User configuration
 250
 251    Users can alter the default behavior for path selection with configuration
 252    options.
 253
 254    - If "ExitNodes" is provided, then every request requires an exit node on
 255      the ExitNodes list.  (If a request is supported by no nodes on that list,
 256      and StrictExitNodes is false, then Tor treats that request as if
 257      ExitNodes were not provided.)
 258
 259    - "EntryNodes" and "StrictEntryNodes" behave analogously.
 260
 261    - If a user tries to connect to or resolve a hostname of the form
 262      <target>.<servername>.exit, the request is rewritten to a request for
 263      <target>, and the request is only supported by the exit whose nickname
 264      or fingerprint is <servername>.
 265
 266 2.3. Cannibalizing circuits
 267
 268    If we need a circuit and have a clean one already established, in
 269    some cases we can adapt the clean circuit for our new
 270    purpose. Specifically,
 271
 272    For hidden service interactions, we can "cannibalize" a clean internal
 273    circuit if one is available, so we don't need to build those circuits
 274    from scratch on demand.
 275
 276    We can also cannibalize clean circuits when the client asks to exit
 277    at a given node -- either via the ".exit" notation or because the
 278    destination is running at the same location as an exit node.
 279
 280
 281 2.4. Handling failure
 282
 283    If an attempt to extend a circuit fails (either because the first create
 284    failed or a subsequent extend failed) then the circuit is torn down and is
 285    no longer pending.  (XXXX really?)  Requests that might have been
 286    supported by the pending circuit thus become unsupported, and a new
 287    circuit needs to be constructed.
 288
 289    If a stream "begin" attempt fails with an EXITPOLICY error, we
 290    decide that the exit node's exit policy is not correctly advertised,
 291    so we treat the exit node as if it were a non-exit until we retrieve
 292    a fresh descriptor for it.
 293
 294    XXXX
 295
 296 3. Attaching streams to circuits
 297
 298    When a circuit that might support a request is built, Tor tries to attach
 299    the request's stream to the circuit and sends a BEGIN, BEGIN_DIR,
 300    or RESOLVE relay
 301    cell as appropriate.  If the request completes unsuccessfully, Tor
 302    considers the reason given in the CLOSE relay cell. [XXX yes, and?]
 303
 304
 305    After a request has remained unattached for SocksTimeout (2 minutes
 306    by default), Tor abandons the attempt and signals an error to the
 307    client as appropriate (e.g., by closing the SOCKS connection).
 308
 309    XXX Timeouts and when Tor auto-retries.
 310     * What stream-end-reasons are appropriate for retrying.
 311
 312    If no reply to BEGIN/RESOLVE, then the stream will timeout and fail.
 313
 314 4. Hidden-service related circuits
 315
 316   XXX Tracking expected hidden service use (client-side and hidserv-side)
 317
 318 5. Guard nodes
 319
 320   We use Guard nodes (also called "helper nodes" in the literature) to
 321   prevent certain profiling attacks.  Here's the risk: if we choose entry and
 322   exit nodes at random, and an attacker controls C out of N servers
 323   (ignoring bandwidth), then the
 324   attacker will control the entry and exit node of any given circuit with
 325   probability (C/N)^2.  But as we make many different circuits over time,
 326   then the probability that the attacker will see a sample of about (C/N)^2
 327   of our traffic goes to 1.  Since statistical sampling works, the attacker
 328   can be sure of learning a profile of our behavior.
 329
 330   If, on the other hand, we picked an entry node and held it fixed, we would
 331   have probability C/N of choosing a bad entry and being profiled, and
 332   probability (N-C)/N of choosing a good entry and not being profiled.
 333
 334   When guard nodes are enabled, Tor maintains an ordered list of entry nodes
 335   as our chosen guards, and stores this list persistently to disk.  If a Guard
 336   node becomes unusable, rather than replacing it, Tor adds new guards to the
 337   end of the list.  When choosing the first hop of a circuit, Tor
 338   chooses at
 339   random from among the first NumEntryGuards (default 3) usable guards on the
 340   list.  If there are not at least 2 usable guards on the list, Tor adds
 341   routers until there are, or until there are no more usable routers to add.
 342
 343   A guard is unusable if any of the following hold:
 344     - it is not marked as a Guard by the networkstatuses,
 345     - it is not marked Valid (and the user hasn't set AllowInvalid entry)
 346     - it is not marked Running
 347     - Tor couldn't reach it the last time it tried to connect
 348
 349   A guard is unusable for a particular circuit if any of the rules for path
 350   selection in 2.2 are not met.  In particular, if the circuit is "fast"
 351   and the guard is not Fast, or if the circuit is "stable" and the guard is
 352   not Stable, or if the guard has already been chosen as the exit node in
 353   that circuit, Tor can't use it as a guard node for that circuit.
 354
 355   If the guard is excluded because of its status in the networkstatuses for
 356   over 30 days, Tor removes it from the list entirely, preserving order.
 357
 358   If Tor fails to connect to an otherwise usable guard, it retries
 359   periodically: every hour for six hours, every 4 hours for 3 days, every
 360   18 hours for a week, and every 36 hours thereafter.  Additionally, Tor
 361   retries unreachable guards the first time it adds a new guard to the list,
 362   since it is possible that the old guards were only marked as unreachable
 363   because the network was unreachable or down.
 364
 365   Tor does not add a guard persistently to the list until the first time we
 366   have connected to it successfully.
 367
 368 6. Router descriptor purposes
 369
 370   There are currently three "purposes" supported for router descriptors:
 371   general, controller, and bridge. Most descriptors are of type general
 372   -- these are the ones listed in the consensus, and the ones fetched
 373   and used in normal cases.
 374
 375   Controller-purpose descriptors are those delivered by the controller
 376   and labelled as such: they will be kept around (and expire like
 377   normal descriptors), and they can be used by the controller in its
 378   CIRCUITEXTEND commands. Otherwise they are ignored by Tor when it
 379   chooses paths.
 380
 381   Bridge-purpose descriptors are for routers that are used as bridges. See
 382   doc/design-paper/blocking.pdf for more design explanation, or proposal
 383   125 for specific details. Currently bridge descriptors are used in place
 384   of normal entry guards, for Tor clients that have UseBridges enabled.
 385
 386
 387 X. Old notes
 388
 389 X.1. Do we actually do this?
 390
 391 How to deal with network down.
 392   - While all helpers are down/unreachable and there are no established
 393     or on-the-way testing circuits, launch a testing circuit. (Do this
 394     periodically in the same way we try to establish normal circuits
 395     when things are working normally.)
 396     (Testing circuits are a special type of circuit, that streams won't
 397     attach to by accident.)
 398   - When a testing circuit succeeds, mark all helpers up and hold
 399     the testing circuit open.
 400   - If a connection to a helper succeeds, close all testing circuits.
 401     Else mark that helper down and try another.
 402   - If the last helper is marked down and we already have a testing
 403     circuit established, then add the first hop of that testing circuit
 404     to the end of our helper node list, close that testing circuit,
 405     and go back to square one. (Actually, rather than closing the
 406     testing circuit, can we get away with converting it to a normal
 407     circuit and beginning to use it immediately?)
 408
 409   [Do we actually do any of the above?  If so, let's spec it.  If not, let's
 410   remove it. -NM]
 411
 412 X.2. A thing we could do to deal with reachability.
 413
 414 And as a bonus, it leads to an answer to Nick's attack ("If I pick
 415 my helper nodes all on 18.0.0.0:*, then I move, you'll know where I
 416 bootstrapped") -- the answer is to pick your original three helper nodes
 417 without regard for reachability. Then the above algorithm will add some
 418 more that are reachable for you, and if you move somewhere, it's more
 419 likely (though not certain) that some of the originals will become useful.
 420 Is that smart or just complex?
 421
 422 X.3. Some stuff that worries me about entry guards. 2006 Jun, Nickm.
 423
 424   It is unlikely for two users to have the same set of entry guards.
 425   Observing a user is sufficient to learn its entry guards.  So, as we move
 426   around, entry guards make us linkable.  If we want to change guards when
 427   our location (IP? subnet?) changes, we have two bad options.  We could
 428     - Drop the old guards.  But if we go back to our old location,
 429       we'll not use our old guards.  For a laptop that sometimes gets used
 430       from work and sometimes from home, this is pretty fatal.
 431     - Remember the old guards as associated with the old location, and use
 432       them again if we ever go back to the old location.  This would be
 433       nasty, since it would force us to record where we've been.
 434
 435   [Do we do any of this now? If not, this should move into 099-misc or
 436   098-todo. -NM]
 437