doc/tor-spec.txt

   1 $Id$
   2
   3                          Tor Protocol Specification
   4
   5                               Roger Dingledine
   6                                Nick Mathewson
   7
   8 Note: This is an attempt to specify Tor as currently implemented.  Future
   9 versions of Tor will implement improved protocols, and compatibility is not
  10 guaranteed.
  11
  12 This is not a design document; most design criteria are not examined.  For
  13 more information on why Tor acts as it does, see tor-design.pdf.
  14
  15 TODO: (very soon)
  16       - REASON_CONNECTFAILED should include an IP.
  17       - Copy prose from tor-design to make everything more readable.
  18 when do we rotate which keys (tls, link, etc)?
  19
  20 0. Notation:
  21
  22    PK -- a public key.
  23    SK -- a private key
  24    K  -- a key for a symmetric cypher
  25
  26    a|b -- concatenation of 'a' and 'b'.
  27
  28    [A0 B1 C2] -- a three-byte sequence, containing the bytes with
  29    hexadecimal values A0, B1, and C2, in that order.
  30
  31    All numeric values are encoded in network (big-endian) order.
  32
  33    Unless otherwise specified, all symmetric ciphers are AES in counter
  34    mode, with an IV of all 0 bytes.  Asymmetric ciphers are either RSA
  35    with 1024-bit keys and exponents of 65537, or DH where the generator (g)
  36    is 2 and the modulus (p) is the 1024-bit safe prime from rfc2409,
  37    section 6.2, whose hex representation is:
  38
  39      "FFFFFFFFFFFFFFFFC90FDAA22168C234C4C6628B80DC1CD129024E08"
  40      "8A67CC74020BBEA63B139B22514A08798E3404DDEF9519B3CD3A431B"
  41      "302B0A6DF25F14374FE1356D6D51C245E485B576625E7EC6F44C42E9"
  42      "A637ED6B0BFF5CB6F406B7EDEE386BFB5A899FA5AE9F24117C4B1FE6"
  43      "49286651ECE65381FFFFFFFFFFFFFFFF"
  44
  45    As an optimization, implementations SHOULD choose DH private keys (x) of
  46    320 bits.  Implementations that do this MUST never use any DH key more
  47    than once.
  48
  49    All "hashes" are 20-byte SHA1 cryptographic digests.
  50
  51    When we refer to "the hash of a public key", we mean the SHA1 hash of the
  52    DER encoding of an ASN.1 RSA public key (as specified in PKCS.1).
  53
  54 1. System overview
  55
  56    Onion Routing is a distributed overlay network designed to anonymize
  57    low-latency TCP-based applications such as web browsing, secure shell,
  58    and instant messaging. Clients choose a path through the network and
  59    build a ``circuit'', in which each node (or ``onion router'' or ``OR'')
  60    in the path knows its predecessor and successor, but no other nodes in
  61    the circuit.  Traffic flowing down the circuit is sent in fixed-size
  62    ``cells'', which are unwrapped by a symmetric key at each node (like
  63    the layers of an onion) and relayed downstream.
  64
  65 2. Connections
  66
  67    There are two ways to connect to an onion router (OR). The first is
  68    as an onion proxy (OP), which allows the OP to authenticate the OR
  69    without authenticating itself.  The second is as another OR, which
  70    allows mutual authentication.
  71
  72    Tor uses TLS for link encryption.  All implementations MUST support
  73    the TLS ciphersuite "TLS_EDH_RSA_WITH_DES_192_CBC3_SHA", and SHOULD
  74    support "TLS_DHE_RSA_WITH_AES_128_CBC_SHA" if it is available.
  75    Implementations MAY support other ciphersuites, but MUST NOT
  76    support any suite without ephemeral keys, symmetric keys of at
  77    least 128 bits, and digests of at least 160 bits.
  78
  79    An OP or OR always sends a two-certificate chain, consisting of a
  80    certificate using a short-term connection key and a second, self-
  81    signed certificate containing the OR's identity key. The commonName of the
  82    first certificate is the OR's nickname, and the commonName of the second
  83    certificate is the OR's nickname, followed by a space and the string
  84    "<identity>".
  85
  86    All parties receiving certificates must confirm that the identity key is
  87    as expected.  (When initiating a connection, the expected identity key is
  88    the one given in the directory; when creating a connection because of an
  89    EXTEND cell, the expected identity key is the one given in the cell.)  If
  90    the key is not as expected, the party must close the connection.
  91
  92    All parties SHOULD reject connections to or from ORs that have malformed
  93    or missing certificates.  ORs MAY accept or reject connections from OPs
  94    with malformed or missing certificates.
  95
  96    Once a TLS connection is established, the two sides send cells
  97    (specified below) to one another.  Cells are sent serially.  All
  98    cells are 512 bytes long.  Cells may be sent embedded in TLS
  99    records of any size or divided across TLS records, but the framing
 100    of TLS records MUST NOT leak information about the type or contents
 101    of the cells.
 102
 103    TLS connections are not permanent. An OP or an OR may close a
 104    connection to an OR if there are no circuits running over the
 105    connection, and an amount of time (KeepalivePeriod, defaults to 5
 106    minutes) has passed.
 107
 108    (As an exception, directory servers may try to stay connected to all of
 109    the ORs.)
 110
 111 3. Cell Packet format
 112
 113    The basic unit of communication for onion routers and onion
 114    proxies is a fixed-width "cell".  Each cell contains the following
 115    fields:
 116
 117         CircID                                [2 bytes]
 118         Command                               [1 byte]
 119         Payload (padded with 0 bytes)         [509 bytes]
 120                                          [Total size: 512 bytes]
 121
 122    The CircID field determines which circuit, if any, the cell is
 123    associated with.
 124
 125    The 'Command' field holds one of the following values:
 126          0 -- PADDING     (Padding)                 (See Sec 6.2)
 127          1 -- CREATE      (Create a circuit)        (See Sec 4)
 128          2 -- CREATED     (Acknowledge create)      (See Sec 4)
 129          3 -- RELAY       (End-to-end data)         (See Sec 5)
 130          4 -- DESTROY     (Stop using a circuit)    (See Sec 4)
 131          5 -- CREATE_FAST (Create a circuit, no PK) (See sec 4)
 132          6 -- CREATED_FAST (Circtuit created, no PK) (See Sec 4)
 133
 134    The interpretation of 'Payload' depends on the type of the cell.
 135       PADDING: Payload is unused.
 136       CREATE:  Payload contains the handshake challenge.
 137       CREATED: Payload contains the handshake response.
 138       RELAY:   Payload contains the relay header and relay body.
 139       DESTROY: Payload is unused.
 140    Upon receiving any other value for the command field, an OR must
 141    drop the cell.
 142
 143    The payload is padded with 0 bytes.
 144
 145    PADDING cells are currently used to implement connection keepalive.
 146    If there is no other traffic, ORs and OPs send one another a PADDING
 147    cell every few minutes.
 148
 149    CREATE, CREATED, and DESTROY cells are used to manage circuits;
 150    see section 4 below.
 151
 152    RELAY cells are used to send commands and data along a circuit; see
 153    section 5 below.
 154
 155 4. Circuit management
 156
 157 4.1. CREATE and CREATED cells
 158
 159    Users set up circuits incrementally, one hop at a time. To create a
 160    new circuit, OPs send a CREATE cell to the first node, with the
 161    first half of the DH handshake; that node responds with a CREATED
 162    cell with the second half of the DH handshake plus the first 20 bytes
 163    of derivative key data (see section 4.2). To extend a circuit past
 164    the first hop, the OP sends an EXTEND relay cell (see section 5)
 165    which instructs the last node in the circuit to send a CREATE cell
 166    to extend the circuit.
 167
 168    The payload for a CREATE cell is an 'onion skin', which consists
 169    of the first step of the DH handshake data (also known as g^x).
 170
 171    The data is encrypted to Bob's PK as follows: Suppose Bob's PK
 172    modulus is L octets long. If the data to be encrypted is shorter
 173    than L-42, then it is encrypted directly (with OAEP padding: see
 174    ftp://ftp.rsasecurity.com/pub/pkcs/pkcs-1/pkcs-1v2-1.pdf). If the
 175    data is at least as long as L-42, then a randomly generated 16-byte
 176    symmetric key is prepended to the data, after which the first L-16-42
 177    bytes of the data are encrypted with Bob's PK; and the rest of the
 178    data is encrypted with the symmetric key.
 179
 180    So in this case, the onion skin on the wire looks like:
 181        RSA-encrypted:
 182          OAEP padding                  [42 bytes]
 183          Symmetric key                 [16 bytes]
 184          First part of g^x             [70 bytes]
 185        Symmetrically encrypted:
 186          Second part of g^x            [58 bytes]
 187
 188    The relay payload for an EXTEND relay cell consists of:
 189          Address                       [4 bytes]
 190          Port                          [2 bytes]
 191          Onion skin                    [186 bytes]
 192          Identity fingerprint          [20 bytes]
 193
 194    The port and address field denote the IPV4 address and port of the next
 195    onion router in the circuit; the public key hash is the SHA1 hash of the
 196    PKCS#1 ASN1 encoding of the next onion router's identity (signing) key.
 197 [XXX please describe why we have this hash. my first guess is that this
 198 way we can notice that we're already connected to this guy even if he's
 199 connected at a different place. anything else? -RD]
 200
 201    The payload for a CREATED cell, or the relay payload for an
 202    EXTENDED cell, contains:
 203          DH data (g^y)                 [128 bytes]
 204          Derivative key data (KH)      [20 bytes]   <see 4.2 below>
 205
 206    The CircID for a CREATE cell is an arbitrarily chosen 2-byte integer,
 207    selected by the node (OP or OR) that sends the CREATE cell.  To prevent
 208    CircID collisions, when one OR sends a CREATE cell to another, it chooses
 209    from only one half of the possible values based on the ORs' public
 210    identity keys: if the sending OR has a lower key, it chooses a CircID with
 211    an MSB of 0; otherwise, it chooses a CircID with an MSB of 1.
 212
 213    Public keys are compared numerically by modulus.
 214
 215    (Older versions of Tor compared OR nicknames, and did it in a broken and
 216    unreliable way.  To support versions of Tor earlier than 0.0.9pre6,
 217    implementations should notice when the other side of a connection is
 218    sending CREATE cells with the "wrong" MSB, and switch accordingly.)
 219
 220 4.1.1. CREATE_FAST/CREATED_FAST cells
 221
 222    When initializing the first hop of a circuit, the OP has already
 223    established the OR's identity and negotiated a secret key using TLS.
 224    Because of this, it is not always necessary for the OP to perform the
 225    public key operations to create a circuit.  In this case, the
 226    OP SHOULD send a CREATE_FAST cell instead of a CREATE cell for the first
 227    hop only.  The OR responds with a CREATED_FAST cell, and the circuit is
 228    created.
 229
 230    A CREATE_FAST cell contains:
 231
 232        Key material (X)    [20 bytes]
 233
 234    A CREATED_FAST cell contains:
 235
 236        Key material (Y)    [20 bytes]
 237        Derivative key data [20 bytes]
 238
 239    [Versions of Tor before 0.1.0.6-rc did not support these cell types;
 240     clients should not send CREATE_FAST cells to older Tor servers.]
 241
 242 4.2. Setting circuit keys
 243
 244    Once the handshake between the OP and an OR is completed, both servers can
 245    now calculate g^xy with ordinary DH.  Before computing g^xy, both client
 246    and server MUST verify that the received g^x or g^y value is not degenerate;
 247    that is, it must be strictly greater than 1 and strictly less than p-1
 248    where p is the DH modulus.  Implementations MUST NOT complete a handshake
 249    with degenerate keys.  Implementions MAY discard other "weak" g^x values.
 250
 251    (Discarding degenerate keys is critical for security; if bad keys are not
 252    discarded, an attacker can substitute the server's CREATED cell's g^y with
 253    0 or 1, thus creating a known g^xy and impersonating the server.)
 254
 255    (The mainline Tor implementation, in the 0.1.1.x-alpha series, also
 256    discarded all g^x values that are less than 2^24, that are greater than
 257    p-2^24, or that have more than 1024-16 identical bits.  This serves no
 258    useful purpose, and will probably stop soon.)
 259
 260    From the base key material g^xy, they compute derivative key material as
 261    follows.  First, the server represents g^xy as a big-endian unsigned
 262    integer.  Next, the server computes 100 bytes of key data as K = SHA1(g^xy
 263    | [00]) | SHA1(g^xy | [01]) | ... SHA1(g^xy | [04]) where "00" is a single
 264    octet whose value is zero, [01] is a single octet whose value is one, etc.
 265    The first 20 bytes of K form KH, bytes 21-40 form the forward digest Df,
 266    41-60 form the backward digest Db, 61-76 form Kf, and 77-92 form Kb.
 267
 268    KH is used in the handshake response to demonstrate knowledge of the
 269    computed shared key. Df is used to seed the integrity-checking hash
 270    for the stream of data going from the OP to the OR, and Db seeds the
 271    integrity-checking hash for the data stream from the OR to the OP. Kf
 272    is used to encrypt the stream of data going from the OP to the OR, and
 273    Kb is used to encrypt the stream of data going from the OR to the OP.
 274
 275    The fast-setup case uses the same formula, except that X|Y is used
 276    in place of g^xy in determining K.  That is,
 277       K = SHA1(X|Y | [00]) | SHA1(X|Y | [01]) | ... SHA1(X|Y| | [04])
 278    The values KH, Kf, Kb, Df, and Db are established and used as before.
 279
 280 4.3. Creating circuits
 281
 282    When creating a circuit through the network, the circuit creator
 283    (OP) performs the following steps:
 284
 285       1. Choose an onion router as an exit node (R_N), such that the onion
 286          router's exit policy does not exclude all pending streams
 287          that need a circuit.
 288
 289       2. Choose a chain of (N-1) onion routers
 290          (R_1...R_N-1) to constitute the path, such that no router
 291          appears in the path twice.
 292
 293       3. If not already connected to the first router in the chain,
 294          open a new connection to that router.
 295
 296       4. Choose a circID not already in use on the connection with the
 297          first router in the chain; send a CREATE cell along the
 298          connection, to be received by the first onion router.
 299
 300       5. Wait until a CREATED cell is received; finish the handshake
 301          and extract the forward key Kf_1 and the backward key Kb_1.
 302
 303       6. For each subsequent onion router R (R_2 through R_N), extend
 304          the circuit to R.
 305
 306    To extend the circuit by a single onion router R_M, the OP performs
 307    these steps:
 308
 309       1. Create an onion skin, encrypted to R_M's public key.
 310
 311       2. Send the onion skin in a relay EXTEND cell along
 312          the circuit (see section 5).
 313
 314       3. When a relay EXTENDED cell is received, verify KH, and
 315          calculate the shared keys.  The circuit is now extended.
 316
 317    When an onion router receives an EXTEND relay cell, it sends a CREATE
 318    cell to the next onion router, with the enclosed onion skin as its
 319    payload.  The initiating onion router chooses some circID not yet
 320    used on the connection between the two onion routers.  (But see
 321    section 4.1. above, concerning choosing circIDs based on
 322    lexicographic order of nicknames.)
 323
 324    When an onion router receives a CREATE cell, if it already has a
 325    circuit on the given connection with the given circID, it drops the
 326    cell.  Otherwise, after receiving the CREATE cell, it completes the
 327    DH handshake, and replies with a CREATED cell.  Upon receiving a
 328    CREATED cell, an onion router packs it payload into an EXTENDED relay
 329    cell (see section 5), and sends that cell up the circuit.  Upon
 330    receiving the EXTENDED relay cell, the OP can retrieve g^y.
 331
 332    (As an optimization, OR implementations may delay processing onions
 333    until a break in traffic allows time to do so without harming
 334    network latency too greatly.)
 335
 336 4.4. Tearing down circuits
 337
 338    Circuits are torn down when an unrecoverable error occurs along
 339    the circuit, or when all streams on a circuit are closed and the
 340    circuit's intended lifetime is over.  Circuits may be torn down
 341    either completely or hop-by-hop.
 342
 343    To tear down a circuit completely, an OR or OP sends a DESTROY
 344    cell to the adjacent nodes on that circuit, using the appropriate
 345    direction's circID.
 346
 347    Upon receiving an outgoing DESTROY cell, an OR frees resources
 348    associated with the corresponding circuit. If it's not the end of
 349    the circuit, it sends a DESTROY cell for that circuit to the next OR
 350    in the circuit. If the node is the end of the circuit, then it tears
 351    down any associated edge connections (see section 5.1).
 352
 353    After a DESTROY cell has been processed, an OR ignores all data or
 354    destroy cells for the corresponding circuit.
 355
 356    (The rest of this section is not currently used; on errors, circuits
 357    are destroyed, not truncated.)
 358
 359    To tear down part of a circuit, the OP may send a RELAY_TRUNCATE cell
 360    signaling a given OR (Stream ID zero).  That OR sends a DESTROY
 361    cell to the next node in the circuit, and replies to the OP with a
 362    RELAY_TRUNCATED cell.
 363
 364    When an unrecoverable error occurs along one connection in a
 365    circuit, the nodes on either side of the connection should, if they
 366    are able, act as follows:  the node closer to the OP should send a
 367    RELAY_TRUNCATED cell towards the OP; the node farther from the OP
 368    should send a DESTROY cell down the circuit.
 369
 370 4.5. Routing relay cells
 371
 372    When an OR receives a RELAY cell, it checks the cell's circID and
 373    determines whether it has a corresponding circuit along that
 374    connection.  If not, the OR drops the RELAY cell.
 375
 376    Otherwise, if the OR is not at the OP edge of the circuit (that is,
 377    either an 'exit node' or a non-edge node), it de/encrypts the payload
 378    with AES/CTR, as follows:
 379         'Forward' relay cell (same direction as CREATE):
 380             Use Kf as key; decrypt.
 381         'Back' relay cell (opposite direction from CREATE):
 382             Use Kb as key; encrypt.
 383    Note that in counter mode, decrypt and encrypt are the same operation.
 384
 385    The OR then decides whether it recognizes the relay cell, by
 386    inspecting the payload as described in section 5.1 below.  If the OR
 387    recognizes the cell, it processes the contents of the relay cell.
 388    Otherwise, it passes the decrypted relay cell along the circuit if
 389    the circuit continues.  If the OR at the end of the circuit
 390    encounters an unrecognized relay cell, an error has occurred: the OR
 391    sends a DESTROY cell to tear down the circuit.
 392
 393    When a relay cell arrives at an OP, the OP decrypts the payload
 394    with AES/CTR as follows:
 395          OP receives data cell:
 396             For I=N...1,
 397                 Decrypt with Kb_I.  If the payload is recognized (see
 398                 section 5.1), then stop and process the payload.
 399
 400    For more information, see section 5 below.
 401
 402 5. Application connections and stream management
 403
 404 5.1. Relay cells
 405
 406    Within a circuit, the OP and the exit node use the contents of
 407    RELAY packets to tunnel end-to-end commands and TCP connections
 408    ("Streams") across circuits.  End-to-end commands can be initiated
 409    by either edge; streams are initiated by the OP.
 410
 411    The payload of each unencrypted RELAY cell consists of:
 412          Relay command           [1 byte]
 413          'Recognized'            [2 bytes]
 414          StreamID                [2 bytes]
 415          Digest                  [4 bytes]
 416          Length                  [2 bytes]
 417          Data                    [498 bytes]
 418
 419    The relay commands are:
 420          1 -- RELAY_BEGIN     [forward]
 421          2 -- RELAY_DATA      [forward or backward]
 422          3 -- RELAY_END       [forward or backward]
 423          4 -- RELAY_CONNECTED [backward]
 424          5 -- RELAY_SENDME    [forward or backward]
 425          6 -- RELAY_EXTEND    [forward]
 426          7 -- RELAY_EXTENDED  [backward]
 427          8 -- RELAY_TRUNCATE  [forward]
 428          9 -- RELAY_TRUNCATED [backward]
 429         10 -- RELAY_DROP      [forward or backward]
 430         11 -- RELAY_RESOLVE   [forward]
 431         12 -- RELAY_RESOLVED  [backward]
 432
 433    Commands labelled as "forward" must only be sent by the originator
 434    of the circuit. Commands labelled as "backward" must only be sent by
 435    other nodes in the circuit back to the originator. Commands marked
 436    as either can be sent either by the originator or other nodes.
 437
 438    The 'recognized' field in any unencrypted relay payload is always set
 439    to zero; the 'digest' field is computed as the first four bytes of
 440    the running SHA-1 digest of all the bytes that have been destined for
 441    this hop of the circuit or originated from this hop of the circuit,
 442    seeded from Df or Db respectively (obtained in section 4.2 above),
 443    and including this RELAY cell's entire payload (taken with the digest
 444    field set to zero).
 445
 446    When the 'recognized' field of a RELAY cell is zero, and the digest
 447    is correct, the cell is considered "recognized" for the purposes of
 448    decryption (see section 4.5 above).
 449
 450    (The digest does not include any bytes from relay cells that do
 451    not start or end at this hop of the circuit. That is, it does not
 452    include forwarded data. Therefore if 'recognized' is zero but the
 453    digest does not match, the running digest at that node should
 454    not be updated, and the cell should be forwarded on.)
 455
 456    All RELAY cells pertaining to the same tunneled stream have the
 457    same stream ID.  StreamIDs are chosen arbitrarily by the OP.  RELAY
 458    cells that affect the entire circuit rather than a particular
 459    stream use a StreamID of zero.
 460
 461    The 'Length' field of a relay cell contains the number of bytes in
 462    the relay payload which contain real payload data. The remainder of
 463    the payload is padded with NUL bytes.
 464
 465    If the RELAY cell is recognized but the relay command is not
 466    understood, the cell must be dropped and ignored. Its contents
 467    still count with respect to the digests, though. [Up until
 468    0.1.1.10, Tor closed circuits when it received an unknown relay
 469    command. Perhaps this will be more forward-compatible. -RD]
 470
 471 5.2. Opening streams and transferring data
 472
 473    To open a new anonymized TCP connection, the OP chooses an open
 474    circuit to an exit that may be able to connect to the destination
 475    address, selects an arbitrary StreamID not yet used on that circuit,
 476    and constructs a RELAY_BEGIN cell with a payload encoding the address
 477    and port of the destination host.  The payload format is:
 478
 479          ADDRESS | ':' | PORT | [00]
 480
 481    where  ADDRESS can be a DNS hostname, or an IPv4 address in
 482    dotted-quad format, or an IPv6 address surrounded by square brackets;
 483    and where PORT is encoded in decimal.
 484
 485    [What is the [00] for? -NM]
 486    [It's so the payload is easy to parse out with string funcs -RD]
 487
 488    Upon receiving this cell, the exit node resolves the address as
 489    necessary, and opens a new TCP connection to the target port.  If the
 490    address cannot be resolved, or a connection can't be established, the
 491    exit node replies with a RELAY_END cell.  (See 5.4 below.)
 492    Otherwise, the exit node replies with a RELAY_CONNECTED cell, whose
 493    payload is in one of the following formats:
 494        The IPv4 address to which the connection was made [4 octets]
 495        A number of seconds (TTL) for which the address may be cached [4 octets]
 496     or
 497        Four zero-valued octets [4 octets]
 498        An address type (6)     [1 octet]
 499        The IPv6 address to which the connection was made [16 octets]
 500        A number of seconds (TTL) for which the address may be cached [4 octets]
 501    [XXXX Versions of Tor before 0.1.1.6 ignore and do not generate the TTL
 502    field.  No version of Tor currently generates the IPv6 format.]
 503
 504    The OP waits for a RELAY_CONNECTED cell before sending any data.
 505    Once a connection has been established, the OP and exit node
 506    package stream data in RELAY_DATA cells, and upon receiving such
 507    cells, echo their contents to the corresponding TCP stream.
 508    RELAY_DATA cells sent to unrecognized streams are dropped.
 509
 510    Relay RELAY_DROP cells are long-range dummies; upon receiving such
 511    a cell, the OR or OP must drop it.
 512
 513 5.3. Closing streams
 514
 515    When an anonymized TCP connection is closed, or an edge node
 516    encounters error on any stream, it sends a 'RELAY_END' cell along the
 517    circuit (if possible) and closes the TCP connection immediately.  If
 518    an edge node receives a 'RELAY_END' cell for any stream, it closes
 519    the TCP connection completely, and sends nothing more along the
 520    circuit for that stream.
 521
 522    The payload of a RELAY_END cell begins with a single 'reason' byte to
 523    describe why the stream is closing, plus optional data (depending on
 524    the reason.)  The values are:
 525
 526        1 -- REASON_MISC           (catch-all for unlisted reasons)
 527        2 -- REASON_RESOLVEFAILED  (couldn't look up hostname)
 528        3 -- REASON_CONNECTREFUSED (remote host refused connection) [*]
 529        4 -- REASON_EXITPOLICY     (OR refuses to connect to host or port)
 530        5 -- REASON_DESTROY        (Circuit is being destroyed)
 531        6 -- REASON_DONE           (Anonymized TCP connection was closed)
 532        7 -- REASON_TIMEOUT        (Connection timed out, or OR timed out
 533                                    while connecting)
 534        8 -- (unallocated) [**]
 535        9 -- REASON_HIBERNATING    (OR is temporarily hibernating)
 536       10 -- REASON_INTERNAL       (Internal error at the OR)
 537       11 -- REASON_RESOURCELIMIT  (OR has no resources to fulfill request)
 538       12 -- REASON_CONNRESET      (Connection was unexpectedly reset)
 539       13 -- REASON_TORPROTOCOL    (Sent when closing connection because of
 540                                    Tor protocol violations.)
 541
 542    (With REASON_EXITPOLICY, the 4-byte IPv4 address or 16-byte IPv6 address
 543    forms the optional data; no other reason currently has extra data.
 544    As of 0.1.1.6, the body also contains a 4-byte TTL.)
 545
 546    OPs and ORs MUST accept reasons not on the above list, since future
 547    versions of Tor may provide more fine-grained reasons.
 548
 549    [*] Older versions of Tor also send this reason when connections are
 550        reset.
 551    [**] Due to a bug in versions of Tor through 0095, error reason 8 must
 552         remain allocated until that version is obsolete.
 553
 554    --- [The rest of this section describes unimplemented functionality.]
 555
 556    Because TCP connections can be half-open, we follow an equivalent
 557    to TCP's FIN/FIN-ACK/ACK protocol to close streams.
 558
 559    An exit connection can have a TCP stream in one of three states:
 560    'OPEN', 'DONE_PACKAGING', and 'DONE_DELIVERING'.  For the purposes
 561    of modeling transitions, we treat 'CLOSED' as a fourth state,
 562    although connections in this state are not, in fact, tracked by the
 563    onion router.
 564
 565    A stream begins in the 'OPEN' state.  Upon receiving a 'FIN' from
 566    the corresponding TCP connection, the edge node sends a 'RELAY_FIN'
 567    cell along the circuit and changes its state to 'DONE_PACKAGING'.
 568    Upon receiving a 'RELAY_FIN' cell, an edge node sends a 'FIN' to
 569    the corresponding TCP connection (e.g., by calling
 570    shutdown(SHUT_WR)) and changing its state to 'DONE_DELIVERING'.
 571
 572    When a stream in already in 'DONE_DELIVERING' receives a 'FIN', it
 573    also sends a 'RELAY_FIN' along the circuit, and changes its state
 574    to 'CLOSED'.  When a stream already in 'DONE_PACKAGING' receives a
 575    'RELAY_FIN' cell, it sends a 'FIN' and changes its state to
 576    'CLOSED'.
 577
 578    If an edge node encounters an error on any stream, it sends a
 579    'RELAY_END' cell (if possible) and closes the stream immediately.
 580
 581 5.4. Remote hostname lookup
 582
 583    To find the address associated with a hostname, the OP sends a
 584    RELAY_RESOLVE cell containing the hostname to be resolved.  (For a reverse
 585    lookup, the OP sends a RELAY_RESOLVE cell containing an in-addr.arpa
 586    address.)  The OR replies with a RELAY_RESOLVED cell containing a status
 587    byte, and any number of answers.  Each answer is of the form:
 588        Type   (1 octet)
 589        Length (1 octet)
 590        Value  (variable-width)
 591        TTL    (4 octets)
 592    "Length" is the length of the Value field.
 593    "Type" is one of:
 594       0x00 -- Hostname
 595       0x04 -- IPv4 address
 596       0x06 -- IPv6 address
 597       0xF0 -- Error, transient
 598       0xF1 -- Error, nontransient
 599
 600     If any answer has a type of 'Error', then no other answer may be given.
 601
 602     The RELAY_RESOLVE cell must use a nonzero, distinct streamID; the
 603     corresponding RELAY_RESOLVED cell must use the same streamID.  No stream
 604     is actually created by the OR when resolving the name.
 605
 606 6. Flow control
 607
 608 6.1. Link throttling
 609
 610    Each node should do appropriate bandwidth throttling to keep its
 611    user happy.
 612
 613    Communicants rely on TCP's default flow control to push back when they
 614    stop reading.
 615
 616 6.2. Link padding
 617
 618    Currently nodes are not required to do any sort of link padding or
 619    dummy traffic. Because strong attacks exist even with link padding,
 620    and because link padding greatly increases the bandwidth requirements
 621    for running a node, we plan to leave out link padding until this
 622    tradeoff is better understood.
 623
 624 6.3. Circuit-level flow control
 625
 626    To control a circuit's bandwidth usage, each OR keeps track of
 627    two 'windows', consisting of how many RELAY_DATA cells it is
 628    allowed to package for transmission, and how many RELAY_DATA cells
 629    it is willing to deliver to streams outside the network.
 630    Each 'window' value is initially set to 1000 data cells
 631    in each direction (cells that are not data cells do not affect
 632    the window).  When an OR is willing to deliver more cells, it sends a
 633    RELAY_SENDME cell towards the OP, with Stream ID zero.  When an OR
 634    receives a RELAY_SENDME cell with stream ID zero, it increments its
 635    packaging window.
 636
 637    Each of these cells increments the corresponding window by 100.
 638
 639    The OP behaves identically, except that it must track a packaging
 640    window and a delivery window for every OR in the circuit.
 641
 642    An OR or OP sends cells to increment its delivery window when the
 643    corresponding window value falls under some threshold (900).
 644
 645    If a packaging window reaches 0, the OR or OP stops reading from
 646    TCP connections for all streams on the corresponding circuit, and
 647    sends no more RELAY_DATA cells until receiving a RELAY_SENDME cell.
 648 [this stuff is badly worded; copy in the tor-design section -RD]
 649
 650 6.4. Stream-level flow control
 651
 652    Edge nodes use RELAY_SENDME cells to implement end-to-end flow
 653    control for individual connections across circuits. Similarly to
 654    circuit-level flow control, edge nodes begin with a window of cells
 655    (500) per stream, and increment the window by a fixed value (50)
 656    upon receiving a RELAY_SENDME cell. Edge nodes initiate RELAY_SENDME
 657    cells when both a) the window is <= 450, and b) there are less than
 658    ten cell payloads remaining to be flushed at that edge.
 659
 660 7. Directories and routers
 661
 662 7.1. Extensible information format
 663
 664 Router descriptors and directories both obey the following lightweight
 665 extensible information format.
 666
 667 The highest level object is a Document, which consists of one or more Items.
 668 Every Item begins with a KeywordLine, followed by one or more Objects. A
 669 KeywordLine begins with a Keyword, optionally followed by a space and more
 670 non-newline characters, and ends with a newline.  A Keyword is a sequence of
 671 one or more characters in the set [A-Za-z0-9-].  An Object is a block of
 672 encoded data in pseudo-Open-PGP-style armor. (cf. RFC 2440)
 673
 674 More formally:
 675
 676     Document ::= (Item | NL)+
 677     Item ::= KeywordLine Object*
 678     KeywordLine ::= Keyword NL | Keyword SP ArgumentsChar+ NL
 679     Keyword = KeywordChar+
 680     KeywordChar ::= 'A' ... 'Z' | 'a' ... 'z' | '0' ... '9' | '-'
 681     ArgumentChar ::= any printing ASCII character except NL.
 682     Object ::= BeginLine Base-64-encoded-data EndLine
 683     BeginLine ::= "-----BEGIN " Keyword "-----" NL
 684     EndLine ::= "-----END " Keyword "-----" NL
 685
 686     The BeginLine and EndLine of an Object must use the same keyword.
 687
 688 When interpreting a Document, software MUST reject any document containing a
 689 KeywordLine that starts with a keyword it doesn't recognize.
 690
 691 The "opt" keyword is reserved for non-critical future extensions.  All
 692 implementations MUST ignore any item of the form "opt keyword ....." when
 693 they would not recognize "keyword ....."; and MUST treat "opt keyword ....."
 694 as synonymous with "keyword ......" when keyword is recognized.
 695
 696 7.2. Router descriptor format.
 697
 698 Every router descriptor MUST start with a "router" Item; MUST end with a
 699 "router-signature" Item and an extra NL; and MUST contain exactly one
 700 instance of each of the following Items: "published" "onion-key" "link-key"
 701 "signing-key" "bandwidth".  Additionally, a router descriptor MAY contain any
 702 number of "accept", "reject", "fingerprint", "uptime", and "opt" Items.
 703 Other than "router" and "router-signature", the items may appear in any
 704 order.
 705
 706 The items' formats are as follows:
 707    "router" nickname address (ORPort SocksPort DirPort)?
 708
 709       Indicates the beginning of a router descriptor.  "address" must be an
 710       IPv4 address in dotted-quad format.  The Port values will soon be
 711       deprecated; using them here is equivalent to using them in a "ports"
 712       item.
 713
 714    "ports" ORPort SocksPort DirPort
 715
 716       Indicates the TCP ports at which this OR exposes functionality.
 717       ORPort is a port at which this OR accepts TLS connections for the main
 718       OR protocol;  SocksPort is the port at which this OR accepts SOCKS
 719       connections; and DirPort is the port at which this OR accepts
 720       directory-related HTTP connections.  If any port is not supported, the
 721       value 0 is given instead of a port number.
 722
 723    "bandwidth" bandwidth-avg bandwidth-burst bandwidth-observed
 724
 725       Estimated bandwidth for this router, in bytes per second.  The
 726       "average" bandwidth is the volume per second that the OR is willing
 727       to sustain over long periods; the "burst" bandwidth is the volume
 728       that the OR is willing to sustain in very short intervals.  The
 729       "observed" value is an estimate of the capacity this server can
 730       handle.  The server remembers the max bandwidth sustained output
 731       over any ten second period in the past day, and another sustained
 732       input.  The "observed" value is the lesser of these two numbers.
 733
 734    "platform" string
 735
 736       A human-readable string describing the system on which this OR is
 737       running.  This MAY include the operating system, and SHOULD include
 738       the name and version of the software implementing the Tor protocol.
 739
 740    "published" YYYY-MM-DD HH:MM:SS
 741
 742       The time, in GMT, when this descriptor was generated.
 743
 744    "fingerprint"
 745
 746       A fingerprint (20 byte SHA1 hash of asn1 encoded public key, encoded
 747       in hex, with spaces after every 4 characters) for this router's
 748       identity key.
 749
 750       [We didn't start parsing this line until Tor 0.1.0.6-rc; it should
 751        be marked with "opt" until earlier versions of Tor are obsolete.]
 752
 753    "hibernating" 0|1
 754
 755       If the value is 1, then the Tor server was hibernating when the
 756       descriptor was published, and shouldn't be used to build circuits.
 757
 758       [We didn't start parsing this line until Tor 0.1.0.6-rc; it should
 759        be marked with "opt" until earlier versions of Tor are obsolete.]
 760
 761    "uptime"
 762
 763       The number of seconds that this OR process has been running.
 764
 765    "onion-key" NL a public key in PEM format
 766
 767       This key is used to encrypt EXTEND cells for this OR.  The key MUST
 768       be accepted for at least XXXX hours after any new key is published in
 769       a subsequent descriptor.
 770
 771    "signing-key" NL a public key in PEM format
 772
 773       The OR's long-term identity key.
 774
 775    "accept" exitpattern
 776    "reject" exitpattern
 777
 778        These lines, in order, describe the rules that an OR follows when
 779        deciding whether to allow a new stream to a given address.  The
 780        'exitpattern' syntax is described below.
 781
 782    "router-signature" NL Signature NL
 783
 784        The "SIGNATURE" object contains a signature of the PKCS1-padded SHA1
 785        hash of the entire router descriptor, taken from the beginning of the
 786        "router" line, through the newline after the "router-signature" line.
 787        The router descriptor is invalid unless the signature is performed
 788        with the router's identity key.
 789
 790    "contact" info NL
 791
 792        Describes a way to contact the server's administrator, preferably
 793        including an email address and a PGP key fingerprint.
 794
 795    "family" names NL
 796
 797        'Names' is a space-separated list of server nicknames. If two ORs
 798        list one another in their "family" entries, then OPs should treat
 799        them as a single OR for the purpose of path selection.
 800
 801        For example, if node A's descriptor contains "family B", and node B's
 802        descriptor contains "family A", then node A and node B should never
 803        be used on the same circuit.
 804
 805    "read-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM,NUM,NUM... NL
 806    "write-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM,NUM,NUM... NL
 807
 808        Declare how much bandwidth the OR has used recently. Usage is divided
 809        into intervals of NSEC seconds.  The YYYY-MM-DD HH:MM:SS field defines
 810        the end of the most recent interval.  The numbers are the number of
 811        bytes used in the most recent intervals, ordered from oldest to newest.
 812
 813        [We didn't start parsing these lines until Tor 0.1.0.6-rc; they should
 814         be marked with "opt" until earlier versions of Tor are obsolete.]
 815
 816 nickname ::= between 1 and 19 alphanumeric characters, case-insensitive.
 817
 818 exitpattern ::= addrspec ":" portspec
 819 portspec ::= "*" | port | port "-" port
 820 port ::= an integer between 1 and 65535, inclusive.
 821 addrspec ::= "*" | ip4spec | ip6spec
 822 ipv4spec ::= ip4 | ip4 "/" num_ip4_bits | ip4 "/" ip4mask
 823 ip4 ::= an IPv4 address in dotted-quad format
 824 ip4mask ::= an IPv4 mask in dotted-quad format
 825 num_ip4_bits ::= an integer between 0 and 32
 826 ip6spec ::= ip6 | ip6 "/" num_ip6_bits
 827 ip6 ::= an IPv6 address, surrounded by square brackets.
 828 num_ip6_bits ::= an integer between 0 and 128
 829
 830 Ports are required; if they are not included in the router
 831 line, they must appear in the "ports" lines.
 832
 833 7.3. Directory format
 834
 835 A Directory begins with a "signed-directory" item, followed by one each of
 836 the following, in any order: "recommended-software", "published",
 837 "router-status", "dir-signing-key".  It may include any number of "opt"
 838 items.  After these items, a directory includes any number of router
 839 descriptors, and a single "directory-signature" item.
 840
 841     "signed-directory"
 842
 843         Indicates the start of a directory.
 844
 845     "published" YYYY-MM-DD HH:MM:SS
 846
 847         The time at which this directory was generated and signed, in GMT.
 848
 849     "dir-signing-key"
 850
 851         The key used to sign this directory; see "signing-key" for format.
 852
 853     "recommended-software"  comma-separated-version-list
 854
 855         A list of which versions of which implementations are currently
 856         believed to be secure and compatible with the network.
 857
 858     "running-routers" space-separated-list
 859
 860         A description of which routers are currently believed to be up or
 861         down.  Every entry consists of an optional "!", followed by either an
 862         OR's nickname, or "$" followed by a hexadecimal encoding of the hash
 863         of an OR's identity key.  If the "!" is included, the router is
 864         believed not to be running; otherwise, it is believed to be running.
 865         If a router's nickname is given, exactly one router of that nickname
 866         will appear in the directory, and that router is "approved" by the
 867         directory server.  If a hashed identity key is given, that OR is not
 868         "approved".  [XXXX The 'running-routers' line is only provided for
 869         backward compatibility.  New code should parse 'router-status'
 870         instead.]
 871
 872     "router-status" space-separated-list
 873
 874         A description of which routers are currently believed to be up or
 875         down, and which are verified or unverified.  Contains one entry for
 876         every router that the directory server knows.  Each entry is of the
 877         format:
 878
 879               !name=$digest  [Verified router, currently not live.]
 880               name=$digest   [Verified router, currently live.]
 881               !$digest       [Unverified router, currently not live.]
 882           or  $digest        [Unverified router, currently live.]
 883
 884         (where 'name' is the router's nickname and 'digest' is a hexadecimal
 885         encoding of the hash of the routers' identity key).
 886
 887         When parsing this line, clients should only mark a router as
 888         'verified' if its nickname AND digest match the one provided.
 889
 890     "directory-signature" nickname-of-dirserver NL Signature
 891
 892 The signature is computed by computing the SHA-1 hash of the
 893 directory, from the characters "signed-directory", through the newline
 894 after "directory-signature".  This digest is then padded with PKCS.1,
 895 and signed with the directory server's signing key.
 896
 897 If software encounters an unrecognized keyword in a single router descriptor,
 898 it MUST reject only that router descriptor, and continue using the
 899 others.  Because this mechanism is used to add 'critical' extensions to
 900 future versions of the router descriptor format, implementation should treat
 901 it as a normal occurrence and not, for example, report it to the user as an
 902 error.  [Versions of Tor prior to 0.1.1 did this.]
 903
 904 If software encounters an unrecognized keyword in the directory header,
 905 it SHOULD reject the entire directory.
 906
 907 7.4. Network-status descriptor
 908
 909 A "network-status" (a.k.a "running-routers") document is a truncated
 910 directory that contains only the current status of a list of nodes, not
 911 their actual descriptors.  It contains exactly one of each of the following
 912 entries.
 913
 914      "network-status"
 915
 916         Must appear first.
 917
 918      "published" YYYY-MM-DD HH:MM:SS
 919
 920         (see 7.3 above)
 921
 922      "router-status" list
 923
 924         (see 7.3 above)
 925
 926      "directory-signature" NL signature
 927
 928         (see 7.3 above)
 929
 930 7.5. Behavior of a directory server
 931
 932 lists nodes that are connected currently
 933 speaks HTTP on a socket, spits out directory on request
 934
 935 Directory servers listen on a certain port (the DirPort), and speak a
 936 limited version of HTTP 1.0. Clients send either GET or POST commands.
 937 The basic interactions are:
 938   "%s %s HTTP/1.0\r\nContent-Length: %lu\r\nHost: %s\r\n\r\n",
 939     command, url, content-length, host.
 940   Get "/tor/" to fetch a full directory.
 941   Get "/tor/dir.z" to fetch a compressed full directory.
 942   Get "/tor/running-routers" to fetch a network-status descriptor.
 943   Post "/tor/" to post a server descriptor, with the body of the
 944     request containing the descriptor.
 945
 946   "host" is used to specify the address:port of the dirserver, so
 947   the request can survive going through HTTP proxies.
 948
 949 A.1. Differences between spec and implementation
 950
 951 - The current specification requires all ORs to have IPv4 addresses, but
 952   allows servers to exit and resolve to IPv6 addresses, and to declare IPv6
 953   addresses in their exit policies.  The current codebase has no IPv6
 954   support at all.
 955
 956 B. Things that should change in a later version of the Tor protocol
 957
 958
 959 B.1. ... but which will require backward-incompatible change
 960
 961   - Circuit IDs should be longer.
 962   - IPv6 everywhere.
 963   - Maybe, keys should be longer.
 964   - Drop backward compatibility.
 965   - We should use a 128-bit subgroup of our DH prime.
 966   - Handshake should use HMAC.
 967   - Multiple cell lengths
 968   - Ability to split circuits across paths (If this is useful.)
 969   - SENDME windows should be dynamic.
 970
 971   - Directory
 972      - Stop ever mentioning socks ports
 973
 974 B.1. ... and that will require no changes
 975
 976    - Mention multiple addr/port combos
 977    - Advertised outbound IP?
 978    - Migrate streams across circuits.
 979
 980 B.2. ... and that we have no idea how to do.
 981
 982    - UDP (as transport)
 983    - UDP (as content)
 984    - Use a better AES mode that has built-in integrity checking,
 985      doesn't grow with the number of hops, is not patented, and
 986      is implemented and maintained by smart people.
 987