1 [This proposed Tor extension has not been implemented yet. It is currently
2 in request-for-comments state. -RD]
4 Tor Unreliable Datagram Extension Proposal
14 Tor is a distributed overlay network designed to anonymize low-latency
15 TCP-based applications. The current tor specification supports only
16 TCP-based traffic. This limitation prevents the use of tor to anonymize
17 other important applications, notably voice over IP software. This document
18 is a proposal to extend the tor specification to support UDP traffic.
20 The basic design philosophy of this extension is to add support for
21 tunneling unreliable datagrams through tor with as few modifications to the
22 protocol as possible. As currently specified, tor cannot directly support
23 such tunneling, as connections between nodes are built using transport layer
24 security (TLS) atop TCP. The latency incurred by TCP is likely unacceptable
25 to the operation of most UDP-based application level protocols.
27 Thus, we propose the addition of links between nodes using datagram
28 transport layer security (DTLS). These links allow packets to traverse a
29 route through tor quickly, but their unreliable nature requires minor
30 changes to the tor protocol. This proposal outlines the necessary
31 additions and changes to the tor specification to support UDP traffic.
33 We note that a separate set of DTLS links between nodes creates a second
34 overlay, distinct from the that composed of TLS links. This separation and
35 resulting decrease in each anonymity set's size will make certain attacks
36 easier. However, it is our belief that VoIP support in tor will
37 dramatically increase its appeal, and correspondingly, the size of its user
38 base, number of deployed nodes, and total traffic relayed. These increases
39 should help offset the loss of anonymity that two distinct networks imply.
41 1. Overview of Tor-UDP and its complications
43 As described above, this proposal extends the Tor specification to support
44 UDP with as few changes as possible. Tor's overlay network is managed
45 through TLS based connections; we will re-use this control plane to set up
46 and tear down circuits that relay UDP traffic. These circuits be built atop
47 DTLS, in a fashion analogous to how Tor currently sends TCP traffic over
50 The unreliability of DTLS circuits creates problems for Tor at two levels:
52 1. Tor's encryption of the relay layer does not allow independent
53 decryption of individual records. If record N is not received, then
54 record N+1 will not decrypt correctly, as the counter for AES/CTR is
55 maintained implicitly.
57 2. Tor's end-to-end integrity checking works under the assumption that
58 all RELAY cells are delivered. This assumption is invalid when cells
61 The fix for the first problem is straightforward: add an explicit sequence
62 number to each cell. To fix the second problem, we introduce a
63 system of nonces and hashes to RELAY packets.
65 In the following sections, we mirror the layout of the Tor Protocol
66 Specification, presenting the necessary modifications to the Tor protocol as
71 Tor-UDP uses DTLS for encryption of some links. All DTLS links must have
72 corresponding TLS links, as all control messages are sent over TLS. All
73 implementations MUST support the DTLS ciphersuite "[TODO]".
75 DTLS connections are formed using the same protocol as TLS connections.
76 This occurs upon request, following a CREATE_UDP or CREATE_FAST_UDP cell,
77 as detailed in section 4.6.
79 Once a paired TLS/DTLS connection is established, the two sides send cells
80 to one another. All but two types of cells are sent over TLS links. RELAY
81 cells containing the commands RELAY_UDP_DATA and RELAY_UDP_DROP, specified
82 below, are sent over DTLS links. [Should all cells still be 512 bytes long?
83 Perhaps upon completion of a preliminary implementation, we should do a
84 performance evaluation for some class of UDP traffic, such as VoIP. - ML]
85 Cells may be sent embedded in TLS or DTLS records of any size or divided
86 across such records. The framing of these records MUST NOT leak any more
87 information than the above differentiation on the basis of cell type. [I am
88 uncomfortable with this leakage, but don't see any simple, elegant way
91 As with TLS connections, DTLS connections are not permanent.
95 Each cell contains the following fields:
99 Sequence Number [2 bytes]
100 Payload (padded with 0 bytes) [507 bytes]
101 [Total size: 512 bytes]
103 The 'Command' field holds one of the following values:
104 0 -- PADDING (Padding) (See Sec 6.2)
105 1 -- CREATE (Create a circuit) (See Sec 4)
106 2 -- CREATED (Acknowledge create) (See Sec 4)
107 3 -- RELAY (End-to-end data) (See Sec 5)
108 4 -- DESTROY (Stop using a circuit) (See Sec 4)
109 5 -- CREATE_FAST (Create a circuit, no PK) (See Sec 4)
110 6 -- CREATED_FAST (Circuit created, no PK) (See Sec 4)
111 7 -- CREATE_UDP (Create a UDP circuit) (See Sec 4)
112 8 -- CREATED_UDP (Acknowledge UDP create) (See Sec 4)
113 9 -- CREATE_FAST_UDP (Create a UDP circuit, no PK) (See Sec 4)
114 10 -- CREATED_FAST_UDP(UDP circuit created, no PK) (See Sec 4)
116 The sequence number allows for AES/CTR decryption of RELAY cells
117 independently of one another; this functionality is required to support
118 cells sent over DTLS. The sequence number is described in more detail in
121 [Should the sequence number only appear in RELAY packets? The overhead is
122 small, and I'm hesitant to force more code paths on the implementor. -ML]
123 [There's already a separate relay header that has other material in it,
124 so it wouldn't be the end of the world to move it there if it's
127 [Having separate commands for UDP circuits seems necessary, unless we can
128 assume a flag day event for a large number of tor nodes. -ML]
130 4. Circuit management
132 4.2. Setting circuit keys
134 Keys are set up for UDP circuits in the same fashion as for TCP circuits.
135 Each UDP circuit shares keys with its corresponding TCP circuit.
137 [If the keys are used for both TCP and UDP connections, how does it
138 work to mix sequence-number-less cells with sequenced-numbered cells --
139 how do you know you have the encryption order right? -RD]
141 4.3. Creating circuits
143 UDP circuits are created as TCP circuits, using the *_UDP cells as
146 4.4. Tearing down circuits
148 UDP circuits are torn down as TCP circuits, using the *_UDP cells as
151 4.5. Routing relay cells
153 When an OR receives a RELAY cell, it checks the cell's circID and
154 determines whether it has a corresponding circuit along that
155 connection. If not, the OR drops the RELAY cell.
157 Otherwise, if the OR is not at the OP edge of the circuit (that is,
158 either an 'exit node' or a non-edge node), it de/encrypts the payload
159 with AES/CTR, as follows:
160 'Forward' relay cell (same direction as CREATE):
161 Use Kf as key; decrypt, using sequence number to synchronize
162 ciphertext and keystream.
163 'Back' relay cell (opposite direction from CREATE):
164 Use Kb as key; encrypt, using sequence number to synchronize
165 ciphertext and keystream.
166 Note that in counter mode, decrypt and encrypt are the same operation.
167 [Since the sequence number is only 2 bytes, what do you do when it
170 Each stream encrypted by a Kf or Kb has a corresponding unique state,
171 captured by a sequence number; the originator of each such stream chooses
172 the initial sequence number randomly, and increments it only with RELAY
173 cells. [This counts cells; unlike, say, TCP, tor uses fixed-size cells, so
174 there's no need for counting bytes directly. Right? - ML]
175 [I believe this is true. You'll find out for sure when you try to
178 The OR then decides whether it recognizes the relay cell, by
179 inspecting the payload as described in section 5.1 below. If the OR
180 recognizes the cell, it processes the contents of the relay cell.
181 Otherwise, it passes the decrypted relay cell along the circuit if
182 the circuit continues. If the OR at the end of the circuit
183 encounters an unrecognized relay cell, an error has occurred: the OR
184 sends a DESTROY cell to tear down the circuit.
186 When a relay cell arrives at an OP, the OP decrypts the payload
187 with AES/CTR as follows:
188 OP receives data cell:
190 Decrypt with Kb_I, using the sequence number as above. If the
191 payload is recognized (see section 5.1), then stop and process
194 For more information, see section 5 below.
196 4.6. CREATE_UDP and CREATED_UDP cells
198 Users set up UDP circuits incrementally. The procedure is similar to that
199 for TCP circuits, as described in section 4.1. In addition to the TLS
200 connection to the first node, the OP also attempts to open a DTLS
201 connection. If this succeeds, the OP sends a CREATE_UDP cell, with a
202 payload in the same format as a CREATE cell. To extend a UDP circuit past
203 the first hop, the OP sends an EXTEND_UDP relay cell (see section 5) which
204 instructs the last node in the circuit to send a CREATE_UDP cell to extend
207 The relay payload for an EXTEND_UDP relay cell consists of:
211 Onion skin [186 bytes]
212 Identity fingerprint [20 bytes]
214 The address field and ports denote the IPV4 address and ports of the next OR
217 The payload for a CREATED_UDP cell or the relay payload for an
218 RELAY_EXTENDED_UDP cell is identical to that of the corresponding CREATED or
219 RELAY_EXTENDED cell. Both circuits are established using the same key.
221 Note that the existence of a UDP circuit implies the
222 existence of a corresponding TCP circuit, sharing keys, sequence numbers,
223 and any other relevant state.
225 4.6.1 CREATE_FAST_UDP/CREATED_FAST_UDP cells
227 As above, the OP must successfully connect using DTLS before attempting to
228 send a CREATE_FAST_UDP cell. Otherwise, the procedure is the same as in
231 5. Application connections and stream management
235 Within a circuit, the OP and the exit node use the contents of RELAY cells
236 to tunnel end-to-end commands, TCP connections ("Streams"), and UDP packets
237 across circuits. End-to-end commands and UDP packets can be initiated by
238 either edge; streams are initiated by the OP.
240 The payload of each unencrypted RELAY cell consists of:
241 Relay command [1 byte]
242 'Recognized' [2 bytes]
248 The relay commands are:
249 1 -- RELAY_BEGIN [forward]
250 2 -- RELAY_DATA [forward or backward]
251 3 -- RELAY_END [forward or backward]
252 4 -- RELAY_CONNECTED [backward]
253 5 -- RELAY_SENDME [forward or backward]
254 6 -- RELAY_EXTEND [forward]
255 7 -- RELAY_EXTENDED [backward]
256 8 -- RELAY_TRUNCATE [forward]
257 9 -- RELAY_TRUNCATED [backward]
258 10 -- RELAY_DROP [forward or backward]
259 11 -- RELAY_RESOLVE [forward]
260 12 -- RELAY_RESOLVED [backward]
261 13 -- RELAY_BEGIN_UDP [forward]
262 14 -- RELAY_DATA_UDP [forward or backward]
263 15 -- RELAY_EXTEND_UDP [forward]
264 16 -- RELAY_EXTENDED_UDP [backward]
265 17 -- RELAY_DROP_UDP [forward or backward]
267 Commands labelled as "forward" must only be sent by the originator
268 of the circuit. Commands labelled as "backward" must only be sent by
269 other nodes in the circuit back to the originator. Commands marked
270 as either can be sent either by the originator or other nodes.
272 The 'recognized' field in any unencrypted relay payload is always set to
275 The 'digest' field can have two meanings. For all cells sent over TLS
276 connections (that is, all commands and all non-UDP RELAY data), it is
277 computed as the first four bytes of the running SHA-1 digest of all the
278 bytes that have been sent reliably and have been destined for this hop of
279 the circuit or originated from this hop of the circuit, seeded from Df or Db
280 respectively (obtained in section 4.2 above), and including this RELAY
281 cell's entire payload (taken with the digest field set to zero). Cells sent
282 over DTLS connections do not affect this running digest. Each cell sent
283 over DTLS (that is, RELAY_DATA_UDP and RELAY_DROP_UDP) has the digest field
284 set to the SHA-1 digest of the current RELAY cells' entire payload, with the
285 digest field set to zero. Coupled with a randomly-chosen streamID, this
286 provides per-cell integrity checking on UDP cells.
287 [If you drop malformed UDP relay cells but don't close the circuit,
288 then this 8 bytes of digest is not as strong as what we get in the
289 TCP-circuit side. Is this a problem? -RD]
291 When the 'recognized' field of a RELAY cell is zero, and the digest
292 is correct, the cell is considered "recognized" for the purposes of
293 decryption (see section 4.5 above).
295 (The digest does not include any bytes from relay cells that do
296 not start or end at this hop of the circuit. That is, it does not
297 include forwarded data. Therefore if 'recognized' is zero but the
298 digest does not match, the running digest at that node should
299 not be updated, and the cell should be forwarded on.)
301 All RELAY cells pertaining to the same tunneled TCP stream have the
302 same streamID. Such streamIDs are chosen arbitrarily by the OP. RELAY
303 cells that affect the entire circuit rather than a particular
304 stream use a StreamID of zero.
306 All RELAY cells pertaining to the same UDP tunnel have the same streamID.
307 This streamID is chosen randomly by the OP, but cannot be zero.
309 The 'Length' field of a relay cell contains the number of bytes in
310 the relay payload which contain real payload data. The remainder of
311 the payload is padded with NUL bytes.
313 If the RELAY cell is recognized but the relay command is not
314 understood, the cell must be dropped and ignored. Its contents
315 still count with respect to the digests, though. [Before
316 0.1.1.10, Tor closed circuits when it received an unknown relay
317 command. Perhaps this will be more forward-compatible. -RD]
319 5.2.1. Opening UDP tunnels and transferring data
321 To open a new anonymized UDP connection, the OP chooses an open
322 circuit to an exit that may be able to connect to the destination
323 address, selects a random streamID not yet used on that circuit,
324 and constructs a RELAY_BEGIN_UDP cell with a payload encoding the address
325 and port of the destination host. The payload format is:
327 ADDRESS | ':' | PORT | [00]
329 where ADDRESS can be a DNS hostname, or an IPv4 address in
330 dotted-quad format, or an IPv6 address surrounded by square brackets;
331 and where PORT is encoded in decimal.
333 [What is the [00] for? -NM]
334 [It's so the payload is easy to parse out with string funcs -RD]
336 Upon receiving this cell, the exit node resolves the address as necessary.
337 If the address cannot be resolved, the exit node replies with a RELAY_END
338 cell. (See 5.4 below.) Otherwise, the exit node replies with a
339 RELAY_CONNECTED cell, whose payload is in one of the following formats:
340 The IPv4 address to which the connection was made [4 octets]
341 A number of seconds (TTL) for which the address may be cached [4 octets]
343 Four zero-valued octets [4 octets]
344 An address type (6) [1 octet]
345 The IPv6 address to which the connection was made [16 octets]
346 A number of seconds (TTL) for which the address may be cached [4 octets]
347 [XXXX Versions of Tor before 0.1.1.6 ignore and do not generate the TTL
348 field. No version of Tor currently generates the IPv6 format.]
350 The OP waits for a RELAY_CONNECTED cell before sending any data.
351 Once a connection has been established, the OP and exit node
352 package UDP data in RELAY_DATA_UDP cells, and upon receiving such
353 cells, echo their contents to the corresponding socket.
354 RELAY_DATA_UDP cells sent to unrecognized streams are dropped.
356 Relay RELAY_DROP_UDP cells are long-range dummies; upon receiving such
357 a cell, the OR or OP must drop it.
361 UDP tunnels are closed in a fashion corresponding to TCP connections.
365 UDP streams are not subject to flow control.
367 7.2. Router descriptor format.
369 The items' formats are as follows:
370 "router" nickname address ORPort SocksPort DirPort UDPPort
372 Indicates the beginning of a router descriptor. "address" must be
373 an IPv4 address in dotted-quad format. The last three numbers
374 indicate the TCP ports at which this OR exposes
375 functionality. ORPort is a port at which this OR accepts TLS
376 connections for the main OR protocol; SocksPort is deprecated and
377 should always be 0; DirPort is the port at which this OR accepts
378 directory-related HTTP connections; and UDPPort is a port at which
379 this OR accepts DTLS connections for UDP data. If any port is not
380 supported, the value 0 is given instead of a port number.
384 What changes need to happen to each node's exit policy to support this? -RD