3 (As of 8 October 2003, this was all accurate. If you're reading this in
4 the distant future, stuff may have changed.)
6 0. Intro and required reading
8 Onion Routing is still very much in development stages. This document
9 aims to get you started in the right direction if you want to understand
10 the code, add features, fix bugs, etc.
12 Read the README file first, so you can get familiar with the basics of
13 installing and running an onion router.
15 Then, skim some of the introductory materials in tor-spec.txt,
16 tor-design.tex, and the Tor FAQ to learn more about how the Tor protocol
17 is supposed to work. This document will assume you know about Cells,
18 Circuits, Streams, Connections, Onion Routers, and Onion Proxies.
24 The code is divided into two directories: ./src/common and ./src/or.
25 The "common" directory contains general purpose utility functions not
26 specific to onion routing. The "or" directory implements all
27 onion-routing and onion-proxy specific functionality.
29 Files in ./src/common:
31 aes.[ch] -- Implements the AES cipher (with 128-bit keys and blocks),
32 and a counter-mode stream cipher on top of AES. This code is
33 taken from the main Rijndael distribution. (We include this
34 because many people are running older versions of OpenSSL without
37 crypto.[ch] -- Wrapper functions to present a consistent interface to
38 public-key and symmetric cryptography operations from OpenSSL.
40 fakepoll.[ch] -- Used on systems that don't have a poll() system call;
41 reimplements() poll using the select() system call.
43 log.[ch] -- Tor's logging subsystem.
45 test.h -- Macros used by unit tests.
47 torint.h -- Provides missing [u]int*_t types for environments that
50 tortls.[ch] -- Wrapper functions to present a consistent interface to
51 TLS, SSL, and X.509 functions from OpenSSL.
53 util.[ch] -- Miscellaneous portability and convenience functions.
57 [General-purpose modules]
59 or.h -- Common header file: include everything, define everything.
61 buffers.c -- Implements a generic buffer interface. Buffers are
62 fairly opaque string holders that can read to or flush from:
63 memory, file descriptors, or TLS connections.
65 Also implements parsing functions to read HTTP and SOCKS commands
68 tree.h -- A splay tree implementation by Niels Provos. Used only by
71 config.c -- Code to parse and validate the configuration file.
73 [Background processing modules]
75 cpuworker.c -- Implements a separate 'CPU worker' process to perform
76 CPU-intensive tasks in the background, so as not interrupt the
77 onion router. (OR only)
79 dns.c -- Implements a farm of 'DNS worker' processes to perform DNS
80 lookups for onion routers and cache the results. [This needs to
81 be done in the background because of the lack of a good,
82 ubiquitous asynchronous DNS implementation.] (OR only)
84 [Directory-related functionality.]
86 directory.c -- Code to send and fetch directories and router
87 descriptors via HTTP. Directories use dirserv.c to generate the
88 results; clients use routers.c to parse them.
90 dirserv.c -- Code to manage directory contents and generate
91 directories. [Directory server only]
93 routers.c -- Code to parse directories and router descriptors; and to
94 generate a router descriptor corresponding to this OR's
95 capabilities. Also presents some high-level interfaces for
96 managing an OR or OP's view of the directory.
98 [Circuit-related modules.]
100 circuit.c -- Code to create circuits, manage circuits, and route
101 relay cells along circuits.
103 onion.c -- Code to generate and respond to "onion skins".
105 [Core protocol implementation.]
107 connection.c -- Code used in common by all connection types. See
108 1.2. below for more general information about connections.
110 connection_edge.c -- Code used only by edge connections.
112 command.c -- Code to handle specific cell types.
114 connection_or.c -- Code to implement cell-speaking connections.
118 main.c -- Toplevel module. Initializes keys, handles signals,
119 multiplexes between connections, implements main loop, and drives
122 tor_main.c -- Stub module containing a main() function. Allows unit
123 test binary to link against main.c
127 test.c -- Contains unit tests for many pieces of the lower level Tor
130 1.2. All about connections
132 All sockets in Tor are handled as different types of nonblocking
133 'connections'. (What the Tor spec calls a "Connection", the code refers
134 to as a "Cell-speaking" or "OR" connection.)
136 Connections are implemented by the connection_t struct, defined in or.h.
137 Not every kind of connection uses all the fields in connection_t; see
138 the comments in or.h and the assertions in assert_connection_ok() for
141 Every connection has a type and a state. Connections never change their
142 type, but can go through many state changes in their lifetime.
144 The connection types break down as follows:
146 [Cell-speaking connections]
147 CONN_TYPE_OR -- A bidirectional TLS connection transmitting a
148 sequence of cells. May be from an OR to an OR, or from an OP to
152 CONN_TYPE_EXIT -- A TCP connection from an onion router to a
153 Stream's destination. [OR only]
154 CONN_TYPE_AP -- A SOCKS proxy connection from the end user
155 application to the onion proxy. [OP only]
158 CONN_TYPE_OR_LISTENER [OR only]
159 CONN_TYPE_AP_LISTENER [OP only]
160 CONN_TYPE_DIR_LISTENER [Directory server only]
161 -- Bound network sockets, waiting for incoming connections.
164 CONN_TYPE_DNSWORKER -- Connection from the main process to a DNS
165 worker process. [OR only]
167 CONN_TYPE_CPUWORKER -- Connection from the main process to a CPU
168 worker process. [OR only]
170 Connection states are documented in or.h.
172 Every connection has two associated input and output buffers.
173 Listeners don't use them. For non-listener connections, incoming
174 data is appended to conn->inbuf, and outgoing data is taken from the
175 front of conn->outbuf. Connections differ primarily in the functions
176 called to fill and drain these buffers.
178 1.3. All about circuits.
180 A circuit_t structure fills two roles. First, a circuit_t links two
181 connections together: either an edge connection and an OR connection,
182 or two OR connections. (When joined to an OR connection, a circuit_t
183 affects only cells sent to a particular ACI on that connection. When
184 joined to an edge connection, a circuit_t affects all data.)
186 Second, a circuit_t holds the cipher keys and state for sending data
187 along a given circuit. At the OP, it has a sequence of ciphers, each
188 of which is shared with a single OR along the circuit. Separate
189 ciphers are used for data going "forward" (away from the OP) and
190 "backward" (towards the OP). At the OR, a circuit has only two stream
191 ciphers: one for data going forward, and one for data going backward.
193 1.4. Asynchronous IO and the main loop.
195 Tor uses the poll(2) system call (or it wraps select(2) to act like
196 poll, if poll is not available) to handle nonblocking (asynchronous)
197 IO. If you're not familiar with nonblocking IO, check out the links
198 at the end of this document.
200 All asynchronous logic is handled in main.c. The functions
201 'connection_add', 'connection_set_poll_socket', and 'connection_remove'
202 manage an array of connection_t*, and keep in synch with the array of
203 struct pollfd required by poll(2). (This array of connection_t* is
204 accessible via get_connection_array, but users should generally call
205 one of the 'connection_get_by_*' functions in connection.c to look up
206 individual connections.)
208 To trap read and write events, connections call the functions
209 'connection_{is|stop|start}_{reading|writing}'. If you want
210 to completely reset the events you're watching for, use
211 'connection_watch_events'.
213 Every time poll() finishes, main.c calls conn_read and conn_write on
214 every connection. These functions dispatch events that have something
215 to read to connection_handle_read, and events that have something to
216 write to connection_handle_write, respectively.
218 When connections need to be closed, they can respond in two ways. Most
219 simply, they can make connection_handle_* return an error (-1),
220 which will make conn_{read|write} close them. But if it's not
221 convenient to return -1 (for example, processing one connection causes
222 you to realize that a second one should close), then you can also
223 mark a connection to close by setting conn->marked_for_close. Marked
224 connections will be closed at the end of the current iteration of
227 The main loop handles several other operations: First, it checks
228 whether any signals have been received that require a response (HUP,
229 KILL, USR1, CHLD). Second, it calls prepare_for_poll to handle recurring
230 tasks and compute the necessary poll timeout. These recurring tasks
231 include periodically fetching the directory, timing out unused
232 circuits, incrementing flow control windows and re-enabling connections
233 that were blocking for more bandwidth, and maintaining statistics.
235 A word about TLS: Using TLS on OR connections complicates matters in
237 First, a TLS stream has its own read buffer independent of the
238 connection's read buffer. (TLS needs to read an entire frame from
239 the network before it can decrypt any data. Thus, trying to read 1
240 byte from TLS can require that several KB be read from the network
241 and decrypted. The extra data is stored in TLS's decrypt buffer.)
242 Because the data hasn't been read by tor (it's still inside the TLS),
243 this means that sometimes a connection "has stuff to read" even when
244 poll() didn't return POLLIN. The tor_tls_get_pending_bytes function is
245 used in main.c to detect TLS objects with non-empty internal buffers.
246 Second, the TLS stream's events do not correspond directly to network
247 events: sometimes, before a TLS stream can read, the network must be
248 ready to write -- or vice versa.
250 1.5. How data flows (An illustration.)
252 Suppose an OR receives 256 bytes along an OR connection. These 256
253 bytes turn out to be a data relay cell, which gets decrypted and
254 delivered to an edge connection. Here we give a possible call sequence
255 for the delivery of this data.
257 (This may be outdated quickly.)
259 do_main_loop -- Calls poll(2), receives a POLLIN event on a struct
261 conn_read -- Looks up the corresponding connection_t, and calls:
262 connection_handle_read -- Calls:
263 connection_read_to_buf -- Notices that it has an OR connection so:
264 read_to_buf_tls -- Pulls data from the TLS stream onto conn->inbuf.
265 connection_process_inbuf -- Notices that it has an OR connection so:
266 connection_or_process_inbuf -- Checks whether conn is open, and calls:
267 connection_process_cell_from_inbuf -- Notices it has enough data for
269 connection_fetch_from_buf -- Pulls the cell from the buffer.
270 cell_unpack -- Decodes the raw cell into a cell_t
271 command_process_cell -- Notices it is a relay cell, so calls:
272 command_process_relay_cell -- Looks up the circuit for the cell,
273 makes sure the circuit is live, then passes the cell to:
274 circuit_deliver_relay_cell -- Passes the cell to each of:
275 relay_crypt -- Strips a layer of encryption from the cell and
276 notices that the cell is for local delivery.
277 connection_edge_process_relay_cell -- extracts the cell's
278 relay command, and makes sure the edge connection is
279 open. Since it has a DATA cell and an open connection,
281 circuit_consider_sending_sendme -- check if the total number
282 of cells received by all streams on this circuit is
283 enough that we should send back an acknowledgement
284 (requesting that more cells be sent to any stream).
285 connection_write_to_buf -- To place the data on the outgoing
286 buffer of the correct edge connection, by calling:
287 connection_start_writing -- To tell the main poll loop about
289 write_to_buf -- To actually place the outgoing data on the
291 connection_consider_sending_sendme -- if the outbuf waiting
292 to flush to the exit connection is not too full, check
293 if the total number of cells received on this stream
294 is enough that we should send back an acknowledgement
295 (requesting that more cells be sent to this stream).
297 In a subsequent iteration, main notices that the edge connection is
300 do_main_loop -- Calls poll(2), receives a POLLOUT event on a struct
302 conn_write -- Looks up the corresponding connection_t, and calls:
303 connection_handle_write -- This isn't a TLS connection, so calls:
304 flush_buf -- Delivers data from the edge connection's outbuf to the
306 connection_wants_to_flush -- Reports that all data has been flushed.
307 connection_finished_flushing -- Notices the connection is an exit,
309 connection_edge_finished_flushing -- The connection is open, so it
311 connection_stop_writing -- Tells the main poll loop that this
312 connection has no more data to write.
313 connection_consider_sending_sendme -- now that the outbuf
314 is empty, check again if the total number of cells
315 received on this stream is enough that we should send
316 back an acknowledgement (requesting that more cells be
317 sent to this stream).
320 1.6. Routers, descriptors, and directories
322 All Tor processes need to keep track of a list of onion routers, for
324 - OPs need to establish connections and circuits to ORs.
325 - ORs need to establish connections to other ORs.
326 - OPs and ORs need to fetch directories from a directory server.
327 - ORs need to upload their descriptors to directory servers.
328 - Directory servers need to know which ORs are allowed onto the
329 network, what the descriptors are for those ORs, and which of
330 those ORs are currently live.
332 Thus, every Tor process keeps track of a list of all the ORs it knows
333 in a static variable 'directory' in the routers.c module. This
334 variable contains a routerinfo_t object for each known OR. On startup,
335 the directory is initialized to a list of known directory servers (via
336 router_get_list_from_file()). Later, the directory is updated via
337 router_get_dir_from_string(). (OPs and ORs retrieve fresh directories
338 from directory servers; directory servers generate their own.)
340 Every OR must periodically regenerate a router descriptor for itself.
341 The descriptor and the corresponding routerinfo_t are stored in the
342 'desc_routerinfo' and 'descriptor' static variables in routers.c.
344 Additionally, a directory server keeps track of a list of the
345 router descriptors it knows in a separate list in dirserv.c. It
346 uses this list, checking which OR connections are open, to build
357 2. Coding conventions
361 Use tor_malloc, tor_strdup, and tor_gettimeofday instead of their
362 generic equivalents. (They always succeed or exit.)
364 Use INLINE instead of 'inline', so that we work properly on windows.
366 2.2. Calling and naming conventions
368 Whenever possible, functions should return -1 on error and and 0 on
371 For multi-word identifiers, use lowercase words combined with
372 underscores. (e.g., "multi_word_identifier"). Use ALL_CAPS for macros and
375 Typenames should end with "_t".
377 Function names should be prefixed with a module name or object name. (In
378 general, code to manipulate an object should be a module with the same
379 name as the object, so it's hard to tell which convention is used.)
381 Functions that do things should have imperative-verb names
382 (e.g. buffer_clear, buffer_resize); functions that return booleans should
383 have predicate names (e.g. buffer_is_empty, buffer_needs_resizing).
385 2.3. What To Optimize
387 Don't optimize anything if it's not in the critical path. Right now,
388 the critical path seems to be AES, logging, and the network itself.
389 Feel free to do your own profiling to determine otherwise.
393 Log convention: use only these four log severities.
395 ERR is if something fatal just happened.
396 WARN if something bad happened, but we're still running. The
397 bad thing is either a bug in the code, an attack or buggy
398 protocol/implementation of the remote peer, etc. The operator should
399 examine the bad thing and try to correct it.
400 (No error or warning messages should be expected during normal OR or OP
401 operation. I expect most people to run on -l warn eventually. If a
402 library function is currently called such that failure always means
403 ERR, then the library function should log WARN and let the caller
405 INFO means something happened (maybe bad, maybe ok), but there's nothing
406 you need to (or can) do about it.
407 DEBUG is for everything louder than INFO.
409 [XXX Proposed convention: every messages of severity INFO or higher should
410 either (A) be intelligible to end-users who don't know the Tor source; or
411 (B) somehow inform the end-users that they aren't expected to understand
412 the message (perhaps with a string like "internal error"). Option (A) is
413 to be preferred to option (B). -NM]
419 See http://freehaven.net/tor/
420 http://freehaven.net/tor/cvs/doc/tor-spec.txt
421 http://freehaven.net/tor/cvs/doc/tor-design.tex
422 http://freehaven.net/tor/cvs/doc/FAQ
426 See http://freehaven.net/anonbib/
430 [XXX insert references]
433 # ======================================================================
434 # Old HACKING document; merge into the above, move into tor-design.tex,
436 # ======================================================================
439 Routers. Onion routers, as far as the 'tor' program is concerned,
440 are a bunch of data items that are loaded into the router_array when
441 the program starts. Periodically it downloads a new set of routers
442 from a directory server, and updates the router_array. When a new OR
443 connection is started (see below), the relevant information is copied
444 from the router struct to the connection struct.
446 Connections. A connection is a long-standing tcp socket between
447 nodes. A connection is named based on what it's connected to -- an "OR
448 connection" has an onion router on the other end, an "OP connection" has
449 an onion proxy on the other end, an "exit connection" has a website or
450 other server on the other end, and an "AP connection" has an application
451 proxy (and thus a user) on the other end.
453 Circuits. A circuit is a path over the onion routing
454 network. Applications can connect to one end of the circuit, and can
455 create exit connections at the other end of the circuit. AP and exit
456 connections have only one circuit associated with them (and thus these
457 connection types are closed when the circuit is closed), whereas OP and
458 OR connections multiplex many circuits at once, and stay standing even
459 when there are no circuits running over them.
461 Streams. Streams are specific conversations between an AP and an exit.
462 Streams are multiplexed over circuits.
464 Cells. Some connections, specifically OR and OP connections, speak
465 "cells". This means that data over that connection is bundled into 256
466 byte packets (8 bytes of header and 248 bytes of payload). Each cell has
467 a type, or "command", which indicates what it's for.
471 [XXX no longer up to date]
472 Bandwidth throttling. Each cell-speaking connection has a maximum
473 bandwidth it can use, as specified in the routers.or file. Bandwidth
474 throttling can occur on both the sender side and the receiving side. If
475 the LinkPadding option is on, the sending side sends cells at regularly
476 spaced intervals (e.g., a connection with a bandwidth of 25600B/s would
477 queue a cell every 10ms). The receiving side protects against misbehaving
478 servers that send cells more frequently, by using a simple token bucket:
480 Each connection has a token bucket with a specified capacity. Tokens are
481 added to the bucket each second (when the bucket is full, new tokens
482 are discarded.) Each token represents permission to receive one byte
483 from the network --- to receive a byte, the connection must remove a
484 token from the bucket. Thus if the bucket is empty, that connection must
485 wait until more tokens arrive. The number of tokens we add enforces a
486 longterm average rate of incoming bytes, yet we still permit short-term
487 bursts above the allowed bandwidth. Currently bucket sizes are set to
488 ten seconds worth of traffic.
490 The bandwidth throttling uses TCP to push back when we stop reading.
491 We extend it with token buckets to allow more flexibility for traffic
494 Data congestion control. Even with the above bandwidth throttling,
495 we still need to worry about congestion, either accidental or intentional.
496 If a lot of people make circuits into same node, and they all come out
497 through the same connection, then that connection may become saturated
498 (be unable to send out data cells as quickly as it wants to). An adversary
499 can make a 'put' request through the onion routing network to a webserver
500 he owns, and then refuse to read any of the bytes at the webserver end
501 of the circuit. These bottlenecks can propagate back through the entire
502 network, mucking up everything.
504 (See the tor-spec.txt document for details of how congestion control
507 In practice, all the nodes in the circuit maintain a receive window
508 close to maximum except the exit node, which stays around 0, periodically
509 receiving a sendme and reading more data cells from the webserver.
510 In this way we can use pretty much all of the available bandwidth for
511 data, but gracefully back off when faced with multiple circuits (a new
512 sendme arrives only after some cells have traversed the entire network),
513 stalled network connections, or attacks.
515 We don't need to reimplement full tcp windows, with sequence numbers,
516 the ability to drop cells when we're full etc, because the tcp streams
517 already guarantee in-order delivery of each cell. Rather than trying
518 to build some sort of tcp-on-tcp scheme, we implement this minimal data
519 congestion control; so far it's enough.
521 Router twins. In many cases when we ask for a router with a given
522 address and port, we really mean a router who knows a given key. Router
523 twins are two or more routers that share the same private key. We thus
524 give routers extra flexibility in choosing the next hop in the circuit: if
525 some of the twins are down or slow, it can choose the more available ones.
527 Currently the code tries for the primary router first, and if it's down,
528 chooses the first available twin.