1 @mainpage Tor source reference
5 @section welcome Welcome to Tor
7 This documentation describes the general structure of the Tor codebase, how
8 it fits together, what functionality is available for extending Tor, and
9 gives some notes on how Tor got that way. It also includes a reference for
10 nearly every function, type, file, and module in the Tor source code. The
11 high-level documentation is a work in progress.
13 Tor itself remains a work in progress too: We've been working on it for
14 nearly two decades, and we've learned a lot about good coding since we first
15 started. This means, however, that some of the older pieces of Tor will have
16 some "code smell" in them that could stand a brisk refactoring. So when we
17 describe a piece of code, we'll sometimes give a note on how it got that way,
18 and whether we still think that's a good idea.
20 This document is not an overview of the Tor protocol. For that, see the
21 design paper and the specifications at https://spec.torproject.org/ .
23 For more information about Tor's coding standards and some helpful
24 development tools, see
25 [doc/HACKING](https://gitweb.torproject.org/tor.git/tree/doc/HACKING) in the
28 @section topics Topic-related documentation
34 @subpage initialization
44 @subpage time_periodic
46 @subpage configuration
48 @subpage publish_subscribe
50 @page intro A high-level overview
54 @section highlevel The very high level
56 Ultimately, Tor runs as an event-driven network daemon: it responds to
57 network events, signals, and timers by sending and receiving things over
58 the network. Clients, relays, and directory authorities all use the
59 same codebase: the Tor process will run as a client, relay, or authority
60 depending on its configuration.
62 Tor has a few major dependencies, including Libevent (used to tell which
63 sockets are readable and writable), OpenSSL or NSS (used for many encryption
64 functions, and to implement the TLS protocol), and zlib (used to
65 compress and uncompress directory information).
67 Most of Tor's work today is done in a single event-driven main thread.
68 Tor also spawns one or more worker threads to handle CPU-intensive
69 tasks. (Right now, this only includes circuit encryption and the more
70 expensive compression algorithms.)
72 On startup, Tor initializes its libraries, reads and responds to its
73 configuration files, and launches a main event loop. At first, the only
74 events that Tor listens for are a few signals (like TERM and HUP), and
75 one or more listener sockets (for different kinds of incoming
76 connections). Tor also configures several timers to handle periodic
77 events. As Tor runs over time, other events will open, and new events
80 The codebase is divided into a few top-level subdirectories, each of
81 which contains several sub-modules.
83 - `ext` -- Code maintained elsewhere that we include in the Tor
86 - \refdir{lib} -- Lower-level utility code, not necessarily
89 - `trunnel` -- Automatically generated code (from the Trunnel
90 tool): used to parse and encode binary formats.
92 - \refdir{core} -- Networking code that is implements the central
93 parts of the Tor protocol and main loop.
95 - \refdir{feature} -- Aspects of Tor (like directory management,
96 running a relay, running a directory authorities, managing a list of
97 nodes, running and using onion services) that are built on top of the
100 - \refdir{app} -- Highest-level functionality; responsible for setting
101 up and configuring the Tor daemon, making sure all the lower-level
102 modules start up when required, and so on.
104 - \refdir{tools} -- Binaries other than Tor that we produce.
105 Currently this is tor-resolve, tor-gencert, and the tor_runner.o helper
108 - `test` -- unit tests, regression tests, and a few integration
111 In theory, the above parts of the codebase are sorted from highest-level to
112 lowest-level, where high-level code is only allowed to invoke lower-level
113 code, and lower-level code never includes or depends on code of a higher
114 level. In practice, this refactoring is incomplete: The modules in
115 \refdir{lib} are well-factored, but there are many layer violations ("upward
116 dependencies") in \refdir{core} and \refdir{feature}.
117 We aim to eliminate those over time.
119 @section keyabstractions Some key high-level abstractions
121 The most important abstractions at Tor's high-level are Connections,
122 Channels, Circuits, and Nodes.
124 A 'Connection' (connection_t) represents a stream-based information flow.
125 Most connections are TCP connections to remote Tor servers and clients. (But
126 as a shortcut, a relay will sometimes make a connection to itself without
127 actually using a TCP connection. More details later on.) Connections exist
128 in different varieties, depending on what functionality they provide. The
129 principle types of connection are edge_connection_t (eg a socks connection or
130 a connection from an exit relay to a destination), or_connection_t (a TLS
131 stream connecting to a relay), dir_connection_t (an HTTP connection to learn
132 about the network), and control_connection_t (a connection from a
135 A 'Circuit' (circuit_t) is persistent tunnel through the Tor network,
136 established with public-key cryptography, and used to send cells one or more
137 hops. Clients keep track of multi-hop circuits (origin_circuit_t), and the
138 cryptography associated with each hop. Relays, on the other hand, keep track
139 only of their hop of each circuit (or_circuit_t).
141 A 'Channel' (channel_t) is an abstract view of sending cells to and from a
142 Tor relay. Currently, all channels are implemented using OR connections
143 (channel_tls_t). If we switch to other strategies in the future, we'll have
144 more connection types.
146 A 'Node' (node_t) is a view of a Tor instance's current knowledge and opinions
147 about a Tor relay or bridge.