doc/HACKING

   1                         Guide to Hacking Tor
   2
   3 (As of 8 October 2003, this was all accurate.  If you're reading this in
   4 the distant future, stuff may have changed.)
   5
   6 0. Intro and required reading
   7
   8   Onion Routing is still very much in development stages. This document
   9   aims to get you started in the right direction if you want to understand
  10   the code, add features, fix bugs, etc.
  11
  12   Read the README file first, so you can get familiar with the basics of
  13   installing and running an onion router.
  14
  15   Then, skim some of the introductory materials in tor-design.pdf,
  16   tor-spec.txt, and the Tor FAQ to learn more about how the Tor protocol
  17   is supposed to work.  This document will assume you know about Cells,
  18   Circuits, Streams, Connections, Onion Routers, and Onion Proxies.
  19
  20 1. Code organization
  21
  22 1.1. The modules
  23
  24   The code is divided into two directories: ./src/common and ./src/or.
  25   The "common" directory contains general purpose utility functions not
  26   specific to onion routing.  The "or" directory implements all
  27   onion-routing and onion-proxy specific functionality.
  28
  29   Files in ./src/common:
  30
  31      aes.[ch] -- Implements the AES cipher (with 128-bit keys and blocks),
  32         and a counter-mode stream cipher on top of AES.  This code is
  33         taken from the main Rijndael distribution.  (We include this
  34         because many people are running older versions of OpenSSL without
  35         AES support.)
  36
  37      compat.[ch] -- Wrappers to make calls more portable.  This code defines
  38         functions such as tor_malloc, tor_snprintf, get/set various data types,
  39         renaming, setting socket options, switching user IDs.  It is basically
  40         where the non-portable items are conditionally included depending on
  41         the platform.
  42
  43      container.[ch] -- Implements a smart list which is a resizable array along
  44         with helper functions to use on these lists.  Also includes a
  45         splay-tree implementation of the string-to-void* map.
  46
  47      crypto.[ch] -- Wrapper functions to present a consistent interface to
  48         public-key and symmetric cryptography operations from OpenSSL.
  49
  50      log.[ch] -- Tor's logging subsystem.
  51
  52      strlcat.c -- Safer, size-bounded string concatenation.  Use this instead
  53         of strncat because it has a safer API.  Included for platforms that
  54         that don't already ship this code.
  55
  56      strlcpy.c -- Safer, size-bounded string copying.  Use this instead of
  57         strncpy because it is a safer API which guarantees to NUL terminate.
  58         Included for platforms that don't already ship this code.
  59
  60      test.h -- Macros used by unit tests.
  61
  62      torgzip.[ch] -- A simple in-memory gzip implementation.
  63
  64      torint.h -- Provides missing [u]int*_t types for environments that
  65         don't have stdint.h.
  66
  67      tortls.[ch] -- Wrapper functions to present a consistent interface to
  68         TLS, SSL, and X.509 functions from OpenSSL.
  69
  70      util.[ch] -- Miscellaneous portability and convenience functions.
  71
  72   Files in ./src/or:
  73
  74    [General-purpose modules]
  75
  76      or.h -- Common header file: include everything, define everything.
  77
  78      buffers.c -- Implements a generic buffer interface.  Buffers are
  79         fairly opaque string holders that can read to or flush from:
  80         memory, file descriptors, or TLS connections.
  81
  82         Also implements parsing functions to read HTTP and SOCKS commands
  83         from buffers.
  84
  85      tree.h -- A splay tree implementation by Niels Provos.  Used by
  86         dns.c for dns caching at exits, and by connection_edge.c for dns
  87         caching at clients.
  88
  89      config.c -- Code to parse and validate the configuration file.
  90
  91    [Background processing modules]
  92
  93      cpuworker.c -- Implements a farm of 'CPU worker' processes to perform
  94         CPU-intensive tasks in the background, so as not interrupt the
  95         onion router.  (OR only)
  96
  97      dns.c -- Implements a farm of 'DNS worker' processes to perform DNS
  98         lookups for onion routers and cache the results.  [This needs to
  99         be done in the background because of the lack of a good,
 100         ubiquitous asynchronous DNS implementation.] (OR only)
 101
 102    [Directory-related functionality.]
 103
 104      directory.c -- Code to send and fetch directories and router
 105         descriptors via HTTP.  Directories use dirserv.c to generate the
 106         results; clients use routers.c to parse them.
 107
 108      dirserv.c -- Code to manage directory contents and generate
 109         directories. [Directory server only]
 110
 111      router.c -- Code to parse directories and router descriptors; and to
 112         generate a router descriptor corresponding to this OR's
 113         capabilities.  Also presents some high-level interfaces for
 114         managing an OR or OP's view of the directory.
 115
 116    [Circuit-related modules.]
 117
 118      circuitbuild.c -- Creates circuits.
 119
 120      circuitlist.c -- Manage the global circuit list.
 121
 122      circuituse.c -- Launch the right type of circuits and attach streams
 123         to them.
 124
 125      onion.c -- Code to generate and respond to "onion skins".
 126
 127      relay.c -- Handle relay cell encryption/decryption along with packaging
 128         and receiving from circuits.
 129
 130    [Core protocol implementation.]
 131
 132      command.c -- Code to handle specific cell types.
 133
 134      connection.c -- Code used in common by all connection types.  See
 135         1.2. below for more general information about connections.
 136
 137      connection_edge.c -- Code used only by edge connections.
 138
 139      connection_or.c -- Code to implement cell-speaking connections.
 140
 141    [Hidden services]
 142
 143      rendclient.c -- Client code to access location-hidden services.  This
 144         allows clients and servers to run services and have people connect
 145         without either end knowing who they are connecting to.
 146
 147      rendcommon.c -- Rendevzous implementation: Shared code between
 148         introducers, services, clients, and rendezvous points.
 149
 150      rendmid.c -- Implement introduction and rendezvous points.
 151
 152      rendservice.c -- Hidden-service side of rendezvous functionality.
 153
 154    [Reputation]
 155
 156      rephist.c -- Basic history functionality for reputation module.
 157
 158    [Router lists]
 159
 160      routerlist.c -- Code to maintain and access global list of routerinfos for
 161         known servers.
 162
 163      routerparse.c -- Code to parse and validate router descriptors and
 164         directories.
 165
 166    [Bandwidth and GUI]
 167
 168      control.c -- Implementation of Tor's control socket interface.  Useful
 169         for designing GUIs to interact with Tor.
 170
 171      hibernate.c -- Functions to close listeners, stop allowing new circuits,
 172         and so on in preparation of closing down or going dormant.  Also used
 173         to track bandwidth and time intervals to know when to hibernate.
 174
 175    [Toplevel modules.]
 176
 177      main.c -- Toplevel module.  Initializes keys, handles signals,
 178         multiplexes between connections, implements main loop, and drives
 179         scheduled events.
 180
 181      tor_main.c -- Stub module containing a main() function.  Allows unit
 182         test binary to link against main.c
 183
 184    [Unit tests]
 185
 186      test.c -- Contains unit tests for many pieces of the lower level Tor
 187         modules.
 188
 189
 190 1.2. All about connections
 191
 192   All sockets in Tor are handled as different types of nonblocking
 193   'connections'.  (What the Tor spec calls a "Connection", the code refers
 194   to as a "Cell-speaking" or "OR" connection.)
 195
 196   Connections are implemented by the connection_t struct, defined in or.h.
 197   Not every kind of connection uses all the fields in connection_t; see
 198   the comments in or.h and the assertions in assert_connection_ok() for
 199   more information.
 200
 201   Every connection has a type and a state.  Connections never change their
 202   type, but can go through many state changes in their lifetime.
 203
 204   The connection types break down as follows:
 205
 206      [Cell-speaking connections]
 207        CONN_TYPE_OR -- A bidirectional TLS connection transmitting a
 208           sequence of cells.  May be from an OR to an OR, or from an OP to
 209           an OR.
 210
 211      [Edge connections]
 212        CONN_TYPE_EXIT -- A TCP connection from an onion router to a
 213           Stream's destination. [OR only]
 214        CONN_TYPE_AP -- A SOCKS proxy connection from the end user
 215           application to the onion proxy.  [OP only]
 216
 217      [Listeners]
 218        CONN_TYPE_OR_LISTENER [OR only]
 219        CONN_TYPE_AP_LISTENER [OP only]
 220        CONN_TYPE_DIR_LISTENER [Directory server only]
 221           -- Bound network sockets, waiting for incoming connections.
 222
 223      [Internal]
 224        CONN_TYPE_DNSWORKER -- Connection from the main process to a DNS
 225           worker process. [OR only]
 226
 227        CONN_TYPE_CPUWORKER -- Connection from the main process to a CPU
 228           worker process. [OR only]
 229
 230    Connection states are documented in or.h.
 231
 232    Every connection has two associated input and output buffers.
 233    Listeners don't use them.  For non-listener connections, incoming
 234    data is appended to conn->inbuf, and outgoing data is taken from the
 235    front of conn->outbuf.  Connections differ primarily in the functions
 236    called to fill and drain these buffers.
 237
 238 1.3. All about circuits.
 239
 240    A circuit_t structure fills two roles.  First, a circuit_t links two
 241    connections together: either an edge connection and an OR connection,
 242    or two OR connections.  (When joined to an OR connection, a circuit_t
 243    affects only cells sent to a particular circID on that connection.  When
 244    joined to an edge connection, a circuit_t affects all data.)
 245
 246    Second, a circuit_t holds the cipher keys and state for sending data
 247    along a given circuit.  At the OP, it has a sequence of ciphers, each
 248    of which is shared with a single OR along the circuit.  Separate
 249    ciphers are used for data going "forward" (away from the OP) and
 250    "backward" (towards the OP).  At the OR, a circuit has only two stream
 251    ciphers: one for data going forward, and one for data going backward.
 252
 253 1.4. Asynchronous IO and the main loop.
 254
 255    Tor uses the poll(2) system call (or it wraps select(2) to act like
 256    poll, if poll is not available) to handle nonblocking (asynchronous)
 257    IO.  If you're not familiar with nonblocking IO, check out the links
 258    at the end of this document.
 259
 260    All asynchronous logic is handled in main.c.  The functions
 261    'connection_add', 'connection_set_poll_socket', and 'connection_remove'
 262    manage an array of connection_t*, and keep in synch with the array of
 263    struct pollfd required by poll(2).  (This array of connection_t* is
 264    accessible via get_connection_array, but users should generally call
 265    one of the 'connection_get_by_*' functions in connection.c to look up
 266    individual connections.)
 267
 268    To trap read and write events, connections call the functions
 269    'connection_{is|stop|start}_{reading|writing}'. If you want
 270    to completely reset the events you're watching for, use
 271    'connection_watch_events'.
 272
 273    Every time poll() finishes, main.c calls conn_read and conn_write on
 274    every connection. These functions dispatch events that have something
 275    to read to connection_handle_read, and events that have something to
 276    write to connection_handle_write, respectively.
 277
 278    When connections need to be closed, they can respond in two ways.  Most
 279    simply, they can make connection_handle_* return an error (-1),
 280    which will make conn_{read|write} close them.  But if it's not
 281    convenient to return -1 (for example, processing one connection causes
 282    you to realize that a second one should close), then you can also
 283    mark a connection to close by setting conn->marked_for_close. Marked
 284    connections will be closed at the end of the current iteration of
 285    the main loop.
 286
 287    The main loop handles several other operations: First, it checks
 288    whether any signals have been received that require a response (HUP,
 289    KILL, USR1, CHLD).  Second, it calls prepare_for_poll to handle recurring
 290    tasks and compute the necessary poll timeout.  These recurring tasks
 291    include periodically fetching the directory, timing out unused
 292    circuits, incrementing flow control windows and re-enabling connections
 293    that were blocking for more bandwidth, and maintaining statistics.
 294
 295    A word about TLS: Using TLS on OR connections complicates matters in
 296    two ways.
 297    First, a TLS stream has its own read buffer independent of the
 298    connection's read buffer.  (TLS needs to read an entire frame from
 299    the network before it can decrypt any data.  Thus, trying to read 1
 300    byte from TLS can require that several KB be read from the network
 301    and decrypted.  The extra data is stored in TLS's decrypt buffer.)
 302    Because the data hasn't been read by tor (it's still inside the TLS),
 303    this means that sometimes a connection "has stuff to read" even when
 304    poll() didn't return POLLIN. The tor_tls_get_pending_bytes function is
 305    used in main.c to detect TLS objects with non-empty internal buffers.
 306    Second, the TLS stream's events do not correspond directly to network
 307    events: sometimes, before a TLS stream can read, the network must be
 308    ready to write -- or vice versa.
 309
 310 1.5. How data flows (An illustration.)
 311
 312    Suppose an OR receives 256 bytes along an OR connection.  These 256
 313    bytes turn out to be a data relay cell, which gets decrypted and
 314    delivered to an edge connection.  Here we give a possible call sequence
 315    for the delivery of this data.
 316
 317    (This may be outdated quickly.)
 318
 319    do_main_loop -- Calls poll(2), receives a POLLIN event on a struct
 320                  pollfd, then calls:
 321     conn_read -- Looks up the corresponding connection_t, and calls:
 322      connection_handle_read -- Calls:
 323       connection_read_to_buf -- Notices that it has an OR connection so:
 324        read_to_buf_tls -- Pulls data from the TLS stream onto conn->inbuf.
 325       connection_process_inbuf -- Notices that it has an OR connection so:
 326        connection_or_process_inbuf -- Checks whether conn is open, and calls:
 327         connection_process_cell_from_inbuf -- Notices it has enough data for
 328                  a cell, then calls:
 329          connection_fetch_from_buf -- Pulls the cell from the buffer.
 330          cell_unpack -- Decodes the raw cell into a cell_t
 331          command_process_cell -- Notices it is a relay cell, so calls:
 332           command_process_relay_cell -- Looks up the circuit for the cell,
 333                  makes sure the circuit is live, then passes the cell to:
 334            circuit_deliver_relay_cell -- Passes the cell to each of:
 335             relay_crypt -- Strips a layer of encryption from the cell and
 336                  notices that the cell is for local delivery.
 337             connection_edge_process_relay_cell -- extracts the cell's
 338                  relay command, and makes sure the edge connection is
 339                  open.  Since it has a DATA cell and an open connection,
 340                  calls:
 341              circuit_consider_sending_sendme -- check if the total number
 342                  of cells received by all streams on this circuit is
 343                  enough that we should send back an acknowledgement
 344                  (requesting that more cells be sent to any stream).
 345              connection_write_to_buf -- To place the data on the outgoing
 346                  buffer of the correct edge connection, by calling:
 347               connection_start_writing -- To tell the main poll loop about
 348                  the pending data.
 349               write_to_buf -- To actually place the outgoing data on the
 350                  edge connection.
 351              connection_consider_sending_sendme -- if the outbuf waiting
 352                  to flush to the exit connection is not too full, check
 353                  if the total number of cells received on this stream
 354                  is enough that we should send back an acknowledgement
 355                  (requesting that more cells be sent to this stream).
 356
 357    In a subsequent iteration, main notices that the edge connection is
 358    ready for writing:
 359
 360    do_main_loop -- Calls poll(2), receives a POLLOUT event on a struct
 361                  pollfd, then calls:
 362     conn_write -- Looks up the corresponding connection_t, and calls:
 363      connection_handle_write -- This isn't a TLS connection, so calls:
 364       flush_buf -- Delivers data from the edge connection's outbuf to the
 365                  network.
 366       connection_wants_to_flush -- Reports that all data has been flushed.
 367       connection_finished_flushing -- Notices the connection is an exit,
 368                  and calls:
 369        connection_edge_finished_flushing -- The connection is open, so it
 370                  calls:
 371         connection_stop_writing -- Tells the main poll loop that this
 372                  connection has no more data to write.
 373         connection_consider_sending_sendme -- now that the outbuf
 374                  is empty, check again if the total number of cells
 375                  received on this stream is enough that we should send
 376                  back an acknowledgement (requesting that more cells be
 377                  sent to this stream).
 378
 379 1.6. Routers, descriptors, and directories
 380
 381    All Tor processes need to keep track of a list of onion routers, for
 382    several reasons:
 383        - OPs need to establish connections and circuits to ORs.
 384        - ORs need to establish connections to other ORs.
 385        - OPs and ORs need to fetch directories from a directory server.
 386        - ORs need to upload their descriptors to directory servers.
 387        - Directory servers need to know which ORs are allowed onto the
 388          network, what the descriptors are for those ORs, and which of
 389          those ORs are currently live.
 390
 391    Thus, every Tor process keeps track of a list of all the ORs it knows
 392    in a static variable 'directory' in the routers.c module.  This
 393    variable contains a routerinfo_t object for each known OR. On startup,
 394    the directory is initialized to a list of known directory servers (via
 395    router_get_list_from_file()).  Later, the directory is updated via
 396    router_get_dir_from_string().  (OPs and ORs retrieve fresh directories
 397    from directory servers; directory servers generate their own.)
 398
 399    Every OR must periodically regenerate a router descriptor for itself.
 400    The descriptor and the corresponding routerinfo_t are stored in the
 401    'desc_routerinfo' and 'descriptor' static variables in routers.c.
 402
 403    Additionally, a directory server keeps track of a list of the
 404    router descriptors it knows in a separate list in dirserv.c.  It
 405    uses this list, checking which OR connections are open, to build
 406    directories.
 407
 408 1.7. Data model
 409
 410   [XXX]
 411
 412 1.8. Flow control
 413
 414   [XXX]
 415
 416 2. Coding conventions
 417
 418 2.1. Details
 419
 420   Use tor_malloc, tor_free, tor_snprintf, tor_strdup, and tor_gettimeofday
 421   instead of their generic equivalents.  (They always succeed or exit.)
 422
 423   Use INLINE instead of 'inline', so that we work properly on windows.
 424
 425 2.2. Calling and naming conventions
 426
 427   Whenever possible, functions should return -1 on error and and 0 on
 428   success.
 429
 430   For multi-word identifiers, use lowercase words combined with
 431   underscores. (e.g., "multi_word_identifier").  Use ALL_CAPS for macros and
 432   constants.
 433
 434   Typenames should end with "_t".
 435
 436   Function names should be prefixed with a module name or object name.  (In
 437   general, code to manipulate an object should be a module with the same
 438   name as the object, so it's hard to tell which convention is used.)
 439
 440   Functions that do things should have imperative-verb names
 441   (e.g. buffer_clear, buffer_resize); functions that return booleans should
 442   have predicate names (e.g. buffer_is_empty, buffer_needs_resizing).
 443
 444 2.3. What To Optimize
 445
 446   Don't optimize anything if it's not in the critical path.  Right now,
 447   the critical path seems to be AES, logging, and the network itself.
 448   Feel free to do your own profiling to determine otherwise.
 449
 450 2.4. Log conventions
 451
 452   Log convention: use only these four log severities.
 453
 454     ERR is if something fatal just happened.
 455     WARN if something bad happened, but we're still running. The
 456       bad thing is either a bug in the code, an attack or buggy
 457       protocol/implementation of the remote peer, etc. The operator should
 458       examine the bad thing and try to correct it.
 459     NOTICE if it's something the operator will want to know about.
 460     (No error or warning messages should be expected during normal OR or OP
 461       operation. I expect most people to run on -l notice eventually. If a
 462       library function is currently called such that failure always means
 463       ERR, then the library function should log WARN and let the caller
 464       log ERR.)
 465     INFO means something happened (maybe bad, maybe ok), but there's nothing
 466       you need to (or can) do about it.
 467     DEBUG is for everything louder than INFO.
 468
 469   [XXX Proposed convention: every messages of severity INFO or higher should
 470   either (A) be intelligible to end-users who don't know the Tor source; or
 471   (B) somehow inform the end-users that they aren't expected to understand
 472   the message (perhaps with a string like "internal error").  Option (A) is
 473   to be preferred to option (B). -NM]
 474
 475 2.5. Doxygen
 476
 477   We use the 'doxygen' utility to generate documentation from our source code.
 478   Here's how to use it:
 479
 480   1. Begin every file that should be documented with
 481          /**
 482           * \file filename.c
 483           * \brief Short desccription of the file
 484           */
 485
 486      (Doxygen will recognize any comment beginning with /** as special.)
 487
 488   2. Before any function, structure, #define, or variable you want to
 489      document, add a comment of the form:
 490
 491         /** Describe the function's actions in imperative sentences.
 492          *
 493          * Use blank lines for paragraph breaks
 494          *   - and
 495          *   - hyphens
 496          *   - for
 497          *   - lists.
 498          *
 499          * Write <b>argument_names</b> in boldface.
 500          *
 501          * \code
 502          *     place_example_code();
 503          *     between_code_and_endcode_commands();
 504          * \endcode
 505          */
 506
 507   3. Make sure to escape the characters "<", ">", "\", "%" and "#" as "\<",
 508      "\>", "\\", "\%", and "\#".
 509
 510   4. To document structure members, you can use two forms:
 511
 512        struct foo {
 513          /** You can put the comment before an element; */
 514          int a;
 515          int b; /**< Or use the less-than symbol to put the comment after the element. */
 516        };
 517
 518   5. To generate documentation from the Tor source code, type:
 519
 520      $ doxygen -g
 521
 522      To generate a file called 'Doxyfile'.  Edit that file and run 'doxygen' to
 523      generate the aPI documentation.
 524
 525   6. See the Doxygen manual for more information; this summary just scratches
 526      the surface.
 527
 528 3. References
 529
 530   About Tor
 531
 532      See http://tor.eff.org/
 533          http://tor.eff.org/cvs/doc/tor-spec.txt
 534          http://tor.eff.org/cvs/doc/tor-design.tex
 535          http://tor.eff.org/cvs/doc/FAQ
 536
 537   About anonymity
 538
 539      See http://freehaven.net/anonbib/
 540
 541   About nonblocking IO
 542
 543      [XXX insert references]
 544
 545 # ======================================================================
 546 # Old HACKING document; merge into the above, move into tor-design.tex,
 547 # or delete.
 548 # ======================================================================
 549 The pieces.
 550
 551   Routers. Onion routers, as far as the 'tor' program is concerned,
 552   are a bunch of data items that are loaded into the router_array when
 553   the program starts. Periodically it downloads a new set of routers
 554   from a directory server, and updates the router_array. When a new OR
 555   connection is started (see below), the relevant information is copied
 556   from the router struct to the connection struct.
 557
 558   Connections. A connection is a long-standing tcp socket between
 559   nodes. A connection is named based on what it's connected to -- an "OR
 560   connection" has an onion router on the other end, an "OP connection" has
 561   an onion proxy on the other end, an "exit connection" has a website or
 562   other server on the other end, and an "AP connection" has an application
 563   proxy (and thus a user) on the other end.
 564
 565   Circuits. A circuit is a path over the onion routing
 566   network. Applications can connect to one end of the circuit, and can
 567   create exit connections at the other end of the circuit. AP and exit
 568   connections have only one circuit associated with them (and thus these
 569   connection types are closed when the circuit is closed), whereas OP and
 570   OR connections multiplex many circuits at once, and stay standing even
 571   when there are no circuits running over them.
 572
 573   Streams. Streams are specific conversations between an AP and an exit.
 574   Streams are multiplexed over circuits.
 575
 576   Cells. Some connections, specifically OR and OP connections, speak
 577   "cells". This means that data over that connection is bundled into 512
 578   byte packets (14 bytes of header and 498 bytes of payload). Each cell has
 579   a type, or "command", which indicates what it's for.
 580