1 Hacking Tor: An Incomplete Guide
2 ================================
7 For full information on how Tor is supposed to work, look at the files in
8 https://gitweb.torproject.org/torspec.git/tree
10 For an explanation of how to change Tor's design to work differently, look at
11 https://gitweb.torproject.org/torspec.git/blob_plain/HEAD:/proposals/001-process.txt
13 For the latest version of the code, get a copy of git, and
15 git clone https://git.torproject.org/git/tor
17 We talk about Tor on the tor-talk mailing list. Design proposals and
18 discussion belong on the tor-dev mailing list. We hang around on
19 irc.oftc.net, with general discussion happening on #tor and development
20 happening on #tor-dev.
22 How we use Git branches
23 -----------------------
25 Each main development series (like 0.2.1, 0.2.2, etc) has its main work
26 applied to a single branch. At most one series can be the development series
27 at a time; all other series are maintenance series that get bug-fixes only.
28 The development series is built in a git branch called "master"; the
29 maintenance series are built in branches called "maint-0.2.0", "maint-0.2.1",
30 and so on. We regularly merge the active maint branches forward.
32 For all series except the development series, we also have a "release" branch
33 (as in "release-0.2.1"). The release series is based on the corresponding
34 maintenance series, except that it deliberately lags the maint series for
35 most of its patches, so that bugfix patches are not typically included in a
36 maintenance release until they've been tested for a while in a development
37 release. Occasionally, we'll merge an urgent bugfix into the release branch
38 before it gets merged into maint, but that's rare.
40 If you're working on a bugfix for a bug that occurs in a particular version,
41 base your bugfix branch on the "maint" branch for the first supported series
42 that has that bug. (As of June 2013, we're supporting 0.2.3 and later.) If
43 you're working on a new feature, base it on the master branch.
49 When you do a commit that needs a ChangeLog entry, add a new file to
50 the "changes" toplevel subdirectory. It should have the format of a
51 one-entry changelog section from the current ChangeLog file, as in
54 - Fix a potential buffer overflow. Fixes bug 99999; bugfix on
57 To write a changes file, first categorize the change. Some common categories
58 are: Minor bugfixes, Major bugfixes, Minor features, Major features, Code
59 simplifications and refactoring. Then say what the change does. If
60 it's a bugfix, mention what bug it fixes and when the bug was
61 introduced. To find out which Git tag the change was introduced in,
62 you can use "git describe --contains <sha1 of commit>".
64 If at all possible, try to create this file in the same commit where
65 you are making the change. Please give it a distinctive name that no
66 other branch will use for the lifetime of your change.
68 When we go to make a release, we will concatenate all the entries
69 in changes to make a draft changelog, and clear the directory. We'll
70 then edit the draft changelog into a nice readable format.
72 What needs a changes file?::
73 A not-exhaustive list: Anything that might change user-visible
74 behavior. Anything that changes internals, documentation, or the build
75 system enough that somebody could notice. Big or interesting code
76 rewrites. Anything about which somebody might plausibly wonder "when
77 did that happen, and/or why did we do that" 6 months down the line.
79 Why use changes files instead of Git commit messages?::
80 Git commit messages are written for developers, not users, and they
81 are nigh-impossible to revise after the fact.
83 Why use changes files instead of entries in the ChangeLog?::
84 Having every single commit touch the ChangeLog file tended to create
85 zillions of merge conflicts.
90 These aren't strictly necessary for hacking on Tor, but they can help track
96 http://jenkins.torproject.org
101 The dmalloc library will keep track of memory allocation, so you can find out
102 if we're leaking memory, doing any double-frees, or so on.
104 dmalloc -l ~/dmalloc.log
105 (run the commands it tells you)
106 ./configure --with-dmalloc
111 valgrind --leak-check=yes --error-limit=no --show-reachable=yes src/or/tor
113 (Note that if you get a zillion openssl warnings, you will also need to
114 pass --undef-value-errors=no to valgrind, or rebuild your openssl
117 Running gcov for unit test coverage
118 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
122 make CFLAGS='-g -fprofile-arcs -ftest-coverage'
124 gcov -o src/common src/common/*.[ch]
125 gcov -o src/or src/or/*.[ch]
126 cd ../or; gcov *.[ch]
129 Then, look at the .gcov files. '-' before a line means that the
130 compiler generated no code for that line. '######' means that the
131 line was never reached. Lines with numbers were called that number
134 If that doesn't work:
135 * Try configuring Tor with --disable-gcc-hardening
136 * On recent OSX versions, you might need to add CC=clang to your
138 make CFLAGS='-g -fprofile-arcs -ftest-coverage' CC=clang
139 Their llvm-gcc doesn't work so great for me.
141 Profiling Tor with oprofile
142 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
144 The oprofile tool runs (on Linux only!) to tell you what functions Tor is
145 spending its CPU time in, so we can identify berformance pottlenecks.
147 Here are some basic instructions
149 - Build tor with debugging symbols (you probably already have, unless
150 you messed with CFLAGS during the build process).
151 - Build all the libraries you care about with debugging symbols
152 (probably you only care about libssl, maybe zlib and Libevent).
153 - Copy this tor to a new directory
154 - Copy all the libraries it uses to that dir too (ldd ./tor will
156 - Set LD_LIBRARY_PATH to include that dir. ldd ./tor should now
157 show you it's using the libs in that dir
159 - Reset oprofiles counters/start it
160 * "opcontrol --reset; opcontrol --start", if Nick remembers right.
161 - After a while, have it dump the stats on tor and all the libs
162 in that dir you created.
163 * "opcontrol --dump;"
164 * "opreport -l that_dir/*"
174 If possible, send your patch as one of these (in descending order of
177 - A git branch we can pull from
178 - Patches generated by git format-patch
183 - To build your code while configured with --enable-gcc-warnings?
184 - To run "make check-spaces" on your code?
185 - To run "make check-docs" to see whether all new options are on
187 - To write unit tests, as possible?
188 - To base your code on the appropriate branch?
189 - To include a file in the "changes" directory as appropriate?
191 Whitespace and C conformance
192 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
194 Invoke "make check-spaces" from time to time, so it can tell you about
195 deviations from our C whitespace style. Generally, we use:
197 - Unix-style line endings
198 - K&R-style indentation
199 - No space before newlines
200 - A blank line at the end of each file
201 - Never more than one blank line in a row
202 - Always spaces, never tabs
203 - No more than 79-columns per line.
204 - Two spaces per indent.
205 - A space between control keywords and their corresponding paren
206 "if (x)", "while (x)", and "switch (x)", never "if(x)", "while(x)", or
208 - A space between anything and an open brace.
209 - No space between a function name and an opening paren. "puts(x)", not
211 - Function declarations at the start of the line.
213 We try hard to build without warnings everywhere. In particular, if you're
214 using gcc, you should invoke the configure script with the option
215 "--enable-gcc-warnings". This will give a bunch of extra warning flags to
216 the compiler, and help us find divergences from our preferred C style.
218 Getting emacs to edit Tor source properly
219 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
221 Nick likes to put the following snippet in his .emacs file:
224 (add-hook 'c-mode-hook
227 (set-variable 'show-trailing-whitespace t)
229 (let ((fname (expand-file-name (buffer-file-name))))
231 ((string-match "^/home/nickm/src/libevent" fname)
232 (set-variable 'indent-tabs-mode t)
233 (set-variable 'c-basic-offset 4)
234 (set-variable 'tab-width 4))
235 ((string-match "^/home/nickm/src/tor" fname)
236 (set-variable 'indent-tabs-mode nil)
237 (set-variable 'c-basic-offset 2))
238 ((string-match "^/home/nickm/src/openssl" fname)
239 (set-variable 'indent-tabs-mode t)
240 (set-variable 'c-basic-offset 8)
241 (set-variable 'tab-width 8))
245 You'll note that it defaults to showing all trailing whitespace. The "cond"
246 test detects whether the file is one of a few C free software projects that I
247 often edit, and sets up the indentation level and tab preferences to match
250 If you want to try this out, you'll need to change the filename regex
251 patterns to match where you keep your Tor files.
253 If you use emacs for editing Tor and nothing else, you could always just say:
256 (add-hook 'c-mode-hook
259 (set-variable 'show-trailing-whitespace t)
260 (set-variable 'indent-tabs-mode nil)
261 (set-variable 'c-basic-offset 2)))
264 There is probably a better way to do this. No, we are probably not going
265 to clutter the files with emacs stuff.
271 We have some wrapper functions like tor_malloc, tor_free, tor_strdup, and
272 tor_gettimeofday; use them instead of their generic equivalents. (They
273 always succeed or exit.)
275 You can get a full list of the compatibility functions that Tor provides by
276 looking through src/common/util.h and src/common/compat.h. You can see the
277 available containers in src/common/containers.h. You should probably
278 familiarize yourself with these modules before you write too much code, or
279 else you'll wind up reinventing the wheel.
281 Use 'INLINE' instead of 'inline', so that we work properly on Windows.
283 Calling and naming conventions
284 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
286 Whenever possible, functions should return -1 on error and 0 on success.
288 For multi-word identifiers, use lowercase words combined with
289 underscores. (e.g., "multi_word_identifier"). Use ALL_CAPS for macros and
292 Typenames should end with "_t".
294 Function names should be prefixed with a module name or object name. (In
295 general, code to manipulate an object should be a module with the same name
296 as the object, so it's hard to tell which convention is used.)
298 Functions that do things should have imperative-verb names
299 (e.g. buffer_clear, buffer_resize); functions that return booleans should
300 have predicate names (e.g. buffer_is_empty, buffer_needs_resizing).
302 If you find that you have four or more possible return code values, it's
303 probably time to create an enum. If you find that you are passing three or
304 more flags to a function, it's probably time to create a flags argument that
310 Don't optimize anything if it's not in the critical path. Right now, the
311 critical path seems to be AES, logging, and the network itself. Feel free to
312 do your own profiling to determine otherwise.
317 https://trac.torproject.org/projects/tor/wiki/doc/TorFAQ#loglevel
319 No error or warning messages should be expected during normal OR or OP
322 If a library function is currently called such that failure always means ERR,
323 then the library function should log WARN and let the caller log ERR.
325 Every message of severity INFO or higher should either (A) be intelligible
326 to end-users who don't know the Tor source; or (B) somehow inform the
327 end-users that they aren't expected to understand the message (perhaps
328 with a string like "internal error"). Option (A) is to be preferred to
334 We use the 'doxygen' utility to generate documentation from our
335 source code. Here's how to use it:
337 1. Begin every file that should be documented with
340 * \brief Short description of the file.
343 (Doxygen will recognize any comment beginning with /** as special.)
345 2. Before any function, structure, #define, or variable you want to
346 document, add a comment of the form:
348 /** Describe the function's actions in imperative sentences.
350 * Use blank lines for paragraph breaks
356 * Write <b>argument_names</b> in boldface.
359 * place_example_code();
360 * between_code_and_endcode_commands();
364 3. Make sure to escape the characters "<", ">", "\", "%" and "#" as "\<",
365 "\>", "\\", "\%", and "\#".
367 4. To document structure members, you can use two forms:
370 /** You can put the comment before an element; */
372 int b; /**< Or use the less-than symbol to put the comment
373 * after the element. */
376 5. To generate documentation from the Tor source code, type:
380 To generate a file called 'Doxyfile'. Edit that file and run
381 'doxygen' to generate the API documentation.
383 6. See the Doxygen manual for more information; this summary just
384 scratches the surface.
386 Doxygen comment conventions
387 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
389 Say what functions do as a series of one or more imperative sentences, as
390 though you were telling somebody how to be the function. In other words, DO
393 /** The strtol function parses a number.
395 * nptr -- the string to parse. It can include whitespace.
396 * endptr -- a string pointer to hold the first thing that is not part
397 * of the number, if present.
398 * base -- the numeric base.
399 * returns: the resulting number.
401 long strtol(const char *nptr, char **nptr, int base);
403 Instead, please DO say:
405 /** Parse a number in radix <b>base</b> from the string <b>nptr</b>,
406 * and return the result. Skip all leading whitespace. If
407 * <b>endptr</b> is not NULL, set *<b>endptr</b> to the first character
408 * after the number parsed.
410 long strtol(const char *nptr, char **nptr, int base);
412 Doxygen comments are the contract in our abstraction-by-contract world: if
413 the functions that call your function rely on it doing something, then your
414 function should mention that it does that something in the documentation. If
415 you rely on a function doing something beyond what is in its documentation,
416 then you should watch out, or it might do something else later.
418 Putting out a new release
419 -------------------------
421 Here are the steps Roger takes when putting out a new Tor release:
423 1) Use it for a while, as a client, as a relay, as a hidden service,
424 and as a directory authority. See if it has any obvious bugs, and
427 1.5) As applicable, merge the maint-X branch into the release-X branch.
429 2) Gather the changes/* files into a changelog entry, rewriting many
430 of them and reordering to focus on what users and funders would find
431 interesting and understandable.
433 2.1) Make sure that everything that wants a bug number has one.
434 2.2) Concatenate them.
435 2.3) Sort them by section. Within each section, try to make the
436 first entry or two and the last entry most interesting: they're
437 the ones that skimmers tend to read.
442 "Fixes bug 9999; bugfix on 0.3.3.3-alpha."
444 One period after a space.
446 Make stuff very terse
448 Make sure each section name ends with a colon
450 Describe the user-visible problem right away
452 Mention relevant config options by name. If they're rare or unusual,
453 remind people what they're for
455 Avoid starting lines with open-paren
457 Present and imperative tense: not past.
459 Try not to let any given section be longer than about a page. Break up
460 long sections into subsections by some sort of common subtopic. This
461 guideline is especially important when organizing Release Notes for
464 If a given changes stanza showed up in a different release (e.g.
465 maint-0.2.1), be sure to make the stanzas identical (so people can
466 distinguish if these are the same change).
470 2.6) Clean everything one last time.
472 2.7) Run it through fmt to make it pretty.
474 3) Compose a short release blurb to highlight the user-facing
475 changes. Insert said release blurb into the ChangeLog stanza. If it's
476 a stable release, add it to the ReleaseNotes file too. If we're adding
477 to a release-0.2.x branch, manually commit the changelogs to the later
480 4) Bump the version number in configure.ac and rebuild.
482 5) Make dist, put the tarball up somewhere, and tell #tor about it. Wait
483 a while to see if anybody has problems building it. Try to get Sebastian
484 or somebody to try building it on Windows.
486 6) Get at least two of weasel/arma/sebastian to put the new version number
487 in their approved versions list.
489 7) Sign the tarball, then sign and push the git tag:
490 gpg -ba <the_tarball>
491 git tag -u <keyid> tor-0.2.x.y-status
492 git push origin tag tor-0.2.x.y-status
494 8) scp the tarball and its sig to the website in the dist/ directory
495 (i.e. /srv/www-master.torproject.org/htdocs/dist/ on vescum). Edit
496 include/versions.wmi to note the new version. From your website checkout,
497 run ./publish to build and publish the website.
499 Try not to delay too much between scp'ing the tarball and running
500 ./publish -- the website has multiple A records and your scp only sent
503 9) Email Erinn and weasel (cc'ing tor-assistants) that a new tarball
504 is up. This step should probably change to mailing more packagers.
506 10) Add the version number to Trac. To do this, go to Trac, log in,
507 select "Admin" near the top of the screen, then select "Versions" from
508 the menu on the left. At the right, there will be an "Add version"
509 box. By convention, we enter the version in the form "Tor:
510 0.2.2.23-alpha" (or whatever the version is), and we select the date as
511 the date in the ChangeLog.
513 11) Forward-port the ChangeLog.
515 12) Update the topic in #tor to reflect the new version.
517 12) Wait up to a day or two (for a development release), or until most
518 packages are up (for a stable release), and mail the release blurb and
519 changelog to tor-talk or tor-announce.
521 (We might be moving to faster announcements, but don't announce until
522 the website is at least updated.)