doc/HACKING/WritingTests.md

   1 # Writing tests for Tor: an incomplete guide
   2
   3 Tor uses a variety of testing frameworks and methodologies to try to
   4 keep from introducing bugs.  The major ones are:
   5
   6    1. Unit tests written in C and shipped with the Tor distribution.
   7
   8    2. Integration tests written in Python 2 (>= 2.7) or Python 3
   9       (>= 3.1) and shipped with the Tor distribution.
  10
  11    3. Integration tests written in Python and shipped with the Stem
  12       library.  Some of these use the Tor controller protocol.
  13
  14    4. System tests written in Python and SH, and shipped with the
  15       Chutney package.  These work by running many instances of Tor
  16       locally, and sending traffic through them.
  17
  18    5. The Shadow network simulator.
  19
  20 ## How to run these tests
  21
  22 ### The easy version
  23
  24 To run all the tests that come bundled with Tor, run `make check`.
  25
  26 To run the Stem tests as well, fetch stem from the git repository,
  27 set `STEM_SOURCE_DIR` to the checkout, and run `make test-stem`.
  28
  29 To run the Chutney tests as well, fetch chutney from the git repository,
  30 set `CHUTNEY_PATH` to the checkout, and run `make test-network`.
  31
  32 To run all of the above, run `make test-full`.
  33
  34 To run all of the above, plus tests that require a working connection to the
  35 internet, run `make test-full-online`.
  36
  37 ### Running particular subtests
  38
  39 The Tor unit tests are divided into separate programs and a couple of
  40 bundled unit test programs.
  41
  42 Separate programs are easy.  For example, to run the memwipe tests in
  43 isolation, you just run `./src/test/test-memwipe`.
  44
  45 To run tests within the unit test programs, you can specify the name
  46 of the test.  The string ".." can be used as a wildcard at the end of the
  47 test name.  For example, to run all the cell format tests, enter
  48 `./src/test/test cellfmt/..`.
  49
  50 Many tests that need to mess with global state run in forked subprocesses in
  51 order to keep from contaminating one another.  But when debugging a failing test,
  52 you might want to run it without forking a subprocess.  To do so, use the
  53 `--no-fork` option with a single test.  (If you specify it along with
  54 multiple tests, they might interfere.)
  55
  56 You can turn on logging in the unit tests by passing one of `--debug`,
  57 `--info`, `--notice`, or `--warn`.  By default only errors are displayed.
  58
  59 Unit tests are divided into `./src/test/test` and `./src/test/test-slow`.
  60 The former are those that should finish in a few seconds; the latter tend to
  61 take more time, and may include CPU-intensive operations, deliberate delays,
  62 and stuff like that.
  63
  64 ## Finding test coverage
  65
  66 Test coverage is a measurement of which lines your tests actually visit.
  67
  68 When you configure Tor with the `--enable-coverage` option, it should
  69 build with support for coverage in the unit tests, and in a special
  70 `tor-cov` binary.
  71
  72 Then, run the tests you'd like to see coverage from.  If you have old
  73 coverage output, you may need to run `reset-gcov` first.
  74
  75 Now you've got a bunch of files scattered around your build directories
  76 called `*.gcda`.  In order to extract the coverage output from them, make a
  77 temporary directory for them and run `./scripts/test/coverage ${TMPDIR}`,
  78 where `${TMPDIR}` is the temporary directory you made.  This will create a
  79 `.gcov` file for each source file under tests, containing that file's source
  80 annotated with the number of times the tests hit each line.  (You'll need to
  81 have gcov installed.)
  82
  83 You can get a summary of the test coverage for each file by running
  84 `./scripts/test/cov-display ${TMPDIR}/*` .  Each line lists the file's name,
  85 the number of uncovered lines, the number of uncovered lines, and the
  86 coverage percentage.
  87
  88 For a summary of the test coverage for each _function_, run
  89 `./scripts/test/cov-display -f ${TMPDIR}/*`.
  90
  91 For more details on using gcov, including the helper scripts in
  92 scripts/test, see HelpfulTools.md.
  93
  94 ### Comparing test coverage
  95
  96 Sometimes it's useful to compare test coverage for a branch you're writing to
  97 coverage from another branch (such as git master, for example).  But you
  98 can't run `diff` on the two coverage outputs directly, since the actual
  99 number of times each line is executed aren't so important, and aren't wholly
 100 deterministic.
 101
 102 Instead, follow the instructions above for each branch, creating a separate
 103 temporary directory for each.  Then, run `./scripts/test/cov-diff ${D1}
 104 ${D2}`, where D1 and D2 are the directories you want to compare.  This will
 105 produce a diff of the two directories, with all lines normalized to be either
 106 covered or uncovered.
 107
 108 To count new or modified uncovered lines in D2, you can run:
 109
 110 ```console
 111 $ ./scripts/test/cov-diff ${D1} ${D2}" | grep '^+ *\#' | wc -l
 112 ```
 113
 114 ## Marking lines as unreachable by tests
 115
 116 You can mark a specific line as unreachable by using the special
 117 string LCOV_EXCL_LINE.  You can mark a range of lines as unreachable
 118 with LCOV_EXCL_START... LCOV_EXCL_STOP.  Note that older versions of
 119 lcov don't understand these lines.
 120
 121 You can post-process .gcov files to make these lines 'unreached' by
 122 running ./scripts/test/cov-exclude on them.  It marks excluded
 123 unreached lines with 'x', and excluded reached lines with '!!!'.
 124
 125 Note: you should never do this unless the line is meant to 100%
 126 unreachable by actual code.
 127
 128 ## What kinds of test should I write?
 129
 130 Integration testing and unit testing are complementary: it's probably a
 131 good idea to make sure that your code is hit by both if you can.
 132
 133 If your code is very-low level, and its behavior is easily described in
 134 terms of a relation between inputs and outputs, or a set of state
 135 transitions, then it's a natural fit for unit tests.  (If not, please
 136 consider refactoring it until most of it _is_ a good fit for unit
 137 tests!)
 138
 139 If your code adds new externally visible functionality to Tor, it would
 140 be great to have a test for that functionality.  That's where
 141 integration tests more usually come in.
 142
 143 ## Unit and regression tests: Does this function do what it's supposed to?
 144
 145 Most of Tor's unit tests are made using the "tinytest" testing framework.
 146 You can see a guide to using it in the tinytest manual at
 147
 148     https://github.com/nmathewson/tinytest/blob/master/tinytest-manual.md
 149
 150 To add a new test of this kind, either edit an existing C file in `src/test/`,
 151 or create a new C file there.  Each test is a single function that must
 152 be indexed in the table at the end of the file.  We use the label "done:" as
 153 a cleanup point for all test functions.
 154
 155 If you have created a new test file, you will need to:
 156 1. Add the new test file to include.am
 157 2. In `test.h`, include the new test cases (testcase_t)
 158 3. In `test.c`, add the new test cases to testgroup_t testgroups
 159
 160 (Make sure you read `tinytest-manual.md` before proceeding.)
 161
 162 I use the term "unit test" and "regression tests" very sloppily here.
 163
 164 ## A simple example
 165
 166 Here's an example of a test function for a simple function in util.c:
 167
 168 ```c
 169 static void
 170 test_util_writepid(void *arg)
 171 {
 172     (void) arg;
 173
 174     char *contents = NULL;
 175     const char *fname = get_fname("tmp_pid");
 176     unsigned long pid;
 177     char c;
 178
 179     write_pidfile(fname);
 180
 181     contents = read_file_to_str(fname, 0, NULL);
 182     tt_assert(contents);
 183
 184     int n = sscanf(contents, "%lu\n%c", &pid, &c);
 185     tt_int_op(n, OP_EQ, 1);
 186     tt_int_op(pid, OP_EQ, getpid());
 187
 188 done:
 189     tor_free(contents);
 190 }
 191 ```
 192
 193 This should look pretty familiar to you if you've read the tinytest
 194 manual.  One thing to note here is that we use the testing-specific
 195 function `get_fname` to generate a file with respect to a temporary
 196 directory that the tests use.  You don't need to delete the file;
 197 it will get removed when the tests are done.
 198
 199 Also note our use of `OP_EQ` instead of `==` in the `tt_int_op()` calls.
 200 We define `OP_*` macros to use instead of the binary comparison
 201 operators so that analysis tools can more easily parse our code.
 202 (Coccinelle really hates to see `==` used as a macro argument.)
 203
 204 Finally, remember that by convention, all `*_free()` functions that
 205 Tor defines are defined to accept NULL harmlessly.  Thus, you don't
 206 need to say `if (contents)` in the cleanup block.
 207
 208 ## Exposing static functions for testing
 209
 210 Sometimes you need to test a function, but you don't want to expose
 211 it outside its usual module.
 212
 213 To support this, Tor's build system compiles a testing version of
 214 each module, with extra identifiers exposed.  If you want to
 215 declare a function as static but available for testing, use the
 216 macro `STATIC` instead of `static`.  Then, make sure there's a
 217 macro-protected declaration of the function in the module's header.
 218
 219 For example, `crypto_curve25519.h` contains:
 220
 221 ```c
 222 #ifdef CRYPTO_CURVE25519_PRIVATE
 223 STATIC int curve25519_impl(uint8_t *output, const uint8_t *secret,
 224         const uint8_t *basepoint);
 225 #endif
 226 ```
 227
 228 The `crypto_curve25519.c` file and the `test_crypto.c` file both define
 229 `CRYPTO_CURVE25519_PRIVATE`, so they can see this declaration.
 230
 231 ## STOP!  Does this test really test?
 232
 233 When writing tests, it's not enough to just generate coverage on all the
 234 lines of the code that you're testing:  It's important to make sure that
 235 the test _really tests_ the code.
 236
 237 For example, here is a _bad_ test for the unlink() function (which is
 238 supposed to remove a file).
 239
 240 ```c
 241 static void
 242 test_unlink_badly(void *arg)
 243 {
 244     (void) arg;
 245     int r;
 246
 247     const char *fname = get_fname("tmpfile");
 248
 249     /* If the file isn't there, unlink returns -1 and sets ENOENT */
 250     r = unlink(fname);
 251     tt_int_op(n, OP_EQ, -1);
 252     tt_int_op(errno, OP_EQ, ENOENT);
 253
 254     /* If the file DOES exist, unlink returns 0. */
 255     write_str_to_file(fname, "hello world", 0);
 256     r = unlink(fnme);
 257     tt_int_op(r, OP_EQ, 0);
 258
 259 done:
 260     tor_free(contents);
 261 }
 262 ```
 263
 264 This test might get very high coverage on unlink().  So why is it a
 265 bad test? Because it doesn't check that unlink() *actually removes the
 266 named file*!
 267
 268 Remember, the purpose of a test is to succeed if the code does what
 269 it's supposed to do, and fail otherwise.  Try to design your tests so
 270 that they check for the code's intended and documented functionality
 271 as much as possible.
 272
 273 ## Mock functions for testing in isolation
 274
 275 Often we want to test that a function works right, but the function to
 276 be tested depends on other functions whose behavior is hard to observe,
 277 or which require a working Tor network, or something like that.
 278
 279 To write tests for this case, you can replace the underlying functions
 280 with testing stubs while your unit test is running.  You need to declare
 281 the underlying function as 'mockable', as follows:
 282
 283 ```c
 284 MOCK_DECL(returntype, functionname, (argument list));
 285 ```
 286
 287 and then later implement it as:
 288
 289 ```c
 290 MOCK_IMPL(returntype, functionname, (argument list))
 291 {
 292     /* implementation here */
 293 }
 294 ```
 295
 296 For example, if you had a 'connect to remote server' function, you could
 297 declare it as:
 298
 299 ```c
 300 MOCK_DECL(int, connect_to_remote, (const char *name, status_t *status));
 301 ```
 302
 303 When you declare a function this way, it will be declared as normal in
 304 regular builds, but when the module is built for testing, it is declared
 305 as a function pointer initialized to the actual implementation.
 306
 307 In your tests, if you want to override the function with a temporary
 308 replacement, you say:
 309
 310 ```c
 311 MOCK(functionname, replacement_function_name);
 312 ```
 313
 314 And later, you can restore the original function with:
 315
 316 ```c
 317 UNMOCK(functionname);
 318 ```
 319
 320 For more information, see the definitions of this mocking logic in
 321 `testsupport.h`.
 322
 323 ## Okay but what should my tests actually do?
 324
 325 We talk above about "test coverage" -- making sure that your tests visit
 326 every line of code, or every branch of code.  But visiting the code isn't
 327 enough: we want to verify that it's correct.
 328
 329 So when writing tests, try to make tests that should pass with any correct
 330 implementation of the code, and that should fail if the code doesn't do what
 331 it's supposed to do.
 332
 333 You can write "black-box" tests or "glass-box" tests.  A black-box test is
 334 one that you write without looking at the structure of the function.  A
 335 glass-box one is one you implement while looking at how the function is
 336 implemented.
 337
 338 In either case, make sure to consider common cases *and* edge cases; success
 339 cases and failure csaes.
 340
 341 For example, consider testing this function:
 342
 343 ```c
 344 /** Remove all elements E from sl such that E==element.  Preserve
 345  * the order of any elements before E, but elements after E can be
 346  * rearranged.
 347  */
 348 void smartlist_remove(smartlist_t *sl, const void *element);
 349 ```
 350
 351 In order to test it well, you should write tests for at least all of the
 352 following cases.  (These would be black-box tests, since we're only looking
 353 at the declared behavior for the function:
 354
 355    * Remove an element that is in the smartlist.
 356    * Remove an element that is not in the smartlist.
 357    * Remove an element that appears in the smartlist more than once.
 358
 359 And your tests should verify that it behaves correct.  At minimum, you should
 360 test:
 361
 362    * That other elements before E are in the same order after you call the
 363      functions.
 364    * That the target element is really removed.
 365    * That _only_ the target element is removed.
 366
 367 When you consider edge cases, you might try:
 368
 369    * Remove an element from an empty list.
 370    * Remove an element from a singleton list containing that element.
 371    * Remove an element for a list containing several instances of that
 372      element, and nothing else.
 373
 374 Now let's look at the implementation:
 375
 376 ```c
 377 void
 378 smartlist_remove(smartlist_t *sl, const void *element)
 379 {
 380     int i;
 381     if (element == NULL)
 382         return;
 383     for (i=0; i < sl->num_used; i++)
 384         if (sl->list[i] == element) {
 385             sl->list[i] = sl->list[--sl->num_used]; /* swap with the end */
 386             i--; /* so we process the new i'th element */
 387             sl->list[sl->num_used] = NULL;
 388         }
 389 }
 390 ```
 391
 392 Based on the implementation, we now see three more edge cases to test:
 393
 394    * Removing NULL from the list.
 395    * Removing an element from the end of the list
 396    * Removing an element from a position other than the end of the list.
 397
 398 ## What should my tests NOT do?
 399
 400 Tests shouldn't require a network connection.
 401
 402 Whenever possible, tests shouldn't take more than a second.  Put the test
 403 into test/slow if it genuinely needs to be run.
 404
 405 Tests should not alter global state unless they run with `TT_FORK`: Tests
 406 should not require other tests to be run before or after them.
 407
 408 Tests should not leak memory or other resources.  To find out if your tests
 409 are leaking memory, run them under valgrind (see HelpfulTools.txt for more
 410 information on how to do that).
 411
 412 When possible, tests should not be over-fit to the implementation.  That is,
 413 the test should verify that the documented behavior is implemented, but
 414 should not break if other permissible behavior is later implemented.
 415
 416 ## Advanced techniques: Namespaces
 417
 418 Sometimes, when you're doing a lot of mocking at once, it's convenient to
 419 isolate your identifiers within a single namespace.  If this were C++, we'd
 420 already have namespaces, but for C, we do the best we can with macros and
 421 token-pasting.
 422
 423 We have some macros defined for this purpose in `src/test/test.h`.  To use
 424 them, you define `NS_MODULE` to a prefix to be used for your identifiers, and
 425 then use other macros in place of identifier names.  See `src/test/test.h` for
 426 more documentation.
 427
 428 ## Integration tests: Calling Tor from the outside
 429
 430 Some tests need to invoke Tor from the outside, and shouldn't run from the
 431 same process as the Tor test program.  Reasons for doing this might include:
 432
 433    * Testing the actual behavior of Tor when run from the command line
 434    * Testing that a crash-handler correctly logs a stack trace
 435    * Verifying that violating a sandbox or capability requirement will
 436      actually crash the program.
 437    * Needing to run as root in order to test capability inheritance or
 438      user switching.
 439
 440 To add one of these, you generally want a new C program in `src/test`.  Add it
 441 to `TESTS` and `noinst_PROGRAMS` if it can run on its own and return success or
 442 failure.  If it needs to be invoked multiple times, or it needs to be
 443 wrapped, add a new shell script to `TESTS`, and the new program to
 444 `noinst_PROGRAMS`.  If you need access to any environment variable from the
 445 makefile (eg `${PYTHON}` for a python interpreter), then make sure that the
 446 makefile exports them.
 447
 448 ## Writing integration tests with Stem
 449
 450 The 'stem' library includes extensive tests for the Tor controller protocol.
 451 You can run stem tests from tor with `make test-stem`, or see
 452 `https://stem.torproject.org/faq.html#how-do-i-run-the-tests`.
 453
 454 To see what tests are available, have a look around the `test/*` directory in
 455 stem. The first thing you'll notice is that there are both `unit` and `integ`
 456 tests. The former are for tests of the facilities provided by stem itself that
 457 can be tested on their own, without the need to hook up a tor process. These
 458 are less relevant, unless you want to develop a new stem feature. The latter,
 459 however, are a very useful tool to write tests for controller features. They
 460 provide a default environment with a connected tor instance that can be
 461 modified and queried. Adding more integration tests is a great way to increase
 462 the test coverage inside Tor, especially for controller features.
 463
 464 Let's assume you actually want to write a test for a previously untested
 465 controller feature. I'm picking the `exit-policy/*` GETINFO queries. Since
 466 these are a controller feature that we want to write an integration test for,
 467 the right file to modify is
 468 `https://gitweb.torproject.org/stem.git/tree/test/integ/control/controller.py`.
 469
 470 First off we notice that there is an integration test called
 471 `test_get_exit_policy()` that's already written. This exercises the interaction
 472 of stem's `Controller.get_exit_policy()` method, and is not relevant for our
 473 test since there are no stem methods to make use of all `exit-policy/*`
 474 queries (if there were, likely they'd be tested already. Maybe you want to
 475 write a stem feature, but I chose to just add tests).
 476
 477 Our test requires a tor controller connection, so we'll use the
 478 `@require_controller` annotation for our `test_exit_policy()` method. We need a
 479 controller instance, which we get from
 480 `test.runner.get_runner().get_tor_controller()`. The attached Tor instance is
 481 configured as a client, but the exit-policy GETINFO queries need a relay to
 482 work, so we have to change the config (using `controller.set_options()`). This
 483 is OK for us to do, we just have to remember to set DisableNetwork so we don't
 484 actually start an exit relay and also to undo the changes we made (by calling
 485 `controller.reset_conf()` at the end of our test). Additionally, we have to
 486 configure a static Address for Tor to use, because it refuses to build a
 487 descriptor when it can't guess a suitable IP address. Unfortunately, these
 488 kinds of tripwires are everywhere. Don't forget to file appropriate tickets if
 489 you notice any strange behaviour that seems totally unreasonable.
 490
 491 Check out the `test_exit_policy()` function in abovementioned file to see the
 492 final implementation for this test.
 493
 494 ## System testing with Chutney
 495
 496 The 'chutney' program configures and launches a set of Tor relays,
 497 authorities, and clients on your local host.  It has a `test network`
 498 functionality to send traffic through them and verify that the traffic
 499 arrives correctly.
 500
 501 You can write new test networks by adding them to `networks`. To add
 502 them to Tor's tests, add them to the `test-network` or `test-network-all`
 503 targets in `Makefile.am`.
 504
 505 (Adding new kinds of program to chutney will still require hacking the
 506 code.)
 507
 508 ## Other integration tests
 509
 510 It's fine to write tests that use a POSIX shell to invoke Tor or test other
 511 aspects of the system.  When you do this, have a look at our existing tests
 512 of this kind in `src/test/` to make sure that you haven't forgotten anything
 513 important.  For example: it can be tricky to make sure you're invoking Tor at
 514 the right path in various build scenarios.
 515
 516 We use a POSIX shell whenever possible here, and we use the shellcheck tool
 517 to make sure that our scripts portable.  We should only require bash for
 518 scripts that are developer-only.