t/README-WRITING-TESTS

   1 Writing Tests
   2 =============
   3
   4 The point of the testing library (available by sourcing the
   5 test-lib.sh file) is to assist with rapidly writing robust
   6 tests that produce TAP-compliant output.  (For a quick primer
   7 on TAP see the README file in the section "TAP - A Quick Overview".)
   8
   9 For a reference guide to the testing library itself see the
  10 README-TESTLIB file.  For a "how-to" write tests using the testing
  11 library, keep reading.
  12
  13
  14 -----------------
  15 Test Script Names
  16 -----------------
  17
  18 Test scripts should be executable POSIX-shell scripts named with an
  19 initial `t` followed by four (4) digits, a hyphen, more descriptive text
  20 and a `.sh` suffix.  In other words something like:
  21
  22     tNNNN-my-test-script.sh
  23
  24 where the "NNNN" part is a four-digit decimal number.
  25
  26
  27 --------------------
  28 Test Script Contents
  29 --------------------
  30
  31 Each executable test script file should contain (in this order) these
  32 elements:
  33
  34   1. A "shebang" line `#!/bin/sh` as the first line
  35   2. A non-empty assignment to the `test_description` variable
  36   3. A line sourcing the test library with `. ./test-lib.sh`
  37   4. A (barely optional) call to the `test_plan` function if you're nice
  38   5. One or more calls to `test_expect_...`/`test_tolerate_...` functions
  39   6. A call to the `test_done` function
  40
  41 Additional shell variable assignments, function definitions and other shell
  42 code may be interspersed between the "shebang" line and the `test_done` line
  43 (since `test_done` causes `exit` to be called nothing after it will be run).
  44
  45 Here's an example `t0000-test-true.sh` script:
  46
  47 ```sh
  48
  49 #!/bin/sh
  50
  51 test_description='test the true utility'
  52
  53 . ./test-lib.sh
  54
  55 test_plan 2
  56
  57 test_expect_success 'test true utility' '
  58         true
  59 '
  60
  61 test_expect_failure 'test ! true utility' '
  62         ! true
  63 '
  64
  65 test_done
  66
  67 ```
  68
  69
  70 test_plan
  71 ~~~~~~~~~
  72
  73 The `test_plan 2` line causes a `1..2` line to be output to standard output
  74 telling the TAP processor to expect two test result lines.
  75
  76 The TAP protocol allows this line to be output either _before all_ or
  77 _after all_ of the test result lines.  Calling `test_plan` causes it to be
  78 output before, omitting the `test_plan` line causes it to be output after (when
  79 the `test_done` function is called along with a warning).
  80
  81 If you are nice and can count you include a `test_plan` call so that the TAP
  82 harness can output a decent progress display for test scripts with a lot of
  83 subtests in them.  If you are not so nice (or just plain lazy) you don't.
  84 (If the number of subtests truly varies there's an option for that as well.)
  85
  86
  87 test_expect_success
  88 ~~~~~~~~~~~~~~~~~~~
  89
  90 The example `test_expect_success` call shown above essentially becomes this:
  91
  92 ```sh
  93
  94 if eval "true"; then
  95         echo "ok 1 - test true utility"
  96 else
  97         echo "not ok 1 - test true utility"
  98 fi
  99
 100 ```
 101
 102
 103 test_expect_failure
 104 ~~~~~~~~~~~~~~~~~~~
 105
 106 The example `test_expect_failure` call shown above essentially becomes this:
 107
 108 ```sh
 109
 110 if eval "! true"; then
 111         echo "ok 2 - test ! true utility # TODO known breakage vanished"
 112 else
 113         echo "not ok 2 - test ! true utility # TODO known breakage"
 114 fi
 115
 116 ```
 117
 118 ---------------------
 119 Non-Zero Result Codes
 120 ---------------------
 121
 122 Sometimes a test "passes" when the command being run returns a non-zero result
 123 code.
 124
 125 For example, this must produce a non-zero result code to pass:
 126
 127     git -c my.bad=nada config --bool my.bad
 128
 129 So you could simply write this into the test script:
 130
 131     ! git -c my.bad=nada config --bool my.bad
 132
 133 The problem with that is that _any_ non-zero result code will cause it to
 134 succeed even if it dies because of a signal or because the command wasn't found
 135 or wasn't executable.
 136
 137 The testing library provides three different functions to help with this:
 138
 139   * `test_must_fail`
 140     Any non-signal exit failure is allowed (but it can be extended with an
 141     optional first argument to also permit success and/or `SIGPIPE`).
 142   * `test_might_fail`
 143     This is just a shortcut for calling `test_must_fail` with the optional
 144     first argument to also allow success.  The end result being that any
 145     non-signal error _or_ success is allowed.
 146   * `test_expect_code`
 147     The required first argument is the explicit (and only) allowed exit code.
 148
 149 So given those utility functions and knowing that `git` exits with a 128 status
 150 for the bad boolean, either of these would work:
 151
 152     test_must_fail git -c my.bad=nada config --bool my.bad
 153     test_expect_code 128 git -c my.bad=nada config --bool my.bad
 154
 155 If you want to be picky and require an exact non-zero exit code use the
 156 `test_expect_code` function.  Otherwise, to just require a non-signal and
 157 non-zero exit code use the `test_must_fail` function.
 158
 159 An example of when to use the `test_might_fail` option would be when using the
 160 `git config --unset` command -- it fails if the value being unset is not
 161 already set.  If you're using it you probably do not care that the value was
 162 not present just that if it is present it's successfully removed and as long
 163 as the command does not exit because of a signal like a segment violation it's
 164 probably fine.
 165
 166 That can be done like so:
 167
 168     test_might_fail git config --unset might-not.be-set
 169
 170
 171 -------------------------------
 172 Scripts, Functions and Failures
 173 -------------------------------
 174
 175 This is a perfectly valid test script fragment:
 176
 177     run_test_one() {
 178         # do some testing
 179         test_must_fail blah blah blah
 180         # do some more testing
 181         blah blah blah
 182     }
 183
 184     test_expect_success 'sample' 'run_test_one'
 185
 186 However, should the test fail, when the failing "script" is output to the log
 187 the only thing shown will be the single line `run_test_one` which is unlikely
 188 to be of much help diagnosing the problem.
 189
 190 Instead the above is typically written like so:
 191
 192     test_expect_success 'sample' '
 193         # do some testing
 194         test_must_fail blah blah blah
 195         # do some more testing
 196         blah blah blah
 197     '
 198
 199 It's just as readable, just as efficient and should it fail every line in the
 200 test "script" will appear in the log.
 201
 202 A problem sometimes arises with the quoting.  If the test script itself involves
 203 some complicated quoting, munging that so that it can be a single-quoted
 204 argument can be horribly confounding at times.
 205
 206 There are two solutions to the problem.  Either move the noxious quoting issue
 207 into a separate function and call that from the single-quoted test "script" or
 208 use the special "-" script.
 209
 210 Anything moved into an external function will not appear in the log of any
 211 failures (sometimes this is a good thing to keep the log more succinct).  It
 212 may make sense for "uninteresting" parts of the test "script" to be placed into
 213 external functions anyway for this reason.
 214
 215 However, when there is a confounding quotation issue but the lines in question
 216 really do belong in the log of any failures the special "-" script can be used
 217 to read the script from standard input as a "HERE" document like so:
 218
 219     test_expect_success 'sample' - <<'SCRIPT'
 220         # do some testing
 221         test_must_fail blah blah blah
 222         # do some more testing
 223         blah blah blah
 224         # Inside a 'quoted "here doc there are no quoting issues
 225     SCRIPT
 226
 227 The single drawback to this approach is that it's less efficient than either of
 228 the others (a `cat` process must be spawned to read the script) so should be
 229 reserved for only those unique cases of confounding quotation quandaries.
 230
 231
 232 --------------------
 233 Test Chaining and &&
 234 --------------------
 235
 236 Consider this test script fragment:
 237
 238     test_expect_success 'four lines' '
 239         one
 240         two
 241         three
 242         four
 243     '
 244
 245 What happens if "two" fails but none of the others do?
 246
 247 The answer is "it depends" ;)  In the Git version of the testing framework the
 248 answer is that the failure of "two" would always be overlooked.
 249
 250 However, both the Git version (and this version) contain a "feature" called
 251 test chain linting that tries to determine whether or not all of the statements
 252 in the test were chained together with '&&' to avoid this.
 253
 254 This "feature" is enabled by default and controlled by the
 255 `TESTLIB_TEST_CHAIN_LINT` variable which may be altered on a per-subtest basis
 256 or the default changed for an entire test script using the `--chain-lint` or
 257 `--no-chain-lint` option.
 258
 259 When enabled (the default) it will complain about the above test (with a nasty
 260 message of "broken &&-chain") and "Bail out!"
 261
 262 Rewriting the script thusly:
 263
 264     test_expect_success 'four lines' '
 265         one &&
 266         two &&
 267         three &&
 268         four
 269     '
 270
 271 satisfies the test chain monster and solves the problem where the result of a
 272 failing "two" could be ignored.
 273
 274 However, the chain linting monster is not terribly smart and this version
 275 escapes its grasp:
 276
 277     test_expect_success 'four lines' '{
 278         one
 279         two
 280         three
 281         four
 282     }'
 283
 284 So while it is indeed helpful in finding these things, it's not foolproof.
 285
 286 Here's where the difference compared with the Git version comes in.  This
 287 version of the testing library normally "eval"s the "script" in a subshell
 288 which Git's version does not.  (This can be controlled with the
 289 `TESTLIB_TEST_NO_SUBSHELL` variable if necessary.)
 290
 291 As a bonus, when the subshell functionality is _not_ disabled (the default)
 292 the "script" is run in a `set -e` (aka `set -o errexit`) subshell environment.
 293
 294 That's not always foolproof either but it is an improvement and as a result
 295 this version of the testing library will, indeed, catch a failure of just the
 296 "two" command in the final example above that uses the `{`...`}` version.
 297
 298
 299 -----------------
 300 More On Subshells
 301 -----------------
 302
 303 This will not work as expected:
 304
 305     test_expect_success 'first' '
 306         : it works &&
 307         itworked=1
 308     '
 309
 310     test_expect_success 'check' '
 311         test "$itworked" = "1"
 312     '
 313
 314 While it _will_ succeed in Git's version of the testing library, it will fail
 315 by default in this one because each test is "eval"'d in a subshell by default.
 316
 317 Ordinarily this would also mean the `test_when_finished` function would not
 318 work either.  However, the `test_when_finished` function takes great pains to
 319 save the re-quoted arguments to a temporary script and execute that AFTER the
 320 subshell has exited.  There's still a "gotcha" with this though because,
 321 obviously, the temporary script cannot refer to any variables set within the
 322 subshell as the subshell will have already exited.  This usually does not
 323 present that much of a problem in practice and it _does_ work from within
 324 nested subshells (to any depth) which the Git version does not.
 325
 326 For example, this alteration will make the above work:
 327
 328     test_expect_success 'first' '
 329         : it works &&
 330         test_when_finished itworked=1
 331     '
 332
 333     test_expect_success 'check' '
 334         test "$itworked" = "1"
 335     '
 336
 337 To make it so that the first test can affect the environment of the test script
 338 directly, the `TESTLIB_TEST_NO_SUBSHELL` variable can be set like so:
 339
 340     TESTLIB_TEST_NO_SUBSHELL=1
 341     test_expect_success 'first' '
 342         : it works &&
 343         itworked=1
 344     '
 345     unset TESTLIB_TEST_NO_SUBSHELL
 346
 347     test_expect_success 'check' '
 348         test "$itworked" = "1"
 349     '
 350
 351 Strictly speaking it does not need to be unset before the "check" second
 352 subtest but it doesn't hurt to do so since it is only the first subtest that
 353 needs to modify the environment of the test script (the "check" subtest just
 354 reads it but does not modify it).
 355
 356 Setting `TESTLIB_TEST_NO_SUBSHELL` also allows the `test_when_finished`
 357 function to access variables from within the subshell (assuming it's used in
 358 a context where it would otherwise have worked).
 359
 360 For example, this use of `test_when_finished` requires "no subshell" to work:
 361
 362     TESTLIB_TEST_NO_SUBSHELL=1
 363     test_expect_success 'first' '
 364         test_when_finished "itworked=\$resultval" &&
 365         : it works &&
 366         resultval=1
 367     '
 368     unset TESTLIB_TEST_NO_SUBSHELL
 369
 370     test_expect_success 'check' '
 371         test "$itworked" = "1"
 372     '
 373
 374 Without the "no subshell" setting when the temporary `test_when_finished`
 375 script gets executed the value of `resultval` would already have been discarded
 376 thereby causing the following subtest to fail.
 377
 378 Avoid setting the `TESTLIB_TEST_NO_SUBSHELL` if at all possible because
 379 allowing subtests to affect the environment of the test script itself can
 380 inadvertently cause subsequent subtests to pass when they shouldn't or
 381 vice versa in sometimes very subtle and hard to detect ways.
 382
 383 For example, a test script with a hundred subtests in it where one of the early
 384 subtests leaves behind a variable turd in a variable that one of the later
 385 subtests assumes is unset.  In a test script with many subtests this
 386 cross-subtest variable contamination is not really all that uncommon and the
 387 default of running each subtest in a subshell prevents it from happening in the
 388 first place.
 389
 390 The test script runs in its very own trash directory.  If you really, really,
 391 really (but not just "really, really" ;) need to communicate information from
 392 inside one of the subtest "eval" scripts back out, have it write the
 393 information into a temporary file in the current directory.  For small tidbits
 394 of information just use the `test_when_finished` function instead.
 395
 396 Sometimes just a "flag" is enough so simply creating a file and then testing
 397 for its existence or using `test_when_finished` to set a variable will do.
 398 The same cross-subtest contamination problem is possible with this mechanism as
 399 well.  It's best to treat each subtest as a black box into which information
 400 flows but only a single "ok" or "not ok" comes back out.
 401
 402 If all that's needed is to check whether or not the previous subtest succeeded
 403 then the `LASTOK` prerequisite may be used as described in the next section.
 404
 405
 406 -----------------------------
 407 Chaining Subtests with LASTOK
 408 -----------------------------
 409
 410 Sometimes while two consecutive subtests are logically separate tests that do
 411 not belong in a single test, it does not make sense to run the second (or
 412 several subsequent subtests) if the first one fails.
 413
 414 The special `LASTOK` prerequisite can be used to skip a subtest if the last
 415 non-skipped subtest in the test script did not succeed.  For the purpose
 416 of the `LASTOK` prerequisite check, "succeed" means any test result line (even
 417 if it was ultimately suppressed due to `$test_external_has_tap` not being `0`)
 418 that begins with "ok".  Any "# SKIP" result lines are totally ignored and do
 419 not change the state of the `LASTOK` prerequisite.
 420
 421 When a test script starts, the `LASTOK` prerequisite is implicitly true so that
 422 it will succeed if used on the first subtest in a script file.
 423
 424 Here's an example script to consider:
 425
 426 ```sh
 427 #!/bin/sh
 428
 429 test_description="LASTOK example"
 430
 431 . ./test-lib.sh
 432
 433 test_plan 4 # 'cause we can count
 434
 435 test_expect_success 'works' ':'
 436 test_expect_success LASTOK 'followup' ':'
 437 test_expect_success LASTOK 'confirm' 'echo last was ok'
 438 test_expect_success !LASTOK 'alt' 'echo last was not ok'
 439
 440 test_done
 441 ```
 442
 443 When run, subtests 1-3 are "ok" and subtest 4 is "ok # SKIP".
 444
 445 If the "script" for the first subtest is changed to "! :" instead then
 446 subtest 1 is "not ok", subtests 2-3 are "ok # SKIP" and subtest 4 is "ok".
 447
 448 Notice how the skipped subtest 2 does not change the value of the `LASTOK`
 449 prerequisite check in this case so that subtest 3 is also skipped which also
 450 does not affect the value of `LASTOK` allowing subtest 4 to _not_ be skipped.
 451
 452 If the first subtest is changed to `test_expect_failure` still using the
 453 altered "! :" script then subtest 1 is "not ok # TODO", subtests 2-3 are
 454 "ok # SKIP" and subtest 4 is "ok".
 455
 456 The difference between using `test_expect_success` and `test_expect_failure`
 457 with the altered script "! :" on the first subtest is that using
 458 `test_expect_success` means the outcome is "1 of 4 failed" versus the
 459 `test_expect_failure` result of "all 4 passed".
 460
 461 The `LASTOK` literally checks for the "ok" not whether it's a "# TODO" or not.