t/README-WRITING-TESTS

   1 Writing Tests
   2 =============
   3
   4 The point of the testing library (available by sourcing the
   5 test-lib.sh file) is to assist with rapidly writing robust
   6 tests that produce TAP-compliant output.  (For a quick primer
   7 on TAP see the README file in the section "TAP - A Quick Overview".)
   8
   9 For a reference guide to the testing library itself see the
  10 README-TESTLIB file.  For a "how-to" write tests using the testing
  11 library, keep reading.
  12
  13
  14 -----------------
  15 Test Script Names
  16 -----------------
  17
  18 Test scripts should be executable POSIX-shell scripts named with an
  19 initial `t` followed by four (4) digits, a hyphen, more descriptive text
  20 and a `.sh` suffix.  In other words something like:
  21
  22     tNNNN-my-test-script.sh
  23
  24 where the "NNNN" part is a four-digit decimal number.
  25
  26
  27 --------------------
  28 Test Script Contents
  29 --------------------
  30
  31 Each executable test script file should contain (in this order) these
  32 elements:
  33
  34   1. A "shebang" line `#!/bin/sh` as the first line
  35   2. A non-empty assignment to the `test_description` variable
  36   3. A line sourcing the test library with `. ./test-lib.sh`
  37   4. An optional call to the `test_plan` function if you're nice
  38   5. One or more calls to `test_expect_...`/`test_tolerate_...` functions
  39   6. A call to the `test_done` function
  40
  41 Additional shell variable assignments, function definitions and other shell
  42 code may be interspersed between the "shebang" line and the `test_done` line
  43 (since `test_done` causes `exit` to be called nothing after it will be run).
  44
  45 Here's an example `t0000-test-true.sh` script:
  46
  47 ```sh
  48
  49 #!/bin/sh
  50
  51 test_description='test the true utility'
  52
  53 . ./test-lib.sh
  54
  55 test_plan 2
  56
  57 test_expect_success 'test true utility' '
  58         true
  59 '
  60
  61 test_expect_failure 'test ! true utility' '
  62         ! true
  63 '
  64
  65 test_done
  66
  67 ```
  68
  69
  70 test_plan
  71 ~~~~~~~~~
  72
  73 The `test_plan 2` line causes a `1..2` line to be output to standard output
  74 telling the TAP processor to expect two test result lines.
  75
  76 The TAP protocol allows this line to be output either _before all_ or
  77 _after all_ of the test result lines.  Calling `test_plan` causes it to be
  78 output before, omitting the `test_plan` line causes it to be output after (when
  79 the `test_done` function is called).
  80
  81 If you are nice and can count you include a `test_plan` call so that the TAP
  82 harness can output a decent progress display for test scripts with a lot of
  83 subtests in them.  If you are not so nice (or just plain lazy) you don't.
  84
  85
  86 test_expect_success
  87 ~~~~~~~~~~~~~~~~~~~
  88
  89 The example `test_expect_success` call shown above essentially becomes this:
  90
  91 ```sh
  92
  93 if eval "true"; then
  94         echo "ok 1 - test true utility"
  95 else
  96         echo "not ok 1 - test true utility"
  97 fi
  98
  99 ```
 100
 101
 102 test_expect_failure
 103 ~~~~~~~~~~~~~~~~~~~
 104
 105 The example `test_expect_failure` call shown above essentially becomes this:
 106
 107 ```sh
 108
 109 if eval "! true"; then
 110         echo "ok 2 - test ! true utility # TODO known breakage vanished"
 111 else
 112         echo "not ok 2 - test ! true utility # TODO known breakage"
 113 fi
 114
 115 ```
 116
 117 ---------------------
 118 Non-Zero Result Codes
 119 ---------------------
 120
 121 Sometimes a test "passes" when the command being run returns a non-zero result
 122 code.
 123
 124 For example, this must produce a non-zero result code to pass:
 125
 126     git -c my.bad=nada config --bool my.bad
 127
 128 So you could simply write this into the test script:
 129
 130     ! git -c my.bad=nada config --bool my.bad
 131
 132 The problem with that is that _any_ non-zero result code will cause it to
 133 succeed even if it dies because of a signal or because the command wasn't found
 134 or wasn't executable.
 135
 136 The testing library provides three different functions to help with this:
 137
 138   * `test_must_fail`
 139     Any non-signal exit failure is allowed (but it can be extended with an
 140     optional first argument to also permit success and/or `SIGPIPE`).
 141   * `test_might_fail`
 142     This is just a shortcut for calling `test_must_fail` with the optional
 143     first argument to also allow success.  The end result being that any
 144     non-signal error _or_ success is allowed.
 145   * `test_expect_code`
 146     The required first argument is the explicit (and only) allowed exit code.
 147
 148 So given those utility functions and knowing that `git` exits with a 128 status
 149 for the bad boolean, either of these would work:
 150
 151     test_must_fail git -c my.bad=nada config --bool my.bad
 152     test_expect_code 128 git -c my.bad=nada config --bool my.bad
 153
 154 If you want to be picky and require an exact non-zero exit code use the
 155 `test_expect_code` function.  Otherwise, to just require a non-signal and
 156 non-zero exit code use the `test_must_fail` function.
 157
 158 An example of when to use the `test_might_fail` option would be when using the
 159 `git config --unset` command -- it fails if the value being unset is not
 160 already set.  If you're using it you probably do not care that the value was
 161 not present just that if it is present it's successfully removed and as long
 162 as the command does not exit because of a signal like a segment violation it's
 163 probably fine.
 164
 165 That can be done like so:
 166
 167     test_might_fail git config --unset might-not.be-set
 168
 169
 170 -------------------------------
 171 Scripts, Functions and Failures
 172 -------------------------------
 173
 174 This is a perfectly valid test script fragment:
 175
 176     run_test_one() {
 177         # do some testing
 178         test_must_fail blah blah blah
 179         # do some more testing
 180         blah blah blah
 181     }
 182
 183     test_expect_success 'sample' 'run_test_one'
 184
 185 However, should the test fail, when the failing "script" is output to the log
 186 the only thing shown will be the single line `run_test_one` which is unlikely
 187 to be of much help diagnosing the problem.
 188
 189 Instead the above is typically written like so:
 190
 191     test_expect_success 'sample' '
 192         # do some testing
 193         test_must_fail blah blah blah
 194         # do some more testing
 195         blah blah blah
 196     '
 197
 198 It's just as readble, just as efficient and should it fail every line in the
 199 test "script" will appear in the log.
 200
 201 A problem sometimes arises with the quoting.  If the test script itself involves
 202 some complicated quoting, munging that so that it can be a single-quoted
 203 argument can be horribly confounding at times.
 204
 205 There are two solutions to the problem.  Either move the noxious quoting issue
 206 into a separate function and call that from the single-quoted test "script" or
 207 use the special "-" script.
 208
 209 Anything moved into an external function will not appear in the log of any
 210 failures (sometimes this is a good thing to keep the log more succinct.  It
 211 may make sense for "uninteresting" parts of the test "script" to be placed into
 212 external functions anyway for this reason).
 213
 214 However, when there is a confounding quotation issue but the lines in question
 215 really do belong in the log of any failures the special "-" script can be used
 216 to read the script from standard input as a "HERE" document like so:
 217
 218     test_expect_success 'sample' - <<'SCRIPT'
 219         # do some testing
 220         test_must_fail blah blah blah
 221         # do some more testing
 222         blah blah blah
 223         # Inside a 'quoted "here doc there are no quoting issues
 224     SCRIPT
 225
 226 The single drawback to this approach is that it's less efficient than either of
 227 the others (a `cat` process must be spawned to read the script) so should be
 228 reserved for only those unique cases of confounding quotation quandaries.
 229
 230
 231 --------------------
 232 Test Chaining and &&
 233 --------------------
 234
 235 Consider this test script fragment:
 236
 237     test_expect_success 'four lines' '
 238         one
 239         two
 240         three
 241         four
 242     '
 243
 244 What happens if "two" fails but none of the others do?
 245
 246 The answer is "it depends" ;)  In the Git version of the testing framework the
 247 answer is that the failure of "two" would always be overlooked.
 248
 249 However, both the Git version (and this version) contain a "feature" called
 250 test chain linting that tries to determine whether or not all of the statements
 251 in the test were chained together with '&&' to avoid this.
 252
 253 This "feature" is enabled by default and controlled by the
 254 `TESTLIB_TEST_CHAIN_LINT` variable which may be altered on a per-subtest basis
 255 or the default changed for an entire test script using the `--chain-lint` or
 256 `--no-chain-lint` option.
 257
 258 When enabled (the default) it will complain about the above test (with a nasty
 259 message of "broken &&-chain") and "Bail out!"
 260
 261 Rewriting the script thusly:
 262
 263     test_expect_success 'four lines' '
 264         one &&
 265         two &&
 266         three &&
 267         four
 268     '
 269
 270 satisfies the test chain monster and solves the problem where the result of a
 271 failing "two" could be ignored.
 272
 273 However, the chain linting monster is not terribly smart and this version
 274 escapes its grasp:
 275
 276     test_expect_success 'four lines' '{
 277         one
 278         two
 279         three
 280         four
 281     }'
 282
 283 So while it is indeed helpful in finding these things, it's not foolproof.
 284
 285 Here's where the difference compared with the Git version comes in.  This
 286 version of the testing library normally "eval"s the "script" in a subshell
 287 which Git's version does not.  (This can be controlled with the
 288 `TESTLIB_TEST_NO_SUBSHELL` variable if necessary.)
 289
 290 As a bonus, when the subshell functionality is _not_ disabled (the default)
 291 the "script" is run in a `set -e` (aka `set -o errexit`) subshell environment.
 292
 293 That's not always foolproof either but it is an improvement and as a result
 294 this version of the testing library will, indeed, catch a failure of just the
 295 "two" command in the final example above that uses the `{`...`}` version.
 296
 297
 298 -----------------
 299 More On Subshells
 300 -----------------
 301
 302 This will not work as expected:
 303
 304     test_expect_success 'first' '
 305         : it works &&
 306         itworked=1
 307     '
 308
 309     test_expect_success 'check' '
 310         test "$itworked" = "1"
 311     '
 312
 313 While it _will_ succeed in Git's version of the testing library, it will fail
 314 by default in this one because each test is "eval"'d in a subshell by default.
 315
 316 This also means the `test_when_finished` function normally does not work either
 317 (although that may eventually be corrected).
 318
 319 To make it so that the first test can affect the environment of the test script
 320 the `TESTLIB_TEST_NO_SUBSHELL` variable can be set like so:
 321
 322     TESTLIB_TEST_NO_SUBSHELL=1
 323     test_expect_success 'first' '
 324         : it works &&
 325         itworked=1
 326     '
 327     unset TESTLIB_TEST_NO_SUBSHELL
 328
 329     test_expect_success 'check' '
 330         test "$itworked" = "1"
 331     '
 332
 333 Strictly speaking it does not need to be unset before the "check" second
 334 subtest but it doesn't hurt to do so since it is only the first subtest that
 335 needs to modify the environment of the test script (the "check" subtest just
 336 reads it but does not modify it).
 337
 338 Setting `TESTLIB_TEST_NO_SUBSHELL` also allows the `test_when_finished`
 339 function to work (assuming it's used in a context where it would otherwise have
 340 worked).
 341
 342 Avoid setting the `TESTLIB_TEST_NO_SUBSHELL` if at all possible because
 343 allowing subtests to affect the environment of the test script itself can
 344 inadvertently cause subsequent subtests to pass when they shouldn't or
 345 vice versa in sometimes very subtle and hard to detect ways.
 346
 347 For example, a test script with a hundred subtests in it where one of the early
 348 subtests leaves behind a variable turd in a variable that one of the later
 349 subtests assumes is unset.  In a test script with many subtests this
 350 cross-subtest variable contamination is not really all that uncommon and the
 351 default of running each subtest in a subshell prevents it from happening in the
 352 first place.
 353
 354 The test script runs in its very own trash directory.  If you really, really,
 355 really (but not just "really, really" ;) need to communicate information from
 356 inside one of the subtest "eval" scripts back out, have it write the
 357 information into a temporary file in the current directory.
 358
 359 Sometimes just a "flag" is enough so simply creating a file and then testing
 360 for its existence will do.  The same cross-subtest contamination problem is
 361 possible with this mechanism as well.  It's best to treat each subtest as a
 362 black box into which information flows but only a single "ok" or "not ok" comes
 363 back out.
 364
 365 If all that's needed is to check whether or not the previous subtest succeeded
 366 then the `LASTOK` prerequisite may be used as described in the next section.
 367
 368
 369 -----------------------------
 370 Chaining Subtests with LASTOK
 371 -----------------------------
 372
 373 Sometimes while two consecutive subtests are logically separate tests that do
 374 not belong in a single test, it does not make sense to run the second (or
 375 several subsequent subtests) if the first one fails.
 376
 377 The special `LASTOK` prerequisite can be used to skip a subtest if the last
 378 non-skipped subtest in the test script did not succeed.  For the purpose
 379 of the `LASTOK` prerequisite check, "succeed" means any test result line (even
 380 if it was ultimately suppressed due to `$test_external_has_tap` not being `0`)
 381 that begins with "ok".  Any "# SKIP" result lines are totally ignored and do
 382 not change the state of the `LASTOK` prerequisite.
 383
 384 When a test script starts, the `LASTOK` prerequisite is implicitly true so that
 385 it will succeed if used on the first subtest in a script file.
 386
 387 Here's an example script to consider:
 388
 389 ```sh
 390 #!/bin/sh
 391
 392 test_description="LASTOK example"
 393
 394 . ./test-lib.sh
 395
 396 test_plan 4 # 'cause we can count
 397
 398 test_expect_success 'works' ':'
 399 test_expect_success LASTOK 'followup' ':'
 400 test_expect_success LASTOK 'confirm' 'echo last was ok'
 401 test_expect_success !LASTOK 'alt' 'echo last was not ok'
 402
 403 test_done
 404 ```
 405
 406 When run, subtests 1-3 are "ok" and subtest 4 is "ok # SKIP".
 407
 408 If the "script" for the first subtest is changed to "! :" instead then
 409 subtest 1 is "not ok", subtests 2-3 are "ok # SKIP" and subtest 4 is "ok".
 410
 411 Notice how the skipped subtest 2 does not change the value of the `LASTOK`
 412 prerequisite check in this case so that subtest 3 is also skipped which also
 413 does not affect the value of `LASTOK` allowing subtest 4 to _not_ be skipped.
 414
 415 If the first subtest is changed to `test_expect_failure` still using the
 416 altered "! :" script then subtest 1 is "not ok # TODO", subtests 2-3 are
 417 "ok # SKIP" and subtest 4 is "ok".
 418
 419 The difference between using `test_expect_success` and `test_expect_failure`
 420 with the altered script "! :" on the first subtest is that using
 421 `test_expect_success` means the outcome is "1 of 4 failed" versus the
 422 `test_expect_failure` result of "all 4 passed".
 423
 424 The `LASTOK` literally checks for the "ok" not whether it's a "# TODO" or not.