manuals/development.md

   1 # Development
   2
   3 Updated: 22 Dec 2023
   4
   5 This document is meant for the day when I (Gavin D. Howard) get [hit by a
   6 bus][1]. In other words, it's meant to make the [bus factor][1] a non-issue.
   7
   8 This document is supposed to contain all of the knowledge necessary to develop
   9 `bc` and `dc`.
  10
  11 In addition, this document is meant to add to the [oral tradition of software
  12 engineering][118], as described by Bryan Cantrill.
  13
  14 This document will reference other parts of the repository. That is so a lot of
  15 the documentation can be closest to the part of the repo where it is actually
  16 necessary.
  17
  18 ## What Is It?
  19
  20 This repository contains an implementation of both [POSIX `bc`][2] and [Unix
  21 `dc`][3].
  22
  23 POSIX `bc` is a standard utility required for POSIX systems. `dc` is a
  24 historical utility that was included in early Unix and even predates both Unix
  25 and C. They both are arbitrary-precision command-line calculators with their own
  26 programming languages. `bc`'s language looks similar to C, with infix notation
  27 and including functions, while `dc` uses [Reverse Polish Notation][4] and allows
  28 the user to execute strings as though they were functions.
  29
  30 In addition, it is also possible to build the arbitrary-precision math as a
  31 library, named [`bcl`][156].
  32
  33 **Note**: for ease, I will refer to both programs as `bc` in this document.
  34 However, if I say "just `bc`," I am referring to just `bc`, and if I say `dc`, I
  35 am referring to just `dc`.
  36
  37 ### History
  38
  39 This project started in January 2018 when a certain individual on IRC, hearing
  40 that I knew how to write parsers, asked me to write a `bc` parser for his math
  41 library. I did so. I thought about writing my own math library, but he
  42 disparaged my programming skills and made me think that I couldn't do it.
  43
  44 However, he took so long to do it that I eventually decided to give it a try and
  45 had a working math portion in two weeks. It taught me that I should not listen
  46 to such people.
  47
  48 From that point, I decided to make it an extreme learning experience about how
  49 to write quality software.
  50
  51 That individual's main goal had been to get his `bc` into [toybox][16], and I
  52 managed to get my own `bc` in. I also got it in [busybox][17].
  53
  54 Eventually, in late 2018, I also decided to try my hand at implementing
  55 [Karatsuba multiplication][18], an algorithm that that unnamed individual
  56 claimed I could never implement. It took me a bit, but I did it.
  57
  58 This project became a passion project for me, and I continued. In mid-2019,
  59 Stefan Eßer suggested I improve performance by putting more than 1 digit in each
  60 section of the numbers. After I showed immaturity because of some burnout, I
  61 implemented his suggestion, and the results were incredible.
  62
  63 Since that time, I have gradually been improving the `bc` as I have learned more
  64 about things like fuzzing, [`scan-build`][19], [valgrind][20],
  65 [AddressSanitizer][21] (and the other sanitizers), and many other things.
  66
  67 One of my happiest moments was when my `bc` was made the default in FreeBSD.
  68 Another happiest moment was when I found out that my `bc` had shipped with macOS
  69 Ventura, without my knowledge.
  70
  71 But since I believe in [finishing the software I write][22], I have done less
  72 work on `bc` over time, though there are still times when I put a lot of effort
  73 in, such as now (17 June 2021), when I am attempting to convince OpenBSD to use
  74 my `bc`.
  75
  76 And that is why I am writing this document: someday, someone else is going to
  77 want to change my code, and this document is my attempt to make it as simple as
  78 possible.
  79
  80 ### Values
  81
  82 [According to Bryan Cantrill][10], all software has values. I think he's
  83 correct, though I [added one value for programming languages in particular][11].
  84
  85 However, for `bc`, his original list will do:
  86
  87 * Approachability
  88 * Availability
  89 * Compatibility
  90 * Composability
  91 * Debuggability
  92 * Expressiveness
  93 * Extensibility
  94 * Interoperability
  95 * Integrity
  96 * Maintainability
  97 * Measurability
  98 * Operability
  99 * Performance
 100 * Portability
 101 * Resiliency
 102 * Rigor
 103 * Robustness
 104 * Safety
 105 * Security
 106 * Simplicity
 107 * Stability
 108 * Thoroughness
 109 * Transparency
 110 * Velocity
 111
 112 There are several values that don't apply. The reason they don't apply is
 113 because `bc` and `dc` are existing utilities; this is just another
 114 reimplementation. The designs of `bc` and `dc` are set in stone; there is
 115 nothing we can do to change them, so let's get rid of those values that would
 116 apply to their design:
 117
 118 * Compatibility
 119 * Integrity
 120 * Maintainability
 121 * Measurability
 122 * Performance
 123 * Portability
 124 * Resiliency
 125 * Rigor
 126 * Robustness
 127 * Safety
 128 * Security
 129 * Simplicity
 130 * Stability
 131 * Thoroughness
 132 * Transparency
 133
 134 Furthermore, some of the remaining ones don't matter to me, so let me get rid of
 135 those and order the rest according to my *actual* values for this project:
 136
 137 * Robustness
 138 * Stability
 139 * Portability
 140 * Compatibility
 141 * Performance
 142 * Security
 143 * Simplicity
 144
 145 First is **robustness**. This `bc` and `dc` should be robust, accepting any
 146 input, never crashing, and instead, returning an error.
 147
 148 Closely related to that is **stability**. The execution of `bc` and `dc` should
 149 be deterministic and never change for the same inputs, including the
 150 pseudo-random number generator (for the same seed).
 151
 152 Third is **portability**. These programs should run everywhere that POSIX
 153 exists, as well as Windows. This means that just about every person on the
 154 planet will have access to these programs.
 155
 156 Next is **compatibility**. These programs should, as much as possible, be
 157 compatible with other existing implementations and standards.
 158
 159 Then we come to **performance**. A calculator is only usable if it's fast, so
 160 these programs should run as fast as possible.
 161
 162 After that is **security**. These programs should *never* be the reason a user's
 163 computer is compromised.
 164
 165 And finally, **simplicity**. Where possible, the code should be simple, while
 166 deferring to the above values.
 167
 168 Keep these values in mind for the rest of this document, and for exploring any
 169 other part of this repo.
 170
 171 #### Portability
 172
 173 But before I go on, I want to talk about portability in particular.
 174
 175 Most of these principles just require good attention and care, but portability
 176 is different. Sometimes, it requires pulling in code from other places and
 177 adapting it. In other words, sometimes I need to duplicate and adapt code.
 178
 179 This happened in a few cases:
 180
 181 * Option parsing (see [`include/opt.h`][35]).
 182 * History (see [`include/history.h`][36]).
 183 * Pseudo-Random Number Generator (see [`include/rand.h`][37]).
 184
 185 This was done because I decided to ensure that `bc`'s dependencies were
 186 basically zero. In particular, either users have a normal install of Windows or
 187 they have a POSIX system.
 188
 189 A POSIX system limited me to C99, `sh`, and zero external dependencies. That
 190 last item is why I pull code into `bc`: if I pull it in, it's not an external
 191 dependency.
 192
 193 That's why `bc` has duplicated code. Remove it, and you risk `bc` not being
 194 portable to some platforms.
 195
 196 ## Suggested Course
 197
 198 I do have a suggested course for programmers to follow when trying to understand
 199 this codebase. The order is this:
 200
 201 1.      `bc` Spec.
 202 2.      Manpages.
 203 3.      Test suite.
 204 4.      Understand the build.
 205 5.      Algorithms manual.
 206 6.      Code concepts.
 207 7.      Repo structure.
 208 8.      Headers.
 209 9.      Source code.
 210
 211 This order roughly follows this order:
 212
 213 1. High-level requirements
 214 2. Low-level requirements
 215 3. High-level implementation
 216 4. Low-level implementation
 217
 218 In other words, first understand what the code is *supposed* to do, then
 219 understand the code itself.
 220
 221 ## Useful External Tools
 222
 223 I have a few tools external to `bc` that are useful:
 224
 225 * A [Vim plugin with syntax files made specifically for my `bc` and `dc`][132].
 226 * A [repo of `bc` and `dc` scripts][133].
 227 * A set of `bash` aliases (see below).
 228 * A `.bcrc` file with items useful for my `bash` setup (see below).
 229
 230 My `bash` aliases are these:
 231
 232 ```sh
 233 alias makej='make -j16'
 234 alias mcmake='make clean && make'
 235 alias mcmakej='make clean && make -j16'
 236 alias bcdebug='CPPFLAGS="-DBC_DEBUG_CODE=1" CFLAGS="-Weverything -Wno-padded \
 237     -Wno-switch-enum -Wno-format-nonliteral -Wno-cast-align \
 238     -Wno-unreachable-code-return -Wno-missing-noreturn \
 239     -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \
 240     -pedantic -std=c99" ./configure.sh'
 241 alias bcconfig='CFLAGS="-Weverything -Wno-padded -Wno-switch-enum \
 242     -Wno-format-nonliteral -Wno-cast-align -Wno-unreachable-code-return \
 243     -Wno-missing-noreturn -Wno-disabled-macro-expansion -Wno-unreachable-code \
 244     -Wall -Wextra -pedantic -std=c99" ./configure.sh'
 245 alias bcnoassert='CPPFLAGS="-DNDEBUG" CFLAGS="-Weverything -Wno-padded \
 246     -Wno-switch-enum -Wno-format-nonliteral -Wno-cast-align \
 247     -Wno-unreachable-code-return -Wno-missing-noreturn \
 248     -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \
 249     -pedantic -std=c99" ./configure.sh'
 250 alias bcdebugnoassert='CPPFLAGS="-DNDEBUG -DBC_DEBUG_CODE=1" \
 251     CFLAGS="-Weverything -Wno-padded -Wno-switch-enum -Wno-format-nonliteral \
 252     -Wno-cast-align -Wno-unreachable-code-return -Wno-missing-noreturn \
 253     -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \
 254     -pedantic -std=c99" ./configure.sh'
 255 alias bcunset='unset BC_LINE_LENGTH && unset BC_ENV_ARGS'
 256 ```
 257
 258 `makej` runs `make` with all of my cores.
 259
 260 `mcmake` runs `make clean` before running `make`. It will take a target on the
 261 command-line.
 262
 263 `mcmakej` is a combination of `makej` and `mcmake`.
 264
 265 `bcdebug` configures `bc` for a full debug build, including `BC_DEBUG_CODE` (see
 266 [Debugging][134] below).
 267
 268 `bcconfig` configures `bc` with Clang (Clang is my personal default compiler)
 269 using full warnings, with a few really loud and useless warnings turned off.
 270
 271 `bcnoassert` configures `bc` to not have asserts built in.
 272
 273 `bcdebugnoassert` is like `bcnoassert`, except it also configures `bc` for debug
 274 mode.
 275
 276 `bcunset` unsets my personal `bc` environment variables, which are set to:
 277
 278 ```sh
 279 export BC_ENV_ARGS="-l $HOME/.bcrc"
 280 export BC_LINE_LENGTH="74"
 281 ```
 282
 283 Unsetting these environment variables are necessary for running
 284 [`scripts/release.sh`][83] because otherwise, it will error when attempting to
 285 run `bc -s` on my `$HOME/.bcrc`.
 286
 287 Speaking of which, the contents of that file are:
 288
 289 ```bc
 290 define void print_time_unit(t){
 291         if(t<10)print "0"
 292         if(t<1&&t)print "0"
 293         print t,":"
 294 }
 295 define void sec2time(t){
 296         auto s,m,h,d,r
 297         r=scale
 298         scale=0
 299         t=abs(t)
 300         s=t%60
 301         t-=s
 302         m=t/60%60
 303         t-=m
 304         h=t/3600%24
 305         t-=h
 306         d=t/86400
 307         if(d)print_time_unit(d)
 308         if(h)print_time_unit(h)
 309         print_time_unit(m)
 310         if(s<10)print "0"
 311         if(s<1&&s)print "0"
 312         s
 313         scale=r
 314 }
 315 define minutes(secs){
 316         return secs/60;
 317 }
 318 define hours(secs){
 319         return secs/3600;
 320 }
 321 define days(secs){
 322         return secs/3600/24;
 323 }
 324 define years(secs){
 325         return secs/3600/24/365.25;
 326 }
 327 define fbrand(b,p){
 328         auto l,s,t
 329         b=abs(b)$
 330         if(b<2)b=2
 331         s=scale
 332         t=b^abs(p)$
 333         l=ceil(l2(t),0)
 334         if(l>scale)scale=l
 335         t=irand(t)/t
 336         scale=s
 337         return t
 338 }
 339 define ifbrand(i,b,p){return irand(abs(i)$)+fbrand(b,p)}
 340 ```
 341
 342 This allows me to use `bc` as part of my `bash` prompt.
 343
 344 ## Code Style
 345
 346 The code style for `bc` is...weird, and that comes from historical accident.
 347
 348 In [History][23], I mentioned how I got my `bc` in [toybox][16]. Well, in order
 349 to do that, my `bc` originally had toybox style. Eventually, I changed to using
 350 tabs, and assuming they were 4 spaces wide, but other than that, I basically
 351 kept the same style, with some exceptions that are more or less dependent on my
 352 taste.
 353
 354 However, I later managed to get [ClangFormat][24] to work, so I changed the
 355 style to that.
 356
 357 ### ClangFormat
 358
 359 The style is now defined as whatever [ClangFormat][24] outputs using the
 360 existing `.clang-format` file. More precisely, the style is whatever is output
 361 when the following command is run in the root directory:
 362
 363 ```
 364 ./scripts/format.sh
 365 ```
 366
 367 ### Historical Style
 368
 369 The code style used to be:
 370
 371 * Tabs are 4 spaces.
 372 * Tabs are used at the beginning of lines for indent.
 373 * Spaces are used for alignment.
 374 * Lines are limited to 80 characters, period.
 375 * Pointer asterisk (`*`) goes with the variable (on the right), not the type,
 376   unless it is for a pointer type returned from a function.
 377 * The opening brace is put on the same line as the header for the function,
 378   loop, or `if` statement.
 379 * Unless the header is more than one line, in which case the opening brace is
 380   put on its own line.
 381 * If the opening brace is put on its own line, there is no blank line after it.
 382 * If the opening brace is *not* put on its own line, there *is* a blank line
 383   after it, *unless* the block is only one or two lines long.
 384 * Code lines are grouped into what I call "paragraphs." Basically, lines that
 385   seem like they should go together are grouped together. This one comes down
 386   to judgment.
 387 * Bodies of `if` statements, `else` statements, and loops that are one line
 388   long are put on the same line as the statement, unless the header is more than
 389   one line long, and/or, the header and body cannot fit into 80 characters with
 390   a space inbetween them.
 391 * If single-line bodies are on a separate line from their headers, and the
 392   headers are only a single line, then no braces are used.
 393 * However, braces are *always* used if they contain another `if` statement or
 394   loop.
 395 * Loops with empty bodies are ended with a semicolon.
 396 * Expressions that return a boolean value are surrounded by paretheses.
 397 * Macro backslashes are aligned as far to the left as possible.
 398 * Binary operators have spaces on both sides.
 399 * If a line with binary operators overflows 80 characters, a newline is inserted
 400   *after* binary operators.
 401 * Function modifiers and return types are on the same line as the function name.
 402 * With one exception, `goto`'s are only used to jump to the end of a function
 403   for cleanup.
 404 * All structs, enums, and unions are `typedef`'ed.
 405 * All constant data is in one file: [`src/data.c`][131], but the corresponding
 406   `extern` declarations are in the appropriate header file.
 407 * All local variables are declared at the beginning of the scope where they
 408   appear. They may be initialized at that point, if it does not invoke UB or
 409   otherwise cause bugs.
 410 * All precondition `assert()`'s (see [Asserts][135]) come *after* local variable
 411   declarations.
 412 * Besides short `if` statements and loops, there should *never* be more than one
 413   statement per line.
 414
 415 ## Repo Structure
 416
 417 Functions are documented with Doxygen-style doc comments. Functions that appear
 418 in headers are documented in the headers, while static functions are documented
 419 where they are defined.
 420
 421 ### `configure`
 422
 423 A symlink to [`configure.sh`][69].
 424
 425 ### `configure.sh`
 426
 427 This is the script to configure `bc` and [`bcl`][156] for building.
 428
 429 This `bc` has a custom build system. The reason for this is because of
 430 [*portability*][136].
 431
 432 If `bc` used an outside build system, that build system would be an external
 433 dependency. Thus, I had to write a build system for `bc` that used nothing but
 434 C99 and POSIX utilities.
 435
 436 One of those utilities is POSIX `sh`, which technically implements a
 437 Turing-complete programming language. It's a terrible one, but it works.
 438
 439 A user that wants to build `bc` on a POSIX system (not Windows) first runs
 440 `configure.sh` with the options he wants. `configure.sh` uses those options and
 441 the `Makefile` template ([`Makefile.in`][70]) to generate an actual valid
 442 `Makefile`. Then `make` can do the rest.
 443
 444 For more information about the build process, see the [Build System][142]
 445 section and the [build manual][14].
 446
 447 For more information about shell scripts, see [POSIX Shell Scripts][76].
 448
 449 `configure.sh` does the following:
 450
 451 1.      It processes command-line arguments and figure out what the user wants to
 452         build.
 453 2.      It reads in [`Makefile.in`][70].
 454 3.      One-by-one, it replaces placeholders (in [`Makefile.in`][70]) of the form
 455         `%%<placeholder_name>%%` based on the [build type][81].
 456 4.      It appends a list of file targets based on the [build type][81].
 457 5.      It appends the correct test targets.
 458 6.      It copies the correct manpage and markdown manual for `bc` and `dc` into a
 459         location from which they can be copied for install.
 460 7.      It does a `make clean` to reset the build state.
 461
 462 ### `.gitattributes`
 463
 464 A `.gitattributes` file. This is needed to preserve the `crlf` line endings in
 465 the Visual Studio files.
 466
 467 ### `.gitignore`
 468
 469 The `.gitignore`
 470
 471 ### `LICENSE.md`
 472
 473 This is the `LICENSE` file, including the licenses of various software that I
 474 have borrowed.
 475
 476 ### `Makefile.in`
 477
 478 This is the `Makefile` template for [`configure.sh`][69] to use for generating a
 479 `Makefile`.
 480
 481 For more information, see [`configure.sh`][69], the [Build System][142] section,
 482 and the [build manual][14].
 483
 484 Because of [portability][136], the generated `Makefile.in` should be a pure
 485 [POSIX `make`][74]-compatible `Makefile` (minus the placeholders). Here are a
 486 few snares for the unwary programmer in this file:
 487
 488 1.      No extensions allowed, including and especially GNU extensions.
 489 2.      If new headers are added, they must also be added to `Makefile.in`.
 490 3.      Don't delete the `.POSIX:` empty target at the top; that's what tells `make`
 491         implementations that pure [POSIX `make`][74] is needed.
 492
 493 In particular, there is no way to set up variables other than the `=` operator.
 494 There are no conditionals, so all of the conditional stuff must be in
 495 [`configure.sh`][69]. This is, in fact, why [`configure.sh`][69] exists in the
 496 first place: [POSIX `make`][74] is barebones and only does a build with no
 497 configuration.
 498
 499 ### `NEWS.md`
 500
 501 A running changelog with an entry for each version. This should be updated at
 502 the same time that [`include/version.h`][75] is.
 503
 504 ### `NOTICE.md`
 505
 506 The `NOTICE` file with proper attributions.
 507
 508 ### `README.md`
 509
 510 The `README`. Read it.
 511
 512 ### `benchmarks/`
 513
 514 The folder containing files to generate benchmarks.
 515
 516 Each of these files was made, at one time or another, to benchmark some
 517 experimental feature, so if it seems there is no rhyme or reason to these
 518 benchmarks, it is because there is none, besides historical accident.
 519
 520 #### `bc/`
 521
 522 The folder containing `bc` scripts to generate `bc` benchmarks.
 523
 524 ##### `add.bc`
 525
 526 The file to generate the benchmark to benchmark addition in `bc`.
 527
 528 ##### `arrays_and_constants.bc`
 529
 530 The file to generate the benchmark to benchmark `bc` using lots of array names
 531 and constants.
 532
 533 ##### `arrays.bc`
 534
 535 The file to generate the benchmark to benchmark `bc` using lots of array names.
 536
 537 ##### `constants.bc`
 538
 539 The file to generate the benchmark to benchmark `bc` using lots of constants.
 540
 541 ##### `divide.bc`
 542
 543 The file to generate the benchmark to benchmark division in `bc`.
 544
 545 ##### `functions.bc`
 546
 547 The file to generate the benchmark to benchmark `bc` using lots of functions.
 548
 549 ##### `irand_long.bc`
 550
 551 The file to generate the benchmark to benchmark `bc` using lots of calls to
 552 `irand()` with large bounds.
 553
 554 ##### `irand_short.bc`
 555
 556 The file to generate the benchmark to benchmark `bc` using lots of calls to
 557 `irand()` with small bounds.
 558
 559 ##### `lib.bc`
 560
 561 The file to generate the benchmark to benchmark `bc` using lots of calls to
 562 heavy functions in `lib.bc`.
 563
 564 ##### `multiply.bc`
 565
 566 The file to generate the benchmark to benchmark multiplication in `bc`.
 567
 568 ##### `newton_raphson_div_large.bc`
 569
 570 The file to generate the benchmark to benchmark the Newton-Raphson division in
 571 [GitHub PR #72][229] with large numbers.
 572
 573 ##### `newton_raphson_div_small.bc`
 574
 575 The file to generate the benchmark to benchmark the Newton-Raphson division in
 576 [GitHub PR #72][229] with small numbers.
 577
 578 ##### `newton_raphson_sqrt_large.bc`
 579
 580 The file to generate the benchmark to benchmark the Newton-Raphson square root
 581 in [GitHub PR #72][229] with large numbers.
 582
 583 ##### `newton_raphson_sqrt_small.bc`
 584
 585 The file to generate the benchmark to benchmark the Newton-Raphson square root
 586 in [GitHub PR #72][229] with small numbers.
 587
 588 ##### `postfix_incdec.bc`
 589
 590 The file to generate the benchmark to benchmark `bc` using postfix increment and
 591 decrement operators.
 592
 593 ##### `power.bc`
 594
 595 The file to generate the benchmark to benchmark power (exponentiation) in `bc`.
 596
 597 ##### `subtract.bc`
 598
 599 The file to generate the benchmark to benchmark subtraction in `bc`.
 600
 601 ##### `strings.bc`
 602
 603 The file to generate the benchmark to benchmark `bc` using lots of strings.
 604
 605 #### `dc/`
 606
 607 The folder containing `dc` scripts to generate `dc` benchmarks.
 608
 609 ##### `modexp.dc`
 610
 611 The file to generate the benchmark to benchmark modular exponentiation in `dc`.
 612
 613 ### `gen/`
 614
 615 A folder containing the files necessary to generate C strings that will be
 616 embedded in the executable.
 617
 618 All of the files in this folder have license headers, but the program and script
 619 that can generate strings from them include code to strip the license header out
 620 before strings are generated.
 621
 622 #### `bc_help.txt`
 623
 624 A text file containing the text displayed for `bc -h` or `bc --help`.
 625
 626 This text just contains the command-line options and a short summary of the
 627 differences from GNU and BSD `bc`'s. It also directs users to the manpage.
 628
 629 The reason for this is because otherwise, the help would be far too long to be
 630 useful.
 631
 632 **Warning**: The text has some `printf()` format specifiers. You need to make
 633 sure the format specifiers match the arguments given to `bc_file_printf()`.
 634
 635 #### `dc_help.txt`
 636
 637 A text file containing the text displayed for `dc -h` or `dc --help`.
 638
 639 This text just contains the command-line options and a short summary of the
 640 differences from GNU and BSD `dc`'s. It also directs users to the manpage.
 641
 642 The reason for this is because otherwise, the help would be far too long to be
 643 useful.
 644
 645 **Warning**: The text has some `printf()` format specifiers. You need to make
 646 sure the format specifiers match the arguments given to `bc_file_printf()`.
 647
 648 #### `lib.bc`
 649
 650 A `bc` script containing the [standard math library][5] required by POSIX. See
 651 the [POSIX standard][2] for what is required.
 652
 653 This file does not have any extraneous whitespace, except for tabs at the
 654 beginning of lines. That is because this data goes directly into the binary,
 655 and whitespace is extra bytes in the binary. Thus, not having any extra
 656 whitespace shrinks the resulting binary.
 657
 658 However, tabs at the beginning of lines are kept for two reasons:
 659
 660 1.      Readability. (This file is still code.)
 661 2.      The program and script that generate strings from this file can remove
 662         tabs at the beginning of lines.
 663
 664 For more details about the algorithms used, see the [algorithms manual][25].
 665
 666 However, there are a few snares for unwary programmers.
 667
 668 First, all constants must be one digit. This is because otherwise, multi-digit
 669 constants could be interpreted wrongly if the user uses a different `ibase`.
 670 This does not happen with single-digit numbers because they are guaranteed to be
 671 interpreted what number they would be if the `ibase` was as high as possible.
 672
 673 This is why `A` is used in the library instead of `10`, and things like `2*9*A`
 674 for `180` in [`lib2.bc`][26].
 675
 676 As an alternative, you can set `ibase` in the function, but if you do, make sure
 677 to set it with a single-digit number and beware the snare below...
 678
 679 Second, `scale`, `ibase`, and `obase` must be safely restored before returning
 680 from any function in the library. This is because without the `-g` option,
 681 functions are allowed to change any of the globals.
 682
 683 Third, all local variables in a function must be declared in an `auto` statement
 684 before doing anything else. This includes arrays. However, function parameters
 685 are considered predeclared.
 686
 687 Fourth, and this is only a snare for `lib.bc`, not [`lib2.bc`][26], the code
 688 must not use *any* extensions. It has to work when users use the `-s` or `-w`
 689 flags.
 690
 691 #### `lib2.bc`
 692
 693 A `bc` script containing the [extended math library][7].
 694
 695 Like [`lib.bc`][8], and for the same reasons, this file should have no
 696 extraneous whitespace, except for tabs at the beginning of lines.
 697
 698 For more details about the algorithms used, see the [algorithms manual][25].
 699
 700 Also, be sure to check [`lib.bc`][8] for the snares that can trip up unwary
 701 programmers when writing code for `lib2.bc`.
 702
 703 #### `strgen.c`
 704
 705 Code for the program to generate C strings from text files. This is the original
 706 program, although [`strgen.sh`][9] was added later.
 707
 708 The reason I used C here is because even though I knew `sh` would be available
 709 (it must be available to run `configure.sh`), I didn't know how to do what I
 710 needed to do with POSIX utilities and `sh`.
 711
 712 Later, [`strgen.sh`][9] was contributed by Stefan Eßer of FreeBSD, showing that
 713 it *could* be done with `sh` and POSIX utilities.
 714
 715 However, `strgen.c` exists *still* exists because the versions generated by
 716 [`strgen.sh`][9] may technically hit an environmental limit. (See the [draft C99
 717 standard][12], page 21.) This is because [`strgen.sh`][9] generates string
 718 literals, and in C99, string literals can be limited to 4095 characters, and
 719 `gen/lib2.bc` is above that.
 720
 721 Fortunately, the limit for "objects," which include `char` arrays, is much
 722 bigger: 65535 bytes, so that's what `strgen.c` generates.
 723
 724 However, the existence of `strgen.c` does come with a cost: the build needs C99
 725 compiler that targets the host machine. For more information, see the ["Cross
 726 Compiling" section][13] of the [build manual][14].
 727
 728 Read the comments in `strgen.c` for more detail about it, the arguments it
 729 takes, and how it works.
 730
 731 #### `strgen.sh`
 732
 733 An `sh` script that will generate C strings that uses only POSIX utilities. This
 734 exists for those situations where a host C99 compiler is not available, and the
 735 environment limits mentioned above in [`strgen.c`][15] don't matter.
 736
 737 `strgen.sh` takes the same arguments as [`strgen.c`][15], and the arguments mean
 738 the exact same things, so see the comments in [`strgen.c`][15] for more detail
 739 about that, and see the comments in `strgen.sh` for more details about it and
 740 how it works.
 741
 742 For more information about shell scripts, see [POSIX Shell Scripts][76].
 743
 744 ### `include/`
 745
 746 A folder containing the headers.
 747
 748 The headers are not included among the source code because I like it better that
 749 way. Also there were folders within `src/` at one point, and I did not want to
 750 see `#include "../some_header.h"` or things like that.
 751
 752 So all headers are here, even though only one ([`bcl.h`][30]) is meant for end
 753 users (to be installed in `INCLUDEDIR`).
 754
 755 #### `args.h`
 756
 757 This file is the API for processing command-line arguments.
 758
 759 #### `bc.h`
 760
 761 This header is the API for `bc`-only items. This includes the `bc_main()`
 762 function and the `bc`-specific lexing and parsing items.
 763
 764 The `bc` parser is perhaps the most sensitive part of the entire codebase. See
 765 the documentation in `bc.h` for more information.
 766
 767 The code associated with this header is in [`src/bc.c`][40],
 768 [`src/bc_lex.c`][41], and [`src/bc_parse.c`][42].
 769
 770 #### `bcl.h`
 771
 772 This header is the API for the [`bcl`][156] library.
 773
 774 This header is meant for distribution to end users and contains the API that end
 775 users of [`bcl`][156] can use in their own software.
 776
 777 This header, because it's the public header, is also the root header. That means
 778 that it has platform-specific fixes for Windows. (If the fixes were not in this
 779 header, the build would fail on Windows.)
 780
 781 The code associated with this header is in [`src/library.c`][43].
 782
 783 #### `dc.h`
 784
 785 This header is the API for `dc`-only items. This includes the `dc_main()`
 786 function and the `dc`-specific lexing and parsing items.
 787
 788 The code associated with this header is in [`src/dc.c`][44],
 789 [`src/dc_lex.c`][45], and [`src/dc_parse.c`][46].
 790
 791 #### `file.h`
 792
 793 This header is for `bc`'s internal buffered I/O API.
 794
 795 For more information about `bc`'s error handling and custom buffered I/O, see
 796 [Error Handling][97] and [Custom I/O][114], along with [`status.h`][176] and the
 797 notes about version [`3.0.0`][32] in the [`NEWS`][32].
 798
 799 The code associated with this header is in [`src/file.c`][47].
 800
 801 #### `history.h`
 802
 803 This header is for `bc`'s implementation of command-line editing/history, which
 804 is based on a [UTF-8-aware fork][28] of [`linenoise`][29].
 805
 806 For more information, see the [Command-Line History][189] section.
 807
 808 The code associated with this header is in [`src/history.c`][48].
 809
 810 #### `lang.h`
 811
 812 This header defines the data structures and bytecode used for actual execution
 813 of `bc` and `dc` code.
 814
 815 Yes, it's misnamed; that's an accident of history where the first things I put
 816 into it all seemed related to the `bc` language.
 817
 818 The code associated with this header is in [`src/lang.c`][49].
 819
 820 #### `lex.h`
 821
 822 This header defines the common items that both programs need for lexing.
 823
 824 The code associated with this header is in [`src/lex.c`][50],
 825 [`src/bc_lex.c`][41], and [`src/dc_lex.c`][45].
 826
 827 #### `library.h`
 828
 829 This header defines the things needed for [`bcl`][156] that users should *not*
 830 have access to. In other words, [`bcl.h`][30] is the *public* header for the
 831 library, and this header is the *private* header for the library.
 832
 833 The code associated with this header is in [`src/library.c`][43].
 834
 835 #### `num.h`
 836
 837 This header is the API for numbers and math.
 838
 839 The code associated with this header is in [`src/num.c`][39].
 840
 841 #### `opt.h`
 842
 843 This header is the API for parsing command-line arguments.
 844
 845 It's different from [`args.h`][31] in that [`args.h`][31] is for the main code
 846 to process the command-line arguments into global data *after* they have already
 847 been parsed by `opt.h` into proper tokens. In other words, `opt.h` actually
 848 parses the command-line arguments, and [`args.h`][31] turns that parsed data
 849 into flags (bits), strings, and expressions that will be used later.
 850
 851 Why are they separate? Because originally, `bc` used `getopt_long()` for
 852 parsing, so [`args.h`][31] was the only one that existed. After it was
 853 discovered that `getopt_long()` has different behavior on different platforms, I
 854 adapted a [public-domain option parsing library][34] to do the job instead. And
 855 in doing so, I gave it its own header.
 856
 857 They could probably be combined, but I don't really care enough at this point.
 858
 859 The code associated with this header is in [`src/opt.c`][51].
 860
 861 #### `parse.h`
 862
 863 This header defines the common items that both programs need for parsing.
 864
 865 Note that the parsers don't produce abstract syntax trees (AST's) or any
 866 intermediate representations. They produce bytecode directly. In other words,
 867 they don't have special data structures except what they need to do their job.
 868
 869 The code associated with this header is in [`src/parse.c`][50],
 870 [`src/bc_lex.c`][42], and [`src/dc_lex.c`][46].
 871
 872 #### `program.h`
 873
 874 This header defines the items needed to manage the data structures in
 875 [`lang.h`][38] as well as any helper functions needed to generate bytecode or
 876 execute it.
 877
 878 The code associated with this header is in [`src/program.c`][53].
 879
 880 #### `rand.h`
 881
 882 This header defines the API for the [pseudo-random number generator
 883 (PRNG)][179].
 884
 885 The PRNG only generates fixed-size integers. The magic of generating random
 886 numbers of arbitrary size is actually given to the code that does math
 887 ([`src/num.c`][39]).
 888
 889 The code associated with this header is in [`src/rand.c`][54].
 890
 891 #### `read.h`
 892
 893 This header defines the API for reading from files and `stdin`.
 894
 895 Thus, [`file.h`][55] is really for buffered *output*, while this file is for
 896 *input*. There is no buffering needed for `bc`'s inputs.
 897
 898 The code associated with this header is in [`src/read.c`][56].
 899
 900 #### `status.h`
 901
 902 This header has several things:
 903
 904 * A list of possible errors that internal `bc` code can use.
 905 * Compiler-specific fixes.
 906 * Platform-specific fixes.
 907 * Macros for `bc`'s [error handling][97].
 908
 909 There is no code associated with this header.
 910
 911 #### `vector.h`
 912
 913 This header defines the API for the vectors (resizable arrays) that are used for
 914 data structures.
 915
 916 Vectors are what do the heavy lifting in almost all of `bc`'s data structures.
 917 Even the maps of identifiers and arrays use vectors.
 918
 919 The code associated with this header is in [`src/vector.c`][228].
 920
 921 #### `version.h`
 922
 923 This header defines the version of `bc`.
 924
 925 There is no code associated with this header.
 926
 927 #### `vm.h`
 928
 929 This header defines the API for setting up and running `bc` and `dc`.
 930
 931 It is so named because I think of it as the "virtual machine" of `bc`, though
 932 that is probably not true as [`program.h`][57] is probably the "virtual machine"
 933 API. Thus, the name is more historical accident.
 934
 935 The code associated with this header is in [`src/vm.c`][58].
 936
 937 ### `locales/`
 938
 939 This folder contains a bunch of `.msg` files and soft links to the real `.msg`
 940 files. This is how locale support is implemented in `bc`.
 941
 942 The files are in the format required by the [`gencat`][59] POSIX utility. They
 943 all have the same messages, in the same order, with the same numbering, under
 944 the same groups. This is because the locale system expects those messages in
 945 that order.
 946
 947 The softlinks exist because for many locales, they would contain the exact same
 948 information. To prevent duplication, they are simply linked to a master copy.
 949
 950 The naming format for all files is:
 951
 952 ```
 953 <language_code>_<country_code>.<encoding>.msg
 954 ```
 955
 956 This naming format must be followed for all locale files.
 957
 958 ### `manuals/`
 959
 960 This folder contains the documentation for `bc`, `dc`, and [`bcl`][156], along
 961 with a few other manuals.
 962
 963 #### `algorithms.md`
 964
 965 This file explains the mathematical algorithms that are used.
 966
 967 The hope is that this file will guide people in understanding how the math code
 968 works.
 969
 970 #### `bc.1.md.in`
 971
 972 This file is a template for the markdown version of the `bc` manual and
 973 manpages.
 974
 975 For more information about how the manpages and markdown manuals are generated,
 976 and for why, see [`scripts/manpage.sh`][60] and [Manuals][86].
 977
 978 #### `bcl.3`
 979
 980 This is the manpage for the [`bcl`][156] library. It is generated from
 981 [`bcl.3.md`][61] using [`scripts/manpage.sh`][60].
 982
 983 For the reason why I check generated data into the repo, see
 984 [`scripts/manpage.sh`][60] and [Manuals][86].
 985
 986 #### `bcl.3.md`
 987
 988 This is the markdown manual for the [`bcl`][156] library. It is the source for the
 989 generated [`bcl.3`][62] file.
 990
 991 #### `benchmarks.md`
 992
 993 This is a document that compares this `bc` to GNU `bc` in various benchmarks. It
 994 was last updated when version [`3.0.0`][32] was released.
 995
 996 It has very little documentation value, other than showing what compiler options
 997 are useful for performance.
 998
 999 #### `build.md`
1000
1001 This is the [build manual][14].
1002
1003 This `bc` has a custom build system. The reason for this is because of
1004 [*portability*][136].
1005
1006 If `bc` used an outside build system, that build system would be an external
1007 dependency. Thus, I had to write a build system for `bc` that used nothing but
1008 C99 and POSIX utilities, including barebones [POSIX `make`][74].
1009
1010 for more information about the build system, see the [build system][142]
1011 section, the [build manual][14], [`configure.sh`][69], and [`Makefile.in`][70].
1012
1013 #### `dc.1.md.in`
1014
1015 This file is a template for the markdown version of the `dc` manual and
1016 manpages.
1017
1018 For more information about how the manpages and markdown manuals are generated,
1019 and for why, see [`scripts/manpage.sh`][60] and [Manuals][86].
1020
1021 #### `development.md`
1022
1023 The file you are reading right now.
1024
1025 #### `header_bcl.txt`
1026
1027 Used by [`scripts/manpage.sh`][60] to give the [`bcl.3`][62] manpage a proper
1028 header.
1029
1030 For more information about generating manuals, see [`scripts/manpage.sh`][60]
1031 and [Manuals][86].
1032
1033 #### `header_bc.txt`
1034
1035 Used by [`scripts/manpage.sh`][60] to give the [generated `bc` manpages][79] a
1036 proper header.
1037
1038 For more information about generating manuals, see [`scripts/manpage.sh`][60]
1039 and [Manuals][86].
1040
1041 #### `header_dc.txt`
1042
1043 Used by [`scripts/manpage.sh`][60] to give the [generated `dc` manpages][80] a
1044 proper header.
1045
1046 For more information about generating manuals, see [`scripts/manpage.sh`][60]
1047 and [Manuals][86].
1048
1049 #### `header.txt`
1050
1051 Used by [`scripts/manpage.sh`][60] to give all generated manpages a license
1052 header.
1053
1054 For more information about generating manuals, see [`scripts/manpage.sh`][60]
1055 and [Manuals][86].
1056
1057 #### `release.md`
1058
1059 A checklist that I try to somewhat follow when making a release.
1060
1061 #### `bc/`
1062
1063 A folder containing the `bc` manuals.
1064
1065 Each `bc` manual corresponds to a [build type][81]. See that link for more
1066 details.
1067
1068 For each manual, there are two copies: the markdown version generated from the
1069 template, and the manpage generated from the markdown version.
1070
1071 #### `dc/`
1072
1073 A folder containing the `dc` manuals.
1074
1075 Each `dc` manual corresponds to a [build type][81]. See that link for more
1076 details.
1077
1078 For each manual, there are two copies: the markdown version generated from the
1079 template, and the manpage generated from the markdown version.
1080
1081 ### `scripts/`
1082
1083 This folder contains helper scripts. Most of them are written in pure [POSIX
1084 `sh`][72], but three ([`afl.py`][94], [`karatsuba.py`][78], and
1085 [`randmath.py`][95]) are written in Python 3, and one ([`ministat.c`][223]) is
1086 written in C. [`ministat.c`][223] in particular is copied from elsewhere.
1087
1088 For more information about the shell scripts, see [POSIX Shell Scripts][76].
1089
1090 #### `afl.py`
1091
1092 This script is meant to be used as part of the fuzzing workflow.
1093
1094 It does one of two things: checks for valid crashes, or runs `bc` and or `dc`
1095 under all of the paths found by [AFL++][125].
1096
1097 See [Fuzzing][82] for more information about fuzzing, including this script.
1098
1099 #### `alloc.sh`
1100
1101 This script is a quick and dirty script to test whether or not the garbage
1102 collection mechanism of the [`BcNum` caching][96] works. It has been little-used
1103 because it tests something that is not important to correctness.
1104
1105 #### `benchmark.sh`
1106
1107 A script making it easy to run benchmarks and to run the executable produced by
1108 [`ministat.c`][223] on them.
1109
1110 For more information, see the [Benchmarks][144] section.
1111
1112 #### `bitfuncgen.c`
1113
1114 A source file for an executable to generate tests for `bc`'s bitwise functions
1115 in [`gen/lib2.bc`][26]. The executable is `scripts/bitfuncgen`, and it is built
1116 with `make bitfuncgen`. It produces the test on `stdout` and the expected
1117 results on `stderr`. This means that to generat tests, use the following
1118 invokation:
1119
1120 ```
1121 scripts/bitfuncgen > tests/bc/bitfuncs.txt 2> tests/bc/bitfuncs_results.txt
1122 ```
1123
1124 It calls `abort()` if it runs into an error.
1125
1126 #### `exec-install.sh`
1127
1128 This script is the magic behind making sure `dc` is installed properly if it's
1129 a symlink to `bc`. It checks to see if it is a link, and if so, it just creates
1130 a new symlink in the install directory. Of course, it also installs `bc` itself,
1131 or `dc` when it's alone.
1132
1133 #### `functions.sh`
1134
1135 This file is a bunch of common functions for most of the POSIX shell scripts. It
1136 is not supposed to be run; instead, it is *sourced* by other POSIX shell
1137 scripts, like so:
1138
1139 ```
1140 . "$scriptdir/functions.sh"
1141 ```
1142
1143 or the equivalent, depending on where the sourcing script is.
1144
1145 For more information about the shell scripts, see [POSIX Shell Scripts][76].
1146
1147 #### `fuzz_prep.sh`
1148
1149 Fuzzing is a regular activity when I am preparing for a release.
1150
1151 This script handles all the options and such for building a fuzzable binary.
1152 Instead of having to remember a bunch of options, I just put them in this script
1153 and run the script when I want to fuzz.
1154
1155 For more information about fuzzing, see [Fuzzing][82].
1156
1157 #### `karatsuba.py`
1158
1159 This script has at least one of two major differences from most of the other
1160 scripts:
1161
1162 * It's written in Python 3.
1163 * It's meant for software packagers.
1164
1165 For example, [`scripts/afl.py`][94] and [`scripts/randmath.py`][95] are both in
1166 Python 3, but they are not meant for the end user or software packagers and are
1167 not included in source distributions. But this script is.
1168
1169 This script breaks my rule of only POSIX utilities necessary for package
1170 maintainers, but there's a very good reason for that: it's only meant to be run
1171 *once* when the package is created for the first time, and maybe not even then.
1172
1173 You see, this script does two things: it tests the Karatsuba implementation at
1174 various settings for `KARATSUBA_LEN`, and it figures out what the optimal
1175 `KARATSUBA_LEN` is for the machine that it is running on.
1176
1177 Package maintainers can use this script, when creating a package for this `bc`,
1178 to figure out what is optimal for their users. Then they don't have to run it
1179 ever again. So this script only has to run on the packagers machine.
1180
1181 I tried to write the script in `sh`, by the way, and I finally accepted the
1182 tradeoff of using Python 3 when it became too hard.
1183
1184 However, I also mentioned that it's for testing Karatsuba with various settings
1185 of `KARATSUBA_LEN`. Package maintainers will want to run the [test suite][124],
1186 right?
1187
1188 Yes, but this script is not part of the [test suite][124]; it's used for testing
1189 in the [`scripts/release.sh`][83] script, which is maintainer use only.
1190
1191 However, there is one snare with `karatsuba.py`: I didn't want the user to have
1192 to install any Python libraries to run it. Keep that in mind if you change it.
1193
1194 #### `link.sh`
1195
1196 This script is the magic behind making `dc` a symlink of `bc` when both
1197 calculators are built.
1198
1199 #### `locale_install.sh`
1200
1201 This script does what its name says: it installs locales.
1202
1203 It turns out that this is complicated.
1204
1205 There is a magic environment variable, `$NLSPATH`, that tells you how and where
1206 you are supposed to install locales.
1207
1208 Yes, *how*. And where.
1209
1210 But now is not the place to rant about `$NLSPATH`. For more information on
1211 locales and `$NLSPATH`, see [Locales][85].
1212
1213 #### `locale_uninstall.sh`
1214
1215 This script does what its name says: it uninstalls locales.
1216
1217 This is far less complicated than installing locales. I basically generate a
1218 wildcard path and then list all paths that fit that wildcard. Then I delete each
1219 one of those paths. Easy.
1220
1221 For more information on locales, see [Locales][85].
1222
1223 #### `manpage.sh`
1224
1225 This script is the one that generates markdown manuals from a template and a
1226 manpage from a markdown manual.
1227
1228 For more information about generating manuals, see [Manuals][86].
1229
1230 #### `ministat.c`
1231
1232 This is a file copied [from FreeBSD][221] that calculates the standard
1233 statistical numbers, such as mean, average, and median, based on numbers
1234 obtained from a file.
1235
1236 For more information, see the [FreeBSD ministat(1) manpage][222].
1237
1238 This file allows `bc` to build the `scripts/ministat` executable using the
1239 command `make ministat`, and this executable helps programmers evaluate the
1240 results of [benchmarks][144] more accurately.
1241
1242 #### `package.sh`
1243
1244 This script is what helps `bc` maintainers cut a release. It does the following:
1245
1246 1.      Creates the appropriate `git` tag.
1247 2.      Pushes the `git` tag.
1248 3.      Copies the repo to a temp directory.
1249 4.      Removes files that should not be included in source distributions.
1250 5.      Creates the tarballs.
1251 6.      Signs the tarballs.
1252 7.      Zips and signs the Windows executables if they exist.
1253 8.      Calculates and outputs SHA512 and SHA256 sums for all of the files,
1254         including the signatures.
1255
1256 This script is for `bc` maintainers to use when cutting a release. It is not
1257 meant for outside use. This means that some non-POSIX utilities can be used,
1258 such as `git` and `gpg`.
1259
1260 In addition, before using this script, it expects that the folders that Windows
1261 generated when building `bc`, `dc`, and [`bcl`][156], are in the parent
1262 directory of the repo, exactly as Windows generated them. If they are not there,
1263 then it will not zip and sign, nor calculate sums of, the Windows executables.
1264
1265 Because this script creates a tag and pushes it, it should *only* be run *ONCE*
1266 per release.
1267
1268 #### `radamsa.sh`
1269
1270 A script to test `bc`'s command-line expression parsing code, which, while
1271 simple, strives to handle as much as possible.
1272
1273 What this script does is it uses the test cases in [`radamsa.txt`][98] an input
1274 to the [Radamsa fuzzer][99].
1275
1276 For more information, see the [Radamsa][128] section.
1277
1278 #### `radamsa.txt`
1279
1280 Initial test cases for the [`radamsa.sh`][100] script.
1281
1282 #### `randmath.py`
1283
1284 This script generates random math problems and checks that `bc`'s and `dc`'s
1285 output matches the GNU `bc` and `dc`. (For this reason, it is necessary to have
1286 GNU `bc` and `dc` installed before using this script.)
1287
1288 One snare: be sure that this script is using the GNU `bc` and `dc`, not a
1289 previously-installed version of this `bc` and `dc`.
1290
1291 If you want to check for memory issues or failing asserts, you can build the
1292 `bc` using `./scripts/fuzz_prep.sh -a`, and then run it under this script. Any
1293 errors or crashes should be caught by the script and given to the user as part
1294 of the "checklist" (see below).
1295
1296 The basic idea behind this script is that it generates as many math problems as
1297 it can, biasing towards situations that may be likely to have bugs, and testing
1298 each math problem against GNU `bc` or `dc`.
1299
1300 If GNU `bc` or `dc` fails, it just continues. If this `bc` or `dc` fails, it
1301 stores that problem. If the output mismatches, it also stores the problem.
1302
1303 Then, when the user sends a `SIGINT`, the script stops testing and goes into
1304 report mode. One-by-one, it will go through the "checklist," the list of failed
1305 problems, and present each problem to the user, as well as whether this `bc` or
1306 `dc` crashed, and its output versus GNU. Then the user can decide to add them as
1307 test cases, which it does automatically to the appropriate test file.
1308
1309 #### `release_settings.txt`
1310
1311 A text file of settings combinations that [`release.sh`][83] uses to ensure that
1312 `bc` and `dc` build and work with various default settings. [`release.sh`][83]
1313 simply reads it line by line and uses each line for one build.
1314
1315 #### `release.sh`
1316
1317 This script is for `bc` maintainers only. It runs `bc`, `dc`, and [`bcl`][156]
1318 through a gauntlet that is mostly meant to be used in preparation for a release.
1319
1320 It does the following:
1321
1322 1.      Builds every [build type][81], with every setting combo in
1323         [`release_settings.txt`][93] with both calculators, `bc` alone, and `dc`
1324         alone.
1325 2.      Builds every [build type][81], with every setting combo in
1326         [`release_settings.txt`][93] with both calculators, `bc` alone, and `dc`
1327         alone for 32-bit.
1328 3.      Does #1 and #2 for Debug, Release, Release with Debug Info, and Min Size
1329         Release builds.
1330 4.      Runs the [test suite][124] on every build, if desired.
1331 5.      Runs the [test suite][124] under [ASan, UBSan, and MSan][21] for every build
1332         type/setting combo.
1333 6.      Runs [`scripts/karatsuba.py`][78] in test mode.
1334 7.      Runs the [test suite][124] for both calculators, `bc` alone, and `dc` alone
1335         under [valgrind][20] and errors if there are any memory bugs or memory
1336         leaks.
1337
1338 #### `safe-install.sh`
1339
1340 A script copied from [musl][101] to atomically install files.
1341
1342 #### `test_settings.sh`
1343
1344 A quick and dirty script to help automate rebuilding while manually testing the
1345 various default settings.
1346
1347 This script uses [`test_settings.txt`][103] to generate the various settings
1348 combos.
1349
1350 For more information about settings, see [Settings][102] in the [build
1351 manual][14].
1352
1353 #### `test_settings.txt`
1354
1355 A list of the various settings combos to be used by [`test_settings.sh`][104].
1356
1357 ### `src/`
1358
1359 This folder is, obviously, where the actual heart and soul of `bc`, the source
1360 code, is.
1361
1362 All of the source files are in one folder; this simplifies the build system
1363 immensely.
1364
1365 There are separate files for `bc` and `dc` specific code ([`bc.c`][40],
1366 [`bc_lex.c`][41], [`bc_parse.c`][42], [`dc.c`][44], [`dc_lex.c`][45], and
1367 [`dc_parse.c`][46]) where possible because it is cleaner to exclude an entire
1368 source file from a build than to have `#if`/`#endif` preprocessor guards.
1369
1370 That said, it was easier in many cases to use preprocessor macros where both
1371 calculators used much of the same code and data structures, so there is a
1372 liberal sprinkling of them through the code.
1373
1374 #### `args.c`
1375
1376 Code for processing command-line arguments.
1377
1378 The header for this file is [`include/args.h`][31].
1379
1380 #### `bc.c`
1381
1382 The code for the `bc` main function `bc_main()`.
1383
1384 The header for this file is [`include/bc.h`][106].
1385
1386 #### `bc_lex.c`
1387
1388 The code for lexing that only `bc` needs.
1389
1390 The headers for this file are [`include/lex.h`][180] and [`include/bc.h`][106].
1391
1392 #### `bc_parse.c`
1393
1394 The code for parsing that only `bc` needs. This code is the most complex and
1395 subtle in the entire codebase.
1396
1397 The headers for this file are [`include/parse.h`][181] and
1398 [`include/bc.h`][106].
1399
1400 #### `data.c`
1401
1402 Due to [historical accident][23] because of a desire to get my `bc` into
1403 [toybox][16], all of the constant data that `bc` needs is all in one file. This
1404 is that file.
1405
1406 There is no code in this file, but a lot of the const data has a heavy influence
1407 on code, including the order of data in arrays because that order has to
1408 correspond to the order of other things elsewhere in the codebase. If you change
1409 the order of something in this file, run `make test`, and get errors, you
1410 changed something that depends on the order that you messed up.
1411
1412 Almost all headers have `extern` references to items in this file.
1413
1414 #### `dc.c`
1415
1416 The code for the `dc` main function `dc_main()`.
1417
1418 The header for this file is [`include/dc.h`][182].
1419
1420 #### `dc_lex.c`
1421
1422 The code for lexing that only `dc` needs.
1423
1424 The headers for this file are [`include/lex.h`][180] and [`include/dc.h`][182].
1425
1426 #### `dc_parse.c`
1427
1428 The code for parsing that only `dc` needs.
1429
1430 The headers for this file are [`include/parse.h`][181] and
1431 [`include/bc.h`][182].
1432
1433 #### `file.c`
1434
1435 The code for `bc`'s implementation of buffered I/O. For more information about
1436 why I implemented my own buffered I/O, see [`include/file.h`][55], [Error
1437 Handling][97], and [Custom I/O][114], along with [`status.h`][176] and the notes
1438 about version [`3.0.0`][32] in the [`NEWS`][32].
1439
1440 The header for this file is [`include/file.h`][55].
1441
1442 #### `history.c`
1443
1444 The code for `bc`'s implementation of command-line editing/history, which is
1445 based on a [UTF-8-aware fork][28] of [`linenoise`][29].
1446
1447 For more information, see the [Command-Line History][189] section.
1448
1449 The header for this file is [`include/history.h`][36].
1450
1451 #### `lang.c`
1452
1453 The data structures used for actual execution of `bc` and `dc` code.
1454
1455 While execution is done in [`src/program.c`][53], this file defines functions
1456 for initializing, copying, and freeing the data structures, which is somewhat
1457 orthogonal to actual execution.
1458
1459 Yes, it's misnamed; that's an accident of history where the first things I put
1460 into it all seemed related to the `bc` language.
1461
1462 The header for this file is [`include/lang.h`][38].
1463
1464 #### `lex.c`
1465
1466 The code for the common things that both programs need for lexing.
1467
1468 The header for this file is [`include/lex.h`][180].
1469
1470 #### `library.c`
1471
1472 The code to implement the public API of the `bcl` library.
1473
1474 The code in this file does a lot to ensure that clients do not have to worry
1475 about internal `bc` details, especially error handling with `setjmp()` and
1476 `longjmp()`. That and encapsulating the handling of numbers are the bulk of what
1477 the code in this file actually does because most of the library is still
1478 implemented in [`src/num.c`][39].
1479
1480 The headers for this file are [`include/bcl.h`][30] and
1481 [`include/library.h`][183].
1482
1483 #### `main.c`
1484
1485 The entry point for both programs; this is the `main()` function.
1486
1487 This file has no headers associated with it.
1488
1489 #### `num.c`
1490
1491 The code for all of the arbitrary-precision [numbers][177] and [math][178] in
1492 `bc`.
1493
1494 The header for this file is [`include/num.h`][184].
1495
1496 #### `opt.c`
1497
1498 The code for parsing command-line options.
1499
1500 The header for this file is [`include/opt.h`][35].
1501
1502 #### `parse.c`
1503
1504 The code for the common items that both programs need for parsing.
1505
1506 The header for this file is [`include/parse.h`][181].
1507
1508 #### `program.c`
1509
1510 The code for the actual execution engine for `bc` and `dc` code.
1511
1512 The header for this file is [`include/program.h`][57].
1513
1514 #### `rand.c`
1515
1516 The code for the [pseudo-random number generator (PRNG)][179] and the special
1517 stack handling it needs.
1518
1519 The PRNG only generates fixed-size integers. The magic of generating random
1520 numbers of arbitrary size is actually given to the code that does math
1521 ([`src/num.c`][39]).
1522
1523 The header for this file is [`include/rand.h`][37].
1524
1525 #### `read.c`
1526
1527 The code for reading from files and `stdin`.
1528
1529 The header for this file is [`include/read.h`][185].
1530
1531 #### `vector.c`
1532
1533 The code for [vectors][111], [maps][186], and [slab vectors][187], along with
1534 slabs.
1535
1536 The header for this file is [`include/vector.h`][174].
1537
1538 #### `vm.c`
1539
1540 The code for setting up and running `bc` and `dc`.
1541
1542 It is so named because I think of it as the "virtual machine" of `bc`, though
1543 that is probably not true as [`program.h`][57] is probably the "virtual machine"
1544 code. Thus, the name is more historical accident.
1545
1546 The header for this file is [`include/vm.h`][27].
1547
1548 ### `tests/`
1549
1550 This directory contains the entire [test suite][124] and its infrastructure.
1551
1552 #### `all.sh`
1553
1554 A convenience script for the `make run_all_tests` target (see the [Group
1555 Tests][141] section for more information).
1556
1557 #### `all.txt`
1558
1559 The file with the names of the calculators. This is to make it easier for the
1560 test scripts to know where the standard and other test directories are.
1561
1562 #### `bcl.c`
1563
1564 The test for the [`bcl`][156] API. For more information, see the [`bcl`
1565 Test][157] section.
1566
1567 #### `error.sh`
1568
1569 The script to run the file-based error tests in `tests/<calculator>/errors/` for
1570 each calculator. For more information, see the [Error Tests][151] section.
1571
1572 This is a separate script so that each error file can be run separately and in
1573 parallel.
1574
1575 #### `errors.sh`
1576
1577 The script to run the line-based error tests in `tests/<calculator>/errors.txt`
1578 for each calculator. For more information, see the [Error Tests][151] section.
1579
1580 #### `extra_required.txt`
1581
1582 The file with the list of tests which both calculators have that need the [Extra
1583 Math build option][188]. This exists to make it easy for test scripts to skip
1584 those tests when the [Extra Math build option][188] is disabled.
1585
1586 #### `history.py`
1587
1588 The file with all of the history tests. For more information, see the [History
1589 Tests][155] section.
1590
1591 #### `history.sh`
1592
1593 The script to integrate [`history.py`][139] into the build system in a portable
1594 way, and to skip it if necessary.
1595
1596 This script also re-runs the test three times if it fails. This is because
1597 `pexpect` can be flaky at times.
1598
1599 #### `other.sh`
1600
1601 The script to run the "other" (miscellaneous) tests for each calculator. For
1602 more information, see the [Other Tests][154] section.
1603
1604 #### `read.sh`
1605
1606 The script to run the read tests for each calculator. For more information, see
1607 the [`read()` Tests][153] section.
1608
1609 #### `script.sed`
1610
1611 The `sed` script to edit the output of GNU `bc` when generating script tests.
1612 For more information, see the [Script Tests][150] section.
1613
1614 #### `script.sh`
1615
1616 The script for running one script test. For more information, see the [Script
1617 Tests][150] section.
1618
1619 #### `scripts.sh`
1620
1621 The script to help the `make run_all_tests` (see the [Group Tests][141] section)
1622 run all of the script tests.
1623
1624 #### `stdin.sh`
1625
1626 The script to run the `stdin` tests for each calculator. For more information,
1627 see the [`stdin` Tests][152] section.
1628
1629 #### `test.sh`
1630
1631 The script to run one standard test. For more information, see the [Standard
1632 Tests][149] section.
1633
1634 #### `bc/`
1635
1636 The standard tests directory for `bc`. For more information, see the [Standard
1637 Tests][149] section.
1638
1639 ##### `all.txt`
1640
1641 The file to tell the build system and `make run_all_tests` (see the [Group
1642 Tests][141] section) what standard tests to run for `bc`, as well as in what
1643 order.
1644
1645 This file just lists the test names, one per line.
1646
1647 ##### `errors.txt`
1648
1649 The initial error test file for `bc`. This file has one test per line. See the
1650 [Error Tests][151] section for more information.
1651
1652 ##### `posix_errors.txt`
1653
1654 The file of tests for POSIX compatibility for `bc`. This file has one test per
1655 line. For more information, see the [Error Tests][151] section.
1656
1657 ##### `timeconst.sh`
1658
1659 The script to run the `bc` tests that use the [Linux `timeconst.bc` script][6].
1660 For more information, see the [Linux `timeconst.bc` Script][191]section.
1661
1662 ##### `errors/`
1663
1664 The directory with error tests for `bc`, most discovered by AFL++ (see the
1665 [Fuzzing][82] section). There is one test per file. For more information, see
1666 the [Error Tests][151] section.
1667
1668 ##### `scripts/`
1669
1670 The script tests directory for `bc`. For more information, see the [Script
1671 Tests][150] section.
1672
1673 ###### `all.txt`
1674
1675 A file to tell the build system and `make run_all_tests` (see the [Group
1676 Tests][141] section) what script tests to run for `bc`, as well as in what
1677 order.
1678
1679 This file just lists the test names, one per line.
1680
1681 #### `dc/`
1682
1683 The standard tests directory for `dc`. For more information, see the [Standard
1684 Tests][149] section.
1685
1686 ##### `all.txt`
1687
1688 The file to tell the build system and `make run_all_tests` (see the [Group
1689 Tests][141] section) what standard tests to run for `dc`, as well as in what
1690 order.
1691
1692 This file just lists the test names, one per line.
1693
1694 ##### `errors.txt`
1695
1696 The initial error test file for `dc`. This file has one test per line. See the
1697 [Error Tests][151] section for more information.
1698
1699 ##### `read_errors.txt`
1700
1701 The file of tests errors with the `?` command (`read()` in `bc`). This file has
1702 one test per line. See the [Error Tests][151] section for more information.
1703
1704 ##### `errors/`
1705
1706 The directory with error tests for `dc`, most discovered by AFL++ (see the
1707 [Fuzzing][82] section). There is one test per file. For more information, see
1708 the [Error Tests][151] section.
1709
1710 ##### `scripts/`
1711
1712 The script tests directory for `dc`. For more information, see the [Script
1713 Tests][150] section.
1714
1715 ###### `all.txt`
1716
1717 The file to tell the build system and `make run_all_tests` (see the [Group
1718 Tests][141] section) what script tests to run for `dc`, as well as in what
1719 order.
1720
1721 This file just lists the test names, one per line.
1722
1723 #### `fuzzing/`
1724
1725 The directory containing the fuzzing infrastructure. For more information, see
1726 the [Fuzzing][82] section.
1727
1728 ##### `bc_afl_continue.yaml`
1729
1730 The [`tmuxp`][123] config (for use with [`tmux`][122]) for easily restarting a
1731 fuzz run. For more information, see the [Convenience][130] subsection of the
1732 [Fuzzing][82] section.
1733
1734 ##### `bc_afl.yaml`
1735
1736 The [`tmuxp`][123] config (for use with [`tmux`][122]) for easily starting a
1737 fuzz run. For more information, see the [Convenience][130] subsection of the
1738 [Fuzzing][82] section.
1739
1740 Be aware that this will delete all previous unsaved fuzzing tests in the output
1741 directories.
1742
1743 ##### `bc_inputs1/`
1744
1745 The fuzzing input directory for the first third of inputs for `bc`. For more
1746 information, see the [Corpuses][192] subsection of the [Fuzzing][82] section.
1747
1748 ##### `bc_inputs2/`
1749
1750 The fuzzing input directory for the second third of inputs for `bc`. For more
1751 information, see the [Corpuses][192] subsection of the [Fuzzing][82] section.
1752
1753 ##### `bc_inputs3/`
1754
1755 The fuzzing input directory for the third third of inputs for `bc`. For more
1756 information, see the [Corpuses][192] subsection of the [Fuzzing][82] section.
1757
1758 ##### `dc_inputs/`
1759
1760 The fuzzing input directory for the inputs for `dc`. For more information, see
1761 the [Corpuses][192] subsection of the [Fuzzing][82] section.
1762
1763 ### `vs/`
1764
1765 The directory containing all of the materials needed to build `bc`, `dc`, and
1766 `bcl` on Windows.
1767
1768 #### `bcl.sln`
1769
1770 A Visual Studio solution file for [`bcl`][156]. This, along with
1771 [`bcl.vcxproj`][63] and [`bcl.vcxproj.filters`][64] is what makes it possible to
1772 build [`bcl`][156] on Windows.
1773
1774 #### `bcl.vcxproj`
1775
1776 A Visual Studio project file for [`bcl`][156]. This, along with [`bcl.sln`][65]
1777 and [`bcl.vcxproj.filters`][64] is what makes it possible to build [`bcl`][156]
1778 on Windows.
1779
1780 #### `bcl.vcxproj.filters`
1781
1782 A Visual Studio filters file for [`bcl`][156]. This, along with [`bcl.sln`][65]
1783 and [`bcl.vcxproj`][63] is what makes it possible to build [`bcl`][156] on
1784 Windows.
1785
1786 #### `bc.sln`
1787
1788 A Visual Studio solution file for `bc`. This, along with [`bc.vcxproj`][66]
1789 and [`bc.vcxproj.filters`][67] is what makes it possible to build `bc` on
1790 Windows.
1791
1792 #### `bc.vcxproj`
1793
1794 A Visual Studio project file for `bc`. This, along with [`bc.sln`][68] and
1795 [`bc.vcxproj.filters`][67] is what makes it possible to build `bc` on Windows.
1796
1797 #### `bc.vcxproj.filters`
1798
1799 A Visual Studio filters file for `bc`. This, along with [`bc.sln`][68] and
1800 [`bc.vcxproj`][66] is what makes it possible to build `bc` on Windows.
1801
1802 #### `tests/`
1803
1804 A directory of files to run tests on Windows.
1805
1806 ##### `tests_bc.bat`
1807
1808 A file to run basic `bc` tests on Windows. It expects that it will be run from
1809 the directory containing it, and it also expects a `bc.exe` in the same
1810 directory.
1811
1812 ##### `tests_dc.bat`
1813
1814 A file to run basic `dc` tests on Windows. It expects that it will be run from
1815 the directory containing it, and it also expects a `bc.exe` in the same
1816 directory.
1817
1818 ## Build System
1819
1820 The build system is described in detail in the [build manual][14], so
1821 maintainers should start there. This section, however, describes some parts of
1822 the build system that only maintainers will care about.
1823
1824 ### Clean Targets
1825
1826 `bc` has a default `make clean` target that cleans up the build files. However,
1827 because `bc`'s build system can generate many different types of files, there
1828 are other clean targets that may be useful:
1829
1830 * `make clean_gen` cleans the `gen/strgen` executable generated from
1831   [`gen/strgen.c`][15]. It has no prerequisites.
1832 * `make clean` cleans object files, `*.cat` files (see the [Locales][85]
1833   section), executables, and files generated from text files in [`gen/`][145],
1834   including `gen/strgen` if it was built. So this has a prerequisite on
1835   `make clean_gen` in normal use.
1836 * `make clean_benchmarks` cleans [benchmarks][144], including the `ministat`
1837   executable. It has no prerequisites.
1838 * `make clean_config` cleans the generated `Makefile` and the manuals that
1839   [`configure.sh`][69] copied in preparation for install. It also depends on
1840   `make clean` and `make clean_benchmarks`, so it cleans those items too. This
1841   is the target that [`configure.sh`][69] uses before it does its work.
1842 * `make clean_coverage` cleans the generated coverage files for the [test
1843   suite][124]'s [code coverage][146] capabilities. It has no prerequisites. This
1844   is useful if the code coverage tools are giving errors.
1845 * `make clean_tests` cleans *everything*. It has prerequisites on all previous
1846   clean targets, but it also cleans all of the [generated tests][143].
1847
1848 When adding more generated files, you may need to add them to one of these
1849 targets and/or add a target for them especially.
1850
1851 ### Preprocessor Macros
1852
1853 `bc` and `dc` use *a lot* of preprocessor macros to ensure that each build type:
1854
1855 * builds,
1856 * works under the [test suite][124], and
1857 * excludes as much code as possible from all builds.
1858
1859 This section will explain the preprocessor style of `bc` and `dc`, as well as
1860 provide an explanation of the macros used.
1861
1862 #### Style
1863
1864 The style of macro use in `bc` is pretty straightforward: I avoid depending on
1865 macro definitions and instead, I set defaults if the macro is not defined and
1866 then test the value if the macro with a plain `#if`.
1867
1868 (Some examples of setting defaults are in [`include/status.h`][176], just above
1869 the definition of the `BcStatus` enum.)
1870
1871 In other words, I use `#if` instead of `#ifndef` or `#ifdef`, where possible.
1872
1873 There are a couple of cases where I went with standard stuff instead.
1874
1875 #### Standard Macros
1876
1877 `BC_ENABLED`
1878
1879 :   This macro expands to `1` if `bc` is enabled, `0` if disabled.
1880
1881 `DC_ENABLED`
1882
1883 :   This macro expands to `1` if `dc` is enabled, `0` if disabled.
1884
1885 `BUILD_TYPE`
1886
1887 :   The macro expands to the build type, which is one of: `A`, `E`, `H`, `N`,
1888     `EH`, `EN`, `HN`, `EHN`. This build type is used in the help text to direct
1889     the user to the correct markdown manual in the `git.gavinhoward.com`
1890     website.
1891
1892 `EXECPREFIX`
1893
1894 :   This macro expands to the prefix on the executable name. This is used to
1895     allow `bc` and `dc` to skip the prefix when finding out which calculator is
1896     executing.
1897
1898 `BC_NUM_KARATSUBA_LEN`
1899
1900 :   This macro expands to an integer, which is the length of numbers below which
1901     the Karatsuba multiplication algorithm switches to brute-force
1902     multiplication.
1903
1904 `BC_ENABLE_EXTRA_MATH`
1905
1906 :   This macro expands to `1` if the [Extra Math build option][188] is enabled,
1907     `0` if disabled.
1908
1909 `BC_ENABLE_HISTORY`
1910
1911 :   This macro expands to `1` if the [History build option][193] is enabled, `0`
1912     if disabled.
1913
1914 `BC_ENABLE_NLS`
1915
1916 :   This macro expands to `1` if the [NLS build option][193] (for locales) is
1917     enabled, `0` if disabled.
1918
1919 `BC_ENABLE_LIBRARY`
1920
1921 :   This macro expands to `1` if the [`bcl` library][156] is enabled, `0` if
1922     disabled. If this is enabled, building the calculators themselves is
1923     disabled, but both `BC_ENABLED` and `DC_ENABLED` must be non-zero.
1924
1925 `BC_ENABLE_MEMCHECK`
1926
1927 :   This macro expands to `1` if `bc` has been built for use with Valgrind's
1928     [Memcheck][194], `0` otherwise. This ensures that fatal errors still free
1929     all of their memory when exiting. `bc` does not do that normally because
1930     what's the point?
1931
1932 `BC_ENABLE_AFL`
1933
1934 :   This macro expands to `1` if `bc` has been built for fuzzing with
1935     [AFL++][125], `0` otherwise. See the [Fuzzing][82] section for more
1936     information.
1937
1938 `BC_DEFAULT_BANNER`
1939
1940 :   This macro expands to the default value for displaying the `bc` banner.
1941
1942 `BC_DEFAULT_SIGINT_RESET`
1943
1944 :   The macro expands to the default value for whether or not `bc` should reset
1945     on `SIGINT` or quit.
1946
1947 `BC_DEFAULT_TTY_MODE`
1948
1949 :   The macro expands to the default value for whether or not `bc` should use
1950     TTY mode when it available.
1951
1952 `BC_DEFAULT_PROMPT`
1953
1954 :   This macro expands to the default value for whether or not `bc` should use a
1955     prompt when TTY mode is available.
1956
1957 `DC_DEFAULT_SIGINT_RESET`
1958
1959 :   The macro expands to the default value for whether or not `dc` should reset
1960     on `SIGINT` or quit.
1961
1962 `DC_DEFAULT_TTY_MODE`
1963
1964 :   The macro expands to the default value for whether or not `dc` should use
1965     TTY mode when it available.
1966
1967 `DC_DEFAULT_PROMPT`
1968
1969 :   This macro expands to the default value for whether or not `dc` should use a
1970     prompt when TTY mode is available.
1971
1972 `BC_DEBUG_CODE`
1973
1974 :   If this macro expands to a non-zero integer, then `bc` is built with *a lot*
1975     of extra debugging code. This is never set by the build system and must be
1976     set by the programmer manually. This should never be set in builds given to
1977     end users. For more information, see the [Debugging][134] section.
1978
1979 ## Test Suite
1980
1981 While the source code may be the heart and soul of `bc`, the test suite is the
1982 arms and legs: it gives `bc` the power to do anything it needs to do.
1983
1984 The test suite is what allowed `bc` to climb to such high heights of quality.
1985 This even goes for fuzzing because fuzzing depends on the test suite for its
1986 input corpuses. (See the [Fuzzing][82] section.)
1987
1988 Understanding how the test suite works should be, I think, the first thing that
1989 maintainers learn after learning what `bc` and `dc` should do. This is because
1990 the test suite, properly used, gives confidence that changes have not caused
1991 bugs or regressions.
1992
1993 That is why I spent the time to make the test suite as easy to use and as fast
1994 as possible.
1995
1996 To use the test suite (assuming `bc` and/or `dc` are already built), run the
1997 following command:
1998
1999 ```
2000 make test
2001 ```
2002
2003 That's it. That's all.
2004
2005 It will return an error code if the test suite failed. It will also print out
2006 information about the failure.
2007
2008 If you want the test suite to go fast, then run the following command:
2009
2010 ```
2011 make -j<cores> test
2012 ```
2013
2014 Where `<cores>` is the number of cores that your computer has. Of course, this
2015 requires a `make` implementation that supports that option, but most do. (And I
2016 will use this convention throughout the rest of this section.)
2017
2018 I have even tried as much as possible, to put longer-running tests near the
2019 beginning of the run so that the entire suite runs as fast as possible.
2020
2021 However, if you want to be sure which test is failing, then running a bare
2022 `make test` is a great way to do that.
2023
2024 But enough about how you have no excuses to use the test suite as much as
2025 possible; let's talk about how it works and what you *can* do with it.
2026
2027 ### Standard Tests
2028
2029 The heavy lifting of testing the math in `bc`, as well as basic scripting, is
2030 done by the "standard tests" for each calculator.
2031
2032 These tests use the files in the [`tests/bc/`][161] and [`tests/dc/`][162]
2033 directories (except for [`tests/bc/all.txt`][163], [`tests/bc/errors.txt`][164],
2034 [`tests/bc/posix_errors.txt`][165], [`tests/bc/timeconst.sh`][166],
2035 [`tests/dc/all.txt`][167], [`tests/dc/errors.txt`][168], and
2036 [`tests/dc/read_errors.txt`][175]), which are called the "standard test
2037 directories."
2038
2039 For every test, there is the test file and the results file. The test files have
2040 names of the form `<test>.txt`, where `<test>` is the name of the test, and the
2041 results files have names of the form `<test>_results.txt`.
2042
2043 If the test file exists but the results file does not, the results for that test
2044 are generated by a GNU-compatible `bc` or `dc`. See the [Generated Tests][143]
2045 section.
2046
2047 The `all.txt` file in each standard tests directory is what tells the test suite
2048 and [build system][142] what tests there are, and the tests are either run in
2049 that order, or in the case of parallel `make`, that is the order that the
2050 targets are listed as prerequisites of `make test`.
2051
2052 If the test exists in the `all.txt` file but does not *actually* exist, the test
2053 and its results are generated by a GNU-compatible `bc` or `dc`. See the
2054 [Generated Tests][143] section.
2055
2056 To add a non-generated standard test, do the following:
2057
2058 * Add the test file (`<test>.txt` in the standard tests directory).
2059 * Add the results file (`<test>_results.txt` in the standard tests directory).
2060   You can skip this step if just the results file needs to be generated. See the
2061   [Generated Tests][147] section for more information.
2062 * Add the name of the test to the `all.txt` file in the standard tests
2063   directory, putting it in the order it should be in. If possible, I would put
2064   longer tests near the beginning because they will start running earlier with
2065   parallel `make`. I always keep `decimal` first, though, as a smoke test.
2066
2067 If you need to add a generated standard test, see the [Generated Tests][147]
2068 section for how to do that.
2069
2070 Some standard tests need to be skipped in certain cases. That is handled by the
2071 [build system][142]. See the [Integration with the Build System][147] section
2072 for more details.
2073
2074 In addition to all of the above, the standard test directory is not only the
2075 directory for the standard tests of each calculator, it is also the parent
2076 directory of all other test directories for each calculator.
2077
2078 #### `bc` Standard Tests
2079
2080 The list of current (27 February 2023) standard tests for `bc` is below:
2081
2082 decimal
2083
2084 :   Tests decimal parsing and printing.
2085
2086 print
2087
2088 :   Tests printing in every base from decimal. This is near the top for
2089     performance of parallel testing.
2090
2091 parse
2092
2093 :   Tests parsing in any base and outputting in decimal. This is near the top
2094     for performance of parallel testing.
2095
2096 lib2
2097
2098 :   Tests the extended math library. This is near the top for performance of
2099     parallel testing.
2100
2101 print2
2102
2103 :   Tests printing at the extreme values of `obase`.
2104
2105 length
2106
2107 :   Tests the `length()` builtin function.
2108
2109 scale
2110
2111 :   Tests the `scale()` builtin function.
2112
2113 shift
2114
2115 :   Tests the left (`<<`) and right (`>>`) shift operators.
2116
2117 add
2118
2119 :   Tests addition.
2120
2121 subtract
2122
2123 :   Tests subtraction.
2124
2125 multiply
2126
2127 :   Tests multiplication.
2128
2129 divide
2130
2131 :   Tests division.
2132
2133 modulus
2134
2135 :   Tests modulus.
2136
2137 power
2138
2139 :   Tests power (exponentiation).
2140
2141 sqrt
2142
2143 :   Tests the `sqrt()` (square root) builtin function.
2144
2145 trunc
2146
2147 :   Tests the truncation (`$`) operator.
2148
2149 places
2150
2151 :   Tests the places (`@`) operator.
2152
2153 vars
2154
2155 :   Tests some usage of variables. This one came from [AFL++][125] I think.
2156
2157 boolean
2158
2159 :   Tests boolean operators.
2160
2161 comp
2162
2163 :   Tests comparison operators.
2164
2165 abs
2166
2167 :   Tests the `abs()` builtin function.
2168
2169 assignments
2170
2171 :   Tests assignment operators, including increment/decrement operators.
2172
2173 functions
2174
2175 :   Tests functions, specifically function parameters being replaced before they
2176     themselves are used. See the comment in `bc_program_call()` about the last
2177     condition.
2178
2179 scientific
2180
2181 :   Tests scientific notation.
2182
2183 engineering
2184
2185 :   Tests engineering notation.
2186
2187 globals
2188
2189 :   Tests that assigning to globals affects callers.
2190
2191 strings
2192
2193 :   Tests strings.
2194
2195 strings2
2196
2197 :   Tests string allocation in slabs, to ensure slabs work.
2198
2199 letters
2200
2201 :   Tests single and double letter numbers to ensure they behave differently.
2202     Single-letter numbers always be set to the same value, regardless of
2203     `ibase`.
2204
2205 exponent
2206
2207 :   Tests the `e()` function in the math library.
2208
2209 log
2210
2211 :   Tests the `l()` function in the math library.
2212
2213 pi
2214
2215 :   Tests that `bc` produces the right value of pi for numbers with varying
2216     `scale` values.
2217
2218 arctangent
2219
2220 :   Tests the `a()` function in the math library.
2221
2222 sine
2223
2224 :   Tests the `s()` function in the math library.
2225
2226 cosine
2227
2228 :   Tests the `c()` function in the math library.
2229
2230 bessel
2231
2232 :   Tests the `j()` function in the math library.
2233
2234 fib
2235
2236 :   Tests the `fib()` Fibonacci function in the extended math library.
2237
2238 arrays
2239
2240 :   Test arrays.
2241
2242 misc
2243
2244 :   Miscellaneous tests. I named it this because at the time, I struggled to
2245     classify them, but it's really testing multi-line numbers.
2246
2247 misc1
2248
2249 :   A miscellaneous test found by [AFL++][125].
2250
2251 misc2
2252
2253 :   A miscellaneous test found by [AFL++][125].
2254
2255 misc3
2256
2257 :   A miscellaneous test found by [AFL++][125].
2258
2259 misc4
2260
2261 :   A miscellaneous test found by [AFL++][125].
2262
2263 misc5
2264
2265 :   A miscellaneous test found by [AFL++][125].
2266
2267 misc6
2268
2269 :   A miscellaneous test found by [AFL++][125].
2270
2271 misc7
2272
2273 :   A miscellaneous test found by [AFL++][125].
2274
2275 void
2276
2277 :   Tests void functions.
2278
2279 rand
2280
2281 :   Tests the pseudo-random number generator and its special stack handling.
2282
2283 rand_limits
2284
2285 :   Tests the limits of the pseudo-random number generator `irand()`.
2286
2287 recursive_arrays
2288
2289 :   Tested the slab vector undo ability in used in `bc_parse_name()` when it
2290     existed. Now used as a stress test.
2291
2292 divmod
2293
2294 :   Tests divmod.
2295
2296 modexp
2297
2298 :   Tests modular exponentiation.
2299
2300 bitfuncs
2301
2302 :   Tests the bitwise functions, `band()`, `bor()`, `bxor()`, `blshift()` and
2303     `brshift()` in [`gen/lib2.bc`][26].
2304
2305 leadingzero
2306
2307 :   Tests the leading zero functionality and the `plz*()` and `pnlz*()`
2308     functions in [`gen/lib2.bc`][26].
2309
2310 is_number
2311
2312 :   Tests the `is_number()` built-in function.
2313
2314 is_string
2315
2316 :   Tests the `is_number()` built-in function.
2317
2318 asciify_array
2319
2320 :   Tests the ability of `asciify()` to convert an array into a full string.
2321
2322 line_by_line1
2323
2324 :   Tests the line-by-line behavior of `bc` with regards to `quit` in a function
2325     definition.
2326
2327 line_by_line2
2328
2329 :   Tests the line-by-line behavior of `bc` with regards to `quit`.
2330
2331 line_loop_quit1
2332
2333 :   Tests the behavior of `bc` with a `quit` after a single-line loop.
2334
2335 line_loop_quit2
2336
2337 :   Tests the behavior of `bc` with a `quit` after a single-line loop and a
2338     newline escape.
2339
2340 #### `dc` Standard Tests
2341
2342 The list of current (27 February 2023) standard tests for `dc` is below:
2343
2344 decimal
2345
2346 :   Tests decimal parsing and printing.
2347
2348 length
2349
2350 :   Tests the `length()` builtin function, including for strings and arrays.
2351
2352 stack_len
2353
2354 :   Tests taking the length of the results stack.
2355
2356 exec_stack_len
2357
2358 :   Tests taking the length of the execution stack.
2359
2360 add
2361
2362 :   Tests addition.
2363
2364 subtract
2365
2366 :   Tests subtraction.
2367
2368 multiply
2369
2370 :   Tests multiplication.
2371
2372 divide
2373
2374 :   Tests division.
2375
2376 modulus
2377
2378 :   Tests modulus.
2379
2380 divmod
2381
2382 :   Tests divmod.
2383
2384 power
2385
2386 :   Tests power (exponentiation).
2387
2388 sqrt
2389
2390 :   Tests the `sqrt()` (square root) builtin function.
2391
2392 modexp
2393
2394 :   Tests modular exponentiation.
2395
2396 boolean
2397
2398 :   Tests boolean operators.
2399
2400 negate
2401
2402 :   Tests negation as a command and as part of numbers.
2403
2404 trunc
2405
2406 :   Tests the truncation (`$`) operator.
2407
2408 places
2409
2410 :   Tests the places (`@`) operator.
2411
2412 shift
2413
2414 :   Tests the left (`<<`) and right (`>>`) shift operators.
2415
2416 abs
2417
2418 :   Tests the `abs()` builtin function (the `b` command).
2419
2420 scientific
2421
2422 :   Tests scientific notation.
2423
2424 engineering
2425
2426 :   Tests engineering notation.
2427
2428 vars
2429
2430 :   Tests some usage of variables. This one came from [AFL++][125] I think.
2431
2432 misc
2433
2434 :   Miscellaneous tests. I named it this because at the time, I struggled to
2435     classify them.
2436
2437 misc1
2438
2439 :   More miscellaneous tests. This used to be an error file
2440     (`tests/dc/errors/15.txt`) due to the presence of a invalid `u` character.
2441     However, starting with version `6.1.0`, `u` became a valid command
2442     (`is_number()`), so the error file became valid. It was checked manually and
2443     moved, and `tests/dc/errors/34.txt` became the new `tests/dc/errors/15.txt`.
2444
2445 strings
2446
2447 :   Tests strings.
2448
2449 rand
2450
2451 :   Tests the pseudo-random number generator and its special stack handling.
2452
2453 is_number
2454
2455 :   Tests the `is_number()` built-in function (the `u` command).
2456
2457 is_string
2458
2459 :   Tests the `is_number()` built-in function (the `t` command).
2460
2461 ### Script Tests
2462
2463 The heavy lifting of testing the scripting of `bc` is done by the "script tests"
2464 for each calculator.
2465
2466 These tests use the files in the [`tests/bc/scripts/`][169] and
2467 [`tests/dc/scripts/`][170] directories (except for
2468 [`tests/bc/scripts/all.txt`][171] and [`tests/dc/scripts/all.txt`][172]), which
2469 are called the "script test directories."
2470
2471 To add a script test, do the following:
2472
2473 * Add the test file (`<test>.bc` or `<test>.dc` in the script tests directory).
2474 * Add the results file (`<test>.txt` in the script tests directory). You can
2475   skip this step if just the results file needs to be generated. See the
2476   [Generated Tests][147] section for more information.
2477 * Add the name of the test to the `all.txt` file in the script tests directory,
2478   putting it in the order it should be in. If possible, I would put longer tests
2479   near the beginning because they will start running earlier with parallel
2480   `make`.
2481
2482 Some script tests need to be skipped in certain cases. That is handled by the
2483 [build system][142]. See the [Integration with the Build System][147] section
2484 for more details.
2485
2486 Another unique thing about the script tests, at least for `bc`: they test the
2487 `-g` and `--global-stacks` flags. This means that all of the script tests for
2488 `bc` are written assuming the `-g` flag was given on the command-line
2489
2490 There is one extra piece of script tests: [`tests/script.sed`][190]. This `sed`
2491 script is used to remove an incompatibility with GNU `bc`.
2492
2493 If there is only one more character to print at the end of `BC_LINE_LENGTH`, GNU
2494 `bc` still prints a backslash+newline+digit combo. OpenBSD doesn't, which is
2495 correct according to my reading of the `bc` spec, so my `bc` doesn't as well.
2496
2497 The `sed` script edits numbers that end with just one digit on a line by itself
2498 to put it on the same line as others.
2499
2500 #### `bc` Script Tests
2501
2502 The list of current (27 February 2023) script tests for `bc` is below:
2503
2504 print.bc
2505
2506 :   Tests printing even harder than the print standard test.
2507
2508 print2
2509
2510 :   Tests printing at the extreme edge of line length in various bases.
2511
2512 multiply.bc
2513
2514 :   Tests multiplication even harder than the multiply standard test.
2515
2516 divide.bc
2517
2518 :   Tests division even harder than the divide standard test.
2519
2520 subtract.bc
2521
2522 :   Tests subtraction even harder than the subtract standard test.
2523
2524 add.bc
2525
2526 :   Tests addition even harder than the add standard test.
2527
2528 parse.bc
2529
2530 :   Tests parsing even harder than the parse standard test.
2531
2532 root.bc
2533
2534 :   Tests that `root()` and `cbrt()` do not go into an infinite loop on a
2535     pathological case found by a user.
2536
2537 array.bc
2538
2539 :   Tests arrays even harder than the arrays standard test.
2540
2541 array2.bc
2542
2543 :   Implements a test where an array element is passed as a parameter to a
2544     function, and then another array is passed to a later parameter that is
2545     named the same as the first array. This was added because of a bug found
2546     while writing a script in bc.
2547
2548 atan.bc
2549
2550 :   Tests arctangent even harder than the arctangent standard test.
2551
2552 bessel.bc
2553
2554 :   Tests bessel even harder than the bessel standard test.
2555
2556 functions.bc
2557
2558 :   Tests functions even harder than the functions standard test.
2559
2560 globals.bc
2561
2562 :   Tests global stacks directly.
2563
2564 len.bc
2565
2566 :   Tests the `length()` builtin on arrays.
2567
2568 rand.bc
2569
2570 :   Tests the random number generator in the presence of global stacks.
2571
2572 references.bc
2573
2574 :   Tests functions with array reference parameters.
2575
2576 screen.bc
2577
2578 :   A random script provided by an early user that he used to calculate the size
2579     of computer screens
2580
2581 strings2.bc
2582
2583 :   Tests escaping in strings.
2584
2585 ifs.bc
2586
2587 :   Tests proper ending of `if` statements without `else` statements.
2588
2589 ifs2.bc
2590
2591 :   More tests proper ending of `if` statements without `else` statements.
2592
2593 #### `dc` Script Tests
2594
2595 The list of current (27 February 2023) script tests for `dc` is below:
2596
2597 prime.dc
2598
2599 :   Tests scripting by generating the first 100,000 primes.
2600
2601 asciify.dc
2602
2603 :   Tests the asciify command.
2604
2605 stream.dc
2606
2607 :   Tests the stream command.
2608
2609 array.dc
2610
2611 :   Tests arrays.
2612
2613 else.dc
2614
2615 :   Tests else clauses on conditional execution commands.
2616
2617 factorial.dc
2618
2619 :   Tests scripting with factorial.
2620
2621 loop.dc
2622
2623 :   Tests scripting by implementing loops.
2624
2625 quit.dc
2626
2627 :   Tests the quit command in the presence of tail calls.
2628
2629 weird.dc
2630
2631 :   A miscellaneous test.
2632
2633 no_clamp.dc
2634
2635 :   A test to ensure `dc` has the same behavior as the BSD `dc` with digi
2636     clamping off when parsing numbers.
2637
2638 ### Error Tests
2639
2640 One of the most useful parts of the `bc` test suite, in my opinion, is the heavy
2641 testing of error conditions.
2642
2643 Just about every error condition I can think of is tested, along with many
2644 machine-generated (by [AFL++][125]) ones.
2645
2646 However, because the error tests will often return error codes, they require
2647 different infrastructure from the rest of the test suite, which assumes that
2648 the calculator under test will return successfully. A lot of that infrastructure
2649 is in the [`scripts/functions.sh`][105] script, but it basically allows the
2650 calculator to exit with an error code and then tests that there *was* an error
2651 code.
2652
2653 Besides returning error codes, error tests also ensure that there is output from
2654 `stderr`. This is to make sure that an error message is always printed.
2655
2656 The error tests for each calculator are spread through two directories, due to
2657 historical accident. These two directories are the standard test directory (see
2658 the [Standard Tests][149] section) and the `errors/` directory directly
2659 underneath the standard tests directory.
2660
2661 This split is convenient, however, because the tests in each directory are
2662 treated differently.
2663
2664 The error tests in the standard test directory, which include `errors.txt` for
2665 both calculators, `posix_errors.txt` for `bc`, and `read_errors.txt` for `dc`,
2666 are run by [`tests/errors.sh`][226]. It reads them line-by-line and shoves the
2667 data through `stdin`. Each line is considered a separate test. For this reason,
2668 there can't be any blank lines in the error files in the standard tests
2669 directory because a blank line causes a successful exit.
2670
2671 On the other hand, the tests in the `errors/` directory below the standard tests
2672 directory are run by [`tests/error.sh`][227] and are considered to be one test
2673 per file. As such, they are used differently. They are shoved into the
2674 calculator through `stdin`, but they are also executed by passing them on the
2675 command-line.
2676
2677 To add an error test, first figure out which kind you want.
2678
2679 Is it a simple one-liner, and you don't care if it's tested through a file?
2680
2681 Then put it in one of the error files in the standard test directory. I would
2682 only put POSIX errors in the `posix_errors.txt` file for `bc`, and only `read()`
2683 errors in the `read_errors.txt` file for `dc`; all others I would put in the
2684 respective `errors.txt` file.
2685
2686 On the other hand, if you care if the error is run as a file on the
2687 command-line, or the error requires multiple lines to reproduce, then put the
2688 test in the respective `errors/` directory and run the [`configure.sh`][69]
2689 script again.
2690
2691 After that, you are done; the test suite will automatically pick up the new
2692 test, and you don't have to tell the test suite the expected results.
2693
2694 ### `stdin` Tests
2695
2696 The `stdin` tests specifically test the lexing and parsing of multi-line
2697 comments and strings. This is important because when reading from `stdin`, the
2698 calculators can only read one line at a time, so partial parses are possible.
2699
2700 To add `stdin` tests, just add the tests to the `stdin.txt` file in the
2701 respective standard tests directory, and add the expected results in the
2702 `stdin_results.txt` in the respective standard tests directory.
2703
2704 ### `read()` Tests
2705
2706 The `read()` tests are meant to test the `read()` builtin function, to ensure
2707 that the parsing and execution is correct.
2708
2709 Each line is one test, as that is the nature of using the `read()` function, so
2710 to add a test, just add it as another line in the `read.txt` file in the
2711 respective standard tests directory, and add its result to the
2712 `read_results.txt` file in the respective standard tests directory.
2713
2714 ### Other Tests
2715
2716 The "other" tests are just random tests that I could not easily classify under
2717 other types of tests. They usually include things like command-line parsing and
2718 environment variable testing.
2719
2720 To add an other test, it requires adding the programming for it to
2721 [`tests/other.sh`][195] because all of the tests are written specifically in
2722 that script. It would be best to use the infrastructure in
2723 [`scripts/functions.sh`][105].
2724
2725 ### Linux `timeconst.bc` Script
2726
2727 One special script that `bc`'s test suite will use is the [Linux `timeconst.bc`
2728 script][6].
2729
2730 I made the test suite able to use this script because the reason the
2731 [toybox][16] maintainer wanted my `bc` is because of this script, and I wanted
2732 to be sure that it would run correctly on the script.
2733
2734 However, it is not part of the distribution, nor is it part of the repository.
2735 The reason for this is because [`timeconst.bc`][6] is under the GPL, while this
2736 repo is under a BSD license.
2737
2738 If you want `bc` to run tests on [`timeconst.bc`][6], download it and place it
2739 at `tests/bc/scripts/timeconst.bc`. If it is there, the test suite will
2740 automatically run its tests; otherwise, it will skip it.
2741
2742 ### History Tests
2743
2744 There are automatic tests for history; however, they have dependencies: Python 3
2745 and [`pexpect`][137].
2746
2747 As a result, because I need the [test suite to be portable][138], like the rest
2748 of `bc`, the history tests are carefully guarded with things to ensure that they
2749 are skipped, rather than failing if Python and [`pexpect`][137] are not
2750 installed. For this reason, there is a `sh` script, [`tests/history.sh`][140]
2751 that runs the actual script, [`tests/history.py`][139].
2752
2753 I have added as many tests as I could to cover as many lines and branches as
2754 possible. I guess I could have done more, but doing so would have required a lot
2755 of time.
2756
2757 I have tried to make it as easy as possible to run the history tests. They will
2758 run automatically if you use the `make test_history` command, and they will also
2759 use parallel execution with `make -j<cores> test_history`.
2760
2761 However, the history tests are meant only to be run by maintainers of `bc`; they
2762 are *not* meant to be run by users and packagers. The reason for this is that
2763 they only seem to work reliably on Linux; `pexpect` seems to have issues on
2764 other platforms, especially timeout issues.
2765
2766 Thus, they are excluded from running with `make test` and [`tests/all.sh`][225].
2767 However, they can be run from the [`scripts/release.sh`][83] script.
2768
2769 All of the tests are contained in [`tests/history.py`][139]. The reason for this
2770 is because they are in Python, and I don't have an easy way of including Python
2771 (or at the very least, I am not familiar enough with Python to do that). So they
2772 are all in the same file to make it easier on me.
2773
2774 Each test is one function in the script. They all take the same number and type
2775 of arguments:
2776
2777 1.      `exe`: the executable to run.
2778 2.      `args`: the arguments to pass to the executable.
2779 3.      `env`: the environment.
2780
2781 Each function creates a child process with `pexpect.spawn` and then tests with
2782 that child. Then the function returns the child to the caller, who closes it
2783 and checks its error code against its expected error code.
2784
2785 Yes, the error code is not a success all the time. This is because of the UTF-8
2786 tests; `bc` gives a fatal error on any non-ASCII data because ASCII is all `bc`
2787 is required to handle, per the [standard][2].
2788
2789 So in [`tests/history.py`][139], there are four main arrays:
2790
2791 * `bc` test functions,
2792 * `bc` expected error codes.
2793 * `dc` test functions.
2794 * `dc` expected error codes.
2795
2796 [`tests/history.py`][139] takes an index as an argument; that index is what test
2797 it should run. That index is used to index into the proper test and error code
2798 array.
2799
2800 If you need to add more history tests, you need to do the following:
2801
2802 1.      Add the function for that test to [`tests/history.py`][139].
2803 2.      Add the function to the proper array of tests.
2804 3.      Add the expected error code to the proper array of error codes.
2805 4.      Add a target for the test to [`Makefile.in`][70].
2806 5.      Add that target as a prerequisite to either `test_bc_history` or
2807         `test_dc_history`.
2808
2809 You do not need to do anything to add the test to `history_all_tests` (see
2810 [Group Tests][141] below) because the scripts will automatically run all of the
2811 tests properly.
2812
2813 ### Generated Tests
2814
2815 Some tests are *large*, and as such, it is impractical to check them into `git`.
2816 Instead, the tests depend on the existence of a GNU-compatible `bc` in the
2817 `PATH`, which is then used to generate the tests.
2818
2819 If [`configure.sh`][69] was run with the `-G` argument, which disables generated
2820 tests, then `make test` and friends will automatically skip generated tests.
2821 This is useful to do on platforms that are not guaranteed to have a
2822 GNU-compatible `bc` installed.
2823
2824 However, adding a generated test is a complicated because you have to figure out
2825 *where* you want to put the file to generate the test.
2826
2827 For example, `bc`'s test suite will automatically use a GNU-compatible `bc` to
2828 generate a `<test>_results.txt` file in the [standard tests][149] directory
2829 (either `tests/bc/` or `tests/dc/`) if none exists for the `<test>` test. If no
2830 `<test>.txt` file exists in the [standard tests][149] directory, then `bc`'s
2831 test suite will look for a `<test>.bc` or `<test>.dc` file in the [script
2832 tests][150] directory (either `tests/bc/scripts` or `tests/dc/scripts`), and if
2833 that exists, it will use that script to generate the `<test>.txt` file in the
2834 [standard tests][149] directory after which it will generate the
2835 `<test>_results.txt` file in the [standard tests][149] directory.
2836
2837 So you can choose to either:
2838
2839 * Have a test in the [standard tests][149] directory without a corresponding
2840   `*_results.txt` file, or
2841 * Have a script in the [script tests][150] directory to generate the
2842   corresponding file in the standard test directory before generating the
2843   corresponding `*_results.txt` file.
2844
2845 Adding a script has a double benefit: the script itself can be used as a test.
2846 However, script test results can also be generated.
2847
2848 If `bc` is asked to run a script test, then if the script does not exist, `bc`'s
2849 test suite returns an error. If it *does* exist, but no corresponding
2850 `<test>.txt` file exists in the [script tests][150] directory, then a
2851 GNU-compatible `bc` is used to generate the `<test>.txt` results file.
2852
2853 If generated tests are disabled through [`configure.sh`][69], then these tests
2854 are not generated if they do not exist. However, if they *do* exist, then they
2855 are run. This can happen if a `make clean_tests` was not run between a build
2856 that generated tests and a build that will not.
2857
2858 ### Group Tests
2859
2860 While the test suite has a lot of targets in order to get parallel execution,
2861 there are five targets that allow you to run each section, or all, of the test
2862 suite as one unit:
2863
2864 * `bc_all_tests` (`bc` tests)
2865 * `timeconst_all_tests` ([Linux `timeconst.bc` script][6] tests)
2866 * `dc_all_tests` (`dc` tests)
2867 * `history_all_tests` (history tests)
2868 * `run_all_tests` (combination of the previous four)
2869
2870 In addition, there are more fine-grained targets available:
2871
2872 * `test_bc` runs all `bc` tests (except history tests).
2873 * `test_dc` runs all `dc` tests (except history tests).
2874 * `test_bc_tests` runs all `bc` [standard tests][149].
2875 * `test_dc_tests` runs all `dc` [standard tests][149].
2876 * `test_bc_scripts` runs all `bc` [script tests][150].
2877 * `test_dc_scripts` runs all `dc` [script tests][150].
2878 * `test_bc_stdin` runs the `bc` [`stdin` tests][152].
2879 * `test_dc_stdin` runs the `dc` [`stdin` tests][152].
2880 * `test_bc_read` runs the `bc` [`read()` tests][153].
2881 * `test_dc_read` runs the `dc` [`read()` tests][153].
2882 * `test_bc_errors` runs the `bc` [error tests][151].
2883 * `test_dc_errors` runs the `dc` [error tests][151].
2884 * `test_bc_other` runs the `bc` [other tests][151].
2885 * `test_dc_other` runs the `dc` [other tests][151].
2886 * `timeconst` runs the tests for the [Linux `timeconst.bc` script][6].
2887 * `test_history` runs all history tests.
2888 * `test_bc_history` runs all `bc` history tests.
2889 * `test_dc_history` runs all `dc` history tests.
2890
2891 All of the above tests are parallelizable.
2892
2893 ### Individual Tests
2894
2895 In addition to all of the above, individual test targets are available. These
2896 are mostly useful for attempting to fix a singular test failure.
2897
2898 These tests are:
2899
2900 * `test_bc_<test>`, where `<test>` is the name of a `bc` [standard test][149].
2901   The name is the name of the test file without the `.txt` extension. It is the
2902   name printed by the test suite when running the test.
2903 * `test_dc_<test>`, where `<test>` is the name of a `dc` [standard test][149].
2904   The name is the name of the test file without the `.txt` extension. It is the
2905   name printed by the test suite when running the test.
2906 * `test_bc_script_<test>`, where `<test>` is the name of a `bc` [script
2907   test][150]. The name of the test is the name of the script without the `.bc`
2908   extension.
2909 * `test_dc_script_<test>`, where `<test>` is the name of a `dc` [script
2910   test][150]. The name of the test is the name of the script without the `.dc`
2911   extension.
2912 * `test_bc_history<idx>` runs the `bc` history test with index `<idx>`.
2913 * `test_dc_history<idx>` runs the `dc` history test with index `<idx>`.
2914
2915 ### [`bcl`][156] Test
2916
2917 When [`bcl`][156] is built, the [build system][142] automatically ensures that
2918 `make test` runs the [`bcl`][156] test instead of the `bc` and `dc` tests.
2919
2920 There is only one test, and it is built from [`tests/bcl.c`][158].
2921
2922 The reason the test is in C is because [`bcl`][156] is a C library; I did not
2923 want to have to write C code *and* POSIX `sh` scripts to run it.
2924
2925 The reason there is only one test is because most of the code for the library is
2926 tested by virtue of testing `bc` and `dc`; the test needs to only ensure that
2927 the library bindings and plumbing do not interfere with the underlying code.
2928
2929 However, just because there is only one test does not mean that it doesn't test
2930 more than one thing. The code actually handles a series of tests, along with
2931 error checking to ensure that nothing went wrong.
2932
2933 To add a [`bcl`][156] test, just figure out what test you want, figure out where
2934 in the [`tests/bcl.c`][158] would be best to put it, and put it there. Do as
2935 much error checking as possible, and use the `err(BclError)` function. Ensure
2936 that all memory is freed because that test is run through [Valgrind][159] and
2937 [AddressSanitizer][160].
2938
2939 ### Integration with the Build System
2940
2941 If it was not obvious by now, the test suite is heavily integrated into the
2942 [build system][142], but the integration goes further than just making the test
2943 suite easy to run from `make` and generating individual and group tests.
2944
2945 The big problem the test suite has is that some `bc` code, stuff that is
2946 important to test, is only in *some* builds. This includes all of the extra math
2947 extensions, for example.
2948
2949 So the test suite needs to have some way of turning off the tests that depend on
2950 certain [build types][81] when those [build types][81] are not used.
2951
2952 This is the reason the is tightly integrated with the [build system][142]: the
2953 [build system][142] knows what [build type][81] was used and can tell the test
2954 suite to turn off the tests that do not apply.
2955
2956 It does this with arguments to the test scripts that are either a `1` or a `0`,
2957 depending on whether tests of that type should be enabled or not. These
2958 arguments are why I suggest, in the [Test Scripts][148] section, to always use a
2959 `make` target to run the test suite or any individual test. I have added a lot
2960 of targets to make this easy and as fast as possible.
2961
2962 In addition to all of that, the build system is responsible for selecting the
2963 `bc`/`dc` tests or the [`bcl` test][157].
2964
2965 ### Output Directories
2966
2967 During any run of the test suite, the test suite outputs the results of running
2968 various tests to files. These files are usually output to `tests/bc_outputs/`
2969 and `tests/dc_outputs/`.
2970
2971 However, in some cases, it may be necessary to output test results to a
2972 different directory. If that is the case, set the environment variable
2973 `BC_TEST_OUTPUT_DIR` to the name of the directory.
2974
2975 If that is done, then test results will be written to
2976 `$BC_TEST_OUTPUT_DIR/bc_outputs/` and `$BC_TEST_OUTPUT_DIR/dc_outputs/`.
2977
2978 ### Test Suite Portability
2979
2980 The test suite is meant to be run by users and packagers as part of their
2981 install process.
2982
2983 This puts some constraints on the test suite, but the biggest is that the test
2984 suite must be as [portable as `bc` itself][136].
2985
2986 This means that the test suite must be implemented in pure POSIX `make`, `sh`,
2987 and C99.
2988
2989 #### Test Scripts
2990
2991 To accomplish the portability, the test suite is run by a bunch of `sh` scripts
2992 that have the constraints laid out in [POSIX Shell Scripts][76].
2993
2994 However, that means they have some quirks, made worse by the fact that there are
2995 [generated tests][143] and [tests that need to be skipped, but only
2996 sometimes][147].
2997
2998 This means that a lot of the scripts take an awkward number and type of
2999 arguments. Some arguments are strings, but most are integers, like
3000 [`scripts/release.sh`][83].
3001
3002 It is for this reason that I do not suggest running the test scripts directly.
3003 Instead, always use an appropriate `make` target, which already knows the
3004 correct arguments for the test because of the [integration with the build
3005 system][147].
3006
3007 ### Test Coverage
3008
3009 In order to get test coverage information, you need `gcc`, `gcov`, and `gcovr`.
3010
3011 If you have them, run the following commands:
3012
3013 ```
3014 CC=gcc ./configure -gO3 -c
3015 make -j<cores>
3016 make coverage
3017 ```
3018
3019 Note that `make coverage` does not have a `-j<cores>` part; it cannot be run in
3020 parallel. If you try, you will get errors. And note that `CC=gcc` is used.
3021
3022 After running those commands, you can open your web browser and open the
3023 `index.html` file in the root directory of the repo. From there, you can explore
3024 all of the coverage results.
3025
3026 If you see lines or branches that you think you could hit with a manual
3027 execution, do such manual execution, and then run the following command:
3028
3029 ```
3030 make coverage_output
3031 ```
3032
3033 and the coverage output will be updated.
3034
3035 If you want to rerun `make coverage`, you must do a `make clean` and build
3036 first, like this:
3037
3038 ```
3039 make clean
3040 make -j<cores>
3041 make coverage
3042 ```
3043
3044 Otherwise, you will get errors.
3045
3046 If you want to run tests in parallel, you can do this:
3047
3048 ```
3049 make -j<cores>
3050 make -j<cores> test
3051 make coverage_output
3052 ```
3053
3054 and that will generate coverage output correctly.
3055
3056 ### [AddressSanitizer][21] and Friends
3057
3058 To run the test suite under [AddressSanitizer][21] or any of its friends, use
3059 the following commands:
3060
3061 ```
3062 CFLAGS="-fsanitize=<sanitizer> ./configure -gO3 -m
3063 make -j<cores>
3064 make -j<cores> test
3065 ```
3066
3067 where `<sanitizer>` is the correct name of the desired sanitizer. There is one
3068 exception to the above: `UndefinedBehaviorSanitizer` should be run on a build
3069 that has zero optimization, so for `UBSan`, use the following commands:
3070
3071 ```
3072 CFLAGS="-fsanitize=undefined" ./configure -gO0 -m
3073 make -j<cores>
3074 make -j<cores> test
3075 ```
3076
3077 ### [Valgrind][20]
3078
3079 To run the test suite under [Valgrind][20], run the following commands:
3080
3081 ```
3082 ./configure -gO3 -v
3083 make -j<cores>
3084 make -j<cores> test
3085 ```
3086
3087 It really is that easy. I have directly added infrastructure to the build system
3088 and the test suite to ensure that if [Valgrind][20] detects any memory errors or
3089 any memory leaks at all, it will tell the test suite infrastructure to report an
3090 error and exit accordingly.
3091
3092 ## POSIX Shell Scripts
3093
3094 There is a lot of shell scripts in this repository, and every single one of them
3095 is written in pure [POSIX `sh`][72].
3096
3097 The reason that they are written in [POSIX `sh`][72] is for *portability*: POSIX
3098 systems are only guaranteed to have a barebones implementation of `sh`
3099 available.
3100
3101 There are *many* snares for unwary programmers attempting to modify
3102 [`configure.sh`][69], any of the scripts in this directory, [`strgen.sh`][9], or
3103 any of the scripts in [`tests/`][77]. Here are some of them:
3104
3105 1.      No `bash`-isms.
3106 2.      Only POSIX standard utilities are allowed.
3107 3.      Only command-line options defined in the POSIX standard for POSIX utilities
3108         are allowed.
3109 4.      Only the standardized behavior of POSIX utilities is allowed.
3110 5.      Functions return data by *printing* it. Using `return` sets their exit code.
3111
3112 In other words, the script must only use what is standardized in the [`sh`][72]
3113 and [Shell Command Language][73] standards in POSIX. This is *hard*. It precludes
3114 things like `local` and the `[[ ]]` notation.
3115
3116 These are *enormous* restrictions and must be tested properly. I put out at
3117 least one release with a change to `configure.sh` that wasn't portable. That was
3118 an embarrassing mistake.
3119
3120 The lack of `local`, by the way, is why variables in functions are named with
3121 the form:
3122
3123 ```
3124 _<function_name>_<var_name>
3125 ```
3126
3127 This is done to prevent any clashes of variable names with already existing
3128 names. And this applies to *all* shell scripts. However, there are a few times
3129 when that naming convention is *not* used; all of them are because those
3130 functions are required to change variables in the global scope.
3131
3132 ### Maintainer-Only Scripts
3133
3134 If a script is meant to be used for maintainers (of `bc`, not package
3135 maintainers), then rules 2, 3, and 4 don't need to be followed as much because
3136 it is assumed that maintainers will be able to install whatever tools are
3137 necessary to do the job.
3138
3139 ## Manuals
3140
3141 The manuals for `bc` and `dc` are all generated, and the manpages for `bc`,
3142 `dc`, and `bcl` are also generated.
3143
3144 Why?
3145
3146 I don't like the format of manpages, and I am not confident in my ability to
3147 write them. Also, they are not easy to read on the web.
3148
3149 So that explains why `bcl`'s manpage is generated from its markdown version. But
3150 why are the markdown versions of the `bc` and `dc` generated?
3151
3152 Because the content of the manuals needs to change based on the [build
3153 type][81]. For example, if `bc` was built with no history support, it should not
3154 have the **COMMAND LINE HISTORY** section in its manual. If it did, that would
3155 just confuse users.
3156
3157 So the markdown manuals for `bc` and `dc` are generated from templates
3158 ([`manuals/bc.1.md.in`][89] and [`manuals/dc.1.md.in`][90]). And from there,
3159 the manpages are generated from the generated manuals.
3160
3161 The generated manpage for `bcl` ([`manuals/bcl.3`][62]) is checked into version
3162 control, and the generated markdown manuals and manpages for `bc`
3163 ([`manuals/bc`][79]) and `dc` ([`manuals/dc`][80]) are as well.
3164
3165 This is because generating the manuals and manpages requires a heavy dependency
3166 that only maintainers should care about: [Pandoc][92]. Because users [should not
3167 have to install *any* dependencies][136], the files are generated, checked into
3168 version control, and included in distribution tarballs.
3169
3170 If you run [`configure.sh`][69], you have an easy way of generating the markdown
3171 manuals and manpages: just run `make manpages`. This target calls
3172 [`scripts/manpage.sh`][60] appropriately for `bc`, `dc`, and `bcl`.
3173
3174 For more on how generating manuals and manpages works, see
3175 [`scripts/manpage.sh`][60].
3176
3177 ## Locales
3178
3179 The locale system of `bc` is enormously complex, but that's because
3180 POSIX-compatible locales are terrible.
3181
3182 How are they terrible?
3183
3184 First, `gencat` does not work for generating cross-compilation. In other words,
3185 it does not generate machine-portable files. There's nothing I can do about
3186 this except for warn users.
3187
3188 Second, the format of `.msg` files is...interesting. Thank goodness it is text
3189 because otherwise, it would be impossible to get them right.
3190
3191 Third, `.msg` files are not used. In other words, `gencat` exists. Why?
3192
3193 Fourth, `$NLSPATH` is an awful way to set where and *how* to install locales.
3194
3195 Yes, where and *how*.
3196
3197 Obviously, from it's name, it's a path, and that's the where. The *how* is more
3198 complicated.
3199
3200 It's actually *not* a path, but a path template. It's a format string, and it
3201 can have a few format specifiers. For more information on that, see [this
3202 link][84]. But in essence, those format specifiers configure how each locale is
3203 supposed to be installed.
3204
3205 With all those problems, why use POSIX locales? Portability, as always. I can't
3206 assume that `gettext` will be available, but I *can* pretty well assume that
3207 POSIX locales will be available.
3208
3209 The locale system of `bc` includes all files under [`locales/`][85],
3210 [`scripts/locale_install.sh`][87], [`scripts/locale_uninstall.sh`][88],
3211 [`scripts/functions.sh`][105], the `bc_err_*` constants in [`src/data.c`][131],
3212 and the parts of the build system needed to activate it. There is also code in
3213 [`src/vm.c`][58] (in `bc_vm_gettext()`) for loading the current locale.
3214
3215 If the order of error messages and/or categories are changed, the order of
3216 errors must be changed in the enum, the default error messages and categories in
3217 [`src/data.c`][131], and all of the messages and categories in the `.msg` files
3218 under [`locales/`][85].
3219
3220 ## Static Analysis
3221
3222 I do *some* static analysis on `bc`.
3223
3224 I used to use [Coverity][196], but I stopped using it when it started giving me
3225 too many false positives and also because it had a vulnerability.
3226
3227 However, I still use the [Clang Static Analyzer][197] through
3228 [`scan-build`][19]. I only use it in debug mode because I have to add some
3229 special code to make it not complain about things that are definitely not a
3230 problem.
3231
3232 The most frequent example of false positives is where a local is passed to a
3233 function to be initialized. [`scan-build`][19] misses that fact, so I
3234 pre-initialize such locals to prevent the warnings.
3235
3236 To run `scan-build`, do the following:
3237
3238 ```
3239 make clean
3240 scan-build make
3241 ```
3242
3243 `scan-build` will print its warnings to `stdout`.
3244
3245 ## Fuzzing
3246
3247 The quality of this `bc` is directly related to the amount of fuzzing I did. As
3248 such, I spent a lot of work making the fuzzing convenient and fast, though I do
3249 admit that it took me a long time to admit that it did need to be faster.
3250
3251 First, there were several things which make fuzzing fast:
3252
3253 * Using [AFL++][125]'s deferred initialization.
3254 * Splitting `bc`'s corpuses.
3255 * Parallel fuzzing.
3256
3257 Second, there are several things which make fuzzing convenient:
3258
3259 * Preprepared input corpuses.
3260 * [`scripts/fuzz_prep.sh`][119].
3261 * `tmux` and `tmuxp` configs.
3262 * [`scripts/afl.py`][94].
3263
3264 ### Fuzzing Performance
3265
3266 Fuzzing with [AFL++][125] can be ***SLOW***. Spending the time to make it as
3267 fast as possible is well worth the time.
3268
3269 However, there is a caveat to the above: it is easy to make [AFL++][125] crash,
3270 be unstable, or be unable to find "paths" (see [AFL++ Quickstart][129]) if the
3271 performance enhancements are done poorly.
3272
3273 To stop [AFL++][125] from crashing on test cases, and to be stable, these are
3274 the requirements:
3275
3276 * The state at startup must be *exactly* the same.
3277 * The virtual memory setup at startup must be *exactly* the same.
3278
3279 The first isn't too hard; it's the second that is difficult.
3280
3281 `bc` allocates a lot of memory at start. ("A lot" is relative; it's far less
3282 than most programs.) After going through an execution run, however, some of that
3283 memory, while it could be cleared and reset, is in different places because of
3284 vectors. Since vectors reallocate, their allocations are not guaranteed to be in
3285 the same place.
3286
3287 So to make all three work, I had to set up the deferred initialization and
3288 persistent mode *before* any memory was allocated (except for `vm.jmp_bufs`,
3289 which is probably what caused the stability to drop below 100%). However, using
3290 deferred alone let me put the [AFL++][125] initialization further back. This
3291 works because [AFL++][125] sets up a `fork()` server that `fork()`'s `bc` right
3292 at that call. Thus, every run has the exact same virtual memory setup, and each
3293 run can skip all of the setup code.
3294
3295 I tested `bc` using [AFL++][125]'s deferred initialization, plus persistent
3296 mode, plus shared memory fuzzing. In order to do it safely, with stability above
3297 99%, all of that was actually *slower* than using just deferred initialization
3298 with the initialization *right before* `stdin` was read. And as a bonus, the
3299 stability in that situation is 100%.
3300
3301 As a result, my [AFL++][125] setup only uses deferred initialization. That's the
3302 `__AFL_INIT()` call.
3303
3304 (Note: there is one more big item that must be done in order to have 100%
3305 stability: the pseudo-random number generator *must* start with *exactly* the
3306 same seed for every run. This is set up with the `tmux` and `tmuxp` configs that
3307 I talk about below in [Convenience][130]. This seed is set before the
3308 `__AFL_INIT()` call, so setting it has no runtime cost for each run, but without
3309 it, stability would be abysmal.)
3310
3311 On top of that, while `dc` is plenty fast under fuzzing (because of a faster
3312 parser and less test cases), `bc` can be slow. So I have split the `bc` input
3313 corpus into three parts, and I set fuzzers to run on each individually. This
3314 means that they will duplicate work, but they will also find more stuff.
3315
3316 On top of all of that, each input corpus (the three `bc` corpuses and the one
3317 `dc` corpus) is set to run with 4 fuzzers. That works out perfectly for two
3318 reasons: first, my machine has 16 cores, and second, the [AFL++][125] docs
3319 recommend 4 parallel fuzzers, at least, to run different "power schedules."
3320
3321 ### Convenience
3322
3323 The preprepared input corpuses are contained in the
3324 `tests/fuzzing/bc_inputs{1,2,3}/`, and `tests/fuzzing/dc_inputs` directories.
3325 There are three `bc` directories and only one `dc` directory because `bc`'s
3326 input corpuses are about three times as large, and `bc` is a larger program;
3327 it's going to need much more fuzzing.
3328
3329 (They do share code though, so fuzzing all of them still tests a lot of the same
3330 math code.)
3331
3332 The next feature of convenience is the [`scripts/fuzz_prep.sh`][119] script. It
3333 assumes the existence of `afl-clang-lto` in the `$PATH`, but if that exists, it
3334 automatically configures and builds `bc` with a fuzz-ideal build.
3335
3336 A fuzz-ideal build has several things:
3337
3338 * `afl-clang-lto` as the compiler. (See [AFL++ Quickstart][129].)
3339 * Debug mode, to crash as easily as possible.
3340 * Full optimization (including [Link-Time Optimization][126]), for performance.
3341 * [AFL++][125]'s deferred initialization (see [Fuzzing Performance][127] above).
3342 * And `AFL_HARDEN=1` during the build to harden the build. See the [AFL++][125]
3343   documentation for more information.
3344
3345 There is one big thing that a fuzz-ideal build does *not* have: it does not use
3346 [AFL++][125]'s `libdislocator.so`. This is because `libdislocator.so` crashes if
3347 it fails to allocate memory. I do not want to consider those as crashes because
3348 my `bc` does, in fact, handle them gracefully by exiting with a set error code.
3349 So `libdislocator.so` is not an option.
3350
3351 However, to add to [`scripts/fuzz_prep.sh`][119] making a fuzz-ideal build, in
3352 `tests/fuzzing/`, there are two `yaml` files: [`tests/fuzzing/bc_afl.yaml`][120]
3353 and [`tests/fuzzing/bc_afl_continue.yaml`][121]. These files are meant to be
3354 used with [`tmux`][122] and [`tmuxp`][123]. While other programmers will have to
3355 adjust the `start_directory` item, once it is adjusted, then using this command:
3356
3357 ```
3358 tmuxp load tests/fuzzing/bc_afl.yaml
3359 ```
3360
3361 will start fuzzing.
3362
3363 In other words, to start fuzzing, the sequence is:
3364
3365 ```
3366 ./scripts/fuzz_prep.sh
3367 tmuxp load tests/fuzzing/bc_afl.yaml
3368 ```
3369
3370 Doing that will load, in `tmux`, 16 separate instances of [AFL++][125], 12 on
3371 `bc` and 4 on `dc`. The outputs will be put into the
3372 `tests/fuzzing/bc_outputs{1,2,3}/` and `tests/fuzzing/dc_outputs/` directories.
3373
3374 (Note that loading that config will also delete all unsaved [AFL++][125] output
3375 from the output directories.)
3376
3377 Sometimes, [AFL++][125] will report crashes when there are none. When crashes
3378 are reported, I always run the following command:
3379
3380 ```
3381 ./scripts/afl.py <dir>
3382 ```
3383
3384 where `dir` is one of `bc1`, `bc2`, `bc3`, or `dc`, depending on which of the
3385 16 instances reported the crash. If it was one of the first four (`bc11` through
3386 `bc14`), I use `bc1`. If it was one of the second four (`bc21` through `bc24`, I
3387 use `bc2`. If it was one of the third four (`bc31` through `bc34`, I use `bc3`.
3388 And if it was `dc`, I use `dc`.
3389
3390 The [`scripts/afl.py`][94] script will report whether [AFL++][125] correctly
3391 reported a crash or not. If so, it will copy the crashing test case to
3392 `.test.txt` and tell you whether it was from running it as a file or through
3393 `stdin`.
3394
3395 From there, I personally always investigate the crash and fix it. Then, when the
3396 crash is fixed, I either move `.test.txt` to `tests/{bc,dc}/errors/<idx>.txt` as
3397 an error test (if it produces an error) or I create a new
3398 `tests/{bc,dc}/misc<idx>.txt` test for it and a corresponding results file. (See
3399 [Test Suite][124] for more information about the test suite.) In either case,
3400 `<idx>` is the next number for a file in that particular place. For example, if
3401 the last file in `tests/{bc,dc}/errors/` is `tests/{bc,dc}/errors/18.txt`, I
3402 move `.test.txt` to `tests/bc/error/19.txt`.
3403
3404 Then I immediately run [`scripts/afl.py`][94] again to find the next crash
3405 because often, [AFL++][125] found multiple test cases that trigger the same
3406 crash. If it finds another, I repeat the process until it is happy.
3407
3408 Once it *is* happy, I do the same `fuzz_prep.sh`, `tmuxp load` sequence and
3409 restart fuzzing. Why do I restart instead of continuing? Because with the
3410 changes, the test outputs could be stale and invalid.
3411
3412 However, there *is* a case where I continue: if [`scripts/afl.py`][94] finds
3413 that every crash reported by [AFL++][125] is invalid. If that's the case, I can
3414 just continue with the command:
3415
3416 ```
3417 tmuxp load tests/fuzzing/bc_afl_continue.yaml
3418 ```
3419
3420 (Note: I admit that I usually run [`scripts/afl.py`][94] while the fuzzer is
3421 still running, so often, I don't find a need to continue since there was no
3422 stop. However, the capability is there, if needed.)
3423
3424 In addition, my fuzzing setup, including the `tmux` and `tmuxp` configs,
3425 automatically set up [AFL++][125] power schedules (see [Fuzzing
3426 Performance][127] above). They also set up the parallel fuzzing such that there
3427 is one fuzzer in each group of 4 that does deterministic fuzzing. It's always
3428 the first one in each group.
3429
3430 For more information about deterministic fuzzing, see the [AFL++][125]
3431 documentation.
3432
3433 ### Corpuses
3434
3435 I occasionally add to the input corpuses. These files come from new files in the
3436 [Test Suite][124]. In fact, I use soft links when the files are the same.
3437
3438 However, when I add new files to an input corpus, I sometimes reduce the size of
3439 the file by removing some redundancies.
3440
3441 And then, when adding to the `bc` corpuses, I try to add them evenly so that
3442 each corpus will take about the same amount of time to get to a finished state.
3443
3444 ### [AFL++][125] Quickstart
3445
3446 The way [AFL++][125] works is complicated.
3447
3448 First, it is the one to invoke the compiler. It leverages the compiler to add
3449 code to the binary to help it know when certain branches are taken.
3450
3451 Then, when fuzzing, it uses that branch information to generate information
3452 about the "path" that was taken through the binary.
3453
3454 I don't know what AFL++ counts as a new path, but each new path is added to an
3455 output corpus, and it is later used as a springboard to find new paths.
3456
3457 This is what makes AFL++ so effective: it's not just blindly thrashing a binary;
3458 it adapts to the binary by leveraging information about paths.
3459
3460 ### Fuzzing Runs
3461
3462 For doing a fuzzing run, I expect about a week or two where my computer is
3463 basically unusable, except for text editing and light web browsing.
3464
3465 Yes, it can take two weeks for me to do a full fuzzing run, and that does *not*
3466 include the time needed to find and fix crashes; it only counts the time on the
3467 *last* run, the one that does not find any crashes. This means that the entire
3468 process can take a month or more.
3469
3470 What I use as an indicator that the fuzzing run is good enough is when the
3471 number of "Pending" paths (see [AFL++ Quickstart][129] above) for all fuzzer
3472 instances, except maybe the deterministic instances, is below 50. And even then,
3473 I try to let deterministic instances get that far as well.
3474
3475 You can see how many pending paths are left in the "path geometry" section of
3476 the [AFL++][125] dashboard.
3477
3478 Also, to make [AFL++][125] quit, you need to send it a `SIGINT`, either with
3479 `Ctrl+c` or some other method. It will not quit until you tell it to.
3480
3481 ### Radamsa
3482
3483 I rarely use [Radamsa][99] instead of [AFL++][125]. In fact, it's only happened
3484 once.
3485
3486 The reason I use [Radamsa][99] instead of [AFL++][125] is because it is easier
3487 to use with varying command-line arguments, which was needed for testing `bc`'s
3488 command-line expression parsing code, and [AFL++][125] is best when testing
3489 input from `stdin`.
3490
3491 [`scripts/radamsa.sh`][100] does also do fuzzing on the [AFL++][125] inputs, but
3492 it's not as effective at that, so I don't really use it for that either.
3493
3494 [`scripts/radamsa.sh`][100] and [Radamsa][99] were only really used once; I have
3495 not had to touch the command-line expression parsing code since.
3496
3497 ### [AddressSanitizer][21] with Fuzzing
3498
3499 One advantage of using [AFL++][125] is that it saves every test case that
3500 generated a new path (see [AFL++ Quickstart][129] above), and it doesn't delete
3501 them when the user makes it quit.
3502
3503 Keeping them around is not a good idea, for several reasons:
3504
3505 * They are frequently large.
3506 * There are a lot of them.
3507 * They go stale; after `bc` is changed, the generated paths may not be valid
3508   anymore.
3509
3510 However, before they are deleted, they can definitely be leveraged for even
3511 *more* bug squashing by running *all* of the paths through a build of `bc` with
3512 [AddressSanitizer][21].
3513
3514 This can easily be done with these four commands:
3515
3516 ```
3517 ./scripts/fuzz_prep.sh -a
3518 ./scripts/afl.py --asan bc1
3519 ./scripts/afl.py --asan bc2
3520 ./scripts/afl.py --asan bc3
3521 ./scripts/afl.py --asan dc
3522 ```
3523
3524 (By the way, the last four commands could be run in separate terminals to do the
3525 processing in parallel.)
3526
3527 These commands build an [ASan][21]-enabled build of `bc` and `dc` and then they
3528 run `bc` and `dc` on all of the found crashes and path output corpuses. This is
3529 to check that no path or crash has found any memory errors, including memory
3530 leaks.
3531
3532 Because the output corpuses can contain test cases that generate infinite loops
3533 in `bc` or `dc`, [`scripts/afl.py`][94] has a timeout of 8 seconds, which is far
3534 greater than the timeout that [AFL++][125] uses and should be enough to catch
3535 any crash.
3536
3537 If [AFL++][125] fails to find crashes *and* [ASan][21] fails to find memory
3538 errors on the outputs of [AFL++][125], that is an excellent indicator of very
3539 few bugs in `bc`, and a release can be made with confidence.
3540
3541 ## Code Concepts
3542
3543 This section is about concepts that, if understood, will make it easier to
3544 understand the code as it is written.
3545
3546 The concepts in this section are not found in a single source file, but they are
3547 littered throughout the code. That's why I am writing them all down in a single
3548 place.
3549
3550 ### POSIX Mode
3551
3552 POSIX mode is `bc`-only.
3553
3554 In fact, POSIX mode is two different modes: Standard Mode and Warning Mode.
3555 These modes are designed to help users write POSIX-compatible `bc` scripts.
3556
3557 #### Standard Mode
3558
3559 Standard Mode is activated with the `-s` or `--standard` flags.
3560
3561 In this mode, `bc` will error if any constructs are used that are not strictly
3562 compatible with the [POSIX `bc` specification][2].
3563
3564 #### Warning Mode
3565
3566 Warning Mode is activated with the `-w` or `--warn` flags.
3567
3568 In this mode, `bc` will issue warnings, but continue, if any constructs are used
3569 that are not strictly compatible with the [POSIX `bc` specification][2].
3570
3571 ### Memory Management
3572
3573 The memory management in `bc` is simple: everything is owned by one thing.
3574
3575 If something is in a vector, it is owned by that vector.
3576
3577 If something is contained in a struct, it is owned by that struct with one
3578 exception: structs can be given pointers to other things, but only if those
3579 other things will outlast the struct itself.
3580
3581 As an example, the `BcParse` struct has a pointer to the one `BcProgram` in
3582 `bc`. This is okay because the program is initialized first and deallocated
3583 last.
3584
3585 In other words, it's simple: if a field in a struct is a pointer, then unless
3586 that pointer is directly allocated by the struct (like the vector array or the
3587 number limb array), that struct does not own the item at that pointer.
3588 Otherwise, the struct *does* own the item.
3589
3590 ### [Async-Signal-Safe][115] Signal Handling
3591
3592 `bc` is not the typical Unix utility. Most Unix utilities are I/O bound, but
3593 `bc` is, by and large, CPU-bound. This has several consequences, but the biggest
3594 is that there is no easy way to allow signals to interrupt it.
3595
3596 This consequence is not obvious, but it comes from the fact that a lot of I/O
3597 operations can be interrupted and return [`EINTR`][198]. This makes such I/O
3598 calls natural places for allowing signals to interrupt execution, even when the
3599 signal comes during execution, and not interrupting I/O calls. The way this is
3600 done is setting a flag in the signal handler, which is checked around the time
3601 of the I/O call, when it is convenient.
3602
3603 Alternatively, I/O bound programs can use the [self-pipe trick][199].
3604
3605 Neither of these are possible in `bc` because the execution of math code can
3606 take a long time. If a signal arrives during this long execution time, setting a
3607 flag like an I/O bound application and waiting until the next I/O call could
3608 take seconds, minutes, hours, or even days. (Last I checked, my `bc` takes a
3609 week to calculate a million digits of pi, and it's not slow as far as `bc`
3610 implementations go.)
3611
3612 Thus, using just the technique of setting the flag just will not work for an
3613 interactive calculator.
3614
3615 Well, it can, but it requires a lot of code and massive inefficiencies. I know
3616 this because that was the original implementation.
3617
3618 The original implementation set a flag and just exit the signal handler. Then,
3619 on just about every loop header, I have a check for the signal flag. These
3620 checks happened on every iteration of every loop. It was a massive waste because
3621 it was polling, and [polling is evil][200].
3622
3623 So for version [3.0.0][32], I expended a lot of effort to change the
3624 implementation.
3625
3626 In the new system, code *outside* the signal handler sets a flag (`vm.sig_lock`)
3627 to tell the signal handler whether it can use `longjmp()` to stop the current
3628 execution. If so, it does. If not, it sets a flag, which then is used by the
3629 code outside the signal handler that set the `vm.sig_lock` flag. When that code
3630 unsets `vm.sig_lock`, it checks to see if a signal happened, and if so, that
3631 code executes the `longjmp()` and stops the current execution.
3632
3633 Other than that, the rest of the interrupt-based implementation is best
3634 described in the [Error Handling][97].
3635
3636 However, there are rules for signal handlers that I must lay out.
3637
3638 First, signal handlers can only call [async-signal-safe][115] functions.
3639
3640 Second, any field set or read by both the signal handler and normal code must be
3641 a `volatile sig_atomic_t`.
3642
3643 Third, when setting such fields, they must be set to constants and no math can
3644 be done on them. This restriction and the above restriction exist in order to
3645 ensure that the setting of the fields is always atomic with respect to signals.
3646
3647 These rules exist for *any* code using Unix signal handlers, not just `bc`.
3648
3649 #### Vectors and Numbers
3650
3651 Vectors and numbers needed special consideration with the interrupt-based signal
3652 handling.
3653
3654 When vectors and numbers are about to allocate, or *reallocate* their arrays,
3655 they need to lock signals to ensure that they do not call `malloc()` and friends
3656 and get interrupted by a signal because, as you will see in the [Error
3657 Handling][97] section, `longjmp()` cannot be used in a signal handler if it may
3658 be able to interrupt a non-[async-signal-safe][115] function like `malloc()` and
3659 friends.
3660
3661 ### Asserts
3662
3663 If you asked me what procedure is used the most in `bc`, I would reply without
3664 hesitation, "`assert()`."
3665
3666 I use `assert()` everywhere. In fact, it is what made [fuzzing][82] with
3667 [AFL++][125] so effective. [AFL++][125] is incredibly good at finding crashes,
3668 and a failing `assert()` counts as one.
3669
3670 So while a lot of bad bugs might have corrupted data and *not* caused crashes,
3671 because I put in so many `assert()`'s, they were *turned into* crashing bugs,
3672 and [AFL++][125] found them.
3673
3674 By far, the most bugs it found this way was in the `bc` parser. (See the [`bc`
3675 Parsing][110] for more information.) And even though I was careful to put
3676 `assert()`'s everywhere, most parser bugs manifested during execution of
3677 bytecode because the virtual machine assumes the bytecode is valid.
3678
3679 Sidenote: one of those bugs caused an infinite recursion when running the sine
3680 (`s()`) function in the math library, so yes, parser bugs can be *very* weird.
3681
3682 Anyway, the way I did `assert()`'s was like this: whenever I realized that I
3683 had put assumptions into the code, I would put an `assert()` there to test it
3684 **and** to *document* it.
3685
3686 Yes, documentation. In fact, by far the best documentation of the code in `bc`
3687 is actually the `assert()`'s. The only time I would not put an `assert()` to
3688 test an assumption is if that assumption was already tested by an `assert()`
3689 earlier.
3690
3691 As an example, if a function calls another function and passes a pointer that
3692 the caller previously `assert()`'ed was *not* `NULL`, then the callee does not
3693 have to `assert()` it too, unless *also* called by another function that does
3694 not `assert()` that.
3695
3696 At first glance, it may seem like putting asserts for pointers being non-`NULL`
3697 everywhere would actually be good, but unfortunately, not for fuzzing. Each
3698 `assert()` is a branch, and [AFL++][125] rates its own effectiveness based on
3699 how many branches it covers. If there are too many `assert()`'s, it may think
3700 that it is not being effective and that more fuzzing is needed.
3701
3702 This means that `assert()`'s show up most often in two places: function
3703 preconditions and function postconditions.
3704
3705 Function preconditions are `assert()`'s that test conditions relating to the
3706 arguments a function was given. They appear at the top of the function, usually
3707 before anything else (except maybe initializing a local variable).
3708
3709 Function postconditions are `assert()`'s that test the return values or other
3710 conditions when a function exits. These are at the bottom of a function or just
3711 before a `return` statement.
3712
3713 The other `assert()`'s cover various miscellaneous assumptions.
3714
3715 If you change the code, I ***HIGHLY*** suggest that you use `assert()`'s to
3716 document your assumptions. And don't remove them when [AFL++][125] gleefully
3717 crashes `bc` and `dc` over and over again.
3718
3719 ### Vectors
3720
3721 In `bc`, vectors mean resizable arrays, and they are the most fundamental piece
3722 of code in the entire codebase.
3723
3724 I had previously written a [vector implementation][112], which I used to guide
3725 my decisions, but I wrote a new one so that `bc` would not have a dependency. I
3726 also didn't make it as sophisticated; the one in `bc` is very simple.
3727
3728 Vectors store some information about the type that they hold:
3729
3730 * The size (as returned by `sizeof`).
3731 * An enum designating the destructor.
3732
3733 If the destructor is `BC_DTOR_NONE`, it is counted as the type not having a
3734 destructor.
3735
3736 But by storing the size, the vector can then allocate `size * cap` bytes, where
3737 `cap` is the capacity. Then, when growing the vector, the `cap` is doubled again
3738 and again until it is bigger than the requested size.
3739
3740 But to store items, or to push items, or even to return items, the vector has to
3741 figure out where they are, since to it, the array just looks like an array of
3742 bytes.
3743
3744 It does this by calculating a pointer to the underlying type with
3745 `v + (i * size)`, where `v` is the array of bytes, `i` is the index of the
3746 desired element, and `size` is the size of the underlying type.
3747
3748 Doing that, vectors can avoid undefined behavior (because `char` pointers can
3749 be cast to any other pointer type), while calculating the exact position of
3750 every element.
3751
3752 Because it can do that, it can figure out where to push new elements by
3753 calculating `v + (len * size)`, where `len` is the number of items actually in
3754 the vector.
3755
3756 By the way, `len` is different from `cap`. While `cap` is the amount of storage
3757 *available*, `len` is the number of actual elements in the vector at the present
3758 point in time.
3759
3760 Growing the vector happens when `len` is equal to `cap` *before* pushing new
3761 items, not after.
3762
3763 To add a destructor, you need to add an enum item to `BcDtorType` in
3764 [`include/vector.h`][174] and add the actual destructor in the same place as the
3765 enum item in the `bc_vec_dtors[]` array in [`src/data.c`][131].
3766
3767 #### Pointer Invalidation
3768
3769 There is one big danger with the vectors as currently implemented: pointer
3770 invalidation.
3771
3772 If a piece of code receives a pointer from a vector, then adds an item to the
3773 vector before they finish using the pointer, that code must then update the
3774 pointer from the vector again.
3775
3776 This is because any pointer inside the vector is calculated based off of the
3777 array in the vector, and when the vector grows, it can `realloc()` the array,
3778 which may move it in memory. If that is done, any pointer returned by
3779 `bc_vec_item()`, `bc_vec_top()` and `bc_vec_item_rev()` will be invalid.
3780
3781 This fact was the single most common cause of crashes in the early days of this
3782 `bc`; wherever I have put a comment about pointers becoming invalidated and
3783 updating them with another call to `bc_vec_item()` and friends, *do **NOT**
3784 remove that code!*
3785
3786 #### Maps
3787
3788 Maps in `bc` are...not.
3789
3790 They are really a combination of two vectors. Those combinations are easily
3791 recognized in the source because one vector is named `<name>s` (plural), and the
3792 other is named `<name>_map`.
3793
3794 There are currently three, all in `BcProgram`:
3795
3796 * `fns` and `fn_map` (`bc` functions).
3797 * `vars` and `var_map` (variables).
3798 * `arrs` and `arr_map` (arrays).
3799
3800 They work like this: the `<name>_map` vector holds `BcId`'s, which just holds a
3801 string and an index. The string is the name of the item, and the index is the
3802 index of that item in the `<name>s` vector.
3803
3804 Obviously, I could have just done a linear search for items in the `<name>s`
3805 vector, but that would be slow with a lot of functions/variables/arrays.
3806 Instead, I ensure that whenever an item is inserted into the `<name>_map`
3807 vector, the item is inserted in sorted order. This means that the `<name>_map`
3808 is always sorted (by the names of the items).
3809
3810 So when looking up an item in the "map", what is really done is this:
3811
3812 1.      A binary search is carried out on the names in the `<name>_map` vector.
3813 2.      When one is found, it returns the index in the `<name>_map` vector where the
3814         item was found.
3815 3.      This index is then used to retrieve the `BcId`.
3816 4.      The index from the `BcId` is then used to index into the `<name>s` vector,
3817         which returns the *actual* desired item.
3818
3819 Why were the `<name>s` and `<name>_map` vectors not combined for ease? The
3820 answer is that sometime, when attempting to insert into the "map", code might
3821 find that something is already there. For example, a function with that name may
3822 already exist, or the variable might already exist.
3823
3824 If the insert fails, then the name already exists, and the inserting code can
3825 forego creating a new item to put into the vector. However, if there is no item,
3826 the inserting code must create a new item and insert it.
3827
3828 If the two vectors were combined together, it would not be possible to separate
3829 the steps such that creating a new item could be avoided if it already exists.
3830
3831 #### Slabs and Slab Vectors
3832
3833 `bc` allocates *a lot* of small strings, and small allocations are the toughest
3834 for general-purpose allocators to handle efficiently.
3835
3836 Because of that reason, I decided to create a system for allocating small
3837 strings using something that I call a "slab vector" after [slab
3838 allocators][201].
3839
3840 These vectors allocate what I call "slabs," which are just an allocation of a
3841 single page with a length to tell the slab how much of the slab is used.
3842
3843 The vector itself holds slabs, and when the slab vector is asked to allocate a
3844 string, it attempts to in the last slab. If that slab cannot do so, it allocates
3845 a new slab and allocates from that.
3846
3847 There is one exception: if a string is going to be bigger than 128 bytes, then
3848 the string is directly allocated, and a slab is created with that pointer and a
3849 length of `SIZE_MAX`, which tells the slab vector that it is a direct
3850 allocation. Then, the last slab is pushed into the next spot and the new special
3851 slab is put into the vacated spot. This ensures that a non-special slab is
3852 always last.
3853
3854 ### Command-Line History
3855
3856 When I first wrote `bc`, I immediately started using it in order to eat my own
3857 dog food.
3858
3859 It sucked, and the biggest reason why was because of the lack of command-line
3860 history.
3861
3862 At first, I just dealt with it, not knowing how command-line history might be
3863 implemented.
3864
3865 Eventually, I caved and attempted to adapt [`linenoise-mob`][28], which I had
3866 known about for some time.
3867
3868 It turned out to be easier than I thought; the hardest part was the tedious
3869 renaming of everything to fit the `bc` naming scheme.
3870
3871 Understanding command-line history in `bc` is really about understanding VT-100
3872 escape codes, so I would start there.
3873
3874 Now, the history implementation of `bc` has been adapted far beyond that initial
3875 adaptation to make the command-line history implementation perfect for `bc`
3876 alone, including integrating it into `bc`'s [Custom I/O][114] and making sure
3877 that it does not disturb output that did not end with a newline.
3878
3879 On top of that, at one point, I attempted to get history to work on Windows. It
3880 barely worked after a lot of work and a lot of portability code, but even with
3881 all of that, it does not have at least one feature: multi-line pasting from the
3882 clipboard.
3883
3884 #### Editline and Readline
3885
3886 I also implemented an integration with both editline and readline, though not at
3887 the same time. This allows distributions that want to use either one in `bc` to
3888 do so. This enables users to use their `.editrc` and `.inputrc` settings.
3889
3890 The integration is custom to each library, and it does *not* involve any of
3891 `bc`'s custom history code. It also required changes to `bc`'s [Custom
3892 I/O][114].
3893
3894 ### Error Handling
3895
3896 The error handling on `bc` got an overhaul for version [`3.0.0`][32], and it
3897 became one of the things that taught me the most about C in particular and
3898 programming in general.
3899
3900 Before then, error handling was manual. Almost all functions returned a
3901 `BcStatus` indicating if an error had occurred. This led to a proliferation of
3902 lines like:
3903
3904 ```
3905 if (BC_ERR(s)) return s;
3906 ```
3907
3908 In fact, a quick and dirty count of such lines in version `2.7.2` (the last
3909 version before [`3.0.0`][32]) turned up 252 occurrences of that sort of line.
3910
3911 And that didn't even guarantee that return values were checked *everywhere*.
3912
3913 But before I can continue, let me back up a bit.
3914
3915 From the beginning, I decided that I would not do what GNU `bc` does on errors;
3916 it tries to find a point at which it can recover. Instead, I decided that I
3917 would have `bc` reset to a clean slate, which I believed, would reduce the
3918 number of bugs where an unclean state caused errors with continuing execution.
3919
3920 So from the beginning, errors would essentially unwind the stack until they got
3921 to a safe place from which to clean the slate, reset, and ask for more input.
3922
3923 Well, if that weren't enough, `bc` also has to handle [POSIX signals][113]. As
3924 such, it had a signal handler that set a flag. But it could not safely interrupt
3925 execution, so that's all it could do.
3926
3927 In order to actually respond to the signal, I had to litter checks for the flag
3928 *everywhere* in the code. And I mean *everywhere*. They had to be checked on
3929 every iteration of *every* loop. They had to be checked going into and out of
3930 certain functions.
3931
3932 It was a mess.
3933
3934 But fortunately for me, signals did the same thing that errors did: they unwound
3935 the stack to the *same* place.
3936
3937 Do you see where I am going with this?
3938
3939 It turns out that what I needed was a [async-signal-safe][115] form of what
3940 programmers call "exceptions" in other languages.
3941
3942 I knew that [`setjmp()`][116] and [`longjmp()`][117] are used in C to implement
3943 exceptions, so I thought I would learn how to use them. How hard could it be?
3944
3945 Quite hard, it turns out, especially in the presence of signals. And that's
3946 because there are a few big snares:
3947
3948 1.      The value of any local variables are not guaranteed to be preserved after a
3949         `longjmp()` back into a function.
3950 2.      While `longjmp()` is required to be [async-signal-safe][115], if it is
3951         invoked by a signal handler that interrupted a non-[async-signal-safe][115]
3952         function, then the behavior is undefined.
3953 3.      Any mutation that is not guaranteed to be atomic with respect to signals may
3954         be incomplete when a signal arrives.
3955
3956 Oh boy.
3957
3958 For number 1, the answer to this is to hide data that must stay changed behind
3959 pointers. Only the *pointers* are considered local, so as long as I didn't do
3960 any modifying pointer arithmetic, pointers and their data would be safe. For
3961 cases where I have local data that must change and stay changed, I needed to
3962 *undo* the `setjmp()`, do the change, and the *redo* the `setjmp()`.
3963
3964 For number 2 and number 3, `bc` needs some way to tell the signal handler that
3965 it cannot do a `longjmp()`. This is done by "locking" signals with a `volatile
3966 sig_atomic_t`. (For more information, see the [Async-Signal-Safe Signal
3967 Handling][173] section.) For every function that calls a function that is not
3968 async-signal-safe, they first need to use `BC_SIG_LOCK` to lock signals, and
3969 afterward, use `BC_SIG_UNLOCK` to unlock signals.
3970
3971 Code also need to do this for all global, non-atomic mutation, which means that
3972 modifying any part of the `BcVm` global struct.
3973
3974 `BC_SIG_UNLOCK` has another requirement: it must check for signals or errors and
3975 jump if necessary.
3976
3977 On top of all of that, *all* functions with cleanup needed to be able to run
3978 their cleanup. This meant that `longjmp()` could not just jump to the finish; it
3979 had to start what I call a "jump series," using a stack of `jmp_buf`'s
3980 (`jmp_bufs` in `BcVm`). Each `longjmp()` uses the top of the `jmp_bufs` stack to
3981 execute its jump. Then, if the cleanup code was executed because of a jump, the
3982 cleanup code was responsible for continuing the jump series by popping the
3983 previous item off the stack and using the new top of the stack for a jump.
3984
3985 In this way, C++-style exceptions were implemented in pure C. Not fun, but it
3986 works. However, the order of operations matters, especially in the macros that
3987 help implement the error handling.
3988
3989 For example, in `BC_UNSETJMP`, signals are unlocked before checking for signals.
3990 If a signal comes between, that's fine; it will still cause a jump to the right
3991 place. However, disabling the lock after could mean that a signal could come
3992 *after* checking for signals, but before signals were unlocked, causing the
3993 handling of the signal to be delayed.
3994
3995 #### Custom I/O
3996
3997 Why did I implement my own buffered I/O for `bc`? Because I use `setjmp()` and
3998 `longjmp()` for error handling (see the [Error Handling][97] section), and the
3999 buffered I/O in `libc` does not interact well with the use of those procedures;
4000 all of the buffered I/O API is basically non-[async-signal-safe][115].
4001
4002 Implementing custom buffered I/O had other benefits. First, it allowed me to
4003 tightly integrate history with the I/O code. Second, it allowed me to make
4004 changes to history in order to make it adapt to user prompts.
4005
4006 ### Lexing
4007
4008 To simplify parsing, both calculators use lexers to turn the text into a more
4009 easily-parsable form.
4010
4011 While some tokens are only one character long, others require many tokens, and
4012 some of those need to store all of the text corresponding to the token for use
4013 by the parsers. Tokens that need to store their corresponding text include, but
4014 are not limited to:
4015
4016 * Strings.
4017 * Numbers.
4018 * Identifiers.
4019
4020 For this purpose, the lexer has a [vector][111] named `str` to store the data
4021 for tokens. This data is overwritten if another token is lexed that needs to
4022 store data, so the parsers need to copy the data before calling the lexer again.
4023
4024 Both lexers do some of the same things:
4025
4026 * Lex identifiers into tokens, storing the identifier in `str`.
4027 * Lex number strings into tokens, storing the string in `str`.
4028 * Lex whitespace.
4029 * Lex comments.
4030
4031 Other than that, and some common plumbing, the lexers have separate code.
4032
4033 #### `dc` Lexing
4034
4035 The `dc` lexer is remarkably simple; in fact, besides [`src/main.c`][205],
4036 [`src/bc.c`][40], and [`src/dc.c`][44], which just contain one function each,
4037 the only file smaller than [`src/dc_lex.c`][45] is [`src/args.c`][206], which
4038 just processes command-line arguments after they are parsed by
4039 [`src/opt.c`][51].
4040
4041 For most characters, the `dc` lexer is able to convert directly from the
4042 character to its corresponding token. This happens using `dc_lex_tokens[]` in
4043 [`src/data.c`][131].
4044
4045 `dc`'s lexer also has to lex the register name after lexing tokens for commands
4046 that need registers.
4047
4048 And finally, `dc`'s lexer needs to parse `dc` strings, which is the only part of
4049 the `dc` lexer that is more complex than the `bc` lexer. This is because `dc`
4050 strings need to have a balanced number of brackets.
4051
4052 #### `bc` Lexing
4053
4054 The `bc` lexer is fairly simple. It does the following things:
4055
4056 * Lexes `bc` strings.
4057 * Lexes `bc` identifiers. This is necessary because this is how `bc` keywords
4058   are lexed. After ensuring that an identifier is not a keyword, the `bc` lexer
4059   allows the common identifier function to take over.
4060 * Turns characters and groups of characters into `bc` operator tokens.
4061
4062 ### Parsing
4063
4064 The difference between parsing `bc` and `dc` code is...vast. The `dc` parser is
4065 simple, while the `bc` parser is the most complex piece of code in the entire
4066 codebase.
4067
4068 However, they both do some of the same things.
4069
4070 First, the parsers do *not* use [abstract syntax trees][207]; instead, they
4071 directly generate the bytecode that will be executed by the `BcProgram` code.
4072 Even in the case of `bc`, this heavily simplifies the parsing because the
4073 [Shunting-Yard Algorithm][109] is designed to generate [Reverse Polish
4074 Notation][108], which is basically directly executable.
4075
4076 Second, any extra data that the `BcProgram` needs for execution is stored into
4077 functions (see the [Functions][208] section). These include constants and
4078 strings.
4079
4080 #### `dc` Parsing
4081
4082 The parser for `dc`, like its lexer, is remarkably simple. In fact, the easiness
4083 of lexing and parsing [Reverse Polish notation][108] is probably why it was used
4084 for `dc` when it was first created at Bell Labs.
4085
4086 For most tokens, the `dc` parser is able to convert directly from the token
4087 to its corresponding instruction. This happens using `dc_parse_insts[]` in
4088 [`src/data.c`][131].
4089
4090 `dc`'s parser also has to parse the register name for commands that need
4091 registers. This is the most complex part of the `dc` parser; each different
4092 register command needs to be parsed differently because most of them require two
4093 or more instructions to execute properly.
4094
4095 For example, storing in a register requires a swap instruction and an assignment
4096 instruction.
4097
4098 Another example are conditional execution instructions; they need to produce the
4099 instruction for the condition, and then they must parse a possible "else" part,
4100 which might not exist.
4101
4102 ##### Existing Commands
4103
4104 `dc` is based on commands, which are usually one letter. The following table is
4105 a table of which ASCII characters are already used:
4106
4107 | Characters | Used? | For...                                     |
4108 |------------|-------|--------------------------------------------|
4109 | Space      | x     | Separator                                  |
4110 | `!`        | x     | Conditional Execution of Registers         |
4111 | `"`        | x     | Bounded Rand Operator                      |
4112 | `#`        | x     | Comments                                   |
4113 | `$`        | x     | Truncation                                 |
4114 | `%`        | x     | Modulus                                    |
4115 | `&`        |       |                                            |
4116 | `'`        | x     | Rand Operator                              |
4117 | `(`        | x     | Greater Than Operator                      |
4118 | `)`        | x     | Less Than Operator                         |
4119 | `*`        | x     | Multiplication                             |
4120 | `+`        | x     | Addition                                   |
4121 | `,`        | x     | Depth of Execution Stack                   |
4122 | `-`        | x     | Subtraction                                |
4123 | `.`        | x     | Numbers                                    |
4124 | `/`        | x     | Division                                   |
4125 | `0-9`      | x     | Numbers                                    |
4126 | `:`        | x     | Store into Array                           |
4127 | `;`        | x     | Load from Array                            |
4128 | `<`        | x     | Conditional Execution of Registers         |
4129 | `=`        | x     | Conditional Execution of Registers         |
4130 | `>`        | x     | Conditional Execution of Registers         |
4131 | `?`        | x     | Ask for User Input                         |
4132 | `@`        | x     | Places Operator                            |
4133 | `A-F`      | x     | Numbers                                    |
4134 | `G`        | x     | Equal Operator                             |
4135 | `H`        | x     | Shift Left                                 |
4136 | `I`        | x     | Push `ibase` onto Stack                    |
4137 | `J`        | x     | Push `seed` onto Stack                     |
4138 | `K`        | x     | Push `scale` onto Stack                    |
4139 | `L`        | x     | Pop off of Register                        |
4140 | `M`        | x     | Boolean And Operator                       |
4141 | `N`        | x     | Boolean Not Operator                       |
4142 | `O`        | x     | Push `obase` onto Stack                    |
4143 | `P`        | x     | Byte Stream Printing                       |
4144 | `Q`        | x     | Quit Some Number of Macros                 |
4145 | `R`        | x     | Pop Top of Stack                           |
4146 | `S`        | x     | Push onto Register                         |
4147 | `T`        | x     | Push Max `ibase` onto Stack                |
4148 | `U`        | x     | Push Max `obase` onto Stack                |
4149 | `V`        | x     | Push Max `scale` onto Stack                |
4150 | `W`        | x     | Push Max of `'` Operator                   |
4151 | `X`        | x     | Scale of a Number                          |
4152 | `Y`        | x     | Length of Array                            |
4153 | `Z`        | x     | Number of Significant Digits               |
4154 | `[`        | x     | Strings                                    |
4155 | `\\`       | x     | Escaping Brackets in Strings               |
4156 | `]`        | x     | Strings                                    |
4157 | `^`        | x     | Power                                      |
4158 | `_`        | x     | Negative Numbers and Negation              |
4159 | Backtick   |       |                                            |
4160 | `a`        | x     | Asciify                                    |
4161 | `b`        | x     | Absolute Value                             |
4162 | `c`        | x     | Clear Stack                                |
4163 | `d`        | x     | Duplication of Top of Stack                |
4164 | `e`        | x     | Else in Conditional Execution of Registers |
4165 | `f`        | x     | Printing the Stack                         |
4166 | `g`        | x     | Global Settings                            |
4167 | `h`        | x     | Shift Right                                |
4168 | `i`        | x     | Set `ibase`                                |
4169 | `j`        | x     | Set `seed`                                 |
4170 | `k`        | x     | Set `scale`                                |
4171 | `l`        | x     | Load from Register                         |
4172 | `m`        | x     | Boolean Or Operator                        |
4173 | `n`        | x     | Print and Pop                              |
4174 | `o`        | x     | Set `obase`                                |
4175 | `p`        | x     | Print with Newline                         |
4176 | `q`        | x     | Quit Two Macros                            |
4177 | `r`        | x     | Swap Top Two Items                         |
4178 | `s`        | x     | Store into Register                        |
4179 | `t`        | x     | Equivalent of `bc`'s `is_number()`         |
4180 | `u`        | x     | Equivalent of `bc`'s `is_string()`         |
4181 | `v`        | x     | Square Root                                |
4182 | `w`        |       |                                            |
4183 | `x`        | x     | Execute String                             |
4184 | `y`        | x     | Current Depth of a Register                |
4185 | `z`        | x     | Current Depth of Stack                     |
4186 | `{`        | x     | Greater Than or Equal Operator             |
4187 | `\|`       | x     | Moduler Exponentiation                     |
4188 | `}`        | x     | Less Than or Equal Operator                |
4189 | `~`        | x     | Division and Modulus Combined              |
4190
4191 #### `bc` Parsing
4192
4193 `bc`'s parser is, by far, the most sensitive piece of code in this software, and
4194 there is a very big reason for that: `bc`'s standard is awful and defined a very
4195 poor language.
4196
4197 The standard says that either semicolons or newlines can end statements. Trying
4198 to parse the end of a statement when it can either be a newline or a semicolon
4199 is subtle. Doing it in the presence of control flow constructs that do not have
4200 to use braces is even harder.
4201
4202 And then comes the biggest complication of all: `bc` has to assume that it is
4203 *always* at a REPL (Read-Eval-Print Loop). `bc` is, first and foremost, an
4204 *interactive* utility.
4205
4206 ##### Flags
4207
4208 All of this means that `bc` has to be able to partially parse something, store
4209 enough data to recreate that state later, and return, making sure to not
4210 execute anything in the meantime.
4211
4212 *That* is what the flags in [`include/bc.h`][106] are: they are the state that
4213 `bc` is saving for itself.
4214
4215 It saves them in a stack, by the way, because it's possible to nest
4216 structures, just like any other programming language. Thus, not only does it
4217 have to store state, it needs to do it arbitrarily, and still be able to
4218 come back to it.
4219
4220 So `bc` stores its parser state with flags in a stack. Careful setting of these
4221 flags, along with properly using them and maintaining the flag stack, are what
4222 make `bc` parsing work, but it's complicated. In fact, as I mentioned, the `bc`
4223 parser is the single most subtle, fickle, and sensitive piece of code in the
4224 entire codebase. Only one thing came close once: square root, and that was only
4225 sensitive because I wrote it wrong. This parser is pretty good, and it is
4226 *still* sensitive. And flags are the reason why.
4227
4228 For more information about what individual flags there are, see the comments in
4229 [`include/bc.h`][106].
4230
4231 ##### Labels
4232
4233 `bc`'s language is Turing-complete. That means that code needs the ability to
4234 jump around, specifically to implement control flow like `if` statements and
4235 loops.
4236
4237 `bc` handles this while parsing with what I called "labels."
4238
4239 Labels are markers in the bytecode. They are stored in functions alongside the
4240 bytecode, and they are just indices into the bytecode.
4241
4242 When the `bc` parser creates a label, it pushes an index onto the labels array,
4243 and the index of the label in that array is the index that will be inserted into
4244 the bytecode.
4245
4246 Then, when a jump happens, the index pulled out of the bytecode is used to index
4247 the labels array, and the label (index) at the index is then used to set the
4248 instruction pointer.
4249
4250 ##### Cond Labels
4251
4252 "Cond" labels are so-called because they are used by conditionals.
4253
4254 The key to them is that they come *before* the code that uses them. In other
4255 words, when jumping to a condition, code is jumping *backwards*.
4256
4257 This means that when a cond label is created, the value that should go there is
4258 well-known. Cond labels are easy.
4259
4260 However, they are still stored on a stack so that the parser knows what cond
4261 label to use.
4262
4263 ##### Exit Labels
4264
4265 Exit labels are not so easy.
4266
4267 "Exit" labels are so-called because they are used by code "exiting" out of `if`
4268 statements or loops.
4269
4270 The key to them is that they come *after* the code that uses them. In other
4271 words, when jumping to an exit, code is jumping *forwards*.
4272
4273 But this means that when an exit label is created, the value that should go
4274 there is *not* known. The code that needs it must be parsed and generated first.
4275
4276 That means that exit labels are created with the index of `SIZE_MAX`, which is
4277 then specifically checked for with an assert in `bc_program_exec()` before using
4278 those indices.
4279
4280 There should ***NEVER*** be a case when an exit label is not filled in properly
4281 if the parser has no bugs. This is because every `if` statement, every loop,
4282 must have an exit, so the exit must be set. If not, there is a bug.
4283
4284 Exit labels are also stored on a stack so that the parser knows what exit label
4285 to use.
4286
4287 ##### Expression Parsing
4288
4289 `bc` has expressions like you might expect in a typical programming language.
4290 This means [infix notation][107].
4291
4292 One thing about infix notation is that you can't just generate code straight
4293 from it like you can with [Reverse Polish notation][108]. It requires more work
4294 to shape it into a form that works for execution on a stack machine.
4295
4296 That extra work is called the [Shunting-Yard algorithm][109], and the form it
4297 translates infix notation into is...[Reverse Polish notation][108].
4298
4299 In order to understand the rest of this section, you must understand the
4300 [Shunting-Yard algorithm][109]. Go do that before you read on.
4301
4302 ###### Operator Stack
4303
4304 In `bc`, the [Shunting-Yard algorithm][109] is implemented with bytecode as the
4305 output and an explicit operator stack (the `ops` field in `BcParse`) as the
4306 operator stack. It stores tokens from `BcLex`.
4307
4308 However, there is one **HUGE** hangup: multiple expressions can stack. This
4309 means that multiple expressions can be parsed at one time (think an array element
4310 expression in the middle of a larger expression). Because of that, we need to
4311 keep track of where the previous expression ended. That's what `start` parameter
4312 to `bc_parse_operator()` is.
4313
4314 Parsing multiple expressions on one operator stack only works because
4315 expressions can only *stack*; this means that, if an expression begins before
4316 another ends, it must *also* end before that other expression ends. This
4317 property ensures that operators will never interfere with each other on the
4318 operator stack.
4319
4320 Also, the operator precedence is reversed: a lower precedence *value* means a
4321 higher precedence.
4322
4323 ###### Recursion
4324
4325 Because expressions can stack, parsing expressions actually requires recursion.
4326 Well, it doesn't *require* it, but the code is much more readable that way.
4327
4328 This recursion is indirect; the functions that `bc_parse_expr_err()` (the actual
4329 expression parsing function) calls can, in turn, call it.
4330
4331 ###### Expression Flags
4332
4333 There is one more big thing: not all expressions in `bc` are equal.
4334
4335 Some expressions have requirements that others don't have. For example, only
4336 array arguments can be arrays (which are technically not expressions, but are
4337 treated as such for parsing), and some operators (in POSIX) are not allowed in
4338 certain places.
4339
4340 For this reason, functions that are part of the expression parsing
4341 infrastructure in `bc`'s parser usually take a `flags` argument. This is meant
4342 to be passed to children, and somewhere, they will be checked to ensure that the
4343 resulting expression meets its requirements.
4344
4345 There are also places where the flags are changed. This is because the
4346 requirements change.
4347
4348 Maintaining the integrity of the requirements flag set is an important part of
4349 the `bc` parser. However, they do not have to be stored on a stack because their
4350 stack is implicit from the recursion that expression parsing uses.
4351
4352 ### Functions
4353
4354 Functions, in `bc`, are data structures that contain the bytecode and data
4355 produced by the parsers. Functions are what the `BcProgram` program executes.
4356
4357 #### Main and Read Functions
4358
4359 There are two functions that always exist, which I call the "main" and "read"
4360 functions.
4361
4362 The "main" function is the function in which any code and data outside other
4363 functions is put. Basically, it is the function where the scripting code ends
4364 up.
4365
4366 The "read" function is the function that is reset and parsed every time a call
4367 to the `read()` builtin function happens.
4368
4369 #### `dc` Strings
4370
4371 In `dc`, strings can be executed, and since there are no actual "functions" in
4372 `dc`, strings are handled as functions. In fact, they are effectively translated
4373 into functions by parsing.
4374
4375 ##### Tail Calls
4376
4377 Since strings in `dc` are functions, and the fact that `dc` has no native loops,
4378 such loops are implemented in `dc` code using strings with conditional execution
4379 commands at the end of strings.
4380
4381 When such conditional execution, or even unconditional execution, commands are
4382 the very last commands in a string, then `dc` can perform a [tail call][202].
4383
4384 This is done by recording the fact that a tail call happened, done by
4385 incrementing an integer on a stack. When a string is executed *without* a tail
4386 call, a new entry is pushed onto the stack with the integer `1`.
4387
4388 When a string finally quits that followed tail calls, its stack entry is popped,
4389 eliminating all of those tail calls.
4390
4391 Why perform tail calls? Because otherwise, `dc` would be subject to the same
4392 thing that plagues [functional programming languages][203]: stack overflow. In
4393 `dc`'s case, that would manifest itself as a growing [heap][204], because the
4394 execution stack is stored on the heap, until a fatal allocation failure would
4395 occur.
4396
4397 #### Execution
4398
4399 Execution is handled by an interpreter implemented using `BcProgram` and code
4400 in [`src/program.c`][53].
4401
4402 The interpreter is a mix between a [stack machine][210] and a [register
4403 machine][211]. It is a stack machine in that operations happen on a stack I call
4404 the "results stack," but it is a register machine in that items on the stack can
4405 be stored to and loaded from "registers" (`dc` terminology), variables (`bc`
4406 terminology), and arrays.
4407
4408 ##### Stacks
4409
4410 There are two stacks in the interpreter:
4411
4412 * The "results" stack (as mentioned above).
4413 * The "execution" stack.
4414
4415 The results stack (the `results` field of the `BcProgram` struct) is the stack
4416 where the results of computations are stored. It is what makes the interpreter
4417 part [stack machine][210]. It is filled with `BcResult`'s.
4418
4419 The execution stack (the `stack` field of the `BcProgram` struct) is the stack
4420 that tracks the current execution state of the interpreter. It is the presence
4421 of this separate stack that allows the interpreter to implement the machine as a
4422 loop, rather than recursively. It is filled with `BcInstPtr`'s, which are the
4423 "instruction pointers."
4424
4425 These instruction pointers have three fields, all integers:
4426
4427 * `func`, the index of the function that is currently executing.
4428 * `idx`, the index of the next bytecode instruction to execute in the function's
4429   bytecode array.
4430 * `len`, which is the length of the results stack when the function started
4431   executing. This is not used by `dc`, but it used by `bc` because functions
4432   in `bc` should never affect the results stack of their callers.
4433
4434 With these three fields, and always executing using the instruction pointer at
4435 the top of the execution stack, the interpreter can always keep track of its
4436 execution.
4437
4438 When a function or a string starts executing, a new `BcInstPtr` is pushed onto
4439 the execution stack for it. This includes if a function was called recursively.
4440 And then, when the function or string returns, its `BcInstPtr` is popped off of
4441 the execution stack.
4442
4443 ##### Bytecode
4444
4445 Execution of functions are done through bytecode produced directly by the
4446 parsers (see the [Parsing][209]). This bytecode is stored in the `code`
4447 [vector][111] of the `BcFunc` struct.
4448
4449 This is a vector for two reasons:
4450
4451 * It makes it easier to add bytecode to the vector in the parsers.
4452 * `bc` allows users to redefine functions.
4453
4454 The reason I can use bytecode is because there are less than 256 instructions,
4455 so an `unsigned char` can store all the bytecodes.
4456
4457 ###### Bytecode Indices
4458
4459 There is one other factor to bytecode: there are instructions that need to
4460 reference strings, constants, variables, or arrays. Bytecode need some way to
4461 reference those things.
4462
4463 Fortunately, all of those things can be referenced in the same way: with indices
4464 because all of the items are in vectors.
4465
4466 So `bc` has a way of encoding an index into bytecode. It does this by, after
4467 pushing the instruction that references anything, pushing a byte set to the
4468 length of the index in bytes, then the bytes of the index are pushed in
4469 little-endian order.
4470
4471 Then, when the interpreter encounters an instruction that needs one or more
4472 items, it decodes the index or indices there and updates the `idx` field of the
4473 current `BcInstPtr` to point to the byte after the index or indices.
4474
4475 One more thing: the encoder of the indices only pushes as many bytes as
4476 necessary to encode the index. It stops pushing when the index has no more bytes
4477 with any 1 bits.
4478
4479 ##### Variables
4480
4481 In `bc`, the vector of variables, `vars` in `BcProgram`, is not a vector of
4482 numbers; it is a vector of vector of numbers. The first vector is the vector of
4483 variables, the second is the variable stack, and the last level is the actual
4484 number.
4485
4486 This is because both `bc` and `dc` need variables to be stacks.
4487
4488 For `dc`, registers are *defined* to be stacks.
4489
4490 For `bc`, variables as stacks is how function arguments/parameters and function
4491 `auto` variables are implemented.
4492
4493 When a function is called, and a value needs to be used as a function argument,
4494 a copy of the value is pushed onto the stack corresponding to the variable with
4495 the same name as the function's parameter. For `auto` variables, a new number
4496 set to zero is pushed onto each stack corresponding to the `auto` variables.
4497 (Zero is used because the [`bc` spec][2] requires that `auto` variables are set
4498 to zero.)
4499
4500 It is in this way that the old value of the variable, which may not even be
4501 related to the function parameter or `auto` variable, is preserved while the
4502 variable is used as a function parameter or `auto` variable.
4503
4504 When the function returns, of course, the stacks of the variables for the
4505 parameters and `auto`'s will have their top item popped, restoring the old value
4506 as it was before the function call.
4507
4508 ##### Arrays
4509
4510 Like variables, arrays are also implemented as stacks. However, because they are
4511 arrays, there is yet another level; the `arrs` field in `BcProgram` is a vector
4512 of vectors of vectors of numbers. The first of the two levels is the vector of
4513 arrays, the second the stack of for each array, the third the actual array, and
4514 last the numbers in the array.
4515
4516 `dc` has no need of this extra stack, but `bc` does because arrays can be
4517 function parameters themselves.
4518
4519 When arrays are used for function arguments, they are copied with a deep copy;
4520 each item of the source vector is copied. This is because in `bc`, according to
4521 the [`bc` spec][2], all function arguments are passed by value.
4522
4523 However, array references are possible (see below).
4524
4525 When arrays are used as `auto`'s, a new vector is pushed with one element; if
4526 more elements are needed, the array is grown automatically, and new elements are
4527 given the value of zero.
4528
4529 In fact, if *any* array is accessed and does not have an element at that index,
4530 the array is automatically grown to that size, and all new elements are given
4531 the value zero. This behavior is guaranteed by the [`bc` spec][2].
4532
4533 ###### Array References
4534
4535 Array references had to be implemented as vectors themselves because they must
4536 be pushed on the vectors stacks, which, as seen above, expect vectors
4537 themselves.
4538
4539 So thus, references are implemented as vectors on the vector stacks. These
4540 vectors are not vectors of vectors themselves; they are vectors of bytes; in
4541 fact, the fact that they are byte vectors and not vector vectors is how a
4542 reference vector is detected.
4543
4544 These reference vectors always have the same two things pushed: a byte encoding
4545 (the same way bytecode indices are) of the referenced vector's index in the
4546 `arrs` vector, and a byte encoding of the referenced vectors index in the vector
4547 stack.
4548
4549 If an item in a referenced vector is needed, then the reference is dereferenced,
4550 and the item is returned.
4551
4552 If a reference vector is passed to a function that does *not* expect a
4553 reference, the vector is dereferenced and a deep copy is done, in the same way
4554 as vectors are copied for normal array function parameters.
4555
4556 ### Callbacks
4557
4558 There are many places in `bc` and `dc` where function pointers are used:
4559
4560 * To implement destructors in vectors. (See the [Vectors][111] section.)
4561 * To select the correct lex and parse functions for `bc` and `dc`.
4562 * To select the correct function to execute unary operators.
4563 * To select the correct function to execute binary operators.
4564 * To calculate the correct number size for binary operators.
4565 * To print a "digit" of a number.
4566 * To seed the pseudo-random number generator.
4567
4568 And there might be more.
4569
4570 In every case, they are used for reducing the amount of code. Instead of
4571 `if`/`else` chains, such as:
4572
4573 ```
4574 if (BC_IS_BC) {
4575         bc_parse_parse(vm.parse);
4576 }
4577 else {
4578         dc_parse_parse(vm.parse);
4579 }
4580 ```
4581
4582 The best example of this is `bc_num_binary()`. It is called by every binary
4583 operator. It figures out if it needs to allocate space for a new `BcNum`. If so,
4584 it allocates the space and then calls the function pointer to the *true*
4585 operation.
4586
4587 Doing it like that shrunk the code *immensely*. First, instead of every single
4588 binary operator duplicating the allocation code, it only exists in one place.
4589 Second, `bc_num_binary()` itself does not have a massive `if`/`else` chain or a
4590 `switch` statement.
4591
4592 But perhaps the most important use was for destructors in vectors.
4593
4594 Most of the data structures in `bc` are stored in vectors. If I hadn't made
4595 destructors available for vectors, then ensuring that `bc` had no memory leaks
4596 would have been nigh impossible. As it is, I check `bc` for memory leaks every
4597 release when I change the code, and I have not released `bc` after version
4598 `1.0.0` with any memory leaks, as far as I can remember anyway.
4599
4600 ### Numbers
4601
4602 In order to do arbitrary-precision math, as `bc` must do, there must be some way
4603 of representing arbitrary-precision numbers. `BcNum` in [`include/num.h`][184]
4604 is `bc`'s way of doing that.
4605
4606 (Note: the word ["limb"][214] is used below; it has a specific meaning when
4607 applied to arbitrary-precision numbers. It means one piece of the number. It can
4608 have a single digit, which is what GNU `bc` does, or it can have multiple, which
4609 is what this `bc` does.)
4610
4611 This struct needs to store several things:
4612
4613 * The array of limbs of the number. This is the `num` field.
4614 * The location of the decimal point. This is the `rdx` (short for [radix][215])
4615   field.
4616 * The number of limbs the number has. This is the `len` field.
4617 * Whether the number is negative or not. This is the least significant bit of
4618   the `rdx` field. More on that later.
4619
4620 In addition, `bc`'s number stores the capacity of the limb array; this is the
4621 `cap` field.
4622
4623 If the number needs to grow, and the capacity of the number is big enough, the
4624 number is not reallocated; the number of limbs is just added to.
4625
4626 There is one additional wrinkle: to make the usual operations (binary operators)
4627 fast, the decimal point is *not* allowed to be in the middle of a limb; it must
4628 always be between limbs, after all limbs (integer), or before all limbs (real
4629 between -1 and 1).
4630
4631 The reason for this is because addition, subtraction, multiplication, and
4632 division expect digits to be lined up on the decimal point. By requiring that it
4633 be between limbs, no extra alignment is needed, and those operations can proceed
4634 without extra overhead.
4635
4636 This does make some operations, most notably extending, truncating, and
4637 shifting, more expensive, but the overhead is constant, and these operations are
4638 usually cheap compared to the binary operators anyway.
4639
4640 This also requires something else: `bc` numbers need to know *exactly* how many
4641 decimal places they have after the decimal point. If the decimal point must be
4642 inbetween limbs, the last decimal place could be in the middle of a limb. The
4643 amount of decimal places in a number is carefully tracked and stored in the
4644 `scale` field, and this number must always coincide with the `rdx` field by the
4645 following formula:
4646
4647 ```
4648 scale + (BC_BASE_DIGS - 1) / BC_BASE_DIGS == rdx >> 1
4649 ```
4650
4651 (`BC_BASE_DIGS` is the number of decimal digits stored in one limb. It is 9 on
4652 64-bit systems and 4 on other systems.)
4653
4654 Yes, `rdx` is shifted; that is because the negative bit is stored in the least
4655 significant bit of the `rdx` field, and the actual radix (amount of limbs after
4656 the decimal/radix point) is stored in the rest of the bits. This is safe because
4657 `BC_BASE_DIGS` is always at least 4, which means `rdx` will always need at least
4658 2 bits less than `scale`.
4659
4660 In addition to `rdx` always matching `scale`, another invariant is that `rdx`
4661 must always be less than or equal to `len`. (Because `scale` may be greater than
4662 `rdx`, `scale` does not have to be less than or equal to `len`.)
4663
4664 Another invariant is that `len` must always be less than or equal to `cap`, for
4665 obvious reasons.
4666
4667 The last thing programmers need to know is that the limb array is stored in
4668 little-endian order. This means that the last decimal places are in the limb
4669 stored at index 0, and the most significant digits are stored at index `len-1`.
4670
4671 This is done to make the most important operations fast. Addition and
4672 subtraction are done from least significant to most significant limbs, which
4673 means they can speed through memory in the way most computers are best at.
4674 Multiplication does the same, sort of, and with division, it matters less.
4675 Comparison does need to go backwards, but that's after exhausting all other
4676 alternatives, including for example, checking the length of the integer portion
4677 of each limb array.
4678
4679 Finally, here are some possible special situations with numbers and what they
4680 mean:
4681
4682 * `len == 0`: the number equals 0.
4683 * `len == 0 && scale != 0`: the number equals 0, but it has a `scale` value.
4684   This is the only case where `scale` does not have to coincide with `rdx`
4685   This can happen with division, for example, that sets a specific `scale` for
4686   the result value but may produce 0.
4687 * `(rdx >> 1) == len`: the number is greater than or equal to 1, or less than or
4688   equal to -1.
4689 * `(rdx >> 1) < len`: the number is greater than -1 and less than 1, not
4690   including 0, although this will be true for 0 as well. However, 0 is always
4691   assumed to be represented by `len == 0`.
4692 * `(rdx >> 1) == 0`: the number is an integer. In this case, `scale` must also
4693   equal 0.
4694
4695 #### Math Style
4696
4697 When I wrote the math for `bc`, I adopted a certain style that, if known, will
4698 make it easier to understand the code. The style follows these rules:
4699
4700 * `BcNum` arguments always come before arguments of other types.
4701 * Among the `BcNum` arguments, the operands always come first, and the `BcNum`
4702   where the result(s) will be stored come last.
4703 * Error checking is placed first in the function.
4704 * Easy cases are placed next.
4705 * Preparation, such as allocating temporaries, comes next.
4706 * The actual math.
4707 * Cleanup and ensuring invariants.
4708
4709 While these rules are not hard and fast, using them as a guide will probably
4710 help.
4711
4712 #### Digit Clamping
4713
4714 There is a question of what to do when parsing numbers and coming across digits
4715 that are greater than or equal to the current `ibase`. `bc` and `dc` do one of
4716 two things:
4717
4718 * Clamp the digit to the maximum correct digit, or
4719 * Use the digit as-is, multiplying it by the `ibase` to the power of the digit's
4720   position (starting from 0 at the least significant digit).
4721
4722 The former behavior is what GNU `bc` does, and the latter is what GNU `dc`, the
4723 OpenBSD `bc`, and the OpenBSD `dc` do.
4724
4725 It is possible to switch between the two with the `-c` and `-C` command-line
4726 options. However, it is important for developers to understand that both
4727 behaviors exist and should exist.
4728
4729 The library can also do both, and it is set in a global for each thread.
4730
4731 ### Strings as Numbers
4732
4733 Strings can be assigned to variables. This is a problem because the vectors for
4734 variable stacks expect `BcNum` structs only.
4735
4736 While I could have made a union, I decided that the complexity of adding an
4737 entirely new type, with destructor and everything, was not worth it. Instead, I
4738 took advantage of the fact that `free()`, when passed a `NULL` pointer, will do
4739 nothing.
4740
4741 Using that, I made it so `BcNum`'s could store strings instead. This is marked
4742 by the `BcNum` having a `NULL` limb array (`num`) and a `cap` of 0 (which should
4743 *never* happen with a real number, though the other fields could be 0).
4744
4745 The `BcNum` stores the function that stores the string in the `rdx` field, and
4746 it stores the index of the string in the `scale` field. This is used to actually
4747 load the string if necessary.
4748
4749 Note that historically, string information was stored in the `loc` field of
4750 the `d` union in a `BcResult`. This was changed recently to standardize; now,
4751 all string information are stored in the `n` field of the `d` union regardless.
4752 This means that all string information is stored in `BcNum`'s. This removes
4753 extra cases.
4754
4755 Also, if a temp is made with a string, then the result type should still be
4756 `BC_RESULT_STR`, not `BC_RESULT_TEMP`. This is to make it easier to do type
4757 checks.
4758
4759 ### Pseudo-Random Number Generator
4760
4761 In order to understand this section, I suggest you read the information in the
4762 manpages about the pseudo-random number generator (PRNG) first; that will help
4763 you understand the guarantees it has, which is important because this section
4764 delves into implementation details.
4765
4766 First, the PRNG I use is seeded; this is because most OS's have an excellent
4767 cryptographically secure PRNG available via command-line, usually
4768 `/dev/urandom`, but the only *seeded* PRNG available is usually `bash`'s
4769 `$RANDOM`, which is essentially a wrapper around C's `rand()`.
4770
4771 `rand()` is...bad. It is only guaranteed to return 15 bits of random data.
4772 Obviously, getting good random data out of that would be hard with that alone,
4773 but implementations also seem to be poor.
4774
4775 On top of that, `bc` is an arbitrary-precision calculator; if I made it able to
4776 generate random numbers, I could make it generate random numbers of any size,
4777 and since it would be seeded, results would be reproducible, when wanted.
4778
4779 So to get that, I needed a seeded PRNG with good characteristics. After scouring
4780 the Internet, I decided on the [PCG PRNG][215], mostly because of [this blog
4781 post][216]. Part of the reason was the behavior of the xoroshiro128+ author, who
4782 hates on PCG and its author, but also because PCG seemed to do better when
4783 tested by independent parties.
4784
4785 After that decision, I faced a challenge: PCG requires 255 bits of seed: 128 for
4786 the actual seed, and 127 for the "increment." (Melissa O'Neill, the PCG author,
4787 likens the increment to selecting a codebook.)
4788
4789 I could, of course, put the entire 255 bits into one massive arbitrary-precision
4790 number; `bc` is good at that, after all. But that didn't sit right with me
4791 because it would mean any seed selected by users would have the real portion
4792 ignored, which is stupid in a program like `bc`.
4793
4794 Instead, I decided to make the integer portion the increment (clamped down to
4795 size), and the real portion the seed.
4796
4797 In most cases, this would be a bad idea because you cannot, in general, know how
4798 many decimal places you need to represent any number with `n` real digits in
4799 base `b` in another base. However, there is an easy to how many decimal digits
4800 after the decimal point it takes to represent reals of base 2 in base 10: the
4801 power of two.
4802
4803 It turns out that, for base 2 represented in base 10, the power of 2 is
4804 *exactly* how many digits are necessary to represent *any* number `n/2^p`, where
4805 `p` is the power of 2. This is because at every halving, the number of decimal
4806 places increases by 1:
4807
4808 ```
4809 0.5
4810 0.25
4811 0.125
4812 0.0625
4813 0.03125
4814 0.015625
4815 ...
4816 ```
4817
4818 This happens because because every decimal representation of `1/2^p` ends with a
4819 5 because 5 is half of 10. Because 5 is odd, half of it always ends with the
4820 digits 25, where 2 is where the previous 5 was, and the new 5 is one decimal
4821 place further right.
4822
4823 So the algorithm to convert all 255 bits of the seed is as follows:
4824
4825 1.      Convert the increment to a `BcNum`.
4826 2.      Convert the seed to a `BcNum`.
4827 3.      Divide the seed by `2^128` with a `scale` of 128. (For 32-bit systems,
4828         substitute 64 bits for 128.)
4829 4.      Add the two numbers together.
4830
4831 Likewise, the algorithm to convert from a user-supplied number to a seed is:
4832
4833 1.      Truncate a copy of the number.
4834 2.      Subtract the result from #1 from the original number. This gives the real
4835         portion of the number.
4836 3.      Clamp the result of #1 to 127 (or 63) bits. This is the increment.
4837 4.      Multiply the result of #2 by `2^128`.
4838 5.      Truncate the result of #4. This is the seed.
4839
4840 #### Generating Arbitrary-Precision Numbers
4841
4842 I wrote a function (`bc_rand_bounded()`) that will return unbiased results with
4843 any bound below the max that PCG can generate.
4844
4845 To generate an integer of arbitrary size using a bound, `bc` simply uses
4846 `bc_rand_bounded()` to generate numbers with a bound `10^BC_BASE_DIGS` for as
4847 many limbs as needed to satisfy the bigger bound.
4848
4849 To generate numbers with arbitrary precision after the decimal point, `bc`
4850 merely generates an arbitrary precision integer with the bound `10^p`, where `p`
4851 is the desired number of decimal places, then divides in by `10^p` with a
4852 `scale` of `p`.
4853
4854 ## Debug Code
4855
4856 Besides building `bc` in debug mode with the `-g` flag to [`configure.sh`][69],
4857 programmers can also add `-DBC_DEBUG_CODE=1` to the `CFLAGS`. This will enable
4858 the inclusion of *a lot* of extra code to assist with debugging.
4859
4860 For more information, see all of the code guarded by `#if BC_DEBUG_CODE` in the
4861 [`include/`][212] directory and in the [`src/`][213] directory.
4862
4863 Yes, all of the code is guarded by `#if` preprocessor statements; this is
4864 because the code should *never* be in a release build, and by making programmers
4865 add this manually (not even an option to [`configure.sh`][69]), it is easier to
4866 ensure that never happens.
4867
4868 However, that said, the extra debug code is useful; that was why I kept it in.
4869
4870 ## Performance
4871
4872 While I have put in a lot of effort to make `bc` as fast as possible, there
4873 might be some things users can do to speed it up without changing the code.
4874
4875 First, users can probably use [profile-guided optimization][217] to optimize
4876 even better, using the test suite to profile.
4877
4878 Second, I included macros that might help branch placement and prediction:
4879
4880 * `BC_ERR(e)`
4881 * `BC_UNLIKELY(e)`
4882 * `BC_NO_ERR(e)`
4883 * `BC_LIKELY(e)`
4884
4885 `BC_ERR` is the same as `BC_UNLIKELY`, and `BC_NO_ERR` is the same as
4886 `BC_LIKELY`; I just added them to also document branches that lead to error
4887 conditions or *away* from error conditions.
4888
4889 Anyway, if `BC_LIKELY` and `BC_UNLIKELY` are not defined during compilation,
4890 they expand to nothing but the argument they were given.
4891
4892 They can, however, be defined to `__builtin_expect((e), 1)` and
4893 `__builtin_expect((e), 0)`, respectively, on GCC and Clang for better branch
4894 prediction and placement. (For more information about `__builtin_expect()` see
4895 the [GCC documentation][218].)
4896
4897 There might be other compilers that can take advantage of that, but I don't know
4898 anything about that.
4899
4900 Also, as stated in the [build manual][219], link-time optimization is excellent
4901 at optimizing this `bc`. Use it.
4902
4903 ### Benchmarks
4904
4905 To help programmers improve performance, I have built and assembled
4906 infrastructure to make benchmarking easy.
4907
4908 First, in order to easily run benchmarks, I created
4909 [`scripts/benchmark.sh`][220].
4910
4911 Second, I copied and adapted [`ministat.c`][223] [from FreeBSD][221], to make it
4912 easier to judge whether the results are significant or not.
4913
4914 Third, I made the `make` clean target `make clean_benchmarks`, to clean
4915 `scripts/ministat` and the generated benchmark files.
4916
4917 Fourth, I made it so [`scripts/benchmark.sh`][220] outputs the timing and memory
4918 data in a format that is easy for `scripts/ministat` to digest.
4919
4920 To add a benchmark, add a script in the right directory to generate the
4921 benchmark. Yes, generate.
4922
4923 All of the benchmarks are generated first, from `.bc` and `.dc` files in the
4924 [`benchmarks/bc/`][91] and [`benchmarks/dc/`][224]. This is so that massive
4925 amounts of data can be generated and then pushed through the calculators.
4926
4927 If you need to benchmark `bc` or `dc` with simple loops, have the generator
4928 files simply print the loop code.
4929
4930 ### Caching of Numbers
4931
4932 In order to provide some performance boost, `bc` tries to reuse old `BcNum`'s
4933 that have the default capacity (`BC_NUM_DEF_SIZE`).
4934
4935 It does this by allowing `bc_num_free()` to put the limb array onto a
4936 statically-allocated stack (it's just a global array with a set size). Then,
4937 when a `BcNum` with the default capacity is needed, `bc_num_init()` asks if any
4938 are available. If the answer is yes, the one on top of the stack is returned.
4939 Otherwise, `NULL` is returned, and `bc_num_free()` knows it needs to `malloc()`
4940 a new limb array.
4941
4942 When the stack is filled, any numbers that `bc` attempts to put on it are just
4943 freed.
4944
4945 This setup saved a few percent in my testing for version [3.0.0][32], which is
4946 when I added it.
4947
4948 ## `bcl`
4949
4950 At the request of one of my biggest users, I spent the time to make a build mode
4951 where the number and math code of `bc` could be wrapped into a library, which I
4952 called `bcl`.
4953
4954 This mode is exclusive; `bc` and `dc` themselves are *not* built when building
4955 `bcl`.
4956
4957 The only things in the `bc` math code that is not included is:
4958
4959 * Printing newlines (clients do not care about `bc`'s line lenth restriction).
4960 * `dc`'s stream print.
4961
4962 Even the [pseudo-random number generator][179] is included, with extra support
4963 for generating real numbers with it. (In `bc`, such support is in
4964 [`lib2.bc`][26].)
4965
4966 ### Signal Handling
4967
4968 `bcl` does not have the capability for signal handling (as of version `6.0.0`).
4969
4970 ### Threads
4971
4972 It is possible to use `bcl` across multiple threads. However, `bcl` must be
4973 initialized on each thread where it will be used.
4974
4975 This is because `bcl` uses thread-specific data to store the "globals" for each
4976 thread. This is to ensure that threads do not stomp on each other's numbers or
4977 other data structures.
4978
4979 ### Contexts
4980
4981 Contexts were an idea by the same user that requested `bcl`. They are meant to
4982 make it so multiple clients in one program can keep their data separate from
4983 each other.
4984
4985 ### Numbers
4986
4987 Numbers in `bcl` are literally indices into an encapsulated array of numbers,
4988 hidden in the context. These indices are then passed to clients to refer to
4989 numbers later.
4990
4991 ### Operand Consumption
4992
4993 Most math functions in `bcl` "consume" their operand arguments; the arguments
4994 are freed, whether or not an error is returned.
4995
4996 This is to make it easy to implement math code, like this:
4997
4998 ```
4999 n = bcl_add(bcl_mul(a, b), bcl_div(c, d));
5000 ```
5001
5002 If numbers need to be preserved, they can be with `bcl_dup()`:
5003
5004 ```
5005 n = bcl_add(bcl_mul(bcl_dup(a), bc_dup(b)), bcl_div(bcl_dup(c), bcl_dup(d)));
5006 ```
5007
5008 ### Errors
5009
5010 Errors can be encoded in the indices representing numbers, and where necessary,
5011 clients are responsible for checking those errors.
5012
5013 The encoding of errors is this: if an error happens, the value `0-error` is
5014 returned. To decode, do the exact same thing. Thus, any index above
5015 `0-num_errors` is an error.
5016
5017 If an index that represents an error is passed to a math function, that function
5018 propagates the error to its result and does not perform the math operation.
5019
5020 All of this is to, once again, make it easy to implement the math code as above.
5021
5022 However, where possible, errors are returned directly.
5023
5024 [1]: https://en.wikipedia.org/wiki/Bus_factor
5025 [2]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/bc.html#top
5026 [3]: https://en.wikipedia.org/wiki/Dc_(Unix)
5027 [4]: https://en.wikipedia.org/wiki/Reverse_Polish_notation
5028 [5]: ./bc/A.1.md#standard-library
5029 [6]: https://github.com/torvalds/linux/blob/master/kernel/time/timeconst.bc
5030 [7]: ./bc/A.1.md#extended-library
5031 [8]: #libbc-2
5032 [9]: #strgensh
5033 [10]: https://vimeo.com/230142234
5034 [11]: https://gavinhoward.com/2019/12/values-for-yao/
5035 [12]: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
5036 [13]: ./build.md#cross-compiling
5037 [14]: ./build.md
5038 [15]: #strgenc
5039 [16]: http://landley.net/toybox/about.html
5040 [17]: https://www.busybox.net/
5041 [18]: https://en.wikipedia.org/wiki/Karatsuba_algorithm
5042 [19]: https://clang-analyzer.llvm.org/scan-build.html
5043 [20]: https://www.valgrind.org/
5044 [21]: https://clang.llvm.org/docs/AddressSanitizer.html
5045 [22]: https://gavinhoward.com/2019/11/finishing-software/
5046 [23]: #history
5047 [24]: https://clang.llvm.org/docs/ClangFormat.html
5048 [25]: ./algorithms.md
5049 [26]: #lib2bc
5050 [27]: #vmh
5051 [28]: https://github.com/rain-1/linenoise-mob
5052 [29]: https://github.com/antirez/linenoise
5053 [30]: #bclh
5054 [31]: #argsh
5055 [32]: ../NEWS.md#3-0-0
5056 [33]: ../NEWS.md
5057 [34]: https://github.com/skeeto/optparse
5058 [35]: #opth
5059 [36]: #historyh
5060 [37]: #randh
5061 [38]: #langh
5062 [39]: #numc
5063 [40]: #bcc
5064 [41]: #bc_lexc
5065 [42]: #bc_parsec
5066 [43]: #libraryc
5067 [44]: #dcc
5068 [45]: #dc_lexc
5069 [46]: #dc_parsec
5070 [47]: #filec
5071 [48]: #historyc
5072 [49]: #langc
5073 [50]: #lexc
5074 [51]: #optc
5075 [52]: #parsec
5076 [53]: #programc
5077 [54]: #randc
5078 [55]: #fileh
5079 [56]: #readc
5080 [57]: #programh
5081 [58]: #vmc
5082 [59]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/gencat.html#top
5083 [60]: #manpagesh
5084 [61]: #bcl3md
5085 [62]: #bcl3
5086 [63]: #bclvcxproj
5087 [64]: #bclvcxprojfilters
5088 [65]: #bclsln
5089 [66]: #bcvcxproj
5090 [67]: #bcvcxprojfilters
5091 [68]: #bcsln
5092 [69]: #configuresh
5093 [70]: #makefilein
5094 [71]: #functionsh
5095 [72]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html#top
5096 [73]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18
5097 [74]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html#top
5098 [75]: #versionh
5099 [76]: ##posix-shell-scripts
5100 [77]: #tests
5101 [78]: #karatsubapy
5102 [79]: #bc-1
5103 [80]: #dc-1
5104 [81]: ./build.md#build-type
5105 [82]: #fuzzing-1
5106 [83]: #releasesh
5107 [84]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
5108 [85]: #locales-1
5109 [86]: #manuals-1
5110 [87]: #locale_installsh
5111 [88]: #locale_uninstallsh
5112 [89]: #bc1mdin
5113 [90]: #dc1mdin
5114 [91]: #bc
5115 [92]: https://pandoc.org/
5116 [93]: #release_settingstxt
5117 [94]: #aflpy
5118 [95]: #randmathpy
5119 [96]: #caching-of-numbers
5120 [97]: #error-handling
5121 [98]: #radamsatxt
5122 [99]: https://gitlab.com/akihe/radamsa
5123 [100]: #radamsash
5124 [101]: https://musl.libc.org/
5125 [102]: ./build.md#settings
5126 [103]: #test_settingstxt
5127 [104]: #test_settingssh
5128 [105]: #functionssh
5129 [106]: #bch
5130 [107]: https://en.wikipedia.org/wiki/Infix_notation
5131 [108]: https://en.wikipedia.org/wiki/Reverse_Polish_notation
5132 [109]: https://en.wikipedia.org/wiki/Shunting-yard_algorithm
5133 [110]: #bc-parsing
5134 [111]: #vectors
5135 [112]: https://git.yzena.com/Yzena/Yc/src/branch/master/include/yc/vector.h
5136 [113]: https://en.wikipedia.org/wiki/Signal_(IPC)
5137 [114]: #custom-io
5138 [115]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03
5139 [116]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/setjmp.html
5140 [117]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/longjmp.html
5141 [118]: https://www.youtube.com/watch?v=4PaWFYm0kEw
5142 [119]: #fuzz_prepsh
5143 [120]: #bc_aflyaml
5144 [121]: #bc_afl_continueyaml
5145 [122]: https://github.com/tmux/tmux
5146 [123]: https://tmuxp.git-pull.com/
5147 [124]: #test-suite
5148 [125]: https://aflplus.plus/
5149 [126]: #link-time-optimization
5150 [127]: #fuzzing-performance
5151 [128]: #radamsa
5152 [129]: #afl-quickstart
5153 [130]: #convenience
5154 [131]: #datac
5155 [132]: https://git.gavinhoward.com/gavin/vim-bc
5156 [133]: https://git.gavinhoward.com/gavin/bc_libs
5157 [134]: #debugging
5158 [135]: #asserts
5159 [136]: #portability
5160 [137]: https://pexpect.readthedocs.io/en/stable/
5161 [138]: #test-suite-portability
5162 [139]: #historypy
5163 [140]: #historysh
5164 [141]: #group-tests
5165 [142]: #build-system
5166 [143]: #generated-tests
5167 [144]: #benchmarks-1
5168 [145]: #gen
5169 [146]: #test-coverage
5170 [147]: #integration-with-the-build-system
5171 [148]: #test-scripts
5172 [149]: #standard-tests
5173 [150]: #script-tests
5174 [151]: #error-tests
5175 [152]: #stdin-tests
5176 [153]: #read-tests
5177 [154]: #other-tests
5178 [155]: #history-tests
5179 [156]: #bcl
5180 [157]: #bcl-test
5181 [158]: #bclc
5182 [159]: #valgrind
5183 [160]: #addresssanitizer-and-friends
5184 [161]: #bc-2
5185 [162]: #dc-2
5186 [163]: #alltxt-1
5187 [164]: #errorstxt
5188 [165]: #posix_errorstxt
5189 [166]: #timeconstsh
5190 [167]: #alltxt-3
5191 [168]: #errorstxt-1
5192 [169]: #scripts-1
5193 [170]: #scripts-2
5194 [171]: #alltxt-2
5195 [172]: #alltxt-4
5196 [173]: #async-signal-safe-signal-handling
5197 [174]: #vectorh
5198 [175]: #read_errorstxt
5199 [176]: #statush
5200 [177]: #numbers
5201 [178]: #math-style
5202 [179]: #pseudo-random-number-generator
5203 [180]: #lexh
5204 [181]: #parseh
5205 [182]: #dch
5206 [183]: #libraryh
5207 [184]: #numh
5208 [185]: #readh
5209 [186]: #maps
5210 [187]: #slabs-and-slab-vectors
5211 [188]: ./build.md#extra-math
5212 [189]: #command-line-history
5213 [190]: #scriptsed
5214 [191]: #linux-timeconstbc-script
5215 [192]: #corpuses
5216 [193]: ./build.md#history
5217 [194]: https://www.valgrind.org/docs/manual/mc-manual.html
5218 [195]: #othersh
5219 [196]: https://scan.coverity.com/
5220 [197]: https://clang-analyzer.llvm.org/
5221 [198]: https://unix.stackexchange.com/questions/253349/eintr-is-there-a-rationale-behind-it
5222 [199]: https://cr.yp.to/docs/selfpipe.html
5223 [200]: https://skarnet.org/cgi-bin/archive.cgi?2:mss:1607:201701:dfblejammjllfkggpcph
5224 [201]: https://slembcke.github.io/2020/10/12/CustomAllocators.html#1-slab-allocator
5225 [202]: https://en.wikipedia.org/wiki/Tail_call
5226 [203]: https://en.wikipedia.org/wiki/Functional_programming_language
5227 [204]: https://en.wikipedia.org/wiki/C_dynamic_memory_allocation
5228 [205]: #mainc
5229 [206]: #argc
5230 [207]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
5231 [208]: #functions
5232 [209]: #parsing
5233 [210]: https://en.wikipedia.org/wiki/Stack_machine
5234 [211]: https://en.wikipedia.org/wiki/Register_machine
5235 [212]: #include
5236 [213]: #src
5237 [214]: https://gmplib.org/manual/Nomenclature-and-Types
5238 [215]: https://en.wikipedia.org/wiki/Radix_point
5239 [216]: #main-and-read-functions
5240 [215]: https://www.pcg-random.org/
5241 [216]: https://lemire.me/blog/2017/08/22/testing-non-cryptographic-random-number-generators-my-results/
5242 [217]: https://en.wikipedia.org/wiki/Profile-guided_optimization
5243 [218]: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect
5244 [219]: ./build.md#optimization
5245 [220]: #benchmarksh
5246 [221]: https://cgit.freebsd.org/src/tree/usr.bin/ministat/ministat.c
5247 [222]: https://www.freebsd.org/cgi/man.cgi?query=ministat&apropos=0&sektion=0&manpath=FreeBSD+13.0-RELEASE+and+Ports&arch=default&format=html
5248 [223]: #ministatc
5249 [224]: #dc
5250 [225]: #allsh
5251 [226]: #errorssh
5252 [227]: #errorsh
5253 [228]: #vectorc
5254 [229]: https://github.com/gavinhoward/bc/pull/72