source/texk/web2c/triptrap/trapman.tex

   1 % The TRAP manual: How to validate MF --- last updated by D E Knuth on 4 Dec 89
   2 \font\eighttt= cmtt8
   3 \font\eightrm= cmr8
   4 \font\titlefont=cmssdc10 at 40pt
   5 \let\mc=\eightrm
   6 \font\logo=manfnt % font used for the METAFONT logo
   7 \def\MF{{\logo META}\-{\logo FONT}}
   8 \rm
   9 \let\mainfont=\tenrm
  10
  11 \def\.#1{\hbox{\tt#1}}
  12 \def\\#1{\hbox{\it#1\/\hskip.05em}} % italic type for identifiers
  13
  14 \parskip 2pt plus 1pt
  15 \baselineskip 12pt plus .25pt
  16
  17 \def\verbatim#1{\begingroup \frenchspacing
  18   \def\do##1{\catcode`##1=12 } \dospecials
  19   \parskip 0pt \parindent 0pt
  20   \catcode`\ =\active \catcode`\^^M=\active
  21   \tt \def\par{\ \endgraf} \obeylines \obeyspaces
  22   \input #1 \endgroup}
  23 % a blank line will be typeset at the end of the file;
  24 % if you're unlucky it will appear on a page by itself!
  25 {\obeyspaces\global\let =\ }
  26
  27 \output{\shipout\box255\global\advance\pageno by 1} % for the title page only
  28 \null
  29 \vfill
  30 \centerline{\titlefont A Torture Test}
  31 \vskip8pt
  32 \centerline{\titlefont for \logo ()*+,-.*}
  33 \vskip 24pt
  34 \centerline{by Donald E. Knuth}
  35 \centerline{Stanford University}
  36 \vskip 6pt
  37 \centerline{({\sl Version 2, January 1990\/})}
  38 \vfill
  39 \centerline{\vbox{\hsize 4in
  40 \noindent Programs that claim to be implementations of \MF84 are
  41 supposed to be able to process the test routine contained in this
  42 report, producing the outputs contained in this report.}}
  43 \vskip 24pt
  44 {\baselineskip 9pt
  45 \eightrm\noindent
  46 The preparation of this report was supported in part by the National Science
  47 Foundation under grants IST-8201926 and MCS-8300984,
  48 and by the System Development Foundation.
  49 {\logo opqrstuq} is a trademark of Addison-Wesley Publishing Company.
  50
  51
  52 }\pageno=0\eject
  53
  54 \output{\shipout\vbox{ % for subsequent pages
  55     \baselineskip0pt\lineskip0pt
  56     \hbox to\hsize{\strut
  57       \ifodd\pageno \hfil\eightrm\firstmark\hfil
  58         \mainfont\the\pageno
  59       \else\mainfont\the\pageno\hfil
  60         \eightrm\firstmark\hfil\fi}
  61     \vskip 10pt
  62     \box255}
  63   \global\advance\pageno by 1}
  64 \let\runninghead=\mark
  65 \outer\def\section#1.{\noindent{\bf#1.}\quad
  66   \runninghead{\uppercase{#1} }\ignorespaces}
  67
  68 \section Introduction.
  69 People often think that their programs are ``debugged'' when large applications
  70 have been run successfully. But system programmers know that a typical large
  71 application tends to use at most about 50 per cent of the instructions
  72 in a typical compiler. Although the other half of the code---which tends
  73 to be the ``harder half''---might be riddled with errors, the system seems
  74 to be working quite impressively until an unusual case shows up on the
  75 next day. And on the following day another error manifests itself, and so on;
  76 months or years go by before certain parts of the compiler are even
  77 activated, much less tested in combination with other portions of the system,
  78 if user applications provide the only tests.
  79
  80 How then shall we go about testing a compiler? Ideally we would like to
  81 have a formal proof of correctness, certified by a computer.
  82 This would give us a lot of confidence,
  83 although of course the formal verification program might itself be incorrect.
  84 A more serious drawback of automatic verification is that the formal
  85 specifications of the compiler are likely to be wrong, since they aren't
  86 much easier to write than the compiler itself. Alternatively, we can
  87 substitute an informal proof of correctness: The programmer writes his or
  88 her code in a structured manner and checks that appropriate relations
  89 remain invariant, etc. This helps greatly to reduce errors, but it cannot
  90 be expected to remove them completely; the task of checking a large
  91 system is sufficiently formidable that human beings cannot do it without
  92 making at least a few slips here and there.
  93
  94 Thus, we have seen that test programs are unsatisfactory if they are simply
  95 large user applications; yet some sort of test program is needed because
  96 proofs of correctness aren't adequate either. People have proposed schemes
  97 for constructing test data automatically from a program text, but such
  98 approaches run the risk of circularity, since they cannot assume that a
  99 given program has the right structure.
 100
 101 I have been having good luck with a somewhat different approach,
 102 first used in 1960 to debug an {\mc ALGOL} compiler. The idea is to
 103 construct a test file that is about as different from a typical user
 104 application as could be imagined. Instead of testing things that people
 105 normally want to do, the file tests complicated things that people would
 106 never dare to think of, and it embeds these complexities in still
 107 more arcane constructions. Instead of trying to make the compiler do the
 108 right thing, the goal is to make it fail (until the bugs have all been found).
 109
 110 To write such a fiendish test routine, one simply gets into a nasty frame
 111 of mind and tries to do everything in the unexpected way. Parameters
 112 that are normally positive are set negative or zero; borderline cases
 113 are pushed to the limit; deliberate errors are made in hopes that the
 114 compiler will not be able to recover properly from them.
 115
 116 A user's application tends to exercise 50\%\ of a compiler's logic,
 117 but my first fiendish tests tend to improve this to about 90\%. As the
 118 next step I generally make use of frequency-counting software to identify
 119 the instructions that have still not been called upon. Then I add ever more
 120 fiendishness to the test routine, until more than 99\%\ of the code
 121 has been used at least once. (The remaining bits are things that
 122 can occur only if the source program is really huge, or if certain
 123 fatal errors are detected; or they are cases so similar to other well-tested
 124 things that there can be little doubt of their validity.)
 125
 126 Of course, this is not guaranteed to work. But my experience in 1960 was
 127 that only two bugs were ever found in that {\mc ALGOL} compiler after it
 128 correctly translated that original fiendish test. And one of those bugs
 129 was actually present in the results of the test; I simply had failed to
 130 notice that the output was incorrect. Similar experiences occurred later
 131 during the 60s and 70s, with respect to a few assemblers, compilers,
 132 and simulators that I wrote.
 133
 134 This method of debugging, combined with the methodology of structured
 135 programming and informal proofs (otherwise known as careful desk checking),
 136 leads to greater reliability of production software than any other
 137 method I know. Therefore I have used it in developing \MF84, and the
 138 main bulk of this report is simply a presentation of the test program
 139 that was used to get the bugs out of \MF.
 140
 141 Such a test file is useful also after a program has been debugged, since
 142 it can be used to give some assurance that subsequent modifications don't
 143 mess things up.
 144
 145 The test file is called \.{TRAP.MF}, because of my warped sense of humor:
 146 \MF's companion system, \TeX, has a similar test file called \.{TRIP}, and I
 147 couldn't help thinking about Billy Goat Gruff and the story of ``trip,
 148 trap, trip, trap.''
 149
 150 The contents of this test file are so remote from what people actually
 151 do with \MF, I feel apologetic if I have to explain the correct
 152 translation of \.{TRAP.MF}; nobody really cares about most of the
 153 nitty-gritty rules that are involved. Yet I believe \.{TRAP} exemplifies
 154 the sort of test program that has outstanding diagnostic ability, as
 155 explained above.
 156
 157 If somebody claims to have a correct implementation of \MF, I will not
 158 believe it until I see that \.{TRAP.MF} is translated properly.
 159 I propose, in fact, that a program must meet two criteria before it
 160 can justifiably be called \MF: (1)~The person who wrote it must be
 161 happy with the way it works at his or her installation; and (2)~the
 162 program must produce the correct results from \.{TRAP.MF}.
 163
 164 \MF\ is in the public domain, and its algorithms are published;
 165 I've done this since I do not want to discourage its use by placing
 166 proprietary restrictions on the software. However, I don't want
 167 faulty imitations to masquerade as \MF\ processors, since users
 168 want \MF\ to produce identical results on different machines.
 169 Hence I am planning to do whatever I can to suppress any systems that
 170 call themselves \MF\ without meeting conditions (1) and~(2).
 171 I have copyrighted the programs so that I have some chance to forbid
 172 unauthorized copies; I explicitly authorize copying of correct
 173 \MF\ implementations, and not of incorrect ones!
 174
 175 The remainder of this report consists of appendices, whose contents ought
 176 to be described briefly here:
 177
 178 Appendix A explains in detail how to carry out a test of \MF, given
 179 a tape that contains copies of the other appendices.
 180
 181 Appendix B is \.{TRAP.MF}, the fiendish test file that has already
 182 been mentioned. People who think that they understand \MF\ are challenged
 183 to see if they know what \MF\ is supposed to do with this file.
 184 People who know only a little about \MF\ might still find it
 185 interesting to study Appendix~B, just to get some insights into the
 186 methodology advocated here.
 187
 188 Appendix C is \.{TRAPIN.LOG}, a correct transcript file \.{TRAP.LOG}
 189 that results if \.{INIMF} is applied to \.{TRAP.MF}. (\.{INIMF} is
 190 the name of a version of \MF\ that does certain initializations;
 191 this run of \.{INIMF} also creates a binary base file called \.{TRAP.BASE}.)
 192
 193 Appendix D is a correct transcript file \.{TRAP.LOG} that results if
 194 \.{INIMF} or any other version of \MF\ is applied to \.{TRAP.MF}
 195 with base file \.{TRAP.BASE}.
 196
 197 Appendix E is \.{TRAP.TYP}, the symbolic version of a correct output
 198 file \.{TRAP.72270GF} that was produced at the same time as the \.{TRAP.LOG}
 199 file of Appendix~D.
 200
 201 Appendix F is \.{TRAP.PL}, the symbolic version of a correct output
 202 file \.{TRAP.TFM} that was produced at the same time as the \.{TRAP.LOG}
 203 file of Appendix~D.
 204
 205 Appendix G is \.{TRAP.FOT}, an abbreviated version of Appendix D that
 206 appears on the user's terminal during the run that produces \.{TRAP.LOG},
 207 \.{TRAP.72270GF}, and \.{TRAP.TFM}.
 208
 209 The debugging of \MF\ and the testing of the adequacy of \.{TRAP.MF}
 210 could not have been done nearly as well as reported here except for
 211 the magnificent software support provided by my colleague David R. Fuchs.
 212 In particular, he extended our local Pascal compiler so that
 213 frequency counting and a number of other important features were added
 214 to its online debugging abilities.
 215
 216 The method of testing advocated here has one chief difficulty that deserves
 217 comment: I had to verify by hand that \MF\ did the right things
 218 to \.{TRAP.MF}. This took many hours, and perhaps I have missed
 219 something (as I did in 1960); I must confess that I have not checked
 220 every single number in Appendices D, E, and~F. However, I'm willing to pay
 221 $\$$81.92 to the first finder of any remaining bug in \MF, and I will
 222 be surprised if that bug doesn't show up also in one of these appendices.
 223
 224 \vfill\eject
 225
 226 \section Appendix A: How to test \MF.
 227
 228 \item{0.} Let's assume that you have a tape containing \.{TRAP.MF},
 229 \.{TRAPIN.LOG}, \.{TRAP.LOG}, \.{TRAP.TYP}, \.{TRAP.PL}, and \.{TRAP.FOT},
 230 as in Appendices B, C, D, E, F, and~G. Furthermore, let's suppose that you
 231 have a working \.{WEB} system, and that you have working programs
 232 \.{TFtoPL} and \.{GFtype}, as described in the \TeX ware and \MF ware reports.
 233
 234 \item{1.} Prepare a version of \.{INIMF}. (This means that your \.{WEB}
 235 change file should have {\bf init} and {\bf tini} defined to be null.)
 236 The {\bf debug} and {\bf gubed} macros should be null, in order to
 237 activate special printouts that occur when $\\{tracingedges}>1.0$.
 238 The {\bf stat} and {\bf tats} macros should also be null, so that
 239 statistics are kept. Set \\{mem\_top} and \\{mem\_max} to 3000
 240 (or to \\{mem\_min} plus 3000, if \\{mem\_min} isn't zero),
 241 for purposes of this test version.
 242 Also set $\\{error\_line}=64$, $\\{half\_error\_line}=32$,
 243 $\\{max\_print\_line}=72$, $\\{screen\_width}=100$, and
 244 $\\{screen\_depth}=200$; these parameters affect many of the lines of
 245 the test output, so your job will be much easier if you use the same
 246 settings that were used to produce Appendix~E. Also (if possible) set
 247 $\\{gf\_buf\_size}=8$, since this tests more parts of the program.
 248 You probably should also use the ``normal'' settings of other parameters
 249 found in \.{MF.WEB} (e.g., $\\{max\_internal}=100$, $\\{buf\_size}=500$,
 250 etc.), since these show up in a few lines of the test output. Finally,
 251 change \MF's screen-display routines by putting the following simple lines
 252 in the change file:
 253 $$\vbox{\halign{\tt#\hfil\cr
 254 \char`\@x Screen routines:\cr
 255 begin init\char`\_screen:=false;\cr
 256 \char`\@y\cr
 257 begin init\char`\_screen:=true;
 258  \char`\{screen instructions will be logged\char`\}\cr
 259 \char`\@z\cr}}$$
 260 None of the other screen routines (\\{update\_screen}, \\{blank\_rectangle},
 261 \\{paint\_row}) should be changed in any way; the effect will be to have
 262 \MF's actions recorded in the transcript files instead of on the screen,
 263 in a machine-independent way.
 264
 265 \item{2.} Run the \.{INIMF} prepared in step 1. In response to the first
 266 `\.{**}' prompt, type carriage return (thus getting another `\.{**}').
 267 Then type `\.{\char`\\input trap}'. You should get an output that matches
 268 the file \.{TRAPIN.LOG} (Appendix~C). Don't be alarmed by the error
 269 messages that you see, unless they are different from those in Appendix~C.
 270
 271 \def\sp{{\char'40}}
 272 \item{3.} Run \.{INIMF} again. This time type `\.{\sp\&trap\sp\sp trap\sp}'.
 273 (The spaces in this input help to check certain parts of \MF\ that
 274 aren't otherwise used.) You should get outputs \.{TRAP.LOG}, \.{TRAP.72270GF},
 275 and \.{TRAP.TFM}.
 276 Furthermore, your terminal should receive output that matches \.{TRAP.FOT}
 277 (Appendix~G). During the middle part of this test, however, the terminal
 278 will not be getting output, because \.{batchmode} is being
 279 tested; don't worry if nothing seems to be happening for a while---nothing
 280 is supposed to.
 281
 282 \item{4.} Compare the \.{TRAP.LOG} file from step 3 with the ``master''
 283 \.{TRAP.LOG} file of step~0. (Let's hope you put that master file in a
 284 safe place so that it wouldn't be clobbered.) There should be perfect
 285 agreement between these files except in the following respects:
 286
 287 \itemitem{a)} The dates and possibly the file names will
 288 naturally be different.
 289
 290 \itemitem{b)} If you had different values for \\{stack\_size}, \\{buf\_size},
 291 etc., the corresponding capacity values will be different when they
 292 are printed out at the end.
 293
 294 \itemitem{c)} Help messages may be different; indeed, the author encourages
 295 non-English help messages in versions of \MF\ for people who don't
 296 understand English as well as some other language.
 297
 298 \itemitem{d)} The total number and length of strings at the end and/or
 299 ``still untouched'' may well be different.
 300
 301 \itemitem{e)} If your \MF\ uses a different memory allocation or
 302 packing scheme, the memory usage statistics may change.
 303
 304 \itemitem{f)} If you use a different storage allocation scheme, the
 305 capsule numbers will probably be different, but the order of variables
 306 should be unchanged when dependent variables are shown. \MF\ should also
 307 choose the same variables to be dependent.
 308
 309 \itemitem{g)} If your computer handles integer division of negative operands
 310 in a nonstandard way, you may get results that are rounded differently.
 311 Although \TeX\ is careful to be machine-independent in this regard,
 312 \MF\ is not, because integer divisions are present in so many places.
 313
 314 \item{5.} Use \.{GFtype} to convert your file \.{TRAP.72270GF} to a file
 315 \.{TRAP.TYP}. (Both of \.{GFtype}'s options, i.e., mnemonic output and image
 316 output, should be enabled so that you get the maximum amount of output.)
 317 The resulting file should agree with the master \.{TRAP.TYP} file of step~0,
 318 assuming that your \.{GFtype} has the ``normal'' values of compile-time
 319 constants ($\\{top\_pixel}=69$, etc.).
 320
 321 \item{6.} Use \.{TFtoPL} to convert your file \.{TRAP.TFM} to a file
 322 \.{TRAP.PL}. The resulting file should agree with the master \.{TRAP.PL}
 323 file of step~0.
 324
 325 \item{7.} You might also wish to test \.{TRAP} with other versions of
 326 \MF\ (i.e., \.{VIRMF} or a production version with another base file
 327 preloaded). It should work unless \MF's primitives have been redefined in
 328 the base file. However, this step isn't essential, since all the code of
 329 \.{VIRMF} appears in \.{INIMF}; you probably won't catch any more errors
 330 this way, unless they would already become obvious from normal use of
 331 the~system.
 332
 333 \vfill\eject
 334
 335 \section Appendix B: The \.{TRAP.MF} file.
 336 The contents of the test routine are prefixed here with line numbers, for
 337 ease in comparing this file with the error messages printed later; the
 338 line numbers aren't actually present.
 339 \runninghead{APPENDIX B: \.{TRAP.MF} (CONTINUED)}
 340
 341 \vskip 8pt
 342 \begingroup\count255=0
 343 \everypar{\global\advance\count255 by 1
 344   \hbox to 20pt{\sevenrm\hfil\the\count255\ \ }}
 345 \verbatim{trap.mf}
 346 \endgroup
 347 \vfill\eject
 348
 349 \section Appendix C: The \.{TRAPIN.LOG} file.
 350 When \.{INIMF} makes the \.{TRAP.BASE} file, it also creates a file called
 351 \.{TRAP.LOG} that looks like this.
 352 \runninghead{APPENDIX C: \.{TRAPIN.LOG} (CONTINUED)}
 353
 354 \vskip8pt
 355 \verbatim{trapin.log}
 356 \vfill\eject
 357
 358 \section Appendix D: The \.{TRAP.LOG} file.
 359 Here is the major output of the \.{TRAP} test; it is generated by running
 360 \.{INIMF} and loading \.{TRAP.BASE}, then reading \.{TRAP.MF}.
 361 \runninghead{APPENDIX D: \.{TRAP.LOG} (CONTINUED)}
 362
 363 {\let\tt=\eighttt\leftskip 1in\baselineskip 9pt plus .1pt minus .1pt
 364 \vskip8pt
 365 \verbatim{trap.log}
 366 }
 367 \vfill\eject
 368
 369 \section Appendix E: The \.{TRAP.TYP} file.
 370 Here is another major component of the test. It shows the output of \.{GFtype}
 371 applied to the file \.{TRAP.72270GF} that is created at the same time
 372 Appendix D was produced.
 373 \runninghead{APPENDIX E: \.{TRAP.TYP} (CONTINUED)}
 374
 375 {\let\tt=\eighttt\leftskip 1in\baselineskip 9pt plus .1pt minus .1pt
 376 \vskip8pt
 377 \verbatim{trap.typ}
 378 }
 379 \vfill\eject
 380
 381 \section Appendix F: The \.{TRAP.PL} file.
 382 In this case we have the output of \.{TFtoPL}
 383 applied to the file \.{TRAP.TFM} that is created at the same time
 384 Appendix D was produced.
 385 \runninghead{APPENDIX F: \.{TRAP.PL} (CONTINUED)}
 386
 387 {\let\tt=\eighttt\leftskip 1in\baselineskip 9pt plus .1pt minus .1pt
 388 \vskip8pt
 389 \verbatim{trap.pl}
 390 }
 391 \vfill\eject
 392
 393 \section Appendix G: The \.{TRAP.FOT} file.
 394 This shows what appeared on the terminal while Appendix D was being produced.
 395 \runninghead{APPENDIX G: \.{TRAP.FOT} (CONTINUED)}
 396
 397 \vskip8pt
 398 \verbatim{trap.fot}
 399
 400 \vfill\end