manual/luatex-modifications.tex

   1 \environment luatex-style
   2 \environment luatex-logos
   3
   4 \startcomponent luatex-modifications
   5
   6 \startchapter[reference=modifications,title={Modifications}]
   7
   8 \startsection[title=The merged engines]
   9
  10 \startsubsection[title=The need for change]
  11
  12 The first version of \LUATEX\ only had a few extra primitives and it was largely
  13 the same as \PDFTEX. Then we merged substantial parts of \ALEPH\ into the code
  14 and got more primitives. When we got more stable teh decision was made to clean
  15 up the rather hybrid nature of the program. This means that some primnitives have
  16 been promoted to core primitives, often with a different name, and that others
  17 were removed. This made it possible to start cleaning up the code base. We will
  18 describe most in following paragraphs.
  19
  20 Besides the expected changes caused by new functionality, there are a number of
  21 not|-|so|-|expected changes. These are sometimes a side|-|effect of a new
  22 (conflicting) feature, or, more often than not, a change neccessary to clean up
  23 the internal interfaces. These will also be mentioned.
  24
  25 \stopsubsection
  26
  27 \startsubsection[title=Changes from \TEX\ 3.1415926]
  28
  29 Of course it all starts with traditional \TEX. Even if we started with \PDFTEX,
  30 most still comes from the original. But we divert a bit.
  31
  32 \startitemize
  33
  34 \startitem
  35     The current code base is written in \CCODE, not \PASCAL. We use \CWEB\
  36     when possible.
  37 \stopitem
  38
  39 \startitem
  40     See \in {chapter} [languages] for many small changes related to paragraph
  41     building, language handling and hyphenation. The most important change is
  42     that adding a brace group in the middle of a word (like in \type {of{}fice})
  43     does not prevent ligature creation.
  44 \stopitem
  45
  46 \startitem
  47     There is no pool file, all strings are embedded during compilation.
  48 \stopitem
  49
  50 \startitem
  51     The specifier \type {plus 1 fillll} does not generate an error. The extra
  52     \quote{l} is simply typeset.
  53 \stopitem
  54
  55 \startitem
  56     The upper limit to \type {\endlinechar} and \type {\newlinechar} is 127.
  57 \stopitem
  58
  59 \startitem
  60     The hz optimization code has been partially redone so that we no longer need
  61     to create extra font instances. The front- and backend have been decoupled and
  62     more efficient (\PDF) code is generated.
  63 \stopitem
  64
  65 \stopitemize
  66
  67 \stopsubsection
  68
  69 \startsubsection[title=Changes from \ETEX\ 2.2]
  70
  71 Being the de factor standard extension of course we provide the \ETEX\
  72 functionality, but with a few small adaptions.
  73
  74 \startitemize
  75
  76 \startitem
  77     The \ETEX\ functionality is always present and enabled so the prepended
  78     asterisk or \type {-etex} switch for \INITEX\ is not needed.
  79 \stopitem
  80
  81 \startitem
  82     The \TEXXET\ extension is not present, so the primitives \type
  83     {\TeXXeTstate}, \type {\beginR}, \type {\beginL}, \type {\endR} and \type
  84     {\endL} are missing.
  85 \stopitem
  86
  87 \startitem
  88     Some of the tracing information that is output by \ETEX's \type
  89     {\tracingassigns} and \type {\tracingrestores} is not there.
  90 \stopitem
  91
  92 \startitem
  93     Register management in \LUATEX\ uses the \ALEPH\ model, so the maximum value
  94     is 65535 and the implementation uses a flat array instead of the mixed
  95     flat|\&|sparse model from \ETEX.
  96 \stopitem
  97
  98 \startitem
  99     The \type {\savinghyphcodes} command is a no|-|op. \in {Chapter} [languages]
 100     explains why.
 101 \stopitem
 102
 103 \startitem
 104     When kpathsea is used to find files, \LUATEX\ uses the \type {ofm} file
 105     format to search for font metrics. In turn, this means that \LUATEX\ looks at
 106     the \type {OFMFONTS} configuration variable (like \OMEGA\ and \ALEPH) instead
 107     of \type {TFMFONTS} (like \TEX\ and \PDFTEX). Likewise for virtual fonts
 108     (\LUATEX\ uses the variable \type {OVFFONTS} instead of \type {VFFONTS}).
 109 \stopitem
 110
 111 \stopitemize
 112
 113 \stopsubsection
 114
 115 \startsubsection[title=Changes from \PDFTEX\ 1.40]
 116
 117 Because we want to produce \PDF\ the most natural starting point was the popular
 118 \PDFTEX\ program. We inherit the stable features, dropped most of the
 119 experimental code and promoted some functionality to core \LUATEX\ functionality
 120 which in turn triggered renaming primitives.
 121
 122 \startitemize
 123
 124 \startitem
 125     The (experimental) support for snap nodes has been removed, because it is
 126     much more natural to build this functionality on top of node processing and
 127     attributes. The associated primitives that are now gone are: \type
 128     {\pdfsnaprefpoint}, \type {\pdfsnapy}, and \type {\pdfsnapycomp}.
 129 \stopitem
 130
 131 \startitem
 132     The (experimental) support for specialized spacing around nodes has also been
 133     removed. The associated primitives that are now gone are: \type
 134     {\pdfadjustinterwordglue}, \type {\pdfprependkern}, and \type {\pdfappendkern}, as
 135     well as the five supporting primitives \type {\knbscode}, \type {\stbscode}, \type
 136     {\shbscode}, \type {\knbccode}, and \type {\knaccode}.
 137 \stopitem
 138
 139 \startitem
 140     A number of \quote {pdftex primitives} have been removed as they can be
 141     implemented using \LUA:
 142
 143     \start \raggedright
 144     \type {\pdfelapsedtime}, \type {\pdfescapehex}, \type {\pdfescapename}, \type
 145     {\pdfescapestring}, \type {\pdffiledump}, \type {\pdffilemoddate}, \type
 146     {\pdffilesize}, \type {\pdfforcepagebox}, \type {\pdflastmatch}, \type
 147     {\pdfmatch}, \type {\pdfmdfivesum}, \type {\pdfmovechars}, \type
 148     {\pdfoptionalwaysusepdfpagebox}, \type {\pdfoptionpdfinclusionerrorlevel},
 149     \type {\pdfresettimer}, \type {\pdfshellescape}, \type {\pdfstrcmp} and \type
 150     {\pdfunescapehex}
 151     \par \stop
 152 \stopitem
 153
 154 \startitem
 155     The version related primitives \type {\pdftexbanner}, \type {\pdftexversion}
 156     and \type {\pdftexrevision} are no longer present as there is no longer a
 157     strict relationship with \PDFTEX\ development.
 158 \stopitem
 159
 160 \startitem
 161     The experimental snapper mechanism has been removed and therefore also the
 162     primitives:
 163
 164     \start \raggedright
 165     \type {\pdfignoreddimen}, \type {\pdffirstlineheight}, \type
 166     {\pdfeachlineheight}, \type {\pdfeachlinedepth} and \type
 167     {\pdflastlinedepth}
 168     \par \stop
 169 \stopitem
 170
 171 \startitem
 172     The experimental primitives \type {\primitive}, \type {\ifprimitive}, \type
 173     {\ifabsnum} and \type {\ifabsdim} are promoted to core primitives. The \type
 174     {\pdf*} prefixed originals are not available.
 175 \stopitem
 176
 177 \startitem
 178     The \PNG\ transparency fix from 1.40.6 is not applied as high|-|level
 179     support is pending.
 180 \stopitem
 181
 182 \startitem
 183     Two extra token lists are provides, \type {\pdfxformresources} and \type
 184     {\pdfxformattr}, as an alternative to \type {\pdfxform} keywords.
 185 \stopitem
 186
 187 \startitem
 188     The current version of \LUATEX\ no longer replaces and|/|or merges fonts in
 189     embedded pdf files with fonts of the enveloping \PDF\ document. This
 190     regression may be temporary, depending on how the rewritten font backend will
 191     look like.
 192 \stopitem
 193
 194 \startitem
 195     The primitives \type {\pdfpagewidth} and \type {\pdfpageheight} have been removed
 196     because \type {\pagewidth} and \type {\pageheight} have that purpose.
 197 \stopitem
 198
 199 \startitem
 200     The primitives \type {\pdfnormaldeviate}, \type {\pdfuniformdeviate}, \type
 201     {\pdfsetrandomseed} and \type {\pdfrandomseed} have been promoted to core
 202     primitives without \type {pdf} prefix so the original commands are no longer
 203     recognized.
 204 \stopitem
 205
 206 \startitem
 207     The primitives \type {\ifincsname}, \type {\expanded} and \type {\quitvmode} are now
 208     core primitives.
 209 \stopitem
 210
 211 \startitem
 212     As the hz and protrusion mechanism are part of the core the related
 213     primitives \type {\lpcode}, \type {\rpcode}, \type {\efcode}, \type
 214     {\leftmarginkern}, \type {\rightmarginkern} are promoted to core primitives. The
 215     two commands \type {\protrudechars} and \type {\adjustspacing} replace their
 216     prefixed with \type {\pdf} originals.
 217 \stopitem
 218
 219 \startitem
 220     The \type {\tagcode} primitive is promoted to core primitive.
 221 \stopitem
 222
 223 \startitem
 224     The \type {\letterspacefont} feature is now part of the core but will not be
 225     changed (improved). We just provide it for legacy use.
 226 \stopitem
 227
 228 \startitem
 229     The \type {\pdfnoligatures} primitive is now \type {\ignoreligaturesinfont}.
 230 \stopitem
 231
 232 \startitem
 233     The \type {\pdffontexpand} primitive is now \type {\expandglyphsinfont}.
 234 \stopitem
 235
 236 \startitem
 237     Because position tracking is also available in \DVI\ mode the
 238     \type {\savepos}, \type {\lastxpos} and \type {\lastypos} commands now
 239     replace their \type {pdf} prefixed originals.
 240 \stopitem
 241
 242 \startitem
 243     Candidates for removal are \type {\pdfcolorstackinit} and \type
 244     {\pdfcolorstack}.
 245 \stopitem
 246
 247 \startitem
 248     Candidates for replacement are \type {\pdfoutput} (\type {\outputmode}) and
 249     \type {\pdfmatrix} (something with a normal syntax).
 250 \stopitem
 251
 252 \stopitemize
 253
 254 \stopsubsection
 255
 256 \startsubsection[title=Changes from \ALEPH\ RC4]
 257
 258 Because we wanted proper directional typesetting the \ALEPH\ mechanisms looked
 259 most attractive. These are rather close to the ones provided by \OMEGA, so what
 260 we say next applies to both these programs.
 261
 262 \startitemize
 263
 264 \startitem
 265     The extended 16-bit math primitives (\type {\omathcode} etc.) have been
 266     removed.
 267 \stopitem
 268
 269 \startitem
 270     The \OCP\ processing is no longer supported at all. As a consequence, the
 271     following primitives have been removed:
 272
 273     \start \raggedright
 274     \type {\ocp}, \type {\externalocp}, \type {\ocplist}, \type {\pushocplist},
 275     \type {\popocplist}, \type {\clearocplists}, \type {\addbeforeocplist}, \type
 276     {\addafterocplist}, \type {\removebeforeocplist}, \type {\removeafterocplist}
 277     and \type {\ocptracelevel}
 278     \par \stop
 279 \stopitem
 280
 281 \startitem
 282     \LUATEX\ only understands 4~of the 16~direction specifiers of \ALEPH: \type
 283     {TLT} (latin), \type {TRT} (arabic), \type {RTT} (cjk), \type {LTL}
 284     (mongolian). All other direction specifiers generate an error.
 285 \stopitem
 286
 287 \startitem
 288     The input translations from \ALEPH\ are not implemented, the related
 289     primitives are not available:
 290
 291     \start \raggedright
 292     \type {\DefaultInputMode}, \type {\noDefaultInputMode}, \type {\noInputMode},
 293     \type {\InputMode}, \type {\DefaultOutputMode}, \type {\noDefaultOutputMode},
 294     \type {\noOutputMode}, \type {\OutputMode}, \type {\DefaultInputTranslation},
 295     \type {\noDefaultInputTranslation}, \type {\noInputTranslation}, \type
 296     {\InputTranslation}, \type {\DefaultOutputTranslation}, \type
 297     {\noDefaultOutputTranslation}, \type {\noOutputTranslation} and \type
 298     {\OutputTranslation}
 299     \par \stop
 300 \stopitem
 301
 302 \startitem
 303     Several bugs hav ebeen fixed. The \type {\hoffset} bug when \type {\pagedir TRT}
 304     is gone, removing the need for an explicit fix to \type {\hoffset}. Also bug
 305     causing \type {\fam} to fail for family numbers above 15 is fixed. A fair amount
 306     of other minor bugs are fixed as well, most of these related to \type
 307     {\tracingcommands} output.
 308 \stopitem
 309
 310 \startitem
 311     The scanner for direction specifications now allows an optional space after
 312     the direction is completely parsed.
 313 \stopitem
 314
 315 \startitem
 316     The \type {^^} notation can come in five and six item repetitions also, to
 317     insert characters that do not fit in the BMP.
 318 \stopitem
 319
 320 \startitem
 321     Glues {\it immediately after} direction change commands are not legal
 322     breakpoints.
 323 \stopitem
 324
 325 \startitem
 326     Several mechanisms that need to be right|-|to|-|left aware have been
 327     improved. For instance placement of formula numbers.
 328 \stopitem
 329
 330 \startitem
 331     The page dimension related primitives \type {\pagewidth} and \type {\pageheight} have
 332     been promoted to core primitives.
 333 \stopitem
 334
 335 \startitem
 336     The primitives \type {\charwd}, \type {\charht}, \type {\chardp} and \type {\charit}
 337     have been removes as we have the \ETEX\ variants \type {\fontchar*}.
 338 \stopitem
 339
 340 \startitem
 341     The two dimension registers \type {\pagerightoffset} and \type
 342     {\pagebottomoffset} are now core primitives.
 343 \stopitem
 344
 345 \startitem
 346     The direction related primitives \type {\pagedir}, \type {\bodydir}, \type
 347     {\pardir}, \type {\textdir}, \type {\mathdir} and \type {\boxdir} are now
 348     core primitives.
 349 \stopitem
 350
 351 \startitem
 352     The promotion of primitives to core primitives as well as the removed of all
 353     others mean that the initialization namespace \type {aleph} is gone.
 354 \stopitem
 355
 356 \stopitemize
 357
 358 \stopsubsection
 359
 360 \startsubsection[title=Changes from standard \WEBC]
 361
 362 The compilation framework is \WEBC\ and we keep using that but without the
 363 \PASCAL\ to \CCODE\ step. This framework also provides some common features that
 364 deal with reading bytes from files and locating files in \TDS. This is what we do
 365 different:
 366
 367 \startitemize
 368
 369 \startitem
 370     There is no mltex support.
 371 \stopitem
 372
 373 \startitem
 374     There is no enctex support.
 375 \stopitem
 376
 377 \startitem
 378     The following commandline switches are silently ignored, even in non|-|\LUA\
 379     mode: \type {-8bit}, \type {-translate-file}, \type {-mltex}, \type {-enc}
 380     and \type {-etex}.
 381 \stopitem
 382
 383 \startitem
 384     The \type {\openout} whatsits are not written to the log file.
 385 \stopitem
 386
 387 \startitem
 388     Some of the so|-|called web2c extensions are hard to set up in non|-|\KPSE\
 389     mode because \type {texmf.cnf} is not read: \type {shell-escape} is off (but
 390     that is not a problem because of \LUA's \type {os.execute}), and the paranoia
 391     checks on \type {openin} and \type {openout} do not happen (however, it is
 392     easy for a \LUA\ script to do this itself by overloading \type {io.open}).
 393 \stopitem
 394
 395 \startitem
 396     The \quote{E} option does not do anything useful.
 397 \stopitem
 398
 399 \stopitemize
 400
 401 \stopsubsection
 402
 403 \stopsection
 404
 405 \startsection[title=Implementation notes]
 406
 407 \startsubsection[title=Memory allocation]
 408
 409 The single internal memory heap that traditional \TEX\ used for tokens and nodes
 410 is split into two separate arrays. Each of these will grow dynamically when
 411 needed.
 412
 413 The \type {texmf.cnf} settings related to main memory are no longer used (these
 414 are: \type {main_memory}, \type {mem_bot}, \type {extra_mem_top} and \type
 415 {extra_mem_bot}). \quote {Out of main memory} errors can still occur, but the
 416 limiting factor is now the amount of RAM in your system, not a predefined limit.
 417
 418 Also, the memory (de)allocation routines for nodes are completely rewritten. The
 419 relevant code now lives in the C file \type {texnode.c}, and basically uses a
 420 dozen or so \quote {avail} lists instead of a doubly|-|linked model. An extra
 421 function layer is added so that the code can ask for nodes by type instead of
 422 directly requisitioning a certain amount of memory words.
 423
 424 Because of the split into two arrays and the resulting differences in the data
 425 structures, some of the macros have been duplicated. For instance, there are now
 426 \type {vlink} and \type {vinfo} as well as \type {token_link} and \type
 427 {token_info}. All access to the variable memory array is now hidden behind a
 428 macro called \type {vmem}.
 429
 430 The implementation of the growth of two arrays (via reallocation) introduces a
 431 potential pitfall: the memory arrays should never be used as the left hand side
 432 of a statement that can modify the array in question.
 433
 434 The input line buffer and pool size are now also reallocated when needed, and the
 435 \type {texmf.cnf} settings \type {buf_size} and \type {pool_size} are silently
 436 ignored.
 437
 438 \stopsubsection
 439
 440 \startsubsection[title=Sparse arrays]
 441
 442 The \type {\mathcode}, \type {\delcode}, \type {\catcode}, \type {\sfcode}, \type {\lccode}
 443 and \type {\uccode} tables are now sparse arrays that are implemented in~\CCODE.
 444 They are no longer part of the \TEX\ \quote {equivalence table} and because each
 445 had 1.1 million entries with a few memory words each, this makes a major
 446 difference in memory usage.
 447
 448 The \type {\catcode}, \type {\sfcode}, \type {\lccode} and \type {\uccode} assignments do
 449 not yet show up when using the etex tracing routines \type {\tracingassigns} and
 450 \type {\tracingrestores} (code simply not written yet).
 451
 452 A side|-|effect of the current implementation is that \type {\global} is now more
 453 expensive in terms of processing than non|-|global assignments.
 454
 455 See \type {mathcodes.c} and \type {textcodes.c} if you are interested in the
 456 details.
 457
 458 Also, the glyph ids within a font are now managed by means of a sparse array and
 459 glyph ids can go up to index $2^{21}-1$.
 460
 461 \stopsubsection
 462
 463 \startsubsection[title=Simple single-character csnames]
 464
 465 Single|-|character commands are no longer treated specially in the internals,
 466 they are stored in the hash just like the multiletter csnames.
 467
 468 The code that displays control sequences explicitly checks if the length is one
 469 when it has to decide whether or not to add a trailing space.
 470
 471 Active characters are internally implemented as a special type of multi|-|letter
 472 control sequences that uses a prefix that is otherwise impossible to obtain.
 473
 474 \stopsubsection
 475
 476 \startsubsection[title=Compressed format]
 477
 478 The format is passed through zlib, allowing it to shrink to roughly half of the
 479 size it would have had in uncompressed form. This takes a bit more \CPU\ cycles
 480 but much less disk \IO, so it should still be faster.
 481
 482 \stopsubsection
 483
 484 \startsubsection[title=Binary file reading]
 485
 486 All of the internal code is changed in such a way that if one of the \type
 487 {read_xxx_file} callbacks is not set, then the file is read by a C function using
 488 basically the same convention as the callback: a single read into a buffer big
 489 enough to hold the entire file contents. While this uses more memory than the
 490 previous code (that mostly used \type {getc} calls), it can be quite a bit faster
 491 (depending on your I/O subsystem).
 492
 493 \stopsubsection
 494
 495 \stopsection
 496
 497 \stopchapter
 498
 499 \stopcomponent