1 \environment luatex-style
2 \environment luatex-logos
4 \startcomponent luatex-modifications
6 \startchapter[reference=modifications,title=
{Modifications
}]
8 \startsection[title=The merged engines
]
10 \startsubsection[title=The need for change
]
12 The first version of
\LUATEX\ only had a few extra primitives and it was largely
13 the same as
\PDFTEX. Then we merged substantial parts of
\ALEPH\ into the code
14 and got more primitives. When we got more stable teh decision was made to clean
15 up the rather hybrid nature of the program. This means that some primnitives have
16 been promoted to core primitives, often with a different name, and that others
17 were removed. This made it possible to start cleaning up the code base. We will
18 describe most in following paragraphs.
20 Besides the expected changes caused by new functionality, there are a number of
21 not|-|so|-|expected changes. These are sometimes a side|-|effect of a new
22 (conflicting) feature, or, more often than not, a change neccessary to clean up
23 the internal interfaces. These will also be mentioned.
27 \startsubsection[title=Changes from
\TEX\
3.1415926]
29 Of course it all starts with traditional
\TEX. Even if we started with
\PDFTEX,
30 most still comes from the original. But we divert a bit.
35 The current code base is written in
\CCODE, not
\PASCAL. We use
\CWEB\
40 See
\in {chapter
} [languages
] for many small changes related to paragraph
41 building, language handling and hyphenation. The most important change is
42 that adding a brace group in the middle of a word (like in
\type {of
{}fice
})
43 does not prevent ligature creation.
47 There is no pool file, all strings are embedded during compilation.
51 The specifier
\type {plus
1 fillll
} does not generate an error. The extra
52 \quote{l
} is simply typeset.
56 The upper limit to
\type {\endlinechar} and
\type {\newlinechar} is
127.
60 The hz optimization code has been partially redone so that we no longer need
61 to create extra font instances. The front- and backend have been decoupled and
62 more efficient (
\PDF) code is generated.
69 \startsubsection[title=Changes from
\ETEX\
2.2]
71 Being the de factor standard extension of course we provide the
\ETEX\
72 functionality, but with a few small adaptions.
77 The
\ETEX\ functionality is always present and enabled so the prepended
78 asterisk or
\type {-etex
} switch for
\INITEX\ is not needed.
82 The
\TEXXET\ extension is not present, so the primitives
\type
83 {\TeXXeTstate},
\type {\beginR},
\type {\beginL},
\type {\endR} and
\type
88 Some of the tracing information that is output by
\ETEX's
\type
89 {\tracingassigns} and
\type {\tracingrestores} is not there.
93 Register management in
\LUATEX\ uses the
\ALEPH\ model, so the maximum value
94 is
65535 and the implementation uses a flat array instead of the mixed
95 flat|\&|sparse model from
\ETEX.
99 The
\type {\savinghyphcodes} command is a no|-|op.
\in {Chapter
} [languages
]
104 When kpathsea is used to find files,
\LUATEX\ uses the
\type {ofm
} file
105 format to search for font metrics. In turn, this means that
\LUATEX\ looks at
106 the
\type {OFMFONTS
} configuration variable (like
\OMEGA\ and
\ALEPH) instead
107 of
\type {TFMFONTS
} (like
\TEX\ and
\PDFTEX). Likewise for virtual fonts
108 (
\LUATEX\ uses the variable
\type {OVFFONTS
} instead of
\type {VFFONTS
}).
115 \startsubsection[title=Changes from
\PDFTEX\
1.40]
117 Because we want to produce
\PDF\ the most natural starting point was the popular
118 \PDFTEX\ program. We inherit the stable features, dropped most of the
119 experimental code and promoted some functionality to core
\LUATEX\ functionality
120 which in turn triggered renaming primitives.
125 The (experimental) support for snap nodes has been removed, because it is
126 much more natural to build this functionality on top of node processing and
127 attributes. The associated primitives that are now gone are:
\type
128 {\pdfsnaprefpoint},
\type {\pdfsnapy}, and
\type {\pdfsnapycomp}.
132 The (experimental) support for specialized spacing around nodes has also been
133 removed. The associated primitives that are now gone are:
\type
134 {\pdfadjustinterwordglue},
\type {\pdfprependkern}, and
\type {\pdfappendkern}, as
135 well as the five supporting primitives
\type {\knbscode},
\type {\stbscode},
\type
136 {\shbscode},
\type {\knbccode}, and
\type {\knaccode}.
140 A number of
\quote {pdftex primitives
} have been removed as they can be
141 implemented using
\LUA:
144 \type {\pdfelapsedtime},
\type {\pdfescapehex},
\type {\pdfescapename},
\type
145 {\pdfescapestring},
\type {\pdffiledump},
\type {\pdffilemoddate},
\type
146 {\pdffilesize},
\type {\pdfforcepagebox},
\type {\pdflastmatch},
\type
147 {\pdfmatch},
\type {\pdfmdfivesum},
\type {\pdfmovechars},
\type
148 {\pdfoptionalwaysusepdfpagebox},
\type {\pdfoptionpdfinclusionerrorlevel},
149 \type {\pdfresettimer},
\type {\pdfshellescape},
\type {\pdfstrcmp} and
\type
155 The version related primitives
\type {\pdftexbanner},
\type {\pdftexversion}
156 and
\type {\pdftexrevision} are no longer present as there is no longer a
157 strict relationship with
\PDFTEX\ development.
161 The experimental snapper mechanism has been removed and therefore also the
165 \type {\pdfignoreddimen},
\type {\pdffirstlineheight},
\type
166 {\pdfeachlineheight},
\type {\pdfeachlinedepth} and
\type
172 The experimental primitives
\type {\primitive},
\type {\ifprimitive},
\type
173 {\ifabsnum} and
\type {\ifabsdim} are promoted to core primitives. The
\type
174 {\pdf*
} prefixed originals are not available.
178 The
\PNG\ transparency fix from
1.40.6 is not applied as high|-|level
183 Two extra token lists are provides,
\type {\pdfxformresources} and
\type
184 {\pdfxformattr}, as an alternative to
\type {\pdfxform} keywords.
188 The current version of
\LUATEX\ no longer replaces and|/|or merges fonts in
189 embedded pdf files with fonts of the enveloping
\PDF\
document. This
190 regression may be temporary, depending on how the rewritten font backend will
195 The primitives
\type {\pdfpagewidth} and
\type {\pdfpageheight} have been removed
196 because
\type {\pagewidth} and
\type {\pageheight} have that purpose.
200 The primitives
\type {\pdfnormaldeviate},
\type {\pdfuniformdeviate},
\type
201 {\pdfsetrandomseed} and
\type {\pdfrandomseed} have been promoted to core
202 primitives without
\type {pdf
} prefix so the original commands are no longer
207 The primitives
\type {\ifincsname},
\type {\expanded} and
\type {\quitvmode} are now
212 As the hz and protrusion mechanism are part of the core the related
213 primitives
\type {\lpcode},
\type {\rpcode},
\type {\efcode},
\type
214 {\leftmarginkern},
\type {\rightmarginkern} are promoted to core primitives. The
215 two commands
\type {\protrudechars} and
\type {\adjustspacing} replace their
216 prefixed with
\type {\pdf} originals.
220 The
\type {\tagcode} primitive is promoted to core primitive.
224 The
\type {\letterspacefont} feature is now part of the core but will not be
225 changed (improved). We just provide it for legacy use.
229 The
\type {\pdfnoligatures} primitive is now
\type {\ignoreligaturesinfont}.
233 The
\type {\pdffontexpand} primitive is now
\type {\expandglyphsinfont}.
237 Because position tracking is also available in
\DVI\ mode the
238 \type {\savepos},
\type {\lastxpos} and
\type {\lastypos} commands now
239 replace their
\type {pdf
} prefixed originals.
243 Candidates for removal are
\type {\pdfcolorstackinit} and
\type
248 Candidates for replacement are
\type {\pdfoutput} (
\type {\outputmode}) and
249 \type {\pdfmatrix} (something with a normal syntax).
256 \startsubsection[title=Changes from
\ALEPH\ RC4
]
258 Because we wanted proper directional typesetting the
\ALEPH\ mechanisms looked
259 most attractive. These are rather close to the ones provided by
\OMEGA, so what
260 we say next applies to both these programs.
265 The extended
16-bit math primitives (
\type {\omathcode} etc.) have been
270 The
\OCP\ processing is no longer supported at all. As a consequence, the
271 following primitives have been removed:
274 \type {\ocp},
\type {\externalocp},
\type {\ocplist},
\type {\pushocplist},
275 \type {\popocplist},
\type {\clearocplists},
\type {\addbeforeocplist},
\type
276 {\addafterocplist},
\type {\removebeforeocplist},
\type {\removeafterocplist}
277 and
\type {\ocptracelevel}
282 \LUATEX\ only understands
4~of the
16~direction specifiers of
\ALEPH:
\type
283 {TLT
} (latin),
\type {TRT
} (arabic),
\type {RTT
} (cjk),
\type {LTL
}
284 (mongolian). All other direction specifiers generate an error.
288 The input translations from
\ALEPH\ are not implemented, the related
289 primitives are not available:
292 \type {\DefaultInputMode},
\type {\noDefaultInputMode},
\type {\noInputMode},
293 \type {\InputMode},
\type {\DefaultOutputMode},
\type {\noDefaultOutputMode},
294 \type {\noOutputMode},
\type {\OutputMode},
\type {\DefaultInputTranslation},
295 \type {\noDefaultInputTranslation},
\type {\noInputTranslation},
\type
296 {\InputTranslation},
\type {\DefaultOutputTranslation},
\type
297 {\noDefaultOutputTranslation},
\type {\noOutputTranslation} and
\type
303 Several bugs hav ebeen fixed. The
\type {\hoffset} bug when
\type {\pagedir TRT
}
304 is gone, removing the need for an explicit fix to
\type {\hoffset}. Also bug
305 causing
\type {\fam} to fail for family numbers above
15 is fixed. A fair amount
306 of other minor bugs are fixed as well, most of these related to
\type
307 {\tracingcommands} output.
311 The scanner for direction specifications now allows an optional space after
312 the direction is completely parsed.
316 The
\type {^^
} notation can come in five and six item repetitions also, to
317 insert characters that do not fit in the BMP.
321 Glues
{\it immediately after
} direction change commands are not legal
326 Several mechanisms that need to be right|-|to|-|left aware have been
327 improved. For instance placement of formula numbers.
331 The page dimension related primitives
\type {\pagewidth} and
\type {\pageheight} have
332 been promoted to core primitives.
336 The primitives
\type {\charwd},
\type {\charht},
\type {\chardp} and
\type {\charit}
337 have been removes as we have the
\ETEX\ variants
\type {\fontchar*
}.
341 The two dimension registers
\type {\pagerightoffset} and
\type
342 {\pagebottomoffset} are now core primitives.
346 The direction related primitives
\type {\pagedir},
\type {\bodydir},
\type
347 {\pardir},
\type {\textdir},
\type {\mathdir} and
\type {\boxdir} are now
352 The promotion of primitives to core primitives as well as the removed of all
353 others mean that the initialization namespace
\type {aleph
} is gone.
360 \startsubsection[title=Changes from standard
\WEBC]
362 The compilation framework is
\WEBC\ and we keep using that but without the
363 \PASCAL\ to
\CCODE\ step. This framework also provides some common features that
364 deal with reading bytes from files and locating files in
\TDS. This is what we do
370 There is no mltex support.
374 There is no enctex support.
378 The following commandline switches are silently ignored, even in non|-|
\LUA\
379 mode:
\type {-
8bit
},
\type {-translate-file
},
\type {-mltex
},
\type {-enc
}
384 The
\type {\openout} whatsits are not written to the log file.
388 Some of the so|-|called web2c extensions are hard to set up in non|-|
\KPSE\
389 mode because
\type {texmf.cnf
} is not read:
\type {shell-escape
} is off (but
390 that is not a problem because of
\LUA's
\type {os.execute
}), and the paranoia
391 checks on
\type {openin
} and
\type {openout
} do not happen (however, it is
392 easy for a
\LUA\ script to do this itself by overloading
\type {io.open
}).
396 The
\quote{E
} option does not do anything useful.
405 \startsection[title=Implementation notes
]
407 \startsubsection[title=Memory allocation
]
409 The single internal memory heap that traditional
\TEX\ used for tokens and nodes
410 is split into two separate arrays. Each of these will grow dynamically when
413 The
\type {texmf.cnf
} settings related to main memory are no longer used (these
414 are:
\type {main_memory
},
\type {mem_bot
},
\type {extra_mem_top
} and
\type
415 {extra_mem_bot
}).
\quote {Out of main memory
} errors can still occur, but the
416 limiting factor is now the amount of RAM in your system, not a predefined limit.
418 Also, the memory (de)allocation routines for nodes are completely rewritten. The
419 relevant code now lives in the C file
\type {texnode.c
}, and basically uses a
420 dozen or so
\quote {avail
} lists instead of a doubly|-|linked model. An extra
421 function layer is added so that the code can ask for nodes by type instead of
422 directly requisitioning a certain amount of memory words.
424 Because of the split into two arrays and the resulting differences in the data
425 structures, some of the macros have been duplicated. For instance, there are now
426 \type {vlink
} and
\type {vinfo
} as well as
\type {token_link
} and
\type
427 {token_info
}. All access to the variable memory array is now hidden behind a
428 macro called
\type {vmem
}.
430 The implementation of the growth of two arrays (via reallocation) introduces a
431 potential pitfall: the memory arrays should never be used as the left hand side
432 of a statement that can modify the array in question.
434 The input line buffer and pool size are now also reallocated when needed, and the
435 \type {texmf.cnf
} settings
\type {buf_size
} and
\type {pool_size
} are silently
440 \startsubsection[title=Sparse arrays
]
442 The
\type {\mathcode},
\type {\delcode},
\type {\catcode},
\type {\sfcode},
\type {\lccode}
443 and
\type {\uccode} tables are now sparse arrays that are implemented in~
\CCODE.
444 They are no longer part of the
\TEX\
\quote {equivalence table
} and because each
445 had
1.1 million entries with a few memory words each, this makes a major
446 difference in memory usage.
448 The
\type {\catcode},
\type {\sfcode},
\type {\lccode} and
\type {\uccode} assignments do
449 not yet show up when using the etex tracing routines
\type {\tracingassigns} and
450 \type {\tracingrestores} (code simply not written yet).
452 A side|-|effect of the current implementation is that
\type {\global} is now more
453 expensive in terms of processing than non|-|global assignments.
455 See
\type {mathcodes.c
} and
\type {textcodes.c
} if you are interested in the
458 Also, the glyph ids within a font are now managed by means of a sparse array and
459 glyph ids can go up to index $
2^
{21}-
1$.
463 \startsubsection[title=Simple single-character csnames
]
465 Single|-|character commands are no longer treated specially in the internals,
466 they are stored in the hash just like the multiletter csnames.
468 The code that displays control sequences explicitly checks if the length is one
469 when it has to decide whether or not to add a trailing space.
471 Active characters are internally implemented as a special type of multi|-|letter
472 control sequences that uses a prefix that is otherwise impossible to obtain.
476 \startsubsection[title=Compressed format
]
478 The format is passed through zlib, allowing it to shrink to roughly half of the
479 size it would have had in uncompressed form. This takes a bit more
\CPU\ cycles
480 but much less disk
\IO, so it should still be faster.
484 \startsubsection[title=Binary file reading
]
486 All of the internal code is changed in such a way that if one of the
\type
487 {read_xxx_file
} callbacks is not set, then the file is read by a C function using
488 basically the same convention as the callback: a single read into a buffer big
489 enough to hold the entire file contents. While this uses more memory than the
490 previous code (that mostly used
\type {getc
} calls), it can be quite a bit faster
491 (depending on your I/O subsystem).