updated NEWS
[luatex.git] / manual / luatex-modifications.tex
blob4d4fb02929e42f73726e568ede787faa5b972e0d
1 \environment luatex-style
2 \environment luatex-logos
4 \startcomponent luatex-modifications
6 \startchapter[reference=modifications,title={Modifications}]
8 \startsection[title=The merged engines]
10 \startsubsection[title=The need for change]
12 The first version of \LUATEX\ only had a few extra primitives and it was largely
13 the same as \PDFTEX. Then we merged substantial parts of \ALEPH\ into the code
14 and got more primitives. When we got more stable teh decision was made to clean
15 up the rather hybrid nature of the program. This means that some primnitives have
16 been promoted to core primitives, often with a different name, and that others
17 were removed. This made it possible to start cleaning up the code base. We will
18 describe most in following paragraphs.
20 Besides the expected changes caused by new functionality, there are a number of
21 not|-|so|-|expected changes. These are sometimes a side|-|effect of a new
22 (conflicting) feature, or, more often than not, a change neccessary to clean up
23 the internal interfaces. These will also be mentioned.
25 \stopsubsection
27 \startsubsection[title=Changes from \TEX\ 3.1415926]
29 Of course it all starts with traditional \TEX. Even if we started with \PDFTEX,
30 most still comes from the original. But we divert a bit.
32 \startitemize
34 \startitem
35 The current code base is written in \CCODE, not \PASCAL. We use \CWEB\
36 when possible.
37 \stopitem
39 \startitem
40 See \in {chapter} [languages] for many small changes related to paragraph
41 building, language handling and hyphenation. The most important change is
42 that adding a brace group in the middle of a word (like in \type {of{}fice})
43 does not prevent ligature creation.
44 \stopitem
46 \startitem
47 There is no pool file, all strings are embedded during compilation.
48 \stopitem
50 \startitem
51 The specifier \type {plus 1 fillll} does not generate an error. The extra
52 \quote{l} is simply typeset.
53 \stopitem
55 \startitem
56 The upper limit to \type {\endlinechar} and \type {\newlinechar} is 127.
57 \stopitem
59 \startitem
60 The hz optimization code has been partially redone so that we no longer need
61 to create extra font instances. The front- and backend have been decoupled and
62 more efficient (\PDF) code is generated.
63 \stopitem
65 \stopitemize
67 \stopsubsection
69 \startsubsection[title=Changes from \ETEX\ 2.2]
71 Being the de factor standard extension of course we provide the \ETEX\
72 functionality, but with a few small adaptions.
74 \startitemize
76 \startitem
77 The \ETEX\ functionality is always present and enabled so the prepended
78 asterisk or \type {-etex} switch for \INITEX\ is not needed.
79 \stopitem
81 \startitem
82 The \TEXXET\ extension is not present, so the primitives \type
83 {\TeXXeTstate}, \type {\beginR}, \type {\beginL}, \type {\endR} and \type
84 {\endL} are missing.
85 \stopitem
87 \startitem
88 Some of the tracing information that is output by \ETEX's \type
89 {\tracingassigns} and \type {\tracingrestores} is not there.
90 \stopitem
92 \startitem
93 Register management in \LUATEX\ uses the \ALEPH\ model, so the maximum value
94 is 65535 and the implementation uses a flat array instead of the mixed
95 flat|\&|sparse model from \ETEX.
96 \stopitem
98 \startitem
99 The \type {\savinghyphcodes} command is a no|-|op. \in {Chapter} [languages]
100 explains why.
101 \stopitem
103 \startitem
104 When kpathsea is used to find files, \LUATEX\ uses the \type {ofm} file
105 format to search for font metrics. In turn, this means that \LUATEX\ looks at
106 the \type {OFMFONTS} configuration variable (like \OMEGA\ and \ALEPH) instead
107 of \type {TFMFONTS} (like \TEX\ and \PDFTEX). Likewise for virtual fonts
108 (\LUATEX\ uses the variable \type {OVFFONTS} instead of \type {VFFONTS}).
109 \stopitem
111 \stopitemize
113 \stopsubsection
115 \startsubsection[title=Changes from \PDFTEX\ 1.40]
117 Because we want to produce \PDF\ the most natural starting point was the popular
118 \PDFTEX\ program. We inherit the stable features, dropped most of the
119 experimental code and promoted some functionality to core \LUATEX\ functionality
120 which in turn triggered renaming primitives.
122 \startitemize
124 \startitem
125 The (experimental) support for snap nodes has been removed, because it is
126 much more natural to build this functionality on top of node processing and
127 attributes. The associated primitives that are now gone are: \type
128 {\pdfsnaprefpoint}, \type {\pdfsnapy}, and \type {\pdfsnapycomp}.
129 \stopitem
131 \startitem
132 The (experimental) support for specialized spacing around nodes has also been
133 removed. The associated primitives that are now gone are: \type
134 {\pdfadjustinterwordglue}, \type {\pdfprependkern}, and \type {\pdfappendkern}, as
135 well as the five supporting primitives \type {\knbscode}, \type {\stbscode}, \type
136 {\shbscode}, \type {\knbccode}, and \type {\knaccode}.
137 \stopitem
139 \startitem
140 A number of \quote {pdftex primitives} have been removed as they can be
141 implemented using \LUA:
143 \start \raggedright
144 \type {\pdfelapsedtime}, \type {\pdfescapehex}, \type {\pdfescapename}, \type
145 {\pdfescapestring}, \type {\pdffiledump}, \type {\pdffilemoddate}, \type
146 {\pdffilesize}, \type {\pdfforcepagebox}, \type {\pdflastmatch}, \type
147 {\pdfmatch}, \type {\pdfmdfivesum}, \type {\pdfmovechars}, \type
148 {\pdfoptionalwaysusepdfpagebox}, \type {\pdfoptionpdfinclusionerrorlevel},
149 \type {\pdfresettimer}, \type {\pdfshellescape}, \type {\pdfstrcmp} and \type
150 {\pdfunescapehex}
151 \par \stop
152 \stopitem
154 \startitem
155 The version related primitives \type {\pdftexbanner}, \type {\pdftexversion}
156 and \type {\pdftexrevision} are no longer present as there is no longer a
157 strict relationship with \PDFTEX\ development.
158 \stopitem
160 \startitem
161 The experimental snapper mechanism has been removed and therefore also the
162 primitives:
164 \start \raggedright
165 \type {\pdfignoreddimen}, \type {\pdffirstlineheight}, \type
166 {\pdfeachlineheight}, \type {\pdfeachlinedepth} and \type
167 {\pdflastlinedepth}
168 \par \stop
169 \stopitem
171 \startitem
172 The experimental primitives \type {\primitive}, \type {\ifprimitive}, \type
173 {\ifabsnum} and \type {\ifabsdim} are promoted to core primitives. The \type
174 {\pdf*} prefixed originals are not available.
175 \stopitem
177 \startitem
178 The \PNG\ transparency fix from 1.40.6 is not applied as high|-|level
179 support is pending.
180 \stopitem
182 \startitem
183 Two extra token lists are provides, \type {\pdfxformresources} and \type
184 {\pdfxformattr}, as an alternative to \type {\pdfxform} keywords.
185 \stopitem
187 \startitem
188 The current version of \LUATEX\ no longer replaces and|/|or merges fonts in
189 embedded pdf files with fonts of the enveloping \PDF\ document. This
190 regression may be temporary, depending on how the rewritten font backend will
191 look like.
192 \stopitem
194 \startitem
195 The primitives \type {\pdfpagewidth} and \type {\pdfpageheight} have been removed
196 because \type {\pagewidth} and \type {\pageheight} have that purpose.
197 \stopitem
199 \startitem
200 The primitives \type {\pdfnormaldeviate}, \type {\pdfuniformdeviate}, \type
201 {\pdfsetrandomseed} and \type {\pdfrandomseed} have been promoted to core
202 primitives without \type {pdf} prefix so the original commands are no longer
203 recognized.
204 \stopitem
206 \startitem
207 The primitives \type {\ifincsname}, \type {\expanded} and \type {\quitvmode} are now
208 core primitives.
209 \stopitem
211 \startitem
212 As the hz and protrusion mechanism are part of the core the related
213 primitives \type {\lpcode}, \type {\rpcode}, \type {\efcode}, \type
214 {\leftmarginkern}, \type {\rightmarginkern} are promoted to core primitives. The
215 two commands \type {\protrudechars} and \type {\adjustspacing} replace their
216 prefixed with \type {\pdf} originals.
217 \stopitem
219 \startitem
220 The \type {\tagcode} primitive is promoted to core primitive.
221 \stopitem
223 \startitem
224 The \type {\letterspacefont} feature is now part of the core but will not be
225 changed (improved). We just provide it for legacy use.
226 \stopitem
228 \startitem
229 The \type {\pdfnoligatures} primitive is now \type {\ignoreligaturesinfont}.
230 \stopitem
232 \startitem
233 The \type {\pdffontexpand} primitive is now \type {\expandglyphsinfont}.
234 \stopitem
236 \startitem
237 Because position tracking is also available in \DVI\ mode the
238 \type {\savepos}, \type {\lastxpos} and \type {\lastypos} commands now
239 replace their \type {pdf} prefixed originals.
240 \stopitem
242 \startitem
243 Candidates for removal are \type {\pdfcolorstackinit} and \type
244 {\pdfcolorstack}.
245 \stopitem
247 \startitem
248 Candidates for replacement are \type {\pdfoutput} (\type {\outputmode}) and
249 \type {\pdfmatrix} (something with a normal syntax).
250 \stopitem
252 \stopitemize
254 \stopsubsection
256 \startsubsection[title=Changes from \ALEPH\ RC4]
258 Because we wanted proper directional typesetting the \ALEPH\ mechanisms looked
259 most attractive. These are rather close to the ones provided by \OMEGA, so what
260 we say next applies to both these programs.
262 \startitemize
264 \startitem
265 The extended 16-bit math primitives (\type {\omathcode} etc.) have been
266 removed.
267 \stopitem
269 \startitem
270 The \OCP\ processing is no longer supported at all. As a consequence, the
271 following primitives have been removed:
273 \start \raggedright
274 \type {\ocp}, \type {\externalocp}, \type {\ocplist}, \type {\pushocplist},
275 \type {\popocplist}, \type {\clearocplists}, \type {\addbeforeocplist}, \type
276 {\addafterocplist}, \type {\removebeforeocplist}, \type {\removeafterocplist}
277 and \type {\ocptracelevel}
278 \par \stop
279 \stopitem
281 \startitem
282 \LUATEX\ only understands 4~of the 16~direction specifiers of \ALEPH: \type
283 {TLT} (latin), \type {TRT} (arabic), \type {RTT} (cjk), \type {LTL}
284 (mongolian). All other direction specifiers generate an error.
285 \stopitem
287 \startitem
288 The input translations from \ALEPH\ are not implemented, the related
289 primitives are not available:
291 \start \raggedright
292 \type {\DefaultInputMode}, \type {\noDefaultInputMode}, \type {\noInputMode},
293 \type {\InputMode}, \type {\DefaultOutputMode}, \type {\noDefaultOutputMode},
294 \type {\noOutputMode}, \type {\OutputMode}, \type {\DefaultInputTranslation},
295 \type {\noDefaultInputTranslation}, \type {\noInputTranslation}, \type
296 {\InputTranslation}, \type {\DefaultOutputTranslation}, \type
297 {\noDefaultOutputTranslation}, \type {\noOutputTranslation} and \type
298 {\OutputTranslation}
299 \par \stop
300 \stopitem
302 \startitem
303 Several bugs hav ebeen fixed. The \type {\hoffset} bug when \type {\pagedir TRT}
304 is gone, removing the need for an explicit fix to \type {\hoffset}. Also bug
305 causing \type {\fam} to fail for family numbers above 15 is fixed. A fair amount
306 of other minor bugs are fixed as well, most of these related to \type
307 {\tracingcommands} output.
308 \stopitem
310 \startitem
311 The scanner for direction specifications now allows an optional space after
312 the direction is completely parsed.
313 \stopitem
315 \startitem
316 The \type {^^} notation can come in five and six item repetitions also, to
317 insert characters that do not fit in the BMP.
318 \stopitem
320 \startitem
321 Glues {\it immediately after} direction change commands are not legal
322 breakpoints.
323 \stopitem
325 \startitem
326 Several mechanisms that need to be right|-|to|-|left aware have been
327 improved. For instance placement of formula numbers.
328 \stopitem
330 \startitem
331 The page dimension related primitives \type {\pagewidth} and \type {\pageheight} have
332 been promoted to core primitives.
333 \stopitem
335 \startitem
336 The primitives \type {\charwd}, \type {\charht}, \type {\chardp} and \type {\charit}
337 have been removes as we have the \ETEX\ variants \type {\fontchar*}.
338 \stopitem
340 \startitem
341 The two dimension registers \type {\pagerightoffset} and \type
342 {\pagebottomoffset} are now core primitives.
343 \stopitem
345 \startitem
346 The direction related primitives \type {\pagedir}, \type {\bodydir}, \type
347 {\pardir}, \type {\textdir}, \type {\mathdir} and \type {\boxdir} are now
348 core primitives.
349 \stopitem
351 \startitem
352 The promotion of primitives to core primitives as well as the removed of all
353 others mean that the initialization namespace \type {aleph} is gone.
354 \stopitem
356 \stopitemize
358 \stopsubsection
360 \startsubsection[title=Changes from standard \WEBC]
362 The compilation framework is \WEBC\ and we keep using that but without the
363 \PASCAL\ to \CCODE\ step. This framework also provides some common features that
364 deal with reading bytes from files and locating files in \TDS. This is what we do
365 different:
367 \startitemize
369 \startitem
370 There is no mltex support.
371 \stopitem
373 \startitem
374 There is no enctex support.
375 \stopitem
377 \startitem
378 The following commandline switches are silently ignored, even in non|-|\LUA\
379 mode: \type {-8bit}, \type {-translate-file}, \type {-mltex}, \type {-enc}
380 and \type {-etex}.
381 \stopitem
383 \startitem
384 The \type {\openout} whatsits are not written to the log file.
385 \stopitem
387 \startitem
388 Some of the so|-|called web2c extensions are hard to set up in non|-|\KPSE\
389 mode because \type {texmf.cnf} is not read: \type {shell-escape} is off (but
390 that is not a problem because of \LUA's \type {os.execute}), and the paranoia
391 checks on \type {openin} and \type {openout} do not happen (however, it is
392 easy for a \LUA\ script to do this itself by overloading \type {io.open}).
393 \stopitem
395 \startitem
396 The \quote{E} option does not do anything useful.
397 \stopitem
399 \stopitemize
401 \stopsubsection
403 \stopsection
405 \startsection[title=Implementation notes]
407 \startsubsection[title=Memory allocation]
409 The single internal memory heap that traditional \TEX\ used for tokens and nodes
410 is split into two separate arrays. Each of these will grow dynamically when
411 needed.
413 The \type {texmf.cnf} settings related to main memory are no longer used (these
414 are: \type {main_memory}, \type {mem_bot}, \type {extra_mem_top} and \type
415 {extra_mem_bot}). \quote {Out of main memory} errors can still occur, but the
416 limiting factor is now the amount of RAM in your system, not a predefined limit.
418 Also, the memory (de)allocation routines for nodes are completely rewritten. The
419 relevant code now lives in the C file \type {texnode.c}, and basically uses a
420 dozen or so \quote {avail} lists instead of a doubly|-|linked model. An extra
421 function layer is added so that the code can ask for nodes by type instead of
422 directly requisitioning a certain amount of memory words.
424 Because of the split into two arrays and the resulting differences in the data
425 structures, some of the macros have been duplicated. For instance, there are now
426 \type {vlink} and \type {vinfo} as well as \type {token_link} and \type
427 {token_info}. All access to the variable memory array is now hidden behind a
428 macro called \type {vmem}.
430 The implementation of the growth of two arrays (via reallocation) introduces a
431 potential pitfall: the memory arrays should never be used as the left hand side
432 of a statement that can modify the array in question.
434 The input line buffer and pool size are now also reallocated when needed, and the
435 \type {texmf.cnf} settings \type {buf_size} and \type {pool_size} are silently
436 ignored.
438 \stopsubsection
440 \startsubsection[title=Sparse arrays]
442 The \type {\mathcode}, \type {\delcode}, \type {\catcode}, \type {\sfcode}, \type {\lccode}
443 and \type {\uccode} tables are now sparse arrays that are implemented in~\CCODE.
444 They are no longer part of the \TEX\ \quote {equivalence table} and because each
445 had 1.1 million entries with a few memory words each, this makes a major
446 difference in memory usage.
448 The \type {\catcode}, \type {\sfcode}, \type {\lccode} and \type {\uccode} assignments do
449 not yet show up when using the etex tracing routines \type {\tracingassigns} and
450 \type {\tracingrestores} (code simply not written yet).
452 A side|-|effect of the current implementation is that \type {\global} is now more
453 expensive in terms of processing than non|-|global assignments.
455 See \type {mathcodes.c} and \type {textcodes.c} if you are interested in the
456 details.
458 Also, the glyph ids within a font are now managed by means of a sparse array and
459 glyph ids can go up to index $2^{21}-1$.
461 \stopsubsection
463 \startsubsection[title=Simple single-character csnames]
465 Single|-|character commands are no longer treated specially in the internals,
466 they are stored in the hash just like the multiletter csnames.
468 The code that displays control sequences explicitly checks if the length is one
469 when it has to decide whether or not to add a trailing space.
471 Active characters are internally implemented as a special type of multi|-|letter
472 control sequences that uses a prefix that is otherwise impossible to obtain.
474 \stopsubsection
476 \startsubsection[title=Compressed format]
478 The format is passed through zlib, allowing it to shrink to roughly half of the
479 size it would have had in uncompressed form. This takes a bit more \CPU\ cycles
480 but much less disk \IO, so it should still be faster.
482 \stopsubsection
484 \startsubsection[title=Binary file reading]
486 All of the internal code is changed in such a way that if one of the \type
487 {read_xxx_file} callbacks is not set, then the file is read by a C function using
488 basically the same convention as the callback: a single read into a buffer big
489 enough to hold the entire file contents. While this uses more memory than the
490 previous code (that mostly used \type {getc} calls), it can be quite a bit faster
491 (depending on your I/O subsystem).
493 \stopsubsection
495 \stopsection
497 \stopchapter
499 \stopcomponent