1 .TH PATGEN 1 "2 December 2014" "Web2C @VERSION@"
2 .\"=====================================================================
3 .if t .ds TX \fRT\\h'-0.1667m'\\v'0.20v'E\\v'-0.20v'\\h'-0.125m'X\fP
5 .ie t .ds OX \fIT\v'+0.25m'E\v'-0.25m'X\fP for troff
8 .\" the same but obliqued
9 .\" BX definition must follow TX so BX can use TX
10 .if t .ds BX \fRB\s-2IB\s0\fP\*(TX
12 .\" LX definition must follow TX so LX can use TX
13 .if t .ds LX \fRL\\h'-0.36m'\\v'-0.15v'\s-2A\s0\\h'-0.15m'\\v'0.15v'\fP\*(TX
15 .\"=====================================================================
17 patgen \- generate patterns for TeX hyphenation
20 .I dictionary_file pattern_file patout_file translate_file
21 .\"=====================================================================
23 This manual page is not meant to be exhaustive.
24 See also the Info file or manual
25 .I "Web2C: A TeX implementation"
26 available as part of the TeX Live distribution or at
27 .IR http://tug.org/web2c .
33 containing a list of hyphenated words and the
35 containing previously-generated patterns (if any) for a particular
36 language (not a complete TeX source file; see below), and produces the
38 with (previously- plus newly-generated) hyphenation patterns for that
41 defines language specific values for the parameters
42 .IR left_hyphen_min " and " right_hyphen_min
43 used by \*(TX's hyphenation algorithm and the external representation
44 of the lower and upper case version(s) of all \`letters' of that
45 language. Further details of the pattern generation process such as
46 hyphenation levels and pattern lengths are requested interactively from
47 the user's terminal. Optionally
49 creates a new dictionary file
51 showing the good and bad hyphens found by the generated patterns, where
53 is the highest hyphenation level.
55 The patterns generated by
59 for use in hyphenating words. For a real-life example of
62 .IR $TEXMFMAIN/tex/generic/hyphen/hyphen.tex ,
63 which contains the patterns \*(TX uses for English by default.
64 At some sites, patterns for (many) other languages may be available,
67 programs may have them preloaded.
69 All filenames must be complete; no adding of default
70 extensions or path searching is done.
71 .\"=====================================================================
77 digests hyphenation patterns, \*(TX first expands macros and the result
78 must entirely consist of digits (hyphenation levels), dots (\`.', edge
79 of a word), and letters. In pattern files for non-English languages
80 letters are often represented by macros or other expandable constructs.
83 these are just character sequences, subject to the condition that no
84 such sequence is a prefix of another one.
87 A dictionary file contains a weighted list of hyphenated words, one word
88 per line starting in column 1. A digit in column 1 indicates a global
89 word weight (initially =1) applicable to all following words up to the
90 next global word weight. A digit at some intercharacter position
91 indicates a weight for that position only.
93 The hyphens in a word are indicated by \`-', \`*', or \`.' (or their
94 replacements as defined in the translate file) for hyphens yet to be
95 found, \`good' hyphens (correctly found by the patterns), and \`bad'
96 hyphens (erroneously found by the patterns) respectively; when reading a
97 dictionary file \`*' is treated like \`-' and \`.' is ignored.
100 A pattern file contains only patterns in the format above, e.g., from a
101 previous run of patgen. It may \fInot\fR contain any \*(TX comments or
102 control sequences. For instance, this is not a valid pattern file:
105 % this is a pattern file read by TeX.
110 It can only contain the actual patterns, i.e., the `.\|.\|.'.
113 A translate file starts with a line containing the values of
117 in columns 3-4, and either a blank or the replacement for one of the
118 "hyphen" characters \`-', \`*', and \`.' in columns 5, 6, and 7. (Input
119 lines are padded with blanks as for many \*(TX related programs.)
121 Each following line defines one \`letter': an arbitrary delimiter
122 character in column 1, followed by one or more external representations
123 of that character (first the \`lower' case one used for output), each
124 one terminated by the delimiter and the whole sequence terminated by
127 If the translate file is empty, the values
128 .IR left_hyphen_min "=2, " right_hyphen_min "=3,"
129 and the 26 lower case letters
131 with their upper case representations
138 and any previously-generated patterns from
141 requests input from the user's terminal.
143 First the integer values of
144 .IR hyph_start " and " hyph_finish ,
145 the lowest and highest hyphenation level for which patterns are to be
146 generated. The value of
148 should be larger than any hyphenation level already present in
151 Then, for each hyphenation level, the integer values of
152 .IR pat_start " and " pat_finish ,
153 the smallest and largest pattern length to be analyzed, as well as
154 .IR "good weight" ", " "bad weight" ", and " threshold ,
155 the weights for good and bad hyphens and a weight threshold for useful
158 Finally the decision (\`y' or \`Y' vs. anything else) whether or not to
159 produce a hyphenated word list.
160 .\"=====================================================================
163 .I $TEXMFMAIN/tex/generic/hyphen/hyphen.tex
164 The original hyphenation patterns for English, by Donald Knuth and Frank
167 .I http://www.ctan.org/pkg/ushyph
168 Additional hyphenation patterns for English, extended by Gerard Kuiken.
170 .I http://www.ctan.org/pkg/hyph-utf8
171 Collected hyphenation patterns for many languages in many formats.
173 .I http://www.ctan.org/tex-archive/language/
174 General CTAN directory for patterns and support for many other languages.
175 .\"=====================================================================
177 Frank Liang and Peter Breitenlohner,
181 .IR "Word hy-phen-a-tion by com-puter" ,
183 Stanford University Ph.D. thesis, 1983,
184 http://tug.org/docs/liang.
187 .IR "The \*(OXbook" ,
188 Addison-Wesley, 1986, ISBN 0-201-13447-0, Appendix H.
189 .\"=====================================================================
191 Frank Liang wrote the first version of this program. Peter
193 substantial revision in 1991 for \*(TX 3.
194 The first version was published as the appendix to the
196 technical report. Howard Trickey originally ported it to Unix.