lib/perl5/5.6.1/unicode/NamesList.html

   1 <html>
   2
   3 <head>
   4 <meta name="GENERATOR" content="Microsoft FrontPage 3.0">
   5 <title>Unicode 3.0 NamesList File Structure</title>
   6 </head>
   7
   8 <body>
   9
  10 <h3>Unicode NamesList File Format</h3>
  11
  12 <p>Last updated: 1999-07-06</p>
  13
  14 <h3>1.0 Introduction</h3>
  15
  16 <p>The Unicode name list file NamesList.txt (also NamesList.lst) is a plain text file used
  17 to drive the layout of the character code charts in the Unicode Standard. The information
  18 in this file is a combination of several fields from the UnicodeData.txt and Blocks.txt files,
  19 together with additional annotations for many characters. This document describes the
  20 syntax rules for the file format, but also gives brief information on how each construct
  21 is rendered when laid out for the book. Some of the syntax elements were used in
  22 preparation of the drafts of the book and may not be present in the final, released form
  23 of the NamesList.txt file.</p>
  24
  25 <p>The same input file can be used to do the draft preparation for ISO/IEC 10646 (referred
  26 below as ISO-style). This necessitates the presence of some information in the name list
  27 file that is not needed (and in fact removed during parsing) for the Unicode book.</p>
  28
  29 <p>With access to the layout program (unibook.exe) it is a simple matter of creating
  30 name lists for the purpose of formatting working drafts containing proposed characters.</p>
  31
  32 <h3>1.1 NamesList File Overview</h3>
  33
  34 <p>The *.lst files are plain text files which in their most simple form look like this</p>
  35
  36 <p>@@&lt;tab&gt;0020&lt;tab&gt;BASIC LATIN&lt;tab&gt;007F<br>
  37 ; this is a file comment (ignored)<br>
  38 0020&lt;tab&gt;SPACE<br>
  39 0021&lt;tab&gt;EXCLAMATION MARK<br>
  40 0022&lt;tab&gt;QUOTATION MARK<br>
  41 . . . <br>
  42 007F&lt;tab&gt;DELETE</p>
  43
  44 <p>The semicolon (as first character), @ and &lt;tab&gt; characters are used by the file
  45 syntax and must be provided as shown. Hexadecimal digits must be in UPPER CASE). A double
  46 @@ introduces a block header, with the title, and start and ending code of the block
  47 provided as shown.</p>
  48
  49 <p>For an ISO-style, minimal name list, only the NAME_LINE and BLOCKHEADER and their
  50 constituent syntax elements are needed.</p>
  51
  52 <p>The full syntax with all the options is provided in the following sections.</p>
  53
  54 <h3>1.2 NamesList File Structure</h3>
  55
  56 <p>This section gives defines the overall file structure</p>
  57
  58 <pre><strong>NAMELIST:     TITLE_PAGE* BLOCK*
  59 </strong>
  60 <strong>TITLE_PAGE:   TITLE
  61                 | TITLE_PAGE SUBTITLE
  62                 | TITLE_PAGE SUBHEADER
  63                 | TITLE_PAGE IGNORED_LINE
  64                 | TITLE_PAGE EMPTY_LINE
  65                 | TITLE_PAGE COMMENTLINE
  66                 | TITLE_PAGE NOTICE
  67                 | TITLE_PAGE PAGEBREAK
  68 </strong>
  69 <strong>BLOCK:        BLOCKHEADER
  70                 | BLOCK CHAR_ENTRY
  71                 | BLOCK SUBHEADER
  72                 | BLOCK NOTICE
  73                 | BLOCK EMPTY_LINE
  74                 | BLOCK IGNORED_LINE
  75                 | BLOCK PAGEBREAK
  76
  77 CHAR_ENTRY:   NAME_LINE | RESERVED_LINE
  78                 | CHAR_ENTRY ALIAS_LINE
  79                 | CHAR_ENTRY COMMENT_LINE
  80                 | CHAR_ENTRY CROSS_REF
  81                 | CHAR_ENTRY DECOMPOSITION
  82                 | CHAR_ENTRY COMPAT_MAPPING
  83                 | CHAR_ENTRY IGNORED_LINE
  84                 | CHAR_ENTRY EMPTY_LINE
  85                 | CHAR_ENTRY NOTICE
  86 </strong></pre>
  87
  88 <p>In other words:<br>
  89 <br>
  90 Neither TITLE nor&nbsp; SUBTITLE may occur after the first BLOCKHEADER. </p>
  91
  92 <p>Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE,&nbsp; and IGNORED_LINE may
  93 occur before the first BLOCKHEADER.</p>
  94
  95 <p>Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted sequence of
  96 the following lines may occur (in any order and repeated as often as needed): ALIAS_LINE,
  97 CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, NOTICE, EMPTY_LINE and IGNORED_LINE.</p>
  98
  99 <p>Except for EMPTY_LINE, NOTICE and IGNORED_LINE, none of these lines may occur in any other
 100 place. </p>
 101
 102 <p>Note: A NOTICE displays differently depending on whether it follows a header or title
 103 or is part of a CHAR_ENTRY.</p>
 104
 105 <h3>1.3 NamesList File Elements</h3>
 106
 107 <p>This section provides the details of the syntax for the individual elements.</p>
 108
 109 <pre><small><strong>ELEMENT             SYNTAX</strong> // How rendered</small></pre>
 110
 111 <pre><small><strong>NAME_LINE:  CHAR &lt;tab&gt; LINE
 112 </strong>                       // the CHAR and the corresponding image are echoed,
 113                         // followed by the name as given in LINE
 114
 115 <strong>                CHAR TAB NAME COMMENT LF
 116 </strong>                       // Names may have a comment, which is stripped off
 117                         // unless the file is parsed for an ISO style list
 118
 119 <strong>RESERVED_LINE:  CHAR TAB &lt;reserved&gt;
 120 </strong>                       // the CHAR is echoed followed by an icon for the
 121                         // reserved character and a fixed string e.g. &lt;reserved&gt;
 122
 123 <strong>COMMMENT_LINE:  &lt;tab&gt; &quot;*&quot; SP EXPAND_LINE
 124 </strong>                       // * is replaced by BULLET, output line as comment
 125                 <strong>&lt;tab&gt; EXPAND_LINE</strong>
 126                         // output line as comment
 127
 128 <strong>ALIAS_LINE:     &lt;tab&gt; &quot;=&quot; SP LINE
 129 </strong>                       // replace = by itself, output line as alias
 130
 131 <strong>CROSS_REF:      &lt;tab&gt; &quot;X&quot; SP EXPAND_LINE
 132 </strong>                       // X is replaced by a right arrow
 133 <strong>                &lt;tab&gt; &quot;X&quot; SP &quot;(&quot; STRING SP &quot;-&quot; SP CHAR &quot;)&quot;
 134 </strong>                       // X is replaced by a right arrow
 135                         // the &quot;(&quot;, &quot;-&quot;, &quot;)&quot; are removed, the
 136                         // order of CHAR and STRING is reversed
 137                         // i.e. both inputs result in the same output
 138
 139 <strong>IGNORED_LINE:   &lt;tab&gt; &quot;;&quot; EXPAND_LINE
 140 EMPTY_LINE:     LF
 141 </strong>                       // empty lines and file comments are ignored
 142
 143 <strong>DECOMPOSITION:  &lt;tab&gt; &quot;:&quot; EXPAND_LINE
 144 </strong>                       // replace ':' by EQUIV, expand line into
 145                         // decomposition
 146
 147 <strong>COMPAT_MAPPING: &lt;tab&gt; &quot;#&quot; SP EXPAND_LINE
 148 </strong>                       // replace '#' by APPROX, output line as mapping
 149
 150 <strong>NOTICE:         &quot;@+&quot; &lt;tab&gt; LINE
 151 </strong>                       // skip '@+', output text as notice
 152 <strong>                &quot;@+&quot; TAB * SP LINE
 153 </strong>                       // skip '@', output text as notice
 154                         // &quot;*&quot; expands to a bullet character
 155                         // Notices following a character code apply to the
 156                         // character and are indented. Notices not following
 157                         // a character code apply to the page/block/column
 158                         // and are italicized, but not indented
 159
 160 <strong>SUBTITLE:       &quot;@@@+&quot; &lt;tab&gt; LINE
 161 </strong>                       // skip &quot;@@@+&quot;, output text as subtitle
 162
 163 <strong>SUBHEADER:      &quot;@&quot; &lt;tab&gt; LINE
 164 </strong>                       // skip '@', output line as text as column header
 165
 166 <strong>BLOCKHEADER:    &quot;@@&quot; &lt;tab&gt; BLOCKSTART &lt;tab&gt; BLOCKNAME &lt;tab&gt; BLOCKEND
 167 </strong>                       // skip &quot;@@&quot;, cause a page break and optional
 168                         // blank page, then output one or more charts
 169                         // followed by the list of character names.
 170                         // use BLOCKSTART and BLOCKEND to define the
 171                         // what characters belong to a block
 172                         // use blockname in page and table headers
 173         <strong>        &quot;@@&quot; &lt;tab&gt; BLOCKSTART &lt;tab&gt; BLOCKNAME COMMENT &lt;tab&gt; BLOCKEND
 174                         </strong>// if a comment is present it replaces the blockname
 175                         // when an ISO-style namelist is laid out
 176
 177 <strong>BLOCKSTART:     CHAR</strong>   // first character position in block
 178 <strong>BLOCKEND:       CHAR</strong>   // last character position in block
 179 <strong>PAGE_BREAK:     &quot;@@&quot;</strong> // insert a (column) break
 180
 181 <strong>TITLE:          &quot;@@@&quot; &lt;tab&gt; LINE</strong>
 182                         // skip &quot;@@@&quot;, output line as text
 183                         // Title is used in page headers
 184
 185 <strong>EXPAND_LINE:    {CHAR | STRING}+ LF     </strong>
 186                         // all instances of CHAR *) are replaced by
 187                         // CHAR NBSP x NBSP where x is the single Unicode
 188                         // character corresponding to char
 189                         // If character is combining, it is replaced with
 190                         // CHAR NBSP &lt;circ&gt; x NBSP where &lt;circ&gt; is the
 191                         // dotted circle</small>
 192 </pre>
 193
 194 <h3><strong>1.4 NamesList File Primitives</strong></h3>
 195
 196 <p>The following are the primitives and terminals for the NamesList syntax.</p>
 197
 198 <pre><small><strong>LINE:               STRING LF
 199 COMMENT:        &quot;(&quot; NAME &quot;)&quot;
 200                 &quot;(&quot; NAME &quot;)&quot; &quot;*&quot;
 201 </strong>
 202 <strong>NAME</strong>:          &lt;sequence of ASCII characters, except &quot;(&quot; or &quot;)&quot; &gt;
 203 <strong>STRING</strong>:                &lt;sequence of Latin-1 characters&gt;
 204 <strong>CHAR</strong>:          <strong>X X X X</strong>
 205                 <strong>| X X X X X X X X X</strong></small>
 206 <small><strong>X:               &quot;0&quot;|&quot;1&quot;|&quot;2&quot;|&quot;3&quot;|&quot;4&quot;|&quot;5&quot;|&quot;6&quot;|&quot;7&quot;|&quot;8&quot;|&quot;9&quot;|&quot;A&quot;|&quot;B&quot;|&quot;C&quot;|&quot;D&quot;|&quot;E&quot;|&quot;F&quot;
 207 &lt;tab&gt;:</strong>           &lt;sequence of one or more ASCII tab characters 0x09&gt;
 208 <strong>SP</strong>:            &lt;ASCII 0x20&gt;
 209 <strong>LF</strong>:            &lt;any sequence of ASCII 0x0A and 0x0D&gt;
 210 </small></pre>
 211
 212 <p><strong>Notes:</strong>
 213
 214 <ul>
 215   <li>Special lookahead logic prevents a mention of a 4 digit standard, such as ISO 9999 from
 216     being misinterpreted as ISO CHAR.</li>
 217   <li>Use of Latin-1 is supported in unibook.exe, but not portably, unless the file is encoded as
 218     UTF-16LE.</li>
 219   <li>The final LF in the file must be present</li>
 220   <li>A CHAR inside ' or &quot; is expanded, but only its glyph image is printed,&nbsp; the
 221     code value is not echoed</li>
 222   <li>Straight quotes in an EXPAND_LINE are replaced by curly quotes using English rules.
 223     Apostrophes are supported, but nested quotes are not.</li>
 224 </ul>
 225 </body>
 226 </html>