resources/encodings/apple/README.TXT

   1 #=======================================================================
   2 #   FTP file name:  README.TXT
   3 #
   4 #   Contents:       Background information on Unicode mapping tables
   5 #                   for Mac OS text encodings
   6 #
   7 #   Copyright:      (c) 1995-1998 by Apple Computer, Inc., all rights
   8 #                   reserved.
   9 #
  10 #   Contacts:       Peter Edberg <pedberg@apple.com>
  11 #                   Julio Gonzalez <juliog@apple.com>
  12 #                   John Jenkins <jenkins@apple.com>
  13 #
  14 #   Changes:
  15 #
  16 #       n07  1998-Feb-05    Rewrite to provide additional information
  17 #                           relevant to using the accompanying mapping
  18 #                           tables, and to delete some extraneous
  19 #                           information. Delete Bulgarian (no special
  20 #                           encoding, uses standard Cyrillic), add
  21 #                           Farsi, Devanagari, Gurmukhi, Gujarati,
  22 #                           Celtic, Gaelic, Inuit, Tibetan.
  23 #       n04  1995-Nov-15    Update info for Hebrew and Thai
  24 #       n03  1995-Apr-15    First version (after fixing some typos).
  25 #
  26 ##################
  27
  28 0. Preliminaries
  29 ----------------
  30
  31 For maximum interchangeability, this file and the accompanying Mac OS
  32 mapping tables use only ASCII characters. They are intended to be
  33 displayed in a monospaced font.
  34
  35 Apple, the Apple logo, Mac, and Macintosh are trademarks of Apple
  36 Computer, Inc., registered in the United States and other countries.
  37 QuickDraw and TrueType are trademarks of Apple Computer, Inc. Unicode is
  38 a trademark of Unicode Inc. PostScript is a trademark of Adobe Systems
  39 Inc., which may be registered in certain jurisdictions. IBM is a
  40 registered trademark of International Business Machines Corporation. ITC
  41 Zapf Dingbats is a registered trademark of the International Typeface
  42 Corporation. For the sake of brevity, throughout this document and the
  43 accompanying tables, "Macintosh" can be used to refer to Macintosh
  44 computers and "Unicode" can be used to refer to the Unicode standard.
  45
  46 Apple Computer, Inc. ("Apple") makes no warranty or representation,
  47 either express or implied, with respect to this document and the
  48 accompanying tables, their quality, accuracy, or fitness for a
  49 particular purpose. In no event will Apple be liable for direct,
  50 indirect, special, incidental, or consequential damages resulting from
  51 any defect or inaccuracy in this document or the accompanying tables.
  52
  53 1. Introduction
  54 ---------------
  55
  56 This document summarizes some Unicode mapping considerations that are
  57 relevant for the accompanying mapping tables. It also provides an
  58 overview of Mac OS encodings.
  59
  60 These mapping tables and character lists are subject to change.
  61 The latest tables should be available from the following:
  62
  63 <ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
  64 <ftp://dev.apple.com/devworld/Technical_Documentation/Misc._Standards/>
  65
  66 2. Round-trip fidelity and overview of mapping techniques
  67 ---------------------------------------------------------
  68
  69 For a particular set of national and international standards, Unicode
  70 provides round-trip fidelity: Text in one of those encodings can be
  71 mapped to Unicode and back again, yielding the original characters.
  72 Characters which are distinct in one of these source standards have
  73 a distinct counterpart in Unicode. Note that this counterpart might not
  74 be a single Unicode character; as is pointed out in "The Unicode
  75 Standard, Version 2.0" (page 2-10), "sometimes a single code value in
  76 another standard corresponds to a sequence of code values in the Unicode
  77 Standard, or vice versa."
  78
  79 However, Unicode does not attempt to provide round-trip fidelity for
  80 most vendor standards. Nevertheless, Apple and other platform vendors
  81 may need to provide such round-trip fidelity for their current encodings
  82 (this can be important in file systems, for example). In order to do
  83 this, Apple makes use of some Unicode characters in the corporate-use
  84 zone (the upper end of the private use area).
  85
  86 Corporate-zone characters must be used with care. Indiscriminate use of
  87 such characters can result in text which is not easily interchanged with
  88 other systems, since these characters have no standard meaning outside a
  89 particular platform. The mappings provided here are intended to minimize
  90 the use of private use characters, or to use them in such a way that
  91 basic text content will not be lost if the corporate zone characters are
  92 dropped when text is transferred to another system.
  93
  94 The tables provided here have three goals, in the following order of
  95 importance:
  96 1. Provide 100% round-trip mapping from a Mac OS encoding to Unicode
  97 and back (even if the mappings here are converted to maximal
  98 decompositions, see below).
  99 2. Map characters in a Mac OS encoding into the Unicode characters
 100 that best represent the interpretation and usage of the Mac OS
 101 characters.
 102 3. When mapping text in a Mac OS encoding to Unicode using the tables,
 103 the resulting Unicode text should be as interchangeable as possible.
 104
 105 To satisfy these goals, the mappings use a variety of techniques. First
 106 we attempt to achieve round-trip mappings using any standard Unicode
 107 feature at our disposal, without resorting to corporate-zone characters.
 108 This can includes the following techniques:
 109 - Use of all Unicode characters defined in Unicode 2.0, including
 110   compatibility characters.
 111 - Mapping a single character in a Mac OS encoding to a sequence of
 112   standard Unicode characters, or vice versa. This requires grouping
 113   characters into appropriate chunks for lookup before mapping them
 114   (this mainly applies to sequences of Unicode characters).
 115 - Using Unicode direction overrides to force direction attributes when
 116   mapping to Unicode. This requires resolution of Unicode character
 117   direction, and use of this information, when mapping from Unicode back
 118   to certain Mac OS encodings.
 119 The requirements imposed on Unicode handling are necessary for other,
 120 non-transcoding operations in a full Unicode implementation anyway, so
 121 requiring them for transcoding should not impose much of a burden.
 122
 123 Next, if round-trip fidelity cannot be achieved using the above
 124 techniques, we attempt to use corporate-zone characters only as
 125 "transcoding hints" (more on this below). These are combined with one or
 126 more standard Unicode characters to mark them as special for
 127 transcoding, but have no other function and can be deleted with no loss
 128 of basic text content (only of round-trip fidelity).
 129
 130 Finally, if a character in a Mac OS encoding is unrelated to any Unicode
 131 or Unicode sequence, we may map it to a single corporate-zone Unicode
 132 code point.
 133
 134 These techniques are described in more detail in the following sections.
 135
 136 Some clients of these tables may have a different set of goals. For
 137 example, some clients may prefer to avoid compatibility characters,
 138 perhaps sacrificing round-trip fidelity if necessary. In most cases it
 139 is fairly easy to construct other types of mappings from the mappings
 140 given here. In particular, the mappings here have been designed so that
 141 if they are converted to maximal decomposition mappings (by recursive
 142 application of the canonical decompositions in the Unicode database),
 143 the resulting mappings will still provide 100% roundtrip fidelity.
 144
 145 There is one more round-trip issue that should be mentioned. If a
 146 Unicode character or sequence can be mapped at all into a particular
 147 Mac encoding, then the reverse mapping back to Unicode should yield
 148 the original Unicode character or sequence (except for possible
 149 differences in direction overrides or other Unicode characters in the
 150 "Other, Format" category). The tables here also provide this. For a
 151 related issue, see the next section.
 152
 153 3. Mapping tolerance: Strict and loose
 154 --------------------------------------
 155
 156 In many character sets, a single character may have multiple semantics,
 157 either by explicit definition, ambiguous definition, or established
 158 usage. For example, the JIS character 0x2142, or 0x8161 in Shift-JIS,
 159 is specified in the JIS X0208 standard to have two meanings: "double
 160 vertical line" and "parallel". Each of these meanings corresponds to a
 161 different Unicode character: 0x2016 DOUBLE VERTICAL LINE and 0x2225
 162 PARALLEL TO. When mapping from Unicode to Shift-JIS, it is normally
 163 desirable to map both of these Unicode characters to the single
 164 Shift-JIS character. However, when mapping the Shift-JIS character to
 165 Unicode, we can choose only one of the possible Unicode characters.
 166
 167 For two encodings X and Y, we can define a set of "strict" mappings
 168 from one to the other as follows: If text in X can be mapped to Y using
 169 the strict mappings from X to Y, then the resulting text can be mapped
 170 back using the strict mappings from Y to X to end up with the original
 171 text from X. Similarly, if text in Y can be mapped to X using the strict
 172 mappings from Y to X, then the resulting text can be mapped back using
 173 the strict mappings from X to Y to end up with the original text from Y.
 174
 175 There may be several characters in one encoding that all map to a
 176 single character in another encoding, but only one of these mappings
 177 can be strict; the others are "loose".
 178
 179 The mappings given in the accompanying tables are strict mappings.
 180 However, the Mac OS Text Encoding Converter also supports loose
 181 mappings and fallback mappings. Some of the accompanying tables provide
 182 suggestions about possible loose mappings.
 183
 184 4. Mapping a Mac encoding character to a Unicode sequence or vice versa
 185 -----------------------------------------------------------------------
 186
 187 In some cases, a character in a Mac OS encoding maps to a sequence of
 188 Unicode characters. For example, the Mac OS Japanese encoding includes
 189 a character for the circled CJK ideograph "big". Although Unicode
 190 encodes other circled ideographs as single characters, it does not
 191 encode this one. However, this character can be unambiguously
 192 represented in Unicode as the Unicode sequence 0x5927+0x20DD, the CJK
 193 ideograph for "big" followed by COMBINING ENCLOSING CIRCLE.
 194
 195 To handle the reverse mapping, a transcoding process must group the
 196 Unicode sequence 0x5927+0x20DD as a single element for lookup (The
 197 Mac OS Text Encoding Converter does this).
 198
 199 In a few cases, a sequence of characters in a Mac OS encoding must
 200 be grouped for mapping to a single Unicode character or a sequence
 201 of Unicode characters. For example, in Mac OS Devanagari (based on
 202 ISCII-91), DEVANAGARI LETTER VOCALIC L is represented as 0xA6+0xE9;
 203 but this is represented in Unicode by the single character 0x090C.
 204 Furthermore, explicit halant is represented in Mac OS Devanagari as
 205 0xE8+0xE8 (double halant) and in Unicode as 0x094D+0x200C (VIRAMA
 206 plus ZERO WIDTH NON-JOINER). The latter can also be considered as
 207 a context-dependent mapping of 0xE8, halant.
 208
 209 Loose mappings from Unicode to a Mac OS encoding often map a single
 210 Unicode to a sequence of characters in the Mac OS encoding. For example,
 211 the Unicode character 0x00BD VULGAR FRACTION ONE HALF cannot be mapped
 212 into the Mac OS Roman character set as a single character, but it has a
 213 loose mapping to the sequence 0x31+0xDA+0x32, "digit one" + "fraction
 214 slash" + "digit two".
 215
 216 In some cases a Unicode character such as a direction override may
 217 simply be discarded when mapping to a Mac OS encoding, since the
 218 information carried by the override may be represented in a different
 219 way by the Mac OS encoding. See the next section for an example.
 220
 221 5. Mappings that depend on directionality (or other attributes)
 222 ---------------------------------------------------------------
 223
 224 Strict mappings from Unicode to Mac OS encodings may depend on resolved
 225 character direction. Loose mappings may depend on additional attributes
 226 such as the state of symmetric swapping and whether the text should use
 227 vertical form codes if available (i.e. whether the text is intended for
 228 vertical display on a system that cannot automatically substitute
 229 vertical forms).
 230
 231 a) Resolved character direction
 232
 233 The Mac OS Arabic and Hebrew character sets were developed in 1986-1987.
 234 At that time the bidirectional line layout algorithm used in the Mac OS
 235 was fairly simple; it used only a few direction classes (instead of the
 236 13 or so now used in the Unicode bidirectional algorithm). In order to
 237 permit users to handle some tricky layout problems, certain punctuation
 238 and symbol characters have duplicate code points, one with a left-right
 239 direction attribute and the other with a right-left direction attribute.
 240
 241 For example, plus sign is encoded at 0x2B with a left-right attribute,
 242 and at 0xAB with a right-left attribute. However, there is only one PLUS
 243 SIGN character in Unicode. This leads to some interesting problems when
 244 mapping between Mac OS Arabic or Hebrew and Unicode.
 245
 246 We need a way to map both of these plus signs to Unicode and back. Using
 247 a single corporate character for one of these plus signs is not a good
 248 solution, since both of the plus sign characters are likely to be used
 249 in text that is interchanged, and thus content would be lost.
 250
 251 The problem is solved with the use of direction override characters and
 252 direction-dependent mappings. When mapping from Mac OS Arabic or Hebrew
 253 to Unicode, we use direction overrides as necessary to force the
 254 direction of the resulting Unicode characters. When mapping back from
 255 Unicode, the Unicode bidirectional algorithm should be used to determine
 256 resolved direction of the Unicode characters. The mapping from Unicode
 257 to Mac OS Arabic or Hebrew can then be disambiguated as necessary by
 258 using the resolved direction.
 259
 260 For example, when mapping from Mac OS Arabic or Hebrew, we can use
 261 LEFT-RIGHT OVERRIDE (LRO), RIGHT-LEFT OVERRIDE (RLO), and POP DIRECTION
 262 FORMATTING (PDF) as follows:
 263
 264   0x2B ->  0x202D (LRO) + 0x002B (PLUS SIGN) + 0x202C (PDF)
 265   0xAB ->  0x202E (RLO) + 0x002B (PLUS SIGN) + 0x202C (PDF)
 266
 267 When mapping back, we resolve the direction of the Unicode character
 268 0x002B, and use this information to determine which of the Mac OS
 269 encoding characters to use:
 270
 271   0x002B -> 0x2B (if LR) or 0xAB (if RL)
 272
 273 After direction overrides have been used in this way to force a
 274 particular resolved direction, they may be discarded when mapping from
 275 Unicode to Mac OS Arabic and Hebrew (since the information they carried
 276 in Unicode is represented in the Mac OS encoding by the code point of
 277 the plus sign).
 278
 279 Even when not required for round-trip fidelity, direction overrides
 280 may be used when mapping from a Mac OS encoding to Unicode in order to
 281 preserve proper text layout. For example, the single Mac OS Arabic
 282 ellipsis character has direction class right-left, while the Unicode
 283 HORIZONTAL ELLIPSIS character has direction class neutral. When
 284 mapping the Mac OS ellipsis to Unicode, it is surrounded with a
 285 direction override to help preserve proper text layout. However,
 286 resolved direction is not needed or used when mapping the Unicode
 287 HORIZONTAL ELLIPSIS back to Mac OS Arabic.
 288
 289 b) Symmetric swapping
 290
 291 In loose mappings from Unicode to the Mac OS Arabic character set, the
 292 state of symmetric swapping (which may be changed by the Unicode
 293 characters 0x206A, 0x206B) affects the mapping of paired characters such
 294 as punctuation and brackets. This does not affect the strict mappings
 295 given in the accompanying tables.
 296
 297 c) Horizontal or vertical display
 298
 299 The Mac OS Japanese encoding includes separately-encoded vertical forms
 300 for some punctuation and kana. When Unicode characters in the CJK
 301 punctuation and kana ranges are mapped to Mac OS Japanese characters and
 302 (1) those characters are intended for vertical display, (2) they will be
 303 displayed in an environment that does not provide automatic vertical
 304 form substitution, and (3) loose mappings are desired, the Unicode
 305 characters can be mapped to the corresponding vertical form codes in the
 306 Mac OS Japanese encoding.
 307
 308 This does not affect mapping of the Unicode vertical presentation forms
 309 (which always map to the Mac OS Japanese vertical form codes).
 310
 311 6. Use of corporate characters
 312 ------------------------------
 313
 314 Apple has defined a block of 32 corporate characters as "transcoding
 315 hints." These are used in combination with standard Unicode characters
 316 to force them to be treated in a special way for mapping to other
 317 encodings; they have no other effect. Sixteen of these transcoding
 318 hints are "grouping hints" - they indicate that the next 2-4 Unicode
 319 characters should be treated as a single entity for transcoding. The
 320 other sixteen transcoding hints are "variant tags" - they are like
 321 combining characters, and can follow a standard Unicode (or a sequence
 322 consisting of a base character and other combining characters) to
 323 cause it to be treated in a special way for transcoding. These always
 324 terminate a combining-character sequence.
 325
 326 Whenever possible, mappings that require corporate-zone characters
 327 use standard Unicode characters in combination with a single
 328 transcoding hint (no mapping uses more than one transcoding hint).
 329 For these mappings, even if the corporate-zone characters are lost in
 330 interchange, the basic text content will be preserved.
 331
 332 However, some characters in a Mac OS encoding - such as the Apple
 333 logo character - bear no relation to any standard Unicode character.
 334 In these cases, the Mac OS character is mapped to a single corporate
 335 zone character defined by Apple. Fewer than 15 corporate characters
 336 are used in this way.
 337
 338 All of the corporate characters defined by Apple are listed in the
 339 accompanying file "CORPCHAR.TXT", including old Apple corporate
 340 character assignments which are now deprecated (but which are still
 341 supported as loose mappings by the Mac OS Text Encoding Converter).
 342
 343 7. Font variants
 344 ----------------
 345
 346 For some Mac OS encodings, certain fonts used with that encoding may
 347 actually implement a slight variant of the standard encoding specified
 348 in the accompanying mapping tables. The header comments in the mapping
 349 table files for each encoding describe any font variants associated with
 350 that encoding.
 351
 352 8. Mac OS encodings
 353 -------------------
 354
 355 The Mac OS can support multiple encodings. In the current Mac OS
 356 architecture these encodings are distinguished primarily by script code:
 357 font family IDs are grouped into ranges, and each range is associated
 358 with a script code.
 359
 360 In some cases, there are several encodings that share a single script
 361 code. Usually these are closely related. To distinguish among these,
 362 additional information is required, such as font name or system
 363 region code (locale code).
 364
 365 The encodings described here (and in the accompanying tables) are the
 366 encodings used in Mac OS versions 7.1 and later. In some cases, certain
 367 earlier system versions have used different encodings.
 368
 369 In all Mac OS encodings, character codes 0x00-0x7F are identical to
 370 ASCII, except that
 371   - in Mac OS Japanese, reverse solidus is replaced by yen sign
 372   - in Mac OS Arabic, Farsi, and Hebrew, some of the punctuation in this
 373     range is treated as having strong left-right directionality,
 374         although the corresponding Unicode characters have neutral
 375         directionality
 376 Fonts used as "system" fonts (for menus, dialogs, etc.) have four glyphs
 377 at code points 0x11-0x14 for transient use by the Menu Manager. These
 378 glyphs are not intended as characters for use in normal text, and the
 379 associated code points are not generally interpreted as associated with
 380 these glyphs. (However, a "system font variant" mapping table could
 381 provide mappings for these).
 382
 383 Note that in general, character sets cannot be determined from font
 384 layouts (they are not the same thing!). This is very noticeable with
 385 Arabic, Hebrew, and Devanagari, for example.
 386
 387 The following is a list of current Mac OS encodings. The accompanying
 388 tables provide mappings from these encodings to Unicode 2.0.
 389
 390 a) Mac OS encodings for script code 0, smRoman.
 391
 392 * Roman - this is the default for script code 0 (when the special
 393   cases listed below do not apply). It covers several western European
 394   languages, and includes math operators and various symbols.
 395
 396 * Symbol - this is the encoding for the font named "Symbol". It includes
 397   Greek letters, math operators, and miscellaneous symbols. The layout
 398   of the Symbol character set is identical to the layout of the Adobe
 399   Symbol encoding vector, with the addition of the Apple logo at 0xF0.
 400
 401 * Dingbats - this is the encoding for the font named "Zapf Dingbats".
 402   The layout of the Dingbats character set is identical to or a superset
 403   of the layout of the Adobe Zapf Dingbats encoding vector.
 404
 405 * Turkish - this is the encoding if the script code is 0 and the system
 406   region code is 24, verTurkey. It has 7 code point differences from
 407   Mac OS Roman.
 408
 409 * Croatian - this is the encoding if the script code is 0 and the system
 410   region code is any of the following:
 411     68, verCroatia
 412         66, verSlovenian
 413         25, verYugoCroatian (only used in older systems)
 414   It has 20 code point differences from standard Roman, but only 10
 415   differences in repertoire.
 416
 417 * Icelandic - this is the encoding if the script code is 0 and the
 418   system region code is either of the following:
 419     21, verIceland
 420         47, verFaroeIsl
 421   It has 6 code point differences from standard Roman. It also has one
 422   font variant.
 423
 424 * Romanian - this is the encoding if the script code is 0 and the system
 425   region code is 39, verRomania . It has 6 code point differences from
 426   standard Roman.
 427
 428 * Celtic - this is the encoding if the script code is 0 and the system
 429   region code is any of the following:
 430     50, verIreland
 431         75, verScottishGaelic
 432         76, verManxGaelic
 433         77, verBreton
 434         79, verWelsh
 435   It is a variant of Mac OS Roman with a few extra accented characters
 436   for Welsh.
 437
 438 * Gaelic - this is the encoding if the script code is 0 and the system
 439   region code is 81, verIrishGaelicScript. It is a variant of Mac OS
 440   Roman, and supports the older Irish orthography using dot above.
 441
 442 * Greek (monotonic) - this is the encoding if the script code is 0 and
 443   the system region code is 20, verGreece. Although a script code is
 444   defined for Greek, the Greek localized system does not use it (the
 445   font family IDs are in the smRoman range). This encoding is based on
 446   the ISO/IEC 8859-7 repertoire with additional Roman characters for
 447   French and German, as well as additional symbols. Greek system 4.1
 448   used a different encoding that matched 8859-7 code points for Greek
 449   letters. Greek system 6.0.7 also used a variant of the standard
 450   encoding, but it was quickly replaced by Greek system 6.0.7.1 which
 451   used the standard encoding.
 452
 453   See also the Central European encoding under script code 29 below.
 454
 455 b) Mac OS encodings for script code 1, smJapanese.
 456
 457 * Japanese - this is the default for script code 1. It is based on a
 458   Shift-JIS implementation of JIS X0208-1990 ("fullwidth") and
 459   JIS X0201-1976 ("halfwidth"), with 5 additional one-byte characters
 460   and one modified character, a set of Apple extension characters which
 461   include many industry standard extensions, and separate codes for
 462   vertical forms of some punctuation and kana. There are several font
 463   variants.
 464
 465 c) Mac OS encodings for script code 2, smTradChinese.
 466
 467 * Chinese Traditional - this is an extension of Big-5.
 468
 469 d) Mac OS encodings for script code 3, smKorean.
 470
 471 * Korean - this is an extension of EUC-KR.
 472
 473 e) Mac OS encodings for script code 4, smArabic.
 474
 475 * Arabic - This is the default for script code 4 (when the special
 476   case listed below does not apply). It is based on the ISO/IEC 8859-6
 477   repertoire, with additional Arabic letters for Persian and Urdu and
 478   with accented Roman letters for European languages. It has the
 479   interesting feature mentioned above that certain ASCII punctuation
 480   and symbol characters are encoded twice, once for each direction. It
 481   has several font variants.
 482
 483 * Farsi - This is the encoding if the script code is 4 and the system
 484   region code is 48, verIran. It is similar to Mac OS Arabic, but has
 485   the "extended" or Persian digits instead of the standard Arabic
 486   digits. It has one font variant.
 487
 488 f) Mac OS encodings for script code 5, smHebrew.
 489
 490 * Hebrew - This is based on the ISO/IEC 8859-8 Hebrew letter repertoire,
 491   but adds Hebrew points, some Hebrew ligatures, some accented Roman
 492   letters for European languages, and some non-ASCII punctuation. As
 493   with Mac OS Arabic, certain ASCII punctuation and symbol characters
 494   are encoded twice, once for each direction. This is also true for the
 495   European digits. This has one font variant.
 496
 497 g) Mac OS encodings for script code 6, smGreek.
 498
 499   None currently - see smRoman.
 500
 501 h) Mac OS encodings for script code 7, smCyrillic.
 502
 503 * Cyrillic - this is the default for script code 7 (when the special
 504   cases listed below do not apply). It is based on the ISO/IEC 8859-5
 505   Cyrillic character repertoire.
 506
 507 * Ukrainian - this is the encoding if the script code is 7 and the
 508   system region code is 62, verUkraine; it is also the encoding used for
 509   the Cyrillic Language Kit. It has 2 code point differences from
 510   standard Cyrillic (it adds a case pair for GHE WITH UPTURN).
 511
 512 i) Mac OS encodings for script code 9, smDevanagari.
 513
 514 * Devanagari - This is based on IS 13194:1991 (ISCII-91), and adds some
 515   punctuation and symbols.
 516
 517 j) Mac OS encodings for script code 10, smGurmukhi.
 518
 519 * Gurmukhi - This is based on IS 13194:1991 (ISCII-91), and adds some
 520   punctuation and symbols.
 521
 522 k) Mac OS encodings for script code 11, smGujarati.
 523
 524 * Gujarati - This is based on IS 13194:1991 (ISCII-91), and adds some
 525   punctuation and symbols.
 526
 527 l) Mac OS encodings for script code 21, smThai.
 528
 529 * Thai - This is based on TIS 620-2533, except that three of the
 530   TIS 620-2533 characters are replaced with other characters. Some
 531   undefined code points in TIS 620-2533 are used for additional
 532   punctuation characters.
 533
 534 m) Mac OS encodings for script code 25, smSimpChinese.
 535
 536 * Chinese Simplified - this is an extension of EUC-CN.
 537
 538 n) Mac OS encodings for script code 26, smTibetan.
 539
 540 * Tibetan
 541
 542 o) Mac OS encodings for script code 28, smEthiopic.
 543
 544 * Inuit - this is the encoding if the script code is 28 and the
 545   system region code is 78, verNunavut (for Inuktitut language).
 546   There is no script code for Inuit, so it shares the script code
 547   with Ethiopic.
 548
 549 p) Mac OS encodings for script code 29, smCentralEuroRoman.
 550
 551 * Central European - This is similar to standard Roman, but with a
 552   different (and larger) set of European characters and with fewer
 553   symbols. It is used for Polish, Czech, Slovak, Hungarian, Estonian,
 554   Latvian, and Lithuanian.