1 # $NetBSD: UCS%Big5@1984.src,v 1.1 2006/06/19 17:28:25 tnozaki Exp $
2 # $DragonFly: src/share/i18n/csmapper/BIG5/UCS%Big5@1984.src,v 1.1 2008/04/10 10:21:02 hasso Exp $
6 SRC_ZONE 0x00A2 - 0xFFE5
13 # This mapping data is made from the mapping data provided by Unicode, Inc.
16 # Name: BIG5 to Unicode table (complete)
17 # Unicode version: 1.1
18 # Table version: 0.0d3
19 # Table format: Format A
20 # Date: 11 February 1994
22 # Copyright (c) 1991-1994 Unicode, Inc. All Rights reserved.
24 # This file is provided as-is by Unicode, Inc. (The Unicode Consortium).
25 # No claims are made as to fitness for any particular purpose. No
26 # warranties of any kind are expressed or implied. The recipient
27 # agrees to determine applicability of information provided. If this
28 # file has been provided on magnetic media by Unicode, Inc., the sole
29 # remedy for any claim will be exchange of defective media within 90
32 # Recipient is granted the right to make copies in any form for
33 # internal distribution and to freely use the information supplied
34 # in the creation of products supporting Unicode. Unicode, Inc.
35 # specifically excludes the right to re-distribute this file directly
36 # to third parties or other organizations whether for profit or not.
41 # This table contains one set of mappings from BIG5 into Unicode.
42 # Note that these data are *possible* mappings only and may not be the
43 # same as those used by actual products, nor may they be the best suited
44 # for all uses. For more information on the mappings between various code
45 # pages incorporating the repertoire of BIG5 and Unicode, consult the
46 # VENDORS mapping data. Normative information on the mapping between
47 # BIG5 and Unicode may be found in the Unihan.txt file in the
48 # latest Unicode Character Database.
50 # If you have carefully considered the fact that the mappings in
51 # this table are only one possible set of mappings between BIG5 and
52 # Unicode and have no normative status, but still feel that you
53 # have located an error in the table that requires fixing, you may
54 # report any such error to errata@unicode.org.
56 # WARNING! It is currently impossible to provide round-trip compatibility
57 # between BIG5 and Unicode.
59 # A number of characters are not currently mapped because
60 # of conflicts with other mappings. They are as follows:
62 # BIG5 Description Comments
64 # 0xA15A SPACING UNDERSCORE duplicates A1C4
65 # 0xA1C3 SPACING HEAVY OVERSCORE not in Unicode
66 # 0xA1C5 SPACING HEAVY UNDERSCORE not in Unicode
67 # 0xA1FE LT DIAG UP RIGHT TO LOW LEFT duplicates A2AC
68 # 0xA240 LT DIAG UP LEFT TO LOW RIGHT duplicates A2AD
69 # 0xA2CC HANGZHOU NUMERAL TEN conflicts with A451 mapping
70 # 0xA2CE HANGZHOU NUMERAL THIRTY conflicts with A4CA mapping
72 # We currently map all of these characters to U+FFFD REPLACEMENT CHARACTER.
73 # It is also possible to map these characters to their duplicates, or to
78 # 1. In addition to the above, there is some uncertainty about the
79 # mappings in the range C6A1 - C8FE, and F9DD - F9FE. The ETEN
80 # version of BIG5 organizes the former range differently, and adds
81 # additional characters in the latter range. The correct mappings
82 # these ranges need to be determined.
84 # 2. There is an uncertainty in the mapping of the Big Five character
85 # 0xA3BC. This character occurs within the Big Five block of tone marks
86 # for bopomofo and is intended to be the tone mark for the first tone in
87 # Mandarin Chinese. We have selected the mapping U+02C9 MODIFIER LETTER
88 # MACRON (Mandarin Chinese first tone) to reflect this semantic.
89 # However, because bopomofo uses the absense of a tone mark to indicate
90 # the first Mandarin tone, most implementations of Big Five represent
91 # this character with a blank space, and so a mapping such as U+2003 EM
92 # SPACE might be preferred.
94 # Format: Three tab-separated columns
95 # Column #1 is the BIG5 code (in hex as 0xXXXX)
96 # Column #2 is the Unicode (in hex as 0xXXXX)
97 # Column #3 is the Unicode name (follows a comment sign, '#')
98 # The official names for Unicode characters U+4E00
99 # to U+9FA5, inclusive, is "CJK UNIFIED IDEOGRAPH-XXXX",
100 # where XXXX is the code point. Including all these
101 # names in this file increases its size substantially
102 # and needlessly. The token "<CJK>" is used for the
103 # name of these characters. If necessary, it can be
104 # expanded algorithmically by a parser or editor.
106 # The entries are in BIG5 order
109 0x00A2 = 0xA246 # fallback -> 0xFFE0
110 0x00A3 = 0xA247 # fallback -> 0xFFE1
111 0x00A5 = 0xA244 # fallback -> 0xFFE5
178 0x2022 = 0xA145 # fallback -> 0x2027
220 0x223C = 0xA1E3 # fallback -> 0xFF5E
287 0x2609 = 0xA1F3 # fallback -> 0x2299
289 0x2641 = 0xA1F2 # fallback -> 0x2295
1166 0x5344 = 0xA2CD # fallback -> 0x3039
3078 0x5F5D = 0xC255 # fallback -> 0x5F5E
13575 0xFF64 = 0xA14E # fallback -> 0xFE51