1 # Default 7bit replacements.
3 # Default 7bit replacements. If the MIME name is set to us-ascii,
4 # this will be identified with the "7 bit approximations" Display
7 # This table is very important and should not be excluded from the distribution
8 # since this is a default fallback for any 8bit user's "display character set"
9 # which (nearly) of 256 chars and could not map a rich Unicode repertoire.
11 # M.P.: unicode to ascii table. I took this file from lynx.
12 # lynx/src/chrtrans/def7_uni.tbl
14 #The MIME name of this charset.
17 # Shall this become the "default" translation table? YES!
18 # There has to be exactly one table marked as "default".
22 # us-ascii characters should not normally pass here,
23 # they are always processed directly but let declare them here:
28 # should not happen (processed in the code):
44 # should not happen (processed in the code):
46 # XXX: The following is missing in original lynx, not sure if we should do
47 # anything about it...? --pasky
70 # Ä, not the best choice for some languages.
80 # Ö, not the best choice for some languages.
85 # Ü, not the best choice for some languages.
92 # ä, not the best choice for some languages.
103 # ö, not the best choice for some languages.
108 # ü, not the best choice for some languages.
113 # end of latin-1 repertoire
114 0x41 U+0100 U+0102 U+0104 # A
115 0x61 U+0101 U+0103 U+0105 # a
116 0x43 U+0106 U+010a U+010c # C
117 # "Fundamento de Esperanto"
119 # The following line is an example for mapping several accented versions
120 # of small letter 'c' to 'c':
121 0x63 U+0107 U+010b U+010d # c
122 # "Fundamento de Esperanto"
128 0x45 U+0112 U+0114 U+0116 U+0118 U+011a # E
129 0x65 U+0113 U+0115 U+0117 U+0119 U+011b # e
130 0x47 U+011e U+0120 U+0122 # G
132 # "Fundamento de Esperanto"
134 0x67 U+011f U+0121 U+0123 # g
140 0x48 U+0127 # LATIN SMALL LETTER H BAR -> H
141 0x49 U+0128 U+012a U+012c U+012e U+0130 # I
142 0x69 U+0129 U+012b U+012d U+012f U+0131 # i
146 # "Fundamento de Esperanto"
153 0x4c U+0139 U+013b U+013d # L
154 0x6c U+013a U+013c U+013e # l
159 0x4e U+0143 U+0145 U+0147 # N
160 0x6e U+0144 U+0146 U+0148 # n
163 0x4e U+014B # LATIN SMALL LETTER ENG -> N
164 0x4f U+014c U+014e # O
165 0x6f U+014d U+014f # o
170 0x52 U+0154 U+0156 U+0158 # R
171 0x72 U+0155 U+0157 U+0159 # r
173 # "Fundamento de Esperanto"
174 0x53 U+015a U+015e U+0160 # S
176 0x73 U+015b U+015f U+0161 # s
179 0x54 U+0162 U+0164 # T
180 0x74 U+0163 U+0165 # t
183 0x55 U+0168 U+016a U+016e U+0172 # U
186 0x75 U+0169 U+016b U+016f U+0173 # u
195 0x5a U+0179 U+017b U+017d U+021d
196 0x7a U+017a U+017c U+017e
283 # Linkname: FAQ: Representing IPA Phonetics in ASCII
284 # URL: http://www.hpl.hp.com/personal/Evan_Kirshenbaum/IPA/faq.html
285 # (corrected in Russian Cyrillic area).
286 # (corrected in Greek area).
288 0x41 U+0251 # LATIN SMALL LETTER SCRIPT A -> A
291 0x4f U+0254 # LATIN SMALL LETTER OPEN O -> O
295 0x40 U+0259 # LATIN SMALL LETTER SCHWA -> @
296 0x52 U+025A # LATIN SMALL LETTER SCHWA HOOK -> R
297 0x45 U+025B # LATIN SMALL LETTER EPSILON -> E
301 0x4a U+025F # LATIN SMALL LETTER DOTLESS J BAR -> J
303 0x67 U+0261 # LATIN SMALL LETTER SCRIPT G
304 0x47 U+0262 # LATIN LETTER SMALL CAPITAL G
305 0x51 U+0263 # LATIN SMALL LETTER GAMMA -> Q
310 0x49 U+026A U+0269 # LATIN LETTER SMALL CAPITAL I, LATIN SMALL LETTER IOTA
311 0x4c U+026B # LATIN SMALL LETTER L WITH MIDDLE TILDE
312 0x4c U+026C # LATIN SMALL LETTER L BELT
317 0x4d U+0271 # LATIN SMALL LETTER M HOOK
322 0x55 U+0277 # LATIN SMALL LETTER CLOSED OMEGA -> U
323 0x72 U+0279 # LATIN SMALL LETTER TURNED R -> r
327 0x2a U+027E # LATIN SMALL LETTER FISHHOOK R -> *
329 0x52 U+0280 # LATIN LETTER SMALL CAPITAL R -> R
332 0x53 U+0283 # LATIN SMALL LETTER ESH -> S
337 0x55 U+028A # LATIN SMALL LETTER UPSILON -> U
339 0x56 U+028C # LATIN SMALL LETTER TURNED V -> V
345 0x3f U+0294 # LATIN SMALL LETTER GLOTTAL STOP -> ?
352 0x6a U+029d # LATIN SMALL LETTER CROSSED-TAIL J
354 0x4c U+029F # LATIN LETTER SMALL CAPITAL L
509 # Cyrillic capital letters
523 # Russian Cyrillic letters, transliterated
590 # end of Russian Cyrillic letters.
591 # Cyrillic small letters (and some archaic)
619 # These may make Yiddish slightly more readable, until we have
739 # Replacement strings for Ethiopic characters
1059 # ETHIOPIC SPACE U+1360 mapped to ASCII space
1373 # General punctuation:
1374 0x20 U+2000 U+2002 U+2004-U+2009 # spaces
1380 0x2d U+2010 U+2011 U+2013 U+2015 # hyphen-like
1384 0x60 U+2018 # left single quotation mark <`>
1385 0x27 U+2019-U+201b # various single quotation marks <'>
1386 0x22 U+201c-U+201f # various double quotation marks <">
1397 # Dont wanna see these:
1398 # POP DIRECTIONAL FORMATTING 202C
1400 # LEFT-TO-RIGHT OVERRIDE 202D
1417 0x2d U+2043 # HYPHEN BULLET ?
1421 # end of General punctuation.
1450 # Old euro currency sign glyph:
1456 # New euro currency sign glyph:
1479 0x4b U+212A # Kelvin sign - K
1648 U+22B2: NORMAL SUBGROUP OF
1649 U+22B3: CONTAINS AS NORMAL SUBGROUP
1650 U+22B4: NORMAL SUBGROUP OF OR EQUAL TO
1651 U+22B5: CONTAINS AS NORMAL SUBGROUP OR EQUAL TO
1841 0x2b U+250c-U+256c # box drawings, use +
1917 0x58 U+2713 U+2717 # check marks -> x
1920 0x20 U+3000 # ideographic space
2185 #There are four special ranges of characters that are represented only by
2186 #their start and end characters <...>
2188 # The CJK Ideographs Area (U+4E00 - U+9FFF)
2189 # The Hangul Syllables Area (U+AC00 - U+D7A3)
2190 # The Surrogates Area (U+D800 - U+DFFF)
2191 # The Private Use Area (U+E000 - U+F8FF)
2318 # the reverse byte-order-mark: zero-width non-break space
2321 0x21-0x7e U+ff01-U+ff5e
2326 # Symbols for C0 and C1 control characters, in case they get through...
2360 # Most of these characters (80-9F) may be inflicted on us
2361 # by MS FrontPages which uses Unicode notation such as ™
2362 # but there are no assigned letters in Unicode 128-159 range.
2363 # It is assumed in the code that those codepoints are from windows-1252.
2398 # Let's try to show a question mark for character that cannot
2399 # be shown. U+fffd is used for invalid characters.
2400 # It works, but let's stick with UHHH representation. - FM