Merge branch 'master' of git+ssh://repo.or.cz/srv/git/jben
[jben.git] / license / kanjidic_doc.html
blob51738f954897e318376986693e9116ccc8ba2e6a
1 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
2 <html><head>
5 <meta http-equiv="Content-Type" content="text/html; charset=euc-jp">
6 <meta name="Generator" content="Jim's Markup Program - V0.99"><title>KANJIDIC Documentation</title></head><body bgcolor="white">
7 <!-- DO NOT EDIT!!
8 This HTML document was generated by the "markup" program.
9 Edit the original file instead. -->
10 <h1 align="center"> KANJIDIC</h1>
11 <h2 align="center"> A Database of Information on the 6,355 Kanji in the JIS X 0208 Standard</h2>
12 <i>Copyright (C) 2005 The Electronic Dictionary Research and Development</i>
13 <i>Group, Monash University.</i>
14 <p>
15 <i>(NB: This document has been converted quickly from plain text to HTML. </i>
16 <i>As a result, some of the formatting has been left as it was in the </i>
17 <i>original document. A more elegant version may be developed later.)</i>
18 </p>
19 <p>
20 <b>CONTENTS:</b>
21 </p>
22 <ul>
23 <li><a href="#IREF01">INTRODUCTION</a></li>
24 <li><a href="#IREF02">CONTENTS &amp; FORMAT</a></li>
25 <li><a href="#IREF03">INFORMATION FIELDS</a></li>
26 <li><a href="#IREF04">CURRENT USAGE</a></li>
27 <li><a href="#IREF05">SUPPORT</a></li>
28 <li><a href="#IREF06">TOO MUCH INFORMATION?</a></li>
29 <li><a href="#IREF07">HISTORY</a></li>
30 <li><a href="#IREF08">LICENCE STATEMENT AND COPYRIGHT NOTICE</a></li>
31 <li><a href="#IREF09">APPENDIX A - JIS CODES</a></li>
32 <li><a href="#IREF10">APPENDIX B - UNICODE</a></li>
33 <li><a href="#IREF11">APPENDIX C - SKIP CODES</a></li>
34 <li><a href="#IREF12">APPENDIX D - AN OVERVIEW OF THE FOUR CORNER CODING SYSTEM</a></li>
35 <li><a href="#IREF13">APPENDIX E - RADICAL AND STROKE COUNTING RULES</a></li>
36 <li><a href="#IREF14">APPENDIX F - CONDITIONS FOR USING SKIP DATA</a></li>
37 <li><a href="#IREF15">APPENDIX G - DE ROO CODES</a></li>
38 </ul>
39 <p>
40 <b><a name="IREF01">INTRODUCTION</a></b>
41 </p>
42 <p>
43 The KANJIDIC file contains comprehensive information about Japanese kanji. It
44 is a text file currently 6,355 lines long, with one line for each kanji in
45 the two levels of the characters specified in the JIS X 0208-1990 set. (For
46 basic information about this set, see Appendix A.)
47 </p>
48 <p>
49 The file contains a mixture of ASCII characters and kana/kanji encoded using
50 the EUC (Extended Unix Code) coding.
51 </p>
52 <p>
53 Attention is drawn to the KANJIDIC LICENCE STATEMENT AND COPYRIGHT NOTICE
54 included below in this document.
55 </p>
56 <p>
57 A similar file, KANJD212, is available for the 5,801 supplementary kanji in
58 the JIS X 0212-1990 set.
59 </p>
60 <p>
61 From June 2003, the KANJIDIC file has been generated from a database
62 developed from KANJIDIC to support the KANJIDIC2 XML-format version.
63 The legacy KANJIDIC format file will continue to be distributed.
64 </p>
65 <p>
66 <b><a name="IREF02">CONTENTS &amp; FORMAT</a></b>
67 </p>
68 <p>
69 The first part of each line is of a fixed format, indicating which character
70 the line is for, while the rest is more free-format.
71 </p>
72 <p>
73 The first two bytes are the kanji itself. There is then a space, the 4-byte
74 ASCII representation of the hexadecimal coding of the two-byte JIS encoding,
75 and another space.
76 </p>
77 <p>
78 The rest of the line is composed of a combination of three kinds of fields
79 (which may be in any order and interspersed):
80 </p>
81 <ol type="a">
82 <li>information fields, beginning with an identifying letter and ending with
83 a space. See below for more information about these fields.
84 <p>
85 </p>
86 </li>
87 <li>readings (with '-' to indicate prefixes/suffixes, and '.' to indicate the
88 portion of the reading that is okurigana). ON-yomi are in katakana and
89 KUN-yomi are in
90 hiragana. There may be several classes of reading fields, with ordinary
91 readings first, followed by members of the other classes, if any. The current
92 other classes, and their tagging, are:
93 <ol type="i">
94 <li>where the kanji has special "nanori" (i.e. name) readings,
95 these are preceded the marker "T1";
96 <p>
97 </p>
98 </li>
99 <li>where the kanji is a radical, and the radical name is not already
100 a reading, the radical name is preceded the marker "T2".
102 (Other Tn classes may be created at a later date.)
103 </p>
104 </li>
105 </ol>
106 </li>
107 <li>English meanings. Each such field begins with an open brace '{' and ends
108 at the next close brace '}'.
109 </li>
110 </ol>
111 <b><a name="IREF03">INFORMATION FIELDS</a></b>
113 There are currently a variety of predefined fields (programs using KANJIDIC
114 should not make any assumptions about the presence or absence of any of these
115 fields, as KANJIDIC is certain to be extended in the future):
116 </p>
117 <ul>
118 <li> B&lt;num&gt; -- the radical (Bushu) number. There is one per entry. As
119 far as possible, this is the radical number used in the Nelson
120 "Modern
121 Japanese-English Character Dictionary" (i.e. the Classic, not the New Nelson). Where the classical or
122 historical radical number differs from this, it is present as a
123 separate C&lt;num&gt; entry.
125 </p>
126 </li>
127 <li> C&lt;num&gt; -- the historical or classical radical number, as recorded
128 in the KangXi Zidian (where this differs from the B&lt;num&gt; entry.) There
129 will be at most one of these.
131 </p>
132 </li>
133 <li> F&lt;num&gt; -- the frequency-of-use ranking. At most one per line. The
134 2,501 most-used characters have a ranking; those characters that lack
135 this field are not ranked. The frequency is a number from 1 to 2,501
136 that expresses the relative frequency of occurrence of a character in
137 modern Japanese. The data is based on an analysis of word
138 frequencies in the Mainichi Shimbun over 4 years by Alexandre Girardi.
139 From this the relative frequencies have been derived. Note:
140 <ol type="a">
141 <li>these frequencies are biassed towards words and kanji used in newspaper
142 articles,
143 </li>
144 <li>the relative frequencies for the last few hundred
145 kanji so graded is quite imprecise.
146 </li>
147 </ol>
148 (Earlier editions of the KANJIDIC file used a frequency-of-use ranking
149 from the
150 National Language Research Institute (Tokyo), interpreted and adapted
151 by Jack Halpern.)
153 </p>
154 </li>
155 <li> G&lt;num&gt; -- the Jouyou grade level. At most one per line. G1 through
156 G6 indicate Jouyou grades 1-6. G8 indicates general-use characters.
157 G9 indicates Jinmeiyou ("for use in names") characters. If not
158 present, it is a kanji outside these categories.
160 </p>
161 </li>
162 <li> H&lt;num&gt; -- the index number in the New Japanese-English Character
163 Dictionary, edited by Jack Halpern. At most one allowed per line.
164 If not preset, the character is not in Halpern.
166 </p>
167 </li>
168 <li> N&lt;num&gt; -- the index number in the "Modern Reader's Japanese-English
169 Character Dictionary", edited by Andrew Nelson. At most one allowed
170 per line. If not present, the character is not in Nelson, or is
171 considered to be a non-standard version, in which case it may have a
172 cross-reference code in the form: XNnnnn. (Note that many kanji
173 currently used are what Nelson described as "non-standard" forms or
174 glyphs.)
176 </p>
177 </li>
178 <li> V&lt;num&gt; -- the index number in The New Nelson Japanese-English
179 Character Dictionary, edited by John Haig.
181 </p>
182 </li>
183 <li>D&lt;code&gt; -- the "D" codes will be progressively used for dictionary
184 based codes.
185 <ol type="a">
186 <li>DBnnn - the index numbers used in "Japanese For Busy People" vols I-III,
187 published by the AJLT. The codes are the volume.chapter.
189 </p>
190 </li>
191 <li>DCnnnn - the index numbers used in "The Kanji Way to Japanese Language
192 Power" by Dale Crowley.
194 </p>
195 </li>
196 <li>DGnnn - the index numbers used in the "Kodansha Compact Kanji Guide".
198 </p>
199 </li>
200 <li>DHnnnn - the index numbers used in the 3rd edition of
201 "A Guide To Reading and Writing Japanese" edited by Ken Hensall et al.
203 </p>
204 </li>
205 <li>DJnnnn - the index numbers used in the "Kanji in Context" by
206 Nishiguchi and Kono.
208 </p>
209 </li>
210 <li>DKnnnn - the index numbers used by Jack Halpern in his Kanji
211 Learners Dictionary, published by Kodansha in 1999. The numbers have
212 been provided by Mr Halpern.
214 </p>
215 </li>
216 <li>DOnnnn - the index numbers used in P.G. O'Neill's Essential Kanji
217 The numbers have been provided by Glenn Rosenthal.
219 </p>
220 </li>
221 <li>DRnnnn - these are the codes developed by Father Joseph De Roo,
222 and published in his book "2001 Kanji" (Bonjinsha). Fr De Roo has
223 given his permission for these codes to be included.
225 </p>
226 </li>
227 <li>DSnnnn - the index numbers used in the early editions of
228 "A Guide To Reading and Writing Japanese" edited by Florence Sakade.
230 </p>
231 </li>
232 <li>DTnnn - the index numbers used in the Tuttle Kanji Cards, compiled
233 by Alexander Kask.
235 </p>
236 </li>
237 </ol>
239 </p>
240 </li>
241 <li> P&lt;code&gt; -- the SKIP pattern code. The &lt;code&gt; is of the form
242 "P&lt;num&gt;-&lt;num&gt;-&lt;num&gt;". The System of Kanji Indexing by Patterns
243 (SKIP) is a scheme for the classification and rapid retrieval of
244 Chinese characters on the basis of geometrical patterns. Developed
245 by Jack Halpern, it first appeared in the New Japanese-English
246 Character Dictionary (Kenkyusha, Tokyo 1990; NTC, Chicago 1993), and
247 is being used in a series of dictionaries and learning tools called
248 KIT (Kanji Integrated Tools). SKIP is protected by copyright,
249 copyleft and patent laws. The commercial utilization of SKIP in any
250 form is strictly forbidden without the written permission of Jack
251 Halpern, the copyright holder (jhalpern@cc.win.or.jp). (A brief
252 summary of the method is in Appendix C. See Appendix E. for some of
253 the rules applied when counting strokes in some of the radicals.)
255 </p>
256 </li>
257 <li> S&lt;num&gt; -- the stroke count. At least one per line. If more than
258 one, the first is considered the accepted count, while subsequent
259 ones are common miscounts. (See Appendix E. For some of the rules
260 applied when counting strokes in some of the radicals.)
262 </p>
263 </li>
264 <li> U&lt;hexnum&gt; -- the Unicode encoding of the kanji. See Appendix B for
265 further information on this code. There is exactly one per line.
267 </p>
268 </li>
269 <li> I&lt;code&gt; -- the index codes in the reference books by Spahn &amp;
270 Hadamitzky. These codes take two forms:
271 <ol type="i">
272 <li>for The Kanji Dictionary (Tuttle 1996), they are in the form
273 nxnn.n, e.g. 3k11.2, where the kanji has 3
274 strokes in the identifying radical, it is radical "k" in the S&amp;H
275 classification system, there are 11 other strokes, and it is the 2nd
276 kanji in the 3k11 sequence. I am very grateful to Mark Spahn for
277 providing the (almost) full list of these descriptor codes for the
278 kanji in this file. At the time of writing some 800 kanji in the
279 file lack the SH descriptor. This is because the book used a
280 different glyph as the primary kanji. The gaps are gradually being
281 filled in. Where the JIS X 0208 glyph is the second kanji for a
282 particular descriptor code, it has a "-2" appended to the code.
284 </p>
285 </li>
286 <li>for the Kanji &amp; Kana book (Tuttle), they are in the form
287 INnnnn, where nnnn is the number of the kanji referenced in
288 that book (2nd edition.)
289 </li>
290 </ol>
291 </li>
292 <li> Qnnnn.n -- the "Four Corner" code for that kanji. This is a code
293 invented by Wang Chen in 1928, it has since then been widely used for
294 dictionaries in China and Japan. In some cases there are two of these
295 codes, as it is can be little ambiguous, and Morohashi has some kanji
296 coded differently from their traditional Chinese codes. See Appendix
297 D for an overview of the Four Corner System. Christian Wittern,
298 who passed on these codes, comments that they are in need of
299 proof-reading and thus users are advised to be cautious using the
300 codes for serious scholarship.
302 </p>
303 </li>
304 <li> MNnnnnnnn and MPnn.nnnn -- the index number and volume.page
305 respectively of the kanji in the 13-volume Morohashi Daikanwajiten.
306 In the MNnnn field, a terminal `P`, e.g. MN4879P, indicates that it
307 is 4879' in the original. In some 500 cases, the number is terminated
308 with an `X`, to indicate that the kanji in Morohashi has a close, but
309 not identical, glyph to the form in the JIS X 0208 standard.
311 </p>
312 </li>
313 <li> Ennnn -- the index number used in "A Guide To Remembering Japanese
314 Characters" by Kenneth G. Henshall. There are 1945 kanji with these
315 numbers (i.e. the Jouyou subset.)
317 </p>
318 </li>
319 <li> Knnnn -- the index number in the Gakken Kanji Dictionary ("A New
320 Dictionary of Kanji Usage"). Some of the numbers relate to the list
321 at the back of the book, jouyou kanji not contained in the
322 dictionary, and various historical tables at the end.
324 </p>
325 </li>
326 <li> Lnnnn -- the index number used in "Remembering The Kanji" by James
327 Heisig.
329 </p>
330 </li>
331 <li>Onnnn -- the index number in "Japanese Names", by P.G. O'Neill.
332 (Weatherhill, 1972) (A warning: some of the numbers end with 'A'. This
333 is how they appear in the book; it is not a problem with the file.)
335 </p>
336 </li>
337 <li> Wxxxx -- the romanized form of the Korean reading(s) of the kanji.
338 Most of these kanji have one Korean reading, a few have two or more.
339 The readings are in the (Republic of Korea) Ministry of Education
340 style of romanization.
342 </p>
343 </li>
344 <li> Yxxxxx -- the "Pinyin" of each kanji, i.e. the (Mandarin or Beijing)
345 Chinese romanization. About 6,000 of the kanji have these. Obviously
346 most of the native Japanese kokuji do not have Pinyin, however at least
347 one does as it was taken into Chinese at a later date.
349 </p>
350 </li>
351 <li> Xxxxxxx -- a cross-reference code. An entry of, say, XN1234 will mean
352 that the user is referred to the kanji with the (unique) Nelson index
353 of 1234. XJ0xxxx and XJ1xxxx are cross-references to the kanji with
354 the JIS hexadecimal code of xxxx. The `0' means the reference is to a
355 JIS X 0208 kanji, and the `1' references a JIS X 0212 kanji.
357 </p>
358 </li>
359 <li> Zxxxxxx -- a mis-classification code. It means that this kanji is
360 sometimes mis-classified as having the xxxxxx coding. In the case of
361 the SKIP classifications, an extra letter code is used to indicate
362 the type of mis-classification. ZPPn-n-n, ZSPn-n-n and ZBPn-n-n
363 indicate mis-classification according to position, stroke-count and
364 both position and stroke-count. (ZRPn-n-n codes are where Jim
365 Breen &amp;
366 Jack Halpern are having a [hopefully temporary] disagreement over the
367 number of strokes.)
368 </li>
369 </ul>
370 If the final field of a line is not an English field, there is a final space.
371 Each reading and information field is therefore bracketed by space characters
372 (which makes it convenient for searches using programs like "grep".)
374 As far as possible all entries will have their yomikata and readings
375 attached, even if they are a recognized variant of another kanji. This is to
376 facilitate electronic searches using these fields as keys, and should not be
377 taken as a recommendation to use such obscure kanji.
378 </p>
380 <b><a name="IREF04">CURRENT USAGE</a></b>
381 </p>
383 KANJIDIC is used now to build the "kinfo.dat" file which is used by JDIC and
384 JREADER, and by Stephen Chung's JWP. "kinfo.dat" contains the identical
385 information, but in a compressed form and in a structure suitable for fast
386 indexed access.
387 </p>
389 KANJIDIC is also used in the XJDIC and MacJDic dictionary programs, and a
390 growing number of other programs such as KDRILL and KDIC.
391 </p>
393 <b><a name="IREF05">SUPPORT</a></b>
394 </p>
396 KANJIDIC was originally compiled, and is maintained by:
397 </p>
398 <dl><dd>
399 Jim Breen
400 <br>
401 (jwb@csse.monash.edu.au)
402 <br>
403 School of Computer Science &amp; Software Engineering
404 <br>
405 Monash University, Victoria, Australia
406 </dd></dl>
407 If you have suggested changes, send diffs [not complete files] with
408 corrections to him.
410 <b><a name="IREF06">TOO MUCH INFORMATION?</a></b>
411 </p>
413 KANJIDIC is now rather large, and has information in it which is not much use
414 for people who are not studying and researching Japanese orthography. It is
415 still appropriate to maintain it as a useful freely-available compendium of
416 such information.
417 </p>
419 For people who only wish to use a subset of the information in KANJIDIC,
420 there is a program "kdfilt.c", also available as kdfilt.exe for MS-DOS, which
421 will strip out unwanted fields. Dan Crevier has also released a program
422 (kanjidicSplit) which does the same for MacJDic users. (For users of the JDIC
423 program, the KANJDFIX.EXE utility also strips out unwanted fields prior to
424 building the KINFO.DAT file.)
425 </p>
427 <b><a name="IREF07">HISTORY</a></b>
428 </p>
430 (some comments by Jim Breen)
431 </p>
433 KANJIDIC began as two files: jis1detl.lst and jis2detl.lst, which were later
434 merged into a single file.
435 </p>
437 The first file was compiled initially from the file "kinfo.dat" supplied by
438 Stephen Chung, who in turn compiled his file from a file prepared by Mike
439 Erickson. I originally added about 1900 "meanings" by James Heisig keyed in
440 by Kevin Moore from the book "Remembering The Kanji". I later added the
441 meanings from Rik Smoody's files, compiled when he was working for Sony in
442 Japan. These appear to have been based on Nelson.
443 </p>
445 The second file was compiled from a complete JIS2 list with Bushu and stroke
446 counts kindly supplied to me by Jon Crossley, to which I added Nelson
447 numbers, yomikata and meanings extracted from Rik Smoody's file.
448 </p>
450 Theresa Martin was an early assister with this file, particularly with
451 tracking down and correcting many mistranscribed yomikata (the old zu/dzu,
452 oo/ou, ji/dji, etc. problems).
453 </p>
455 Jeffrey Friedl did a major overhaul in September-October 1992, in which he
456 added the original frequency rankings, Halpern codes, SKIP patterns, updated the
457 grading ("G" fields) to reflect the modern Jouyou lists, corrected radical
458 numbers, corrected stroke counts and readings to fall in line with modern
459 usage.
460 </p>
462 Magnus Halldorsson corrected some erroneous Halpern numbers, and provided
463 them for a lot of the radicals. He provided the list of Heisig indices,
464 which he originally compiled himself, then verified and expanded using lists
465 from Richard Walters and Antti Karttunen. He also passed on to me the list of
466 Gakken indices compiled by Antti Karttunen.
467 </p>
469 Lee Collins provided the Unicode mappings (see appendix B)
470 </p>
472 Iain Sinclair has provided the yomikata, meanings and S&amp;H indices of many of
473 the obscure JIS2 kanji.
474 </p>
476 Christian Wittern, a Sinologist working at Kyoto University, sent me a
477 monster file prepared by Dr Urs App from Hanazono College. From this I have
478 extracted the Four Corner and Morohashi information. Christian also provided
479 the original Pinyin details, which were later replaced. I am very grateful
480 for these significant contributions.
481 </p>
483 In March 1994 the Morohashi indices were proof-read and corrected by
484 Christian.
485 </p>
487 Alfredo Pinochet supplied all the Henshall numbers.
488 </p>
490 Ingar Holst has provided considerable assistance in regularizing the Bnnn and
491 Cnnn radical classifications to remove some errors that were in the original
492 JIS2 file, and to make it all conform to Nelson's classification.
493 </p>
495 In mid-1993 I withdrew the SKIP codes from the distributed file as it
496 appeared that their presence violated Jack Halpern's copyright on these
497 codes. Jeffrey Friedl contacted Jack about this, and Jack obtained permission
498 from his publisher for the codes to be included subject to the copyright and
499 usage restrictions stated in this document. In March 1994 the Halpern indices
500 and SKIP codes were checked against an extract from Jack's files, and the "Z"
501 mis-classification codes added, again from his files. Jack has also made a
502 lot of useful comments and suggestions about the content and format of the
503 file. I am most grateful to Jack for his permission and assistance, and also
504 to Jeffrey for making the contact.
505 </p>
507 In May 1995, a number of updates took place. Jeffrey Friedl established
508 contact with James Heisig, and obtained a further set of his indices. I
509 contacted Mark Spahn (via the "honyaku" mailing list) and he kindly provided
510 most of the missing S&amp;H descriptors, and Jack Halpern released to me the SKIP
511 codes of the kanji not in the New Japanese-English Character Dictionary. For
512 all this material I am most grateful.
513 </p>
515 In August 1995, I added the O'Neill index numbers. These were compiled by
516 Jenny Nazak, David Rosenfeld and myself. Thanks to Jenny &amp; David for their
517 assistance.
518 </p>
520 In January and February 1996 the Morohashi numbers were checked thoroughly
521 against two important sources: a file of Unicode-Morohashi data (Uni2Dict)
522 which was prepared by Koichi Yasuoka from the allocation in the JIS X 0221
523 standard, and the review draft of the proposed revision of the JIS X 0208
524 standard, which was prepared by the INSTAC Committee, and made available in
525 a text file, thus enabling comparisons. All the mismatches between the three
526 files were examined against the Morohashi text, and extensive corrections
527 made to all three files. I am grateful to Koichi Yasuoka and Masayuki
528 Toyoshima for their considerable assistance in this task.
529 </p>
531 In March 1996 the Korean readings were added. They were provided by Dr
532 Charles Muller of Toyo Gakuen University (acmuller@gol.com), to whom I
533 am most grateful. Chuck's compilation of Korean readings is extremely
534 thorough and scholarly, and I am pleased to be able to incorporate
535 them.
536 </p>
538 In April 1996 the readings of all the kanji were compared with those in the
539 JIS X 0208 draft, and a number of corrections and additions made.
540 </p>
542 In May 1996 I carried out a "unification" of the readings of the KANJIDIC
543 and KANJD212 files, wherein all the readings of the "itaiji" were brought
544 into line. The identification of these itaiji was drawn from a file posted
545 to the fj.kanji group by Taichi Kawabata (kawabata@is.s.u-tokyo.ac.jp),
546 which was compiled at the ETL from the itaiji identification in the
547 JIS X 0208 and JIS X 0212 standards. I corrected a few errors, and added
548 some extra sets which were indicated in the JIS X 0208-1996 draft.
549 </p>
551 In July 1996 the Pinyin details were completely replaced by a new set. The
552 original Pinyin were from an earlier compilation by Christian Wittern, and
553 and contained many errors. Two more reliable sources had become available:
554 the Uni2Pinyin file compiled by Koichi Yasuoka, which is based in part on
555 the TONEPY.tit by Yongguang Zhang; and the PYCHAR set of readings of Big5
556 hanzi compiled by Christian Wittern. The Pinyin currently in the KANJIDIC
557 file is a combination of the two, following the order in the Uni2Pinyin
558 file.
559 </p>
561 In August 1996 I corrected a few more missing and erroneous Nelson numbers,
562 using a massive Nelson list prepared by Wolfgang Cronrath. He also flagged
563 the kokuji, so I added these to the readings fields as "{(kokuji)}".
564 </p>
566 Also in August 1996 I deleted the handful of former "XJxxxx" cross-references,
567 and replaced them with a much more comprehensive set, so that they now
568 represent all the recognized "itaiji". The file I used for this was the
569 corrected itaiji file mentioned above.
570 </p>
572 In April 1997 I corrected a large number of bushu codes. Many of these had
573 been identified as errors by Jean-Luc Leger (reiga@iria.mines.u-nancy.fr) who
574 analyzed and examined all the Nelson bushu. I also identified and added a large
575 number of missing Cnnn codes.
576 </p>
578 Also in April 1997 I added the S&amp;H "Kanji &amp; Kana" indices. These had been
579 keyed by Olivier Galibert (Olivier.Galibert@mines.u-nancy.fr). (There must
580 be an outbreak of kanji interest on Nancy.)
581 </p>
583 In February 1998, the long-awaited inclusion of the "New Nelson" numbers took
584 place. I had been waiting for the editor of the New Nelson, John Haig, to
585 supply a list (as he had agreed some years before), but in the meantime,
586 Jean-Luc Leger keyed a list, so they are now available.
587 </p>
589 Also between December 1997 and February 1998 a large number of Level 2
590 kanji had their stroke counts corrected to bring them into line with the
591 counting principles used in the Level 1 kanji. This usually aligned the
592 counts with those used in the New Nelson and in S&amp;H. Appendix E of this
593 document was amended to reflect this. The leg-work in tracking this material
594 down was done by Wolfgang Cronrath.
595 </p>
597 During December 1998 &amp; Jan 1999 I updated the stroke counts of many of the
598 Level 2 kanji, using an analysis of them carried out by Wolfgang Cronrath.
599 I also added the De Roo codes, which had been keyed by Jasmin Blanchette,
600 who also typed the explanatory material. I contacted Fr De Roo in Tokyo who
601 readily agreed to the inclusion of thecodes.
602 </p>
604 The extension of the S&amp;H Kana &amp; Kanji numbers to the 2nd edition was
605 done by Enrique Sanchez Rosa.
606 </p>
608 The Hangul versions of the Korean readings (which only appear in the
609 XML version) were provided by Francis Bond and Kyonghee Paik.
610 </p>
612 I did the Tuttle card numbers myself.
613 </p>
615 James Rose provided the numbers from Crowley's "The Kanji Way to Japanese
616 Language Power", Sakade's "A Guide To Reading and Writing Japanese", and
617 also for that book's 3rd Edition edited by Henshall, Seeley &amp; De Groot.
618 </p>
620 The "Kodansha's compact Kanji guide" codes were provided by Richard Fremmerlid.
621 </p>
623 The "Kanji in Context" codes were provided by Randy Foreman.
624 </p>
626 The Spanish kanji meanings (which appear in the XML format, and may also
627 appear in special versions of KANJIDIC) were compiled by Francisco Gutierrez
628 and provided by Gabriel Sanroman.
629 </p>
631 <b><a name="IREF08">KANJIDIC LICENCE STATEMENT AND COPYRIGHT NOTICE</a></b>
632 </p>
634 In March 2000, James William Breen assigned ownership of the copyright
635 of the dictionary files assembled, coordinated and edited by him to the
636 The Electronic Dictionary Research and Development Group at Monash
637 University.
638 </p>
640 Information about the formal usage arrangement for KANJIDIC can be found on
641 <a href="http://www.csse.monash.edu.au/%7Ejwb/edrdg/">the Group's WWW page. </a>
642 </p>
644 In summary, KANJIDIC can be freely used provided satisfactory
645 acknowledgement is made, and a number of other conditions are met.
646 </p>
648 The following people have granted permission for material for which they hold
649 copyright to be included in the files, and distributed under the above
650 conditions, while retaining their copyright over that material:
651 </p>
653 Jack HALPERN: The SKIP codes in the KANJIDIC file.
654 </p>
656 With regard to the SKIP codes, Mr Halpern draws your attention to the
657 statement he has prepared on the matter, which is included at Appendix F.
658 </p>
660 Christian WITTERN and Koichi YASUOKA: The Pinyin information in the KANJIDIC
661 file.
662 </p>
664 Urs APP: the Four Corner codes and the Morohashi information in the KANJIDIC
665 file.
666 </p>
668 Mark SPAHN and Wolfgang HADAMITZKY: the kanji descriptors from their
669 dictionary.
670 </p>
672 Charles MULLER: the romanized Korean readings.
673 </p>
675 Joseph DE ROOO: the De Roo codes.
676 </p>
678 <b><a name="IREF09">APPENDIX A - JIS CODES</a></b>
679 </p>
681 For full information about JIS codes, please see Ken Lunde's "japan.inf"
682 file, or his book "Understanding Japanese Information Processing", O'Reilly
683 1993. The following is a brief extract from the "japan.inf" file.
684 </p>
686 "The Japanese character set as described in the document JIS X 0208-1990
687 specifies 6,879 standard characters; 6,355 kanji in 2 levels (Level 1: 2,965
688 kanji arranged by pronunciation; Level 2: 3,390 kanji arranged by radical),
689 86 katakana, 83 hiragana, 10 numerals, 52 Roman characters, 147 symbols, 66
690 Russian characters, 48 Greek characters, and 32 line elements (for making
691 charts).
692 </p>
694 This standard was first established in 1978, modified for the first time in
695 1983 (character position swapping, glyph changes, and four kanji appended to
696 JIS Level 2), and modified again in 1990 (two kanji were appended to JIS
697 Level 2). This character set is widely implemented on a variety of platforms.
698 Encoding methods for JIS X 0208-1990 include Shift-JIS, EUC, and JIS."
699 </p>
701 <b><a name="IREF10">APPENDIX B - UNICODE</a></b>
702 </p>
704 The following information about Unicode was provided in 1992 by Lee
705 Collins at Taligent.
706 </p>
708 (The Unicode sequences are) "the final, official mapping to JIS of the
709 CJK-JRG's (Chinese, Japanese, Korean- Joint Research Group) "Unified
710 Repertoire and Ordering Version 2.0" which is the unified Han character set
711 of ISO 10646 and Unicode. All of the Unicode companies (Apple, IBM,
712 Microsoft, NeXT, Taligent, etc) are now using this mapping. There has been
713 some confusion because of difference in nomenclature. Unicode people call it
714 UniHan, the Chinese sometimes call it HCS (Han Character Set) and ISO calls
715 it "Ideographic CJK Character Unified Repertoire and Ordering". ISO can't use
716 the term "Han" character because Japan was very sensitive to this (even
717 though it is a direct translation of "Kanzi") and it can't be called a
718 character set because only ISO WG2 is empowered with the authority to encode
719 characters. Problems of naming aside, they are all the same thing.
720 </p>
722 The CJK-JRG was formed under the aegis of ISO in 1990 to investigate and
723 propose a unified Han character set for inclusion in ISO 10646. It brought
724 together various experts on Han characters from China, Hong Kong, Japan,
725 Korea, Taiwan and the United States selected by the national bodies
726 participating in ISO WG2.
727 </p>
729 Including the initial work in the US on Unicode and in China on GB 13000,
730 which were merged and became the basis for the URO, the task spanned about 4
731 years. The work was completed in April of this year. It contains 21,000 Han
732 characters from all of the major standards used in East Asia, including JIS X
733 0208-1990 and JIS X 0212-1990. The Unicode consortium provides a
734 cross-reference file for all of the source sets. To get a copy contact
735 </p>
737 Steve Greenfield
738 <br>
739 unicode-inc@HQ.M4.Metaphor.COM
740 </p>
742 For further details about the URO/UniHan, you might want to pick up a copy of
743 the "The Unicode Standard Version 1.0 Vol II". It's published by Addison
744 Wesley, ISBN 0-201-60845-6. It's been available in the USA for over a month
745 now. For a slightly different presentation of the characters, a copy of 10646
746 or of the "Ideographic CJK Character Unified Repertoire and Ordering Version
747 2.0" might be available through the the Australian national body to ISO WG2."
748 </p>
751 <b><a name="IREF11">APPENDIX C - SKIP CODES</a></b>
752 </p>
754 S K I P - SYSTEM OF KANJI INDEXING BY PATTERNS
755 </p>
757 [This document contains the text and examples from the covers of the "New
758 Japanese-English Character Dictionary" edited by Jack Halpern and published
759 by Kenkyusha and NTC. It is reproduced with Mr Halpern's kind permission.
760 </p>
761 <pre>The text on which this is based used four patterns which are not able to be
762 reproduced in this document. They are referred to below as #1 through #4,
763 and relate to the following shapes in the NJECD:
765 . ¢£¢£¡±¡±¡Ã ¢£¢£¢£¢£ ¢£¢£¢£¢£ ¢£¢£¢£¢£
766 . ¢£¢£ ¡Ã ¢£¢£¢£¢£ ¢£¢£¢£¢£ ¢£¢£¢£¢£
767 . ¢£¢£ ¡Ã ¢£¢£¢£¢£ ¢£ ¢£ ¢£¢£¢£¢£
768 . ¢£¢£ ¡Ã ¡Ã ¡Ã ¢£ ¢£ ¢£¢£¢£¢£
769 . ¢£¢£ ¡Ã ¡Ã ¡Ã ¢£¢£¢£¢£ ¢£¢£¢£¢£
770 . ¢£¢£¡²¡²¡× ¡Ã¡²¡²¡× ¢£¢£¢£¢£ ¢£¢£¢£¢£
772 . #1 #2 #3 #4
773 . LEFT- TOP- ENCLOSURE SOLID
774 . RIGHT BOTTOM]
777 . HOW TO LOCATE AN ENTRY
779 A. Determine the SKIP number of your character.
781 STEP 1 IDENTIFY PATTERN
783 Determine to which of the four PATTERNS your character belongs to get the
784 first part of the SKIP number (the PATTERN NUMBER).
786 If your character belongs to pattern #1, #2 or #3 (Áꢪ#1), carry out the
787 steps in the left column; if it belongs to pattern #4 (²¼¢ª#4), carry out the
788 steps in the right column. (REF: R4. How to Identify the Pattern)
790 . #1 #2 #3 #4
792 STEP 2
793 DIVIDE CHARACTER OMIT
794 Divide the character into two parts at (Since solid characters
795 the first division point. [Áê=ÌÚ+ÌÜ] cannot be divided, go to
796 REF: R5. How to Divide the Character STEP 3.) REF: R6. How to
797 Subclassify the Solid Pattern
799 STEP 3
800 COUNT STROKES OF SHADED PART DETERMINE TOTAL STROKE-COUNT
801 Count the strokes of the SHADED PART Determine the total stroke-count of
802 to get the second part of the SKIP your character to get the second part
803 number. [Áê #1 1-4-] of the SKIP number. [²¼ #4 4-3-]
804 REF: Appendix 2. How to Count Strokes REF: Appendix 2. How to Count Strokes
806 STEP 4
807 COUNT STROKES OF BLANK PART IDENTIFY SOLID SUBPATTERN
808 Count the strokes of the BLANK PART Determine to which of the four
809 to get the third part of the SKIP SOLID SUBPATTERNS your character
810 number. [Áê #1 1-4-5] belongs to get the third part of the
811 REF: Appendix 2. How to Count Strokes SKIP number. Select from: `¡±' 1,
812 `¡²' 2, `|' 3, or `¢£' 4. [²¼ #4 4-3-1]
813 REF: R6. How to Subclassify the
814 Solid Pattern
816 After determining the SKIP number of your character, locate your character
817 entry in one of two ways:
819 1. Determine the entry number in the Pattern Index beginning on p. 1952 then
820 locate your character entry in the main part of the dictionary. See R3.1.2
821 Index Method for details.
823 2. Locate your character entry directly (without referring to the Pattern
824 Index) from its SKIP number. See R3.1.3 Direct Method for details.
826 NOTE: All references preceded by a section mark (R) refer to SYSTEM OF KANJI
827 INDEXING BY PATTERNS beginning on p. 106a
830 HOW TO IDENTIFY THE PATTERN
832 DETERMINE TO WHICH OF THE FOUR PATTERNS YOUR CHARACTER BELONGS
834 #1 Characters that can be divided into left and right parts
835 RIGHT: Áê 4-5 Ȭ 1-1 ½ç 1-11 °· 3-3
836 WRONG: ÊÒ 1-3 ÍÑ 1-4 ²Ä 3-2 ¿ 3-3
838 #2 Characters that can be divided into top and bottom parts
839 RIGHT: Æó 1-1 »û 3-3 ¸Å 2-3 ½Õ 5-4
840 WRONG: Ëü 1-2 ¹Í 4-2 ´Ö 8-4 ºÁ 4-3
842 #3 Characters that can be divided by an enclosure element
843 RIGHT: ¿Ê 3-8 ¹­ 3-2 Ìä 8-3 ¹ñ 3-5
844 WRONG: Æþ 1-1 ¸â 4-3 ̾ 3-3 °Ù 5-4
846 #4 Characters that cannot be classified under patterns #1, #2, or #3
847 RIGHT: ±« 8-1 ʼ 5-2 Ãæ 4-3 Í¿ 3-4
848 WRONG: Åá 2-1 Æü 4-1 ¿å 4-3
850 IF A CHARACTER CAN BE CLASSIFIED UNDER MORE THAN ONE PATTERN, SELECT THE ONE
851 THAT FOLLOWS THE NATURAL CONSTRUCTION OF THE CHARACTER
852 RIGHT: »ù 2-5-2 È¢ 2-6-9
853 WRONG: »ù 1-2-5 È¢ 1-7-8
856 HOW TO DIVIDE THE CHARACTER
858 DIVIDE THE CHARACTER INTO TWO PARTS AT THE FIRST DIVISION POINT
860 #1 Going from left to right, divide at the first space
861 RIGHT: ÌÀ 4-4 ¾® 1-2 °· 3-3
862 WRONG: ¾® 2-1 ³¹ 9-3
864 #2 Going from top to bottom, divide at the first space, horizontal line, or
865 frame element, whichever comes first
866 RIGHT: »° 1-2 ¶¼ 2-8 ÀÖ 3-4 ¸Å 2-3
867 WRONG: »° 2-1 ¶¼ 6-4 ÀÖ 2-5 ²¼ 1-2
869 #3 Going from the outside toward the inside, divide after the first enclosure
870 element
871 RIGHT: ÅÙ 3-6 ¿Ê 3-8 ÊÄ 8-3 ÌÜ 3-2
872 WRONG: ÅÙ 7-2 Ëá 11-5
874 DO NOT VIOLATE THE PRINCIPLE OF ELEMENT INTEGRITY
875 . 1. Never break through strokes
876 . RIGHT: ¶§ 3-2-2 WRONG: ¶§ 1-1-4
877 . 2. Never break through indivisible units
878 . RIGHT: ¾ð 1-3-8 WRONG: ¾ð 1-1-10
879 . 3. Never make unnatural divisions
880 . RIGHT: µ¤ 3-4-2 WRONG: µ¤ 2-2-4
882 HOW TO SUBCLASSIFY THE SOLID PATTERN
884 A. DETERMINE TO WHICH OF THE FOUR SOLID SUBPATTERNS YOUR CHARACTER BELONGS
886 `T' 1. Characters that contain a top line
887 RIGHT: ±« 8-1 ²¼ 3-1 ¼ª 6-1 ²Ì 8-1
888 WRONG: Åá 2-1 Àé 3-2 ¿â 8-1 ʼ 5-1
890 2. Characters that contain a bottom line
891 RIGHT: ¾å 3-2 ʼ 5-2 ¿â 8-2
892 WRONG: »³ 3-2 Êñ 5-2 ¼Ô 8-2
894 3. Characters that contain a through line
895 RIGHT: Ãæ 4-3 Åì 8-3 ÌÓ 4-3
896 WRONG: ¿å 4-3 À£ 3-3 ¸á 4-3 Äï 7-3
898 4. Characters that do not contain a top line, bottom line, or through line
899 RIGHT: Í¿ 3-4 Âç 3-4 ¼÷ 7-4
900 WRONG: »å 6-4 µ× 3-4 ͧ 4-4 Îô 6-4
902 B. IF A CHARACTER CAN BE CLASSIFIED UNDER MORE THAN ONE SUBPATTERN, THE
903 SUBPATTERN WITH THE SMALLEST NUMBER TAKES PRECEDENCE
904 RIGHT: ²¦ 4-1 ¸Ê 3-1 ÆÓ 7-1 ²Ì 8-1 ½Ð 5-2 À¸ 5-2 ¹Ã 5-1
905 WRONG: ²¦ 4-2 ¸Ê 3-2 ÆÓ 7-2 ²Ì 8-3 ½Ð 5-3 À¸ 5-3 ¹Ã 5-3
906 </pre>
908 <b><a name="IREF12">APPENDIX D: - AN OVERVIEW OF THE FOUR CORNER CODING SYSTEM</a></b>
909 </p>
911 The Four Corner System has been used for many years in China and Japan for
912 classifying kanji. In China it is losing popularity in favour of Pinyin
913 ordering. Some Japanese dictionaries, such as the Morohashi Daikanwajiten
914 have a Four Corner Index.
915 </p>
917 The following overview of the system has been condensed from the article "The
918 Four Corner System: an introduction with exercises" by Dr Urs App, which
919 appeared in the Electronic Bodhidharma No 2, February 1992, published by the
920 International Research Institute for Zen Buddhism, Hanazono College. (More
921 examples will be added from that article in due course.)
922 </p>
923 <pre>1. Stroke shapes are divided into ten classes:
925 . 0 LID е
926 . 1 HORIZONTAL LINE °ì
927 . 2 VERTICAL LINE ¡Ã
928 . 3 DOT Ц
929 . 4 CROSS ½½
930 . 5 SKEWER ¥­
931 . 6 BOX ¸ý
932 . 7 ANGLE ÒÌ
933 . 8 HACHI Ȭ
934 . 9 CHIISAI ¾®
936 2. The Four Digits are derived from the Four Corners in a Z-shaped order.
938 . A B 7 1 7 7
939 . for example: ¸¶ ·î
940 . C D 2 9 2 2
942 Some examples: »Å 2421 ¹Ô 2122 Îò 7121 µû 2733 »ì 0762 Ʊ 7722 ¶¶ 4292
944 3. A shape is only used once. If it fills several corners, it is counted as
945 zero in subsequent corners.
947 Some examples: ¸ý 6000 ¼ó 8060 ʬ 8022 Âç 2003 Ï 2690 ÉÊ 6066 µþ 0096
949 4. When the upper or lower half of a character consists of only one (single
950 or composite) shape, it is, regardless of its position, counted as a left
951 corner. The right corner is counted as zero.
953 Some examples: Ω 0010 ͳ 5060 Àã 1017 Êý 0022 Äí 0024 »å 2090 ¼ê 2050
955 5. When there is no additional element to the four sides of the characters
956 .¸ý, Ìç, ò¨ (and sometimes ¹Ô), whatever is inside these characters is taken
957 for the lower two corners.
959 Some examples: Ìä 7760 ¼ü 6080 Ô¢ 6015 ÌÜ 6010 ³« 7744 ÌÌ 1060 îò 2110
961 6. The analysis is based on the block-style handwritten kaisho (Ü´½ñ) shape
962 of characters.
964 (This needs attention, as ¸Í is 3027, not 1027. The top stroke is treated as
965 a Ц.)
967 7. Some points to note when analyzing shapes:
969 o Shape 0:
971 When the horizontal line below a DOT shape (number 3) is connected to another
972 stroke at its right-hand end (as in Õß ¸Í, etc.) it is not counted as a LID
973 (number 0) but as a DOT.
975 Examples: °Â 3040 ¿À 3520 µ§ 3222
977 o Shape 6:
979 Characters such as »® and Õù where one of the strokes of the square extends
980 beyond it, are not considered to be square (number 6) shapes, but corners
981 (number 7).
983 Examples: ³î 7710 ½ê 3222 »® 7710 ´Û 8377 µ¹ 3010
985 o Shape 7:
987 Only the cornered end of corner shapes (number 7) is counted as 7.
989 Examples: ¶è 7171 ¶Ô 7222 ¶ç 2762 È¿ 7124
991 o Shape 8:
993 Strokes that cross other strokes are not counted as shape number 8 (Ȭ).
995 Examples: Èþ 8043 ´Ø 7743 Âç 4003 ¼º 8043 ¹Õ 2143 Àí 9043
997 o Shape 9:
999 Shapes resembling shape 9, but featuring two strokes in the middle (as in the
1000 top part of ¶È or ÁÑ) or two strokes on one side (as in ¿å or the bottom part
1001 of Êé) are not considered as 9 shapes.
1003 Examples: Êé 4433 ¶È 3290 ÁÑ 3214
1005 8. Some points to note when choosing corners.
1007 - when a corner is occupied by more than one independent or parallel strokes,
1008 the one that extend furthest to the left or right is taken as the corner,
1009 regardless of how high or low it is.
1011 examples: Èó 1111 Ðë 2124 ¼À 0013 Äë 0022 ¼Ò 3421 ÌÔ 4721
1013 - if there is another shape above (or, at the bottom of the character, below)
1014 the leftmost or rightmost stroke of a character, that shape is given
1015 preference and is taken as the corner.
1017 examples: »¡ 3090 ¹¬ 4040 ᶠ6020 ½÷ 4040 ã¹ 3521 ¶ 4480
1019 - when two composite stroke shapes are interwoven and each could be regarded
1020 as a corner, the shape that is higher is taken as the upper corner, and the
1021 lower stroke as lower corner.
1023 - when a stroke that slopes downwards to the left or right is supported by
1024 another stroke, the latter is taken as the corner.
1026 examples: ±° 2740 ΢ 0073 ¾Ë 1962 é° 4464 ·Ô 4410 Èï 3424
1028 - a left slanting stroke on the upper left is taken for the left corner only;
1029 for the right corner one takes a stroke more to the right.
1031 examples: ¿È 2740 ̶ 2350 ³û 6752 Ū 2762 ½Ü 2762 Åç 2772
1033 9. Shape variations: (Dr App includes several pages of examples)
1035 10. The fifth corner:
1037 In order to differentiate between the several characters with the same code,
1038 an optional "fifth corner" is sometimes used. This is, loosely, a shape above
1039 the fourth corner which has not been used in any other shape.
1040 </pre>
1042 <b><a name="IREF13">APPENDIX E. RADICAL AND STROKE COUNTING RULES</a></b>
1043 </p>
1045 These rules apply:
1046 </p>
1047 <ol type="a">
1048 <li>to the stroke-counts themselves;
1050 </p>
1051 </li>
1052 <li>to the stroke counts in the SKIP codes. Where this results in a SKIP
1053 which differs from that in the NJECD, or in the non-NJECD SKIPs
1054 provided by Jack Halpern, the Jack Halpern version is included prefixed
1055 with "ZR"
1056 </li>
1057 </ol>
1058 RADICALS
1060 The radicals listed below are ones where there are differing approaches to
1061 the counting of radicals in the various references. The stroke counting in
1062 this file does not strictly follow any reference, but tends to more
1063 aligned to Halpern.
1064 </p>
1065 <ol>
1066 <li>B54 ENNYOU - ׮. Traditionally counted as 3 strokes, but more recently
1067 often counted as 2. S&amp;H counr this as 2; Nelson, Halpern, Koujien, etc,
1068 count it is 3. I treat it as 3.
1070 </p>
1071 </li>
1072 <li> B140 KUSA-KANMURI e.g. ²× always counted as 3 strokes (Halpern counts
1073 this 4 strokes for the (mostly level 2) kanji where the older form is
1074 often printed.) Note that this has been carried through to kanji where
1075 this element is not the indexing radical, such as ۯ.
1077 </p>
1078 </li>
1079 <li> B162 SHIN-NYUU e.g. ô£ or °© counted as 3 or 4 strokes. (Nelson and
1080 S&amp;H
1081 count it as 2 strokes, and Halpern as either 3 or 4.) [See Note 1 below.]
1083 </p>
1084 </li>
1085 <li> B163 OOZATOZUKIRI &amp; B170 KOZATO-HEN Ë® and ïô always counted as 3 strokes
1086 (Nelson and S&amp;H count it as 2, Halpern as 3.) This also applies where it
1087 appears mid-kanji, such as in Üó.
1089 </p>
1090 </li>
1091 <li> B199 MUGI Çþ always counted as 7 strokes, except for óÎ &amp; óÏ where it
1092 is counted as 11. (Nelson and Halpern do the same, and S&amp;H avoid treating
1093 it as a radical, but count it as 12 in the remainder.)
1095 </p>
1096 </li>
1097 <li> B113 SHIMESU e.g. Îé, is counted as 4 strokes in that form, and 5 strokes
1098 in its older form, ã«. 18 kanji are in the 4-stroke form and 20 are in
1099 the 5-stroke form. (Nelson and S&amp;H count it as 4; Halpern counts it as 4
1100 or 5. [See Note 1.])
1102 </p>
1103 </li>
1104 <li> B184 SHOKU HEN ¿©, µ², etc.is counted as 8 strokes in the µ² form, and as
1105 9 strokes in the Ò¬ and »Á forms. (Nelson and S&amp;H count it as 8 strokes,
1106 and Halpern as 8 or 9.) [See Note 1. below.]
1108 </p>
1109 </li>
1110 <li> B131 SHIN/KERAI ¿Ã. Counted as 7 (Nelson counts it as 6, Halpern as 7
1111 (in the book), and S&amp;H as both for different kanji.)
1113 </p>
1114 </li>
1115 <li> B136 MAI ASHI Á¤. Counted as 7 (traditionally counted as 6, in
1116 accordance with the older writing of `¥ð'. Nelson counts as 6, S&amp;H as
1117 7, and Halpern as 7 for ¾ïÍÑ and ¿Í̾ÍÑ´Á»ú and 6 for the rest.) Note
1118 this is also applied to counting å¬ and for kanji with the ðê pattern.
1120 </p>
1121 </li>
1122 <li> B131 SHIN or KERAI ¿Ã. Counted as 7 (traditionally counted as 6). Nelson
1123 counts as 6, Halpern as 7, and S&amp;H as 6 or 7 in different cases.
1125 </p>
1126 </li>
1127 <li> The ROO or OI radical (Ϸ) has a variant consisting of the top 4 strokes.
1128 For example, it is in ¼Ô. Traditionally, this variant had an extra dot,
1129 and was counted as 5 strokes. I'm counting it as 4 throughout.
1130 </li>
1131 </ol>
1132 OTHER STROKE PATTERNS
1133 <ol>
1134 <li> While the pattern ±± is a 6-stroke radical, the top half of Ò× is made up
1135 of three distinct parts totalling 8 strokes. Note that this also is the
1136 case with Õ¿, Þì, çÛ and Áé despite the simplification in the JIS glyphs.
1138 </p>
1139 </li>
1140 <li> ²ç (KIBA HEN) is a problem. It is classically counted as 4 strokes, but
1141 these days has a flick that makes it effectively 5. Halpern, Nelson and
1142 S&amp;H usually have it as 5 strokes, so I'm standardizing on that.
1144 </p>
1145 </li>
1146 <li> Another little horror is ÚÜ (MU or NASHI), which is classically counted
1147 as 4 strokes. The most common variant has 5 strokes, but looks like 6.
1148 Halpern, S&amp;H and the Classical Nelson count this as 4 strokes, and the New
1149 Nelson as 5. I'm making it 5 too.
1151 </p>
1152 </li>
1153 <li> The JUU or ASHIATO radical is at the bottom of ¶Ù and ã¼. It is
1154 traditionally counted as 5 strokes, although sometimes it looks like 4.
1155 I'm using 5 throughout.
1157 </p>
1158 </li>
1159 <li>A related shape is ¥à, as in ±», ¸É, ¸Ì, etc. This is sometimes counted
1160 as two strokes (both Nelsons) and sometimes as three strokes (Halpern, S&amp;H).
1161 Classically it is regarded as two strokes. I am using 6 strokes for ±».
1163 </p>
1164 </li>
1165 <li> The pattern to the left of ÚÉ, which appears in several kanji, e.g.
1166 ʾ and ÊÍ, has 8 strokes. (There are 3 strokes at the top as in ¾°.)
1168 </p>
1169 </li>
1170 <li> The "east" pattern (Åì) has 8 strokes. There is an older form in which
1171 there are two strokes in the box (ÛË). It is counted as 8
1172 strokes here in the Åì form (e.g. ´Ò) and 9 in the ÛË form, as in ëÝ.
1174 </p>
1175 </li>
1176 <li> The pattern at the bottom of ð´ is counted as 4 strokes in modern
1177 dictionaries, although traditionally it was 5.
1179 </p>
1180 </li>
1181 <li> The pattern ´¬, which appears in several kanji, is counted as 9 strokes.
1182 Several dictionaries count it as either 8 or 9.
1184 </p>
1185 </li>
1186 <li> The pattern on the left of ¼ý is variously handled as 2 strokes or 3
1187 strokes. As more recent dictionaries make it 4, I will do so too.
1188 </li>
1189 </ol>
1190 Note The JIS X 0208-1990 standard does not formally specify the precise
1191 glyphs used for kanji, however the glyphs it uses in the published
1192 version have become de facto standards for many font compilations. In
1193 the published standard, for several kanji, e.g. é/íé, Îé/ã«, µ²/Ò¬, the
1194 JIS level one kanji use the simpler form, and the Level 2 kanji use the
1195 older more complex form. Just to make matters worse, many fonts for
1196 JIS X 0208 kanji are based on the bit-maps specified in JIS X 9051-1984
1197 standard, which defines the 16x16 patterns for JIS X 0208-1983 characters.
1198 According to Ken Lunde: "This standard was not very good, and JSA is no
1199 longer supporting it."
1200 Anyway, JIS X 9051-1984 had the simpler form for all these bushu in both
1201 Levels 1 and 2, as well as having simplifications of kanji like ßÉ. Thus,
1202 as the font foundries have freedom to choose whichever glyphs they like,
1203 what you see on your screen may well not agree with these rules. All
1204 the rules in this appendix relate to the glyphs as published in the
1205 JIS X 0208-1990 standard, and as appearing in font compilations based
1206 on them.
1208 <b><a name="IREF14">APPENDIX F.CONDITIONS FOR USING SKIP DATA by Jack Halpern (jack@kanji.org)</a></b>
1209 </p>
1211 Ever since my New Japanese-English Character Dictionary (NJECD) came
1212 out (Kenkyusha 1990, NTC 1993), I have been getting inquiries asking for
1213 permission to use SKIP (System of Kanji Indexing by Patterns) data in
1214 software products and electronic dictionaries. Below I explain the
1215 policy of the Kanji Dictionary Publishing Society (KDPS) on how to use
1216 copyright issues when distributing SKIP data or using it in software
1217 product or electronic dictionary.
1218 </p>
1220 WHAT IS SKIP?
1221 </p>
1223 Briefly, SKIP is an indexing system that enables the user to locate
1224 kanji quickly and accurately. The system is extremely convenient because
1225 it can be learned in a very short time, is easy to use, and requires very
1226 little prior knowledge of kanji.
1227 </p>
1229 The central idea of SKIP is the classification of characters into four
1230 major categories on the basis of easy-to-identify geometrical
1231 &lt;patterns&gt;:
1232 </p>
1233 <pre>1. Left-right
1234 2. Up-down
1235 3. Enclosure
1236 4. Solid
1237 </pre>
1238 Characters belonging to the first three categories are arranged in
1239 ascending order of hyphenated numerals that represent the number of
1240 strokes in the &lt;shaded part,&gt; and the number of strokes in the &lt;blank
1241 part.&gt; See http://www.kanji.org and NJECD front matter for details.
1243 To distribute SKIP data within a group or use it in a commercial
1244 or non-commercial product, please confirm that you agree to the following
1245 conditions:
1246 </p>
1247 <ol>
1248 <li>COPYRIGHT AND DISTRIBUTION
1250 SKIP data is protected by copyright, copyleft and patent laws. The
1251 copyright holder is Jack Halpern, chief editor of KDPS (the Kanji
1252 Dictionary Publishing Society). The SKIP data must be protected
1253 from illegal copying and distribution, using such meaasures encryption.
1254 The data must be encrypted if it is to be used in any kind of product,
1255 including commercial products, software and freeware. The data, or extracts
1256 from it, must not be distributed to a third party, must not be sold as
1257 part of any commercial software package, and must not be incorporated
1258 in any published dictionary or other printed document without the
1259 specific permission of the copyright holder.
1260 </p>
1263 </p>
1264 </li>
1265 <li> ACKNOWLEDGMENT OF SOURCE
1267 The source of SKIP data shall be acknowledged in the information
1268 screens of the product, and the following disclaimer should appear
1269 in the documentation and/or help screens:
1270 </p>
1272 "SKIP (System of Kanji Indexing by Patterns) numbers are derived from
1273 the New Japanese-English Character Dictionary (Kenkyusha 1990, NTC
1274 1993) and The Kodansha Kanji Learner's Dictionary (Kodansha
1275 International, 1999). SKIP is protected by copyright, copyleft and
1276 patent laws. The commercial or non-commercial utilization of SKIP in
1277 any form is strictly forbidden without the written permission of
1278 Jack Halpern, the copyright holder. Such permission is normally
1279 granted. Please contact jack@kanji.org and/or see http://www.kanji.org."
1280 </p>
1282 </p>
1283 </li>
1284 <li>ROYALTIES
1286 SKIP is a product of seven years of computer-assisted research and
1287 experimentation on how kanji elements are intuitively perceived in
1288 terms of their parts. Development work was financed by private funds
1289 and research grants. To enable us to continue to develop useful data
1290 and products, we ask for you cooperation by paying KDPS (the Kanji
1291 Dictionary Publishing Society) a royalty 0.5% (negotiable) if you are
1292 using the data for a commercial product. Depending on the circumstances,
1293 it is also possible to use SKIP data free of charge or at a lower
1294 royalty.
1295 </p>
1297 Finally, please send a copy of your product to Jack Halpern
1298 </p>
1299 </li>
1300 </ol>
1301 <b><a name="IREF15">APPENDIX G - DE ROO CODES</a></b>
1303 AN OVERVIEW OF THE DE ROO SYSTEM
1304 </p>
1306 [This document contains the text found in the second edition of "2001 Kanji"
1307 edited by Joseph R. De Roo and published by Bonjinsha.]
1308 </p>
1310 The system used in "2001 Kanji" is intended for the beginner who encounters
1311 a kanji and wants to look it up, knowing neither its radical, pronunciation,
1312 nor its exact number of strokes. The method consists of looking at the top
1313 of the kanji, and then at its bottom, disregarding its other parts.
1314 </p>
1316 "2001 Kanji" provides drawings for all graphic elements. This information
1317 cannot be reproduced here. However, an attempt was made to describe each
1318 element as much as possible given the constraints of a computer text file,
1319 and examples of characters possessing the element are always given.
1320 </p>
1321 <pre>Two-step visual method for locating a kanji:
1323 1. Observe its EXTREME TOP or LEFT TOP.
1325 There are only four possibilities: DOT (Ц), VERTICAL LINE (¡Ã), DIAGONAL
1326 LINE (¥Î), HORIZONTAL LINE (°ì). Each of these four strokes can occur either
1327 in isolation or in connection with one or more strokes. Each of the four
1328 groups of graphic elements correspond to the four basic strokes in their
1329 immediate environment. Each element has a number wich will become the first
1330 half of the kanji number.
1332 DOT (Ц):
1333 3 DOT (Ц) ÇÈ Îä ±Ê ¿´ ɬ ³Ú ¿Þ
1334 4 ROOF (е) µþ
1335 5 DOTTED CLIFF (Öø) Ä£ ¼À
1336 6 ALTAR ¡¡ ¡¡ Îé Èï Ç·
1337 7 KANA U (Õß) °Â
1338 8 LID ¡¡ ¡¡ Çò ÎÉ ¿ã ½® ¼« ¿È µ´ Åç ¸þ ½°
1339 9 HORNS ¡¡ ¡¡ °Ù Äï Á° ³Ø ¸· µó Áã
1341 VERTICAL LINE (¡Ã):
1342 10 SMALL ON BOX ¡¡ ¡¡ ¶È ·ô ÊÆ È¾ ¾° ÊÀ ¸÷ Åö ¾Ó
1343 11 SMALL (¾®) ¾® ¿å ɹ À­ ϧ
1344 12 VERTICAL LINE (¡Ã) »Ý ÅÀ »ß ¸© ¿Í µ¢ ¸â Ò¸ ¼ý ÊÒ Ãû
1345 ¡¡ Èó ÅÍ Àî ½£ »³
1346 13 HAND TO THE LEFT ¡¡ ¡¡ »ý
1347 14 CROSS (½½) ¸ ¼Ô ¼° Âç É× ÁÕ À£ µá ±¦ Ë® ¼·
1348 ¡¡ ÅÚ ÆÇ Íè ºÊ ÆÖ
1349 15 CROSS ON BOX (¸Å) ¿¿ Æî ¼Ö ·Ã Åì Ä« « »É ¸Å »ö
1350 16 KANA KA (¥«) ´Ý ½ñ °ã Æâ Éý Èé Ãæ ¿½ ±û Äá
1351 17 WOMAN (½÷) °ù
1352 18 TREE (ÌÚ) ËÜ
1353 19 LETTER H (×°) Çü ³× °æ ´Å ÂÓ À¤ ʦ ¶ ¶Ê Áâ
1355 DIAGONAL LINE (¥Î):
1356 20 KANA NO (¥Î) ¼õ º© ¹Ô ˳ ë Ȭ Éã
1357 21 MAN TO THE LEFT (¥¤) »ø
1358 22 THOUSAND (Àé) ×Û ·Ï ¼á Íø ¾£ ²æ ¼ê Àé ÌÓ ¿â ¾è
1359 ¡¡ ½Å
1360 23 MAN TO THE TOP ¡¡ ¡¡ ̵ ´¿ ¸á Ìð ÃÝ ²· Ëè ¸ð
1361 24 COW (µí) ¾Ç µí ¼º ¼ë ¹ð À¸ Àè À©
1362 25 KANA KU (¥¯) ³° Á³ Ôé µ× ³Ñ Ò±
1363 26 HILL TOP ¡¡ ¡¡ »á Äþ ·Þ α Íñ ÀÍ ¹¡ ½â °õ ÃÊ
1364 27 LEFT ARROW (¡ã) Âæ Öß »å ÍÄ ¶¿
1365 28 ROOF (¢Ê) ¶â ¿© ÁÒ ²ñ ²ð
1366 29 X (¡ß) ÈÈ ´¢ ´õ »¦
1368 HORIZONTAL LINE (°ì):
1369 30 HORIZONTAL LINE (°ì) ¸À Éû Ʀ ¸Í Æõ Îï ¼¨ ¸µ ±¾
1370 31 FOURTH (Ãú) Îó »ê ±« Ãà ÉÔ Ëü Å· ¹¹ ²Ä ²¼ ¸ß
1371 ¡¡ ¸Þ Ê¿ ¹© ²¦
1372 32 BALD (Ѻ) ²ç »à À¾ Í× ÆÓ ·Á ¼ª °¡ ¼¥
1373 33 CLIFF (ÒÌ) ÀРä Îå ¸¶ È¿
1374 34 TOP-LEFT CORNER ¡¡ ¿Ã ÇÏ Ä¹ °å
1375 35 TOP-RIGHT CORNER ¡¡ Æþ ȯ ͽ Ëô Åá λ ×® ²µ Èô µÝ ¿Ò
1376 ¡¡ ·¯ ÁÂ
1377 36 UPSIDE-DOWN CAN (ÑÄ) Ʊ ÑÜ »Í »® ÅÄ ¹ü ð Êì ÆÌ ±ú
1378 37 MOUTH (¸ý) Õù ¼Ü À× Â­ ̱
1379 38 SUN (Æü) ¨ º± Ìç
1380 39 EYE TOP ¡¡ ·î ÌÜ ³î ³­ ¸«
1382 2. Observe its EXTREME BOTTOM or RIGHT BOTTOM.
1384 There are nine possibilities: DOT (Ц), LEFT HOOK (Э), VERTICAL LINE (¡Ã),
1385 RIGHT HOOK, DIAGONAL LINE (¥Î), BACK DIAGONAL LINE (¡³), BOTTOM OF HEAD É¥,
1386 BOTTOM OF WATAKUSHI ÒÓ, HORIZONTAL LINE (°ì). They are listed in association
1387 with one or more strokes. The number of the bottom element will become the
1388 second half of the kanji number.
1390 DOT (Ц):
1391 40 FOUR DOTS ¡¡ ̵ ×Û ±÷ Åß ½Â ´¨ ¿Ô
1392 41 SMALL (¾®) µþ ¾® ¸¶ ¼¨ ; ÀÖ »å ¸©
1393 42 WATER (¿å) µá ±Ê ɹ ¿å
1395 LEFT HOOK (Э):
1396 43 KANA RI (¥ê) Íø
1397 44 SEAL (ÒÇ) ʦ Ĥ Äï ÒÇ »Ô Éô
1398 45 SWORD BOTTOM (Åá) ±ß ³Ñ ǵ ÎÏ Ëü Åá ¿Ï
1399 46 MOON (·î) ÌÀ Í­
1400 47 DOTLESS INCH ¡¡ ºÆ ºý Õú в Í· Í¿ Êì Ëè ð »Ò ¾µ
1401 ¡¡ ¼ê ¿È ºÍ ²ç
1402 48 INCH (À£) ½®
1403 49 MOUTH LEFT HOOK ¡¡ ¡¡ ¼þ ²Ä »Ê ¶É
1404 50 BIRD BOTTOM ¡¡ ¡¡ Ä»
1405 51 ANIMAL (ÌÞ) ìµ Êª
1406 52 BOW BOTTOM ¡¡ ¡¡ µà µÝ Åç Ò±
1407 53 LEFT HOOK (Э) ±© Ìç Ãú λ ÍÑ ºý ÑÄ ±©
1409 VERTICAL LINE (¡Ã):
1410 54 VERTICAL LINE (¡Ã) ÉÔ ËÎ Æã Èó ÊÒ ¶Ô Àî ʹ
1411 55 CROSS (½½) ¶« Á¤ ÅÍ ´³ ÍÓ ËÜ ¿½ Áá ¼Ö ÀÍ Ãæ
1412 ¡¡ ³× ¼ª ×° °æ
1414 RIGHT HOOK:
1415 56 RIGHT HOOK ¡¡ ¸Ê Ìé Ýã ²µ ´¤ ÑÜ »á ×µ ε Ò¸
1416 57 LEGS (ѹ) Õ÷ µ´ Ѻ ȯ Ãû ´Ý ¹Ó
1417 58 HEART (¿´) Ç°
1418 59 TASSELED SPEAR BOTTOM ¡¡ Øù ɬ
1420 DIAGONAL LINE (¥Î):
1421 60 KANA NO (¥Î) ×Ä ¾¯ º£ ͼ Õù À¼ Õú µÕ
1423 BACK DIAGONAL LINE (¡³):
1424 61 SMALL PODIUM ¡¡ Âþ ³­ ¸² ¶¦ Ï»
1425 62 BACK KANA NO (¡³) °ç °Ê ¼Ü µ× Æþ ²Ð ±» ¿Í Öß
1426 63 BIG (Âç) Ìð ÂÀ Å· ¾Ð É× ¼Â ·ð Íè ·è ±û
1427 64 TREE (ÌÚ) Ûù « Åì ¼ë Íè ¾è Ãã ̤ ²Ó ·ó ½Ò
1428 ¡¡ ÊÆ
1429 65 SMALL SPOON ¡¡ °á ´Ä º± ι Ĺ ä ÇÉ ½°
1430 66 GOVERN (Щ) ¿Þ Éã ¾æ ʸ Ú¾ Íù ¹¹
1431 67 AGAIN (Ëô) Ìë Ôé µÚ Ôé ×®
1432 68 WINDY AGAIN (ÝÕ) Ìò
1433 69 WOMAN (½÷) °Â
1435 HEAD BOTTOM:
1436 70 HEAD BOTTOM ¡¡ É¥ Äê ­ Áö Ç·
1438 WATAKUSHI BOTTOM:
1439 71 WATAKUSHI BOTTOM ¡¡ Ãî Öö ÒÓ
1441 HORIZONTAL LINE (°ì):
1442 72 HORIZONTAL LINE (°ì) ¹© ÅÚ ²¦ ¾å À¸ Τ ¿â ¶Ì ¶â ¶Ë ¸ß
1443 73 STANDING BOTTOM ¡¡ °¡ »ß ³î ¸Þ µÖ Ω Ʀ
1444 74 DISH BOTTOM ¡¡ ÊÂ »®
1445 75 BOTTOM CORNER ¡¡ ÆÌ ±ú µÔ Åö ľ Ñá Ò¹ ð² À¾ ÆÓ Ë´
1446 ¡¡ À¤
1447 76 MOUNTAIN (»³) Àç ½Ð ÍÉ ´Ì ÅÄ Í³ ²Á Í© ¶Ê ÌÌ
1448 77 MOUTH (¸ý) Àê ¸Å Àå Ϥ ´±
1449 78 SUN (Æü) Çò É´ ´Å
1450 79 EYE (ÌÜ) ¼ó ½â ¼«
1452 The number of the kanji you are looking for consists of the top number
1453 coming first and the bottom number coming second, the two numbers being
1454 placed side by side. E.g., ´Á 363 (3 63), »ú 747 (7 47).
1456 There are two rules always to keep in mind:
1458 a. Ignore the complete enclosure Óø and the "road" radical (as in Æ»). Look
1459 at the top and bottom (in some cases only the bottom) of what is inside the
1460 complete enclosure, and of what is to the upper right of "road". E.g.,
1461 ¼ü 1262, ¸Ä 2177, Æ» 979, Ë¥ 2755.
1463 b. When a part is enclosed by the "gate" radical, take the bottom or right
1464 bottom of that part. E.g., Æ® 3848, Íó 1864.
1465 </pre>
1467 </body></html>