2 <!DOCTYPE rfc SYSTEM "rfc2629.dtd">
6 <rfc category="info" ipr="full2026"
7 docName="draft-josefsson-idn-test-vectors">
11 <title>Nameprep and IDNA Test Vectors</title>
13 <author initials="S." surname="Josefsson" fullname="Simon Josefsson">
14 <organization></organization>
17 <street>Drottningholmsv. 70</street>
18 <city>Stockholm</city> <code>112 42</code>
19 <country>Sweden</country>
21 <email>simon@josefsson.org</email>
25 <date month="February" year="2003"/>
29 <t>This document contains test vectors for Nameprep and IDNA.</t>
37 <section title="Introduction">
39 <t>The Nameprep and IDNA specifications lack thorough examples that
40 would have aided in implementing them. This document act as a
41 complement to those specifications providing such examples.</t>
43 <t>It should be pointed out that this document is not normative, and
44 thus any errors in this document should not be treated as gospel that
45 defines Nameprep nor IDNA. When conforming to the specification and
46 generating output corresponding to values in this document is in
47 conflict, implementations should conform to the specification.</t>
49 <t><vspace blankLines="10000" /></t>
53 <section title="Format of Nameprep Test Vectors">
55 <t>The tests follow a certain syntax, described here by showing one
56 complete example with comments intermixed. The comments are prefixed
57 with the '#' character.</t>
61 # First the (UTF-8) string is printed as a C octet string, with
62 # characters [A-Za-z .0-9] shown inline and other characters shown
63 # escaped with \xAB where AB is the hex sequence of that octet. The
64 # number of octets are also shown.
69 # The input is also printed as Unicode codepoints.
74 # After printing the input, the nameprep steps starts. When the
75 # string is modified, the specific operation that caused it is printed
76 # along with the new string of Unicode code points.
78 # 1) Map -- For each character in the input, check if it has a mapping
79 # and, if so, replace it with its mapping. This is described in
82 Table B.2 maps U+1fb7 to U+03b1 U+0342 U+03b9.
85 # 2) Normalize -- Possibly normalize the result of step 1 using Unicode
86 # normalization. This is described in section 4.
88 Unicode normalization with form KC maps string into:
91 # 3) Prohibit -- Check for any characters that are not allowed in the
92 # output. If any are found, return an error. This is described in
95 # 4) Check bidi -- Possibly check for right-to-left characters, and if
96 # any are found, make sure that the whole string satisfies the
97 # requirements for bidirectional strings. If the string does not
98 # satisfy the requirements for bidirectional strings, return an
99 # error. This is described in section 6.
101 # 1) The characters in section 5.8 MUST be prohibited.
103 # 2) If a string contains any RandALCat character, the string MUST NOT
104 # contain any LCat character.
106 # 3) If a string contains any RandALCat character, a RandALCat
107 # character MUST be the first character of the string, and a
108 # RandALCat character MUST be the last character of the string.
110 # The output is printed as Unicode codepoints.
115 # And finally the output is printed as UTF-8
117 out (length 5 bytes):
124 <section title="Format of IDNA Test Vectors">
126 <t>The tests follow a certain syntax, described here by showing one
127 complete example with comments intermixed. The comments are prefixed
128 with the '#' character.</t>
132 # First the (UTF-8) string is printed as a C octet string, with
133 # characters [A-Za-z .0-9] shown inline and other characters shown
134 # escaped with \xAB where AB is the hex sequence of that octet. The
135 # number of octets are also shown.
137 in (length 39 bytes):
138 'Hello\x2DAnother\x2DWa'
139 'y\x2D\xE3\x81\x9D\xE3\x82\x8C\xE3\x81\x9E\xE3\x82\x8C\xE3\x81'
140 '\xAE\xE5\xA0\xB4\xE6\x89\x80
142 # The input is also printed as Unicode codepoints.
145 U+0048 U+0065 U+006c U+006c U+006f U+002d U+0041 U+006e
146 U+006f U+0074 U+0068 U+0065 U+0072 U+002d U+0057 U+0061
147 U+0079 U+002d U+305d U+308c U+305e U+308c U+306e U+5834
150 # After printing the input, the IDNA ToASCII step starts. The output
151 # is printed as an ASCII string.
153 out: xn--hello-another-way--fc4qua05auwb3674vfr0b
159 <t><vspace blankLines="10000" /></t>
163 <section title="Nameprep Test Vectors">
165 <?rfc include="foo"?>
169 <section title="IDNA Test Vectors">
171 <?rfc include="bar"?>
175 <section title="Security Considerations">
177 <t>The security considerations from Nameprep and IDNA are
180 <t>These test vectors are not believed to introduce new security
181 considerations nor disrupt the operation of the Internet, but may
182 expose security weaknesses in existing implementations. Any such
183 incident should not be regarded as a problem with this document,
184 though, but rather taken as evidence that this document served its
193 <note title="Acknowledgments">
194 <t>Some IDNA test vectors were borrowed from Punycode <xref
195 target="RFC3492" />.</t>
198 <section title="Nameprep test vectors in C syntax">
200 <t>In order to avoid having implementors type in the test vectors
201 above, a C structure with the data is provided.</t>
203 <t>The comment field is the section titles used in this document. The
204 in field contains UTF-8 encoded strings. The out field contains
205 expected output, or NULL if the expected result is an error. The
206 profile field can be ignored. The only significant setting for the
207 flags field is STRINGPREP_NO_UNASSIGNED which signals to the Nameprep
208 implementation that it should perform unassigned code point checking,
209 aka the "AllowUnassigned" flag. The rc field contains expected error
210 codes, where 0 indicates success and the other flags should be self
228 "foo\xC2\xAD\xCD\x8F\xE1\xA0\x86\xE1\xA0\x8B"
229 "bar""\xE2\x80\x8B\xE2\x81\xA0""baz\xEF\xB8\x80\xEF\xB8\x88"
230 "\xEF\xB8\x8F\xEF\xBB\xBF", "foobarbaz"
233 "Case folding ASCII U+0043 U+0041 U+0046 U+0045",
237 "Case folding 8bit U+00DF (german sharp s)",
241 "Case folding U+0130 (turkish capital I with dot)",
242 "\xC4\xB0", "i\xcc\x87"
245 "Case folding multibyte U+0143 U+037A",
246 "\xC5\x83\xCD\xBA", "\xC5\x84 \xCE\xB9"
249 "Case folding U+2121 U+33C6 U+1D7BB",
250 "\xE2\x84\xA1\xE3\x8F\x86\xF0\x9D\x9E\xBB",
251 "telc\xE2\x88\x95""kg\xCF\x83"
254 "Normalization of U+006a U+030c U+00A0 U+00AA",
255 "\x6A\xCC\x8C\xC2\xA0\xC2\xAA", "\xC7\xB0 a"
258 "Case folding U+1FB7 and normalization",
259 "\xE1\xBE\xB7", "\xE1\xBE\xB6\xCE\xB9"
262 "Self-reverting case folding U+01F0 and normalization",
263 "\xC7\xF0", "\xC7\xB0"
266 "Self-reverting case folding U+0390 and normalization",
267 "\xCE\x90", "\xCE\x90"
270 "Self-reverting case folding U+03B0 and normalization",
271 "\xCE\xB0", "\xCE\xB0"
274 "Self-reverting case folding U+1E96 and normalization",
275 "\xE1\xBA\x96", "\xE1\xBA\x96"
278 "Self-reverting case folding U+1F56 and normalization",
279 "\xE1\xBD\x96", "\xE1\xBD\x96"
282 "ASCII space character U+0020",
286 "Non-ASCII 8bit space character U+00A0",
290 "Non-ASCII multibyte space character U+1680",
291 "\xE1\x9A\x80", NULL, "Nameprep", 0,
292 STRINGPREP_CONTAINS_PROHIBITED
295 "Non-ASCII multibyte space character U+2000",
296 "\xE2\x80\x80", "\x20"
299 "Zero Width Space U+200b",
303 "Non-ASCII multibyte space character U+3000",
304 "\xE3\x80\x80", "\x20"
307 "ASCII control characters U+0010 U+007F",
308 "\x10\x7F", "\x10\x7F"
311 "Non-ASCII 8bit control character U+0085",
312 "\xC2\x85", NULL, "Nameprep", 0,
313 STRINGPREP_CONTAINS_PROHIBITED
316 "Non-ASCII multibyte control character U+180E",
317 "\xE1\xA0\x8E", NULL, "Nameprep", 0,
318 STRINGPREP_CONTAINS_PROHIBITED
321 "Zero Width No-Break Space U+FEFF",
325 "Non-ASCII control character U+1D175",
326 "\xF0\x9D\x85\xB5", NULL, "Nameprep", 0,
327 STRINGPREP_CONTAINS_PROHIBITED
330 "Plane 0 private use character U+F123",
331 "\xEF\x84\xA3", NULL, "Nameprep", 0,
332 STRINGPREP_CONTAINS_PROHIBITED
335 "Plane 15 private use character U+F1234",
336 "\xF3\xB1\x88\xB4", NULL, "Nameprep", 0,
337 STRINGPREP_CONTAINS_PROHIBITED
340 "Plane 16 private use character U+10F234",
341 "\xF4\x8F\x88\xB4", NULL, "Nameprep", 0,
342 STRINGPREP_CONTAINS_PROHIBITED
345 "Non-character code point U+8FFFE",
346 "\xF2\x8F\xBF\xBE", NULL, "Nameprep", 0,
347 STRINGPREP_CONTAINS_PROHIBITED
350 "Non-character code point U+10FFFF",
351 "\xF4\x8F\xBF\xBF", NULL, "Nameprep", 0,
352 STRINGPREP_CONTAINS_PROHIBITED
355 "Surrogate code U+DF42",
356 "\xED\xBD\x82", NULL, "Nameprep", 0,
357 STRINGPREP_CONTAINS_PROHIBITED
360 "Non-plain text character U+FFFD",
361 "\xEF\xBF\xBD", NULL, "Nameprep", 0,
362 STRINGPREP_CONTAINS_PROHIBITED
365 "Ideographic description character U+2FF5",
366 "\xE2\xBF\xB5", NULL, "Nameprep", 0,
367 STRINGPREP_CONTAINS_PROHIBITED
370 "Display property character U+0341",
371 "\xCD\x81", "\xCC\x81"
374 "Left-to-right mark U+200E",
375 "\xE2\x80\x8E", "\xCC\x81", "Nameprep", 0,
376 STRINGPREP_CONTAINS_PROHIBITED
380 "\xE2\x80\xAA", "\xCC\x81", "Nameprep", 0,
381 STRINGPREP_CONTAINS_PROHIBITED
384 "Language tagging character U+E0001",
385 "\xF3\xA0\x80\x81", "\xCC\x81", "Nameprep", 0,
386 STRINGPREP_CONTAINS_PROHIBITED
389 "Language tagging character U+E0042",
390 "\xF3\xA0\x81\x82", NULL, "Nameprep", 0,
391 STRINGPREP_CONTAINS_PROHIBITED
394 "Bidi: RandALCat character U+05BE and LCat characters",
395 "foo\xD6\xBE""bar", NULL, "Nameprep", 0,
396 STRINGPREP_BIDI_BOTH_L_AND_RAL
399 "Bidi: RandALCat character U+FD50 and LCat characters",
400 "foo\xEF\xB5\x90""bar", NULL, "Nameprep", 0,
401 STRINGPREP_BIDI_BOTH_L_AND_RAL
404 "Bidi: RandALCat character U+FB38 and LCat characters",
405 "foo\xEF\xB9\xB6""bar", "foo \xd9\x8e""bar"
407 { "Bidi: RandALCat without trailing RandALCat U+0627 U+0031",
408 "\xD8\xA7\x31", NULL, "Nameprep", 0,
409 STRINGPREP_BIDI_LEADTRAIL_NOT_RAL}
412 "Bidi: RandALCat character U+0627 U+0031 U+0628",
413 "\xD8\xA7\x31\xD8\xA8", "\xD8\xA7\x31\xD8\xA8"
416 "Unassigned code point U+E0002",
417 "\xF3\xA0\x80\x82", NULL, "Nameprep", STRINGPREP_NO_UNASSIGNED,
418 STRINGPREP_CONTAINS_UNASSIGNED
421 "Larger test (shrinking)",
422 "X\xC2\xAD\xC3\xDF\xC4\xB0\xE2\x84\xA1\x6a\xcc\x8c\xc2\xa0\xc2"
423 "\xaa\xce\xb0\xe2\x80\x80", "xssi\xcc\x87""tel\xc7\xb0 a\xce\xb0 ",
427 "Larger test (expanding)",
428 "X\xC3\xDF\xe3\x8c\x96\xC4\xB0\xE2\x84\xA1\xE2\x92\x9F\xE3\x8c\x80",
429 "xss\xe3\x82\xad\xe3\x83\xad\xe3\x83\xa1\xe3\x83\xbc\xe3\x83\x88"
430 "\xe3\x83\xab""i\xcc\x87""tel\x28""d\x29\xe3\x82\xa2\xe3\x83\x91"
431 "\xe3\x83\xbc\xe3\x83\x88"
439 <section title="IDNA test vectors in C syntax">
441 <t>In order to avoid having implementors type in the IDNA test vectors
442 above, a C structure with the data is provided.</t>
444 <t>The name field is the section titles used in this document. The
445 inlen and in field contains Unicode code points. The out field
446 contains expected ToASCII output. The allowunassigned, and
447 usestd3asciirules can be ignored. The toasciirc and tounicoderc field
448 contains expected error codes, where 0 indicates success and the other
449 flags should be self explanatory.</t>
457 unsigned long in[100];
460 int usestd3asciirules;
466 "Arabic (Egyptian)", 17,
468 0x0644, 0x064A, 0x0647, 0x0645, 0x0627, 0x0628, 0x062A, 0x0643,
469 0x0644, 0x0645, 0x0648, 0x0634, 0x0639, 0x0631, 0x0628, 0x064A,
471 IDNA_ACE_PREFIX "egbpdaj6bu4bxfgehfvwxn", 0, 0, IDNA_SUCCESS,
474 "Chinese (simplified)", 9,
476 0x4ED6, 0x4EEC, 0x4E3A, 0x4EC0, 0x4E48, 0x4E0D, 0x8BF4, 0x4E2D, 0x6587},
477 IDNA_ACE_PREFIX "ihqwcrb4cv8a8dqg056pqjye", 0, 0, IDNA_SUCCESS,
480 "Chinese (traditional)", 9,
482 0x4ED6, 0x5011, 0x7232, 0x4EC0, 0x9EBD, 0x4E0D, 0x8AAA, 0x4E2D, 0x6587},
483 IDNA_ACE_PREFIX "ihqwctvzc91f659drss3x8bo0yb", 0, 0, IDNA_SUCCESS,
488 0x0050, 0x0072, 0x006F, 0x010D, 0x0070, 0x0072, 0x006F, 0x0073,
489 0x0074, 0x011B, 0x006E, 0x0065, 0x006D, 0x006C, 0x0075, 0x0076,
490 0x00ED, 0x010D, 0x0065, 0x0073, 0x006B, 0x0079},
491 IDNA_ACE_PREFIX "Proprostnemluvesky-uyb24dma41a", 0, 0, IDNA_SUCCESS,
496 0x05DC, 0x05DE, 0x05D4, 0x05D4, 0x05DD, 0x05E4, 0x05E9, 0x05D5,
497 0x05D8, 0x05DC, 0x05D0, 0x05DE, 0x05D3, 0x05D1, 0x05E8, 0x05D9,
498 0x05DD, 0x05E2, 0x05D1, 0x05E8, 0x05D9, 0x05EA},
499 IDNA_ACE_PREFIX "4dbcagdahymbxekheh6e0a7fei0b", 0, 0, IDNA_SUCCESS,
502 "Hindi (Devanagari)", 30,
504 0x092F, 0x0939, 0x0932, 0x094B, 0x0917, 0x0939, 0x093F, 0x0928,
505 0x094D, 0x0926, 0x0940, 0x0915, 0x094D, 0x092F, 0x094B, 0x0902,
506 0x0928, 0x0939, 0x0940, 0x0902, 0x092C, 0x094B, 0x0932, 0x0938,
507 0x0915, 0x0924, 0x0947, 0x0939, 0x0948, 0x0902},
508 IDNA_ACE_PREFIX "i1baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd", 0, 0,
511 "Japanese (kanji and hiragana)", 18,
513 0x306A, 0x305C, 0x307F, 0x3093, 0x306A, 0x65E5, 0x672C, 0x8A9E,
514 0x3092, 0x8A71, 0x3057, 0x3066, 0x304F, 0x308C, 0x306A, 0x3044,
516 IDNA_ACE_PREFIX "n8jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa", 0, 0,
519 "Russian (Cyrillic)", 28,
521 0x043F, 0x043E, 0x0447, 0x0435, 0x043C, 0x0443, 0x0436, 0x0435,
522 0x043E, 0x043D, 0x0438, 0x043D, 0x0435, 0x0433, 0x043E, 0x0432,
523 0x043E, 0x0440, 0x044F, 0x0442, 0x043F, 0x043E, 0x0440, 0x0443,
524 0x0441, 0x0441, 0x043A, 0x0438},
525 IDNA_ACE_PREFIX "b1abfaaepdrnnbgefbadotcwatmq2g4l", 0, 0,
526 IDNA_SUCCESS, IDNA_SUCCESS},
530 0x0050, 0x006F, 0x0072, 0x0071, 0x0075, 0x00E9, 0x006E, 0x006F,
531 0x0070, 0x0075, 0x0065, 0x0064, 0x0065, 0x006E, 0x0073, 0x0069,
532 0x006D, 0x0070, 0x006C, 0x0065, 0x006D, 0x0065, 0x006E, 0x0074,
533 0x0065, 0x0068, 0x0061, 0x0062, 0x006C, 0x0061, 0x0072, 0x0065,
534 0x006E, 0x0045, 0x0073, 0x0070, 0x0061, 0x00F1, 0x006F, 0x006C},
535 IDNA_ACE_PREFIX "PorqunopuedensimplementehablarenEspaol-fmd56a", 0, 0,
540 0x0054, 0x1EA1, 0x0069, 0x0073, 0x0061, 0x006F, 0x0068, 0x1ECD,
541 0x006B, 0x0068, 0x00F4, 0x006E, 0x0067, 0x0074, 0x0068, 0x1EC3,
542 0x0063, 0x0068, 0x1EC9, 0x006E, 0x00F3, 0x0069, 0x0074, 0x0069,
543 0x1EBF, 0x006E, 0x0067, 0x0056, 0x0069, 0x1EC7, 0x0074},
544 IDNA_ACE_PREFIX "TisaohkhngthchnitingVit-kjcr8268qyxafd2f1b9g", 0, 0,
549 0x0033, 0x5E74, 0x0042, 0x7D44, 0x91D1, 0x516B, 0x5148, 0x751F},
550 IDNA_ACE_PREFIX "3B-ww4c5e180e575a65lsy2b", 0, 0, IDNA_SUCCESS,
555 0x5B89, 0x5BA4, 0x5948, 0x7F8E, 0x6075, 0x002D, 0x0077, 0x0069,
556 0x0074, 0x0068, 0x002D, 0x0053, 0x0055, 0x0050, 0x0045, 0x0052,
557 0x002D, 0x004D, 0x004F, 0x004E, 0x004B, 0x0045, 0x0059, 0x0053},
558 IDNA_ACE_PREFIX "-with-SUPER-MONKEYS-pc58ag80a8qai00g7n9n", 0, 0,
563 0x0048, 0x0065, 0x006C, 0x006C, 0x006F, 0x002D, 0x0041, 0x006E,
564 0x006F, 0x0074, 0x0068, 0x0065, 0x0072, 0x002D, 0x0057, 0x0061,
565 0x0079, 0x002D, 0x305D, 0x308C, 0x305E, 0x308C, 0x306E, 0x5834,
567 IDNA_ACE_PREFIX "Hello-Another-Way--fc4qua05auwb3674vfr0b", 0, 0,
572 0x3072, 0x3068, 0x3064, 0x5C4B, 0x6839, 0x306E, 0x4E0B, 0x0032},
573 IDNA_ACE_PREFIX "2-u9tlzr9756bt3uc0v", 0, 0, IDNA_SUCCESS,
578 0x004D, 0x0061, 0x006A, 0x0069, 0x3067, 0x004B, 0x006F, 0x0069,
579 0x3059, 0x308B, 0x0035, 0x79D2, 0x524D},
580 IDNA_ACE_PREFIX "MajiKoi5-783gue6qz075azm5e", 0, 0, IDNA_SUCCESS,
585 0x30D1, 0x30D5, 0x30A3, 0x30FC, 0x0064, 0x0065, 0x30EB, 0x30F3, 0x30D0},
586 IDNA_ACE_PREFIX "de-jg4avhby1noc0d", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS},
590 0x305D, 0x306E, 0x30B9, 0x30D4, 0x30FC, 0x30C9, 0x3067},
591 IDNA_ACE_PREFIX "d9juau41awczczp", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS},
594 {0x03b5, 0x03bb, 0x03bb, 0x03b7, 0x03bd, 0x03b9, 0x03ba, 0x03ac},
595 IDNA_ACE_PREFIX "hxargifdar", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS},
597 "Maltese (Malti)", 10,
598 {0x0062, 0x006f, 0x006e, 0x0121, 0x0075, 0x0073, 0x0061, 0x0127,
600 IDNA_ACE_PREFIX "bonusaa-5bb1da", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS},
602 "Russian (Cyrillic)", 28,
603 {0x043f, 0x043e, 0x0447, 0x0435, 0x043c, 0x0443, 0x0436, 0x0435,
604 0x043e, 0x043d, 0x0438, 0x043d, 0x0435, 0x0433, 0x043e, 0x0432,
605 0x043e, 0x0440, 0x044f, 0x0442, 0x043f, 0x043e, 0x0440, 0x0443,
606 0x0441, 0x0441, 0x043a, 0x0438},
607 IDNA_ACE_PREFIX "b1abfaaepdrnnbgefbadotcwatmq2g4l", 0, 0,
608 IDNA_SUCCESS, IDNA_SUCCESS},
615 <references title="Normative References">
616 <?rfc include="reference.RFC.3491.xml"?>
617 <?rfc include="reference.RFC.3490.xml"?>
620 <references title="Informative References">
621 <?rfc include="reference.RFC.3492.xml"?>