3 Network Working Group S. Josefsson
4 Internet-Draft February 2003
5 Expires: August 2, 2003
8 Nameprep and IDNA Test Vectors
9 draft-josefsson-idn-test-vectors
13 This document is an Internet-Draft and is in full conformance with
14 all provisions of Section 10 of RFC2026.
16 Internet-Drafts are working documents of the Internet Engineering
17 Task Force (IETF), its areas, and its working groups. Note that
18 other groups may also distribute working documents as
21 Internet-Drafts are draft documents valid for a maximum of six months
22 and may be updated, replaced, or obsoleted by other documents at any
23 time. It is inappropriate to use Internet-Drafts as reference
24 material or to cite them other than as "work in progress."
26 The list of current Internet-Drafts can be accessed at http://
27 www.ietf.org/ietf/1id-abstracts.txt.
29 The list of Internet-Draft Shadow Directories can be accessed at
30 http://www.ietf.org/shadow.html.
32 This Internet-Draft will expire on August 2, 2003.
36 This document contains test vectors for Nameprep and IDNA.
54 Josefsson Expires August 2, 2003 [Page 1]
56 Internet-Draft Nameprep and IDNA Test Vectors February 2003
61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
62 2. Format of Nameprep Test Vectors . . . . . . . . . . . . . . 5
63 3. Format of IDNA Test Vectors . . . . . . . . . . . . . . . . 6
64 4. Nameprep Test Vectors . . . . . . . . . . . . . . . . . . . 7
65 4.1 Map to nothing . . . . . . . . . . . . . . . . . . . . . . . 7
66 4.2 Case folding ASCII U+0043 U+0041 U+0046 U+0045 . . . . . . . 8
67 4.3 Case folding 8bit U+00DF (german sharp s) . . . . . . . . . 8
68 4.4 Case folding U+0130 (turkish capital I with dot) . . . . . . 9
69 4.5 Case folding multibyte U+0143 U+037A . . . . . . . . . . . . 9
70 4.6 Case folding U+2121 U+33C6 U+1D7BB . . . . . . . . . . . . . 10
71 4.7 Normalization of U+006a U+030c U+00A0 U+00AA . . . . . . . . 10
72 4.8 Case folding U+1FB7 and normalization . . . . . . . . . . . 11
73 4.9 Self-reverting case folding U+01F0 and normalization . . . . 11
74 4.10 Self-reverting case folding U+0390 and normalization . . . . 12
75 4.11 Self-reverting case folding U+03B0 and normalization . . . . 12
76 4.12 Self-reverting case folding U+1E96 and normalization . . . . 13
77 4.13 Self-reverting case folding U+1F56 and normalization . . . . 13
78 4.14 ASCII space character U+0020 . . . . . . . . . . . . . . . . 13
79 4.15 Non-ASCII 8bit space character U+00A0 . . . . . . . . . . . 14
80 4.16 Non-ASCII multibyte space character U+1680 . . . . . . . . . 14
81 4.17 Non-ASCII multibyte space character U+2000 . . . . . . . . . 14
82 4.18 Zero Width Space U+200b . . . . . . . . . . . . . . . . . . 15
83 4.19 Non-ASCII multibyte space character U+3000 . . . . . . . . . 15
84 4.20 ASCII control characters U+0010 U+007F . . . . . . . . . . . 15
85 4.21 Non-ASCII 8bit control character U+0085 . . . . . . . . . . 16
86 4.22 Non-ASCII multibyte control character U+180E . . . . . . . . 16
87 4.23 Zero Width No-Break Space U+FEFF . . . . . . . . . . . . . . 16
88 4.24 Non-ASCII control character U+1D175 . . . . . . . . . . . . 16
89 4.25 Plane 0 private use character U+F123 . . . . . . . . . . . . 17
90 4.26 Plane 15 private use character U+F1234 . . . . . . . . . . . 17
91 4.27 Plane 16 private use character U+10F234 . . . . . . . . . . 17
92 4.28 Non-character code point U+8FFFE . . . . . . . . . . . . . . 17
93 4.29 Non-character code point U+10FFFF . . . . . . . . . . . . . 18
94 4.30 Surrogate code U+DF42 . . . . . . . . . . . . . . . . . . . 18
95 4.31 Non-plain text character U+FFFD . . . . . . . . . . . . . . 18
96 4.32 Ideographic description character U+2FF5 . . . . . . . . . . 18
97 4.33 Display property character U+0341 . . . . . . . . . . . . . 19
98 4.34 Left-to-right mark U+200E . . . . . . . . . . . . . . . . . 19
99 4.35 Deprecated U+202A . . . . . . . . . . . . . . . . . . . . . 19
100 4.36 Language tagging character U+E0001 . . . . . . . . . . . . . 19
101 4.37 Language tagging character U+E0042 . . . . . . . . . . . . . 20
102 4.38 Bidi: RandALCat character U+05BE and LCat characters . . . . 20
103 4.39 Bidi: RandALCat character U+FD50 and LCat characters . . . . 20
104 4.40 Bidi: RandALCat character U+FB38 and LCat characters . . . . 21
105 4.41 Bidi: RandALCat without trailing RandALCat U+0627 U+0031 . . 21
106 4.42 Bidi: RandALCat character U+0627 U+0031 U+0628 . . . . . . . 21
110 Josefsson Expires August 2, 2003 [Page 2]
112 Internet-Draft Nameprep and IDNA Test Vectors February 2003
115 4.43 Unassigned code point U+E0002 . . . . . . . . . . . . . . . 22
116 4.44 Larger test (shrinking) . . . . . . . . . . . . . . . . . . 22
117 4.45 Larger test (expanding) . . . . . . . . . . . . . . . . . . 23
118 5. IDNA Test Vectors . . . . . . . . . . . . . . . . . . . . . 23
119 5.1 Arabic (Egyptian) . . . . . . . . . . . . . . . . . . . . . 23
120 5.2 Chinese (simplified) . . . . . . . . . . . . . . . . . . . . 24
121 5.3 Chinese (traditional) . . . . . . . . . . . . . . . . . . . 24
122 5.4 Czech . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
123 5.5 Hebrew . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
124 5.6 Hindi (Devanagari) . . . . . . . . . . . . . . . . . . . . . 25
125 5.7 Japanese (kanji and hiragana) . . . . . . . . . . . . . . . 25
126 5.8 Russian (Cyrillic) . . . . . . . . . . . . . . . . . . . . . 26
127 5.9 Spanish . . . . . . . . . . . . . . . . . . . . . . . . . . 26
128 5.10 Vietnamese . . . . . . . . . . . . . . . . . . . . . . . . . 27
129 5.11 Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . 27
130 5.12 Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . 27
131 5.13 Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . 28
132 5.14 Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . 28
133 5.15 Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . 28
134 5.16 Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . 29
135 5.17 Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . 29
136 5.18 Greek . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
137 5.19 Maltese (Malti) . . . . . . . . . . . . . . . . . . . . . . 29
138 5.20 Russian (Cyrillic) . . . . . . . . . . . . . . . . . . . . . 30
139 6. Security Considerations . . . . . . . . . . . . . . . . . . 30
140 Author's Address . . . . . . . . . . . . . . . . . . . . . . 31
141 Normative References . . . . . . . . . . . . . . . . . . . . 30
142 Informative References . . . . . . . . . . . . . . . . . . . 30
143 A. Nameprep test vectors in C syntax . . . . . . . . . . . . . 31
144 B. IDNA test vectors in C syntax . . . . . . . . . . . . . . . 36
145 Intellectual Property and Copyright Statements . . . . . . . 40
166 Josefsson Expires August 2, 2003 [Page 3]
168 Internet-Draft Nameprep and IDNA Test Vectors February 2003
173 The Nameprep and IDNA specifications lack thorough examples that
174 would have aided in implementing them. This document act as a
175 complement to those specifications providing such examples.
177 It should be pointed out that this document is not normative, and
178 thus any errors in this document should not be treated as gospel that
179 defines Nameprep nor IDNA. When conforming to the specification and
180 generating output corresponding to values in this document is in
181 conflict, implementations should conform to the specification.
222 Josefsson Expires August 2, 2003 [Page 4]
224 Internet-Draft Nameprep and IDNA Test Vectors February 2003
227 2. Format of Nameprep Test Vectors
229 The tests follow a certain syntax, described here by showing one
230 complete example with comments intermixed. The comments are prefixed
231 with the '#' character.
233 # First the (UTF-8) string is printed as a C octet string, with
234 # characters [A-Za-z .0-9] shown inline and other characters shown
235 # escaped with \xAB where AB is the hex sequence of that octet. The
236 # number of octets are also shown.
241 # The input is also printed as Unicode codepoints.
246 # After printing the input, the nameprep steps starts. When the
247 # string is modified, the specific operation that caused it is printed
248 # along with the new string of Unicode code points.
250 # 1) Map -- For each character in the input, check if it has a mapping
251 # and, if so, replace it with its mapping. This is described in
254 Table B.2 maps U+1fb7 to U+03b1 U+0342 U+03b9.
257 # 2) Normalize -- Possibly normalize the result of step 1 using Unicode
258 # normalization. This is described in section 4.
260 Unicode normalization with form KC maps string into:
263 # 3) Prohibit -- Check for any characters that are not allowed in the
264 # output. If any are found, return an error. This is described in
267 # 4) Check bidi -- Possibly check for right-to-left characters, and if
268 # any are found, make sure that the whole string satisfies the
269 # requirements for bidirectional strings. If the string does not
270 # satisfy the requirements for bidirectional strings, return an
271 # error. This is described in section 6.
273 # 1) The characters in section 5.8 MUST be prohibited.
278 Josefsson Expires August 2, 2003 [Page 5]
280 Internet-Draft Nameprep and IDNA Test Vectors February 2003
283 # 2) If a string contains any RandALCat character, the string MUST NOT
284 # contain any LCat character.
286 # 3) If a string contains any RandALCat character, a RandALCat
287 # character MUST be the first character of the string, and a
288 # RandALCat character MUST be the last character of the string.
290 # The output is printed as Unicode codepoints.
295 # And finally the output is printed as UTF-8
297 out (length 5 bytes):
301 3. Format of IDNA Test Vectors
303 The tests follow a certain syntax, described here by showing one
304 complete example with comments intermixed. The comments are prefixed
305 with the '#' character.
307 # First the (UTF-8) string is printed as a C octet string, with
308 # characters [A-Za-z .0-9] shown inline and other characters shown
309 # escaped with \xAB where AB is the hex sequence of that octet. The
310 # number of octets are also shown.
312 in (length 39 bytes):
313 'Hello\x2DAnother\x2DWa'
314 'y\x2D\xE3\x81\x9D\xE3\x82\x8C\xE3\x81\x9E\xE3\x82\x8C\xE3\x81'
315 '\xAE\xE5\xA0\xB4\xE6\x89\x80
317 # The input is also printed as Unicode codepoints.
320 U+0048 U+0065 U+006c U+006c U+006f U+002d U+0041 U+006e
321 U+006f U+0074 U+0068 U+0065 U+0072 U+002d U+0057 U+0061
322 U+0079 U+002d U+305d U+308c U+305e U+308c U+306e U+5834
325 # After printing the input, the IDNA ToASCII step starts. The output
326 # is printed as an ASCII string.
328 out: xn--hello-another-way--fc4qua05auwb3674vfr0b
334 Josefsson Expires August 2, 2003 [Page 6]
336 Internet-Draft Nameprep and IDNA Test Vectors February 2003
339 4. Nameprep Test Vectors
343 in (length 37 bytes):
344 foo\xC2\xAD\xCD\x8F\xE1\xA0\x86\xE1\xA0\x8Bbar'
345 '\xE2\x80\x8B\xE2\x81\xA0baz\xEF\xB8\x80\xEF\xB8\x88\xEF'
346 '\xB8\x8F\xEF\xBB\xBF
348 U+0066 U+006f U+006f U+00ad U+034f U+1806 U+180b U+0062
349 U+0061 U+0072 U+200b U+2060 U+0062 U+0061 U+007a U+fe00
352 Table B.1 maps U+00ad to nothing.
353 Table B.1 maps U+034f to nothing.
354 Table B.1 maps U+1806 to nothing.
355 Table B.1 maps U+180b to nothing.
356 Table B.1 maps U+200b to nothing.
357 Table B.1 maps U+2060 to nothing.
358 Table B.1 maps U+fe00 to nothing.
359 Table B.1 maps U+fe08 to nothing.
360 Table B.1 maps U+fe0f to nothing.
361 Table B.1 maps U+feff to nothing.
362 U+0066 U+006f U+006f U+0062 U+0061 U+0072 U+0062 U+0061
366 U+0066 U+006f U+006f U+0062 U+0061 U+0072 U+0062 U+0061
368 out (length 9 bytes):
390 Josefsson Expires August 2, 2003 [Page 7]
392 Internet-Draft Nameprep and IDNA Test Vectors February 2003
395 4.2 Case folding ASCII U+0043 U+0041 U+0046 U+0045
400 U+0043 U+0041 U+0046 U+0045
402 Table B.2 maps U+0043 to U+0063.
403 Table B.2 maps U+0041 to U+0061.
404 Table B.2 maps U+0046 to U+0066.
405 Table B.2 maps U+0045 to U+0065.
406 U+0063 U+0061 U+0066 U+0065
409 U+0063 U+0061 U+0066 U+0065
410 out (length 4 bytes):
413 4.3 Case folding 8bit U+00DF (german sharp s)
420 Table B.2 maps U+00df to U+0073 U+0073.
425 out (length 2 bytes):
446 Josefsson Expires August 2, 2003 [Page 8]
448 Internet-Draft Nameprep and IDNA Test Vectors February 2003
451 4.4 Case folding U+0130 (turkish capital I with dot)
458 Table B.2 maps U+0130 to U+0069 U+0307.
463 out (length 3 bytes):
466 4.5 Case folding multibyte U+0143 U+037A
473 Table B.2 maps U+0143 to U+0144.
474 Table B.2 maps U+037a to U+0020 U+03b9.
479 out (length 5 bytes):
502 Josefsson Expires August 2, 2003 [Page 9]
504 Internet-Draft Nameprep and IDNA Test Vectors February 2003
507 4.6 Case folding U+2121 U+33C6 U+1D7BB
509 in (length 10 bytes):
510 \xE2\x84\xA1\xE3\x8F\x86\xF0\x9D\x9E\xBB
512 U+2121 U+33c6 U+1d7bb
514 Table B.2 maps U+2121 to U+0074 U+0065 U+006c.
515 Table B.2 maps U+33c6 to U+0063 U+2215 U+006b U+0067.
516 Table B.2 maps U+1d7bb to U+03c3.
517 U+0074 U+0065 U+006c U+0063 U+2215 U+006b U+0067 U+03c3
521 U+0074 U+0065 U+006c U+0063 U+2215 U+006b U+0067 U+03c3
523 out (length 11 bytes):
524 telc\xE2\x88\x95kg\xCF\x83
526 4.7 Normalization of U+006a U+030c U+00A0 U+00AA
529 j\xCC\x8C\xC2\xA0\xC2\xAA
531 U+006a U+030c U+00a0 U+00aa
533 Unicode normalization with form KC maps string into:
538 out (length 4 bytes):
558 Josefsson Expires August 2, 2003 [Page 10]
560 Internet-Draft Nameprep and IDNA Test Vectors February 2003
563 4.8 Case folding U+1FB7 and normalization
570 Table B.2 maps U+1fb7 to U+03b1 U+0342 U+03b9.
572 Unicode normalization with form KC maps string into:
577 out (length 5 bytes):
580 4.9 Self-reverting case folding U+01F0 and normalization
587 Table B.2 maps U+01f0 to U+006a U+030c.
589 Unicode normalization with form KC maps string into:
594 out (length 2 bytes):
614 Josefsson Expires August 2, 2003 [Page 11]
616 Internet-Draft Nameprep and IDNA Test Vectors February 2003
619 4.10 Self-reverting case folding U+0390 and normalization
626 Table B.2 maps U+0390 to U+03b9 U+0308 U+0301.
628 Unicode normalization with form KC maps string into:
633 out (length 2 bytes):
636 4.11 Self-reverting case folding U+03B0 and normalization
643 Table B.2 maps U+03b0 to U+03c5 U+0308 U+0301.
645 Unicode normalization with form KC maps string into:
650 out (length 2 bytes):
670 Josefsson Expires August 2, 2003 [Page 12]
672 Internet-Draft Nameprep and IDNA Test Vectors February 2003
675 4.12 Self-reverting case folding U+1E96 and normalization
682 Table B.2 maps U+1e96 to U+0068 U+0331.
684 Unicode normalization with form KC maps string into:
689 out (length 3 bytes):
692 4.13 Self-reverting case folding U+1F56 and normalization
699 Table B.2 maps U+1f56 to U+03c5 U+0313 U+0342.
701 Unicode normalization with form KC maps string into:
706 out (length 3 bytes):
709 4.14 ASCII space character U+0020
719 out (length 1 bytes):
726 Josefsson Expires August 2, 2003 [Page 13]
728 Internet-Draft Nameprep and IDNA Test Vectors February 2003
731 4.15 Non-ASCII 8bit space character U+00A0
738 Unicode normalization with form KC maps string into:
743 out (length 1 bytes):
746 4.16 Non-ASCII multibyte space character U+1680
753 Table C.1.2 prohibits string (character U+1680).
756 4.17 Non-ASCII multibyte space character U+2000
763 Unicode normalization with form KC maps string into:
768 out (length 1 bytes):
782 Josefsson Expires August 2, 2003 [Page 14]
784 Internet-Draft Nameprep and IDNA Test Vectors February 2003
787 4.18 Zero Width Space U+200b
794 Table B.1 maps U+200b to nothing.
799 out (length 0 bytes):
802 4.19 Non-ASCII multibyte space character U+3000
809 Unicode normalization with form KC maps string into:
814 out (length 1 bytes):
817 4.20 ASCII control characters U+0010 U+007F
827 out (length 2 bytes):
838 Josefsson Expires August 2, 2003 [Page 15]
840 Internet-Draft Nameprep and IDNA Test Vectors February 2003
843 4.21 Non-ASCII 8bit control character U+0085
850 Table C.2.2 prohibits string (character U+0085).
853 4.22 Non-ASCII multibyte control character U+180E
860 Table C.2.2 prohibits string (character U+180e).
863 4.23 Zero Width No-Break Space U+FEFF
870 Table B.1 maps U+feff to nothing.
875 out (length 0 bytes):
878 4.24 Non-ASCII control character U+1D175
885 Table C.2.2 prohibits string (character U+1d175).
894 Josefsson Expires August 2, 2003 [Page 16]
896 Internet-Draft Nameprep and IDNA Test Vectors February 2003
899 4.25 Plane 0 private use character U+F123
906 Table C.3 prohibits string (character U+f123).
909 4.26 Plane 15 private use character U+F1234
916 Table C.3 prohibits string (character U+f1234).
919 4.27 Plane 16 private use character U+10F234
926 Table C.3 prohibits string (character U+10f234).
929 4.28 Non-character code point U+8FFFE
936 Table C.4 prohibits string (character U+8fffe).
950 Josefsson Expires August 2, 2003 [Page 17]
952 Internet-Draft Nameprep and IDNA Test Vectors February 2003
955 4.29 Non-character code point U+10FFFF
962 Table C.4 prohibits string (character U+10ffff).
965 4.30 Surrogate code U+DF42
972 Table C.5 prohibits string (character U+df42).
975 4.31 Non-plain text character U+FFFD
982 Table C.6 prohibits string (character U+fffd).
985 4.32 Ideographic description character U+2FF5
992 Table C.7 prohibits string (character U+2ff5).
1006 Josefsson Expires August 2, 2003 [Page 18]
1008 Internet-Draft Nameprep and IDNA Test Vectors February 2003
1011 4.33 Display property character U+0341
1013 in (length 2 bytes):
1018 Unicode normalization with form KC maps string into:
1023 out (length 2 bytes):
1026 4.34 Left-to-right mark U+200E
1028 in (length 3 bytes):
1033 Table C.8 prohibits string (character U+200e).
1036 4.35 Deprecated U+202A
1038 in (length 3 bytes):
1043 Table C.8 prohibits string (character U+202a).
1046 4.36 Language tagging character U+E0001
1048 in (length 4 bytes):
1053 Table C.9 prohibits string (character U+e0001).
1062 Josefsson Expires August 2, 2003 [Page 19]
1064 Internet-Draft Nameprep and IDNA Test Vectors February 2003
1067 4.37 Language tagging character U+E0042
1069 in (length 4 bytes):
1074 Table C.9 prohibits string (character U+e0042).
1077 4.38 Bidi: RandALCat character U+05BE and LCat characters
1079 in (length 8 bytes):
1082 U+0066 U+006f U+006f U+05be U+0062 U+0061 U+0072
1084 String contains both L and RAL characters.
1087 4.39 Bidi: RandALCat character U+FD50 and LCat characters
1089 in (length 9 bytes):
1092 U+0066 U+006f U+006f U+fd50 U+0062 U+0061 U+0072
1094 Unicode normalization with form KC maps string into:
1095 U+0066 U+006f U+006f U+062a U+062c U+0645 U+0062 U+0061
1097 String contains both L and RAL characters.
1118 Josefsson Expires August 2, 2003 [Page 20]
1120 Internet-Draft Nameprep and IDNA Test Vectors February 2003
1123 4.40 Bidi: RandALCat character U+FB38 and LCat characters
1125 in (length 9 bytes):
1128 U+0066 U+006f U+006f U+fe76 U+0062 U+0061 U+0072
1130 Unicode normalization with form KC maps string into:
1131 U+0066 U+006f U+006f U+0020 U+064e U+0062 U+0061 U+0072
1135 U+0066 U+006f U+006f U+0020 U+064e U+0062 U+0061 U+0072
1137 out (length 9 bytes):
1140 4.41 Bidi: RandALCat without trailing RandALCat U+0627 U+0031
1142 in (length 3 bytes):
1147 Bidi string does not start/end with RAL characters.
1150 4.42 Bidi: RandALCat character U+0627 U+0031 U+0628
1152 in (length 5 bytes):
1155 U+0627 U+0031 U+0628
1159 U+0627 U+0031 U+0628
1160 out (length 5 bytes):
1174 Josefsson Expires August 2, 2003 [Page 21]
1176 Internet-Draft Nameprep and IDNA Test Vectors February 2003
1179 4.43 Unassigned code point U+E0002
1181 in (length 4 bytes):
1186 Table A.1 prohibits string (unassigned character U+e0002).
1189 4.44 Larger test (shrinking)
1191 in (length 22 bytes):
1192 X\xC2\xAD\xC3\xDF\xC4\xB0\xE2\x84\xA1j\xCC\x8C\xC2\xA0\xC2'
1193 '\xAA\xCE\xB0\xE2\x80\x80
1195 U+0058 U+00ad U+00df U+0130 U+2121 U+006a U+030c U+00a0
1196 U+00aa U+03b0 U+2000
1198 Table B.1 maps U+00ad to nothing.
1199 U+0058 U+00df U+0130 U+2121 U+006a U+030c U+00a0 U+00aa
1201 Table B.2 maps U+0058 to U+0078.
1202 Table B.2 maps U+00df to U+0073 U+0073.
1203 Table B.2 maps U+0130 to U+0069 U+0307.
1204 Table B.2 maps U+2121 to U+0074 U+0065 U+006c.
1205 Table B.2 maps U+03b0 to U+03c5 U+0308 U+0301.
1206 U+0078 U+0073 U+0073 U+0069 U+0307 U+0074 U+0065 U+006c
1207 U+006a U+030c U+00a0 U+00aa U+03c5 U+0308 U+0301 U+2000
1209 Unicode normalization with form KC maps string into:
1210 U+0078 U+0073 U+0073 U+0069 U+0307 U+0074 U+0065 U+006c
1211 U+01f0 U+0020 U+0061 U+03b0 U+0020
1214 U+0078 U+0073 U+0073 U+0069 U+0307 U+0074 U+0065 U+006c
1215 U+01f0 U+0020 U+0061 U+03b0 U+0020
1216 out (length 16 bytes):
1217 xssi\xCC\x87tel\xC7\xB0 a\xCE\xB0
1230 Josefsson Expires August 2, 2003 [Page 22]
1232 Internet-Draft Nameprep and IDNA Test Vectors February 2003
1235 4.45 Larger test (expanding)
1237 in (length 17 bytes):
1238 X\xC3\xDF\xE3\x8C\x96\xC4\xB0\xE2\x84\xA1\xE2\x92\x9F\xE3\x8C'
1241 U+0058 U+00df U+3316 U+0130 U+2121 U+249f U+3300
1243 Table B.2 maps U+0058 to U+0078.
1244 Table B.2 maps U+00df to U+0073 U+0073.
1245 Table B.2 maps U+0130 to U+0069 U+0307.
1246 Table B.2 maps U+2121 to U+0074 U+0065 U+006c.
1247 U+0078 U+0073 U+0073 U+3316 U+0069 U+0307 U+0074 U+0065
1248 U+006c U+249f U+3300
1249 Unicode normalization with form KC maps string into:
1250 U+0078 U+0073 U+0073 U+30ad U+30ed U+30e1 U+30fc U+30c8
1251 U+30eb U+0069 U+0307 U+0074 U+0065 U+006c U+0028 U+0064
1252 U+0029 U+30a2 U+30d1 U+30fc U+30c8
1255 U+0078 U+0073 U+0073 U+30ad U+30ed U+30e1 U+30fc U+30c8
1256 U+30eb U+0069 U+0307 U+0074 U+0065 U+006c U+0028 U+0064
1257 U+0029 U+30a2 U+30d1 U+30fc U+30c8
1258 out (length 42 bytes):
1259 xss\xE3\x82\xAD\xE3\x83\xAD\xE3\x83\xA1\xE3\x83\xBC\xE3'
1260 '\x83\x88\xE3\x83\xABi\xCC\x87tel\x28d\x29\xE3\x82'
1261 '\xA2\xE3\x83\x91\xE3\x83\xBC\xE3\x83\x88
1263 5. IDNA Test Vectors
1265 5.1 Arabic (Egyptian)
1267 in (length 34 bytes):
1268 '\xD9\x84\xD9\x8A\xD9\x87\xD9\x85\xD8\xA7\xD8\xA8\xD8\xAA\xD9\x83'
1269 '\xD9\x84\xD9\x85\xD9\x88\xD8\xB4\xD8\xB9\xD8\xB1\xD8\xA8\xD9\x8A'
1272 U+0644 U+064a U+0647 U+0645 U+0627 U+0628 U+062a U+0643
1273 U+0644 U+0645 U+0648 U+0634 U+0639 U+0631 U+0628 U+064a
1276 out: xn--egbpdaj6bu4bxfgehfvwxn
1286 Josefsson Expires August 2, 2003 [Page 23]
1288 Internet-Draft Nameprep and IDNA Test Vectors February 2003
1291 5.2 Chinese (simplified)
1293 in (length 27 bytes):
1294 '\xE4\xBB\x96\xE4\xBB\xAC\xE4\xB8\xBA\xE4\xBB\x80\xE4\xB9\x88\xE4'
1295 '\xB8\x8D\xE8\xAF\xB4\xE4\xB8\xAD\xE6\x96\x87
1297 U+4ed6 U+4eec U+4e3a U+4ec0 U+4e48 U+4e0d U+8bf4 U+4e2d
1300 out: xn--ihqwcrb4cv8a8dqg056pqjye
1303 5.3 Chinese (traditional)
1305 in (length 27 bytes):
1306 '\xE4\xBB\x96\xE5\x80\x91\xE7\x88\xB2\xE4\xBB\x80\xE9\xBA\xBD\xE4'
1307 '\xB8\x8D\xE8\xAA\xAA\xE4\xB8\xAD\xE6\x96\x87
1309 U+4ed6 U+5011 U+7232 U+4ec0 U+9ebd U+4e0d U+8aaa U+4e2d
1312 out: xn--ihqwctvzc91f659drss3x8bo0yb
1317 in (length 26 bytes):
1318 'Pro\xC4\x8Dprost\xC4\x9Bneml'
1319 'uv\xC3\xAD\xC4\x8Desky
1321 U+0050 U+0072 U+006f U+010d U+0070 U+0072 U+006f U+0073
1322 U+0074 U+011b U+006e U+0065 U+006d U+006c U+0075 U+0076
1323 U+00ed U+010d U+0065 U+0073 U+006b U+0079
1325 out: xn--proprostnemluvesky-uyb24dma41a
1342 Josefsson Expires August 2, 2003 [Page 24]
1344 Internet-Draft Nameprep and IDNA Test Vectors February 2003
1349 in (length 44 bytes):
1350 '\xD7\x9C\xD7\x9E\xD7\x94\xD7\x94\xD7\x9D\xD7\xA4\xD7\xA9\xD7\x95'
1351 '\xD7\x98\xD7\x9C\xD7\x90\xD7\x9E\xD7\x93\xD7\x91\xD7\xA8\xD7\x99'
1352 '\xD7\x9D\xD7\xA2\xD7\x91\xD7\xA8\xD7\x99\xD7\xAA
1354 U+05dc U+05de U+05d4 U+05d4 U+05dd U+05e4 U+05e9 U+05d5
1355 U+05d8 U+05dc U+05d0 U+05de U+05d3 U+05d1 U+05e8 U+05d9
1356 U+05dd U+05e2 U+05d1 U+05e8 U+05d9 U+05ea
1358 out: xn--4dbcagdahymbxekheh6e0a7fei0b
1361 5.6 Hindi (Devanagari)
1363 in (length 90 bytes):
1364 '\xE0\xA4\xAF\xE0\xA4\xB9\xE0\xA4\xB2\xE0\xA5\x8B\xE0\xA4\x97\xE0'
1365 '\xA4\xB9\xE0\xA4\xBF\xE0\xA4\xA8\xE0\xA5\x8D\xE0\xA4\xA6\xE0\xA5'
1366 '\x80\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA4\xAF\xE0\xA5\x8B\xE0\xA4\x82'
1367 '\xE0\xA4\xA8\xE0\xA4\xB9\xE0\xA5\x80\xE0\xA4\x82\xE0\xA4\xAC\xE0'
1368 '\xA5\x8B\xE0\xA4\xB2\xE0\xA4\xB8\xE0\xA4\x95\xE0\xA4\xA4\xE0\xA5'
1369 '\x87\xE0\xA4\xB9\xE0\xA5\x88\xE0\xA4\x82
1371 U+092f U+0939 U+0932 U+094b U+0917 U+0939 U+093f U+0928
1372 U+094d U+0926 U+0940 U+0915 U+094d U+092f U+094b U+0902
1373 U+0928 U+0939 U+0940 U+0902 U+092c U+094b U+0932 U+0938
1374 U+0915 U+0924 U+0947 U+0939 U+0948 U+0902
1376 out: xn--i1baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd
1379 5.7 Japanese (kanji and hiragana)
1381 in (length 54 bytes):
1382 '\xE3\x81\xAA\xE3\x81\x9C\xE3\x81\xBF\xE3\x82\x93\xE3\x81\xAA\xE6'
1383 '\x97\xA5\xE6\x9C\xAC\xE8\xAA\x9E\xE3\x82\x92\xE8\xA9\xB1\xE3\x81'
1384 '\x97\xE3\x81\xA6\xE3\x81\x8F\xE3\x82\x8C\xE3\x81\xAA\xE3\x81\x84'
1385 '\xE3\x81\xAE\xE3\x81\x8B
1387 U+306a U+305c U+307f U+3093 U+306a U+65e5 U+672c U+8a9e
1388 U+3092 U+8a71 U+3057 U+3066 U+304f U+308c U+306a U+3044
1391 out: xn--n8jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa
1398 Josefsson Expires August 2, 2003 [Page 25]
1400 Internet-Draft Nameprep and IDNA Test Vectors February 2003
1403 5.8 Russian (Cyrillic)
1405 in (length 56 bytes):
1406 '\xD0\xBF\xD0\xBE\xD1\x87\xD0\xB5\xD0\xBC\xD1\x83\xD0\xB6\xD0\xB5'
1407 '\xD0\xBE\xD0\xBD\xD0\xB8\xD0\xBD\xD0\xB5\xD0\xB3\xD0\xBE\xD0\xB2'
1408 '\xD0\xBE\xD1\x80\xD1\x8F\xD1\x82\xD0\xBF\xD0\xBE\xD1\x80\xD1\x83'
1409 '\xD1\x81\xD1\x81\xD0\xBA\xD0\xB8
1411 U+043f U+043e U+0447 U+0435 U+043c U+0443 U+0436 U+0435
1412 U+043e U+043d U+0438 U+043d U+0435 U+0433 U+043e U+0432
1413 U+043e U+0440 U+044f U+0442 U+043f U+043e U+0440 U+0443
1414 U+0441 U+0441 U+043a U+0438
1416 out: xn--b1abfaaepdrnnbgefbadotcwatmq2g4l
1421 in (length 42 bytes):
1422 'Porqu\xC3\xA9nopuedens'
1426 U+0050 U+006f U+0072 U+0071 U+0075 U+00e9 U+006e U+006f
1427 U+0070 U+0075 U+0065 U+0064 U+0065 U+006e U+0073 U+0069
1428 U+006d U+0070 U+006c U+0065 U+006d U+0065 U+006e U+0074
1429 U+0065 U+0068 U+0061 U+0062 U+006c U+0061 U+0072 U+0065
1430 U+006e U+0045 U+0073 U+0070 U+0061 U+00f1 U+006f U+006c
1433 out: xn--porqunopuedensimplementehablarenespaol-fmd56a
1454 Josefsson Expires August 2, 2003 [Page 26]
1456 Internet-Draft Nameprep and IDNA Test Vectors February 2003
1461 in (length 45 bytes):
1462 'T\xE1\xBA\xA1isaoh\xE1\xBB\x8Dkh\xC3\xB4'
1463 'ngth\xE1\xBB\x83ch\xE1\xBB\x89n\xC3\xB3i'
1464 'ti\xE1\xBA\xBFngVi\xE1\xBB\x87t
1466 U+0054 U+1ea1 U+0069 U+0073 U+0061 U+006f U+0068 U+1ecd
1467 U+006b U+0068 U+00f4 U+006e U+0067 U+0074 U+0068 U+1ec3
1468 U+0063 U+0068 U+1ec9 U+006e U+00f3 U+0069 U+0074 U+0069
1469 U+1ebf U+006e U+0067 U+0056 U+0069 U+1ec7 U+0074
1471 out: xn--tisaohkhngthchnitingvit-kjcr8268qyxafd2f1b9g
1476 in (length 20 bytes):
1477 '3\xE5\xB9\xB4B\xE7\xB5\x84\xE9\x87\x91\xE5\x85\xAB\xE5\x85'
1480 U+0033 U+5e74 U+0042 U+7d44 U+91d1 U+516b U+5148 U+751f
1483 out: xn--3b-ww4c5e180e575a65lsy2b
1488 in (length 34 bytes):
1489 '\xE5\xAE\x89\xE5\xAE\xA4\xE5\xA5\x88\xE7\xBE\x8E\xE6\x81\xB5\x2D'
1490 'with\x2DSUPER\x2DMONKE'
1493 U+5b89 U+5ba4 U+5948 U+7f8e U+6075 U+002d U+0077 U+0069
1494 U+0074 U+0068 U+002d U+0053 U+0055 U+0050 U+0045 U+0052
1495 U+002d U+004d U+004f U+004e U+004b U+0045 U+0059 U+0053
1498 out: xn---with-super-monkeys-pc58ag80a8qai00g7n9n
1510 Josefsson Expires August 2, 2003 [Page 27]
1512 Internet-Draft Nameprep and IDNA Test Vectors February 2003
1517 in (length 39 bytes):
1518 'Hello\x2DAnother\x2DWa'
1519 'y\x2D\xE3\x81\x9D\xE3\x82\x8C\xE3\x81\x9E\xE3\x82\x8C\xE3\x81'
1520 '\xAE\xE5\xA0\xB4\xE6\x89\x80
1522 U+0048 U+0065 U+006c U+006c U+006f U+002d U+0041 U+006e
1523 U+006f U+0074 U+0068 U+0065 U+0072 U+002d U+0057 U+0061
1524 U+0079 U+002d U+305d U+308c U+305e U+308c U+306e U+5834
1527 out: xn--hello-another-way--fc4qua05auwb3674vfr0b
1532 in (length 22 bytes):
1533 '\xE3\x81\xB2\xE3\x81\xA8\xE3\x81\xA4\xE5\xB1\x8B\xE6\xA0\xB9\xE3'
1534 '\x81\xAE\xE4\xB8\x8B2
1536 U+3072 U+3068 U+3064 U+5c4b U+6839 U+306e U+4e0b U+0032
1539 out: xn--2-u9tlzr9756bt3uc0v
1544 in (length 23 bytes):
1545 'Maji\xE3\x81\xA7Koi\xE3\x81\x99\xE3\x82\x8B'
1546 '5\xE7\xA7\x92\xE5\x89\x8D
1548 U+004d U+0061 U+006a U+0069 U+3067 U+004b U+006f U+0069
1549 U+3059 U+308b U+0035 U+79d2 U+524d
1551 out: xn--majikoi5-783gue6qz075azm5e
1566 Josefsson Expires August 2, 2003 [Page 28]
1568 Internet-Draft Nameprep and IDNA Test Vectors February 2003
1573 in (length 23 bytes):
1574 '\xE3\x83\x91\xE3\x83\x95\xE3\x82\xA3\xE3\x83\xBCde\xE3\x83'
1575 '\xAB\xE3\x83\xB3\xE3\x83\x90
1577 U+30d1 U+30d5 U+30a3 U+30fc U+0064 U+0065 U+30eb U+30f3
1580 out: xn--de-jg4avhby1noc0d
1585 in (length 21 bytes):
1586 '\xE3\x81\x9D\xE3\x81\xAE\xE3\x82\xB9\xE3\x83\x94\xE3\x83\xBC\xE3'
1587 '\x83\x89\xE3\x81\xA7
1589 U+305d U+306e U+30b9 U+30d4 U+30fc U+30c9 U+3067
1591 out: xn--d9juau41awczczp
1596 in (length 16 bytes):
1597 '\xCE\xB5\xCE\xBB\xCE\xBB\xCE\xB7\xCE\xBD\xCE\xB9\xCE\xBA\xCE\xAC
1599 U+03b5 U+03bb U+03bb U+03b7 U+03bd U+03b9 U+03ba U+03ac
1605 5.19 Maltese (Malti)
1607 in (length 13 bytes):
1608 'bon\xC4\xA1usa\xC4\xA7\xC4\xA7a
1610 U+0062 U+006f U+006e U+0121 U+0075 U+0073 U+0061 U+0127
1613 out: xn--bonusaa-5bb1da
1622 Josefsson Expires August 2, 2003 [Page 29]
1624 Internet-Draft Nameprep and IDNA Test Vectors February 2003
1627 5.20 Russian (Cyrillic)
1629 in (length 56 bytes):
1630 '\xD0\xBF\xD0\xBE\xD1\x87\xD0\xB5\xD0\xBC\xD1\x83\xD0\xB6\xD0\xB5'
1631 '\xD0\xBE\xD0\xBD\xD0\xB8\xD0\xBD\xD0\xB5\xD0\xB3\xD0\xBE\xD0\xB2'
1632 '\xD0\xBE\xD1\x80\xD1\x8F\xD1\x82\xD0\xBF\xD0\xBE\xD1\x80\xD1\x83'
1633 '\xD1\x81\xD1\x81\xD0\xBA\xD0\xB8
1635 U+043f U+043e U+0447 U+0435 U+043c U+0443 U+0436 U+0435
1636 U+043e U+043d U+0438 U+043d U+0435 U+0433 U+043e U+0432
1637 U+043e U+0440 U+044f U+0442 U+043f U+043e U+0440 U+0443
1638 U+0441 U+0441 U+043a U+0438
1640 out: xn--b1abfaaepdrnnbgefbadotcwatmq2g4l
1643 6. Security Considerations
1645 The security considerations from Nameprep and IDNA are inherited.
1647 These test vectors are not believed to introduce new security
1648 considerations nor disrupt the operation of the Internet, but may
1649 expose security weaknesses in existing implementations. Any such
1650 incident should not be regarded as a problem with this document,
1651 though, but rather taken as evidence that this document served its
1654 Normative References
1656 [1] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for
1657 Internationalized Domain Names (IDN)", RFC 3491, March 2003.
1659 [2] Faltstrom, P., Hoffman, P. and A. Costello, "Internationalizing
1660 Domain Names in Applications (IDNA)", RFC 3490, March 2003.
1662 Informative References
1664 [3] Costello, A., "Punycode: A Bootstring encoding of Unicode for
1665 Internationalized Domain Names in Applications (IDNA)", RFC
1678 Josefsson Expires August 2, 2003 [Page 30]
1680 Internet-Draft Nameprep and IDNA Test Vectors February 2003
1690 EMail: simon@josefsson.org
1694 Some IDNA test vectors were borrowed from Punycode [3].
1696 Appendix A. Nameprep test vectors in C syntax
1698 In order to avoid having implementors type in the test vectors above,
1699 a C structure with the data is provided.
1701 The comment field is the section titles used in this document. The
1702 in field contains UTF-8 encoded strings. The out field contains
1703 expected output, or NULL if the expected result is an error. The
1704 profile field can be ignored. The only significant setting for the
1705 flags field is STRINGPREP_NO_UNASSIGNED which signals to the Nameprep
1706 implementation that it should perform unassigned code point checking,
1707 aka the "AllowUnassigned" flag. The rc field contains expected error
1708 codes, where 0 indicates success and the other flags should be self
1724 "foo\xC2\xAD\xCD\x8F\xE1\xA0\x86\xE1\xA0\x8B"
1725 "bar""\xE2\x80\x8B\xE2\x81\xA0""baz\xEF\xB8\x80\xEF\xB8\x88"
1726 "\xEF\xB8\x8F\xEF\xBB\xBF", "foobarbaz"
1729 "Case folding ASCII U+0043 U+0041 U+0046 U+0045",
1734 Josefsson Expires August 2, 2003 [Page 31]
1736 Internet-Draft Nameprep and IDNA Test Vectors February 2003
1741 "Case folding 8bit U+00DF (german sharp s)",
1745 "Case folding U+0130 (turkish capital I with dot)",
1746 "\xC4\xB0", "i\xcc\x87"
1749 "Case folding multibyte U+0143 U+037A",
1750 "\xC5\x83\xCD\xBA", "\xC5\x84 \xCE\xB9"
1753 "Case folding U+2121 U+33C6 U+1D7BB",
1754 "\xE2\x84\xA1\xE3\x8F\x86\xF0\x9D\x9E\xBB",
1755 "telc\xE2\x88\x95""kg\xCF\x83"
1758 "Normalization of U+006a U+030c U+00A0 U+00AA",
1759 "\x6A\xCC\x8C\xC2\xA0\xC2\xAA", "\xC7\xB0 a"
1762 "Case folding U+1FB7 and normalization",
1763 "\xE1\xBE\xB7", "\xE1\xBE\xB6\xCE\xB9"
1766 "Self-reverting case folding U+01F0 and normalization",
1767 "\xC7\xF0", "\xC7\xB0"
1770 "Self-reverting case folding U+0390 and normalization",
1771 "\xCE\x90", "\xCE\x90"
1774 "Self-reverting case folding U+03B0 and normalization",
1775 "\xCE\xB0", "\xCE\xB0"
1778 "Self-reverting case folding U+1E96 and normalization",
1779 "\xE1\xBA\x96", "\xE1\xBA\x96"
1782 "Self-reverting case folding U+1F56 and normalization",
1783 "\xE1\xBD\x96", "\xE1\xBD\x96"
1786 "ASCII space character U+0020",
1790 Josefsson Expires August 2, 2003 [Page 32]
1792 Internet-Draft Nameprep and IDNA Test Vectors February 2003
1798 "Non-ASCII 8bit space character U+00A0",
1802 "Non-ASCII multibyte space character U+1680",
1803 "\xE1\x9A\x80", NULL, "Nameprep", 0,
1804 STRINGPREP_CONTAINS_PROHIBITED
1807 "Non-ASCII multibyte space character U+2000",
1808 "\xE2\x80\x80", "\x20"
1811 "Zero Width Space U+200b",
1815 "Non-ASCII multibyte space character U+3000",
1816 "\xE3\x80\x80", "\x20"
1819 "ASCII control characters U+0010 U+007F",
1820 "\x10\x7F", "\x10\x7F"
1823 "Non-ASCII 8bit control character U+0085",
1824 "\xC2\x85", NULL, "Nameprep", 0,
1825 STRINGPREP_CONTAINS_PROHIBITED
1828 "Non-ASCII multibyte control character U+180E",
1829 "\xE1\xA0\x8E", NULL, "Nameprep", 0,
1830 STRINGPREP_CONTAINS_PROHIBITED
1833 "Zero Width No-Break Space U+FEFF",
1837 "Non-ASCII control character U+1D175",
1838 "\xF0\x9D\x85\xB5", NULL, "Nameprep", 0,
1839 STRINGPREP_CONTAINS_PROHIBITED
1842 "Plane 0 private use character U+F123",
1846 Josefsson Expires August 2, 2003 [Page 33]
1848 Internet-Draft Nameprep and IDNA Test Vectors February 2003
1851 "\xEF\x84\xA3", NULL, "Nameprep", 0,
1852 STRINGPREP_CONTAINS_PROHIBITED
1855 "Plane 15 private use character U+F1234",
1856 "\xF3\xB1\x88\xB4", NULL, "Nameprep", 0,
1857 STRINGPREP_CONTAINS_PROHIBITED
1860 "Plane 16 private use character U+10F234",
1861 "\xF4\x8F\x88\xB4", NULL, "Nameprep", 0,
1862 STRINGPREP_CONTAINS_PROHIBITED
1865 "Non-character code point U+8FFFE",
1866 "\xF2\x8F\xBF\xBE", NULL, "Nameprep", 0,
1867 STRINGPREP_CONTAINS_PROHIBITED
1870 "Non-character code point U+10FFFF",
1871 "\xF4\x8F\xBF\xBF", NULL, "Nameprep", 0,
1872 STRINGPREP_CONTAINS_PROHIBITED
1875 "Surrogate code U+DF42",
1876 "\xED\xBD\x82", NULL, "Nameprep", 0,
1877 STRINGPREP_CONTAINS_PROHIBITED
1880 "Non-plain text character U+FFFD",
1881 "\xEF\xBF\xBD", NULL, "Nameprep", 0,
1882 STRINGPREP_CONTAINS_PROHIBITED
1885 "Ideographic description character U+2FF5",
1886 "\xE2\xBF\xB5", NULL, "Nameprep", 0,
1887 STRINGPREP_CONTAINS_PROHIBITED
1890 "Display property character U+0341",
1891 "\xCD\x81", "\xCC\x81"
1894 "Left-to-right mark U+200E",
1895 "\xE2\x80\x8E", "\xCC\x81", "Nameprep", 0,
1896 STRINGPREP_CONTAINS_PROHIBITED
1902 Josefsson Expires August 2, 2003 [Page 34]
1904 Internet-Draft Nameprep and IDNA Test Vectors February 2003
1907 "Deprecated U+202A",
1908 "\xE2\x80\xAA", "\xCC\x81", "Nameprep", 0,
1909 STRINGPREP_CONTAINS_PROHIBITED
1912 "Language tagging character U+E0001",
1913 "\xF3\xA0\x80\x81", "\xCC\x81", "Nameprep", 0,
1914 STRINGPREP_CONTAINS_PROHIBITED
1917 "Language tagging character U+E0042",
1918 "\xF3\xA0\x81\x82", NULL, "Nameprep", 0,
1919 STRINGPREP_CONTAINS_PROHIBITED
1922 "Bidi: RandALCat character U+05BE and LCat characters",
1923 "foo\xD6\xBE""bar", NULL, "Nameprep", 0,
1924 STRINGPREP_BIDI_BOTH_L_AND_RAL
1927 "Bidi: RandALCat character U+FD50 and LCat characters",
1928 "foo\xEF\xB5\x90""bar", NULL, "Nameprep", 0,
1929 STRINGPREP_BIDI_BOTH_L_AND_RAL
1932 "Bidi: RandALCat character U+FB38 and LCat characters",
1933 "foo\xEF\xB9\xB6""bar", "foo \xd9\x8e""bar"
1935 { "Bidi: RandALCat without trailing RandALCat U+0627 U+0031",
1936 "\xD8\xA7\x31", NULL, "Nameprep", 0,
1937 STRINGPREP_BIDI_LEADTRAIL_NOT_RAL}
1940 "Bidi: RandALCat character U+0627 U+0031 U+0628",
1941 "\xD8\xA7\x31\xD8\xA8", "\xD8\xA7\x31\xD8\xA8"
1944 "Unassigned code point U+E0002",
1945 "\xF3\xA0\x80\x82", NULL, "Nameprep", STRINGPREP_NO_UNASSIGNED,
1946 STRINGPREP_CONTAINS_UNASSIGNED
1949 "Larger test (shrinking)",
1950 "X\xC2\xAD\xC3\xDF\xC4\xB0\xE2\x84\xA1\x6a\xcc\x8c\xc2\xa0\xc2"
1951 "\xaa\xce\xb0\xe2\x80\x80", "xssi\xcc\x87""tel\xc7\xb0 a\xce\xb0 ",
1958 Josefsson Expires August 2, 2003 [Page 35]
1960 Internet-Draft Nameprep and IDNA Test Vectors February 2003
1963 "Larger test (expanding)",
1964 "X\xC3\xDF\xe3\x8c\x96\xC4\xB0\xE2\x84\xA1\xE2\x92\x9F\xE3\x8c\x80",
1965 "xss\xe3\x82\xad\xe3\x83\xad\xe3\x83\xa1\xe3\x83\xbc\xe3\x83\x88"
1966 "\xe3\x83\xab""i\xcc\x87""tel\x28""d\x29\xe3\x82\xa2\xe3\x83\x91"
1967 "\xe3\x83\xbc\xe3\x83\x88"
1972 Appendix B. IDNA test vectors in C syntax
1974 In order to avoid having implementors type in the IDNA test vectors
1975 above, a C structure with the data is provided.
1977 The name field is the section titles used in this document. The
1978 inlen and in field contains Unicode code points. The out field
1979 contains expected ToASCII output. The allowunassigned, and
1980 usestd3asciirules can be ignored. The toasciirc and tounicoderc
1981 field contains expected error codes, where 0 indicates success and
1982 the other flags should be self explanatory.
1988 unsigned long in[100];
1990 int allowunassigned;
1991 int usestd3asciirules;
1997 "Arabic (Egyptian)", 17,
1999 0x0644, 0x064A, 0x0647, 0x0645, 0x0627, 0x0628, 0x062A, 0x0643,
2000 0x0644, 0x0645, 0x0648, 0x0634, 0x0639, 0x0631, 0x0628, 0x064A,
2002 IDNA_ACE_PREFIX "egbpdaj6bu4bxfgehfvwxn", 0, 0, IDNA_SUCCESS,
2005 "Chinese (simplified)", 9,
2007 0x4ED6, 0x4EEC, 0x4E3A, 0x4EC0, 0x4E48, 0x4E0D, 0x8BF4, 0x4E2D, 0x6587},
2008 IDNA_ACE_PREFIX "ihqwcrb4cv8a8dqg056pqjye", 0, 0, IDNA_SUCCESS,
2014 Josefsson Expires August 2, 2003 [Page 36]
2016 Internet-Draft Nameprep and IDNA Test Vectors February 2003
2019 "Chinese (traditional)", 9,
2021 0x4ED6, 0x5011, 0x7232, 0x4EC0, 0x9EBD, 0x4E0D, 0x8AAA, 0x4E2D, 0x6587},
2022 IDNA_ACE_PREFIX "ihqwctvzc91f659drss3x8bo0yb", 0, 0, IDNA_SUCCESS,
2027 0x0050, 0x0072, 0x006F, 0x010D, 0x0070, 0x0072, 0x006F, 0x0073,
2028 0x0074, 0x011B, 0x006E, 0x0065, 0x006D, 0x006C, 0x0075, 0x0076,
2029 0x00ED, 0x010D, 0x0065, 0x0073, 0x006B, 0x0079},
2030 IDNA_ACE_PREFIX "Proprostnemluvesky-uyb24dma41a", 0, 0, IDNA_SUCCESS,
2035 0x05DC, 0x05DE, 0x05D4, 0x05D4, 0x05DD, 0x05E4, 0x05E9, 0x05D5,
2036 0x05D8, 0x05DC, 0x05D0, 0x05DE, 0x05D3, 0x05D1, 0x05E8, 0x05D9,
2037 0x05DD, 0x05E2, 0x05D1, 0x05E8, 0x05D9, 0x05EA},
2038 IDNA_ACE_PREFIX "4dbcagdahymbxekheh6e0a7fei0b", 0, 0, IDNA_SUCCESS,
2041 "Hindi (Devanagari)", 30,
2043 0x092F, 0x0939, 0x0932, 0x094B, 0x0917, 0x0939, 0x093F, 0x0928,
2044 0x094D, 0x0926, 0x0940, 0x0915, 0x094D, 0x092F, 0x094B, 0x0902,
2045 0x0928, 0x0939, 0x0940, 0x0902, 0x092C, 0x094B, 0x0932, 0x0938,
2046 0x0915, 0x0924, 0x0947, 0x0939, 0x0948, 0x0902},
2047 IDNA_ACE_PREFIX "i1baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd", 0, 0,
2050 "Japanese (kanji and hiragana)", 18,
2052 0x306A, 0x305C, 0x307F, 0x3093, 0x306A, 0x65E5, 0x672C, 0x8A9E,
2053 0x3092, 0x8A71, 0x3057, 0x3066, 0x304F, 0x308C, 0x306A, 0x3044,
2055 IDNA_ACE_PREFIX "n8jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa", 0, 0,
2058 "Russian (Cyrillic)", 28,
2060 0x043F, 0x043E, 0x0447, 0x0435, 0x043C, 0x0443, 0x0436, 0x0435,
2061 0x043E, 0x043D, 0x0438, 0x043D, 0x0435, 0x0433, 0x043E, 0x0432,
2062 0x043E, 0x0440, 0x044F, 0x0442, 0x043F, 0x043E, 0x0440, 0x0443,
2063 0x0441, 0x0441, 0x043A, 0x0438},
2064 IDNA_ACE_PREFIX "b1abfaaepdrnnbgefbadotcwatmq2g4l", 0, 0,
2065 IDNA_SUCCESS, IDNA_SUCCESS},
2070 Josefsson Expires August 2, 2003 [Page 37]
2072 Internet-Draft Nameprep and IDNA Test Vectors February 2003
2077 0x0050, 0x006F, 0x0072, 0x0071, 0x0075, 0x00E9, 0x006E, 0x006F,
2078 0x0070, 0x0075, 0x0065, 0x0064, 0x0065, 0x006E, 0x0073, 0x0069,
2079 0x006D, 0x0070, 0x006C, 0x0065, 0x006D, 0x0065, 0x006E, 0x0074,
2080 0x0065, 0x0068, 0x0061, 0x0062, 0x006C, 0x0061, 0x0072, 0x0065,
2081 0x006E, 0x0045, 0x0073, 0x0070, 0x0061, 0x00F1, 0x006F, 0x006C},
2082 IDNA_ACE_PREFIX "PorqunopuedensimplementehablarenEspaol-fmd56a", 0, 0,
2087 0x0054, 0x1EA1, 0x0069, 0x0073, 0x0061, 0x006F, 0x0068, 0x1ECD,
2088 0x006B, 0x0068, 0x00F4, 0x006E, 0x0067, 0x0074, 0x0068, 0x1EC3,
2089 0x0063, 0x0068, 0x1EC9, 0x006E, 0x00F3, 0x0069, 0x0074, 0x0069,
2090 0x1EBF, 0x006E, 0x0067, 0x0056, 0x0069, 0x1EC7, 0x0074},
2091 IDNA_ACE_PREFIX "TisaohkhngthchnitingVit-kjcr8268qyxafd2f1b9g", 0, 0,
2096 0x0033, 0x5E74, 0x0042, 0x7D44, 0x91D1, 0x516B, 0x5148, 0x751F},
2097 IDNA_ACE_PREFIX "3B-ww4c5e180e575a65lsy2b", 0, 0, IDNA_SUCCESS,
2102 0x5B89, 0x5BA4, 0x5948, 0x7F8E, 0x6075, 0x002D, 0x0077, 0x0069,
2103 0x0074, 0x0068, 0x002D, 0x0053, 0x0055, 0x0050, 0x0045, 0x0052,
2104 0x002D, 0x004D, 0x004F, 0x004E, 0x004B, 0x0045, 0x0059, 0x0053},
2105 IDNA_ACE_PREFIX "-with-SUPER-MONKEYS-pc58ag80a8qai00g7n9n", 0, 0,
2110 0x0048, 0x0065, 0x006C, 0x006C, 0x006F, 0x002D, 0x0041, 0x006E,
2111 0x006F, 0x0074, 0x0068, 0x0065, 0x0072, 0x002D, 0x0057, 0x0061,
2112 0x0079, 0x002D, 0x305D, 0x308C, 0x305E, 0x308C, 0x306E, 0x5834,
2114 IDNA_ACE_PREFIX "Hello-Another-Way--fc4qua05auwb3674vfr0b", 0, 0,
2119 0x3072, 0x3068, 0x3064, 0x5C4B, 0x6839, 0x306E, 0x4E0B, 0x0032},
2120 IDNA_ACE_PREFIX "2-u9tlzr9756bt3uc0v", 0, 0, IDNA_SUCCESS,
2126 Josefsson Expires August 2, 2003 [Page 38]
2128 Internet-Draft Nameprep and IDNA Test Vectors February 2003
2133 0x004D, 0x0061, 0x006A, 0x0069, 0x3067, 0x004B, 0x006F, 0x0069,
2134 0x3059, 0x308B, 0x0035, 0x79D2, 0x524D},
2135 IDNA_ACE_PREFIX "MajiKoi5-783gue6qz075azm5e", 0, 0, IDNA_SUCCESS,
2140 0x30D1, 0x30D5, 0x30A3, 0x30FC, 0x0064, 0x0065, 0x30EB, 0x30F3, 0x30D0},
2141 IDNA_ACE_PREFIX "de-jg4avhby1noc0d", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS},
2145 0x305D, 0x306E, 0x30B9, 0x30D4, 0x30FC, 0x30C9, 0x3067},
2146 IDNA_ACE_PREFIX "d9juau41awczczp", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS},
2149 {0x03b5, 0x03bb, 0x03bb, 0x03b7, 0x03bd, 0x03b9, 0x03ba, 0x03ac},
2150 IDNA_ACE_PREFIX "hxargifdar", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS},
2152 "Maltese (Malti)", 10,
2153 {0x0062, 0x006f, 0x006e, 0x0121, 0x0075, 0x0073, 0x0061, 0x0127,
2155 IDNA_ACE_PREFIX "bonusaa-5bb1da", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS},
2157 "Russian (Cyrillic)", 28,
2158 {0x043f, 0x043e, 0x0447, 0x0435, 0x043c, 0x0443, 0x0436, 0x0435,
2159 0x043e, 0x043d, 0x0438, 0x043d, 0x0435, 0x0433, 0x043e, 0x0432,
2160 0x043e, 0x0440, 0x044f, 0x0442, 0x043f, 0x043e, 0x0440, 0x0443,
2161 0x0441, 0x0441, 0x043a, 0x0438},
2162 IDNA_ACE_PREFIX "b1abfaaepdrnnbgefbadotcwatmq2g4l", 0, 0,
2163 IDNA_SUCCESS, IDNA_SUCCESS},
2182 Josefsson Expires August 2, 2003 [Page 39]
2184 Internet-Draft Nameprep and IDNA Test Vectors February 2003
2187 Intellectual Property Statement
2189 The IETF takes no position regarding the validity or scope of any
2190 intellectual property or other rights that might be claimed to
2191 pertain to the implementation or use of the technology described in
2192 this document or the extent to which any license under such rights
2193 might or might not be available; neither does it represent that it
2194 has made any effort to identify any such rights. Information on the
2195 IETF's procedures with respect to rights in standards-track and
2196 standards-related documentation can be found in BCP-11. Copies of
2197 claims of rights made available for publication and any assurances of
2198 licenses to be made available, or the result of an attempt made to
2199 obtain a general license or permission for the use of such
2200 proprietary rights by implementors or users of this specification can
2201 be obtained from the IETF Secretariat.
2203 The IETF invites any interested party to bring to its attention any
2204 copyrights, patents or patent applications, or other proprietary
2205 rights which may cover technology that may be required to practice
2206 this standard. Please address the information to the IETF Executive
2210 Full Copyright Statement
2212 Copyright (C) Simon Josefsson (2003). All Rights Reserved.
2214 Copyright (C) The Internet Society (2003). All Rights Reserved.
2216 This document and translations of it may be copied and furnished to
2217 others, and derivative works that comment on or otherwise explain it
2218 or assist in its implementation may be prepared, copied, published
2219 and distributed, in whole or in part, without restriction of any
2220 kind, provided that the above copyright notice and this paragraph are
2221 included on all such copies and derivative works. However, this
2222 document itself may not be modified in any way, such as by removing
2223 the copyright notice or references to the Internet Society or other
2224 Internet organizations, except as needed for the purpose of
2225 developing Internet standards in which case the procedures for
2226 copyrights defined in the Internet Standards process must be
2227 followed, or as required to translate it into languages other than
2230 The limited permissions granted above are perpetual and will not be
2231 revoked by the Internet Society or its successors or assignees.
2233 This document and the information contained herein is provided on an
2234 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
2238 Josefsson Expires August 2, 2003 [Page 40]
2240 Internet-Draft Nameprep and IDNA Test Vectors February 2003
2243 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
2244 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
2245 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
2246 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
2251 Funding for the RFC Editor function is currently provided by the
2294 Josefsson Expires August 2, 2003 [Page 41]