7 Internet-Draft Kurt D. Zeilenga
8 Intended Category: Standard Track OpenLDAP Foundation
9 Expires in six months 26 May 2003
13 LDAP: Internationalized String Preparation
14 <draft-ietf-ldapbis-strprep-00.txt>
19 This document is an Internet-Draft and is in full conformance with all
20 provisions of Section 10 of RFC2026.
22 Distribution of this memo is unlimited. Technical discussion of this
23 document will take place on the IETF LDAP Revision Working Group
24 mailing list <ietf-ldapbis@openldap.org>. Please send editorial
25 comments directly to the author <Kurt@OpenLDAP.org>.
27 Internet-Drafts are working documents of the Internet Engineering Task
28 Force (IETF), its areas, and its working groups. Note that other
29 groups may also distribute working documents as Internet-Drafts.
30 Internet-Drafts are draft documents valid for a maximum of six months
31 and may be updated, replaced, or obsoleted by other documents at any
32 time. It is inappropriate to use Internet-Drafts as reference
33 material or to cite them other than as ``work in progress.''
35 The list of current Internet-Drafts can be accessed at
36 <http://www.ietf.org/ietf/1id-abstracts.txt>. The list of
37 Internet-Draft Shadow Directories can be accessed at
38 <http://www.ietf.org/shadow.html>.
40 Copyright (C) The Internet Society (2003). All Rights Reserved.
42 Please see the Full Copyright section near the end of this document
48 The previous Lightweight Directory Access Protocol (LDAP) technical
49 specifications did not precisely define how string matching is to be
50 performed. This lead to a number of usability and interoperability
51 problems. This document defines string preparation algorithms for
52 matching rules defined for use in LDAP.
58 Zeilenga LDAPprep [Page 1]
60 Internet-Draft draft-ietf-ldapbis-strprep-00 26 May 2003
65 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
66 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
67 document are to be interpreted as described in BCP 14 [RFC2119].
69 Character names in this document use the notation for code points and
70 names from the Unicode Standard [Unicode]. For example, the letter
71 "a" may be represented as either <U+0061> or <LATIN SMALL LETTER A>.
72 In the lists of mappings and the prohibited characters, the "U+" is
73 left off to make the lists easier to read. The comments for character
74 ranges are shown in square brackets (such as "[CONTROL CHARACTERS]")
75 and do not come from the standard.
77 Note: a glossary of terms used in Unicode can be found in [Glossary].
78 Information on the Unicode character encoding model can be found in
86 A Lightweight Directory Access Protocol (LDAP) [Roadmap] matching rule
87 [Syntaxes] defines an algorithm for determining whether a presented
88 value matches an attribute value in accordance with the criteria
89 defined for the rule. The proposition may be evaluated to True,
92 True - the attribute contains a matching value,
94 False - the attribute contains no matching value,
96 Undefined - it cannot be determined whether the attribute contains
97 a matching value or not.
99 For instance, the caseIgnoreMatch matching rule may be used to compare
100 whether the commonName attribute contains a particular value without
101 regard for case and insignificant spaces.
104 1.2. X.500 String Matching Rules
106 "X.520: Selected attribute types" [X.520] provides (amongst other
107 things) value syntaxes and matching rules for comparing values
108 commonly used in the Directory. These specifications are inadequate
109 for strings composed of characters from the Universal Character Set
110 (UCS) [ISO10646], a superset of Unicode [Unicode].
114 Zeilenga LDAPprep [Page 2]
116 Internet-Draft draft-ietf-ldapbis-strprep-00 26 May 2003
119 The caseIgnoreMatch matching rule [X.520], for example, is simply
120 defined as being a case insensitive comparison where insignificant
121 spaces are ignored. For printableString, there is only one space
122 character and case mapping is bijective, hence this definition is
123 sufficient. However, for UCS-based string types such as
124 universalString, this is not sufficient. For example, a case
125 insensitive matching implementation which folded lower case characters
126 to upper case would yield different different results than an
127 implementation which used upper case to lower case folding. Or one
128 implementation may view space as referring to only SPACE (U+0020), a
129 second implementation may view any character with the space separator
130 (Zs) property as a space, and another implementation may view any
131 character with the whitespace (WS) category as a space.
133 The lack of precise specification for string matching has led to
134 significant interoperability problems. When used in certificate chain
135 validation, security vulnerabilities can arise. To address these
136 problems, this document defines precise algorithms for preparing
137 strings for matching.
140 1.3. Relationship to "stringprep"
142 The string preparation algorithms described in this document are based
143 upon the "stringprep" approach [RFC3454]. In "stringprep", presented
144 and stored values are first prepared for comparison and so that a
145 character-by-character comparison yields the "correct" result.
147 The approach used here is a refinement of the "stringprep" [RFC3454]
148 approach. Each algorithm involves two additional preparation steps.
150 a) prior to applying the Unicode string preparation steps outlined in
151 "stringprep", the string is transcoded to Unicode;
153 b) after applying the Unicode string preparation steps outlined in
154 "stringprep", characters insignificant to the matching rules are
157 Hence, preparation of strings for X.500 matching involves the
164 5) Check Bidi (Bidirectional)
165 6) Insignificant Character Removal
170 Zeilenga LDAPprep [Page 3]
172 Internet-Draft draft-ietf-ldapbis-strprep-00 26 May 2003
175 These steps are described in Section 2.
178 1.4. Relationship to the LDAP Technical Specification
180 This document is a integral part of the LDAP technical specification
181 [Roadmap] which obsoletes the previously defined LDAP technical
182 specification [RFC3377] in its entirety.
184 This document details new LDAP internationalized string preparation
185 algorithms used by [Syntaxes] and possible other technical
186 specifications defining LDAP syntaxes and/or matching rules.
189 1.5. Relationship to X.500
191 LDAP is defined [Roadmap] in X.500 terms as an X.500 access mechanism.
192 As such, there is a strong desire for alignment between LDAP and X.500
193 syntax and semantics. The string preparation algorithms described in
194 this document are based upon "Internationalized String Matching Rules
195 for X.500" [XMATCH] proposal to ITU/ISO Joint Study Group 2.
198 2. String Preparation
200 The following six-step process SHALL be applied to each presented and
201 attribute value in preparation for string match rule evaluation.
208 6) Insignificant Character Removal
210 Failure in any step is be cause the assertion to be Undefined.
212 The character repertoire of this process is Unicode 3.2 [Unicode].
217 Each non-Unicode string value is transcoded to Unicode.
219 TeletexString [X.680][T.61] values are transcoded to Unicode as
220 described in Appendix A.
222 PrintableString [X.680] value are transcoded directly to Unicode.
226 Zeilenga LDAPprep [Page 4]
228 Internet-Draft draft-ietf-ldapbis-strprep-00 26 May 2003
231 UniversalString, UTF8String, and bmpString [X.680] values need not be
232 transcoded as they are Unicode-based strings (in the case of
233 bmpString, a subset of Unicode).
235 If the implementation is unable or unwilling to perform the
236 transcoding as described above, or the transcoding fails, this step
237 fails and the assertion is evaluated to Undefined.
239 The transcoded string is the output string.
244 SOFT HYPHEN (U+00AD) and MONGOLIAN TODO SOFT HYPHEN (U+1806) code
245 points are mapped to nothing. COMBINING GRAPHEME JOINER (U+034F) and
246 VARIATION SELECTORs (U+180B-180D,FF00-FE0F) code points are also
247 mapped to nothing. The OBJECT REPLACEMENT CHARACTER (U+FFFC) is
250 CHARACTER TABULATION (U+0009), LINE FEED (LF) (U+000A), LINE
251 TABULATION (U+000B), FORM FEED (FF) (U+000C), CARRIAGE RETURN (CR)
252 (U+000D), and NEXT LINE (NEL) (U+0085) are mapped to SPACE (U+0020).
254 All other control code points (e.g., Cc) or code points with a control
255 function (e.g., Cf) are mapped to nothing.
257 ZERO WIDTH SPACE (U+200B) is mapped to nothing. All other code points
258 with Separator (space, line, or paragraph) property (e.g, Zs, Zl, or
259 Zp) are mapped to SPACE (U+0020).
261 For case ignore, numeric, and stored prefix string matching rules,
262 characters are case folded per B.2 of [RFC3454].
267 The input string is be normalized to Unicode Form KC (compatibility
268 composed) as described in [UAX15].
273 All Unassigned, Private Use, and non-character code points are
274 prohibited. Surrogate codes (U+D800-DFFFF) are prohibited.
276 The REPLACEMENT CHARACTER (U+FFFD) code point is prohibited.
278 The first code point of a string is prohibited from being a combining
282 Zeilenga LDAPprep [Page 5]
284 Internet-Draft draft-ietf-ldapbis-strprep-00 26 May 2003
289 Empty strings are prohibited.
291 The step fails and the assertion is evaluated to Undefined if the
292 input string contains any prohibited code point. The output string is
298 There are no bidirectional restrictions. The output string is the
302 2.5. Insignificant Character Removal
304 In this step, characters insignificant to the matching rule are to be
305 removed. The characters to be removed differ from matching rule to
308 Section 2.6.1 applies to case ignore and exact string matching.
309 Section 2.6.2 applies to numericString matching.
310 Section 2.6.3 applies to telephoneNumber matching
313 2.6.1. Insignificant Space Removal
315 For the purposes of this section, a space is defined to be the SPACE
316 (U+0020) code point followed by no combining marks.
318 NOTE - The previous steps ensure that the string cannot contain any
319 code points in the separator class, other than SPACE (U+0020).
321 The following spaces are regarded as not significant and are to be
323 - leading spaces (i.e. those preceding the first character that is
325 - trailing spaces (i.e. those following the last character that is
327 - multiple consecutive spaces (these are taken as equivalent to a
328 single space character).
330 A string consisting entirely of spaces is equivalent to a string
331 containing exactly one space.
333 For example, removal of spaces from the Form KC string:
334 "<SPACE><SPACE>foo<SPACE><SPACE>bar<SPACE><SPACE>"
338 Zeilenga LDAPprep [Page 6]
340 Internet-Draft draft-ietf-ldapbis-strprep-00 26 May 2003
343 would result in the output string:
345 and the Form KC string:
346 "<SPACE><SPACE><SPACE>"
347 would result in the output string:
351 2.6.2. numericString Insignificant Character Removal
353 For the purposes of this section, a space is defined to be the SPACE
354 (U+0020) code point followed by no combining marks.
356 All spaces are regarded as not significant and are to be removed.
358 For example, removal of spaces from the Form KC string:
359 "<SPACE><SPACE>123<SPACE><SPACE>456<SPACE><SPACE>" would result in
362 and the Form KC string:
363 "<SPACE><SPACE><SPACE>"
364 would result in an empty output string.
367 2.6.3. telephoneNumber Insignificant Character Removal
369 For the purposes of this section, a hyphen is defined to be
370 HYPHEN-MINUS (U+002D), ARMENIAN HYPHEN (U+058A), HYPHEN (U+2010),
371 NON-BREAKING HYPHEN (U+2011), MINUS SIGN (U+2212), SMALL HYPHEN-MINUS
372 (U+FE63), or FULLWIDTH HYPHEN-MINUS (U+FF0D) code point followed by no
373 combining marks and a space is defined to be the SPACE (U+0020) code
374 point followed by no combining marks.
376 All hyphens and spaces are regarded as not significant and are to be
380 3. Security Considerations
382 "Preparation for International Strings ('stringprep')" [RFC3454]
383 security considerations generally apply to the algorithms described
389 Appendix A and B of this document were authored by Howard Chu
390 <hyc@symas.com> of Symas Corporation (based upon information provided
394 Zeilenga LDAPprep [Page 7]
396 Internet-Draft draft-ietf-ldapbis-strprep-00 26 May 2003
404 The approach used in this document is based upon design principles and
405 algorithms described in "Preparation of Internationalized Strings
406 ('stringprep')" [RFC3454] by Paul Hoffman and Marc Blanchet. Some
407 additional guidance was drawn from Unicode Technical Standards,
408 Technical Reports, and Notes.
414 E-mail: <kurt@openldap.org>
419 7.1. Normative References
421 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
422 Requirement Levels", BCP 14 (also RFC 2119), March 1997.
424 [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
425 Internationalized Strings ('stringprep')", RFC 3454,
428 [Roadmap] Zeilenga, K. (editor), "LDAP: Technical Specification
429 Road Map", draft-ietf-ldapbis-roadmap-xx.txt, a work in
432 [Syntaxes] Legg, S. (editor), "LDAP: Syntaxes and Matching Rules",
433 draft-ietf-ldapbis-syntaxes-xx.txt, a work in progress.
435 [ISO10646] International Organization for Standardization,
436 "Universal Multiple-Octet Coded Character Set (UCS) -
437 Architecture and Basic Multilingual Plane", ISO/IEC
440 [Unicode] The Unicode Consortium, "The Unicode Standard, Version
441 3.2.0" is defined by "The Unicode Standard, Version 3.0"
442 (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5),
443 as amended by the "Unicode Standard Annex #27: Unicode
444 3.1" (http://www.unicode.org/reports/tr27/) and by the
445 "Unicode Standard Annex #28: Unicode 3.2"
446 (http://www.unicode.org/reports/tr28/).
450 Zeilenga LDAPprep [Page 8]
452 Internet-Draft draft-ietf-ldapbis-strprep-00 26 May 2003
455 [UAX15] Davis, M. and M. Duerst, "Unicode Standard Annex #15:
456 Unicode Normalization Forms, Version 3.2.0".
457 <http://www.unicode.org/unicode/reports/tr15/tr15-22.html>,
460 [X.680] International Telecommunication Union -
461 Telecommunication Standardization Sector, "Abstract
462 Syntax Notation One (ASN.1) - Specification of Basic
463 Notation", X.680(1997) (also ISO/IEC 8824-1:1998).
465 [T.61] CCITT (now ITU), "Character Repertoire and Coded
466 Character Sets for the International Teletex Service",
469 7.2. Informative References
471 [X.500] International Telecommunication Union -
472 Telecommunication Standardization Sector, "The Directory
473 -- Overview of concepts, models and services,"
474 X.500(1993) (also ISO/IEC 9594-1:1994).
476 [X.501] International Telecommunication Union -
477 Telecommunication Standardization Sector, "The Directory
478 -- Models," X.501(1993) (also ISO/IEC 9594-2:1994).
480 [X.520] International Telecommunication Union -
481 Telecommunication Standardization Sector, "The
482 Directory: Selected Attribute Types", X.520(1993) (also
483 ISO/IEC 9594-6:1994).
485 [Glossary] The Unicode Consortium, "Unicode Glossary",
486 <http://www.unicode.org/glossary/>.
488 [CharModel] Whistler, K. and M. Davis, "Unicode Technical Report
489 #17, Character Encoding Model", UTR17,
490 <http://www.unicode.org/unicode/reports/tr17/>, August
493 [XMATCH] Zeilenga, K., "Internationalized String Matching Rules
494 for X.500", draft-zeilenga-ldapbis-strmatch-xx.txt a
497 [RFC1345] Simonsen, K., "Character Mnemonics & Character Sets",
501 Appendix A. Teletex (T.61) to Unicode
506 Zeilenga LDAPprep [Page 9]
508 Internet-Draft draft-ietf-ldapbis-strprep-00 26 May 2003
511 This appendix defines an algorithm for transcoding [T.61] characters
512 to [Unicode] characters for use in string preparation for LDAP
513 matching rules. This appendix is a normative.
515 The transcoding algorithm is derived from the T.61-8bit definition
516 provided in [RFC1345]. With a few exceptions, the T.61 character
517 codes from x00 to x7f are equivalent to the corresponding [Unicode]
518 code points, and their values are left unchanged by this algorithm.
519 E.g. the T.61 code x20 is identical to (U+0020). The exceptions are
520 for these T.61 codes that are undefined: x23, x24, x5c, x5e, x60, x7b,
523 The codes from x80 to x9f are also equivalent to the corresponding
524 Unicode code points. This is specified for completeness only, as
525 these codes are control characters, and will be mapped to nothing in
526 the LDAP String Preparation Mapping step.
528 The remaining T.61 codes are mapped below in Table A.1. Table
529 positions marked "??" are undefined.
531 Input strings containing undefined T.61 codes SHALL produce an
532 Undefined matching result. For diagnostic purposes, this algorithm
533 does not fail for undefined input codes. Instead, undefined codes in
534 the input are mapped to the Unicode REPLACEMENT CHARACTER (U+FFFD).
535 As the LDAP String Preparation Probhibit step disallows the
536 REPLACEMENT CHARACTER from appearing in its output, this transcoding
537 yields the desired effect.
539 Note: RFC 1345 listed the non-spacing accent codepoints as residing in
540 the range starting at (U+E000). In the current Unicode
541 standard, the (U+E000) range is reserved for Private Use, and
542 the non-spacing accents are in the range starting at (U+0300).
543 The tables here use the (U+0300) range for these accents.
545 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
546 --+------+------+------+------+------+------+------+------+
547 a0| 00a0 | 00a1 | 00a2 | 00a3 | 0024 | 00a5 | 0023 | 00a7 |
548 a8| 00a8 | ?? | ?? | 00ab | ?? | ?? | ?? | ?? |
549 b0| 00b0 | 00b1 | 00b2 | 00b3 | 00d7 | 00b5 | 00b6 | 00b7 |
550 b8| 00f7 | ?? | ?? | 00bb | 00bc | 00bd | 00be | 00bf |
551 c0| ?? | 0300 | 0301 | 0302 | 0303 | 0304 | 0306 | 0307 |
552 c8| 0308 | ?? | 030a | 0327 | 0332 | 030b | 0328 | 030c |
553 d0| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
554 d8| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
555 e0| 2126 | 00c6 | 00d0 | 00aa | ?? | 0126 | 0132 | 013f |
556 e8| 0141 | 00d8 | 0152 | 00ba | 00de | 0166 | 014a | 0149 |
557 f0| 0138 | 00e6 | 0111 | 00f0 | 0127 | 0131 | 0133 | 0140 |
558 f8| 0142 | 00f8 | 0153 | 00df | 00fe | 0167 | 014b | ?? |
562 Zeilenga LDAPprep [Page 10]
564 Internet-Draft draft-ietf-ldapbis-strprep-00 26 May 2003
567 --+------+------+------+------+------+------+------+------+
568 Table A.1: Mapping of 8-bit T.61 codes to Unicode
570 T.61 also defines a number of accented characters that are formed by
571 combining an accent prefix followed by a base character. These
572 prefixes are in the code range xc1 to xcf. If a prefix character
573 appears at the end of a string, the result is undefined. Otherwise
574 these sequences are mapped to Unicode by substituting the
575 corresponding non-spacing accent code (as listed in Table A.1) for the
576 accent prefix, and exchanging the order so that the base character
580 Appendix B. Additional Teletex (T.61) to Unicode Tables
582 All of the accented characters in T.61 have a corresponding code point
583 in Unicode. For the sake of completeness, the combined character
584 codes are presented in the following tables. This is informational
585 only; for matching purposes it is sufficient to map the non-spacing
586 accent and exchange the order of the character pair as specified in
590 B.1. Combinations with SPACE
592 Accents may be combined with a <SPACE> to generate the accent by
593 itself. For each accent code, the result of combining with <SPACE> is
596 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
597 --+------+------+------+------+------+------+------+------+
598 c0| ?? | 0060 | 00b4 | 005e | 007e | 00af | 02d8 | 02d9 |
599 c8| 00a8 | ?? | 02da | 00b8 | ?? | 02dd | 02db | 02c7 |
600 --+------+------+------+------+------+------+------+------+
601 Table B.1: Mapping of T.61 Accents with <SPACE> to Unicode
604 B.2. Combinations for xc1: (Grave accent)
606 T.61 has predefined characters for combinations with A, E, I, O, and
607 U. Unicode also defines combinations for N, W, and Y. All of these
608 combinations are present in Table B.2.
610 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
611 --+------+------+------+------+------+------+------+------+
612 40| ?? | 00c0 | ?? | ?? | ?? | 00c8 | ?? | ?? |
613 48| ?? | 00cc | ?? | ?? | ?? | ?? | 01f8 | 00d2 |
614 50| ?? | ?? | ?? | ?? | ?? | 00d9 | ?? | 1e80 |
618 Zeilenga LDAPprep [Page 11]
620 Internet-Draft draft-ietf-ldapbis-strprep-00 26 May 2003
623 58| ?? | 1ef2 | ?? | ?? | ?? | ?? | ?? | ?? |
624 60| ?? | 00e0 | ?? | ?? | ?? | 00e8 | ?? | ?? |
625 68| ?? | 00ec | ?? | ?? | ?? | ?? | 01f9 | 00f2 |
626 70| ?? | ?? | ?? | ?? | ?? | 00f9 | ?? | 1e81 |
627 78| ?? | 1ef3 | ?? | ?? | ?? | ?? | ?? | ?? |
628 --+------+------+------+------+------+------+------+------+
629 Table B.2: Mapping of T.61 Grave Accent Combinations
632 B.3. Combinations for xc2: (Acute accent)
634 T.61 has predefined characters for combinations with A, E, I, O, U, Y,
635 C, L, N, R, S, and Z. Unicode also defines G, K, M, P, and W. All of
636 these combinations are present in Table B.3.
638 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
639 --+------+------+------+------+------+------+------+------+
640 40| ?? | 00c1 | ?? | 0106 | ?? | 00c9 | ?? | 01f4 |
641 48| ?? | 00cd | ?? | 1e30 | 0139 | 1e3e | 0143 | 00d3 |
642 50| 1e54 | ?? | 0154 | 015a | ?? | 00da | ?? | 1e82 |
643 58| ?? | 00dd | 0179 | ?? | ?? | ?? | ?? | ?? |
644 60| ?? | 00e1 | ?? | 0107 | ?? | 00e9 | ?? | 01f5 |
645 68| ?? | 00ed | ?? | 1e31 | 013a | 1e3f | 0144 | 00f3 |
646 70| 1e55 | ?? | 0155 | 015b | ?? | 00fa | ?? | 1e83 |
647 78| ?? | 00fd | 017a | ?? | ?? | ?? | ?? | ?? |
648 --+------+------+------+------+------+------+------+------+
649 Table B.3: Mapping of T.61 Acute Accent Combinations
652 B.4. Combinations for xc3: (Circumflex)
654 T.61 has predefined characters for combinations with A, E, I, O, U, Y,
655 C, G, H, J, S, and W. Unicode also defines the combination for Z.
656 All of these combinations are present in Table B.4.
658 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
659 --+------+------+------+------+------+------+------+------+
660 40| ?? | 00c2 | ?? | 0108 | ?? | 00ca | ?? | 011c |
661 48| 0124 | 00ce | 0134 | ?? | ?? | ?? | ?? | 00d4 |
662 50| ?? | ?? | ?? | 015c | ?? | 00db | ?? | 0174 |
663 58| ?? | 0176 | 1e90 | ?? | ?? | ?? | ?? | ?? |
664 60| ?? | 00e2 | ?? | 0109 | ?? | 00ea | ?? | 011d |
665 68| 0125 | 00ee | 0135 | ?? | ?? | ?? | ?? | 00f4 |
666 70| ?? | ?? | ?? | 015d | ?? | 00fb | ?? | 0175 |
667 78| ?? | 0177 | 1e91 | ?? | ?? | ?? | ?? | ?? |
668 --+------+------+------+------+------+------+------+------+
669 Table B.4: Mapping of T.61 Circumflex Accent Combinations
674 Zeilenga LDAPprep [Page 12]
676 Internet-Draft draft-ietf-ldapbis-strprep-00 26 May 2003
679 B.5. Combinations for xc4: (Tilde)
681 T.61 has predefined characters for combinations with A, I, O, U, and
682 N. Unicode also defines E, V, and Y. All of these combinations are
683 present in Table B.5.
685 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
686 --+------+------+------+------+------+------+------+------+
687 40| ?? | 00c3 | ?? | ?? | ?? | 1ebc | ?? | ?? |
688 48| ?? | 0128 | ?? | ?? | ?? | ?? | 00d1 | 00d5 |
689 50| ?? | ?? | ?? | ?? | ?? | 0168 | 1e7c | ?? |
690 58| ?? | 1ef8 | ?? | ?? | ?? | ?? | ?? | ?? |
691 60| ?? | 00e3 | ?? | ?? | ?? | 1ebd | ?? | ?? |
692 68| ?? | 0129 | ?? | ?? | ?? | ?? | 00f1 | 00f5 |
693 70| ?? | ?? | ?? | ?? | ?? | 0169 | 1e7d | ?? |
694 78| ?? | 1ef9 | ?? | ?? | ?? | ?? | ?? | ?? |
695 --+------+------+------+------+------+------+------+------+
696 Table B.5: Mapping of T.61 Tilde Accent Combinations
699 B.6. Combinations for xc5: (Macron)
701 T.61 has predefined characters for combinations with A, E, I, O, and
702 U. Unicode also defines Y, G, and AE. All of these combinations are
703 present in Table B.6.
705 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
706 --+------+------+------+------+------+------+------+------+
707 40| ?? | 0100 | ?? | ?? | ?? | 0112 | ?? | 1e20 |
708 48| ?? | 012a | ?? | ?? | ?? | ?? | ?? | 014c |
709 50| ?? | ?? | ?? | ?? | ?? | 016a | ?? | ?? |
710 58| ?? | 0232 | ?? | ?? | ?? | ?? | ?? | ?? |
711 60| ?? | 0101 | ?? | ?? | ?? | 0113 | ?? | 1e21 |
712 68| ?? | 012b | ?? | ?? | ?? | ?? | ?? | 014d |
713 70| ?? | ?? | ?? | ?? | ?? | 016b | ?? | ?? |
714 78| ?? | 0233 | ?? | ?? | ?? | ?? | ?? | ?? |
715 e0| ?? | 01e2 | ?? | ?? | ?? | ?? | ?? | ?? |
716 f0| ?? | 01e3 | ?? | ?? | ?? | ?? | ?? | ?? |
717 --+------+------+------+------+------+------+------+------+
718 Table B.6: Mapping of T.61 Macron Accent Combinations
721 B.7. Combinations for xc6: (Breve)
723 T.61 has predefined characters for combinations with A, U, and G.
724 Unicode also defines E, I, and O. All of these combinations are
725 present in Table B.7.
730 Zeilenga LDAPprep [Page 13]
732 Internet-Draft draft-ietf-ldapbis-strprep-00 26 May 2003
735 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
736 --+------+------+------+------+------+------+------+------+
737 40| ?? | 0102 | ?? | ?? | ?? | 0114 | ?? | 011e |
738 48| ?? | 012c | ?? | ?? | ?? | ?? | ?? | 014e |
739 50| ?? | ?? | ?? | ?? | ?? | 016c | ?? | ?? |
740 58| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
741 60| ?? | 0103 | ?? | ?? | ?? | 0115 | ?? | 011f |
742 68| ?? | 012d | ?? | ?? | ?? | ?? | 00f1 | 014f |
743 70| ?? | ?? | ?? | ?? | ?? | 016d | ?? | ?? |
744 78| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
745 --+------+------+------+------+------+------+------+------+
746 Table B.7: Mapping of T.61 Breve Accent Combinations
749 B.8. Combinations for xc7: (Dot Above)
751 T.61 has predefined characters for C, E, G, I, and Z. Unicode also
752 defines A, O, B, D, F, H, M, N, P, R, S, T, W, X, and Y. All of these
753 combinations are present in Table B.8.
755 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
756 --+------+------+------+------+------+------+------+------+
757 40| ?? | 0226 | 1e02 | 010a | 1e0a | 0116 | 1e1e | 0120 |
758 48| 1e22 | 0130 | ?? | ?? | ?? | 1e40 | 1e44 | 022e |
759 50| 1e56 | ?? | 1e58 | 1e60 | 1e6a | ?? | ?? | 1e86 |
760 58| 1e8a | 1e8e | 017b | ?? | ?? | ?? | ?? | ?? |
761 60| ?? | 0227 | 1e03 | 010b | 1e0b | 0117 | 1e1f | 0121 |
762 68| 1e23 | ?? | ?? | ?? | ?? | 1e41 | 1e45 | 022f |
763 70| 1e57 | ?? | 1e59 | 1e61 | 1e6b | ?? | ?? | 1e87 |
764 78| 1e8b | 1e8f | 017c | ?? | ?? | ?? | ?? | ?? |
765 --+------+------+------+------+------+------+------+------+
766 Table B.8: Mapping of T.61 Dot Above Accent Combinations
769 B.9. Combinations for xc8: (Diaeresis)
771 T.61 has predefined characters for A, E, I, O, U, and Y. Unicode also
772 defines H, W, X, and t. All of these combinations are present in
775 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
776 --+------+------+------+------+------+------+------+------+
777 40| ?? | 00c4 | ?? | ?? | ?? | 00cb | ?? | ?? |
778 48| 1e26 | 00cf | ?? | ?? | ?? | ?? | ?? | 00d6 |
779 50| ?? | ?? | ?? | ?? | ?? | 00dc | ?? | 1e84 |
780 58| 1e8c | 0178 | ?? | ?? | ?? | ?? | ?? | ?? |
781 60| ?? | 00e4 | ?? | ?? | ?? | 00eb | ?? | ?? |
782 68| 1e27 | 00ef | ?? | ?? | ?? | ?? | ?? | 00f6 |
786 Zeilenga LDAPprep [Page 14]
788 Internet-Draft draft-ietf-ldapbis-strprep-00 26 May 2003
791 70| ?? | ?? | ?? | ?? | 1e97 | 00fc | ?? | 1e85 |
792 78| 1e8d | 00ff | ?? | ?? | ?? | ?? | ?? | ?? |
793 --+------+------+------+------+------+------+------+------+
794 Table B.8: Mapping of T.61 Diaeresis Accent Combinations
797 B.10. Combinations for xca: (Ring Above)
799 T.61 has predefined characters for A, and U. Unicode also defines w
800 and y. All of these combinations are present in Table B.10.
802 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
803 --+------+------+------+------+------+------+------+------+
804 40| ?? | 00c5 | ?? | ?? | ?? | ?? | ?? | ?? |
805 48| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
806 50| ?? | ?? | ?? | ?? | ?? | 016e | ?? | ?? |
807 58| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
808 60| ?? | 00e5 | ?? | ?? | ?? | ?? | ?? | ?? |
809 68| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
810 70| ?? | ?? | ?? | ?? | ?? | 016f | ?? | 1e98 |
811 78| ?? | 1e99 | ?? | ?? | ?? | ?? | ?? | ?? |
812 --+------+------+------+------+------+------+------+------+
813 Table B.10: Mapping of T.61 Ring Above Accent Combinations
816 B.11. Combinations for xcb: (Cedilla)
818 T.61 has predefined characters for C, G, K, L, N, R, S, and T.
819 Unicode also defines E, D, and H. All of these combinations are
820 present in Table B.11.
822 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
823 --+------+------+------+------+------+------+------+------+
824 40| ?? | ?? | ?? | 00c7 | 1e10 | 0228 | ?? | 0122 |
825 48| 1e28 | ?? | ?? | 0136 | 013b | ?? | 0145 | ?? |
826 50| ?? | ?? | 0156 | 015e | 0162 | ?? | ?? | ?? |
827 58| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
828 60| ?? | ?? | ?? | 00e7 | 1e11 | 0229 | ?? | 0123 |
829 68| 1e29 | ?? | ?? | 0137 | 013c | ?? | 0146 | ?? |
830 70| ?? | ?? | 0157 | 015f | 0163 | ?? | ?? | ?? |
831 78| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
832 --+------+------+------+------+------+------+------+------+
833 Table B.11: Mapping of T.61 Cedilla Accent Combinations
836 B.12. Combinations for xcd: (Double Acute Accent)
838 T.61 has predefined characters for O, and U. These combinations are
842 Zeilenga LDAPprep [Page 15]
844 Internet-Draft draft-ietf-ldapbis-strprep-00 26 May 2003
847 present in Table B.12.
849 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
850 --+------+------+------+------+------+------+------+------+
851 48| ?? | ?? | ?? | ?? | ?? | ?? | ?? | 0150 |
852 50| ?? | ?? | ?? | ?? | ?? | 0170 | ?? | ?? |
853 68| ?? | ?? | ?? | ?? | ?? | ?? | ?? | 0151 |
854 70| ?? | ?? | ?? | ?? | ?? | 0171 | ?? | ?? |
855 --+------+------+------+------+------+------+------+------+
856 Table B.12: Mapping of T.61 Double Acute Accent Combinations
859 B.13. Combinations for xce: (Ogonek)
861 T.61 has predefined characters for A, E, I, and U. Unicode also
862 defines the combination for O. All of these combinations are present
865 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
866 --+------+------+------+------+------+------+------+------+
867 40| ?? | 0104 | ?? | ?? | ?? | 0118 | ?? | ?? |
868 48| ?? | 012e | ?? | ?? | ?? | ?? | ?? | 01ea |
869 50| ?? | ?? | ?? | ?? | ?? | 0172 | ?? | ?? |
870 58| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
871 60| ?? | 0105 | ?? | ?? | ?? | 0119 | ?? | ?? |
872 68| ?? | 012f | ?? | ?? | ?? | ?? | ?? | 01eb |
873 70| ?? | ?? | ?? | ?? | ?? | 0173 | ?? | ?? |
874 78| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
875 --+------+------+------+------+------+------+------+------+
876 Table B.13: Mapping of T.61 Ogonek Accent Combinations
879 B.14. Combinations for xcf: (Caron)
881 T.61 has predefined characters for C, D, E, L, N, R, S, T, and Z.
882 Unicode also defines A, I, O, U, G, H, j,and K. All of these
883 combinations are present in Table B.14.
885 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
886 --+------+------+------+------+------+------+------+------+
887 40| ?? | 01cd | ?? | 010c | 010e | 011a | ?? | 01e6 |
888 48| 021e | 01cf | ?? | 01e8 | 013d | ?? | 0147 | 01d1 |
889 50| ?? | ?? | 0158 | 0160 | 0164 | 01d3 | ?? | ?? |
890 58| ?? | ?? | 017d | ?? | ?? | ?? | ?? | ?? |
891 60| ?? | 01ce | ?? | 010d | 010f | 011b | ?? | 01e7 |
892 68| 021f | 01d0 | 01f0 | 01e9 | 013e | ?? | 0148 | 01d2 |
893 70| ?? | ?? | 0159 | 0161 | 0165 | 01d4 | ?? | ?? |
894 78| ?? | ?? | 017e | ?? | ?? | ?? | ?? | ?? |
898 Zeilenga LDAPprep [Page 16]
900 Internet-Draft draft-ietf-ldapbis-strprep-00 26 May 2003
903 --+------+------+------+------+------+------+------+------+
904 Table B.14: Mapping of T.61 Caron Accent Combinations
909 Intellectual Property Rights
911 The IETF takes no position regarding the validity or scope of any
912 intellectual property or other rights that might be claimed to pertain
913 to the implementation or use of the technology described in this
914 document or the extent to which any license under such rights might or
915 might not be available; neither does it represent that it has made any
916 effort to identify any such rights. Information on the IETF's
917 procedures with respect to rights in standards-track and
918 standards-related documentation can be found in BCP-11. Copies of
919 claims of rights made available for publication and any assurances of
920 licenses to be made available, or the result of an attempt made to
921 obtain a general license or permission for the use of such proprietary
922 rights by implementors or users of this specification can be obtained
923 from the IETF Secretariat.
925 The IETF invites any interested party to bring to its attention any
926 copyrights, patents or patent applications, or other proprietary
927 rights which may cover technology that may be required to practice
928 this standard. Please address the information to the IETF Executive
935 Copyright (C) The Internet Society (2003). All Rights Reserved.
937 This document and translations of it may be copied and furnished to
938 others, and derivative works that comment on or otherwise explain it
939 or assist in its implmentation may be prepared, copied, published and
940 distributed, in whole or in part, without restriction of any kind,
941 provided that the above copyright notice and this paragraph are
942 included on all such copies and derivative works. However, this
943 document itself may not be modified in any way, such as by removing
944 the copyright notice or references to the Internet Society or other
945 Internet organizations, except as needed for the purpose of
946 developing Internet standards in which case the procedures for
947 copyrights defined in the Internet Standards process must be followed,
948 or as required to translate it into languages other than English.
954 Zeilenga LDAPprep [Page 17]