doc/specifications/draft-iab-idn-nextsteps-05.txt

   1
   2
   3
   4
   5 Network Working Group                                         J. Klensin
   6 Internet-Draft
   7 Expires: October 14, 2006                                   P. Faltstrom
   8                                                                      IAB
   9                                                           April 12, 2006
  10
  11
  12   Review and Recommendations for Internationalized Domain Names (IDN)
  13                     draft-iab-idn-nextsteps-05.txt
  14
  15 Status of this Memo
  16
  17    By submitting this Internet-Draft, each author represents that any
  18    applicable patent or other IPR claims of which he or she is aware
  19    have been or will be disclosed, and any of which he or she becomes
  20    aware will be disclosed, in accordance with Section 6 of BCP 79.
  21
  22    Internet-Drafts are working documents of the Internet Engineering
  23    Task Force (IETF), its areas, and its working groups.  Note that
  24    other groups may also distribute working documents as Internet-
  25    Drafts.
  26
  27    Internet-Drafts are draft documents valid for a maximum of six months
  28    and may be updated, replaced, or obsoleted by other documents at any
  29    time.  It is inappropriate to use Internet-Drafts as reference
  30    material or to cite them other than as "work in progress."
  31
  32    The list of current Internet-Drafts can be accessed at
  33    http://www.ietf.org/ietf/1id-abstracts.txt.
  34
  35    The list of Internet-Draft Shadow Directories can be accessed at
  36    http://www.ietf.org/shadow.html.
  37
  38    This Internet-Draft will expire on October 14, 2006.
  39
  40 Copyright Notice
  41
  42    Copyright (C) The Internet Society (2006).
  43
  44 Abstract
  45
  46    This note describes issues raised by the deployment and use of
  47    Internationalized Domain Names.  It describes problems both at the
  48    time of registration and those for use of those names for use in the
  49    DNS.  It recommends that IETF should update the IDN related RFCs and
  50    a framework to be followed in doing so, as well as summarizing and
  51    identifying some work that is required outside the IETF.  In
  52    particular, it proposes that some changes be investigated for the
  53
  54
  55
  56 Klensin & Faltstrom     Expires October 14, 2006                [Page 1]
  57 \f
  58 Internet-Draft            IAB -- IDN Next Steps               April 2006
  59
  60
  61    IDNA standard and its supporting tables, based on experience gained
  62    since those standards were completed.
  63
  64
  65 Table of Contents
  66
  67    1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
  68      1.1.  Status of this Document and its Recommendations  . . . . .  4
  69      1.2.  The IDNA Standard  . . . . . . . . . . . . . . . . . . . .  4
  70      1.3.  Unicode Documents  . . . . . . . . . . . . . . . . . . . .  5
  71      1.4.  Definitions  . . . . . . . . . . . . . . . . . . . . . . .  5
  72        1.4.1.  Language . . . . . . . . . . . . . . . . . . . . . . .  6
  73        1.4.2.  Script . . . . . . . . . . . . . . . . . . . . . . . .  6
  74        1.4.3.  Multilingual . . . . . . . . . . . . . . . . . . . . .  6
  75        1.4.4.  Localization . . . . . . . . . . . . . . . . . . . . .  6
  76        1.4.5.  Internationalization . . . . . . . . . . . . . . . . .  7
  77      1.5.  Statements and Guidelines  . . . . . . . . . . . . . . . .  7
  78        1.5.1.  IESG Statement . . . . . . . . . . . . . . . . . . . .  7
  79        1.5.2.  ICANN statements . . . . . . . . . . . . . . . . . . .  8
  80    2.  Problems and Issues  . . . . . . . . . . . . . . . . . . . . . 10
  81      2.1.  User conceptions, local character sets, and input
  82            issues . . . . . . . . . . . . . . . . . . . . . . . . . . 11
  83      2.2.  Examples of issues . . . . . . . . . . . . . . . . . . . . 12
  84        2.2.1.  Language specific character matching . . . . . . . . . 12
  85        2.2.2.  Multiple scripts . . . . . . . . . . . . . . . . . . . 12
  86        2.2.3.  Normalization and Character Mappings . . . . . . . . . 13
  87        2.2.4.  URLs in Printed Form . . . . . . . . . . . . . . . . . 15
  88        2.2.5.  Bidirectional text . . . . . . . . . . . . . . . . . . 16
  89        2.2.6.  Confusable Character Issues  . . . . . . . . . . . . . 16
  90        2.2.7.  The IESG Statement and IDNA issues . . . . . . . . . . 17
  91        2.2.8.  Versions of Unicode  . . . . . . . . . . . . . . . . . 18
  92    3.  Framework for next steps in IDN development  . . . . . . . . . 19
  93      3.1.  Issues within the scope of the IETF  . . . . . . . . . . . 19
  94        3.1.1.  Review of IDNA . . . . . . . . . . . . . . . . . . . . 19
  95        3.1.2.  Non-DNS and Above-DNS Internationalization
  96                Approaches . . . . . . . . . . . . . . . . . . . . . . 20
  97        3.1.3.  Security issues, certificates, etc.  . . . . . . . . . 21
  98        3.1.4.  Non US-ASCII in local part of email addresses  . . . . 22
  99        3.1.5.  Use of the Unicode Character Set in the IETF . . . . . 22
 100      3.2.  Issues that fall within the purview of ICANN . . . . . . . 23
 101        3.2.1.  Dispute resolution . . . . . . . . . . . . . . . . . . 23
 102        3.2.2.  Policy at registries . . . . . . . . . . . . . . . . . 23
 103        3.2.3.  IDN TLDs . . . . . . . . . . . . . . . . . . . . . . . 24
 104    4.  Specific Recommendations for Next Steps  . . . . . . . . . . . 24
 105      4.1.  Reduction of permitted character list  . . . . . . . . . . 24
 106        4.1.1.  Elimination of all non-language characters . . . . . . 25
 107        4.1.2.  Elimination of word-separation punctuation . . . . . . 25
 108      4.2.  Updating to new versions of Unicode  . . . . . . . . . . . 25
 109
 110
 111
 112 Klensin & Faltstrom     Expires October 14, 2006                [Page 2]
 113 \f
 114 Internet-Draft            IAB -- IDN Next Steps               April 2006
 115
 116
 117      4.3.  Combining Characters and Character Components  . . . . . . 26
 118      4.4.  Role and Uses of the DNS . . . . . . . . . . . . . . . . . 26
 119      4.5.  Databases of Registered Names  . . . . . . . . . . . . . . 27
 120    5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 27
 121    6.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 27
 122    7.  Change History . . . . . . . . . . . . . . . . . . . . . . . . 28
 123      7.1.  Changes for version -01  . . . . . . . . . . . . . . . . . 28
 124      7.2.  Changes for version -02  . . . . . . . . . . . . . . . . . 28
 125      7.3.  Changes for Version -03  . . . . . . . . . . . . . . . . . 29
 126      7.4.  Changes for version -04  . . . . . . . . . . . . . . . . . 29
 127      7.5.  Changes for version -05  . . . . . . . . . . . . . . . . . 29
 128    8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 29
 129      8.1.  Normative References . . . . . . . . . . . . . . . . . . . 29
 130      8.2.  Informative References . . . . . . . . . . . . . . . . . . 30
 131    Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 34
 132    Intellectual Property and Copyright Statements . . . . . . . . . . 35
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168 Klensin & Faltstrom     Expires October 14, 2006                [Page 3]
 169 \f
 170 Internet-Draft            IAB -- IDN Next Steps               April 2006
 171
 172
 173 1.  Introduction
 174
 175 1.1.  Status of this Document and its Recommendations
 176
 177    This document reviews the IDN landscape from an IETF perspective and
 178    presents the recommendations and conclusions of the IAB, based
 179    partially on input from an ad hoc committee charged with reviewing
 180    IDN issues and the path forward (See Section 6).  Its recommendations
 181    are recommendations to the IETF, or in a few cases to other bodies,
 182    for topics to be examined and actions to be taken if those bodies,
 183    after their examinations, consider those actions appropriate.
 184
 185    IMPORTANT: The IAB has not yet reached consensus that this document
 186    is ready for final publication.  While considerable input from the
 187    members of the ad hoc committee went into the document, no claim is
 188    made that it represents the consensus of that group.  However, the
 189    IAB concluded that it was appropriate to expose these versions, as
 190    working drafts, for community comment and feedback.  Such comments
 191    should be sent to iab@iab.org.
 192
 193 1.2.  The IDNA Standard
 194
 195    During 2002 IETF completed the following RFCs that, together, define
 196    IDNs:
 197
 198    RFC 3454 Preparation of Internationalized Strings ("Stringprep")
 199       [RFC3454].
 200       Stringprep is a generic mechanism for taking a Unicode string and
 201       converting it into a canonical format.  Stringprep itself is just
 202       a collection of rules, tables, and operations.  Any protocol or
 203       algorithm that uses it must define a "Stringprep profile", which
 204       specifies which of those rules are applied, how, and with which
 205       characteristics.
 206
 207    RFC 3490 Internationalizing Domain Names in Applications (IDNA)
 208       [RFC3490].
 209       IDNA is the base specification in this group.  It specifies that
 210       Nameprep is used as the Stringprep profile for domain names, and
 211       that Punycode is the relevant encoding mechanism for use in
 212       generating an ASCII-compatible ("ACE") form of the name.  It also
 213       applies some additional conversions and character filtering that
 214       are not part of Nameprep.
 215
 216    RFC 3491 Nameprep: A Stringprep Profile for Internationalized Domain
 217       Names (IDN) [RFC3491].
 218       Nameprep is one such profile.  It is designed to meet the specific
 219       needs of IDNs and, in particular, to support case-folding for
 220       scripts that support what are traditionally known as upper and
 221
 222
 223
 224 Klensin & Faltstrom     Expires October 14, 2006                [Page 4]
 225 \f
 226 Internet-Draft            IAB -- IDN Next Steps               April 2006
 227
 228
 229       lower case forms of the same letters.  The result of the Nameprep
 230       algorithm is a string containing a subset of the Unicode Character
 231       set, normalized and case folded so that case insensitive
 232       comparison can be made.
 233
 234    RFC 3492 Punycode: A Bootstring encoding of Unicode for
 235       Internationalized Domain Names in Applications (IDNA) [RFC3492].
 236       Punycode is a mechanism for encoding a Unicode string in ASCII
 237       characters.  The characters used are the same the subset of
 238       characters that are allowed in the hostname definition of DNS,
 239       i.e., the "letter, digit, and hyphen" characters, sometimes known
 240       as "LDH".
 241
 242 1.3.  Unicode Documents
 243
 244    Unicode is used as the base, and defining, character set for IDN.
 245    Unicode is standardized by the Unicode Consortium, and synchronized
 246    with ISO to create ISO/IEC 10646 [ISO10646].  At the time the RFCs
 247    mentioned earlier were created, Unicode was at version 3.2.  For
 248    reasons explained later, it was necessary to pick a particular, then-
 249    current, version of Unicode when IDNA was adopted.  Consequently, the
 250    RFCs are explicitly dependent on Unicode version 3.2 [Unicode32].
 251    There is, at present, no established mechanism for modifying the IDNA
 252    RFCs to use newer Unicode versions (see Section 2.2.8).
 253
 254    Unicode is a very large and complex character set.  (The term
 255    "character set" or "charset" is used in a way that is peculiar to the
 256    IETF and may not be the same as the usage in other bodies and
 257    contexts.)  The Unicode Standard and related documents are created
 258    and maintained by the Unicode Technical Committee (UTC), one of the
 259    committees of the Unicode Consortium.
 260
 261    The Consortium first published The Unicode Standard [Unicode10] in
 262    1991, and continues to develop standards based on that original work.
 263    Unicode is developed in conjunction with the International
 264    Organization for Standardization, and it shares its character
 265    repertoire with ISO/IEC 10646.  Unicode and ISO/IEC 10646 function
 266    equivalently as character encodings, but The Unicode Standard
 267    contains much more information for implementers, covering -- in depth
 268    -- topics such as bitwise encoding, collation, and rendering.  The
 269    Unicode Standard enumerates a multitude of character properties,
 270    including those needed for supporting bidirectional text.  The
 271    Unicode Consortium and ISO standards do use slightly different
 272    terminology.
 273
 274 1.4.  Definitions
 275
 276    The following terms and their meanings are critical to understanding
 277
 278
 279
 280 Klensin & Faltstrom     Expires October 14, 2006                [Page 5]
 281 \f
 282 Internet-Draft            IAB -- IDN Next Steps               April 2006
 283
 284
 285    the rest of this document and to discussions of IDNs more generally.
 286    These terms are derived from [RFC3536], which contains additional
 287    discussion of some of them.
 288
 289 1.4.1.  Language
 290
 291    A language is a way that humans interact.  The use of language occurs
 292    in many forms, including speech, writing, and signing.
 293
 294    Some languages have a close relationship between the written and
 295    spoken forms, while others have a looser relationship.  RFC 3066
 296    [RFC3066] discusses languages in more detail and provides identifiers
 297    for languages for use in Internet protocols.  Computer languages are
 298    explicitly excluded from this definition.  The most recent IETF work
 299    in this area, and on script identification (see below), is documented
 300    in [ltru-registry] and [ltru-initial].
 301
 302 1.4.2.  Script
 303
 304    A script is a set of graphic characters used for the written form of
 305    one or more languages.  This definition is the one used in
 306    [ISO10646].
 307
 308    Examples of scripts are Arabic, Cyrillic, Greek, Han (the so-called
 309    ideographs used in writing Chinese, Japanese, and Korean), and Latin
 310    (more properly "Roman", see below), Arabic, Greek, and Latin are, of
 311    course, also names of languages.  Some issues with script
 312    identification and relationships with other standards are discussed
 313    in [ltru-registry].
 314
 315 1.4.3.  Multilingual
 316
 317    The term "multilingual" has many widely-varying definitions and thus
 318    is not recommended for use in standards.  Some of the definitions
 319    relate to the ability to handle international characters; other
 320    definitions relate to the ability to handle multiple charsets; and
 321    still others relate to the ability to handle multiple languages.
 322
 323    While this term has been deprecated for IETF-related uses and does
 324    not otherwise appear in this document, a discussion here seemed
 325    appropriate since the term is still widely used in some discussions
 326    of IDNs.
 327
 328 1.4.4.  Localization
 329
 330    Localization is the process of adapting an internationalized
 331    application platform or application to a specific cultural
 332    environment.  In localization, the same semantics are preserved while
 333
 334
 335
 336 Klensin & Faltstrom     Expires October 14, 2006                [Page 6]
 337 \f
 338 Internet-Draft            IAB -- IDN Next Steps               April 2006
 339
 340
 341    the syntax or presentation forms may be changed.
 342
 343    Localization is the act of tailoring an application for a different
 344    language or script or culture.  Some internationalized applications
 345    can handle a wide variety of languages.  Typical users only
 346    understand a small number of languages, so the program must be
 347    tailored to interact with users in just the languages they know.
 348
 349    Somewhat different definitions for localization and
 350    internationalization (see below) are used by groups other than the
 351    IETF.  See [W3C-Localization] for one example.
 352
 353 1.4.5.  Internationalization
 354
 355    In the IETF, the term "internationalization" is used to describe
 356    adding or improving the handling of non-ASCII text in a protocol.
 357    Other bodies use the term in other ways, often ones that are subtly
 358    different from each other.  The term "internationalization" is often
 359    abbreviated "i18n".
 360
 361    Many protocols that handle text only handle the characters associated
 362    with one script (often, a subset of the characters used in writing
 363    English text), or leave the question of what character set is used up
 364    to local guesswork (which leads, of course, to interoperability
 365    problems).  Adding non-ASCII text to such a protocol allows the
 366    protocol to handle more scripts, with the intention of being able to
 367    include all of the scripts that are useful in the world.  It should
 368    be noted that many English words cannot be written in ASCII, various
 369    mythologies notwithstanding.
 370
 371 1.5.  Statements and Guidelines
 372
 373    When the IDN RFCs were published, IESG and ICANN made statements that
 374    were intended to guide deployment and future work.  In recent months,
 375    ICANN has updated its statement and others have also made
 376    contributions.  It is worth noting that the quality of understanding
 377    of internationalization issues as applied to the DNS has evolved
 378    considerably over the last few years.  Organizations that took
 379    specific positions a year or more ago might not make exactly the same
 380    statements today.
 381
 382 1.5.1.  IESG Statement
 383
 384    The IESG made a statement on IDNA [IESG-IDN]:
 385
 386
 387
 388
 389
 390
 391
 392 Klensin & Faltstrom     Expires October 14, 2006                [Page 7]
 393 \f
 394 Internet-Draft            IAB -- IDN Next Steps               April 2006
 395
 396
 397        IDNA, through its requirement of Nameprep [RFC3491], uses
 398        equivalence tables that are based only on the characters
 399        themselves; no attention is paid to the intended language (if any)
 400        for the domain name. However, for many domain names, the intended
 401        language of one or more parts of the domain name actually does
 402        matter to the users.
 403
 404        Similarly, many names cannot be presented and used without
 405        ambiguity unless the scripts to which their characters belong are
 406        known. In both cases, this additional information should be of
 407        concern to the registry.
 408
 409    The statement is longer than this, but these paragraphs are the
 410    important ones.  The rest of the statement are explanations and
 411    examples.
 412
 413 1.5.2.  ICANN statements
 414
 415 1.5.2.1.  Initial ICANN Guidelines
 416
 417    Soon after the IDNA standard was adopted, ICANN produced an initial
 418    version of its "IDN Guidelines" [ICANNv1].  This document was
 419    intended to serve two purposes.  The first was to provide a basis for
 420    releasing the gTLD registries that had been established by ICANN from
 421    a contractual restriction on the registration of labels containing
 422    hyphens in the third and fourth positions.  The second was to provide
 423    a general framework for the development of registry policies for the
 424    implementation of IDN.
 425
 426    One of the key components of this framework was prescribing strict
 427    compliance with RFCs 3490, 3491, and 3492.  These specifications
 428    established the ACE (ASCII-Compatible Encoding) scheme for IDN use,
 429    known as "Punycode", and the various rules for its use.  The
 430    specifications designated Punycode, supported by those rules, as the
 431    sole such encoding to be used with the DNS.
 432
 433    Limitations on the characters available for inclusion in IDNs were
 434    mandated by two devices.  The first was by requiring an "inclusion-
 435    based approach (meaning that code points that are not explicitly
 436    permitted by the registry are prohibited) for identifying permissible
 437    code points from among the full Unicode repertoire."  The second
 438    device required the association of every IDN with a specific
 439    language, with additional policies also being language based:
 440
 441    "In implementing the IDN standards, top-level domain registries will
 442    (a) associate each registered internationalized domain name with one
 443    language or set of languages,
 444    (b) employ language-specific registration and administration rules
 445
 446
 447
 448 Klensin & Faltstrom     Expires October 14, 2006                [Page 8]
 449 \f
 450 Internet-Draft            IAB -- IDN Next Steps               April 2006
 451
 452
 453    that are documented and publicly available, such as the reservation
 454    of all domain names with equivalent character variants in the
 455    languages associated with the registered domain name, and,
 456    (c) where the registry finds that the registration and administration
 457    rules for a given language would benefit from a character variants
 458    table, allow registrations in that language only when an appropriate
 459    table is available. ...  In implementing the IDN standards, top-level
 460    domain registries should, at least initially, limit any given domain
 461    label (such as a second-level domain name) to the characters
 462    associated with one language or set of languages only."
 463
 464    It was left to each TLD registry to define the character repertoire
 465    it would associate with any given language.  This led to significant
 466    variation from registry to registry, with further heterogeneity in
 467    the underlying language-based IDN policies.  If the guidelines had
 468    made provision for IDN policies also being based on script, a
 469    substantial amount of the resulting ambiguity could have been
 470    avoided.  However, they did not, and the sequence of events leading
 471    to the present review of IDNA was thus triggered.
 472
 473 1.5.2.2.  ICANN Version 2 Guidelines
 474
 475    One of responses of the TLD registries to what was widely perceived
 476    as a crisis situation, was to invoke the mechanism described in the
 477    initial guidelines: "As the deployment of IDNs proceeds, ICANN and
 478    the IDN registries will review these Guidelines at regular intervals,
 479    and revise them as necessary based on experience."
 480
 481    The pivotal requirement was the modification of the guidelines to
 482    permit script-based IDN policies.  Further concern was expressed
 483    about the need for realistically implementable mechanisms for the
 484    propagation of TLD registry policies into the lower levels of their
 485    name trees.  In addition to the anticipated increase of constraint on
 486    the protocol level, one obvious additional approach would be to
 487    replace the guidelines by an instrument which itself had clear status
 488    in the IETF's normative framework.  A BCP was therefore seen as the
 489    appropriate focus for longer-term effort.  The most pressing issues
 490    would be dealt with in the interim by incremental modification to the
 491    guidelines, but no need was seen for the detailed further development
 492    of those guidelines once that incremental modification was complete..
 493
 494    The outcome of this action was a version 2.0 of the guidelines
 495    [ICANNv2] which was endorsed by the ICANN Board on November 8, 2005
 496    for a period of nine months.  The Board stated further that it "tasks
 497    the IDN working group to continue its important work and return to
 498    the board with specific IDN improvement recommendations before the
 499    ICANN Meeting in Morocco" and "supports the working group's continued
 500    action to reframe the guidelines completely in a manner appropriate
 501
 502
 503
 504 Klensin & Faltstrom     Expires October 14, 2006                [Page 9]
 505 \f
 506 Internet-Draft            IAB -- IDN Next Steps               April 2006
 507
 508
 509    for further development as a Best Current Practices (BCP) document,
 510    to ensure that the Guideline directions will be used deeper into the
 511    DNS hierarchy and within TLD's where ICANN has a lesser policy
 512    relationship."
 513
 514    Retaining the inclusion-based approach established in version 1.0,
 515    the crucial addition to the policy framework is that:
 516
 517    "All code points in a single label will be taken from the same script
 518    as determined by the Unicode Standard Annex #24: Script Names at
 519    http://www.unicode.org/reports/tr24.  Exception to this is
 520    permissible for languages with established orthographies and
 521    conventions that require the commingled use of multiple scripts.  In
 522    such cases, visually confusable characters from different scripts
 523    will not be allowed to co-exist in a single set of permissible
 524    codepoints unless a corresponding policy and character table is
 525    clearly defined."
 526
 527    Additionally:
 528
 529    "Permissible code points will not include: (a) line symbol-drawing
 530    characters (as those in the Unicode Box Drawing block), (b) symbols
 531    and icons that are neither alphanumeric nor ideographic language
 532    characters, such as typographic and pictographic dingbats, (c)
 533    characters with well-established functions as protocol elements, (d)
 534    punctuation marks used solely to indicate the structure of
 535    sentences."
 536
 537    Attention has been called to several points that are not adequately
 538    dealt with (if at all) in the version 2.0 guidelines but which ought
 539    to be included in the policy framework without waiting for the
 540    production and release of a document based on a "best practices"
 541    model.  The term "BCP" above does not necessarily refer to an IETF
 542    consensus document.  The recommendations to be put to the ICANN Board
 543    prior to its meeting in Morocco (in late June 2006) will therefore be
 544    collated incrementally and appear in interim version 2.n releases of
 545    the guidelines.
 546
 547
 548 2.  Problems and Issues
 549
 550    This section intentionally mixes problems and issues of several
 551    types.  Each subsection outlines something that is perceived to be a
 552    problem or issue "with IDNs", therefore needing correction.  Some of
 553    these issues can be at least partially resolved by making changes to
 554    elements of the IDNA protocol or tables.  Others will exist as long
 555    as people have expectations of IDNs that are inconsistent with the
 556    basic DNS architecture.  It is important to identify this entire
 557
 558
 559
 560 Klensin & Faltstrom     Expires October 14, 2006               [Page 10]
 561 \f
 562 Internet-Draft            IAB -- IDN Next Steps               April 2006
 563
 564
 565    range of problems because users, registrants, and policy makers often
 566    do not understand the protocol and other technical issues but only
 567    the difference between what they believe happens or should happen and
 568    what actually happens.  As long as those differences exist, there
 569    will be demands for functionality or policy changes for IDN.  Of
 570    course, some of these demands will be less realistic than others but
 571    even the realistic ones should be understood in the same context as
 572    the others.
 573
 574 2.1.  User conceptions, local character sets, and input issues
 575
 576    People use "words" when they think of things and wish others to think
 577    of them too.  For example "orange", "tree", "restaurant" or "Acme
 578    Inc".  Words are normally in a specific language, such as English or
 579    Swedish.  The DNS, however, supports character-string labels, not
 580    "words".  While it is useful, especially for mnemonic value or to
 581    identify objects, for actual words to be used as DNS labels, other
 582    constraints on the DNS make it impossible to guarantee that it will
 583    be possible to represent every word in every language as a DNS label,
 584    internationalized or not.
 585
 586    When writing or typing the label (or word), a script must be selected
 587    and a charset must be picked for use with that script.  That choice
 588    of charset is typically not under the control of the user on a per
 589    word or per document basis, but may depend on local input devices,
 590    keyboard or terminal drivers, or other decisions made by operating
 591    system or even hardware designers and implementers.
 592
 593    If that charset, or the local charset being used by the relevant
 594    operating system or application software, is not Unicode, a further
 595    conversion must be performed to produce Unicode.  How often this is
 596    an issue depends on estimates of how widely Unicode is deployed as
 597    the native character set for hardware, operating systems, and
 598    applications.  Those estimates differ widely, with some Unicode
 599    advocates claiming that it is used in the vast majority of systems
 600    and applications today.  Others are more skeptical, pointing out
 601    that:
 602
 603    o  ISO 8859 versions [ISO.8859.2003] and even national variations of
 604       ISO 646 [ISO.646.1991] are still widely used in parts of Europe;
 605    o  code-table switching methods, typically based on the techniques of
 606       ISO 2022 [ISO.2022.1986] are still in general use in many parts of
 607       the world, especially in Japan with Shift-JIS and its variations;
 608    o  that computing, systems, and communications in China tend to use
 609       one or more of the national "GB" standards rather than native
 610       Unicode;
 611
 612
 613
 614
 615
 616 Klensin & Faltstrom     Expires October 14, 2006               [Page 11]
 617 \f
 618 Internet-Draft            IAB -- IDN Next Steps               April 2006
 619
 620
 621    o  and so on.
 622
 623    Not all charsets define their characters in the same way and not all
 624    pre-existing coding systems were incorporated into Unicode without
 625    changes.  Sometimes local distinctions were made that Unicode does
 626    not make or vice versa.  Consequently, conversion from other systems
 627    to Unicode may potentially lose information.
 628
 629    The Unicode string that results from this processing --processing
 630    that is trivial in a Unicode-native system but that may be
 631    significant in others-- is then used as input to IDNA.
 632
 633 2.2.  Examples of issues
 634
 635    While much of the discussion below is stated in terms of Unicode
 636    codings and associated rules, the IAB believes that some of the
 637    issues are actually not about the Unicode Character set per se, but
 638    about how distributed matching systems operate in reality, and about
 639    what implications the distributed delayed search for stored data that
 640    characterizes the DNS have on the mapping algorithms.
 641
 642 2.2.1.  Language specific character matching
 643
 644    There are similar words that can be expressed in multiple languages.
 645    For example the name Torbjorn in Norwegian and Swedish.  In Norwegian
 646    it is spelled with the character U+00F8 (LATIN SMALL LETTER O WITH
 647    STROKE) in the second syllable, while in Swedish it is spelled with
 648    U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS).  Those characters are
 649    not treated as equivalent according to the Unicode consortium while
 650    most people speaking Swedish, Danish and Norwegian probably think
 651    they are equivalent.
 652
 653    It is neither possible nor desirable to make these characters
 654    equivalent on a global basis.  To do so would, for this example
 655    rationalize the situation in Sweden while causing considerable
 656    confusion in Germany, where the U+00F8 character is never used in the
 657    German language.  But the "variant" model introduced in [RFC3743] and
 658    [RFC4290] can be used by a registry to prevent the worst consequence
 659    of the possible confusion, either by ensuring that both names are
 660    registered to same party in a given domain or that one of them is
 661    completely prohibited.
 662
 663 2.2.2.  Multiple scripts
 664
 665    There are languages in the world that can be expressed using multiple
 666    scripts.  For example some Eastern European and Central Asian
 667    languages can be expressed in either Cyrillic or Roman characters or
 668    some African and Southeast Asian languages can be expressed in either
 669
 670
 671
 672 Klensin & Faltstrom     Expires October 14, 2006               [Page 12]
 673 \f
 674 Internet-Draft            IAB -- IDN Next Steps               April 2006
 675
 676
 677    Arabic or Roman characters A few languages can even be written in
 678    three different scripts.  In other cases, the language is typically
 679    written in a combination of scripts (e.g., Kanji and Kana for
 680    Japanese, Hangul and Hanji for Korean).  Because of this, the same
 681    word, in the same language, can be expressed in different ways.  For
 682    some languages, only a single script is normally used to write a
 683    single word; for others, mixed scripts are required; and, for still
 684    others, special circumstances may dictate mixing scripts in labels
 685    although that is not normally done for "words".  For IDN purposes,
 686    these variations make the definition of "script" extremely sensitive,
 687    especially since ICANN is now recommending that it be used as the
 688    primary basis for registry policies.  However essential it may be to
 689    prohibit mixed-script labels, additional policy nuance is required
 690    for "languages with established orthographies and conventions that
 691    require the commingled use of multiple scripts".
 692
 693 2.2.3.  Normalization and Character Mappings
 694
 695    Unicode contains several different models for representing
 696    characters.  The Chinese (Han)-derived characters of the "CJK"
 697    languages are "unified", i.e., characters with common derivation and
 698    similar appearances are assigned to the same code point.  European
 699    characters derived from a Greek-Roman base are separated into
 700    separate code blocks for "Latin", Greek and Cyrillic even when
 701    individual characters are identical in both form and semantics.
 702    Separate code points based on font differences alone are generally
 703    prohibited, but a large number of characters for "mathematical" use
 704    have been assigned separate code points even though they differ from
 705    base ASCII characters only by font attributes such as "script",
 706    "bold", or "italic".  Some characters that often appear together are
 707    treated as typographical digraphs with specific code points assigned
 708    to the combination, others require that the two-character sequences
 709    be used, and still others are available in both forms.  Some Roman-
 710    based letters that were developed as decorated variations on the
 711    basic Latin letter collection (e.g., by addition of diacritical
 712    marks) are assigned code points as individual characters, others must
 713    be built up as two (or more) character sequences using "composing
 714    characters".
 715
 716    Many of these differences result from the desire to maintain backward
 717    compatibility while the standard evolved historically, and are hence
 718    understandable.  However, the DNS requires precise knowledge of which
 719    codes and code sequences represent the same character and which ones
 720    do not.  Limiting the potential difficulties with confusable
 721    characters (see Section 2.2.6) requires even more knowledge of which
 722    characters might look alike in some fonts but not in others.  These
 723    variations make it difficult or impossible to apply a single set of
 724    rules to all of Unicode and, in doing so, satisfy everyone and their
 725
 726
 727
 728 Klensin & Faltstrom     Expires October 14, 2006               [Page 13]
 729 \f
 730 Internet-Draft            IAB -- IDN Next Steps               April 2006
 731
 732
 733    perceived needs.  Instead, more or less complex mapping tables,
 734    defined on a character by character basis, are required to
 735    "normalize" different representations of the same character to a
 736    single form so that matching is possible.
 737
 738    Unless normalization rules, such as those that underlie Nameprep, are
 739    applied, characters that are essentially identical will not match in
 740    the DNS, creating many opportunities for problems.  The most common
 741    one is that, due to the process above before a word ends up being a
 742    Unicode string, a single word can end up being expressed as more than
 743    one unique Unicode string.
 744
 745    IDNA attempts to compensate for some of these problems by using a
 746    normalization algorithm defined by the Unicode Consortium.  This
 747    algorithm can change a sequence of one or more Unicode characters to
 748    another set of characters.  One example is that the base character
 749    U+0061 (LATIN SMALL LETTER A) followed by U+0308 (COMBINING
 750    DIAERESIS) is changed to the single Unicode character U+00E4 (LATIN
 751    SMALL LETTER A WITH DIAERESIS).
 752
 753    This Unicode normalization process accounts only for simple character
 754    equivalences, not equivalences that are language or script dependent.
 755    For example, as mentioned above, the characters U+00F8 (LATIN SMALL
 756    LETTER O WITH STROKE) and U+00F6 (LATIN SMALL LETTER O WITH
 757    DIAERESIS) are considered to match in Swedish (and some other
 758    languages), but not for all languages than use either of the
 759    characters.  Having these characters be treated as equivalent in some
 760    contexts and not in others requires decisions and mechanisms that, in
 761    turn, depend much more on context than either IDNA or the Unicode
 762    character-based normalization tables can provide.
 763
 764    If we leave Roman-based scripts and examine those based on Chinese
 765    characters, we see there is also an absence of specific, lexigraphic,
 766    rules for transformations between Traditional and Simplified Chinese.
 767    Even if there were such rules, unification of Japanese and Korean
 768    characters with Chinese ones would make it impossible to normalize
 769    Traditional Chinese into Simplified Chinese ones without causing
 770    problems in Japanese and Korean use of the same characters.
 771
 772    More generally, while some mappings, such as those between
 773    precomposed Roman-based characters and the equivalent multiple code
 774    point composed character sequences, depend only on the characters
 775    themselves, in many or most cases, such as the case with Swedish
 776    above, the mapping is language or culturally dependent.  There have
 777    been discussions as to whether different canonicalization rules (in
 778    addition to or instead of Unicode normalization) should be, or could
 779    be, applied differently to different languages or scripts.  The fact
 780    that most scripts included in Unicode have been initially
 781
 782
 783
 784 Klensin & Faltstrom     Expires October 14, 2006               [Page 14]
 785 \f
 786 Internet-Draft            IAB -- IDN Next Steps               April 2006
 787
 788
 789    incorporated by copying an existing standard more or less intact has
 790    impact on the optimization of these algorithms and on forward
 791    compatibility.  Even if the language is known and language-specific
 792    rules can be defined, dependencies on the language do not disappear.
 793    Canonicalization operations are not possible unless they either
 794    depend only on short sequences of text or have significant context
 795    available that is not obvious from the text itself.  DNS lookups and
 796    many other operations do not have a way to capture and utilize the
 797    language or other information that would be needed to provide that
 798    context.
 799
 800    These variations in languages and in user perceptions of characters
 801    make it difficult or impossible to provide uniform algorithms for
 802    matching Unicode strings in a way that no end users are ever
 803    surprised by the result.  For closely-related scripts or characters,
 804    surprises may even be frequent.  However, because uniform algorithms
 805    are required for mappings that are applied when names are looked up
 806    in the DNS, the rules that are chosen will always represent an
 807    approximation that will be more or less successful in minimizing
 808    those user surprises.  The current Nameprep and Stringprep algorithms
 809    use mapping tables to "normalize" different representations of the
 810    same text to a single form so that matching is possible.
 811
 812    More details on the creation of the normalization algorithms can be
 813    found in the Unicode Specification and the associated Technical
 814    Reports [UTR] and Annexes.  Technical Report #36 [UTR36] and [UTR39]
 815    are specifically related to the IDN discussion.
 816
 817 2.2.4.  URLs in Printed Form
 818
 819    URLs and other identifiers appear, not only in electronic forms from
 820    which they can (at least in principle) be accurately copied and
 821    "pasted" but in printed forms from which the user must transcribe
 822    them into the computer system.  This is often known as the "side of
 823    the bus problem" because a particularly problematic version of it
 824    requires that the user be able to observe and accurately remember a
 825    URL that is quickly-glimpsed in a transient form -- a billboard seen
 826    while driving, a sign on the side of a passing vehicle, a television
 827    advertisement that is not frequently repeated or on-screen for a long
 828    time, and so on.
 829
 830    The difficulty, in short, is that two Unicode strings that are
 831    actually different might look exactly the same, especially when there
 832    is no time to study them.  This is because, for example, some glyphs
 833    in Cyrillic, Greek and Latin do look the same, but have been assigned
 834    different codepoints in Unicode.  Worse, one needs to be reasonably
 835    familiar with a script and how it is used to understand how much
 836    characters can reasonably vary as the result of artistic fonts and
 837
 838
 839
 840 Klensin & Faltstrom     Expires October 14, 2006               [Page 15]
 841 \f
 842 Internet-Draft            IAB -- IDN Next Steps               April 2006
 843
 844
 845    typography.  For example, there are a few fonts for Latin characters
 846    that are sufficiently highly ornamented that an observer might easily
 847    confuse some of the characters with characters in Thai script.
 848
 849 2.2.5.  Bidirectional text
 850
 851    Some scripts (and because of that some words in some languages) are
 852    written not left to right, but right to left.  And, to complicate
 853    things, one might have something written in Arabic characters right
 854    to left that includes some characters in Latin characters, such as
 855    European-style digits.  The Latin character part is written left to
 856    right, which implies some texts might have a mixed left to right AND
 857    right to left order (even though in most implementations all texts
 858    have a major direction, with the other as an exception).  IDNA
 859    prohibits these mixed-directional (or bidirectional) strings in IDN
 860    labels, but the prohibition causes other problems such as the
 861    rejection of some otherwise linguistically and culturally sensible
 862    strings.  As Unicode and conventions for handling so-called
 863    bidirectional ("BIDI") strings evolve, the prohibition in IDNA should
 864    be reviewed and reevaluated.
 865
 866 2.2.6.  Confusable Character Issues
 867
 868    Similar-looking characters in identifiers can cause actual problems
 869    on the Internet since they can result, deliberately or accidentally,
 870    in people being directed to the wrong host or mailbox by believing
 871    that they are typing, or clicking on, intended characters which are
 872    different from those that actually appear in the domain name or
 873    reference.  See Section 3.1.3 for further discussion of this issue.
 874
 875    IDNs complicate these issues, not only by providing many additional
 876    characters that look sufficiently alike to be potentially confused,
 877    but by raising new policy questions.  For example, if a language can
 878    be written in two different scripts, is a label constructed from a
 879    word written in one script equivalent to a label constructed from the
 880    same word written in the other script?  Is the answer the same for
 881    words in two different languages that translate into each other?
 882
 883    It is now generally understood that, in addition to the collision
 884    problems of possibly equivalent words and hence labels, it is
 885    possible to utilize characters that look alike -- "confusable"
 886    characters -- to spoof names in order to mislead or defraud users.
 887    That issue, driven by particular attacks such as those known as
 888    "phishing", has introduced stronger requirements for registry efforts
 889    to prevent problems than were previously generally recognized as
 890    important.
 891
 892    One commonly-proposed approach is to have a registry establish
 893
 894
 895
 896 Klensin & Faltstrom     Expires October 14, 2006               [Page 16]
 897 \f
 898 Internet-Draft            IAB -- IDN Next Steps               April 2006
 899
 900
 901    restrictions on the characters, and combinations of characters, it
 902    will permit to be included in a string to be registered as a label.
 903    Taking the Swedish top-level domain, .SE, as an example, a rule might
 904    be adopted that the registry "only accepts registrations in Swedish,
 905    using Roman script, and because of this, Unicode characters Latin-a,
 906    -b, -c,...".  But, because there is not a 1:1 mapping between country
 907    and language, even a ccTLD like .SE might have to accept
 908    registrations in other languages.  For example, there may be a
 909    requirement for Finnish (the second most-used language in Sweden).
 910    What rules and codepoints are then defined for Finnish?  Does it have
 911    special mappings that collide with those that are defined for
 912    Swedish?  And what does one do in countries that use more than one
 913    script?  (Finnish and Swedish use the same script.)  In all cases,
 914    the dispute will ultimately be about whether two strings are the same
 915    (or confusingly similar) or not.  That, in turn, will generate a
 916    discussion of how one defines "what is the same" and "what is similar
 917    enough to be a problem".
 918
 919    These difficulties can never be completely eliminated by algorithmic
 920    means.  Some of the problem can be addressed by appropriate tuning of
 921    the protocols and their tables, other parts by registry actions to
 922    reduce confusion and conflicts, and still other parts can be
 923    addressed by careful design of user interfaces in application
 924    programs.  But, ultimately, some responsibility to avoid being
 925    tricked or harmfully confused will rest with the user.
 926
 927    Another registry technique that has been extensively explored
 928    involves looking at confusable characters and confusion between
 929    complete labels, restricting the labels that can be registered based
 930    on relationships to what is registered already.  Registries that
 931    adopt this approach might establish special mapping rules such as:
 932
 933    1.  If you register something with codepoint A, domain names with B
 934        instead of A will be blocked from registration by others.
 935    2.  If you register something with codepoint A, you also get domain
 936        name with B instead of A.
 937
 938    These approaches are discussed in more detail for "CJK" characters in
 939    RFC 3743 [RFC3743] and more generally in RFC 4290 [RFC4290].
 940
 941 2.2.7.  The IESG Statement and IDNA issues
 942
 943    The issues above, at least as they were understood at the time,
 944    provided the background for the IESG statement included in
 945    Section 1.5.1 which, in turn, was part of the basis for the initial
 946    ICANN Guidelines) that a registry should have a policy about the
 947    scripts, languages, codepoints and text directions for which
 948    registrations will be accepted.  While "accept all" might be an
 949
 950
 951
 952 Klensin & Faltstrom     Expires October 14, 2006               [Page 17]
 953 \f
 954 Internet-Draft            IAB -- IDN Next Steps               April 2006
 955
 956
 957    acceptable policy, it implies there is also a dispute resolution
 958    process that takes the problems listed above into account.  The
 959    dispute resolution process must be designed so that all types of
 960    potential disputes must be able to be resolved: for example, issues
 961    might arise between registrant and registry over a decision by the
 962    registry on collisions with already registered domain names and
 963    between registrant and trade mark holder (that a domain name
 964    infringes on a trademark).  In both cases the parties disagreeing
 965    have different views on whether two strings are "equivalent" or not.
 966    They may believe that a string that is not allowed to be registered
 967    is actually different from one that is already registered.  Or they
 968    might believe that two strings are the same, even though the rules
 969    adopted by the registry to prevent confusion define them as two
 970    different domain names.
 971
 972 2.2.8.  Versions of Unicode
 973
 974    While opinions differ about how important the issues are in practice,
 975    the use of Unicode and its supporting tables to support IDNs appears
 976    to be far more sensitive to subtle changes than typical Unicode
 977    applications.  This may be, at least in part, because many other
 978    applications are internally sensitive only to the appearance of
 979    characters and not to their representation.  Or those applications
 980    may be able to take effective advantage of script, language, or
 981    character class identification.  The working group that developed
 982    IDNA concluded that attempting to encode any ancillary character
 983    information into the DNS label would be impractical and unwise, and
 984    the IAB, based in part on the comments in the ad hoc committee, saw
 985    no reason to review that decision.
 986
 987    This sensitivity to changes has made it quite difficult to migrate
 988    IDNA from one version of Unicode to the next if any changes are made
 989    that are not strictly additive.  A change in a code point assignment
 990    or definition may be extremely disruptive if DNS labels have been
 991    defined using the earlier form.  Unicode normalization tables, tables
 992    of scripts or languages and characters that belong to them, and even
 993    tables of confusable characters as an adjunct to security
 994    recommendations may be very helpful in designing registry
 995    restrictions on registrations and applications provisions for
 996    avoiding or identifying suspicious names.  Ironically, they also
 997    extend the sensitivity of IDNA and its implementations to all forms
 998    of change between one version of Unicode and the next.  Consequently,
 999    they make Unicode version migration more difficult.
1000
1001    An example of the type of change that appears to be just a small
1002    correction from one perspective but may be problematic from another
1003    was the correction to the normalization definition in 2004 [Unicode-
1004    PR29].  There was community input that the change would cause
1005
1006
1007
1008 Klensin & Faltstrom     Expires October 14, 2006               [Page 18]
1009 \f
1010 Internet-Draft            IAB -- IDN Next Steps               April 2006
1011
1012
1013    problems for Stringprep, but UTC decided, on balance, that the change
1014    was worthwhile.  Because of difficulties with consistency, some
1015    deployed implementations have decided to adopt the change and others
1016    have not, leading to subtle incompatibilities.
1017
1018    This situation leads to a dilemma.  On the one hand, it is completely
1019    unacceptable to freeze Unicode at a version level that excludes more
1020    recently-defined characters and scripts which are important to those
1021    who use them.  On the other hand, it is equally unacceptable to
1022    migrate from one version of Unicode to the next if such migration
1023    might invalidate an existing registered DNS name or some of its
1024    registered properties or might make the string or representation of
1025    that name ambiguous.  If IDNA is to be modified to accommodate new
1026    versions of Unicode, the IETF will need to work with the Unicode
1027    Consortium and other relevant bodies to find an appropriate balance
1028    in this area, but progress will be possible only if all relevant
1029    parties are able to fairly consider and discuss possible decisions
1030    that may be very difficult and unpalatable.
1031
1032
1033 3.  Framework for next steps in IDN development
1034
1035 3.1.  Issues within the scope of the IETF
1036
1037 3.1.1.  Review of IDNA
1038
1039    The IETF should consider reviewing RFCs 3454, 3490, 3491 and/or 3492,
1040    and update, replace or supplement them to meet the criteria of this
1041    paragraph (one or more of them may prove impractical after further
1042    study).  Any new versions or additional specifications should be
1043    adapted to the version of Unicode that is current when they are
1044    created.  Ideally, they should specify a path for adapting to future
1045    versions of Unicode (some suggestions below may facilitate this).
1046    The IETF should also consider whether there are significant
1047    advantages to mapping some groups of characters, such as code points
1048    assigned to font variations, into others or whether clarity and
1049    comprehensibility for the user would be better served by simply
1050    prohibiting those characters.  More generally, it appears that it
1051    would be worthwhile for the IETF to review whether the Unicode
1052    normalization rules now invoked by the Stringprep profile in Nameprep
1053    are optimal for the DNS or whether more restrictive rules, or an even
1054    more restrictive set of permitted character combinations, would
1055    provide better support for DNS internationalization.
1056
1057    The IAB has concluded that there is a consensus within the broader
1058    community that lists of codepoints should be specified by the use of
1059    an inclusion based mechanism (i.e., identifying the characters that
1060    are permitted), rather than by excluding a small number of characters
1061
1062
1063
1064 Klensin & Faltstrom     Expires October 14, 2006               [Page 19]
1065 \f
1066 Internet-Draft            IAB -- IDN Next Steps               April 2006
1067
1068
1069    from the total Unicode set as Stringprep and Nameprep do today.  That
1070    conclusion should be reviewed by the IETF community and action taken
1071    as appropriate.
1072
1073    We suggest that the individuals doing the review of the codepoints
1074    should work as a specialized design team.  To the extent possible,
1075    that work should be done jointly by people with experience from the
1076    IETF and deep knowledge of the constraints of the DNS and application
1077    design, participants from the Unicode Consortium, and other people
1078    necessary to be able to reach a generally-accepted result.  Because
1079    any work along these lines would be modifications and updates to
1080    standards-track documents, final review and approval of any proposals
1081    would necesarily follow normal IETF processes.
1082
1083    It is worth noting that sufficiently extreme changes to IDNA would
1084    require a new Punycode prefix, probably with long-term support for
1085    both the old prefix or the new one in both registration arrangements
1086    and applications.  An alternative, which is almost certainly
1087    impractical, would be some sort of "flag day", i.e., a date on which
1088    the old rules are simultaneously abandoned by everyone and the new
1089    ones adopted.  However, preliminary analysis indicates that few, if
1090    any, of the changes recommended for consideration elsewhere in this
1091    document would require this type of version change.  For example,
1092    additional restrictions on what can be registered may require policy
1093    decisions about actions to be taken with regard to labels that
1094    conformed to earlier rules but not to new ones, but not changes in
1095    the protocol or prefix.
1096
1097 3.1.2.  Non-DNS and Above-DNS Internationalization Approaches
1098
1099    The IETF should once again examine the extent to which it is
1100    appropriate to try to solve internationalization problems via the DNS
1101    and what place the many varieties of so-called "keyword systems" or
1102    other Internet navigational techniques might have.  Those techniques
1103    can be designed to impose fewer constraints, or at least different
1104    constraints, than IDNA and the DNS.  As discussed elsewhere in this
1105    document, IDNA cannot support information about scripts, languages,
1106    or Unicode versions on lookup.  As a consequence of the nature of DNS
1107    lookups, characters and labels either match or do not match; a near-
1108    match is simply not a possible concept in the DNS.  By contrast,
1109    observation of near-matching is common in human communication and in
1110    matching operations performed by people, especially when they have a
1111    particular script or language context in mind.  The DNS is further
1112    constrained by a fairly rigid internal aliasing system (via CNAME and
1113    DNAME resource records), while some applications of international
1114    naming may require more flexibility.  Finally, the rigid hierarchy of
1115    the DNS --and the tendency in practice for it to become flat at
1116    levels nearest the root-- and the need for names to be unique are
1117
1118
1119
1120 Klensin & Faltstrom     Expires October 14, 2006               [Page 20]
1121 \f
1122 Internet-Draft            IAB -- IDN Next Steps               April 2006
1123
1124
1125    more suitable for some purposes than others and may not be a good
1126    match for some purposes for which people wish to use IDNs.  Each of
1127    these constraints can be relaxed or changed by one or more systems
1128    that would provide alternatives to direct use of the DNS by users.
1129    Some of the issues involved are discussed further in Section 4.4 and
1130    various ideas have been discussed in detail in the IETF or IRTF.
1131    Many of those ideas have even been described in Internet Drafts or
1132    other documents.  As experience with IDNs and with expectations for
1133    them accumulates, it will probably become appropriate for the IETF or
1134    IRTF to revisit the underlying questions and possibilities.
1135
1136 3.1.3.  Security issues, certificates, etc.
1137
1138    Some characters look like others, often as the result of common
1139    origins.  The problem with these "confusable" characters, often
1140    incorrectly called homographs, has always existed when characters are
1141    presented to humans that interpret what is displayed and then make
1142    decisions based on what the person sees.  This is not a problem that
1143    exists only when working with internationalized domain names, but it
1144    makes the problem worse.  The result of a survey that would explain
1145    what the problems are might be interesting.  Many of these issues are
1146    mentioned in Unicode Technical Report #36 [UTR36].
1147
1148    In this and other issues associated with IDNs, precise use of
1149    terminology is important lest even more confusion result.  The
1150    definition of the term 'homograph' that normally appears in
1151    dictionaries and linguistic texts states that homographs are
1152    different words which are spelled identically (for example, the
1153    adjective 'brief' meaning short, the noun 'brief' meaning a document,
1154    and the verb 'brief' meaning to inform).  By definition, letters in
1155    two different alphabets are not the same, regardless of similarities
1156    in appearance.  This means that sequences of letters from two
1157    different scripts that appear to be identical on a computer display
1158    cannot be homographs in the accepted sense, even if they are both
1159    words in the dictionary of some language.  Assuming that there is a
1160    language written with Cyrillic script in which "cap" is a word,
1161    regardless of what it might mean, it is not a homograph of the Latin-
1162    script English word "cap".
1163
1164    When the security implications of visually confusable characters were
1165    brought to the forefront in 2005, the term homograph was used to
1166    designate any instance of graphic similarity, even when comparing
1167    individual characters.  This usage is not only incorrect, but risks
1168    introducing even more confusion and hence should be avoided.  The
1169    current preferred terminology is to describe these similar-looking
1170    characters as "confusable characters" or even "confusables".
1171
1172    Many people have suggested that confusable characters are a problem
1173
1174
1175
1176 Klensin & Faltstrom     Expires October 14, 2006               [Page 21]
1177 \f
1178 Internet-Draft            IAB -- IDN Next Steps               April 2006
1179
1180
1181    that must be addressed, at least in part, as part of the user
1182    interfaces of application software.  While it should almost certainly
1183    be part of a complete solution, that approach creates it own set of
1184    difficulties.  For example, a user switching between systems, or even
1185    between applications on the same system, may be surprised by
1186    different types of behavior and different levels of protection.  In
1187    addition, it is unclear how a secure setup for the end user should be
1188    designed.  Today, in the web browser, a padlock is a traditional way
1189    of describing some level of security for the end user.  Is this
1190    binary signaling enough?  Should there be any connection between a
1191    risk for a displayed string including confusable characters and the
1192    padlock or similar signaling to the user?
1193
1194    Many web browsers have adopted the convention, based on a "whitelist"
1195    or similar techniques, that IDNs within top-level domains that are
1196    deemed to practice safe practices about registration of confusable
1197    labels are displayed as native characters, while IDNs from other
1198    domains are displayed as Punycode.  These techniques clearly are not
1199    sensitive to different policies between top-level domains and their
1200    subdomains and, while clearly helpful, may not be adequate.  Are
1201    other methods of dealing with confusable characters possible?  Would
1202    other methods of identifying and listing policies about avoiding
1203    confusing registrations be feasible and helpful?
1204
1205    It would be interesting to see a more coordinated effort to have
1206    guidelines in the form of user interface guidelines.
1207
1208 3.1.4.  Non US-ASCII in local part of email addresses
1209
1210    Work is going on in the IETF related to the local part of email
1211    addresses.  It should be noted that the local part of email addresses
1212    has much different syntax and constraints than a domain name label,
1213    so to directly apply IDNA on the local part is not possible.
1214
1215 3.1.5.  Use of the Unicode Character Set in the IETF
1216
1217    Unicode, and the closely-related ISO 10646, are the only coded
1218    character set that aspires to include all of the world's characters.
1219    As such, they permit use of international characters without having
1220    to identify particular character coding standards or tables.  The
1221    requirement for a single character set is particularly important for
1222    use with the DNS since there is no place to put character set
1223    identification.  The decision to use Unicode as the base for IETF
1224    protocols going forward is discussed in [RFC2277].  The IAB does not
1225    see any reason to revisit the decision to use Unicode in IETF
1226    protocols.
1227
1228
1229
1230
1231
1232 Klensin & Faltstrom     Expires October 14, 2006               [Page 22]
1233 \f
1234 Internet-Draft            IAB -- IDN Next Steps               April 2006
1235
1236
1237 3.2.  Issues that fall within the purview of ICANN
1238
1239 3.2.1.  Dispute resolution
1240
1241    IDN creates new types of collisions between trademarks and domain
1242    names as well as collisions between domain names.  These have impact
1243    on dispute resolution processes used by registries and otherwise.  It
1244    is important that deployment of IDN evolve in parallel with review
1245    and updating of ICANN or registry-specific dispute resolution
1246    processes.
1247
1248 3.2.2.  Policy at registries
1249
1250    The IAB recommends that registries use an inclusion based model when
1251    choosing what characters to allow at the time of registration.  This
1252    list of characters is in turn to be a subset of what is allowed
1253    according to the updated IDNA standard.  The IAB further recommends
1254    that registries develop their inclusion based models in parallel with
1255    dispute resolution process at the registry itself.
1256
1257    Most established policies for dealing with claimed or apparent
1258    confusion or conflicts of names are based on "dispute resolution".
1259    Decisions about legitimate use or registration of one or more names
1260    are resolved at or after the time of registration on a case-by-case
1261    basis and using policies that are specific to the particular DNS zone
1262    or jurisdiction involved.  These policies have generally not been
1263    extended below the level of the DNS that is directly controlled by
1264    the top-level registry.
1265
1266    Because of the much larger number of conflicts that can be generated
1267    by the larger number of available and confusable characters in
1268    Unicode, we recommend that registration-restriction and dispute
1269    resolution policies be developed to constrain IDN registrations by
1270    registries and zone administrators at all levels of the DNS tree.  Of
1271    course, many of these policies will be less formal than others and
1272    there is no requirement for complete global consistency, but the
1273    arguments for reduction of confusable characters and other issues in
1274    TLDs should apply to all zones below that specific TLD.
1275
1276    Consistency across all zones can obviously only be accomplished by
1277    changes to the protocols.  Such changes should be considered by the
1278    IETF if particular restrictions are identified that are important and
1279    consistent enough to be applied globally.
1280
1281    Policy changes that would not permit existing, registered, names to
1282    be registered under the newer rules should be considered carefully,
1283    balancing their importance against possible disruption and the issues
1284    of invalidating older names against the importance of consistency as
1285
1286
1287
1288 Klensin & Faltstrom     Expires October 14, 2006               [Page 23]
1289 \f
1290 Internet-Draft            IAB -- IDN Next Steps               April 2006
1291
1292
1293    seen by the user.
1294
1295 3.2.3.  IDN TLDs
1296
1297    The IAB has concluded that there is not one IDN TLD issue but at
1298    least three very separate ones:
1299
1300    o  Assuming there are to be IDN entries in the root zone at all, a
1301       decision must be made as to what TLDs are to be created and how
1302       they are to be named.  This decision falls within the traditional
1303       IANA scope and is an ICANN issue today.
1304    o  There has been discussion of permitting some or all existing TLDs
1305       to be referenced by multiple labels, with those labels presumably
1306       representing some understanding of the "name" of the TLD in
1307       different languages.  If actual aliases of this type are desired
1308       for existing domains, the IETF may need to consider whether the
1309       use of DNAME records in the root is appropriate to meet that need,
1310       what constraints, if any, are needed, whether alternate
1311       approaches, such as those of [RFC4185], are appropriate or whether
1312       further alternatives should be investigated.  But, to the extent
1313       to which aliases are considered desirable and feasible, decisions
1314       presumably must be made as to which, if any, root IDN labels
1315       should be associated with DNAME records and which ones should be
1316       handled by normal delegation records or other mechanisms.  That
1317       decision is one of DNS root-level namespace policy and hence falls
1318       to ICANN although we would expect ICANN to pay careful attention
1319       to any technical, operational, or security recommendations that
1320       may be produced by other bodies.
1321    o  Finally, if IDN labels are to be placed in the root zone, there
1322       are issues associated with how they are to be encoded and
1323       deployed.  This area may have implications for work that has been
1324       done, or should be done, in the IETF.
1325
1326
1327 4.  Specific Recommendations for Next Steps
1328
1329    Consistent with the framework described above, the IAB offers these
1330    recommendations as steps for further consideration in the identified
1331    groups.
1332
1333 4.1.  Reduction of permitted character list
1334
1335    Generalize from the original "hostname" rules to non-ASCII
1336    characters, permitting as few characters as possible to do that job.
1337    This would represent a restriction of the model of characters
1338    permitted in IDN labels, and it contrasts with the approach used to
1339    develop the original IDNA/Nameprep tables: that approach was to
1340    include all Unicode characters that there was not a clear reason to
1341
1342
1343
1344 Klensin & Faltstrom     Expires October 14, 2006               [Page 24]
1345 \f
1346 Internet-Draft            IAB -- IDN Next Steps               April 2006
1347
1348
1349    exclude.
1350
1351    The specific recommendation here is to specify such internationalized
1352    hostnames.  Such an activity would fall to the IETF, although the
1353    task of developing the appropriate list of permitted characters will
1354    require effort both in the IETF and elsewhere.  The effort should be
1355    as linguistically and culturally sensitive as possible, but smooth
1356    and effective operation of the DNS, including minimizing of
1357    complexity, should be primary goals.  The following should be
1358    considered as possible mechanisms for achieving an appropriate
1359    minimum number of characters.
1360
1361 4.1.1.  Elimination of all non-language characters
1362
1363    Unicode characters that are not needed to write words or numbers in
1364    any of the world's languages should be eliminated from the list of
1365    characters that are appropriate in DNS labels.  In addition to such
1366    characters as those used for box-drawing and sentence punctuation,
1367    this should exclude punctuation for word structure and other
1368    delimiters: while DNS labels may conveniently be used to express
1369    words in many circumstances, the goal is not to express words (or
1370    sentences or phrases), but to permit the creation of unambiguous
1371    labels with good mnemonic value.
1372
1373 4.1.2.  Elimination of word-separation punctuation
1374
1375    The inclusion of the hyphen in the original hostname rules is a
1376    historical artifact from an older, flat, name space.  The community
1377    should consider whether it is appropriate to treat it as a simple
1378    legacy property of ASCII names and not attempt to generalize it to
1379    other scripts.  We might, for example, not permit claimed equivalents
1380    to the hyphen from other scripts to be used in IDNs.  We might even
1381    consider banning use of the hyphen itself in non-ASCII strings or,
1382    less restrictively, strings that contained non-Roman characters.
1383
1384 4.2.  Updating to new versions of Unicode
1385
1386    As new scripts, to support new languages, continue to be added to
1387    Unicode, it is important that IDNA track updates.  If it does not do
1388    so, but remains "stuck" at 3.2 or some single later version, it will
1389    not be possible to include labels in the DNS that are derived from
1390    words in languages that require characters that are available only in
1391    later versions.  Making those upgrades is difficult, and will
1392    continue to be difficult, as long as new versions require, not just
1393    addition of characters, but changes to canonicalization conventions,
1394    normalization tables, or matching procedures (see Section 2.2.8).
1395    Anything that can be done to lower complexity and simplify forward
1396    transitions should be seriously considered.
1397
1398
1399
1400 Klensin & Faltstrom     Expires October 14, 2006               [Page 25]
1401 \f
1402 Internet-Draft            IAB -- IDN Next Steps               April 2006
1403
1404
1405 4.3.  Combining Characters and Character Components
1406
1407    One thing that increases IDNA complexity and the need for
1408    normalization is that combining characters are permitted.  Without
1409    them, complexity might be reduced enough to permit more easy
1410    transitions to new versions.  The community should consider whether
1411    combining characters should be prohibited entirely from IDNs.  A
1412    consequence of this, of course, is that each new language or script,
1413    and several existing ones, would require that all of its characters
1414    have Unicode assignments to specific, precomposed, code points.
1415
1416    Note that this is not currently permitted within Unicode for Roman-
1417    based scripts.  For non-Roman scripts, some such code points have
1418    been defined.  The decisions that govern the assignment of such code
1419    points are managed entirely within the Unicode Consortium.  Were the
1420    IETF to choose to reduce IDNA complexity by excluding combining
1421    characters, no doubt there would be additional input to the Unicode
1422    Consortium from users and proponents of scripts requiring composing
1423    characters.  The IAB and the IETF should examine whether it is
1424    appropriate to press the Unicode Consortium to revise these policies
1425    or otherwise to recommend actions that would reduce the need for
1426    normalization and the related complexities.  However, we have been
1427    told that the Technical Committee does not believe it is reasonable
1428    or feasible to add all possible precomposed characters to Unicode.
1429    If Unicode cannot be modified to contain the precomposed characters
1430    necessary to support existing languages and scripts, much less new
1431    ones, this option for IDN restrictions will not be feasible.
1432
1433    Retaining combining characters without further global restrictions
1434    may leave us "stuck" at Unicode 3.2, leading either to
1435    incompatibility differences in applications that otherwise use a
1436    modern version of Unicode (while IDN remains at Unicode 3.2) or to
1437    painful transitions to new versions.
1438
1439 4.4.  Role and Uses of the DNS
1440
1441    We wish to remind the community that there are boundaries to the
1442    appropriate uses of the DNS.  It was designed and implemented to
1443    serve some specific purposes.  There are additional things that it
1444    does well, other things that it does badly, and still other things it
1445    cannot do at all.  No amount of protocol work on IDNs will solve
1446    problems with alternate spellings, near-matches, searching for
1447    appropriate names, and so on.  Registration restrictions and
1448    carefully-designed user interfaces can be used to reduce the risk and
1449    pain of attempts to do some of these things gone wrong, as well as
1450    reducing the risks of various sort of deliberate bad behavior, but,
1451    beyond a certain point, use of the DNS simply because it is available
1452    becomes a bad tradeoff.  The tradeoff may be particularly unfortunate
1453
1454
1455
1456 Klensin & Faltstrom     Expires October 14, 2006               [Page 26]
1457 \f
1458 Internet-Draft            IAB -- IDN Next Steps               April 2006
1459
1460
1461    when the use of IDNs does not actually solve the proposed problem.
1462    For example, internationalization of DNS names does not eliminate the
1463    ASCII protocol identifiers and structure of URIs [RFC3986] and even
1464    IRIs [RFC3987].  Hence, DNS internationalization itself, at any or
1465    all levels of the DNS tree, is not a sufficient response to the
1466    desire of populations to use the Internet entirely in their own
1467    languages and the characters associated with those languages.
1468
1469    These issues are discussed at more length, and alternatives
1470    presented, in [RFC2825], [RFC3467], [INDNS], and [DNS-Choices].
1471
1472 4.5.  Databases of Registered Names
1473
1474    In addition to their presence in the DNS, IDNs introduce issues in
1475    other contexts in which domain names are used.  In particular, the
1476    design and content of databases that bind registered names to
1477    information about the registrant (commonly described as "whois"
1478    databases) will require review and updating.  For example, the whois
1479    protocol itself [RFC3912] is ASCII-only: with a conforming
1480    implementation of the Whois protocol, one cannot search for, or
1481    report, either a DNS name or contact information that is not in ASCII
1482    characters .  This may provide some additional impetus for a switch
1483    to IRIS [RFC3981] [RFC3982] but also raises a number of other
1484    questions about what information, and in what languages and scripts,
1485    should be included or permitted in such databases.
1486
1487
1488 5.  Security Considerations
1489
1490    This document is simply a discussion of IDNs and IDN issues; it
1491    raises no new security concerns.  However, if some of its
1492    recommendations to reduce IDNA complexity, the number of available
1493    characters, and various approaches to constraining the use of
1494    confusable characters, are followed and prove successful, the risks
1495    of name spoofing and other problems may be reduced.
1496
1497
1498 6.  Acknowledgments
1499
1500    The contributions to this report from members of the IAB-IDN ad hoc
1501    committee are gratefully acknowledged.  Of course, not all of the
1502    members of that group endorse every comment and suggestion of this
1503    report.  In particular, this report does not claim to reflect the
1504    views of the Unicode Consortium as a whole or those of particular
1505    participants in the work of that Consortium.  The members of the ad
1506    hoc committee were:
1507
1508    Rob Austein, Leslie Daigle, Tina Dam, Mark Davis, Patrik Faltstrom,
1509
1510
1511
1512 Klensin & Faltstrom     Expires October 14, 2006               [Page 27]
1513 \f
1514 Internet-Draft            IAB -- IDN Next Steps               April 2006
1515
1516
1517    Scott Hollenbeck, Cary Karp, John Klensin, Gervase Markham, David
1518    Meyer, Thomas Narten, Michael Suignard, Sam Weiler, Bert Wijnen, Kurt
1519    Zeilenga and Lixia Zhang.
1520
1521    Special thanks are due to Cary Karp and Tina Dam for contributions of
1522    considerable specific text, to Marcos Sanz and Paul Hoffman for
1523    careful late-stage reading and extensive comments, and to Pete
1524    Resnick for many contributions and comments, both in conjunction with
1525    his former IAB service and subsequently.
1526
1527    Members of the IAB at the time of approval of this document were:
1528    [[anchor39: To be supplied]]
1529
1530
1531 7.  Change History
1532
1533    [[anchor41: RFC Editor: this section is to be removed before
1534    publication]]
1535
1536 7.1.  Changes for version -01
1537
1538    1.  Added discussion and reference to Unicode PR-29
1539    2.  Replaced the discussion of the ICANN Guidelines (with thanks to
1540        Tina Dam and Cary Karp).
1541    3.  Revised the Bidi text to make the potential recommendation more
1542        clear.
1543    4.  Removed any claims (actual or implied) of endorsement by the
1544        members of the ad hoc committee.
1545    5.  Several small editorial changes, etc.
1546
1547 7.2.  Changes for version -02
1548
1549    1.  Added some additional references, e.g., to W3C
1550        internationalization work and to UTR39.
1551    2.  Adjusted some terminology to correct errors and avoid unnecessary
1552        controversy.
1553    3.  Extended the discussion of related characters in Swedish and
1554        Norwegian to clarify at least one of the possibilities
1555    4.  Introduced new Section 4.5 to discuss IDN issues in other than
1556        the DNS itself and point to IRIS.
1557    5.  Rewrote the introduction to the "problem" section and its first
1558        subsection.
1559    6.  Small changes made to the "definitions" section including
1560        explaining why "multilingual" is there and rewriting the "script"
1561        definition to clarify slightly and put the example script names
1562        into alphabetical order.
1563
1564
1565
1566
1567
1568 Klensin & Faltstrom     Expires October 14, 2006               [Page 28]
1569 \f
1570 Internet-Draft            IAB -- IDN Next Steps               April 2006
1571
1572
1573    7.  Section 3.2.3, has been fairly extensively rewritten for clarity,
1574        and a large number of less extensive clarifications have been
1575        made, although no substantive changes have been (intentionally)
1576        occurred.
1577
1578 7.3.  Changes for Version -03
1579
1580    1.  Made a number of further tuning changes to better reflect the
1581        role of the document and corrected several references.
1582    2.  Removed the reference to Vietnamese.
1583    3.  Added a discussion of IDNA versioning and new prefixes.
1584
1585 7.4.  Changes for version -04
1586
1587    1.  Corrected many small typographical and editorial errors.
1588    2.  Clarified that elimination of non-language characters was not
1589        intended to eliminate digits.
1590
1591 7.5.  Changes for version -05
1592
1593    1.  Revised section 4.3 to further clarify the suggestion.
1594    2.  Revised the Acknowledgments section
1595
1596
1597 8.  References
1598
1599 8.1.  Normative References
1600
1601    [ISO10646]
1602               International Organization for Standardization,
1603               "Information Technology - Universal Multiple- Octet Coded
1604               Character Set (UCS) - Part 1: Architecture and Basic
1605               Multilingual Plane"", ISO/IEC 10646-1:2000, October 2000.
1606
1607    [RFC3454]  Hoffman, P. and M. Blanchet, "Preparation of
1608               Internationalized Strings ("stringprep")", RFC 3454,
1609               December 2002.
1610
1611    [RFC3490]  Faltstrom, P., Hoffman, P., and A. Costello,
1612               "Internationalizing Domain Names in Applications (IDNA)",
1613               RFC 3490, March 2003.
1614
1615    [RFC3491]  Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
1616               Profile for Internationalized Domain Names (IDN)",
1617               RFC 3491, March 2003.
1618
1619    [RFC3492]  Costello, A., "Punycode: A Bootstring encoding of Unicode
1620               for Internationalized Domain Names in Applications
1621
1622
1623
1624 Klensin & Faltstrom     Expires October 14, 2006               [Page 29]
1625 \f
1626 Internet-Draft            IAB -- IDN Next Steps               April 2006
1627
1628
1629               (IDNA)", RFC 3492, March 2003.
1630
1631    [Unicode32]
1632               The Unicode Consortium, "The Unicode Standard, Version
1633               3.0", 2000.
1634
1635               (Reading, MA, Addison-Wesley, 2000.  ISBN 0-201-61633-5).
1636               Version 3.2 consists of the definition in that book as
1637               amended by the Unicode Standard Annex #27: Unicode 3.1
1638               (http://www.unicode.org/reports/tr27/) and by the Unicode
1639               Standard Annex #28: Unicode 3.2
1640               (http://www.unicode.org/reports/tr28/).
1641
1642 8.2.  Informative References
1643
1644    [DNS-Choices]
1645               Faltstrom, P., "Design Choices When Expanding DNS",
1646               draft-iab-dns-choices-02 (work in progress), June 2005.
1647
1648    [ICANNv1]  ICANN, "Guidelines for the Implementation of
1649               Internationalized Domain Names, Version 1.0", March 2003,
1650               <http://www.icann.org/general/idn-guidelines-20jun03.htm>.
1651
1652    [ICANNv2]  ICANN, "Guidelines for the Implementation of
1653               Internationalized Domain Names, Version 2.0",
1654               November 2005,
1655               <http://www.icann.org/general/idn-guidelines-20sep05.htm>.
1656
1657    [IESG-IDN]
1658               Internet Engineering Steering Group (IESG), "IESG
1659               Statement on IDN", IESG Statements IDN Statement,
1660               February 2003,
1661               <http://www.ietf.org/IESG/STATEMENTS/IDNstatement.txt>.
1662
1663    [INDNS]    National Research Council, "Signposts in Cyberspace: The
1664               Domain Name System and Internet Navigation", National
1665               Academy Press ISBN 0309-09640-5 (Book) 0309-54979-5 (PDF),
1666               2005,
1667               <http://www7.nationalacademies.org/cstb/pub_dns.html>.
1668
1669    [ISO.2022.1986]
1670               International Organization for Standardization,
1671               "Information Processing: ISO 7-bit and 8-bit coded
1672               character sets: Code extension techniques", ISO Standard
1673               2022, 1986.
1674
1675    [ISO.646.1991]
1676               International Organization for Standardization,
1677
1678
1679
1680 Klensin & Faltstrom     Expires October 14, 2006               [Page 30]
1681 \f
1682 Internet-Draft            IAB -- IDN Next Steps               April 2006
1683
1684
1685               "Information technology - ISO 7-bit coded character set
1686               for information interchange", ISO Standard 646, 1991.
1687
1688    [ISO.8859.2003]
1689               International Organization for Standardization,
1690               "Information processing - 8-bit single-byte coded graphic
1691               character sets - Part 1: Latin alphabet No. 1 (1998) -
1692               Part 2: Latin alphabet No. 2 (1999) - Part 3: Latin
1693               alphabet No. 3 (1999) - Part 4: Latin alphabet No. 4
1694               (1998) - Part 5: Latin/Cyrillic alphabet (1999) - Part 6:
1695               Latin/Arabic alphabet (1999) - Part 7: Latin/Greek
1696               alphabet (2003) - Part 8: Latin/Hebrew alphabet (1999) -
1697               Part 9: Latin alphabet No. 5 (1999) - Part 10: Latin
1698               alphabet No. 6 (1998) - Part 11: Latin/Thai alphabet
1699               (2001) - Part 13: Latin alphabet No. 7 (1998) - Part 14:
1700               Latin alphabet No. 8 (Celtic) (1998) - Part 15: Latin
1701               alphabet No. 9 (1999) - Part 16: Part 16: Latin alphabet
1702               No. 10 (2001)", ISO Standard 8859, 2003.
1703
1704    [RFC2277]  Alvestrand, H., "IETF Policy on Character Sets and
1705               Languages", BCP 18, RFC 2277, January 1998.
1706
1707    [RFC2825]  IAB and L. Daigle, "A Tangled Web: Issues of I18N, Domain
1708               Names, and the Other Internet protocols", RFC 2825,
1709               May 2000.
1710
1711    [RFC3066]  Alvestrand, H., "Tags for the Identification of
1712               Languages", BCP 47, RFC 3066, January 2001.
1713
1714    [RFC3467]  Klensin, J., "Role of the Domain Name System (DNS)",
1715               RFC 3467, February 2003.
1716
1717    [RFC3536]  Hoffman, P., "Terminology Used in Internationalization in
1718               the IETF", RFC 3536, May 2003.
1719
1720    [RFC3743]  Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint
1721               Engineering Team (JET) Guidelines for Internationalized
1722               Domain Names (IDN) Registration and Administration for
1723               Chinese, Japanese, and Korean", RFC 3743, April 2004.
1724
1725    [RFC3912]  Daigle, L., "WHOIS Protocol Specification", RFC 3912,
1726               September 2004.
1727
1728    [RFC3981]  Newton, A. and M. Sanz, "IRIS: The Internet Registry
1729               Information Service (IRIS) Core Protocol", RFC 3981,
1730               January 2005.
1731
1732    [RFC3982]  Newton, A. and M. Sanz, "IRIS: A Domain Registry (dreg)
1733
1734
1735
1736 Klensin & Faltstrom     Expires October 14, 2006               [Page 31]
1737 \f
1738 Internet-Draft            IAB -- IDN Next Steps               April 2006
1739
1740
1741               Type for the Internet Registry Information Service
1742               (IRIS)", RFC 3982, January 2005.
1743
1744    [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
1745               Resource Identifier (URI): Generic Syntax", STD 66,
1746               RFC 3986, January 2005.
1747
1748    [RFC3987]  Duerst, M. and M. Suignard, "Internationalized Resource
1749               Identifiers (IRIs)", RFC 3987, January 2005.
1750
1751    [RFC4185]  Klensin, J., "National and Local Characters for DNS Top
1752               Level Domain (TLD) Names", RFC 4185, October 2005.
1753
1754    [RFC4290]  Klensin, J., "Suggested Practices for Registration of
1755               Internationalized Domain Names (IDN)", RFC 4290,
1756               December 2005.
1757
1758    [UTR]      Unicode Consortium, "Unicode Technical Reports",
1759               <http://www.unicode.org/reports/>.
1760
1761    [UTR36]    Davis, M. and M. Suignard, "Unicode Technical Report #36:
1762               Unicode Security Considerations", November 2005,
1763               <http://www.unicode.org/draft/reports/tr36/tr36.html>.
1764
1765               Working Draft for Proposed Update
1766
1767    [UTR39]    Davis, M. and M. Suignard, "Unicode Technical Standard #39
1768               (proposed): Unicode Security Considerations", July 2005,
1769               <http://www.unicode.org/draft/reports/tr39/tr39.html>.
1770
1771               Working Draft for Proposed Draft
1772
1773    [Unicode-PR29]
1774               The Unicode Consortium, "Public Review Issue #29:
1775               Normalization Issue", Unicode PR 29, February 2004.
1776
1777    [Unicode10]
1778               The Unicode Consortium, "The Unicode Standard, Version
1779               1.0", 1991.
1780
1781    [W3C-Localization]
1782               Ishida, R. and S. Miller, "Localization vs.
1783               Internationalization", W3C International/questions/
1784               qa-i18n.txt, December 2005.
1785
1786    [ltru-initial]
1787               Ewell, D., Ed., "Initial Language Subtag Registry",
1788               draft-ietf-ltru-initial-06 (work in progress),
1789
1790
1791
1792 Klensin & Faltstrom     Expires October 14, 2006               [Page 32]
1793 \f
1794 Internet-Draft            IAB -- IDN Next Steps               April 2006
1795
1796
1797               February 2004.
1798
1799               This document is awaiting publication as an Informational
1800               RFC.
1801
1802    [ltru-registry]
1803               Phillips, A., Ed. and M. Davis, Ed., "Tags for Identifying
1804               Languages", draft-ietf-ltru-registry-14 (work in
1805               progress), October 2004.
1806
1807               This document has been approved as a Proposed Standard and
1808               is awaiting publication as an RFC.
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848 Klensin & Faltstrom     Expires October 14, 2006               [Page 33]
1849 \f
1850 Internet-Draft            IAB -- IDN Next Steps               April 2006
1851
1852
1853 Authors' Addresses
1854
1855    John C Klensin
1856    1770 Massachusetts Ave, #322
1857    Cambridge, MA  02140
1858    USA
1859
1860    Phone: +1 617 491 5735
1861    Email: john-ietf@jck.com
1862
1863
1864    Patrik Faltstrom
1865    IAB
1866
1867    Email: paf@cisco.com
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904 Klensin & Faltstrom     Expires October 14, 2006               [Page 34]
1905 \f
1906 Internet-Draft            IAB -- IDN Next Steps               April 2006
1907
1908
1909 Intellectual Property Statement
1910
1911    The IETF takes no position regarding the validity or scope of any
1912    Intellectual Property Rights or other rights that might be claimed to
1913    pertain to the implementation or use of the technology described in
1914    this document or the extent to which any license under such rights
1915    might or might not be available; nor does it represent that it has
1916    made any independent effort to identify any such rights.  Information
1917    on the procedures with respect to rights in RFC documents can be
1918    found in BCP 78 and BCP 79.
1919
1920    Copies of IPR disclosures made to the IETF Secretariat and any
1921    assurances of licenses to be made available, or the result of an
1922    attempt made to obtain a general license or permission for the use of
1923    such proprietary rights by implementers or users of this
1924    specification can be obtained from the IETF on-line IPR repository at
1925    http://www.ietf.org/ipr.
1926
1927    The IETF invites any interested party to bring to its attention any
1928    copyrights, patents or patent applications, or other proprietary
1929    rights that may cover technology that may be required to implement
1930    this standard.  Please address the information to the IETF at
1931    ietf-ipr@ietf.org.
1932
1933
1934 Disclaimer of Validity
1935
1936    This document and the information contained herein are provided on an
1937    "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1938    OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
1939    ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
1940    INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
1941    INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1942    WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
1943
1944
1945 Copyright Statement
1946
1947    Copyright (C) The Internet Society (2006).  This document is subject
1948    to the rights, licenses and restrictions contained in BCP 78, and
1949    except as set forth therein, the authors retain all their rights.
1950
1951
1952 Acknowledgment
1953
1954    Funding for the RFC Editor function is currently provided by the
1955    Internet Society.
1956
1957
1958
1959
1960 Klensin & Faltstrom     Expires October 14, 2006               [Page 35]
1961 \f
1962