1 Libidn getaddrinfo-idn.txt -- Proposal for IDN support in POSIX getaddrinfo.
2 Copyright (C) 2003, 2004 Simon Josefsson
3 See the end for copying conditions.
8 Libidn is a package for internationalized string handling based on the
9 Stringprep, Punycode and Internationalized Domain Names in
10 Applications (IDNA) specifications. It can be used by applications
11 directly by linking to it, as is done by, e.g., Gnus, KDE, and Mutt.
13 Having each and every application link with and perform its own IDN
14 handling is not a good idea. It bloats the code and makes things
15 unnecessarily complex. Only few applications, such as web browsers
16 and mail clients, will need to do this in the future, to provide good
17 user interfaces for internationalization.
19 See http://josefsson.org/libidn/ for more information.
21 Alternative Approaches
22 ----------------------
24 There are implementation that modify gethostbyname() to accept UTF-8
25 strings and perform the IDNA ToASCII operation within gethostbyname().
27 There are even implementations that assume gethostbyname (on the
28 client host) perform no validation of the string and will send UTF-8
29 strings out to the DNS server, and perform the IDN-conversion on the
32 Some doubts can be raised whether this is an approach that is likely
33 to be standardized. It also lack in functionality: it only provide
34 black-box ToASCII functionality. The application cannot extract the
35 output from the ToASCII operation. More important, there is no way to
36 perform a ToUnicode operation that applications may want to use for
37 display purposes. Furthermore, while the first can support locale
38 specific character sets (e.g., ISO-8859-1), the second approach is
39 bound to either guess the character set, or always use UTF-8.
41 See also the thread rooted in <iluel7n6bmu.fsf@latte.josefsson.org>
42 posted to libc-alpha@sources.redhat.com on 08 Jan 2003.
47 The getaddrinfo() API should have two new flags, AI_IDN and
48 AI_CANONIDN. Roughly they correspond to IDNA ToASCII and IDNA
49 ToUnicode, but there are several details. Note that strings are still
50 'char*', i.e. it does not use the "wide" character type, and that the
51 encoding of non-ASCII strings are the current locale's character set
52 (i.e., nl_langinfo(CODESET)).
54 An application that uses AI_IDN signal to the getaddrinfo()
55 implementation that the input host name may be non-ASCII and that the
56 appropriate IDNA ToASCII steps should be carried out on the input, and
57 the output from the ToASCII operation (if any) should be used in the
58 lookup using the current resolver processing.
60 An application that uses AI_CANONIDN signal to the getaddrinfo()
61 implementation that the input host name should be put through the IDNA
62 ToUnicode steps, and the output of that placed in the 'ai_canonname'
63 field of the resulting structure. Normal resolver processing applies
64 to the input string, of course.
66 Consequently, an application that uses AI_IDN|AI_CANONIDN signal to
67 the getaddrinfo() implementation that the input host name may be
68 non-ASCII and should be put through the IDNA ToASCII steps before run
69 through the resolver, and that the input string should also be run
70 through the IDNA ToUnicode steps and the output of that placed in the
73 The semantics of AI_CANONNAME|AI_CANONIDN is that instead of running
74 the ToUnicode IDNA steps on the input string, the canonical host name
75 as returned by the resolver for the input string should be used in the
81 Four new flags has been proposed; AI_IDN_ALLOW_UNASSIGNED,
82 AI_IDN_USE_STD3_ASCII_RULES for getaddrinfo, and
83 NI_IDN_ALLOW_UNASSIGNED, NI_IDN_USE_STD3_ASCII_RULES for getnameinfo.
84 The implementation is simple, if specified those flag will set the
85 appropriate flag in the call to the IDNA functions. See the RFC for
86 the meaning of those flags.
91 The AI_IDN flag has been implemented and shipped as a proof-of-concept
92 patch for GNU Libc with GNU Libidn since January 2003. Binary libc
93 packages with the patch exists for (at least) two GNU/Linux
94 distributions. The AI_CANONIDN flag is not yet implemented.
96 As of March 2004, Libidn has been integrated as an add-on in the GNU
97 Libc CVS repository. The AI_CANONIDN flag has been implemented. The
98 AllowUnassigned and UseSTD3ASCIIRules flags were added.
103 Allow non-ASCII in gethostname (and similar functions), if
104 administrator has supplied, e.g., 'option idn' in /etc/resolv.conf?
109 This document is a work-in-progress and the details may change.
110 Contact me at simon@josefsson.org to discuss changes.
112 ----------------------------------------------------------------------
113 Permission is granted to anyone to make or distribute verbatim copies
114 of this document, in any medium, provided that the copyright notice
115 and permission notice are preserved, and that the distributor grants
116 the recipient permission for further redistribution as permitted by
117 this notice. Modified versions may not be made.