1 \input texinfo @c -*-texinfo-*-
2 @c This file is part of the GNU Libidn Manual.
3 @c Copyright (C) 2002, 2003 Simon Josefsson
4 @c See below for copying conditions.
6 @setfilename libidn.info
8 @settitle GNU Libidn @value{VERSION}
13 This manual is for GNU Libidn
14 (version @value{VERSION}, @value{UPDATED}),
15 which is a library for internationalized string processing.
17 Copyright @copyright{} 2002, 2003 Simon Josefsson.
20 Permission is granted to copy, distribute and/or modify this document
21 under the terms of the GNU Free Documentation License, Version 1.1 or
22 any later version published by the Free Software Foundation; with no
23 Invariant Sections, with the Front-Cover Texts being ``A GNU Manual,''
24 and with the Back-Cover Texts as in (a) below. A copy of the
25 license is included in the section entitled ``GNU Free Documentation
28 (a) The FSF's Back-Cover Text is: ``You have freedom to copy and modify
29 this GNU Manual, like GNU software. Copies published by the Free
30 Software Foundation raise funds for GNU development.''
34 @dircategory GNU Libraries
36 * libidn: (libidn). Internationalized string processing library.
39 @dircategory GNU utilities
41 * idn: (libidn)Invoking idn. Command line interface to GNU Libidn.
46 * IDN Library: (libidn)Emacs API. Emacs API for IDN functions.
51 @subtitle for version @value{VERSION}, @value{UPDATED}
52 @author Simon Josefsson (@email{bug-libidn@@gnu.org})
54 @vskip 0pt plus 1filll
68 * Introduction:: How to use this manual.
69 * Preparation:: What you should do before using the library.
70 * Stringprep Functions:: Stringprep functions.
71 * Punycode Functions:: Punycode functions.
72 * IDNA Functions:: IDNA functions.
73 * Examples:: Demonstrate how to use the library.
74 * Acknowledgements:: Whom to blame.
76 * Invoking idn:: Command line interface to the library.
78 * Emacs API:: Emacs Lisp API for Libidn.
82 * Library Copying:: How you can copy and share GNU Libidn.
83 * Copying This Manual:: How you can copy and share this manual.
88 * Function and Variable Index::
96 GNU Libidn is an implementation of the Stringprep, Punycode and IDNA
97 specifications defined by the IETF Internationalized Domain Names
98 (IDN) working group, used for internationalized domain names. The
99 package is available under the GNU Lesser General Public License.
101 The library contains a generic Stringprep implementation that does
102 Unicode 3.2 NFKC normalization, mapping and prohibitation of
103 characters, and bidirectional character handling. Profiles for iSCSI,
104 Kerberos 5, Nameprep, SASL and XMPP are included. Punycode and ASCII
105 Compatible Encoding (ACE) via IDNA are supported.
107 The Stringprep API consists of two main functions, one for converting
108 data from the system's native representation into UTF-8, and one
109 function to perform the Stringprep processing. Adding a new
110 Stringprep profile for your application within the API is
111 straightforward. The Punycode API consists of one encoding function
112 and one decoding function. The IDNA API consists of the ToASCII and
113 ToUnicode functions, as well as an high-level interface for converting
114 entire domain names to and from the ACE encoded form.
116 The library is used by, e.g., GNU SASL and Shishi to process user
117 names and passwords. Libidn can be built into GNU Libc to enable a
118 new system-wide getaddrinfo() flag for IDN processing.
120 GNU Libidn is developed for the GNU/Linux system, but runs on over 20
121 Unix platforms (including Solaris, IRIX, AIX, and Tru64) and Windows.
126 * Supported Platforms::
130 @node Getting Started
131 @section Getting Started
133 This manual documents the library programming interface. All
134 functions and data types provided by the library are explained.
136 The reader is assumed to possess basic familiarity with
137 internationalization concepts and network programming in C or C++.
139 This manual can be used in several ways. If read from the beginning
140 to the end, it gives a good introduction into the library and how it
141 can be used in an application. Forward references are included where
142 necessary. Later on, the manual can be used as a reference manual to
143 get just the information needed about any particular interface of the
144 library. Experienced programmers might want to start looking at the
145 examples at the end of the manual (@pxref{Examples}), and then only
146 read up those parts of the interface which are unclear.
151 This library might have a couple of advantages over other libraries
155 @item It's Free Software
156 Anybody can use, modify, and redistribute it under the terms of the
157 GNU Lesser General Public License.
159 @item It's thread-safe
160 No global state is kept in the library.
163 It should work on all Unix like operating systems, including Windows.
167 @node Supported Platforms
168 @section Supported Platforms
170 Libidn has at some point in time been tested on the following
175 @item Debian GNU/Linux 3.0 (Woody)
178 GCC 2.95.4 and GNU Make. This is the main development platform.
179 @code{alphaev67-unknown-linux-gnu}, @code{alphaev6-unknown-linux-gnu},
180 @code{hppa64-unknown-linux-gnu}, @code{i686-pc-linux-gnu},
181 @code{ia64-unknown-linux-gnu}.
186 Tru64 UNIX C compiler and Tru64 Make. @code{alphaev67-dec-osf5.1},
187 @code{alphaev68-dec-osf5.1}.
192 GCC 2.96 and GNU Make. @code{alphaev6-unknown-linux-gnu},
193 @code{alphaev67-unknown-linux-gnu}.
195 @item SuSE Linux 7.2a
198 GCC 3.0 and GNU Make. @code{ia64-unknown-linux-gnu}.
200 @item RedHat Linux 7.2
203 GCC 2.96 and GNU Make. @code{alphaev6-unknown-linux-gnu},
204 @code{alphaev67-unknown-linux-gnu}, @code{ia64-unknown-linux-gnu}.
206 @item RedHat Linux 8.0
209 GCC 3.2 and GNU Make. @code{i686-pc-linux-gnu}.
211 @item RedHat Advanced Server 2.1
212 @cindex RedHat Advanced Server
214 GCC 2.96 and GNU Make. @code{i686-pc-linux-gnu}.
216 @item Slackware Linux 8.0.01
219 GCC 2.95.3 and GNU Make. @code{i686-pc-linux-gnu}.
221 @item Mandrake Linux 9.0
224 GCC 3.2 and GNU Make. @code{i686-pc-linux-gnu}.
229 MIPS C compiler, IRIX Make. @code{mips-sgi-irix6.5}.
234 IBM C for AIX compiler, AIX Make. @code{rs6000-ibm-aix4.3.2.0}.
236 @item Microsoft Windows 2000 (Cygwin)
239 GCC 3.2, GNU make. @code{i686-pc-cygwin}.
244 HP-UX C compiler and HP Make. @code{ia64-hp-hpux11.22},
245 @code{hppa2.0w-hp-hpux11.11}.
247 @item SUN Solaris 2.8
250 Sun WorkShop Compiler C 6.0 and SUN Make. @code{sparc-sun-solaris2.8}.
255 GCC 2.95.3 and GNU Make. @code{alpha-unknown-netbsd1.6},
256 @code{i386-unknown-netbsdelf1.6}.
258 @item OpenBSD 3.1 and 3.2
261 GCC 2.95.3 and GNU Make. @code{alpha-unknown-openbsd3.1},
262 @code{i386-unknown-openbsd3.1}.
267 GCC 2.95.4 and GNU Make. @code{alpha-unknown-freebsd4.7},
268 @code{i386-unknown-freebsd4.7}.
272 If you use Libidn on, or port Libidn to, a new platform please report
277 @cindex Reporting Bugs
279 If you think you have found a bug in Libidn, please investigate it and
284 @item Please make sure that the bug is really in Libidn, and
285 preferably also check that it hasn't already been fixed in the latest
288 @item You have to send us a test case that makes it possible for us to
291 @item You also have to explain what is wrong; if you get a crash, or
292 if the results printed are not good and in that case, in what way.
293 Make sure that the bug report includes all information you would need
294 to fix this kind of bug for someone else.
298 Please make an effort to produce a self-contained report, with
299 something definite that can be tested or debugged. Vague queries or
300 piecemeal messages are difficult to act on and don't help the
303 If your bug report is good, we will do our best to help you to get a
304 corrected version of the software; if the bug report is poor, we won't
305 do anything about it (apart from asking you to send better bug
308 If you think something in this manual is unclear, or downright
309 incorrect, or if the language needs to be improved, please also send a
312 Send your bug report to:
314 @center @samp{bug-libidn@@gnu.org}
317 @c **********************************************************
318 @c ******************* Preparation ************************
319 @c **********************************************************
323 To use `Libidn', you have to perform some changes to your sources and
324 the build system. The necessary changes are small and explained in
325 the following sections. At the end of this chapter, it is described
326 how the library is initialized, and how the requirements of the
327 library are verified.
329 A faster way to find out how to adapt your application for use with
330 `Libidn' may be to look at the examples at the end of this manual
337 * Building the source::
343 The library contains a few independent parts, and each part export the
344 interfaces (data types and functions) in a header file. You must
345 include the appropriate header files in all programs using the
346 library, either directly or through some other header file, like this:
349 #include <stringprep.h>
352 The header files and the functions they define are categorized as
358 The low-level stringprep API entry point. For IDN applications, this
359 is usually invoked via IDNA. Some applications, specifically non-IDN
360 ones, may want to prepare strings directly though, and should include
363 The name space of the stringprep part of Libidn is @code{stringprep*}
364 for function names, @code{Stringprep*} for data types and
365 @code{STRINGPREP_*} for other symbols. In addition the same name
366 prefixes with one prepended underscore are reserved for internal use
367 and should never be used by an application.
371 The entry point to Punycode encoding and decoding functions. Normally
372 punycode is used via the idna.h interface, but some application may
373 want to perform raw punycode operations.
375 The name space of the punycode part of Libidn is @code{punycode_*} for
376 function names, @code{Punycode*} for data types and @code{PUNYCODE_*}
377 for other symbols. In addition the same name prefixes with one
378 prepended underscore are reserved for internal use and should never be
379 used by an application.
383 The entry point to the IDNA functions. This is the normal entry point
384 for applications that need IDN functionality.
386 The name space of the IDNA part of Libidn is @code{idna_*} for
387 function names, @code{Idna*} for data types and @code{IDNA_*} for
388 other symbols. In addition the same name prefixes with one prepended
389 underscore are reserved for internal use and should never be used by
395 @section Initialization
397 Libidn is stateless and does not need any initialization.
400 @section Version Check
402 It is often desirable to check that the version of `Libidn' used is
403 indeed one which fits all requirements. Even with binary
404 compatibility new features may have been introduced but due to problem
405 with the dynamic linker an old version is actually used. So you may
406 want to check that the version is okay right after program startup.
408 @include libidn-api-version.texi
410 The normal way to use the function is to put something similar to the
411 following first in your @code{main()}:
414 if (!stringprep_check_version (STRINGPREP_VERSION))
416 printf ("stringprep_check_version() failed:\n"
417 "Header file incompatible with shared library.\n");
422 @node Building the source
423 @section Building the source
424 @cindex Compiling your application
426 If you want to compile a source file including e.g. the `idna.h' header
427 file, you must make sure that the compiler can find it in the
428 directory hierarchy. This is accomplished by adding the path to the
429 directory in which the header file is located to the compilers include
430 file search path (via the @option{-I} option).
432 However, the path to the include file is determined at the time the
433 source is configured. To solve this problem, `Libidn' uses the
434 external package @command{pkg-config} that knows the path to the
435 include file and other configuration options. The options that need
436 to be added to the compiler invocation at compile time are output by
437 the @option{--cflags} option to @command{pkg-config libidn}. The
438 following example shows how it can be used at the command line:
441 gcc -c foo.c `pkg-config libidn --cflags`
444 Adding the output of @samp{pkg-config libidn --cflags} to the
445 compilers command line will ensure that the compiler can find e.g. the
448 A similar problem occurs when linking the program with the library.
449 Again, the compiler has to find the library files. For this to work,
450 the path to the library files has to be added to the library search
451 path (via the @option{-L} option). For this, the option
452 @option{--libs} to @command{pkg-config libidn} can be used. For
453 convenience, this option also outputs all other options that are
454 required to link the program with the `libidn' libarary. The example
455 shows how to link @file{foo.o} with the `libidn' library to a program
459 gcc -o foo foo.o `pkg-config libidn --libs`
462 Of course you can also combine both examples to a single command by
463 specifying both options to @command{pkg-config}:
466 gcc -o foo foo.c `pkg-config libidn --cflags --libs`
469 @c **********************************************************
470 @c ****************** Stringprep Functions *****************
471 @c **********************************************************
472 @node Stringprep Functions
473 @chapter Stringprep Functions
474 @cindex Stringprep Functions
476 Stringprep describes a framework for preparing Unicode text strings in
477 order to increase the likelihood that string input and string
478 comparison work in ways that make sense for typical users throughout
479 the world. The stringprep protocol is useful for protocol identifier
480 values, company and personal names, internationalized domain names,
481 and other text strings.
483 @defcv {Enumerated type} Stringprep_profile_flags STRINGPREP_NO_NFKC
484 STRINGPREP_NO_NFKC disables the NFKC normalization, as well as
485 selecting the non-NFKC case folding tables. Usually the profile
486 specifies BIDI and NFKC settings.
489 @defcv {Enumerated type} Stringprep_profile_flags STRINGPREP_NO_BIDI
490 STRINGPREP_NO_BIDI disables the BIDI step. Usually the profile
491 specifies BIDI and NFKC settings.
494 @defcv {Enumerated type} Stringprep_profile_flags STRINGPREP_NO_UNASSIGNED
495 STRINGPREP_NO_UNASSIGNED causes stringprep() abort with an error if
496 string contains unassigned characters according to profile.
499 @include libidn-api-stringprep.texi
501 @c **********************************************************
502 @c ******************* Punycode Functions ******************
503 @c **********************************************************
504 @node Punycode Functions
505 @chapter Punycode Functions
506 @cindex Punycode Functions
508 Punycode is a simple and efficient transfer encoding syntax designed
509 for use with Internationalized Domain Names in Applications. It
510 uniquely and reversibly transforms a Unicode string into an ASCII
511 string. ASCII characters in the Unicode string are represented
512 literally, and non-ASCII characters are represented by ASCII
513 characters that are allowed in host name labels (letters, digits, and
514 hyphens). This document defines a general algorithm called Bootstring
515 that allows a string of basic code points to uniquely represent any
516 string of code points drawn from a larger set. Punycode is an instance
517 of Bootstring that uses particular parameter values specified by this
518 document, appropriate for IDNA.
520 @include libidn-api-punycode.texi
522 @c **********************************************************
523 @c ********************* IDNA Functions *********************
524 @c **********************************************************
526 @chapter IDNA Functions
527 @cindex IDNA Functions
529 Until now, there has been no standard method for domain names to use
530 characters outside the ASCII repertoire. The IDNA document defines
531 internationalized domain names (IDNs) and a mechanism called IDNA for
532 handling them in a standard fashion. IDNs use characters drawn from a
533 large repertoire (Unicode), but IDNA allows the non-ASCII characters
534 to be represented using only the ASCII characters already allowed in
535 so-called host names today. This backward-compatible representation is
536 required in existing protocols like DNS, so that IDNs can be
537 introduced with no changes to the existing infrastructure. IDNA is
538 only meant for processing domain names, not free text.
540 @include libidn-api-idna.texi
542 @c **********************************************************
543 @c *********************** Examples ***********************
544 @c **********************************************************
549 This chapter contains example code which illustrate how `Libidn' can
550 be used when writing your own application.
553 * Example 1:: Example using stringprep.
554 * Example 2:: Example using punycode.
555 * Example 3:: Example using IDNA ToASCII.
556 * Example 4:: Example using IDNA ToUnicode.
562 This example demonstrates how the stringprep functions are used.
565 @include example.c.texi
572 This example demonstrates how the punycode functions are used.
575 @include example2.c.texi
582 This example demonstrates how the library is used to convert
583 internationalized domain names into ASCII compatible names.
586 @include example3.c.texi
593 This example demonstrates how the library is used to convert ASCII
594 compatible names to internationalized domain names.
597 @include example4.c.texi
600 @c **********************************************************
601 @c ********************* Invoking idn *********************
602 @c **********************************************************
604 @chapter Invoking idn
607 @cindex invoking @command{idn}
612 GNU Libidn (idn) -- Internationalized Domain Names command line tool
614 @majorheading Description
615 @code{idn} is a utility part of GNU Libidn. It allows preparation of
616 strings, encoding and decoding of punycode data, and IDNA
617 ToASCII/ToUnicode operations to be performed on the command line,
618 without the need to write a program that uses libidn.
620 Data is read, line by line, from the standard input, and one of the
621 operations indicated by command parameters are performed and the
622 output is printed to standard output. If any errors are encountered,
623 the execution of the applications is aborted.
625 @majorheading Options
626 @code{idn} recognizes these commands:
633 Print version and exit
636 Prepare string according to nameprep profile
639 Encode UTF-8 to Punycode
642 Decode Punycode to UTF-8
645 Convert UTF-8 to ACE according to IDNA
648 Convert ACE to UTF-8 according to IDNA
651 Toggle IDNA AllowUnassigned flag (default=off)
654 Toggle IDNA UseSTD3ASCIIRules flag (default=off)
656 -pSTRING --profile=STRING
657 Use specified stringprep profile instead
659 Valid stringprep profiles are 'generic', 'Nameprep',
660 'KRBprep', 'Nodeprep', 'Resourceprep', 'plain',
661 'SASLprep', and 'ISCSIprep'.
664 Print debugging information (default=off)
667 Don't print the welcome greeting (default=off)
670 @majorheading Environment Variables
672 The @var{CHARSET} environment variable can be used to override what
673 character set to be used for decoding incoming data on the standard
674 input, and to encode data to the standard output. If your system is
675 set up correctly, the application will guess which character set is
676 used automatically. Example usage:
679 $ CHARSET=ISO-8859-1 idn --punycode-encode
686 Included in Libidn are @file{punycode.el} and @file{idna.el} that
687 provides an Emacs Lisp API to (a limited set of) the Libidn API. This
688 section describes the API.
690 @defvar punycode-program
691 Name of the GNU Libidn @file{idn} application. The default is
692 @samp{env CHARSET=UTF-8 idn}. This variable can be customized.
695 @defvar punycode-encode-parameters
696 Parameters passed to @var{punycode-program} to invoke punycode
697 encoding mode. The default is @samp{--quiet --punycode-encode}. This
698 variable can be customized.
701 @defvar punycode-decode-parameters
702 Parameters passed to @var{punycode-program} to invoke punycode
703 decoding mode. The default is @samp{--quiet --punycode-decode}. This
704 variable can be customized.
707 @defun punycode-encode string
708 Returns a Punycode encoding of the @var{string}, after converting the
712 @defun punycode-decode string
713 Returns a possibly multibyte string which is the decoding of the
714 @var{string} which is a punycode encoded string.
718 Name of the GNU Libidn @file{idn} application. The default is
719 @samp{env CHARSET=UTF-8 idn}. This variable can be customized.
722 @defvar idna-to-ascii-parameters
723 Parameters passed to @var{idna-program} to invoke IDNA ToASCII mode.
724 The default is @samp{--quiet --idna-to-ascii}. This variable can be
728 @defvar idna-to-unicode-parameters
729 Parameters passed @var{idna-program} to invoke IDNA ToUnicode mode.
730 The default is @samp{--quiet --idna-to-unicode}. This variable can be
734 @defun idna-to-ascii string
735 Returns an ASCII Compatible Encoding (ACE) of the string computed by
736 the IDNA ToASCII operation on the input @var{string}, after converting
740 @defun idna-to-unicode string
741 Returns a possibly multibyte string which is the output of the IDNA
742 ToUnicode operation computed on the input @var{string}.
745 @c **********************************************************
746 @c ******************* Acknowledgements *******************
747 @c **********************************************************
748 @node Acknowledgements
749 @chapter Acknowledgements
751 The punycode code was taken from the IETF IDN Punycode specification,
754 Some functions (see nfkc.c and toutf8.c) has been borrowed from GLib
755 downloaded from www.gtk.org.
757 Several people reported bugs, sent patches or suggested improvements,
762 @node Copying This Manual
763 @appendix Copying This Manual
766 * GNU Free Documentation License:: License for copying this manual.
772 @unnumbered Concept Index
776 @node Function and Variable Index
777 @unnumbered Function and Variable Index