1 \input texinfo @c -*-texinfo-*-
2 @c This file is part of the GNU Libidn Manual.
3 @c Copyright (C) 2002, 2003 Simon Josefsson
4 @c See below for copying conditions.
6 @setfilename libidn.info
8 @settitle GNU Libidn @value{VERSION}
13 This manual is for GNU Libidn
14 (version @value{VERSION}, @value{UPDATED}),
15 which is a library for internationalized string processing.
17 Copyright @copyright{} 2002, 2003 Simon Josefsson.
20 Permission is granted to copy, distribute and/or modify this document
21 under the terms of the GNU Free Documentation License, Version 1.1 or
22 any later version published by the Free Software Foundation; with no
23 Invariant Sections, with the Front-Cover Texts being ``A GNU Manual,''
24 and with the Back-Cover Texts as in (a) below. A copy of the
25 license is included in the section entitled ``GNU Free Documentation
28 (a) The FSF's Back-Cover Text is: ``You have freedom to copy and modify
29 this GNU Manual, like GNU software. Copies published by the Free
30 Software Foundation raise funds for GNU development.''
34 @dircategory GNU Libraries
36 * libidn: (libidn). Internationalized string processing library.
39 @dircategory GNU utilities
41 * idn: (libidn)Invoking idn. Command line interface to GNU Libidn.
46 * IDN Library: (libidn)Emacs API. Emacs API for IDN functions.
51 @subtitle for version @value{VERSION}, @value{UPDATED}
52 @author Simon Josefsson (@email{bug-libidn@@gnu.org})
54 @vskip 0pt plus 1filll
68 * Introduction:: How to use this manual.
69 * Preparation:: What you should do before using the library.
70 * Stringprep Functions:: Stringprep functions.
71 * Punycode Functions:: Punycode functions.
72 * IDNA Functions:: IDNA functions.
73 * Examples:: Demonstrate how to use the library.
74 * Acknowledgements:: Whom to blame.
76 * Invoking idn:: Command line interface to the library.
78 * Emacs API:: Emacs Lisp API for Libidn.
82 * Library Copying:: How you can copy and share GNU Libidn.
83 * Copying This Manual:: How you can copy and share this manual.
88 * Function and Variable Index::
96 GNU Libidn is an implementation of the Stringprep, Punycode and IDNA
97 specifications defined by the IETF Internationalized Domain Names
98 (IDN) working group, used for internationalized domain names. It is
99 available under the GNU Lesser General Public License. Currently the
100 iSCSI, Kerberos 5, Nameprep, SASL and XMPP Stringprep profiles are
103 The library contains a generic Stringprep implementation (including
104 Unicode 3.2 NFKC normalization, table mapping of characters, and
105 Bidirectional Character handling), a few Stringprep profiles, and an
106 implementation of the functionality defined by Punycode and IDNA.
108 The Stringprep API consists of two main functions, one for converting
109 data from the system's native representation into UTF-8, and one
110 function to perform the Stringprep processing. Each stringprep
111 profile has a corresponding CPP macro. Adding a new Stringprep
112 profile for your application within the API is straightforward. The
113 Punycode API consists of one encoding function and one decoding
114 function. The IDNA API consists of the ToASCII and ToUnicode
115 functions, as well as an high-level interface for converting entire
116 domain names to and from the ACE encoded form.
118 The library is used by forthcoming network applications to process
119 user names and passwords before they are input to cryptographic
120 operations. Libidn can be built into GNU Libc to enable a new
121 getaddrinfo() flag for system-wide IDN processing.
123 GNU Libidn is developed for the GNU/Linux system, but runs on over 20
124 Unix platforms (including Solaris, IRIX, AIX, and Tru64) and Windows.
129 * Supported Platforms::
133 @node Getting Started
134 @section Getting Started
136 This manual documents the library programming interface. All
137 functions and data types provided by the library are explained.
139 The reader is assumed to possess basic familiarity with
140 internationalization concepts and network programming in C or C++.
142 This manual can be used in several ways. If read from the beginning
143 to the end, it gives a good introduction into the library and how it
144 can be used in an application. Forward references are included where
145 necessary. Later on, the manual can be used as a reference manual to
146 get just the information needed about any particular interface of the
147 library. Experienced programmers might want to start looking at the
148 examples at the end of the manual (@pxref{Examples}), and then only
149 read up those parts of the interface which are unclear.
154 This library might have a couple of advantages over other libraries
158 @item It's Free Software
159 Anybody can use, modify, and redistribute it under the terms of the
160 GNU Lesser General Public License.
162 @item It's thread-safe
163 No global state is kept in the library.
166 It should work on all Unix like operating systems, including Windows.
170 @node Supported Platforms
171 @section Supported Platforms
173 Libidn has at some point in time been tested on the following
178 @item Debian GNU/Linux 3.0 (Woody)
181 GCC 2.95.4 and GNU Make. This is the main development platform.
182 @code{alphaev67-unknown-linux-gnu}, @code{alphaev6-unknown-linux-gnu},
183 @code{hppa64-unknown-linux-gnu}, @code{i686-pc-linux-gnu},
184 @code{ia64-unknown-linux-gnu}.
189 Tru64 UNIX C compiler and Tru64 Make. @code{alphaev67-dec-osf5.1},
190 @code{alphaev68-dec-osf5.1}.
195 GCC 2.96 and GNU Make. @code{alphaev6-unknown-linux-gnu},
196 @code{alphaev67-unknown-linux-gnu}.
198 @item SuSE Linux 7.2a
201 GCC 3.0 and GNU Make. @code{ia64-unknown-linux-gnu}.
203 @item RedHat Linux 7.2
206 GCC 2.96 and GNU Make. @code{alphaev6-unknown-linux-gnu},
207 @code{alphaev67-unknown-linux-gnu}, @code{ia64-unknown-linux-gnu}.
209 @item RedHat Linux 8.0
212 GCC 3.2 and GNU Make. @code{i686-pc-linux-gnu}.
214 @item RedHat Advanced Server 2.1
215 @cindex RedHat Advanced Server
217 GCC 2.96 and GNU Make. @code{i686-pc-linux-gnu}.
219 @item Slackware Linux 8.0.01
222 GCC 2.95.3 and GNU Make. @code{i686-pc-linux-gnu}.
224 @item Mandrake Linux 9.0
227 GCC 3.2 and GNU Make. @code{i686-pc-linux-gnu}.
232 MIPS C compiler, IRIX Make. @code{mips-sgi-irix6.5}.
237 IBM C for AIX compiler, AIX Make. @code{rs6000-ibm-aix4.3.2.0}.
239 @item Microsoft Windows 2000 (Cygwin)
242 GCC 3.2, GNU make. @code{i686-pc-cygwin}.
247 HP-UX C compiler and HP Make. @code{ia64-hp-hpux11.22},
248 @code{hppa2.0w-hp-hpux11.11}.
250 @item SUN Solaris 2.8
253 Sun WorkShop Compiler C 6.0 and SUN Make. @code{sparc-sun-solaris2.8}.
258 GCC 2.95.3 and GNU Make. @code{alpha-unknown-netbsd1.6},
259 @code{i386-unknown-netbsdelf1.6}.
261 @item OpenBSD 3.1 and 3.2
264 GCC 2.95.3 and GNU Make. @code{alpha-unknown-openbsd3.1},
265 @code{i386-unknown-openbsd3.1}.
270 GCC 2.95.4 and GNU Make. @code{alpha-unknown-freebsd4.7},
271 @code{i386-unknown-freebsd4.7}.
275 If you use Libidn on, or port Libidn to, a new platform please report
280 @cindex Reporting Bugs
282 If you think you have found a bug in Libidn, please investigate it and
287 @item Please make sure that the bug is really in Libidn, and
288 preferably also check that it hasn't already been fixed in the latest
291 @item You have to send us a test case that makes it possible for us to
294 @item You also have to explain what is wrong; if you get a crash, or
295 if the results printed are not good and in that case, in what way.
296 Make sure that the bug report includes all information you would need
297 to fix this kind of bug for someone else.
301 Please make an effort to produce a self-contained report, with
302 something definite that can be tested or debugged. Vague queries or
303 piecemeal messages are difficult to act on and don't help the
306 If your bug report is good, we will do our best to help you to get a
307 corrected version of the software; if the bug report is poor, we won't
308 do anything about it (apart from asking you to send better bug
311 If you think something in this manual is unclear, or downright
312 incorrect, or if the language needs to be improved, please also send a
315 Send your bug report to:
317 @center @samp{bug-libidn@@gnu.org}
320 @c **********************************************************
321 @c ******************* Preparation ************************
322 @c **********************************************************
326 To use `Libidn', you have to perform some changes to your sources and
327 the build system. The necessary changes are small and explained in
328 the following sections. At the end of this chapter, it is described
329 how the library is initialized, and how the requirements of the
330 library are verified.
332 A faster way to find out how to adapt your application for use with
333 `Libidn' may be to look at the examples at the end of this manual
340 * Building the source::
346 The library contains a few independent parts, and each part export the
347 interfaces (data types and functions) in a header file. You must
348 include the appropriate header files in all programs using the
349 library, either directly or through some other header file, like this:
352 #include <stringprep.h>
355 The header files and the functions they define are categorized as
361 The low-level stringprep API entry point. For IDN applications, this
362 is usually invoked via IDNA. Some applications, specifically non-IDN
363 ones, may want to prepare strings directly though, and should include
366 The name space of the stringprep part of Libidn is @code{stringprep*}
367 for function names, @code{Stringprep*} for data types and
368 @code{STRINGPREP_*} for other symbols. In addition the same name
369 prefixes with one prepended underscore are reserved for internal use
370 and should never be used by an application.
374 The entry point to Punycode encoding and decoding functions. Normally
375 punycode is used via the idna.h interface, but some application may
376 want to perform raw punycode operations.
378 The name space of the punycode part of Libidn is @code{punycode_*} for
379 function names, @code{Punycode*} for data types and @code{PUNYCODE_*}
380 for other symbols. In addition the same name prefixes with one
381 prepended underscore are reserved for internal use and should never be
382 used by an application.
386 The entry point to the IDNA functions. This is the normal entry point
387 for applications that need IDN functionality.
389 The name space of the IDNA part of Libidn is @code{idna_*} for
390 function names, @code{Idna*} for data types and @code{IDNA_*} for
391 other symbols. In addition the same name prefixes with one prepended
392 underscore are reserved for internal use and should never be used by
398 @section Initialization
400 Libidn is stateless and does not need any initialization.
403 @section Version Check
405 It is often desirable to check that the version of `Libidn' used is
406 indeed one which fits all requirements. Even with binary
407 compatibility new features may have been introduced but due to problem
408 with the dynamic linker an old version is actually used. So you may
409 want to check that the version is okay right after program startup.
411 @include libidn-api-version.texi
413 The normal way to use the function is to put something similar to the
414 following first in your @code{main()}:
417 if (!stringprep_check_version (STRINGPREP_VERSION))
419 printf ("stringprep_check_version() failed:\n"
420 "Header file incompatible with shared library.\n");
425 @node Building the source
426 @section Building the source
427 @cindex Compiling your application
429 If you want to compile a source file including e.g. the `idna.h' header
430 file, you must make sure that the compiler can find it in the
431 directory hierarchy. This is accomplished by adding the path to the
432 directory in which the header file is located to the compilers include
433 file search path (via the @option{-I} option).
435 However, the path to the include file is determined at the time the
436 source is configured. To solve this problem, `Libidn' uses the
437 external package @command{pkg-config} that knows the path to the
438 include file and other configuration options. The options that need
439 to be added to the compiler invocation at compile time are output by
440 the @option{--cflags} option to @command{pkg-config libidn}. The
441 following example shows how it can be used at the command line:
444 gcc -c foo.c `pkg-config libidn --cflags`
447 Adding the output of @samp{pkg-config libidn --cflags} to the
448 compilers command line will ensure that the compiler can find e.g. the
451 A similar problem occurs when linking the program with the library.
452 Again, the compiler has to find the library files. For this to work,
453 the path to the library files has to be added to the library search
454 path (via the @option{-L} option). For this, the option
455 @option{--libs} to @command{pkg-config libidn} can be used. For
456 convenience, this option also outputs all other options that are
457 required to link the program with the `libidn' libarary. The example
458 shows how to link @file{foo.o} with the `libidn' library to a program
462 gcc -o foo foo.o `pkg-config libidn --libs`
465 Of course you can also combine both examples to a single command by
466 specifying both options to @command{pkg-config}:
469 gcc -o foo foo.c `pkg-config libidn --cflags --libs`
472 @c **********************************************************
473 @c ****************** Stringprep Functions *****************
474 @c **********************************************************
475 @node Stringprep Functions
476 @chapter Stringprep Functions
477 @cindex Stringprep Functions
479 Stringprep describes a framework for preparing Unicode text strings in
480 order to increase the likelihood that string input and string
481 comparison work in ways that make sense for typical users throughout
482 the world. The stringprep protocol is useful for protocol identifier
483 values, company and personal names, internationalized domain names,
484 and other text strings.
486 @defcv {Enumerated type} Stringprep_profile_flags STRINGPREP_NO_NFKC
487 STRINGPREP_NO_NFKC disables the NFKC normalization, as well as
488 selecting the non-NFKC case folding tables. Usually the profile
489 specifies BIDI and NFKC settings.
492 @defcv {Enumerated type} Stringprep_profile_flags STRINGPREP_NO_BIDI
493 STRINGPREP_NO_BIDI disables the BIDI step. Usually the profile
494 specifies BIDI and NFKC settings.
497 @defcv {Enumerated type} Stringprep_profile_flags STRINGPREP_NO_UNASSIGNED
498 STRINGPREP_NO_UNASSIGNED causes stringprep() abort with an error if
499 string contains unassigned characters according to profile.
502 @include libidn-api-stringprep.texi
504 @c **********************************************************
505 @c ******************* Punycode Functions ******************
506 @c **********************************************************
507 @node Punycode Functions
508 @chapter Punycode Functions
509 @cindex Punycode Functions
511 Punycode is a simple and efficient transfer encoding syntax designed
512 for use with Internationalized Domain Names in Applications. It
513 uniquely and reversibly transforms a Unicode string into an ASCII
514 string. ASCII characters in the Unicode string are represented
515 literally, and non-ASCII characters are represented by ASCII
516 characters that are allowed in host name labels (letters, digits, and
517 hyphens). This document defines a general algorithm called Bootstring
518 that allows a string of basic code points to uniquely represent any
519 string of code points drawn from a larger set. Punycode is an instance
520 of Bootstring that uses particular parameter values specified by this
521 document, appropriate for IDNA.
523 @include libidn-api-punycode.texi
525 @c **********************************************************
526 @c ********************* IDNA Functions *********************
527 @c **********************************************************
529 @chapter IDNA Functions
530 @cindex IDNA Functions
532 Until now, there has been no standard method for domain names to use
533 characters outside the ASCII repertoire. The IDNA document defines
534 internationalized domain names (IDNs) and a mechanism called IDNA for
535 handling them in a standard fashion. IDNs use characters drawn from a
536 large repertoire (Unicode), but IDNA allows the non-ASCII characters
537 to be represented using only the ASCII characters already allowed in
538 so-called host names today. This backward-compatible representation is
539 required in existing protocols like DNS, so that IDNs can be
540 introduced with no changes to the existing infrastructure. IDNA is
541 only meant for processing domain names, not free text.
543 @include libidn-api-idna.texi
545 @c **********************************************************
546 @c *********************** Examples ***********************
547 @c **********************************************************
552 This chapter contains example code which illustrate how `Libidn' can
553 be used when writing your own application.
556 * Example 1:: Example using stringprep.
557 * Example 2:: Example using punycode.
558 * Example 3:: Example using IDNA ToASCII.
559 * Example 4:: Example using IDNA ToUnicode.
565 This example demonstrates how the stringprep functions are used.
568 @include example.c.texi
575 This example demonstrates how the punycode functions are used.
578 @include example2.c.texi
585 This example demonstrates how the library is used to convert
586 internationalized domain names into ASCII compatible names.
589 @include example3.c.texi
596 This example demonstrates how the library is used to convert ASCII
597 compatible names to internationalized domain names.
600 @include example4.c.texi
603 @c **********************************************************
604 @c ********************* Invoking idn *********************
605 @c **********************************************************
607 @chapter Invoking idn
610 @cindex invoking @command{idn}
615 GNU Libidn (idn) -- Internationalized Domain Names command line tool
617 @majorheading Description
618 @code{idn} is a utility part of GNU Libidn. It allows preparation of
619 strings, encoding and decoding of punycode data, and IDNA
620 ToASCII/ToUnicode operations to be performed on the command line,
621 without the need to write a program that uses libidn.
623 Data is read, line by line, from the standard input, and one of the
624 operations indicated by command parameters are performed and the
625 output is printed to standard output. If any errors are encountered,
626 the execution of the applications is aborted.
628 @majorheading Options
629 @code{idn} recognizes these commands:
636 Print version and exit
639 Prepare string according to nameprep profile
642 Encode UTF-8 to Punycode
645 Decode Punycode to UTF-8
648 Convert UTF-8 to ACE according to IDNA
651 Convert ACE to UTF-8 according to IDNA
654 Toggle IDNA AllowUnassigned flag (default=off)
657 Toggle IDNA UseSTD3ASCIIRules flag (default=off)
659 -pSTRING --profile=STRING
660 Use specified stringprep profile instead
662 Valid stringprep profiles are 'generic', 'Nameprep',
663 'KRBprep', 'Nodeprep', 'Resourceprep', 'plain',
664 'SASLprep', and 'ISCSIprep'.
667 Print debugging information (default=off)
670 Don't print the welcome greeting (default=off)
673 @majorheading Environment Variables
675 The @var{CHARSET} environment variable can be used to override what
676 character set to be used for decoding incoming data on the standard
677 input, and to encode data to the standard output. If your system is
678 set up correctly, the application will guess which character set is
679 used automatically. Example usage:
682 $ CHARSET=ISO-8859-1 idn --punycode-encode
689 Included in Libidn are @file{punycode.el} and @file{idna.el} that
690 provides an Emacs Lisp API to (a limited set of) the Libidn API. This
691 section describes the API.
693 @defvar punycode-program
694 Name of the GNU Libidn @file{idn} application. The default is
695 @samp{env CHARSET=UTF-8 idn}. This variable can be customized.
698 @defvar punycode-encode-parameters
699 Parameters passed to @var{punycode-program} to invoke punycode
700 encoding mode. The default is @samp{--quiet --punycode-encode}. This
701 variable can be customized.
704 @defvar punycode-decode-parameters
705 Parameters passed to @var{punycode-program} to invoke punycode
706 decoding mode. The default is @samp{--quiet --punycode-decode}. This
707 variable can be customized.
710 @defun punycode-encode string
711 Returns a Punycode encoding of the @var{string}, after converting the
715 @defun punycode-decode string
716 Returns a possibly multibyte string which is the decoding of the
717 @var{string} which is a punycode encoded string.
721 Name of the GNU Libidn @file{idn} application. The default is
722 @samp{env CHARSET=UTF-8 idn}. This variable can be customized.
725 @defvar idna-to-ascii-parameters
726 Parameters passed to @var{idna-program} to invoke IDNA ToASCII mode.
727 The default is @samp{--quiet --idna-to-ascii}. This variable can be
731 @defvar idna-to-unicode-parameters
732 Parameters passed @var{idna-program} to invoke IDNA ToUnicode mode.
733 The default is @samp{--quiet --idna-to-unicode}. This variable can be
737 @defun idna-to-ascii string
738 Returns an ASCII Compatible Encoding (ACE) of the string computed by
739 the IDNA ToASCII operation on the input @var{string}, after converting
743 @defun idna-to-unicode string
744 Returns a possibly multibyte string which is the output of the IDNA
745 ToUnicode operation computed on the input @var{string}.
748 @c **********************************************************
749 @c ******************* Acknowledgements *******************
750 @c **********************************************************
751 @node Acknowledgements
752 @chapter Acknowledgements
754 The punycode code was taken from the IETF IDN Punycode specification,
757 Some functions (see nfkc.c and toutf8.c) has been borrowed from GLib
758 downloaded from www.gtk.org.
760 Several people reported bugs, sent patches or suggested improvements,
765 @node Copying This Manual
766 @appendix Copying This Manual
769 * GNU Free Documentation License:: License for copying this manual.
775 @unnumbered Concept Index
779 @node Function and Variable Index
780 @unnumbered Function and Variable Index