1 \input texinfo @c -*-texinfo-*-
2 @c This file is part of the GNU Libidn Manual.
3 @c Copyright (C) 2002, 2003 Simon Josefsson
4 @c See below for copying conditions.
6 @setfilename libidn.info
8 @settitle GNU Libidn @value{VERSION}
13 This manual is for GNU Libidn version @value{VERSION},
16 Copyright @copyright{} 2002, 2003 Simon Josefsson.
19 Permission is granted to copy, distribute and/or modify this document
20 under the terms of the GNU Free Documentation License, Version 1.1 or
21 any later version published by the Free Software Foundation; with no
22 Invariant Sections, with the Front-Cover Texts being ``A GNU Manual,''
23 and with the Back-Cover Texts as in (a) below. A copy of the
24 license is included in the section entitled ``GNU Free Documentation
27 (a) The FSF's Back-Cover Text is: ``You have freedom to copy and modify
28 this GNU Manual, like GNU software. Copies published by the Free
29 Software Foundation raise funds for GNU development.''
33 @dircategory GNU Libraries
35 * libidn: (libidn). Internationalized string processing library.
38 @dircategory GNU utilities
40 * idn: (libidn)Invoking idn. Command line interface to GNU Libidn.
45 * IDN Library: (libidn)Emacs API. Emacs API for IDN functions.
50 @subtitle for version @value{VERSION}, @value{UPDATED}
51 @author Simon Josefsson (@email{bug-libidn@@gnu.org})
53 @vskip 0pt plus 1filll
67 * Introduction:: How to use this manual.
68 * Preparation:: What you should do before using the library.
69 * Stringprep Functions:: Stringprep functions.
70 * Punycode Functions:: Punycode functions.
71 * IDNA Functions:: IDNA functions.
72 * Examples:: Demonstrate how to use the library.
73 * Acknowledgements:: Whom to blame.
75 * Invoking idn:: Command line interface to the library.
77 * Emacs API:: Emacs Lisp API for Libidn.
82 * Function and Variable Index::
86 * Library Copying:: How you can copy and share GNU Libidn.
87 * Copying This Manual:: How you can copy and share this manual.
95 GNU Libidn is an implementation of the Stringprep, Punycode and IDNA
96 specifications defined by the IETF Internationalized Domain Names
97 (IDN) working group, used for internationalized domain names. The
98 package is available under the GNU Lesser General Public License.
100 The library contains a generic Stringprep implementation that does
101 Unicode 3.2 NFKC normalization, mapping and prohibitation of
102 characters, and bidirectional character handling. Profiles for iSCSI,
103 Kerberos 5, Nameprep, SASL and XMPP are included. Punycode and ASCII
104 Compatible Encoding (ACE) via IDNA are supported.
106 The Stringprep API consists of two main functions, one for converting
107 data from the system's native representation into UTF-8, and one
108 function to perform the Stringprep processing. Adding a new
109 Stringprep profile for your application within the API is
110 straightforward. The Punycode API consists of one encoding function
111 and one decoding function. The IDNA API consists of the ToASCII and
112 ToUnicode functions, as well as an high-level interface for converting
113 entire domain names to and from the ACE encoded form.
115 The library is used by, e.g., GNU SASL and Shishi to process user
116 names and passwords. Libidn can be built into GNU Libc to enable a
117 new system-wide getaddrinfo() flag for IDN processing.
119 Libidn is developed for the GNU/Linux system, but runs on over 20 Unix
120 platforms (including Solaris, IRIX, AIX, and Tru64) and Windows.
121 Libidn is written in C and (parts of) the API is accessible from C,
122 C++, Emacs Lisp, Python and Java.
127 * Supported Platforms::
131 @node Getting Started
132 @section Getting Started
134 This manual documents the library programming interface. All
135 functions and data types provided by the library are explained.
137 The reader is assumed to possess basic familiarity with
138 internationalization concepts and network programming in C or C++.
140 This manual can be used in several ways. If read from the beginning
141 to the end, it gives a good introduction into the library and how it
142 can be used in an application. Forward references are included where
143 necessary. Later on, the manual can be used as a reference manual to
144 get just the information needed about any particular interface of the
145 library. Experienced programmers might want to start looking at the
146 examples at the end of the manual (@pxref{Examples}), and then only
147 read up those parts of the interface which are unclear.
152 This library might have a couple of advantages over other libraries
156 @item It's Free Software
157 Anybody can use, modify, and redistribute it under the terms of the
158 GNU Lesser General Public License.
160 @item It's thread-safe
161 No global state is kept in the library.
164 It should work on all Unix like operating systems, including Windows.
168 @node Supported Platforms
169 @section Supported Platforms
171 Libidn has at some point in time been tested on the following
176 @item Debian GNU/Linux 3.0 (Woody)
179 GCC 2.95.4 and GNU Make. This is the main development platform.
180 @code{alphaev67-unknown-linux-gnu}, @code{alphaev6-unknown-linux-gnu},
181 @code{arm-unknown-linux-gnu}, @code{hppa-unknown-linux-gnu},
182 @code{hppa64-unknown-linux-gnu}, @code{i686-pc-linux-gnu},
183 @code{ia64-unknown-linux-gnu}, @code{m68k-unknown-linux-gnu},
184 @code{mips-unknown-linux-gnu}, @code{mipsel-unknown-linux-gnu},
185 @code{powerpc-unknown-linux-gnu}, @code{s390-ibm-linux-gnu},
186 @code{sparc-unknown-linux-gnu}.
188 @item Debian GNU/Linux 2.1
191 GCC 2.95.1 and GNU Make. @code{armv4l-unknown-linux-gnu}.
196 Tru64 UNIX C compiler and Tru64 Make. @code{alphaev67-dec-osf5.1},
197 @code{alphaev68-dec-osf5.1}.
202 GCC 2.96 and GNU Make. @code{alphaev6-unknown-linux-gnu},
203 @code{alphaev67-unknown-linux-gnu}.
205 @item SuSE Linux 7.2a
208 GCC 3.0 and GNU Make. @code{ia64-unknown-linux-gnu}.
210 @item RedHat Linux 7.2
213 GCC 2.96 and GNU Make. @code{alphaev6-unknown-linux-gnu},
214 @code{alphaev67-unknown-linux-gnu}, @code{ia64-unknown-linux-gnu}.
216 @item RedHat Linux 8.0
219 GCC 3.2 and GNU Make. @code{i686-pc-linux-gnu}.
221 @item RedHat Advanced Server 2.1
222 @cindex RedHat Advanced Server
224 GCC 2.96 and GNU Make. @code{i686-pc-linux-gnu}.
226 @item Slackware Linux 8.0.01
229 GCC 2.95.3 and GNU Make. @code{i686-pc-linux-gnu}.
231 @item Mandrake Linux 9.0
234 GCC 3.2 and GNU Make. @code{i686-pc-linux-gnu}.
239 MIPS C compiler, IRIX Make. @code{mips-sgi-irix6.5}.
244 IBM C for AIX compiler, AIX Make. @code{rs6000-ibm-aix4.3.2.0}.
246 @item Microsoft Windows 2000 (Cygwin)
249 GCC 3.2, GNU make. @code{i686-pc-cygwin}.
254 HP-UX C compiler and HP Make. @code{ia64-hp-hpux11.22},
255 @code{hppa2.0w-hp-hpux11.11}.
257 @item SUN Solaris 2.8
260 Sun WorkShop Compiler C 6.0 and SUN Make. @code{sparc-sun-solaris2.8}.
265 GCC 2.95.3 and GNU Make. @code{alpha-unknown-netbsd1.6},
266 @code{i386-unknown-netbsdelf1.6}.
268 @item OpenBSD 3.1 and 3.2
271 GCC 2.95.3 and GNU Make. @code{alpha-unknown-openbsd3.1},
272 @code{i386-unknown-openbsd3.1}.
277 GCC 2.95.4 and GNU Make. @code{alpha-unknown-freebsd4.7},
278 @code{i386-unknown-freebsd4.7}.
282 If you use Libidn on, or port Libidn to, a new platform please report
287 @cindex Reporting Bugs
289 If you think you have found a bug in Libidn, please investigate it and
294 @item Please make sure that the bug is really in Libidn, and
295 preferably also check that it hasn't already been fixed in the latest
298 @item You have to send us a test case that makes it possible for us to
301 @item You also have to explain what is wrong; if you get a crash, or
302 if the results printed are not good and in that case, in what way.
303 Make sure that the bug report includes all information you would need
304 to fix this kind of bug for someone else.
308 Please make an effort to produce a self-contained report, with
309 something definite that can be tested or debugged. Vague queries or
310 piecemeal messages are difficult to act on and don't help the
313 If your bug report is good, we will do our best to help you to get a
314 corrected version of the software; if the bug report is poor, we won't
315 do anything about it (apart from asking you to send better bug
318 If you think something in this manual is unclear, or downright
319 incorrect, or if the language needs to be improved, please also send a
322 Send your bug report to:
324 @center @samp{bug-libidn@@gnu.org}
327 @c **********************************************************
328 @c ******************* Preparation ************************
329 @c **********************************************************
333 To use `Libidn', you have to perform some changes to your sources and
334 the build system. The necessary changes are small and explained in
335 the following sections. At the end of this chapter, it is described
336 how the library is initialized, and how the requirements of the
337 library are verified.
339 A faster way to find out how to adapt your application for use with
340 `Libidn' may be to look at the examples at the end of this manual
347 * Building the source::
353 The library contains a few independent parts, and each part export the
354 interfaces (data types and functions) in a header file. You must
355 include the appropriate header files in all programs using the
356 library, either directly or through some other header file, like this:
359 #include <stringprep.h>
362 The header files and the functions they define are categorized as
368 The low-level stringprep API entry point. For IDN applications, this
369 is usually invoked via IDNA. Some applications, specifically non-IDN
370 ones, may want to prepare strings directly though, and should include
373 The name space of the stringprep part of Libidn is @code{stringprep*}
374 for function names, @code{Stringprep*} for data types and
375 @code{STRINGPREP_*} for other symbols. In addition the same name
376 prefixes with one prepended underscore are reserved for internal use
377 and should never be used by an application.
381 The entry point to Punycode encoding and decoding functions. Normally
382 punycode is used via the idna.h interface, but some application may
383 want to perform raw punycode operations.
385 The name space of the punycode part of Libidn is @code{punycode_*} for
386 function names, @code{Punycode*} for data types and @code{PUNYCODE_*}
387 for other symbols. In addition the same name prefixes with one
388 prepended underscore are reserved for internal use and should never be
389 used by an application.
393 The entry point to the IDNA functions. This is the normal entry point
394 for applications that need IDN functionality.
396 The name space of the IDNA part of Libidn is @code{idna_*} for
397 function names, @code{Idna*} for data types and @code{IDNA_*} for
398 other symbols. In addition the same name prefixes with one prepended
399 underscore are reserved for internal use and should never be used by
405 @section Initialization
407 Libidn is stateless and does not need any initialization.
410 @section Version Check
412 It is often desirable to check that the version of `Libidn' used is
413 indeed one which fits all requirements. Even with binary
414 compatibility new features may have been introduced but due to problem
415 with the dynamic linker an old version is actually used. So you may
416 want to check that the version is okay right after program startup.
418 @include libidn-api-version.texi
420 The normal way to use the function is to put something similar to the
421 following first in your @code{main()}:
424 if (!stringprep_check_version (STRINGPREP_VERSION))
426 printf ("stringprep_check_version() failed:\n"
427 "Header file incompatible with shared library.\n");
432 @node Building the source
433 @section Building the source
434 @cindex Compiling your application
436 If you want to compile a source file including e.g. the `idna.h' header
437 file, you must make sure that the compiler can find it in the
438 directory hierarchy. This is accomplished by adding the path to the
439 directory in which the header file is located to the compilers include
440 file search path (via the @option{-I} option).
442 However, the path to the include file is determined at the time the
443 source is configured. To solve this problem, `Libidn' uses the
444 external package @command{pkg-config} that knows the path to the
445 include file and other configuration options. The options that need
446 to be added to the compiler invocation at compile time are output by
447 the @option{--cflags} option to @command{pkg-config libidn}. The
448 following example shows how it can be used at the command line:
451 gcc -c foo.c `pkg-config libidn --cflags`
454 Adding the output of @samp{pkg-config libidn --cflags} to the
455 compilers command line will ensure that the compiler can find e.g. the
458 A similar problem occurs when linking the program with the library.
459 Again, the compiler has to find the library files. For this to work,
460 the path to the library files has to be added to the library search
461 path (via the @option{-L} option). For this, the option
462 @option{--libs} to @command{pkg-config libidn} can be used. For
463 convenience, this option also outputs all other options that are
464 required to link the program with the `libidn' libarary. The example
465 shows how to link @file{foo.o} with the `libidn' library to a program
469 gcc -o foo foo.o `pkg-config libidn --libs`
472 Of course you can also combine both examples to a single command by
473 specifying both options to @command{pkg-config}:
476 gcc -o foo foo.c `pkg-config libidn --cflags --libs`
479 @c **********************************************************
480 @c ****************** Stringprep Functions *****************
481 @c **********************************************************
482 @node Stringprep Functions
483 @chapter Stringprep Functions
484 @cindex Stringprep Functions
486 Stringprep describes a framework for preparing Unicode text strings in
487 order to increase the likelihood that string input and string
488 comparison work in ways that make sense for typical users throughout
489 the world. The stringprep protocol is useful for protocol identifier
490 values, company and personal names, internationalized domain names,
491 and other text strings.
493 @defcv {Enumerated type} Stringprep_profile_flags STRINGPREP_NO_NFKC
494 STRINGPREP_NO_NFKC disables the NFKC normalization, as well as
495 selecting the non-NFKC case folding tables. Usually the profile
496 specifies BIDI and NFKC settings.
499 @defcv {Enumerated type} Stringprep_profile_flags STRINGPREP_NO_BIDI
500 STRINGPREP_NO_BIDI disables the BIDI step. Usually the profile
501 specifies BIDI and NFKC settings.
504 @defcv {Enumerated type} Stringprep_profile_flags STRINGPREP_NO_UNASSIGNED
505 STRINGPREP_NO_UNASSIGNED causes stringprep() abort with an error if
506 string contains unassigned characters according to profile.
509 @include libidn-api-stringprep.texi
511 @c **********************************************************
512 @c ******************* Punycode Functions ******************
513 @c **********************************************************
514 @node Punycode Functions
515 @chapter Punycode Functions
516 @cindex Punycode Functions
518 Punycode is a simple and efficient transfer encoding syntax designed
519 for use with Internationalized Domain Names in Applications. It
520 uniquely and reversibly transforms a Unicode string into an ASCII
521 string. ASCII characters in the Unicode string are represented
522 literally, and non-ASCII characters are represented by ASCII
523 characters that are allowed in host name labels (letters, digits, and
524 hyphens). This document defines a general algorithm called Bootstring
525 that allows a string of basic code points to uniquely represent any
526 string of code points drawn from a larger set. Punycode is an instance
527 of Bootstring that uses particular parameter values specified by this
528 document, appropriate for IDNA.
530 @include libidn-api-punycode.texi
532 @c **********************************************************
533 @c ********************* IDNA Functions *********************
534 @c **********************************************************
536 @chapter IDNA Functions
537 @cindex IDNA Functions
539 Until now, there has been no standard method for domain names to use
540 characters outside the ASCII repertoire. The IDNA document defines
541 internationalized domain names (IDNs) and a mechanism called IDNA for
542 handling them in a standard fashion. IDNs use characters drawn from a
543 large repertoire (Unicode), but IDNA allows the non-ASCII characters
544 to be represented using only the ASCII characters already allowed in
545 so-called host names today. This backward-compatible representation is
546 required in existing protocols like DNS, so that IDNs can be
547 introduced with no changes to the existing infrastructure. IDNA is
548 only meant for processing domain names, not free text.
550 @include libidn-api-idna.texi
552 @c **********************************************************
553 @c *********************** Examples ***********************
554 @c **********************************************************
559 This chapter contains example code which illustrate how `Libidn' can
560 be used when writing your own application.
563 * Example 1:: Example using stringprep.
564 * Example 2:: Example using punycode.
565 * Example 3:: Example using IDNA ToASCII.
566 * Example 4:: Example using IDNA ToUnicode.
572 This example demonstrates how the stringprep functions are used.
575 @include example.c.texi
582 This example demonstrates how the punycode functions are used.
585 @include example2.c.texi
592 This example demonstrates how the library is used to convert
593 internationalized domain names into ASCII compatible names.
596 @include example3.c.texi
603 This example demonstrates how the library is used to convert ASCII
604 compatible names to internationalized domain names.
607 @include example4.c.texi
610 @c **********************************************************
611 @c ********************* Invoking idn *********************
612 @c **********************************************************
614 @chapter Invoking idn
617 @cindex invoking @command{idn}
622 GNU Libidn (idn) -- Internationalized Domain Names command line tool
624 @majorheading Description
625 @code{idn} is a utility part of GNU Libidn. It allows preparation of
626 strings, encoding and decoding of punycode data, and IDNA
627 ToASCII/ToUnicode operations to be performed on the command line,
628 without the need to write a program that uses libidn.
630 Data is read, line by line, from the standard input, and one of the
631 operations indicated by command parameters are performed and the
632 output is printed to standard output. If any errors are encountered,
633 the execution of the applications is aborted.
635 @majorheading Options
636 @code{idn} recognizes these commands:
643 Print version and exit
646 Prepare string according to nameprep profile
649 Encode UTF-8 to Punycode
652 Decode Punycode to UTF-8
655 Convert UTF-8 to ACE according to IDNA
658 Convert ACE to UTF-8 according to IDNA
661 Toggle IDNA AllowUnassigned flag (default=off)
664 Toggle IDNA UseSTD3ASCIIRules flag (default=off)
666 -pSTRING --profile=STRING
667 Use specified stringprep profile instead
669 Valid stringprep profiles are 'generic', 'Nameprep',
670 'KRBprep', 'Nodeprep', 'Resourceprep', 'plain',
671 'SASLprep', and 'ISCSIprep'.
674 Print debugging information (default=off)
677 Don't print the welcome greeting (default=off)
680 @majorheading Environment Variables
682 The @var{CHARSET} environment variable can be used to override what
683 character set to be used for decoding incoming data on the standard
684 input, and to encode data to the standard output. If your system is
685 set up correctly, the application will guess which character set is
686 used automatically. Example usage:
689 $ CHARSET=ISO-8859-1 idn --punycode-encode
696 Included in Libidn are @file{punycode.el} and @file{idna.el} that
697 provides an Emacs Lisp API to (a limited set of) the Libidn API. This
698 section describes the API.
700 @defvar punycode-program
701 Name of the GNU Libidn @file{idn} application. The default is
702 @samp{env CHARSET=UTF-8 idn}. This variable can be customized.
705 @defvar punycode-encode-parameters
706 Parameters passed to @var{punycode-program} to invoke punycode
707 encoding mode. The default is @samp{--quiet --punycode-encode}. This
708 variable can be customized.
711 @defvar punycode-decode-parameters
712 Parameters passed to @var{punycode-program} to invoke punycode
713 decoding mode. The default is @samp{--quiet --punycode-decode}. This
714 variable can be customized.
717 @defun punycode-encode string
718 Returns a Punycode encoding of the @var{string}, after converting the
722 @defun punycode-decode string
723 Returns a possibly multibyte string which is the decoding of the
724 @var{string} which is a punycode encoded string.
728 Name of the GNU Libidn @file{idn} application. The default is
729 @samp{env CHARSET=UTF-8 idn}. This variable can be customized.
732 @defvar idna-to-ascii-parameters
733 Parameters passed to @var{idna-program} to invoke IDNA ToASCII mode.
734 The default is @samp{--quiet --idna-to-ascii}. This variable can be
738 @defvar idna-to-unicode-parameters
739 Parameters passed @var{idna-program} to invoke IDNA ToUnicode mode.
740 The default is @samp{--quiet --idna-to-unicode}. This variable can be
744 @defun idna-to-ascii string
745 Returns an ASCII Compatible Encoding (ACE) of the string computed by
746 the IDNA ToASCII operation on the input @var{string}, after converting
750 @defun idna-to-unicode string
751 Returns a possibly multibyte string which is the output of the IDNA
752 ToUnicode operation computed on the input @var{string}.
755 @c **********************************************************
756 @c ******************* Acknowledgements *******************
757 @c **********************************************************
758 @node Acknowledgements
759 @chapter Acknowledgements
761 The punycode code was taken from the IETF IDN Punycode specification,
764 Some functions (see nfkc.c and toutf8.c) has been borrowed from GLib
765 downloaded from www.gtk.org.
767 Several people reported bugs, sent patches or suggested improvements,
771 @unnumbered Concept Index
775 @node Function and Variable Index
776 @unnumbered Function and Variable Index
782 @node Copying This Manual
783 @appendix Copying This Manual
786 * GNU Free Documentation License:: License for copying this manual.