1 \input texinfo @c -*-texinfo-*-
2 @c This file is part of the GNU Libidn Manual.
3 @c Copyright (C) 2002, 2003 Simon Josefsson
4 @c See below for copying conditions.
6 @setfilename libidn.info
8 @settitle GNU Libidn @value{VERSION}
13 This manual is for GNU Libidn version @value{VERSION},
16 Copyright @copyright{} 2002, 2003 Simon Josefsson.
19 Permission is granted to copy, distribute and/or modify this document
20 under the terms of the GNU Free Documentation License, Version 1.1 or
21 any later version published by the Free Software Foundation; with no
22 Invariant Sections, with the Front-Cover Texts being ``A GNU Manual,''
23 and with the Back-Cover Texts as in (a) below. A copy of the
24 license is included in the section entitled ``GNU Free Documentation
27 (a) The FSF's Back-Cover Text is: ``You have freedom to copy and modify
28 this GNU Manual, like GNU software. Copies published by the Free
29 Software Foundation raise funds for GNU development.''
33 @dircategory GNU Libraries
35 * libidn: (libidn). Internationalized string processing library.
38 @dircategory GNU utilities
40 * idn: (libidn)Invoking idn. Command line interface to GNU Libidn.
45 * IDN Library: (libidn)Emacs API. Emacs API for IDN functions.
50 @subtitle for version @value{VERSION}, @value{UPDATED}
51 @author Simon Josefsson (@email{bug-libidn@@gnu.org})
53 @vskip 0pt plus 1filll
67 * Introduction:: How to use this manual.
68 * Preparation:: What you should do before using the library.
69 * Stringprep Functions:: Stringprep functions.
70 * Punycode Functions:: Punycode functions.
71 * IDNA Functions:: IDNA functions.
72 * Examples:: Demonstrate how to use the library.
73 * Acknowledgements:: Whom to blame.
75 * Invoking idn:: Command line interface to the library.
77 * Emacs API:: Emacs Lisp API for Libidn.
82 * Function and Variable Index::
86 * Library Copying:: How you can copy and share GNU Libidn.
87 * Copying This Manual:: How you can copy and share this manual.
95 GNU Libidn is an implementation of the Stringprep, Punycode and IDNA
96 specifications defined by the IETF Internationalized Domain Names
97 (IDN) working group, used for internationalized domain names. The
98 package is available under the GNU Lesser General Public License.
100 The library contains a generic Stringprep implementation that does
101 Unicode 3.2 NFKC normalization, mapping and prohibitation of
102 characters, and bidirectional character handling. Profiles for iSCSI,
103 Kerberos 5, Nameprep, SASL and XMPP are included. Punycode and ASCII
104 Compatible Encoding (ACE) via IDNA are supported.
106 The Stringprep API consists of two main functions, one for converting
107 data from the system's native representation into UTF-8, and one
108 function to perform the Stringprep processing. Adding a new
109 Stringprep profile for your application within the API is
110 straightforward. The Punycode API consists of one encoding function
111 and one decoding function. The IDNA API consists of the ToASCII and
112 ToUnicode functions, as well as an high-level interface for converting
113 entire domain names to and from the ACE encoded form.
115 The library is used by, e.g., GNU SASL and Shishi to process user
116 names and passwords. Libidn can be built into GNU Libc to enable a
117 new system-wide getaddrinfo() flag for IDN processing.
119 GNU Libidn is developed for the GNU/Linux system, but runs on over 20
120 Unix platforms (including Solaris, IRIX, AIX, and Tru64) and Windows.
125 * Supported Platforms::
129 @node Getting Started
130 @section Getting Started
132 This manual documents the library programming interface. All
133 functions and data types provided by the library are explained.
135 The reader is assumed to possess basic familiarity with
136 internationalization concepts and network programming in C or C++.
138 This manual can be used in several ways. If read from the beginning
139 to the end, it gives a good introduction into the library and how it
140 can be used in an application. Forward references are included where
141 necessary. Later on, the manual can be used as a reference manual to
142 get just the information needed about any particular interface of the
143 library. Experienced programmers might want to start looking at the
144 examples at the end of the manual (@pxref{Examples}), and then only
145 read up those parts of the interface which are unclear.
150 This library might have a couple of advantages over other libraries
154 @item It's Free Software
155 Anybody can use, modify, and redistribute it under the terms of the
156 GNU Lesser General Public License.
158 @item It's thread-safe
159 No global state is kept in the library.
162 It should work on all Unix like operating systems, including Windows.
166 @node Supported Platforms
167 @section Supported Platforms
169 Libidn has at some point in time been tested on the following
174 @item Debian GNU/Linux 3.0 (Woody)
177 GCC 2.95.4 and GNU Make. This is the main development platform.
178 @code{alphaev67-unknown-linux-gnu}, @code{alphaev6-unknown-linux-gnu},
179 @code{arm-unknown-linux-gnu}, @code{hppa-unknown-linux-gnu},
180 @code{hppa64-unknown-linux-gnu}, @code{i686-pc-linux-gnu},
181 @code{ia64-unknown-linux-gnu}, @code{m68k-unknown-linux-gnu},
182 @code{mips-unknown-linux-gnu}, @code{mipsel-unknown-linux-gnu},
183 @code{powerpc-unknown-linux-gnu}, @code{s390-ibm-linux-gnu},
184 @code{sparc-unknown-linux-gnu}.
186 @item Debian GNU/Linux 2.1
189 GCC 2.95.1 and GNU Make. @code{armv4l-unknown-linux-gnu}.
194 Tru64 UNIX C compiler and Tru64 Make. @code{alphaev67-dec-osf5.1},
195 @code{alphaev68-dec-osf5.1}.
200 GCC 2.96 and GNU Make. @code{alphaev6-unknown-linux-gnu},
201 @code{alphaev67-unknown-linux-gnu}.
203 @item SuSE Linux 7.2a
206 GCC 3.0 and GNU Make. @code{ia64-unknown-linux-gnu}.
208 @item RedHat Linux 7.2
211 GCC 2.96 and GNU Make. @code{alphaev6-unknown-linux-gnu},
212 @code{alphaev67-unknown-linux-gnu}, @code{ia64-unknown-linux-gnu}.
214 @item RedHat Linux 8.0
217 GCC 3.2 and GNU Make. @code{i686-pc-linux-gnu}.
219 @item RedHat Advanced Server 2.1
220 @cindex RedHat Advanced Server
222 GCC 2.96 and GNU Make. @code{i686-pc-linux-gnu}.
224 @item Slackware Linux 8.0.01
227 GCC 2.95.3 and GNU Make. @code{i686-pc-linux-gnu}.
229 @item Mandrake Linux 9.0
232 GCC 3.2 and GNU Make. @code{i686-pc-linux-gnu}.
237 MIPS C compiler, IRIX Make. @code{mips-sgi-irix6.5}.
242 IBM C for AIX compiler, AIX Make. @code{rs6000-ibm-aix4.3.2.0}.
244 @item Microsoft Windows 2000 (Cygwin)
247 GCC 3.2, GNU make. @code{i686-pc-cygwin}.
252 HP-UX C compiler and HP Make. @code{ia64-hp-hpux11.22},
253 @code{hppa2.0w-hp-hpux11.11}.
255 @item SUN Solaris 2.8
258 Sun WorkShop Compiler C 6.0 and SUN Make. @code{sparc-sun-solaris2.8}.
263 GCC 2.95.3 and GNU Make. @code{alpha-unknown-netbsd1.6},
264 @code{i386-unknown-netbsdelf1.6}.
266 @item OpenBSD 3.1 and 3.2
269 GCC 2.95.3 and GNU Make. @code{alpha-unknown-openbsd3.1},
270 @code{i386-unknown-openbsd3.1}.
275 GCC 2.95.4 and GNU Make. @code{alpha-unknown-freebsd4.7},
276 @code{i386-unknown-freebsd4.7}.
280 If you use Libidn on, or port Libidn to, a new platform please report
285 @cindex Reporting Bugs
287 If you think you have found a bug in Libidn, please investigate it and
292 @item Please make sure that the bug is really in Libidn, and
293 preferably also check that it hasn't already been fixed in the latest
296 @item You have to send us a test case that makes it possible for us to
299 @item You also have to explain what is wrong; if you get a crash, or
300 if the results printed are not good and in that case, in what way.
301 Make sure that the bug report includes all information you would need
302 to fix this kind of bug for someone else.
306 Please make an effort to produce a self-contained report, with
307 something definite that can be tested or debugged. Vague queries or
308 piecemeal messages are difficult to act on and don't help the
311 If your bug report is good, we will do our best to help you to get a
312 corrected version of the software; if the bug report is poor, we won't
313 do anything about it (apart from asking you to send better bug
316 If you think something in this manual is unclear, or downright
317 incorrect, or if the language needs to be improved, please also send a
320 Send your bug report to:
322 @center @samp{bug-libidn@@gnu.org}
325 @c **********************************************************
326 @c ******************* Preparation ************************
327 @c **********************************************************
331 To use `Libidn', you have to perform some changes to your sources and
332 the build system. The necessary changes are small and explained in
333 the following sections. At the end of this chapter, it is described
334 how the library is initialized, and how the requirements of the
335 library are verified.
337 A faster way to find out how to adapt your application for use with
338 `Libidn' may be to look at the examples at the end of this manual
345 * Building the source::
351 The library contains a few independent parts, and each part export the
352 interfaces (data types and functions) in a header file. You must
353 include the appropriate header files in all programs using the
354 library, either directly or through some other header file, like this:
357 #include <stringprep.h>
360 The header files and the functions they define are categorized as
366 The low-level stringprep API entry point. For IDN applications, this
367 is usually invoked via IDNA. Some applications, specifically non-IDN
368 ones, may want to prepare strings directly though, and should include
371 The name space of the stringprep part of Libidn is @code{stringprep*}
372 for function names, @code{Stringprep*} for data types and
373 @code{STRINGPREP_*} for other symbols. In addition the same name
374 prefixes with one prepended underscore are reserved for internal use
375 and should never be used by an application.
379 The entry point to Punycode encoding and decoding functions. Normally
380 punycode is used via the idna.h interface, but some application may
381 want to perform raw punycode operations.
383 The name space of the punycode part of Libidn is @code{punycode_*} for
384 function names, @code{Punycode*} for data types and @code{PUNYCODE_*}
385 for other symbols. In addition the same name prefixes with one
386 prepended underscore are reserved for internal use and should never be
387 used by an application.
391 The entry point to the IDNA functions. This is the normal entry point
392 for applications that need IDN functionality.
394 The name space of the IDNA part of Libidn is @code{idna_*} for
395 function names, @code{Idna*} for data types and @code{IDNA_*} for
396 other symbols. In addition the same name prefixes with one prepended
397 underscore are reserved for internal use and should never be used by
403 @section Initialization
405 Libidn is stateless and does not need any initialization.
408 @section Version Check
410 It is often desirable to check that the version of `Libidn' used is
411 indeed one which fits all requirements. Even with binary
412 compatibility new features may have been introduced but due to problem
413 with the dynamic linker an old version is actually used. So you may
414 want to check that the version is okay right after program startup.
416 @include libidn-api-version.texi
418 The normal way to use the function is to put something similar to the
419 following first in your @code{main()}:
422 if (!stringprep_check_version (STRINGPREP_VERSION))
424 printf ("stringprep_check_version() failed:\n"
425 "Header file incompatible with shared library.\n");
430 @node Building the source
431 @section Building the source
432 @cindex Compiling your application
434 If you want to compile a source file including e.g. the `idna.h' header
435 file, you must make sure that the compiler can find it in the
436 directory hierarchy. This is accomplished by adding the path to the
437 directory in which the header file is located to the compilers include
438 file search path (via the @option{-I} option).
440 However, the path to the include file is determined at the time the
441 source is configured. To solve this problem, `Libidn' uses the
442 external package @command{pkg-config} that knows the path to the
443 include file and other configuration options. The options that need
444 to be added to the compiler invocation at compile time are output by
445 the @option{--cflags} option to @command{pkg-config libidn}. The
446 following example shows how it can be used at the command line:
449 gcc -c foo.c `pkg-config libidn --cflags`
452 Adding the output of @samp{pkg-config libidn --cflags} to the
453 compilers command line will ensure that the compiler can find e.g. the
456 A similar problem occurs when linking the program with the library.
457 Again, the compiler has to find the library files. For this to work,
458 the path to the library files has to be added to the library search
459 path (via the @option{-L} option). For this, the option
460 @option{--libs} to @command{pkg-config libidn} can be used. For
461 convenience, this option also outputs all other options that are
462 required to link the program with the `libidn' libarary. The example
463 shows how to link @file{foo.o} with the `libidn' library to a program
467 gcc -o foo foo.o `pkg-config libidn --libs`
470 Of course you can also combine both examples to a single command by
471 specifying both options to @command{pkg-config}:
474 gcc -o foo foo.c `pkg-config libidn --cflags --libs`
477 @c **********************************************************
478 @c ****************** Stringprep Functions *****************
479 @c **********************************************************
480 @node Stringprep Functions
481 @chapter Stringprep Functions
482 @cindex Stringprep Functions
484 Stringprep describes a framework for preparing Unicode text strings in
485 order to increase the likelihood that string input and string
486 comparison work in ways that make sense for typical users throughout
487 the world. The stringprep protocol is useful for protocol identifier
488 values, company and personal names, internationalized domain names,
489 and other text strings.
491 @defcv {Enumerated type} Stringprep_profile_flags STRINGPREP_NO_NFKC
492 STRINGPREP_NO_NFKC disables the NFKC normalization, as well as
493 selecting the non-NFKC case folding tables. Usually the profile
494 specifies BIDI and NFKC settings.
497 @defcv {Enumerated type} Stringprep_profile_flags STRINGPREP_NO_BIDI
498 STRINGPREP_NO_BIDI disables the BIDI step. Usually the profile
499 specifies BIDI and NFKC settings.
502 @defcv {Enumerated type} Stringprep_profile_flags STRINGPREP_NO_UNASSIGNED
503 STRINGPREP_NO_UNASSIGNED causes stringprep() abort with an error if
504 string contains unassigned characters according to profile.
507 @include libidn-api-stringprep.texi
509 @c **********************************************************
510 @c ******************* Punycode Functions ******************
511 @c **********************************************************
512 @node Punycode Functions
513 @chapter Punycode Functions
514 @cindex Punycode Functions
516 Punycode is a simple and efficient transfer encoding syntax designed
517 for use with Internationalized Domain Names in Applications. It
518 uniquely and reversibly transforms a Unicode string into an ASCII
519 string. ASCII characters in the Unicode string are represented
520 literally, and non-ASCII characters are represented by ASCII
521 characters that are allowed in host name labels (letters, digits, and
522 hyphens). This document defines a general algorithm called Bootstring
523 that allows a string of basic code points to uniquely represent any
524 string of code points drawn from a larger set. Punycode is an instance
525 of Bootstring that uses particular parameter values specified by this
526 document, appropriate for IDNA.
528 @include libidn-api-punycode.texi
530 @c **********************************************************
531 @c ********************* IDNA Functions *********************
532 @c **********************************************************
534 @chapter IDNA Functions
535 @cindex IDNA Functions
537 Until now, there has been no standard method for domain names to use
538 characters outside the ASCII repertoire. The IDNA document defines
539 internationalized domain names (IDNs) and a mechanism called IDNA for
540 handling them in a standard fashion. IDNs use characters drawn from a
541 large repertoire (Unicode), but IDNA allows the non-ASCII characters
542 to be represented using only the ASCII characters already allowed in
543 so-called host names today. This backward-compatible representation is
544 required in existing protocols like DNS, so that IDNs can be
545 introduced with no changes to the existing infrastructure. IDNA is
546 only meant for processing domain names, not free text.
548 @include libidn-api-idna.texi
550 @c **********************************************************
551 @c *********************** Examples ***********************
552 @c **********************************************************
557 This chapter contains example code which illustrate how `Libidn' can
558 be used when writing your own application.
561 * Example 1:: Example using stringprep.
562 * Example 2:: Example using punycode.
563 * Example 3:: Example using IDNA ToASCII.
564 * Example 4:: Example using IDNA ToUnicode.
570 This example demonstrates how the stringprep functions are used.
573 @include example.c.texi
580 This example demonstrates how the punycode functions are used.
583 @include example2.c.texi
590 This example demonstrates how the library is used to convert
591 internationalized domain names into ASCII compatible names.
594 @include example3.c.texi
601 This example demonstrates how the library is used to convert ASCII
602 compatible names to internationalized domain names.
605 @include example4.c.texi
608 @c **********************************************************
609 @c ********************* Invoking idn *********************
610 @c **********************************************************
612 @chapter Invoking idn
615 @cindex invoking @command{idn}
620 GNU Libidn (idn) -- Internationalized Domain Names command line tool
622 @majorheading Description
623 @code{idn} is a utility part of GNU Libidn. It allows preparation of
624 strings, encoding and decoding of punycode data, and IDNA
625 ToASCII/ToUnicode operations to be performed on the command line,
626 without the need to write a program that uses libidn.
628 Data is read, line by line, from the standard input, and one of the
629 operations indicated by command parameters are performed and the
630 output is printed to standard output. If any errors are encountered,
631 the execution of the applications is aborted.
633 @majorheading Options
634 @code{idn} recognizes these commands:
641 Print version and exit
644 Prepare string according to nameprep profile
647 Encode UTF-8 to Punycode
650 Decode Punycode to UTF-8
653 Convert UTF-8 to ACE according to IDNA
656 Convert ACE to UTF-8 according to IDNA
659 Toggle IDNA AllowUnassigned flag (default=off)
662 Toggle IDNA UseSTD3ASCIIRules flag (default=off)
664 -pSTRING --profile=STRING
665 Use specified stringprep profile instead
667 Valid stringprep profiles are 'generic', 'Nameprep',
668 'KRBprep', 'Nodeprep', 'Resourceprep', 'plain',
669 'SASLprep', and 'ISCSIprep'.
672 Print debugging information (default=off)
675 Don't print the welcome greeting (default=off)
678 @majorheading Environment Variables
680 The @var{CHARSET} environment variable can be used to override what
681 character set to be used for decoding incoming data on the standard
682 input, and to encode data to the standard output. If your system is
683 set up correctly, the application will guess which character set is
684 used automatically. Example usage:
687 $ CHARSET=ISO-8859-1 idn --punycode-encode
694 Included in Libidn are @file{punycode.el} and @file{idna.el} that
695 provides an Emacs Lisp API to (a limited set of) the Libidn API. This
696 section describes the API.
698 @defvar punycode-program
699 Name of the GNU Libidn @file{idn} application. The default is
700 @samp{env CHARSET=UTF-8 idn}. This variable can be customized.
703 @defvar punycode-encode-parameters
704 Parameters passed to @var{punycode-program} to invoke punycode
705 encoding mode. The default is @samp{--quiet --punycode-encode}. This
706 variable can be customized.
709 @defvar punycode-decode-parameters
710 Parameters passed to @var{punycode-program} to invoke punycode
711 decoding mode. The default is @samp{--quiet --punycode-decode}. This
712 variable can be customized.
715 @defun punycode-encode string
716 Returns a Punycode encoding of the @var{string}, after converting the
720 @defun punycode-decode string
721 Returns a possibly multibyte string which is the decoding of the
722 @var{string} which is a punycode encoded string.
726 Name of the GNU Libidn @file{idn} application. The default is
727 @samp{env CHARSET=UTF-8 idn}. This variable can be customized.
730 @defvar idna-to-ascii-parameters
731 Parameters passed to @var{idna-program} to invoke IDNA ToASCII mode.
732 The default is @samp{--quiet --idna-to-ascii}. This variable can be
736 @defvar idna-to-unicode-parameters
737 Parameters passed @var{idna-program} to invoke IDNA ToUnicode mode.
738 The default is @samp{--quiet --idna-to-unicode}. This variable can be
742 @defun idna-to-ascii string
743 Returns an ASCII Compatible Encoding (ACE) of the string computed by
744 the IDNA ToASCII operation on the input @var{string}, after converting
748 @defun idna-to-unicode string
749 Returns a possibly multibyte string which is the output of the IDNA
750 ToUnicode operation computed on the input @var{string}.
753 @c **********************************************************
754 @c ******************* Acknowledgements *******************
755 @c **********************************************************
756 @node Acknowledgements
757 @chapter Acknowledgements
759 The punycode code was taken from the IETF IDN Punycode specification,
762 Some functions (see nfkc.c and toutf8.c) has been borrowed from GLib
763 downloaded from www.gtk.org.
765 Several people reported bugs, sent patches or suggested improvements,
769 @unnumbered Concept Index
773 @node Function and Variable Index
774 @unnumbered Function and Variable Index
780 @node Copying This Manual
781 @appendix Copying This Manual
784 * GNU Free Documentation License:: License for copying this manual.