doc/libidn.texi

   1 \input texinfo   @c -*-texinfo-*-
   2 @c This file is part of the GNU Libidn Manual.
   3 @c Copyright (C) 2002, 2003 Simon Josefsson
   4 @c See below for copying conditions.
   5
   6 @setfilename libidn.info
   7 @include version.texi
   8 @settitle GNU Libidn @value{VERSION}
   9
  10 @syncodeindex pg cp
  11
  12 @copying
  13 This manual is for GNU Libidn version @value{VERSION},
  14 @value{UPDATED}.
  15
  16 Copyright @copyright{} 2002, 2003 Simon Josefsson.
  17
  18 @quotation
  19 Permission is granted to copy, distribute and/or modify this document
  20 under the terms of the GNU Free Documentation License, Version 1.1 or
  21 any later version published by the Free Software Foundation; with no
  22 Invariant Sections, with the Front-Cover Texts being ``A GNU Manual,''
  23 and with the Back-Cover Texts as in (a) below.  A copy of the
  24 license is included in the section entitled ``GNU Free Documentation
  25 License.''
  26
  27 (a) The FSF's Back-Cover Text is: ``You have freedom to copy and modify
  28 this GNU Manual, like GNU software.  Copies published by the Free
  29 Software Foundation raise funds for GNU development.''
  30 @end quotation
  31 @end copying
  32
  33 @dircategory GNU Libraries
  34 @direntry
  35 * libidn: (libidn).     Internationalized string processing library.
  36 @end direntry
  37
  38 @dircategory GNU utilities
  39 @direntry
  40 * idn: (libidn)Invoking idn.            Command line interface to GNU Libidn.
  41 @end direntry
  42
  43 @dircategory Emacs
  44 @direntry
  45 * IDN Library: (libidn)Emacs API.       Emacs API for IDN functions.
  46 @end direntry
  47
  48 @titlepage
  49 @title GNU Libidn
  50 @subtitle for version @value{VERSION}, @value{UPDATED}
  51 @author Simon Josefsson (@email{bug-libidn@@gnu.org})
  52 @page
  53 @vskip 0pt plus 1filll
  54 @insertcopying
  55 @end titlepage
  56
  57 @contents
  58
  59 @ifnottex
  60 @node Top
  61 @top GNU Libidn
  62
  63 @insertcopying
  64 @end ifnottex
  65
  66 @menu
  67 * Introduction::                How to use this manual.
  68 * Preparation::                 What you should do before using the library.
  69 * Stringprep Functions::        Stringprep functions.
  70 * Punycode Functions::          Punycode functions.
  71 * IDNA Functions::              IDNA functions.
  72 * Examples::                    Demonstrate how to use the library.
  73 * Invoking idn::                Command line interface to the library.
  74 * Emacs API::                   Emacs Lisp API for Libidn.
  75 * Acknowledgements::            Whom to blame.
  76
  77 Indices
  78
  79 * Concept Index::
  80 * Function and Variable Index::
  81
  82 Appendices
  83
  84 * Library Copying::             How you can copy and share GNU Libidn.
  85 * Copying This Manual::         How you can copy and share this manual.
  86
  87 @end menu
  88
  89
  90 @node Introduction
  91 @chapter Introduction
  92
  93 GNU Libidn is an implementation of the Stringprep, Punycode and IDNA
  94 specifications defined by the IETF Internationalized Domain Names
  95 (IDN) working group, used for internationalized domain names.  The
  96 package is available under the GNU Lesser General Public License.
  97
  98 The library contains a generic Stringprep implementation that does
  99 Unicode 3.2 NFKC normalization, mapping and prohibitation of
 100 characters, and bidirectional character handling.  Profiles for iSCSI,
 101 Kerberos 5, Nameprep, SASL and XMPP are included.  Punycode and ASCII
 102 Compatible Encoding (ACE) via IDNA are supported.
 103
 104 The Stringprep API consists of two main functions, one for converting
 105 data from the system's native representation into UTF-8, and one
 106 function to perform the Stringprep processing.  Adding a new
 107 Stringprep profile for your application within the API is
 108 straightforward.  The Punycode API consists of one encoding function
 109 and one decoding function.  The IDNA API consists of the ToASCII and
 110 ToUnicode functions, as well as an high-level interface for converting
 111 entire domain names to and from the ACE encoded form.
 112
 113 The library is used by, e.g., GNU SASL and Shishi to process user
 114 names and passwords.  Libidn can be built into GNU Libc to enable a
 115 new system-wide getaddrinfo flag for IDN processing.
 116
 117 Libidn is developed for the GNU/Linux system, but runs on over 20 Unix
 118 platforms (including Solaris, IRIX, AIX, and Tru64) and Windows.
 119 Libidn is written in C and (parts of) the API is accessible from C,
 120 C++, Emacs Lisp, Python and Java.
 121
 122 @menu
 123 * Getting Started::
 124 * Features::
 125 * Supported Platforms::
 126 * Bug Reports::
 127 @end menu
 128
 129 @node Getting Started
 130 @section Getting Started
 131
 132 This manual documents the library programming interface.  All
 133 functions and data types provided by the library are explained.
 134
 135 The reader is assumed to possess basic familiarity with
 136 internationalization concepts and network programming in C or C++.
 137
 138 This manual can be used in several ways.  If read from the beginning
 139 to the end, it gives a good introduction into the library and how it
 140 can be used in an application.  Forward references are included where
 141 necessary.  Later on, the manual can be used as a reference manual to
 142 get just the information needed about any particular interface of the
 143 library.  Experienced programmers might want to start looking at the
 144 examples at the end of the manual (@pxref{Examples}), and then only
 145 read up those parts of the interface which are unclear.
 146
 147 @node Features
 148 @section Features
 149
 150 This library might have a couple of advantages over other libraries
 151 doing a similar job.
 152
 153 @table @asis
 154 @item It's Free Software
 155 Anybody can use, modify, and redistribute it under the terms of the
 156 GNU Lesser General Public License.
 157
 158 @item It's thread-safe
 159 No global state is kept in the library.
 160
 161 @item It's portable
 162 It should work on all Unix like operating systems, including Windows.
 163
 164 @end table
 165
 166 @node Supported Platforms
 167 @section Supported Platforms
 168
 169 Libidn has at some point in time been tested on the following
 170 platforms.
 171
 172 @enumerate
 173
 174 @item Debian GNU/Linux 3.0 (Woody)
 175 @cindex Debian
 176
 177 GCC 2.95.4 and GNU Make. This is the main development platform.
 178 @code{alphaev67-unknown-linux-gnu}, @code{alphaev6-unknown-linux-gnu},
 179 @code{arm-unknown-linux-gnu}, @code{armv4l-unknown-linux-gnu},
 180 @code{hppa-unknown-linux-gnu}, @code{hppa64-unknown-linux-gnu},
 181 @code{i686-pc-linux-gnu}, @code{ia64-unknown-linux-gnu},
 182 @code{m68k-unknown-linux-gnu}, @code{mips-unknown-linux-gnu},
 183 @code{mipsel-unknown-linux-gnu}, @code{powerpc-unknown-linux-gnu},
 184 @code{s390-ibm-linux-gnu}, @code{sparc-unknown-linux-gnu},
 185 @code{sparc64-unknown-linux-gnu}.
 186
 187 @item Debian GNU/Linux 2.1
 188 @cindex Debian
 189
 190 GCC 2.95.1 and GNU Make. @code{armv4l-unknown-linux-gnu}.
 191
 192 @item Tru64 UNIX
 193 @cindex Tru64
 194
 195 Tru64 UNIX C compiler and Tru64 Make. @code{alphaev67-dec-osf5.1},
 196 @code{alphaev68-dec-osf5.1}.
 197
 198 @item SuSE Linux 7.1
 199 @cindex SuSE
 200
 201 GCC 2.96 and GNU Make. @code{alphaev6-unknown-linux-gnu},
 202 @code{alphaev67-unknown-linux-gnu}.
 203
 204 @item SuSE Linux 7.2a
 205 @cindex SuSE Linux
 206
 207 GCC 3.0 and GNU Make. @code{ia64-unknown-linux-gnu}.
 208
 209 @item SuSE Linux
 210 @cindex SuSE Linux
 211
 212 GCC 3.2.2 and GNU Make.  @code{x86_64-unknown-linux-gnu} (AMD64
 213 Opteron ``Melody'').
 214
 215 @item RedHat Linux 7.2
 216 @cindex RedHat
 217
 218 GCC 2.96 and GNU Make. @code{alphaev6-unknown-linux-gnu},
 219 @code{alphaev67-unknown-linux-gnu}, @code{ia64-unknown-linux-gnu}.
 220
 221 @item RedHat Linux 8.0
 222 @cindex RedHat
 223
 224 GCC 3.2 and GNU Make. @code{i686-pc-linux-gnu}.
 225
 226 @item RedHat Advanced Server 2.1
 227 @cindex RedHat Advanced Server
 228
 229 GCC 2.96 and GNU Make. @code{i686-pc-linux-gnu}.
 230
 231 @item Slackware Linux 8.0.01
 232 @cindex RedHat
 233
 234 GCC 2.95.3 and GNU Make. @code{i686-pc-linux-gnu}.
 235
 236 @item Mandrake Linux 9.0
 237 @cindex Mandrake
 238
 239 GCC 3.2 and GNU Make. @code{i686-pc-linux-gnu}.
 240
 241 @item IRIX 6.5
 242 @cindex IRIX
 243
 244 MIPS C compiler, IRIX Make. @code{mips-sgi-irix6.5}.
 245
 246 @item AIX 4.3.2
 247 @cindex AIX
 248
 249 IBM C for AIX compiler, AIX Make.  @code{rs6000-ibm-aix4.3.2.0}.
 250
 251 @item Microsoft Windows 2000 (Cygwin)
 252 @cindex Windows
 253
 254 GCC 3.2, GNU make. @code{i686-pc-cygwin}.
 255
 256 @item HP-UX 11
 257 @cindex HP-UX
 258
 259 HP-UX C compiler and HP Make. @code{ia64-hp-hpux11.22},
 260 @code{hppa2.0w-hp-hpux11.11}.
 261
 262 @item SUN Solaris 2.8
 263 @cindex Solaris
 264
 265 Sun WorkShop Compiler C 6.0 and SUN Make. @code{sparc-sun-solaris2.8}.
 266
 267 @item NetBSD 1.6
 268 @cindex NetBSD
 269
 270 GCC 2.95.3 and GNU Make. @code{alpha-unknown-netbsd1.6},
 271 @code{i386-unknown-netbsdelf1.6}.
 272
 273 @item OpenBSD 3.1 and 3.2
 274 @cindex OpenBSD
 275
 276 GCC 2.95.3 and GNU Make. @code{alpha-unknown-openbsd3.1},
 277 @code{i386-unknown-openbsd3.1}.
 278
 279 @item FreeBSD 4.7 and 4.8
 280 @cindex FreeBSD
 281
 282 GCC 2.95.4 and GNU Make. @code{alpha-unknown-freebsd4.7},
 283 @code{alpha-unknown-freebsd4.8}, @code{i386-unknown-freebsd4.7},
 284 @code{i386-unknown-freebsd4.8}.
 285
 286 @item MacOS X 10.2 Server Edition
 287 @cindex MacOS X
 288
 289 GCC 3.1 and GNU Make. @code{powerpc-apple-darwin6.5}.
 290
 291 @end enumerate
 292
 293 If you use Libidn on, or port Libidn to, a new platform please report
 294 it to the author.
 295
 296 @node Bug Reports
 297 @section Bug Reports
 298 @cindex Reporting Bugs
 299
 300 If you think you have found a bug in Libidn, please investigate it and
 301 report it.
 302
 303 @itemize @bullet
 304
 305 @item Please make sure that the bug is really in Libidn, and
 306 preferably also check that it hasn't already been fixed in the latest
 307 version.
 308
 309 @item You have to send us a test case that makes it possible for us to
 310 reproduce the bug.
 311
 312 @item You also have to explain what is wrong; if you get a crash, or
 313 if the results printed are not good and in that case, in what way.
 314 Make sure that the bug report includes all information you would need
 315 to fix this kind of bug for someone else.
 316
 317 @end itemize
 318
 319 Please make an effort to produce a self-contained report, with
 320 something definite that can be tested or debugged.  Vague queries or
 321 piecemeal messages are difficult to act on and don't help the
 322 development effort.
 323
 324 If your bug report is good, we will do our best to help you to get a
 325 corrected version of the software; if the bug report is poor, we won't
 326 do anything about it (apart from asking you to send better bug
 327 reports).
 328
 329 If you think something in this manual is unclear, or downright
 330 incorrect, or if the language needs to be improved, please also send a
 331 note.
 332
 333 Send your bug report to:
 334
 335 @center @samp{bug-libidn@@gnu.org}
 336
 337
 338 @c **********************************************************
 339 @c *******************  Preparation  ************************
 340 @c **********************************************************
 341 @node Preparation
 342 @chapter Preparation
 343
 344 To use `Libidn', you have to perform some changes to your sources and
 345 the build system.  The necessary changes are small and explained in
 346 the following sections.  At the end of this chapter, it is described
 347 how the library is initialized, and how the requirements of the
 348 library are verified.
 349
 350 A faster way to find out how to adapt your application for use with
 351 `Libidn' may be to look at the examples at the end of this manual
 352 (@pxref{Examples}).
 353
 354 @menu
 355 * Header::
 356 * Initialization::
 357 * Version Check::
 358 * Building the source::
 359 @end menu
 360
 361 @node Header
 362 @section Header
 363
 364 The library contains a few independent parts, and each part export the
 365 interfaces (data types and functions) in a header file.  You must
 366 include the appropriate header files in all programs using the
 367 library, either directly or through some other header file, like this:
 368
 369 @example
 370 #include <stringprep.h>
 371 @end example
 372
 373 The header files and the functions they define are categorized as
 374 follows:
 375
 376 @table @asis
 377 @item stringprep.h
 378
 379 The low-level stringprep API entry point.  For IDN applications, this
 380 is usually invoked via IDNA. Some applications, specifically non-IDN
 381 ones, may want to prepare strings directly though, and should include
 382 this header file.
 383
 384 The name space of the stringprep part of Libidn is @code{stringprep*}
 385 for function names, @code{Stringprep*} for data types and
 386 @code{STRINGPREP_*} for other symbols.  In addition the same name
 387 prefixes with one prepended underscore are reserved for internal use
 388 and should never be used by an application.
 389
 390 @item punycode.h
 391
 392 The entry point to Punycode encoding and decoding functions.  Normally
 393 punycode is used via the idna.h interface, but some application may
 394 want to perform raw punycode operations.
 395
 396 The name space of the punycode part of Libidn is @code{punycode_*} for
 397 function names, @code{Punycode*} for data types and @code{PUNYCODE_*}
 398 for other symbols.  In addition the same name prefixes with one
 399 prepended underscore are reserved for internal use and should never be
 400 used by an application.
 401
 402 @item idna.h
 403
 404 The entry point to the IDNA functions.  This is the normal entry point
 405 for applications that need IDN functionality.
 406
 407 The name space of the IDNA part of Libidn is @code{idna_*} for
 408 function names, @code{Idna*} for data types and @code{IDNA_*} for
 409 other symbols.  In addition the same name prefixes with one prepended
 410 underscore are reserved for internal use and should never be used by
 411 an application.
 412
 413 @end table
 414
 415 @node Initialization
 416 @section Initialization
 417
 418 Libidn is stateless and does not need any initialization.
 419
 420 @node Version Check
 421 @section Version Check
 422
 423 It is often desirable to check that the version of `Libidn' used is
 424 indeed one which fits all requirements.  Even with binary
 425 compatibility new features may have been introduced but due to problem
 426 with the dynamic linker an old version is actually used.  So you may
 427 want to check that the version is okay right after program startup.
 428
 429 @deftypefun {const char *} stringprep_check_version (const char * @var{req_version})
 430
 431 @var{req_version}:  Required version number, or NULL.
 432
 433 Check that the the version of the library is at minimum the requested one
 434 and return the version string; return NULL if the condition is not
 435 satisfied.  If a NULL is passed to this function, no check is done,
 436 but the version string is simply returned.
 437
 438 See @var{STRINGPREP_VERSION} for a suitable @code{req_version} string.
 439
 440  Version string of run-time library, or NULL if the
 441 run-time library does not meet the required version number.
 442
 443 @end deftypefun
 444
 445 The normal way to use the function is to put something similar to the
 446 following first in your @code{main}:
 447
 448 @example
 449   if (!stringprep_check_version (STRINGPREP_VERSION))
 450     @{
 451       printf ("stringprep_check_version() failed:\n"
 452               "Header file incompatible with shared library.\n");
 453       exit(1);
 454     @}
 455 @end example
 456
 457 @node Building the source
 458 @section Building the source
 459 @cindex Compiling your application
 460
 461 If you want to compile a source file including e.g. the `idna.h' header
 462 file, you must make sure that the compiler can find it in the
 463 directory hierarchy.  This is accomplished by adding the path to the
 464 directory in which the header file is located to the compilers include
 465 file search path (via the @option{-I} option).
 466
 467 However, the path to the include file is determined at the time the
 468 source is configured.  To solve this problem, `Libidn' uses the
 469 external package @command{pkg-config} that knows the path to the
 470 include file and other configuration options.  The options that need
 471 to be added to the compiler invocation at compile time are output by
 472 the @option{--cflags} option to @command{pkg-config libidn}.  The
 473 following example shows how it can be used at the command line:
 474
 475 @example
 476 gcc -c foo.c `pkg-config libidn --cflags`
 477 @end example
 478
 479 Adding the output of @samp{pkg-config libidn --cflags} to the
 480 compilers command line will ensure that the compiler can find e.g. the
 481 idna.h header file.
 482
 483 A similar problem occurs when linking the program with the library.
 484 Again, the compiler has to find the library files.  For this to work,
 485 the path to the library files has to be added to the library search
 486 path (via the @option{-L} option).  For this, the option
 487 @option{--libs} to @command{pkg-config libidn} can be used.  For
 488 convenience, this option also outputs all other options that are
 489 required to link the program with the `libidn' libarary.  The example
 490 shows how to link @file{foo.o} with the `libidn' library to a program
 491 @command{foo}.
 492
 493 @example
 494 gcc -o foo foo.o `pkg-config libidn --libs`
 495 @end example
 496
 497 Of course you can also combine both examples to a single command by
 498 specifying both options to @command{pkg-config}:
 499
 500 @example
 501 gcc -o foo foo.c `pkg-config libidn --cflags --libs`
 502 @end example
 503
 504 @c **********************************************************
 505 @c ******************  Stringprep Functions *****************
 506 @c **********************************************************
 507 @node Stringprep Functions
 508 @chapter Stringprep Functions
 509 @cindex Stringprep Functions
 510
 511 Stringprep describes a framework for preparing Unicode text strings in
 512 order to increase the likelihood that string input and string
 513 comparison work in ways that make sense for typical users throughout
 514 the world. The stringprep protocol is useful for protocol identifier
 515 values, company and personal names, internationalized domain names,
 516 and other text strings.
 517
 518 @section Return Codes
 519
 520 All functions return a code of the @code{enum Stringprep_rc}
 521 enumerated type:
 522
 523 @deftypevr {Return code} {enum Stringprep_rc} {STRINGPREP_OK = 0}
 524 Successful operation.  This value is guaranteed to always be zero, the
 525 remaining ones are only guaranteed to hold non-zero values, for
 526 logical comparison purposes.
 527 @end deftypevr
 528
 529 @deftypevr {Return code} {enum Stringprep_rc} {STRINGPREP_CONTAINS_UNASSIGNED}
 530 String contain unassigned Unicode code points, which is forbidden by
 531 the profile.
 532 @end deftypevr
 533
 534 @deftypevr {Return code} {enum Stringprep_rc} {STRINGPREP_CONTAINS_PROHIBITED}
 535 String contain code points prohibited by the profile.
 536 @end deftypevr
 537
 538 @deftypevr {Return code} {enum Stringprep_rc} {STRINGPREP_BIDI_BOTH_L_AND_RAL}
 539 String contain code points with conflicting bidirection category.
 540 @end deftypevr
 541
 542 @deftypevr {Return code} {enum Stringprep_rc} {STRINGPREP_BIDI_LEADTRAIL_NOT_RAL}
 543 Leading and trailing character in string not of proper bidirectional
 544 category.
 545 @end deftypevr
 546
 547 @deftypevr {Return code} {enum Stringprep_rc} {STRINGPREP_BIDI_CONTAINS_PROHIBITED}
 548 Contains prohibited code points detected by bidirectional code.
 549 @end deftypevr
 550
 551 @deftypevr {Return code} {enum Stringprep_rc} {STRINGPREP_TOO_SMALL_BUFFER}
 552 Buffer handed to function was too small.  This usually indicate a
 553 problem in the calling application.
 554 @end deftypevr
 555
 556 @deftypevr {Return code} {enum Stringprep_rc} {STRINGPREP_PROFILE_ERROR}
 557 The stringprep profile was inconsistent..  This usually indicate an
 558 internal error in the library.
 559 @end deftypevr
 560
 561 @deftypevr {Return code} {enum Stringprep_rc} {STRINGPREP_FLAG_ERROR}
 562 The supplied flag conflicted with profile.  This usually indicate a
 563 problem in the calling application.
 564 @end deftypevr
 565
 566 @deftypevr {Return code} {enum Stringprep_rc} {STRINGPREP_UNKNOWN_PROFILE}
 567 The supplied profile name was not known to the library.
 568 @end deftypevr
 569
 570 @deftypevr {Return code} {enum Stringprep_rc} {STRINGPREP_NFKC_FAILED}
 571 The Unicode NFKC operation failed.  This usually indicate an internal
 572 error in the library.
 573 @end deftypevr
 574
 575 @deftypevr {Return code} {enum Stringprep_rc} {STRINGPREP_MALLOC_ERROR}
 576 The @code{malloc} was out of memory.  This is usually a fatal error.
 577 @end deftypevr
 578
 579 @section Control Flags
 580
 581 @deftypevr {Stringprep flags} {enum Stringprep_profile_flags} {STRINGPREP_NO_NFKC}
 582 Disable the NFKC normalization, as well as selecting the non-NFKC case
 583 folding tables.  Usually the profile specifies BIDI and NFKC settings,
 584 and applications should not override it unless in special situations.
 585 @end deftypevr
 586
 587 @deftypevr {Stringprep flags} {enum Stringprep_profile_flags} {STRINGPREP_NO_BIDI}
 588 Disable the BIDI step.  Usually the profile specifies BIDI and NFKC
 589 settings, and applications should not override it unless in special
 590 situations.
 591 @end deftypevr
 592
 593 @deftypevr {Stringprep flags} {enum Stringprep_profile_flags} {STRINGPREP_NO_UNASSIGNED}
 594 Make the library return with an error if string contains unassigned
 595 characters according to profile.
 596 @end deftypevr
 597
 598 @section Defining A Stringprep Profile
 599
 600 Further types and structures are defined for applications that want to
 601 specify their own stringprep profile.  As these are fairly obscure,
 602 and by necessity tied to the implementation, we do not document them
 603 here.  Look into the @file{stringprep.h} header file, and the
 604 @file{profiles.c} source code for the details.
 605
 606 @section Core Functions
 607
 608 @deftypefun {enum Stringprep_rc} stringprep (char * @var{in}, size_t @var{maxlen}, enum Stringprep_profile_flags @var{flags}, Stringprep_profile * @var{profile})
 609
 610 @var{in}: input/ouput array with string to prepare.
 611
 612 @var{maxlen}: maximum length of input/output array.
 613
 614 @var{flags}: optional stringprep profile flags.
 615
 616 @var{profile}: pointer to stringprep profile to use.
 617
 618 Prepare the input UTF-8 string according to the stringprep profile.
 619 Normally application programmers use stringprep profile macros such as
 620 @code{stringprep_nameprep}, @code{stringprep_kerberos5} etc instead of
 621 calling this function directly.
 622
 623 Since the stringprep operation can expand the string, @code{maxlen}
 624 indicate how large the buffer holding the string is.  The @code{flags}
 625 are one of Stringprep_profile_flags, or 0.  The profile indicates
 626 processing details specific to that profile.  Your application can
 627 define new profiles, possibly re-using the generic stringprep tables
 628 that always will be part of the library.
 629
 630 Note that you must convert strings entered in the systems locale into
 631 UTF-8 before using this function.
 632
 633 Returns 0 iff successful, or an error code.
 634 @end deftypefun
 635
 636 @deftypefun {enum Stringprep_rc} stringprep_profile (char * @var{in}, char ** @var{out}, char * @var{profile}, enum Stringprep_profile_flags @var{flags})
 637
 638 @var{in}: input/ouput array with string to prepare.
 639
 640 @var{out}: output variable with newly allocate string.
 641
 642 @var{profile}: name of stringprep profile to use.
 643
 644 @var{flags}: optional stringprep profile flags.
 645
 646 Prepare the input UTF-8 string according to the stringprep profile.
 647 Normally application programmers use stringprep profile macros such as
 648 @code{stringprep_nameprep}, @code{stringprep_kerberos5} etc
 649 instead of calling this function directly.
 650
 651 Note that you must convert strings entered in the systems locale into
 652 UTF-8 before using this function.
 653
 654 The output @code{out} variable must be deallocated by the caller.
 655
 656 Returns 0 iff successful, or an error code.
 657 @end deftypefun
 658
 659 @section Unicode Character Codings
 660
 661 @deftypefun {uint32_t} stringprep_utf8_to_unichar (const char * @var{p})
 662
 663 @var{p}: a pointer to Unicode character encoded as UTF-8
 664
 665 Converts a sequence of bytes encoded as UTF-8 to a Unicode character.
 666 If @code{p} does not point to a valid UTF-8 encoded character, results
 667 are undefined.
 668
 669 Returns the resulting character.
 670 @end deftypefun
 671
 672 @deftypefun {int} stringprep_unichar_to_utf8 (uint32_t @var{c}, char * @var{outbuf})
 673
 674 @var{c}: a ISO10646 character code
 675
 676 @var{outbuf}: output buffer, must have at least 6 bytes of space.  If
 677 @var{NULL}, the length will be computed and returned and nothing will
 678 be written to @code{outbuf}.
 679
 680 Converts a single character to UTF-8.
 681
 682 Returns the number of bytes written.
 683 @end deftypefun
 684
 685 @deftypefun {uint32_t *} stringprep_utf8_to_ucs4 (const char * @var{str}, ssize_t @var{len}, size_t * @var{items_written})
 686
 687 @var{str}: a UTF-8 encoded string
 688
 689 @var{len}: the maximum length of @code{str} to use. If @code{len} < 0,
 690 then the string is nul-terminated.
 691
 692 @var{items_written}: location to store the number of characters in the
 693 result, or @var{NULL}.
 694
 695 Convert a string from UTF-8 to a 32-bit fixed width representation as
 696 UCS-4, assuming valid UTF-8 input.  This function does no error
 697 checking on the input.
 698
 699 Returns a pointer to a newly allocated UCS-4 string.  This value must
 700 be freed with @code{free}.
 701 @end deftypefun
 702
 703 @deftypefun {char *} stringprep_ucs4_to_utf8 (const uint32_t * @var{str}, ssize_t @var{len}, size_t * @var{items_read}, size_t * @var{items_written})
 704
 705 @var{str}: a UCS-4 encoded string
 706
 707 @var{len}: the maximum length of @code{str} to use. If @code{len} < 0,
 708 then the string is terminated with a 0 character.
 709
 710 @var{items_read}: location to store number of characters read read, or
 711 @var{NULL}.
 712
 713 @var{items_written}: location to store number of bytes written or
 714 @var{NULL}.  The value here stored does not include the trailing 0
 715 byte.
 716
 717 Convert a string from a 32-bit fixed width representation as UCS-4.
 718 to UTF-8. The result will be terminated with a 0 byte.
 719
 720 Returns a pointer to a newly allocated UTF-8 string.  This value must
 721 be freed with @code{free}. If an error occurs, @var{NULL} will be
 722 returned and @code{error} set.
 723 @end deftypefun
 724
 725 @section Unicode Normalization
 726
 727 @deftypefun {char *} stringprep_utf8_nfkc_normalize (const char * @var{str}, ssize_t @var{len})
 728
 729 @var{str}: a UTF-8 encoded string.
 730
 731 @var{len}: length of @code{str}, in bytes, or -1 if @code{str} is
 732 nul-terminated.
 733
 734 Converts a string into canonical form, standardizing such issues as
 735 whether a character with an accent is represented as a base character
 736 and combining accent or as a single precomposed character.
 737
 738 The normalization mode is NFKC (ALL COMPOSE).  It standardizes
 739 differences that do not affect the text content, such as the
 740 above-mentioned accent representation. It standardizes the
 741 "compatibility" characters in Unicode, such as SUPERSCRIPT THREE to
 742 the standard forms (in this case DIGIT THREE). Formatting information
 743 may be lost but for most text operations such characters should be
 744 considered the same. It returns a result with composed forms rather
 745 than a maximally decomposed form.
 746
 747 Returns a newly allocated string, that is the NFKC normalized form of
 748 @code{str}.
 749 @end deftypefun
 750
 751 @deftypefun {uint32_t *} stringprep_ucs4_nfkc_normalize (uint32_t * @var{str}, ssize_t @var{len})
 752
 753 @var{str}: a Unicode string.
 754
 755 @var{len}: length of @code{str} array, or -1 if @code{str} is
 756 nul-terminated.
 757
 758 Converts UCS4 string into UTF-8 and runs
 759 @code{stringprep_utf8_nfkc_normalize}.
 760
 761 Returns a newly allocated Unicode string, that is the NFKC normalized
 762 form of @code{str}.
 763 @end deftypefun
 764
 765 @section Character Set Conversion
 766
 767 @deftypefun {const char *} stringprep_locale_charset ( @var{void})
 768 Return the character set used by the system locale.  It will never
 769 return NULL, but use "ASCII" as a fallback.
 770 @end deftypefun
 771
 772 @deftypefun {char *} stringprep_convert (const char * @var{str}, const char * @var{to_codeset}, const char * @var{from_codeset})
 773
 774 @var{str}: input zero-terminated string.
 775
 776 @var{to_codeset}: name of destination character set.
 777
 778 @var{from_codeset}: name of origin character set, as used by
 779 @code{str}.
 780
 781 Convert the string from one character set to another using the
 782 system's @code{iconv} function.
 783
 784 Returns newly allocated zero-terminated string which is @code{str}
 785 transcoded into to_codeset.
 786 @end deftypefun
 787
 788 @deftypefun {char *} stringprep_locale_to_utf8 (const char * @var{str})
 789
 790 @var{str}: input zero terminated string.
 791
 792 Convert string encoded in the locale's character set into UTF-8 by
 793 using @code{stringprep_convert}.
 794
 795 Returns newly allocated zero-terminated string which is @code{str}
 796 transcoded into UTF-8.
 797 @end deftypefun
 798
 799 @deftypefun {char *} stringprep_utf8_to_locale (const char * @var{str})
 800
 801 @var{str}: input zero terminated string.
 802
 803 Convert string encoded in UTF-8 into the locale's character set by
 804 using @code{stringprep_convert}.
 805
 806 Returns newly allocated zero-terminated string which is @code{str}
 807 transcoded into the locale's character set.
 808 @end deftypefun
 809
 810 @section Stringprep Profile Macros
 811
 812 @deftypefun {enum Stringprep_rc} stringprep_nameprep_no_unassigned (char * @var{in}, int @var{maxlen})
 813
 814 @var{in}: input/ouput array with string to prepare.
 815
 816 @var{maxlen}: maximum length of input/output array.
 817
 818 Prepare the input UTF-8 string according to the nameprep profile.  The
 819 AllowUnassigned flag is false, use @code{stringprep_nameprep} for
 820 true AllowUnassigned.  Returns 0 iff successful, or an error code.
 821 @end deftypefun
 822
 823 @deftypefun {enum Stringprep_rc} stringprep_iscsi (char * @var{in}, int @var{maxlen})
 824
 825 @var{in}: input/ouput array with string to prepare.
 826
 827 @var{maxlen}: maximum length of input/output array.
 828
 829 Prepare the input UTF-8 string according to the draft iSCSI stringprep
 830 profile.  Returns 0 iff successful, or an error code.
 831 @end deftypefun
 832
 833 @deftypefun {enum Stringprep_rc} stringprep_kerberos5 (char * @var{in}, int @var{maxlen})
 834
 835 @var{in}:  input/ouput array with string to prepare.
 836
 837 @var{maxlen}: maximum length of input/output array.
 838
 839 Prepare the input UTF-8 string according to the draft Kerberos5
 840 stringprep profile.  Returns 0 iff successful, or an error code.
 841 @end deftypefun
 842
 843 @deftypefun {enum Stringprep_rc} stringprep_plain (char * @var{in}, int @var{maxlen})
 844
 845 @var{in}: input/ouput array with string to prepare.
 846
 847 @var{maxlen}: maximum length of input/output array.
 848
 849 Prepare the input UTF-8 string according to the draft SASL ANONYMOUS
 850 profile.  Returns 0 iff successful, or an error code.
 851 @end deftypefun
 852
 853 @deftypefun {enum Stringprep_rc} stringprep_xmpp_nodeprep (char * @var{in}, int @var{maxlen})
 854
 855 @var{in}: input/ouput array with string to prepare.
 856
 857 @var{maxlen}: maximum length of input/output array.
 858
 859 Prepare the input UTF-8 string according to the draft XMPP node
 860 identifier profile.  Returns 0 iff successful, or an error code.
 861 @end deftypefun
 862
 863 @deftypefun {enum Stringprep_rc} stringprep_xmpp_resourceprep (char * @var{in}, int @var{maxlen})
 864
 865 @var{in}: input/ouput array with string to prepare.
 866
 867 @var{maxlen}: maximum length of input/output array.
 868
 869 Prepare the input UTF-8 string according to the draft XMPP resource
 870 identifier profile.  Returns 0 iff successful, or an error code.
 871 @end deftypefun
 872
 873 @deftypefun {enum Stringprep_rc} stringprep_generic (char * @var{in}, int @var{maxlen})
 874
 875 @var{in}: input/ouput array with string to prepare.
 876
 877 @var{maxlen}: maximum length of input/output array.
 878
 879 Prepare the input UTF-8 string according to a hypotetical "generic"
 880 stringprep profile. This is mostly used for debugging or when
 881 constructing new stringprep profiles. Returns 0 iff successful, or an
 882 error code.
 883 @end deftypefun
 884
 885 @c **********************************************************
 886 @c *******************  Punycode Functions ******************
 887 @c **********************************************************
 888 @node Punycode Functions
 889 @chapter Punycode Functions
 890 @cindex Punycode Functions
 891
 892 Punycode is a simple and efficient transfer encoding syntax designed
 893 for use with Internationalized Domain Names in Applications. It
 894 uniquely and reversibly transforms a Unicode string into an ASCII
 895 string. ASCII characters in the Unicode string are represented
 896 literally, and non-ASCII characters are represented by ASCII
 897 characters that are allowed in host name labels (letters, digits, and
 898 hyphens). A general algorithm called Bootstring allows a string of
 899 basic code points to uniquely represent any string of code points
 900 drawn from a larger set. Punycode is an instance of Bootstring that
 901 uses particular parameter values, appropriate for IDNA.
 902
 903 @section Return Codes
 904
 905 All functions return a code of the @code{enum punycode_status}
 906 enumerated type:
 907
 908 @deftypevr {Return code} {enum punycode_status} {PUNYCODE_SUCCESS = 0}
 909 Successful operation.  This value is guaranteed to always be zero, the
 910 remaining ones are only guaranteed to hold non-zero values, for
 911 logical comparison purposes.
 912 @end deftypevr
 913
 914 @deftypevr {Return code} {enum punycode_status} {PUNYCODE_BAD_INPUT}
 915 Input is invalid.
 916 @end deftypevr
 917
 918 @deftypevr {Return code} {enum punycode_status} {PUNYCODE_BIG_OUTPUT}
 919 Output would exceed the space provided.
 920 @end deftypevr
 921
 922 @deftypevr {Return code} {enum punycode_status} {PUNYCODE_BIG_OVERFLOW}
 923 Input needs wider integers to process.
 924 @end deftypevr
 925
 926 @section Unicode Code Point Type
 927
 928 The punycode function uses a special type to denote Unicode code
 929 points.  It is guaranteed to always be a 32 bit unsigned integer.
 930
 931 @deftypevr {Punycode Unicode code point} uint32_t punycode_uint
 932 A unsigned integer that hold Unicode code points.
 933 @end deftypevr
 934
 935 @section Core Functions
 936
 937 Note that the current implementation return @code{PUNYCODE_BAD_INPUT}
 938 if the @code{input_length} exceed 4294967295 characters.  This
 939 restriction may be removed in the future.  Meanwhile applications are
 940 encouraged to not depend on this problem, and use @code{sizeof} to
 941 initialize @code{input_length} and @code{output_length}.
 942
 943 The functions provided are the following two entry points:
 944
 945 @deftypefun {enum punycode_status} punycode_encode (size_t @var{input_length}, const punycode_uint @var{input}[], const unsigned char @var{case_flags}[], size_t * @var{output_length}, char @var{output}[])
 946
 947 @var{input_length}: The input_length is the number of code points in
 948 the input.
 949
 950 @var{input}: The input is represented as an array of Unicode code
 951 points (not code units; surrogate pairs are not allowed).  It must
 952 contain at least @code{input_length} code points.
 953
 954 @var{case_flags array}: Holds input_length boolean values, where
 955 nonzero suggests that the corresponding Unicode character be forced to
 956 uppercase after being decoded (if possible), and zero suggests that it
 957 be forced to lowercase (if possible).  ASCII code points are encoded
 958 literally, except that ASCII letters are forced to uppercase or
 959 lowercase according to the corresponding uppercase flags.  If
 960 case_flags is a null pointer then ASCII letters are left as they are,
 961 and other code points are treated as if their uppercase flags were
 962 zero.
 963
 964 @var{output_length}: The output_length is an in/out argument: the
 965 caller passes in the maximum number of code points that it can
 966 receive, and on successful return it will contain the number of code
 967 points actually output.
 968
 969 @var{output}: The output, must have room for at least
 970 @code{output_length} code points.  The output will be represented as
 971 an array of ASCII code points.  The output string is not
 972 null-terminated; it will contain zeros if and only if the input
 973 contains zeros.  (Of course the caller can leave room for a terminator
 974 and add one if needed.)
 975
 976 Converts Unicode to Punycode.
 977
 978 The return value can be any of the punycode_status values defined
 979 above except @code{PUNYCODE_BAD_INPUT}; if not
 980 @code{PUNYCODE_SUCCESS}, then output_size and output might contain
 981 garbage.
 982
 983 @end deftypefun
 984
 985 @deftypefun {enum punycode_status} punycode_decode (size_t @var{input_length}, const char @var{input}[], size_t * @var{output_length}, punycode_uint @var{output}[], unsigned char @var{case_flags}[])
 986
 987 @var{input_length}: The input_length is the number of code points in
 988 the input.
 989
 990 @var{input}: The input is represented as an array of ASCII code
 991 points.  It must contain at least @code{input_length} code points.
 992
 993 @var{case_flags array}: The case_flags array needs room for at least
 994 @code{output_length} values, or it can be a @code{NULL} pointer if the
 995 case information is not needed.  A nonzero flag suggests that the
 996 corresponding Unicode character be forced to uppercase by the caller
 997 (if possible), while zero suggests that it be forced to lowercase (if
 998 possible).  ASCII code points are output already in the proper case,
 999 but their flags will be set appropriately so that applying the flags
1000 would be harmless.
1001
1002 @var{output_length}: The output_length is an in/out argument: the
1003 caller passes in the maximum number of code points that it can
1004 receive, and on successful return it will contain the number of code
1005 points actually output.
1006
1007 @var{output}: The output, must have room for at least
1008 @code{output_length} code points.  The output will be represented as
1009 an array of ASCII code points.  The output string is not
1010 null-terminated; it will contain zeros if and only if the input
1011 contains zeros.  (Of course the caller can leave room for a terminator
1012 and add one if needed.)
1013
1014 Converts Punycode to Unicode.
1015
1016 The return value can be any of the punycode_status values defined
1017 above; if not @code{PUNYCODE_SUCCESS}, then output_length, output, and
1018 case_flags might contain garbage.  On success, the decoder will never
1019 need to write an output_length greater than input_length, because of
1020 how the encoding is defined.
1021
1022 @end deftypefun
1023
1024
1025 @c **********************************************************
1026 @c ********************* IDNA Functions *********************
1027 @c **********************************************************
1028 @node IDNA Functions
1029 @chapter IDNA Functions
1030 @cindex IDNA Functions
1031
1032 Until now, there has been no standard method for domain names to use
1033 characters outside the ASCII repertoire. The IDNA document defines
1034 internationalized domain names (IDNs) and a mechanism called IDNA for
1035 handling them in a standard fashion. IDNs use characters drawn from a
1036 large repertoire (Unicode), but IDNA allows the non-ASCII characters
1037 to be represented using only the ASCII characters already allowed in
1038 so-called host names today. This backward-compatible representation is
1039 required in existing protocols like DNS, so that IDNs can be
1040 introduced with no changes to the existing infrastructure. IDNA is
1041 only meant for processing domain names, not free text.
1042
1043 @section Return Codes
1044
1045 All functions return a exit code:
1046
1047 @deftypevr {Return code} {enum Idna_rc} {IDNA_SUCCESS = 0}
1048 Successful operation.  This value is guaranteed to always be zero, the
1049 remaining ones are only guaranteed to hold non-zero values, for
1050 logical comparison purposes.
1051 @end deftypevr
1052
1053 @deftypevr {Return code} {enum Idna_rc} IDNA_STRINGPREP_ERROR
1054 Error during string preparation.
1055 @end deftypevr
1056
1057 @deftypevr {Return code} {enum Idna_rc} IDNA_PUNYCODE_ERROR
1058 Error during punycode operation.
1059 @end deftypevr
1060
1061 @deftypevr {Return code} {enum Idna_rc} IDNA_CONTAINS_LDH
1062 For IDNA_USE_STD3_ASCII_RULES, indicate that the string contains LDH
1063 ASCII characters.
1064 @end deftypevr
1065
1066 @deftypevr {Return code} {enum Idna_rc} IDNA_CONTAINS_MINUS
1067 For IDNA_USE_STD3_ASCII_RULES, indicate that the string contains a
1068 leading or trailing hyphen-minus (U+002D).
1069 @end deftypevr
1070
1071 @deftypevr {Return code} {enum Idna_rc} IDNA_INVALID_LENGTH
1072 The final output string is not within the (inclusive) range 1 to 63
1073 characters.
1074 @end deftypevr
1075
1076 @deftypevr {Return code} {enum Idna_rc} IDNA_NO_ACE_PREFIX
1077 The string does not contain the ACE prefix (for ToUnicode).
1078 @end deftypevr
1079
1080 @deftypevr {Return code} {enum Idna_rc} IDNA_ROUNDTRIP_VERIFY_ERROR
1081 The ToASCII operation on output string does not equal the input.
1082 @end deftypevr
1083
1084 @deftypevr {Return code} {enum Idna_rc} IDNA_CONTAINS_ACE_PREFIX
1085 The input contains the ACE prefix (for ToASCII).
1086 @end deftypevr
1087
1088 @deftypevr {Return code} {enum Idna_rc} IDNA_ICONV_ERROR
1089 Could not convert string in locale encoding.
1090 @end deftypevr
1091
1092 @deftypevr {Return code} {enum Idna_rc} IDNA_MALLOC_ERROR
1093 Could not allocate buffer (this is typically a fatal error).
1094 @end deftypevr
1095
1096 @section Control Flags
1097
1098 The IDNA @code{flags} parameter can take on the following values, or a
1099 bit-wise inclusive or of any subset of the parameters:
1100
1101 @deftypevr {Return code} {enum Idna_flags} IDNA_ALLOW_UNASSIGNED
1102 Allow unassigned Unicode code points.
1103 @end deftypevr
1104
1105 @deftypevr {Return code} {enum Idna_flags} IDNA_USE_STD3_ASCII_RULES
1106 Check output to make sure it is a STD3 conforming host name.
1107 @end deftypevr
1108
1109 @section Prefix String
1110
1111 @deftypevr {Macro} {#define} IDNA_ACE_PREFIX
1112 String with the official IDNA prefix, ``xn--''.
1113 @end deftypevr
1114
1115 @section Core Functions
1116
1117 The idea behind the IDNA function names are as follows: the
1118 @code{idna_to_ascii_4i} and @code{idna_to_unicode_44i} functions are
1119 the core IDNA primitives.  The @code{4} indicate that the function
1120 takes UCS-4 strings (i.e., Unicode code points encoded in a 32-bit
1121 unsigned integer type) of the specified length.  The @code{i} indicate
1122 that the data is written ``inline'' into the buffer.  This means the
1123 caller is responsible for allocating (and deallocating) the string,
1124 and providing the library with the allocated length of the string.
1125 The output length is written in the output length variable.  The
1126 remaining functions all contain the @code{z} indicator, which means
1127 the strings are zero terminated.  All output strings are allocated by
1128 the library, and must be deallocated by the caller.  The @code{4}
1129 indicator again means that the string is UCS-4, the @code{8} means the
1130 strings are UTF-8 and the @code{l} indicator means the strings are
1131 encoded in the encoding used by the current locale.
1132
1133 The functions provided are the following entry points:
1134
1135 @deftypefun {enum Idna_rc} idna_to_ascii_4i (const uint32_t * @var{in}, size_t @var{inlen}, char * @var{out}, {enum Idna_flags} @var{flags})
1136
1137 @var{in}: input array with unicode code points.
1138
1139 @var{inlen}: length of input array with unicode code points.
1140
1141 @var{out}: output zero terminated string that must have room for at
1142 least 63 characters plus the terminating zero.
1143
1144 @var{flags}: IDNA flags, e.g. IDNA_ALLOW_UNASSIGNED.
1145
1146 The ToASCII operation takes a sequence of Unicode code points that
1147 make up one label and transforms it into a sequence of code points in
1148 the ASCII range (0..7F). If ToASCII succeeds, the original sequence
1149 and the resulting sequence are equivalent labels.
1150
1151 It is important to note that the ToASCII operation can fail. ToASCII
1152 fails if any step of it fails. If any step of the ToASCII operation
1153 fails on any label in a domain name, that domain name MUST NOT be used
1154 as an internationalized domain name. The method for deadling with this
1155 failure is application-specific.
1156
1157 The inputs to ToASCII are a sequence of code points, the
1158 AllowUnassigned flag, and the UseSTD3ASCIIRules flag. The output of
1159 ToASCII is either a sequence of ASCII code points or a failure
1160 condition.
1161
1162 ToASCII never alters a sequence of code points that are all in the
1163 ASCII range to begin with (although it could fail). Applying the
1164 ToASCII operation multiple times has exactly the same effect as
1165 applying it just once.
1166
1167 Returns 0 on success, or an error code.
1168 @end deftypefun
1169
1170 @deftypefun {enum Idna_rc} idna_to_unicode_44i (const uint32_t * @var{in}, size_t @var{inlen}, uint32_t * @var{out}, size_t * @var{outlen}, {enum Idna_flags} @var{flags})
1171
1172 @var{in}: input array with unicode code points.
1173
1174 @var{inlen}: length of input array with unicode code points.
1175
1176 @var{out}: output array with unicode code points.
1177
1178 @var{outlen}: on input, maximum size of output array with unicode code
1179 points, on exit, actual size of output array with unicode code points.
1180
1181 @var{flags}: IDNA flags, e.g. IDNA_ALLOW_UNASSIGNED.
1182
1183 The ToUnicode operation takes a sequence of Unicode code points that
1184 make up one label and returns a sequence of Unicode code points. If
1185 the input sequence is a label in ACE form, then the result is an
1186 equivalent internationalized label that is not in ACE form, otherwise
1187 the original sequence is returned unaltered.
1188
1189 ToUnicode never fails. If any step fails, then the original input
1190 sequence is returned immediately in that step.
1191
1192 The ToUnicode output never contains more code points than its input.
1193 Note that the number of octets needed to represent a sequence of code
1194 points depends on the particular character encoding used.
1195
1196 The inputs to ToUnicode are a sequence of code points, the
1197 AllowUnassigned flag, and the UseSTD3ASCIIRules flag. The output of
1198 ToUnicode is always a sequence of Unicode code points.
1199
1200 Returns error condition, but it must only be used for debugging
1201 purposes.  The output buffer is always guaranteed to contain the
1202 correct data according to the specification (sans malloc induced
1203 errors).  NB!  This means that you normally ignore the return code
1204 from this function, as checking it means breaking the standard.
1205 @end deftypefun
1206
1207 @section Simplified ToASCII Interface
1208
1209 @deftypefun {enum Idna_rc} idna_to_ascii_4z (const uint32_t * @var{input}, char ** @var{output}, {enum Idna_flags} @var{flags})
1210
1211 @var{input}: zero terminated input Unicode string.
1212
1213 @var{output}: pointer to newly allocated output string.
1214
1215 @var{flags}: IDNA flags, e.g. IDNA_ALLOW_UNASSIGNED.
1216
1217 Convert UCS-4 domain name to ASCII string.  The domain name may
1218 contain several labels, separated by dots.  The output buffer must be
1219 deallocated by the caller.
1220
1221 Returns IDNA_SUCCESS on success, or error code.
1222 @end deftypefun
1223
1224 @deftypefun {enum Idna_rc} idna_to_ascii_8z (const char * @var{input}, char ** @var{output}, {enum Idna_flags} @var{flags})
1225
1226 @var{input}: zero terminated input UTF-8 string.
1227
1228 @var{output}: pointer to newly allocated output string.
1229
1230 @var{flags}: IDNA flags, e.g. IDNA_ALLOW_UNASSIGNED.
1231
1232 Convert UTF-8 domain name to ASCII string.  The domain name may
1233 contain several labels, separated by dots.  The output buffer must
1234 be deallocated by the caller.
1235
1236 Returns IDNA_SUCCESS on success, or error code.
1237 @end deftypefun
1238
1239 @deftypefun {enum Idna_rc} idna_to_ascii_lz (const char * @var{input}, char ** @var{output}, {enum Idna_flags} @var{flags})
1240
1241 @var{input}: zero terminated input UTF-8 string.
1242
1243 @var{output}: pointer to newly allocated output string.
1244
1245 @var{flags}: IDNA flags, e.g. IDNA_ALLOW_UNASSIGNED.
1246
1247 Convert domain name in the locale's encoding to ASCII string.  The
1248 domain name may contain several labels, separated by dots.  The output
1249 buffer must be deallocated by the caller.
1250
1251 Returns IDNA_SUCCESS on success, or error code.
1252 @end deftypefun
1253
1254 @section Simplified ToUnicode Interface
1255
1256 @deftypefun {enum Idna_rc} idna_to_unicode_4z4z (const uint32_t * @var{input}, uint32_t ** @var{output}, {enum Idna_flags} @var{flags})
1257
1258 @var{input}: zero-terminated Unicode string.
1259
1260 @var{output}: pointer to newly allocated output Unicode string.
1261
1262 @var{flags}: IDNA flags, e.g. IDNA_ALLOW_UNASSIGNED.
1263
1264 Convert possibly ACE encoded domain name in UCS-4 format into a UCS-4
1265 string.  The domain name may contain several labels, separated by
1266 dots.  The output buffer must be deallocated by the caller.
1267
1268 Returns IDNA_SUCCESS on success, or error code.
1269 @end deftypefun
1270
1271 @deftypefun {enum Idna_rc} idna_to_unicode_8z4z (const char * @var{input}, uint32_t ** @var{output}, {enum Idna_flags} @var{flags})
1272
1273 @var{input}: zero-terminated UTF-8 string.
1274
1275 @var{output}: pointer to newly allocated output Unicode string.
1276
1277 @var{flags}: IDNA flags, e.g. IDNA_ALLOW_UNASSIGNED.
1278
1279 Convert possibly ACE encoded domain name in UTF-8 format into a UCS-4
1280 string.  The domain name may contain several labels, separated by
1281 dots.  The output buffer must be deallocated by the caller.
1282
1283 Returns IDNA_SUCCESS on success, or error code.
1284 @end deftypefun
1285
1286 @deftypefun {enum Idna_rc} idna_to_unicode_8z8z (const char * @var{input}, char ** @var{output}, {enum Idna_flags} @var{flags})
1287
1288 @var{input}: zero-terminated UTF-8 string.
1289
1290 @var{output}: pointer to newly allocated output UTF-8 string.
1291
1292 @var{flags}: IDNA flags, e.g. IDNA_ALLOW_UNASSIGNED.
1293
1294 Convert possibly ACE encoded domain name in UTF-8 format into a UTF-8
1295 string.  The domain name may contain several labels, separated by
1296 dots.  The output buffer must be deallocated by the caller.
1297
1298 Returns IDNA_SUCCESS on success, or error code.
1299 @end deftypefun
1300
1301 @deftypefun {enum Idna_rc} idna_to_unicode_8zlz (const char * @var{input}, char ** @var{output}, {enum Idna_flags} @var{flags})
1302
1303 @var{input}: zero-terminated UTF-8 string.
1304
1305 @var{output}: pointer to newly allocated output string encoded in the
1306 current locale's character set.
1307
1308 @var{flags}: IDNA flags, e.g. IDNA_ALLOW_UNASSIGNED.
1309
1310 Convert possibly ACE encoded domain name in UTF-8 format into a string
1311 encoded in the current locale's character set.  The domain name may
1312 contain several labels, separated by dots.  The output buffer must be
1313 deallocated by the caller.
1314
1315 Returns IDNA_SUCCESS on success, or error code.
1316 @end deftypefun
1317
1318 @deftypefun {enum Idna_rc} idna_to_unicode_lzlz (const char * @var{input}, char ** @var{output}, {enum Idna_flags} @var{flags})
1319
1320 @var{input}: zero-terminated string encoded in the current locale's
1321 character set.
1322
1323 @var{output}: pointer to newly allocated output string encoded in the
1324 current locale's character set.
1325
1326 @var{flags}: IDNA flags, e.g. IDNA_ALLOW_UNASSIGNED.
1327
1328 Convert possibly ACE encoded domain name in the locale's character set
1329 into a string encoded in the current locale's character set.  The
1330 domain name may contain several labels, separated by dots.  The output
1331 buffer must be deallocated by the caller.
1332
1333 Returns IDNA_SUCCESS on success, or error code.
1334 @end deftypefun
1335
1336
1337 @c **********************************************************
1338 @c ***********************  Examples  ***********************
1339 @c **********************************************************
1340 @node Examples
1341 @chapter Examples
1342 @cindex Examples
1343
1344 This chapter contains example code which illustrate how `Libidn' can
1345 be used when writing your own application.
1346
1347 @menu
1348 * Example 1::           Example using stringprep.
1349 * Example 2::           Example using punycode.
1350 * Example 3::           Example using IDNA ToASCII.
1351 * Example 4::           Example using IDNA ToUnicode.
1352 @end menu
1353
1354 @node Example 1
1355 @section Example 1
1356
1357 This example demonstrates how the stringprep functions are used.
1358
1359 @example
1360 @include example.c.texi
1361 @end example
1362
1363
1364 @node Example 2
1365 @section Example 2
1366
1367 This example demonstrates how the punycode functions are used.
1368
1369 @example
1370 @include example2.c.texi
1371 @end example
1372
1373
1374 @node Example 3
1375 @section Example 3
1376
1377 This example demonstrates how the library is used to convert
1378 internationalized domain names into ASCII compatible names.
1379
1380 @example
1381 @include example3.c.texi
1382 @end example
1383
1384
1385 @node Example 4
1386 @section Example 4
1387
1388 This example demonstrates how the library is used to convert ASCII
1389 compatible names to internationalized domain names.
1390
1391 @example
1392 @include example4.c.texi
1393 @end example
1394
1395 @c **********************************************************
1396 @c *********************  Invoking idn  *********************
1397 @c **********************************************************
1398 @node Invoking idn
1399 @chapter Invoking idn
1400
1401 @pindex idn
1402 @cindex invoking @command{idn}
1403 @cindex command line
1404
1405 @majorheading Name
1406
1407 GNU Libidn (idn) -- Internationalized Domain Names command line tool
1408
1409 @majorheading Description
1410 @code{idn} is a utility part of GNU Libidn.  It allows preparation of
1411 strings, encoding and decoding of punycode data, and IDNA
1412 ToASCII/ToUnicode operations to be performed on the command line,
1413 without the need to write a program that uses libidn.
1414
1415 Data is read, line by line, from the standard input, and one of the
1416 operations indicated by command parameters are performed and the
1417 output is printed to standard output.  If any errors are encountered,
1418 the execution of the applications is aborted.
1419
1420 @majorheading Options
1421 @code{idn} recognizes these commands:
1422
1423 @verbatim
1424        -h  --help
1425               Print help and exit
1426
1427        -V  --version
1428               Print version and exit
1429
1430        -s --stringprep
1431               Prepare string according to nameprep profile
1432
1433        -e  --punycode-encode
1434               Encode UTF-8 to Punycode
1435
1436        -d  --punycode-decode
1437               Decode Punycode to UTF-8
1438
1439        -a  --idna-to-ascii
1440               Convert UTF-8 to ACE according to IDNA
1441
1442        -u  --idna-to-unicode
1443               Convert ACE to UTF-8 according to IDNA
1444
1445        --allow-unassigned
1446               Toggle IDNA AllowUnassigned flag (default=off)
1447
1448        --usestd3asciirules
1449               Toggle IDNA UseSTD3ASCIIRules flag (default=off)
1450
1451        -pSTRING   --profile=STRING
1452               Use specified stringprep profile instead
1453
1454               Valid stringprep profiles are 'generic', 'Nameprep',
1455               'KRBprep', 'Nodeprep', 'Resourceprep', 'plain',
1456               'SASLprep', and 'ISCSIprep'.
1457
1458        --debug
1459               Print debugging information (default=off)
1460
1461        --quiet
1462               Don't print the welcome greeting (default=off)
1463 @end verbatim
1464
1465 @majorheading Environment Variables
1466
1467 The @var{CHARSET} environment variable can be used to override what
1468 character set to be used for decoding incoming data on the standard
1469 input, and to encode data to the standard output.  If your system is
1470 set up correctly, the application will guess which character set is
1471 used automatically.  Example usage:
1472
1473 @verbatim
1474 $ CHARSET=ISO-8859-1 idn --punycode-encode
1475 ...
1476 @end verbatim
1477
1478 @node Emacs API
1479 @chapter Emacs API
1480
1481 Included in Libidn are @file{punycode.el} and @file{idna.el} that
1482 provides an Emacs Lisp API to (a limited set of) the Libidn API.  This
1483 section describes the API.
1484
1485 @defvar punycode-program
1486 Name of the GNU Libidn @file{idn} application.  The default is
1487 @samp{idn}.  This variable can be customized.
1488 @end defvar
1489
1490 @defvar punycode-environment
1491 List of environment variable definitions prepended to
1492 @samp{process-environment}.  The default is @samp{("CHARSET=UTF-8")}.
1493 This variable can be customized.
1494 @end defvar
1495
1496 @defvar punycode-encode-parameters
1497 List of parameters passed to @var{punycode-program} to invoke punycode
1498 encoding mode.  The default is @samp{("--quiet" "--punycode-encode")}.
1499 This variable can be customized.
1500 @end defvar
1501
1502 @defvar punycode-decode-parameters
1503 Parameters passed to @var{punycode-program} to invoke punycode
1504 decoding mode.  The default is @samp{("--quiet" "--punycode-decode")}.
1505 This variable can be customized.
1506 @end defvar
1507
1508 @defun punycode-encode string
1509 Returns a Punycode encoding of the @var{string}, after converting the
1510 input into UTF-8.
1511 @end defun
1512
1513 @defun punycode-decode string
1514 Returns a possibly multibyte string which is the decoding of the
1515 @var{string} which is a punycode encoded string.
1516 @end defun
1517
1518 @defvar idna-program
1519 Name of the GNU Libidn @file{idn} application.  The default is
1520 @samp{idn}.  This variable can be customized.
1521 @end defvar
1522
1523 @defvar idna-environment
1524 List of environment variable definitions prepended to
1525 @samp{process-environment}.  The default is @samp{("CHARSET=UTF-8")}.
1526 This variable can be customized.
1527 @end defvar
1528
1529 @defvar idna-to-ascii-parameters
1530 List of parameters passed to @var{idna-program} to invoke IDNA ToASCII
1531 mode.  The default is @samp{("--quiet" "--idna-to-ascii")}.  This
1532 variable can be customized.
1533 @end defvar
1534
1535 @defvar idna-to-unicode-parameters
1536 Parameters passed @var{idna-program} to invoke IDNA ToUnicode mode.
1537 The default is @samp{("--quiet" "--idna-to-unicode")}.  This variable
1538 can be customized.
1539 @end defvar
1540
1541 @defun idna-to-ascii string
1542 Returns an ASCII Compatible Encoding (ACE) of the string computed by
1543 the IDNA ToASCII operation on the input @var{string}, after converting
1544 the input to UTF-8.
1545 @end defun
1546
1547 @defun idna-to-unicode string
1548 Returns a possibly multibyte string which is the output of the IDNA
1549 ToUnicode operation computed on the input @var{string}.
1550 @end defun
1551
1552 @c **********************************************************
1553 @c *******************  Acknowledgements  *******************
1554 @c **********************************************************
1555 @node Acknowledgements
1556 @chapter Acknowledgements
1557
1558 The punycode code was taken from the IETF IDN Punycode specification,
1559 by Adam M. Costello.
1560
1561 Some functions (see nfkc.c and toutf8.c) has been borrowed from GLib
1562 downloaded from www.gtk.org.
1563
1564 Several people reported bugs, sent patches or suggested improvements,
1565 see the file THANKS.
1566
1567 @node Concept Index
1568 @unnumbered Concept Index
1569
1570 @printindex cp
1571
1572 @node Function and Variable Index
1573 @unnumbered Function and Variable Index
1574
1575 @printindex fn
1576
1577 @include lgpl.texi
1578
1579 @node Copying This Manual
1580 @appendix Copying This Manual
1581
1582 @menu
1583 * GNU Free Documentation License::  License for copying this manual.
1584 @end menu
1585
1586 @include fdl.texi
1587
1588 @bye