doc/libidn.texi

   1 \input texinfo   @c -*- mode: texinfo; coding: us-ascii; -*-
   2 @c This file is part of GNU Libidn.
   3 @c See below for copyright and license.
   4
   5 @setfilename libidn.info
   6 @include version.texi
   7 @settitle GNU Libidn
   8 @finalout
   9
  10 @syncodeindex pg cp
  11
  12 @copying
  13 This manual is last updated @value{UPDATED} for version
  14 @value{VERSION} of GNU Libidn.
  15
  16 Copyright @copyright{} 2002, 2003, 2004, 2005, 2006 Simon Josefsson.
  17
  18 @quotation
  19 Permission is granted to copy, distribute and/or modify this document
  20 under the terms of the GNU Free Documentation License, Version 1.2 or
  21 any later version published by the Free Software Foundation; with the
  22 Invariant Sections being ``Commercial Support'', no Front-Cover Texts,
  23 and no Back-Cover Texts.  A copy of the license is included in the
  24 section entitled ``GNU Free Documentation License''.
  25 @end quotation
  26 @end copying
  27
  28 @dircategory GNU Libraries
  29 @direntry
  30 * libidn: (libidn).     Internationalized string processing library.
  31 @end direntry
  32
  33 @dircategory GNU utilities
  34 @direntry
  35 * idn: (libidn)Invoking idn.            Command line interface to GNU Libidn.
  36 @end direntry
  37
  38 @dircategory Emacs
  39 @direntry
  40 * IDN Library: (libidn)Emacs API.       Emacs API for IDN functions.
  41 @end direntry
  42
  43 @titlepage
  44 @title GNU Libidn
  45 @subtitle Internationalized string processing for the GNU system
  46 @subtitle for version @value{VERSION}, @value{UPDATED}
  47 @author Simon Josefsson
  48 @page
  49 @vskip 0pt plus 1filll
  50 @insertcopying
  51 @end titlepage
  52
  53 @contents
  54
  55 @ifnottex
  56 @node Top
  57 @top GNU Libidn
  58
  59 @insertcopying
  60 @end ifnottex
  61
  62 @menu
  63 * Introduction::                How to use this manual.
  64 * Preparation::                 What you should do before using the library.
  65 * Utility Functions::           Unicode transformation utility functions.
  66 * Stringprep Functions::        Stringprep functions.
  67 * Punycode Functions::          Punycode functions.
  68 * IDNA Functions::              IDNA functions.
  69 * TLD Functions::               TLD functions.
  70 * PR29 Functions::              Detect strings non-idempotent under NFKC.
  71 * Examples::                    Demonstrate how to use the library.
  72 * Invoking idn::                Command line interface to the library.
  73 * Emacs API::                   Emacs Lisp API for Libidn.
  74 * Java API::                    Notes on the Java port of Libidn.
  75 * C# API::                      Notes on the C# port of Libidn.
  76 * Acknowledgements::            Whom to blame.
  77 * History::                     Rough outline of development history.
  78
  79 Appendices
  80
  81 * PR29 discussion::             Implementation aspects of the PR29 flaw.
  82 * Copying Information::         License text covering the Libidn library.
  83
  84 Indices
  85
  86 * Function and Variable Index::
  87 * Concept Index::
  88
  89 @end menu
  90
  91
  92 @node Introduction
  93 @chapter Introduction
  94
  95 GNU Libidn is an implementation of the Stringprep, Punycode and IDNA
  96 specifications defined by the IETF Internationalized Domain Names
  97 (IDN) working group, used for internationalized domain names.  The C
  98 library is available under the GNU Lesser General Public License
  99 (@pxref{GNU LGPL}).
 100
 101 The library contains a generic Stringprep implementation that does
 102 Unicode 3.2 NFKC normalization, mapping and prohibitation of
 103 characters, and bidirectional character handling.  Profiles for
 104 Nameprep, iSCSI, SASL and XMPP are included.  Punycode and ASCII
 105 Compatible Encoding (ACE) via IDNA are supported.  A mechanism to
 106 define Top-Level Domain (TLD) specific validation tables, and to
 107 compare strings against those tables, is included.  Default tables for
 108 some TLDs are also included.
 109
 110 The Stringprep API consists of two main functions, one for converting
 111 data from the system's native representation into UTF-8, and one
 112 function to perform the Stringprep processing.  Adding a new
 113 Stringprep profile for your application within the API is
 114 straightforward.  The Punycode API consists of one encoding function
 115 and one decoding function.  The IDNA API consists of the ToASCII and
 116 ToUnicode functions, as well as an high-level interface for converting
 117 entire domain names to and from the ACE encoded form.  The TLD API
 118 consists of one set of functions to extract the TLD name from a domain
 119 string, one set of functions to locate the proper TLD table to use
 120 based on the TLD name, and core functions to validate a string against
 121 a TLD table, and some utility wrappers to perform all the steps in one
 122 call.
 123
 124 The library is used by, e.g., GNU SASL and Shishi to process user
 125 names and passwords.  Libidn can be built into GNU Libc to enable a
 126 new system-wide getaddrinfo flag for IDN processing.
 127
 128 Libidn is developed for the GNU/Linux system, but runs on over 20 Unix
 129 platforms (including Solaris, IRIX, AIX, and Tru64) and Windows.
 130 Libidn is written in C and (parts of) the API is accessible from C,
 131 C++, Emacs Lisp, Python and Java.  A native Java and C# port is also
 132 provided, licensed under the GNU General Public License (@pxref{GNU
 133 GPL}).
 134
 135 @menu
 136 * Getting Started::
 137 * Features::
 138 * Library Overview::
 139 * Supported Platforms::
 140 * Getting help::
 141 * Commercial Support::
 142 * Downloading and Installing::
 143 * Bug Reports::
 144 * Contributing::
 145 @end menu
 146
 147 @node Getting Started
 148 @section Getting Started
 149
 150 This manual documents the library programming interface.  All
 151 functions and data types provided by the library are explained.
 152 Included are also examples, and documentation for the command line
 153 tool @file{idn} that provide a quick interface to the library.  The
 154 Emacs Lisp bindings for the library is also discussed.
 155
 156 The reader is assumed to possess basic familiarity with
 157 internationalization concepts and network programming in C or C++.
 158
 159 This manual can be used in several ways.  If read from the beginning
 160 to the end, it gives a good introduction into the library and how it
 161 can be used in an application.  Forward references are included where
 162 necessary.  Later on, the manual can be used as a reference manual to
 163 get just the information needed about any particular interface of the
 164 library.  Experienced programmers might want to start looking at the
 165 examples at the end of the manual (@pxref{Examples}), and then only
 166 read up those parts of the interface which are unclear.
 167
 168 @node Features
 169 @section Features
 170
 171 This library might have a couple of advantages over other libraries
 172 doing a similar job.
 173
 174 @table @asis
 175 @item It's Free Software
 176 Anybody can use, modify, and redistribute it under the terms of the
 177 GNU Lesser General Public License (@pxref{GNU LGPL}).
 178
 179 @item It's thread-safe
 180 No global state is kept in the library.  All functions are reentrant.
 181
 182 @item It's portable
 183 The code is intended to be written in pure ANSI C89.  It has been
 184 tested on many Unix like operating systems, and Windows.
 185
 186 @item It's modularized
 187 The library is composed of several modules, and the only interaction
 188 between modules is through each modules' public API.  If you only need
 189 one piece of functionality, it is possible to take the files you need
 190 and incorporate them into your own project.
 191
 192 @item It's not bloated
 193 The design of the library is based on the smallest API necessary to
 194 implement the basic functionality.  It has been carefully extended
 195 with a small number of high-level wrappers to make it comfortable to
 196 use the library.  However, it does not implement additional
 197 functionality just for the sake of completeness.
 198
 199 @item It's documented
 200 Sadly, not all software comes with documentation these days.  This one
 201 does.
 202
 203 @end table
 204
 205 @node Library Overview
 206 @section Library Overview
 207
 208 The following illustration show the components that make up Libidn,
 209 and how your application relates to the library.  In the illustration,
 210 various components are shown as boxes.  You see the generic StringPrep
 211 component, the various StringPrep profiles including Nameprep, the
 212 Punycode component, the IDNA component, and the TLD component.  The
 213 arrows indicate aggregation, e.g., IDNA uses Punycode and Nameprep,
 214 and in turn Nameprep uses the generic StringPrep interface.  The
 215 interfaces to all components are available for applications, no
 216 component within the library is hidden from the application.
 217
 218 @image{components}
 219
 220 @node Supported Platforms
 221 @section Supported Platforms
 222
 223 Libidn has at some point in time been tested on the following
 224 platforms.
 225
 226 @enumerate
 227
 228 @item Debian GNU/Linux 3.0 (Woody)
 229 @cindex Debian
 230
 231 GCC 2.95.4 and GNU Make. This is the main development platform.
 232 @code{alphaev67-unknown-linux-gnu}, @code{alphaev6-unknown-linux-gnu},
 233 @code{arm-unknown-linux-gnu}, @code{armv4l-unknown-linux-gnu},
 234 @code{hppa-unknown-linux-gnu}, @code{hppa64-unknown-linux-gnu},
 235 @code{i686-pc-linux-gnu}, @code{ia64-unknown-linux-gnu},
 236 @code{m68k-unknown-linux-gnu}, @code{mips-unknown-linux-gnu},
 237 @code{mipsel-unknown-linux-gnu}, @code{powerpc-unknown-linux-gnu},
 238 @code{s390-ibm-linux-gnu}, @code{sparc-unknown-linux-gnu},
 239 @code{sparc64-unknown-linux-gnu}.
 240
 241 @item Debian GNU/Linux 2.1
 242 @cindex Debian
 243
 244 GCC 2.95.1 and GNU Make. @code{armv4l-unknown-linux-gnu}.
 245
 246 @item Tru64 UNIX
 247 @cindex Tru64
 248
 249 Tru64 UNIX C compiler and Tru64 Make. @code{alphaev67-dec-osf5.1},
 250 @code{alphaev68-dec-osf5.1}.
 251
 252 @item SuSE Linux 7.1
 253 @cindex SuSE
 254
 255 GCC 2.96 and GNU Make. @code{alphaev6-unknown-linux-gnu},
 256 @code{alphaev67-unknown-linux-gnu}.
 257
 258 @item SuSE Linux 7.2a
 259 @cindex SuSE Linux
 260
 261 GCC 3.0 and GNU Make. @code{ia64-unknown-linux-gnu}.
 262
 263 @item SuSE Linux
 264 @cindex SuSE Linux
 265
 266 GCC 3.2.2 and GNU Make.  @code{x86_64-unknown-linux-gnu} (AMD64
 267 Opteron ``Melody'').
 268
 269 @item SuSE Enterprise Server 9 on IBM OpenPower 720
 270 @cindex SuSE Linux
 271 @cindex OpenPower 720
 272
 273 GCC 3.3.3 and GNU Make.  @code{powerpc64-unknown-linux-gnu}.
 274
 275 @item RedHat Linux 7.2
 276 @cindex RedHat
 277
 278 GCC 2.96 and GNU Make. @code{alphaev6-unknown-linux-gnu},
 279 @code{alphaev67-unknown-linux-gnu}, @code{ia64-unknown-linux-gnu}.
 280
 281 @item RedHat Linux 8.0
 282 @cindex RedHat
 283
 284 GCC 3.2 and GNU Make. @code{i686-pc-linux-gnu}.
 285
 286 @item RedHat Advanced Server 2.1
 287 @cindex RedHat Advanced Server
 288
 289 GCC 2.96 and GNU Make. @code{i686-pc-linux-gnu}.
 290
 291 @item Slackware Linux 8.0.01
 292 @cindex RedHat
 293
 294 GCC 2.95.3 and GNU Make. @code{i686-pc-linux-gnu}.
 295
 296 @item Mandrake Linux 9.0
 297 @cindex Mandrake
 298
 299 GCC 3.2 and GNU Make. @code{i686-pc-linux-gnu}.
 300
 301 @item IRIX 6.5
 302 @cindex IRIX
 303
 304 MIPS C compiler, IRIX Make. @code{mips-sgi-irix6.5}.
 305
 306 @item AIX 4.3.2
 307 @cindex AIX
 308
 309 IBM C for AIX compiler, AIX Make.  @code{rs6000-ibm-aix4.3.2.0}.
 310
 311 @item Microsoft Windows 2000 (Cygwin)
 312 @cindex Windows
 313
 314 GCC 3.2, GNU make. @code{i686-pc-cygwin}.
 315
 316 @item HP-UX 11
 317 @cindex HP-UX
 318
 319 HP-UX C compiler and HP Make. @code{ia64-hp-hpux11.22},
 320 @code{hppa2.0w-hp-hpux11.11}.
 321
 322 @item SUN Solaris 2.7
 323 @cindex Solaris
 324
 325 GCC 3.0.4 and GNU Make. @code{sparc-sun-solaris2.7}.
 326
 327 @item SUN Solaris 2.8
 328 @cindex Solaris
 329
 330 Sun WorkShop Compiler C 6.0 and SUN Make. @code{sparc-sun-solaris2.8}.
 331
 332 @item SUN Solaris 2.9
 333 @cindex Solaris
 334
 335 Sun Forte Developer 7 C compiler and GNU
 336 Make. @code{sparc-sun-solaris2.9}.
 337
 338 @item NetBSD 1.6
 339 @cindex NetBSD
 340
 341 GCC 2.95.3 and GNU Make. @code{alpha-unknown-netbsd1.6},
 342 @code{i386-unknown-netbsdelf1.6}.
 343
 344 @item OpenBSD 3.1 and 3.2
 345 @cindex OpenBSD
 346
 347 GCC 2.95.3 and GNU Make. @code{alpha-unknown-openbsd3.1},
 348 @code{i386-unknown-openbsd3.1}.
 349
 350 @item FreeBSD 4.7 and 4.8
 351 @cindex FreeBSD
 352
 353 GCC 2.95.4 and GNU Make. @code{alpha-unknown-freebsd4.7},
 354 @code{alpha-unknown-freebsd4.8}, @code{i386-unknown-freebsd4.7},
 355 @code{i386-unknown-freebsd4.8}.
 356
 357 @item MacOS X 10.2 Server Edition
 358 @cindex MacOS X
 359
 360 GCC 3.1 and GNU Make. @code{powerpc-apple-darwin6.5}.
 361
 362 @item MacOS X 10.4 ``Tiger'' with Xcode 2.0
 363 @cindex MacOS X
 364
 365 GCC 4.0 and GNU Make. @code{powerpc-apple-darwin8.0}.
 366
 367 @item Cross compiled to uClinux/uClibc on Motorola Coldfire
 368 @cindex Motorola Coldfire
 369 @cindex uClinux
 370 @cindex uClibc
 371
 372 GCC 3.4 and GNU Make @code{m68k-uclinux-elf}.
 373
 374 @item Cross compiled to ARM using Glibc
 375 @cindex ARM
 376
 377 GCC 2.95 and GNU Make @code{arm-linux}.
 378
 379 @item Cross compiled to Mingw32.
 380 @cindex Windows
 381 @cindex Microsoft
 382 @cindex mingw32
 383
 384 GCC 3.4.4 and GNU Make @code{i586-mingw32msvc}.
 385
 386 @end enumerate
 387
 388 If you use Libidn on, or port Libidn to, a new platform please report
 389 it to the author.
 390
 391 @node Getting help
 392 @section Getting help
 393
 394 A mailing list where users of Libidn may help each other exists, and
 395 you can reach it by sending e-mail to @email{help-libidn@@gnu.org}.
 396 Archives of the mailing list discussions, and an interface to manage
 397 subscriptions, is available through the World Wide Web at
 398 @url{http://lists.gnu.org/mailman/listinfo/help-libidn}.
 399
 400 @node Commercial Support
 401 @section Commercial Support
 402
 403 Commercial support is available for users of GNU Libidn.  The kind of
 404 support that can be purchased may include:
 405
 406 @itemize
 407
 408 @item Implement new features.
 409 Such as country code specific profiling to support a restricted subset
 410 of Unicode.
 411
 412 @item Port Libidn to new platforms.
 413 This could include porting Libidn to an embedded platforms that may
 414 need memory or size optimization.
 415
 416 @item Integrating IDN support in your existing project.
 417
 418 @item System design of components related to IDN.
 419
 420 @end itemize
 421
 422 If you are interested, please write to:
 423
 424 @verbatim
 425 Simon Josefsson Datakonsult
 426 Hagagatan 24
 427 113 47 Stockholm
 428 Sweden
 429
 430 E-mail: simon@josefsson.org
 431 @end verbatim
 432
 433 If your company provide support related to GNU Libidn and would like
 434 to be mentioned here, contact the author (@pxref{Bug Reports}).
 435
 436 @node Downloading and Installing
 437 @section Downloading and Installing
 438 @cindex Installation
 439 @cindex Download
 440
 441 The package can be downloaded from several places, including:
 442
 443 @url{http://josefsson.org/libidn/releases/}
 444
 445 The latest version is stored in a file, e.g.,
 446 @samp{gsasl-@value{VERSION}.tar.gz} where the @samp{@value{VERSION}}
 447 value is the highest version number in the directory.
 448
 449 The package is then extracted, configured and built like many other
 450 packages that use Autoconf.  For detailed information on configuring
 451 and building it, refer to the @file{INSTALL} file that is part of the
 452 distribution archive.
 453
 454 Here is an example terminal session that download, configure, build
 455 and install the package.  You will need a few basic tools, such as
 456 @samp{sh}, @samp{make} and @samp{cc}.
 457
 458 @example
 459 $ wget -q http://josefsson.org/libidn/releases/libidn-@value{VERSION}.tar.gz
 460 $ tar xfz libidn-@value{VERSION}.tar.gz
 461 $ cd libidn-@value{VERSION}/
 462 $ ./configure
 463 ...
 464 $ make
 465 ...
 466 $ make install
 467 ...
 468 @end example
 469
 470 After that Libidn should be properly installed and ready for use.
 471
 472 A few @code{configure} options may be relevant, summarized in the
 473 table.
 474
 475 @table @code
 476
 477 @item --enable-java
 478 Build the Java port into a *.JAR file.  @xref{Java API}, for more
 479 information.
 480
 481 @item --disable-tld
 482 Disable the TLD module.  This would typically only be useful if you
 483 are building on a memory restricted platforms.  @xref{TLD Functions},
 484 for more information.
 485
 486 @item --enable-csharp[=IMPL]
 487 Build the C3 port into a *.DLL file.  @xref{C# API}, for more
 488 information.  Here, @code{IMPL} is @code{pnet} or @code{mono},
 489 indicating whether the PNET @command{cscc} compiler or the Mono
 490 @command{mcs} compiler should be used, respectively.
 491
 492 @end table
 493
 494 For the complete list, refer to the output from @code{configure
 495 --help}.
 496
 497 @node Bug Reports
 498 @section Bug Reports
 499 @cindex Reporting Bugs
 500
 501 If you think you have found a bug in Libidn, please investigate it and
 502 report it.
 503
 504 @itemize @bullet
 505
 506 @item Please make sure that the bug is really in Libidn, and
 507 preferably also check that it hasn't already been fixed in the latest
 508 version.
 509
 510 @item You have to send us a test case that makes it possible for us to
 511 reproduce the bug.
 512
 513 @item You also have to explain what is wrong; if you get a crash, or
 514 if the results printed are not good and in that case, in what way.
 515 Make sure that the bug report includes all information you would need
 516 to fix this kind of bug for someone else.
 517
 518 @end itemize
 519
 520 Please make an effort to produce a self-contained report, with
 521 something definite that can be tested or debugged.  Vague queries or
 522 piecemeal messages are difficult to act on and don't help the
 523 development effort.
 524
 525 If your bug report is good, we will do our best to help you to get a
 526 corrected version of the software; if the bug report is poor, we won't
 527 do anything about it (apart from asking you to send better bug
 528 reports).
 529
 530 If you think something in this manual is unclear, or downright
 531 incorrect, or if the language needs to be improved, please also send a
 532 note.
 533
 534 Send your bug report to:
 535
 536 @center @samp{bug-libidn@@gnu.org}
 537
 538
 539 @node Contributing
 540 @section Contributing
 541 @cindex Contributing
 542 @cindex Hacking
 543
 544 If you want to submit a patch for inclusion -- from solve a typo you
 545 discovered, up to adding support for a new feature -- you should
 546 submit it as a bug report (@pxref{Bug Reports}).  There are some
 547 things that you can do to increase the chances for it to be included
 548 in the official package.
 549
 550 Unless your patch is very small (say, under 10 lines) we require that
 551 you assign the copyright of your work to the Free Software Foundation.
 552 This is to protect the freedom of the project.  If you have not
 553 already signed papers, we will send you the necessary information when
 554 you submit your contribution.
 555
 556 For contributions that doesn't consist of actual programming code, the
 557 only guidelines are common sense.  Use it.
 558
 559 For code contributions, a number of style guides will help you:
 560
 561 @itemize @bullet
 562
 563 @item Coding Style.
 564 Follow the GNU Standards document (@pxref{top, GNU Coding Standards,,
 565 standards}).
 566
 567 If you normally code using another coding standard, there is no
 568 problem, but you should use @samp{indent} to reformat the code
 569 (@pxref{top, GNU Indent,, indent}) before submitting your work.
 570
 571 @item Use the unified diff format @samp{diff -u}.
 572
 573 @item Return errors.
 574 No reason whatsoever should abort the execution of the library.  Even
 575 memory allocation errors, e.g. when malloc return NULL, should work
 576 although result in an error code.
 577
 578 @item Design with thread safety in mind.
 579 Don't use global variables and the like.
 580
 581 @item Avoid using the C math library.
 582 It causes problems for embedded implementations, and in most
 583 situations it is very easy to avoid using it.
 584
 585 @item Document your functions.
 586 Use comments before each function headers, that, if properly
 587 formatted, are extracted into GTK-DOC web pages.  Don't forget to
 588 update the Texinfo manual as well.
 589
 590 @item Supply a ChangeLog and NEWS entries, where appropriate.
 591
 592 @end itemize
 593
 594 @c **********************************************************
 595 @c *******************  Preparation  ************************
 596 @c **********************************************************
 597 @node Preparation
 598 @chapter Preparation
 599
 600 To use `Libidn', you have to perform some changes to your sources and
 601 the build system.  The necessary changes are small and explained in
 602 the following sections.  At the end of this chapter, it is described
 603 how the library is initialized, and how the requirements of the
 604 library are verified.
 605
 606 A faster way to find out how to adapt your application for use with
 607 `Libidn' may be to look at the examples at the end of this manual
 608 (@pxref{Examples}).
 609
 610 @menu
 611 * Header::
 612 * Initialization::
 613 * Version Check::
 614 * Building the source::
 615 * Autoconf tests::
 616 @end menu
 617
 618 @node Header
 619 @section Header
 620
 621 The library contains a few independent parts, and each part export the
 622 interfaces (data types and functions) in a header file.  You must
 623 include the appropriate header files in all programs using the
 624 library, either directly or through some other header file, like this:
 625
 626 @example
 627 #include <stringprep.h>
 628 @end example
 629
 630 The header files and the functions they define are categorized as
 631 follows:
 632
 633 @table @asis
 634 @item stringprep.h
 635
 636 The low-level stringprep API entry point.  For IDN applications, this
 637 is usually invoked via IDNA. Some applications, specifically non-IDN
 638 ones, may want to prepare strings directly though, and should include
 639 this header file.
 640
 641 The name space of the stringprep part of Libidn is @code{stringprep*}
 642 for function names, @code{Stringprep*} for data types and
 643 @code{STRINGPREP_*} for other symbols.  In addition,
 644 @code{_stringprep*} is reserved for internal use and should never be
 645 used by applications.
 646
 647 @item punycode.h
 648
 649 The entry point to Punycode encoding and decoding functions.  Normally
 650 punycode is used via the idna.h interface, but some application may
 651 want to perform raw punycode operations.
 652
 653 The name space of the punycode part of Libidn is @code{punycode_*} for
 654 function names, @code{Punycode*} for data types and @code{PUNYCODE_*}
 655 for other symbols.  In addition, @code{_punycode*} is reserved for
 656 internal use and should never be used by applications.
 657 @item idna.h
 658
 659 The entry point to the IDNA functions.  This is the normal entry point
 660 for applications that need IDN functionality.
 661
 662 The name space of the IDNA part of Libidn is @code{idna_*} for
 663 function names, @code{Idna*} for data types and @code{IDNA_*} for
 664 other symbols.  In addition, @code{_idna*} is reserved for internal
 665 use and should never be used by applications.
 666
 667 @item tld.h
 668
 669 The entry point to the TLD functions.  Normal applications are not
 670 expected to need this functionality, but it is present for
 671 applications that are used by TLDs to validate customer input.
 672
 673 The name space of the TLD part of Libidn is @code{tld_*} for function
 674 names, @code{Tld_*} for data types and @code{TLD_*} for other symbols.
 675 In addition, @code{_tld*} is reserved for internal use and should
 676 never be used by applications.
 677
 678 @item pr29.h
 679
 680 The entry point to the PR29 functions.  These functions are used to
 681 detect ``problem sequences'' (@pxref{PR29 Functions}), mostly for use
 682 in security critical applications.
 683
 684 The name space of the PR29 part of Libidn is @code{pr29_*} for
 685 function names, @code{Pr29_*} for data types and @code{PR29_*} for
 686 other symbols.  In addition, @code{_pr29*} is reserved for internal
 687 use and should never be used by applications.
 688
 689 @end table
 690
 691 @node Initialization
 692 @section Initialization
 693
 694 Libidn is stateless and does not need any initialization.
 695
 696 @node Version Check
 697 @section Version Check
 698
 699 It is often desirable to check that the version of `Libidn' used is
 700 indeed one which fits all requirements.  Even with binary
 701 compatibility new features may have been introduced but due to problem
 702 with the dynamic linker an old version is actually used.  So you may
 703 want to check that the version is okay right after program startup.
 704
 705 @include texi/stringprep_check_version.texi
 706
 707 The normal way to use the function is to put something similar to the
 708 following first in your @code{main}:
 709
 710 @example
 711   if (!stringprep_check_version (STRINGPREP_VERSION))
 712     @{
 713       printf ("stringprep_check_version() failed:\n"
 714               "Header file incompatible with shared library.\n");
 715       exit(1);
 716     @}
 717 @end example
 718
 719 @node Building the source
 720 @section Building the source
 721 @cindex Compiling your application
 722
 723 If you want to compile a source file including e.g. the `idna.h' header
 724 file, you must make sure that the compiler can find it in the
 725 directory hierarchy.  This is accomplished by adding the path to the
 726 directory in which the header file is located to the compilers include
 727 file search path (via the @option{-I} option).
 728
 729 However, the path to the include file is determined at the time the
 730 source is configured.  To solve this problem, `Libidn' uses the
 731 external package @command{pkg-config} that knows the path to the
 732 include file and other configuration options.  The options that need
 733 to be added to the compiler invocation at compile time are output by
 734 the @option{--cflags} option to @command{pkg-config libidn}.  The
 735 following example shows how it can be used at the command line:
 736
 737 @example
 738 gcc -c foo.c `pkg-config libidn --cflags`
 739 @end example
 740
 741 Adding the output of @samp{pkg-config libidn --cflags} to the
 742 compilers command line will ensure that the compiler can find e.g. the
 743 idna.h header file.
 744
 745 A similar problem occurs when linking the program with the library.
 746 Again, the compiler has to find the library files.  For this to work,
 747 the path to the library files has to be added to the library search
 748 path (via the @option{-L} option).  For this, the option
 749 @option{--libs} to @command{pkg-config libidn} can be used.  For
 750 convenience, this option also outputs all other options that are
 751 required to link the program with the `libidn' libarary.  The example
 752 shows how to link @file{foo.o} with the `libidn' library to a program
 753 @command{foo}.
 754
 755 @example
 756 gcc -o foo foo.o `pkg-config libidn --libs`
 757 @end example
 758
 759 Of course you can also combine both examples to a single command by
 760 specifying both options to @command{pkg-config}:
 761
 762 @example
 763 gcc -o foo foo.c `pkg-config libidn --cflags --libs`
 764 @end example
 765
 766 @node Autoconf tests
 767 @section Autoconf tests
 768 @cindex Autoconf tests
 769 @cindex Configure tests
 770
 771 If your project uses Autoconf (@pxref{top, GNU Autoconf,, autoconf})
 772 to check for installed libraries, you might find the following snippet
 773 illustrative.  It add a new @file{configure} parameter
 774 @code{--with-libidn}, and check for @file{idna.h} and @samp{-lidn}
 775 (possibly below the directory specified as the optional argument to
 776 @code{--with-libidn}), and define the @acronym{CPP} symbol
 777 @code{LIBIDN} if the library is found.  The default behaviour is to
 778 search for the library and enable the functionality (that is, define
 779 the symbol) when the library is found, but if you wish to make the
 780 default behaviour of your package be that Libidn is not used (even if
 781 it is installed on the system), change @samp{libidn=yes} to
 782 @samp{libidn=no} on the third line.
 783
 784 @example
 785 AC_ARG_WITH(libidn, AC_HELP_STRING([--with-libidn=[DIR]],
 786                                 [Support IDN (needs GNU Libidn)]),
 787   libidn=$withval, libidn=yes)
 788 if test "$libidn" != "no"; then
 789   if test "$libidn" != "yes"; then
 790     LDFLAGS="$@{LDFLAGS@} -L$libidn/lib"
 791     CPPFLAGS="$@{CPPFLAGS@} -I$libidn/include"
 792   fi
 793   AC_CHECK_HEADER(idna.h,
 794     AC_CHECK_LIB(idn, stringprep_check_version,
 795       [libidn=yes LIBS="$@{LIBS@} -lidn"], libidn=no),
 796     libidn=no)
 797 fi
 798 if test "$libidn" != "no" ; then
 799   AC_DEFINE(LIBIDN, 1, [Define to 1 if you want IDN support.])
 800 else
 801   AC_MSG_WARN([Libidn not found])
 802 fi
 803 AC_MSG_CHECKING([if Libidn should be used])
 804 AC_MSG_RESULT($libidn)
 805 @end example
 806
 807 If you require that your users have installed @code{pkg-config} (which
 808 I cannot recommend generally), the above can be done more easily as
 809 follows.
 810
 811 @example
 812 AC_ARG_WITH(libidn, AC_HELP_STRING([--with-libidn=[DIR]],
 813                                 [Support IDN (needs GNU Libidn)]),
 814   libidn=$withval, libidn=yes)
 815 if test "$libidn" != "no" ; then
 816   PKG_CHECK_MODULES(LIBIDN, libidn >= 0.0.0, [libidn=yes], [libidn=no])
 817   if test "$libidn" != "yes" ; then
 818     libidn=no
 819     AC_MSG_WARN([Libidn not found])
 820   else
 821     libidn=yes
 822     AC_DEFINE(LIBIDN, 1, [Define to 1 if you want Libidn.])
 823   fi
 824 fi
 825 AC_MSG_CHECKING([if Libidn should be used])
 826 AC_MSG_RESULT($libidn)
 827 @end example
 828
 829 @c **********************************************************
 830 @c ********************  Utility Functions ******************
 831 @c **********************************************************
 832 @node Utility Functions
 833 @chapter Utility Functions
 834 @cindex Utility Functions
 835
 836 The rest of this library makes extensive use of Unicode characters.
 837 In order to interface this library with the outside world, your
 838 application may need to make various Unicode transformations.
 839
 840 @section Header file @code{stringprep.h}
 841
 842 To use the functions explained in this chapter, you need to include
 843 the file @file{stringprep.h} using:
 844
 845 @example
 846 #include <stringprep.h>
 847 @end example
 848
 849 @section Unicode Encoding Transformation
 850
 851 @include texi/stringprep_unichar_to_utf8.texi
 852 @include texi/stringprep_utf8_to_unichar.texi
 853 @include texi/stringprep_ucs4_to_utf8.texi
 854 @include texi/stringprep_utf8_to_ucs4.texi
 855
 856 @section Unicode Normalization
 857
 858 @include texi/stringprep_ucs4_nfkc_normalize.texi
 859 @include texi/stringprep_utf8_nfkc_normalize.texi
 860
 861 @section Character Set Conversion
 862
 863 @include texi/stringprep_locale_charset.texi
 864 @include texi/stringprep_convert.texi
 865 @include texi/stringprep_locale_to_utf8.texi
 866 @include texi/stringprep_utf8_to_locale.texi
 867
 868
 869 @c **********************************************************
 870 @c ******************  Stringprep Functions *****************
 871 @c **********************************************************
 872 @node Stringprep Functions
 873 @chapter Stringprep Functions
 874 @cindex Stringprep Functions
 875
 876 Stringprep describes a framework for preparing Unicode text strings in
 877 order to increase the likelihood that string input and string
 878 comparison work in ways that make sense for typical users throughout
 879 the world. The stringprep protocol is useful for protocol identifier
 880 values, company and personal names, internationalized domain names,
 881 and other text strings.
 882
 883 @section Header file @code{stringprep.h}
 884
 885 To use the functions explained in this chapter, you need to include
 886 the file @file{stringprep.h} using:
 887
 888 @example
 889 #include <stringprep.h>
 890 @end example
 891
 892 @section Defining A Stringprep Profile
 893
 894 Further types and structures are defined for applications that want to
 895 specify their own stringprep profile.  As these are fairly obscure,
 896 and by necessity tied to the implementation, we do not document them
 897 here.  Look into the @file{stringprep.h} header file, and the
 898 @file{profiles.c} source code for the details.
 899
 900 @section Control Flags
 901
 902 @deftypevr {Stringprep flags} {Stringprep_profile_flags} {STRINGPREP_NO_NFKC}
 903 Disable the NFKC normalization, as well as selecting the non-NFKC case
 904 folding tables.  Usually the profile specifies BIDI and NFKC settings,
 905 and applications should not override it unless in special situations.
 906 @end deftypevr
 907
 908 @deftypevr {Stringprep flags} {Stringprep_profile_flags} {STRINGPREP_NO_BIDI}
 909 Disable the BIDI step.  Usually the profile specifies BIDI and NFKC
 910 settings, and applications should not override it unless in special
 911 situations.
 912 @end deftypevr
 913
 914 @deftypevr {Stringprep flags} {Stringprep_profile_flags} {STRINGPREP_NO_UNASSIGNED}
 915 Make the library return with an error if string contains unassigned
 916 characters according to profile.
 917 @end deftypevr
 918
 919 @section Core Functions
 920
 921 @include texi/stringprep_4i.texi
 922 @include texi/stringprep_4zi.texi
 923 @include texi/stringprep.texi
 924 @include texi/stringprep_profile.texi
 925
 926 @section Error Handling
 927
 928 @include texi/stringprep_strerror.texi
 929
 930 @section Stringprep Profile Macros
 931
 932 @deftypefun {int} stringprep_nameprep_no_unassigned (char * @var{in}, int @var{maxlen})
 933
 934 @var{in}: input/ouput array with string to prepare.
 935
 936 @var{maxlen}: maximum length of input/output array.
 937
 938 Prepare the input UTF-8 string according to the nameprep profile.  The
 939 AllowUnassigned flag is false, use @code{stringprep_nameprep} for
 940 true AllowUnassigned.  Returns 0 iff successful, or an error code.
 941 @end deftypefun
 942
 943 @deftypefun {int} stringprep_iscsi (char * @var{in}, int @var{maxlen})
 944
 945 @var{in}: input/ouput array with string to prepare.
 946
 947 @var{maxlen}: maximum length of input/output array.
 948
 949 Prepare the input UTF-8 string according to the draft iSCSI stringprep
 950 profile.  Returns 0 iff successful, or an error code.
 951 @end deftypefun
 952
 953 @deftypefun {int} stringprep_plain (char * @var{in}, int @var{maxlen})
 954
 955 @var{in}: input/ouput array with string to prepare.
 956
 957 @var{maxlen}: maximum length of input/output array.
 958
 959 Prepare the input UTF-8 string according to the draft SASL ANONYMOUS
 960 profile.  Returns 0 iff successful, or an error code.
 961 @end deftypefun
 962
 963 @deftypefun {int} stringprep_xmpp_nodeprep (char * @var{in}, int @var{maxlen})
 964
 965 @var{in}: input/ouput array with string to prepare.
 966
 967 @var{maxlen}: maximum length of input/output array.
 968
 969 Prepare the input UTF-8 string according to the draft XMPP node
 970 identifier profile.  Returns 0 iff successful, or an error code.
 971 @end deftypefun
 972
 973 @deftypefun {int} stringprep_xmpp_resourceprep (char * @var{in}, int @var{maxlen})
 974
 975 @var{in}: input/ouput array with string to prepare.
 976
 977 @var{maxlen}: maximum length of input/output array.
 978
 979 Prepare the input UTF-8 string according to the draft XMPP resource
 980 identifier profile.  Returns 0 iff successful, or an error code.
 981 @end deftypefun
 982
 983 @c **********************************************************
 984 @c *******************  Punycode Functions ******************
 985 @c **********************************************************
 986 @node Punycode Functions
 987 @chapter Punycode Functions
 988 @cindex Punycode Functions
 989
 990 Punycode is a simple and efficient transfer encoding syntax designed
 991 for use with Internationalized Domain Names in Applications. It
 992 uniquely and reversibly transforms a Unicode string into an ASCII
 993 string. ASCII characters in the Unicode string are represented
 994 literally, and non-ASCII characters are represented by ASCII
 995 characters that are allowed in host name labels (letters, digits, and
 996 hyphens). A general algorithm called Bootstring allows a string of
 997 basic code points to uniquely represent any string of code points
 998 drawn from a larger set. Punycode is an instance of Bootstring that
 999 uses particular parameter values, appropriate for IDNA.
1000
1001 @section Header file @code{punycode.h}
1002
1003 To use the functions explained in this chapter, you need to include
1004 the file @file{punycode.h} using:
1005
1006 @example
1007 #include <punycode.h>
1008 @end example
1009
1010 @section Unicode Code Point Data Type
1011
1012 The punycode function uses a special type to denote Unicode code
1013 points.  It is guaranteed to always be a 32 bit unsigned integer.
1014
1015 @deftypevr {Punycode Unicode code point} uint32_t punycode_uint
1016 A unsigned integer that hold Unicode code points.
1017 @end deftypevr
1018
1019 @section Core Functions
1020
1021 Note that the current implementation will fail if the
1022 @code{input_length} exceed 4294967295 (the size of
1023 @code{punycode_uint}).  This restriction may be removed in the future.
1024 Meanwhile applications are encouraged to not depend on this problem,
1025 and use @code{sizeof} to initialize @code{input_length} and
1026 @code{output_length}.
1027
1028 The functions provided are the following two entry points:
1029
1030 @include texi/punycode_encode.texi
1031 @include texi/punycode_decode.texi
1032
1033 @section Error Handling
1034
1035 @include texi/punycode_strerror.texi
1036
1037 @c **********************************************************
1038 @c ********************* IDNA Functions *********************
1039 @c **********************************************************
1040 @node IDNA Functions
1041 @chapter IDNA Functions
1042 @cindex IDNA Functions
1043
1044 Until now, there has been no standard method for domain names to use
1045 characters outside the ASCII repertoire. The IDNA document defines
1046 internationalized domain names (IDNs) and a mechanism called IDNA for
1047 handling them in a standard fashion. IDNs use characters drawn from a
1048 large repertoire (Unicode), but IDNA allows the non-ASCII characters
1049 to be represented using only the ASCII characters already allowed in
1050 so-called host names today. This backward-compatible representation is
1051 required in existing protocols like DNS, so that IDNs can be
1052 introduced with no changes to the existing infrastructure. IDNA is
1053 only meant for processing domain names, not free text.
1054
1055 @section Header file @code{idna.h}
1056
1057 To use the functions explained in this chapter, you need to include
1058 the file @file{idna.h} using:
1059
1060 @example
1061 #include <idna.h>
1062 @end example
1063
1064 @section Control Flags
1065
1066 The IDNA @code{flags} parameter can take on the following values, or a
1067 bit-wise inclusive or of any subset of the parameters:
1068
1069 @deftypevr {Return code} {Idna_flags} IDNA_ALLOW_UNASSIGNED
1070 Allow unassigned Unicode code points.
1071 @end deftypevr
1072
1073 @deftypevr {Return code} {Idna_flags} IDNA_USE_STD3_ASCII_RULES
1074 Check output to make sure it is a STD3 conforming host name.
1075 @end deftypevr
1076
1077 @section Prefix String
1078
1079 @deftypevr {Macro} {#define} IDNA_ACE_PREFIX
1080 String with the official IDNA prefix, @code{xn--}.
1081 @end deftypevr
1082
1083 @section Core Functions
1084
1085 The idea behind the IDNA function names are as follows: the
1086 @code{idna_to_ascii_4i} and @code{idna_to_unicode_44i} functions are
1087 the core IDNA primitives.  The @code{4} indicate that the function
1088 takes UCS-4 strings (i.e., Unicode code points encoded in a 32-bit
1089 unsigned integer type) of the specified length.  The @code{i} indicate
1090 that the data is written ``inline'' into the buffer.  This means the
1091 caller is responsible for allocating (and deallocating) the string,
1092 and providing the library with the allocated length of the string.
1093 The output length is written in the output length variable.  The
1094 remaining functions all contain the @code{z} indicator, which means
1095 the strings are zero terminated.  All output strings are allocated by
1096 the library, and must be deallocated by the caller.  The @code{4}
1097 indicator again means that the string is UCS-4, the @code{8} means the
1098 strings are UTF-8 and the @code{l} indicator means the strings are
1099 encoded in the encoding used by the current locale.
1100
1101 The functions provided are the following entry points:
1102
1103 @include texi/idna_to_ascii_4i.texi
1104 @include texi/idna_to_unicode_44i.texi
1105
1106 @section Simplified ToASCII Interface
1107
1108 @include texi/idna_to_ascii_4z.texi
1109 @include texi/idna_to_ascii_8z.texi
1110 @include texi/idna_to_ascii_lz.texi
1111
1112 @section Simplified ToUnicode Interface
1113
1114 @include texi/idna_to_unicode_4z4z.texi
1115 @include texi/idna_to_unicode_8z4z.texi
1116 @include texi/idna_to_unicode_8z8z.texi
1117 @include texi/idna_to_unicode_8zlz.texi
1118 @include texi/idna_to_unicode_lzlz.texi
1119
1120 @section Error Handling
1121
1122 @include texi/idna_strerror.texi
1123
1124 @c **********************************************************
1125 @c ********************** TLD Functions *********************
1126 @c **********************************************************
1127 @node TLD Functions
1128 @chapter TLD Functions
1129 @cindex TLD Functions
1130
1131 Organizations that manage some Top Level Domains (@acronym{TLD}s) have
1132 published tables with characters they accept within the domain.  The
1133 reason may be to reduce complexity that come from using the full
1134 Unicode range, and to protect themselves from future (backwards
1135 incompatible) changes in the IDN or Unicode specifications.  Libidn
1136 implement an infrastructure for defining and checking strings against
1137 such tables.  Libidn also ship some tables from @acronym{TLD}s that we
1138 have managed to get permission to use them from.  Because these tables
1139 are even less static than Unicode or StringPrep tables, it is likely
1140 that they will be updated from time to time (even in backwards
1141 incompatibe ways).  The Libidn interface provide a ``version'' field
1142 for each @acronym{TLD} table, which can be compared for equality to
1143 guarantee the same operation over time.
1144
1145 From a design point of view, you can regard the @acronym{TLD} tables
1146 for IDN as the ``localization'' step that come after the
1147 ``internationalization'' step provided by the IETF standards.
1148
1149 The TLD functionality rely on up-to-date tables.  The latest version
1150 of Libidn aim to provide these, but tables with unclear copying
1151 conditions, or generally experimental tables, are not included.  Some
1152 such tables can be found at @url{http://tldchk.berlios.de}.
1153
1154 @section Header file @code{tld.h}
1155
1156 To use the functions explained in this chapter, you need to include
1157 the file @file{tld.h} using:
1158
1159 @example
1160 #include <tld.h>
1161 @end example
1162
1163 @c @section Data Types
1164 @c
1165 @c @deftp {Data type} {Tld_table_element} @var{start} @var{end}
1166 @c @example
1167 @c /* Interval of valid code points in the TLD. */
1168 @c struct Tld_table_element
1169 @c @{
1170 @c   uint32_t start;            /* Start of range. */
1171 @c   uint32_t end;              /* End of range, end == start if single. */
1172 @c @};
1173 @c typedef struct Tld_table_element Tld_table_element;
1174 @c @end example
1175 @c This @code{struct} contain the @var{start} and @var{end} positions
1176 @c (inclusive) of a range.  If the range is a single (i.e., starts and
1177 @c ends in the same character), then set @var{end} to the same as
1178 @c @var{start}.  This structure is normally used as an array.
1179 @c @end deftp
1180 @c
1181 @c @deftp {Data type} {Tld_table} @var{name} @var{version} @var{nvalid} @var{valid}
1182 @c @example
1183 @c /* List valid code points in a TLD. */
1184 @c struct Tld_table
1185 @c @{
1186 @c   char *name;                        /* TLD name, e.g., "no". */
1187 @c   char *version;             /* Version string from TLD file. */
1188 @c   size_t nvalid;             /* Number of entries in data. */
1189 @c   Tld_table_element *valid[];        /* Sorted array of valid code points. */
1190 @c @};
1191 @c typedef struct Tld_table Tld_table;
1192 @c @end example
1193 @c In this @code{struct}, the @var{name} field is a string (@samp{char*})
1194 @c indicating the TLD name (e.g., ``no'').  The @var{version} field is a
1195 @c string (@samp{char*}) containing a free form humanly readable string
1196 @c that can be used for equality comparison to compare different versions
1197 @c of the table.  The @var{nvalid} field indicate how many entries there
1198 @c are in @var{valid}, which brings us finally to @var{valid} that
1199 @c contain the actual code points that are valid for this TLD (see
1200 @c @code{Tld_table_element} above).
1201 @c @end deftp
1202
1203 @section Core Functions
1204
1205 @include texi/tld_check_4t.texi
1206 @include texi/tld_check_4tz.texi
1207
1208 @section Utility Functions
1209
1210 @include texi/tld_get_4.texi
1211 @include texi/tld_get_4z.texi
1212 @include texi/tld_get_z.texi
1213 @include texi/tld_get_table.texi
1214 @include texi/tld_default_table.texi
1215
1216 @section High-Level Wrapper Functions
1217
1218 @include texi/tld_check_4.texi
1219 @include texi/tld_check_4z.texi
1220 @include texi/tld_check_8z.texi
1221 @include texi/tld_check_lz.texi
1222
1223 @section Error Handling
1224
1225 @include texi/tld_strerror.texi
1226
1227 @c **********************************************************
1228 @c ********************** PR29 Functions ********************
1229 @c **********************************************************
1230 @node PR29 Functions
1231 @chapter PR29 Functions
1232 @cindex PR29 Functions
1233
1234 A deficiency in the specification of Unicode Normalization Forms has
1235 been found.  The consequence is that some strings can be normalized
1236 into different strings by different implementations.  In other words,
1237 two different implementations may return different output for the same
1238 input (because the interpretation of the specification is
1239 ambiguous). Further, an implementation invoked again on the one of the
1240 output strings may return a different string (because one of the
1241 interpretation of the ambiguous specification make normalization
1242 non-idempotent).  Fortunately, only a select few character sequence
1243 exhibit this problem, and none of them are expected to occur in
1244 natural languages (due to different linguistic uses of the involved
1245 characters).
1246
1247 A full discussion of the problem may be found at:
1248
1249 @url{http://www.unicode.org/review/pr-29.html}
1250
1251 The PR29 functions below allow you to detect the problem sequence.  So
1252 when would you want to use these functions?  For most applications,
1253 such as those using Nameprep for IDN, this is likely only to be an
1254 interoperability problem.  Thus, you may not want to care about it, as
1255 the character sequences will rarely occur naturally.  However, if you
1256 are using a profile, such as SASLPrep, to process authentication
1257 tokens; authorization tokens; or passwords, there is a real danger
1258 that attackers may try to use the peculiarities in these strings to
1259 attack parts of your system.  As only a small number of strings, and
1260 no naturally occurring strings, exhibit this problem, the conservative
1261 approach of rejecting the strings is recommended.  If this approach is
1262 not used, you should instead verify that all parts of your system,
1263 that process the tokens and passwords, use a NFKC implementation that
1264 produce the same output for the same input.
1265
1266 Technically inclined readers may be interested in knowing more about
1267 the implementation aspects of the PR29 flaw. @xref{PR29 discussion}.
1268
1269 @section Header file @code{pr29.h}
1270
1271 To use the functions explained in this chapter, you need to include
1272 the file @file{pr29.h} using:
1273
1274 @example
1275 #include <pr29.h>
1276 @end example
1277
1278 @section Core Functions
1279
1280 @include texi/pr29_4.texi
1281
1282 @section Utility Functions
1283
1284 @include texi/pr29_4z.texi
1285 @include texi/pr29_8z.texi
1286
1287 @section Error Handling
1288
1289 @include texi/pr29_strerror.texi
1290
1291 @c **********************************************************
1292 @c ***********************  Examples  ***********************
1293 @c **********************************************************
1294 @node Examples
1295 @chapter Examples
1296 @cindex Examples
1297
1298 This chapter contains example code which illustrate how `Libidn' can
1299 be used when writing your own application.
1300
1301 @menu
1302 * Example 1::           Example using stringprep.
1303 * Example 2::           Example using punycode.
1304 * Example 3::           Example using IDNA ToASCII.
1305 * Example 4::           Example using IDNA ToUnicode.
1306 * Example 5::           Example using TLD checking.
1307 @end menu
1308
1309 @node Example 1
1310 @section Example 1
1311
1312 This example demonstrates how the stringprep functions are used.
1313
1314 @verbatiminclude example.c
1315
1316 @node Example 2
1317 @section Example 2
1318
1319 This example demonstrates how the punycode functions are used.
1320
1321 @verbatiminclude example2.c
1322
1323 @node Example 3
1324 @section Example 3
1325
1326 This example demonstrates how the library is used to convert
1327 internationalized domain names into ASCII compatible names.
1328
1329 @verbatiminclude example3.c
1330
1331 @node Example 4
1332 @section Example 4
1333
1334 This example demonstrates how the library is used to convert ASCII
1335 compatible names to internationalized domain names.
1336
1337 @verbatiminclude example4.c
1338
1339 @node Example 5
1340 @section Example 5
1341
1342 This example demonstrates how the library is used to check a string
1343 for invalid characters within a specific TLD.
1344
1345 @verbatiminclude example5.c
1346
1347 @c **********************************************************
1348 @c *********************  Invoking idn  *********************
1349 @c **********************************************************
1350 @node Invoking idn
1351 @chapter Invoking idn
1352
1353 @pindex idn
1354 @cindex invoking @command{idn}
1355 @cindex command line
1356
1357 @section Name
1358
1359 GNU Libidn (idn) -- Internationalized Domain Names command line tool
1360
1361 @section Description
1362 @code{idn} allows internationalized string preparation
1363 (@samp{stringprep}), encoding and decoding of punycode data, and IDNA
1364 ToASCII/ToUnicode operations to be performed on the command line.
1365
1366 If strings are specified on the command line, they are used as input
1367 and the computed output is printed to standard output @code{stdout}.
1368 If no strings are specified on the command line, the program read
1369 data, line by line, from the standard input @code{stdin}, and print
1370 the computed output to standard output.  What processing is performed
1371 (e.g., ToASCII, or Punycode encode) is indicated by options.  If any
1372 errors are encountered, the execution of the applications is aborted.
1373
1374 All strings are expected to be encoded in the preferred charset used
1375 by your locale.  Use @code{--debug} to find out what this charset is.
1376 You can override the charset used by setting environment variable
1377 @code{CHARSET}.
1378
1379 To process a string that starts with @code{-}, for example
1380 @code{-foo}, use @code{--} to signal the end of parameters, as in
1381 @code{idn --quiet -a -- -foo}.
1382
1383 @section Options
1384 @code{idn} recognizes these commands:
1385
1386 @verbatim
1387   -h, --help               Print help and exit
1388
1389   -V, --version            Print version and exit
1390
1391   -s, --stringprep         Prepare string according to nameprep profile
1392
1393   -d, --punycode-decode    Decode Punycode
1394
1395   -e, --punycode-encode    Encode Punycode
1396
1397   -a, --idna-to-ascii      Convert to ACE according to IDNA (default)
1398
1399   -u, --idna-to-unicode    Convert from ACE according to IDNA
1400
1401       --allow-unassigned   Toggle IDNA AllowUnassigned flag  (default=off)
1402
1403       --usestd3asciirules  Toggle IDNA UseSTD3ASCIIRules flag  (default=off)
1404
1405   -t, --tld                Check string for TLD specific rules
1406                              Only for --idna-to-ascii and --idna-to-unicode
1407                              (default=on)
1408
1409   -p, --profile=STRING     Use specified stringprep profile instead
1410                              Valid stringprep profiles are `Nameprep',
1411                              `iSCSI', `Nodeprep', `Resourceprep', `trace', and
1412                              `SASLprep'.
1413
1414       --debug              Print debugging information  (default=off)
1415
1416       --quiet              Silent operation  (default=off)
1417 @end verbatim
1418
1419 @section Environment Variables
1420
1421 The @var{CHARSET} environment variable can be used to override what
1422 character set to be used for decoding incoming data (i.e., on the
1423 command line or on the standard input stream), and to encode data to
1424 the standard output.  If your system is set up correctly, however, the
1425 application will guess which character set is used automatically.
1426 Example usage:
1427
1428 @example
1429 $ CHARSET=ISO-8859-1 idn --punycode-encode
1430 ...
1431 @end example
1432
1433 @section Examples
1434
1435 Standard usage, reading input from standard input:
1436
1437 @example
1438 jas@@latte:~$ idn
1439 libidn 0.3.5
1440 Copyright 2002, 2003 Simon Josefsson.
1441 GNU Libidn comes with NO WARRANTY, to the extent permitted by law.
1442 You may redistribute copies of GNU Libidn under the terms of
1443 the GNU Lesser General Public License.  For more information
1444 about these matters, see the file named COPYING.LIB.
1445 Type each input string on a line by itself, terminated by a newline character.
1446 r@"aksm@"org@aa{}s.se
1447 xn--rksmrgs-5wao1o.se
1448 jas@@latte:~$
1449 @end example
1450
1451 Reading input from command line, and disabling copyright and license
1452 information:
1453
1454 @example
1455 jas@@latte:~$ idn --quiet r@"aksm@"org@aa{}s.se bl@aa{}b@ae{}rgr@o{}d.no
1456 xn--rksmrgs-5wao1o.se
1457 xn--blbrgrd-fxak7p.no
1458 jas@@latte:~$
1459 @end example
1460
1461 Accessing a specific StringPrep profile directly:
1462
1463 @example
1464 jas@@latte:~$ idn --quiet --profile=SASLprep --stringprep te@ss{}t@ordf{}
1465 te@ss{}ta
1466 jas@@latte:~$
1467 @end example
1468
1469 @section Troubleshooting
1470
1471 Getting character data encoded right, and making sure Libidn use the
1472 same encoding, can be difficult.  The reason for this is that most
1473 systems encode character data in more than one character encoding,
1474 i.e., using @code{UTF-8} together with @code{ISO-8859-1} or
1475 @code{ISO-2022-JP}.  This problem is likely to continue to exist until
1476 only one character encoding come out as the evolutionary winner, or
1477 (more likely, at least to some extents) forever.
1478
1479 The first step to troubleshooting character encoding problems with
1480 Libidn is to use the @samp{--debug} parameter to find out which
1481 character set encoding @samp{idn} believe your locale uses.
1482
1483 @example
1484 jas@@latte:~$ idn --debug --quiet ""
1485 system locale uses charset `UTF-8'.
1486
1487 jas@@latte:~$
1488 @end example
1489
1490 If it prints @code{ANSI_X3.4-1968} (i.e., @code{US-ASCII}), this
1491 indicate you have not configured your locale properly.  To configure
1492 the locale, you can, for example, use @samp{LANG=sv_SE.UTF-8; export
1493 LANG} at a @code{/bin/sh} prompt, to set up your locale for a Swedish
1494 environment using @code{UTF-8} as the encoding.
1495
1496 Sometimes @samp{idn} appear to be unable to translate from your system
1497 locale into @code{UTF-8} (which is used internally), and you get an
1498 error like the following:
1499
1500 @example
1501 jas@@latte:~$ idn --quiet foo
1502 idn: could not convert from ISO-8859-1 to UTF-8.
1503 jas@@latte:~$
1504 @end example
1505
1506 The simplest explanation is that you haven't installed the
1507 @samp{iconv} conversion tools.  You can find it as a standalone
1508 library in @acronym{GNU} Libiconv
1509 (@uref{http://www.gnu.org/software/libiconv/}).  On many
1510 @acronym{GNU}/Linux systems, this library is part of the system, but
1511 you may have to install additional packages (e.g., @samp{glibc-locale}
1512 for Debian) to be able to use it.
1513
1514 Another explanation is that the error is correct and you are feeding
1515 @samp{idn} invalid data.  This can happen inadvertently if you are not
1516 careful with the character set encodings you use.  For example, if
1517 your shell run in a @code{ISO-8859-1} environment, and you invoke
1518 @samp{idn} with the @samp{CHARSET} environment variable as follows,
1519 you will feed it @code{ISO-8859-1} characters but force it to believe
1520 they are @code{UTF-8}.  Naturally this will lead to an error, unless
1521 the byte sequences happen to be parsable as @code{UTF-8}.  Note that
1522 even if you don't get an error, the output may be incorrect in this
1523 situation, because @code{ISO-8859-1} and @code{UTF-8} does not in
1524 general encode the same characters as the same byte sequences.
1525
1526 @example
1527 jas@@latte:~$ idn --quiet --debug ""
1528 system locale uses charset `ISO-8859-1'.
1529
1530 jas@@latte:~$ CHARSET=UTF-8 idn --quiet --debug r@"aksm@"org@aa{}s
1531 system locale uses charset `UTF-8'.
1532 input[0] = U+0072
1533 input[1] = U+4af3
1534 input[2] = U+006d
1535 input[3] = U+1b29e5
1536 input[4] = U+0073
1537 output[0] = U+0078
1538 output[1] = U+006e
1539 output[2] = U+002d
1540 output[3] = U+002d
1541 output[4] = U+0072
1542 output[5] = U+006d
1543 output[6] = U+0073
1544 output[7] = U+002d
1545 output[8] = U+0068
1546 output[9] = U+0069
1547 output[10] = U+0036
1548 output[11] = U+0064
1549 output[12] = U+0035
1550 output[13] = U+0039
1551 output[14] = U+0037
1552 output[15] = U+0035
1553 output[16] = U+0035
1554 output[17] = U+0032
1555 output[18] = U+0061
1556 xn--rms-hi6d597552a
1557 jas@@latte:~$
1558 @end example
1559
1560 The sense moral here is to forget about @samp{CHARSET} (configure your
1561 locales properly instead) unless you know what you are doing, and if
1562 you want to use it, do it carefully, after verifying with
1563 @samp{--debug} that you get the desired results.
1564
1565 @node Emacs API
1566 @chapter Emacs API
1567
1568 Included in Libidn are @file{punycode.el} and @file{idna.el} that
1569 provides an Emacs Lisp API to (a limited set of) the Libidn API.  This
1570 section describes the API.  Currently the IDNA API always set the
1571 @code{UseSTD3ASCIIRules} flag and clear the @code{AllowUnassigned}
1572 flag, in the future there may be functionality to specify these flags
1573 via the API.
1574
1575 @section Punycode Emacs API
1576
1577 @defvar punycode-program
1578 Name of the GNU Libidn @file{idn} application.  The default is
1579 @samp{idn}.  This variable can be customized.
1580 @end defvar
1581
1582 @defvar punycode-environment
1583 List of environment variable definitions prepended to
1584 @samp{process-environment}.  The default is @samp{("CHARSET=UTF-8")}.
1585 This variable can be customized.
1586 @end defvar
1587
1588 @defvar punycode-encode-parameters
1589 List of parameters passed to @var{punycode-program} to invoke punycode
1590 encoding mode.  The default is @samp{("--quiet" "--punycode-encode")}.
1591 This variable can be customized.
1592 @end defvar
1593
1594 @defvar punycode-decode-parameters
1595 Parameters passed to @var{punycode-program} to invoke punycode
1596 decoding mode.  The default is @samp{("--quiet" "--punycode-decode")}.
1597 This variable can be customized.
1598 @end defvar
1599
1600 @defun punycode-encode string
1601 Returns a Punycode encoding of the @var{string}, after converting the
1602 input into UTF-8.
1603 @end defun
1604
1605 @defun punycode-decode string
1606 Returns a possibly multibyte string which is the decoding of the
1607 @var{string} which is a punycode encoded string.
1608 @end defun
1609
1610 @section IDNA Emacs API
1611
1612 @defvar idna-program
1613 Name of the GNU Libidn @file{idn} application.  The default is
1614 @samp{idn}.  This variable can be customized.
1615 @end defvar
1616
1617 @defvar idna-environment
1618 List of environment variable definitions prepended to
1619 @samp{process-environment}.  The default is @samp{("CHARSET=UTF-8")}.
1620 This variable can be customized.
1621 @end defvar
1622
1623 @defvar idna-to-ascii-parameters
1624 List of parameters passed to @var{idna-program} to invoke IDNA ToASCII
1625 mode.  The default is @samp{("--quiet" "--idna-to-ascii"
1626 "--usestd3asciirules")}.  This variable can be customized.
1627 @end defvar
1628
1629 @defvar idna-to-unicode-parameters
1630 Parameters passed @var{idna-program} to invoke IDNA ToUnicode mode.
1631 The default is @samp{("--quiet" "--idna-to-unicode"
1632 "--usestd3asciirules")}.  This variable can be customized.
1633 @end defvar
1634
1635 @defun idna-to-ascii string
1636 Returns an ASCII Compatible Encoding (ACE) of the string computed by
1637 the IDNA ToASCII operation on the input @var{string}, after converting
1638 the input to UTF-8.
1639 @end defun
1640
1641 @defun idna-to-unicode string
1642 Returns a possibly multibyte string which is the output of the IDNA
1643 ToUnicode operation computed on the input @var{string}.
1644 @end defun
1645
1646 @node Java API
1647 @chapter Java API
1648
1649 Libidn has been ported to the Java programming language, and as a
1650 consequence most of the API is available to native Java applications.
1651 This section contain notes on this support, complete documentation is
1652 pending.
1653
1654 The Java library, if Libidn has been built with Java support
1655 (@pxref{Downloading and Installing}), will be placed in
1656 @file{java/libidn.jar}.  The source code is located in
1657 @file{java/gnu/inet/encoding/}.
1658
1659 @section Overview
1660
1661 This package provides a Java implementation of the Internationalized
1662 Domain Names in Applications (IDNA) standard. It is written entirely
1663 in Java and does not require any additional libraries to be set up.
1664
1665 The gnu.inet.encoding.IDNA class offers two public functions, toASCII
1666 and toUnicode which can be used as follows:
1667
1668 @example
1669 gnu.inet.encoding.IDNA.toASCII("bl@"ods.z@"ug");
1670 gnu.inet.encoding.IDNA.toUnicode("xn--blds-6qa.xn--zg-xka");
1671 @end example
1672
1673 @section Miscellaneous Programs
1674
1675 The @file{misc/} directory contains several programs that are related
1676 to the Java part of GNU Libidn, but that don't need to be included in
1677 the main source tree.
1678
1679 @subsection GenerateRFC3454
1680
1681 This program parses RFC3454 and creates the RFC3454.java program that
1682 is required during the StringPrep phase.
1683
1684 The RFC can be found at various locations, for example at
1685 @url{http://www.ietf.org/rfc/rfc3454.txt}.
1686
1687 Invoke the program as follows:
1688
1689 @example
1690 $ java GenerateRFC3454
1691 Creating RFC3454.java... Ok.
1692 @end example
1693
1694 @subsection GenerateNFKC
1695
1696 The GenerateNFKC program parses the Unicode character database file
1697 and generates all the tables required for NFKC. This program requires
1698 the two files UnicodeData.txt and CompositionExclusions.txt of version
1699 3.2 of the Unicode files. Note that RFC3454 (Stringprep) defines that
1700 Unicode version 3.2 is to be used, not the latest version.
1701
1702 The Unicode data files can be found at
1703 @url{http://www.unicode.org/Public/}.
1704
1705 Invoke the program as follows:
1706
1707 @example
1708 $ java GenerateNFKC
1709 Creating CombiningClass.java... Ok.
1710 Creating DecompositionKeys.java... Ok.
1711 Creating DecompositionMappings.java... Ok.
1712 Creating Composition.java... Ok.
1713 @end example
1714
1715 @subsection TestIDNA
1716
1717 The TestIDNA program allows to test the IDNA implementation manually
1718 or against Simon Josefsson's test vectors.
1719
1720 The test vectors can be found at the Libidn homepage,
1721 @url{http://www.gnu.org/software/libidn/}.
1722
1723 To test the tranformation manually, use:
1724
1725 @example
1726 $ java -cp .:../libidn.jar TestIDNA -a <string to test>
1727 Input: <string to test>
1728 Output: <toASCII(string to test)>
1729 $ java -cp .:../libidn.jar TestIDNA -u <string to test>
1730 Input: <string to test>
1731 Output: <toUnicode(string to test)>
1732 @end example
1733
1734 To test against draft-josefsson-idn-test-vectors.html, use:
1735
1736 @example
1737 $ java -cp .:../libidn.jar TestIDNA -t
1738 No errors detected!
1739 @end example
1740
1741 @subsection TestNFKC
1742
1743 The TestNFKC program allows to test the NFKC implementation manually
1744 or against the NormalizationTest.txt file from the Unicode data files.
1745
1746 To test the normalization manually, use:
1747
1748 @example
1749 $ java -cp .:../libidn.jar TestNFKC <string to test>
1750 Input: <string to test>
1751 Output: <nfkc version of the string to test>
1752 @end example
1753
1754 To test against NormalizationTest.txt:
1755
1756 @example
1757 $ java -cp .:../libidn.jar TestNFKC
1758 No errors detected!
1759 @end example
1760
1761 @section Possible Problems
1762
1763 Beware of Bugs: This Java API needs a lot more testing, especially
1764 with "exotic" character sets. While it works for me, it may not work
1765 for you.
1766
1767 Encoding of your Java sources: If you are using non-ASCII characters
1768 in your Java source code, make sure javac compiles your programs with
1769 the correct encoding. If necessary specify the encoding using the
1770 -encoding parameter.
1771
1772 Java Unicode handling: Java 1.4 only handles 16-bit Unicode code
1773 points (i.e. characters in the Basic Multilingual Plane), this
1774 implementation therefore ignores all references to so-called
1775 Supplementary Characters (U+10000 to U+10FFFF). Starting from Java
1776 1.5, these characters will also be supported by Java, but this will
1777 require changes to this library.  See also the next section.
1778
1779 @section A Note on Java and Unicode
1780
1781 This library uses Java's builtin 'char' datatype. Up to Java 1.4, this
1782 datatype only supports 16-bit Unicode code points, also called the
1783 Basic Multilingual Plane. For this reason, this library doesn't work
1784 for Supplementary Characters (i.e. characters from U+10000 to
1785 U+10FFFF). All references to such characters are silently ignored.
1786
1787 Starting from Java 1.5, also Supplementary Characters will be
1788 supported. However, this will require changes in the present version
1789 of the library. Java 1.5 is currently in beta status.
1790
1791 For more information refer to the documentation of java.lang.Character
1792 in the JDK API.
1793
1794 @node C# API
1795 @chapter C# API
1796
1797 The Libidn library has been ported to the C# language.  The port
1798 reside in the top-level @file{csharp/} directory.  Currently, no
1799 further documentation about the implementation or the API is
1800 available.  However, the C# port was based on the Java port, and the
1801 API is exactly the same as in the Java version.  The help files for
1802 the Java API may thus be useful.
1803
1804 @c **********************************************************
1805 @c *******************  Acknowledgements  *******************
1806 @c **********************************************************
1807 @node Acknowledgements
1808 @chapter Acknowledgements
1809
1810 The punycode implementation was taken from the IETF IDN Punycode
1811 specification, by Adam M. Costello.  The TLD code was contributed by
1812 Thomas Jacob.  The Java implementation was contributed by Oliver Hitz.
1813 The C# implementation was contributed by Alexander Gnauck.  The
1814 Unicode tables were provided by Unicode, Inc.  Some functions for
1815 dealing with Unicode (see nfkc.c and toutf8.c) were borrowed from
1816 GLib, downloaded from @url{http://www.gtk.org/}.  The manual borrowed
1817 text from Libgcrypt by Werner Koch.
1818
1819 Inspiration for many things that, consciously or not, have gone into
1820 this package is due to a number of free software package that the
1821 author has been exposed to.  The author wishes to acknowledge the free
1822 software community in general, for giving an example on what role
1823 software development can play in the modern society.
1824
1825 Several people reported bugs, sent patches or suggested improvements,
1826 see the file THANKS in the top-level directory of the source code.
1827
1828 @c **********************************************************
1829 @c ************************  History  ***********************
1830 @c **********************************************************
1831 @node History
1832 @chapter History
1833
1834 The complete history of user visible changes is stored in the file
1835 @file{NEWS} in the top-level directory of the source code tree.  The
1836 complete history of modifications to each file is stored in the file
1837 @file{ChangeLog} in the same directory.  This section contain a
1838 condensed version of that information, in the form of ``milestones''
1839 for the project.
1840
1841 @table @asis
1842 @item Stringprep implementation.
1843 Version 0.0.0 released on 2002-11-05.
1844
1845 @item IDNA and Punycode implementations, part of the GNU project.
1846 Version 0.1.0 released on 2003-01-05.
1847
1848 @item Uses official IDNA ACE prefix 'xn--'.
1849 Version 0.1.7 released on 2003-02-12.
1850
1851 @item Command line interface.
1852 Version 0.1.11 released on 2003-02-26.
1853
1854 @item GNU Libc add-on proposed.
1855 Version 0.1.12 released on 2003-03-06.
1856
1857 @item Interoperability testing during IDNConnect.
1858 Version 0.3.1 released on 2003-10-02.
1859
1860 @item TLD restriction testing.
1861 Version 0.4.0 released on 2004-02-28.
1862
1863 @item GNU Libc add-on integrated.
1864 Version 0.4.1 released on 2004-03-08.
1865
1866 @item Native Java implementation.
1867 Version 0.4.2-0.4.9 released between 2004-03-20 and 2004-06-11.
1868
1869 @item PR-29 functions for ``problem sequences''.
1870 Version 0.5.0 released on 2004-06-26.
1871
1872 @item Many small portability fixes and wider use.
1873 Version 0.5.1 through 0.5.20, released between 2004-07-09 and
1874 2005-10-23.
1875
1876 @item Native C# implementation.
1877 Version 0.6.0 released on 2005-12-03.
1878
1879 @item Windows support through cross-compilation.
1880 Version 0.6.1 released on 2006-01-20.
1881
1882 @end table
1883
1884 @node PR29 discussion
1885 @appendix PR29 discussion
1886
1887 If you wish to experiment with a modified Unicode NFKC implementation
1888 according to the PR29 proposal, you may find the following bug report
1889 useful.  However, I have not verified that the suggested modifications
1890 are correct.  For reference, I'm including my response to the report
1891 as well.
1892
1893 @verbatim
1894 From: Rick McGowan <rick@unicode.org>
1895 Subject: Possible bug and status of PR 29 change(s)
1896 To: bug-libidn@gnu.org
1897 Date: Wed, 27 Oct 2004 14:49:17 -0700
1898
1899 Hello. On behalf of the Unicode Consortium editorial committee, I would
1900 like to find out more information about the PR 29 fixes, if any, and
1901 functions in Libidn. Your implementation was listed in the text of PR29 as
1902 needing investigation, so I am following up on several implementations.
1903
1904 The UTC has accepted the proposed fix to D2 as outlined in PR29, and a new
1905 draft of UAX #15 has been issued.
1906
1907 I have looked at Libidn 0.5.8 (today), and there may still be a possible
1908 bug in NFKC.java and nfkc.c.
1909
1910 ------------------------------------------------------
1911
1912 1. In NFKC.java, this line in canonicalOrdering():
1913
1914       if (i > 0 && (last_cc == 0 || last_cc != cc)) {
1915
1916 should perhaps be changed to:
1917
1918       if (i > 0 && (last_cc == 0 || last_cc < cc)) {
1919
1920 but I'm not sure of the sense of this comparison.
1921
1922 ------------------------------------------------------
1923
1924 2. In nfkc.c, function _g_utf8_normalize_wc() has this code:
1925
1926           if (i > 0 &&
1927               (last_cc == 0 || last_cc != cc) &&
1928               combine (wc_buffer[last_start], wc_buffer[i],
1929                        &wc_buffer[last_start]))
1930             {
1931
1932 This appears to have the same bug as the current Python implementation (in
1933 Python 2.3.4). The code should be checking, as per new rule D2 UAX #15
1934 update, that the next combining character is the same or HIGHER than the
1935 current one. It now checks to see if it's non-zero and not equal.
1936
1937 The above line(s) should perhaps be changed to:
1938
1939           if (i > 0 &&
1940               (last_cc == 0 || last_cc < cc) &&
1941               combine (wc_buffer[last_start], wc_buffer[i],
1942                        &wc_buffer[last_start]))
1943             {
1944
1945 but I'm not sure of the sense of the comparison (< or > or <=?) here.
1946
1947 In the text of PR29, I will be marking Libidn as "needs change" and adding
1948 the version number that I checked. If any further change is made, please
1949 let me know the release version, and I'll update again.
1950
1951 Regards,
1952         Rick McGowan
1953 @end verbatim
1954
1955 @verbatim
1956 From: Simon Josefsson <jas@extundo.com>
1957 Subject: Re: Possible bug and status of PR 29 change(s)
1958 To: Rick McGowan <rick@unicode.org>
1959 Cc: bug-libidn@gnu.org
1960 Date: Thu, 28 Oct 2004 09:47:47 +0200
1961
1962 Rick McGowan <rick@unicode.org> writes:
1963
1964 > Hello. On behalf of the Unicode Consortium editorial committee, I would
1965 > like to find out more information about the PR 29 fixes, if any, and
1966 > functions in Libidn. Your implementation was listed in the text of PR29 as
1967 > needing investigation, so I am following up on several implementations.
1968 >
1969 > The UTC has accepted the proposed fix to D2 as outlined in PR29, and a new
1970 > draft of UAX #15 has been issued.
1971 >
1972 > I have looked at Libidn 0.5.8 (today), and there may still be a possible
1973 > bug in NFKC.java and nfkc.c.
1974
1975 Hello Rick.
1976
1977 I believe the current behavior is intentional.  Libidn do not aim to
1978 implement latest-and-greatest NFKC, it aim to implement the NFKC
1979 functionality required for StringPrep and IDN.  As you may know,
1980 StringPrep/IDN reference Unicode 3.2.0, and explicitly says any later
1981 changes (which I consider PR29 as) do not apply.
1982
1983 In fact, I believe that would I incorporate the changes suggested in
1984 PR29, I would in fact be violating the IDN specifications.
1985
1986 Thanks for looking into the code and finding the place where the
1987 change could be made.  I'll see if I can mention this in the manual
1988 somewhere, for technically interested readers.
1989
1990 Regards,
1991 Simon
1992 @end verbatim
1993
1994 @node Copying Information
1995 @appendix Copying Information
1996
1997 @menu
1998 * GNU Free Documentation License::   License for copying this manual.
1999 * GNU LGPL::                         License for copying the library.
2000 * GNU GPL::                          License for copying the programs.
2001 @end menu
2002
2003 @include fdl.texi
2004 @include lgpl.texi
2005 @include gpl.texi
2006
2007 @node Function and Variable Index
2008 @unnumbered Function and Variable Index
2009
2010 @printindex fn
2011
2012 @node Concept Index
2013 @unnumbered Concept Index
2014
2015 @printindex cp
2016
2017 @bye