doc/libidn.texi

   1 \input texinfo   @c -*- mode: texinfo; coding: us-ascii; -*-
   2 @c This file is part of GNU Libidn.
   3 @c See below for copyright and license.
   4
   5 @setfilename libidn.info
   6 @include version.texi
   7 @settitle GNU Libidn
   8 @finalout
   9
  10 @syncodeindex pg cp
  11
  12 @copying
  13 This manual is last updated @value{UPDATED} for version
  14 @value{VERSION} of GNU Libidn.
  15
  16 Copyright @copyright{} 2002, 2003, 2004, 2005 Simon Josefsson.
  17
  18 @quotation
  19 Permission is granted to copy, distribute and/or modify this document
  20 under the terms of the GNU Free Documentation License, Version 1.2 or
  21 any later version published by the Free Software Foundation; with the
  22 Invariant Sections being ``Commercial Support'', no Front-Cover Texts,
  23 and no Back-Cover Texts.  A copy of the license is included in the
  24 section entitled ``GNU Free Documentation License''.
  25 @end quotation
  26 @end copying
  27
  28 @dircategory GNU Libraries
  29 @direntry
  30 * libidn: (libidn).     Internationalized string processing library.
  31 @end direntry
  32
  33 @dircategory GNU utilities
  34 @direntry
  35 * idn: (libidn)Invoking idn.            Command line interface to GNU Libidn.
  36 @end direntry
  37
  38 @dircategory Emacs
  39 @direntry
  40 * IDN Library: (libidn)Emacs API.       Emacs API for IDN functions.
  41 @end direntry
  42
  43 @titlepage
  44 @title GNU Libidn
  45 @subtitle Internationalized string processing for the GNU system
  46 @subtitle for version @value{VERSION}, @value{UPDATED}
  47 @author Simon Josefsson
  48 @page
  49 @vskip 0pt plus 1filll
  50 @insertcopying
  51 @end titlepage
  52
  53 @contents
  54
  55 @ifnottex
  56 @node Top
  57 @top GNU Libidn
  58
  59 @insertcopying
  60 @end ifnottex
  61
  62 @menu
  63 * Introduction::                How to use this manual.
  64 * Preparation::                 What you should do before using the library.
  65 * Utility Functions::           Unicode transformation utility functions.
  66 * Stringprep Functions::        Stringprep functions.
  67 * Punycode Functions::          Punycode functions.
  68 * IDNA Functions::              IDNA functions.
  69 * TLD Functions::               TLD functions.
  70 * PR29 Functions::              Detect strings non-idempotent under NFKC.
  71 * Examples::                    Demonstrate how to use the library.
  72 * Invoking idn::                Command line interface to the library.
  73 * Emacs API::                   Emacs Lisp API for Libidn.
  74 * Java API::                    Notes on the Java port of Libidn.
  75 * C# API::                      Notes on the C# port of Libidn.
  76 * Acknowledgements::            Whom to blame.
  77 * Milestones::                  Rough outline of development history.
  78
  79 Indices
  80
  81 * Concept Index::
  82 * Function and Variable Index::
  83
  84 Appendices
  85
  86 * PR29 discussion::             Implementation aspects of the PR29 flaw.
  87 * Library Copying::             How you can copy and share GNU Libidn.
  88 * Copying This Manual::         How you can copy and share this manual.
  89
  90 @end menu
  91
  92
  93 @node Introduction
  94 @chapter Introduction
  95
  96 GNU Libidn is an implementation of the Stringprep, Punycode and IDNA
  97 specifications defined by the IETF Internationalized Domain Names
  98 (IDN) working group, used for internationalized domain names.  The
  99 package is available under the GNU Lesser General Public License.
 100
 101 The library contains a generic Stringprep implementation that does
 102 Unicode 3.2 NFKC normalization, mapping and prohibitation of
 103 characters, and bidirectional character handling.  Profiles for
 104 Nameprep, iSCSI, SASL and XMPP are included.  Punycode and ASCII
 105 Compatible Encoding (ACE) via IDNA are supported.  A mechanism to
 106 define Top-Level Domain (TLD) specific validation tables, and to
 107 compare strings against those tables, is included.  Default tables for
 108 some TLDs are also included.
 109
 110 The Stringprep API consists of two main functions, one for converting
 111 data from the system's native representation into UTF-8, and one
 112 function to perform the Stringprep processing.  Adding a new
 113 Stringprep profile for your application within the API is
 114 straightforward.  The Punycode API consists of one encoding function
 115 and one decoding function.  The IDNA API consists of the ToASCII and
 116 ToUnicode functions, as well as an high-level interface for converting
 117 entire domain names to and from the ACE encoded form.  The TLD API
 118 consists of one set of functions to extract the TLD name from a domain
 119 string, one set of functions to locate the proper TLD table to use
 120 based on the TLD name, and core functions to validate a string against
 121 a TLD table, and some utility wrappers to perform all the steps in one
 122 call.
 123
 124 The library is used by, e.g., GNU SASL and Shishi to process user
 125 names and passwords.  Libidn can be built into GNU Libc to enable a
 126 new system-wide getaddrinfo flag for IDN processing.
 127
 128 Libidn is developed for the GNU/Linux system, but runs on over 20 Unix
 129 platforms (including Solaris, IRIX, AIX, and Tru64) and Windows.
 130 Libidn is written in C and (parts of) the API is accessible from C,
 131 C++, Emacs Lisp, Python and Java.  A native Java and C# port is also
 132 provided.
 133
 134 @menu
 135 * Getting Started::
 136 * Features::
 137 * Library Overview::
 138 * Supported Platforms::
 139 * Getting help::
 140 * Commercial Support::
 141 * Downloading and Installing::
 142 * Bug Reports::
 143 * Contributing::
 144 @end menu
 145
 146 @node Getting Started
 147 @section Getting Started
 148
 149 This manual documents the library programming interface.  All
 150 functions and data types provided by the library are explained.
 151 Included are also examples, and documentation for the command line
 152 tool @file{idn} that provide a quick interface to the library.  The
 153 Emacs Lisp bindings for the library is also discussed.
 154
 155 The reader is assumed to possess basic familiarity with
 156 internationalization concepts and network programming in C or C++.
 157
 158 This manual can be used in several ways.  If read from the beginning
 159 to the end, it gives a good introduction into the library and how it
 160 can be used in an application.  Forward references are included where
 161 necessary.  Later on, the manual can be used as a reference manual to
 162 get just the information needed about any particular interface of the
 163 library.  Experienced programmers might want to start looking at the
 164 examples at the end of the manual (@pxref{Examples}), and then only
 165 read up those parts of the interface which are unclear.
 166
 167 @node Features
 168 @section Features
 169
 170 This library might have a couple of advantages over other libraries
 171 doing a similar job.
 172
 173 @table @asis
 174 @item It's Free Software
 175 Anybody can use, modify, and redistribute it under the terms of the
 176 GNU Lesser General Public License.
 177
 178 @item It's thread-safe
 179 No global state is kept in the library.  All functions are reentrant.
 180
 181 @item It's portable
 182 The code is intended to be written in pure ANSI C89.  It has been
 183 tested on many Unix like operating systems, and Windows.
 184
 185 @item It's modularized
 186 The library is composed of several modules, and the only interaction
 187 between modules is through each modules' public API.  If you only need
 188 one piece of functionality, it is possible to take the files you need
 189 and incorporate them into your own project.
 190
 191 @item It's not bloated
 192 The design of the library is based on the smallest API necessary to
 193 implement the basic functionality.  It has been carefully extended
 194 with a small number of high-level wrappers to make it comfortable to
 195 use the library.  However, it does not implement additional
 196 functionality just for the sake of completeness.
 197
 198 @item It's documented
 199 Sadly, not all software comes with documentation these days.  This one
 200 does.
 201
 202 @end table
 203
 204 @node Library Overview
 205 @section Library Overview
 206
 207 The following illustration show the components that make up Libidn,
 208 and how your application relates to the library.  In the illustration,
 209 various components are shown as boxes.  You see the generic StringPrep
 210 component, the various StringPrep profiles including Nameprep, the
 211 Punycode component, the IDNA component, and the TLD component.  The
 212 arrows indicate aggregation, e.g., IDNA uses Punycode and Nameprep,
 213 and in turn Nameprep uses the generic StringPrep interface.  The
 214 interfaces to all components are available for applications, no
 215 component within the library is hidden from the application.
 216
 217 @image{components}
 218
 219 @node Supported Platforms
 220 @section Supported Platforms
 221
 222 Libidn has at some point in time been tested on the following
 223 platforms.
 224
 225 @enumerate
 226
 227 @item Debian GNU/Linux 3.0 (Woody)
 228 @cindex Debian
 229
 230 GCC 2.95.4 and GNU Make. This is the main development platform.
 231 @code{alphaev67-unknown-linux-gnu}, @code{alphaev6-unknown-linux-gnu},
 232 @code{arm-unknown-linux-gnu}, @code{armv4l-unknown-linux-gnu},
 233 @code{hppa-unknown-linux-gnu}, @code{hppa64-unknown-linux-gnu},
 234 @code{i686-pc-linux-gnu}, @code{ia64-unknown-linux-gnu},
 235 @code{m68k-unknown-linux-gnu}, @code{mips-unknown-linux-gnu},
 236 @code{mipsel-unknown-linux-gnu}, @code{powerpc-unknown-linux-gnu},
 237 @code{s390-ibm-linux-gnu}, @code{sparc-unknown-linux-gnu},
 238 @code{sparc64-unknown-linux-gnu}.
 239
 240 @item Debian GNU/Linux 2.1
 241 @cindex Debian
 242
 243 GCC 2.95.1 and GNU Make. @code{armv4l-unknown-linux-gnu}.
 244
 245 @item Tru64 UNIX
 246 @cindex Tru64
 247
 248 Tru64 UNIX C compiler and Tru64 Make. @code{alphaev67-dec-osf5.1},
 249 @code{alphaev68-dec-osf5.1}.
 250
 251 @item SuSE Linux 7.1
 252 @cindex SuSE
 253
 254 GCC 2.96 and GNU Make. @code{alphaev6-unknown-linux-gnu},
 255 @code{alphaev67-unknown-linux-gnu}.
 256
 257 @item SuSE Linux 7.2a
 258 @cindex SuSE Linux
 259
 260 GCC 3.0 and GNU Make. @code{ia64-unknown-linux-gnu}.
 261
 262 @item SuSE Linux
 263 @cindex SuSE Linux
 264
 265 GCC 3.2.2 and GNU Make.  @code{x86_64-unknown-linux-gnu} (AMD64
 266 Opteron ``Melody'').
 267
 268 @item SuSE Enterprise Server 9 on IBM OpenPower 720
 269 @cindex SuSE Linux
 270 @cindex OpenPower 720
 271
 272 GCC 3.3.3 and GNU Make.  @code{powerpc64-unknown-linux-gnu}.
 273
 274 @item RedHat Linux 7.2
 275 @cindex RedHat
 276
 277 GCC 2.96 and GNU Make. @code{alphaev6-unknown-linux-gnu},
 278 @code{alphaev67-unknown-linux-gnu}, @code{ia64-unknown-linux-gnu}.
 279
 280 @item RedHat Linux 8.0
 281 @cindex RedHat
 282
 283 GCC 3.2 and GNU Make. @code{i686-pc-linux-gnu}.
 284
 285 @item RedHat Advanced Server 2.1
 286 @cindex RedHat Advanced Server
 287
 288 GCC 2.96 and GNU Make. @code{i686-pc-linux-gnu}.
 289
 290 @item Slackware Linux 8.0.01
 291 @cindex RedHat
 292
 293 GCC 2.95.3 and GNU Make. @code{i686-pc-linux-gnu}.
 294
 295 @item Mandrake Linux 9.0
 296 @cindex Mandrake
 297
 298 GCC 3.2 and GNU Make. @code{i686-pc-linux-gnu}.
 299
 300 @item IRIX 6.5
 301 @cindex IRIX
 302
 303 MIPS C compiler, IRIX Make. @code{mips-sgi-irix6.5}.
 304
 305 @item AIX 4.3.2
 306 @cindex AIX
 307
 308 IBM C for AIX compiler, AIX Make.  @code{rs6000-ibm-aix4.3.2.0}.
 309
 310 @item Microsoft Windows 2000 (Cygwin)
 311 @cindex Windows
 312
 313 GCC 3.2, GNU make. @code{i686-pc-cygwin}.
 314
 315 @item HP-UX 11
 316 @cindex HP-UX
 317
 318 HP-UX C compiler and HP Make. @code{ia64-hp-hpux11.22},
 319 @code{hppa2.0w-hp-hpux11.11}.
 320
 321 @item SUN Solaris 2.7
 322 @cindex Solaris
 323
 324 GCC 3.0.4 and GNU Make. @code{sparc-sun-solaris2.7}.
 325
 326 @item SUN Solaris 2.8
 327 @cindex Solaris
 328
 329 Sun WorkShop Compiler C 6.0 and SUN Make. @code{sparc-sun-solaris2.8}.
 330
 331 @item SUN Solaris 2.9
 332 @cindex Solaris
 333
 334 Sun Forte Developer 7 C compiler and GNU
 335 Make. @code{sparc-sun-solaris2.9}.
 336
 337 @item NetBSD 1.6
 338 @cindex NetBSD
 339
 340 GCC 2.95.3 and GNU Make. @code{alpha-unknown-netbsd1.6},
 341 @code{i386-unknown-netbsdelf1.6}.
 342
 343 @item OpenBSD 3.1 and 3.2
 344 @cindex OpenBSD
 345
 346 GCC 2.95.3 and GNU Make. @code{alpha-unknown-openbsd3.1},
 347 @code{i386-unknown-openbsd3.1}.
 348
 349 @item FreeBSD 4.7 and 4.8
 350 @cindex FreeBSD
 351
 352 GCC 2.95.4 and GNU Make. @code{alpha-unknown-freebsd4.7},
 353 @code{alpha-unknown-freebsd4.8}, @code{i386-unknown-freebsd4.7},
 354 @code{i386-unknown-freebsd4.8}.
 355
 356 @item MacOS X 10.2 Server Edition
 357 @cindex MacOS X
 358
 359 GCC 3.1 and GNU Make. @code{powerpc-apple-darwin6.5}.
 360
 361 @item MacOS X 10.4 ``Tiger'' with Xcode 2.0
 362 @cindex MacOS X
 363
 364 GCC 4.0 and GNU Make. @code{powerpc-apple-darwin8.0}.
 365
 366 @item Cross compiled to uClinux/uClibc on Motorola Coldfire
 367 @cindex Motorola Coldfire
 368 @cindex uClinux
 369 @cindex uClibc
 370
 371 GCC 3.4 and GNU Make @code{m68k-uclinux-elf}.
 372
 373 @item Cross compiled to ARM using Glibc
 374 @cindex ARM
 375
 376 GCC 2.95 and GNU Make @code{arm-linux}.
 377
 378 @end enumerate
 379
 380 If you use Libidn on, or port Libidn to, a new platform please report
 381 it to the author.
 382
 383 @node Getting help
 384 @section Getting help
 385
 386 A mailing list where users of Libidn may help each other exists, and
 387 you can reach it by sending e-mail to @email{help-libidn@@gnu.org}.
 388 Archives of the mailing list discussions, and an interface to manage
 389 subscriptions, is available through the World Wide Web at
 390 @url{http://lists.gnu.org/mailman/listinfo/help-libidn}.
 391
 392 @node Commercial Support
 393 @section Commercial Support
 394
 395 Commercial support is available for users of GNU Libidn.  The kind of
 396 support that can be purchased may include:
 397
 398 @itemize
 399
 400 @item Implement new features.
 401 Such as country code specific profiling to support a restricted subset
 402 of Unicode.
 403
 404 @item Port Libidn to new platforms.
 405 This could include porting Libidn to an embedded platforms that may
 406 need memory or size optimization.
 407
 408 @item Integrating IDN support in your existing project.
 409
 410 @item System design of components related to IDN.
 411
 412 @end itemize
 413
 414 If you are interested, please write to:
 415
 416 @verbatim
 417 Simon Josefsson Datakonsult
 418 Hagagatan 24
 419 113 47 Stockholm
 420 Sweden
 421
 422 E-mail: simon@josefsson.org
 423 @end verbatim
 424
 425 If your company provide support related to GNU Libidn and would like
 426 to be mentioned here, contact the author (@pxref{Bug Reports}).
 427
 428 @node Downloading and Installing
 429 @section Downloading and Installing
 430 @cindex Installation
 431 @cindex Download
 432
 433 The package can be downloaded from several places, including:
 434
 435 @url{http://josefsson.org/libidn/releases/}
 436
 437 The latest version is stored in a file, e.g.,
 438 @samp{gsasl-@value{VERSION}.tar.gz} where the @samp{@value{VERSION}}
 439 value is the highest version number in the directory.
 440
 441 The package is then extracted, configured and built like many other
 442 packages that use Autoconf.  For detailed information on configuring
 443 and building it, refer to the @file{INSTALL} file that is part of the
 444 distribution archive.
 445
 446 Here is an example terminal session that download, configure, build
 447 and install the package.  You will need a few basic tools, such as
 448 @samp{sh}, @samp{make} and @samp{cc}.
 449
 450 @example
 451 $ wget -q http://josefsson.org/libidn/releases/libidn-@value{VERSION}.tar.gz
 452 $ tar xfz libidn-@value{VERSION}.tar.gz
 453 $ cd libidn-@value{VERSION}/
 454 $ ./configure
 455 ...
 456 $ make
 457 ...
 458 $ make install
 459 ...
 460 @end example
 461
 462 After that Libidn should be properly installed and ready for use.
 463
 464 A few @code{configure} options may be relevant, summarized in the
 465 table.
 466
 467 @table @code
 468
 469 @item --enable-java
 470 Build the Java port into a *.JAR file.  @xref{Java API}, for more
 471 information.
 472
 473 @item --disable-tld
 474 Disable the TLD module.  This would typically only be useful if you
 475 are building on a memory restricted platforms.  @xref{TLD Functions},
 476 for more information.
 477
 478 @item --enable-csharp[=IMPL]
 479 Build the C3 port into a *.DLL file.  @xref{C# API}, for more
 480 information.  Here, @code{IMPL} is @code{pnet} or @code{mono},
 481 indicating whether the PNET @command{cscc} compiler or the Mono
 482 @command{mcs} compiler should be used, respectively.
 483
 484 @end table
 485
 486 For the complete list, refer to the output from @code{configure
 487 --help}.
 488
 489 @node Bug Reports
 490 @section Bug Reports
 491 @cindex Reporting Bugs
 492
 493 If you think you have found a bug in Libidn, please investigate it and
 494 report it.
 495
 496 @itemize @bullet
 497
 498 @item Please make sure that the bug is really in Libidn, and
 499 preferably also check that it hasn't already been fixed in the latest
 500 version.
 501
 502 @item You have to send us a test case that makes it possible for us to
 503 reproduce the bug.
 504
 505 @item You also have to explain what is wrong; if you get a crash, or
 506 if the results printed are not good and in that case, in what way.
 507 Make sure that the bug report includes all information you would need
 508 to fix this kind of bug for someone else.
 509
 510 @end itemize
 511
 512 Please make an effort to produce a self-contained report, with
 513 something definite that can be tested or debugged.  Vague queries or
 514 piecemeal messages are difficult to act on and don't help the
 515 development effort.
 516
 517 If your bug report is good, we will do our best to help you to get a
 518 corrected version of the software; if the bug report is poor, we won't
 519 do anything about it (apart from asking you to send better bug
 520 reports).
 521
 522 If you think something in this manual is unclear, or downright
 523 incorrect, or if the language needs to be improved, please also send a
 524 note.
 525
 526 Send your bug report to:
 527
 528 @center @samp{bug-libidn@@gnu.org}
 529
 530
 531 @node Contributing
 532 @section Contributing
 533 @cindex Contributing
 534 @cindex Hacking
 535
 536 If you want to submit a patch for inclusion -- from solve a typo you
 537 discovered, up to adding support for a new feature -- you should
 538 submit it as a bug report (@pxref{Bug Reports}).  There are some
 539 things that you can do to increase the chances for it to be included
 540 in the official package.
 541
 542 Unless your patch is very small (say, under 10 lines) we require that
 543 you assign the copyright of your work to the Free Software Foundation.
 544 This is to protect the freedom of the project.  If you have not
 545 already signed papers, we will send you the necessary information when
 546 you submit your contribution.
 547
 548 For contributions that doesn't consist of actual programming code, the
 549 only guidelines are common sense.  Use it.
 550
 551 For code contributions, a number of style guides will help you:
 552
 553 @itemize @bullet
 554
 555 @item Coding Style.
 556 Follow the GNU Standards document (@pxref{top, GNU Coding Standards,,
 557 standards}).
 558
 559 If you normally code using another coding standard, there is no
 560 problem, but you should use @samp{indent} to reformat the code
 561 (@pxref{top, GNU Indent,, indent}) before submitting your work.
 562
 563 @item Use the unified diff format @samp{diff -u}.
 564
 565 @item Return errors.
 566 No reason whatsoever should abort the execution of the library.  Even
 567 memory allocation errors, e.g. when malloc return NULL, should work
 568 although result in an error code.
 569
 570 @item Design with thread safety in mind.
 571 Don't use global variables and the like.
 572
 573 @item Avoid using the C math library.
 574 It causes problems for embedded implementations, and in most
 575 situations it is very easy to avoid using it.
 576
 577 @item Document your functions.
 578 Use comments before each function headers, that, if properly
 579 formatted, are extracted into GTK-DOC web pages.  Don't forget to
 580 update the Texinfo manual as well.
 581
 582 @item Supply a ChangeLog and NEWS entries, where appropriate.
 583
 584 @end itemize
 585
 586 @c **********************************************************
 587 @c *******************  Preparation  ************************
 588 @c **********************************************************
 589 @node Preparation
 590 @chapter Preparation
 591
 592 To use `Libidn', you have to perform some changes to your sources and
 593 the build system.  The necessary changes are small and explained in
 594 the following sections.  At the end of this chapter, it is described
 595 how the library is initialized, and how the requirements of the
 596 library are verified.
 597
 598 A faster way to find out how to adapt your application for use with
 599 `Libidn' may be to look at the examples at the end of this manual
 600 (@pxref{Examples}).
 601
 602 @menu
 603 * Header::
 604 * Initialization::
 605 * Version Check::
 606 * Building the source::
 607 * Autoconf tests::
 608 @end menu
 609
 610 @node Header
 611 @section Header
 612
 613 The library contains a few independent parts, and each part export the
 614 interfaces (data types and functions) in a header file.  You must
 615 include the appropriate header files in all programs using the
 616 library, either directly or through some other header file, like this:
 617
 618 @example
 619 #include <stringprep.h>
 620 @end example
 621
 622 The header files and the functions they define are categorized as
 623 follows:
 624
 625 @table @asis
 626 @item stringprep.h
 627
 628 The low-level stringprep API entry point.  For IDN applications, this
 629 is usually invoked via IDNA. Some applications, specifically non-IDN
 630 ones, may want to prepare strings directly though, and should include
 631 this header file.
 632
 633 The name space of the stringprep part of Libidn is @code{stringprep*}
 634 for function names, @code{Stringprep*} for data types and
 635 @code{STRINGPREP_*} for other symbols.  In addition,
 636 @code{_stringprep*} is reserved for internal use and should never be
 637 used by applications.
 638
 639 @item punycode.h
 640
 641 The entry point to Punycode encoding and decoding functions.  Normally
 642 punycode is used via the idna.h interface, but some application may
 643 want to perform raw punycode operations.
 644
 645 The name space of the punycode part of Libidn is @code{punycode_*} for
 646 function names, @code{Punycode*} for data types and @code{PUNYCODE_*}
 647 for other symbols.  In addition, @code{_punycode*} is reserved for
 648 internal use and should never be used by applications.
 649 @item idna.h
 650
 651 The entry point to the IDNA functions.  This is the normal entry point
 652 for applications that need IDN functionality.
 653
 654 The name space of the IDNA part of Libidn is @code{idna_*} for
 655 function names, @code{Idna*} for data types and @code{IDNA_*} for
 656 other symbols.  In addition, @code{_idna*} is reserved for internal
 657 use and should never be used by applications.
 658
 659 @item tld.h
 660
 661 The entry point to the TLD functions.  Normal applications are not
 662 expected to need this functionality, but it is present for
 663 applications that are used by TLDs to validate customer input.
 664
 665 The name space of the TLD part of Libidn is @code{tld_*} for function
 666 names, @code{Tld_*} for data types and @code{TLD_*} for other symbols.
 667 In addition, @code{_tld*} is reserved for internal use and should
 668 never be used by applications.
 669
 670 @item pr29.h
 671
 672 The entry point to the PR29 functions.  These functions are used to
 673 detect ``problem sequences'' (@pxref{PR29 Functions}), mostly for use
 674 in security critical applications.
 675
 676 The name space of the PR29 part of Libidn is @code{pr29_*} for
 677 function names, @code{Pr29_*} for data types and @code{PR29_*} for
 678 other symbols.  In addition, @code{_pr29*} is reserved for internal
 679 use and should never be used by applications.
 680
 681 @end table
 682
 683 @node Initialization
 684 @section Initialization
 685
 686 Libidn is stateless and does not need any initialization.
 687
 688 @node Version Check
 689 @section Version Check
 690
 691 It is often desirable to check that the version of `Libidn' used is
 692 indeed one which fits all requirements.  Even with binary
 693 compatibility new features may have been introduced but due to problem
 694 with the dynamic linker an old version is actually used.  So you may
 695 want to check that the version is okay right after program startup.
 696
 697 @include texi/stringprep_check_version.texi
 698
 699 The normal way to use the function is to put something similar to the
 700 following first in your @code{main}:
 701
 702 @example
 703   if (!stringprep_check_version (STRINGPREP_VERSION))
 704     @{
 705       printf ("stringprep_check_version() failed:\n"
 706               "Header file incompatible with shared library.\n");
 707       exit(1);
 708     @}
 709 @end example
 710
 711 @node Building the source
 712 @section Building the source
 713 @cindex Compiling your application
 714
 715 If you want to compile a source file including e.g. the `idna.h' header
 716 file, you must make sure that the compiler can find it in the
 717 directory hierarchy.  This is accomplished by adding the path to the
 718 directory in which the header file is located to the compilers include
 719 file search path (via the @option{-I} option).
 720
 721 However, the path to the include file is determined at the time the
 722 source is configured.  To solve this problem, `Libidn' uses the
 723 external package @command{pkg-config} that knows the path to the
 724 include file and other configuration options.  The options that need
 725 to be added to the compiler invocation at compile time are output by
 726 the @option{--cflags} option to @command{pkg-config libidn}.  The
 727 following example shows how it can be used at the command line:
 728
 729 @example
 730 gcc -c foo.c `pkg-config libidn --cflags`
 731 @end example
 732
 733 Adding the output of @samp{pkg-config libidn --cflags} to the
 734 compilers command line will ensure that the compiler can find e.g. the
 735 idna.h header file.
 736
 737 A similar problem occurs when linking the program with the library.
 738 Again, the compiler has to find the library files.  For this to work,
 739 the path to the library files has to be added to the library search
 740 path (via the @option{-L} option).  For this, the option
 741 @option{--libs} to @command{pkg-config libidn} can be used.  For
 742 convenience, this option also outputs all other options that are
 743 required to link the program with the `libidn' libarary.  The example
 744 shows how to link @file{foo.o} with the `libidn' library to a program
 745 @command{foo}.
 746
 747 @example
 748 gcc -o foo foo.o `pkg-config libidn --libs`
 749 @end example
 750
 751 Of course you can also combine both examples to a single command by
 752 specifying both options to @command{pkg-config}:
 753
 754 @example
 755 gcc -o foo foo.c `pkg-config libidn --cflags --libs`
 756 @end example
 757
 758 @node Autoconf tests
 759 @section Autoconf tests
 760 @cindex Autoconf tests
 761 @cindex Configure tests
 762
 763 If your project uses Autoconf (@pxref{top, GNU Autoconf,, autoconf})
 764 to check for installed libraries, you might find the following snippet
 765 illustrative.  It add a new @file{configure} parameter
 766 @code{--with-libidn}, and check for @file{idna.h} and @samp{-lidn}
 767 (possibly below the directory specified as the optional argument to
 768 @code{--with-libidn}), and define the @acronym{CPP} symbol
 769 @code{LIBIDN} if the library is found.  The default behaviour is to
 770 search for the library and enable the functionality (that is, define
 771 the symbol) when the library is found, but if you wish to make the
 772 default behaviour of your package be that Libidn is not used (even if
 773 it is installed on the system), change @samp{libidn=yes} to
 774 @samp{libidn=no} on the third line.
 775
 776 @example
 777 AC_ARG_WITH(libidn, AC_HELP_STRING([--with-libidn=[DIR]],
 778                                 [Support IDN (needs GNU Libidn)]),
 779   libidn=$withval, libidn=yes)
 780 if test "$libidn" != "no"; then
 781   if test "$libidn" != "yes"; then
 782     LDFLAGS="$@{LDFLAGS@} -L$libidn/lib"
 783     CPPFLAGS="$@{CPPFLAGS@} -I$libidn/include"
 784   fi
 785   AC_CHECK_HEADER(idna.h,
 786     AC_CHECK_LIB(idn, stringprep_check_version,
 787       [libidn=yes LIBS="$@{LIBS@} -lidn"], libidn=no),
 788     libidn=no)
 789 fi
 790 if test "$libidn" != "no" ; then
 791   AC_DEFINE(LIBIDN, 1, [Define to 1 if you want IDN support.])
 792 else
 793   AC_MSG_WARN([Libidn not found])
 794 fi
 795 AC_MSG_CHECKING([if Libidn should be used])
 796 AC_MSG_RESULT($libidn)
 797 @end example
 798
 799 If you require that your users have installed @code{pkg-config} (which
 800 I cannot recommend generally), the above can be done more easily as
 801 follows.
 802
 803 @example
 804 AC_ARG_WITH(libidn, AC_HELP_STRING([--with-libidn=[DIR]],
 805                                 [Support IDN (needs GNU Libidn)]),
 806   libidn=$withval, libidn=yes)
 807 if test "$libidn" != "no" ; then
 808   PKG_CHECK_MODULES(LIBIDN, libidn >= 0.0.0, [libidn=yes], [libidn=no])
 809   if test "$libidn" != "yes" ; then
 810     libidn=no
 811     AC_MSG_WARN([Libidn not found])
 812   else
 813     libidn=yes
 814     AC_DEFINE(LIBIDN, 1, [Define to 1 if you want Libidn.])
 815   fi
 816 fi
 817 AC_MSG_CHECKING([if Libidn should be used])
 818 AC_MSG_RESULT($libidn)
 819 @end example
 820
 821 @c **********************************************************
 822 @c ********************  Utility Functions ******************
 823 @c **********************************************************
 824 @node Utility Functions
 825 @chapter Utility Functions
 826 @cindex Utility Functions
 827
 828 The rest of this library makes extensive use of Unicode characters.
 829 In order to interface this library with the outside world, your
 830 application may need to make various Unicode transformations.
 831
 832 @section Header file @code{stringprep.h}
 833
 834 To use the functions explained in this chapter, you need to include
 835 the file @file{stringprep.h} using:
 836
 837 @example
 838 #include <stringprep.h>
 839 @end example
 840
 841 @section Unicode Encoding Transformation
 842
 843 @include texi/stringprep_unichar_to_utf8.texi
 844 @include texi/stringprep_utf8_to_unichar.texi
 845 @include texi/stringprep_ucs4_to_utf8.texi
 846 @include texi/stringprep_utf8_to_ucs4.texi
 847
 848 @section Unicode Normalization
 849
 850 @include texi/stringprep_ucs4_nfkc_normalize.texi
 851 @include texi/stringprep_utf8_nfkc_normalize.texi
 852
 853 @section Character Set Conversion
 854
 855 @include texi/stringprep_locale_charset.texi
 856 @include texi/stringprep_convert.texi
 857 @include texi/stringprep_locale_to_utf8.texi
 858 @include texi/stringprep_utf8_to_locale.texi
 859
 860
 861 @c **********************************************************
 862 @c ******************  Stringprep Functions *****************
 863 @c **********************************************************
 864 @node Stringprep Functions
 865 @chapter Stringprep Functions
 866 @cindex Stringprep Functions
 867
 868 Stringprep describes a framework for preparing Unicode text strings in
 869 order to increase the likelihood that string input and string
 870 comparison work in ways that make sense for typical users throughout
 871 the world. The stringprep protocol is useful for protocol identifier
 872 values, company and personal names, internationalized domain names,
 873 and other text strings.
 874
 875 @section Header file @code{stringprep.h}
 876
 877 To use the functions explained in this chapter, you need to include
 878 the file @file{stringprep.h} using:
 879
 880 @example
 881 #include <stringprep.h>
 882 @end example
 883
 884 @section Defining A Stringprep Profile
 885
 886 Further types and structures are defined for applications that want to
 887 specify their own stringprep profile.  As these are fairly obscure,
 888 and by necessity tied to the implementation, we do not document them
 889 here.  Look into the @file{stringprep.h} header file, and the
 890 @file{profiles.c} source code for the details.
 891
 892 @section Control Flags
 893
 894 @deftypevr {Stringprep flags} {Stringprep_profile_flags} {STRINGPREP_NO_NFKC}
 895 Disable the NFKC normalization, as well as selecting the non-NFKC case
 896 folding tables.  Usually the profile specifies BIDI and NFKC settings,
 897 and applications should not override it unless in special situations.
 898 @end deftypevr
 899
 900 @deftypevr {Stringprep flags} {Stringprep_profile_flags} {STRINGPREP_NO_BIDI}
 901 Disable the BIDI step.  Usually the profile specifies BIDI and NFKC
 902 settings, and applications should not override it unless in special
 903 situations.
 904 @end deftypevr
 905
 906 @deftypevr {Stringprep flags} {Stringprep_profile_flags} {STRINGPREP_NO_UNASSIGNED}
 907 Make the library return with an error if string contains unassigned
 908 characters according to profile.
 909 @end deftypevr
 910
 911 @section Core Functions
 912
 913 @include texi/stringprep_4i.texi
 914 @include texi/stringprep_4zi.texi
 915 @include texi/stringprep.texi
 916 @include texi/stringprep_profile.texi
 917
 918 @section Error Handling
 919
 920 @include texi/stringprep_strerror.texi
 921
 922 @section Stringprep Profile Macros
 923
 924 @deftypefun {int} stringprep_nameprep_no_unassigned (char * @var{in}, int @var{maxlen})
 925
 926 @var{in}: input/ouput array with string to prepare.
 927
 928 @var{maxlen}: maximum length of input/output array.
 929
 930 Prepare the input UTF-8 string according to the nameprep profile.  The
 931 AllowUnassigned flag is false, use @code{stringprep_nameprep} for
 932 true AllowUnassigned.  Returns 0 iff successful, or an error code.
 933 @end deftypefun
 934
 935 @deftypefun {int} stringprep_iscsi (char * @var{in}, int @var{maxlen})
 936
 937 @var{in}: input/ouput array with string to prepare.
 938
 939 @var{maxlen}: maximum length of input/output array.
 940
 941 Prepare the input UTF-8 string according to the draft iSCSI stringprep
 942 profile.  Returns 0 iff successful, or an error code.
 943 @end deftypefun
 944
 945 @deftypefun {int} stringprep_plain (char * @var{in}, int @var{maxlen})
 946
 947 @var{in}: input/ouput array with string to prepare.
 948
 949 @var{maxlen}: maximum length of input/output array.
 950
 951 Prepare the input UTF-8 string according to the draft SASL ANONYMOUS
 952 profile.  Returns 0 iff successful, or an error code.
 953 @end deftypefun
 954
 955 @deftypefun {int} stringprep_xmpp_nodeprep (char * @var{in}, int @var{maxlen})
 956
 957 @var{in}: input/ouput array with string to prepare.
 958
 959 @var{maxlen}: maximum length of input/output array.
 960
 961 Prepare the input UTF-8 string according to the draft XMPP node
 962 identifier profile.  Returns 0 iff successful, or an error code.
 963 @end deftypefun
 964
 965 @deftypefun {int} stringprep_xmpp_resourceprep (char * @var{in}, int @var{maxlen})
 966
 967 @var{in}: input/ouput array with string to prepare.
 968
 969 @var{maxlen}: maximum length of input/output array.
 970
 971 Prepare the input UTF-8 string according to the draft XMPP resource
 972 identifier profile.  Returns 0 iff successful, or an error code.
 973 @end deftypefun
 974
 975 @c **********************************************************
 976 @c *******************  Punycode Functions ******************
 977 @c **********************************************************
 978 @node Punycode Functions
 979 @chapter Punycode Functions
 980 @cindex Punycode Functions
 981
 982 Punycode is a simple and efficient transfer encoding syntax designed
 983 for use with Internationalized Domain Names in Applications. It
 984 uniquely and reversibly transforms a Unicode string into an ASCII
 985 string. ASCII characters in the Unicode string are represented
 986 literally, and non-ASCII characters are represented by ASCII
 987 characters that are allowed in host name labels (letters, digits, and
 988 hyphens). A general algorithm called Bootstring allows a string of
 989 basic code points to uniquely represent any string of code points
 990 drawn from a larger set. Punycode is an instance of Bootstring that
 991 uses particular parameter values, appropriate for IDNA.
 992
 993 @section Header file @code{punycode.h}
 994
 995 To use the functions explained in this chapter, you need to include
 996 the file @file{punycode.h} using:
 997
 998 @example
 999 #include <punycode.h>
1000 @end example
1001
1002 @section Unicode Code Point Data Type
1003
1004 The punycode function uses a special type to denote Unicode code
1005 points.  It is guaranteed to always be a 32 bit unsigned integer.
1006
1007 @deftypevr {Punycode Unicode code point} uint32_t punycode_uint
1008 A unsigned integer that hold Unicode code points.
1009 @end deftypevr
1010
1011 @section Core Functions
1012
1013 Note that the current implementation will fail if the
1014 @code{input_length} exceed 4294967295 (the size of
1015 @code{punycode_uint}).  This restriction may be removed in the future.
1016 Meanwhile applications are encouraged to not depend on this problem,
1017 and use @code{sizeof} to initialize @code{input_length} and
1018 @code{output_length}.
1019
1020 The functions provided are the following two entry points:
1021
1022 @include texi/punycode_encode.texi
1023 @include texi/punycode_decode.texi
1024
1025 @section Error Handling
1026
1027 @include texi/punycode_strerror.texi
1028
1029 @c **********************************************************
1030 @c ********************* IDNA Functions *********************
1031 @c **********************************************************
1032 @node IDNA Functions
1033 @chapter IDNA Functions
1034 @cindex IDNA Functions
1035
1036 Until now, there has been no standard method for domain names to use
1037 characters outside the ASCII repertoire. The IDNA document defines
1038 internationalized domain names (IDNs) and a mechanism called IDNA for
1039 handling them in a standard fashion. IDNs use characters drawn from a
1040 large repertoire (Unicode), but IDNA allows the non-ASCII characters
1041 to be represented using only the ASCII characters already allowed in
1042 so-called host names today. This backward-compatible representation is
1043 required in existing protocols like DNS, so that IDNs can be
1044 introduced with no changes to the existing infrastructure. IDNA is
1045 only meant for processing domain names, not free text.
1046
1047 @section Header file @code{idna.h}
1048
1049 To use the functions explained in this chapter, you need to include
1050 the file @file{idna.h} using:
1051
1052 @example
1053 #include <idna.h>
1054 @end example
1055
1056 @section Control Flags
1057
1058 The IDNA @code{flags} parameter can take on the following values, or a
1059 bit-wise inclusive or of any subset of the parameters:
1060
1061 @deftypevr {Return code} {Idna_flags} IDNA_ALLOW_UNASSIGNED
1062 Allow unassigned Unicode code points.
1063 @end deftypevr
1064
1065 @deftypevr {Return code} {Idna_flags} IDNA_USE_STD3_ASCII_RULES
1066 Check output to make sure it is a STD3 conforming host name.
1067 @end deftypevr
1068
1069 @section Prefix String
1070
1071 @deftypevr {Macro} {#define} IDNA_ACE_PREFIX
1072 String with the official IDNA prefix, @code{xn--}.
1073 @end deftypevr
1074
1075 @section Core Functions
1076
1077 The idea behind the IDNA function names are as follows: the
1078 @code{idna_to_ascii_4i} and @code{idna_to_unicode_44i} functions are
1079 the core IDNA primitives.  The @code{4} indicate that the function
1080 takes UCS-4 strings (i.e., Unicode code points encoded in a 32-bit
1081 unsigned integer type) of the specified length.  The @code{i} indicate
1082 that the data is written ``inline'' into the buffer.  This means the
1083 caller is responsible for allocating (and deallocating) the string,
1084 and providing the library with the allocated length of the string.
1085 The output length is written in the output length variable.  The
1086 remaining functions all contain the @code{z} indicator, which means
1087 the strings are zero terminated.  All output strings are allocated by
1088 the library, and must be deallocated by the caller.  The @code{4}
1089 indicator again means that the string is UCS-4, the @code{8} means the
1090 strings are UTF-8 and the @code{l} indicator means the strings are
1091 encoded in the encoding used by the current locale.
1092
1093 The functions provided are the following entry points:
1094
1095 @include texi/idna_to_ascii_4i.texi
1096 @include texi/idna_to_unicode_44i.texi
1097
1098 @section Simplified ToASCII Interface
1099
1100 @include texi/idna_to_ascii_4z.texi
1101 @include texi/idna_to_ascii_8z.texi
1102 @include texi/idna_to_ascii_lz.texi
1103
1104 @section Simplified ToUnicode Interface
1105
1106 @include texi/idna_to_unicode_4z4z.texi
1107 @include texi/idna_to_unicode_8z4z.texi
1108 @include texi/idna_to_unicode_8z8z.texi
1109 @include texi/idna_to_unicode_8zlz.texi
1110 @include texi/idna_to_unicode_lzlz.texi
1111
1112 @section Error Handling
1113
1114 @include texi/idna_strerror.texi
1115
1116 @c **********************************************************
1117 @c ********************** TLD Functions *********************
1118 @c **********************************************************
1119 @node TLD Functions
1120 @chapter TLD Functions
1121 @cindex TLD Functions
1122
1123 Organizations that manage some Top Level Domains (@acronym{TLD}s) have
1124 published tables with characters they accept within the domain.  The
1125 reason may be to reduce complexity that come from using the full
1126 Unicode range, and to protect themselves from future (backwards
1127 incompatible) changes in the IDN or Unicode specifications.  Libidn
1128 implement an infrastructure for defining and checking strings against
1129 such tables.  Libidn also ship some tables from @acronym{TLD}s that we
1130 have managed to get permission to use them from.  Because these tables
1131 are even less static than Unicode or StringPrep tables, it is likely
1132 that they will be updated from time to time (even in backwards
1133 incompatibe ways).  The Libidn interface provide a ``version'' field
1134 for each @acronym{TLD} table, which can be compared for equality to
1135 guarantee the same operation over time.
1136
1137 From a design point of view, you can regard the @acronym{TLD} tables
1138 for IDN as the ``localization'' step that come after the
1139 ``internationalization'' step provided by the IETF standards.
1140
1141 The TLD functionality rely on up-to-date tables.  The latest version
1142 of Libidn aim to provide these, but tables with unclear copying
1143 conditions, or generally experimental tables, are not included.  Some
1144 such tables can be found at @url{http://tldchk.berlios.de}.
1145
1146 @section Header file @code{tld.h}
1147
1148 To use the functions explained in this chapter, you need to include
1149 the file @file{tld.h} using:
1150
1151 @example
1152 #include <tld.h>
1153 @end example
1154
1155 @c @section Data Types
1156 @c
1157 @c @deftp {Data type} {Tld_table_element} @var{start} @var{end}
1158 @c @example
1159 @c /* Interval of valid code points in the TLD. */
1160 @c struct Tld_table_element
1161 @c @{
1162 @c   uint32_t start;            /* Start of range. */
1163 @c   uint32_t end;              /* End of range, end == start if single. */
1164 @c @};
1165 @c typedef struct Tld_table_element Tld_table_element;
1166 @c @end example
1167 @c This @code{struct} contain the @var{start} and @var{end} positions
1168 @c (inclusive) of a range.  If the range is a single (i.e., starts and
1169 @c ends in the same character), then set @var{end} to the same as
1170 @c @var{start}.  This structure is normally used as an array.
1171 @c @end deftp
1172 @c
1173 @c @deftp {Data type} {Tld_table} @var{name} @var{version} @var{nvalid} @var{valid}
1174 @c @example
1175 @c /* List valid code points in a TLD. */
1176 @c struct Tld_table
1177 @c @{
1178 @c   char *name;                        /* TLD name, e.g., "no". */
1179 @c   char *version;             /* Version string from TLD file. */
1180 @c   size_t nvalid;             /* Number of entries in data. */
1181 @c   Tld_table_element *valid[];        /* Sorted array of valid code points. */
1182 @c @};
1183 @c typedef struct Tld_table Tld_table;
1184 @c @end example
1185 @c In this @code{struct}, the @var{name} field is a string (@samp{char*})
1186 @c indicating the TLD name (e.g., ``no'').  The @var{version} field is a
1187 @c string (@samp{char*}) containing a free form humanly readable string
1188 @c that can be used for equality comparison to compare different versions
1189 @c of the table.  The @var{nvalid} field indicate how many entries there
1190 @c are in @var{valid}, which brings us finally to @var{valid} that
1191 @c contain the actual code points that are valid for this TLD (see
1192 @c @code{Tld_table_element} above).
1193 @c @end deftp
1194
1195 @section Core Functions
1196
1197 @include texi/tld_check_4t.texi
1198 @include texi/tld_check_4tz.texi
1199
1200 @section Utility Functions
1201
1202 @include texi/tld_get_4.texi
1203 @include texi/tld_get_4z.texi
1204 @include texi/tld_get_z.texi
1205 @include texi/tld_get_table.texi
1206 @include texi/tld_default_table.texi
1207
1208 @section High-Level Wrapper Functions
1209
1210 @include texi/tld_check_4.texi
1211 @include texi/tld_check_4z.texi
1212 @include texi/tld_check_8z.texi
1213 @include texi/tld_check_lz.texi
1214
1215 @section Error Handling
1216
1217 @include texi/tld_strerror.texi
1218
1219 @c **********************************************************
1220 @c ********************** PR29 Functions ********************
1221 @c **********************************************************
1222 @node PR29 Functions
1223 @chapter PR29 Functions
1224 @cindex PR29 Functions
1225
1226 A deficiency in the specification of Unicode Normalization Forms has
1227 been found.  The consequence is that some strings can be normalized
1228 into different strings by different implementations.  In other words,
1229 two different implementations may return different output for the same
1230 input (because the interpretation of the specification is
1231 ambiguous). Further, an implementation invoked again on the one of the
1232 output strings may return a different string (because one of the
1233 interpretation of the ambiguous specification make normalization
1234 non-idempotent).  Fortunately, only a select few character sequence
1235 exhibit this problem, and none of them are expected to occur in
1236 natural languages (due to different linguistic uses of the involved
1237 characters).
1238
1239 A full discussion of the problem may be found at:
1240
1241 @url{http://www.unicode.org/review/pr-29.html}
1242
1243 The PR29 functions below allow you to detect the problem sequence.  So
1244 when would you want to use these functions?  For most applications,
1245 such as those using Nameprep for IDN, this is likely only to be an
1246 interoperability problem.  Thus, you may not want to care about it, as
1247 the character sequences will rarely occur naturally.  However, if you
1248 are using a profile, such as SASLPrep, to process authentication
1249 tokens; authorization tokens; or passwords, there is a real danger
1250 that attackers may try to use the peculiarities in these strings to
1251 attack parts of your system.  As only a small number of strings, and
1252 no naturally occurring strings, exhibit this problem, the conservative
1253 approach of rejecting the strings is recommended.  If this approach is
1254 not used, you should instead verify that all parts of your system,
1255 that process the tokens and passwords, use a NFKC implementation that
1256 produce the same output for the same input.
1257
1258 Technically inclined readers may be interested in knowing more about
1259 the implementation aspects of the PR29 flaw. @xref{PR29 discussion}.
1260
1261 @section Header file @code{pr29.h}
1262
1263 To use the functions explained in this chapter, you need to include
1264 the file @file{pr29.h} using:
1265
1266 @example
1267 #include <pr29.h>
1268 @end example
1269
1270 @section Core Functions
1271
1272 @include texi/pr29_4.texi
1273
1274 @section Utility Functions
1275
1276 @include texi/pr29_4z.texi
1277 @include texi/pr29_8z.texi
1278
1279 @section Error Handling
1280
1281 @include texi/pr29_strerror.texi
1282
1283 @c **********************************************************
1284 @c ***********************  Examples  ***********************
1285 @c **********************************************************
1286 @node Examples
1287 @chapter Examples
1288 @cindex Examples
1289
1290 This chapter contains example code which illustrate how `Libidn' can
1291 be used when writing your own application.
1292
1293 @menu
1294 * Example 1::           Example using stringprep.
1295 * Example 2::           Example using punycode.
1296 * Example 3::           Example using IDNA ToASCII.
1297 * Example 4::           Example using IDNA ToUnicode.
1298 * Example 5::           Example using TLD checking.
1299 @end menu
1300
1301 @node Example 1
1302 @section Example 1
1303
1304 This example demonstrates how the stringprep functions are used.
1305
1306 @verbatiminclude ../examples/example.c
1307
1308 @node Example 2
1309 @section Example 2
1310
1311 This example demonstrates how the punycode functions are used.
1312
1313 @verbatiminclude ../examples/example2.c
1314
1315 @node Example 3
1316 @section Example 3
1317
1318 This example demonstrates how the library is used to convert
1319 internationalized domain names into ASCII compatible names.
1320
1321 @verbatiminclude ../examples/example3.c
1322
1323 @node Example 4
1324 @section Example 4
1325
1326 This example demonstrates how the library is used to convert ASCII
1327 compatible names to internationalized domain names.
1328
1329 @verbatiminclude ../examples/example4.c
1330
1331 @node Example 5
1332 @section Example 5
1333
1334 This example demonstrates how the library is used to check a string
1335 for invalid characters within a specific TLD.
1336
1337 @verbatiminclude ../examples/example5.c
1338
1339 @c **********************************************************
1340 @c *********************  Invoking idn  *********************
1341 @c **********************************************************
1342 @node Invoking idn
1343 @chapter Invoking idn
1344
1345 @pindex idn
1346 @cindex invoking @command{idn}
1347 @cindex command line
1348
1349 @section Name
1350
1351 GNU Libidn (idn) -- Internationalized Domain Names command line tool
1352
1353 @section Description
1354 @code{idn} allows internationalized string preparation
1355 (@samp{stringprep}), encoding and decoding of punycode data, and IDNA
1356 ToASCII/ToUnicode operations to be performed on the command line.
1357
1358 If strings are specified on the command line, they are used as input
1359 and the computed output is printed to standard output @code{stdout}.
1360 If no strings are specified on the command line, the program read
1361 data, line by line, from the standard input @code{stdin}, and print
1362 the computed output to standard output.  What processing is performed
1363 (e.g., ToASCII, or Punycode encode) is indicated by options.  If any
1364 errors are encountered, the execution of the applications is aborted.
1365
1366 All strings are expected to be encoded in the preferred charset used
1367 by your locale.  Use @code{--debug} to find out what this charset is.
1368 You can override the charset used by setting environment variable
1369 @code{CHARSET}.
1370
1371 To process a string that starts with @code{-}, for example
1372 @code{-foo}, use @code{--} to signal the end of parameters, as in
1373 @code{idn --quiet -a -- -foo}.
1374
1375 @section Options
1376 @code{idn} recognizes these commands:
1377
1378 @verbatim
1379   -h, --help               Print help and exit
1380
1381   -V, --version            Print version and exit
1382
1383   -s, --stringprep         Prepare string according to nameprep profile
1384
1385   -d, --punycode-decode    Decode Punycode
1386
1387   -e, --punycode-encode    Encode Punycode
1388
1389   -a, --idna-to-ascii      Convert to ACE according to IDNA (default)
1390
1391   -u, --idna-to-unicode    Convert from ACE according to IDNA
1392
1393       --allow-unassigned   Toggle IDNA AllowUnassigned flag  (default=off)
1394
1395       --usestd3asciirules  Toggle IDNA UseSTD3ASCIIRules flag  (default=off)
1396
1397   -t, --tld                Check string for TLD specific rules
1398                              Only for --idna-to-ascii and --idna-to-unicode
1399                              (default=on)
1400
1401   -p, --profile=STRING     Use specified stringprep profile instead
1402                              Valid stringprep profiles are `Nameprep',
1403                              `iSCSI', `Nodeprep', `Resourceprep', `trace', and
1404                              `SASLprep'.
1405
1406       --debug              Print debugging information  (default=off)
1407
1408       --quiet              Silent operation  (default=off)
1409 @end verbatim
1410
1411 @section Environment Variables
1412
1413 The @var{CHARSET} environment variable can be used to override what
1414 character set to be used for decoding incoming data (i.e., on the
1415 command line or on the standard input stream), and to encode data to
1416 the standard output.  If your system is set up correctly, however, the
1417 application will guess which character set is used automatically.
1418 Example usage:
1419
1420 @example
1421 $ CHARSET=ISO-8859-1 idn --punycode-encode
1422 ...
1423 @end example
1424
1425 @section Examples
1426
1427 Standard usage, reading input from standard input:
1428
1429 @example
1430 jas@@latte:~$ idn
1431 libidn 0.3.5
1432 Copyright 2002, 2003 Simon Josefsson.
1433 GNU Libidn comes with NO WARRANTY, to the extent permitted by law.
1434 You may redistribute copies of GNU Libidn under the terms of
1435 the GNU Lesser General Public License.  For more information
1436 about these matters, see the file named COPYING.LIB.
1437 Type each input string on a line by itself, terminated by a newline character.
1438 r@"aksm@"org@aa{}s.se
1439 xn--rksmrgs-5wao1o.se
1440 jas@@latte:~$
1441 @end example
1442
1443 Reading input from command line, and disabling copyright and license
1444 information:
1445
1446 @example
1447 jas@@latte:~$ idn --quiet r@"aksm@"org@aa{}s.se bl@aa{}b@ae{}rgr@o{}d.no
1448 xn--rksmrgs-5wao1o.se
1449 xn--blbrgrd-fxak7p.no
1450 jas@@latte:~$
1451 @end example
1452
1453 Accessing a specific StringPrep profile directly:
1454
1455 @example
1456 jas@@latte:~$ idn --quiet --profile=SASLprep --stringprep te@ss{}t@ordf{}
1457 te@ss{}ta
1458 jas@@latte:~$
1459 @end example
1460
1461 @section Troubleshooting
1462
1463 Getting character data encoded right, and making sure Libidn use the
1464 same encoding, can be difficult.  The reason for this is that most
1465 systems encode character data in more than one character encoding,
1466 i.e., using @code{UTF-8} together with @code{ISO-8859-1} or
1467 @code{ISO-2022-JP}.  This problem is likely to continue to exist until
1468 only one character encoding come out as the evolutionary winner, or
1469 (more likely, at least to some extents) forever.
1470
1471 The first step to troubleshooting character encoding problems with
1472 Libidn is to use the @samp{--debug} parameter to find out which
1473 character set encoding @samp{idn} believe your locale uses.
1474
1475 @example
1476 jas@@latte:~$ idn --debug --quiet ""
1477 system locale uses charset `UTF-8'.
1478
1479 jas@@latte:~$
1480 @end example
1481
1482 If it prints @code{ANSI_X3.4-1968} (i.e., @code{US-ASCII}), this
1483 indicate you have not configured your locale properly.  To configure
1484 the locale, you can, for example, use @samp{LANG=sv_SE.UTF-8; export
1485 LANG} at a @code{/bin/sh} prompt, to set up your locale for a Swedish
1486 environment using @code{UTF-8} as the encoding.
1487
1488 Sometimes @samp{idn} appear to be unable to translate from your system
1489 locale into @code{UTF-8} (which is used internally), and you get an
1490 error like the following:
1491
1492 @example
1493 jas@@latte:~$ idn --quiet foo
1494 idn: could not convert from ISO-8859-1 to UTF-8.
1495 jas@@latte:~$
1496 @end example
1497
1498 The simplest explanation is that you haven't installed the
1499 @samp{iconv} conversion tools.  You can find it as a standalone
1500 library in @acronym{GNU} Libiconv
1501 (@uref{http://www.gnu.org/software/libiconv/}).  On many
1502 @acronym{GNU}/Linux systems, this library is part of the system, but
1503 you may have to install additional packages (e.g., @samp{glibc-locale}
1504 for Debian) to be able to use it.
1505
1506 Another explanation is that the error is correct and you are feeding
1507 @samp{idn} invalid data.  This can happen inadvertently if you are not
1508 careful with the character set encodings you use.  For example, if
1509 your shell run in a @code{ISO-8859-1} environment, and you invoke
1510 @samp{idn} with the @samp{CHARSET} environment variable as follows,
1511 you will feed it @code{ISO-8859-1} characters but force it to believe
1512 they are @code{UTF-8}.  Naturally this will lead to an error, unless
1513 the byte sequences happen to be parsable as @code{UTF-8}.  Note that
1514 even if you don't get an error, the output may be incorrect in this
1515 situation, because @code{ISO-8859-1} and @code{UTF-8} does not in
1516 general encode the same characters as the same byte sequences.
1517
1518 @example
1519 jas@@latte:~$ idn --quiet --debug ""
1520 system locale uses charset `ISO-8859-1'.
1521
1522 jas@@latte:~$ CHARSET=UTF-8 idn --quiet --debug r@"aksm@"org@aa{}s
1523 system locale uses charset `UTF-8'.
1524 input[0] = U+0072
1525 input[1] = U+4af3
1526 input[2] = U+006d
1527 input[3] = U+1b29e5
1528 input[4] = U+0073
1529 output[0] = U+0078
1530 output[1] = U+006e
1531 output[2] = U+002d
1532 output[3] = U+002d
1533 output[4] = U+0072
1534 output[5] = U+006d
1535 output[6] = U+0073
1536 output[7] = U+002d
1537 output[8] = U+0068
1538 output[9] = U+0069
1539 output[10] = U+0036
1540 output[11] = U+0064
1541 output[12] = U+0035
1542 output[13] = U+0039
1543 output[14] = U+0037
1544 output[15] = U+0035
1545 output[16] = U+0035
1546 output[17] = U+0032
1547 output[18] = U+0061
1548 xn--rms-hi6d597552a
1549 jas@@latte:~$
1550 @end example
1551
1552 The sense moral here is to forget about @samp{CHARSET} (configure your
1553 locales properly instead) unless you know what you are doing, and if
1554 you want to use it, do it carefully, after verifying with
1555 @samp{--debug} that you get the desired results.
1556
1557 @node Emacs API
1558 @chapter Emacs API
1559
1560 Included in Libidn are @file{punycode.el} and @file{idna.el} that
1561 provides an Emacs Lisp API to (a limited set of) the Libidn API.  This
1562 section describes the API.  Currently the IDNA API always set the
1563 @code{UseSTD3ASCIIRules} flag and clear the @code{AllowUnassigned}
1564 flag, in the future there may be functionality to specify these flags
1565 via the API.
1566
1567 @section Punycode Emacs API
1568
1569 @defvar punycode-program
1570 Name of the GNU Libidn @file{idn} application.  The default is
1571 @samp{idn}.  This variable can be customized.
1572 @end defvar
1573
1574 @defvar punycode-environment
1575 List of environment variable definitions prepended to
1576 @samp{process-environment}.  The default is @samp{("CHARSET=UTF-8")}.
1577 This variable can be customized.
1578 @end defvar
1579
1580 @defvar punycode-encode-parameters
1581 List of parameters passed to @var{punycode-program} to invoke punycode
1582 encoding mode.  The default is @samp{("--quiet" "--punycode-encode")}.
1583 This variable can be customized.
1584 @end defvar
1585
1586 @defvar punycode-decode-parameters
1587 Parameters passed to @var{punycode-program} to invoke punycode
1588 decoding mode.  The default is @samp{("--quiet" "--punycode-decode")}.
1589 This variable can be customized.
1590 @end defvar
1591
1592 @defun punycode-encode string
1593 Returns a Punycode encoding of the @var{string}, after converting the
1594 input into UTF-8.
1595 @end defun
1596
1597 @defun punycode-decode string
1598 Returns a possibly multibyte string which is the decoding of the
1599 @var{string} which is a punycode encoded string.
1600 @end defun
1601
1602 @section IDNA Emacs API
1603
1604 @defvar idna-program
1605 Name of the GNU Libidn @file{idn} application.  The default is
1606 @samp{idn}.  This variable can be customized.
1607 @end defvar
1608
1609 @defvar idna-environment
1610 List of environment variable definitions prepended to
1611 @samp{process-environment}.  The default is @samp{("CHARSET=UTF-8")}.
1612 This variable can be customized.
1613 @end defvar
1614
1615 @defvar idna-to-ascii-parameters
1616 List of parameters passed to @var{idna-program} to invoke IDNA ToASCII
1617 mode.  The default is @samp{("--quiet" "--idna-to-ascii"
1618 "--usestd3asciirules")}.  This variable can be customized.
1619 @end defvar
1620
1621 @defvar idna-to-unicode-parameters
1622 Parameters passed @var{idna-program} to invoke IDNA ToUnicode mode.
1623 The default is @samp{("--quiet" "--idna-to-unicode"
1624 "--usestd3asciirules")}.  This variable can be customized.
1625 @end defvar
1626
1627 @defun idna-to-ascii string
1628 Returns an ASCII Compatible Encoding (ACE) of the string computed by
1629 the IDNA ToASCII operation on the input @var{string}, after converting
1630 the input to UTF-8.
1631 @end defun
1632
1633 @defun idna-to-unicode string
1634 Returns a possibly multibyte string which is the output of the IDNA
1635 ToUnicode operation computed on the input @var{string}.
1636 @end defun
1637
1638 @node Java API
1639 @chapter Java API
1640
1641 Libidn has been ported to the Java programming language, and as a
1642 consequence most of the API is available to native Java applications.
1643 This section contain notes on this support, complete documentation is
1644 pending.
1645
1646 The Java library, if Libidn has been built with Java support
1647 (@pxref{Downloading and Installing}), will be placed in
1648 @file{java/libidn.jar}.  The source code is located in
1649 @file{java/gnu/inet/encoding/}.
1650
1651 @section Overview
1652
1653 This package provides a Java implementation of the Internationalized
1654 Domain Names in Applications (IDNA) standard. It is written entirely
1655 in Java and does not require any additional libraries to be set up.
1656
1657 The gnu.inet.encoding.IDNA class offers two public functions, toASCII
1658 and toUnicode which can be used as follows:
1659
1660 @example
1661 gnu.inet.encoding.IDNA.toASCII("bl@"ods.z@"ug");
1662 gnu.inet.encoding.IDNA.toUnicode("xn--blds-6qa.xn--zg-xka");
1663 @end example
1664
1665 @section Miscellaneous Programs
1666
1667 The @file{misc/} directory contains several programs that are related
1668 to the Java part of GNU Libidn, but that don't need to be included in
1669 the main source tree.
1670
1671 @subsection GenerateRFC3454
1672
1673 This program parses RFC3454 and creates the RFC3454.java program that
1674 is required during the StringPrep phase.
1675
1676 The RFC can be found at various locations, for example at
1677 @url{http://www.ietf.org/rfc/rfc3454.txt}.
1678
1679 Invoke the program as follows:
1680
1681 @example
1682 $ java GenerateRFC3454
1683 Creating RFC3454.java... Ok.
1684 @end example
1685
1686 @subsection GenerateNFKC
1687
1688 The GenerateNFKC program parses the Unicode character database file
1689 and generates all the tables required for NFKC. This program requires
1690 the two files UnicodeData.txt and CompositionExclusions.txt of version
1691 3.2 of the Unicode files. Note that RFC3454 (Stringprep) defines that
1692 Unicode version 3.2 is to be used, not the latest version.
1693
1694 The Unicode data files can be found at
1695 @url{http://www.unicode.org/Public/}.
1696
1697 Invoke the program as follows:
1698
1699 @example
1700 $ java GenerateNFKC
1701 Creating CombiningClass.java... Ok.
1702 Creating DecompositionKeys.java... Ok.
1703 Creating DecompositionMappings.java... Ok.
1704 Creating Composition.java... Ok.
1705 @end example
1706
1707 @subsection TestIDNA
1708
1709 The TestIDNA program allows to test the IDNA implementation manually
1710 or against Simon Josefsson's test vectors.
1711
1712 The test vectors can be found at the Libidn homepage,
1713 @url{http://www.gnu.org/software/libidn/}.
1714
1715 To test the tranformation manually, use:
1716
1717 @example
1718 $ java -cp .:../libidn.jar TestIDNA -a <string to test>
1719 Input: <string to test>
1720 Output: <toASCII(string to test)>
1721 $ java -cp .:../libidn.jar TestIDNA -u <string to test>
1722 Input: <string to test>
1723 Output: <toUnicode(string to test)>
1724 @end example
1725
1726 To test against draft-josefsson-idn-test-vectors.html, use:
1727
1728 @example
1729 $ java -cp .:../libidn.jar TestIDNA -t
1730 No errors detected!
1731 @end example
1732
1733 @subsection TestNFKC
1734
1735 The TestNFKC program allows to test the NFKC implementation manually
1736 or against the NormalizationTest.txt file from the Unicode data files.
1737
1738 To test the normalization manually, use:
1739
1740 @example
1741 $ java -cp .:../libidn.jar TestNFKC <string to test>
1742 Input: <string to test>
1743 Output: <nfkc version of the string to test>
1744 @end example
1745
1746 To test against NormalizationTest.txt:
1747
1748 @example
1749 $ java -cp .:../libidn.jar TestNFKC
1750 No errors detected!
1751 @end example
1752
1753 @section Possible Problems
1754
1755 Beware of Bugs: This Java API needs a lot more testing, especially
1756 with "exotic" character sets. While it works for me, it may not work
1757 for you.
1758
1759 Encoding of your Java sources: If you are using non-ASCII characters
1760 in your Java source code, make sure javac compiles your programs with
1761 the correct encoding. If necessary specify the encoding using the
1762 -encoding parameter.
1763
1764 Java Unicode handling: Java 1.4 only handles 16-bit Unicode code
1765 points (i.e. characters in the Basic Multilingual Plane), this
1766 implementation therefore ignores all references to so-called
1767 Supplementary Characters (U+10000 to U+10FFFF). Starting from Java
1768 1.5, these characters will also be supported by Java, but this will
1769 require changes to this library.  See also the next section.
1770
1771 @section A Note on Java and Unicode
1772
1773 This library uses Java's builtin 'char' datatype. Up to Java 1.4, this
1774 datatype only supports 16-bit Unicode code points, also called the
1775 Basic Multilingual Plane. For this reason, this library doesn't work
1776 for Supplementary Characters (i.e. characters from U+10000 to
1777 U+10FFFF). All references to such characters are silently ignored.
1778
1779 Starting from Java 1.5, also Supplementary Characters will be
1780 supported. However, this will require changes in the present version
1781 of the library. Java 1.5 is currently in beta status.
1782
1783 For more information refer to the documentation of java.lang.Character
1784 in the JDK API.
1785
1786 @node C# API
1787 @chapter C# API
1788
1789 The Libidn library has been ported to the C# language.  The port
1790 reside in the top-level @file{csharp/} directory.  Currently, no
1791 further documentation about the implementation or the API is
1792 available.
1793
1794 @c **********************************************************
1795 @c *******************  Acknowledgements  *******************
1796 @c **********************************************************
1797 @node Acknowledgements
1798 @chapter Acknowledgements
1799
1800 The punycode implementation was taken from the IETF IDN Punycode
1801 specification, by Adam M. Costello.  The TLD code was contributed by
1802 Thomas Jacob.  The Java implementation was contributed by Oliver Hitz.
1803 The C# implementation was contributed by Alexander Gnauck.  The
1804 Unicode tables were provided by Unicode, Inc.  Some functions for
1805 dealing with Unicode (see nfkc.c and toutf8.c) were borrowed from
1806 GLib, downloaded from @url{http://www.gtk.org/}.  The manual borrowed
1807 text from Libgcrypt by Werner Koch.
1808
1809 Inspiration for many things that, consciously or not, have gone into
1810 this package is due to a number of free software package that the
1811 author has been exposed to.  The author wishes to acknowledge the free
1812 software community in general, for giving an example on what role
1813 software development can play in the modern society.
1814
1815 Several people reported bugs, sent patches or suggested improvements,
1816 see the file THANKS in the top-level directory of the source code.
1817
1818 @c **********************************************************
1819 @c **********************  Milestones  **********************
1820 @c **********************************************************
1821 @node Milestones
1822 @chapter Milestones
1823
1824 The complete history of user visible changes is stored in the file
1825 @file{NEWS} in the top-level directory of the source code tree.  The
1826 complete history of modifications to each file is stored in the file
1827 @file{ChangeLog} in the same directory.  This section contain a
1828 condensed version of that information, in the form of ``milestones''
1829 for the project.
1830
1831 @table @asis
1832 @item Stringprep implementation.
1833 Version 0.0.0 released on 2002-11-05.
1834
1835 @item IDNA and Punycode implementations, part of the GNU project.
1836 Version 0.1.0 released on 2003-01-05.
1837
1838 @item Uses official IDNA ACE prefix 'xn--'.
1839 Version 0.1.7 released on 2003-02-12.
1840
1841 @item Command line interface.
1842 Version 0.1.11 released on 2003-02-26.
1843
1844 @item GNU Libc add-on proposed.
1845 Version 0.1.12 released on 2003-03-06.
1846
1847 @item Interoperability testing during IDNConnect.
1848 Version 0.3.1 released on 2003-10-02.
1849
1850 @item TLD restriction testing.
1851 Version 0.4.0 released on 2004-02-28.
1852
1853 @item GNU Libc add-on integrated.
1854 Version 0.4.1 released on 2004-03-08.
1855
1856 @item Native Java implementation.
1857 Version 0.4.2-0.4.9 released between 2004-03-20 and 2004-06-11.
1858
1859 @item PR-29 functions for ``problem sequences''.
1860 Version 0.5.0 released on 2004-06-26.
1861
1862 @item Many small portability fixes and wider use.
1863 Version 0.5.1 through 0.5.20, released between 2004-07-09 and
1864 2005-10-23.
1865
1866 @item Native C# implementation.
1867 Version 0.6.0 released on 2005-12-03.
1868
1869 @end table
1870
1871 @node Concept Index
1872 @unnumbered Concept Index
1873
1874 @printindex cp
1875
1876 @node Function and Variable Index
1877 @unnumbered Function and Variable Index
1878
1879 @printindex fn
1880
1881 @node PR29 discussion
1882 @appendix PR29 discussion
1883
1884 If you wish to experiment with a modified Unicode NFKC implementation
1885 according to the PR29 proposal, you may find the following bug report
1886 useful.  However, I have not verified that the suggested modifications
1887 are correct.  For reference, I'm including my response to the report
1888 as well.
1889
1890 @verbatim
1891 From: Rick McGowan <rick@unicode.org>
1892 Subject: Possible bug and status of PR 29 change(s)
1893 To: bug-libidn@gnu.org
1894 Date: Wed, 27 Oct 2004 14:49:17 -0700
1895
1896 Hello. On behalf of the Unicode Consortium editorial committee, I would
1897 like to find out more information about the PR 29 fixes, if any, and
1898 functions in Libidn. Your implementation was listed in the text of PR29 as
1899 needing investigation, so I am following up on several implementations.
1900
1901 The UTC has accepted the proposed fix to D2 as outlined in PR29, and a new
1902 draft of UAX #15 has been issued.
1903
1904 I have looked at Libidn 0.5.8 (today), and there may still be a possible
1905 bug in NFKC.java and nfkc.c.
1906
1907 ------------------------------------------------------
1908
1909 1. In NFKC.java, this line in canonicalOrdering():
1910
1911       if (i > 0 && (last_cc == 0 || last_cc != cc)) {
1912
1913 should perhaps be changed to:
1914
1915       if (i > 0 && (last_cc == 0 || last_cc < cc)) {
1916
1917 but I'm not sure of the sense of this comparison.
1918
1919 ------------------------------------------------------
1920
1921 2. In nfkc.c, function _g_utf8_normalize_wc() has this code:
1922
1923           if (i > 0 &&
1924               (last_cc == 0 || last_cc != cc) &&
1925               combine (wc_buffer[last_start], wc_buffer[i],
1926                        &wc_buffer[last_start]))
1927             {
1928
1929 This appears to have the same bug as the current Python implementation (in
1930 Python 2.3.4). The code should be checking, as per new rule D2 UAX #15
1931 update, that the next combining character is the same or HIGHER than the
1932 current one. It now checks to see if it's non-zero and not equal.
1933
1934 The above line(s) should perhaps be changed to:
1935
1936           if (i > 0 &&
1937               (last_cc == 0 || last_cc < cc) &&
1938               combine (wc_buffer[last_start], wc_buffer[i],
1939                        &wc_buffer[last_start]))
1940             {
1941
1942 but I'm not sure of the sense of the comparison (< or > or <=?) here.
1943
1944 In the text of PR29, I will be marking Libidn as "needs change" and adding
1945 the version number that I checked. If any further change is made, please
1946 let me know the release version, and I'll update again.
1947
1948 Regards,
1949         Rick McGowan
1950 @end verbatim
1951
1952 @verbatim
1953 From: Simon Josefsson <jas@extundo.com>
1954 Subject: Re: Possible bug and status of PR 29 change(s)
1955 To: Rick McGowan <rick@unicode.org>
1956 Cc: bug-libidn@gnu.org
1957 Date: Thu, 28 Oct 2004 09:47:47 +0200
1958
1959 Rick McGowan <rick@unicode.org> writes:
1960
1961 > Hello. On behalf of the Unicode Consortium editorial committee, I would
1962 > like to find out more information about the PR 29 fixes, if any, and
1963 > functions in Libidn. Your implementation was listed in the text of PR29 as
1964 > needing investigation, so I am following up on several implementations.
1965 >
1966 > The UTC has accepted the proposed fix to D2 as outlined in PR29, and a new
1967 > draft of UAX #15 has been issued.
1968 >
1969 > I have looked at Libidn 0.5.8 (today), and there may still be a possible
1970 > bug in NFKC.java and nfkc.c.
1971
1972 Hello Rick.
1973
1974 I believe the current behavior is intentional.  Libidn do not aim to
1975 implement latest-and-greatest NFKC, it aim to implement the NFKC
1976 functionality required for StringPrep and IDN.  As you may know,
1977 StringPrep/IDN reference Unicode 3.2.0, and explicitly says any later
1978 changes (which I consider PR29 as) do not apply.
1979
1980 In fact, I believe that would I incorporate the changes suggested in
1981 PR29, I would in fact be violating the IDN specifications.
1982
1983 Thanks for looking into the code and finding the place where the
1984 change could be made.  I'll see if I can mention this in the manual
1985 somewhere, for technically interested readers.
1986
1987 Regards,
1988 Simon
1989 @end verbatim
1990
1991 @include lgpl.texi
1992
1993 @node Copying This Manual
1994 @appendix Copying This Manual
1995
1996 @menu
1997 * GNU Free Documentation License::  License for copying this manual.
1998 @end menu
1999
2000 @include fdl.texi
2001
2002 @bye