doc/libidn.texi

   1 \input texinfo   @c -*- mode: texinfo; coding: us-ascii; -*-
   2 @c This file is part of GNU Libidn.
   3 @c See below for copyright and license.
   4
   5 @setfilename libidn.info
   6 @include version.texi
   7 @settitle GNU Libidn
   8 @finalout
   9
  10 @syncodeindex pg cp
  11
  12 @copying
  13 This manual is last updated @value{UPDATED} for version
  14 @value{VERSION} of GNU Libidn.
  15
  16 Copyright @copyright{} 2002, 2003, 2004, 2005, 2006 Simon Josefsson.
  17
  18 @quotation
  19 Permission is granted to copy, distribute and/or modify this document
  20 under the terms of the GNU Free Documentation License, Version 1.2 or
  21 any later version published by the Free Software Foundation; with the
  22 Invariant Sections being ``Commercial Support'', no Front-Cover Texts,
  23 and no Back-Cover Texts.  A copy of the license is included in the
  24 section entitled ``GNU Free Documentation License''.
  25 @end quotation
  26 @end copying
  27
  28 @dircategory GNU Libraries
  29 @direntry
  30 * libidn: (libidn).     Internationalized string processing library.
  31 @end direntry
  32
  33 @dircategory GNU utilities
  34 @direntry
  35 * idn: (libidn)Invoking idn.            Command line interface to GNU Libidn.
  36 @end direntry
  37
  38 @dircategory Emacs
  39 @direntry
  40 * IDN Library: (libidn)Emacs API.       Emacs API for IDN functions.
  41 @end direntry
  42
  43 @titlepage
  44 @title GNU Libidn
  45 @subtitle Internationalized string processing for the GNU system
  46 @subtitle for version @value{VERSION}, @value{UPDATED}
  47 @author Simon Josefsson
  48 @page
  49 @vskip 0pt plus 1filll
  50 @insertcopying
  51 @end titlepage
  52
  53 @contents
  54
  55 @ifnottex
  56 @node Top
  57 @top GNU Libidn
  58
  59 @insertcopying
  60 @end ifnottex
  61
  62 @menu
  63 * Introduction::                How to use this manual.
  64 * Preparation::                 What you should do before using the library.
  65 * Utility Functions::           Unicode transformation utility functions.
  66 * Stringprep Functions::        Stringprep functions.
  67 * Punycode Functions::          Punycode functions.
  68 * IDNA Functions::              IDNA functions.
  69 * TLD Functions::               TLD functions.
  70 * PR29 Functions::              Detect strings non-idempotent under NFKC.
  71 * Examples::                    Demonstrate how to use the library.
  72 * Invoking idn::                Command line interface to the library.
  73 * Emacs API::                   Emacs Lisp API for Libidn.
  74 * Java API::                    Notes on the Java port of Libidn.
  75 * C# API::                      Notes on the C# port of Libidn.
  76 * Acknowledgements::            Whom to blame.
  77 * Milestones::                  Rough outline of development history.
  78
  79 Indices
  80
  81 * Concept Index::
  82 * Function and Variable Index::
  83
  84 Appendices
  85
  86 * PR29 discussion::             Implementation aspects of the PR29 flaw.
  87 * Library Copying::             How you can copy and share GNU Libidn.
  88 * Copying This Manual::         How you can copy and share this manual.
  89
  90 @end menu
  91
  92
  93 @node Introduction
  94 @chapter Introduction
  95
  96 GNU Libidn is an implementation of the Stringprep, Punycode and IDNA
  97 specifications defined by the IETF Internationalized Domain Names
  98 (IDN) working group, used for internationalized domain names.  The
  99 package is available under the GNU Lesser General Public License.
 100
 101 The library contains a generic Stringprep implementation that does
 102 Unicode 3.2 NFKC normalization, mapping and prohibitation of
 103 characters, and bidirectional character handling.  Profiles for
 104 Nameprep, iSCSI, SASL and XMPP are included.  Punycode and ASCII
 105 Compatible Encoding (ACE) via IDNA are supported.  A mechanism to
 106 define Top-Level Domain (TLD) specific validation tables, and to
 107 compare strings against those tables, is included.  Default tables for
 108 some TLDs are also included.
 109
 110 The Stringprep API consists of two main functions, one for converting
 111 data from the system's native representation into UTF-8, and one
 112 function to perform the Stringprep processing.  Adding a new
 113 Stringprep profile for your application within the API is
 114 straightforward.  The Punycode API consists of one encoding function
 115 and one decoding function.  The IDNA API consists of the ToASCII and
 116 ToUnicode functions, as well as an high-level interface for converting
 117 entire domain names to and from the ACE encoded form.  The TLD API
 118 consists of one set of functions to extract the TLD name from a domain
 119 string, one set of functions to locate the proper TLD table to use
 120 based on the TLD name, and core functions to validate a string against
 121 a TLD table, and some utility wrappers to perform all the steps in one
 122 call.
 123
 124 The library is used by, e.g., GNU SASL and Shishi to process user
 125 names and passwords.  Libidn can be built into GNU Libc to enable a
 126 new system-wide getaddrinfo flag for IDN processing.
 127
 128 Libidn is developed for the GNU/Linux system, but runs on over 20 Unix
 129 platforms (including Solaris, IRIX, AIX, and Tru64) and Windows.
 130 Libidn is written in C and (parts of) the API is accessible from C,
 131 C++, Emacs Lisp, Python and Java.  A native Java and C# port is also
 132 provided.
 133
 134 @menu
 135 * Getting Started::
 136 * Features::
 137 * Library Overview::
 138 * Supported Platforms::
 139 * Getting help::
 140 * Commercial Support::
 141 * Downloading and Installing::
 142 * Bug Reports::
 143 * Contributing::
 144 @end menu
 145
 146 @node Getting Started
 147 @section Getting Started
 148
 149 This manual documents the library programming interface.  All
 150 functions and data types provided by the library are explained.
 151 Included are also examples, and documentation for the command line
 152 tool @file{idn} that provide a quick interface to the library.  The
 153 Emacs Lisp bindings for the library is also discussed.
 154
 155 The reader is assumed to possess basic familiarity with
 156 internationalization concepts and network programming in C or C++.
 157
 158 This manual can be used in several ways.  If read from the beginning
 159 to the end, it gives a good introduction into the library and how it
 160 can be used in an application.  Forward references are included where
 161 necessary.  Later on, the manual can be used as a reference manual to
 162 get just the information needed about any particular interface of the
 163 library.  Experienced programmers might want to start looking at the
 164 examples at the end of the manual (@pxref{Examples}), and then only
 165 read up those parts of the interface which are unclear.
 166
 167 @node Features
 168 @section Features
 169
 170 This library might have a couple of advantages over other libraries
 171 doing a similar job.
 172
 173 @table @asis
 174 @item It's Free Software
 175 Anybody can use, modify, and redistribute it under the terms of the
 176 GNU Lesser General Public License.
 177
 178 @item It's thread-safe
 179 No global state is kept in the library.  All functions are reentrant.
 180
 181 @item It's portable
 182 The code is intended to be written in pure ANSI C89.  It has been
 183 tested on many Unix like operating systems, and Windows.
 184
 185 @item It's modularized
 186 The library is composed of several modules, and the only interaction
 187 between modules is through each modules' public API.  If you only need
 188 one piece of functionality, it is possible to take the files you need
 189 and incorporate them into your own project.
 190
 191 @item It's not bloated
 192 The design of the library is based on the smallest API necessary to
 193 implement the basic functionality.  It has been carefully extended
 194 with a small number of high-level wrappers to make it comfortable to
 195 use the library.  However, it does not implement additional
 196 functionality just for the sake of completeness.
 197
 198 @item It's documented
 199 Sadly, not all software comes with documentation these days.  This one
 200 does.
 201
 202 @end table
 203
 204 @node Library Overview
 205 @section Library Overview
 206
 207 The following illustration show the components that make up Libidn,
 208 and how your application relates to the library.  In the illustration,
 209 various components are shown as boxes.  You see the generic StringPrep
 210 component, the various StringPrep profiles including Nameprep, the
 211 Punycode component, the IDNA component, and the TLD component.  The
 212 arrows indicate aggregation, e.g., IDNA uses Punycode and Nameprep,
 213 and in turn Nameprep uses the generic StringPrep interface.  The
 214 interfaces to all components are available for applications, no
 215 component within the library is hidden from the application.
 216
 217 @image{components}
 218
 219 @node Supported Platforms
 220 @section Supported Platforms
 221
 222 Libidn has at some point in time been tested on the following
 223 platforms.
 224
 225 @enumerate
 226
 227 @item Debian GNU/Linux 3.0 (Woody)
 228 @cindex Debian
 229
 230 GCC 2.95.4 and GNU Make. This is the main development platform.
 231 @code{alphaev67-unknown-linux-gnu}, @code{alphaev6-unknown-linux-gnu},
 232 @code{arm-unknown-linux-gnu}, @code{armv4l-unknown-linux-gnu},
 233 @code{hppa-unknown-linux-gnu}, @code{hppa64-unknown-linux-gnu},
 234 @code{i686-pc-linux-gnu}, @code{ia64-unknown-linux-gnu},
 235 @code{m68k-unknown-linux-gnu}, @code{mips-unknown-linux-gnu},
 236 @code{mipsel-unknown-linux-gnu}, @code{powerpc-unknown-linux-gnu},
 237 @code{s390-ibm-linux-gnu}, @code{sparc-unknown-linux-gnu},
 238 @code{sparc64-unknown-linux-gnu}.
 239
 240 @item Debian GNU/Linux 2.1
 241 @cindex Debian
 242
 243 GCC 2.95.1 and GNU Make. @code{armv4l-unknown-linux-gnu}.
 244
 245 @item Tru64 UNIX
 246 @cindex Tru64
 247
 248 Tru64 UNIX C compiler and Tru64 Make. @code{alphaev67-dec-osf5.1},
 249 @code{alphaev68-dec-osf5.1}.
 250
 251 @item SuSE Linux 7.1
 252 @cindex SuSE
 253
 254 GCC 2.96 and GNU Make. @code{alphaev6-unknown-linux-gnu},
 255 @code{alphaev67-unknown-linux-gnu}.
 256
 257 @item SuSE Linux 7.2a
 258 @cindex SuSE Linux
 259
 260 GCC 3.0 and GNU Make. @code{ia64-unknown-linux-gnu}.
 261
 262 @item SuSE Linux
 263 @cindex SuSE Linux
 264
 265 GCC 3.2.2 and GNU Make.  @code{x86_64-unknown-linux-gnu} (AMD64
 266 Opteron ``Melody'').
 267
 268 @item SuSE Enterprise Server 9 on IBM OpenPower 720
 269 @cindex SuSE Linux
 270 @cindex OpenPower 720
 271
 272 GCC 3.3.3 and GNU Make.  @code{powerpc64-unknown-linux-gnu}.
 273
 274 @item RedHat Linux 7.2
 275 @cindex RedHat
 276
 277 GCC 2.96 and GNU Make. @code{alphaev6-unknown-linux-gnu},
 278 @code{alphaev67-unknown-linux-gnu}, @code{ia64-unknown-linux-gnu}.
 279
 280 @item RedHat Linux 8.0
 281 @cindex RedHat
 282
 283 GCC 3.2 and GNU Make. @code{i686-pc-linux-gnu}.
 284
 285 @item RedHat Advanced Server 2.1
 286 @cindex RedHat Advanced Server
 287
 288 GCC 2.96 and GNU Make. @code{i686-pc-linux-gnu}.
 289
 290 @item Slackware Linux 8.0.01
 291 @cindex RedHat
 292
 293 GCC 2.95.3 and GNU Make. @code{i686-pc-linux-gnu}.
 294
 295 @item Mandrake Linux 9.0
 296 @cindex Mandrake
 297
 298 GCC 3.2 and GNU Make. @code{i686-pc-linux-gnu}.
 299
 300 @item IRIX 6.5
 301 @cindex IRIX
 302
 303 MIPS C compiler, IRIX Make. @code{mips-sgi-irix6.5}.
 304
 305 @item AIX 4.3.2
 306 @cindex AIX
 307
 308 IBM C for AIX compiler, AIX Make.  @code{rs6000-ibm-aix4.3.2.0}.
 309
 310 @item Microsoft Windows 2000 (Cygwin)
 311 @cindex Windows
 312
 313 GCC 3.2, GNU make. @code{i686-pc-cygwin}.
 314
 315 @item HP-UX 11
 316 @cindex HP-UX
 317
 318 HP-UX C compiler and HP Make. @code{ia64-hp-hpux11.22},
 319 @code{hppa2.0w-hp-hpux11.11}.
 320
 321 @item SUN Solaris 2.7
 322 @cindex Solaris
 323
 324 GCC 3.0.4 and GNU Make. @code{sparc-sun-solaris2.7}.
 325
 326 @item SUN Solaris 2.8
 327 @cindex Solaris
 328
 329 Sun WorkShop Compiler C 6.0 and SUN Make. @code{sparc-sun-solaris2.8}.
 330
 331 @item SUN Solaris 2.9
 332 @cindex Solaris
 333
 334 Sun Forte Developer 7 C compiler and GNU
 335 Make. @code{sparc-sun-solaris2.9}.
 336
 337 @item NetBSD 1.6
 338 @cindex NetBSD
 339
 340 GCC 2.95.3 and GNU Make. @code{alpha-unknown-netbsd1.6},
 341 @code{i386-unknown-netbsdelf1.6}.
 342
 343 @item OpenBSD 3.1 and 3.2
 344 @cindex OpenBSD
 345
 346 GCC 2.95.3 and GNU Make. @code{alpha-unknown-openbsd3.1},
 347 @code{i386-unknown-openbsd3.1}.
 348
 349 @item FreeBSD 4.7 and 4.8
 350 @cindex FreeBSD
 351
 352 GCC 2.95.4 and GNU Make. @code{alpha-unknown-freebsd4.7},
 353 @code{alpha-unknown-freebsd4.8}, @code{i386-unknown-freebsd4.7},
 354 @code{i386-unknown-freebsd4.8}.
 355
 356 @item MacOS X 10.2 Server Edition
 357 @cindex MacOS X
 358
 359 GCC 3.1 and GNU Make. @code{powerpc-apple-darwin6.5}.
 360
 361 @item MacOS X 10.4 ``Tiger'' with Xcode 2.0
 362 @cindex MacOS X
 363
 364 GCC 4.0 and GNU Make. @code{powerpc-apple-darwin8.0}.
 365
 366 @item Cross compiled to uClinux/uClibc on Motorola Coldfire
 367 @cindex Motorola Coldfire
 368 @cindex uClinux
 369 @cindex uClibc
 370
 371 GCC 3.4 and GNU Make @code{m68k-uclinux-elf}.
 372
 373 @item Cross compiled to ARM using Glibc
 374 @cindex ARM
 375
 376 GCC 2.95 and GNU Make @code{arm-linux}.
 377
 378 @item Cross compiled to Mingw32.
 379 @cindex Windows
 380 @cindex Microsoft
 381 @cindex mingw32
 382
 383 GCC 3.4.4 and GNU Make @code{i586-mingw32msvc}.
 384
 385 @end enumerate
 386
 387 If you use Libidn on, or port Libidn to, a new platform please report
 388 it to the author.
 389
 390 @node Getting help
 391 @section Getting help
 392
 393 A mailing list where users of Libidn may help each other exists, and
 394 you can reach it by sending e-mail to @email{help-libidn@@gnu.org}.
 395 Archives of the mailing list discussions, and an interface to manage
 396 subscriptions, is available through the World Wide Web at
 397 @url{http://lists.gnu.org/mailman/listinfo/help-libidn}.
 398
 399 @node Commercial Support
 400 @section Commercial Support
 401
 402 Commercial support is available for users of GNU Libidn.  The kind of
 403 support that can be purchased may include:
 404
 405 @itemize
 406
 407 @item Implement new features.
 408 Such as country code specific profiling to support a restricted subset
 409 of Unicode.
 410
 411 @item Port Libidn to new platforms.
 412 This could include porting Libidn to an embedded platforms that may
 413 need memory or size optimization.
 414
 415 @item Integrating IDN support in your existing project.
 416
 417 @item System design of components related to IDN.
 418
 419 @end itemize
 420
 421 If you are interested, please write to:
 422
 423 @verbatim
 424 Simon Josefsson Datakonsult
 425 Hagagatan 24
 426 113 47 Stockholm
 427 Sweden
 428
 429 E-mail: simon@josefsson.org
 430 @end verbatim
 431
 432 If your company provide support related to GNU Libidn and would like
 433 to be mentioned here, contact the author (@pxref{Bug Reports}).
 434
 435 @node Downloading and Installing
 436 @section Downloading and Installing
 437 @cindex Installation
 438 @cindex Download
 439
 440 The package can be downloaded from several places, including:
 441
 442 @url{http://josefsson.org/libidn/releases/}
 443
 444 The latest version is stored in a file, e.g.,
 445 @samp{gsasl-@value{VERSION}.tar.gz} where the @samp{@value{VERSION}}
 446 value is the highest version number in the directory.
 447
 448 The package is then extracted, configured and built like many other
 449 packages that use Autoconf.  For detailed information on configuring
 450 and building it, refer to the @file{INSTALL} file that is part of the
 451 distribution archive.
 452
 453 Here is an example terminal session that download, configure, build
 454 and install the package.  You will need a few basic tools, such as
 455 @samp{sh}, @samp{make} and @samp{cc}.
 456
 457 @example
 458 $ wget -q http://josefsson.org/libidn/releases/libidn-@value{VERSION}.tar.gz
 459 $ tar xfz libidn-@value{VERSION}.tar.gz
 460 $ cd libidn-@value{VERSION}/
 461 $ ./configure
 462 ...
 463 $ make
 464 ...
 465 $ make install
 466 ...
 467 @end example
 468
 469 After that Libidn should be properly installed and ready for use.
 470
 471 A few @code{configure} options may be relevant, summarized in the
 472 table.
 473
 474 @table @code
 475
 476 @item --enable-java
 477 Build the Java port into a *.JAR file.  @xref{Java API}, for more
 478 information.
 479
 480 @item --disable-tld
 481 Disable the TLD module.  This would typically only be useful if you
 482 are building on a memory restricted platforms.  @xref{TLD Functions},
 483 for more information.
 484
 485 @item --enable-csharp[=IMPL]
 486 Build the C3 port into a *.DLL file.  @xref{C# API}, for more
 487 information.  Here, @code{IMPL} is @code{pnet} or @code{mono},
 488 indicating whether the PNET @command{cscc} compiler or the Mono
 489 @command{mcs} compiler should be used, respectively.
 490
 491 @end table
 492
 493 For the complete list, refer to the output from @code{configure
 494 --help}.
 495
 496 @node Bug Reports
 497 @section Bug Reports
 498 @cindex Reporting Bugs
 499
 500 If you think you have found a bug in Libidn, please investigate it and
 501 report it.
 502
 503 @itemize @bullet
 504
 505 @item Please make sure that the bug is really in Libidn, and
 506 preferably also check that it hasn't already been fixed in the latest
 507 version.
 508
 509 @item You have to send us a test case that makes it possible for us to
 510 reproduce the bug.
 511
 512 @item You also have to explain what is wrong; if you get a crash, or
 513 if the results printed are not good and in that case, in what way.
 514 Make sure that the bug report includes all information you would need
 515 to fix this kind of bug for someone else.
 516
 517 @end itemize
 518
 519 Please make an effort to produce a self-contained report, with
 520 something definite that can be tested or debugged.  Vague queries or
 521 piecemeal messages are difficult to act on and don't help the
 522 development effort.
 523
 524 If your bug report is good, we will do our best to help you to get a
 525 corrected version of the software; if the bug report is poor, we won't
 526 do anything about it (apart from asking you to send better bug
 527 reports).
 528
 529 If you think something in this manual is unclear, or downright
 530 incorrect, or if the language needs to be improved, please also send a
 531 note.
 532
 533 Send your bug report to:
 534
 535 @center @samp{bug-libidn@@gnu.org}
 536
 537
 538 @node Contributing
 539 @section Contributing
 540 @cindex Contributing
 541 @cindex Hacking
 542
 543 If you want to submit a patch for inclusion -- from solve a typo you
 544 discovered, up to adding support for a new feature -- you should
 545 submit it as a bug report (@pxref{Bug Reports}).  There are some
 546 things that you can do to increase the chances for it to be included
 547 in the official package.
 548
 549 Unless your patch is very small (say, under 10 lines) we require that
 550 you assign the copyright of your work to the Free Software Foundation.
 551 This is to protect the freedom of the project.  If you have not
 552 already signed papers, we will send you the necessary information when
 553 you submit your contribution.
 554
 555 For contributions that doesn't consist of actual programming code, the
 556 only guidelines are common sense.  Use it.
 557
 558 For code contributions, a number of style guides will help you:
 559
 560 @itemize @bullet
 561
 562 @item Coding Style.
 563 Follow the GNU Standards document (@pxref{top, GNU Coding Standards,,
 564 standards}).
 565
 566 If you normally code using another coding standard, there is no
 567 problem, but you should use @samp{indent} to reformat the code
 568 (@pxref{top, GNU Indent,, indent}) before submitting your work.
 569
 570 @item Use the unified diff format @samp{diff -u}.
 571
 572 @item Return errors.
 573 No reason whatsoever should abort the execution of the library.  Even
 574 memory allocation errors, e.g. when malloc return NULL, should work
 575 although result in an error code.
 576
 577 @item Design with thread safety in mind.
 578 Don't use global variables and the like.
 579
 580 @item Avoid using the C math library.
 581 It causes problems for embedded implementations, and in most
 582 situations it is very easy to avoid using it.
 583
 584 @item Document your functions.
 585 Use comments before each function headers, that, if properly
 586 formatted, are extracted into GTK-DOC web pages.  Don't forget to
 587 update the Texinfo manual as well.
 588
 589 @item Supply a ChangeLog and NEWS entries, where appropriate.
 590
 591 @end itemize
 592
 593 @c **********************************************************
 594 @c *******************  Preparation  ************************
 595 @c **********************************************************
 596 @node Preparation
 597 @chapter Preparation
 598
 599 To use `Libidn', you have to perform some changes to your sources and
 600 the build system.  The necessary changes are small and explained in
 601 the following sections.  At the end of this chapter, it is described
 602 how the library is initialized, and how the requirements of the
 603 library are verified.
 604
 605 A faster way to find out how to adapt your application for use with
 606 `Libidn' may be to look at the examples at the end of this manual
 607 (@pxref{Examples}).
 608
 609 @menu
 610 * Header::
 611 * Initialization::
 612 * Version Check::
 613 * Building the source::
 614 * Autoconf tests::
 615 @end menu
 616
 617 @node Header
 618 @section Header
 619
 620 The library contains a few independent parts, and each part export the
 621 interfaces (data types and functions) in a header file.  You must
 622 include the appropriate header files in all programs using the
 623 library, either directly or through some other header file, like this:
 624
 625 @example
 626 #include <stringprep.h>
 627 @end example
 628
 629 The header files and the functions they define are categorized as
 630 follows:
 631
 632 @table @asis
 633 @item stringprep.h
 634
 635 The low-level stringprep API entry point.  For IDN applications, this
 636 is usually invoked via IDNA. Some applications, specifically non-IDN
 637 ones, may want to prepare strings directly though, and should include
 638 this header file.
 639
 640 The name space of the stringprep part of Libidn is @code{stringprep*}
 641 for function names, @code{Stringprep*} for data types and
 642 @code{STRINGPREP_*} for other symbols.  In addition,
 643 @code{_stringprep*} is reserved for internal use and should never be
 644 used by applications.
 645
 646 @item punycode.h
 647
 648 The entry point to Punycode encoding and decoding functions.  Normally
 649 punycode is used via the idna.h interface, but some application may
 650 want to perform raw punycode operations.
 651
 652 The name space of the punycode part of Libidn is @code{punycode_*} for
 653 function names, @code{Punycode*} for data types and @code{PUNYCODE_*}
 654 for other symbols.  In addition, @code{_punycode*} is reserved for
 655 internal use and should never be used by applications.
 656 @item idna.h
 657
 658 The entry point to the IDNA functions.  This is the normal entry point
 659 for applications that need IDN functionality.
 660
 661 The name space of the IDNA part of Libidn is @code{idna_*} for
 662 function names, @code{Idna*} for data types and @code{IDNA_*} for
 663 other symbols.  In addition, @code{_idna*} is reserved for internal
 664 use and should never be used by applications.
 665
 666 @item tld.h
 667
 668 The entry point to the TLD functions.  Normal applications are not
 669 expected to need this functionality, but it is present for
 670 applications that are used by TLDs to validate customer input.
 671
 672 The name space of the TLD part of Libidn is @code{tld_*} for function
 673 names, @code{Tld_*} for data types and @code{TLD_*} for other symbols.
 674 In addition, @code{_tld*} is reserved for internal use and should
 675 never be used by applications.
 676
 677 @item pr29.h
 678
 679 The entry point to the PR29 functions.  These functions are used to
 680 detect ``problem sequences'' (@pxref{PR29 Functions}), mostly for use
 681 in security critical applications.
 682
 683 The name space of the PR29 part of Libidn is @code{pr29_*} for
 684 function names, @code{Pr29_*} for data types and @code{PR29_*} for
 685 other symbols.  In addition, @code{_pr29*} is reserved for internal
 686 use and should never be used by applications.
 687
 688 @end table
 689
 690 @node Initialization
 691 @section Initialization
 692
 693 Libidn is stateless and does not need any initialization.
 694
 695 @node Version Check
 696 @section Version Check
 697
 698 It is often desirable to check that the version of `Libidn' used is
 699 indeed one which fits all requirements.  Even with binary
 700 compatibility new features may have been introduced but due to problem
 701 with the dynamic linker an old version is actually used.  So you may
 702 want to check that the version is okay right after program startup.
 703
 704 @include texi/stringprep_check_version.texi
 705
 706 The normal way to use the function is to put something similar to the
 707 following first in your @code{main}:
 708
 709 @example
 710   if (!stringprep_check_version (STRINGPREP_VERSION))
 711     @{
 712       printf ("stringprep_check_version() failed:\n"
 713               "Header file incompatible with shared library.\n");
 714       exit(1);
 715     @}
 716 @end example
 717
 718 @node Building the source
 719 @section Building the source
 720 @cindex Compiling your application
 721
 722 If you want to compile a source file including e.g. the `idna.h' header
 723 file, you must make sure that the compiler can find it in the
 724 directory hierarchy.  This is accomplished by adding the path to the
 725 directory in which the header file is located to the compilers include
 726 file search path (via the @option{-I} option).
 727
 728 However, the path to the include file is determined at the time the
 729 source is configured.  To solve this problem, `Libidn' uses the
 730 external package @command{pkg-config} that knows the path to the
 731 include file and other configuration options.  The options that need
 732 to be added to the compiler invocation at compile time are output by
 733 the @option{--cflags} option to @command{pkg-config libidn}.  The
 734 following example shows how it can be used at the command line:
 735
 736 @example
 737 gcc -c foo.c `pkg-config libidn --cflags`
 738 @end example
 739
 740 Adding the output of @samp{pkg-config libidn --cflags} to the
 741 compilers command line will ensure that the compiler can find e.g. the
 742 idna.h header file.
 743
 744 A similar problem occurs when linking the program with the library.
 745 Again, the compiler has to find the library files.  For this to work,
 746 the path to the library files has to be added to the library search
 747 path (via the @option{-L} option).  For this, the option
 748 @option{--libs} to @command{pkg-config libidn} can be used.  For
 749 convenience, this option also outputs all other options that are
 750 required to link the program with the `libidn' libarary.  The example
 751 shows how to link @file{foo.o} with the `libidn' library to a program
 752 @command{foo}.
 753
 754 @example
 755 gcc -o foo foo.o `pkg-config libidn --libs`
 756 @end example
 757
 758 Of course you can also combine both examples to a single command by
 759 specifying both options to @command{pkg-config}:
 760
 761 @example
 762 gcc -o foo foo.c `pkg-config libidn --cflags --libs`
 763 @end example
 764
 765 @node Autoconf tests
 766 @section Autoconf tests
 767 @cindex Autoconf tests
 768 @cindex Configure tests
 769
 770 If your project uses Autoconf (@pxref{top, GNU Autoconf,, autoconf})
 771 to check for installed libraries, you might find the following snippet
 772 illustrative.  It add a new @file{configure} parameter
 773 @code{--with-libidn}, and check for @file{idna.h} and @samp{-lidn}
 774 (possibly below the directory specified as the optional argument to
 775 @code{--with-libidn}), and define the @acronym{CPP} symbol
 776 @code{LIBIDN} if the library is found.  The default behaviour is to
 777 search for the library and enable the functionality (that is, define
 778 the symbol) when the library is found, but if you wish to make the
 779 default behaviour of your package be that Libidn is not used (even if
 780 it is installed on the system), change @samp{libidn=yes} to
 781 @samp{libidn=no} on the third line.
 782
 783 @example
 784 AC_ARG_WITH(libidn, AC_HELP_STRING([--with-libidn=[DIR]],
 785                                 [Support IDN (needs GNU Libidn)]),
 786   libidn=$withval, libidn=yes)
 787 if test "$libidn" != "no"; then
 788   if test "$libidn" != "yes"; then
 789     LDFLAGS="$@{LDFLAGS@} -L$libidn/lib"
 790     CPPFLAGS="$@{CPPFLAGS@} -I$libidn/include"
 791   fi
 792   AC_CHECK_HEADER(idna.h,
 793     AC_CHECK_LIB(idn, stringprep_check_version,
 794       [libidn=yes LIBS="$@{LIBS@} -lidn"], libidn=no),
 795     libidn=no)
 796 fi
 797 if test "$libidn" != "no" ; then
 798   AC_DEFINE(LIBIDN, 1, [Define to 1 if you want IDN support.])
 799 else
 800   AC_MSG_WARN([Libidn not found])
 801 fi
 802 AC_MSG_CHECKING([if Libidn should be used])
 803 AC_MSG_RESULT($libidn)
 804 @end example
 805
 806 If you require that your users have installed @code{pkg-config} (which
 807 I cannot recommend generally), the above can be done more easily as
 808 follows.
 809
 810 @example
 811 AC_ARG_WITH(libidn, AC_HELP_STRING([--with-libidn=[DIR]],
 812                                 [Support IDN (needs GNU Libidn)]),
 813   libidn=$withval, libidn=yes)
 814 if test "$libidn" != "no" ; then
 815   PKG_CHECK_MODULES(LIBIDN, libidn >= 0.0.0, [libidn=yes], [libidn=no])
 816   if test "$libidn" != "yes" ; then
 817     libidn=no
 818     AC_MSG_WARN([Libidn not found])
 819   else
 820     libidn=yes
 821     AC_DEFINE(LIBIDN, 1, [Define to 1 if you want Libidn.])
 822   fi
 823 fi
 824 AC_MSG_CHECKING([if Libidn should be used])
 825 AC_MSG_RESULT($libidn)
 826 @end example
 827
 828 @c **********************************************************
 829 @c ********************  Utility Functions ******************
 830 @c **********************************************************
 831 @node Utility Functions
 832 @chapter Utility Functions
 833 @cindex Utility Functions
 834
 835 The rest of this library makes extensive use of Unicode characters.
 836 In order to interface this library with the outside world, your
 837 application may need to make various Unicode transformations.
 838
 839 @section Header file @code{stringprep.h}
 840
 841 To use the functions explained in this chapter, you need to include
 842 the file @file{stringprep.h} using:
 843
 844 @example
 845 #include <stringprep.h>
 846 @end example
 847
 848 @section Unicode Encoding Transformation
 849
 850 @include texi/stringprep_unichar_to_utf8.texi
 851 @include texi/stringprep_utf8_to_unichar.texi
 852 @include texi/stringprep_ucs4_to_utf8.texi
 853 @include texi/stringprep_utf8_to_ucs4.texi
 854
 855 @section Unicode Normalization
 856
 857 @include texi/stringprep_ucs4_nfkc_normalize.texi
 858 @include texi/stringprep_utf8_nfkc_normalize.texi
 859
 860 @section Character Set Conversion
 861
 862 @include texi/stringprep_locale_charset.texi
 863 @include texi/stringprep_convert.texi
 864 @include texi/stringprep_locale_to_utf8.texi
 865 @include texi/stringprep_utf8_to_locale.texi
 866
 867
 868 @c **********************************************************
 869 @c ******************  Stringprep Functions *****************
 870 @c **********************************************************
 871 @node Stringprep Functions
 872 @chapter Stringprep Functions
 873 @cindex Stringprep Functions
 874
 875 Stringprep describes a framework for preparing Unicode text strings in
 876 order to increase the likelihood that string input and string
 877 comparison work in ways that make sense for typical users throughout
 878 the world. The stringprep protocol is useful for protocol identifier
 879 values, company and personal names, internationalized domain names,
 880 and other text strings.
 881
 882 @section Header file @code{stringprep.h}
 883
 884 To use the functions explained in this chapter, you need to include
 885 the file @file{stringprep.h} using:
 886
 887 @example
 888 #include <stringprep.h>
 889 @end example
 890
 891 @section Defining A Stringprep Profile
 892
 893 Further types and structures are defined for applications that want to
 894 specify their own stringprep profile.  As these are fairly obscure,
 895 and by necessity tied to the implementation, we do not document them
 896 here.  Look into the @file{stringprep.h} header file, and the
 897 @file{profiles.c} source code for the details.
 898
 899 @section Control Flags
 900
 901 @deftypevr {Stringprep flags} {Stringprep_profile_flags} {STRINGPREP_NO_NFKC}
 902 Disable the NFKC normalization, as well as selecting the non-NFKC case
 903 folding tables.  Usually the profile specifies BIDI and NFKC settings,
 904 and applications should not override it unless in special situations.
 905 @end deftypevr
 906
 907 @deftypevr {Stringprep flags} {Stringprep_profile_flags} {STRINGPREP_NO_BIDI}
 908 Disable the BIDI step.  Usually the profile specifies BIDI and NFKC
 909 settings, and applications should not override it unless in special
 910 situations.
 911 @end deftypevr
 912
 913 @deftypevr {Stringprep flags} {Stringprep_profile_flags} {STRINGPREP_NO_UNASSIGNED}
 914 Make the library return with an error if string contains unassigned
 915 characters according to profile.
 916 @end deftypevr
 917
 918 @section Core Functions
 919
 920 @include texi/stringprep_4i.texi
 921 @include texi/stringprep_4zi.texi
 922 @include texi/stringprep.texi
 923 @include texi/stringprep_profile.texi
 924
 925 @section Error Handling
 926
 927 @include texi/stringprep_strerror.texi
 928
 929 @section Stringprep Profile Macros
 930
 931 @deftypefun {int} stringprep_nameprep_no_unassigned (char * @var{in}, int @var{maxlen})
 932
 933 @var{in}: input/ouput array with string to prepare.
 934
 935 @var{maxlen}: maximum length of input/output array.
 936
 937 Prepare the input UTF-8 string according to the nameprep profile.  The
 938 AllowUnassigned flag is false, use @code{stringprep_nameprep} for
 939 true AllowUnassigned.  Returns 0 iff successful, or an error code.
 940 @end deftypefun
 941
 942 @deftypefun {int} stringprep_iscsi (char * @var{in}, int @var{maxlen})
 943
 944 @var{in}: input/ouput array with string to prepare.
 945
 946 @var{maxlen}: maximum length of input/output array.
 947
 948 Prepare the input UTF-8 string according to the draft iSCSI stringprep
 949 profile.  Returns 0 iff successful, or an error code.
 950 @end deftypefun
 951
 952 @deftypefun {int} stringprep_plain (char * @var{in}, int @var{maxlen})
 953
 954 @var{in}: input/ouput array with string to prepare.
 955
 956 @var{maxlen}: maximum length of input/output array.
 957
 958 Prepare the input UTF-8 string according to the draft SASL ANONYMOUS
 959 profile.  Returns 0 iff successful, or an error code.
 960 @end deftypefun
 961
 962 @deftypefun {int} stringprep_xmpp_nodeprep (char * @var{in}, int @var{maxlen})
 963
 964 @var{in}: input/ouput array with string to prepare.
 965
 966 @var{maxlen}: maximum length of input/output array.
 967
 968 Prepare the input UTF-8 string according to the draft XMPP node
 969 identifier profile.  Returns 0 iff successful, or an error code.
 970 @end deftypefun
 971
 972 @deftypefun {int} stringprep_xmpp_resourceprep (char * @var{in}, int @var{maxlen})
 973
 974 @var{in}: input/ouput array with string to prepare.
 975
 976 @var{maxlen}: maximum length of input/output array.
 977
 978 Prepare the input UTF-8 string according to the draft XMPP resource
 979 identifier profile.  Returns 0 iff successful, or an error code.
 980 @end deftypefun
 981
 982 @c **********************************************************
 983 @c *******************  Punycode Functions ******************
 984 @c **********************************************************
 985 @node Punycode Functions
 986 @chapter Punycode Functions
 987 @cindex Punycode Functions
 988
 989 Punycode is a simple and efficient transfer encoding syntax designed
 990 for use with Internationalized Domain Names in Applications. It
 991 uniquely and reversibly transforms a Unicode string into an ASCII
 992 string. ASCII characters in the Unicode string are represented
 993 literally, and non-ASCII characters are represented by ASCII
 994 characters that are allowed in host name labels (letters, digits, and
 995 hyphens). A general algorithm called Bootstring allows a string of
 996 basic code points to uniquely represent any string of code points
 997 drawn from a larger set. Punycode is an instance of Bootstring that
 998 uses particular parameter values, appropriate for IDNA.
 999
1000 @section Header file @code{punycode.h}
1001
1002 To use the functions explained in this chapter, you need to include
1003 the file @file{punycode.h} using:
1004
1005 @example
1006 #include <punycode.h>
1007 @end example
1008
1009 @section Unicode Code Point Data Type
1010
1011 The punycode function uses a special type to denote Unicode code
1012 points.  It is guaranteed to always be a 32 bit unsigned integer.
1013
1014 @deftypevr {Punycode Unicode code point} uint32_t punycode_uint
1015 A unsigned integer that hold Unicode code points.
1016 @end deftypevr
1017
1018 @section Core Functions
1019
1020 Note that the current implementation will fail if the
1021 @code{input_length} exceed 4294967295 (the size of
1022 @code{punycode_uint}).  This restriction may be removed in the future.
1023 Meanwhile applications are encouraged to not depend on this problem,
1024 and use @code{sizeof} to initialize @code{input_length} and
1025 @code{output_length}.
1026
1027 The functions provided are the following two entry points:
1028
1029 @include texi/punycode_encode.texi
1030 @include texi/punycode_decode.texi
1031
1032 @section Error Handling
1033
1034 @include texi/punycode_strerror.texi
1035
1036 @c **********************************************************
1037 @c ********************* IDNA Functions *********************
1038 @c **********************************************************
1039 @node IDNA Functions
1040 @chapter IDNA Functions
1041 @cindex IDNA Functions
1042
1043 Until now, there has been no standard method for domain names to use
1044 characters outside the ASCII repertoire. The IDNA document defines
1045 internationalized domain names (IDNs) and a mechanism called IDNA for
1046 handling them in a standard fashion. IDNs use characters drawn from a
1047 large repertoire (Unicode), but IDNA allows the non-ASCII characters
1048 to be represented using only the ASCII characters already allowed in
1049 so-called host names today. This backward-compatible representation is
1050 required in existing protocols like DNS, so that IDNs can be
1051 introduced with no changes to the existing infrastructure. IDNA is
1052 only meant for processing domain names, not free text.
1053
1054 @section Header file @code{idna.h}
1055
1056 To use the functions explained in this chapter, you need to include
1057 the file @file{idna.h} using:
1058
1059 @example
1060 #include <idna.h>
1061 @end example
1062
1063 @section Control Flags
1064
1065 The IDNA @code{flags} parameter can take on the following values, or a
1066 bit-wise inclusive or of any subset of the parameters:
1067
1068 @deftypevr {Return code} {Idna_flags} IDNA_ALLOW_UNASSIGNED
1069 Allow unassigned Unicode code points.
1070 @end deftypevr
1071
1072 @deftypevr {Return code} {Idna_flags} IDNA_USE_STD3_ASCII_RULES
1073 Check output to make sure it is a STD3 conforming host name.
1074 @end deftypevr
1075
1076 @section Prefix String
1077
1078 @deftypevr {Macro} {#define} IDNA_ACE_PREFIX
1079 String with the official IDNA prefix, @code{xn--}.
1080 @end deftypevr
1081
1082 @section Core Functions
1083
1084 The idea behind the IDNA function names are as follows: the
1085 @code{idna_to_ascii_4i} and @code{idna_to_unicode_44i} functions are
1086 the core IDNA primitives.  The @code{4} indicate that the function
1087 takes UCS-4 strings (i.e., Unicode code points encoded in a 32-bit
1088 unsigned integer type) of the specified length.  The @code{i} indicate
1089 that the data is written ``inline'' into the buffer.  This means the
1090 caller is responsible for allocating (and deallocating) the string,
1091 and providing the library with the allocated length of the string.
1092 The output length is written in the output length variable.  The
1093 remaining functions all contain the @code{z} indicator, which means
1094 the strings are zero terminated.  All output strings are allocated by
1095 the library, and must be deallocated by the caller.  The @code{4}
1096 indicator again means that the string is UCS-4, the @code{8} means the
1097 strings are UTF-8 and the @code{l} indicator means the strings are
1098 encoded in the encoding used by the current locale.
1099
1100 The functions provided are the following entry points:
1101
1102 @include texi/idna_to_ascii_4i.texi
1103 @include texi/idna_to_unicode_44i.texi
1104
1105 @section Simplified ToASCII Interface
1106
1107 @include texi/idna_to_ascii_4z.texi
1108 @include texi/idna_to_ascii_8z.texi
1109 @include texi/idna_to_ascii_lz.texi
1110
1111 @section Simplified ToUnicode Interface
1112
1113 @include texi/idna_to_unicode_4z4z.texi
1114 @include texi/idna_to_unicode_8z4z.texi
1115 @include texi/idna_to_unicode_8z8z.texi
1116 @include texi/idna_to_unicode_8zlz.texi
1117 @include texi/idna_to_unicode_lzlz.texi
1118
1119 @section Error Handling
1120
1121 @include texi/idna_strerror.texi
1122
1123 @c **********************************************************
1124 @c ********************** TLD Functions *********************
1125 @c **********************************************************
1126 @node TLD Functions
1127 @chapter TLD Functions
1128 @cindex TLD Functions
1129
1130 Organizations that manage some Top Level Domains (@acronym{TLD}s) have
1131 published tables with characters they accept within the domain.  The
1132 reason may be to reduce complexity that come from using the full
1133 Unicode range, and to protect themselves from future (backwards
1134 incompatible) changes in the IDN or Unicode specifications.  Libidn
1135 implement an infrastructure for defining and checking strings against
1136 such tables.  Libidn also ship some tables from @acronym{TLD}s that we
1137 have managed to get permission to use them from.  Because these tables
1138 are even less static than Unicode or StringPrep tables, it is likely
1139 that they will be updated from time to time (even in backwards
1140 incompatibe ways).  The Libidn interface provide a ``version'' field
1141 for each @acronym{TLD} table, which can be compared for equality to
1142 guarantee the same operation over time.
1143
1144 From a design point of view, you can regard the @acronym{TLD} tables
1145 for IDN as the ``localization'' step that come after the
1146 ``internationalization'' step provided by the IETF standards.
1147
1148 The TLD functionality rely on up-to-date tables.  The latest version
1149 of Libidn aim to provide these, but tables with unclear copying
1150 conditions, or generally experimental tables, are not included.  Some
1151 such tables can be found at @url{http://tldchk.berlios.de}.
1152
1153 @section Header file @code{tld.h}
1154
1155 To use the functions explained in this chapter, you need to include
1156 the file @file{tld.h} using:
1157
1158 @example
1159 #include <tld.h>
1160 @end example
1161
1162 @c @section Data Types
1163 @c
1164 @c @deftp {Data type} {Tld_table_element} @var{start} @var{end}
1165 @c @example
1166 @c /* Interval of valid code points in the TLD. */
1167 @c struct Tld_table_element
1168 @c @{
1169 @c   uint32_t start;            /* Start of range. */
1170 @c   uint32_t end;              /* End of range, end == start if single. */
1171 @c @};
1172 @c typedef struct Tld_table_element Tld_table_element;
1173 @c @end example
1174 @c This @code{struct} contain the @var{start} and @var{end} positions
1175 @c (inclusive) of a range.  If the range is a single (i.e., starts and
1176 @c ends in the same character), then set @var{end} to the same as
1177 @c @var{start}.  This structure is normally used as an array.
1178 @c @end deftp
1179 @c
1180 @c @deftp {Data type} {Tld_table} @var{name} @var{version} @var{nvalid} @var{valid}
1181 @c @example
1182 @c /* List valid code points in a TLD. */
1183 @c struct Tld_table
1184 @c @{
1185 @c   char *name;                        /* TLD name, e.g., "no". */
1186 @c   char *version;             /* Version string from TLD file. */
1187 @c   size_t nvalid;             /* Number of entries in data. */
1188 @c   Tld_table_element *valid[];        /* Sorted array of valid code points. */
1189 @c @};
1190 @c typedef struct Tld_table Tld_table;
1191 @c @end example
1192 @c In this @code{struct}, the @var{name} field is a string (@samp{char*})
1193 @c indicating the TLD name (e.g., ``no'').  The @var{version} field is a
1194 @c string (@samp{char*}) containing a free form humanly readable string
1195 @c that can be used for equality comparison to compare different versions
1196 @c of the table.  The @var{nvalid} field indicate how many entries there
1197 @c are in @var{valid}, which brings us finally to @var{valid} that
1198 @c contain the actual code points that are valid for this TLD (see
1199 @c @code{Tld_table_element} above).
1200 @c @end deftp
1201
1202 @section Core Functions
1203
1204 @include texi/tld_check_4t.texi
1205 @include texi/tld_check_4tz.texi
1206
1207 @section Utility Functions
1208
1209 @include texi/tld_get_4.texi
1210 @include texi/tld_get_4z.texi
1211 @include texi/tld_get_z.texi
1212 @include texi/tld_get_table.texi
1213 @include texi/tld_default_table.texi
1214
1215 @section High-Level Wrapper Functions
1216
1217 @include texi/tld_check_4.texi
1218 @include texi/tld_check_4z.texi
1219 @include texi/tld_check_8z.texi
1220 @include texi/tld_check_lz.texi
1221
1222 @section Error Handling
1223
1224 @include texi/tld_strerror.texi
1225
1226 @c **********************************************************
1227 @c ********************** PR29 Functions ********************
1228 @c **********************************************************
1229 @node PR29 Functions
1230 @chapter PR29 Functions
1231 @cindex PR29 Functions
1232
1233 A deficiency in the specification of Unicode Normalization Forms has
1234 been found.  The consequence is that some strings can be normalized
1235 into different strings by different implementations.  In other words,
1236 two different implementations may return different output for the same
1237 input (because the interpretation of the specification is
1238 ambiguous). Further, an implementation invoked again on the one of the
1239 output strings may return a different string (because one of the
1240 interpretation of the ambiguous specification make normalization
1241 non-idempotent).  Fortunately, only a select few character sequence
1242 exhibit this problem, and none of them are expected to occur in
1243 natural languages (due to different linguistic uses of the involved
1244 characters).
1245
1246 A full discussion of the problem may be found at:
1247
1248 @url{http://www.unicode.org/review/pr-29.html}
1249
1250 The PR29 functions below allow you to detect the problem sequence.  So
1251 when would you want to use these functions?  For most applications,
1252 such as those using Nameprep for IDN, this is likely only to be an
1253 interoperability problem.  Thus, you may not want to care about it, as
1254 the character sequences will rarely occur naturally.  However, if you
1255 are using a profile, such as SASLPrep, to process authentication
1256 tokens; authorization tokens; or passwords, there is a real danger
1257 that attackers may try to use the peculiarities in these strings to
1258 attack parts of your system.  As only a small number of strings, and
1259 no naturally occurring strings, exhibit this problem, the conservative
1260 approach of rejecting the strings is recommended.  If this approach is
1261 not used, you should instead verify that all parts of your system,
1262 that process the tokens and passwords, use a NFKC implementation that
1263 produce the same output for the same input.
1264
1265 Technically inclined readers may be interested in knowing more about
1266 the implementation aspects of the PR29 flaw. @xref{PR29 discussion}.
1267
1268 @section Header file @code{pr29.h}
1269
1270 To use the functions explained in this chapter, you need to include
1271 the file @file{pr29.h} using:
1272
1273 @example
1274 #include <pr29.h>
1275 @end example
1276
1277 @section Core Functions
1278
1279 @include texi/pr29_4.texi
1280
1281 @section Utility Functions
1282
1283 @include texi/pr29_4z.texi
1284 @include texi/pr29_8z.texi
1285
1286 @section Error Handling
1287
1288 @include texi/pr29_strerror.texi
1289
1290 @c **********************************************************
1291 @c ***********************  Examples  ***********************
1292 @c **********************************************************
1293 @node Examples
1294 @chapter Examples
1295 @cindex Examples
1296
1297 This chapter contains example code which illustrate how `Libidn' can
1298 be used when writing your own application.
1299
1300 @menu
1301 * Example 1::           Example using stringprep.
1302 * Example 2::           Example using punycode.
1303 * Example 3::           Example using IDNA ToASCII.
1304 * Example 4::           Example using IDNA ToUnicode.
1305 * Example 5::           Example using TLD checking.
1306 @end menu
1307
1308 @node Example 1
1309 @section Example 1
1310
1311 This example demonstrates how the stringprep functions are used.
1312
1313 @verbatiminclude ../examples/example.c
1314
1315 @node Example 2
1316 @section Example 2
1317
1318 This example demonstrates how the punycode functions are used.
1319
1320 @verbatiminclude ../examples/example2.c
1321
1322 @node Example 3
1323 @section Example 3
1324
1325 This example demonstrates how the library is used to convert
1326 internationalized domain names into ASCII compatible names.
1327
1328 @verbatiminclude ../examples/example3.c
1329
1330 @node Example 4
1331 @section Example 4
1332
1333 This example demonstrates how the library is used to convert ASCII
1334 compatible names to internationalized domain names.
1335
1336 @verbatiminclude ../examples/example4.c
1337
1338 @node Example 5
1339 @section Example 5
1340
1341 This example demonstrates how the library is used to check a string
1342 for invalid characters within a specific TLD.
1343
1344 @verbatiminclude ../examples/example5.c
1345
1346 @c **********************************************************
1347 @c *********************  Invoking idn  *********************
1348 @c **********************************************************
1349 @node Invoking idn
1350 @chapter Invoking idn
1351
1352 @pindex idn
1353 @cindex invoking @command{idn}
1354 @cindex command line
1355
1356 @section Name
1357
1358 GNU Libidn (idn) -- Internationalized Domain Names command line tool
1359
1360 @section Description
1361 @code{idn} allows internationalized string preparation
1362 (@samp{stringprep}), encoding and decoding of punycode data, and IDNA
1363 ToASCII/ToUnicode operations to be performed on the command line.
1364
1365 If strings are specified on the command line, they are used as input
1366 and the computed output is printed to standard output @code{stdout}.
1367 If no strings are specified on the command line, the program read
1368 data, line by line, from the standard input @code{stdin}, and print
1369 the computed output to standard output.  What processing is performed
1370 (e.g., ToASCII, or Punycode encode) is indicated by options.  If any
1371 errors are encountered, the execution of the applications is aborted.
1372
1373 All strings are expected to be encoded in the preferred charset used
1374 by your locale.  Use @code{--debug} to find out what this charset is.
1375 You can override the charset used by setting environment variable
1376 @code{CHARSET}.
1377
1378 To process a string that starts with @code{-}, for example
1379 @code{-foo}, use @code{--} to signal the end of parameters, as in
1380 @code{idn --quiet -a -- -foo}.
1381
1382 @section Options
1383 @code{idn} recognizes these commands:
1384
1385 @verbatim
1386   -h, --help               Print help and exit
1387
1388   -V, --version            Print version and exit
1389
1390   -s, --stringprep         Prepare string according to nameprep profile
1391
1392   -d, --punycode-decode    Decode Punycode
1393
1394   -e, --punycode-encode    Encode Punycode
1395
1396   -a, --idna-to-ascii      Convert to ACE according to IDNA (default)
1397
1398   -u, --idna-to-unicode    Convert from ACE according to IDNA
1399
1400       --allow-unassigned   Toggle IDNA AllowUnassigned flag  (default=off)
1401
1402       --usestd3asciirules  Toggle IDNA UseSTD3ASCIIRules flag  (default=off)
1403
1404   -t, --tld                Check string for TLD specific rules
1405                              Only for --idna-to-ascii and --idna-to-unicode
1406                              (default=on)
1407
1408   -p, --profile=STRING     Use specified stringprep profile instead
1409                              Valid stringprep profiles are `Nameprep',
1410                              `iSCSI', `Nodeprep', `Resourceprep', `trace', and
1411                              `SASLprep'.
1412
1413       --debug              Print debugging information  (default=off)
1414
1415       --quiet              Silent operation  (default=off)
1416 @end verbatim
1417
1418 @section Environment Variables
1419
1420 The @var{CHARSET} environment variable can be used to override what
1421 character set to be used for decoding incoming data (i.e., on the
1422 command line or on the standard input stream), and to encode data to
1423 the standard output.  If your system is set up correctly, however, the
1424 application will guess which character set is used automatically.
1425 Example usage:
1426
1427 @example
1428 $ CHARSET=ISO-8859-1 idn --punycode-encode
1429 ...
1430 @end example
1431
1432 @section Examples
1433
1434 Standard usage, reading input from standard input:
1435
1436 @example
1437 jas@@latte:~$ idn
1438 libidn 0.3.5
1439 Copyright 2002, 2003 Simon Josefsson.
1440 GNU Libidn comes with NO WARRANTY, to the extent permitted by law.
1441 You may redistribute copies of GNU Libidn under the terms of
1442 the GNU Lesser General Public License.  For more information
1443 about these matters, see the file named COPYING.LIB.
1444 Type each input string on a line by itself, terminated by a newline character.
1445 r@"aksm@"org@aa{}s.se
1446 xn--rksmrgs-5wao1o.se
1447 jas@@latte:~$
1448 @end example
1449
1450 Reading input from command line, and disabling copyright and license
1451 information:
1452
1453 @example
1454 jas@@latte:~$ idn --quiet r@"aksm@"org@aa{}s.se bl@aa{}b@ae{}rgr@o{}d.no
1455 xn--rksmrgs-5wao1o.se
1456 xn--blbrgrd-fxak7p.no
1457 jas@@latte:~$
1458 @end example
1459
1460 Accessing a specific StringPrep profile directly:
1461
1462 @example
1463 jas@@latte:~$ idn --quiet --profile=SASLprep --stringprep te@ss{}t@ordf{}
1464 te@ss{}ta
1465 jas@@latte:~$
1466 @end example
1467
1468 @section Troubleshooting
1469
1470 Getting character data encoded right, and making sure Libidn use the
1471 same encoding, can be difficult.  The reason for this is that most
1472 systems encode character data in more than one character encoding,
1473 i.e., using @code{UTF-8} together with @code{ISO-8859-1} or
1474 @code{ISO-2022-JP}.  This problem is likely to continue to exist until
1475 only one character encoding come out as the evolutionary winner, or
1476 (more likely, at least to some extents) forever.
1477
1478 The first step to troubleshooting character encoding problems with
1479 Libidn is to use the @samp{--debug} parameter to find out which
1480 character set encoding @samp{idn} believe your locale uses.
1481
1482 @example
1483 jas@@latte:~$ idn --debug --quiet ""
1484 system locale uses charset `UTF-8'.
1485
1486 jas@@latte:~$
1487 @end example
1488
1489 If it prints @code{ANSI_X3.4-1968} (i.e., @code{US-ASCII}), this
1490 indicate you have not configured your locale properly.  To configure
1491 the locale, you can, for example, use @samp{LANG=sv_SE.UTF-8; export
1492 LANG} at a @code{/bin/sh} prompt, to set up your locale for a Swedish
1493 environment using @code{UTF-8} as the encoding.
1494
1495 Sometimes @samp{idn} appear to be unable to translate from your system
1496 locale into @code{UTF-8} (which is used internally), and you get an
1497 error like the following:
1498
1499 @example
1500 jas@@latte:~$ idn --quiet foo
1501 idn: could not convert from ISO-8859-1 to UTF-8.
1502 jas@@latte:~$
1503 @end example
1504
1505 The simplest explanation is that you haven't installed the
1506 @samp{iconv} conversion tools.  You can find it as a standalone
1507 library in @acronym{GNU} Libiconv
1508 (@uref{http://www.gnu.org/software/libiconv/}).  On many
1509 @acronym{GNU}/Linux systems, this library is part of the system, but
1510 you may have to install additional packages (e.g., @samp{glibc-locale}
1511 for Debian) to be able to use it.
1512
1513 Another explanation is that the error is correct and you are feeding
1514 @samp{idn} invalid data.  This can happen inadvertently if you are not
1515 careful with the character set encodings you use.  For example, if
1516 your shell run in a @code{ISO-8859-1} environment, and you invoke
1517 @samp{idn} with the @samp{CHARSET} environment variable as follows,
1518 you will feed it @code{ISO-8859-1} characters but force it to believe
1519 they are @code{UTF-8}.  Naturally this will lead to an error, unless
1520 the byte sequences happen to be parsable as @code{UTF-8}.  Note that
1521 even if you don't get an error, the output may be incorrect in this
1522 situation, because @code{ISO-8859-1} and @code{UTF-8} does not in
1523 general encode the same characters as the same byte sequences.
1524
1525 @example
1526 jas@@latte:~$ idn --quiet --debug ""
1527 system locale uses charset `ISO-8859-1'.
1528
1529 jas@@latte:~$ CHARSET=UTF-8 idn --quiet --debug r@"aksm@"org@aa{}s
1530 system locale uses charset `UTF-8'.
1531 input[0] = U+0072
1532 input[1] = U+4af3
1533 input[2] = U+006d
1534 input[3] = U+1b29e5
1535 input[4] = U+0073
1536 output[0] = U+0078
1537 output[1] = U+006e
1538 output[2] = U+002d
1539 output[3] = U+002d
1540 output[4] = U+0072
1541 output[5] = U+006d
1542 output[6] = U+0073
1543 output[7] = U+002d
1544 output[8] = U+0068
1545 output[9] = U+0069
1546 output[10] = U+0036
1547 output[11] = U+0064
1548 output[12] = U+0035
1549 output[13] = U+0039
1550 output[14] = U+0037
1551 output[15] = U+0035
1552 output[16] = U+0035
1553 output[17] = U+0032
1554 output[18] = U+0061
1555 xn--rms-hi6d597552a
1556 jas@@latte:~$
1557 @end example
1558
1559 The sense moral here is to forget about @samp{CHARSET} (configure your
1560 locales properly instead) unless you know what you are doing, and if
1561 you want to use it, do it carefully, after verifying with
1562 @samp{--debug} that you get the desired results.
1563
1564 @node Emacs API
1565 @chapter Emacs API
1566
1567 Included in Libidn are @file{punycode.el} and @file{idna.el} that
1568 provides an Emacs Lisp API to (a limited set of) the Libidn API.  This
1569 section describes the API.  Currently the IDNA API always set the
1570 @code{UseSTD3ASCIIRules} flag and clear the @code{AllowUnassigned}
1571 flag, in the future there may be functionality to specify these flags
1572 via the API.
1573
1574 @section Punycode Emacs API
1575
1576 @defvar punycode-program
1577 Name of the GNU Libidn @file{idn} application.  The default is
1578 @samp{idn}.  This variable can be customized.
1579 @end defvar
1580
1581 @defvar punycode-environment
1582 List of environment variable definitions prepended to
1583 @samp{process-environment}.  The default is @samp{("CHARSET=UTF-8")}.
1584 This variable can be customized.
1585 @end defvar
1586
1587 @defvar punycode-encode-parameters
1588 List of parameters passed to @var{punycode-program} to invoke punycode
1589 encoding mode.  The default is @samp{("--quiet" "--punycode-encode")}.
1590 This variable can be customized.
1591 @end defvar
1592
1593 @defvar punycode-decode-parameters
1594 Parameters passed to @var{punycode-program} to invoke punycode
1595 decoding mode.  The default is @samp{("--quiet" "--punycode-decode")}.
1596 This variable can be customized.
1597 @end defvar
1598
1599 @defun punycode-encode string
1600 Returns a Punycode encoding of the @var{string}, after converting the
1601 input into UTF-8.
1602 @end defun
1603
1604 @defun punycode-decode string
1605 Returns a possibly multibyte string which is the decoding of the
1606 @var{string} which is a punycode encoded string.
1607 @end defun
1608
1609 @section IDNA Emacs API
1610
1611 @defvar idna-program
1612 Name of the GNU Libidn @file{idn} application.  The default is
1613 @samp{idn}.  This variable can be customized.
1614 @end defvar
1615
1616 @defvar idna-environment
1617 List of environment variable definitions prepended to
1618 @samp{process-environment}.  The default is @samp{("CHARSET=UTF-8")}.
1619 This variable can be customized.
1620 @end defvar
1621
1622 @defvar idna-to-ascii-parameters
1623 List of parameters passed to @var{idna-program} to invoke IDNA ToASCII
1624 mode.  The default is @samp{("--quiet" "--idna-to-ascii"
1625 "--usestd3asciirules")}.  This variable can be customized.
1626 @end defvar
1627
1628 @defvar idna-to-unicode-parameters
1629 Parameters passed @var{idna-program} to invoke IDNA ToUnicode mode.
1630 The default is @samp{("--quiet" "--idna-to-unicode"
1631 "--usestd3asciirules")}.  This variable can be customized.
1632 @end defvar
1633
1634 @defun idna-to-ascii string
1635 Returns an ASCII Compatible Encoding (ACE) of the string computed by
1636 the IDNA ToASCII operation on the input @var{string}, after converting
1637 the input to UTF-8.
1638 @end defun
1639
1640 @defun idna-to-unicode string
1641 Returns a possibly multibyte string which is the output of the IDNA
1642 ToUnicode operation computed on the input @var{string}.
1643 @end defun
1644
1645 @node Java API
1646 @chapter Java API
1647
1648 Libidn has been ported to the Java programming language, and as a
1649 consequence most of the API is available to native Java applications.
1650 This section contain notes on this support, complete documentation is
1651 pending.
1652
1653 The Java library, if Libidn has been built with Java support
1654 (@pxref{Downloading and Installing}), will be placed in
1655 @file{java/libidn.jar}.  The source code is located in
1656 @file{java/gnu/inet/encoding/}.
1657
1658 @section Overview
1659
1660 This package provides a Java implementation of the Internationalized
1661 Domain Names in Applications (IDNA) standard. It is written entirely
1662 in Java and does not require any additional libraries to be set up.
1663
1664 The gnu.inet.encoding.IDNA class offers two public functions, toASCII
1665 and toUnicode which can be used as follows:
1666
1667 @example
1668 gnu.inet.encoding.IDNA.toASCII("bl@"ods.z@"ug");
1669 gnu.inet.encoding.IDNA.toUnicode("xn--blds-6qa.xn--zg-xka");
1670 @end example
1671
1672 @section Miscellaneous Programs
1673
1674 The @file{misc/} directory contains several programs that are related
1675 to the Java part of GNU Libidn, but that don't need to be included in
1676 the main source tree.
1677
1678 @subsection GenerateRFC3454
1679
1680 This program parses RFC3454 and creates the RFC3454.java program that
1681 is required during the StringPrep phase.
1682
1683 The RFC can be found at various locations, for example at
1684 @url{http://www.ietf.org/rfc/rfc3454.txt}.
1685
1686 Invoke the program as follows:
1687
1688 @example
1689 $ java GenerateRFC3454
1690 Creating RFC3454.java... Ok.
1691 @end example
1692
1693 @subsection GenerateNFKC
1694
1695 The GenerateNFKC program parses the Unicode character database file
1696 and generates all the tables required for NFKC. This program requires
1697 the two files UnicodeData.txt and CompositionExclusions.txt of version
1698 3.2 of the Unicode files. Note that RFC3454 (Stringprep) defines that
1699 Unicode version 3.2 is to be used, not the latest version.
1700
1701 The Unicode data files can be found at
1702 @url{http://www.unicode.org/Public/}.
1703
1704 Invoke the program as follows:
1705
1706 @example
1707 $ java GenerateNFKC
1708 Creating CombiningClass.java... Ok.
1709 Creating DecompositionKeys.java... Ok.
1710 Creating DecompositionMappings.java... Ok.
1711 Creating Composition.java... Ok.
1712 @end example
1713
1714 @subsection TestIDNA
1715
1716 The TestIDNA program allows to test the IDNA implementation manually
1717 or against Simon Josefsson's test vectors.
1718
1719 The test vectors can be found at the Libidn homepage,
1720 @url{http://www.gnu.org/software/libidn/}.
1721
1722 To test the tranformation manually, use:
1723
1724 @example
1725 $ java -cp .:../libidn.jar TestIDNA -a <string to test>
1726 Input: <string to test>
1727 Output: <toASCII(string to test)>
1728 $ java -cp .:../libidn.jar TestIDNA -u <string to test>
1729 Input: <string to test>
1730 Output: <toUnicode(string to test)>
1731 @end example
1732
1733 To test against draft-josefsson-idn-test-vectors.html, use:
1734
1735 @example
1736 $ java -cp .:../libidn.jar TestIDNA -t
1737 No errors detected!
1738 @end example
1739
1740 @subsection TestNFKC
1741
1742 The TestNFKC program allows to test the NFKC implementation manually
1743 or against the NormalizationTest.txt file from the Unicode data files.
1744
1745 To test the normalization manually, use:
1746
1747 @example
1748 $ java -cp .:../libidn.jar TestNFKC <string to test>
1749 Input: <string to test>
1750 Output: <nfkc version of the string to test>
1751 @end example
1752
1753 To test against NormalizationTest.txt:
1754
1755 @example
1756 $ java -cp .:../libidn.jar TestNFKC
1757 No errors detected!
1758 @end example
1759
1760 @section Possible Problems
1761
1762 Beware of Bugs: This Java API needs a lot more testing, especially
1763 with "exotic" character sets. While it works for me, it may not work
1764 for you.
1765
1766 Encoding of your Java sources: If you are using non-ASCII characters
1767 in your Java source code, make sure javac compiles your programs with
1768 the correct encoding. If necessary specify the encoding using the
1769 -encoding parameter.
1770
1771 Java Unicode handling: Java 1.4 only handles 16-bit Unicode code
1772 points (i.e. characters in the Basic Multilingual Plane), this
1773 implementation therefore ignores all references to so-called
1774 Supplementary Characters (U+10000 to U+10FFFF). Starting from Java
1775 1.5, these characters will also be supported by Java, but this will
1776 require changes to this library.  See also the next section.
1777
1778 @section A Note on Java and Unicode
1779
1780 This library uses Java's builtin 'char' datatype. Up to Java 1.4, this
1781 datatype only supports 16-bit Unicode code points, also called the
1782 Basic Multilingual Plane. For this reason, this library doesn't work
1783 for Supplementary Characters (i.e. characters from U+10000 to
1784 U+10FFFF). All references to such characters are silently ignored.
1785
1786 Starting from Java 1.5, also Supplementary Characters will be
1787 supported. However, this will require changes in the present version
1788 of the library. Java 1.5 is currently in beta status.
1789
1790 For more information refer to the documentation of java.lang.Character
1791 in the JDK API.
1792
1793 @node C# API
1794 @chapter C# API
1795
1796 The Libidn library has been ported to the C# language.  The port
1797 reside in the top-level @file{csharp/} directory.  Currently, no
1798 further documentation about the implementation or the API is
1799 available.  However, the C# port was based on the Java port, and the
1800 API is exactly the same as in the Java version.  The help files for
1801 the Java API may thus be useful.
1802
1803 @c **********************************************************
1804 @c *******************  Acknowledgements  *******************
1805 @c **********************************************************
1806 @node Acknowledgements
1807 @chapter Acknowledgements
1808
1809 The punycode implementation was taken from the IETF IDN Punycode
1810 specification, by Adam M. Costello.  The TLD code was contributed by
1811 Thomas Jacob.  The Java implementation was contributed by Oliver Hitz.
1812 The C# implementation was contributed by Alexander Gnauck.  The
1813 Unicode tables were provided by Unicode, Inc.  Some functions for
1814 dealing with Unicode (see nfkc.c and toutf8.c) were borrowed from
1815 GLib, downloaded from @url{http://www.gtk.org/}.  The manual borrowed
1816 text from Libgcrypt by Werner Koch.
1817
1818 Inspiration for many things that, consciously or not, have gone into
1819 this package is due to a number of free software package that the
1820 author has been exposed to.  The author wishes to acknowledge the free
1821 software community in general, for giving an example on what role
1822 software development can play in the modern society.
1823
1824 Several people reported bugs, sent patches or suggested improvements,
1825 see the file THANKS in the top-level directory of the source code.
1826
1827 @c **********************************************************
1828 @c **********************  Milestones  **********************
1829 @c **********************************************************
1830 @node Milestones
1831 @chapter Milestones
1832
1833 The complete history of user visible changes is stored in the file
1834 @file{NEWS} in the top-level directory of the source code tree.  The
1835 complete history of modifications to each file is stored in the file
1836 @file{ChangeLog} in the same directory.  This section contain a
1837 condensed version of that information, in the form of ``milestones''
1838 for the project.
1839
1840 @table @asis
1841 @item Stringprep implementation.
1842 Version 0.0.0 released on 2002-11-05.
1843
1844 @item IDNA and Punycode implementations, part of the GNU project.
1845 Version 0.1.0 released on 2003-01-05.
1846
1847 @item Uses official IDNA ACE prefix 'xn--'.
1848 Version 0.1.7 released on 2003-02-12.
1849
1850 @item Command line interface.
1851 Version 0.1.11 released on 2003-02-26.
1852
1853 @item GNU Libc add-on proposed.
1854 Version 0.1.12 released on 2003-03-06.
1855
1856 @item Interoperability testing during IDNConnect.
1857 Version 0.3.1 released on 2003-10-02.
1858
1859 @item TLD restriction testing.
1860 Version 0.4.0 released on 2004-02-28.
1861
1862 @item GNU Libc add-on integrated.
1863 Version 0.4.1 released on 2004-03-08.
1864
1865 @item Native Java implementation.
1866 Version 0.4.2-0.4.9 released between 2004-03-20 and 2004-06-11.
1867
1868 @item PR-29 functions for ``problem sequences''.
1869 Version 0.5.0 released on 2004-06-26.
1870
1871 @item Many small portability fixes and wider use.
1872 Version 0.5.1 through 0.5.20, released between 2004-07-09 and
1873 2005-10-23.
1874
1875 @item Native C# implementation.
1876 Version 0.6.0 released on 2005-12-03.
1877
1878 @end table
1879
1880 @node Concept Index
1881 @unnumbered Concept Index
1882
1883 @printindex cp
1884
1885 @node Function and Variable Index
1886 @unnumbered Function and Variable Index
1887
1888 @printindex fn
1889
1890 @node PR29 discussion
1891 @appendix PR29 discussion
1892
1893 If you wish to experiment with a modified Unicode NFKC implementation
1894 according to the PR29 proposal, you may find the following bug report
1895 useful.  However, I have not verified that the suggested modifications
1896 are correct.  For reference, I'm including my response to the report
1897 as well.
1898
1899 @verbatim
1900 From: Rick McGowan <rick@unicode.org>
1901 Subject: Possible bug and status of PR 29 change(s)
1902 To: bug-libidn@gnu.org
1903 Date: Wed, 27 Oct 2004 14:49:17 -0700
1904
1905 Hello. On behalf of the Unicode Consortium editorial committee, I would
1906 like to find out more information about the PR 29 fixes, if any, and
1907 functions in Libidn. Your implementation was listed in the text of PR29 as
1908 needing investigation, so I am following up on several implementations.
1909
1910 The UTC has accepted the proposed fix to D2 as outlined in PR29, and a new
1911 draft of UAX #15 has been issued.
1912
1913 I have looked at Libidn 0.5.8 (today), and there may still be a possible
1914 bug in NFKC.java and nfkc.c.
1915
1916 ------------------------------------------------------
1917
1918 1. In NFKC.java, this line in canonicalOrdering():
1919
1920       if (i > 0 && (last_cc == 0 || last_cc != cc)) {
1921
1922 should perhaps be changed to:
1923
1924       if (i > 0 && (last_cc == 0 || last_cc < cc)) {
1925
1926 but I'm not sure of the sense of this comparison.
1927
1928 ------------------------------------------------------
1929
1930 2. In nfkc.c, function _g_utf8_normalize_wc() has this code:
1931
1932           if (i > 0 &&
1933               (last_cc == 0 || last_cc != cc) &&
1934               combine (wc_buffer[last_start], wc_buffer[i],
1935                        &wc_buffer[last_start]))
1936             {
1937
1938 This appears to have the same bug as the current Python implementation (in
1939 Python 2.3.4). The code should be checking, as per new rule D2 UAX #15
1940 update, that the next combining character is the same or HIGHER than the
1941 current one. It now checks to see if it's non-zero and not equal.
1942
1943 The above line(s) should perhaps be changed to:
1944
1945           if (i > 0 &&
1946               (last_cc == 0 || last_cc < cc) &&
1947               combine (wc_buffer[last_start], wc_buffer[i],
1948                        &wc_buffer[last_start]))
1949             {
1950
1951 but I'm not sure of the sense of the comparison (< or > or <=?) here.
1952
1953 In the text of PR29, I will be marking Libidn as "needs change" and adding
1954 the version number that I checked. If any further change is made, please
1955 let me know the release version, and I'll update again.
1956
1957 Regards,
1958         Rick McGowan
1959 @end verbatim
1960
1961 @verbatim
1962 From: Simon Josefsson <jas@extundo.com>
1963 Subject: Re: Possible bug and status of PR 29 change(s)
1964 To: Rick McGowan <rick@unicode.org>
1965 Cc: bug-libidn@gnu.org
1966 Date: Thu, 28 Oct 2004 09:47:47 +0200
1967
1968 Rick McGowan <rick@unicode.org> writes:
1969
1970 > Hello. On behalf of the Unicode Consortium editorial committee, I would
1971 > like to find out more information about the PR 29 fixes, if any, and
1972 > functions in Libidn. Your implementation was listed in the text of PR29 as
1973 > needing investigation, so I am following up on several implementations.
1974 >
1975 > The UTC has accepted the proposed fix to D2 as outlined in PR29, and a new
1976 > draft of UAX #15 has been issued.
1977 >
1978 > I have looked at Libidn 0.5.8 (today), and there may still be a possible
1979 > bug in NFKC.java and nfkc.c.
1980
1981 Hello Rick.
1982
1983 I believe the current behavior is intentional.  Libidn do not aim to
1984 implement latest-and-greatest NFKC, it aim to implement the NFKC
1985 functionality required for StringPrep and IDN.  As you may know,
1986 StringPrep/IDN reference Unicode 3.2.0, and explicitly says any later
1987 changes (which I consider PR29 as) do not apply.
1988
1989 In fact, I believe that would I incorporate the changes suggested in
1990 PR29, I would in fact be violating the IDN specifications.
1991
1992 Thanks for looking into the code and finding the place where the
1993 change could be made.  I'll see if I can mention this in the manual
1994 somewhere, for technically interested readers.
1995
1996 Regards,
1997 Simon
1998 @end verbatim
1999
2000 @include lgpl.texi
2001
2002 @node Copying This Manual
2003 @appendix Copying This Manual
2004
2005 @menu
2006 * GNU Free Documentation License::  License for copying this manual.
2007 @end menu
2008
2009 @include fdl.texi
2010
2011 @bye