README

   1 Libidn README -- Important introductory notes.
   2 See the end for copying conditions.
   3
   4 GNU Libidn is an implementation of the IETF Stringprep, Nameprep,
   5 Punycode and IDNA specifications, licensed under the LGPL.  See
   6 ANNOUNCE for an overview.
   7
   8 Currently the API include the following definitions and functions:
   9
  10 Header file: #include <stringprep.h>
  11
  12      Main Stringprep API header file.
  13
  14 Header file: #include <stringprep_nameprep.h>
  15 Header file: #include <stringprep_kerbero5.h>
  16
  17      Convenience header files for stringprep_nameprep* and
  18      stringprep_kerberos5 macros (see below).
  19
  20 Function: int stringprep (char *in, int maxlen, int flags,
  21                           Stringprep_profile * profile);
  22
  23      Perform stringprep on a zero terminated UTF-8 string.  Since the
  24      stringprep operation can expand the string, maxlen indicate how
  25      large the buffer holding the string is.  See below for valid
  26      flags options.  The profile indicates processing details, see the
  27      profile header files, such as stringprep_generic.h and
  28      stringprep_nameprep.h for two examples.  Your application can
  29      define new profiles, possibly re-using the generic stringprep
  30      tables that always will be part of the library.  Note that you
  31      must convert strings entered in the systems locale into UTF-8
  32      before using this function.
  33
  34 Macro: int stringprep_nameprep(char *in, int maxlen)
  35 Macro: int stringprep_nameprep_no_unassigned(char *in, int maxlen)
  36 Macro: int stringprep_kerberos5(char *in, int maxlen)
  37
  38      Short-hand macros for applying Nameprep with
  39      AllowUnassigned=TRUE, Nameprep with AllowUnassigned=FALSE and
  40      Kerberos 5 stringprep profiles to strings, respectively.
  41
  42 Macro: STRINGPREP_VERSION
  43
  44      CPP definition, a string with version of the stringprep header file.
  45
  46 Function: extern char *stringprep_check_version (char *req_version);
  47
  48      Check that the the version of the library is at minimum the one
  49      given as a string in REQ_VERSION and return the actual version
  50      string of the library; return NULL if the condition is not met.
  51      If `NULL' is passed to this function no check is done and only the
  52      version string is returned.
  53
  54 Type: STRINGPREP_NO_NFKC
  55 Type: STRINGPREP_NO_BIDI
  56 Type: STRINGPREP_NO_UNASSIGNED
  57
  58      Valid options to the FLAGS parameter of stringprep().
  59      STRINGPREP_NO_NFKC disables the NFKC normalization, as well as
  60      selecting the non-NFKC case folding tables. STRINGPREP_NO_BIDI
  61      disables the BIDI step.  STRINGPREP_NO_UNASSIGNED causes
  62      stringprep() abort with an error if string contains unassigned
  63      characters according to profile.  Usually the profile specifies
  64      BIDI and NFKC settings.
  65
  66 Header file: #include <punycode.h>
  67
  68      Main Punycode API header file.
  69
  70 Function: int punycode_encode (size_t input_length,
  71                                const unsigned long input[],
  72                                const unsigned char case_flags[],
  73                                size_t * output_length, char output[]);
  74
  75      punycode_encode() converts Unicode to Punycode.  The input is
  76      represented as an array of Unicode code points (not code units;
  77      surrogate pairs are not allowed), and the output will be
  78      represented as an array of ASCII code points.  The output string
  79      is *not* null-terminated; it will contain zeros if and only if
  80      the input contains zeros.  (Of course the caller can leave room
  81      for a terminator and add one if needed.)  The input_length is the
  82      number of code points in the input.  The output_length is an
  83      in/out argument: the caller passes in the maximum number of code
  84      points that it can receive, and on successful return it will
  85      contain the number of code points actually output.  The
  86      case_flags array holds input_length boolean values, where nonzero
  87      suggests that the corresponding Unicode character be forced to
  88      uppercase after being decoded (if possible), and zero suggests
  89      that it be forced to lowercase (if possible).  ASCII code points
  90      are encoded literally, except that ASCII letters are forced to
  91      uppercase or lowercase according to the corresponding uppercase
  92      flags.  If case_flags is a null pointer then ASCII letters are
  93      left as they are, and other code points are treated as if their
  94      uppercase flags were zero.  The return value can be any of the
  95      punycode_status values defined above except punycode_bad_input;
  96      if not punycode_success, then output_size and output might
  97      contain garbage.
  98
  99 Function: int punycode_decode (size_t input_length,
 100                                const char input[],
 101                                size_t * output_length,
 102                                unsigned long output[],
 103                                unsigned char case_flags[]);
 104
 105      punycode_decode() converts Punycode to Unicode.  The input is
 106      represented as an array of ASCII code points, and the output will
 107      be represented as an array of Unicode code points.  The
 108      input_length is the number of code points in the input.  The
 109      output_length is an in/out argument: the caller passes in the
 110      maximum number of code points that it can receive, and on
 111      successful return it will contain the actual number of code
 112      points output.  The case_flags array needs room for at least
 113      output_length values, or it can be a null pointer if the case
 114      information is not needed.  A nonzero flag suggests that the
 115      corresponding Unicode character be forced to uppercase by the
 116      caller (if possible), while zero suggests that it be forced to
 117      lowercase (if possible).  ASCII code points are output already in
 118      the proper case, but their flags will be set appropriately so
 119      that applying the flags would be harmless.  The return value can
 120      be any of the punycode_status values defined above; if not
 121      punycode_success, then output_length, output, and case_flags
 122      might contain garbage.  On success, the decoder will never need
 123      to write an output_length greater than input_length, because of
 124      how the encoding is defined.
 125
 126 Header file: #include <idna.h>
 127
 128      Main IDNA API header file.
 129
 130 Function: int idna_to_ascii (const unsigned long *in, size_t inlen,
 131                              char *out,
 132                              int allowunassigned, int usestd3asciirules);
 133
 134 Function: int idna_to_unicode (const unsigned long *in, size_t inlen,
 135                                unsigned long *out, size_t *outlen,
 136                                int allowunassigned, int usestd3asciirules);
 137
 138 The library also contains the following utility functions:
 139
 140 Function: int stringprep_unichar_to_utf8 (long c, char *outbuf);
 141 Function: long stringprep_utf8_to_unichar (const char *p);
 142
 143      Convert between Unicode (UCS4) and UTF-8, one character only.
 144
 145 Function: long *stringprep_utf8_to_ucs4 (const char *str, int  len,
 146                                       int *items_written);
 147 Function: char *stringprep_ucs4_to_utf8 (const long * str, int len,
 148                                       int * items_read, int * items_written);
 149
 150      Convert between Unicode (UCS4) and UTF-8, zero-terminated strings.
 151
 152 Function: char *stringprep_utf8_nfkc_normalize (const char *str, int len);
 153 Function: long *stringprep_ucs4_nfkc_normalize (long *str, int len);
 154
 155      Perform NFKC normalization on strings.
 156
 157 Function: const char *stringprep_locale_charset ();
 158 Function: char *stringprep_convert (const char *str,
 159                                  const char *to_codeset,
 160                                  const char *from_codeset);
 161 Function: char *stringprep_locale_to_utf8 (const char *str);
 162
 163      Convert strings between character sets.
 164
 165 Libidn has at some point in time passed the self tests on the
 166 following systems, but no guarantees.
 167
 168   - alphaev67-dec-osf5.0 (Tru64 UNIX C, Tru64 make -- iconv failed!)
 169   - i686-pc-linux-gnu (Debian Sid, iconv ok)
 170   - i686-pc-linux-gnu (RedHat 7.2, iconv ok)
 171   - mips-sgi-irix6.5 (MIPS C compiler, IRIX make, iconv ok)
 172   - rs6000-ibm-aix4.3.2.0 (GCC 2.9-aix43-000718, GNU make, iconv failed!)
 173   - rs6000-ibm-aix4.3.2.0 (IBM C for AIX compiler, AIX make, iconv failed!)
 174   - sparc-sun-solaris2.6 (Sun WorkShop Compiler C 5.0, non-GNU make)
 175   - sparc-sun-solaris2.8 (Sun WorkShop Compiler C 6.0U2, SUN make, iconv ok)
 176   - sparc-sun-solaris2.8 (GCC 3.1, GNU make, iconv ok)
 177   - ... and over 10 other unix systems including cygwin.
 178
 179 Things left to do below.  If you like to start working on anything,
 180 please let me know so work duplication can be avoided.
 181
 182   - Optimize stringprep, the table searching is slow (but does it matter?).
 183   - Port applications to use libidn.
 184   - Include more stringprep profiles.
 185   - Add texi documentation.
 186   - Implement IDNA tools?  Is there more?
 187
 188 Before it becomes a FAQ: This library do not link with GLIB for the
 189 UTF-8 functions for two reasons.  First, GLIB does not provide
 190 versioning of the Unicode tables (and the developers said it will not
 191 be added either) and this package must know the Unicode version used.
 192 Secondly, GLIB requires some things (e.g., threads) that would make
 193 this package less portable.
 194
 195 For more information see <URL:http://josefsson.org/libidn/>.
 196
 197 Send all bug reports by electronic mail to bug-libidn@josefsson.org.
 198
 199 ----------------------------------------------------------------------
 200 Copyright (C) 2002 Simon Josefsson
 201
 202 Copying and distribution of this file, with or without modification,
 203 are permitted in any medium without royalty provided the copyright
 204 notice and this notice are preserved.