1 Libidn README -- Important introductory notes.
2 See the end for copying conditions.
4 GNU Libidn is an implementation of the IETF Stringprep, Nameprep,
5 Punycode and IDNA specifications, licensed under the LGPL. See
6 ANNOUNCE for an overview.
8 Currently the API include the following definitions and functions:
10 Header file: #include <stringprep.h>
12 Main Stringprep API header file.
14 Header file: #include <stringprep_nameprep.h>
15 Header file: #include <stringprep_kerbero5.h>
17 Convenience header files for stringprep_nameprep* and
18 stringprep_kerberos5 macros (see below).
20 Function: int stringprep (char *in, int maxlen, int flags,
21 Stringprep_profile * profile);
23 Perform stringprep on a zero terminated UTF-8 string. Since the
24 stringprep operation can expand the string, maxlen indicate how
25 large the buffer holding the string is. See below for valid
26 flags options. The profile indicates processing details, see the
27 profile header files, such as stringprep_generic.h and
28 stringprep_nameprep.h for two examples. Your application can
29 define new profiles, possibly re-using the generic stringprep
30 tables that always will be part of the library. Note that you
31 must convert strings entered in the systems locale into UTF-8
32 before using this function.
34 Macro: int stringprep_nameprep(char *in, int maxlen)
35 Macro: int stringprep_nameprep_no_unassigned(char *in, int maxlen)
36 Macro: int stringprep_kerberos5(char *in, int maxlen)
38 Short-hand macros for applying Nameprep with
39 AllowUnassigned=TRUE, Nameprep with AllowUnassigned=FALSE and
40 Kerberos 5 stringprep profiles to strings, respectively.
42 Macro: STRINGPREP_VERSION
44 CPP definition, a string with version of the stringprep header file.
46 Function: extern char *stringprep_check_version (char *req_version);
48 Check that the the version of the library is at minimum the one
49 given as a string in REQ_VERSION and return the actual version
50 string of the library; return NULL if the condition is not met.
51 If `NULL' is passed to this function no check is done and only the
52 version string is returned.
54 Type: STRINGPREP_NO_NFKC
55 Type: STRINGPREP_NO_BIDI
56 Type: STRINGPREP_NO_UNASSIGNED
58 Valid options to the FLAGS parameter of stringprep().
59 STRINGPREP_NO_NFKC disables the NFKC normalization, as well as
60 selecting the non-NFKC case folding tables. STRINGPREP_NO_BIDI
61 disables the BIDI step. STRINGPREP_NO_UNASSIGNED causes
62 stringprep() abort with an error if string contains unassigned
63 characters according to profile. Usually the profile specifies
64 BIDI and NFKC settings.
66 Header file: #include <punycode.h>
68 Main Punycode API header file.
70 Function: int punycode_encode (size_t input_length,
71 const unsigned long input[],
72 const unsigned char case_flags[],
73 size_t * output_length, char output[]);
75 punycode_encode() converts Unicode to Punycode. The input is
76 represented as an array of Unicode code points (not code units;
77 surrogate pairs are not allowed), and the output will be
78 represented as an array of ASCII code points. The output string
79 is *not* null-terminated; it will contain zeros if and only if
80 the input contains zeros. (Of course the caller can leave room
81 for a terminator and add one if needed.) The input_length is the
82 number of code points in the input. The output_length is an
83 in/out argument: the caller passes in the maximum number of code
84 points that it can receive, and on successful return it will
85 contain the number of code points actually output. The
86 case_flags array holds input_length boolean values, where nonzero
87 suggests that the corresponding Unicode character be forced to
88 uppercase after being decoded (if possible), and zero suggests
89 that it be forced to lowercase (if possible). ASCII code points
90 are encoded literally, except that ASCII letters are forced to
91 uppercase or lowercase according to the corresponding uppercase
92 flags. If case_flags is a null pointer then ASCII letters are
93 left as they are, and other code points are treated as if their
94 uppercase flags were zero. The return value can be any of the
95 punycode_status values defined above except punycode_bad_input;
96 if not punycode_success, then output_size and output might
99 Function: int punycode_decode (size_t input_length,
101 size_t * output_length,
102 unsigned long output[],
103 unsigned char case_flags[]);
105 punycode_decode() converts Punycode to Unicode. The input is
106 represented as an array of ASCII code points, and the output will
107 be represented as an array of Unicode code points. The
108 input_length is the number of code points in the input. The
109 output_length is an in/out argument: the caller passes in the
110 maximum number of code points that it can receive, and on
111 successful return it will contain the actual number of code
112 points output. The case_flags array needs room for at least
113 output_length values, or it can be a null pointer if the case
114 information is not needed. A nonzero flag suggests that the
115 corresponding Unicode character be forced to uppercase by the
116 caller (if possible), while zero suggests that it be forced to
117 lowercase (if possible). ASCII code points are output already in
118 the proper case, but their flags will be set appropriately so
119 that applying the flags would be harmless. The return value can
120 be any of the punycode_status values defined above; if not
121 punycode_success, then output_length, output, and case_flags
122 might contain garbage. On success, the decoder will never need
123 to write an output_length greater than input_length, because of
124 how the encoding is defined.
126 Header file: #include <idna.h>
128 Main IDNA API header file.
130 Function: int idna_to_ascii (const unsigned long *in, size_t inlen,
132 int allowunassigned, int usestd3asciirules);
134 Function: int idna_to_unicode (const unsigned long *in, size_t inlen,
135 unsigned long *out, size_t *outlen,
136 int allowunassigned, int usestd3asciirules);
138 The library also contains the following utility functions:
140 Function: int stringprep_unichar_to_utf8 (long c, char *outbuf);
141 Function: long stringprep_utf8_to_unichar (const char *p);
143 Convert between Unicode (UCS4) and UTF-8, one character only.
145 Function: long *stringprep_utf8_to_ucs4 (const char *str, int len,
147 Function: char *stringprep_ucs4_to_utf8 (const long * str, int len,
148 int * items_read, int * items_written);
150 Convert between Unicode (UCS4) and UTF-8, zero-terminated strings.
152 Function: char *stringprep_utf8_nfkc_normalize (const char *str, int len);
153 Function: long *stringprep_ucs4_nfkc_normalize (long *str, int len);
155 Perform NFKC normalization on strings.
157 Function: const char *stringprep_locale_charset ();
158 Function: char *stringprep_convert (const char *str,
159 const char *to_codeset,
160 const char *from_codeset);
161 Function: char *stringprep_locale_to_utf8 (const char *str);
163 Convert strings between character sets.
165 Libidn has at some point in time passed the self tests on the
166 following systems, but no guarantees.
168 - alphaev67-dec-osf5.0 (Tru64 UNIX C, Tru64 make -- iconv failed!)
169 - i686-pc-linux-gnu (Debian Sid, iconv ok)
170 - i686-pc-linux-gnu (RedHat 7.2, iconv ok)
171 - mips-sgi-irix6.5 (MIPS C compiler, IRIX make, iconv ok)
172 - rs6000-ibm-aix4.3.2.0 (GCC 2.9-aix43-000718, GNU make, iconv failed!)
173 - rs6000-ibm-aix4.3.2.0 (IBM C for AIX compiler, AIX make, iconv failed!)
174 - sparc-sun-solaris2.6 (Sun WorkShop Compiler C 5.0, non-GNU make)
175 - sparc-sun-solaris2.8 (Sun WorkShop Compiler C 6.0U2, SUN make, iconv ok)
176 - sparc-sun-solaris2.8 (GCC 3.1, GNU make, iconv ok)
177 - ... and over 10 other unix systems including cygwin.
179 Things left to do below. If you like to start working on anything,
180 please let me know so work duplication can be avoided.
182 - Optimize stringprep, the table searching is slow (but does it matter?).
183 - Port applications to use libidn.
184 - Include more stringprep profiles.
185 - Add texi documentation.
186 - Implement IDNA tools? Is there more?
188 Before it becomes a FAQ: This library do not link with GLIB for the
189 UTF-8 functions for two reasons. First, GLIB does not provide
190 versioning of the Unicode tables (and the developers said it will not
191 be added either) and this package must know the Unicode version used.
192 Secondly, GLIB requires some things (e.g., threads) that would make
193 this package less portable.
195 For more information see <URL:http://josefsson.org/libidn/>.
197 Send all bug reports by electronic mail to bug-libidn@josefsson.org.
199 ----------------------------------------------------------------------
200 Copyright (C) 2002 Simon Josefsson
202 Copying and distribution of this file, with or without modification,
203 are permitted in any medium without royalty provided the copyright
204 notice and this notice are preserved.