mingw/html/lib/Unicode/Collate.html

   1 <?xml version="1.0" ?>
   2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
   3 <html xmlns="http://www.w3.org/1999/xhtml">
   4 <head>
   5 <title>Unicode::Collate - Unicode Collation Algorithm</title>
   6 <meta http-equiv="content-type" content="text/html; charset=utf-8" />
   7 <link rev="made" href="mailto:" />
   8 </head>
   9
  10 <body style="background-color: white">
  11 <table border="0" width="100%" cellspacing="0" cellpadding="3">
  12 <tr><td class="block" style="background-color: #cccccc" valign="middle">
  13 <big><strong><span class="block">&nbsp;Unicode::Collate - Unicode Collation Algorithm</span></strong></big>
  14 </td></tr>
  15 </table>
  16
  17 <p><a name="__index__"></a></p>
  18 <!-- INDEX BEGIN -->
  19
  20 <ul>
  21
  22         <li><a href="#name">NAME</a></li>
  23         <li><a href="#synopsis">SYNOPSIS</a></li>
  24         <li><a href="#description">DESCRIPTION</a></li>
  25         <ul>
  26
  27                 <li><a href="#constructor_and_tailoring">Constructor and Tailoring</a></li>
  28                 <li><a href="#methods_for_collation">Methods for Collation</a></li>
  29                 <li><a href="#methods_for_searching">Methods for Searching</a></li>
  30                 <li><a href="#other_methods">Other Methods</a></li>
  31         </ul>
  32
  33         <li><a href="#export">EXPORT</a></li>
  34         <li><a href="#install">INSTALL</a></li>
  35         <li><a href="#caveats">CAVEATS</a></li>
  36         <li><a href="#author__copyright_and_license">AUTHOR, COPYRIGHT AND LICENSE</a></li>
  37         <li><a href="#see_also">SEE ALSO</a></li>
  38 </ul>
  39 <!-- INDEX END -->
  40
  41 <hr />
  42 <p>
  43 </p>
  44 <h1><a name="name">NAME</a></h1>
  45 <p>Unicode::Collate - Unicode Collation Algorithm</p>
  46 <p>
  47 </p>
  48 <hr />
  49 <h1><a name="synopsis">SYNOPSIS</a></h1>
  50 <pre>
  51   use Unicode::Collate;</pre>
  52 <pre>
  53   #construct
  54   $Collator = Unicode::Collate-&gt;new(%tailoring);</pre>
  55 <pre>
  56   #sort
  57   @sorted = $Collator-&gt;sort(@not_sorted);</pre>
  58 <pre>
  59   #compare
  60   $result = $Collator-&gt;cmp($a, $b); # returns 1, 0, or -1.</pre>
  61 <pre>
  62   # If %tailoring is false (i.e. empty),
  63   # $Collator should do the default collation.</pre>
  64 <p>
  65 </p>
  66 <hr />
  67 <h1><a name="description">DESCRIPTION</a></h1>
  68 <p>This module is an implementation of Unicode Technical Standard #10
  69 (a.k.a. UTS #10) - Unicode Collation Algorithm (a.k.a. UCA).</p>
  70 <p>
  71 </p>
  72 <h2><a name="constructor_and_tailoring">Constructor and Tailoring</a></h2>
  73 <p>The <code>new</code> method returns a collator object.</p>
  74 <pre>
  75    $Collator = Unicode::Collate-&gt;new(
  76       UCA_Version =&gt; $UCA_Version,
  77       alternate =&gt; $alternate, # deprecated: use of 'variable' is recommended.
  78       backwards =&gt; $levelNumber, # or \@levelNumbers
  79       entry =&gt; $element,
  80       hangul_terminator =&gt; $term_primary_weight,
  81       ignoreName =&gt; qr/$ignoreName/,
  82       ignoreChar =&gt; qr/$ignoreChar/,
  83       katakana_before_hiragana =&gt; $bool,
  84       level =&gt; $collationLevel,
  85       normalization  =&gt; $normalization_form,
  86       overrideCJK =&gt; \&amp;overrideCJK,
  87       overrideHangul =&gt; \&amp;overrideHangul,
  88       preprocess =&gt; \&amp;preprocess,
  89       rearrange =&gt; \@charList,
  90       table =&gt; $filename,
  91       undefName =&gt; qr/$undefName/,
  92       undefChar =&gt; qr/$undefChar/,
  93       upper_before_lower =&gt; $bool,
  94       variable =&gt; $variable,
  95    );</pre>
  96 <dl>
  97 <dt><strong><a name="item_uca_version">UCA_Version</a></strong>
  98
  99 <dd>
 100 <p>If the tracking version number of UCA is given,
 101 behavior of that tracking version is emulated on collating.
 102 If omitted, the return value of <a href="#item_uca_version"><code>UCA_Version()</code></a> is used.
 103 <a href="#item_uca_version"><code>UCA_Version()</code></a> should return the latest tracking version supported.</p>
 104 </dd>
 105 <dd>
 106 <p>The supported tracking version: 8, 9, 11, or 14.</p>
 107 </dd>
 108 <dd>
 109 <pre>
 110      UCA       Unicode Standard         DUCET (@version)
 111      ---------------------------------------------------
 112       8              3.1                3.0.1 (3.0.1d9)
 113       9     3.1 with Corrigendum 3      3.1.1 (3.1.1)
 114      11              4.0                4.0.0 (4.0.0)
 115      14             4.1.0               4.1.0 (4.1.0)</pre>
 116 </dd>
 117 <dd>
 118 <p>Note: Recent UTS #10 renames ``Tracking Version'' to ``Revision.''</p>
 119 </dd>
 120 </li>
 121 <dt><strong><a name="item_alternate">alternate</a></strong>
 122
 123 <dd>
 124 <p>-- see 3.2.2 Alternate Weighting, version 8 of UTS #10</p>
 125 </dd>
 126 <dd>
 127 <p>For backward compatibility, <a href="#item_alternate"><code>alternate</code></a> (old name) can be used
 128 as an alias for <a href="#item_variable"><code>variable</code></a>.</p>
 129 </dd>
 130 </li>
 131 <dt><strong><a name="item_backwards">backwards</a></strong>
 132
 133 <dd>
 134 <p>-- see 3.1.2 French Accents, UTS #10.</p>
 135 </dd>
 136 <dd>
 137 <pre>
 138      backwards =&gt; $levelNumber or \@levelNumbers</pre>
 139 </dd>
 140 <dd>
 141 <p>Weights in reverse order; ex. level 2 (diacritic ordering) in French.
 142 If omitted, forwards at all the levels.</p>
 143 </dd>
 144 </li>
 145 <dt><strong><a name="item_entry">entry</a></strong>
 146
 147 <dd>
 148 <p>-- see 3.1 Linguistic Features; 3.2.1 File Format, UTS #10.</p>
 149 </dd>
 150 <dd>
 151 <p>If the same character (or a sequence of characters) exists
 152 in the collation element table through <a href="#item_table"><code>table</code></a>,
 153 mapping to collation elements is overrided.
 154 If it does not exist, the mapping is defined additionally.</p>
 155 </dd>
 156 <dd>
 157 <pre>
 158     entry =&gt; &lt;&lt;'ENTRY', # for DUCET v4.0.0 (allkeys-4.0.0.txt)
 159 0063 0068 ; [.0E6A.0020.0002.0063] # ch
 160 0043 0068 ; [.0E6A.0020.0007.0043] # Ch
 161 0043 0048 ; [.0E6A.0020.0008.0043] # CH
 162 006C 006C ; [.0F4C.0020.0002.006C] # ll
 163 004C 006C ; [.0F4C.0020.0007.004C] # Ll
 164 004C 004C ; [.0F4C.0020.0008.004C] # LL
 165 00F1      ; [.0F7B.0020.0002.00F1] # n-tilde
 166 006E 0303 ; [.0F7B.0020.0002.00F1] # n-tilde
 167 00D1      ; [.0F7B.0020.0008.00D1] # N-tilde
 168 004E 0303 ; [.0F7B.0020.0008.00D1] # N-tilde
 169 ENTRY</pre>
 170 </dd>
 171 <dd>
 172 <pre>
 173     entry =&gt; &lt;&lt;'ENTRY', # for DUCET v4.0.0 (allkeys-4.0.0.txt)
 174 00E6 ; [.0E33.0020.0002.00E6][.0E8B.0020.0002.00E6] # ae ligature as &lt;a&gt;&lt;e&gt;
 175 00C6 ; [.0E33.0020.0008.00C6][.0E8B.0020.0008.00C6] # AE ligature as &lt;A&gt;&lt;E&gt;
 176 ENTRY</pre>
 177 </dd>
 178 <dd>
 179 <p><strong>NOTE:</strong> The code point in the UCA file format (before <code>';'</code>)
 180 <strong>must</strong> be a Unicode code point (defined as hexadecimal),
 181 but not a native code point.
 182 So <code>0063</code> must always denote <code>U+0063</code>,
 183 but not a character of <code>&quot;\x63&quot;</code>.</p>
 184 </dd>
 185 <dd>
 186 <p>Weighting may vary depending on collation element table.
 187 So ensure the weights defined in <a href="#item_entry"><code>entry</code></a> will be consistent with
 188 those in the collation element table loaded via <a href="#item_table"><code>table</code></a>.</p>
 189 </dd>
 190 <dd>
 191 <p>In DUCET v4.0.0, primary weight of <code>C</code> is <code>0E60</code>
 192 and that of <code>D</code> is <code>0E6D</code>. So setting primary weight of <code>CH</code> to <code>0E6A</code>
 193 (as a value between <code>0E60</code> and <code>0E6D</code>)
 194 makes ordering as <code>C &lt; CH &lt; D</code>.
 195 Exactly speaking DUCET already has some characters between <code>C</code> and <code>D</code>:
 196 <code>small capital C</code> (<code>U+1D04</code>) with primary weight <code>0E64</code>,
 197 <code>c-hook/C-hook</code> (<code>U+0188/U+0187</code>) with <code>0E65</code>,
 198 and <code>c-curl</code> (<code>U+0255</code>) with <code>0E69</code>.
 199 Then primary weight <code>0E6A</code> for <code>CH</code> makes <code>CH</code>
 200 ordered between <code>c-curl</code> and <code>D</code>.</p>
 201 </dd>
 202 </li>
 203 <dt><strong><a name="item_hangul_terminator">hangul_terminator</a></strong>
 204
 205 <dd>
 206 <p>-- see 7.1.4 Trailing Weights, UTS #10.</p>
 207 </dd>
 208 <dd>
 209 <p>If a true value is given (non-zero but should be positive),
 210 it will be added as a terminator primary weight to the end of
 211 every standard Hangul syllable. Secondary and any higher weights
 212 for terminator are set to zero.
 213 If the value is false or <a href="#item_hangul_terminator"><code>hangul_terminator</code></a> key does not exist,
 214 insertion of terminator weights will not be performed.</p>
 215 </dd>
 216 <dd>
 217 <p>Boundaries of Hangul syllables are determined
 218 according to conjoining Jamo behavior in <em>the Unicode Standard</em>
 219 and <em>HangulSyllableType.txt</em>.</p>
 220 </dd>
 221 <dd>
 222 <p><strong>Implementation Note:</strong>
 223 (1) For expansion mapping (Unicode character mapped
 224 to a sequence of collation elements), a terminator will not be added
 225 between collation elements, even if Hangul syllable boundary exists there.
 226 Addition of terminator is restricted to the next position
 227 to the last collation element.</p>
 228 </dd>
 229 <dd>
 230 <p>(2) Non-conjoining Hangul letters
 231 (Compatibility Jamo, halfwidth Jamo, and enclosed letters) are not
 232 automatically terminated with a terminator primary weight.
 233 These characters may need terminator included in a collation element
 234 table beforehand.</p>
 235 </dd>
 236 </li>
 237 <dt><strong><a name="item_ignorechar">ignoreChar</a></strong>
 238
 239 <dt><strong><a name="item_ignorename">ignoreName</a></strong>
 240
 241 <dd>
 242 <p>-- see 3.2.2 Variable Weighting, UTS #10.</p>
 243 </dd>
 244 <dd>
 245 <p>Makes the entry in the table completely ignorable;
 246 i.e. as if the weights were zero at all level.</p>
 247 </dd>
 248 <dd>
 249 <p>Through <a href="#item_ignorechar"><code>ignoreChar</code></a>, any character matching <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_qr_"><code>qr/$ignoreChar/</code></a>
 250 will be ignored. Through <a href="#item_ignorename"><code>ignoreName</code></a>, any character whose name
 251 (given in the <a href="#item_table"><code>table</code></a> file as a comment) matches <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_qr_"><code>qr/$ignoreName/</code></a>
 252 will be ignored.</p>
 253 </dd>
 254 <dd>
 255 <p>E.g. when 'a' and 'e' are ignorable,
 256 'element' is equal to 'lament' (or 'lmnt').</p>
 257 </dd>
 258 </li>
 259 <dt><strong><a name="item_katakana_before_hiragana">katakana_before_hiragana</a></strong>
 260
 261 <dd>
 262 <p>-- see 7.3.1 Tertiary Weight Table, UTS #10.</p>
 263 </dd>
 264 <dd>
 265 <p>By default, hiragana is before katakana.
 266 If the parameter is made true, this is reversed.</p>
 267 </dd>
 268 <dd>
 269 <p><strong>NOTE</strong>: This parameter simplemindedly assumes that any hiragana/katakana
 270 distinctions must occur in level 3, and their weights at level 3 must be
 271 same as those mentioned in 7.3.1, UTS #10.
 272 If you define your collation elements which violate this requirement,
 273 this parameter does not work validly.</p>
 274 </dd>
 275 </li>
 276 <dt><strong><a name="item_level">level</a></strong>
 277
 278 <dd>
 279 <p>-- see 4.3 Form Sort Key, UTS #10.</p>
 280 </dd>
 281 <dd>
 282 <p>Set the maximum level.
 283 Any higher levels than the specified one are ignored.</p>
 284 </dd>
 285 <dd>
 286 <pre>
 287   Level 1: alphabetic ordering
 288   Level 2: diacritic ordering
 289   Level 3: case ordering
 290   Level 4: tie-breaking (e.g. in the case when variable is 'shifted')</pre>
 291 </dd>
 292 <dd>
 293 <pre>
 294   ex.level =&gt; 2,</pre>
 295 </dd>
 296 <dd>
 297 <p>If omitted, the maximum is the 4th.</p>
 298 </dd>
 299 </li>
 300 <dt><strong><a name="item_normalization">normalization</a></strong>
 301
 302 <dd>
 303 <p>-- see 4.1 Normalize, UTS #10.</p>
 304 </dd>
 305 <dd>
 306 <p>If specified, strings are normalized before preparation of sort keys
 307 (the normalization is executed after preprocess).</p>
 308 </dd>
 309 <dd>
 310 <p>A form name <code>Unicode::Normalize::normalize()</code> accepts will be applied
 311 as <code>$normalization_form</code>.
 312 Acceptable names include <code>'NFD'</code>, <code>'NFC'</code>, <code>'NFKD'</code>, and <code>'NFKC'</code>.
 313 See <code>Unicode::Normalize::normalize()</code> for detail.
 314 If omitted, <code>'NFD'</code> is used.</p>
 315 </dd>
 316 <dd>
 317 <p><a href="#item_normalization"><code>normalization</code></a> is performed after <a href="#item_preprocess"><code>preprocess</code></a> (if defined).</p>
 318 </dd>
 319 <dd>
 320 <p>Furthermore, special values, <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_undef"><code>undef</code></a> and <code>&quot;prenormalized&quot;</code>, can be used,
 321 though they are not concerned with <code>Unicode::Normalize::normalize()</code>.</p>
 322 </dd>
 323 <dd>
 324 <p>If <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_undef"><code>undef</code></a> (not a string <code>&quot;undef&quot;</code>) is passed explicitly
 325 as the value for this key,
 326 any normalization is not carried out (this may make tailoring easier
 327 if any normalization is not desired). Under <code>(normalization =&gt; undef)</code>,
 328 only contiguous contractions are resolved;
 329 e.g. even if <code>A-ring</code> (and <code>A-ring-cedilla</code>) is ordered after <code>Z</code>,
 330 <code>A-cedilla-ring</code> would be primary equal to <a href="file://C|\msysgit\mingw\html/pod/perlguts.html#item_a"><code>A</code></a>.
 331 In this point,
 332 <code>(normalization =&gt; undef, preprocess =&gt; sub { NFD(shift) })</code>
 333 <strong>is not</strong> equivalent to <code>(normalization =&gt; 'NFD')</code>.</p>
 334 </dd>
 335 <dd>
 336 <p>In the case of <code>(normalization =&gt; &quot;prenormalized&quot;)</code>,
 337 any normalization is not performed, but
 338 non-contiguous contractions with combining characters are performed.
 339 Therefore
 340 <code>(normalization =&gt; 'prenormalized', preprocess =&gt; sub { NFD(shift) })</code>
 341 <strong>is</strong> equivalent to <code>(normalization =&gt; 'NFD')</code>.
 342 If source strings are finely prenormalized,
 343 <code>(normalization =&gt; 'prenormalized')</code> may save time for normalization.</p>
 344 </dd>
 345 <dd>
 346 <p>Except <code>(normalization =&gt; undef)</code>,
 347 <strong>Unicode::Normalize</strong> is required (see also <strong>CAVEAT</strong>).</p>
 348 </dd>
 349 </li>
 350 <dt><strong><a name="item_overridecjk">overrideCJK</a></strong>
 351
 352 <dd>
 353 <p>-- see 7.1 Derived Collation Elements, UTS #10.</p>
 354 </dd>
 355 <dd>
 356 <p>By default, CJK Unified Ideographs are ordered in Unicode codepoint order
 357 but <code>CJK Unified Ideographs</code> (if <a href="#item_uca_version"><code>UCA_Version</code></a> is 8 to 11, its range is
 358 <code>U+4E00..U+9FA5</code>; if <a href="#item_uca_version"><code>UCA_Version</code></a> is 14, its range is <code>U+4E00..U+9FBB</code>)
 359 are lesser than <code>CJK Unified Ideographs Extension</code> (its range is
 360 <code>U+3400..U+4DB5</code> and <code>U+20000..U+2A6D6</code>).</p>
 361 </dd>
 362 <dd>
 363 <p>Through <a href="#item_overridecjk"><code>overrideCJK</code></a>, ordering of CJK Unified Ideographs can be overrided.</p>
 364 </dd>
 365 <dd>
 366 <p>ex. CJK Unified Ideographs in the JIS code point order.</p>
 367 </dd>
 368 <dd>
 369 <pre>
 370   overrideCJK =&gt; sub {
 371       my $u = shift;             # get a Unicode codepoint
 372       my $b = pack('n', $u);     # to UTF-16BE
 373       my $s = your_unicode_to_sjis_converter($b); # convert
 374       my $n = unpack('n', $s);   # convert sjis to short
 375       [ $n, 0x20, 0x2, $u ];     # return the collation element
 376   },</pre>
 377 </dd>
 378 <dd>
 379 <p>ex. ignores all CJK Unified Ideographs.</p>
 380 </dd>
 381 <dd>
 382 <pre>
 383   overrideCJK =&gt; sub {()}, # CODEREF returning empty list</pre>
 384 </dd>
 385 <dd>
 386 <pre>
 387    # where -&gt;eq(&quot;Pe\x{4E00}rl&quot;, &quot;Perl&quot;) is true
 388    # as U+4E00 is a CJK Unified Ideograph and to be ignorable.</pre>
 389 </dd>
 390 <dd>
 391 <p>If <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_undef"><code>undef</code></a> is passed explicitly as the value for this key,
 392 weights for CJK Unified Ideographs are treated as undefined.
 393 But assignment of weight for CJK Unified Ideographs
 394 in table or <a href="#item_entry"><code>entry</code></a> is still valid.</p>
 395 </dd>
 396 </li>
 397 <dt><strong><a name="item_overridehangul">overrideHangul</a></strong>
 398
 399 <dd>
 400 <p>-- see 7.1 Derived Collation Elements, UTS #10.</p>
 401 </dd>
 402 <dd>
 403 <p>By default, Hangul Syllables are decomposed into Hangul Jamo,
 404 even if <code>(normalization =&gt; undef)</code>.
 405 But the mapping of Hangul Syllables may be overrided.</p>
 406 </dd>
 407 <dd>
 408 <p>This parameter works like <a href="#item_overridecjk"><code>overrideCJK</code></a>, so see there for examples.</p>
 409 </dd>
 410 <dd>
 411 <p>If you want to override the mapping of Hangul Syllables,
 412 NFD, NFKD, and FCD are not appropriate,
 413 since they will decompose Hangul Syllables before overriding.</p>
 414 </dd>
 415 <dd>
 416 <p>If <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_undef"><code>undef</code></a> is passed explicitly as the value for this key,
 417 weight for Hangul Syllables is treated as undefined
 418 without decomposition into Hangul Jamo.
 419 But definition of weight for Hangul Syllables
 420 in table or <a href="#item_entry"><code>entry</code></a> is still valid.</p>
 421 </dd>
 422 </li>
 423 <dt><strong><a name="item_preprocess">preprocess</a></strong>
 424
 425 <dd>
 426 <p>-- see 5.1 Preprocessing, UTS #10.</p>
 427 </dd>
 428 <dd>
 429 <p>If specified, the coderef is used to preprocess
 430 before the formation of sort keys.</p>
 431 </dd>
 432 <dd>
 433 <p>ex. dropping English articles, such as ``a'' or ``the''.
 434 Then, ``the pen'' is before ``a pencil''.</p>
 435 </dd>
 436 <dd>
 437 <pre>
 438      preprocess =&gt; sub {
 439            my $str = shift;
 440            $str =~ s/\b(?:an?|the)\s+//gi;
 441            return $str;
 442         },</pre>
 443 </dd>
 444 <dd>
 445 <p><a href="#item_preprocess"><code>preprocess</code></a> is performed before <a href="#item_normalization"><code>normalization</code></a> (if defined).</p>
 446 </dd>
 447 </li>
 448 <dt><strong><a name="item_rearrange">rearrange</a></strong>
 449
 450 <dd>
 451 <p>-- see 3.1.3 Rearrangement, UTS #10.</p>
 452 </dd>
 453 <dd>
 454 <p>Characters that are not coded in logical order and to be rearranged.
 455 If <a href="#item_uca_version"><code>UCA_Version</code></a> is equal to or lesser than 11, default is:</p>
 456 </dd>
 457 <dd>
 458 <pre>
 459     rearrange =&gt; [ 0x0E40..0x0E44, 0x0EC0..0x0EC4 ],</pre>
 460 </dd>
 461 <dd>
 462 <p>If you want to disallow any rearrangement, pass <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_undef"><code>undef</code></a> or <code>[]</code>
 463 (a reference to empty list) as the value for this key.</p>
 464 </dd>
 465 <dd>
 466 <p>If <a href="#item_uca_version"><code>UCA_Version</code></a> is equal to 14, default is <code>[]</code> (i.e. no rearrangement).</p>
 467 </dd>
 468 <dd>
 469 <p><strong>According to the version 9 of UCA, this parameter shall not be used;
 470 but it is not warned at present.</strong></p>
 471 </dd>
 472 </li>
 473 <dt><strong><a name="item_table">table</a></strong>
 474
 475 <dd>
 476 <p>-- see 3.2 Default Unicode Collation Element Table, UTS #10.</p>
 477 </dd>
 478 <dd>
 479 <p>You can use another collation element table if desired.</p>
 480 </dd>
 481 <dd>
 482 <p>The table file should locate in the <em>Unicode/Collate</em> directory
 483 on <a href="file://C|\msysgit\mingw\html/pod/perlvar.html#item__inc"><code>@INC</code></a>. Say, if the filename is <em>Foo.txt</em>,
 484 the table file is searched as <em>Unicode/Collate/Foo.txt</em> in <a href="file://C|\msysgit\mingw\html/pod/perlvar.html#item__inc"><code>@INC</code></a>.</p>
 485 </dd>
 486 <dd>
 487 <p>By default, <em>allkeys.txt</em> (as the filename of DUCET) is used.
 488 If you will prepare your own table file, any name other than <em>allkeys.txt</em>
 489 may be better to avoid namespace conflict.</p>
 490 </dd>
 491 <dd>
 492 <p>If <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_undef"><code>undef</code></a> is passed explicitly as the value for this key,
 493 no file is read (but you can define collation elements via <a href="#item_entry"><code>entry</code></a>).</p>
 494 </dd>
 495 <dd>
 496 <p>A typical way to define a collation element table
 497 without any file of table:</p>
 498 </dd>
 499 <dd>
 500 <pre>
 501    $onlyABC = Unicode::Collate-&gt;new(
 502        table =&gt; undef,
 503        entry =&gt; &lt;&lt; 'ENTRIES',
 504 0061 ; [.0101.0020.0002.0061] # LATIN SMALL LETTER A
 505 0041 ; [.0101.0020.0008.0041] # LATIN CAPITAL LETTER A
 506 0062 ; [.0102.0020.0002.0062] # LATIN SMALL LETTER B
 507 0042 ; [.0102.0020.0008.0042] # LATIN CAPITAL LETTER B
 508 0063 ; [.0103.0020.0002.0063] # LATIN SMALL LETTER C
 509 0043 ; [.0103.0020.0008.0043] # LATIN CAPITAL LETTER C
 510 ENTRIES
 511     );</pre>
 512 </dd>
 513 <dd>
 514 <p>If <a href="#item_ignorename"><code>ignoreName</code></a> or <a href="#item_undefname"><code>undefName</code></a> is used, character names should be
 515 specified as a comment (following <code>#</code>) on each line.</p>
 516 </dd>
 517 </li>
 518 <dt><strong><a name="item_undefchar">undefChar</a></strong>
 519
 520 <dt><strong><a name="item_undefname">undefName</a></strong>
 521
 522 <dd>
 523 <p>-- see 6.3.4 Reducing the Repertoire, UTS #10.</p>
 524 </dd>
 525 <dd>
 526 <p>Undefines the collation element as if it were unassigned in the table.
 527 This reduces the size of the table.
 528 If an unassigned character appears in the string to be collated,
 529 the sort key is made from its codepoint
 530 as a single-character collation element,
 531 as it is greater than any other assigned collation elements
 532 (in the codepoint order among the unassigned characters).
 533 But, it'd be better to ignore characters
 534 unfamiliar to you and maybe never used.</p>
 535 </dd>
 536 <dd>
 537 <p>Through <a href="#item_undefchar"><code>undefChar</code></a>, any character matching <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_qr_"><code>qr/$undefChar/</code></a>
 538 will be undefined. Through <a href="#item_undefname"><code>undefName</code></a>, any character whose name
 539 (given in the <a href="#item_table"><code>table</code></a> file as a comment) matches <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_qr_"><code>qr/$undefName/</code></a>
 540 will be undefined.</p>
 541 </dd>
 542 <dd>
 543 <p>ex. Collation weights for beyond-BMP characters are not stored in object:</p>
 544 </dd>
 545 <dd>
 546 <pre>
 547     undefChar =&gt; qr/[^\0-\x{fffd}]/,</pre>
 548 </dd>
 549 </li>
 550 <dt><strong><a name="item_upper_before_lower">upper_before_lower</a></strong>
 551
 552 <dd>
 553 <p>-- see 6.6 Case Comparisons, UTS #10.</p>
 554 </dd>
 555 <dd>
 556 <p>By default, lowercase is before uppercase.
 557 If the parameter is made true, this is reversed.</p>
 558 </dd>
 559 <dd>
 560 <p><strong>NOTE</strong>: This parameter simplemindedly assumes that any lowercase/uppercase
 561 distinctions must occur in level 3, and their weights at level 3 must be
 562 same as those mentioned in 7.3.1, UTS #10.
 563 If you define your collation elements which differs from this requirement,
 564 this parameter doesn't work validly.</p>
 565 </dd>
 566 </li>
 567 <dt><strong><a name="item_variable">variable</a></strong>
 568
 569 <dd>
 570 <p>-- see 3.2.2 Variable Weighting, UTS #10.</p>
 571 </dd>
 572 <dd>
 573 <p>This key allows to variable weighting for variable collation elements,
 574 which are marked with an ASTERISK in the table
 575 (NOTE: Many punction marks and symbols are variable in <em>allkeys.txt</em>).</p>
 576 </dd>
 577 <dd>
 578 <pre>
 579    variable =&gt; 'blanked', 'non-ignorable', 'shifted', or 'shift-trimmed'.</pre>
 580 </dd>
 581 <dd>
 582 <p>These names are case-insensitive.
 583 By default (if specification is omitted), 'shifted' is adopted.</p>
 584 </dd>
 585 <dd>
 586 <pre>
 587    'Blanked'        Variable elements are made ignorable at levels 1 through 3;
 588                     considered at the 4th level.</pre>
 589 </dd>
 590 <dd>
 591 <pre>
 592    'Non-Ignorable'  Variable elements are not reset to ignorable.</pre>
 593 </dd>
 594 <dd>
 595 <pre>
 596    'Shifted'        Variable elements are made ignorable at levels 1 through 3
 597                     their level 4 weight is replaced by the old level 1 weight.
 598                     Level 4 weight for Non-Variable elements is 0xFFFF.</pre>
 599 </dd>
 600 <dd>
 601 <pre>
 602    'Shift-Trimmed'  Same as 'shifted', but all FFFF's at the 4th level
 603                     are trimmed.</pre>
 604 </dd>
 605 </li>
 606 </dl>
 607 <p>
 608 </p>
 609 <h2><a name="methods_for_collation">Methods for Collation</a></h2>
 610 <dl>
 611 <dt><strong><a name="item_sort"><code>@sorted = $Collator-&gt;sort(@not_sorted)</code></a></strong>
 612
 613 <dd>
 614 <p>Sorts a list of strings.</p>
 615 </dd>
 616 </li>
 617 <dt><strong><a name="item_cmp"><code>$result = $Collator-&gt;cmp($a, $b)</code></a></strong>
 618
 619 <dd>
 620 <p>Returns 1 (when <a href="file://C|\msysgit\mingw\html/pod/perlvar.html#item__a"><code>$a</code></a> is greater than <a href="file://C|\msysgit\mingw\html/pod/perlvar.html#item__b"><code>$b</code></a>)
 621 or 0 (when <a href="file://C|\msysgit\mingw\html/pod/perlvar.html#item__a"><code>$a</code></a> is equal to <a href="file://C|\msysgit\mingw\html/pod/perlvar.html#item__b"><code>$b</code></a>)
 622 or -1 (when <a href="file://C|\msysgit\mingw\html/pod/perlvar.html#item__a"><code>$a</code></a> is lesser than <a href="file://C|\msysgit\mingw\html/pod/perlvar.html#item__b"><code>$b</code></a>).</p>
 623 </dd>
 624 </li>
 625 <dt><strong><a name="item_eq"><code>$result = $Collator-&gt;eq($a, $b)</code></a></strong>
 626
 627 <dt><strong><a name="item_ne"><code>$result = $Collator-&gt;ne($a, $b)</code></a></strong>
 628
 629 <dt><strong><a name="item_lt"><code>$result = $Collator-&gt;lt($a, $b)</code></a></strong>
 630
 631 <dt><strong><a name="item_le"><code>$result = $Collator-&gt;le($a, $b)</code></a></strong>
 632
 633 <dt><strong><a name="item_gt"><code>$result = $Collator-&gt;gt($a, $b)</code></a></strong>
 634
 635 <dt><strong><a name="item_ge"><code>$result = $Collator-&gt;ge($a, $b)</code></a></strong>
 636
 637 <dd>
 638 <p>They works like the same name operators as theirs.</p>
 639 </dd>
 640 <dd>
 641 <pre>
 642    eq : whether $a is equal to $b.
 643    ne : whether $a is not equal to $b.
 644    lt : whether $a is lesser than $b.
 645    le : whether $a is lesser than $b or equal to $b.
 646    gt : whether $a is greater than $b.
 647    ge : whether $a is greater than $b or equal to $b.</pre>
 648 </dd>
 649 </li>
 650 <dt><strong><a name="item_getsortkey"><code>$sortKey = $Collator-&gt;getSortKey($string)</code></a></strong>
 651
 652 <dd>
 653 <p>-- see 4.3 Form Sort Key, UTS #10.</p>
 654 </dd>
 655 <dd>
 656 <p>Returns a sort key.</p>
 657 </dd>
 658 <dd>
 659 <p>You compare the sort keys using a binary comparison
 660 and get the result of the comparison of the strings using UCA.</p>
 661 </dd>
 662 <dd>
 663 <pre>
 664    $Collator-&gt;getSortKey($a) cmp $Collator-&gt;getSortKey($b)</pre>
 665 </dd>
 666 <dd>
 667 <pre>
 668       is equivalent to</pre>
 669 </dd>
 670 <dd>
 671 <pre>
 672    $Collator-&gt;cmp($a, $b)</pre>
 673 </dd>
 674 </li>
 675 <dt><strong><a name="item_viewsortkey"><code>$sortKeyForm = $Collator-&gt;viewSortKey($string)</code></a></strong>
 676
 677 <dd>
 678 <p>Converts a sorting key into its representation form.
 679 If <a href="#item_uca_version"><code>UCA_Version</code></a> is 8, the output is slightly different.</p>
 680 </dd>
 681 <dd>
 682 <pre>
 683    use Unicode::Collate;
 684    my $c = Unicode::Collate-&gt;new();
 685    print $c-&gt;viewSortKey(&quot;Perl&quot;),&quot;\n&quot;;</pre>
 686 </dd>
 687 <dd>
 688 <pre>
 689    # output:
 690    # [0B67 0A65 0B7F 0B03 | 0020 0020 0020 0020 | 0008 0002 0002 0002 | FFFF FFFF FFFF FFFF]
 691    #  Level 1               Level 2               Level 3               Level 4</pre>
 692 </dd>
 693 </li>
 694 </dl>
 695 <p>
 696 </p>
 697 <h2><a name="methods_for_searching">Methods for Searching</a></h2>
 698 <p><strong>DISCLAIMER:</strong> If <a href="#item_preprocess"><code>preprocess</code></a> or <a href="#item_normalization"><code>normalization</code></a> parameter is true
 699 for <code>$Collator</code>, calling these methods (<a href="#item_index"><code>index</code></a>, <a href="#item_match"><code>match</code></a>, <a href="#item_gmatch"><code>gmatch</code></a>,
 700 <a href="#item_subst"><code>subst</code></a>, <a href="#item_gsubst"><code>gsubst</code></a>) is croaked,
 701 as the position and the length might differ
 702 from those on the specified string.
 703 (And <a href="#item_rearrange"><code>rearrange</code></a> and <a href="#item_hangul_terminator"><code>hangul_terminator</code></a> parameters are neglected.)</p>
 704 <p>The <a href="#item_match"><code>match</code></a>, <a href="#item_gmatch"><code>gmatch</code></a>, <a href="#item_subst"><code>subst</code></a>, <a href="#item_gsubst"><code>gsubst</code></a> methods work
 705 like <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_m_"><code>m//</code></a>, <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_m_"><code>m//g</code></a>, <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_s_"><code>s///</code></a>, <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_s_"><code>s///g</code></a>, respectively,
 706 but they are not aware of any pattern, but only a literal substring.</p>
 707 <dl>
 708 <dt><strong><a name="item_index"><code>$position = $Collator-&gt;index($string, $substring[, $position])</code></a></strong>
 709
 710 <dt><strong><code>($position, $length) = $Collator-&gt;index($string, $substring[, $position])</code></strong>
 711
 712 <dd>
 713 <p>If <code>$substring</code> matches a part of <code>$string</code>, returns
 714 the position of the first occurrence of the matching part in scalar context;
 715 in list context, returns a two-element list of
 716 the position and the length of the matching part.</p>
 717 </dd>
 718 <dd>
 719 <p>If <code>$substring</code> does not match any part of <code>$string</code>,
 720 returns <code>-1</code> in scalar context and
 721 an empty list in list context.</p>
 722 </dd>
 723 <dd>
 724 <p>e.g. you say</p>
 725 </dd>
 726 <dd>
 727 <pre>
 728   my $Collator = Unicode::Collate-&gt;new( normalization =&gt; undef, level =&gt; 1 );
 729                                      # (normalization =&gt; undef) is REQUIRED.
 730   my $str = &quot;Ich muß studieren Perl.&quot;;
 731   my $sub = &quot;MÜSS&quot;;
 732   my $match;
 733   if (my($pos,$len) = $Collator-&gt;index($str, $sub)) {
 734       $match = substr($str, $pos, $len);
 735   }</pre>
 736 </dd>
 737 <dd>
 738 <p>and get <code>&quot;muß&quot;</code> in <code>$match</code> since <code>&quot;muß&quot;</code>
 739 is primary equal to <code>&quot;MÜSS&quot;</code>.</p>
 740 </dd>
 741 </li>
 742 <dt><strong><a name="item_match"><code>$match_ref = $Collator-&gt;match($string, $substring)</code></a></strong>
 743
 744 <dt><strong><code>($match)   = $Collator-&gt;match($string, $substring)</code></strong>
 745
 746 <dd>
 747 <p>If <code>$substring</code> matches a part of <code>$string</code>, in scalar context, returns
 748 <strong>a reference to</strong> the first occurrence of the matching part
 749 (<code>$match_ref</code> is always true if matches,
 750 since every reference is <strong>true</strong>);
 751 in list context, returns the first occurrence of the matching part.</p>
 752 </dd>
 753 <dd>
 754 <p>If <code>$substring</code> does not match any part of <code>$string</code>,
 755 returns <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_undef"><code>undef</code></a> in scalar context and
 756 an empty list in list context.</p>
 757 </dd>
 758 <dd>
 759 <p>e.g.</p>
 760 </dd>
 761 <dd>
 762 <pre>
 763     if ($match_ref = $Collator-&gt;match($str, $sub)) { # scalar context
 764         print &quot;matches [$$match_ref].\n&quot;;
 765     } else {
 766         print &quot;doesn't match.\n&quot;;
 767     }</pre>
 768 </dd>
 769 <dd>
 770 <pre>
 771      or</pre>
 772 </dd>
 773 <dd>
 774 <pre>
 775     if (($match) = $Collator-&gt;match($str, $sub)) { # list context
 776         print &quot;matches [$match].\n&quot;;
 777     } else {
 778         print &quot;doesn't match.\n&quot;;
 779     }</pre>
 780 </dd>
 781 </li>
 782 <dt><strong><a name="item_gmatch"><code>@match = $Collator-&gt;gmatch($string, $substring)</code></a></strong>
 783
 784 <dd>
 785 <p>If <code>$substring</code> matches a part of <code>$string</code>, returns
 786 all the matching parts (or matching count in scalar context).</p>
 787 </dd>
 788 <dd>
 789 <p>If <code>$substring</code> does not match any part of <code>$string</code>,
 790 returns an empty list.</p>
 791 </dd>
 792 </li>
 793 <dt><strong><a name="item_subst"><code>$count = $Collator-&gt;subst($string, $substring, $replacement)</code></a></strong>
 794
 795 <dd>
 796 <p>If <code>$substring</code> matches a part of <code>$string</code>,
 797 the first occurrence of the matching part is replaced by <code>$replacement</code>
 798 (<code>$string</code> is modified) and return <code>$count</code> (always equals to <code>1</code>).</p>
 799 </dd>
 800 <dd>
 801 <p><code>$replacement</code> can be a <code>CODEREF</code>,
 802 taking the matching part as an argument,
 803 and returning a string to replace the matching part
 804 (a bit similar to <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_s_"><code>s/(..)/$coderef-&gt;($1)/e</code></a>).</p>
 805 </dd>
 806 </li>
 807 <dt><strong><a name="item_gsubst"><code>$count = $Collator-&gt;gsubst($string, $substring, $replacement)</code></a></strong>
 808
 809 <dd>
 810 <p>If <code>$substring</code> matches a part of <code>$string</code>,
 811 all the occurrences of the matching part is replaced by <code>$replacement</code>
 812 (<code>$string</code> is modified) and return <code>$count</code>.</p>
 813 </dd>
 814 <dd>
 815 <p><code>$replacement</code> can be a <code>CODEREF</code>,
 816 taking the matching part as an argument,
 817 and returning a string to replace the matching part
 818 (a bit similar to <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_s_"><code>s/(..)/$coderef-&gt;($1)/eg</code></a>).</p>
 819 </dd>
 820 <dd>
 821 <p>e.g.</p>
 822 </dd>
 823 <dd>
 824 <pre>
 825   my $Collator = Unicode::Collate-&gt;new( normalization =&gt; undef, level =&gt; 1 );
 826                                      # (normalization =&gt; undef) is REQUIRED.
 827   my $str = &quot;Camel donkey zebra came\x{301}l CAMEL horse cAm\0E\0L...&quot;;
 828   $Collator-&gt;gsubst($str, &quot;camel&quot;, sub { &quot;&lt;b&gt;$_[0]&lt;/b&gt;&quot; });</pre>
 829 </dd>
 830 <dd>
 831 <pre>
 832   # now $str is &quot;&lt;b&gt;Camel&lt;/b&gt; donkey zebra &lt;b&gt;came\x{301}l&lt;/b&gt; &lt;b&gt;CAMEL&lt;/b&gt; horse &lt;b&gt;cAm\0E\0L&lt;/b&gt;...&quot;;
 833   # i.e., all the camels are made bold-faced.</pre>
 834 </dd>
 835 </li>
 836 </dl>
 837 <p>
 838 </p>
 839 <h2><a name="other_methods">Other Methods</a></h2>
 840 <dl>
 841 <dt><strong><a name="item_change"><code>%old_tailoring = $Collator-&gt;change(%new_tailoring)</code></a></strong>
 842
 843 <dd>
 844 <p>Change the value of specified keys and returns the changed part.</p>
 845 </dd>
 846 <dd>
 847 <pre>
 848     $Collator = Unicode::Collate-&gt;new(level =&gt; 4);</pre>
 849 </dd>
 850 <dd>
 851 <pre>
 852     $Collator-&gt;eq(&quot;perl&quot;, &quot;PERL&quot;); # false</pre>
 853 </dd>
 854 <dd>
 855 <pre>
 856     %old = $Collator-&gt;change(level =&gt; 2); # returns (level =&gt; 4).</pre>
 857 </dd>
 858 <dd>
 859 <pre>
 860     $Collator-&gt;eq(&quot;perl&quot;, &quot;PERL&quot;); # true</pre>
 861 </dd>
 862 <dd>
 863 <pre>
 864     $Collator-&gt;change(%old); # returns (level =&gt; 2).</pre>
 865 </dd>
 866 <dd>
 867 <pre>
 868     $Collator-&gt;eq(&quot;perl&quot;, &quot;PERL&quot;); # false</pre>
 869 </dd>
 870 <dd>
 871 <p>Not all <code>(key,value)</code>s are allowed to be changed.
 872 See also <code>@Unicode::Collate::ChangeOK</code> and <code>@Unicode::Collate::ChangeNG</code>.</p>
 873 </dd>
 874 <dd>
 875 <p>In the scalar context, returns the modified collator
 876 (but it is <strong>not</strong> a clone from the original).</p>
 877 </dd>
 878 <dd>
 879 <pre>
 880     $Collator-&gt;change(level =&gt; 2)-&gt;eq(&quot;perl&quot;, &quot;PERL&quot;); # true</pre>
 881 </dd>
 882 <dd>
 883 <pre>
 884     $Collator-&gt;eq(&quot;perl&quot;, &quot;PERL&quot;); # true; now max level is 2nd.</pre>
 885 </dd>
 886 <dd>
 887 <pre>
 888     $Collator-&gt;change(level =&gt; 4)-&gt;eq(&quot;perl&quot;, &quot;PERL&quot;); # false</pre>
 889 </dd>
 890 </li>
 891 <dt><strong><a name="item_version"><code>$version = $Collator-&gt;version()</code></a></strong>
 892
 893 <dd>
 894 <p>Returns the version number (a string) of the Unicode Standard
 895 which the <a href="#item_table"><code>table</code></a> file used by the collator object is based on.
 896 If the table does not include a version line (starting with <code>@version</code>),
 897 returns <code>&quot;unknown&quot;</code>.</p>
 898 </dd>
 899 </li>
 900 <dt><strong><code>UCA_Version()</code></strong>
 901
 902 <dd>
 903 <p>Returns the tracking version number of UTS #10 this module consults.</p>
 904 </dd>
 905 </li>
 906 <dt><strong><a name="item_base_unicode_version"><code>Base_Unicode_Version()</code></a></strong>
 907
 908 <dd>
 909 <p>Returns the version number of UTS #10 this module consults.</p>
 910 </dd>
 911 </li>
 912 </dl>
 913 <p>
 914 </p>
 915 <hr />
 916 <h1><a name="export">EXPORT</a></h1>
 917 <p>No method will be exported.</p>
 918 <p>
 919 </p>
 920 <hr />
 921 <h1><a name="install">INSTALL</a></h1>
 922 <p>Though this module can be used without any <a href="#item_table"><code>table</code></a> file,
 923 to use this module easily, it is recommended to install a table file
 924 in the UCA format, by copying it under the directory
 925 &lt;a place in @INC&gt;/Unicode/Collate.</p>
 926 <p>The most preferable one is ``The Default Unicode Collation Element Table''
 927 (aka DUCET), available from the Unicode Consortium's website:</p>
 928 <pre>
 929    <a href="http://www.unicode.org/Public/UCA/">http://www.unicode.org/Public/UCA/</a></pre>
 930 <pre>
 931    <a href="http://www.unicode.org/Public/UCA/latest/allkeys.txt">http://www.unicode.org/Public/UCA/latest/allkeys.txt</a> (latest version)</pre>
 932 <p>If DUCET is not installed, it is recommended to copy the file
 933 from <a href="http://www.unicode.org/Public/UCA/latest/allkeys.txt">http://www.unicode.org/Public/UCA/latest/allkeys.txt</a>
 934 to &lt;a place in @INC&gt;/Unicode/Collate/allkeys.txt
 935 manually.</p>
 936 <p>
 937 </p>
 938 <hr />
 939 <h1><a name="caveats">CAVEATS</a></h1>
 940 <dl>
 941 <dt><strong><a name="item_normalization">Normalization</a></strong>
 942
 943 <dd>
 944 <p>Use of the <a href="#item_normalization"><code>normalization</code></a> parameter requires the <strong>Unicode::Normalize</strong>
 945 module (see <a href="file://C|\msysgit\mingw\html/lib/Unicode/Normalize.html">the Unicode::Normalize manpage</a>).</p>
 946 </dd>
 947 <dd>
 948 <p>If you need not it (say, in the case when you need not
 949 handle any combining characters),
 950 assign <code>normalization =&gt; undef</code> explicitly.</p>
 951 </dd>
 952 <dd>
 953 <p>-- see 6.5 Avoiding Normalization, UTS #10.</p>
 954 </dd>
 955 </li>
 956 <dt><strong><a name="item_conformance_test">Conformance Test</a></strong>
 957
 958 <dd>
 959 <p>The Conformance Test for the UCA is available
 960 under <a href="http://www.unicode.org/Public/UCA/">http://www.unicode.org/Public/UCA/</a>.</p>
 961 </dd>
 962 <dd>
 963 <p>For <em>CollationTest_SHIFTED.txt</em>,
 964 a collator via <code>Unicode::Collate-&gt;new( )</code> should be used;
 965 for <em>CollationTest_NON_IGNORABLE.txt</em>, a collator via
 966 <code>Unicode::Collate-&gt;new(variable =&gt; &quot;non-ignorable&quot;, level =&gt; 3)</code>.</p>
 967 </dd>
 968 <dd>
 969 <p><strong>Unicode::Normalize is required to try The Conformance Test.</strong></p>
 970 </dd>
 971 </li>
 972 </dl>
 973 <p>
 974 </p>
 975 <hr />
 976 <h1><a name="author__copyright_and_license">AUTHOR, COPYRIGHT AND LICENSE</a></h1>
 977 <p>The Unicode::Collate module for perl was written by SADAHIRO Tomoyuki,
 978 &lt;<a href="mailto:SADAHIRO@cpan.org">SADAHIRO@cpan.org</a>&gt;. This module is <code>Copyright(C)</code> 2001-2005,
 979 SADAHIRO Tomoyuki. Japan. All rights reserved.</p>
 980 <p>This module is free software; you can redistribute it and/or
 981 modify it under the same terms as Perl itself.</p>
 982 <p>The file Unicode/Collate/allkeys.txt was copied directly
 983 from <a href="http://www.unicode.org/Public/UCA/4.1.0/allkeys.txt">http://www.unicode.org/Public/UCA/4.1.0/allkeys.txt</a>.
 984 This file is Copyright (c) 1991-2005 Unicode, Inc. All rights reserved.
 985 Distributed under the Terms of Use in <a href="http://www.unicode.org/copyright.html">http://www.unicode.org/copyright.html</a>.</p>
 986 <p>
 987 </p>
 988 <hr />
 989 <h1><a name="see_also">SEE ALSO</a></h1>
 990 <dl>
 991 <dt><strong><a name="item_unicode_collation_algorithm__2d_uts__2310">Unicode Collation Algorithm - UTS #10</a></strong>
 992
 993 <dd>
 994 <p><a href="http://www.unicode.org/reports/tr10/">http://www.unicode.org/reports/tr10/</a></p>
 995 </dd>
 996 </li>
 997 <dt><strong><a name="item_table">The Default Unicode Collation Element Table (DUCET)</a></strong>
 998
 999 <dd>
1000 <p><a href="http://www.unicode.org/Public/UCA/latest/allkeys.txt">http://www.unicode.org/Public/UCA/latest/allkeys.txt</a></p>
1001 </dd>
1002 </li>
1003 <dt><strong><a name="item_the_conformance_test_for_the_uca">The conformance test for the UCA</a></strong>
1004
1005 <dd>
1006 <p><a href="http://www.unicode.org/Public/UCA/latest/CollationTest.html">http://www.unicode.org/Public/UCA/latest/CollationTest.html</a></p>
1007 </dd>
1008 <dd>
1009 <p><a href="http://www.unicode.org/Public/UCA/latest/CollationTest.zip">http://www.unicode.org/Public/UCA/latest/CollationTest.zip</a></p>
1010 </dd>
1011 </li>
1012 <dt><strong><a name="item_hangul_syllable_type">Hangul Syllable Type</a></strong>
1013
1014 <dd>
1015 <p><a href="http://www.unicode.org/Public/UNIDATA/HangulSyllableType.txt">http://www.unicode.org/Public/UNIDATA/HangulSyllableType.txt</a></p>
1016 </dd>
1017 </li>
1018 <dt><strong><a name="item_unicode_normalization_forms__2d_uax__2315">Unicode Normalization Forms - UAX #15</a></strong>
1019
1020 <dd>
1021 <p><a href="http://www.unicode.org/reports/tr15/">http://www.unicode.org/reports/tr15/</a></p>
1022 </dd>
1023 </li>
1024 </dl>
1025 <table border="0" width="100%" cellspacing="0" cellpadding="3">
1026 <tr><td class="block" style="background-color: #cccccc" valign="middle">
1027 <big><strong><span class="block">&nbsp;Unicode::Collate - Unicode Collation Algorithm</span></strong></big>
1028 </td></tr>
1029 </table>
1030
1031 </body>
1032
1033 </html>