mingw/html/lib/encoding.html

   1 <?xml version="1.0" ?>
   2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
   3 <html xmlns="http://www.w3.org/1999/xhtml">
   4 <head>
   5 <title>encoding - allows you to write your script in non-ascii or non-utf8</title>
   6 <meta http-equiv="content-type" content="text/html; charset=utf-8" />
   7 <link rev="made" href="mailto:" />
   8 </head>
   9
  10 <body style="background-color: white">
  11 <table border="0" width="100%" cellspacing="0" cellpadding="3">
  12 <tr><td class="block" style="background-color: #cccccc" valign="middle">
  13 <big><strong><span class="block">&nbsp;encoding - allows you to write your script in non-ascii or non-utf8</span></strong></big>
  14 </td></tr>
  15 </table>
  16
  17 <p><a name="__index__"></a></p>
  18 <!-- INDEX BEGIN -->
  19
  20 <ul>
  21
  22         <li><a href="#name">NAME</a></li>
  23         <li><a href="#synopsis">SYNOPSIS</a></li>
  24         <li><a href="#abstract">ABSTRACT</a></li>
  25         <ul>
  26
  27                 <li><a href="#literal_conversions">Literal Conversions</a></li>
  28                 <li><a href="#perlio_layers_for_std_in_out_">PerlIO layers for <code>STD(IN|OUT)</code></a></li>
  29                 <li><a href="#implicit_upgrading_for_byte_strings">Implicit upgrading for byte strings</a></li>
  30         </ul>
  31
  32         <li><a href="#features_that_require_5_8_1">FEATURES THAT REQUIRE 5.8.1</a></li>
  33         <li><a href="#usage">USAGE</a></li>
  34         <li><a href="#the_filter_option">The Filter Option</a></li>
  35         <ul>
  36
  37                 <li><a href="#filterrelated_changes_at_encode_version_1_87">Filter-related changes at Encode version 1.87</a></li>
  38         </ul>
  39
  40         <li><a href="#caveats">CAVEATS</a></li>
  41         <ul>
  42
  43                 <li><a href="#not_scoped">NOT SCOPED</a></li>
  44                 <li><a href="#do_not_mix_multiple_encodings">DO NOT MIX MULTIPLE ENCODINGS</a></li>
  45                 <li><a href="#tr____with_ranges">tr/// with ranges</a></li>
  46                 <ul>
  47
  48                         <li><a href="#workaround_to_tr____">workaround to tr///;</a></li>
  49                 </ul>
  50
  51         </ul>
  52
  53         <li><a href="#example__greekperl">EXAMPLE - Greekperl</a></li>
  54         <li><a href="#known_problems">KNOWN PROBLEMS</a></li>
  55         <ul>
  56
  57                 <li><a href="#the_logic_of__locale">The Logic of :locale</a></li>
  58         </ul>
  59
  60         <li><a href="#history">HISTORY</a></li>
  61         <li><a href="#see_also">SEE ALSO</a></li>
  62 </ul>
  63 <!-- INDEX END -->
  64
  65 <hr />
  66 <p>
  67 </p>
  68 <hr />
  69 <h1><a name="name">NAME</a></h1>
  70 <p>encoding - allows you to write your script in non-ascii or non-utf8</p>
  71 <p>
  72 </p>
  73 <hr />
  74 <h1><a name="synopsis">SYNOPSIS</a></h1>
  75 <pre>
  76   use encoding &quot;greek&quot;;  # Perl like Greek to you?
  77   use encoding &quot;euc-jp&quot;; # Jperl!</pre>
  78 <pre>
  79   # or you can even do this if your shell supports your native encoding</pre>
  80 <pre>
  81   perl -Mencoding=latin2 -e '...' # Feeling centrally European?
  82   perl -Mencoding=euc-kr -e '...' # Or Korean?</pre>
  83 <pre>
  84   # more control</pre>
  85 <pre>
  86   # A simple euc-cn =&gt; utf-8 converter
  87   use encoding &quot;euc-cn&quot;, STDOUT =&gt; &quot;utf8&quot;;  while(&lt;&gt;){print};</pre>
  88 <pre>
  89   # &quot;no encoding;&quot; supported (but not scoped!)
  90   no encoding;</pre>
  91 <pre>
  92   # an alternate way, Filter
  93   use encoding &quot;euc-jp&quot;, Filter=&gt;1;
  94   # now you can use kanji identifiers -- in euc-jp!</pre>
  95 <pre>
  96   # switch on locale -
  97   # note that this probably means that unless you have a complete control
  98   # over the environments the application is ever going to be run, you should
  99   # NOT use the feature of encoding pragma allowing you to write your script
 100   # in any recognized encoding because changing locale settings will wreck
 101   # the script; you can of course still use the other features of the pragma.
 102   use encoding ':locale';</pre>
 103 <p>
 104 </p>
 105 <hr />
 106 <h1><a name="abstract">ABSTRACT</a></h1>
 107 <p>Let's start with a bit of history: Perl 5.6.0 introduced Unicode
 108 support.  You could apply <a href="file://C|\msysgit\mingw\html/pod/perlvar.html#item_substr"><code>substr()</code></a> and regexes even to complex CJK
 109 characters -- so long as the script was written in UTF-8.  But back
 110 then, text editors that supported UTF-8 were still rare and many users
 111 instead chose to write scripts in legacy encodings, giving up a whole
 112 new feature of Perl 5.6.</p>
 113 <p>Rewind to the future: starting from perl 5.8.0 with the <strong>encoding</strong>
 114 pragma, you can write your script in any encoding you like (so long
 115 as the <code>Encode</code> module supports it) and still enjoy Unicode support.
 116 This pragma achieves that by doing the following:</p>
 117 <ul>
 118 <li>
 119 <p>Internally converts all literals (<a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_q_"><code>q//,qq//,qr//,qw///, qx//</code></a>) from
 120 the encoding specified to utf8.  In Perl 5.8.1 and later, literals in
 121 <a href="#item_tr_"><code>tr///</code></a> and <code>DATA</code> pseudo-filehandle are also converted.</p>
 122 </li>
 123 <li>
 124 <p>Changing PerlIO layers of <code>STDIN</code> and <code>STDOUT</code> to the encoding
 125  specified.</p>
 126 </li>
 127 </ul>
 128 <p>
 129 </p>
 130 <h2><a name="literal_conversions">Literal Conversions</a></h2>
 131 <p>You can write code in EUC-JP as follows:</p>
 132 <pre>
 133   my $Rakuda = &quot;\xF1\xD1\xF1\xCC&quot;; # Camel in Kanji
 134                #&lt;-char-&gt;&lt;-char-&gt;   # 4 octets
 135   s/\bCamel\b/$Rakuda/;</pre>
 136 <p>And with <code>use encoding &quot;euc-jp&quot;</code> in effect, it is the same thing as
 137 the code in UTF-8:</p>
 138 <pre>
 139   my $Rakuda = &quot;\x{99F1}\x{99DD}&quot;; # two Unicode Characters
 140   s/\bCamel\b/$Rakuda/;</pre>
 141 <p>
 142 </p>
 143 <h2><a name="perlio_layers_for_std_in_out_">PerlIO layers for <code>STD(IN|OUT)</code></a></h2>
 144 <p>The <strong>encoding</strong> pragma also modifies the filehandle layers of
 145 STDIN and STDOUT to the specified encoding.  Therefore,</p>
 146 <pre>
 147   use encoding &quot;euc-jp&quot;;
 148   my $message = &quot;Camel is the symbol of perl.\n&quot;;
 149   my $Rakuda = &quot;\xF1\xD1\xF1\xCC&quot;; # Camel in Kanji
 150   $message =~ s/\bCamel\b/$Rakuda/;
 151   print $message;</pre>
 152 <p>Will print ``\xF1\xD1\xF1\xCC is the symbol of perl.\n'',
 153 not ``\x{99F1}\x{99DD} is the symbol of perl.\n''.</p>
 154 <p>You can override this by giving extra arguments; see below.</p>
 155 <p>
 156 </p>
 157 <h2><a name="implicit_upgrading_for_byte_strings">Implicit upgrading for byte strings</a></h2>
 158 <p>By default, if strings operating under byte semantics and strings
 159 with Unicode character data are concatenated, the new string will
 160 be created by decoding the byte strings as <em>ISO 8859-1 (Latin-1)</em>.</p>
 161 <p>The <strong>encoding</strong> pragma changes this to use the specified encoding
 162 instead.  For example:</p>
 163 <pre>
 164     use encoding 'utf8';
 165     my $string = chr(20000); # a Unicode string
 166     utf8::encode($string);   # now it's a UTF-8 encoded byte string
 167     # concatenate with another Unicode string
 168     print length($string . chr(20000));</pre>
 169 <p>Will print <code>2</code>, because <code>$string</code> is upgraded as UTF-8.  Without
 170 <code>use encoding 'utf8';</code>, it will print <code>4</code> instead, since <code>$string</code>
 171 is three octets when interpreted as Latin-1.</p>
 172 <p>
 173 </p>
 174 <hr />
 175 <h1><a name="features_that_require_5_8_1">FEATURES THAT REQUIRE 5.8.1</a></h1>
 176 <p>Some of the features offered by this pragma requires perl 5.8.1.  Most
 177 of these are done by Inaba Hiroto.  Any other features and changes
 178 are good for 5.8.0.</p>
 179 <dl>
 180 <dt><strong><a name="item__22non_2deuc_22_doublebyte_encodings">``NON-EUC'' doublebyte encodings</a></strong>
 181
 182 <dd>
 183 <p>Because perl needs to parse script before applying this pragma, such
 184 encodings as Shift_JIS and Big-5 that may contain '\' (BACKSLASH;
 185 \x5c) in the second byte fails because the second byte may
 186 accidentally escape the quoting character that follows.  Perl 5.8.1
 187 or later fixes this problem.</p>
 188 </dd>
 189 </li>
 190 <dt><strong><a name="item_tr_">tr//</a></strong>
 191
 192 <dd>
 193 <p><a href="#item_tr_"><code>tr//</code></a> was overlooked by Perl 5 porters when they released perl 5.8.0
 194 See the section below for details.</p>
 195 </dd>
 196 </li>
 197 <dt><strong><a name="item_data_pseudo_2dfilehandle">DATA pseudo-filehandle</a></strong>
 198
 199 <dd>
 200 <p>Another feature that was overlooked was <code>DATA</code>.</p>
 201 </dd>
 202 </li>
 203 </dl>
 204 <p>
 205 </p>
 206 <hr />
 207 <h1><a name="usage">USAGE</a></h1>
 208 <dl>
 209 <dt><strong><a name="item_use_encoding__5bencname_5d__3b">use encoding [<em>ENCNAME</em>] ;</a></strong>
 210
 211 <dd>
 212 <p>Sets the script encoding to <em>ENCNAME</em>.  And unless ${^UNICODE}
 213 exists and non-zero, PerlIO layers of STDIN and STDOUT are set to
 214 ``:encoding(<em>ENCNAME</em>)''.</p>
 215 </dd>
 216 <dd>
 217 <p>Note that STDERR WILL NOT be changed.</p>
 218 </dd>
 219 <dd>
 220 <p>Also note that non-STD file handles remain unaffected.  Use <code>use
 221 open</code> or <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_binmode"><code>binmode</code></a> to change layers of those.</p>
 222 </dd>
 223 <dd>
 224 <p>If no encoding is specified, the environment variable <em>PERL_ENCODING</em>
 225 is consulted.  If no encoding can be found, the error <code>Unknown encoding
 226 'I&lt;ENCNAME&gt;'</code> will be thrown.</p>
 227 </dd>
 228 </li>
 229 <dt><strong><a name="item_use_encoding_encname__5b_stdin__3d_3e_encname_in__">use encoding <em>ENCNAME</em> [ STDIN =&gt; <em>ENCNAME_IN</em> ...] ;</a></strong>
 230
 231 <dd>
 232 <p>You can also individually set encodings of STDIN and STDOUT via the
 233 <code>STDIN =&gt; ENCNAME</code> form.  In this case, you cannot omit the
 234 first <em>ENCNAME</em>.  <code>STDIN =&gt; undef</code> turns the IO transcoding
 235 completely off.</p>
 236 </dd>
 237 <dd>
 238 <p>When ${^UNICODE} exists and non-zero, these options will completely
 239 ignored.  ${^UNICODE} is a variable introduced in perl 5.8.1.  See
 240 <a href="file://C|\msysgit\mingw\html/pod/perlrun.html">the perlrun manpage</a> see <a href="file://C|\msysgit\mingw\html/pod/perlvar.html#item____unicode_">${^UNICODE} in the perlvar manpage</a> and <a href="file://C|\msysgit\mingw\html/pod/perlrun.html#c">-C in the perlrun manpage</a> for
 241 details (perl 5.8.1 and later).</p>
 242 </dd>
 243 </li>
 244 <dt><strong><a name="item_use_encoding_encname_filter_3d_3e1_3b">use encoding <em>ENCNAME</em> Filter=&gt;1;</a></strong>
 245
 246 <dd>
 247 <p>This turns the encoding pragma into a source filter.  While the
 248 default approach just decodes interpolated literals (in <code>qq()</code> and
 249 qr()), this will apply a source filter to the entire source code.  See
 250 <a href="#the_filter_option">The Filter Option</a> below for details.</p>
 251 </dd>
 252 </li>
 253 <dt><strong><a name="item_no_encoding_3b">no encoding;</a></strong>
 254
 255 <dd>
 256 <p>Unsets the script encoding. The layers of STDIN, STDOUT are
 257 reset to ``:raw'' (the default unprocessed raw stream of bytes).</p>
 258 </dd>
 259 </li>
 260 </dl>
 261 <p>
 262 </p>
 263 <hr />
 264 <h1><a name="the_filter_option">The Filter Option</a></h1>
 265 <p>The magic of <code>use encoding</code> is not applied to the names of
 266 identifiers.  In order to make <code>${&quot;\x{4eba}&quot;}++</code> ($human++, where human
 267 is a single Han ideograph) work, you still need to write your script
 268 in UTF-8 -- or use a source filter.  That's what 'Filter=&gt;1' does.</p>
 269 <p>What does this mean?  Your source code behaves as if it is written in
 270 UTF-8 with 'use utf8' in effect.  So even if your editor only supports
 271 Shift_JIS, for example, you can still try examples in Chapter 15 of
 272 <code>Programming Perl, 3rd Ed.</code>.  For instance, you can use UTF-8
 273 identifiers.</p>
 274 <p>This option is significantly slower and (as of this writing) non-ASCII
 275 identifiers are not very stable WITHOUT this option and with the
 276 source code written in UTF-8.</p>
 277 <p>
 278 </p>
 279 <h2><a name="filterrelated_changes_at_encode_version_1_87">Filter-related changes at Encode version 1.87</a></h2>
 280 <ul>
 281 <li>
 282 <p>The Filter option now sets STDIN and STDOUT like non-filter options.
 283 And <code>STDIN=&gt;ENCODING</code> and <code>STDOUT=&gt;ENCODING</code> work like
 284 non-filter version.</p>
 285 </li>
 286 <li>
 287 <p><code>use utf8</code> is implicitly declared so you no longer have to <code>use
 288 utf8</code> to <code>${&quot;\x{4eba}&quot;}++</code>.</p>
 289 </li>
 290 </ul>
 291 <p>
 292 </p>
 293 <hr />
 294 <h1><a name="caveats">CAVEATS</a></h1>
 295 <p>
 296 </p>
 297 <h2><a name="not_scoped">NOT SCOPED</a></h2>
 298 <p>The pragma is a per script, not a per block lexical.  Only the last
 299 <code>use encoding</code> or <code>no encoding</code> matters, and it affects
 300 <strong>the whole script</strong>.  However, the &lt;no encoding&gt; pragma is supported and
 301 <strong>use encoding</strong> can appear as many times as you want in a given script.
 302 The multiple use of this pragma is discouraged.</p>
 303 <p>By the same reason, the use this pragma inside modules is also
 304 discouraged (though not as strongly discouraged as the case above.
 305 See below).</p>
 306 <p>If you still have to write a module with this pragma, be very careful
 307 of the load order.  See the codes below;</p>
 308 <pre>
 309   # called module
 310   package Module_IN_BAR;
 311   use encoding &quot;bar&quot;;
 312   # stuff in &quot;bar&quot; encoding here
 313   1;</pre>
 314 <pre>
 315   # caller script
 316   use encoding &quot;foo&quot;
 317   use Module_IN_BAR;
 318   # surprise! use encoding &quot;bar&quot; is in effect.</pre>
 319 <p>The best way to avoid this oddity is to use this pragma RIGHT AFTER
 320 other modules are loaded.  i.e.</p>
 321 <pre>
 322   use Module_IN_BAR;
 323   use encoding &quot;foo&quot;;</pre>
 324 <p>
 325 </p>
 326 <h2><a name="do_not_mix_multiple_encodings">DO NOT MIX MULTIPLE ENCODINGS</a></h2>
 327 <p>Notice that only literals (string or regular expression) having only
 328 legacy code points are affected: if you mix data like this</p>
 329 <pre>
 330         \xDF\x{100}</pre>
 331 <p>the data is assumed to be in (Latin 1 and) Unicode, not in your native
 332 encoding.  In other words, this will match in ``greek'':</p>
 333 <pre>
 334         &quot;\xDF&quot; =~ /\x{3af}/</pre>
 335 <p>but this will not</p>
 336 <pre>
 337         &quot;\xDF\x{100}&quot; =~ /\x{3af}\x{100}/</pre>
 338 <p>since the <code>\xDF</code> (ISO 8859-7 GREEK SMALL LETTER IOTA WITH TONOS) on
 339 the left will <strong>not</strong> be upgraded to <code>\x{3af}</code> (Unicode GREEK SMALL
 340 LETTER IOTA WITH TONOS) because of the <code>\x{100}</code> on the left.  You
 341 should not be mixing your legacy data and Unicode in the same string.</p>
 342 <p>This pragma also affects encoding of the 0x80..0xFF code point range:
 343 normally characters in that range are left as eight-bit bytes (unless
 344 they are combined with characters with code points 0x100 or larger,
 345 in which case all characters need to become UTF-8 encoded), but if
 346 the <code>encoding</code> pragma is present, even the 0x80..0xFF range always
 347 gets UTF-8 encoded.</p>
 348 <p>After all, the best thing about this pragma is that you don't have to
 349 resort to \x{....} just to spell your name in a native encoding.
 350 So feel free to put your strings in your encoding in quotes and
 351 regexes.</p>
 352 <p>
 353 </p>
 354 <h2><a name="tr____with_ranges">tr/// with ranges</a></h2>
 355 <p>The <strong>encoding</strong> pragma works by decoding string literals in
 356 <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_q_"><code>q//,qq//,qr//,qw///, qx//</code></a> and so forth.  In perl 5.8.0, this
 357 does not apply to <a href="#item_tr_"><code>tr///</code></a>.  Therefore,</p>
 358 <pre>
 359   use encoding 'euc-jp';
 360   #....
 361   $kana =~ tr/\xA4\xA1-\xA4\xF3/\xA5\xA1-\xA5\xF3/;
 362   #           -------- -------- -------- --------</pre>
 363 <p>Does not work as</p>
 364 <pre>
 365   $kana =~ tr/\x{3041}-\x{3093}/\x{30a1}-\x{30f3}/;</pre>
 366 <dl>
 367 <dt><strong><a name="item_legend_of_characters_above">Legend of characters above</a></strong>
 368
 369 <dd>
 370 <pre>
 371   utf8     euc-jp   charnames::viacode()
 372   -----------------------------------------
 373   \x{3041} \xA4\xA1 HIRAGANA LETTER SMALL A
 374   \x{3093} \xA4\xF3 HIRAGANA LETTER N
 375   \x{30a1} \xA5\xA1 KATAKANA LETTER SMALL A
 376   \x{30f3} \xA5\xF3 KATAKANA LETTER N</pre>
 377 </dd>
 378 </dl>
 379 <p>This counterintuitive behavior has been fixed in perl 5.8.1.</p>
 380 <p>
 381 </p>
 382 <h3><a name="workaround_to_tr____">workaround to tr///;</a></h3>
 383 <p>In perl 5.8.0, you can work around as follows;</p>
 384 <pre>
 385   use encoding 'euc-jp';
 386   #  ....
 387   eval qq{ \$kana =~ tr/\xA4\xA1-\xA4\xF3/\xA5\xA1-\xA5\xF3/ };</pre>
 388 <p>Note the <a href="#item_tr_"><code>tr//</code></a> expression is surrounded by <code>qq{}</code>.  The idea behind
 389 is the same as classic idiom that makes <a href="#item_tr_"><code>tr///</code></a> 'interpolate'.</p>
 390 <pre>
 391    tr/$from/$to/;            # wrong!
 392    eval qq{ tr/$from/$to/ }; # workaround.</pre>
 393 <p>Nevertheless, in case of <strong>encoding</strong> pragma even <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_q_"><code>q//</code></a> is affected so
 394 <a href="#item_tr_"><code>tr///</code></a> not being decoded was obviously against the will of Perl5
 395 Porters so it has been fixed in Perl 5.8.1 or later.</p>
 396 <p>
 397 </p>
 398 <hr />
 399 <h1><a name="example__greekperl">EXAMPLE - Greekperl</a></h1>
 400 <pre>
 401     use encoding &quot;iso 8859-7&quot;;</pre>
 402 <pre>
 403     # \xDF in ISO 8859-7 (Greek) is \x{3af} in Unicode.</pre>
 404 <pre>
 405     $a = &quot;\xDF&quot;;
 406     $b = &quot;\x{100}&quot;;</pre>
 407 <pre>
 408     printf &quot;%#x\n&quot;, ord($a); # will print 0x3af, not 0xdf</pre>
 409 <pre>
 410     $c = $a . $b;</pre>
 411 <pre>
 412     # $c will be &quot;\x{3af}\x{100}&quot;, not &quot;\x{df}\x{100}&quot;.</pre>
 413 <pre>
 414     # chr() is affected, and ...</pre>
 415 <pre>
 416     print &quot;mega\n&quot;  if ord(chr(0xdf)) == 0x3af;</pre>
 417 <pre>
 418     # ... ord() is affected by the encoding pragma ...</pre>
 419 <pre>
 420     print &quot;tera\n&quot; if ord(pack(&quot;C&quot;, 0xdf)) == 0x3af;</pre>
 421 <pre>
 422     # ... as are eq and cmp ...</pre>
 423 <pre>
 424     print &quot;peta\n&quot; if &quot;\x{3af}&quot; eq  pack(&quot;C&quot;, 0xdf);
 425     print &quot;exa\n&quot;  if &quot;\x{3af}&quot; cmp pack(&quot;C&quot;, 0xdf) == 0;</pre>
 426 <pre>
 427     # ... but pack/unpack C are not affected, in case you still
 428     # want to go back to your native encoding</pre>
 429 <pre>
 430     print &quot;zetta\n&quot; if unpack(&quot;C&quot;, (pack(&quot;C&quot;, 0xdf))) == 0xdf;</pre>
 431 <p>
 432 </p>
 433 <hr />
 434 <h1><a name="known_problems">KNOWN PROBLEMS</a></h1>
 435 <dl>
 436 <dt><strong><a name="item_literals_in_regex_that_are_longer_than_127_bytes">literals in regex that are longer than 127 bytes</a></strong>
 437
 438 <dd>
 439 <p>For native multibyte encodings (either fixed or variable length),
 440 the current implementation of the regular expressions may introduce
 441 recoding errors for regular expression literals longer than 127 bytes.</p>
 442 </dd>
 443 </li>
 444 <dt><strong><a name="item_ebcdic">EBCDIC</a></strong>
 445
 446 <dd>
 447 <p>The encoding pragma is not supported on EBCDIC platforms.
 448 (Porters who are willing and able to remove this limitation are
 449 welcome.)</p>
 450 </dd>
 451 </li>
 452 <dt><strong><a name="item_format">format</a></strong>
 453
 454 <dd>
 455 <p>This pragma doesn't work well with format because PerlIO does not
 456 get along very well with it.  When format contains non-ascii
 457 characters it prints funny or gets ``wide character warnings''.
 458 To understand it, try the code below.</p>
 459 </dd>
 460 <dd>
 461 <pre>
 462   # Save this one in utf8
 463   # replace *non-ascii* with a non-ascii string
 464   my $camel;
 465   format STDOUT =
 466   *non-ascii*@&gt;&gt;&gt;&gt;&gt;&gt;&gt;
 467   $camel
 468   .
 469   $camel = &quot;*non-ascii*&quot;;
 470   binmode(STDOUT=&gt;':encoding(utf8)'); # bang!
 471   write;              # funny
 472   print $camel, &quot;\n&quot;; # fine</pre>
 473 </dd>
 474 <dd>
 475 <p>Without binmode this happens to work but without binmode, <a href="file://C|\msysgit\mingw\html/pod/perlfunc.html#item_print"><code>print()</code></a>
 476 fails instead of write().</p>
 477 </dd>
 478 <dd>
 479 <p>At any rate, the very use of format is questionable when it comes to
 480 unicode characters since you have to consider such things as character
 481 width (i.e. double-width for ideographs) and directions (i.e. BIDI for
 482 Arabic and Hebrew).</p>
 483 </dd>
 484 </li>
 485 </dl>
 486 <p>
 487 </p>
 488 <h2><a name="the_logic_of__locale">The Logic of :locale</a></h2>
 489 <p>The logic of <code>:locale</code> is as follows:</p>
 490 <ol>
 491 <li>
 492 <p>If the platform supports the <code>langinfo(CODESET)</code> interface, the codeset
 493 returned is used as the default encoding for the open pragma.</p>
 494 </li>
 495 <li>
 496 <p>If 1. didn't work but we are under the locale pragma, the environment
 497 variables LC_ALL and LANG (in that order) are matched for encodings
 498 (the part after <code>.</code>, if any), and if any found, that is used
 499 as the default encoding for the open pragma.</p>
 500 </li>
 501 <li>
 502 <p>If 1. and 2. didn't work, the environment variables LC_ALL and LANG
 503 (in that order) are matched for anything looking like UTF-8, and if
 504 any found, <code>:utf8</code> is used as the default encoding for the open
 505 pragma.</p>
 506 </li>
 507 </ol>
 508 <p>If your locale environment variables (LC_ALL, LC_CTYPE, LANG)
 509 contain the strings 'UTF-8' or 'UTF8' (case-insensitive matching),
 510 the default encoding of your STDIN, STDOUT, and STDERR, and of
 511 <strong>any subsequent file open</strong>, is UTF-8.</p>
 512 <p>
 513 </p>
 514 <hr />
 515 <h1><a name="history">HISTORY</a></h1>
 516 <p>This pragma first appeared in Perl 5.8.0.  For features that require
 517 5.8.1 and better, see above.</p>
 518 <p>The <code>:locale</code> subpragma was implemented in 2.01, or Perl 5.8.6.</p>
 519 <p>
 520 </p>
 521 <hr />
 522 <h1><a name="see_also">SEE ALSO</a></h1>
 523 <p><a href="file://C|\msysgit\mingw\html/pod/perlunicode.html">the perlunicode manpage</a>, <a href="file://C|\msysgit\mingw\html/lib/Encode.html">the Encode manpage</a>, <a href="file://C|\msysgit\mingw\html/lib/open.html">the open manpage</a>, <a href="file://C|\msysgit\mingw\html/lib/Filter/Util/Call.html">the Filter::Util::Call manpage</a>,</p>
 524 <p>Ch. 15 of <code>Programming Perl (3rd Edition)</code>
 525 by Larry Wall, Tom Christiansen, Jon Orwant;
 526 O'Reilly &amp; Associates; ISBN 0-596-00027-8</p>
 527 <table border="0" width="100%" cellspacing="0" cellpadding="3">
 528 <tr><td class="block" style="background-color: #cccccc" valign="middle">
 529 <big><strong><span class="block">&nbsp;encoding - allows you to write your script in non-ascii or non-utf8</span></strong></big>
 530 </td></tr>
 531 </table>
 532
 533 </body>
 534
 535 </html>