1 PostgreSQL 7.0.1 multi-byte (MB) support README May 20 2000
5 http://www.sra.co.jp/people/t-ishii/PostgreSQL/
7 [µù] 1. ·PÁÂ¥Û¤«¹F¤Ò (Tatsuo Ishii) ¥ý¥Í!
8 2. µùÄÀ³¡¥÷ì¤å©ÒµL, ¤¤Ä¶Y¦³¿ù»~, ½ÐÁpµ¸ cch@cc.kmu.edu.tw
13 MB ¤ä´©¬O¬°¤FÅý PostgreSQL ¯à³B²z¦h¦ì¤¸²Õ¦r¤¸ (multi-byte character),
14 ¨Ò¦p: EUC (Extended Unix Code), Unicode (²Î¤@½X) ©M Mule internal code
15 (¦h°ê»y¨¥¤º½X). ¦b MB ªº¤ä´©¤U, §A¥i¥H¦b¥¿³Wªí¥Ü¦¡ (regexp), LIKE ¤Î
16 ¨ä¥L¤@¨Ç¨ç¦¡¤¤¨Ï¥Î¦h¦ì¤¸²Õ¦r¤¸. ¹w³]ªº½s½X¨t²Î¥i¨ú¨M©ó§A¦w¸Ë PostgreSQL
17 ®Éªº initdb(1) ©R¥O, ¥ç¥i¥Ñ createdb(1) ©R¥O©Î«Ø¥ß¸ê®Æ®wªº SQL ©R¥O¨M©w.
18 ©Ò¥H§A¥i¥H¦³¦hÓ¤£¦P½s½X¨t²Îªº¸ê®Æ®w.
20 MB ¤ä´©¤]¸Ñ¨M¤F¤@¨Ç 8 ¦ì¤¸³æ¦ì¤¸²Õ¦r¤¸¶° (¥]§t ISO-8859-1) ªº¬ÛÃö°ÝÃD,
21 (§Ú¨Ã¨S¦³»¡©Ò¦³ªº¬ÛÃö°ÝÃD³£¸Ñ¨M¤F, §Ú¥u¬O½T»{¤F°jÂk´ú¸Õ°õ¦æ¦¨¥\,
22 ¦Ó¤@¨Çªk»y¦r¤¸¦b MB ׸ɤU¥i¥H¨Ï¥Î. ¦pªG§A¦b¨Ï¥Î 8 ¦ì¤¸¦r¤¸®Éµo²{¤F
27 ½sĶ PostgreSQL «e, °õ¦æ configure ®É¨Ï¥Î multibyte ªº¿ï¶µ
29 % ./configure --enable-multibyte[=encoding_system]
30 % ./configure --enable-multibyte[=½s½X¨t²Î]
32 ¨ä¤¤ªº½s½X¨t²Î¥i¥H«ü©w¬°¤U±¨ä¤¤¤§¤@:
39 UNICODE Unicode(UTF-8)
40 MULE_INTERNAL Mule internal
41 LATIN1 ISO 8859-1 English and some European languages
42 LATIN2 ISO 8859-2 English and some European languages
43 LATIN3 ISO 8859-3 English and some European languages
44 LATIN4 ISO 8859-4 English and some European languages
45 LATIN5 ISO 8859-5 English and some European languages
52 % ./configure --enable-multibyte=EUC_JP
54 ¦pªG¬Ù²¤«ü©w½s½X¨t²Î, ¨º»ò¹w³]È´N¬O SQL_ASCII.
58 initdb ©R¥O©w¸q PostgresSQL ¦w¸Ë«áªº¹w³]½s½X, ¨Ò¦p:
62 ±N¹w³]ªº½s½X³]©w¬° EUC_JP (Extended Unix Code for Japanese), ¦pªG§A³ßÅw
63 ¸ûªøªº¦r¦ê, §A¤]¥i¥H¥Î "--encoding" ¦Ó¤£¥Î "-E". ¦pªG¨S¦³¨Ï¥Î -E ©Î
64 --encoding ªº¿ï¶µ, ¨º»ò½sö®Éªº³]©w·|¦¨¬°¹w³]È.
66 §A¥i¥H«Ø¥ß¨Ï¥Î¤£¦P½s½Xªº¸ê®Æ®w:
68 % createdb -E EUC_KR korean
70 ³oÓ©R¥O·|«Ø¥ß¤@Ó¥s°µ "korean" ªº¸ê®Æ®w, ¦Ó¨ä±Ä¥Î EUC_KR ½s½X.
71 ¥t¥~¦³¤@Ó¤èªk, ¬O¨Ï¥Î SQL ©R¥O, ¤]¥i¥H¹F¨ì¦P¼Ëªº¥Øªº:
73 CREATE DATABASE korean WITH ENCODING = 'EUC_KR';
75 ¦b pg_database ¨t²Î³W®æªí (system catalog) ¤¤¦³¤@Ó "encoding" ªºÄæ¦ì,
76 ´N¬O¥Î¨Ó¬ö¿ý¤@Ó¸ê®Æ®wªº½s½X. §A¥i¥H¥Î psql -l ©Î¶i¤J psql «á¥Î \l ªº
77 ©R¥O¨Ó¬d¬Ý¸ê®Æ®w±Ä¥Î¦óºØ½s½X:
81 Database | Owner | Encoding
82 ---------------+---------+---------------
83 euc_cn | t-ishii | EUC_CN
84 euc_jp | t-ishii | EUC_JP
85 euc_kr | t-ishii | EUC_KR
86 euc_tw | t-ishii | EUC_TW
87 mule_internal | t-ishii | MULE_INTERNAL
88 regression | t-ishii | SQL_ASCII
89 template1 | t-ishii | EUC_JP
90 test | t-ishii | EUC_JP
91 unicode | t-ishii | UNICODE
94 3. «eºÝ»P«áºÝ½s½Xªº¦Û°ÊÂà´«
96 [µù: «eºÝªx«ü«È¤áºÝªºµ{¦¡, ¥i¯à¬O psql ©R¥O¸ÑĶ¾¹, ©Î±Ä¥Î libpq ªº C
97 µ{¦¡, Perl µ{¦¡, ©ÎªÌ¬O³z¹L ODBC ªºµøµ¡À³¥Îµ{¦¡. ¦Ó«áºÝ´N¬O«ü PostgreSQL
100 PostgreSQL ¤ä´©¬Y¨Ç½s½X¦b«eºÝ»P«áºÝ¶¡°µ¦Û°ÊÂà´«: [µù: ³o¸Ì©Ò¿×ªº¦Û°Ê
101 Âà´«¬O«ü§A¦b«eºÝ¤Î«áºÝ©Ò«Å§i±Ä¥Îªº½s½X¤£¦P, ¦ý¥un PostgreSQL ¤ä´©³o
102 ¨âºØ½s½X¶¡ªºÂà´«, ¨º»ò¥¦·|À°§A¦b¦s¨ú«e°µÂà´«]
104 encoding of backend available encoding of frontend
105 --------------------------------------------------------------------
110 LATIN2 LATIN2, WIN1250
112 LATIN5 LATIN5, WIN, ALT
114 MULE_INTERNAL EUC_JP, SJIS, EUC_KR, EUC_CN,
115 EUC_TW, BIG5, LATIN1 to LATIN5,
118 ¦b±Ò°Ê¦Û°Ê½s½XÂà´«¤§«e, §A¥²¶·§i¶D PostgreSQL §An¦b«eºÝ±Ä¥Î¦óºØ½s½X.
119 ¦³¦n´XÓ¤èªk¥i¥H¹F¨ì³oӥتº:
121 o ¦b psql ©R¥O¸ÑĶ¾¹¤¤¨Ï¥Î \encoding ³oÓ©R¥O
123 \encoding ³oÓ©R¥O¥i¥HÅý§A°¨¤W¤Á´««eºÝ½s½X, ¨Ò¦p, §An±N«eºÝ½s½X¤Á´«¬° SJIS,
128 o ¨Ï¥Î libpq [µù: PostgreSQL ¸ê®Æ®wªº C API µ{¦¡®w] ªº¨ç¦¡
130 psql ªº \encoding ©R¥O¨ä¹ê¥u¬O¥h©I¥s PQsetClientEncoding() ³oӨ禡¨Ó¹F¨ì¥Øªº.
132 int PQsetClientEncoding(PGconn *conn, const char *encoding)
134 ¤W¦¡¤¤ conn ³oӰѼƥNªí¤@Ó¹ï«áºÝªº³s½u, encoding ³oӰѼÆn©ñ§A·Q¥Îªº½s½X,
135 °²¦p¥¦¦¨¥\¦a³]©w¤F½s½X, «K·|¶Ç¦^ 0 È, ¥¢±Ñªº¸Ü¶Ç¦^ -1. ¦Ü©ó¥Ø«e³s½uªº½s½X¥i
138 int PQclientEncoding(const PGconn *conn)
140 ³o¸Ìnª`·Nªº¬O: ³oӨ禡¶Ç¦^ªº¬O½s½Xªº¥N¸¹ (encoding id, ¬OÓ¾ã¼ÆÈ),
141 ¦Ó¤£¬O½s½Xªº¦WºÙ¦r¦ê (¦p "EUC_JP"), ¦pªG§An¥Ñ½s½X¥N¸¹±oª¾½s½X¦WºÙ,
144 char *pg_encoding_to_char(int encoding_id)
146 o ¨Ï¥Î PGCLIENTENCODING ³oÓÀô¹ÒÅܼÆ
148 ¦pªG«eºÝ©³³]©w¤F PGCLIENTENCODING ³o¤@ÓÀô¹ÒÅܼÆ, ¨º»ò«áºÝ·|°µ½s½X¦Û°ÊÂà´«.
150 [µù] PostgreSQL 7.0.0 ~ 7.0.3 ¦³Ó bug -- ¤£»{³oÓÀô¹ÒÅܼÆ
152 o ¨Ï¥Î SET CLIENT_ENCODING TO ³oÓ SQL ªº©R¥O
154 n³]©w«eºÝªº½s½X¥i¥H¥Î¥H¤U³oÓ SQL ©R¥O:
156 SET CLIENT_ENCODING TO 'encoding';
158 §A¤]¥i¥H¨Ï¥Î SQL92 ªº»yªk "SET NAMES" ¹F¨ì¦P¼Ëªº¥Øªº:
160 SET NAMES 'encoding';
162 ¬d¸ß¥Ø«eªº«eºÝ½s½X¥i¥H¥Î¥H¤U³oÓ SQL ©R¥O:
164 SHOW CLIENT_ENCODING;
166 ¤Á´«¬°ì¨Ó¹w³]ªº½s½X, ¥Î¥H¤U³oÓ SQL ©R¥O:
168 RESET CLIENT_ENCODING;
170 [µù] ¨Ï¥Î psql ©R¥O¸ÑĶ¾¹®É, «Øij¤£n¥Î³oÓ¤èªk, ½Ð¥Î \encoding
172 4. Ãö©ó Unicode (²Î¤@½X)
174 ²Î¤@½X©M¨ä¥L½s½X¶¡ªºÂà´«¥i¯àn¦b 7.1 ª©«á¤~·|¹ê²{.
176 5. ¦pªGµLªkÂà´«·|µo¥Í¤°»ò¨Æ?
178 °²³]§A¦b«áºÝ¿ï¾Ü¤F EUC_JP ³oÓ½s½X, «eºÝ¨Ï¥Î LATIN1, (¬Y¨Ç¤é¤å¦r¤¸µLªkÂà´«¦¨
179 LATIN1) ¦b³oÓª¬ªp¤U, ¬YÓ¦r¤¸Y¤£¯àÂন LATIN1 ¦r¤¸¶°, ´N·|³QÂন¥H¤Uªº«¬¦¡:
185 These are good sources to start learning various kind of encoding
188 ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf
189 Detailed explanations of EUC_JP, EUC_CN, EUC_KR, EUC_TW
190 appear in section 3.2.
192 Unicode: http://www.unicode.org/
193 The homepage of UNICODE.
196 UTF-8 is defined here.
201 * SJIS UDC (NEC selection IBM kanji) support contributed
203 * Changes above will appear in 7.0.1
206 * Add new libpq functions PQsetClientEncoding, PQclientEncoding
207 * ./configure --with-mb=EUC_JP
209 ./configure --enable-multibyte=EUC_JP
211 * Add SQL_ASCII regression test case
212 * Add SJIS User Defined Character (UDC) support
213 * All of above will appear in 7.0
216 * Add support for WIN1250 (Windows Czech) as a client encoding
217 (contributed by Pavel Behal)
218 * fix some compiler warnings (contributed by Tomoaki Nishiyama)
221 * Add support for KOI8(KOI8-R), WIN(CP1251), ALT(CP866)
222 (thanks Oleg Broytmann for testing)
223 * Fix problem with MB and locale
226 * Add support for Big5 for fronend encoding
227 (you need to create a database with EUC_TW to use Big5)
228 * Add regression test case for EUC_TW
229 (contributed by Jonah Kuo <jonahkuo@mail.ttn.com.tw>)
232 * Bugs related to SQL_ASCII support fixed
235 * 6.4 release. In this version, pg_database has "encoding"
236 column that represents the database encoding
239 * determine encoding at initdb/createdb rather than compile time
240 * support for PGCLIENTENCODING when issuing COPY command
241 * support for SQL92 syntax "SET NAMES"
242 * support for LATIN2-5
243 * add UNICODE regression test case
244 * new test suite for MB
245 * clean up source files
248 * add support for the encoding translation between the backend
250 * new command SET CLIENT_ENCODING etc. added
251 * add support for LATIN1 character set
252 * enhance 8 bit cleaness
254 April 21, 1998 some enhancements/fixes
255 * character_length(), position(), substring() are now aware of
256 multi-byte characters
258 * add --with-mb option to configure
259 * new regression tests for EUC_KR
260 (contributed by "Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr>)
261 * add some test cases to the EUC_JP regression test
262 * fix problem in regress/regress.sh in case of System V
263 * fix toupper(), tolower() to handle 8bit chars
265 Mar 25, 1998 MB PL2 is incorporated into PostgreSQL 6.3.1
267 Mar 10, 1998 PL2 released
268 * add regression test for EUC_JP, EUC_CN and MULE_INTERNAL
269 * add an English document (this file)
270 * fix problems concerning 8-bit single byte characters
272 Mar 1, 1998 PL1 released
276 [Here is a good documentation explaining how to use WIN1250 on
277 Windows/ODBC from Pavel Behal. Please note that Installation step 1)
278 is not necceary in 6.5.1 -- Tatsuo]
280 Version: 0.91 for PgSQL 6.5
282 Revised by: Tatsuo Ishii
283 Email: behal@opf.slu.cz
284 Licence: The Same as PostgreSQL
286 Sorry for my Eglish and C code, I'm not native :-)
288 !!!!!!!!!!!!!!!!!!!!!!!!! NO WARRANTY !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
292 1) Change three affected files in source directories
293 (I don't have time to create proper patch diffs, I don't know how)
294 2) Compile with enabled locale and multibyte set to LATIN2
295 3) Setup properly your instalation, do not forget to create locale
296 variables in your profile (environment). Ex. (may not be exactly true):
297 LC_ALL=cs_CZ.ISO8859-2
298 LC_COLLATE=cs_CZ.ISO8859-2
299 LC_CTYPE=cs_CZ.ISO8859-2
300 LC_MONETARY=cs_CZ.ISO8859-2
301 LC_NUMERIC=cs_CZ.ISO8859-2
302 LC_TIME=cs_CZ.ISO8859-2
303 4) You have to start the postmaster with locales set!
304 5) Try it with Czech language, it have to sort
305 5) Install ODBC driver for PgSQL into your M$ Windows
306 6) Setup properly your data source. Include this line in your ODBC
307 configuration dialog in field "Connect Settings:" :
308 SET CLIENT_ENCODING = 'WIN1250';
309 7) Now try it again, but in Windows with ODBC.
313 - Depends on proper system locales, tested with RH6.0 and Slackware 3.6,
314 with cs_CZ.iso8859-2 loacle
315 - Never try to set-up server multibyte database encoding to WIN1250,
316 always use LATIN2 instead. There is not WIN1250 locale in Unix
317 - WIN1250 encoding is useable only for M$W ODBC clients. The characters are
318 on thy fly re-coded, to be displayed and stored back properly
322 - it reorders your sort order depending on your LC_... setting, so don't be
323 confused with regression tests, they don't use locale
324 - "ch" is corectly sorted only in some newer locales (Ex. RH6.0)
325 - you have to insert money as '162,50' (with comma in aphostrophes!)
326 - not tested properly