From 6fe6cc890178033df801668dc735ce6403cf4545 Mon Sep 17 00:00:00 2001
From: "Edward Z. Yang" META
tag?
Fortunantely for us, the characters we need to write the +
Fortunately for us, the characters we need to write the
META
are in ASCII, which is pretty much universal
over every character encoding that is in common use today. So,
all the web-browser has to do is parse all the way down until
@@ -526,7 +526,7 @@ you don't have to use those user-unfriendly entities.
Websites encoded in Latin-1 (ISO-8859-1) which ocassionally need +
Websites encoded in Latin-1 (ISO-8859-1) which occasionally need
a special character outside of their scope often will use a character
entity reference to achieve the desired effect. For instance, θ can be
written θ
, regardless of the character encoding's
@@ -584,7 +584,7 @@ disappeared off the web, so I am linking to the Web Archive copy.)
application/x-www-form-urlencoded
This is the Content-Type that GET requests must use, and POST requests
-use by default. It involves the ubiquituous percent encoding format that
+use by default. It involves the ubiquitous percent encoding format that
looks something like: %C3%86
. There is no official way of
determining the character encoding of such a request, since the percent
encoding operates on a byte level, so it is usually assumed that it
@@ -674,7 +674,7 @@ it up to the module iconv to do the dirty work.
This approach, however, is not perfect. iconv is blithely unaware of HTML character entities. HTML Purifier, in order to protect against sophisticated escaping schemes, normalizes all character -and numeric entitie references before processing the text. This leads to +and numeric entity references before processing the text. This leads to one important ramification:
Any character that is not supported by the target character @@ -770,7 +770,7 @@ the text when you try to convert it to UTF-8. You'll have to convert it to a binary field, convert it to a Shift-JIS field (the real encoding), and then finally to UTF-8. Many a website had pages irreversibly mangled because they didn't realize that they'd been deluding themselves about -the character encoding all along, don't become the next victim.
+the character encoding all along; don't become the next victim.For PostgreSQL, there appears to be no direct way to change the encoding of a database (as of 8.2). You will have to dump the data, and then reimport @@ -790,7 +790,7 @@ usually supported).
Due to the abovementioned compatibility issues, a more interoperable +
Due to the aforementioned compatibility issues, a more interoperable
way of storing UTF-8 text is to stuff it in a binary datatype.
CHAR
becomes BINARY
, VARCHAR
becomes
VARBINARY
and TEXT
becomes BLOB
.
@@ -917,8 +917,8 @@ anyway. So we'll deal with the other two edge cases.
Fortunantely, the folks over at Wikipedia have already done all the +
Fortunately, the folks over at Wikipedia have already done all the heavy lifting for you. Get the CSS from the horses mouth here: Common.css, and search for ".IPA" There are also a smattering of @@ -972,7 +972,7 @@ users.
When people claim that PHP6 will solve all our Unicode problems, they're -misinformed. It will not fix any of the abovementioned troubles. It will, +misinformed. It will not fix any of the aforementioned troubles. It will, however, fix the problem we are about to discuss: processing UTF-8 text in PHP.
@@ -1035,7 +1035,7 @@ directory.Well, that's it. Hopefully this document has served as a very practical springboard into knowledge of how UTF-8 works. You may have decided that you don't want to migrate yet: that's fine, just know -what will happen to your output and what bug reports you may recieve.
+what will happen to your output and what bug reports you may receive.Many other developers have already discussed the subject of Unicode, UTF-8 and internationalization, and I would like to defer to them for -- 2.11.4.GIT