4 Like anything that claims to afford security, HTML_Purifier can be circumvented
5 through negligence of people. This class will do its job: no more, no less,
6 and it's up to you to provide it the proper information and proper context
7 to be effective. Things to remember:
9 1. UTF-8. Currently, the parser runs under the assumption that it is dealing
10 with UTF-8. Not ISO-8859-1 or Windows-1252, UTF-8. And definitely not "no
11 character encoding explicitly stated" or UTF-7. If you're not using UTF-8 as
12 your character encoding, you should switch. Now. Make sure any input is
13 properly converted to UTF-8, or the parser will mangle it badly
14 (though it won't be a security risk if you're outputting it as UTF-8 though).
15 We will be adding out-of-the-box support for the other major character
18 2. XHTML 1.0 Transitional. This is what the parser is outputting. For the most
19 part, it's compatible with HTML 4.01, but XHTML enforces some very nice things
20 that all web developers should use. Regardless, NO DOCTYPE is a NO. Quirks mode
21 has waaaay too many quirks for a little parser to handle. We did not select
22 strict in order to prevent ourselves from being too draconic on users, but
23 this may be configurable in the future.
25 3. IDs. They need to be unique, but without some knowledge of the
26 rest of the document, it's difficult to know what's unique. %Attr.IDBlacklist
27 needs to be set: we may want to consider disallowing IDs by default to
28 save lazy programmers.
30 4. [PROJECTED] Links. We're not going to try for spam protection (although
31 some hooks for such a module might be nice) but we may offer the ability to
32 only accept relative URLs. Pick the one that's right for you.
34 5. CSS. While we can prevent the most flagrant cases from affecting your
35 layout (such as absolutely positioned elements), no amount of code is going
36 to protect your pages from being attacked by garish colors and plain old
37 bad taste. A neat feature would be the ability to define acceptable colors
38 in a document, but that's not likely to be implemented for a while. In the
39 meantime, be sure to make sure that floated elements (permitted, since they
40 can be quite useful) can't mess up your layout.