Add notes to the markup language table.
[htmlpurifier-web.git] / comparison.html
blob1bfc2fe2bed1fdc917554890a7b1a9638b2e0e0a
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
4 <head>
5 <title>Comparison - HTML Purifier</title>
6 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
7 <meta name="keywords" content="HTMLPurifier, HTML Purifier, HTML, filter,
8 filtering, HTML_Safe, PEAR, comparison, kses, striptags,
9 SafeHTMLChecker" />
10 <meta name="author" content="Edward Z. Yang" />
11 <link rel="icon" href="./favicon.ico" type="image/x-icon" />
12 <link rel="shortcut icon" href="./favicon.ico" type="image/x-icon" />
13 <link rel="stylesheet" href="./style.css" type="text/css" />
14 <!--[if lt IE 7.]><script defer="defer" type="text/javascript" src="./pngfix.js"></script><![endif]-->
15 </head>
16 <body>
18 <img src="./logo.png" id="logo" alt="HTML Purifier" />
20 <h1 id="title">Comparison</h1>
21 <div id="header"><a href="./"><span class="html">HTML</span> <span class="purifier">Purifier</span></a></div>
23 <div id="content">
25 <p class="lead">With the advent of
26 <a href="http://en.wikipedia.org/wiki/Web_2.0">Web 2.0</a>, the end user has
27 gone from passive consumer to active producer of content on the World Wide
28 Web. <a href="http://en.wikipedia.org/wiki/Wiki">Wikis</a>,
29 <a href="http://en.wikipedia.org/wiki/Social_software">Social Software</a> and
30 <a href="http://en.wikipedia.org/wiki/Blog">Blogs</a> all
31 put the user in control.</p>
33 <p>Give the user too much control, however, and you set yourself up
34 for <a href="http://en.wikipedia.org/wiki/Cross-site_scripting"><acronym
35 title="Cross Site Scripting">XSS</acronym></a> attacks. For this reason,
36 <acronym title="HyperText Markup Language">HTML</acronym>'s flexibility
37 has proven to be both a blessing and a curse, and the software that processes
38 it must strike a fine balance between security and usability. How do
39 we prevent users from injecting JavaScript or inserting malformed
40 <acronym title="HyperText Markup Language">HTML</acronym> while allowing
41 a rich syntax of tags, attributes and <acronym
42 title="Cascading Style Sheets">CSS</acronym>? How do we put
43 <acronym title="HyperText Markup Language">HTML</acronym> inside
44 <acronym title="Really Simple Syndication">RSS</acronym> feed without worrying
45 about sloppy coding messing up <acronym
46 title="eXtensible Markup Language">XML</acronym> parsing?
47 Almost every <acronym title="PHP: Hypertext Preprocessor">PHP</acronym>
48 developer has come across this problem before, and many have tried
49 (albeit unsuccessfully) to solve this problem. We will analyze existing
50 libraries to demonstrate how they are ineffective and, of course,
51 how HTML Purifier solves all our problems and achieves world peace.</p>
53 <p>I will take no quarter and pull no punches: as of the time of writing,
54 no other library comes even <em>close</em> to solving the problem effectively
55 for richly formatted documents. However, it's important to note HTML
56 Purifier's mission: to filter <em>richly formatted</em> documents.
57 There are cases in which HTML Purifier is overkill, in such situations,
58 use the tool best suited for the job.</p>
60 <h2>Look Ma, No HTML!</h2>
62 <blockquote class="quote">
63 <div>A clever person solves a problem.</div>
64 <div>A wise person avoids it.</div>
65 <div class="attribution">Einstein</div>
66 </blockquote>
68 <p class="lead">While libraries of this type really shouldn't be
69 considered <acronym title="HyperText Markup Language">HTML</acronym> filters,
70 they are the number one method of taking user input and processing it into
71 something more than plain old text. These libraries forgo
72 <acronym title="HyperText Markup Language">HTML</acronym> and define their
73 own markup syntax. <a href="http://en.wikipedia.org/wiki/BBCode">BBCode</a>,
74 <a href="http://en.wikipedia.org/wiki/Wikitext">Wikitext</a>,
75 <a href="http://daringfireball.net/projects/markdown/">Markdown</a> and
76 <a href="http://textism.com/tools/textile/">Textile</a> are all examples of
77 such markup languages (although it should be noted that
78 Wikitext and Markdown can allow
79 <acronym title="HyperText Markup Language">HTML</acronym> within them).
80 The benefits (to those who use it, anyway) are clear: simplicity and
81 security.
82 </p>
84 <table>
85 <thead>
86 <tr>
87 <th>Markup language</th>
88 <th>Sample</th>
89 </tr>
90 </thead>
91 <tbody>
92 <tr>
93 <th>BBCode</th>
94 <td><tt>[b]B[/b] [i]i[/i] [url = http://www.example.com/]link[/url].</tt></td>
95 </tr>
96 <tr>
97 <th>Wikitext<sup>1</sup></th>
98 <td><tt>'''B''' ''i'' [http://www.example.com/ link]</tt></td>
99 </tr>
100 <tr>
101 <th>Markdown<sup>2</sup></th>
102 <td><tt>**B** *i* [link](http://www.example.com/)</tt></td>
103 </tr>
104 <tr>
105 <th>Textile</th>
106 <td><tt>*B* _i_ &quot;link&quot;:http://www.example.com/</tt></td>
107 </tr>
108 <tr>
109 <th>HTML</th>
110 <td><tt>&lt;b&gt;B&lt;/b&gt; &lt;i&gt;i&lt;/i&gt; &lt;a href=&quot;http://www.example.com/&quot;&gt;link&lt;/a&gt;</tt></td>
111 </tr>
112 <tr>
113 <th>WYSIWYG</th>
114 <td><b>B</b> <i>i</i> <a href="http://www.example.com/">link</a></td>
115 </tr>
116 </tbody>
117 </table>
119 <ol class="notes">
120 <li>Wikitext shown is modeled after <a
121 href="http://www.mediawiki.org/wiki/MediaWiki">MediaWiki</a> style.
122 There are many variants of Wikitext currently extant.</li>
123 <li>Strictly speaking, the Markdown syntax is not equivalent: bold text
124 is expressed as <code>&lt;strong&gt;</code> and italicized text is
125 expressed as <code>&lt;em&gt;</code>. Most browser default stylesheets,
126 however, map those two semantic tags to the associated styling, so
127 many users assume that it really is italics (and use it improperly for,
128 say, book titles.)</li>
129 </ol>
131 <h3>Simplicity</h3>
133 <p class="lead"><acronym title="HyperText Markup Language">HTML</acronym>
134 source
135 code is often criticized for being difficult to read. For example,
136 compare:</p>
138 <pre>
139 * Item 1
140 * Item 2
141 </pre>
143 <p>...versus:</p>
145 <pre>
146 &lt;ul&gt;
147 &lt;li&gt;Item 1&lt;/li&gt;
148 &lt;li&gt;Item 2&lt;/li&gt;
149 &lt;/ul&gt;
150 </pre>
152 <p>The difference seems obvious. However, there is something called a
153 <acronym title="What You See Is What You Get">WYSIWYG</acronym> editor,
154 which blows both cases out of the water in terms
155 of usability. And, when push comes to shove, it is far easier to
156 implement this sort of editor on top of <acronym
157 title="HyperText Markup Language">HTML</acronym> than some obscure
158 markup language. And in the cases when it is done, you usually end up with
159 a live preview, not a true rich text editor.</p>
161 <blockquote class="digression">
162 <p>&quot;Now just wait a second,&quot; you may be saying,
163 &quot;<acronym title="What You See Is What You Get">WYSIWYG</acronym>
164 editors aren't all that great.&quot; There are many good arguments
165 against these editors, and intelligent people have written essays devoted to
166 criticizing <acronym title="What You See Is What You Get">WYSIWYG</acronym>.
167 In a web context, no JavaScript means no
168 editor, and just like when the power is out you must type in the code
169 manually. I will, however, dispel one common objection, that is, that
170 these editors
171 <em>encourage excessive presentational markup</em>. As it turns out,
172 this is the case with any markup language that allows the smallest
173 iota of presentational tags, be it <tt>&lt;font&gt;</tt> or
174 <tt>[color=red]</tt>.
175 A good way to mitigate this trouble is to simply eliminate the
176 dialogue boxes that allow users to change colors or fonts (which
177 usually have no legitimate use) and adopt a
178 <acronym title="What You See Is What You Mean">WYSIWYM</acronym> scheme,
179 allowing users to select contextually correct formatting styles
180 for segments of text.</p>
181 </blockquote>
183 <p>Simplicity is also a double-edged sword. The moment any remotely
184 complex markup is needed, these lightweight markup languages fail to
185 produce. Sure you can make '''this text bold''' with Wikitext, but that
186 infobox all &quot;rendered nicely in aqua blue&quot; will require a gaggle of
187 &lt;div&gt;s and <acronym title="Cascading Style Sheets">CSS</acronym>.
188 These languages face the same troubles as regular <acronym
189 title="HyperText Markup Language">HTML</acronym> filters in that their
190 whitelist is too restrictive (besides the fact that their table markup
191 is extraordinarily complex).</p>
193 <h3>Security</h3>
195 <p class="lead">BBCode can be boiled down to a &quot;wanna-be&quot; version of
196 <acronym title="HyperText Markup Language">HTML</acronym>. I mean, replacing
197 the angled brackets with square brackets and omitting the occasional parameter
198 name? How much more un-original can you get? Somehow, I don't think BBCode
199 was meant to readable. <a
200 href="http://en.wikipedia.org/wiki/BBCode">Wikipedia</a> agrees:</p>
202 <blockquote>
203 BBCode was devised and put to use in order to provide a safer, easier
204 and more limited way of allowing users to format their messages.
205 Previously, many message boards allowed the users to include HTML,
206 which could be used to break/imitate parts of the layout, or run
207 JavaScript. Some implementations of BBCode have suffered problems related
208 to the way they translate the BBCode into HTML, which could negate the
209 security that was intended to be given by BBCode.
210 </blockquote>
212 <p>Or, put more simply:</p>
214 <blockquote>
215 BBCode came to life when developers where too lazy to parse HTML correctly
216 and decided to invent their own markup language. As with all products of
217 laziness, the result is completely inconsistent, unstandardized, and
218 widely adopted.
219 </blockquote>
221 <p>Well, developers, the whole point of HTML Purifier is that I do the
222 work so you can just execute the ridiculously simple
223 <tt>$purifier->purify($html)</tt> call and go on to do, well, whatever
224 you developers do. <tt>:-P</tt></p>
226 <h3>Conclusion</h3>
228 <p>These alternative markup languages have their shiny points, and HTML
229 Purifier is not meant to replace them. However, a major reason for
230 their existence has been called into question. Why are <em>you</em>
231 using these languages?</p>
233 <h2>HTML Tidy</h2>
235 <p class="lead">Dave Raggett's
236 <a href="http://www.w3.org/People/Raggett/tidy/">HTML Tidy</a> is a program;
237 neat enough, at least, to make it into PHP as a
238 <a href="http://us2.php.net/manual/en/ref.tidy.php">PECL extension.</a>
239 The premise is simple, the execution effective. Tidy is, in short, a great
240 <em>tool</em>.</p>
242 <p>It is not, however, a filter. I am often surprised when people ask
243 me, &quot;What about Tidy?&quot; There's nothing against Tidy: Tidy tackles
244 a different problem set. Let's see what <tt>man tidy</tt> has to say:</p>
246 <blockquote cite="http://tidy.sourceforge.net/docs/tidy_man.html">
247 Tidy reads HTML, XHTML and XML files and writes cleaned up markup. For
248 HTML variants, it detects and corrects many common coding errors and
249 strives to produce visually equivalent markup that is both W3C compliant
250 and works on most browsers. A common use of Tidy is to convert plain HTML
251 to XHTML.
252 </blockquote>
254 <p>Hmm... why do I not see the words &quot;filter&quot; or &quot;<acronym
255 title="Cross Site Scripting">XSS</acronym>&quot; in here? Perhaps it's
256 because Tidy accepts <em>any</em> valid
257 <acronym title="HyperText Markup Language">HTML</acronym>. Including
258 <tt>script</tt> tags. Which leads us to our second part: Tidy parses
259 <em>documents</em>, not document <em>fragments</em>.</p>
261 <p>This is not to say that I haven't seen Tidy be used in this sort of
262 fashion. MediaWiki, for instance, uses Tidy to cleanup the final HTML
263 output before shuttling it off to the browser. The developers, nevertheless,
264 agree that this is only a band-aid solution, and that the real way
265 to fix it is to fix the parser. Tidy's great, but in terms of security,
266 it's unsuited for untrusted sources.</p>
268 <h2>Preface</h2>
270 <p>I've ordered my analyses according to how bad a library is. The worst
271 is first, and then we move up the spectrum. I will point out the most
272 flagrant problems with the libraries, but note that I will omit more
273 advanced vulnerabilities: if you can't catch an <tt>onmouseover</tt>
274 attribute, I really shouldn't reprimand you for letting non-SGML code
275 points through. The ideal solution, however, must do all these things.</p>
277 <h2>striptags()</h2>
279 <table class="summary">
280 <tr><th>Whitelist</th> <td>Yes, user-specified</td></tr>
281 <tr><th>Removes foreign tags</th> <td>Buggy</td></tr>
282 <tr><th>Makes well-formed</th> <td>No</td></tr>
283 <tr><th>Fixes nesting</th> <td>No</td></tr>
284 <tr><th>Validates attributes</th> <td>No</td></tr>
285 </table>
287 <p class="lead">The PHP function
288 <a href="http://php.net/manual/en/function.strip-tags.php">striptags()</a> is
289 the classic solution for attempting to clean up
290 <acronym title="HyperText Markup Language">HTML</acronym>. It
291 is also the <em>worst</em> solution, and should be avoided like the plague.
292 The fact that it doesn't validate attributes at all means that anyone can
293 insert an <tt>onmouseover='xss();'</tt> and exploit your application. While
294 this can be bandaided with a series of regular expressions that strip out
295 on[event] (you're still vulnerable to <acronym
296 title="Cross Site Scripting">XSS</acronym> and at the mercy of
297 quirky browser behavior), striptags() is fundamentally flawed and should not be
298 used.
299 </p>
301 <h2>PHP Input Filter</h2>
303 <p class="lead">Though its title may not imply it,
304 <a href="http://cyberai.com/inputfilter/">PHP Input Filter</a>
305 is a souped up version of striptags() with the ability to inspect
306 attributes. (Don't mind the hastily tacked on query escaping function).</p>
308 <table class="summary">
309 <tr><th>Version</th> <td>1.2.2</td></tr>
310 <tr><th>Last update</th> <td>2005-10-05</td></tr>
311 <tr><th>License</th> <td>GPL</td></tr>
312 <tr><th>Whitelist</th> <td>Yes, user defined</td></tr>
313 <tr><th>Removes foreign tags</th> <td>Yes</td></tr>
314 <tr><th>Makes well-formed</th> <td>No</td></tr>
315 <tr><th>Fixes nesting</th> <td>No</td></tr>
316 <tr><th>Validates attributes</th> <td>Partial</td></tr>
317 <tr><th>XSS safe</th> <td>Probably</td></tr>
318 <tr><th>Standards safe</th> <td>No</td></tr>
319 </table>
321 <p>PHP Input Filter implements an
322 <acronym title="HyperText Markup Language">HTML</acronym> parser, and
323 performs very basic checks on whether or not tags and attributes have
324 been defined in the whitelist as well as some smarter <acronym
325 title="Cross Site Scripting">XSS</acronym> checks. It is left up to
326 the user to define what they'll permit.</p>
328 <p>With absolutely no checking of well-formedness, it is trivially easy
329 to trick the filter into leaving unclosed tags lying around. While
330 standards-compliance can be viewed by some as a &quot;nice feature&quot;,
331 basic sanity checks like this must be implemented.</p>
333 <p>More troubles: Woe to
334 any user that allows the <tt>style</tt> attribute: you can't simply
335 just let <acronym
336 title="Cascading Style Sheets">CSS</acronym> through and expect your
337 layout not to be badly mutilated. To top things off,
338 the filter doesn't even preserve data properly: attributes have all
339 spaces stripped out of them. Stay away, stay away!</p>
341 <h2>HTML_Safe/SafeHTML</h2>
343 <p class="lead"><a href="http://pear.php.net/package/HTML_Safe">HTML_Safe</a> is
344 <acronym title="PHP Application and Extension Repository">PEAR</acronym>'s
345 <acronym title="HyperText Markup Language">HTML</acronym>
346 filtering library.
347 It should be noted that this is the same library as
348 <a href="http://pixel-apes.com/safehtml/">SafeHTML</a>, though with different
349 branding (and a different version number).</p>
351 <table class="summary">
352 <tr><th>Version</th> <td>0.9.9beta</td></tr>
353 <tr><th>Last update</th> <td>2005-12-21</td></tr>
354 <tr><th>License</th> <td>BSD (3 clause)</td></tr>
355 <tr><th>Whitelist</th> <td>Mostly No</td></tr>
356 <tr><th>Removes foreign tags</th> <td>Yes</td></tr>
357 <tr><th>Makes well-formed</th> <td>Yes</td></tr>
358 <tr><th>Fixes nesting</th> <td>No</td></tr>
359 <tr><th>Validates attributes</th> <td>Partial</td></tr>
360 <tr><th>XSS safe</th> <td>Probably</td></tr>
361 <tr><th>Standards safe</th> <td>No</td></tr>
362 </table>
364 <p>HTML_Safe's mechanism of action involves parsing
365 <acronym title="HyperText Markup Language">HTML</acronym> with a
366 <acronym title="Simple API for XML">SAX</acronym> parser and performing
367 validation and filtering as the handlers are called. HTML_Safe does a lot
368 of things right, which is why I say it <em>probably</em> isn't vulnerable
369 to <acronym title="Cross Site Scripting">XSS</acronym>, but its approach
370 is fundamentally flawed: blacklists.</p>
372 <p>This library maintains arrays of dangerous tags, attributes and
373 <acronym title="Cascading Style Sheets">CSS</acronym> properties. (It also
374 has a blacklist of dangerous <acronym
375 title="Uniform Resource Identifier">URI</acronym> protocols, but this is
376 intelligently disabled by default in favor of a protocol whitelist.)
377 What this means is that HTML_Safe has no qualms of accepting input
378 like <tt>&lt;foobar&gt; Bang &lt;/foobar&gt;</tt>. Anything goes except
379 the tags in those arrays. Scratch standards-compliance (and that was
380 without even considering proper nesting).</p>
382 <p>For now, HTML_Safe might be safe from
383 <acronym title="Cross Site Scripting">XSS</acronym>.
384 In the future, however, one of the infinitely many tags that HTML_Safe lets
385 through might just possibly be given special functionality by browser vendors.
386 And it might just turn out that this can be exploited. <em>Any</em> blacklist
387 solution puts you at a perpetual arms race against crackers who are constantly
388 discovering new and inventive ways to abuse tags and attributes that you
389 didn't blacklist.</p>
391 <h2>kses</h2>
393 <p class="lead"><a href="http://sourceforge.net/projects/kses/">kses</a> appears to
394 be the de-facto solution for cleaning
395 <acronym title="HyperText Markup Language">HTML</acronym>, having found
396 its way into applications such as <a href="http://wordpress.org/">WordPress</a>
397 and being the number one search result for &quot;php html filter&quot;.</p>
399 <table class="summary">
400 <tr><th>Version</th> <td>0.2.2</td></tr>
401 <tr><th>Last update</th> <td>2005-02-06</td></tr>
402 <tr><th>License</th> <td>GPL</td></tr>
403 <tr><th>Whitelist</th> <td>Yes, user defined</td></tr>
404 <tr><th>Removes foreign tags</th> <td>Yes</td></tr>
405 <tr><th>Makes well-formed</th> <td>No</td></tr>
406 <tr><th>Fixes nesting</th> <td>No</td></tr>
407 <tr><th>Validates attributes</th> <td>Partial</td></tr>
408 <tr><th>XSS safe</th> <td>Probably</td></tr>
409 <tr><th>Standards safe</th> <td>No</td></tr>
410 </table>
412 <p>To be truthful, I didn't do as comprehensive a code survey for kses
413 as I did for some of the other libraries. Out of
414 all the classes I've reviewed so far, kses was definitely the hardest to
415 understand.</p>
417 <p>kses's modus operandi is splitting up html with a monster regexp
418 and then validating each section with <tt>kses_split2()</tt>. It
419 suffers from the same problems as Input Filter: no well-formedness
420 checks leading to rampant runaway tags (and no standards-compliance).</p>
422 <p>Its whitelist syntax, however, is the most complex of all these libraries,
423 so I'm going to take some time to argue why this particular implementation
424 is bad. The author of this library was thoughtful enough to provide some
425 basic constraint checks on attributes like maxlen and maxval. Now, barring
426 the fact that there simply aren't enough checks, and the fact that they are
427 all lumped together in one function, we now must wonder whether or not
428 the user will go through the trouble of specifying the maximum length
429 of a title attribute.</p>
431 <p>I have my opinions about inherent human laziness, but perhaps WordPress's
432 default filterset is the most telling example:</p>
434 <pre>
435 $allowedposttags = array (
436 /* formatted and trimmed */
437 'hr' => array (
438 'align' => array (),
439 'noshade' => array (),
440 'size' => array (),
441 'width' => array ()
444 </pre>
446 <p>Hmm... do I see a blatant lack of attribute constraints? Conclusion:
447 if the user can get away with not doing work, they will! The biggest
448 problem in all these whitelists filters is that they forgot to <em>supply</em>
449 the whitelist. The whitelist is just as important as the code that uses
450 the whitelist to filter
451 <acronym title="HyperText Markup Language">HTML</acronym>.</p>
453 <h2>Safe HTML Checker</h2>
455 <p class="lead">
456 <a href="http://simon.incutio.com/archive/2003/02/23/safeHtmlChecker">Safe
457 HTML Checker</a> is (to my knowledge) the first attempt to make a filter
458 that also outputs standards-compliant XHTML. It wasn't even released or
459 licensed officially, but we'll let that slide: a 4<sup>th</sup> place
460 search result must have done something right.</p>
462 <table class="summary">
463 <tr><th>Version</th> <td>in-house</td></tr>
464 <tr><th>Last update</th> <td>2003-09-15</td></tr>
465 <tr><th>License</th> <td>undefined</td></tr>
466 <tr><th>Whitelist</th> <td>Yes (bare-bones)</td></tr>
467 <tr><th>Removes foreign tags</th> <td>Yes</td></tr>
468 <tr><th>Makes well-formed</th> <td>Yes</td></tr>
469 <tr><th>Fixes nesting</th> <td>Almost</td></tr>
470 <tr><th>Validates attributes</th> <td>Partial</td></tr>
471 <tr><th>XSS safe</th> <td>Yes</td></tr>
472 <tr><th>Standards safe</th> <td>Almost</td></tr>
473 </table>
475 <p>Indeed, it is quite a well-written piece of code. It demonstrates
476 knowledge of inline versus block elements, thus almost nearly getting
477 nesting correct (the only exception is an unimplemented omitted SGML
478 exclusion for <tt>&lt;a&gt;</tt> tags, and that's easy to fix).</p>
480 <p>Unfortunately, part of the reason why it works so well is that it's
481 extremely restrictive. No styling, no tables, very few attributes.
482 Perfectly appropriate for blog comments, but then again, there's always
483 BBCode. This probably means that Safe HTML Checker has a different
484 goal than HTML Purifier.</p>
486 <p>The <acronym title="eXtensible Markup Language">XML</acronym> parser
487 is also quite strict. Accidentally missed a &lt; sign? The parser will
488 complain with the cryptic message:
489 &quot;<acronym title="eXTensible HyperTest Markup Language">XHTML</acronym>
490 is not well-formed&quot;.
491 The solution is not as simple as just switching to a more permissive
492 parser: Safe HTML Checker relies on the fact that the parser will have
493 matched up the tags for them.</p>
495 <h2>HTML Purifier</h2>
497 <table class="summary">
498 <tr><th>Version</th> <td>1.1.2</td></tr>
499 <tr><th>Last update</th> <td>2006-09-30</td></tr>
500 <tr><th>License</th> <td>LGPL</td></tr>
501 <tr><th>Whitelist</th> <td>Yes</td></tr>
502 <tr><th>Removes foreign tags</th> <td>Yes</td></tr>
503 <tr><th>Makes well-formed</th> <td>Yes</td></tr>
504 <tr><th>Fixes nesting</th> <td>Yes</td></tr>
505 <tr><th>Validates attributes</th> <td>Yes</td></tr>
506 <tr><th>XSS safe</th> <td>Yes</td></tr>
507 <tr><th>Standards safe</th> <td>Yes</td></tr>
508 </table>
510 <p class="lead">That table should say it all, but I'll add a few more features:</p>
512 <table class="summary">
513 <tr><th>UTF-8 aware</th><td>Yes</td></tr>
514 <tr><th>Object-Oriented</th><td>Yes</td></tr>
515 <tr><th>Validates CSS</th><td>Yes</td></tr>
516 <tr><th>Tables</th><td>Yes</td></tr>
517 <tr><th>PHP 5 aware</th><td>Yes</td></tr>
518 </table>
520 <p>This is not to say that HTML Purifier doesn't have problems of its own.
521 It's a fairly nascent library (so there's bound to bugs), it's big
522 (while the others usually fit in one file, this one requires a huge
523 include list), and it's <a href="http://hp.jpsband.org/live/TODO">missing
524 features.</a> A big thing I would like to see added to this library is
525 multiple levels of filtering: from a super-permissive lint mode
526 to a super-restrictive blog comment mode. But even in its current state,
527 HTML Purifier is far better than the other libraries.</p>
529 <p>So... <a href="./#Download">what are you waiting for?</a></p>
531 </div>
532 </body>
533 </html>