Fix broken link.
[htmlpurifier-web.git] / index.html
blob5280eb42b4fca0536a6939108fbbee444d812f79
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
4 <head>
5 <title>HTML Purifier - Filter your HTML the standards-compliant way!</title>
6 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
7 <meta name="description" content="HTML filter that guards against
8 XSS and ensures standards-compliant output." />
9 <meta name="keywords" content="HTMLPurifier, HTML Purifier, HTML, filter,
10 filtering, standards, compliant, w3c, XSS, PHP, security, library,
11 open source, LGPL, whitelist" />
12 <meta name="author" content="Edward Z. Yang" />
13 <link rel="icon" href="./favicon.ico" type="image/x-icon" />
14 <link rel="shortcut icon" href="./favicon.ico" type="image/x-icon" />
15 <link rel="stylesheet" href="./style.css" type="text/css" />
16 <!--[if lt IE 7.]><script defer="defer" type="text/javascript" src="./pngfix.js"></script><![endif]-->
17 </head>
18 <body>
20 <img src="./logo.png" id="logo" alt="HTML Purifier" />
22 <h1 id="header"><span class="html">HTML</span> <span class="purifier">Purifier</span></h1>
24 <div id="navigation">
25 <h2>Navigation</h2>
26 <ol>
27 <li><a href="#Background">Background</a></li>
28 <li><a href="#Examples">Examples</a></li>
29 <li><a href="#News">News</a></li>
30 <li><a href="#Download">Download</a></li>
31 <li><a href="#Demo">Demo</a></li>
32 <li><a href="#Resources">Resources</a></li>
33 <li><a href="./mantis/">Bugtracker</a></li>
34 <li><a href="./vanilla/">Forum</a></li>
35 <li><a href="#Contact">Contact</a></li>
36 <li><a href="#License">License</a></li>
37 </ol>
38 </div>
40 <div id="content">
42 <p class="lead"><strong>HTML Purifier</strong> is the premiere
43 <acronym title="PHP: Hypertext Preprocessor">PHP</acronym>
44 solution for all your <acronym title="HyperText Markup Language">HTML</acronym>
45 filtering needs. HTML Purifier will not only remove
46 all malicious code (better known as
47 <acronym title="Cross Site Scripting">XSS</acronym>) with a secure yet
48 permissive <strong>whitelist</strong>, it will also make sure
49 your documents are <strong>standards compliant</strong>, something
50 only acheivable with a
51 comprehensive knowledge of
52 <acronym title="World Wide Web Consortium">W3C</acronym>'s specifications.
53 Tired of forcing users to use BBCode or some other
54 obscure custom markup language due to the
55 current landscape of deficient or hole-ridden <acronym
56 title="HyperText Markup Language">HTML</acronym> filterers? Have a
57 <strong><acronym title="What You See Is What You Get">WYSIWYG</acronym></strong>
58 editor lying around but never had the chance to use it?
59 HTML Purifier
60 is for you!</p>
62 <h2 id="Background">Background</h2>
64 <p class="lead">There are a number of ad hoc <acronym
65 title="HyperText Markup Language">HTML</acronym> filtering solutions out
66 there on the web
67 (some examples including <acronym
68 title="PHP Extension and Application Repository">PEAR</acronym>'s
69 <a href="http://pear.php.net/package/HTML_Safe">HTML_Safe</a>,
70 <a href="http://sourceforge.net/projects/kses">kses</a>
71 and
72 <a href="http://simon.incutio.com/archive/2003/02/23/safeHtmlChecker">
73 SafeHtmlChecker.class.php</a>)
74 that claim to filter <acronym
75 title="HyperText Markup Language">HTML</acronym> properly, preventing
76 malicious JavaScript and layout
77 breaking <acronym
78 title="HyperText Markup Language">HTML</acronym> from getting through
79 the parser. None of them, however,
80 demonstrates a thorough knowledge of the <acronym
81 title="Document Type Definition">DTD</acronym> that defines <acronym
82 title="HyperText Markup Language">HTML</acronym>
83 or the caveats of <acronym
84 title="HyperText Markup Language">HTML</acronym> that cannot be expressed
85 by a <acronym
86 title="Document Type Definition">DTD</acronym>. Configurable
87 filters (such as kses or <acronym
88 title="PHP: Hypertext Preprocessor">PHP</acronym>'s built-in
89 <a href="http://us2.php.net/manual/en/function.strip-tags.php">striptags()</a>
90 function) have trouble
91 validating the contents of attributes and can be subject to security attacks
92 due to poor configuration. Other filters take the naive approach of
93 blacklisting known threats and tags, failing to account for the introduction
94 of new technologies, new tags, new attributes or quirky browser behavior.</p>
96 <p>However, HTML Purifier takes a different approach, one that doesn't use
97 specification-ignorant regexes or narrow blacklists. HTML Purifier will
98 decompose the whole document into tokens, and rigorously process the tokens by:
99 removing non-whitelisted elements, transforming bad practice tags like font
100 into span, properly checking the nesting of tags and their children and
101 validating all attributes according to their <acronym
102 title="Request for Comment">RFC</acronym>s.</p>
104 <p>To my knowledge, there is nothing like this on the web yet. Not even
105 <a href="http://www.mediawiki.org/wiki/MediaWiki">MediaWiki</a>,
106 which allows an amazingly diverse mix of <acronym
107 title="HyperText Markup Language">HTML</acronym> and wikitext in its documents,
108 gets all the nesting quirks right. Existing solutions hope that no JavaScript
109 will slip through, but either do not attempt to ensure that the resulting
110 output is valid <acronym
111 title="eXtensible HyperText Markup Language">XHTML</acronym>
112 or send the <acronym
113 title="HyperText Markup Language">HTML</acronym> through a draconic <acronym
114 title="eXtensible Markup Language">XML</acronym> parser (and yet
115 still get the nesting wrong: SafeHtmlChecker.class.php does not prevent a
116 tags from being nested within each other).</p>
118 <p>For further reading, see the <a href="./comparison.html">Comparison</a>
119 essay.</p>
121 <h2 id="Examples">Examples</h2>
123 <p>Here are some examples of HTML Purifier at work, filtering and
124 beautifying code, while allowing through as much as possible:</p>
126 <h3>Malicious code removed</h3>
127 <pre>&lt;a href=&quot;javascript:evil();&quot; onclick=&quot;evil();&quot;&gt;Evil!&lt;/a&gt;</pre>
128 <p>becomes</p>
129 <pre>&lt;a&gt;Evil!&lt;/a&gt;</pre>
130 <p>Yes, this is valid code, W3C says: <em>Authors may also create an A element
131 that specifies no anchors, i.e., that doesn't specify href, name, or id.</em></p>
133 <h3>Missing end tags corrected</h3>
134 <pre>&lt;b&gt;Bold</pre>
135 <p>becomes</p>
136 <pre>&lt;b&gt;Bold&lt;/b&gt;</pre>
138 <h3>Illegal nesting fixed</h3>
139 <pre>&lt;b&gt;Inline &lt;del&gt;context &lt;div&gt;No block allowed&lt;/div&gt;&lt;/del&gt;&lt;/b&gt;</pre>
140 <p>becomes</p>
141 <pre>&lt;b&gt;Inline &lt;del&gt;context No block allowed&lt;/del&gt;&lt;/b&gt;</pre>
142 <p>According to the <acronym title="Document Type Definition">DTD</acronym>,
143 flow level content including divs are allowed in del. But the specification
144 says that they are not allowed when the del is in an inline context. Tell
145 that to the validator...</p>
147 <h3>Rich formatting preserved</h3>
149 <pre>&lt;table&gt;
150 &lt;caption&gt;
151 Cool table
152 &lt;/caption&gt;
153 &lt;tfoot&gt;
154 &lt;tr&gt;
155 &lt;th&gt;I can do so much!&lt;/th&gt;
156 &lt;/tr&gt;
157 &lt;/tfoot&gt;
158 &lt;tr&gt;
159 &lt;td style=&quot;font-size:16pt;
160 color:#F00;font-family:sans-serif;
161 text-align:center;&quot;&gt;Wow&lt;/td&gt;
162 &lt;/tr&gt;
163 &lt;/table&gt;</pre>
165 <p>No elements or attributes were stripped from this example. Perhaps
166 you didn't even know <tt>tfoot</tt> existed, but HTML Purifier does!</p>
168 <h2 id="News">News</h2>
170 <h3>2006-06-14</h3>
172 <p class="lead">1.1.0 released today, this marks major feature enhancements.
173 Notable additions are documentation generation, better table definition,
174 and the ability to turn off XHTML output. See
175 <a href="http://hp.jpsband.org/svnroot/htmlpurifier/tags/1.1.0/NEWS">News</a>
176 for a complete changelog.</p>
178 <h3>2006-09-14</h3>
180 <p class="lead">The table parsing algorithm now has been made a lot smarter.
181 The specification requires that the elements of a table come in a specific
182 order (caption, col/colgroup, thead, tfoot, tbody/tr), so try jumbling up a
183 few of the items and watch HTML Purifier arrange
184 them into the correct sequence. This feature will be released with 1.1.</p>
186 <pre>
187 &lt;table&gt;
188 &lt;tfoot&gt;&lt;tr&gt;&lt;th&gt;What's the footer&lt;/th&gt;&lt;th&gt;doing up here?&lt;/th&gt;&lt;/tr&gt;&lt;/tfoot&gt;
189 &lt;tr&gt;&lt;td&gt;Cell 1&lt;/td&gt;&lt;td&gt;Cell 2&lt;/td&gt;&lt;/tr&gt;
190 &lt;caption&gt;Here's a sample table to try. This caption should be on
191 top of the table&lt;/caption&gt;
192 &lt;thead&gt;&lt;tr&gt;&lt;th&gt;Same with this&lt;/th&gt;&lt;th&gt;table header&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;
193 &lt;tfoot&gt;&lt;tr&gt;&lt;th&gt;Oops!&lt;/th&gt;&lt;th&gt;Double-footer! (a no-no)&lt;/th&gt;&lt;/tr&gt;&lt;/tfoot&gt;
194 &lt;/table&gt;
195 </pre>
197 <h3>2006-09-10</h3>
199 <p class="lead">A proof-of-concept documentation generator (<tt>configdoc</tt>)
200 was checked into the repository yesterday. While the code won't be packaged
201 in until 1.1, all of the configuration values exist in 1.0, so those of
202 you interested in more advanced configuration may be interested in the
203 output. Note that the document is still a little flaky and needs some
204 styling.</p>
206 <ul>
207 <li><a
208 href="http://hp.jpsband.org/live/configdoc/plain.html">ConfigDoc
209 generated documentation</a></li>
210 <li><a
211 href="http://hp.jpsband.org/live/configdoc/configdoc.xml">Unstyled
212 <acronym title="eXtensible Markup Language">XML</acronym> file</a></li>
213 <li><a
214 href="http://hp.jpsband.org/live/configdoc/styles/plain.xsl"><acronym
215 title="eXtensible Stylesheet Language">XSL</acronym> transformation
216 stylesheet used to generate the documentation</a></li>
217 </ul>
219 <h2 id="Download">Download</h2>
221 <p class="lead">The current version is
222 <strong>1.1.0</strong>. Download the library
223 in <a class="download" href="./releases/htmlpurifier-1.1.0.zip">a zip</a> or
224 <a class="download" href="./releases/htmlpurifier-1.1.0.tar.gz">a
225 tarball</a>.</p>
227 <p>You can also grab the latest developmental code from our Subversion
228 repository. Simply execute this command:</p>
230 <pre class="command">svn co http://hp.jpsband.org/svnroot/htmlpurifier/trunk ./</pre>
232 <p>...or <a href="http://hp.jpsband.org/svnroot/htmlpurifier/trunk">browse
233 anonymously</a> at that address. Previous releases can be obtained by browsing
234 the <a href="./releases">release directory</a>,
235 or checking code out of the
236 <a href="http://hp.jpsband.org/svnroot/htmlpurifier/tags">tags/
237 directory</a>.</p>
239 <p>SHA-1 checksums:</p>
241 <pre>
242 f45d34aaf59ced0e3de6857420c60add93ffcda0 htmlpurifier-1.1.0.tar.gz
243 5c2e042e69b36c6d7ea496230cd3fb0e9a6902cd htmlpurifier-1.1.0.zip
244 </pre>
246 <p>There are also .sig files which you can use to cryptographically verify
247 that the release is from me, Edward Z. Yang. You can find
248 my <a href="http://www.thewritingpot.com/gpgpubkey.asc">public key
249 here</a>. Signatures for
250 <a href="./releases/htmlpurifier-1.1.0.zip.sig">the zip</a> and
251 <a href="./releases/htmlpurifier-1.1.0.tar.gz.sig">the
252 tarball</a>.</p>
254 <p>Verify with these commands:</p>
256 <pre class="command">gpg --verify htmlpurifier-1.1.0.zip.sig</pre>
257 <pre class="command">gpg --verify htmlpurifier-1.1.0.tar.gz.sig</pre>
259 <p>You can be notified of new releases by a low-traffic announce list. Subscribe
260 here:</p>
262 <form method="post" action="http://scripts.dreamhost.com/add_list.cgi">
263 <input type="hidden" name="list" value="htmlpurifier@jpsband.org" />
264 <input type="hidden" name="domain" value="jpsband.org" />
265 <input type="hidden" name="emailit" value="1" />
266 <div>Name: <input name="name" /> E-mail: <input name="email" /></div>
267 <div><input type="submit" name="submit" value="Suscribe to Announcement List" />
268 <input type="submit" name="unsub" value="Unsubscribe" /></div>
269 </form>
271 <h2 id="Demo">Demo</h2>
272 <p class="lead">Enter your HTML and see how it will be filtered!</p>
273 <form name="filter" action="http://hp.jpsband.org/live/docs/examples/demo.php" method="post">
274 <textarea name="html" cols="50" rows="10"></textarea>
275 <div>
276 <input type="submit" value="Submit" name="submit" class="button" />
277 </div>
278 </form>
280 <h2 id="Resources">Resources</h2>
281 <ul>
282 <li><a href="./mantis/">Mantis Bugtracker</a> &mdash; Found a bug? Report
283 it here!</li>
284 <li><a href="./vanilla/">Vanilla Forum</a> &mdash; Talk about all things
285 HTML Purifier.</li>
286 <li><a href="http://hp.jpsband.org/live/smoketests/xssAttacks.php"><acronym
287 title="Cross Site Scripting">XSS</acronym>
288 Attacks Smoketest</a> &mdash; Tests how well HTML Purifier fares
289 against RSnake's famous cheatsheet of <acronym
290 title="Cross Site Scripting">XSS</acronym> attacks.</li>
291 <li><a href="http://hp.jpsband.org/live/TODO">Future planned features</a>
292 &mdash; All the way up to version 3!</li>
293 <li><a href="http://hp.jpsband.org/live/art/">Artwork</a>
294 &mdash; Extra media goodies.</li>
295 <li><a href="http://hp.jpsband.org/live/configdoc/plain.html">Configuration
296 documentation</a> &mdash; See the INSTALL document on how to
297 configure your HTML Purifier installation.</li>
298 <li><a href="http://hp.jpsband.org/doxygen/html">Doxygen-generated
299 Documentation</a> &mdash; No class left undocumented! Cross-referenced
300 code! A must-read for any prospective HTML Purifier hacker.
301 (close by, <a href="http://hp.jpsband.org/phpdoc">PHPDoc-generated
302 Documentation.</a>)</li>
303 </ul>
305 <h2 id="Contact">Contact</h2>
307 <p class="lead">You can also send me an email at
308 <a href="mailto:htmlpurifier@jpsband.org">htmlpurifier@jpsband.org</a></p>
310 <h2 id="License">License</h2>
312 <p class="lead">This library is licensed under the
313 <a href="http://www.gnu.org/licenses/lgpl.html">LGPL v2.1+</a>. License
314 statements are only on <tt>HTMLPurifier.php</tt>, if someone insists
315 I add it to all the files, I will, but it seems like such a waste of
316 perfectly good space.</p>
318 </div>
320 </body>
321 </html>