3 How to install HTML Purifier
5 Being a library, there's no fancy GUI that will take you step-by-step through
6 configuring database credentials and other mumbo-jumbo. HTML Purifier is
7 designed to run "out of the box." Regardless, there are still a couple of
8 things you should be mindful of.
14 HTML Purifier works in both PHP 4 and PHP 5. I have run the test suite on
22 And can confidently say that HTML Purifier should work in all versions
23 between and afterwards. HTML Purifier definitely does not support PHP 4.2,
24 and PHP 4.3 branch support may go further back than that, but I haven't tested
27 I have been unable to get PHP 5.0.5 working on my computer, so if someone
28 wants to test that, be my guest. All tests were done on Windows XP Home,
29 but operating system should not be a major factor in the library.
33 1. Including the proper files
35 The library/ directory must be added to your path: HTML Purifier will not be
36 able to find the necessary includes otherwise. This is as simple as:
38 set_include_path('/path/to/htmlpurifier/library' . PATH_SEPARATOR .
41 ...replacing /path/to/htmlpurifier with the actual location of the folder. Don't
42 worry, HTML Purifier is namespaced so unless you have another file named
43 HTMLPurifier.php, the files won't collide with any of your includes.
45 Then, it's a simple matter of including the base file:
47 require_once 'HTMLPurifier.php';
49 ...and you're good to go.
53 2. Preparing the proper environment
55 While no configuration is necessary, you first should take precautions regarding
56 the other output HTML that the filtered content will be going along with. Here
57 is a (short) checklist:
59 * Have I specified XHTML 1.0 Transitional as the doctype?
60 * Have I specified UTF-8 as the character encoding?
62 To find out what these are, browse to your website and view its source code.
63 You can figure out the doctype from the a declaration that looks like
64 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
65 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
66 or no doctype. You can figure out the character encoding by looking for
67 <meta http-equiv="Content-type" content="text/html;charset=ENCODING">
69 I cannot stress the importance of these two bullets enough. Omitting either
70 of them could have dire consequences not only for security but for plain
71 old usability. You can find a more in-depth discussion of why this is needed
72 in docs/security.txt, in the meantime, try to change your output so this is
73 the case. If you can't, well, we might be able to accomodate you (read
78 3. Configuring HTML Purifier
80 HTML Purifier is designed to run out-of-the-box, but occasionally HTML
81 Purifier needs to be told what to do.
83 If, for some reason, you are unable to switch to UTF-8 immediately, you can
84 switch HTML Purifier's encoding. Note that the availability of encodings is
85 dependent on iconv, and you'll be missing characters if the charset you
86 choose doesn't have them.
88 $config->set('Core', 'Encoding', /* put your encoding here */);
90 An example usage for Latin-1 websites:
92 $config->set('Core', 'Encoding', 'ISO-8859-1');
94 For those of you stuck using HTML 4.01 Transitional, you can disable
95 XHTML output like this:
97 $config->set('Core', 'XHTML', false);
99 However, I strongly recommend that you use XHTML. Currently, we can only
100 guarantee transitional-complaint output, future versions will also allow strict
107 The interface is mind-numbingly simple:
109 $purifier = new HTMLPurifier();
110 $clean_html = $purifier->purify($dirty_html);
112 Or, if you're using the configuration object:
114 $purifier = new HTMLPurifier($config);
115 $clean_html = $purifier->purify($dirty_html);
117 That's it. For more examples, check out docs/examples/. Also, SLOW gives
118 advice on what to do if HTML Purifier is slowing down your application.
124 If your website is in UTF-8 and XHTML Transitional, use this code:
127 set_include_path('/path/to/htmlpurifier/library'
128 . PATH_SEPARATOR . get_include_path() );
129 require_once 'HTMLPurifier.php';
130 $purifier = new HTMLPurifier();
132 $clean_html = $purifier->purify($dirty_html);
135 If your website is in a different encoding or doctype, use this code:
138 set_include_path('/path/to/htmlpurifier/library'
139 . PATH_SEPARATOR . get_include_path() );
140 require_once 'HTMLPurifier.php';
142 $config = HTMLPurifier_Config::createDefault();
143 $config->set('Core', 'Encoding', 'ISO-8859-1'); //replace with your encoding
144 $config->set('Core', 'XHTML', true); //replace with false if HTML 4.01
145 $purifier = new HTMLPurifier($config);
147 $clean_html = $purifier->purify($dirty_html);