Add docs and facilities for having separate directories of schemas.
[htmlpurifier.git] / docs / dev-advanced-api.html
blob0233a56dfea1f049e150b05100f49404f8ba211e
1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
4 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
5 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
6 <meta name="description" content="Specification for HTML Purifier's advanced API for defining custom filtering behavior." />
7 <link rel="stylesheet" type="text/css" href="style.css" />
9 <title>Advanced API - HTML Purifier</title>
11 </head><body>
13 <h1>Advanced API</h1>
15 <div id="filing">Filed under Development</div>
16 <div id="index">Return to the <a href="index.html">index</a>.</div>
17 <div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
19 <p>
20 <strong>Warning:</strong> This document may be out-of-date. When in doubt,
21 consult the source code documentation.
22 </p>
24 <p>HTML Purifier currently natively supports only a subset of HTML's
25 allowed elements, attributes, and behavior; specifically, this subset
26 is the set of elements that are safe for untrusted users to use.
27 However, HTML Purifier is often utilized to ensure standards-compliance
28 from input that is trusted (making it a sort of Tidy substitute),
29 and often users need to define new elements or attributes. The
30 advanced API is oriented specifically for these use-cases.</p>
32 <p>Our goals are to let the user:</p>
34 <dl>
35 <dt>Select</dt>
36 <dd><ul>
37 <li>Doctype</li>
38 <!-- <li>Filterset</li> -->
39 <li>Elements / Attributes / Modules</li>
40 <li>Tidy</li>
41 </ul></dd>
42 <dt>Customize</dt>
43 <dd><ul>
44 <li>Attributes</li>
45 <li>Elements</li>
46 <!--<li>Doctypes</li>-->
47 </ul></dd>
48 </dl>
50 <h2>Select</h2>
52 <p>For basic use, the user will have to specify some basic parameters. This
53 is not strictly necessary, as HTML Purifier's default setting will always
54 output safe code, but is required for standards-compliant output.</p>
56 <h3>Selecting a Doctype</h3>
58 <p>The first thing to select is the <strong>doctype</strong>. This
59 is essential for standards-compliant output.</p>
61 <p class="technical">This identifier is based
62 on the name the W3C has given to the document type and <em>not</em>
63 the DTD identifier.</p>
65 <p>This parameter is set via the configuration object:</p>
67 <pre>$config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional');</pre>
69 <p>Due to historical reasons, the default doctype is XHTML 1.0
70 Transitional, however, we really shouldn't be guessing what the user's
71 doctype is. Fortunantely, people who can't be bothered to set this won't
72 be bothered when their pages stop validating.</p>
74 <h3>Selecting Elements / Attributes / Modules</h3>
76 <p>HTML Purifier will, by default, allow as many elements and attributes
77 as possible. However, a user may decide to roll their own filterset by
78 selecting modules, elements and attributes to allow for their own
79 specific use-case. This can be done using %HTML.Allowed:</p>
81 <pre>$config->set('HTML', 'Allowed', 'a[href|title],em,p,blockquote');</pre>
83 <p class="technical">The directive %HTML.Allowed is a convenience feature
84 that may be fully expressed with the legacy interface.</p>
86 <p>We currently support another interface from older versions:</p>
88 <pre>$config->set('HTML', 'AllowedElements', 'a,em,p,blockquote');
89 $config->set('HTML', 'AllowedAttributes', 'a.href,a.title');</pre>
91 <p>A user may also choose to allow modules using a specialized
92 directive:</p>
94 <pre>$config->set('HTML', 'AllowedModules', 'Hypertext,Text,Lists');</pre>
96 <p>But it is not expected that this feature will be widely used.</p>
98 <p class="technical">Module selection will work slightly differently
99 from the other AllowedElements and AllowedAttributes directives by
100 directly modifying the doctype you are operating in, in the spirit of
101 XHTML 1.1's modularization. We stop users from shooting themselves in the
102 foot by mandating the modules in %HTML.CoreModules be used.</p>
104 <p class="technical">Modules are distinguished from regular elements by the
105 case of their first letter. While XML distinguishes between and allows
106 lower and uppercase letters in element names, XHTML uses only lower-case
107 element names for sake of consistency.</p>
109 <h3>Selecting Tidy</h3>
111 <p>The name of this segment of functionality is inspired off of Dave
112 Ragget's program HTML Tidy, which purported to help clean up HTML. In
113 HTML Purifier, Tidy functionality involves turning unsupported and
114 deprecated elements into standards-compliant ones, maintaining
115 backwards compatibility, and enforcing best practices.</p>
117 <p>This is a complicated feature, and is explained more in depth at
118 <a href="enduser-tidy.html">the Tidy documentation page</a>.</p>
120 <!--
121 <h3>Unified selector</h3>
123 <p>Because selecting each and every one of these configuration options
124 is a chore, we may wish to offer a specialized configuration method
125 for selecting a filterset. Possibility:</p>
127 <pre>function selectFilter($doctype, $filterset, $tidy)</pre>
129 <p>...which is simply a light wrapper over the individual configuration
130 calls. A custom config file format or text format could also be adopted.</p>
133 <h2>Customize</h2>
135 <p>By reviewing topic posts in the support forum, we determined that
136 there were two primarily demanded customization features people wanted:
137 to add an attribute to an existing element, and to add an element.
138 Thus, we'll want to create convenience functions for these common
139 use-cases.</p>
141 <p>Note that the functions described here are only available if
142 a raw copy of <code>HTMLPurifier_HTMLDefinition</code> was retrieved.
143 Furthermore, caching may prevent your changes from immediately
144 being seen: consult <a href="enduser-customize.html">enduser-customize.html</a> on how
145 to work around this.</p>
147 <h3>Attributes</h3>
149 <p>An attribute is bound to an element by a name and has a specific
150 <code>AttrDef</code> that validates it. The interface is therefore:</p>
152 <pre>function addAttribute($element, $attribute, $attribute_def);</pre>
154 <p>Example of the functionality in action:</p>
156 <pre>$def->addAttribute('a', 'rel', 'Enum#nofollow');</pre>
158 <p>The <code>$attribute_def</code> value is flexible,
159 to make things simpler. It can be a literal object or:</p>
161 <ul>
162 <!--<li>Class name: We'll instantiate it for you</li>
163 <li>Function name: We'll create an <code>HTMLPurifier_AttrDef_Anonymous</code>
164 class with that function registered as a callback.</li>-->
165 <li>String attribute type: We'll use <code>HTMLPurifier_AttrTypes</code>
166 to resolve it for you. Any data that follows a hash mark (#) will
167 be used to customize the attribute type: in the example above,
168 we specify which values for Enum to allow.</li>
169 </ul>
171 <h3>Elements</h3>
173 <p>An element requires certain information as specified by
174 <code>HTMLPurifier_ElementDef</code>. However, not all of it is necessary,
175 the usual things required are:</p>
177 <ul>
178 <li>Attributes</li>
179 <li>Content model/type</li>
180 <li>Registration in a content set</li>
181 </ul>
183 <p>This suggests an API like this:</p>
185 <pre>function addElement($element, $type, $contents,
186 $attr_collections = array(); $attributes = array());</pre>
188 <p>Each parameter explained in depth:</p>
190 <dl>
191 <dt><code>$element</code></dt>
192 <dd>Element name, ex. 'label'</dd>
193 <dt><code>$type</code></dt>
194 <dd>Content set to register in, ex. 'Inline' or 'Flow'</dd>
195 <dt><code>$contents</code></dt>
196 <dd>Description of allowed children. This is a merged form of
197 <code>HTMLPurifier_ElementDef</code>'s member variables
198 <code>$content_model</code> and <code>$content_model_type</code>,
199 where the form is <q>Type: Model</q>, ex. 'Optional: Inline'.
200 There are also a number of predefined templates one may use.</dd>
201 <dt><code>$attr_collections</code></dt>
202 <dd>Array (or string if only one) of attribute collection(s) to
203 merge into the attributes array.</dd>
204 <dt><code>$attributes</code></dt>
205 <dd>Array of attribute names to attribute definitions, much like
206 the above-described attribute customization.</dd>
207 </dl>
209 <p>A possible usage:</p>
211 <pre>$def->addElement('font', 'Inline', 'Optional: Inline', 'Common',
212 array('color' => 'Color'));</pre>
214 <p>See <code>HTMLPurifier/HTMLModule.php</code> for details.</p>
216 </body></html>
218 <!-- vim: et sw=4 sts=4