Fix prototype impedance in HTMLDefinition and typo in
[htmlpurifier.git] / docs / enduser-customize.html
blobfbd0f2adb0af6dd4180b5fc816da08c0ca30205f
1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
4 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
5 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
6 <meta name="description" content="Tutorial for customizing HTML Purifier's tag and attribute sets." />
7 <link rel="stylesheet" type="text/css" href="style.css" />
9 <title>Customize - HTML Purifier</title>
11 </head><body>
13 <h1 class="subtitled">Customize!</h1>
14 <div class="subtitle">HTML Purifier is a Swiss-Army Knife</div>
16 <div id="filing">Filed under End-User</div>
17 <div id="index">Return to the <a href="index.html">index</a>.</div>
18 <div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
20 <p>
21 HTML Purifier has this quirk where if you try to allow certain elements or
22 attributes, HTML Purifier will tell you that it's not supported, and that
23 you should go to the forums to find out how to implement it. Well, this
24 document is how to implement elements and attributes which HTML Purifier
25 doesn't support out of the box.
26 </p>
28 <h2>Is it necessary?</h2>
30 <p>
31 Before we even write any code, it is paramount to consider whether or
32 not the code we're writing is necessary or not. HTML Purifier, by default,
33 contains a large set of elements and attributes: large enough so that
34 <em>any</em> element or attribute in XHTML 1.0 or 1.1 (and its HTML variants)
35 that can be safely used by the general public is implemented.
36 </p>
38 <p>
39 So what needs to be implemented? (Feel free to skip this section if
40 you know what you want).
41 </p>
43 <h3>XHTML 1.0</h3>
45 <p>
46 All of the modules listed below are based off of the
47 <a href="http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/abstract_modules.html#sec_5.2.">modularization of
48 XHTML</a>, which, while technically for XHTML 1.1, is quite a useful
49 resource.
50 </p>
52 <ul>
53 <li>Structure</li>
54 <li>Frames</li>
55 <li>Applets (deprecated)</li>
56 <li>Forms</li>
57 <li>Image maps</li>
58 <li>Objects</li>
59 <li>Frames</li>
60 <li>Events</li>
61 <li>Meta-information</li>
62 <li>Style sheets</li>
63 <li>Link (not hypertext)</li>
64 <li>Base</li>
65 <li>Name</li>
66 </ul>
68 <p>
69 If you don't recognize it, you probably don't need it. But the curious
70 can look all of these modules up in the above-mentioned document. Note
71 that inline scripting comes packaged with HTML Purifier (more on this
72 later).
73 </p>
75 <h3>XHTML 1.1</h3>
77 <p>
78 As of HTMLPurifier 2.1.0, we have implemented the
79 <a href="http://www.w3.org/TR/2001/REC-ruby-20010531/">Ruby module</a>,
80 which defines a set of tags
81 for publishing short annotations for text, used mostly in Japanese
82 and Chinese school texts, but applicable for positioning any text (not
83 limited to translations) above or below other corresponding text.
84 </p>
86 <h3>XHTML 2.0</h3>
88 <p>
89 <a href="http://www.w3.org/TR/xhtml2/">XHTML 2.0</a> is still a
90 working draft, so any elements introduced in the
91 specification have not been implemented and will not be implemented
92 until we get a recommendation or proposal. Because XHTML 2.0 is
93 an entirely new markup language, implementing rules for it will be
94 no easy task.
95 </p>
97 <h3>HTML 5</h3>
99 <p>
100 <a href="http://www.whatwg.org/specs/web-apps/current-work/">HTML 5</a>
101 is a fork of HTML 4.01 by WHATWG, who believed that XHTML 2.0 was headed
102 in the wrong direction. It too is a working draft, and may change
103 drastically before publication, but it should be noted that the
104 <code>canvas</code> tag has been implemented by many browser vendors.
105 </p>
107 <h3>Proprietary</h3>
110 There are a number of proprietary tags still in the wild. Many of them
111 have been documented in <a href="ref-proprietary-tags.txt">ref-proprietary-tags.txt</a>,
112 but there is currently no implementation for any of them.
113 </p>
115 <h3>Extensions</h3>
118 There are also a number of other XML languages out there that can
119 be embedded in HTML documents: two of the most popular are MathML and
120 SVG, and I frequently get requests to implement these. But they are
121 expansive, comprehensive specifications, and it would take far too long
122 to implement them <em>correctly</em> (most systems I've seen go as far
123 as whitelisting tags and no further; come on, what about nesting!)
124 </p>
127 Word of warning: HTML Purifier is currently <em>not</em> namespace
128 aware.
129 </p>
131 <h2>Giving back</h2>
134 As you may imagine from the details above (don't be abashed if you didn't
135 read it all: a glance over would have done), there's quite a bit that
136 HTML Purifier doesn't implement. Recent architectural changes have
137 allowed HTML Purifier to implement elements and attributes that are not
138 safe! Don't worry, they won't be activated unless you set %HTML.Trusted
139 to true, but they certainly help out users who need to put, say, forms
140 on their page and don't want to go through the trouble of reading this
141 and implementing it themself.
142 </p>
145 So any of the above that you implement for your own application could
146 help out some other poor sap on the other side of the globe. Help us
147 out, and send back code so that it can be hammered into a module and
148 released with the core. Any code would be greatly appreciated!
149 </p>
151 <h2>And now...</h2>
154 Enough philosophical talk, time for some code:
155 </p>
157 <pre>$config = HTMLPurifier_Config::createDefault();
158 $config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
159 $config-&gt;set('HTML.DefinitionRev', 1);
160 $def = $config-&gt;getHTMLDefinition(true);</pre>
163 Assuming that HTML Purifier has already been properly loaded (hint:
164 include <code>HTMLPurifier.auto.php</code>), this code will set up
165 the environment that you need to start customizing the HTML definition.
166 What's going on?
167 </p>
169 <ul>
170 <li>
171 The first three lines are regular configuration code:
172 <ul>
173 <li>
174 %HTML.DefinitionID is set to a unique identifier for your
175 custom HTML definition. This prevents it from clobbering
176 other custom definitions on the same installation.
177 </li>
178 <li>
179 %HTML.DefinitionRev is a revision integer of your HTML
180 definition. Because HTML definitions are cached, you'll need
181 to increment this whenever you make a change in order to flush
182 the cache.
183 </li>
184 </ul>
185 </li>
186 <li>
187 The fourth line retrieves a raw <code>HTMLPurifier_HTMLDefinition</code>
188 object that we will be tweaking. If the parameter was removed, we
189 would be retrieving a fully formed definition object, which is somewhat
190 useless for customization purposes.
191 </li>
192 </ul>
194 <h3>Broken backwards-compatibility</h3>
197 Those of you who have already been twiddling around with the raw
198 HTML definition object, you'll be noticing that you're getting an error
199 when you attempt to retrieve the raw definition object without specifying
200 a DefinitionID. It is vital to caching (see below) that you make a unique
201 name for your customized definition, so make up something right now and
202 things will operate again.
203 </p>
205 <h2>Turn off caching</h2>
208 To make development easier, we're going to temporarily turn off
209 definition caching:
210 </p>
212 <pre>$config = HTMLPurifier_Config::createDefault();
213 $config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
214 $config-&gt;set('HTML.DefinitionRev', 1);
215 <strong>$config-&gt;set('Cache.DefinitionImpl', null); // TODO: remove this later!</strong>
216 $def = $config-&gt;getHTMLDefinition(true);</pre>
219 A few things should be mentioned about the caching mechanism before
220 we move on. For performance reasons, HTML Purifier caches generated
221 <code>HTMLPurifier_Definition</code> objects in serialized files
222 stored (by default) in <code>library/HTMLPurifier/DefinitionCache/Serializer</code>.
223 A lot of processing is done in order to create these objects, so it
224 makes little sense to repeat the same processing over and over again
225 whenever HTML Purifier is called.
226 </p>
229 In order to identify a cache entry, HTML Purifier uses three variables:
230 the library's version number, the value of %HTML.DefinitionRev and
231 a serial of relevant configuration. Whenever any of these changes,
232 a new HTML definition is generated. Notice that there is no way
233 for the definition object to track changes to customizations: here, it
234 is up to you to supply appropriate information to DefinitionID and
235 DefinitionRev.
236 </p>
238 <h2 id="addAttribute">Add an attribute</h2>
241 For this example, we're going to implement the <code>target</code> attribute found
242 on <code>a</code> elements. To implement an attribute, we have to
243 ask a few questions:
244 </p>
246 <ol>
247 <li>What element is it found on?</li>
248 <li>What is its name?</li>
249 <li>Is it required or optional?</li>
250 <li>What are valid values for it?</li>
251 </ol>
254 The first three are easy: the element is <code>a</code>, the attribute
255 is <code>target</code>, and it is not a required attribute. (If it
256 was required, we'd need to append an asterisk to the attribute name,
257 you'll see an example of this in the addElement() example).
258 </p>
261 The last question is a little trickier.
262 Lets allow the special values: _blank, _self, _target and _top.
263 The form of this is called an <strong>enumeration</strong>, a list of
264 valid values, although only one can be used at a time. To translate
265 this into code form, we write:
266 </p>
268 <pre>$config = HTMLPurifier_Config::createDefault();
269 $config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
270 $config-&gt;set('HTML.DefinitionRev', 1);
271 $config-&gt;set('Cache.DefinitionImpl', null); // remove this later!
272 $def = $config-&gt;getHTMLDefinition(true);
273 <strong>$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');</strong></pre>
276 The <code>Enum#_blank,_self,_target,_top</code> does all the magic.
277 The string is split into two parts, separated by a hash mark (#):
278 </p>
280 <ol>
281 <li>The first part is the name of what we call an <code>AttrDef</code></li>
282 <li>The second part is the parameter of the above-mentioned <code>AttrDef</code></li>
283 </ol>
286 If that sounds vague and generic, it's because it is! HTML Purifier defines
287 an assortment of different attribute types one can use, and each of these
288 has their own specialized parameter format. Here are some of the more useful
289 ones:
290 </p>
292 <table class="table">
293 <thead>
294 <tr>
295 <th>Type</th>
296 <th>Format</th>
297 <th>Description</th>
298 </tr>
299 </thead>
300 <tbody>
301 <tr>
302 <th>Enum</th>
303 <td><em>[s:]</em>value1,value2,...</td>
304 <td>
305 Attribute with a number of valid values, one of which may be used. When
306 s: is present, the enumeration is case sensitive.
307 </td>
308 </tr>
309 <tr>
310 <th>Bool</th>
311 <td>attribute_name</td>
312 <td>
313 Boolean attribute, with only one valid value: the name
314 of the attribute.
315 </td>
316 </tr>
317 <tr>
318 <th>CDATA</th>
319 <td></td>
320 <td>
321 Attribute of arbitrary text. Can also be referred to as <strong>Text</strong>
322 (the specification makes a semantic distinction between the two).
323 </td>
324 </tr>
325 <tr>
326 <th>ID</th>
327 <td></td>
328 <td>
329 Attribute that specifies a unique ID
330 </td>
331 </tr>
332 <tr>
333 <th>Pixels</th>
334 <td></td>
335 <td>
336 Attribute that specifies an integer pixel length
337 </td>
338 </tr>
339 <tr>
340 <th>Length</th>
341 <td></td>
342 <td>
343 Attribute that specifies a pixel or percentage length
344 </td>
345 </tr>
346 <tr>
347 <th>NMTOKENS</th>
348 <td></td>
349 <td>
350 Attribute that specifies a number of name tokens, example: the
351 <code>class</code> attribute
352 </td>
353 </tr>
354 <tr>
355 <th>URI</th>
356 <td></td>
357 <td>
358 Attribute that specifies a URI, example: the <code>href</code>
359 attribute
360 </td>
361 </tr>
362 <tr>
363 <th>Number</th>
364 <td></td>
365 <td>
366 Attribute that specifies an positive integer number
367 </td>
368 </tr>
369 </tbody>
370 </table>
373 For a complete list, consult
374 <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/AttrTypes.php"><code>library/HTMLPurifier/AttrTypes.php</code></a>;
375 more information on attributes that accept parameters can be found on their
376 respective includes in
377 <a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/AttrDef"><code>library/HTMLPurifier/AttrDef</code></a>.
378 </p>
381 Sometimes, the restrictive list in AttrTypes just doesn't cut it. Don't
382 sweat: you can also use a fully instantiated object as the value. The
383 equivalent, verbose form of the above example is:
384 </p>
386 <pre>$config = HTMLPurifier_Config::createDefault();
387 $config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
388 $config-&gt;set('HTML.DefinitionRev', 1);
389 $config-&gt;set('Cache.DefinitionImpl', null); // remove this later!
390 $def = $config-&gt;getHTMLDefinition(true);
391 <strong>$def-&gt;addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(
392 array('_blank','_self','_target','_top')
393 ));</strong></pre>
396 Trust me, you'll learn to love the shorthand.
397 </p>
399 <h2>Add an element</h2>
402 Adding attributes is really small-fry stuff, though, and it was possible
403 to add them (albeit a bit more wordy) prior to 2.0. The real gem of
404 the Advanced API is adding elements. There are five questions to
405 ask when adding a new element:
406 </p>
408 <ol>
409 <li>What is the element's name?</li>
410 <li>What content set does this element belong to?</li>
411 <li>What are the allowed children of this element?</li>
412 <li>What attributes does the element allow that are general?</li>
413 <li>What attributes does the element allow that are specific to this element?</li>
414 </ol>
417 It's a mouthful, and you'll be slightly lost if your not familiar with
418 the HTML specification, so let's explain them step by step.
419 </p>
421 <h3>Content set</h3>
424 The HTML specification defines two major content sets: Inline
425 and Block. Each of these
426 content sets contain a list of elements: Inline contains things like
427 <code>span</code> and <code>b</code> while Block contains things like
428 <code>div</code> and <code>blockquote</code>.
429 </p>
432 These content sets amount to a macro mechanism for HTML definition. Most
433 elements in HTML are organized into one of these two sets, and most
434 elements in HTML allow elements from one of these sets. If we had
435 to write each element verbatim into each other element's allowed
436 children, we would have ridiculously large lists; instead we use
437 content sets to compactify the declaration.
438 </p>
441 Practically speaking, there are several useful values you can use here:
442 </p>
444 <table class="table">
445 <thead>
446 <tr>
447 <th>Content set</th>
448 <th>Description</th>
449 </tr>
450 </thead>
451 <tbody>
452 <tr>
453 <th>Inline</th>
454 <td>Character level elements, text</td>
455 </tr>
456 <tr>
457 <th>Block</th>
458 <td>Block-like elements, like paragraphs and lists</td>
459 </tr>
460 <tr>
461 <th><em>false</em></th>
462 <td>
463 Any element that doesn't fit into the mold, for example <code>li</code>
464 or <code>tr</code>
465 </td>
466 </tr>
467 </tbody>
468 </table>
471 By specifying a valid value here, all other elements that use that
472 content set will also allow your element, without you having to do
473 anything. If you specify <em>false</em>, you'll have to register
474 your element manually.
475 </p>
477 <h3>Allowed children</h3>
480 Allowed children defines the elements that this element can contain.
481 The allowed values may range from none to a complex regexp depending on
482 your element.
483 </p>
486 If you've ever taken a look at the HTML DTD's before, you may have
487 noticed declarations like this:
488 </p>
490 <pre>&lt;!ELEMENT LI - O (%flow;)* -- list item --&gt;</pre>
493 The <code>(%flow;)*</code> indicates the allowed children of the
494 <code>li</code> tag: <code>li</code> allows any number of flow
495 elements as its children. (The <code>- O</code> allows the closing tag to be
496 omitted, though in XML this is not allowed.) In HTML Purifier,
497 we'd write it like <code>Flow</code> (here's where the content sets
498 we were discussing earlier come into play). There are three shorthand
499 content models you can specify:
500 </p>
502 <table class="table">
503 <thead>
504 <tr>
505 <th>Content model</th>
506 <th>Description</th>
507 </tr>
508 </thead>
509 <tbody>
510 <tr>
511 <th>Empty</th>
512 <td>No children allowed, like <code>br</code> or <code>hr</code></td>
513 </tr>
514 <tr>
515 <th>Inline</th>
516 <td>Any number of inline elements and text, like <code>span</code></td>
517 </tr>
518 <tr>
519 <th>Flow</th>
520 <td>Any number of inline elements, block elements and text, like <code>div</code></td>
521 </tr>
522 </tbody>
523 </table>
526 This covers 90% of all the cases out there, but what about elements that
527 break the mold like <code>ul</code>? This guy requires at least one
528 child, and the only valid children for it are <code>li</code>. The
529 content model is: <code>Required: li</code>. There are two parts: the
530 first type determines what <code>ChildDef</code> will be used to validate
531 content models. The most common values are:
532 </p>
534 <table class="table">
535 <thead>
536 <tr>
537 <th>Type</th>
538 <th>Description</th>
539 </tr>
540 </thead>
541 <tbody>
542 <tr>
543 <th>Required</th>
544 <td>Children must be one or more of the valid elements</td>
545 </tr>
546 <tr>
547 <th>Optional</th>
548 <td>Children can be any number of the valid elements</td>
549 </tr>
550 <tr>
551 <th>Custom</th>
552 <td>Children must follow the DTD-style regex</td>
553 </tr>
554 </tbody>
555 </table>
558 You can also implement your own <code>ChildDef</code>: this was done
559 for a few special cases in HTML Purifier such as <code>Chameleon</code>
560 (for <code>ins</code> and <code>del</code>), <code>StrictBlockquote</code>
561 and <code>Table</code>.
562 </p>
565 The second part specifies either valid elements or a regular expression.
566 Valid elements are separated with horizontal bars (|), i.e.
567 "<code>a | b | c</code>". Use #PCDATA to represent plain text.
568 Regular expressions are based off of DTD's style:
569 </p>
571 <ul>
572 <li>Parentheses () are used for grouping</li>
573 <li>Commas (,) separate elements that should come one after another</li>
574 <li>Horizontal bars (|) indicate one or the other elements should be used</li>
575 <li>Plus signs (+) are used for a one or more match</li>
576 <li>Asterisks (*) are used for a zero or more match</li>
577 <li>Question marks (?) are used for a zero or one match</li>
578 </ul>
581 For example, "<code>a, b?, (c | d), e+, f*</code>" means "In this order,
582 one <code>a</code> element, at most one <code>b</code> element,
583 one <code>c</code> or <code>d</code> element (but not both), one or more
584 <code>e</code> elements, and any number of <code>f</code> elements."
585 Regex veterans should be able to jump right in, and those not so savvy
586 can always copy-paste W3C's content model definitions into HTML Purifier
587 and hope for the best.
588 </p>
591 A word of warning: while the regex format is extremely flexible on
592 the developer's side, it is
593 quite unforgiving on the user's side. If the user input does not <em>exactly</em>
594 match the specification, the entire contents of the element will
595 be nuked. This is why there is are specific content model types like
596 Optional and Required: while they could be implemented as <code>Custom:
597 (valid | elements)*</code>, the custom classes contain special recovery
598 measures that make sure as much of the user's original content gets
599 through. HTML Purifier's core, as a rule, does not use Custom.
600 </p>
603 One final note: you can also use Content Sets inside your valid elements
604 lists or regular expressions. In fact, the three shorthand content models
605 mentioned above are just that: abbreviations:
606 </p>
608 <table class="table">
609 <thead>
610 <tr>
611 <th>Content model</th>
612 <th>Implementation</th>
613 </tr>
614 </thead>
615 <tbody>
616 <tr>
617 <th>Inline</th>
618 <td>Optional: Inline | #PCDATA</td>
619 </tr>
620 <tr>
621 <th>Flow</th>
622 <td>Optional: Flow | #PCDATA</td>
623 </tr>
624 </tbody>
625 </table>
628 When the definition is compiled, Inline will be replaced with a
629 horizontal-bar separated list of inline elements. Also, notice that
630 it does not contain text: you have to specify that yourself.
631 </p>
633 <h3>Common attributes</h3>
636 Congratulations: you have just gotten over the proverbial hump (Allowed
637 children). Common attributes is much simpler, and boils down to
638 one question: does your element have the <code>id</code>, <code>style</code>,
639 <code>class</code>, <code>title</code> and <code>lang</code> attributes?
640 If so, you'll want to specify the <code>Common</code> attribute collection,
641 which contains these five attributes that are found on almost every
642 HTML element in the specification.
643 </p>
646 There are a few more collections, but they're really edge cases:
647 </p>
649 <table class="table">
650 <thead>
651 <tr>
652 <th>Collection</th>
653 <th>Attributes</th>
654 </tr>
655 </thead>
656 <tbody>
657 <tr>
658 <th>I18N</th>
659 <td><code>lang</code>, possibly <code>xml:lang</code></td>
660 </tr>
661 <tr>
662 <th>Core</th>
663 <td><code>style</code>, <code>class</code>, <code>id</code> and <code>title</code></td>
664 </tr>
665 </tbody>
666 </table>
669 Common is a combination of the above-mentioned collections.
670 </p>
672 <p class="aside">
673 Readers familiar with the modularization may have noticed that the Core
674 attribute collection differs from that specified by the <a
675 href="http://www.w3.org/TR/xhtml-modularization/abstract_modules.html#s_commonatts">abstract
676 modules of the XHTML Modularization 1.1</a>. We believe this section
677 to be in error, as <code>br</code> permits the use of the <code>style</code>
678 attribute even though it uses the <code>Core</code> collection, and
679 the DTD and XML Schemas supplied by W3C support our interpretation.
680 </p>
682 <h3>Attributes</h3>
685 If you didn't read the <a href="#addAttribute">earlier section on
686 adding attributes</a>, read it now. The last parameter is simply
687 an array of attribute names to attribute implementations, in the exact
688 same format as <code>addAttribute()</code>.
689 </p>
691 <h3>Putting it all together</h3>
694 We're going to implement <code>form</code>. Before we embark, lets
695 grab a reference implementation from over at the
696 <a href="http://www.w3.org/TR/html4/sgml/loosedtd.html">transitional DTD</a>:
697 </p>
699 <pre>&lt;!ELEMENT FORM - - (%flow;)* -(FORM) -- interactive form --&gt;
700 &lt;!ATTLIST FORM
701 %attrs; -- %coreattrs, %i18n, %events --
702 action %URI; #REQUIRED -- server-side form handler --
703 method (GET|POST) GET -- HTTP method used to submit the form--
704 enctype %ContentType; &quot;application/x-www-form-urlencoded&quot;
705 accept %ContentTypes; #IMPLIED -- list of MIME types for file upload --
706 name CDATA #IMPLIED -- name of form for scripting --
707 onsubmit %Script; #IMPLIED -- the form was submitted --
708 onreset %Script; #IMPLIED -- the form was reset --
709 target %FrameTarget; #IMPLIED -- render in this frame --
710 accept-charset %Charsets; #IMPLIED -- list of supported charsets --
711 &gt;</pre>
714 Juicy! With just this, we can answer four of our five questions:
715 </p>
717 <ol>
718 <li>What is the element's name? <strong>form</strong></li>
719 <li>What content set does this element belong to? <strong>Block</strong>
720 (this needs a little sleuthing, I find the easiest way is to search
721 the DTD for <code>FORM</code> and determine which set it is in.)</li>
722 <li>What are the allowed children of this element? <strong>One
723 or more flow elements, but no nested <code>form</code>s</strong></li>
724 <li>What attributes does the element allow that are general? <strong>Common</strong></li>
725 <li>What attributes does the element allow that are specific to this element? <strong>A whole bunch, see ATTLIST;
726 we're going to do the vital ones: <code>action</code>, <code>method</code> and <code>name</code></strong></li>
727 </ol>
730 Time for some code:
731 </p>
733 <pre>$config = HTMLPurifier_Config::createDefault();
734 $config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
735 $config-&gt;set('HTML.DefinitionRev', 1);
736 $config-&gt;set('Cache.DefinitionImpl', null); // remove this later!
737 $def = $config-&gt;getHTMLDefinition(true);
738 $def-&gt;addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(
739 array('_blank','_self','_target','_top')
741 <strong>$form = $def-&gt;addElement(
742 'form', // name
743 'Block', // content set
744 'Flow', // allowed children
745 'Common', // attribute collection
746 array( // attributes
747 'action*' => 'URI',
748 'method' => 'Enum#get|post',
749 'name' => 'ID'
752 $form-&gt;excludes = array('form' => true);</strong></pre>
755 Each of the parameters corresponds to one of the questions we asked.
756 Notice that we added an asterisk to the end of the <code>action</code>
757 attribute to indicate that it is required. If someone specifies a
758 <code>form</code> without that attribute, the tag will be axed.
759 Also, the extra line at the end is a special extra declaration that
760 prevents forms from being nested within each other.
761 </p>
764 And that's all there is to it! Implementing the rest of the form
765 module is left as an exercise to the user; to see more examples
766 check the <a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/HTMLModule"><code>library/HTMLPurifier/HTMLModule/</code></a> directory
767 in your local HTML Purifier installation.
768 </p>
770 <h2>And beyond...</h2>
773 Perceptive users may have realized that, to a certain extent, we
774 have simply re-implemented the facilities of XML Schema or the
775 Document Type Definition. What you are seeing here, however, is
776 not just an XML Schema or Document Type Definition: it is a fully
777 expressive method of specifying the definition of HTML that is
778 a portable superset of the capabilities of the two above-mentioned schema
779 languages. What makes HTMLDefinition so powerful is the fact that
780 if we don't have an implementation for a content model or an attribute
781 definition, you can supply it yourself by writing a PHP class.
782 </p>
785 There are many facets of HTMLDefinition beyond the Advanced API I have
786 walked you through today. To find out more about these, you can
787 check out these source files:
788 </p>
790 <ul>
791 <li><a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/HTMLModule.php"><code>library/HTMLPurifier/HTMLModule.php</code></a></li>
792 <li><a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ElementDef.php"><code>library/HTMLPurifier/ElementDef.php</code></a></li>
793 </ul>
795 </body></html>
797 <!-- vim: et sw=4 sts=4