Release 2.0.1, merged in 1181 to HEAD.
[htmlpurifier.git] / library / HTMLPurifier / HTMLDefinition.php
blob9ed413c7c92e70b997ff7bb37826de52591aadcb
1 <?php
3 require_once 'HTMLPurifier/Definition.php';
4 require_once 'HTMLPurifier/HTMLModuleManager.php';
6 // this definition and its modules MUST NOT define configuration directives
7 // outside of the HTML or Attr namespaces
9 HTMLPurifier_ConfigSchema::define(
10 'HTML', 'DefinitionID', null, 'string/null', '
11 <p>
12 Unique identifier for a custom-built HTML definition. If you edit
13 the raw version of the HTMLDefinition, introducing changes that the
14 configuration object does not reflect, you must specify this variable.
15 If you change your custom edits, you should change this directive, or
16 clear your cache. Example:
17 </p>
18 <pre>
19 $config = HTMLPurifier_Config::createDefault();
20 $config->set(\'HTML\', \'DefinitionID\', \'1\');
21 $def = $config->getHTMLDefinition();
22 $def->addAttribute(\'a\', \'tabindex\', \'Number\');
23 </pre>
24 <p>
25 In the above example, the configuration is still at the defaults, but
26 using the advanced API, an extra attribute has been added. The
27 configuration object normally has no way of knowing that this change
28 has taken place, so it needs an extra directive: %HTML.DefinitionID.
29 If someone else attempts to use the default configuration, these two
30 pieces of code will not clobber each other in the cache, since one has
31 an extra directive attached to it.
32 </p>
33 <p>
34 This directive has been available since 2.0.0, and in that version or
35 later you <em>must</em> specify a value to this directive to use the
36 advanced API features.
37 </p>
38 ');
40 HTMLPurifier_ConfigSchema::define(
41 'HTML', 'DefinitionRev', 1, 'int', '
42 <p>
43 Revision identifier for your custom definition specified in
44 %HTML.DefinitionID. This serves the same purpose: uniquely identifying
45 your custom definition, but this one does so in a chronological
46 context: revision 3 is more up-to-date then revision 2. Thus, when
47 this gets incremented, the cache handling is smart enough to clean
48 up any older revisions of your definition as well as flush the
49 cache. This directive has been available since 2.0.0.
50 </p>
51 ');
53 HTMLPurifier_ConfigSchema::define(
54 'HTML', 'BlockWrapper', 'p', 'string', '
55 <p>
56 String name of element to wrap inline elements that are inside a block
57 context. This only occurs in the children of blockquote in strict mode.
58 </p>
59 <p>
60 Example: by default value,
61 <code>&lt;blockquote&gt;Foo&lt;/blockquote&gt;</code> would become
62 <code>&lt;blockquote&gt;&lt;p&gt;Foo&lt;/p&gt;&lt;/blockquote&gt;</code>.
63 The <code>&lt;p&gt;</code> tags can be replaced with whatever you desire,
64 as long as it is a block level element. This directive has been available
65 since 1.3.0.
66 </p>
67 ');
69 HTMLPurifier_ConfigSchema::define(
70 'HTML', 'Parent', 'div', 'string', '
71 <p>
72 String name of element that HTML fragment passed to library will be
73 inserted in. An interesting variation would be using span as the
74 parent element, meaning that only inline tags would be allowed.
75 This directive has been available since 1.3.0.
76 </p>
77 ');
79 HTMLPurifier_ConfigSchema::define(
80 'HTML', 'AllowedElements', null, 'lookup/null', '
81 <p>
82 If HTML Purifier\'s tag set is unsatisfactory for your needs, you
83 can overload it with your own list of tags to allow. Note that this
84 method is subtractive: it does its job by taking away from HTML Purifier
85 usual feature set, so you cannot add a tag that HTML Purifier never
86 supported in the first place (like embed, form or head). If you
87 change this, you probably also want to change %HTML.AllowedAttributes.
88 </p>
89 <p>
90 <strong>Warning:</strong> If another directive conflicts with the
91 elements here, <em>that</em> directive will win and override.
92 This directive has been available since 1.3.0.
93 </p>
94 ');
96 HTMLPurifier_ConfigSchema::define(
97 'HTML', 'AllowedAttributes', null, 'lookup/null', '
98 <p>
99 If HTML Purifier\'s attribute set is unsatisfactory, overload it!
100 The syntax is "tag.attr" or "*.attr" for the global attributes
101 (style, id, class, dir, lang, xml:lang).
102 </p>
104 <strong>Warning:</strong> If another directive conflicts with the
105 elements here, <em>that</em> directive will win and override. For
106 example, %HTML.EnableAttrID will take precedence over *.id in this
107 directive. You must set that directive to true before you can use
108 IDs at all. This directive has been available since 1.3.0.
109 </p>
112 HTMLPurifier_ConfigSchema::define(
113 'HTML', 'Allowed', null, 'string/null', '
115 This is a convenience directive that rolls the functionality of
116 %HTML.AllowedElements and %HTML.AllowedAttributes into one directive.
117 Specify elements and attributes that are allowed using:
118 <code>element1[attr1|attr2],element2...</code>.
119 </p>
121 <strong>Warning</strong>:
122 All of the constraints on the component directives are still enforced.
123 The syntax is a <em>subset</em> of TinyMCE\'s <code>valid_elements</code>
124 whitelist: directly copy-pasting it here will probably result in
125 broken whitelists. If %HTML.AllowedElements or %HTML.AllowedAttributes
126 are set, this directive has no effect.
127 This directive has been available since 2.0.0.
128 </p>
132 * Definition of the purified HTML that describes allowed children,
133 * attributes, and many other things.
135 * Conventions:
137 * All member variables that are prefixed with info
138 * (including the main $info array) are used by HTML Purifier internals
139 * and should not be directly edited when customizing the HTMLDefinition.
140 * They can usually be set via configuration directives or custom
141 * modules.
143 * On the other hand, member variables without the info prefix are used
144 * internally by the HTMLDefinition and MUST NOT be used by other HTML
145 * Purifier internals. Many of them, however, are public, and may be
146 * edited by userspace code to tweak the behavior of HTMLDefinition.
148 * @note This class is inspected by Printer_HTMLDefinition; please
149 * update that class if things here change.
151 class HTMLPurifier_HTMLDefinition extends HTMLPurifier_Definition
154 // FULLY-PUBLIC VARIABLES ---------------------------------------------
157 * Associative array of element names to HTMLPurifier_ElementDef
158 * @public
160 var $info = array();
163 * Associative array of global attribute name to attribute definition.
164 * @public
166 var $info_global_attr = array();
169 * String name of parent element HTML will be going into.
170 * @public
172 var $info_parent = 'div';
175 * Definition for parent element, allows parent element to be a
176 * tag that's not allowed inside the HTML fragment.
177 * @public
179 var $info_parent_def;
182 * String name of element used to wrap inline elements in block context
183 * @note This is rarely used except for BLOCKQUOTEs in strict mode
184 * @public
186 var $info_block_wrapper = 'p';
189 * Associative array of deprecated tag name to HTMLPurifier_TagTransform
190 * @public
192 var $info_tag_transform = array();
195 * Indexed list of HTMLPurifier_AttrTransform to be performed before validation.
196 * @public
198 var $info_attr_transform_pre = array();
201 * Indexed list of HTMLPurifier_AttrTransform to be performed after validation.
202 * @public
204 var $info_attr_transform_post = array();
207 * Nested lookup array of content set name (Block, Inline) to
208 * element name to whether or not it belongs in that content set.
209 * @public
211 var $info_content_sets = array();
214 * Doctype object
216 var $doctype;
220 // RAW CUSTOMIZATION STUFF --------------------------------------------
223 * Adds a custom attribute to a pre-existing element
224 * @param $element_name String element name to add attribute to
225 * @param $attr_name String name of attribute
226 * @param $def Attribute definition, can be string or object, see
227 * HTMLPurifier_AttrTypes for details
229 function addAttribute($element_name, $attr_name, $def) {
230 $module =& $this->getAnonymousModule();
231 $element =& $module->addBlankElement($element_name);
232 $element->attr[$attr_name] = $def;
236 * Adds a custom element to your HTML definition
237 * @note See HTMLPurifier_HTMLModule::addElement for detailed
238 * parameter descriptions.
240 function addElement($element_name, $type, $contents, $attr_collections, $attributes) {
241 $module =& $this->getAnonymousModule();
242 // assume that if the user is calling this, the element
243 // is safe. This may not be a good idea
244 $module->addElement($element_name, true, $type, $contents, $attr_collections, $attributes);
248 * Retrieves a reference to the anonymous module, so you can
249 * bust out advanced features without having to make your own
250 * module.
252 function &getAnonymousModule() {
253 if (!$this->_anonModule) {
254 $this->_anonModule = new HTMLPurifier_HTMLModule();
255 $this->_anonModule->name = 'Anonymous';
257 return $this->_anonModule;
260 var $_anonModule;
263 // PUBLIC BUT INTERNAL VARIABLES --------------------------------------
265 var $type = 'HTML';
266 var $manager; /**< Instance of HTMLPurifier_HTMLModuleManager */
269 * Performs low-cost, preliminary initialization.
271 function HTMLPurifier_HTMLDefinition() {
272 $this->manager = new HTMLPurifier_HTMLModuleManager();
275 function doSetup($config) {
276 $this->processModules($config);
277 $this->setupConfigStuff($config);
278 unset($this->manager);
280 // cleanup some of the element definitions
281 foreach ($this->info as $k => $v) {
282 unset($this->info[$k]->content_model);
283 unset($this->info[$k]->content_model_type);
288 * Extract out the information from the manager
290 function processModules($config) {
292 if ($this->_anonModule) {
293 // for user specific changes
294 // this is late-loaded so we don't have to deal with PHP4
295 // reference wonky-ness
296 $this->manager->addModule($this->_anonModule);
297 unset($this->_anonModule);
300 $this->manager->setup($config);
301 $this->doctype = $this->manager->doctype;
303 foreach ($this->manager->modules as $module) {
304 foreach($module->info_tag_transform as $k => $v) {
305 if ($v === false) unset($this->info_tag_transform[$k]);
306 else $this->info_tag_transform[$k] = $v;
308 foreach($module->info_attr_transform_pre as $k => $v) {
309 if ($v === false) unset($this->info_attr_transform_pre[$k]);
310 else $this->info_attr_transform_pre[$k] = $v;
312 foreach($module->info_attr_transform_post as $k => $v) {
313 if ($v === false) unset($this->info_attr_transform_post[$k]);
314 else $this->info_attr_transform_post[$k] = $v;
318 $this->info = $this->manager->getElements();
319 $this->info_content_sets = $this->manager->contentSets->lookup;
324 * Sets up stuff based on config. We need a better way of doing this.
326 function setupConfigStuff($config) {
328 $block_wrapper = $config->get('HTML', 'BlockWrapper');
329 if (isset($this->info_content_sets['Block'][$block_wrapper])) {
330 $this->info_block_wrapper = $block_wrapper;
331 } else {
332 trigger_error('Cannot use non-block element as block wrapper.',
333 E_USER_ERROR);
336 $parent = $config->get('HTML', 'Parent');
337 $def = $this->manager->getElement($parent, true);
338 if ($def) {
339 $this->info_parent = $parent;
340 $this->info_parent_def = $def;
341 } else {
342 trigger_error('Cannot use unrecognized element as parent.',
343 E_USER_ERROR);
344 $this->info_parent_def = $this->manager->getElement($this->info_parent, true);
347 // support template text
348 $support = "(for information on implementing this, see the ".
349 "support forums) ";
351 // setup allowed elements
353 $allowed_elements = $config->get('HTML', 'AllowedElements');
354 $allowed_attributes = $config->get('HTML', 'AllowedAttributes');
356 if (!is_array($allowed_elements) && !is_array($allowed_attributes)) {
357 $allowed = $config->get('HTML', 'Allowed');
358 if (is_string($allowed)) {
359 list($allowed_elements, $allowed_attributes) = $this->parseTinyMCEAllowedList($allowed);
363 if (is_array($allowed_elements)) {
364 foreach ($this->info as $name => $d) {
365 if(!isset($allowed_elements[$name])) unset($this->info[$name]);
366 unset($allowed_elements[$name]);
368 // emit errors
369 foreach ($allowed_elements as $element => $d) {
370 $element = htmlspecialchars($element);
371 trigger_error("Element '$element' is not supported $support", E_USER_WARNING);
375 $allowed_attributes_mutable = $allowed_attributes; // by copy!
376 if (is_array($allowed_attributes)) {
377 foreach ($this->info_global_attr as $attr_key => $info) {
378 if (!isset($allowed_attributes["*.$attr_key"])) {
379 unset($this->info_global_attr[$attr_key]);
380 } elseif (isset($allowed_attributes_mutable["*.$attr_key"])) {
381 unset($allowed_attributes_mutable["*.$attr_key"]);
384 foreach ($this->info as $tag => $info) {
385 foreach ($info->attr as $attr => $attr_info) {
386 if (!isset($allowed_attributes["$tag.$attr"]) &&
387 !isset($allowed_attributes["*.$attr"])) {
388 unset($this->info[$tag]->attr[$attr]);
389 } else {
390 if (isset($allowed_attributes_mutable["$tag.$attr"])) {
391 unset($allowed_attributes_mutable["$tag.$attr"]);
392 } elseif (isset($allowed_attributes_mutable["*.$attr"])) {
393 unset($allowed_attributes_mutable["*.$attr"]);
398 // emit errors
399 foreach ($allowed_attributes_mutable as $elattr => $d) {
400 list($element, $attribute) = explode('.', $elattr);
401 $element = htmlspecialchars($element);
402 $attribute = htmlspecialchars($attribute);
403 if ($element == '*') {
404 trigger_error("Global attribute '$attribute' is not ".
405 "supported in any elements $support",
406 E_USER_WARNING);
407 } else {
408 trigger_error("Attribute '$attribute' in element '$element' not supported $support",
409 E_USER_WARNING);
417 * Parses a TinyMCE-flavored Allowed Elements and Attributes list into
418 * separate lists for processing. Format is element[attr1|attr2],element2...
419 * @warning Although it's largely drawn from TinyMCE's implementation,
420 * it is different, and you'll probably have to modify your lists
421 * @param $list String list to parse
422 * @param array($allowed_elements, $allowed_attributes)
424 function parseTinyMCEAllowedList($list) {
426 $elements = array();
427 $attributes = array();
429 $chunks = explode(',', $list);
430 foreach ($chunks as $chunk) {
431 // remove TinyMCE element control characters
432 if (!strpos($chunk, '[')) {
433 $element = $chunk;
434 $attr = false;
435 } else {
436 list($element, $attr) = explode('[', $chunk);
438 if ($element !== '*') $elements[$element] = true;
439 if (!$attr) continue;
440 $attr = substr($attr, 0, strlen($attr) - 1); // remove trailing ]
441 $attr = explode('|', $attr);
442 foreach ($attr as $key) {
443 $attributes["$element.$key"] = true;
447 return array($elements, $attributes);