From 77b60a4206db5e0d854b47ad4985773381d8ebb1 Mon Sep 17 00:00:00 2001
From: "Edward Z. Yang"
- Warning: This document may be out-of-date. When in doubt,
- consult the source code documentation.
- HTML Purifier currently natively supports only a subset of HTML's
-allowed elements, attributes, and behavior; specifically, this subset
-is the set of elements that are safe for untrusted users to use.
-However, HTML Purifier is often utilized to ensure standards-compliance
-from input that is trusted (making it a sort of Tidy substitute),
-and often users need to define new elements or attributes. The
-advanced API is oriented specifically for these use-cases. Our goals are to let the user: For basic use, the user will have to specify some basic parameters. This
-is not strictly necessary, as HTML Purifier's default setting will always
-output safe code, but is required for standards-compliant output. The first thing to select is the doctype. This
-is essential for standards-compliant output. This identifier is based
-on the name the W3C has given to the document type and not
-the DTD identifier. This parameter is set via the configuration object: Due to historical reasons, the default doctype is XHTML 1.0
-Transitional, however, we really shouldn't be guessing what the user's
-doctype is. Fortunantely, people who can't be bothered to set this won't
-be bothered when their pages stop validating. HTML Purifier will, by default, allow as many elements and attributes
-as possible. However, a user may decide to roll their own filterset by
-selecting modules, elements and attributes to allow for their own
-specific use-case. This can be done using %HTML.Allowed: The directive %HTML.Allowed is a convenience feature
-that may be fully expressed with the legacy interface. We currently support another interface from older versions: A user may also choose to allow modules using a specialized
-directive: But it is not expected that this feature will be widely used. Module selection will work slightly differently
-from the other AllowedElements and AllowedAttributes directives by
-directly modifying the doctype you are operating in, in the spirit of
-XHTML 1.1's modularization. We stop users from shooting themselves in the
-foot by mandating the modules in %HTML.CoreModules be used. Modules are distinguished from regular elements by the
-case of their first letter. While XML distinguishes between and allows
-lower and uppercase letters in element names, XHTML uses only lower-case
-element names for sake of consistency. The name of this segment of functionality is inspired off of Dave
-Ragget's program HTML Tidy, which purported to help clean up HTML. In
-HTML Purifier, Tidy functionality involves turning unsupported and
-deprecated elements into standards-compliant ones, maintaining
-backwards compatibility, and enforcing best practices. This is a complicated feature, and is explained more in depth at
-the Tidy documentation page. By reviewing topic posts in the support forum, we determined that
-there were two primarily demanded customization features people wanted:
-to add an attribute to an existing element, and to add an element.
-Thus, we'll want to create convenience functions for these common
-use-cases. Note that the functions described here are only available if
-a raw copy of An attribute is bound to an element by a name and has a specific
- Example of the functionality in action: The An element requires certain information as specified by
- This suggests an API like this: Each parameter explained in depth: A possible usage: See
+ Please see Customize!
+Advanced API
-
-
-
-
-
-
-
Select
-
-Selecting a Doctype
-
-$config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional');
-
-Selecting Elements / Attributes / Modules
-
-$config->set('HTML', 'Allowed', 'a[href|title],em,p,blockquote');
-
-$config->set('HTML', 'AllowedElements', 'a,em,p,blockquote');
-$config->set('HTML', 'AllowedAttributes', 'a.href,a.title');
-
-$config->set('HTML', 'AllowedModules', 'Hypertext,Text,Lists');
-
-Selecting Tidy
-
-Customize
-
-HTMLPurifier_HTMLDefinition
was retrieved.
-Furthermore, caching may prevent your changes from immediately
-being seen: consult enduser-customize.html on how
-to work around this.Attributes
-
-AttrDef
that validates it. The interface is therefore:function addAttribute($element, $attribute, $attribute_def);
-
-$def->addAttribute('a', 'rel', 'Enum#nofollow');
-
-$attribute_def
value is flexible,
-to make things simpler. It can be a literal object or:
-
-
-
-HTMLPurifier_AttrTypes
- to resolve it for you. Any data that follows a hash mark (#) will
- be used to customize the attribute type: in the example above,
- we specify which values for Enum to allow.Elements
-
-HTMLPurifier_ElementDef
. However, not all of it is necessary,
-the usual things required are:
-
-
-function addElement($element, $type, $contents,
- $attr_collections = array(); $attributes = array());
-
-
-
-
-$element
$type
$contents
HTMLPurifier_ElementDef
's member variables
- $content_model
and $content_model_type
,
- where the form is Type: Model
, ex. 'Optional: Inline'.
- There are also a number of predefined templates one may use.$attr_collections
$attributes
$def->addElement('font', 'Inline', 'Optional: Inline', 'Common',
- array('color' => 'Color'));
-
-HTMLPurifier/HTMLModule.php
for details.Advanced API
+
+
$config = HTMLPurifier_Config::createDefault(); -$config->set('HTML', 'DefinitionID', 'enduser-customize.html tutorial'); -$config->set('HTML', 'DefinitionRev', 1); -$def = $config->getHTMLDefinition(true);+$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial'); +$config->set('HTML.DefinitionRev', 1); +$def = $config->getHTMLDefinition(true);
Assuming that HTML Purifier has already been properly loaded (hint: @@ -210,10 +210,10 @@ $def = $config->getHTMLDefinition(true);
$config = HTMLPurifier_Config::createDefault(); -$config->set('HTML', 'DefinitionID', 'enduser-customize.html tutorial'); -$config->set('HTML', 'DefinitionRev', 1); -$config->set('Cache', 'DefinitionImpl', null); // remove this later! -$def = $config->getHTMLDefinition(true);+$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial'); +$config->set('HTML.DefinitionRev', 1); +$config->set('Cache.DefinitionImpl', null); // TODO: remove this later! +$def = $config->getHTMLDefinition(true);
A few things should be mentioned about the caching mechanism before @@ -266,10 +266,10 @@ $def = $config->getHTMLDefinition(true);
$config = HTMLPurifier_Config::createDefault(); -$config->set('HTML', 'DefinitionID', 'enduser-customize.html tutorial'); -$config->set('HTML', 'DefinitionRev', 1); -$config->set('Cache', 'DefinitionImpl', null); // remove this later! -$def = $config->getHTMLDefinition(true); +$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial'); +$config->set('HTML.DefinitionRev', 1); +$config->set('Cache.DefinitionImpl', null); // remove this later! +$def = $config->getHTMLDefinition(true); $def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');
@@ -384,11 +384,11 @@ $def = $config->getHTMLDefinition(true);
$config = HTMLPurifier_Config::createDefault(); -$config->set('HTML', 'DefinitionID', 'enduser-customize.html tutorial'); -$config->set('HTML', 'DefinitionRev', 1); -$config->set('Cache', 'DefinitionImpl', null); // remove this later! -$def = $config->getHTMLDefinition(true); -$def->addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum( +$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial'); +$config->set('HTML.DefinitionRev', 1); +$config->set('Cache.DefinitionImpl', null); // remove this later! +$def = $config->getHTMLDefinition(true); +$def->addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum( array('_blank','_self','_target','_top') ));@@ -731,14 +731,14 @@ $def = $config->getHTMLDefinition(true);
$config = HTMLPurifier_Config::createDefault(); -$config->set('HTML', 'DefinitionID', 'enduser-customize.html tutorial'); -$config->set('HTML', 'DefinitionRev', 1); -$config->set('Cache', 'DefinitionImpl', null); // remove this later! -$def = $config->getHTMLDefinition(true); -$def->addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum( +$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial'); +$config->set('HTML.DefinitionRev', 1); +$config->set('Cache.DefinitionImpl', null); // remove this later! +$def = $config->getHTMLDefinition(true); +$def->addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum( array('_blank','_self','_target','_top') )); -$form = $def->addElement( +$form = $def->addElement( 'form', // name 'Block', // content set 'Flow', // allowed children @@ -749,7 +749,7 @@ $def->addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum( 'name' => 'ID' ) ); -$form->excludes = array('form' => true);+$form->excludes = array('form' => true);
Each of the parameters corresponds to one of the questions we asked. diff --git a/docs/enduser-id.html b/docs/enduser-id.html index 808e2129..53d2da24 100644 --- a/docs/enduser-id.html +++ b/docs/enduser-id.html @@ -31,7 +31,7 @@ by default.
IDs, however, are quite useful functionality to have, so if users start complaining about broken anchors you'll probably want to turn them back on -with %HTML.EnableAttrID. But before you go mucking around with the config +with %Attr.EnableID. But before you go mucking around with the config object, it's probably worth to take some precautions to keep your page validating. Why?
@@ -56,8 +56,8 @@ validating. Why? deal with the most obvious solution: preventing users from using any IDs that appear elsewhere on the document. The method is simple: -$config->set('HTML', 'EnableAttrID', true); -$config->set('Attr', 'IDBlacklist' array( +$config->set('Attr.EnableID', true); +$config->set('Attr.IDBlacklist' array( 'list', 'of', 'attribute', 'values', 'that', 'are', 'forbidden' ));@@ -88,8 +88,8 @@ all, they might have simply specified a duplicate ID by accident.This method, too, is quite simple: add a prefix to all user IDs. With this code:
-$config->set('HTML', 'EnableAttrID', true); -$config->set('Attr', 'IDPrefix', 'user_');+$config->set('Attr.EnableID', true); +$config->set('Attr.IDPrefix', 'user_');...this:
@@ -109,7 +109,7 @@ user_ to the beginning." nothing about multiple HTML Purifier outputs on one page. Thus, we have a second configuration value to piggy-back off of: %Attr.IDPrefixLocal: -$config->set('Attr', 'IDPrefixLocal', 'comment' . $id . '_');+$config->set('Attr.IDPrefixLocal', 'comment' . $id . '_');This new attributes does nothing but append on to regular IDPrefix, but is special in that it is volatile: it's value is determined at run-time and @@ -137,7 +137,7 @@ anchors is beyond me.
To revert back to pre-1.2.0 behavior, simply:
-$config->set('HTML', 'EnableAttrID', true);+$config->set('Attr.EnableID', true);Don't come crying to me when your page mysteriously stops validating, though.
diff --git a/docs/enduser-tidy.html b/docs/enduser-tidy.html index 1721c717..a243f7fc 100644 --- a/docs/enduser-tidy.html +++ b/docs/enduser-tidy.html @@ -76,7 +76,7 @@ associated with it, although it may change depending on your doctype. change the level of cleaning by setting the %HTML.TidyLevel configuration directive: -$config->set('HTML', 'TidyLevel', 'heavy'); // burn baby burn!+$config->set('HTML.TidyLevel', 'heavy'); // burn baby burn!Is the light level really light?
@@ -165,17 +165,17 @@ smoketest. so happy about the br@clear implementation. That's perfectly fine! HTML Purifier will make accomodations: -$config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional'); -$config->set('HTML', 'TidyLevel', 'heavy'); // all changes, minus... -$config->set('HTML', 'TidyRemove', 'br@clear');+$config->set('HTML.Doctype', 'XHTML 1.0 Transitional'); +$config->set('HTML.TidyLevel', 'heavy'); // all changes, minus... +$config->set('HTML.TidyRemove', 'br@clear');That third line does the magic, removing the br@clear fix from the module, ensuring that
-<br clear="both" />
will pass through unharmed. The reverse is possible too:$config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional'); -$config->set('HTML', 'TidyLevel', 'none'); // no changes, plus... -$config->set('HTML', 'TidyAdd', 'p@align');+$config->set('HTML.Doctype', 'XHTML 1.0 Transitional'); +$config->set('HTML.TidyLevel', 'none'); // no changes, plus... +$config->set('HTML.TidyAdd', 'p@align');In this case, all transformations are shut off, except for the p@align one, which you found handy.
diff --git a/docs/enduser-youtube.html b/docs/enduser-youtube.html index aaf8bdbc..87a36b9a 100644 --- a/docs/enduser-youtube.html +++ b/docs/enduser-youtube.html @@ -75,7 +75,7 @@ passes through HTML Purifier unharmed.And the corresponding usage:
<?php - $config->set('Filter', 'YouTube', true); + $config->set('Filter.YouTube', true); ?>There is a bit going in the two code snippets, so let's explain.
-- 2.11.4.GIT