From f922dbcdd393d98c17501f4ba163a7d8d4e91e60 Mon Sep 17 00:00:00 2001 From: "Edward Z. Yang" Date: Sun, 25 Mar 2007 01:38:20 +0000 Subject: [PATCH] Add automatic leading paragraph marking, remove manual class="lead" declarations. git-svn-id: http://htmlpurifier.org/svnroot@890 48356398-32a2-884e-a903-53898d9a118a --- comparison.xhtml | 26 ++++++++-------- index.xhtml | 36 +++++++++++----------- .../XHTMLCompiler/DOMFilter/MarkLeadParagraphs.php | 27 ++++++++++++++++ xhtml-compiler/config.filters.php | 1 + .../smoketests/DOMFilter/MarkLeadParagraphs.xhtml | 23 ++++++++++++++ 5 files changed, 82 insertions(+), 31 deletions(-) create mode 100644 xhtml-compiler/XHTMLCompiler/DOMFilter/MarkLeadParagraphs.php create mode 100644 xhtml-compiler/smoketests/DOMFilter/MarkLeadParagraphs.xhtml diff --git a/comparison.xhtml b/comparison.xhtml index 3d32e42..fae010f 100644 --- a/comparison.xhtml +++ b/comparison.xhtml @@ -18,7 +18,7 @@
-

With the advent of +

With the advent of Web 2.0, the end user has gone from passive consumer to active producer of content on the World Wide Web. Wikis, @@ -62,7 +62,7 @@ disclaimer:

Summary

-

A table summarizing the differences for the impatient.

+

A table summarizing the differences for the impatient.

@@ -174,7 +174,7 @@ disclaimer:

-

HTML Tidy is omitted from this list because it is not an HTML +

HTML Tidy is omitted from this list because it is not an HTML filter.

Look Ma, No HTML!

@@ -187,7 +187,7 @@ filter.

— Albert Einstein
-

Before we jump into the weird and not-so-wonderful world +

Before we jump into the weird and not-so-wonderful world of HTML filters, we must first consider another domain: non-HTML markup libraries. While libraries of this type really shouldn't be considered HTML filters, @@ -254,7 +254,7 @@ security.

Simplicity

-

HTML +

HTML source code is often criticized for being difficult to read. For example, compare:

@@ -328,7 +328,7 @@ their table markup is extraordinarily complex).

Security

-

BBCode can be boiled down to a wanna-be version of +

BBCode can be boiled down to a wanna-be version of HTML. I mean, replacing the angled brackets with square brackets and omitting the occasional parameter name? How much more un-original can you get? Somehow, I don't think BBCode @@ -369,7 +369,7 @@ using these languages?

HTML Tidy

-

Dave Raggett's +

Dave Raggett's HTML Tidy is a program; neat enough, at least, to make it into PHP as a PECL extension. @@ -427,7 +427,7 @@ in the standards-compliance department though.

Validates attributes No -

The PHP function +

The PHP function striptags() is the classic solution for attempting to clean up HTML. It @@ -444,7 +444,7 @@ used.

PHP Input Filter

-

Though its title may not imply it, +

Though its title may not imply it, PHP Input Filter is a souped up version of striptags() with the ability to inspect attributes. (Don't mind the hastily tacked on query escaping function).

@@ -484,7 +484,7 @@ spaces stripped out of them. Stay away, stay away!

HTML_Safe/SafeHTML

-

HTML_Safe is +

HTML_Safe is PEAR's HTML filtering library. It should be noted that this is the same library as SafeHTML, though with different @@ -531,7 +531,7 @@ didn't blacklist.

kses

-

kses appears to +

kses appears to be the de-facto solution for cleaning HTML, having found its way into applications such as WordPress and being the number one search result for php html filter.

@@ -594,7 +594,7 @@ the whitelist to filter HTML.

Safe HTML Checker

-

+

Safe HTML Checker is (to my knowledge) the first attempt to make a filter that also outputs standards-compliant XHTML. It wasn't even released or @@ -649,7 +649,7 @@ matched up the tags for them.

Standards safe Yes -

That table should say it all, but I'll add a few more features:

+

That table should say it all, but I'll add a few more features:

diff --git a/index.xhtml b/index.xhtml index edddfdb..cb0d474 100644 --- a/index.xhtml +++ b/index.xhtml @@ -42,7 +42,7 @@ Download HTML Purifier -

HTML Purifier is a standards-compliant +

HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, @@ -67,7 +67,7 @@ application you're building? HTML Purifier is for you!

Background

-

There are a number of open-source HTML filtering solutions out +

There are a number of open-source HTML filtering solutions out there on the web already (i.e. PEAR's HTML_Safe, @@ -108,7 +108,7 @@ HTML Purifier's comprehensive algorithms are complemented by a breadth of knowledge, ensuring that richly formatted documents pass through unstripped.

-Compare HTML Purifier with other filters +

Compare HTML Purifier with other filters

To my knowledge, there is nothing else in the wild that offers protection from XSS, standards-compliance, and the @@ -142,7 +142,7 @@ out there.

Sat, 24 March 2007 20:27:42 EDT
-

At the prompting of Lars Olesen, HTML Purifier now +

At the prompting of Lars Olesen, HTML Purifier now has its very own PEAR channel. This means that installing HTML Purifier is as simple as:

pear channel-discover hp.jpsband.org
@@ -155,7 +155,7 @@ pear install hp/HTMLPurifier
Fri, 23 March 2007 22:42:12 EDT
-

The 1.5.0 major bugfix +

The 1.5.0 major bugfix release is available today. There have been some major internal refactoring efforts, but these changes are invisible to you.

@@ -193,7 +193,7 @@ pear install hp/HTMLPurifier XML extension was loaded now fixed.
  • Youtube filter regexp now multiline.
  • -

    ...as well as an assortment of some code refactoring (all +

    ...as well as an assortment of some code refactoring (all bugfixes are covered above). See News for a complete changelog.

    @@ -206,7 +206,7 @@ pear install hp/HTMLPurifier
    Sat, 17 March 2007 5:42:12 EDT
    -

    We have a shiny new RSS feed +

    We have a shiny new RSS feed at news.rss, which is hooked up to this news feed. Subscribe for release notifications as well as random news about HTML Purifier.

    @@ -218,7 +218,7 @@ pear install hp/HTMLPurifier
    Wed, 14 March 2007 5:31:46 EDT
    -

    Quick update on the status of version 1.5. The flagship +

    Quick update on the status of version 1.5. The flagship new feature of this release is to be an advanced system for selecting and creating elements and attributes. You can view the projected @@ -237,7 +237,7 @@ pear install hp/HTMLPurifier

    -

    Here's a tutorial on Here's a tutorial on HTML Purifier and UTF-8 character encoding issues. It discusses how to figure out your character encoding, why you should @@ -250,7 +250,7 @@ pear install hp/HTMLPurifier

    Plugins

    -

    HTML Purifier is a great library to integrate with existing +

    HTML Purifier is a great library to integrate with existing CMSes and other applications or WYSIWYG editors. Currently, we have plugins for:

    @@ -270,12 +270,12 @@ for:

    -

    Plugins for other major applications gladly accepted!

    +

    Plugins for other major applications gladly accepted!

    Demo

    -

    Enter your HTML and see how it will be filtered!

    +

    Enter your HTML and see how it will be filtered!

    HTML Purifier Input @@ -287,7 +287,7 @@ for:

    -

    ...or try these sample inputs:

    +

    ...or try these sample inputs:

    • Malicious code removed
    • @@ -300,7 +300,7 @@ for:

      Download

      -

      The current version is +

      The current version is 1.5.0. Pick your distribution:

      -

      The PHP5-strict version is exactly the same +

      The PHP5-strict version is exactly the same as the regular version with a few tweaks to prevent it from complaining with E_STRICT @@ -354,7 +354,7 @@ here (0x869C48DA). My key's fingerprint is:

      gpg --verify $filename.sig
      -

      You can be notified of new releases by a low-traffic announce list. Subscribe +

      You can be notified of new releases by a low-traffic announce list. Subscribe here:

      @@ -399,7 +399,7 @@ here:

      Spread the Word!

      -

      Help spread awareness about HTML Purifier by:

      +

      Help spread awareness about HTML Purifier by:

      • Contact -

        You can send me an email at +

        You can send me an email at htmlpurifier@jpsband.org. However, I prefer that you use the forums for asking general support questions (response time will be the same, I promise!) diff --git a/xhtml-compiler/XHTMLCompiler/DOMFilter/MarkLeadParagraphs.php b/xhtml-compiler/XHTMLCompiler/DOMFilter/MarkLeadParagraphs.php new file mode 100644 index 0000000..96c9e62 --- /dev/null +++ b/xhtml-compiler/XHTMLCompiler/DOMFilter/MarkLeadParagraphs.php @@ -0,0 +1,27 @@ +className = $class; + } + + public function process(DOMDocument $dom, $page) { + $nodes = $this->query("//html:p[local-name(preceding-sibling::*[1])!='p']"); + foreach ($nodes as $node) { + $node->setAttribute('class', $this->className); + } + + } + +} + +?> \ No newline at end of file diff --git a/xhtml-compiler/config.filters.php b/xhtml-compiler/config.filters.php index 8fdac3f..30d21b7 100644 --- a/xhtml-compiler/config.filters.php +++ b/xhtml-compiler/config.filters.php @@ -8,5 +8,6 @@ $filters->addDOMFilter('Quoter'); $filters->addDOMFilter('RSSGenerator'); $filters->addDOMFilter('AbsolutePath'); $filters->addDOMFilter('IEConditionalComments'); +$filters->addDOMFilter('MarkLeadParagraphs'); ?> \ No newline at end of file diff --git a/xhtml-compiler/smoketests/DOMFilter/MarkLeadParagraphs.xhtml b/xhtml-compiler/smoketests/DOMFilter/MarkLeadParagraphs.xhtml new file mode 100644 index 0000000..758d3ce --- /dev/null +++ b/xhtml-compiler/smoketests/DOMFilter/MarkLeadParagraphs.xhtml @@ -0,0 +1,23 @@ + + + + + DOMFilter/MarkLeadParagraphs.html + + + +

        DOMFilter/MarkLeadParagraphs.html

        +

        This is a lead paragraph.

        +

        This is not.

        +
          +
        • Cool a list!
        • +
        +

        This is a lead paragraph.

        +
        +

        Yup, still a lead paragraph.

        +
        +

        Expect: All lead-paragraphs are in bold (including this one).

        + + \ No newline at end of file -- 2.11.4.GIT
    UTF-8 awareYes