Dramatically rewrite null host URI handling.
commite76f4b45d0c906097a5f8019651386c79c7a402a
authorEdward Z. Yang <ezyang@mit.edu>
Tue, 25 Jan 2011 18:56:46 +0000 (25 18:56 +0000)
committerEdward Z. Yang <ezyang@mit.edu>
Tue, 25 Jan 2011 18:56:46 +0000 (25 18:56 +0000)
tree817ebb962385f32efaab6a3eb3c611ea88198963
parenta32d5b52e1483e47ecc17bcfaa691a39756c82d7
Dramatically rewrite null host URI handling.

Basically, browsers don't parse what should be valid URIs correctly, so
we have to go through some backbends to accomodate them.  Specifically,
for browseable URIs, the following URIs have unintended behavior:

    - ///example.com
    - http:/example.com
    - http:///example.com

Furthermore, if the path begins with //, modifying these URLs must
be done with care, as if you remove the host-name component, the
parse tree changes.

I've modified the engine to follow correct URI semantics as much
as possible while outputting browser compatible code, and invalidate
the URI in cases where we can't deal.  There has been a refactoring
of URIScheme so that this important check is always performed,
introducing a new member variable allow_empty_host which is true
on data, file, mailto and news schemes.

This also fixes bypass bugs on URI.Munge.

Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
15 files changed:
NEWS
library/HTMLPurifier/AttrDef/URI/Host.php
library/HTMLPurifier/URI.php
library/HTMLPurifier/URIScheme.php
library/HTMLPurifier/URIScheme/data.php
library/HTMLPurifier/URIScheme/file.php
library/HTMLPurifier/URIScheme/ftp.php
library/HTMLPurifier/URIScheme/http.php
library/HTMLPurifier/URIScheme/mailto.php
library/HTMLPurifier/URIScheme/news.php
library/HTMLPurifier/URIScheme/nntp.php
tests/HTMLPurifier/AttrDef/URITest.php
tests/HTMLPurifier/HTMLT/munge.htmlt [new file with mode: 0644]
tests/HTMLPurifier/URISchemeTest.php
tests/HTMLPurifier/URITest.php