TODO

   1
   2 TODO List
   3
   4 = KEY ====================
   5     # Flagship
   6     - Regular
   7     ? Maybe I'll Do It
   8 ==========================
   9
  10 1.7 release [Advanced API]
  11  # Complete advanced API, and fully document it
  12  # Implement all edge-case attribute transforms
  13  # Implement all deprecated tags and attributes
  14  - Parse TinyMCE-style whitelist into our %HTML.Allow* whitelists (possibly
  15    do this earlier)
  16  ? HTML interface for tweaking configuration to see changes
  17
  18 1.8 release [Refactor, refactor!]
  19  # URI validation routines tighter (see docs/dev-code-quality.html) (COMPLEX)
  20  # Advanced URI filtering schemes (see docs/proposal-new-directives.txt)
  21  - Configuration profiles: predefined directives set with one func call
  22  - Implement IDREF support (harder than it seems, since you cannot have
  23    IDREFs to non-existent IDs)
  24  - Allow non-ASCII characters in font names
  25
  26 1.9 release [Error'ed]
  27  # Error logging for filtering/cleanup procedures
  28     - Requires I18N facilities to be created first (COMPLEX)
  29  - XSS-attempt detection
  30  - More fine-grained control over escaping behavior
  31     - Silently drop content inbetween SCRIPT tags (can be generalized to allow
  32       specification of elements that, when detected as foreign, trigger removal
  33       of children, although unbalanced tags could wreck havoc (or at least
  34       delete the rest of the document)).
  35
  36 1.10 release [Do What I Mean, Not What I Say]
  37  # Additional support for poorly written HTML
  38     - Microsoft Word HTML cleaning (i.e. MsoNormal, but research essential!)
  39     - Friendly strict handling of <address> (block -> <br>)
  40  - Remove redundant tags, ex. <u><u>Underlined</u></u>. Implementation notes:
  41     1. Analyzing which tags to remove duplicants
  42     2. Ensure attributes are merged into the parent tag
  43     3. Extend the tag exclusion system to specify whether or not the
  44     contents should be dropped or not (currently, there's code that could do
  45     something like this if it didn't drop the inner text too.)
  46  - Remove <span> tags that don't do anything (no attributes)
  47  - Remove empty inline tags<i></i>
  48  - Append something to duplicate IDs so they're still usable (impl. note: the
  49    dupe detector would also need to detect the suffix as well)
  50
  51 2.0 release [Beyond HTML]
  52  # Legit token based CSS parsing (will require revamping almost every
  53    AttrDef class)
  54  # Formatters for plaintext (COMPLEX)
  55     - Auto-paragraphing (be sure to leverage fact that we know when things
  56       shouldn't be paragraphed, such as lists and tables).
  57     - Linkify URLs
  58     - Smileys
  59     - Linkification for HTML Purifier docs: notably configuration and classes
  60  - Allow tags to be "armored", an internal flag that protects them
  61    from validation and passes them out unharmed
  62  - Fixes for Firefox's inability to handle COL alignment props (Bug 915)
  63  - Automatically add non-breaking spaces to empty table cells when
  64    empty-cells:show is applied to have compatibility with Internet Explorer
  65  - Convert RTL/LTR override characters to <bdo> tags, or vice versa on demand.
  66    Also, enable disabling of directionality
  67
  68 3.0 release [To XML and Beyond]
  69  - Extended HTML capabilities based on namespacing and tag transforms (COMPLEX)
  70     - Hooks for adding custom processors to custom namespaced tags and
  71       attributes, offer default implementation
  72     - Lots of documentation and samples
  73  - XHTML 1.1 support
  74
  75 Ongoing
  76  - Lots of profiling, make it faster!
  77  - Plugins for major CMSes (COMPLEX)
  78     - WordPress (mostly written, needs beta-testing)
  79     - eFiction
  80     - more! (look for ones that use WYSIWYGs)
  81
  82 Unknown release (on a scratch-an-itch basis)
  83  ? Semi-lossy dumb alternate character encoding transfor
  84  ? Have 'lang' attribute be checked against official lists, achieved by
  85    encoding all characters that have string entity equivalents
  86  - Explain how to use HTML Purifier in non-PHP languages
  87
  88 Requested
  89  ? Native content compression, whitespace stripping (don't rely on Tidy, make
  90    sure we don't remove from <pre> or related tags)
  91
  92 Wontfix
  93  - Non-lossy smart alternate character encoding transformations (unless
  94    patch provided)
  95  - Pretty-printing HTML, users can use Tidy on the output on entire page