docs/ref-content-models.txt

   1
   2 Handling Content Model Changes
   3
   4
   5 1. Context
   6
   7 The distinction between Transitional and Strict document types is somewhat
   8 of an anomaly in the lineage of XHTML document types (following 1.0, no
   9 doctypes do not have flavors: instead, modularization is used to let
  10 document authors vary their elements).  This transition is usually quite
  11 straight-forward, as W3C usually deprecates attributes or elements, which
  12 are quite easily handled using tag and attribute transforms.
  13
  14 However, for two elements, <blockquote>, <body> and <address>, W3C elected
  15 to also change the content model.  <blockquote> and <body> originally
  16 accepted both inline and block elements, but in the strict doctype they
  17 only allow block elements.  With <address>, the situation is inverted:
  18 <p> tags were now forbidden from appearing within this tag.
  19
  20
  21 2. Current situation
  22
  23 Currently, HTML Purifier treats <blockquote> specially during Tidy mode
  24 using a custom ChildDef class StrictBlockquote.  StrictBlockquote
  25 operates similarly to Required, except that when it encounters an inline
  26 element, it will wrap it in a block tag (as specified by
  27 %HTML.BlockWrapper, the default is <p>).  The naming suggests it can
  28 only be used for <blockquote>s, although it may be possible to
  29 genericize it to work on other cases of this nature (this would be of
  30 little practical application, as no other element in XHTML 1.1 or earlier
  31 has a block-only content model).
  32
  33 Tidy currently contains no custom, lenient implementation for <address>.
  34 If one were to be written, it would likely operate on the principle that,
  35 when a <p> tag were to be encountered, it would be replaced with a
  36 leading and trailing <br /> tag (the contents of <p>, being inline, are
  37 not an issue).  There is no prior work with this sort of operation.
  38
  39
  40 3. Outside applicability
  41
  42 There are a number of other elements that contain restrictive content
  43 models, such as <ul> or <span> (the latter is restrictive in that it
  44 does not allow block elements).  In the former case, an errant node
  45 is eliminated completely, in the latter case, the text of the node
  46 would is preserved (as the parent node does allow PCDATA).  Custom
  47 content model implementations probably are not the best way of handling
  48 these cases, instead, node bubbling should be implemented instead.
  49
  50     vim: et sw=4 sts=4