Release 3.2.0.
[htmlpurifier-web.git] / security / 2008 / shift-jis.xhtml
blob70d70f249a55b90ac63853955c5e4bfe5027e906
1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4 <html
5 xmlns="http://www.w3.org/1999/xhtml"
6 xmlns:xi="http://www.w3.org/2001/XInclude"
7 xml:lang="en">
8 <head>
9 <title>Shift_JIS Full Disclosure - Security - HTML Purifier</title>
10 <xi:include href="common-meta.xml" xpointer="xpointer(/*/node())" />
11 <meta name="description" content="Full disclosure security page detailing the Shift_JIS CSS backslash attack." />
12 <meta name="keywords" content="HTMLPurifier, HTML Purifier, HTML, filter, filtering, standards, compliant, 3.1.1, attack, full disclosure, xss, security, shift_jis, backslash, css" />
13 </head>
14 <body>
16 <xi:include href="common-header.xml" xpointer="xpointer(/*/node())" />
18 <div id="main">
19 <h1 id="title">Shift_JIS Full Disclosure</h1>
21 <div id="content">
23 <p>
24 A difference betweeen the behavior of iconv (the utility HTML Purifier
25 uses to transform character encodings) and browsers allowed an attacker
26 to use the Yen character (<code>5C</code> in Shift_JIS) to trick
27 HTML Purifier into outputting a byte-sequence most browsers would
28 interpret as a backslash. This could then be used to execute arbitrary
29 JavaScript from <abbr>CSS</abbr>.
30 </p>
32 <p>
33 This vulnerability was reported privately to the vendor by
34 <a href="http://d.hatena.ne.jp/teracc/">Takeshi Terada</a>.
35 No active exploits are currently known.
36 </p>
38 <h2 id="Fix">Fix</h2>
40 <p>
41 This vulnerability was fixed in HTML Purifier 3.1.1 and 2.1.5.
42 </p>
44 <h2 id="Details">Details</h2>
46 <p>
47 The large majority of character sets in the world are equivalent
48 to US-ASCII in the 7-bit domain. Shift_JIS (as well as Johab) are
49 notable exceptions, redefining two byte sequences <code>5C</code>
50 and <code>7E</code> to be different characters. In Shift_JIS:
51 </p>
53 <table>
54 <thead>
55 <tr>
56 <th>Bytes</th>
57 <th>ASCII</th>
58 <th>Shift_JIS</th>
59 </tr>
60 </thead>
61 <tbody>
62 <tr>
63 <th>5C</th>
64 <td>\</td>
65 <td>¥</td>
66 </tr>
67 <tr>
68 <th>7E</th>
69 <td>~</td>
70 <td></td>
71 </tr>
72 </tbody>
73 </table>
75 <p>
76 This is quite exceptional, and puts users of Shift_JIS in a hard
77 place because they have no way of expressing the backslash or
78 tilde legitimately. Consequently, browsers treat the byte sequence
79 as equivalent to a backslash, even if it renders as a Yen.
80 </p>
82 <p>
83 Iconv, on the other hand, transforms the <code>5C</code> byte
84 sequence to Unicode U+00A5 (in UTF-8, this is <code>C2 A5</code>), the
85 correct character for Yen. This is incorrect behavior, and leads
86 to the security vulnerability: HTML Purifier thinks that the backslash
87 is actually a Yen, and does not take any appropriate security
88 measures. Then, when the Yen is converted back to <code>5C</code>,
89 it gains backslash behavior and can be used to break out of a
90 quoted CSS string. Furthermore, traditionally buggy behavior
91 will be observed if a backslash is somehow introduced to the
92 HTML during processing, as iconv does not know how to convert
93 a backslash in UTF-8 back to a backslash in Shift_JIS (hint: it's
94 impossible without changing the font).
95 </p>
97 <p>
98 The fix involves undoing the unnecessary transformation that iconv
99 performs. HTML Purifier generalizes the fix to all character
100 encodings with
101 <code>HTMLPurifier_Encoder-&gt;testEncodingSupportsASCII()</code>
102 by iterating through all printable 7-bit byte sequences and checking
103 if conversion to UTF-8 causes a change, in which case appropriate
104 measures should be taken. We do not know of any widely used character
105 encodings besides Shift_JIS, however, that would be affected by this
106 behavior.
107 </p>
109 <h2 id="History">History</h2>
112 The vulnerability was reported on May 24, 2008 via email, as a follow
113 up to the another <a href="css-backslash.html">unrelated vulnerability</a> in CSS handling.
114 A patch was committed to the public repository on <a href="http://repo.or.cz/w/htmlpurifier.git?a=commit;h=bb16d8eae571dd4e30e3a62cce03d436d46cefaf">May 25, 2008</a>,
115 with the summary: <q>Fix Shift_JIS encoding wonkiness with yen symbols and whatnot.</q>
116 HTML Purifier 3.1.1 was released on June 19, 2008.
117 </p>
119 </div>
120 </div>
122 </body>
123 </html>