Tone down language against documentation, as per Shiflett and Coates.
[htmlpurifier-web.git] / contribute.xhtml
blobbe5536ba0ac73c346f7684b89e3d8fb1796cd9dc
1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4 <html
5 xmlns="http://www.w3.org/1999/xhtml"
6 xmlns:xi="http://www.w3.org/2001/XInclude"
7 xml:lang="en">
8 <head>
9 <title>Contribute - HTML Purifier</title>
10 <xi:include href="common-meta.xml" xpointer="xpointer(/*/node())" />
11 <meta name="description" content="How to help HTML Purifier grow through code and attention." />
12 <meta name="keywords" content="HTMLPurifier, HTML Purifier, HTML, filter, filtering, standards, compliant, contribute, contribution, open source, community, help, code, needed" />
13 </head>
14 <body>
16 <xi:include href="common-header.xml" xpointer="xpointer(/*/node())" />
17 <h1 id="title">Contribute</h1>
19 <div id="content">
21 <p>
22 The very first question to ask yourself before reading this page is this:
23 </p>
25 <blockquote><p><em>Why contribute?</em></p></blockquote>
27 <p>
28 As open-source software, you are not legally obligated to give anything
29 back to the community. In such a sense, HTML Purifier is our gift to
30 you, and you very well can run away and never be heard from again.
31 </p>
33 <p>
34 We hope, however, that this lack of a legal obligation doesn't prevent
35 you from contributing back to our project. Many hours were poured into
36 this project by its developers, and doubtless, this project has saved
37 many hours on your behalf. If HTML Purifier saved you 200 hours of work
38 (the actual figure might be more, might be less), even if you contribute
39 ten hours back to the project, that still come out ahead 190 hours.
40 </p>
42 <p>
43 Additionally, your use of this library also requires substantial investment
44 on your part as well. You were required to learn the APIs, read the
45 documentation, tweak things so that they worked with your application,
46 et cetera. Contributing back means making good use of this investment:
47 it means not only will your expertise and knowledge be fed back into
48 HTML Purifier, but you might learn a thing or to from the internals that
49 you didn't know before.
50 </p>
52 <p>
53 If I've convinced you, read on! It's quite easy to get started...
54 </p>
56 <div id="toc" />
58 <h2>What can you do?</h2>
60 <p>
61 Contributions can come in many forms. Documentation, code, even
62 evangelism, can all help a project. One of the things we've noticed,
63 however, is that many contributions come from people helping
64 themselves. They have an itch, a special requirement, and they help
65 the project out in that area.
66 </p>
68 <p>
69 What might that itch be? Over the years, we've accumulated many feature
70 requests in our <a href="dev/TODO">TODO</a> file. There are also
71 tasty tidbits in the <a href="docs">proposal section of our
72 documentation.</a> You might have an
73 idea for a new AutoFormatter, or maybe would like to implement an HTMLModule
74 for a set of elements that HTML Purifier doesn't support yet. Maybe you
75 want a demo page built-in with the library so that you can easily test
76 things out without using HTML Purifier's demo page. Code something that
77 interests you.
78 </p>
80 <h2>Coding standards</h2>
82 <p>
83 As a general rule of thumb, make sure your code looks like the code around
84 it. Probably the biggest thing is to remember four spaces, no tabs (if you
85 perpetually forget, get your text-editor to make whitespace visible). There
86 are a number of other formatting subtleties, but suffice to say
87 <em>consistency</em> is the order of the day in this project. You're not
88 going to read <acronym title="Yet Another Coding Standard">YACS</acronym> anyway.
89 </p>
91 <p>
92 The code you write must be PHP 5.0.5 compatible, so avoid later features
93 like magic methods. The code you write also must have unit tests, which
94 reside in the <em>tests/</em> directory. The workflow for your feature
95 should be along the lines of:
96 </p>
98 <ol>
99 <li>Write unit tests</li>
100 <li>Hack hack hack</li>
101 <li>Run <em>php tests/index.php</em></li>
102 <li>If failures, go back to 1 or 2</li>
103 <li>Commit and submit patch</li>
104 </ol>
107 HTML Purifier prides itself in having an evergreen test suite, so if your
108 change breaks other tests, it probably won't be accepted.
109 </p>
111 <h2>Getting setup</h2>
114 You already know how to <em>use</em> HTML Purifier. But do you know how
115 to develop it?
116 </p>
118 <h3>Git</h3>
121 HTML Purifier's repository is hosted via Git. If you've used Git before,
122 you can skip this section: you already know what the workflow is for
123 working on Git, so just clone from <em>git://repo.or.cz/htmlpurifier.git</em> and
124 get going. Otherwise, read-on.
125 </p>
128 In order to hack on HTML Purifier's source tree, you will first need to
129 make sure Git is installed on your system. Type the following command
130 in your prompt:
131 </p>
133 <pre class="command"><a href="http://www.kernel.org/pub/software/scm/git/docs/">git</a> --version</pre>
136 And you should get something along the lines of <q>git version 1.5.6</q>.
137 Otherwise:
138 </p>
140 <dl>
141 <dt>You use Linux:</dt>
142 <dd>
143 Grab Git from your friendly neighborhood package manager. Or compile
144 from source with package provided at <a href="http://git.or.cz/">git.or.cz</a>.
145 Either should be relatively simple.
146 </dd>
147 <dt>You use Windows:</dt>
148 <dd>
149 Download and install <a href="http://code.google.com/p/msysgit/">msysgit</a>.
150 Then, for all of the following commands
151 we discuss, enter them in the console provided by Git Bash. If you have
152 Cygwin, you can also use setup.exe to install Git.
153 </dd>
154 <dt>You use a Mac:</dt>
155 <dd>
156 There are binaries available from <a href="http://metastatic.org/text/Concern/2007/09/15/new-git-package-for-os-x/">various</a>
157 <a href="http://code.google.com/p/git-osx-installer/">sources</a>; I haven't
158 tried them so your mileage may vary. Since Mac is a BSD-like system, you
159 can also <a href="http://www.dekorte.com/blog/blog.cgi?do=item&amp;id=2539">compile
160 from source.</a>
161 </dd>
162 </dl>
165 Run the earlier command again to make sure the installation went
166 smoothly. Now run this command:
167 </p>
169 <pre class="command"><kbd><a href="http://www.kernel.org/pub/software/scm/git/docs/git-clone.html">git clone</a> git://repo.or.cz/htmlpurifier.git</kbd></pre>
172 This will copy the HTML Purifier codebase into the htmlpurifier folder.
173 </p>
176 You will want to configure the Git installation with your name and
177 email address. You can do this with these two commands.
178 </p>
180 <pre class="command"><kbd><a href="http://www.kernel.org/pub/software/scm/git/docs/git-config.html">git config</a> --global user.name "Bob Doe"
181 git config --global user.email bob@example.com</kbd></pre>
184 Let us fast forward for a moment and imagine that we already made our changes
185 and would now like to send the changes to HTML Purifier for review. You
186 will to execute these commands:
187 </p>
189 <pre class="command"><kbd><a href="http://www.kernel.org/pub/software/scm/git/docs/git-status.html">git status</a></kbd></pre>
192 This command will give you a quick rundown about all the files Git knows
193 about. If you have any <q>Untracked files</q>, you will need to add
194 them with:
195 </p>
197 <pre class="command"><kbd><a href="http://www.kernel.org/pub/software/scm/git/docs/git-add.html">git add</a> <em>$filename</em></kbd></pre>
199 <blockquote class="aside"><p>
200 (You can also add <q>Changed but not updated</q> files, but because we will
201 be using the <kbd>-a</kbd> option this is strictly unnecessary.)
202 </p></blockquote>
205 Now, you will want to commit your changes. Users of centralized version
206 control systems, beware: this does not push it to a remote repository,
207 or anything like that. It simply records the change in your local repository.
208 Doing so is as simple as:
209 </p>
211 <pre class="command"><kbd><a href="http://www.kernel.org/pub/software/scm/git/docs/git-commit.html">git commit</a> -as</kbd></pre>
213 <blockquote class="aside"><p>
214 The <q>a</q> flag tells Git to commit all modified files, even if you didn't
215 git add them. The <q>s</q> flag tells Git to sign off your commit message
216 with your name and email.
217 </p></blockquote>
220 You will then have a screen brought up to enter a commit message. If this
221 screen is vim (you can tell if your command line window transmuted into
222 something you've never seen before), type <kbd>i</kbd> (<samp>--INSERT--</samp>
223 mode), write your commit message, type <kbd>ESC</kbd>, and
224 then type <kbd>:wq ENTER</kbd> (write and quit).
225 </p>
228 A quick note about commit messages: there is a very specific format for them.
229 They should look something like this:
230 </p>
232 <pre><samp>Concise one-line statement describing change
234 Full explanation for the change. If you fixed a bug, make
235 sure you describe what was wrong, how you fixed it, and
236 what the behavior is now. If it was a feature, describe
237 why the feature is useful, how you use it, and any tricky
238 implementation details.
240 In short, the body of the commit message (which can span multiple
241 paragraphs) should, along with the code diff, be self
242 explanatory and not require any email introduction. At the
243 same time, your commit message will be immortalized and
244 should be in third-person and formal.
246 Signed-off-by: Edward Z. Yang &lt;edwardzyang@thewritingpot.com&gt;</samp></pre>
249 Finally, after the commit has been recorded, you will want to make a
250 patch to distribute to other people to review and test. Doing so is
251 as simple as:
252 </p>
254 <pre class="command"><a href="http://www.kernel.org/pub/software/scm/git/docs/git-format-patch.html">git format-patch</a> -1</pre>
256 <blockquote class="aside"><p>
257 You can substitute -1 for -#, where # is the number of commits you would
258 like to write patches for. You can also specify a commit hash ID.
259 </p></blockquote>
262 A file named roughly <em>0001-Short-description.patch</em> will be
263 created, with the complete contents of your change.
264 </p>
266 <p>In summary:</p>
268 <pre class="command"><kbd>git clone git://repo.or.cz/htmlpurifier.git
269 git config --global user.name "Bob Doe"
270 git config --global user.email bob@example.com
271 cd htmlpurifier</kbd>
272 # hack hack hack
273 <kbd>git status
274 git add newfile1.txt subdir/newfile2.txt
275 git commit -as
276 git format-patch -1
277 # send patch off</kbd></pre>
280 Two quick notes before we go on to some HTML Purifier specific instructions:
281 </p>
283 <ol>
284 <li>
286 If you are posting the patch on the forum, be sure to copy-paste it
287 in-between <code>&lt;pre&gt;&lt;![CDATA[</code> and <code>]]&gt;&lt;/pre&gt;</code>
288 If you are emailing the patch, we prefer that you send it inline in a text
289 email (be sure to configure your mail client not to wrap lines, check out
290 <a href="http://repo.or.cz/w/git.git?a=blob;f=Documentation/SubmittingPatches;hb=HEAD">SubmittingPatches guidelines from the Git project</a> for more details.)
291 </p></li>
292 <li>
294 In all probability, there have been changes to the HTML Purifier codebase
295 since you made your patch. As part of your duties as a patch-maker, you
296 should ensure that your patch remains off of the HEAD of our master branch.
297 You can do so with the command:
298 </p>
299 <pre class="command"><a href="http://www.kernel.org/pub/software/scm/git/docs/git-pull.html">git pull</a> --rebase</pre>
301 You may also find it useful to perform your development in a topic branch.
302 You can do this using:
303 </p>
304 <pre class="command"><a href="http://www.kernel.org/pub/software/scm/git/docs/git-checkout.html">git checkout</a> -b <em>branchname</em></pre>
306 The benefits of a setup like this is you can now do a regular
307 <kbd>git pull</kbd> on the master branch, and then use
308 <kbd><a href="http://www.kernel.org/pub/software/scm/git/docs/git-rebase.html">git rebase</a> master</kbd> on your own branch to keep it up to
309 date. This can be useful if your patch produces a conflict.
310 (One quick note; you switch between branches using <kbd>git
311 checkout <em>branchname</em></kbd>. The -b flag creates a new branch.)
312 </p>
313 <blockquote class="aside"><p>
314 The default behavior of <kbd>git pull</kbd> in such a case is to merge
315 your branch. If you were a release maintainer, this is what you would
316 want to do, since your history was public and rewriting history
317 could be disruptive. With private, local changes, however, performing
318 the merge makes the history needlessly complicated.
319 </p></blockquote>
320 </li>
321 </ol>
323 <h3>SimpleTest</h3>
326 As mentioned before, one of the keys to successfully developing a new
327 feature on HTML Purifier is a comprehensive set of unit tests. However,
328 unit tests serve you no good if you can't run them.
329 </p>
332 The first step in getting unit tests running on HTML Purifier is downloading
333 <a href="http://simpletest.org">SimpleTest</a>, our test suite. However,
334 the public 1.0.1 release won't work with HTML Puriifer, as it is still
335 <abbr>PHP</abbr>4 compatible and will give off spurious errors. You need to
336 use the trunk version of SimpleTest. This version can be checked out
337 using <a href="http://subversion.tigris.org/">Subversion</a> with this command:
338 </p>
340 <pre class="command"><kbd>svn co https://simpletest.svn.sourceforge.net/svnroot/simpletest/simpletest/trunk simpletest</kbd></pre>
343 The next step is to tell HTML Purifier about the SimpleTest installation.
344 You can do this by copying the <em>test-settings.sample.php</em> file
345 to <em>test-settings.php</em> and configuring it according to the
346 instructions inside. The only variable you must edit is
347 <var>$simpletest_location</var>.
348 </p>
350 <blockquote><p>
351 At the moment, it is somewhat difficult to get the optional parameters setup
352 properly. If you feel adventurous, try the instructions; they should work,
353 but might be a little complicated or sparser than usual.
354 </p></blockquote>
357 Now, check if everything is running by typing <kbd>php tests/index.php --flush</kbd>
358 from the root of your HTML Purifier working copy. You should get a full
359 complement of passing tests. Congratulations!
360 </p>
362 <h2>Workflow</h2>
365 After identifying what changes you would like to make to HTML Purifier,
366 you will need to code appropriate unit tests for it. (If you are of the
367 code first, test later mentality, that is fine too; just make sure the tests
368 are 1. written and 2. comprehensive.) If you modify the file
369 <em>library/HTMLPurifier/ConfigSchema.php</em>, chances are the corresponding
370 tests are in <em>tests/HTMLPurifier/ConfigSchemaTest.php</em> (i.e. substitute
371 library with tests and append a Test to the filename.)
372 </p>
375 We prefer, first-and-foremost, <em>unit</em> tests, that is, the test should
376 not have any dependencies on any other objects, and if it does, those
377 dependencies should be filled in using SimpleTest's excellent
378 <a href="http://www.lastcraft.com/mock_objects_documentation.php">mock object support</a>.
379 We also believe strongly in integration tests,
380 which take in the form of htmlt files, and test HTML Purifier as a whole
381 with your modifications. An htmlt file looks like this:
382 </p>
384 <pre><samp><![CDATA[--INI--
385 %HTML.Allowed = "b,i,u,p"
386 --HTML--
387 <b>Foo<a id="asdf">bar</a></b>
388 --EXPECT--
389 <b>Foobar</b>
390 ]]></samp></pre>
393 The <samp>--INI--</samp> section indicates the configuration directives
394 that should be used with this test (if you added a new feature, you will
395 most probably be using this section to activate it). The <samp>--HTML--</samp>
396 section indicates the input, and the <samp>--EXPECT--</samp> indicates
397 the expected output. Be sure to include a trailing newline. You can place
398 these files in the <em>tests/HTMLPurifier/HTMLT</em> directory; give them
399 a descriptive filename.
400 </p>
403 It is my hope that you find the HTML Purifier core code a joy (or at least,
404 not painful) to work with; every class and method has a docblock that doesn't
405 reiterate what you can find inside its body, but also how the component
406 fits into HTML Purifier as a whole. If you find any section of code that
407 is missing or has poor documentation, please notify us and we will
408 correct it immediately. (Remember, <kbd>git pull --rebase</kbd> to
409 update your branch!)
410 </p>
413 There are, however, some architectural features that are not immediately
414 evident from mere source-code browsing. In this case, you are encouraged
415 to check out the documentation in the <em>docs/</em> folder (web
416 accessible at <a href="docs/">the same location.</a>)
417 <a href="docs/dev-flush.html"><q>Flushing the Purifier</q></a>
418 and <a href="docs/dev-config-schema.html"><q>Config Schema</q></a> in the Development center are of particular
419 notability: in all likelihood you will need this knowledge in order to
420 get HTML Purifier working the way you want it to.
421 </p>
423 <h2>Debugging</h2>
426 Your debugging skills are as good as
427 mine, but there are few things that are helpful to keep in mind:
428 </p>
430 <ul>
431 <li>
432 You can modify the granularity of tests to run down to a single
433 test-case method. The first method is to specify the <em>f</em>
434 parameter with a value like <samp>HTMLPurifier/ConfigSchemaTest.php</samp>
435 which will cause HTML Purifier to run only that test. (In web URL
436 speak, this means <em>tests/index.php?f=HTMLPurifier/ConfigSchemaTest.php</em>,
437 in command line speak, this means <kbd>php tests/index.php -f HTMLPurifier/ConfigSchemaTest.php</kbd>.
438 To run only a single test <em>method</em>, prefix that method with
439 <code>__only</code>. Be sure to revert this change when you're done
440 hammering away, and don't forget to test <em>everything</em> before committing.
441 </li>
442 <li>
443 HTML Purifier does not have a debugging/verbose mode, so any internal
444 data-checks need to be <code>var_dump</code>'ed by the user.
445 <a href="http://www.xdebug.org/">XDebug</a> makes var_dump'ing a pleasure
446 by colorizing and escaping output. (The stack traces are also quite
447 handy!) There is also a function called <code>printTokens($tokens, $index)</code> specifically
448 for outputting arrays of tokens. The <var>$index</var> variable
449 indicates a token to make bold, and can be omitted.
450 </li>
451 <li>
452 There's a Debugger class. Don't use it. It kinda sucks.
453 </li>
454 <li>
455 If it seems like a change you made had no effect on your tests, try
456 flushing with <em>flush</em>.
457 </li>
458 <li>
459 SimpleTest's error message when an <code>assertIdentical</code> message fails with
460 strings is incomprehensible, so keep your test strings small or be ready to
461 <code>var_dump</code> if necessary.
462 </li>
463 <li>
464 Beware whitespace. Tests should work whether or not they're Unix (LF), Windows (CRLF)
465 or Mac (CR) encoded. This usually means <em>not</em> using <code>PHP_EOL</code>
466 but rather a literal newline in the source code.
467 </li>
468 </ul>
470 </div>
471 </body>
472 </html>