Convert instances of Subversion to Git. Not complete.
[htmlpurifier-web.git] / contribute.xhtml
blob1f15d938e44cf793f7cd638d300155606fc76130
1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4 <html
5 xmlns="http://www.w3.org/1999/xhtml"
6 xmlns:xi="http://www.w3.org/2001/XInclude"
7 xml:lang="en">
8 <head>
9 <title>Contribute - HTML Purifier</title>
10 <xi:include href="common-meta.xml" xpointer="xpointer(/*/node())" />
11 <meta name="description" content="How to help HTML Purifier grow through code and attention." />
12 <meta name="keywords" content="HTMLPurifier, HTML Purifier, HTML, filter, filtering, standards, compliant, contribute, contribution, open source, community, help, code, needed" />
13 </head>
14 <body>
16 <xi:include href="common-header.xml" xpointer="xpointer(/*/node())" />
17 <h1 id="title">Contribute</h1>
19 <div id="content">
21 <p>
22 The very first question to ask yourself before reading this page is this:
23 </p>
25 <blockquote><p><em>Why contribute?</em></p></blockquote>
27 <p>
28 As open-source software, you are not legally obligated to give anything
29 back to the community. In such a sense, HTML Purifier is our gift to
30 you, and you very well can run away and never be heard from again.
31 </p>
33 <p>
34 We hope, however, that this lack of a legal obligation doesn't prevent
35 you from contributing back to our project. Many hours were poured into
36 this project by its developers, and doubtless, this project has saved
37 many hours on your behalf. If HTML Purifier saved you 200 hours of work
38 (the actual figure might be more, might be less), even if you contribute
39 ten hours back to the project, that still come out ahead 190 hours.
40 </p>
42 <p>
43 Additionally, your use of this library also requires substantial investment
44 on your part as well. You were required to learn the APIs, read the
45 documentation, tweak things so that they worked with your application,
46 et cetera. Contributing back means making good use of this investment:
47 it means not only will your expertise and knowledge be fed back into
48 HTML Purifier, but you might learn a thing or to from the internals that
49 you didn't know before.
50 </p>
52 <p>
53 If I've convinced you, read on! It's quite easy to get started...
54 </p>
56 <div id="toc" />
58 <h2>What can you do?</h2>
60 <p>
61 New contributors in other projects often get shoehorned into mundane stuff
62 like updating documentation or writing tutorials. Unless that's your
63 calling, we don't want you to do that. We want you to scratch an itch,
64 to think up of something that would be helpful to you, and write code for it.
65 </p>
67 <p>
68 What might that itch be? Over the years, we've accumulated many feature
69 requests in our <a href="dev/TODO">TODO</a> file. There are also
70 tasty tidbits in the <a href="docs">proposal section of our
71 documentation.</a> You might have an
72 idea for a new AutoFormatter, or maybe would like to implement an HTMLModule
73 for a set of elements that HTML Purifier doesn't support yet. Maybe you
74 want a demo page built-in with the library so that you can easily test
75 things out without using HTML Purifier's demo page. Code something that
76 interests you.
77 </p>
79 <h2>Coding standards</h2>
81 <p>
82 As a general rule of thumb, make sure your code looks like the code around
83 it. Probably the biggest thing is to remember four spaces, no tabs (if you
84 perpetually forget, get your text-editor to make whitespace visible). There
85 are a number of other formatting subtleties, but suffice to say
86 <em>consistency</em> is the order of the day in this project. You're not
87 going to read <acronym title="Yet Another Coding Standard">YACS</acronym> anyway.
88 </p>
90 <p>
91 The code you write must be PHP 5.0.5 compatible, so avoid later features
92 like magic methods. The code you write also must have unit tests, which
93 reside in the <em>tests/</em> directory. The workflow for your feature
94 should be along the lines of:
95 </p>
97 <ol>
98 <li>Write unit tests</li>
99 <li>Hack hack hack</li>
100 <li>Run <em>php tests/index.php</em></li>
101 <li>If failures, go back to 1 or 2</li>
102 <li>Commit and submit patch</li>
103 </ol>
106 HTML Purifier prides itself in having an evergreen test suite, so if your
107 change breaks other tests, it probably won't be accepted.
108 </p>
110 <h2>Getting setup</h2>
113 You already know how to <em>use</em> HTML Purifier. But do you know how
114 to develop it?
115 </p>
117 <h3>Git</h3>
120 HTML Purifier's repository is hosted via Git. If you've used Git before,
121 you can skip this section: you already know what the workflow is for
122 working on Git, so just clone from <em>git://repo.or.cz/htmlpurifier.git</em> and
123 get going. Otherwise, read-on.
124 </p>
127 In order to hack on HTML Purifier's source tree, you will first need to
128 make sure Git is installed on your system. Type the following command
129 in your prompt:
130 </p>
132 <pre class="command"><a href="http://www.kernel.org/pub/software/scm/git/docs/">git</a> --version</pre>
135 And you should get something along the lines of <q>git version 1.5.6</q>.
136 Otherwise:
137 </p>
139 <dl>
140 <dt>You use Linux:</dt>
141 <dd>
142 Grab Git from your friendly neighborhood package manager. Or compile
143 from source with package provided at <a href="http://git.or.cz/">git.or.cz</a>.
144 Either should be relatively simple.
145 </dd>
146 <dt>You use Windows:</dt>
147 <dd>
148 Download and install <a href="http://code.google.com/p/msysgit/">msysgit</a>.
149 Then, for all of the following commands
150 we discuss, enter them in the console provided by Git Bash. If you have
151 Cygwin, you can also use setup.exe to install Git.
152 </dd>
153 <dt>You use a Mac:</dt>
154 <dd>
155 There are binaries available from <a href="http://metastatic.org/text/Concern/2007/09/15/new-git-package-for-os-x/">various</a>
156 <a href="http://code.google.com/p/git-osx-installer/">sources</a>; I haven't
157 tried them so your mileage may vary. Since Mac is a BSD-like system, you
158 can also <a href="http://www.dekorte.com/blog/blog.cgi?do=item&amp;id=2539">compile
159 from source.</a>
160 </dd>
161 </dl>
164 Run the earlier command again to make sure the installation went
165 smoothly. Now run this command:
166 </p>
168 <pre class="command"><kbd><a href="http://www.kernel.org/pub/software/scm/git/docs/git-clone.html">git clone</a> git://repo.or.cz/htmlpurifier.git</kbd></pre>
171 This will copy the HTML Purifier codebase into the htmlpurifier folder.
172 </p>
175 You will want to configure the Git installation with your name and
176 email address. You can do this with these two commands.
177 </p>
179 <pre class="command"><kbd><a href="http://www.kernel.org/pub/software/scm/git/docs/git-config.html">git config</a> --global user.name "Bob Doe"
180 git config --global user.email bob@example.com</kbd></pre>
183 Let us fast forward for a moment and imagine that we already made our changes
184 and would now like to send the changes to HTML Purifier for review. You
185 will to execute these commands:
186 </p>
188 <pre class="command"><kbd><a href="http://www.kernel.org/pub/software/scm/git/docs/git-status.html">git status</a></kbd></pre>
191 This command will give you a quick rundown about all the files Git knows
192 about. If you have any <q>Untracked files</q>, you will need to add
193 them with:
194 </p>
196 <pre class="command"><kbd><a href="http://www.kernel.org/pub/software/scm/git/docs/git-add.html">git add</a> <em>$filename</em></kbd></pre>
198 <blockquote class="aside"><p>
199 (You can also add <q>Changed but not updated</q> files, but because we will
200 be using the <kbd>-a</kbd> option this is strictly unnecessary.)
201 </p></blockquote>
204 Now, you will want to commit your changes. Users of centralized version
205 control systems, beware: this does not push it to a remote repository,
206 or anything like that. It simply records the change in your local repository.
207 Doing so is as simple as:
208 </p>
210 <pre class="command"><kbd><a href="http://www.kernel.org/pub/software/scm/git/docs/git-commit.html">git commit</a> -as</kbd></pre>
212 <blockquote class="aside"><p>
213 The <q>a</q> flag tells Git to commit all modified files, even if you didn't
214 git add them. The <q>s</q> flag tells Git to sign off your commit message
215 with your name and email.
216 </p></blockquote>
219 You will then have a screen brought up to enter a commit message. If this
220 screen is vim (you can tell if your command line window transmuted into
221 something you've never seen before), type <kbd>i</kbd> (<samp>--INSERT--</samp>
222 mode), write your commit message, type <kbd>ESC</kbd>, and
223 then type <kbd>:wq ENTER</kbd> (write and quit).
224 </p>
227 A quick note about commit messages: there is a very specific format for them.
228 They should look something like this:
229 </p>
231 <pre><samp>Concise one-line statement describing change
233 Full explanation for the change. If you fixed a bug, make
234 sure you describe what was wrong, how you fixed it, and
235 what the behavior is now. If it was a feature, describe
236 why the feature is useful, how you use it, and any tricky
237 implementation details.
239 In short, the body of the commit message (which can span multiple
240 paragraphs) should, along with the code diff, be self
241 explanatory and not require any email introduction. At the
242 same time, your commit message will be immortalized and
243 should be in third-person and formal.
245 Signed-off-by: Edward Z. Yang &lt;edwardzyang@thewritingpot.com&gt;</samp></pre>
248 Finally, after the commit has been recorded, you will want to make a
249 patch to distribute to other people to review and test. Doing so is
250 as simple as:
251 </p>
253 <pre class="command"><a href="http://www.kernel.org/pub/software/scm/git/docs/git-format-patch.html">git format-patch</a> -1</pre>
255 <blockquote class="aside"><p>
256 You can substitute -1 for -#, where # is the number of commits you would
257 like to write patches for. You can also specify a commit hash ID.
258 </p></blockquote>
261 A file named roughly <em>0001-Short-description.patch</em> will be
262 created, with the complete contents of your change.
263 </p>
265 <p>In summary:</p>
267 <pre class="command"><kbd>git clone git://repo.or.cz/htmlpurifier.git
268 git config --global user.name "Bob Doe"
269 git config --global user.email bob@example.com
270 cd htmlpurifier</kbd>
271 # hack hack hack
272 <kbd>git status
273 git add newfile1.txt subdir/newfile2.txt
274 git commit -as
275 git format-patch -1
276 # send patch off</kbd></pre>
279 Two quick notes before we go on to some HTML Purifier specific instructions:
280 </p>
282 <ol>
283 <li>
285 If you are posting the patch on the forum, be sure to copy-paste it
286 in-between <code>&lt;pre&gt;&lt;![CDATA[</code> and <code>]]&gt;&lt;/pre&gt;</code>
287 If you are emailing the patch, we prefer that you send it inline in a text
288 email (be sure to configure your mail client not to wrap lines, check out
289 <a href="http://repo.or.cz/w/git.git?a=blob;f=Documentation/SubmittingPatches;hb=HEAD">SubmittingPatches guidelines from the Git project</a> for more details.)
290 </p></li>
291 <li>
293 In all probability, there have been changes to the HTML Purifier codebase
294 since you made your patch. As part of your duties as a patch-maker, you
295 should ensure that your patch remains off of the HEAD of our master branch.
296 You can do so with the command:
297 </p>
298 <pre class="command"><a href="http://www.kernel.org/pub/software/scm/git/docs/git-pull.html">git pull</a> --rebase</pre>
300 You may also find it useful to perform your development in a topic branch.
301 You can do this using:
302 </p>
303 <pre class="command"><a href="http://www.kernel.org/pub/software/scm/git/docs/git-checkout.html">git checkout</a> -b <em>branchname</em></pre>
305 The benefits of a setup like this is you can now do a regular
306 <kbd>git pull</kbd> on the master branch, and then use
307 <kbd><a href="http://www.kernel.org/pub/software/scm/git/docs/git-rebase.html">git rebase</a> master</kbd> on your own branch to keep it up to
308 date. This can be useful if your patch produces a conflict.
309 (One quick note; you switch between branches using <kbd>git
310 checkout <em>branchname</em></kbd>. The -b flag creates a new branch.)
311 </p>
312 <blockquote class="aside"><p>
313 The default behavior of <kbd>git pull</kbd> in such a case is to merge
314 your branch. If you were a release maintainer, this is what you would
315 want to do, since your history was public and rewriting history
316 could be disruptive. With private, local changes, however, performing
317 the merge makes the history needlessly complicated.
318 </p></blockquote>
319 </li>
320 </ol>
322 <h3>SimpleTest</h3>
325 As mentioned before, one of the keys to successfully developing a new
326 feature on HTML Purifier is a comprehensive set of unit tests. However,
327 unit tests serve you no good if you can't run them.
328 </p>
331 The first step in getting unit tests running on HTML Purifier is downloading
332 <a href="http://simpletest.org">SimpleTest</a>, our test suite. However,
333 the public 1.0.1 release won't work with HTML Puriifer, as it is still
334 <abbr>PHP</abbr>4 compatible and will give off spurious errors. You need to
335 use the trunk version of SimpleTest. This version can be checked out
336 using <a href="http://subversion.tigris.org/">Subversion</a> with this command:
337 </p>
339 <pre class="command"><kbd>svn co https://simpletest.svn.sourceforge.net/svnroot/simpletest/simpletest/trunk simpletest</kbd></pre>
342 The next step is to tell HTML Purifier about the SimpleTest installation.
343 You can do this by copying the <em>test-settings.sample.php</em> file
344 to <em>test-settings.php</em> and configuring it according to the
345 instructions inside. The only variable you must edit is
346 <var>$simpletest_location</var>.
347 </p>
349 <blockquote><p>
350 At the moment, it is somewhat difficult to get the optional parameters setup
351 properly. If you feel adventurous, try the instructions; they should work,
352 but might be a little complicated or sparser than usual.
353 </p></blockquote>
356 Now, check if everything is running by typing <kbd>php tests/index.php --flush</kbd>
357 from the root of your HTML Purifier working copy. You should get a full
358 complement of passing tests. Congratulations!
359 </p>
361 <h2>Workflow</h2>
364 After identifying what changes you would like to make to HTML Purifier,
365 you will need to code appropriate unit tests for it. (If you are of the
366 code first, test later mentality, that is fine too; just make sure the tests
367 are 1. written and 2. comprehensive.) If you modify the file
368 <em>library/HTMLPurifier/ConfigSchema.php</em>, chances are the corresponding
369 tests are in <em>tests/HTMLPurifier/ConfigSchemaTest.php</em> (i.e. substitute
370 library with tests and append a Test to the filename.)
371 </p>
374 We prefer, first-and-foremost, <em>unit</em> tests, that is, the test should
375 not have any dependencies on any other objects, and if it does, those
376 dependencies should be filled in using SimpleTest's excellent
377 <a href="http://www.lastcraft.com/mock_objects_documentation.php">mock object support</a>.
378 We also believe strongly in integration tests,
379 which take in the form of htmlt files, and test HTML Purifier as a whole
380 with your modifications. An htmlt file looks like this:
381 </p>
383 <pre><samp><![CDATA[--INI--
384 %HTML.Allowed = "b,i,u,p"
385 --HTML--
386 <b>Foo<a id="asdf">bar</a></b>
387 --EXPECT--
388 <b>Foobar</b>
389 ]]></samp></pre>
392 The <samp>--INI--</samp> section indicates the configuration directives
393 that should be used with this test (if you added a new feature, you will
394 most probably be using this section to activate it). The <samp>--HTML--</samp>
395 section indicates the input, and the <samp>--EXPECT--</samp> indicates
396 the expected output. Be sure to include a trailing newline. You can place
397 these files in the <em>tests/HTMLPurifier/HTMLT</em> directory; give them
398 a descriptive filename.
399 </p>
402 It is my hope that you find the HTML Purifier core code a joy (or at least,
403 not painful) to work with; every class and method has a docblock that doesn't
404 reiterate what you can find inside its body, but also how the component
405 fits into HTML Purifier as a whole. If you find any section of code that
406 is missing or has poor documentation, please notify us and we will
407 correct it immediately. (Remember, <kbd>git pull --rebase</kbd> to
408 update your branch!)
409 </p>
412 There are, however, some architectural features that are not immediately
413 evident from mere source-code browsing. In this case, you are encouraged
414 to check out the documentation in the <em>docs/</em> folder (web
415 accessible at <a href="docs/">the same location.</a>)
416 <a href="docs/dev-flush.html"><q>Flushing the Purifier</q></a>
417 and <a href="docs/dev-config-schema.html"><q>Config Schema</q></a> in the Development center are of particular
418 notability: in all likelihood you will need this knowledge in order to
419 get HTML Purifier working the way you want it to.
420 </p>
422 <h2>Debugging</h2>
425 Your debugging skills are as good as
426 mine, but there are few things that are helpful to keep in mind:
427 </p>
429 <ul>
430 <li>
431 You can modify the granularity of tests to run down to a single
432 test-case method. The first method is to specify the <em>f</em>
433 parameter with a value like <samp>HTMLPurifier/ConfigSchemaTest.php</samp>
434 which will cause HTML Purifier to run only that test. (In web URL
435 speak, this means <em>tests/index.php?f=HTMLPurifier/ConfigSchemaTest.php</em>,
436 in command line speak, this means <kbd>php tests/index.php -f HTMLPurifier/ConfigSchemaTest.php</kbd>.
437 To run only a single test <em>method</em>, prefix that method with
438 <code>__only</code>. Be sure to revert this change when you're done
439 hammering away, and don't forget to test <em>everything</em> before committing.
440 </li>
441 <li>
442 HTML Purifier does not have a debugging/verbose mode, so any internal
443 data-checks need to be <code>var_dump</code>'ed by the user.
444 <a href="http://www.xdebug.org/">XDebug</a> makes var_dump'ing a pleasure
445 by colorizing and escaping output. (The stack traces are also quite
446 handy!) There is also a function called <code>printTokens($tokens, $index)</code> specifically
447 for outputting arrays of tokens. The <var>$index</var> variable
448 indicates a token to make bold, and can be omitted.
449 </li>
450 <li>
451 There's a Debugger class. Don't use it. It kinda sucks.
452 </li>
453 <li>
454 If it seems like a change you made had no effect on your tests, try
455 flushing with <em>flush</em>.
456 </li>
457 <li>
458 SimpleTest's error message when an <code>assertIdentical</code> message fails with
459 strings is incomprehensible, so keep your test strings small or be ready to
460 <code>var_dump</code> if necessary.
461 </li>
462 <li>
463 Beware whitespace. Tests should work whether or not they're Unix (LF), Windows (CRLF)
464 or Mac (CR) encoded. This usually means <em>not</em> using <code>PHP_EOL</code>
465 but rather a literal newline in the source code.
466 </li>
467 </ul>
469 </div>
470 </body>
471 </html>