DMVCCM.html

   1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
   2                "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
   3 <html xmlns="http://www.w3.org/1999/xhtml"
   4 lang="en" xml:lang="en">
   5 <head>
   6 <title>DMV/CCM &ndash; todo-list / progress</title>
   7 <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
   8 <meta name="generator" content="Org-mode"/>
   9 <meta name="generated" content="2008/06/10 12:33:43"/>
  10 <meta name="author" content="Kevin Brubeck Unhammer"/>
  11 <link rel="stylesheet" type="text/css" href="http://www.student.uib.no/~kun041/org.css">
  12 </head><body>
  13 <h1 class="title">DMV/CCM &ndash; todo-list / progress</h1>
  14
  15
  16 <div id="table-of-contents">
  17 <h2>Table of Contents</h2>
  18 <div id="text-table-of-contents">
  19 <ul>
  20 <li><a href="#sec-1">1 dmvccm report and project</a></li>
  21 <li><a href="#sec-2">2 [#A] P_STOP and P_CHOOSE for IO/EM</a></li>
  22 <li><a href="#sec-3">3 Find Most Probable Parse of given test sentence</a></li>
  23 <li><a href="#sec-4">4 Combine CCM with DMV</a></li>
  24 <li><a href="#sec-5">5 Get a dependency parsed version of WSJ to test with</a></li>
  25 <li><a href="#sec-6">6 Initialization   </a></li>
  26 <li><a href="#sec-7">7 [#C] Deferred</a></li>
  27 <li><a href="#sec-8">8 Adjacency and combining it with inner()</a></li>
  28 <li><a href="#sec-9">9 Expectation Maximation in IO/DMV-terms</a></li>
  29 <li><a href="#sec-10">10 Python-stuff</a></li>
  30 <li><a href="#sec-11">11 Git</a></li>
  31 </ul>
  32 </div>
  33 </div>
  34
  35 <div id="outline-container-1" class="outline-2">
  36 <h2 id="sec-1">1 dmvccm report and project</h2>
  37 <div id="text-1">
  38
  39 <p><span class="timestamp-kwd">DEADLINE: </span> <span class="timestamp">2008-06-30 Mon</span><br/>
  40 </p><ul>
  41 <li>
  42 <a href="src/dmv.py">dmv.py</a>
  43 </li>
  44 <li>
  45 <a href="src/io.py">io.py</a>
  46 </li>
  47 <li>
  48 <a href="src/harmonic.py">harmonic.py</a>
  49
  50 </li>
  51 </ul>
  52
  53 <p>(But absolute, extended, really-quite-dead-now deadline: August 31&hellip;)
  54 </p></div>
  55
  56 </div>
  57
  58 <div id="outline-container-2" class="outline-2">
  59 <h2 id="sec-2">2 <span class="todo">TODO</span> [#A] P_STOP and P_CHOOSE for IO/EM</h2>
  60 <div id="text-2">
  61
  62 <p><a href="src/dmv.py">dmv-P_STOP</a>
  63 Remember: The P<sub>STOP</sub> formula is upside-down (left-to-right also).
  64 (In the article..not the thesis)
  65 </p>
  66 <p>
  67 Remember: Initialization makes some "short-cut" rules, these will also
  68 have to be updated along with the other P<sub>STOP</sub> updates:
  69 </p><ul>
  70 <li>
  71 b[(NOBAR, n<sub>h</sub>), 'h'] = 1.0       # always
  72 </li>
  73 <li>
  74 b[(RBAR, n<sub>h</sub>), 'h'] = h_.probA  # h_ is RBAR stop rule
  75 </li>
  76 <li>
  77 b[(LRBAR, n<sub>h</sub>), 'h'] = h_.probA * _ h_.probA
  78
  79 </li>
  80 <li id="sec-2.1">P_STOP formulas for various dir and adj:<br/>
  81 Assuming this:
  82 <ul>
  83 <li>
  84 P<sub>STOP</sub>(STOP|h,L,non_adj) = &sum;<sub>corpus</sub> &sum;<sub>s&lt;loc(h)</sub> &sum;<sub>t</sub>
  85 inner(s,t,_h_, &hellip;) / &sum;<sub>corpus</sub> &sum;<sub>s&lt;loc(h)</sub> &sum;<sub>t</sub>
  86 inner(s,t,h_, &hellip;)
  87 </li>
  88 <li>
  89 P<sub>STOP</sub>(STOP|h,L,adj) = &sum;<sub>corpus</sub> &sum;<sub>s=loc(h)</sub> &sum;<sub>t</sub>
  90 inner(s,t,_h_, &hellip;) / &sum;<sub>corpus</sub> &sum;<sub>s=loc(h)</sub> &sum;<sub>t</sub>
  91 inner(s,t,h_, &hellip;)
  92 </li>
  93 <li>
  94 P<sub>STOP</sub>(STOP|h,R,non_adj) = &sum;<sub>corpus</sub> &sum;<sub>s</sub> &sum;<sub>t&gt;loc(h)</sub>
  95 inner(s,t,_h_, &hellip;) / &sum;<sub>corpus</sub> &sum;<sub>s</sub> &sum;<sub>t&gt;loc(h)</sub>
  96 inner(s,t,h_, &hellip;)
  97 </li>
  98 <li>
  99 P<sub>STOP</sub>(STOP|h,R,adj) = &sum;<sub>corpus</sub> &sum;<sub>s</sub> &sum;<sub>t=loc(h)</sub>
 100 inner(s,t,_h_, &hellip;) / &sum;<sub>corpus</sub> &sum;<sub>s</sub> &sum;<sub>t=loc(h)</sub>
 101 inner(s,t,h_, &hellip;)
 102
 103 </li>
 104 </ul>
 105
 106 <p>(And P<sub>STOP</sub>(-STOP|&hellip;) = 1 - P<sub>STOP</sub>(STOP|&hellip;) )
 107 </p></li>
 108 <li id="sec-2.2">P_CHOOSE formula:<br/>
 109 Assuming this:
 110 <ul>
 111 <li>
 112 P<sub>CHOOSE</sub>(a|h,L) = &sum;<sub>corpus</sub> &sum;<sub>s&lt;loc(h)</sub> &sum;<sub>t&gt;=loc(h)</sub> &sum;<sub>r&lt;t</sub>
 113 inner(s,r,_a_, &hellip;) / &sum;<sub>corpus</sub> &sum;<sub>s&lt;loc(h)</sub> &sum;<sub>t&gt;=loc(h)</sub>
 114 inner(s,t,h_,&hellip;)
 115 <ul>
 116 <li>
 117 t &gt;= loc(h) since there are many possibilites for
 118 right-attachments below, and each of them alone gives a lower
 119 probability (through multiplication) to the upper tree (so sum
 120 them all)
 121 </li>
 122 </ul>
 123 </li>
 124 <li>
 125 P<sub>CHOOSE</sub>(a|h,R) = &sum;<sub>corpus</sub> &sum;<sub>s=loc(h)</sub> &sum;<sub>r&gt;s</sub>
 126 inner(s,r,_a_,&hellip;) / &sum;<sub>corpus</sub> &sum;<sub>s=loc(h)</sub> &sum;<sub>t</sub>
 127 inner(s,t,h, &hellip;)
 128
 129
 130 </li>
 131 </ul>
 132 </li>
 133 </ul>
 134 </div>
 135
 136 </div>
 137
 138 <div id="outline-container-3" class="outline-2">
 139 <h2 id="sec-3">3 <span class="todo">TOGROK</span> Find Most Probable Parse of given test sentence</h2>
 140 <div id="text-3">
 141
 142 <p>We could probably do it pretty easily from the chart:
 143 </p><ul>
 144 <li>
 145 for all loc_h, select the (0,len(sent),ROOT,loc_h) that has the highest p
 146 </li>
 147 <li>
 148 for stops, just keep going
 149 </li>
 150 <li>
 151 for rewrites, for loc_dep select the one that has the highest p
 152 </li>
 153 </ul>
 154 </div>
 155
 156 </div>
 157
 158 <div id="outline-container-4" class="outline-2">
 159 <h2 id="sec-4">4 <span class="todo">TOGROK</span> Combine CCM with DMV</h2>
 160 <div id="text-4">
 161
 162 </div>
 163
 164 </div>
 165
 166 <div id="outline-container-5" class="outline-2">
 167 <h2 id="sec-5">5 <span class="todo">TODO</span> Get a dependency parsed version of WSJ to test with</h2>
 168 <div id="text-5">
 169
 170 </div>
 171
 172 </div>
 173
 174 <div id="outline-container-6" class="outline-2">
 175 <h2 id="sec-6">6 Initialization   </h2>
 176 <div id="text-6">
 177
 178 <p><a href="/Users/kiwibird/Documents/Skole/V08/Probability/dmvccm/src/dmv.py">dmv-inits</a>
 179 </p>
 180 <p>
 181 We do have to go through the corpus, since the probabilities are based
 182 on how far away in the sentence arguments are from their heads.
 183 </p><ul>
 184 <li id="sec-6.1"><span class="todo">TOGROK</span> CCM Initialization    <br/>
 185 P<sub>SPLIT</sub> used here&hellip; how, again?
 186 </li>
 187 <li id="sec-6.2"><span class="done">DONE</span> Separate initialization to another file?                      &nbsp;&nbsp;&nbsp;<span class="tag">PRETTIER</span><br/>
 188 <span class="timestamp-kwd">CLOSED: </span> <span class="timestamp">2008-06-08 Sun 12:51</span><br/>
 189 <a href="src/harmonic.py">harmonic.py</a>
 190 </li>
 191 <li id="sec-6.3"><span class="done">DONE</span> DMV Initialization probabilities<br/>
 192 (from initialization frequency)
 193 </li>
 194 <li id="sec-6.4"><span class="done">DONE</span> DMV Initialization frequencies    <br/>
 195 <span class="timestamp-kwd">CLOSED: </span> <span class="timestamp">2008-05-27 Tue 20:04</span><br/>
 196 <ul>
 197 <li id="sec-6.4.1">P_STOP    <br/>
 198 P<sub>STOP</sub> is not well defined by K&amp;M. One possible interpretation given
 199 the sentence [det nn vb nn] is
 200 <pre>
 201  f_{STOP}( STOP|det, L, adj) +1
 202  f_{STOP}(-STOP|det, L, adj) +0
 203  f_{STOP}( STOP|det, L, non_adj) +1
 204  f_{STOP}(-STOP|det, L, non_adj) +0
 205  f_{STOP}( STOP|det, R, adj) +0
 206  f_{STOP}(-STOP|det, R, adj) +1
 207
 208  f_{STOP}( STOP|nn, L, adj) +0
 209  f_{STOP}(-STOP|nn, L, adj) +1
 210  f_{STOP}( STOP|nn, L, non_adj) +1  # since there's at least one to the left
 211  f_{STOP}(-STOP|nn, L, non_adj) +0
 212 </pre>
 213 <ul>
 214 <li id="sec-6.4.1.1"><span class="todo">TODO</span> tweak<br/>
 215 <pre>
 216             f[head,  'STOP', 'LN'] += (i_h &lt;= 1)     # first two words
 217             f[head, '-STOP', 'LN'] += (not i_h &lt;= 1)
 218             f[head,  'STOP', 'LA'] += (i_h == 0)     # very first word
 219             f[head, '-STOP', 'LA'] += (not i_h == 0)
 220             f[head,  'STOP', 'RN'] += (i_h &gt;= n - 2) # last two words
 221             f[head, '-STOP', 'RN'] += (not i_h &gt;= n - 2)
 222             f[head,  'STOP', 'RA'] += (i_h == n - 1) # very last word
 223             f[head, '-STOP', 'RA'] += (not i_h == n - 1)
 224 </pre>
 225 vs
 226 <pre>
 227             # this one requires some additional rewriting since it
 228             # introduces divisions by zero
 229             f[head,  'STOP', 'LN'] += (i_h == 1)     # second word
 230             f[head, '-STOP', 'LN'] += (not i_h &lt;= 1) # not first two
 231             f[head,  'STOP', 'LA'] += (i_h == 0)     # first word
 232             f[head, '-STOP', 'LA'] += (not i_h == 0) # not first
 233             f[head,  'STOP', 'RN'] += (i_h == n - 2)     # second-to-last
 234             f[head, '-STOP', 'RN'] += (not i_h &gt;= n - 2) # not last two
 235             f[head,  'STOP', 'RA'] += (i_h == n - 1)     # last word
 236             f[head, '-STOP', 'RA'] += (not i_h == n - 1) # not last
 237 </pre>
 238 vs
 239 <pre>
 240             f[head,  'STOP', 'LN'] += (i_h == 1)     # second word
 241             f[head, '-STOP', 'LN'] += (not i_h == 1) # not second
 242             f[head,  'STOP', 'LA'] += (i_h == 0)     # first word
 243             f[head, '-STOP', 'LA'] += (not i_h == 0) # not first
 244             f[head,  'STOP', 'RN'] += (i_h == n - 2)     # second-to-last
 245             f[head, '-STOP', 'RN'] += (not i_h == n - 2) # not second-to-last
 246             f[head,  'STOP', 'RA'] += (i_h == n - 1)     # last word
 247             f[head, '-STOP', 'RA'] += (not i_h == n - 1) # not last
 248 </pre>
 249 vs
 250 "all words take the same number of arguments" interpreted as
 251 <pre>
 252 for all heads:
 253     p_STOP(head, 'STOP', 'LN') = 0.3
 254     p_STOP(head, 'STOP', 'LA') = 0.5
 255     p_STOP(head, 'STOP', 'RN') = 0.4
 256     p_STOP(head, 'STOP', 'RA') = 0.7
 257 </pre>
 258 (which we easily may tweak in init_zeros())
 259 </li>
 260 </ul>
 261 </li>
 262 <li id="sec-6.4.2">P_CHOOSE<br/>
 263 Go through the corpus, counting distances between heads and
 264 arguments. In [det nn vb nn], we give
 265 <ul>
 266 <li>
 267 f<sub>CHOOSE</sub>(nn|det, R) +1/1 + C
 268 </li>
 269 <li>
 270 f<sub>CHOOSE</sub>(vb|det, R) +1/2 + C
 271 </li>
 272 <li>
 273 f<sub>CHOOSE</sub>(nn|det, R) +1/3 + C
 274 <ul>
 275 <li>
 276 If this were the full corpus, P<sub>CHOOSE</sub>(nn|det, R) would have
 277 (1+1/3+2C) / sum_a f<sub>CHOOSE</sub>(a|det, R)
 278
 279 </li>
 280 </ul>
 281 </li>
 282 </ul>
 283
 284 <p>The ROOT gets "each argument with equal probability", so in a sentence
 285 of three words, 1/3 for each (in [nn vb nn], 'nn' gets 2/3). Basically
 286 just a frequency count of the corpus&hellip;
 287 </p></li>
 288 </ul>
 289 </li>
 290 </ul>
 291 </div>
 292
 293 </div>
 294
 295 <div id="outline-container-7" class="outline-2">
 296 <h2 id="sec-7">7 [#C] Deferred</h2>
 297 <div id="text-7">
 298
 299 <p><a href="http://wiki.python.org/moin/PythonSpeed/PerformanceTips">http://wiki.python.org/moin/PythonSpeed/PerformanceTips</a> Eg., use
 300 map/reduce/filter/[i for i in [i's]]/(i for i in [i's]) instead of
 301 for-loops; use local variables for globals (global variables or or
 302 functions), etc.
 303 </p>
 304 <ul>
 305 <li id="sec-7.1"><span class="todo">TODO</span> if loc_h == t, no need to try right-attachment rules          &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span><br/>
 306 ** TODO if loc_h == s, no need to try left-attachment rules           :OPTIMIZE:
 307 </li>
 308 <li id="sec-7.3"><span class="todo">TODO</span> when reestimating P_STOP etc, remove rules with p &lt; epsilon   &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span><br/>
 309 </li>
 310 <li id="sec-7.4"><span class="todo">TODO</span> inner_dmv, short ranges and impossible attachment             &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span><br/>
 311 If s-t &lt;= 2, there can be only one attachment below, so don't recurse
 312 with both Lattach=True and Rattach=True.
 313
 314 <p>
 315 If s-t &lt;= 1, there can be no attachment below, so only recurse with
 316 Lattach=False, Rattach=False.
 317 </p>
 318 <p>
 319 Put this in the loop under rewrite rules (could also do it in the STOP
 320 section, but that would only have an effect on very short sentences).
 321 </p></li>
 322 <li id="sec-7.5"><span class="todo">TODO</span> clean up the module files                                     &nbsp;&nbsp;&nbsp;<span class="tag">PRETTIER</span><br/>
 323 Is there better way to divide dmv and harmonic? There's a two-way
 324 dependency between the modules. Guess there could be a third file that
 325 imports both the initialization and the actual EM stuff, while a file
 326 containing constants and classes could be imported by all others:
 327 <pre>
 328  dmv.py imports dmv_EM.py imports dmv_classes.py
 329  dmv.py imports dmv_inits.py imports dmv_classes.py
 330 </pre>
 331
 332 </li>
 333 <li id="sec-7.6"><span class="todo">TOGROK</span> Some (tagged) sentences are bound to come twice             &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span><br/>
 334 Eg, first sort and count, so that the corpus
 335 [['nn','vbd','det','nn'],
 336 ['vbd','nn','det','nn'],
 337 ['nn','vbd','det','nn']]
 338 becomes
 339 [(['nn','vbd','det','nn'],2),
 340 (['vbd','nn','det','nn'],1)]
 341 and then in each loop through sentences, make sure we handle the
 342 frequency correctly.
 343
 344 <p>
 345 Is there much to gain here?
 346 </p>
 347 </li>
 348 <li id="sec-7.7"><span class="todo">TOGROK</span> tags as numbers or tags as strings?                         &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span><br/>
 349 Need to clean up the representation.
 350
 351 <p>
 352 Stick with tag-strings in initialization then switch to numbers for
 353 IO-algorithm perhaps? Can probably afford more string-matching in
 354 initialization..
 355 </p></li>
 356 <li id="sec-7.8"><span class="done">DONE</span> io.debug parameters should not call functions                 &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span><br/>
 357 <span class="timestamp-kwd">CLOSED: </span> <span class="timestamp">2008-06-10 Tue 12:26</span><br/>
 358 Exchanged all io.debug(str,'level') calls with statements of the form:
 359 <pre>
 360 if 'level' in io.DEBUG:
 361     print str
 362 </pre>
 363
 364 <p>
 365 and got an almost threefold speed increase on inner().
 366 </p></li>
 367 <li id="sec-7.9"><span class="done">DONE</span> inner_dmv() should disregard rules with heads not in sent     &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span><br/>
 368 <span class="timestamp-kwd">CLOSED: </span> <span class="timestamp">2008-06-08 Sun 10:18</span><br/>
 369 If the sentence is "nn vbd det nn", we should not even look at rules
 370 where
 371 <pre>
 372  rule.head() not in "nn vbd det nn".split()
 373 </pre>
 374 This is ruled out by getting rules from g.rules(LHS, sent).
 375
 376 <p>
 377 Also, we optimize this further by saying we don't even recurse into
 378 attachment rules where
 379 <pre>
 380  rule.head() not in sent[ s :r+1]
 381  rule.head() not in sent[r+1:t+1]
 382 </pre>
 383 meaning, if we're looking at the span "vbd det", we only use
 384 attachment rules where both daughters are members of ['vbd','det']
 385 (although we don't (yet) care about removing rules that rewrite to the
 386 same tag if there are no duplicate tags in the span, etc., that would
 387 be a lot of trouble for little potential gain).
 388 </p></li>
 389 </ul>
 390 </div>
 391
 392 </div>
 393
 394 <div id="outline-container-8" class="outline-2">
 395 <h2 id="sec-8">8 Adjacency and combining it with inner()</h2>
 396 <div id="text-8">
 397
 398 <p>Each DMV_Rule now has both a probN and a probA, for
 399 adjacencies. inner() needs the correct one in each case.
 400 </p>
 401 <p>
 402 Adjacency gives a problem with duplicate words/tags, eg. in the
 403 sentence "a a b". If this has the dependency structure b-&gt;a<sub>0</sub>-&gt;a<sub>1</sub>,
 404 then b is non-adjacent to a<sub>0</sub> and should use probN (for the LRStop and
 405 the attachment of a<sub>0</sub>), while the other rules should all use
 406 probA. But within the e(0,2,b) we can't just say "oh, a has index 0
 407 so it's not adjacent to 2", since there's also an a at index 1, and
 408 there's also a dependency structure b-&gt;a<sub>1</sub>-&gt;a<sub>0</sub> for that. We want
 409 both. And in possibly much more complex versions.
 410 </p>
 411 <p>
 412 So now we call inner() for each location of a head, and on each
 413 terminal, loc_h must equal s (and t). In the recursive attachment
 414 calls, we use the locations (sentence indices) of words to the left or
 415 right of the head in calls to inner(). loc_h lets us check whether we
 416 need probN or probA.
 417 </p>
 418 <p>
 419 In each inner() call, loc_h is the location of the root of this
 420 dependency structure.
 421 </p><ul>
 422 <li id="sec-8.1">Possible alternate type of adjacency<br/>
 423 K&amp;M's adjacency is just whether or not an argument has been generated
 424 in the current direction yet. One could also make a stronger type of
 425 adjacency, where h and a are not adjacent if b is in between, eg. with
 426 the sentence "a b h" and the structure ((h-&gt;a), (a-&gt;b)), h is
 427 K&amp;M-adjacent to a, but not next to a, since b is in between. It's easy
 428 to check this type of adjacency in inner(), but it needs new rules for
 429 P_STOP reestimation.
 430 </li>
 431 </ul>
 432 </div>
 433
 434 </div>
 435
 436 <div id="outline-container-9" class="outline-2">
 437 <h2 id="sec-9">9 Expectation Maximation in IO/DMV-terms</h2>
 438 <div id="text-9">
 439
 440 <p>inner(s,t,LHS) calculates the expected number of trees headed by LHS
 441 from s to t (sentence positions). This uses the P_STOP and P_CHOOSE
 442 values, which have been conveniently distributed into CNF rules as
 443 probN and probA (non-adjacent and adjacent probabilites).
 444 </p>
 445 <p>
 446 When re-estimating, we use the expected values from inner() to get new
 447 values for P_STOP and P_CHOOSE. When we've re-estimated for the entire
 448 corpus, we distribute P_STOP and P_CHOOSE into the CNF rules again, so
 449 that in the next round we use new probN and probA to find
 450 inner-probabilites.
 451 </p>
 452 <p>
 453 The distribution of P_STOP and P_CHOOSE into CNF rules also happens in
 454 init_normalize() (here along with the creation of P_STOP and
 455 P_CHOOSE); P_STOP is used to create CNF rules where one branch of the
 456 rule is STOP, P_CHOOSE is used to create rules of the form
 457 <pre>
 458  h  -&gt; h  _a_
 459  h_ -&gt; h_ _a_
 460 </pre>
 461 </p>
 462 <p>
 463 Since "adjacency" is not captured in regular CNF rules, we need two
 464 probabilites for each rule, and inner() has to know when to use which.
 465 </p>
 466 <ul>
 467 <li id="sec-9.1"><span class="todo">TODO</span> Corpus access<br/>
 468 </li>
 469 <li id="sec-9.2"><span class="todo">TOGROK</span> sentences or rules as the "outer loop"?                     &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span><br/>
 470 In regard to the E/M-step, finding P<sub>STOP</sub>, P<sub>CHOOSE</sub>.
 471
 472
 473 </li>
 474 </ul>
 475 </div>
 476
 477 </div>
 478
 479 <div id="outline-container-10" class="outline-2">
 480 <h2 id="sec-10">10 Python-stuff</h2>
 481 <div id="text-10">
 482
 483 <p>Make those debug statements steal a bit less attention in emacs:
 484 <pre>
 485 (font-lock-add-keywords
 486  'python-mode                   ; not really regexp, a bit slow
 487  '(("^\\( *\\)\\(\\if +'.+' +in +io.DEBUG. *\\(
 488 \\1    .+$\\)+\\)" 2 font-lock-preprocessor-face t)))
 489 (font-lock-add-keywords
 490  'python-mode
 491  '(("\\&lt;\\(\\(io\\.\\)?debug(.+)\\)" 1 font-lock-preprocessor-face t)))
 492 </pre>
 493 </p>
 494 <ul>
 495 <li>
 496 <a href="src/pseudo.py">pseudo.py</a>
 497 </li>
 498 <li>
 499 <a href="http://nltk.org/doc/en/structured-programming.html">http://nltk.org/doc/en/structured-programming.html</a> recursive dynamic
 500 </li>
 501 <li>
 502 <a href="http://nltk.org/doc/en/advanced-parsing.html">http://nltk.org/doc/en/advanced-parsing.html</a>
 503 </li>
 504 <li>
 505 <a href="http://jaynes.colorado.edu/PythonIdioms.html">http://jaynes.colorado.edu/PythonIdioms.html</a>
 506
 507
 508
 509 </li>
 510 </ul>
 511 </div>
 512
 513 </div>
 514
 515 <div id="outline-container-11" class="outline-2">
 516 <h2 id="sec-11">11 Git</h2>
 517 <div id="text-11">
 518
 519 <p>Repository web page: <a href="http://repo.or.cz/w/dmvccm.git">http://repo.or.cz/w/dmvccm.git</a>
 520 </p>
 521 <p>
 522 Setting up a new project:
 523 <pre>
 524  git init
 525  git add .
 526  git commit -m "first release"
 527 </pre>
 528 </p>
 529 <p>
 530 Later on: (<code>-a</code> does <code>git rm</code> and <code>git add</code> automatically)
 531 <pre>
 532  git init
 533  git commit -a -m "some subsequent release"
 534 </pre>
 535 </p>
 536 <p>
 537 Then push stuff up to the remote server:
 538 <pre>
 539  git push git+ssh://username@repo.or.cz/srv/git/dmvccm.git master
 540 </pre>
 541 </p>
 542 <p>
 543 (<code>eval `ssh-agent`</code> and <code>ssh-add</code> to avoid having to type in keyphrase all
 544 the time)
 545 </p>
 546 <p>
 547 Make a copy of the (remote) master branch:
 548 <pre>
 549  git clone git://repo.or.cz/dmvccm.git
 550 </pre>
 551 </p>
 552 <p>
 553 Make and name a new branch in this folder
 554 <pre>
 555  git checkout -b mybranch
 556 </pre>
 557 </p>
 558 <p>
 559 To save changes in <code>mybranch</code>:
 560 <pre>
 561  git commit -a
 562 </pre>
 563 </p>
 564 <p>
 565 Go back to the master branch (uncommitted changes from <code>mybranch</code> are
 566 carried over):
 567 <pre>
 568  git checkout master
 569 </pre>
 570 </p>
 571 <p>
 572 Try out:
 573 <pre>
 574  git add --interactive
 575 </pre>
 576 </p>
 577 <p>
 578 Good tutorial:
 579 <a href="http://www-cs-students.stanford.edu/~blynn//gitmagic/">http://www-cs-students.stanford.edu/~blynn//gitmagic/</a>
 580 </p></div>
 581 </div>
 582 <div id="postamble"><p class="author"> Author: Kevin Brubeck Unhammer
 583 <a href="mailto:K.BrubeckUnhammer at student uva nl ">&lt;K.BrubeckUnhammer at student uva nl &gt;</a>
 584 </p>
 585 <p class="date"> Date: 2008/06/10 12:33:43</p>
 586 </div><p class="postamble">Skrive vha. emacs + <a href='http://orgmode.org/'>org-mode</a></p></body>
 587 </html>