DMVCCM.html

   1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
   2                "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
   3 <html xmlns="http://www.w3.org/1999/xhtml"
   4 lang="en" xml:lang="en">
   5 <head>
   6 <title>DMV/CCM &ndash; todo-list / progress</title>
   7 <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
   8 <meta name="generator" content="Org-mode"/>
   9 <meta name="generated" content="2008/07/25 14:16:20"/>
  10 <meta name="author" content="Kevin Brubeck Unhammer"/>
  11 <link rel="stylesheet" type="text/css" href="http://www.student.uib.no/~kun041/org.css">
  12 <!-- override with local style.css: -->
  13 <link rel="stylesheet" type="text/css" href="./style.css">
  14 </head><body>
  15 <h1 class="title">DMV/CCM &ndash; todo-list / progress</h1>
  16 <div id="table-of-contents">
  17 <h2>Table of Contents</h2>
  18 <div id="text-table-of-contents">
  19 <ul>
  20 <li><a href="#sec-1">1 DMV/CCM report and project</a></li>
  21 <li><a href="#sec-2">2 Notation</a></li>
  22 <li><a href="#sec-3">3 Testing the dependency parsed WSJ</a>
  23 <ul>
  24 <li><a href="#sec-3.1">3.1 [#A] Add <code>(('ROOT',-1), ('sentence_head', loc_h))</code> to <code>dep_parse</code></a></li>
  25 <li><a href="#sec-3.2">3.2 [#A] <code>def evaluate</code> </a></li>
  26 </ul>
  27 </li>
  28 <li><a href="#sec-4">4 Combine CCM with DMV</a></li>
  29 <li><a href="#sec-5">5 Reestimate P_ORDER ?</a></li>
  30 <li><a href="#sec-6">6 Most Probable Parse</a>
  31 <ul>
  32 <li><a href="#sec-6.1">6.1 Find MPP with CCM</a></li>
  33 <li><a href="#sec-6.2">6.2 Find Most Probable Parse of given test sentence, in DMV</a></li>
  34 </ul>
  35 </li>
  36 <li><a href="#sec-7">7 [#C] Alternative CNF for DMV</a>
  37 <ul>
  38 <li><a href="#sec-7.1">7.1 move as much as possible into common_dmv.py</a></li>
  39 <li><a href="#sec-7.2">7.2 dmv2cnf re-estimation formulas</a></li>
  40 <li><a href="#sec-7.3">7.3 dmv2cnf IO formulas</a></li>
  41 <li><a href="#sec-7.4">7.4 complete programming the dmv2cnf versions of dmv.py and harmonic.py </a></li>
  42 </ul>
  43 </li>
  44 <li><a href="#sec-8">8 Initialization   </a>
  45 <ul>
  46 <li><a href="#sec-8.1">8.1 CCM Initialization    </a></li>
  47 </ul>
  48 </li>
  49 <li><a href="#sec-9">9 [#C] Deferred</a>
  50 <ul>
  51 <li><a href="#sec-9.1">9.1 Clean up reestimation code</a></li>
  52 <li><a href="#sec-9.2">9.2 [#A] compare speed of w_left/right(&hellip;) and w(LEFT/RIGHT, &hellip;)</a></li>
  53 <li><a href="#sec-9.3">9.3 when reestimating P_STOP etc, remove rules with p &lt; epsilon</a></li>
  54 <li><a href="#sec-9.4">9.4 inner_dmv, short ranges and impossible attachment</a></li>
  55 <li><a href="#sec-9.5">9.5 clean up the module files</a></li>
  56 <li><a href="#sec-9.6">9.6 Some (tagged) sentences are bound to come twice</a></li>
  57 <li><a href="#sec-9.7">9.7 tags as numbers or tags as strings?</a></li>
  58 </ul>
  59 </li>
  60 <li><a href="#sec-10">10 Adjacency and combining it with the inside-outside algorithm</a>
  61 <ul>
  62 <li><a href="#sec-10.1">10.1 Possible alternate type of adjacency</a></li>
  63 </ul>
  64 </li>
  65 <li><a href="#sec-11">11 Python-stuff</a></li>
  66 <li><a href="#sec-12">12 Git</a></li>
  67 </ul>
  68 </div>
  69 </div>
  70
  71 <div id="outline-container-1" class="outline-2">
  72 <h2 id="sec-1">1 DMV/CCM report and project</h2>
  73 <div id="text-1">
  74
  75 <ul>
  76 <li>
  77 DMV-<a href="tex/formulas.pdf">formulas.pdf</a>  &ndash; <i>clear</i> information =D
  78
  79 </li>
  80 <li>
  81 <a href="src/io.py">io.py</a>
  82 </li>
  83 <li>
  84 <a href="src/loc_h_harmonic.py">loc_h_harmonic.py</a>
  85 </li>
  86 <li>
  87 <a href="src/loc_h_dmv.py">loc_h_dmv.py</a>
  88 </li>
  89 <li>
  90 <a href="src/common_dmv.py">common_dmv.py</a>
  91 </li>
  92 <li>
  93 <a href="src/wsjdep.py">wsjdep.py</a>
  94 </li>
  95 <li>
  96 <a href="src/main.py">main.py</a>
  97
  98
  99 </li>
 100 </ul>
 101
 102 <p><a href="http://www.student.uib.no/~kun041/dmvccm/DMVCCM_archive.html">Archived entries</a> from this file.
 103 </p></div>
 104
 105 </div>
 106
 107 <div id="outline-container-2" class="outline-2">
 108 <h2 id="sec-2">2 Notation</h2>
 109 <div id="text-2">
 110
 111 <p><pre>
 112  old notes:   new notes:   in tex/code (constants):    in Klein thesis:
 113 --------------------------------------------------------------------------------------
 114  _h_            _h_            SEAL                    bar over h
 115   h_             h&gt;&lt;           RGOL                    right-under-left-arrow over h
 116   h              h&gt;            GOR                     right-arrow over h
 117
 118                &gt;&lt;h             LGOR                    left-under-right-arrow over h
 119                 &lt;h             GOL                     left-arrow over h
 120 </pre>
 121 These are represented in the code as pairs <code>(s_h,h)</code>, where <code>h</code> is an
 122 integer (POS-tag) and <code>s_h</code> &isin; <code>{SEAL,RGOL,GOR,LGOR,GOL}</code>.
 123 </p>
 124 <p>
 125 <code>P_ATTACH</code> and <code>P_CHOOSE</code> are synonymous, I try to use the
 126 former. Also,
 127 <pre>
 128  P_GO_AT(a|h,dir,adj) := P_ATTACH(a|h,dir)*(1-P_STOP(STOP|h,dir,adj)
 129 </pre>
 130 </p>
 131 <p>
 132 (precalculated after each reestimation with <code>g.p_GO_AT = make_GO_AT(g.p_STOP,g.p_ATTACH)</code>)
 133 </p>
 134 </div>
 135
 136 </div>
 137
 138 <div id="outline-container-3" class="outline-2">
 139 <h2 id="sec-3">3 Testing the dependency parsed WSJ</h2>
 140 <div id="text-3">
 141
 142 <p><a href="src/wsjdep.py">wsjdep.py</a> uses NLTK (sort of) to get a dependency parsed version of
 143 WSJ10 into the format used in mpp() in loc_h_dmv.py.
 144 </p>
 145 <p>
 146 As a default, <code>WSJDepCorpusReader</code> looks for the file <code>wsj.combined.10.dep</code> in
 147 <code>../corpus/wsjdep</code>.
 148 </p>
 149 <p>
 150 Only <code>sents()</code>, <code>tagged_sents()</code> and <code>parsed_sents()</code> (plus a new function
 151 <code>tagonly_sents()</code>) are implemented, the other NLTK corpus functions are
 152 ..um.. undefined&hellip;
 153 </p>
 154 </div>
 155
 156 <div id="outline-container-3.1" class="outline-3">
 157 <h3 id="sec-3.1">3.1 <span class="todo">TODO</span> [#A] Add <code>(('ROOT',-1), ('sentence_head', loc_h))</code> to <code>dep_parse</code></h3>
 158 <div id="text-3.1">
 159
 160 <p><a href="src/wsjdep.py">wsjdep.py</a>
 161 </p></div>
 162
 163 </div>
 164
 165 <div id="outline-container-3.2" class="outline-3">
 166 <h3 id="sec-3.2">3.2 <span class="todo">TODO</span> [#A] <code>def evaluate</code> </h3>
 167 <div id="text-3.2">
 168
 169 <p><a href="src/main.py">main.py</a>
 170 </p>
 171 <p>
 172 (just has to count how many pairs are in there; Precision and Recall)
 173 </p></div>
 174 </div>
 175
 176 </div>
 177
 178 <div id="outline-container-4" class="outline-2">
 179 <h2 id="sec-4">4 <span class="todo">TOGROK</span> Combine CCM with DMV</h2>
 180 <div id="text-4">
 181
 182
 183 <p>
 184 <a name="comboquestions">&nbsp;</a>
 185 </p>
 186 <p>
 187 Questions about the <code>P_COMBO</code> info in <a href="http://www.eecs.berkeley.edu/~klein/papers/klein_thesis.pdf">Klein's thesis</a>:
 188 </p><ul>
 189 <li>
 190 Page 109 (pdf: 125): We have to premultiply "all our probabilities"
 191 by the CCM base product <i>&Pi;<sub>&lt;i,j&gt;</sub>   P<sub>SPAN</sub>(&alpha;(i,j,s)|false)P<sub>CONTEXT</sub>(&beta;(i,j,s)|false)</i>; which
 192 probabilities are included under "all"? I'm assuming this includes
 193 <code>P_ATTACH</code> since each time <code>P_ATTACH</code> is used, <i>&phi;</i> is multiplied in
 194 (pp.110-111 ibid.); but <i>&phi;</i> is not used for STOPs, so should we not
 195 have our CCM product multiplied in there? How about <code>P_ROOT</code>?
 196 (Guessing <code>P_ORDER</code> is way out of the question&hellip;)
 197 </li>
 198 <li>
 199 For the outside probabilities, is it correct to assume we multiply
 200 in <i>&phi;(j,k)</i> or <i>&phi;(k,i)</i> when calculating <code>inner(i,j...)</code>? (Eg., only
 201 for the outside part, not for the whole range.) I don't understand
 202 the notation in <code>O()</code> on p.103.
 203 </li>
 204 </ul>
 205 </div>
 206
 207 </div>
 208
 209 <div id="outline-container-5" class="outline-2">
 210 <h2 id="sec-5">5 <span class="todo">TOGROK</span> Reestimate P_ORDER ?</h2>
 211 <div id="text-5">
 212
 213 </div>
 214
 215 </div>
 216
 217 <div id="outline-container-6" class="outline-2">
 218 <h2 id="sec-6">6 Most Probable Parse</h2>
 219 <div id="text-6">
 220
 221
 222 </div>
 223
 224 <div id="outline-container-6.1" class="outline-3">
 225 <h3 id="sec-6.1">6.1 <span class="todo">TOGROK</span> Find MPP with CCM</h3>
 226 <div id="text-6.1">
 227
 228 </div>
 229
 230 </div>
 231
 232 <div id="outline-container-6.2" class="outline-3">
 233 <h3 id="sec-6.2">6.2 <span class="done">DONE</span> Find Most Probable Parse of given test sentence, in DMV</h3>
 234 <div id="text-6.2">
 235
 236 <p><span class="timestamp-kwd">CLOSED: </span> <span class="timestamp">2008-07-23 Wed 10:56</span><br/>
 237 inner() optionally keeps track of the highest probability children of
 238 any node in <code>mpptree</code>. Say we're looking for <code>inner(i,j,(s_h,h),loc_h)</code> in
 239 a certain sentence, and we find some possible left and right children,
 240 we add to <code>mpptree[i,j,(s_h,h),loc_h]</code> the triple <code>(p, L, R)</code> where <code>L</code> and
 241 <code>R</code> are of the same form as the key (<code>i,j,(s_h,h),loc_h</code>) and <code>p</code> is the
 242 probability of this node rewriting to <code>L</code> and <code>R</code>,
 243 eg. <code>inner(L)*inner(R)*p_GO_AT</code> or <code>p_STOP</code> or whatever. We only add this
 244 entry to <code>mpptree</code> if there wasn't a higher-probability entry there
 245 before.
 246 </p>
 247 <p>
 248 Then, after <code>inner_sent</code> makes an <code>mpptree</code>, we find the <i>relevant</i>
 249 head-argument pairs by searching through the tree using a queue,
 250 adding the <code>L</code> and <code>R</code> keys of any entry to the queue as we find them
 251 (skipping <code>STOP</code> keys), and adding any attachment entries to a set of
 252 triples <code>(head,argument,dir)</code>. Thus we have our most probable parse,
 253 eg.
 254 <pre>
 255  set([( ROOT, (vbd,2),RIGHT),
 256       ((vbd,2),(nn,1),LEFT),
 257       ((vbd,2),(nn,3),RIGHT),
 258       ((nn,1),(det,0),LEFT)])
 259 </pre>
 260 </p></div>
 261 </div>
 262
 263 </div>
 264
 265 <div id="outline-container-7" class="outline-2">
 266 <h2 id="sec-7">7 <span class="todo">TOGROK</span> [#C] Alternative CNF for DMV</h2>
 267 <div id="text-7">
 268
 269 <p>Alternatively; use rules of this form:
 270 <a name="dmv2cnf">&nbsp;</a>
 271 <pre>
 272   h      Terminal
 273   h[RA]  Non-Terminal, attaching for the first time to the right
 274   h[RN]  Non-Terminal, attaching non-adjacently to the right
 275   h_[RA] Non-Terminal, stopping to the right adjacently
 276   h_[RN] Non-Terminal, stopping to the right non-adjacently
 277   h_[LA] Non-Terminal, attaching for the first time to the left
 278   h_[LN] Non-Terminal, attaching non-adjacently to the left
 279  _h_[LA] Non-Terminal, stopping to the left adjacently
 280  _h_[LN] Non-Terminal, stopping to the left non-adjacently
 281 </pre>
 282 </p>
 283 <p>
 284 <pre>
 285    h[RA] -&gt; h       _a_[LA]  # adjacent right attachment must go to "terminal"
 286    h[RA] -&gt; h       _a_[LN]  # adjacent right attachment must go to "terminal"
 287
 288    h[RN] -&gt; h[RA]   _a_[LA]  # already attached to right
 289    h[RN] -&gt; h[RN]   _a_[LN]
 290
 291   h_[RA] -&gt; h       STOP     # adjacent right stop must go to "terminal"
 292   h_[RN] -&gt; h[RN]   STOP     # o/w non-adjacent
 293   h_[RN] -&gt; h[RA]   STOP
 294
 295   h_[LA] -&gt; _a_[LA] h_[RA]   # adjacent left attachment must
 296   h_[LA] -&gt; _a_[LN] h_[RN]   # go to mothers of stop rules
 297
 298   h_[LN] -&gt; _a_[LA] h_[LN]   # already attached to left
 299   h_[LN] -&gt; _a_[LN] h_[LA]
 300
 301  _h_[LA] -&gt; STOP    h_[RA]   # adjacent left stop goes
 302  _h_[LA] -&gt; STOP    h_[RN]   # straight to a right stop
 303
 304  _h_[LN] -&gt; STOP    h_[LA]   # non-adjacent left stop
 305  _h_[LN] -&gt; STOP    h_[LN]   # goes to a left attachment rule
 306 </pre>
 307 </p>
 308 <p>
 309 The reestimation function still has to sum over the various
 310 possibilities of N's and A's; but it seems to be simpler than the
 311 loc_h-method altogether.
 312 </p>
 313 <p>
 314 One might reduce the number of rules a tiny bit, by having eg. unary rules
 315 <pre>
 316  _a_ -&gt; _a_[LA]
 317  _a_ -&gt; _a_[LN]
 318 </pre>
 319 etc. (although that might just make it all more confusing)
 320 </p>
 321
 322 </div>
 323
 324 <div id="outline-container-7.1" class="outline-3">
 325 <h3 id="sec-7.1">7.1 <span class="todo">TODO</span> move as much as possible into common_dmv.py</h3>
 326 <div id="text-7.1">
 327
 328 <p><a href="src/common_dmv.py">common_dmv.py</a>
 329 </p>
 330 <p>
 331 &hellip;and improve cnf vs loc_h classes (at <i>least</i> give them different names)
 332 </p></div>
 333
 334 </div>
 335
 336 <div id="outline-container-7.2" class="outline-3">
 337 <h3 id="sec-7.2">7.2 <span class="todo">TODO</span> dmv2cnf re-estimation formulas</h3>
 338 <div id="text-7.2">
 339
 340 <p><a href="tex/formulas.tex">tex</a>
 341 </p></div>
 342
 343 </div>
 344
 345 <div id="outline-container-7.3" class="outline-3">
 346 <h3 id="sec-7.3">7.3 <span class="todo">TODO</span> dmv2cnf IO formulas</h3>
 347 <div id="text-7.3">
 348
 349 <p><a href="tex/formulas.tex">tex</a>
 350 draw some trees first? Or, they should be the same as L&amp;Y apart
 351 from also having the STOP rules
 352 </p></div>
 353
 354 </div>
 355
 356 <div id="outline-container-7.4" class="outline-3">
 357 <h3 id="sec-7.4">7.4 <span class="todo">TODO</span> complete programming the dmv2cnf versions of dmv.py and harmonic.py </h3>
 358 <div id="text-7.4">
 359
 360 </div>
 361 </div>
 362
 363 </div>
 364
 365 <div id="outline-container-8" class="outline-2">
 366 <h2 id="sec-8">8 Initialization   </h2>
 367 <div id="text-8">
 368
 369 <p><a href="/Users/kiwibird/Documents/Skole/V08/Probability/dmvccm/src/dmv.py">dmv-inits</a>
 370 </p>
 371 <p>
 372 We go through the corpus, since the probabilities are based on how far
 373 away in the sentence arguments are from their heads.
 374 </p>
 375 </div>
 376
 377 <div id="outline-container-8.1" class="outline-3">
 378 <h3 id="sec-8.1">8.1 <span class="todo">TOGROK</span> CCM Initialization    </h3>
 379 <div id="text-8.1">
 380
 381 <p>P<sub>SPLIT</sub> used here&hellip; how, again?
 382 </p></div>
 383 </div>
 384
 385 </div>
 386
 387 <div id="outline-container-9" class="outline-2">
 388 <h2 id="sec-9">9 [#C] Deferred</h2>
 389 <div id="text-9">
 390
 391 <p><a href="http://wiki.python.org/moin/PythonSpeed/PerformanceTips">http://wiki.python.org/moin/PythonSpeed/PerformanceTips</a> Eg., use
 392 map/reduce/filter/[i for i in [i's]]/(i for i in [i's]) instead of
 393 for-loops; use local variables for globals (global variables or or
 394 functions), etc.
 395 </p>
 396 </div>
 397
 398 <div id="outline-container-9.1" class="outline-3">
 399 <h3 id="sec-9.1">9.1 <span class="todo">TODO</span> Clean up reestimation code                                    &nbsp;&nbsp;&nbsp;<span class="tag">PRETTIER</span></h3>
 400 <div id="text-9.1">
 401
 402 </div>
 403
 404 </div>
 405
 406 <div id="outline-container-9.2" class="outline-3">
 407 <h3 id="sec-9.2">9.2 <span class="todo">TODO</span> [#A] compare speed of w_left/right(&hellip;) and w(LEFT/RIGHT, &hellip;) &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span></h3>
 408 <div id="text-9.2">
 409
 410 </div>
 411
 412 </div>
 413
 414 <div id="outline-container-9.3" class="outline-3">
 415 <h3 id="sec-9.3">9.3 <span class="todo">TODO</span> when reestimating P_STOP etc, remove rules with p &lt; epsilon   &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span></h3>
 416 <div id="text-9.3">
 417
 418 </div>
 419
 420 </div>
 421
 422 <div id="outline-container-9.4" class="outline-3">
 423 <h3 id="sec-9.4">9.4 <span class="todo">TODO</span> inner_dmv, short ranges and impossible attachment             &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span></h3>
 424 <div id="text-9.4">
 425
 426 <p>If s-t &lt;= 2, there can be only one attachment below, so don't recurse
 427 with both Lattach=True and Rattach=True.
 428 </p>
 429 <p>
 430 If s-t &lt;= 1, there can be no attachment below, so only recurse with
 431 Lattach=False, Rattach=False.
 432 </p>
 433 <p>
 434 Put this in the loop under rewrite rules (could also do it in the STOP
 435 section, but that would only have an effect on very short sentences).
 436 </p></div>
 437
 438 </div>
 439
 440 <div id="outline-container-9.5" class="outline-3">
 441 <h3 id="sec-9.5">9.5 <span class="todo">TODO</span> clean up the module files                                     &nbsp;&nbsp;&nbsp;<span class="tag">PRETTIER</span></h3>
 442 <div id="text-9.5">
 443
 444 <p>Is there better way to divide dmv and harmonic? There's a two-way
 445 dependency between the modules. Guess there could be a third file that
 446 imports both the initialization and the actual EM stuff, while a file
 447 containing constants and classes could be imported by all others:
 448 <pre>
 449  dmv.py imports dmv_EM.py imports dmv_classes.py
 450  dmv.py imports dmv_inits.py imports dmv_classes.py
 451 </pre>
 452 </p>
 453 </div>
 454
 455 </div>
 456
 457 <div id="outline-container-9.6" class="outline-3">
 458 <h3 id="sec-9.6">9.6 <span class="todo">TOGROK</span> Some (tagged) sentences are bound to come twice             &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span></h3>
 459 <div id="text-9.6">
 460
 461 <p>Eg, first sort and count, so that the corpus
 462 [['nn','vbd','det','nn'],
 463 ['vbd','nn','det','nn'],
 464 ['nn','vbd','det','nn']]
 465 becomes
 466 [(['nn','vbd','det','nn'],2),
 467 (['vbd','nn','det','nn'],1)]
 468 and then in each loop through sentences, make sure we handle the
 469 frequency correctly.
 470 </p>
 471 <p>
 472 Is there much to gain here?
 473 </p>
 474 </div>
 475
 476 </div>
 477
 478 <div id="outline-container-9.7" class="outline-3">
 479 <h3 id="sec-9.7">9.7 <span class="todo">TOGROK</span> tags as numbers or tags as strings?                         &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span></h3>
 480 <div id="text-9.7">
 481
 482 <p>Need to clean up the representation.
 483 </p>
 484 <p>
 485 Stick with tag-strings in initialization then switch to numbers for
 486 IO-algorithm perhaps? Can probably afford more string-matching in
 487 initialization..
 488 </p></div>
 489 </div>
 490
 491 </div>
 492
 493 <div id="outline-container-10" class="outline-2">
 494 <h2 id="sec-10">10 Adjacency and combining it with the inside-outside algorithm</h2>
 495 <div id="text-10">
 496
 497 <p>Each DMV_Rule has both a probN and a probA, for adjacencies. inner()
 498 and outer() needs the correct one in each case.
 499 </p>
 500 <p>
 501 In each inner() call, loc_h is the location of the head of this
 502 dependency structure. In each outer() call, it's the head of the <i>Node</i>,
 503 the structure we're looking outside of.
 504 </p>
 505 <p>
 506 We call inner() for each location of a head, and on each terminal,
 507 loc_h must equal <code>i</code> (and <code>loc_h+1</code> equal <code>j</code>). In the recursive attachment
 508 calls, we use the locations (sentence indices) of words to the left or
 509 right of the head in calls to inner(). <i>loc_h lets us check whether we need probN or probA</i>.
 510 </p>
 511 </div>
 512
 513 <div id="outline-container-10.1" class="outline-3">
 514 <h3 id="sec-10.1">10.1 Possible alternate type of adjacency</h3>
 515 <div id="text-10.1">
 516
 517 <p>K&amp;M's adjacency is just whether or not an argument has been generated
 518 in the current direction yet. One could also make a stronger type of
 519 adjacency, where h and a are not adjacent if b is in between, eg. with
 520 the sentence "a b h" and the structure ((h-&gt;a), (a-&gt;b)), h is
 521 K&amp;M-adjacent to a, but not next to a, since b is in between. It's easy
 522 to check this type of adjacency in inner(), but it needs new rules for
 523 P_STOP reestimation.
 524 </p></div>
 525 </div>
 526
 527 </div>
 528
 529 <div id="outline-container-11" class="outline-2">
 530 <h2 id="sec-11">11 Python-stuff</h2>
 531 <div id="text-11">
 532
 533 <p>Make those debug statements steal a bit less attention in emacs:
 534 <pre>
 535 (font-lock-add-keywords
 536  'python-mode                   ; not really regexp, a bit slow
 537  '(("^\\( *\\)\\(\\if +'.+' +in +io.DEBUG. *\\(
 538 \\1    .+$\\)+\\)" 2 font-lock-preprocessor-face t)))
 539 (font-lock-add-keywords
 540  'python-mode
 541  '(("\\&lt;\\(\\(io\\.\\)?debug(.+)\\)" 1 font-lock-preprocessor-face t)))
 542 </pre>
 543 </p>
 544 <ul>
 545 <li>
 546 <a href="src/pseudo.py">pseudo.py</a>
 547 </li>
 548 <li>
 549 <a href="http://nltk.org/doc/en/structured-programming.html">http://nltk.org/doc/en/structured-programming.html</a> recursive dynamic
 550 </li>
 551 <li>
 552 <a href="http://nltk.org/doc/en/advanced-parsing.html">http://nltk.org/doc/en/advanced-parsing.html</a>
 553 </li>
 554 <li>
 555 <a href="http://jaynes.colorado.edu/PythonIdioms.html">http://jaynes.colorado.edu/PythonIdioms.html</a>
 556
 557
 558
 559 </li>
 560 </ul>
 561 </div>
 562
 563 </div>
 564
 565 <div id="outline-container-12" class="outline-2">
 566 <h2 id="sec-12">12 Git</h2>
 567 <div id="text-12">
 568
 569 <p>Repository web page: <a href="http://repo.or.cz/w/dmvccm.git">http://repo.or.cz/w/dmvccm.git</a>
 570 </p>
 571 <p>
 572 Setting up a new project:
 573 <pre>
 574  git init
 575  git add .
 576  git commit -m "first release"
 577 </pre>
 578 </p>
 579 <p>
 580 Later on: (<code>-a</code> does <code>git rm</code> and <code>git add</code> automatically)
 581 <pre>
 582  git init
 583  git commit -a -m "some subsequent release"
 584 </pre>
 585 </p>
 586 <p>
 587 Then push stuff up to the remote server:
 588 <pre>
 589  git push git+ssh://username@repo.or.cz/srv/git/dmvccm.git master
 590 </pre>
 591 </p>
 592 <p>
 593 (<code>eval `ssh-agent`</code> and <code>ssh-add</code> to avoid having to type in keyphrase all
 594 the time)
 595 </p>
 596 <p>
 597 Make a copy of the (remote) master branch:
 598 <pre>
 599  git clone git://repo.or.cz/dmvccm.git
 600 </pre>
 601 </p>
 602 <p>
 603 Make and name a new branch in this folder
 604 <pre>
 605  git checkout -b mybranch
 606 </pre>
 607 </p>
 608 <p>
 609 To save changes in <code>mybranch</code>:
 610 <pre>
 611  git commit -a
 612 </pre>
 613 </p>
 614 <p>
 615 Go back to the master branch (uncommitted changes from <code>mybranch</code> are
 616 carried over):
 617 <pre>
 618  git checkout master
 619 </pre>
 620 </p>
 621 <p>
 622 Try out:
 623 <pre>
 624  git add --interactive
 625 </pre>
 626 </p>
 627 <p>
 628 Good tutorial:
 629 <a href="http://www-cs-students.stanford.edu/~blynn//gitmagic/">http://www-cs-students.stanford.edu/~blynn//gitmagic/</a>
 630 </p></div>
 631 </div>
 632 <div id="postamble"><p class="author"> Author: Kevin Brubeck Unhammer
 633 <a href="mailto:K.BrubeckUnhammer at student uva nl ">&lt;K.BrubeckUnhammer at student uva nl &gt;</a>
 634 </p>
 635 <p class="date"> Date: 2008/07/25 14:16:20</p>
 636 </div><p class="nn-postamble">Skrive vha. emacs + <a href='http://orgmode.org/'>org-mode</a></p></body>
 637 </html>