DMVCCM.html

   1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
   2                "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
   3 <html xmlns="http://www.w3.org/1999/xhtml"
   4 lang="en" xml:lang="en">
   5 <head>
   6 <title>DMV/CCM &ndash; todo-list / progress</title>
   7 <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
   8 <meta name="generator" content="Org-mode"/>
   9 <meta name="generated" content="2008/08/01 21:56:44"/>
  10 <meta name="author" content="Kevin Brubeck Unhammer"/>
  11 <link rel="stylesheet" type="text/css" href="http://www.student.uib.no/~kun041/org.css">
  12 <!-- override with local style.css: -->
  13 <link rel="stylesheet" type="text/css" href="./style.css">
  14
  15 <script type="text/Javascript">
  16 function toggleSemester (semID) {
  17     var sem = document.getElementById(semID);
  18     cN = sem.childNodes;
  19     for(i=0;i<cN.length;i++){
  20         if (cN[i].className=="outline-4") {
  21             cN[i].className="outline-4-hidden";
  22         }
  23         else if(cN[i].className=="outline-4-hidden") {
  24             cN[i].className="outline-4";
  25         }
  26     }
  27 }
  28 </script>
  29 </head><body>
  30 <h1 class="title">DMV/CCM &ndash; todo-list / progress</h1>
  31 <div id="table-of-contents">
  32 <h2>Table of Contents</h2>
  33 <div id="text-table-of-contents">
  34 <ul>
  35 <li><a href="#sec-1">1 DMV/CCM report and project</a></li>
  36 <li><a href="#sec-2">2 Notation</a></li>
  37 <li><a href="#sec-3">3 Testing the dependency parsed WSJ</a>
  38 <ul>
  39 <li><a href="#sec-3.1">3.1 [#A] Should <code>def evaluate</code> use add_root?</a></li>
  40 </ul>
  41 </li>
  42 <li><a href="#sec-4">4 [#C] Alternative CNF for DMV</a>
  43 <ul>
  44 <li><a href="#sec-4.1">4.1 Do we use special P_ROOT rules?</a></li>
  45 <li><a href="#sec-4.2">4.2 complete programming the dmv2cnf versions of dmv.py and harmonic.py </a></li>
  46 <li><a href="#sec-4.3">4.3 move as much as possible into common_dmv.py</a></li>
  47 <li><a href="#sec-4.4">4.4 dmv2cnf re-estimation formulas</a></li>
  48 <li><a href="#sec-4.5">4.5 dmv2cnf IO formulas</a></li>
  49 </ul>
  50 </li>
  51 <li><a href="#sec-5">5 Combine CCM with DMV</a></li>
  52 <li><a href="#sec-6">6 Reestimate P_ORDER ?</a></li>
  53 <li><a href="#sec-7">7 Most Probable Parse</a>
  54 <ul>
  55 <li><a href="#sec-7.1">7.1 Find MPP with CCM</a></li>
  56 <li><a href="#sec-7.2">7.2 Find Most Probable Parse of given test sentence, in DMV</a></li>
  57 </ul>
  58 </li>
  59 <li><a href="#sec-8">8 Initialization   </a>
  60 <ul>
  61 <li><a href="#sec-8.1">8.1 CCM Initialization    </a></li>
  62 </ul>
  63 </li>
  64 <li><a href="#sec-9">9 [#C] Deferred</a>
  65 <ul>
  66 <li><a href="#sec-9.1">9.1 Clean up reestimation code</a></li>
  67 <li><a href="#sec-9.2">9.2 [#A] compare speed of w_left/right(&hellip;) and w(LEFT/RIGHT, &hellip;)</a></li>
  68 <li><a href="#sec-9.3">9.3 when reestimating P_STOP etc, remove rules with p &lt; epsilon</a></li>
  69 <li><a href="#sec-9.4">9.4 inner_dmv, short ranges and impossible attachment</a></li>
  70 <li><a href="#sec-9.5">9.5 clean up the module files</a></li>
  71 <li><a href="#sec-9.6">9.6 Some (tagged) sentences are bound to come twice</a></li>
  72 <li><a href="#sec-9.7">9.7 tags as numbers or tags as strings?</a></li>
  73 </ul>
  74 </li>
  75 <li><a href="#sec-10">10 Adjacency and combining it with the inside-outside algorithm</a>
  76 <ul>
  77 <li><a href="#sec-10.1">10.1 Possible alternate type of adjacency</a></li>
  78 </ul>
  79 </li>
  80 <li><a href="#sec-11">11 Python-stuff</a></li>
  81 <li><a href="#sec-12">12 Git</a></li>
  82 </ul>
  83 </div>
  84 </div>
  85
  86 <div id="outline-container-1" class="outline-2">
  87 <h2 id="sec-1">1 DMV/CCM report and project</h2>
  88 <div id="text-1">
  89
  90 <ul>
  91 <li>
  92 DMV-<a href="tex/formulas.pdf">formulas.pdf</a>  &ndash; <i>clear</i> information =D
  93 </li>
  94 <li>
  95 <a href="src/main.py">main.py</a> &ndash; evaluation
  96 </li>
  97 <li>
  98 <a href="src/wsjdep.py">wsjdep.py</a> &ndash; corpus
  99 </li>
 100 <li>
 101 <a href="src/loc_h_dmv.py">loc_h_dmv.py</a> &ndash; DMV-IO and reestimation
 102 </li>
 103 <li>
 104 <a href="src/loc_h_harmonic.py">loc_h_harmonic.py</a> &ndash; DMV initialization
 105 </li>
 106 <li>
 107 <a href="src/common_dmv.py">common_dmv.py</a> &ndash; various functions used by loc_h_dmv and others
 108 </li>
 109 <li>
 110 <a href="src/io.py">io.py</a> &ndash; non-DMV IO
 111
 112
 113 </li>
 114 </ul>
 115
 116 <p><a href="http://www.student.uib.no/~kun041/dmvccm/DMVCCM_archive.html">Archived entries</a> from this file.
 117 </p></div>
 118
 119 </div>
 120
 121 <div id="outline-container-2" class="outline-2">
 122 <h2 id="sec-2">2 Notation</h2>
 123 <div id="text-2">
 124
 125 <p><pre>
 126  old notes:   new notes:   in tex/code (constants):    in Klein thesis:
 127 --------------------------------------------------------------------------------------
 128  _h_            _h_            SEAL                    bar over h
 129   h_             h&gt;&lt;           RGOL                    right-under-left-arrow over h
 130   h              h&gt;            GOR                     right-arrow over h
 131
 132                &gt;&lt;h             LGOR                    left-under-right-arrow over h
 133                 &lt;h             GOL                     left-arrow over h
 134 </pre>
 135 These are represented in the code as pairs <code>(s_h,h)</code>, where <code>h</code> is an
 136 integer (POS-tag) and <code>s_h</code> &isin; <code>{SEAL,RGOL,GOR,LGOR,GOL}</code>.
 137 </p>
 138 <p>
 139 <code>P_ATTACH</code> and <code>P_CHOOSE</code> are synonymous, I try to use the
 140 former. Also,
 141 <pre>
 142  P_GO_AT(a|h,dir,adj) := P_ATTACH(a|h,dir)*(1-P_STOP(STOP|h,dir,adj)
 143 </pre>
 144 </p>
 145 <p>
 146 (precalculated after each reestimation with <code>g.p_GO_AT = make_GO_AT(g.p_STOP,g.p_ATTACH)</code>)
 147 </p>
 148 </div>
 149
 150 </div>
 151
 152 <div id="outline-container-3" class="outline-2">
 153 <h2 id="sec-3">3 Testing the dependency parsed WSJ</h2>
 154 <div id="text-3">
 155
 156 <p><a href="src/wsjdep.py">wsjdep.py</a> uses NLTK (sort of) to get a dependency parsed version of
 157 WSJ10 into the format used in mpp() in loc_h_dmv.py.
 158 </p>
 159 <p>
 160 As a default, <code>WSJDepCorpusReader</code> looks for the file <code>wsj.combined.10.dep</code> in
 161 <code>../corpus/wsjdep</code>.
 162 </p>
 163 <p>
 164 Only <code>sents()</code>, <code>tagged_sents()</code> and <code>parsed_sents()</code> (plus a new function
 165 <code>tagonly_sents()</code>) are implemented, the other NLTK corpus functions are
 166 ..um.. undefined&hellip;
 167 </p>
 168 </div>
 169
 170 <div id="outline-container-3.1" class="outline-3">
 171 <h3 id="sec-3.1">3.1 <span class="todo">TODO</span> [#A] Should <code>def evaluate</code> use add_root?</h3>
 172 <div id="text-3.1">
 173
 174 <p><a href="src/main.py">main.py</a> evaluate
 175 <a href="src/wsjdep.py">wsjdep.py</a> add_root
 176 </p>
 177 <p>
 178 (just has to count how many pairs are in there; Precision and Recall)
 179 </p></div>
 180 </div>
 181
 182 </div>
 183
 184 <div id="outline-container-4" class="outline-2">
 185 <h2 id="sec-4">4 <span class="todo">TODO</span> [#C] Alternative CNF for DMV</h2>
 186 <div id="text-4">
 187
 188
 189 <p>
 190 <a name="dmv2cnf">&nbsp;</a>
 191 </p><ul>
 192 <li>
 193 <a href="src/cnf_dmv.py">cnf_dmv.py</a>
 194 </li>
 195 <li>
 196 <a href="src/cnf_harmonic.py">cnf_harmonic.py</a>
 197
 198 </li>
 199 </ul>
 200
 201 <p>See section 5 of <a href="tex/formulas.pdf">formulas.pdf</a>.
 202 </p>
 203 <p>
 204 Given a grammar with certain p_ATTACH, p_STOP and p_ROOT, we get:
 205 <pre>
 206 &gt;&gt;&gt; print testgrammar_h():
 207   h&gt;&lt; --&gt;   h&gt;  STOP   [0.30]
 208   h&gt;&lt; --&gt;  &gt;h&gt;  STOP   [0.40]
 209  _h_  --&gt; STOP    h&gt;&lt;  [1.00]
 210  _h_  --&gt; STOP   &lt;h&gt;&lt;  [1.00]
 211  &gt;h&gt;  --&gt;   h&gt;   _h_   [1.00]
 212  &gt;h&gt;  --&gt;  &gt;h&gt;   _h_   [1.00]
 213  &lt;h&gt;&lt; --&gt;  _h_    h&gt;&lt;  [0.70]
 214  &lt;h&gt;&lt; --&gt;  _h_   &lt;h&gt;&lt;  [0.60]
 215 ROOT  --&gt; STOP   _h_   [1.00]
 216 </pre>
 217 </p>
 218
 219
 220 </div>
 221
 222 <div id="outline-container-4.1" class="outline-3">
 223 <h3 id="sec-4.1">4.1 <span class="todo">TOGROK</span> Do we use special P_ROOT rules?</h3>
 224 <div id="text-4.1">
 225
 226 <p>P<sub>ROOT</sub> doesn't care about adjacency anyway, so I guess it doesn't
 227 matter if we just sum over all head <b>types</b> in a sentence.
 228 </p>
 229 <p>
 230 Or?
 231 </p></div>
 232
 233 </div>
 234
 235 <div id="outline-container-4.2" class="outline-3">
 236 <h3 id="sec-4.2">4.2 <span class="todo">TODO</span> complete programming the dmv2cnf versions of dmv.py and harmonic.py </h3>
 237 <div id="text-4.2">
 238
 239 </div>
 240
 241 </div>
 242
 243 <div id="outline-container-4.3" class="outline-3">
 244 <h3 id="sec-4.3">4.3 <span class="todo">TODO</span> move as much as possible into common_dmv.py</h3>
 245 <div id="text-4.3">
 246
 247 <p><a href="src/common_dmv.py">common_dmv.py</a>
 248 </p>
 249 <p>
 250 &hellip;and improve cnf vs loc_h classes (at <i>least</i> give them different names)
 251 </p></div>
 252
 253 </div>
 254
 255 <div id="outline-container-4.4" class="outline-3">
 256 <h3 id="sec-4.4">4.4 <span class="todo">TODO</span> dmv2cnf re-estimation formulas</h3>
 257 <div id="text-4.4">
 258
 259 <p><a href="tex/formulas.tex">tex</a>
 260 </p>
 261 <p>
 262 The reestimation function still has to sum over the various
 263 possibilities of N's and A's; but it seems to be simpler than the
 264 loc_h-method altogether.
 265 </p>
 266 <p>
 267 Question: Would it be the same thing to reestimate using completely
 268 regular IO reestimation?
 269 </p></div>
 270
 271 </div>
 272
 273 <div id="outline-container-4.5" class="outline-3">
 274 <h3 id="sec-4.5">4.5 <span class="done">DONE</span> dmv2cnf IO formulas</h3>
 275 <div id="text-4.5">
 276
 277 <p><span class="timestamp-kwd">CLOSED: </span> <span class="timestamp">2008-07-30 Wed 00:40</span><br/>
 278 </p></div>
 279 </div>
 280
 281 </div>
 282
 283 <div id="outline-container-5" class="outline-2">
 284 <h2 id="sec-5">5 <span class="todo">TOGROK</span> Combine CCM with DMV</h2>
 285 <div id="text-5">
 286
 287
 288 <p>
 289 <a name="comboquestions">&nbsp;</a>
 290 </p>
 291 <p>
 292 Questions about the <code>P_COMBO</code> info in <a href="http://www.eecs.berkeley.edu/~klein/papers/klein_thesis.pdf">Klein's thesis</a>:
 293 </p><ul>
 294 <li>
 295 Page 109 (pdf: 125): We have to premultiply "all our probabilities"
 296 by the CCM base product <i>&Pi;<sub>&lt;i,j&gt;</sub>   P<sub>SPAN</sub>(&alpha;(i,j,s)|false)P<sub>CONTEXT</sub>(&beta;(i,j,s)|false)</i>; which
 297 probabilities are included under "all"? I'm assuming this includes
 298 <code>P_ATTACH</code> since each time <code>P_ATTACH</code> is used, <i>&phi;</i> is multiplied in
 299 (pp.110-111 ibid.); but <i>&phi;</i> is not used for STOPs, so should we not
 300 have our CCM product multiplied in there? How about <code>P_ROOT</code>?
 301 (Guessing <code>P_ORDER</code> is way out of the question&hellip;)
 302 </li>
 303 <li>
 304 For the outside probabilities, is it correct to assume we multiply
 305 in <i>&phi;(j,k)</i> or <i>&phi;(k,i)</i> when calculating <code>inner(i,j...)</code>? (Eg., only
 306 for the outside part, not for the whole range.) I don't understand
 307 the notation in <code>O()</code> on p.103.
 308 </li>
 309 </ul>
 310 </div>
 311
 312 </div>
 313
 314 <div id="outline-container-6" class="outline-2">
 315 <h2 id="sec-6">6 <span class="todo">TOGROK</span> Reestimate P_ORDER ?</h2>
 316 <div id="text-6">
 317
 318 </div>
 319
 320 </div>
 321
 322 <div id="outline-container-7" class="outline-2">
 323 <h2 id="sec-7">7 Most Probable Parse</h2>
 324 <div id="text-7">
 325
 326
 327 </div>
 328
 329 <div id="outline-container-7.1" class="outline-3">
 330 <h3 id="sec-7.1">7.1 <span class="todo">TOGROK</span> Find MPP with CCM</h3>
 331 <div id="text-7.1">
 332
 333 </div>
 334
 335 </div>
 336
 337 <div id="outline-container-7.2" class="outline-3">
 338 <h3 id="sec-7.2">7.2 <span class="done">DONE</span> Find Most Probable Parse of given test sentence, in DMV</h3>
 339 <div id="text-7.2">
 340
 341 <p><span class="timestamp-kwd">CLOSED: </span> <span class="timestamp">2008-07-23 Wed 10:56</span><br/>
 342 inner() optionally keeps track of the highest probability children of
 343 any node in <code>mpptree</code>. Say we're looking for <code>inner(i,j,(s_h,h),loc_h)</code> in
 344 a certain sentence, and we find some possible left and right children,
 345 we add to <code>mpptree[i,j,(s_h,h),loc_h]</code> the triple <code>(p, L, R)</code> where <code>L</code> and
 346 <code>R</code> are of the same form as the key (<code>i,j,(s_h,h),loc_h</code>) and <code>p</code> is the
 347 probability of this node rewriting to <code>L</code> and <code>R</code>,
 348 eg. <code>inner(L)*inner(R)*p_GO_AT</code> or <code>p_STOP</code> or whatever. We only add this
 349 entry to <code>mpptree</code> if there wasn't a higher-probability entry there
 350 before.
 351 </p>
 352 <p>
 353 Then, after <code>inner_sent</code> makes an <code>mpptree</code>, we find the <i>relevant</i>
 354 head-argument pairs by searching through the tree using a queue,
 355 adding the <code>L</code> and <code>R</code> keys of any entry to the queue as we find them
 356 (skipping <code>STOP</code> keys), and adding any attachment entries to a set of
 357 triples <code>(head,argument,dir)</code>. Thus we have our most probable parse,
 358 eg.
 359 <pre>
 360  set([( ROOT, (vbd,2),RIGHT),
 361       ((vbd,2),(nn,1),LEFT),
 362       ((vbd,2),(nn,3),RIGHT),
 363       ((nn,1),(det,0),LEFT)])
 364 </pre>
 365 </p></div>
 366 </div>
 367
 368 </div>
 369
 370 <div id="outline-container-8" class="outline-2">
 371 <h2 id="sec-8">8 Initialization   </h2>
 372 <div id="text-8">
 373
 374 <p><a href="/Users/kiwibird/Documents/Skole/V08/Probability/dmvccm/src/dmv.py">dmv-inits</a>
 375 </p>
 376 <p>
 377 We go through the corpus, since the probabilities are based on how far
 378 away in the sentence arguments are from their heads.
 379 </p>
 380 </div>
 381
 382 <div id="outline-container-8.1" class="outline-3">
 383 <h3 id="sec-8.1">8.1 <span class="todo">TOGROK</span> CCM Initialization    </h3>
 384 <div id="text-8.1">
 385
 386 <p>P<sub>SPLIT</sub> used here&hellip; how, again?
 387 </p></div>
 388 </div>
 389
 390 </div>
 391
 392 <div id="outline-container-9" class="outline-2">
 393 <h2 id="sec-9">9 [#C] Deferred</h2>
 394 <div id="text-9">
 395
 396 <p><a href="http://wiki.python.org/moin/PythonSpeed/PerformanceTips">http://wiki.python.org/moin/PythonSpeed/PerformanceTips</a> Eg., use
 397 map/reduce/filter/[i for i in [i's]]/(i for i in [i's]) instead of
 398 for-loops; use local variables for globals (global variables or or
 399 functions), etc.
 400 </p>
 401 </div>
 402
 403 <div id="outline-container-9.1" class="outline-3">
 404 <h3 id="sec-9.1">9.1 <span class="todo">TODO</span> Clean up reestimation code                                    &nbsp;&nbsp;&nbsp;<span class="tag">PRETTIER</span></h3>
 405 <div id="text-9.1">
 406
 407 </div>
 408
 409 </div>
 410
 411 <div id="outline-container-9.2" class="outline-3">
 412 <h3 id="sec-9.2">9.2 <span class="todo">TODO</span> [#A] compare speed of w_left/right(&hellip;) and w(LEFT/RIGHT, &hellip;) &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span></h3>
 413 <div id="text-9.2">
 414
 415 </div>
 416
 417 </div>
 418
 419 <div id="outline-container-9.3" class="outline-3">
 420 <h3 id="sec-9.3">9.3 <span class="todo">TODO</span> when reestimating P_STOP etc, remove rules with p &lt; epsilon   &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span></h3>
 421 <div id="text-9.3">
 422
 423 </div>
 424
 425 </div>
 426
 427 <div id="outline-container-9.4" class="outline-3">
 428 <h3 id="sec-9.4">9.4 <span class="todo">TODO</span> inner_dmv, short ranges and impossible attachment             &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span></h3>
 429 <div id="text-9.4">
 430
 431 <p>If s-t &lt;= 2, there can be only one attachment below, so don't recurse
 432 with both Lattach=True and Rattach=True.
 433 </p>
 434 <p>
 435 If s-t &lt;= 1, there can be no attachment below, so only recurse with
 436 Lattach=False, Rattach=False.
 437 </p>
 438 <p>
 439 Put this in the loop under rewrite rules (could also do it in the STOP
 440 section, but that would only have an effect on very short sentences).
 441 </p></div>
 442
 443 </div>
 444
 445 <div id="outline-container-9.5" class="outline-3">
 446 <h3 id="sec-9.5">9.5 <span class="todo">TODO</span> clean up the module files                                     &nbsp;&nbsp;&nbsp;<span class="tag">PRETTIER</span></h3>
 447 <div id="text-9.5">
 448
 449 <p>Is there better way to divide dmv and harmonic? There's a two-way
 450 dependency between the modules. Guess there could be a third file that
 451 imports both the initialization and the actual EM stuff, while a file
 452 containing constants and classes could be imported by all others:
 453 <pre>
 454  dmv.py imports dmv_EM.py imports dmv_classes.py
 455  dmv.py imports dmv_inits.py imports dmv_classes.py
 456 </pre>
 457 </p>
 458 </div>
 459
 460 </div>
 461
 462 <div id="outline-container-9.6" class="outline-3">
 463 <h3 id="sec-9.6">9.6 <span class="todo">TOGROK</span> Some (tagged) sentences are bound to come twice             &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span></h3>
 464 <div id="text-9.6">
 465
 466 <p>Eg, first sort and count, so that the corpus
 467 [['nn','vbd','det','nn'],
 468 ['vbd','nn','det','nn'],
 469 ['nn','vbd','det','nn']]
 470 becomes
 471 [(['nn','vbd','det','nn'],2),
 472 (['vbd','nn','det','nn'],1)]
 473 and then in each loop through sentences, make sure we handle the
 474 frequency correctly.
 475 </p>
 476 <p>
 477 Is there much to gain here?
 478 </p>
 479 </div>
 480
 481 </div>
 482
 483 <div id="outline-container-9.7" class="outline-3">
 484 <h3 id="sec-9.7">9.7 <span class="todo">TOGROK</span> tags as numbers or tags as strings?                         &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span></h3>
 485 <div id="text-9.7">
 486
 487 <p>Need to clean up the representation.
 488 </p>
 489 <p>
 490 Stick with tag-strings in initialization then switch to numbers for
 491 IO-algorithm perhaps? Can probably afford more string-matching in
 492 initialization..
 493 </p></div>
 494 </div>
 495
 496 </div>
 497
 498 <div id="outline-container-10" class="outline-2">
 499 <h2 id="sec-10">10 Adjacency and combining it with the inside-outside algorithm</h2>
 500 <div id="text-10">
 501
 502 <p>Each DMV_Rule has both a probN and a probA, for adjacencies. inner()
 503 and outer() needs the correct one in each case.
 504 </p>
 505 <p>
 506 In each inner() call, loc_h is the location of the head of this
 507 dependency structure. In each outer() call, it's the head of the <i>Node</i>,
 508 the structure we're looking outside of.
 509 </p>
 510 <p>
 511 We call inner() for each location of a head, and on each terminal,
 512 loc_h must equal <code>i</code> (and <code>loc_h+1</code> equal <code>j</code>). In the recursive attachment
 513 calls, we use the locations (sentence indices) of words to the left or
 514 right of the head in calls to inner(). <i>loc_h lets us check whether we need probN or probA</i>.
 515 </p>
 516 </div>
 517
 518 <div id="outline-container-10.1" class="outline-3">
 519 <h3 id="sec-10.1">10.1 Possible alternate type of adjacency</h3>
 520 <div id="text-10.1">
 521
 522 <p>K&amp;M's adjacency is just whether or not an argument has been generated
 523 in the current direction yet. One could also make a stronger type of
 524 adjacency, where h and a are not adjacent if b is in between, eg. with
 525 the sentence "a b h" and the structure ((h-&gt;a), (a-&gt;b)), h is
 526 K&amp;M-adjacent to a, but not next to a, since b is in between. It's easy
 527 to check this type of adjacency in inner(), but it needs new rules for
 528 P_STOP reestimation.
 529 </p></div>
 530 </div>
 531
 532 </div>
 533
 534 <div id="outline-container-11" class="outline-2">
 535 <h2 id="sec-11">11 Python-stuff</h2>
 536 <div id="text-11">
 537
 538 <p>Make those debug statements steal a bit less attention in emacs:
 539 <pre>
 540 (font-lock-add-keywords
 541  'python-mode                   ; not really regexp, a bit slow
 542  '(("^\\( *\\)\\(\\if +'.+' +in +io.DEBUG. *\\(
 543 \\1    .+$\\)+\\)" 2 font-lock-preprocessor-face t)))
 544 (font-lock-add-keywords
 545  'python-mode
 546  '(("\\&lt;\\(\\(io\\.\\)?debug(.+)\\)" 1 font-lock-preprocessor-face t)))
 547 </pre>
 548 </p>
 549 <ul>
 550 <li>
 551 <a href="src/pseudo.py">pseudo.py</a>
 552 </li>
 553 <li>
 554 <a href="http://nltk.org/doc/en/structured-programming.html">http://nltk.org/doc/en/structured-programming.html</a> recursive dynamic
 555 </li>
 556 <li>
 557 <a href="http://nltk.org/doc/en/advanced-parsing.html">http://nltk.org/doc/en/advanced-parsing.html</a>
 558 </li>
 559 <li>
 560 <a href="http://jaynes.colorado.edu/PythonIdioms.html">http://jaynes.colorado.edu/PythonIdioms.html</a>
 561
 562
 563
 564 </li>
 565 </ul>
 566 </div>
 567
 568 </div>
 569
 570 <div id="outline-container-12" class="outline-2">
 571 <h2 id="sec-12">12 Git</h2>
 572 <div id="text-12">
 573
 574 <p>Repository web page: <a href="http://repo.or.cz/w/dmvccm.git">http://repo.or.cz/w/dmvccm.git</a>
 575 </p>
 576 <p>
 577 Setting up a new project:
 578 <pre>
 579  git init
 580  git add .
 581  git commit -m "first release"
 582 </pre>
 583 </p>
 584 <p>
 585 Later on: (<code>-a</code> does <code>git rm</code> and <code>git add</code> automatically)
 586 <pre>
 587  git init
 588  git commit -a -m "some subsequent release"
 589 </pre>
 590 </p>
 591 <p>
 592 Then push stuff up to the remote server:
 593 <pre>
 594  git push git+ssh://username@repo.or.cz/srv/git/dmvccm.git master
 595 </pre>
 596 </p>
 597 <p>
 598 (<code>eval `ssh-agent`</code> and <code>ssh-add</code> to avoid having to type in keyphrase all
 599 the time)
 600 </p>
 601 <p>
 602 Make a copy of the (remote) master branch:
 603 <pre>
 604  git clone git://repo.or.cz/dmvccm.git
 605 </pre>
 606 </p>
 607 <p>
 608 Make and name a new branch in this folder
 609 <pre>
 610  git checkout -b mybranch
 611 </pre>
 612 </p>
 613 <p>
 614 To save changes in <code>mybranch</code>:
 615 <pre>
 616  git commit -a
 617 </pre>
 618 </p>
 619 <p>
 620 Go back to the master branch (uncommitted changes from <code>mybranch</code> are
 621 carried over):
 622 <pre>
 623  git checkout master
 624 </pre>
 625 </p>
 626 <p>
 627 Try out:
 628 <pre>
 629  git add --interactive
 630 </pre>
 631 </p>
 632 <p>
 633 Good tutorial:
 634 <a href="http://www-cs-students.stanford.edu/~blynn//gitmagic/">http://www-cs-students.stanford.edu/~blynn//gitmagic/</a>
 635 </p></div>
 636 </div>
 637 <div id="postamble"><p class="author"> Author: Kevin Brubeck Unhammer
 638 <a href="mailto:K.BrubeckUnhammer at student uva nl ">&lt;K.BrubeckUnhammer at student uva nl &gt;</a>
 639 </p>
 640 <p class="date"> Date: 2008/08/01 21:56:44</p>
 641 </div><div class="post-postamble" id="nn-postamble">Skrive vha. emacs + <a href='http://orgmode.org/'>org-mode</a></div></body>
 642 </html>