DMVCCM.html

   1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
   2                "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
   3 <html xmlns="http://www.w3.org/1999/xhtml"
   4 lang="en" xml:lang="en">
   5 <head>
   6 <title>DMV/CCM &ndash; todo-list / progress</title>
   7 <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
   8 <meta name="generator" content="Org-mode"/>
   9 <meta name="generated" content="2008/06/09 13:04:26"/>
  10 <meta name="author" content="Kevin Brubeck Unhammer"/>
  11 <link rel="stylesheet" type="text/css" href="http://www.student.uib.no/~kun041/org.css">
  12 </head><body>
  13 <h1 class="title">DMV/CCM &ndash; todo-list / progress</h1>
  14
  15
  16 <div id="table-of-contents">
  17 <h2>Table of Contents</h2>
  18 <div id="text-table-of-contents">
  19 <ul>
  20 <li><a href="#sec-1">1 dmvccm report and project</a></li>
  21 <li><a href="#sec-2">2 [#A] P_STOP and P_CHOOSE for IO/EM</a></li>
  22 <li><a href="#sec-3">3 Find Most Probable Parse of given test sentence</a></li>
  23 <li><a href="#sec-4">4 Combine CCM with DMV</a></li>
  24 <li><a href="#sec-5">5 Get a dependency parsed version of WSJ to test with</a></li>
  25 <li><a href="#sec-6">6 Initialization   </a></li>
  26 <li><a href="#sec-7">7 [#C] Deferred</a></li>
  27 <li><a href="#sec-8">8 Adjacency and combining it with inner()</a></li>
  28 <li><a href="#sec-9">9 Expectation Maximation in IO/DMV-terms</a></li>
  29 <li><a href="#sec-10">10 Python-stuff</a></li>
  30 <li><a href="#sec-11">11 Git</a></li>
  31 </ul>
  32 </div>
  33 </div>
  34
  35 <div id="outline-container-1" class="outline-2">
  36 <h2 id="sec-1">1 dmvccm report and project</h2>
  37 <div id="text-1">
  38
  39 <p><span class="timestamp-kwd">DEADLINE: </span> <span class="timestamp">2008-06-30 Mon</span><br/>
  40 But absolute, extended, really-quite-dead-now deadline: August
  41 31&hellip;
  42 </p><ul>
  43 <li>
  44 <a href="src/dmv.py">dmv.py</a>
  45 </li>
  46 <li>
  47 <a href="src/io.py">io.py</a>
  48 </li>
  49 <li>
  50 <a href="src/harmonic.py">harmonic.py</a>
  51
  52
  53 </li>
  54 </ul>
  55 </div>
  56
  57 </div>
  58
  59 <div id="outline-container-2" class="outline-2">
  60 <h2 id="sec-2">2 <span class="todo">TODO</span> [#A] P_STOP and P_CHOOSE for IO/EM</h2>
  61 <div id="text-2">
  62
  63 <p><a href="src/dmv.py">dmv-P_STOP</a>
  64 Remember: The P<sub>STOP</sub> formula is upside-down (left-to-right also).
  65 (In the article..not the thesis)
  66 </p>
  67 <p>
  68 Remember: Initialization makes some "short-cut" rules, these will also
  69 have to be updated along with the other P<sub>STOP</sub> updates:
  70 </p><ul>
  71 <li>
  72 b[(NOBAR, n<sub>h</sub>), 'h'] = 1.0       # always
  73 </li>
  74 <li>
  75 b[(RBAR, n<sub>h</sub>), 'h'] = h_.probA  # h_ is RBAR stop rule
  76 </li>
  77 <li>
  78 b[(LRBAR, n<sub>h</sub>), 'h'] = h_.probA * _ h_.probA
  79
  80 </li>
  81 <li id="sec-2.1">P_STOP formulas for various dir and adj:<br/>
  82 Assuming this:
  83 <ul>
  84 <li>
  85 P<sub>STOP</sub>(STOP|h,L,non_adj) = &sum;<sub>corpus</sub> &sum;<sub>s&lt;loc(h)</sub> &sum;<sub>t</sub>
  86 inner(s,t,_h_, &hellip;) / &sum;<sub>corpus</sub> &sum;<sub>s&lt;loc(h)</sub> &sum;<sub>t</sub>
  87 inner(s,t,h_, &hellip;)
  88 </li>
  89 <li>
  90 P<sub>STOP</sub>(STOP|h,L,adj) = &sum;<sub>corpus</sub> &sum;<sub>s=loc(h)</sub> &sum;<sub>t</sub>
  91 inner(s,t,_h_, &hellip;) / &sum;<sub>corpus</sub> &sum;<sub>s=loc(h)</sub> &sum;<sub>t</sub>
  92 inner(s,t,h_, &hellip;)
  93 </li>
  94 <li>
  95 P<sub>STOP</sub>(STOP|h,R,non_adj) = &sum;<sub>corpus</sub> &sum;<sub>s</sub> &sum;<sub>t&gt;loc(h)</sub>
  96 inner(s,t,_h_, &hellip;) / &sum;<sub>corpus</sub> &sum;<sub>s</sub> &sum;<sub>t&gt;loc(h)</sub>
  97 inner(s,t,h_, &hellip;)
  98 </li>
  99 <li>
 100 P<sub>STOP</sub>(STOP|h,R,adj) = &sum;<sub>corpus</sub> &sum;<sub>s</sub> &sum;<sub>t=loc(h)</sub>
 101 inner(s,t,_h_, &hellip;) / &sum;<sub>corpus</sub> &sum;<sub>s</sub> &sum;<sub>t=loc(h)</sub>
 102 inner(s,t,h_, &hellip;)
 103
 104 </li>
 105 </ul>
 106
 107 <p>(And P<sub>STOP</sub>(-STOP|&hellip;) = 1 - P<sub>STOP</sub>(STOP|&hellip;) )
 108 </p></li>
 109 <li id="sec-2.2">P_CHOOSE formula:<br/>
 110 Assuming this:
 111 <ul>
 112 <li>
 113 P<sub>CHOOSE</sub>(a|h,L) = &sum;<sub>corpus</sub> &sum;<sub>s&lt;loc(h)</sub> &sum;<sub>t&gt;=loc(h)</sub> &sum;<sub>r&lt;t</sub>
 114 inner(s,r,_a_, &hellip;) / &sum;<sub>corpus</sub> &sum;<sub>s&lt;loc(h)</sub> &sum;<sub>t&gt;=loc(h)</sub>
 115 inner(s,t,h_,&hellip;)
 116 <ul>
 117 <li>
 118 t &gt;= loc(h) since there are many possibilites for
 119 right-attachments below, and each of them alone gives a lower
 120 probability (through multiplication) to the upper tree (so sum
 121 them all)
 122 </li>
 123 </ul>
 124 </li>
 125 <li>
 126 P<sub>CHOOSE</sub>(a|h,R) = &sum;<sub>corpus</sub> &sum;<sub>s=loc(h)</sub> &sum;<sub>r&gt;s</sub>
 127 inner(s,r,_a_,&hellip;) / &sum;<sub>corpus</sub> &sum;<sub>s=loc(h)</sub> &sum;<sub>t</sub>
 128 inner(s,t,h, &hellip;)
 129
 130
 131 </li>
 132 </ul>
 133 </li>
 134 </ul>
 135 </div>
 136
 137 </div>
 138
 139 <div id="outline-container-3" class="outline-2">
 140 <h2 id="sec-3">3 <span class="todo">TOGROK</span> Find Most Probable Parse of given test sentence</h2>
 141 <div id="text-3">
 142
 143 <p>We could probably do it pretty easily from the chart:
 144 </p><ul>
 145 <li>
 146 for all loc_h, select the (0,len(sent),ROOT,loc_h) that has the highest p
 147 </li>
 148 <li>
 149 for stops, just keep going
 150 </li>
 151 <li>
 152 for rewrites, for loc_dep select the one that has the highest p
 153 </li>
 154 </ul>
 155 </div>
 156
 157 </div>
 158
 159 <div id="outline-container-4" class="outline-2">
 160 <h2 id="sec-4">4 <span class="todo">TOGROK</span> Combine CCM with DMV</h2>
 161 <div id="text-4">
 162
 163 </div>
 164
 165 </div>
 166
 167 <div id="outline-container-5" class="outline-2">
 168 <h2 id="sec-5">5 <span class="todo">TODO</span> Get a dependency parsed version of WSJ to test with</h2>
 169 <div id="text-5">
 170
 171 </div>
 172
 173 </div>
 174
 175 <div id="outline-container-6" class="outline-2">
 176 <h2 id="sec-6">6 Initialization   </h2>
 177 <div id="text-6">
 178
 179 <p><a href="/Users/kiwibird/Documents/Skole/V08/Probability/dmvccm/src/dmv.py">dmv-inits</a>
 180 </p>
 181 <p>
 182 We do have to go through the corpus, since the probabilities are based
 183 on how far away in the sentence arguments are from their heads.
 184 </p><ul>
 185 <li id="sec-6.1"><span class="todo">TOGROK</span> CCM Initialization    <br/>
 186 P<sub>SPLIT</sub> used here&hellip; how, again?
 187 </li>
 188 <li id="sec-6.2"><span class="done">DONE</span> Separate initialization to another file?                      &nbsp;&nbsp;&nbsp;<span class="tag">PRETTIER</span><br/>
 189 <span class="timestamp-kwd">CLOSED: </span> <span class="timestamp">2008-06-08 Sun 12:51</span><br/>
 190 <a href="src/harmonic.py">harmonic.py</a>
 191 </li>
 192 <li id="sec-6.3"><span class="done">DONE</span> DMV Initialization probabilities<br/>
 193 (from initialization frequency)
 194 </li>
 195 <li id="sec-6.4"><span class="done">DONE</span> DMV Initialization frequencies    <br/>
 196 <span class="timestamp-kwd">CLOSED: </span> <span class="timestamp">2008-05-27 Tue 20:04</span><br/>
 197 <ul>
 198 <li id="sec-6.4.1">P_STOP    <br/>
 199 P<sub>STOP</sub> is not well defined by K&amp;M. One possible interpretation given
 200 the sentence [det nn vb nn] is
 201 <pre>
 202  f_{STOP}( STOP|det, L, adj) +1
 203  f_{STOP}(-STOP|det, L, adj) +0
 204  f_{STOP}( STOP|det, L, non_adj) +1
 205  f_{STOP}(-STOP|det, L, non_adj) +0
 206  f_{STOP}( STOP|det, R, adj) +0
 207  f_{STOP}(-STOP|det, R, adj) +1
 208
 209  f_{STOP}( STOP|nn, L, adj) +0
 210  f_{STOP}(-STOP|nn, L, adj) +1
 211  f_{STOP}( STOP|nn, L, non_adj) +1  # since there's at least one to the left
 212  f_{STOP}(-STOP|nn, L, non_adj) +0
 213 </pre>
 214 <ul>
 215 <li id="sec-6.4.1.1"><span class="todo">TODO</span> tweak<br/>
 216 <pre>
 217             f[head,  'STOP', 'LN'] += (i_h &lt;= 1)     # first two words
 218             f[head, '-STOP', 'LN'] += (not i_h &lt;= 1)
 219             f[head,  'STOP', 'LA'] += (i_h == 0)     # very first word
 220             f[head, '-STOP', 'LA'] += (not i_h == 0)
 221             f[head,  'STOP', 'RN'] += (i_h &gt;= n - 2) # last two words
 222             f[head, '-STOP', 'RN'] += (not i_h &gt;= n - 2)
 223             f[head,  'STOP', 'RA'] += (i_h == n - 1) # very last word
 224             f[head, '-STOP', 'RA'] += (not i_h == n - 1)
 225 </pre>
 226 vs
 227 <pre>
 228             # this one requires some additional rewriting since it
 229             # introduces divisions by zero
 230             f[head,  'STOP', 'LN'] += (i_h == 1)     # second word
 231             f[head, '-STOP', 'LN'] += (not i_h &lt;= 1) # not first two
 232             f[head,  'STOP', 'LA'] += (i_h == 0)     # first word
 233             f[head, '-STOP', 'LA'] += (not i_h == 0) # not first
 234             f[head,  'STOP', 'RN'] += (i_h == n - 2)     # second-to-last
 235             f[head, '-STOP', 'RN'] += (not i_h &gt;= n - 2) # not last two
 236             f[head,  'STOP', 'RA'] += (i_h == n - 1)     # last word
 237             f[head, '-STOP', 'RA'] += (not i_h == n - 1) # not last
 238 </pre>
 239 vs
 240 <pre>
 241             f[head,  'STOP', 'LN'] += (i_h == 1)     # second word
 242             f[head, '-STOP', 'LN'] += (not i_h == 1) # not second
 243             f[head,  'STOP', 'LA'] += (i_h == 0)     # first word
 244             f[head, '-STOP', 'LA'] += (not i_h == 0) # not first
 245             f[head,  'STOP', 'RN'] += (i_h == n - 2)     # second-to-last
 246             f[head, '-STOP', 'RN'] += (not i_h == n - 2) # not second-to-last
 247             f[head,  'STOP', 'RA'] += (i_h == n - 1)     # last word
 248             f[head, '-STOP', 'RA'] += (not i_h == n - 1) # not last
 249 </pre>
 250 vs
 251 "all words take the same number of arguments" interpreted as
 252 <pre>
 253 for all heads:
 254     p_STOP(head, 'STOP', 'LN') = 0.3
 255     p_STOP(head, 'STOP', 'LA') = 0.5
 256     p_STOP(head, 'STOP', 'RN') = 0.4
 257     p_STOP(head, 'STOP', 'RA') = 0.7
 258 </pre>
 259 (which we easily may tweak in init_zeros())
 260 </li>
 261 </ul>
 262 </li>
 263 <li id="sec-6.4.2">P_CHOOSE<br/>
 264 Go through the corpus, counting distances between heads and
 265 arguments. In [det nn vb nn], we give
 266 <ul>
 267 <li>
 268 f<sub>CHOOSE</sub>(nn|det, R) +1/1 + C
 269 </li>
 270 <li>
 271 f<sub>CHOOSE</sub>(vb|det, R) +1/2 + C
 272 </li>
 273 <li>
 274 f<sub>CHOOSE</sub>(nn|det, R) +1/3 + C
 275 <ul>
 276 <li>
 277 If this were the full corpus, P<sub>CHOOSE</sub>(nn|det, R) would have
 278 (1+1/3+2C) / sum_a f<sub>CHOOSE</sub>(a|det, R)
 279
 280 </li>
 281 </ul>
 282 </li>
 283 </ul>
 284
 285 <p>The ROOT gets "each argument with equal probability", so in a sentence
 286 of three words, 1/3 for each (in [nn vb nn], 'nn' gets 2/3). Basically
 287 just a frequency count of the corpus&hellip;
 288 </p></li>
 289 </ul>
 290 </li>
 291 </ul>
 292 </div>
 293
 294 </div>
 295
 296 <div id="outline-container-7" class="outline-2">
 297 <h2 id="sec-7">7 [#C] Deferred</h2>
 298 <div id="text-7">
 299
 300 <ul>
 301 <li id="sec-7.1"><span class="todo">TODO</span> if loc_h == t, no need to try right-attachment rules          &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span><br/>
 302 ** TODO if loc_h == s, no need to try left-attachment rules           :OPTIMIZE:
 303 </li>
 304 <li id="sec-7.3"><span class="todo">TODO</span> when reestimating P_STOP etc, remove rules with p &lt; epsilon   &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span><br/>
 305 </li>
 306 <li id="sec-7.4"><span class="todo">TODO</span> inner_dmv, short ranges and impossible attachment             &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span><br/>
 307 If s-t &lt;= 2, there can be only one attachment below, so don't recurse
 308 with both Lattach=True and Rattach=True.
 309
 310 <p>
 311 If s-t &lt;= 1, there can be no attachment below, so only recurse with
 312 Lattach=False, Rattach=False.
 313 </p>
 314 <p>
 315 Put this in the loop under rewrite rules (could also do it in the STOP
 316 section, but that would only have an effect on very short sentences).
 317 </p></li>
 318 <li id="sec-7.5"><span class="todo">TODO</span> clean up the module files                                     &nbsp;&nbsp;&nbsp;<span class="tag">PRETTIER</span><br/>
 319 Is there better way to divide dmv and harmonic? There's a two-way
 320 dependency between the modules. Guess there could be a third file that
 321 imports both the initialization and the actual EM stuff, while a file
 322 containing constants and classes could be imported by all others:
 323 <pre>
 324  dmv.py imports dmv_EM.py imports dmv_classes.py
 325  dmv.py imports dmv_inits.py imports dmv_classes.py
 326 </pre>
 327
 328 </li>
 329 <li id="sec-7.6"><span class="todo">TOGROK</span> Some (tagged) sentences are bound to come twice             &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span><br/>
 330 Eg, first sort and count, so that the corpus
 331 [['nn','vbd','det','nn'],
 332 ['vbd','nn','det','nn'],
 333 ['nn','vbd','det','nn']]
 334 becomes
 335 [(['nn','vbd','det','nn'],2),
 336 (['vbd','nn','det','nn'],1)]
 337 and then in each loop through sentences, make sure we handle the
 338 frequency correctly.
 339
 340 <p>
 341 Is there much to gain here?
 342 </p>
 343 </li>
 344 <li id="sec-7.7"><span class="todo">TOGROK</span> tags as numbers or tags as strings?                         &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span><br/>
 345 Need to clean up the representation.
 346
 347 <p>
 348 Stick with tag-strings in initialization then switch to numbers for
 349 IO-algorithm perhaps? Can probably afford more string-matching in
 350 initialization..
 351 </p></li>
 352 <li id="sec-7.8"><span class="done">DONE</span> inner_dmv() should disregard rules with heads not in sent     &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span><br/>
 353 <span class="timestamp-kwd">CLOSED: </span> <span class="timestamp">2008-06-08 Sun 10:18</span><br/>
 354 If the sentence is "nn vbd det nn", we should not even look at rules
 355 where
 356 <pre>
 357  rule.head() not in "nn vbd det nn".split()
 358 </pre>
 359 This is ruled out by getting rules from g.rules(LHS, sent).
 360
 361 <p>
 362 Also, we optimize this further by saying we don't even recurse into
 363 attachment rules where
 364 <pre>
 365  rule.head() not in sent[ s :r+1]
 366  rule.head() not in sent[r+1:t+1]
 367 </pre>
 368 meaning, if we're looking at the span "vbd det", we only use
 369 attachment rules where both daughters are members of ['vbd','det']
 370 (although we don't (yet) care about removing rules that rewrite to the
 371 same tag if there are no duplicate tags in the span, etc., that would
 372 be a lot of trouble for little potential gain).
 373 </p></li>
 374 </ul>
 375 </div>
 376
 377 </div>
 378
 379 <div id="outline-container-8" class="outline-2">
 380 <h2 id="sec-8">8 Adjacency and combining it with inner()</h2>
 381 <div id="text-8">
 382
 383 <p>Each DMV_Rule now has both a probN and a probA, for
 384 adjacencies. inner() needs the correct one in each case.
 385 </p>
 386 <p>
 387 Adjacency gives a problem with duplicate words/tags, eg. in the
 388 sentence "a a b". If this has the dependency structure b-&gt;a<sub>0</sub>-&gt;a<sub>1</sub>,
 389 then b is non-adjacent to a<sub>0</sub> and should use probN (for the LRStop and
 390 the attachment of a<sub>0</sub>), while the other rules should all use
 391 probA. But within the e(0,2,b) we can't just say "oh, a has index 0
 392 so it's not adjacent to 2", since there's also an a at index 1, and
 393 there's also a dependency structure b-&gt;a<sub>1</sub>-&gt;a<sub>0</sub> for that. We want
 394 both. And in possibly much more complex versions.
 395 </p>
 396 <p>
 397 So now we call inner() for each location of a head, and on each
 398 terminal, loc_h must equal s (and t). In the recursive attachment
 399 calls, we use the locations (sentence indices) of words to the left or
 400 right of the head in calls to inner(). loc_h lets us check whether we
 401 need probN or probA.
 402 </p>
 403 <p>
 404 In each inner() call, loc_h is the location of the root of this
 405 dependency structure.
 406 </p><ul>
 407 <li id="sec-8.1">Possible alternate type of adjacency<br/>
 408 K&amp;M's adjacency is just whether or not an argument has been generated
 409 in the current direction yet. One could also make a stronger type of
 410 adjacency, where h and a are not adjacent if b is in between, eg. with
 411 the sentence "a b h" and the structure ((h-&gt;a), (a-&gt;b)), h is
 412 K&amp;M-adjacent to a, but not next to a, since b is in between. It's easy
 413 to check this type of adjacency in inner(), but it needs new rules for
 414 P_STOP reestimation.
 415 </li>
 416 </ul>
 417 </div>
 418
 419 </div>
 420
 421 <div id="outline-container-9" class="outline-2">
 422 <h2 id="sec-9">9 Expectation Maximation in IO/DMV-terms</h2>
 423 <div id="text-9">
 424
 425 <p>inner(s,t,LHS) calculates the expected number of trees headed by LHS
 426 from s to t (sentence positions). This uses the P_STOP and P_CHOOSE
 427 values, which have been conveniently distributed into CNF rules as
 428 probN and probA (non-adjacent and adjacent probabilites).
 429 </p>
 430 <p>
 431 When re-estimating, we use the expected values from inner() to get new
 432 values for P_STOP and P_CHOOSE. When we've re-estimated for the entire
 433 corpus, we distribute P_STOP and P_CHOOSE into the CNF rules again, so
 434 that in the next round we use new probN and probA to find
 435 inner-probabilites.
 436 </p>
 437 <p>
 438 The distribution of P_STOP and P_CHOOSE into CNF rules also happens in
 439 init_normalize() (here along with the creation of P_STOP and
 440 P_CHOOSE); P_STOP is used to create CNF rules where one branch of the
 441 rule is STOP, P_CHOOSE is used to create rules of the form
 442 <pre>
 443  h  -&gt; h  _a_
 444  h_ -&gt; h_ _a_
 445 </pre>
 446 </p>
 447 <p>
 448 Since "adjacency" is not captured in regular CNF rules, we need two
 449 probabilites for each rule, and inner() has to know when to use which.
 450 </p>
 451 <ul>
 452 <li id="sec-9.1"><span class="todo">TODO</span> Corpus access<br/>
 453 </li>
 454 <li id="sec-9.2"><span class="todo">TOGROK</span> sentences or rules as the "outer loop"?                     &nbsp;&nbsp;&nbsp;<span class="tag">OPTIMIZE</span><br/>
 455 In regard to the E/M-step, finding P<sub>STOP</sub>, P<sub>CHOOSE</sub>.
 456
 457
 458 </li>
 459 </ul>
 460 </div>
 461
 462 </div>
 463
 464 <div id="outline-container-10" class="outline-2">
 465 <h2 id="sec-10">10 Python-stuff</h2>
 466 <div id="text-10">
 467
 468 <p>Make those debug statements take a bit less space in emacs:
 469 <pre>
 470 (font-lock-add-keywords
 471  'python-mode
 472  '(("\\&lt;\\(\\(io\\.\\)?debug(.+)\\)" 1 font-lock-preprocessor-face t)))
 473 </pre>
 474 </p>
 475 <ul>
 476 <li>
 477 <a href="src/pseudo.py">pseudo.py</a>
 478 </li>
 479 <li>
 480 <a href="http://nltk.org/doc/en/structured-programming.html">http://nltk.org/doc/en/structured-programming.html</a> recursive dynamic
 481 </li>
 482 <li>
 483 <a href="http://nltk.org/doc/en/advanced-parsing.html">http://nltk.org/doc/en/advanced-parsing.html</a>
 484 </li>
 485 <li>
 486 <a href="http://jaynes.colorado.edu/PythonIdioms.html">http://jaynes.colorado.edu/PythonIdioms.html</a>
 487
 488
 489
 490 </li>
 491 </ul>
 492 </div>
 493
 494 </div>
 495
 496 <div id="outline-container-11" class="outline-2">
 497 <h2 id="sec-11">11 Git</h2>
 498 <div id="text-11">
 499
 500 <p>Repository web page: <a href="http://repo.or.cz/w/dmvccm.git">http://repo.or.cz/w/dmvccm.git</a>
 501 </p>
 502 <p>
 503 Setting up a new project:
 504 <pre>
 505  git init
 506  git add .
 507  git commit -m "first release"
 508 </pre>
 509 </p>
 510 <p>
 511 Later on: (<code>-a</code> does <code>git rm</code> and <code>git add</code> automatically)
 512 <pre>
 513  git init
 514  git commit -a -m "some subsequent release"
 515 </pre>
 516 </p>
 517 <p>
 518 Then push stuff up to the remote server:
 519 <pre>
 520  git push git+ssh://username@repo.or.cz/srv/git/dmvccm.git master
 521 </pre>
 522 </p>
 523 <p>
 524 (<code>eval `ssh-agent`</code> and <code>ssh-add</code> to avoid having to type in keyphrase all
 525 the time)
 526 </p>
 527 <p>
 528 Make a copy of the (remote) master branch:
 529 <pre>
 530  git clone git://repo.or.cz/dmvccm.git
 531 </pre>
 532 </p>
 533 <p>
 534 Make and name a new branch in this folder
 535 <pre>
 536  git checkout -b mybranch
 537 </pre>
 538 </p>
 539 <p>
 540 To save changes in <code>mybranch</code>:
 541 <pre>
 542  git commit -a
 543 </pre>
 544 </p>
 545 <p>
 546 Go back to the master branch (uncommitted changes from <code>mybranch</code> are
 547 carried over):
 548 <pre>
 549  git checkout master
 550 </pre>
 551 </p>
 552 <p>
 553 Try out:
 554 <pre>
 555  git add --interactive
 556 </pre>
 557 </p>
 558 <p>
 559 Good tutorial:
 560 <a href="http://www-cs-students.stanford.edu/~blynn//gitmagic/">http://www-cs-students.stanford.edu/~blynn//gitmagic/</a>
 561 </p></div>
 562 </div>
 563 <div id="postamble"><p class="author"> Author: Kevin Brubeck Unhammer
 564 <a href="mailto:K.BrubeckUnhammer at student uva nl ">&lt;K.BrubeckUnhammer at student uva nl &gt;</a>
 565 </p>
 566 <p class="date"> Date: 2008/06/09 13:04:26</p>
 567 </div><p class="postamble">Skrive vha. emacs + <a href='http://orgmode.org/'>org-mode</a></p></body>
 568 </html>