Update Friday, 30th of January, Anno Domini MMIX, at the hour of the Buffalo
[git/dscho.git] / index.html
blobfcd13e7d4875ba4bea22308d487d95b65380858b
1 <html>
2 <head>
3 <title>Dscho's blog</title>
4 <meta http-equiv="Content-Type"
5 content="text/html; charset=UTF-8"/>
6 </head>
7 <body style="width:800px;background-image:url(dscho.git?a=blob_plain;hb=832be85c785c80202f17b87db7f063ae57ec2cac;f=paper.jpg);background-repeat:repeat-y;background-attachment:scroll;padding:0px;">
8 <div style="width:610px;margin-left:120px;margin-top:50px;align:left;vertical-align:top;">
9 <h1>Dscho's blog</h1>
10 <div style="position:absolute;top:50px;left:810px;width=400px">
11 <table width=400px bgcolor=#e0e0e0 border=1>
12 <tr><th>Table of contents:</th></tr>
13 <tr><td>
14 <p><ul>
15 <li><a href=#1233277286>30 Jan 2009 More valgrind fun</a>
16 <li><a href=#1233193467>29 Jan 2009 Interactive stash</a>
17 <li><a href=#1233154567>28 Jan 2009 Splitting topic branches</a>
18 <li><a href=#1233102919>28 Jan 2009 Showing off that you're an Alpine user ... priceless!</a>
19 <li><a href=#1233101919>28 Jan 2009 Progress with the interactive rebase preserving merges</a>
20 <li><a href=#1233099894>28 Jan 2009 Another midnight riddle?</a>
21 <li><a href=#1233022809>27 Jan 2009 Fun with calculus after midnight</a>
22 <li><a href=#1232997290>26 Jan 2009 Valgrind takes a loooong time</a>
23 <li><a href=#1232927812>26 Jan 2009 A day full of rebase... and a little valgrind</a>
24 <li><a href=#1232888842>25 Jan 2009 Regular diff with word coloring (as opposed to word diff)</a>
25 </ul></p>
26 <a href=dscho.git?a=blob_plain;hb=9206ba5d401f83ed00b74a943b5499be5e7dcd8c;f=index.html>Older posts</a>
27 </td></tr></table>
28 <br>
29 <div style="text-align:right;">
30 <a href="dscho.git?a=blob_plain;hb=blog;f=blog.rss"
31 title="Subscribe to my RSS feed"
32 class="rss" rel="nofollow"
33 style="background-color:orange;text-decoration:none;color:white;font-family:sans-serif;">RSS</a>
34 </div>
35 <br>
36 <table width=400px bgcolor=#e0e0e0 border=1>
37 <tr><th>About this blog:</th></tr>
38 <tr><td>
39 <p>It is an active <a href=http://repo.or.cz/w/git/dscho.git?a=blob;f=source-1232626236.txt;h=1edde0467a>abuse</a> of <a href=http://repo.or.cz/>repo.or.cz</a>,
40 letting gitweb unpack the objects in the current tip of the branch <i>blog</i>,
41 including the images and the RSS feed.
42 </p><p>
43 Publishing means running a script that collects the posts, turns them into
44 HTML, makes sure all the images are checked in, and pushes the result.
45 </p><p>
46 This blog also serves to grace the world with Dscho's random thoughts on and
47 around Git.
48 </p>
49 </td></tr></table>
50 <br>
51 <table width=400px bgcolor=#e0e0e0 border=1>
52 <tr><th>Links:</th></tr>
53 <tr><td>
54 <ul>
55 <li> <a href=http://git-scm.com/>Git's homepage</a>
56 <li> <a href=http://gitster.livejournal.com/>Junio's blog</a>
57 <li> <a href=http://www.spearce.org/>Shawn's blog</a> seems to be sitting
58 idle ever since he started working for Google...
59 <li> <a href=http://torvalds-family.blogspot.com/>Linus' blog</a> does not
60 talk much about Git...
61 <li> Scott Chacon's <a href=http://whygitisbetterthanx.com/>Why Git is better
62 than X</a> site
63 <li> <a href=http://vilain.net/>The blog of mugwump</a>
64 <li> <a href=http://blogs.gnome.org/newren/>Elijah Newren</a> chose the
65 same path as Cogito, offering an alternative porcelain (an approach
66 that is doomed in my opinion)
67 <li> <a href=http://msysgit.googlecode.com/>The msysGit project</a>, a (mostly)
68 failed experiment to lure the many Windows developers out there to
69 contribute to Open Source for a change.
70 </ul>
71 </td></tr></table>
72 <br>
73 <table width=400px bgcolor=#e0e0e0 border=1>
74 <tr><th>Google Ads:</th></tr>
75 <tr><td>
76 <script type="text/javascript"><!--
77 google_ad_client = "pub-5106407705643819";
78 /* 300x250, created 1/22/09 */
79 google_ad_slot = "6468207338";
80 google_ad_width = 300;
81 google_ad_height = 250;
82 //-->
83 </script>
84 <script type="text/javascript"
85 src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
86 </script>
87 </td></tr></table>
88 </div>
89 <h6>Friday, 30th of January, Anno Domini MMIX, at the hour of the Buffalo</h6>
90 <a name=1233277286>
91 <h2>More valgrind fun</h2>
93 <p>
94 </p><p>
95 So I spent quite a number of hours on that funny zlib/valgrind issue. The
96 thing is, zlib people claim that even if their code accesses uninitialized
97 memory, it does not produce erroneous data (by cutting out the results of the
98 uninitialized data, which is cheaper than checking for the end of the buffer
99 in an unaligned manner), so zlib will always be special for valgrind.
100 </p><p>
101 However, the bug I was chasing is funny, and different from said issue. zlib
102 deflates an input buffer to an output buffer that is exactly 58 bytes long.
103 But valgrind claims that the 52nd of those bytes is uninitialized, and <u>only</u>
104 that one.
105 </p><p>
106 But it is not. It must be 0x2c, otherwise zlib refuses to inflate the
107 buffer.
108 </p><p>
109 Now, I went into a debugging frenzy, and finally found out that zlib just
110 passes fine (with the default suppressions because of the "cute" way it
111 uses uninitialized memory), <u>except</u> when it is compiled with UNALIGNED_OK
112 defined.
113 </p><p>
114 Which Ubuntu does, of course. Ubuntu, the biggest forker of all.
115 </p><p>
116 The bad part is that it sounds like a bug in valgrind, and I <u>could</u> imagine
117 that it is an issue of an optimized memcpy() that copies int by int, and
118 that valgrind misses out on the fact that a part of that int is actually
119 <u>not</u> uninitialized.
120 </p><p>
121 But my debugging session's results disagree with that.
122 </p><p>
123 With the help of Julian Seward, the original author of valgrind, I instrumented
124 zlib's source code so that valgrind checks earlier if the byte is initialized
125 or not, to find out where the reason of the issue lies.
126 </p><p>
127 The sad part is that when I added the instrumentation to both the <u>end</u> of
128 the while() loop in compress_block() in zlib's trees.c, and just <u>after</u> the
129 while() loop (whose condition is a plain <i>variable < variable</i> comparison,
130 nothing fancy, certainly not changing any memory), only the <u>latter</u> catches
131 a valgrind error.
132 </p><p>
133 And that is truly strange.
134 </p>
135 <h6>Thursday, 29th of January, Anno Domini MMIX, at the hour of the Buffalo</h6>
136 <a name=1233193467>
137 <h2>Interactive stash</h2>
140 </p><p>
141 There is an easy way to split a patch:
142 </p><p>
143 <table
144 border=1 bgcolor=black>
145 <tr><td bgcolor=lightblue colspan=3>
146 <pre> </pre>
147 </td></tr>
148 <tr><td>
149 <table cellspacing=5 border=0
150 style="color:white;">
151 <tr><td>
152 <pre>
153 $ git reset HEAD^
154 $ git add -i
155 $ git commit
156 $ git diff -R HEAD@{1} | git apply --index
157 $ git commit
158 </pre>
159 </td></tr>
160 </table>
161 </td></tr>
162 </table>
163 </p><p>
164 but it misses out on the fact that the first of both commits does not
165 reflect the state of the working directory at any time.
166 </p><p>
167 So I think something like an interactive <i>stash</i> is needed. A method
168 to specify what you want to keep in the working directory, the rest should
169 be stashed. The idea would be something like this:
170 </p><p>
171 <ol>
172 <li>Add the desired changes into a temporary index.
173 <li>Put the rest of the changes in another temporary index.
174 <li>Stash the latter index.
175 <li>Synchronize the working directory with the first index.
176 <li>Clean up temporary indices.
177 </ol>
178 </p><p>
179 Or in code:
180 </p><p>
181 <table
182 border=1 bgcolor=black>
183 <tr><td bgcolor=lightblue colspan=3>
184 <pre> </pre>
185 </td></tr>
186 <tr><td>
187 <table cellspacing=5 border=0
188 style="color:white;">
189 <tr><td>
190 <pre>
191 $ cp .git/index .git/interactive-stash-1
192 $ GIT_INDEX_FILE=.git/interactive-stash-1 git add -i
193 $ cp .git/index .git/interactive-stash-2
194 $ GIT_INDEX_FILE=.git/interactive-stash-1 git diff -R |
195 (GIT_INDEX_FILE=.git/interactive-stash-2 git apply--index)
196 $ tree=$(GIT_INDEX_FILE=.git/index git write-tree)
197 $ commit=$(echo Current index | git commit-tree $tree -p HEAD)
198 $ tree=$(GIT_INDEX_FILE=.git/interactive-stash-2 git write-tree)
199 $ commit=$(echo Edited out | git commit-tree $tree -p HEAD -p $commit)
200 $ git update-ref refs/stash $commit
201 $ GIT_INDEX_FILE=.git/interactive-stash-1 git checkout-index -a -f
202 $ rm .git/interactive-stash-1 .git/interactive-stash-2
203 </pre>
204 </td></tr>
205 </table>
206 </td></tr>
207 </table>
208 </p><p>
209 This should probably go into <i>git-stash.sh</i>, maybe even with a switch
210 to start git-gui to do the interactive adding instead of git-add.
211 </p>
212 <h6>Wednesday, 28th of January, Anno Domini MMIX, at the hour of the Monkey</h6>
213 <a name=1233154567>
214 <h2>Splitting topic branches</h2>
217 </p><p>
218 One might be put off easily by the overarching use of buzzwords in the
219 description of how <i>Darcs</i> works. I, for one, do not expect an intelligent
220 author when I read <i>Theory of patches</i> and <i>based on quantum physics</i>.
221 </p><p>
222 The true story, however, is much simpler, and is actually not that dumb:
223 Let's call two commits "conflicting" when they contain at least one
224 overlapping change.
225 </p><p>
226 The idea is now: Given a list of commits (not a set, as the order is important),
227 to sort them into smaller lists such that conflicting commits are in the
228 sublists ("topic branches") and the sublists are minimal, i.e. no two
229 non-conflicting commits are in the same sublist.
230 </p><p>
231 The idea has flaws, of course, as you can have a patch changing the code,
232 and another changing the documentation, but splitting a list of commits
233 in that way is a first step to sort out my <i>my-next</i> mess, where I have
234 a linear perl of not-necessarily-dependent commits.
235 </p><p>
236 And actually, my whole rebase revamp aimed at the clean-up for my own
237 <i>my-next</i> branch, so I am currently writing a script that can be used
238 as a GIT_EDITOR for git-rebase which implements the Darcs algorithm. Kind of:
239 the result is not implicit, but explicit and can be fixed up later.
240 </p>
241 <h6>Wednesday, 28th of January, Anno Domini MMIX, at the hour of the Buffalo</h6>
242 <a name=1233102919>
243 <h2>Showing off that you're an Alpine user ... priceless!</h2>
246 </p><p>
247 So I was in a hurry to send the patches, and sent all the patches as replies
248 to the cover-letter, and therefore typed in <i>rnyn</i> all the time, which is the
249 mantra I need to say to Alpine for <i>Reply</i>, ... include quoted message?
250 <i>No</i>, ... reply to all recipients? <i>Yes</i>, ... use first role?
251 <i>No, use default role</i>.
252 </p><p>
253 That was pretty embarassing, as it shows everybody that I still do not trust
254 <i>send-email</i>, and rather paste every single patch by hand. Which is rather
255 annoying.
256 </p><p>
257 So I started using format-patch today, to output directly to Alpine's
258 <i>postponed-msgs</i> folder, so that I can do some touchups in the mailer
259 before sending the patch series on its way.
260 </p><p>
261 However, when running format-patch with <i>--thread</i>, it generates Message-ID
262 strings that Alpine does not like, and therefore replaces.
263 </p><p>
264 Oh, well, I'll probably just investigate how the Message-IDs are supposed to
265 look, and then use sed to rewrite the generated ones by Alpine-friendly ones
266 during the redirection to <i>postponed-msgs</i>.
267 </p><p>
268 But I alread realized that doing it that way is dramatically faster than the
269 workflow I had before.
270 </p><p>
271 And safer: no more <i>rnyn</i>.
272 </p>
273 <h6>Wednesday, 28th of January, Anno Domini MMIX, at the hour of the Buffalo</h6>
274 <a name=1233101919>
275 <h2>Progress with the interactive rebase preserving merges</h2>
278 </p><p>
279 I thought about the "dropped" commits a bit more, after all, and it is
280 probably a good thing to substitute them by their parent, as Stephen did it.
281 </p><p>
282 Imagine that you have merged a branch with two commits. One is in upstream,
283 and you want to rebase (preserving merges) onto upstream. Then you still
284 want to merge the single commit.
285 </p><p>
286 Even better, if there is no commit left, the <i>$REWRITTEN</i> mechanism will
287 substitute the commit onto which we are rebasing, so a merge will just
288 result in a fast-forward!
289 </p><p>
290 Oh, another thing: merge commits should not have a patch id, as they have
291 <u>multiple</u> patches. However, I borked the code long time ago (9c6efa36)
292 and merges get the patch-id of their diff to the first parent. Which is
293 probably wrong. So I guess I'll have to fix that with my rebase revamp.
294 </p><p>
295 So what about a root commit? If that was dropped, we will just substitute
296 it with the commit onto which we rebase (as a root commit did not really
297 have a parent, but will get the onto-commit as new parent)..
298 </p><p>
299 Now that I finally realized that t3410 is so strange because of a bug <u>I</u>
300 introduced, I can finally go about fixing it.
301 </p>
302 <h6>Wednesday, 28th of January, Anno Domini MMIX, at the hour of the Rat</h6>
303 <a name=1233099894>
304 <h2>Another midnight riddle?</h2>
307 </p><p>
308 Okay, here's another riddle: what is the next line?
309 </p><p>
310 <pre>
314 1 1 1 2
315 3 1 1 2
316 2 1 1 2 1 3
318 </pre>
319 </p><p>
320 And when does the line get wider than 10 digits?
321 </p>
322 <h6>Tuesday, 27th of January, Anno Domini MMIX, at the hour of the Tiger</h6>
323 <a name=1233022809>
324 <h2>Fun with calculus after midnight</h2>
327 </p><p>
328 Problem: what is the shortest way of defining a variable consisting of <i>N</i>
329 spaces? I.e. for <i>N=80</i> the result will look something like
330 </p><p>
331 <table
332 border=1 bgcolor=black>
333 <tr><td bgcolor=lightblue colspan=3>
334 <pre> </pre>
335 </td></tr>
336 <tr><td>
337 <table cellspacing=5 border=0
338 style="color:white;">
339 <tr><td>
340 <pre>
341 s=' '
342 s="$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s"
343 </pre>
344 </td></tr>
345 </table>
346 </td></tr>
347 </table>
348 </p><p>
349 Let's see. Let the minimal number of characters needed be <i>A(N)</i>. For
350 simplicity, let's say that we only use one variable. Then, certainly, <i>A(N)</i>
351 cannot be larger than <i>5+N</i>, as we could define a variable using 1 character
352 for the name, 1 for the equal sign, 2 for the quotes, and one for the semicolon
353 or newline character (whichever).
354 </p><p>
355 Now, let's assume <i>N</i> is a product <i>K*L</i>. Then certainly, <i>A(N)</i> cannot
356 be larger than <i>A(K)+5+2*L</i>, as we could first define a variable that has
357 exactly <i>K</i> spaces and then use that to define the end result (in the example
358 above, <i>K=5</i> and <i>L=20</i>).
359 </p><p>
360 So, for which <i>N=K*L</i> is it better to use two definitions instead of one?
361 </p><p>
362 Simple calculus says that <i>5+K*L>5+K+5+2*L</i> must hold true, or (after some
363 scribbling): <i>L>1+7/(K-2)</i>. Which means that it makes no sense to define
364 a variable with 1 or 2 spaces first, which is kinda obvious (writing '$s'
365 alone would use two characters, so we could write the spaces right away).
366 </p><p>
367 But what for the other values? For <i>K=3</i>, <i>L</i> must be at least 9 to make
368 sense (in other words, <i>N</i> must be at least 27). For <i>K=4</i>, <i>L</i> needs
369 to be greater or equal to 5 (<i>N>=20</i>), the next pairs are <i>(5,4)</i>,
370 <i>(6,3)</i>, <i>(7,3)</i>, <i>(8,3)</i>, <i>(9,3)</i> and starting with <i>K=10</i>, any
371 <i>L>1</i> makes sense.
372 </p><p>
373 The second definition can also contain spaces at the end, however, so for any
374 <i>N=K*L+M</i>, <i>A(N)</i> cannot be larger than <i>A(K)+5+2*L+M</i>.
375 </p><p>
376 Not surprisingly, this leads to exactly the same <i>L>1+7/(K-2)</i> (as we can
377 append the <i>M</i> spaces in the last definition, no matter if we use 1 or
378 2 definitions).
379 </p><p>
380 However, that means that as soon as <i>N>=18</i>, we should use two definitions,
381 prior to that, it makes no sense.
382 </p><p>
383 So for <i>N<18</i>, <i>A(N)=5+N</i>.
384 </p><p>
385 But what <i>K</i> should one choose, i.e. how many spaces in the first definition?
386 In other words, what is <i>A(N)</i> given that we use two definitions?
387 </p><p>
388 That will have to wait for another midnight. Just a teaser: <i>A(80)=36</i>. Oh,
389 and with 80 characters, you can define a string of 9900 spaces...
390 </p>
391 <h6>Monday, 26th of January, Anno Domini MMIX, at the hour of the Dog</h6>
392 <a name=1232997290>
393 <h2>Valgrind takes a loooong time</h2>
396 </p><p>
397 Yesterday, I started a run on a fast machine, and it took roughly 5.5
398 hours by the machine's clock.
399 </p><p>
400 And of course, I redirected stdout only... *sigh*
401 </p><p>
402 Which triggered a Google search how to force redirection of all the output
403 in the test scripts to a file and the terminal at the same time.
404 </p><p>
405 It seems as if that is not easily done. I tried
406 <center><table
407 border=1 bgcolor=black>
408 <tr><td bgcolor=lightblue colspan=3>
409 <pre> </pre>
410 </td></tr>
411 <tr><td>
412 <table cellspacing=5 border=0
413 style="color:white;">
414 <tr><td>
415 <pre>
416 exec >(tee out) 2>&1
417 </pre>
418 </td></tr>
419 </table>
420 </td></tr>
421 </table></center>
422 </p><p>
423 but that did not work: it mumbled something about invalid file handles or some
424 such.
425 </p><p>
426 The only solution I found was:
427 <center><table
428 border=1 bgcolor=black>
429 <tr><td bgcolor=lightblue colspan=3>
430 <pre> </pre>
431 </td></tr>
432 <tr><td>
433 <table cellspacing=5 border=0
434 style="color:white;">
435 <tr><td>
436 <pre>
437 mkpipe pipe
438 tee out < pipe &
439 exec > pipe 2>&1
440 </pre>
441 </td></tr>
442 </table>
443 </td></tr>
444 </table></center>
445 </p><p>
446 That is a problem for parallel execution, though, so I am still looking for a
447 better way to do it.
448 </p><p>
449 Once I have the output, it is relatively easy to analyze it, as I already
450 made a script which disects the output into valgrind output and the test
451 case it came from, then groups by common valgrind output and shows the
452 result to the user.
453 </p>
454 <h6>Monday, 26th of January, Anno Domini MMIX, at the hour of the Rat</h6>
455 <a name=1232927812>
456 <h2>A day full of rebase... and a little valgrind</h2>
459 </p><p>
460 I think that I am progressing nicely with my rebase -p work, so much so
461 that I will soon be able to use it myself to work on topic branches <u>and</u>
462 rebase all the time without much hassle.
463 </p><p>
464 In other words, I would like to be able to rebase all my topic branches
465 to Junio's <i>next</i> branch whenever that has new commits. With a single
466 rebase.
467 </p><p>
468 And finally, I got the idea of the thing Stephen implemented for dropped
469 commits; however, I am quite sure I do not like it.
470 </p><p>
471 So what are "dropped" commits?
472 </p><p>
473 When you rebase, chances are that the upstream already has applied at
474 least some of your patches. So we filter those out with <i>--cherry-pick</i>.
475 Stephen calls those "dropped" commits.
476 </p><p>
477 Then he goes on to reinvent the "$REWRITTEN" system: a directory containing
478 the mappings of old commit names to new commit names. That is easily fixed.
479 </p><p>
480 But worse, he substitutes the dropped commits with their <u>parents</u>, instead
481 of substituting them with the corresponding commits in upstream.
482 </p><p>
483 I guess this will be a medium-sized fight on the mailing list, depending
484 how much energy Stephen wants to put in to defend his strategy.
485 </p><p>
486 Anyway, I finally got to a point where only three of the tests are failing,
487 t3404, t3410 and t3412. Somewhat disappointing is t3404, as its name pretends
488 not to exercize -p at all. Oh well, I guess I'll see what is broken tomorrow.
489 </p><p>
490 Another part of the day was dedicated to the Valgrind patch series, which
491 should give us yet another level of code quality.
492 </p><p>
493 After having confused myself with several diverging/obsolete branches, I did
494 indeed finally manage to send that patch series off. Woohoo.
495 </p>
496 <h6>Sunday, 25th of January, Anno Domini MMIX, at the hour of the Goat</h6>
497 <a name=1232888842>
498 <h2>Regular diff with word coloring (as opposed to word diff)</h2>
501 </p><p>
502 You know, if I were a bit faster with everything I do, I could do so much more!
503 </p><p>
504 For example, Junio's idea that you could keep showing a regular diff, only
505 coloring the words that have been removed/deleted.
506 </p><p>
507 Just imagine looking at the diff of a long line in LaTeX source code. It
508 should be much nicer to the eye to see the complete removed/added sentences
509 instead of one sentence with colored words in between, disrupting your read
510 flow.
511 </p><p>
512 Compare these two versions:
513 </p><p>
514 Regular diff with colored words:
515 <blockquote><tt>
516 -This sentence has a <font color=red>tyop</font> in it.<br>
517 +This sentence has a <font color=green>typo</font> in it.<br>
518 </tt></blockquote>
519 </p><p>
520 Word diff:
521 <blockquote><tt>
522 This sentence has a <font color=red>tyop</font><font color=green>typo</font> in it.<br>
523 </tt></blockquote>
524 </p><p>
525 And it should not be hard to do at all!
526 </p><p>
527 In <i>diff_words_show()</i>, we basically get the minus lines as
528 <i>diff_words->minus</i> and the plus lines as <i>diff_words->plus</i>. The
529 function then prepares the word lists and calls the xdiff engine to do all the
530 hard work, analyzing the result from xdiff and printing the lines in
531 <i>fn_out_diff_words_aux()</i>.
532 </p><p>
533 So all that would have to be changed would be to <u>record</u> the positions
534 of the removed/added words instead of outputting them, and at the end printing
535 the minus/plus buffers using the recorded information to color the words.
536 </p><p>
537 This would involve
538 </p><p>
539 <ul>
540 <li>adding two new members holding the offsets in the <i>diff_words</i>
541 struct,
542 <li>having a special handling for that mode in
543 <i>fn_out_diff_words_aux()</i> that appends the offsets and
544 returns,
545 <li>adding a function <i>show_lines_with_colored_words()</i> that
546 outputs a buffer with a given prefix ('-' or '+') and coloring the words at
547 given offsets with a given color,
548 <li>modify <i>diff_words_show()</i> to call that function for the "special
549 case: only removal" and at the end of the function, and
550 <li> disabling the <i>fwrite()</i> at the end of <i>diff_words_show()</i> for that
551 mode.
552 </ul>
553 </p><p>
554 Of course, the hardest part is to find a nice user interface for that. Maybe
555 <i>--colored-words</i>? &#x263a;
556 </p>
557 </div>
558 </body>
559 </html>