Housekeeping on Saturday, 7th of February, Anno Domini MMIX, at the hour of the Pig
[git/dscho.git] / blog.rss
blob52a8d6bff830e1758f684fd0df90a8535e59696e
1 <?xml version="1.0" encoding="utf-8"?>
2 <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
3 <channel>
4 <title>Dscho's blog</title>
5 <link>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html</link>
6 <atom:link href="http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=blog.rss" rel="self" type="application/rss+xml"/>
7 <description>A few stories told by Dscho</description>
8 <lastBuildDate>Sat, 07 Feb 2009 22:05:44 +0100</lastBuildDate>
9 <language>en-us</language>
10 <item>
11 <title>The infamous mark command in the rebase command</title>
12 <link>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1234040744</link>
13 <guid>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1234040744</guid>
14 <pubDate>Sat, 07 Feb 2009 22:05:44 +0100</pubDate>
15 <description><![CDATA[The infamous <i>mark</i> command in the <i>rebase</i> command
16 </p><p>
17 I realized today how easy it is to lose commits with the "merge preserving"
18 mode of the interactive rebase. In my case, it was when I tried to move a
19 bunch of commits from the tip of my branch into a topic branch.
20 </p><p>
21 But after moving the commits, I forgot to update the parent of the merge
22 commit. Possibly a mark command could have helped. The very same command
23 I called a nightmare for usability.
24 </p><p>
25 So I was wrong. Big news. &#x263a;
26 </p><p>
27 However, I think that the syntax "mark :1" is something best left for
28 machine consumption, not for human beings.
29 </p><p>
30 But I have an idea: we could use some garbled commit subject, or in case of
31 merge parents, the merge subject as some human readable title of the mark.
32 </p><p>
33 The rebase script would then look something like this:
34 </p><p>
35 <table
36 border=1 bgcolor=white>
37 <tr><td bgcolor=lightblue colspan=3>
38 <pre> </pre>
39 </td></tr>
40 <tr><td>
41 <table cellspacing=5 border=0
42 style="color:black;">
43 <tr><td>
44 <pre>
45 pick abcdefg Some ultra cool commit
46 bookmark ultra-cool
47 goto upstream
48 pick hijklmn Some other cool commit
49 merge parent ultra-cool Merge 'ultra-cool' into master
50 </pre>
51 </td></tr>
52 </table>
53 </td></tr>
54 </table>
55 </p><p>
56 The good news is: I added code that refuses to finish a rebase when there
57 are commits that were rewritten, but not part of the new HEAD's ancestry.]]></description>
58 </item>
59 <item>
60 <title>New valgrind series</title>
61 <link>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1233707628</link>
62 <guid>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1233707628</guid>
63 <pubDate>Wed, 04 Feb 2009 01:33:48 +0100</pubDate>
64 <description><![CDATA[New valgrind series
65 </p><p>
66 I spent quite some time cleaning up that patch series, and feel pretty
67 exhausted.
68 </p><p>
69 Granted, the new <i>git rebase -i -p</i> does its job without complaint so far
70 (so much so that I think I'll release a version of my <i>rebase</i> series
71 soonish), but it <u>is</u> a hassle when you have patches that you have a hard
72 time to decide upon the order/commit boundaries.
73 </p><p>
74 For example, I could imagine that the patch making the location of the
75 templates independent of the location of the Git binaries should come
76 <u>before</u> my patch series, and the valgrind specific part should then
77 be squashed into the first valgrind commit.
78 </p><p>
79 Also, it uses two features of valgrind 3.4.0:
80 </p><p>
81 <ul>
82 <li><i>...</i> in the suppression file, and
83 <li><i>--track-origins=yes</i>
84 </ul>
85 </p><p>
86 The latter is actually the reason I am pretty willing to keep the
87 requirement of that valgrind version, as it is really, really useful.
88 </p><p>
89 I guess we will see what happens to it.]]></description>
90 </item>
91 <item>
92 <title>Problems with split-topic-branches.sh</title>
93 <link>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1233706294</link>
94 <guid>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1233706294</guid>
95 <pubDate>Wed, 04 Feb 2009 01:11:34 +0100</pubDate>
96 <description><![CDATA[Problems with split-topic-branches.sh
97 </p><p>
98 So my little script that should help me to split my topic branches does
99 not work properly.
100 </p><p>
101 First some background: the idea was to let <i>git blame</i> do the hard work
102 to find overlapping changes, i.e. changes that would conflict when
103 changing the order (or skipping the first change, on which the next builds).
104 </p><p>
105 The first problem with that approach: when lines are <u>removed</u> by one
106 commit, and the next commit touches the same location, <i>git blame</i> does
107 not find that the first commit is required by the second.
108 </p><p>
109 Therefore I introduced a really slow reverse thing which tries to find
110 those commits whose removals survived until the parent of a particular
111 commit, but not further.
112 </p><p>
113 However, it does not work properly. Basically, only context sizes that
114 span the whole files lead to conflict-free topic branches so far.
115 </p><p>
116 As a consequence, I think I'll add an option --sprout to the revision
117 walker which will fake octopus merges (or a series of two-parent merges)
118 whenever it finds a perl of non-merge commits that are theoretically
119 independent, i.e. whose patches apply cleanly.]]></description>
120 </item>
121 <item>
122 <title>More valgrind fun</title>
123 <link>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1233277286</link>
124 <guid>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1233277286</guid>
125 <pubDate>Fri, 30 Jan 2009 02:01:26 +0100</pubDate>
126 <description><![CDATA[More valgrind fun
127 </p><p>
128 So I spent quite a number of hours on that funny zlib/valgrind issue. The
129 thing is, zlib people claim that even if their code accesses uninitialized
130 memory, it does not produce erroneous data (by cutting out the results of the
131 uninitialized data, which is cheaper than checking for the end of the buffer
132 in an unaligned manner), so zlib will always be special for valgrind.
133 </p><p>
134 However, the bug I was chasing is funny, and different from said issue. zlib
135 deflates an input buffer to an output buffer that is exactly 58 bytes long.
136 But valgrind claims that the 52nd of those bytes is uninitialized, and <u>only</u>
137 that one.
138 </p><p>
139 But it is not. It must be 0x2c, otherwise zlib refuses to inflate the
140 buffer.
141 </p><p>
142 Now, I went into a debugging frenzy, and finally found out that zlib just
143 passes fine (with the default suppressions because of the "cute" way it
144 uses uninitialized memory), <u>except</u> when it is compiled with UNALIGNED_OK
145 defined.
146 </p><p>
147 Which Ubuntu does, of course. Ubuntu, the biggest forker of all.
148 </p><p>
149 The bad part is that it sounds like a bug in valgrind, and I <u>could</u> imagine
150 that it is an issue of an optimized memcpy() that copies int by int, and
151 that valgrind misses out on the fact that a part of that int is actually
152 <u>not</u> uninitialized.
153 </p><p>
154 But my debugging session's results disagree with that.
155 </p><p>
156 With the help of Julian Seward, the original author of valgrind, I instrumented
157 zlib's source code so that valgrind checks earlier if the byte is initialized
158 or not, to find out where the reason of the issue lies.
159 </p><p>
160 The sad part is that when I added the instrumentation to both the <u>end</u> of
161 the while() loop in compress_block() in zlib's trees.c, and just <u>after</u> the
162 while() loop (whose condition is a plain <i>variable < variable</i> comparison,
163 nothing fancy, certainly not changing any memory), only the <u>latter</u> catches
164 a valgrind error.
165 </p><p>
166 And that is truly strange.]]></description>
167 </item>
168 <item>
169 <title>Interactive stash</title>
170 <link>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1233193467</link>
171 <guid>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1233193467</guid>
172 <pubDate>Thu, 29 Jan 2009 02:44:27 +0100</pubDate>
173 <description><![CDATA[Interactive stash
174 </p><p>
175 There is an easy way to split a patch:
176 </p><p>
177 <table
178 border=1 bgcolor=white>
179 <tr><td bgcolor=lightblue colspan=3>
180 <pre> </pre>
181 </td></tr>
182 <tr><td>
183 <table cellspacing=5 border=0
184 style="color:black;">
185 <tr><td>
186 <pre>
187 $ git reset HEAD^
188 $ git add -i
189 $ git commit
190 $ git diff -R HEAD@{1} | git apply --index
191 $ git commit
192 </pre>
193 </td></tr>
194 </table>
195 </td></tr>
196 </table>
197 </p><p>
198 but it misses out on the fact that the first of both commits does not
199 reflect the state of the working directory at any time.
200 </p><p>
201 So I think something like an interactive <i>stash</i> is needed. A method
202 to specify what you want to keep in the working directory, the rest should
203 be stashed. The idea would be something like this:
204 </p><p>
205 <ol>
206 <li>Add the desired changes into a temporary index.
207 <li>Put the rest of the changes in another temporary index.
208 <li>Stash the latter index.
209 <li>Synchronize the working directory with the first index.
210 <li>Clean up temporary indices.
211 </ol>
212 </p><p>
213 Or in code:
214 </p><p>
215 <table
216 border=1 bgcolor=white>
217 <tr><td bgcolor=lightblue colspan=3>
218 <pre> </pre>
219 </td></tr>
220 <tr><td>
221 <table cellspacing=5 border=0
222 style="color:black;">
223 <tr><td>
224 <pre>
225 $ cp .git/index .git/interactive-stash-1
226 $ GIT_INDEX_FILE=.git/interactive-stash-1 git add -i
227 $ cp .git/index .git/interactive-stash-2
228 $ GIT_INDEX_FILE=.git/interactive-stash-1 git diff -R |
229 (GIT_INDEX_FILE=.git/interactive-stash-2 git apply--index)
230 $ tree=$(GIT_INDEX_FILE=.git/index git write-tree)
231 $ commit=$(echo Current index | git commit-tree $tree -p HEAD)
232 $ tree=$(GIT_INDEX_FILE=.git/interactive-stash-2 git write-tree)
233 $ commit=$(echo Edited out | git commit-tree $tree -p HEAD -p $commit)
234 $ git update-ref refs/stash $commit
235 $ GIT_INDEX_FILE=.git/interactive-stash-1 git checkout-index -a -f
236 $ rm .git/interactive-stash-1 .git/interactive-stash-2
237 </pre>
238 </td></tr>
239 </table>
240 </td></tr>
241 </table>
242 </p><p>
243 This should probably go into <i>git-stash.sh</i>, maybe even with a switch
244 to start git-gui to do the interactive adding instead of git-add.]]></description>
245 </item>
246 <item>
247 <title>Splitting topic branches</title>
248 <link>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1233154567</link>
249 <guid>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1233154567</guid>
250 <pubDate>Wed, 28 Jan 2009 15:56:07 +0100</pubDate>
251 <description><![CDATA[Splitting topic branches
252 </p><p>
253 One might be put off easily by the overarching use of buzzwords in the
254 description of how <i>Darcs</i> works. I, for one, do not expect an intelligent
255 author when I read <i>Theory of patches</i> and <i>based on quantum physics</i>.
256 </p><p>
257 The true story, however, is much simpler, and is actually not that dumb:
258 Let's call two commits "conflicting" when they contain at least one
259 overlapping change.
260 </p><p>
261 The idea is now: Given a list of commits (not a set, as the order is important),
262 to sort them into smaller lists such that conflicting commits are in the
263 sublists ("topic branches") and the sublists are minimal, i.e. no two
264 non-conflicting commits are in the same sublist.
265 </p><p>
266 The idea has flaws, of course, as you can have a patch changing the code,
267 and another changing the documentation, but splitting a list of commits
268 in that way is a first step to sort out my <i>my-next</i> mess, where I have
269 a linear perl of not-necessarily-dependent commits.
270 </p><p>
271 And actually, my whole rebase revamp aimed at the clean-up for my own
272 <i>my-next</i> branch, so I am currently writing a script that can be used
273 as a GIT_EDITOR for git-rebase which implements the Darcs algorithm. Kind of:
274 the result is not implicit, but explicit and can be fixed up later.]]></description>
275 </item>
276 <item>
277 <title>Showing off that you're an Alpine user ... priceless!</title>
278 <link>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1233102919</link>
279 <guid>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1233102919</guid>
280 <pubDate>Wed, 28 Jan 2009 01:35:19 +0100</pubDate>
281 <description><![CDATA[Showing off that you're an Alpine user ... priceless!
282 </p><p>
283 So I was in a hurry to send the patches, and sent all the patches as replies
284 to the cover-letter, and therefore typed in <i>rnyn</i> all the time, which is the
285 mantra I need to say to Alpine for <i>Reply</i>, ... include quoted message?
286 <i>No</i>, ... reply to all recipients? <i>Yes</i>, ... use first role?
287 <i>No, use default role</i>.
288 </p><p>
289 That was pretty embarassing, as it shows everybody that I still do not trust
290 <i>send-email</i>, and rather paste every single patch by hand. Which is rather
291 annoying.
292 </p><p>
293 So I started using format-patch today, to output directly to Alpine's
294 <i>postponed-msgs</i> folder, so that I can do some touchups in the mailer
295 before sending the patch series on its way.
296 </p><p>
297 However, when running format-patch with <i>--thread</i>, it generates Message-ID
298 strings that Alpine does not like, and therefore replaces.
299 </p><p>
300 Oh, well, I'll probably just investigate how the Message-IDs are supposed to
301 look, and then use sed to rewrite the generated ones by Alpine-friendly ones
302 during the redirection to <i>postponed-msgs</i>.
303 </p><p>
304 But I alread realized that doing it that way is dramatically faster than the
305 workflow I had before.
306 </p><p>
307 And safer: no more <i>rnyn</i>.]]></description>
308 </item>
309 <item>
310 <title>Progress with the interactive rebase preserving merges</title>
311 <link>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1233101919</link>
312 <guid>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1233101919</guid>
313 <pubDate>Wed, 28 Jan 2009 01:18:39 +0100</pubDate>
314 <description><![CDATA[Progress with the interactive rebase preserving merges
315 </p><p>
316 I thought about the "dropped" commits a bit more, after all, and it is
317 probably a good thing to substitute them by their parent, as Stephen did it.
318 </p><p>
319 Imagine that you have merged a branch with two commits. One is in upstream,
320 and you want to rebase (preserving merges) onto upstream. Then you still
321 want to merge the single commit.
322 </p><p>
323 Even better, if there is no commit left, the <i>$REWRITTEN</i> mechanism will
324 substitute the commit onto which we are rebasing, so a merge will just
325 result in a fast-forward!
326 </p><p>
327 Oh, another thing: merge commits should not have a patch id, as they have
328 <u>multiple</u> patches. However, I borked the code long time ago (9c6efa36)
329 and merges get the patch-id of their diff to the first parent. Which is
330 probably wrong. So I guess I'll have to fix that with my rebase revamp.
331 </p><p>
332 So what about a root commit? If that was dropped, we will just substitute
333 it with the commit onto which we rebase (as a root commit did not really
334 have a parent, but will get the onto-commit as new parent)..
335 </p><p>
336 Now that I finally realized that t3410 is so strange because of a bug <u>I</u>
337 introduced, I can finally go about fixing it.]]></description>
338 </item>
339 <item>
340 <title>Another midnight riddle?</title>
341 <link>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1233099894</link>
342 <guid>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1233099894</guid>
343 <pubDate>Wed, 28 Jan 2009 00:44:54 +0100</pubDate>
344 <description><![CDATA[Another midnight riddle?
345 </p><p>
346 Okay, here's another riddle: what is the next line?
347 </p><p>
348 <pre>
352 1 1 1 2
353 3 1 1 2
354 2 1 1 2 1 3
356 </pre>
357 </p><p>
358 And when does the line get wider than 10 digits?]]></description>
359 </item>
360 <item>
361 <title>Fun with calculus after midnight</title>
362 <link>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1233022809</link>
363 <guid>http://repo.or.cz/w/git/dscho.git?a=blob_plain;hb=blog;f=index.html#1233022809</guid>
364 <pubDate>Tue, 27 Jan 2009 03:20:09 +0100</pubDate>
365 <description><![CDATA[Fun with calculus after midnight
366 </p><p>
367 Problem: what is the shortest way of defining a variable consisting of <i>N</i>
368 spaces? I.e. for <i>N=80</i> the result will look something like
369 </p><p>
370 <table
371 border=1 bgcolor=white>
372 <tr><td bgcolor=lightblue colspan=3>
373 <pre> </pre>
374 </td></tr>
375 <tr><td>
376 <table cellspacing=5 border=0
377 style="color:black;">
378 <tr><td>
379 <pre>
380 s=' '
381 s="$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s"
382 </pre>
383 </td></tr>
384 </table>
385 </td></tr>
386 </table>
387 </p><p>
388 Let's see. Let the minimal number of characters needed be <i>A(N)</i>. For
389 simplicity, let's say that we only use one variable. Then, certainly, <i>A(N)</i>
390 cannot be larger than <i>5+N</i>, as we could define a variable using 1 character
391 for the name, 1 for the equal sign, 2 for the quotes, and one for the semicolon
392 or newline character (whichever).
393 </p><p>
394 Now, let's assume <i>N</i> is a product <i>K*L</i>. Then certainly, <i>A(N)</i> cannot
395 be larger than <i>A(K)+5+2*L</i>, as we could first define a variable that has
396 exactly <i>K</i> spaces and then use that to define the end result (in the example
397 above, <i>K=5</i> and <i>L=20</i>).
398 </p><p>
399 So, for which <i>N=K*L</i> is it better to use two definitions instead of one?
400 </p><p>
401 Simple calculus says that <i>5+K*L>5+K+5+2*L</i> must hold true, or (after some
402 scribbling): <i>L>1+7/(K-2)</i>. Which means that it makes no sense to define
403 a variable with 1 or 2 spaces first, which is kinda obvious (writing '$s'
404 alone would use two characters, so we could write the spaces right away).
405 </p><p>
406 But what for the other values? For <i>K=3</i>, <i>L</i> must be at least 9 to make
407 sense (in other words, <i>N</i> must be at least 27). For <i>K=4</i>, <i>L</i> needs
408 to be greater or equal to 5 (<i>N>=20</i>), the next pairs are <i>(5,4)</i>,
409 <i>(6,3)</i>, <i>(7,3)</i>, <i>(8,3)</i>, <i>(9,3)</i> and starting with <i>K=10</i>, any
410 <i>L>1</i> makes sense.
411 </p><p>
412 The second definition can also contain spaces at the end, however, so for any
413 <i>N=K*L+M</i>, <i>A(N)</i> cannot be larger than <i>A(K)+5+2*L+M</i>.
414 </p><p>
415 Not surprisingly, this leads to exactly the same <i>L>1+7/(K-2)</i> (as we can
416 append the <i>M</i> spaces in the last definition, no matter if we use 1 or
417 2 definitions).
418 </p><p>
419 However, that means that as soon as <i>N>=18</i>, we should use two definitions,
420 prior to that, it makes no sense.
421 </p><p>
422 So for <i>N<18</i>, <i>A(N)=5+N</i>.
423 </p><p>
424 But what <i>K</i> should one choose, i.e. how many spaces in the first definition?
425 In other words, what is <i>A(N)</i> given that we use two definitions?
426 </p><p>
427 That will have to wait for another midnight. Just a teaser: <i>A(80)=36</i>. Oh,
428 and with 80 characters, you can define a string of 9900 spaces...]]></description>
429 </item>
430 </channel>
431 </rss>