index.html

   1 <html>
   2         <head>
   3                 <title>Dscho's blog</title>
   4                 <meta http-equiv="Content-Type"
   5                         content="text/html; charset=UTF-8"/>
   6         </head>
   7         <body style="width:800px;background-image:url(dscho.git?a=blob_plain;hb=832be85c785c80202f17b87db7f063ae57ec2cac;f=paper.jpg);background-repeat:repeat-y;background-attachment:scroll;padding:0px;">
   8                 <div style="width:610px;margin-left:120px;margin-top:50px;align:left;vertical-align:top;">
   9                         <h1>Dscho's blog</h1>
  10                         <div style="position:absolute;top:50px;left:810px;width=400px">
  11                         <table width=400px bgcolor=#e0e0e0 border=1>
  12                         <tr><th>Table of contents:</th></tr>
  13                         <tr><td>
  14                         <p><ul>
  15                         <li><a href=#1233277286>30 Jan 2009 More valgrind fun</a>
  16                         <li><a href=#1233193467>29 Jan 2009 Interactive stash</a>
  17                         <li><a href=#1233154567>28 Jan 2009 Splitting topic branches</a>
  18                         <li><a href=#1233102919>28 Jan 2009 Showing off that you're an Alpine user ... priceless!</a>
  19                         <li><a href=#1233101919>28 Jan 2009 Progress with the interactive rebase preserving merges</a>
  20                         <li><a href=#1233099894>28 Jan 2009 Another midnight riddle?</a>
  21                         <li><a href=#1233022809>27 Jan 2009 Fun with calculus after midnight</a>
  22                         <li><a href=#1232997290>26 Jan 2009 Valgrind takes a loooong time</a>
  23                         <li><a href=#1232927812>26 Jan 2009 A day full of rebase... and a little valgrind</a>
  24                         <li><a href=#1232888842>25 Jan 2009 Regular diff with word coloring (as opposed to word diff)</a>
  25                         </ul></p>
  26                         <a href=dscho.git?a=blob_plain;hb=9206ba5d401f83ed00b74a943b5499be5e7dcd8c;f=index.html>Older posts</a>
  27                         </td></tr></table>
  28                         <br>
  29                         <div style="text-align:right;">
  30                         <a href="dscho.git?a=blob_plain;hb=blog;f=blog.rss"
  31                            title="Subscribe to my RSS feed"
  32                            class="rss" rel="nofollow"
  33                            style="background-color:orange;text-decoration:none;color:white;font-family:sans-serif;">RSS</a>
  34                         </div>
  35                         <br>
  36                         <table width=400px bgcolor=#e0e0e0 border=1>
  37                         <tr><th>About this blog:</th></tr>
  38                         <tr><td>
  39                         <p>It is an active <a href=http://repo.or.cz/w/git/dscho.git?a=blob;f=source-1232626236.txt;h=1edde0467a>abuse</a> of <a href=http://repo.or.cz/>repo.or.cz</a>,
  40                         letting gitweb unpack the objects in the current tip of the branch <i>blog</i>,
  41                         including the images and the RSS feed.
  42                         </p><p>
  43                         Publishing means running a script that collects the posts, turns them into
  44                         HTML, makes sure all the images are checked in, and pushes the result.
  45                         </p><p>
  46                         This blog also serves to grace the world with Dscho's random thoughts on and
  47                         around Git.
  48                         </p>
  49                         </td></tr></table>
  50                         <br>
  51                         <table width=400px bgcolor=#e0e0e0 border=1>
  52                         <tr><th>Links:</th></tr>
  53                         <tr><td>
  54                         <ul>
  55                         <li> <a href=http://git-scm.com/>Git's homepage</a>
  56                         <li> <a href=http://gitster.livejournal.com/>Junio's blog</a>
  57                         <li> <a href=http://www.spearce.org/>Shawn's blog</a> seems to be sitting
  58                               idle ever since he started working for Google...
  59                         <li> <a href=http://torvalds-family.blogspot.com/>Linus' blog</a> does not
  60                               talk much about Git...
  61                         <li> Scott Chacon's <a href=http://whygitisbetterthanx.com/>Why Git is better
  62                              than X</a> site
  63                         <li> <a href=http://vilain.net/>The blog of mugwump</a>
  64                         <li> <a href=http://blogs.gnome.org/newren/>Elijah Newren</a> chose the
  65                               same path as Cogito, offering an alternative porcelain (an approach
  66                               that is doomed in my opinion)
  67                         <li> <a href=http://msysgit.googlecode.com/>The msysGit project</a>, a (mostly)
  68                               failed experiment to lure the many Windows developers out there to
  69                               contribute to Open Source for a change.
  70                         </ul>
  71                         </td></tr></table>
  72                         <br>
  73                         <table width=400px bgcolor=#e0e0e0 border=1>
  74                         <tr><th>Google Ads:</th></tr>
  75                         <tr><td>
  76                         <script type="text/javascript"><!--
  77                         google_ad_client = "pub-5106407705643819";
  78                         /* 300x250, created 1/22/09 */
  79                         google_ad_slot = "6468207338";
  80                         google_ad_width = 300;
  81                         google_ad_height = 250;
  82                         //-->
  83                         </script>
  84                         <script type="text/javascript"
  85                         src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
  86                         </script>
  87                         </td></tr></table>
  88                         </div>
  89                         <h6>Friday, 30th of January, Anno Domini MMIX, at the hour of the Buffalo</h6>
  90                         <a name=1233277286>
  91                         <h2>More valgrind fun</h2>
  92
  93                         <p>
  94                         </p><p>
  95                         So I spent quite a number of hours on that funny zlib/valgrind issue.  The
  96                         thing is, zlib people claim that even if their code accesses uninitialized
  97                         memory, it does not produce erroneous data (by cutting out the results of the
  98                         uninitialized data, which is cheaper than checking for the end of the buffer
  99                         in an unaligned manner), so zlib will always be special for valgrind.
 100                         </p><p>
 101                         However, the bug I was chasing is funny, and different from said issue.  zlib
 102                         deflates an input buffer to an output buffer that is exactly 58 bytes long.
 103                         But valgrind claims that the 52nd of those bytes is uninitialized, and <u>only</u>
 104                         that one.
 105                         </p><p>
 106                         But it is not.  It must be 0x2c, otherwise zlib refuses to inflate the
 107                         buffer.
 108                         </p><p>
 109                         Now, I went into a debugging frenzy, and finally found out that zlib just
 110                         passes fine (with the default suppressions because of the "cute" way it
 111                         uses uninitialized memory), <u>except</u> when it is compiled with UNALIGNED_OK
 112                         defined.
 113                         </p><p>
 114                         Which Ubuntu does, of course.  Ubuntu, the biggest forker of all.
 115                         </p><p>
 116                         The bad part is that it sounds like a bug in valgrind, and I <u>could</u> imagine
 117                         that it is an issue of an optimized memcpy() that copies int by int, and
 118                         that valgrind misses out on the fact that a part of that int is actually
 119                         <u>not</u> uninitialized.
 120                         </p><p>
 121                         But my debugging session's results disagree with that.
 122                         </p><p>
 123                         With the help of Julian Seward, the original author of valgrind, I instrumented
 124                         zlib's source code so that valgrind checks earlier if the byte is initialized
 125                         or not, to find out where the reason of the issue lies.
 126                         </p><p>
 127                         The sad part is that when I added the instrumentation to both the <u>end</u> of
 128                         the while() loop in compress_block() in zlib's trees.c, and just <u>after</u> the
 129                         while() loop (whose condition is a plain <i>variable < variable</i> comparison,
 130                         nothing fancy, certainly not changing any memory), only the <u>latter</u> catches
 131                         a valgrind error.
 132                         </p><p>
 133                         And that is truly strange.
 134                         </p>
 135                         <h6>Thursday, 29th of January, Anno Domini MMIX, at the hour of the Buffalo</h6>
 136                         <a name=1233193467>
 137                         <h2>Interactive stash</h2>
 138
 139                         <p>
 140                         </p><p>
 141                         There is an easy way to split a patch:
 142                         </p><p>
 143                         <table
 144                                                         border=1 bgcolor=black>
 145                                                 <tr><td bgcolor=lightblue colspan=3>
 146                                 <pre>                                                                                </pre>
 147                         </td></tr>
 148                         <tr><td>
 149                                 <table cellspacing=5 border=0
 150                                          style="color:white;">
 151                                 <tr><td>
 152                                         <pre>
 153 $ git reset HEAD^
 154 $ git add -i
 155 $ git commit
 156 $ git diff -R HEAD@{1} | git apply --index
 157 $ git commit
 158 </pre>
 159                                                         </td></tr>
 160                                                         </table>
 161                                                 </td></tr>
 162                                                 </table>
 163                         </p><p>
 164                         but it misses out on the fact that the first of both commits does not
 165                         reflect the state of the working directory at any time.
 166                         </p><p>
 167                         So I think something like an interactive <i>stash</i> is needed.  A method
 168                         to specify what you want to keep in the working directory, the rest should
 169                         be stashed.  The idea would be something like this:
 170                         </p><p>
 171                         <ol>
 172                         <li>Add the desired changes into a temporary index.
 173                         <li>Put the rest of the changes in another temporary index.
 174                         <li>Stash the latter index.
 175                         <li>Synchronize the working directory with the first index.
 176                         <li>Clean up temporary indices.
 177                         </ol>
 178                         </p><p>
 179                         Or in code:
 180                         </p><p>
 181                         <table
 182                                                         border=1 bgcolor=black>
 183                                                 <tr><td bgcolor=lightblue colspan=3>
 184                                 <pre>                                                                                </pre>
 185                         </td></tr>
 186                         <tr><td>
 187                                 <table cellspacing=5 border=0
 188                                          style="color:white;">
 189                                 <tr><td>
 190                                         <pre>
 191 $ cp .git/index .git/interactive-stash-1
 192 $ GIT_INDEX_FILE=.git/interactive-stash-1 git add -i
 193 $ cp .git/index .git/interactive-stash-2
 194 $ GIT_INDEX_FILE=.git/interactive-stash-1 git diff -R |
 195         (GIT_INDEX_FILE=.git/interactive-stash-2 git apply--index)
 196 $ tree=$(GIT_INDEX_FILE=.git/index git write-tree)
 197 $ commit=$(echo Current index | git commit-tree $tree -p HEAD)
 198 $ tree=$(GIT_INDEX_FILE=.git/interactive-stash-2 git write-tree)
 199 $ commit=$(echo Edited out | git commit-tree $tree -p HEAD -p $commit)
 200 $ git update-ref refs/stash $commit
 201 $ GIT_INDEX_FILE=.git/interactive-stash-1 git checkout-index -a -f
 202 $ rm .git/interactive-stash-1 .git/interactive-stash-2
 203 </pre>
 204                                                         </td></tr>
 205                                                         </table>
 206                                                 </td></tr>
 207                                                 </table>
 208                         </p><p>
 209                         This should probably go into <i>git-stash.sh</i>, maybe even with a switch
 210                         to start git-gui to do the interactive adding instead of git-add.
 211                         </p>
 212                         <h6>Wednesday, 28th of January, Anno Domini MMIX, at the hour of the Monkey</h6>
 213                         <a name=1233154567>
 214                         <h2>Splitting topic branches</h2>
 215
 216                         <p>
 217                         </p><p>
 218                         One might be put off easily by the overarching use of buzzwords in the
 219                         description of how <i>Darcs</i> works.  I, for one, do not expect an intelligent
 220                         author when I read <i>Theory of patches</i> and <i>based on quantum physics</i>.
 221                         </p><p>
 222                         The true story, however, is much simpler, and is actually not that dumb:
 223                         Let's call two commits "conflicting" when they contain at least one
 224                         overlapping change.
 225                         </p><p>
 226                         The idea is now: Given a list of commits (not a set, as the order is important),
 227                         to sort them into smaller lists such that conflicting commits are in the
 228                         sublists ("topic branches") and the sublists are minimal, i.e. no two
 229                         non-conflicting commits are in the same sublist.
 230                         </p><p>
 231                         The idea has flaws, of course, as you can have a patch changing the code,
 232                         and another changing the documentation, but splitting a list of commits
 233                         in that way is a first step to sort out my <i>my-next</i> mess, where I have
 234                         a linear perl of not-necessarily-dependent commits.
 235                         </p><p>
 236                         And actually, my whole rebase revamp aimed at the clean-up for my own
 237                         <i>my-next</i> branch, so I am currently writing a script that can be used
 238                         as a GIT_EDITOR for git-rebase which implements the Darcs algorithm.  Kind of:
 239                         the result is not implicit, but explicit and can be fixed up later.
 240                         </p>
 241                         <h6>Wednesday, 28th of January, Anno Domini MMIX, at the hour of the Buffalo</h6>
 242                         <a name=1233102919>
 243                         <h2>Showing off that you're an Alpine user ... priceless!</h2>
 244
 245                         <p>
 246                         </p><p>
 247                         So I was in a hurry to send the patches, and sent all the patches as replies
 248                         to the cover-letter, and therefore typed in <i>rnyn</i> all the time, which is the
 249                         mantra I need to say to Alpine for <i>Reply</i>, ... include quoted message?
 250                         <i>No</i>, ... reply to all recipients? <i>Yes</i>, ... use first role?
 251                         <i>No, use default role</i>.
 252                         </p><p>
 253                         That was pretty embarassing, as it shows everybody that I still do not trust
 254                         <i>send-email</i>, and rather paste every single patch by hand.  Which is rather
 255                         annoying.
 256                         </p><p>
 257                         So I started using format-patch today, to output directly to Alpine's
 258                         <i>postponed-msgs</i> folder, so that I can do some touchups in the mailer
 259                         before sending the patch series on its way.
 260                         </p><p>
 261                         However, when running format-patch with <i>--thread</i>, it generates Message-ID
 262                         strings that Alpine does not like, and therefore replaces.
 263                         </p><p>
 264                         Oh, well, I'll probably just investigate how the Message-IDs are supposed to
 265                         look, and then use sed to rewrite the generated ones by Alpine-friendly ones
 266                         during the redirection to <i>postponed-msgs</i>.
 267                         </p><p>
 268                         But I alread realized that doing it that way is dramatically faster than the
 269                         workflow I had before.
 270                         </p><p>
 271                         And safer: no more <i>rnyn</i>.
 272                         </p>
 273                         <h6>Wednesday, 28th of January, Anno Domini MMIX, at the hour of the Buffalo</h6>
 274                         <a name=1233101919>
 275                         <h2>Progress with the interactive rebase preserving merges</h2>
 276
 277                         <p>
 278                         </p><p>
 279                         I thought about the "dropped" commits a bit more, after all, and it is
 280                         probably a good thing to substitute them by their parent, as Stephen did it.
 281                         </p><p>
 282                         Imagine that you have merged a branch with two commits.  One is in upstream,
 283                         and you want to rebase (preserving merges) onto upstream.  Then you still
 284                         want to merge the single commit.
 285                         </p><p>
 286                         Even better, if there is no commit left, the <i>$REWRITTEN</i> mechanism will
 287                         substitute the commit onto which we are rebasing, so a merge will just
 288                         result in a fast-forward!
 289                         </p><p>
 290                         Oh, another thing: merge commits should not have a patch id, as they have
 291                         <u>multiple</u> patches.  However, I borked the code long time ago (9c6efa36)
 292                         and merges get the patch-id of their diff to the first parent.  Which is
 293                         probably wrong.  So I guess I'll have to fix that with my rebase revamp.
 294                         </p><p>
 295                         So what about a root commit?  If that was dropped, we will just substitute
 296                         it with the commit onto which we rebase (as a root commit did not really
 297                         have a parent, but will get the onto-commit as new parent)..
 298                         </p><p>
 299                         Now that I finally realized that t3410 is so strange because of a bug <u>I</u>
 300                         introduced, I can finally go about fixing it.
 301                         </p>
 302                         <h6>Wednesday, 28th of January, Anno Domini MMIX, at the hour of the Rat</h6>
 303                         <a name=1233099894>
 304                         <h2>Another midnight riddle?</h2>
 305
 306                         <p>
 307                         </p><p>
 308                         Okay, here's another riddle: what is the next line?
 309                         </p><p>
 310 <pre>
 311        1
 312       1 1
 313       2 1
 314     1 1 1 2
 315     3 1 1 2
 316   2 1 1 2 1 3
 317 ...
 318 </pre>
 319                         </p><p>
 320                         And when does the line get wider than 10 digits?
 321                         </p>
 322                         <h6>Tuesday, 27th of January, Anno Domini MMIX, at the hour of the Tiger</h6>
 323                         <a name=1233022809>
 324                         <h2>Fun with calculus after midnight</h2>
 325
 326                         <p>
 327                         </p><p>
 328                         Problem: what is the shortest way of defining a variable consisting of <i>N</i>
 329                         spaces?  I.e. for <i>N=80</i> the result will look something like
 330                         </p><p>
 331                         <table
 332                                                         border=1 bgcolor=black>
 333                                                 <tr><td bgcolor=lightblue colspan=3>
 334                                 <pre>                                                                                </pre>
 335                         </td></tr>
 336                         <tr><td>
 337                                 <table cellspacing=5 border=0
 338                                          style="color:white;">
 339                                 <tr><td>
 340                                         <pre>
 341 s='    '
 342 s="$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s"
 343 </pre>
 344                                                         </td></tr>
 345                                                         </table>
 346                                                 </td></tr>
 347                                                 </table>
 348                         </p><p>
 349                         Let's see.  Let the minimal number of characters needed be <i>A(N)</i>.  For
 350                         simplicity, let's say that we only use one variable.  Then, certainly, <i>A(N)</i>
 351                         cannot be larger than <i>5+N</i>, as we could define a variable using 1 character
 352                         for the name, 1 for the equal sign, 2 for the quotes, and one for the semicolon
 353                         or newline character (whichever).
 354                         </p><p>
 355                         Now, let's assume <i>N</i> is a product <i>K*L</i>.  Then certainly, <i>A(N)</i> cannot
 356                         be larger than <i>A(K)+5+2*L</i>, as we could first define a variable that has
 357                         exactly <i>K</i> spaces and then use that to define the end result (in the example
 358                         above, <i>K=5</i> and <i>L=20</i>).
 359                         </p><p>
 360                         So, for which <i>N=K*L</i> is it better to use two definitions instead of one?
 361                         </p><p>
 362                         Simple calculus says that <i>5+K*L>5+K+5+2*L</i> must hold true, or (after some
 363                         scribbling): <i>L>1+7/(K-2)</i>. Which means that it makes no sense to define
 364                         a variable with 1 or 2 spaces first, which is kinda obvious (writing '$s'
 365                         alone would use two characters, so we could write the spaces right away).
 366                         </p><p>
 367                         But what for the other values?  For <i>K=3</i>, <i>L</i> must be at least 9 to make
 368                         sense (in other words, <i>N</i> must be at least 27).  For <i>K=4</i>, <i>L</i> needs
 369                         to be greater or equal to 5 (<i>N>=20</i>), the next pairs are <i>(5,4)</i>,
 370                         <i>(6,3)</i>, <i>(7,3)</i>, <i>(8,3)</i>, <i>(9,3)</i> and starting with <i>K=10</i>, any
 371                         <i>L>1</i> makes sense.
 372                         </p><p>
 373                         The second definition can also contain spaces at the end, however, so for any
 374                         <i>N=K*L+M</i>, <i>A(N)</i> cannot be larger than <i>A(K)+5+2*L+M</i>.
 375                         </p><p>
 376                         Not surprisingly, this leads to exactly the same <i>L>1+7/(K-2)</i> (as we can
 377                         append the <i>M</i> spaces in the last definition, no matter if we use 1 or
 378                         2 definitions).
 379                         </p><p>
 380                         However, that means that as soon as <i>N>=18</i>, we should use two definitions,
 381                         prior to that, it makes no sense.
 382                         </p><p>
 383                         So for <i>N<18</i>, <i>A(N)=5+N</i>.
 384                         </p><p>
 385                         But what <i>K</i> should one choose, i.e. how many spaces in the first definition?
 386                         In other words, what is <i>A(N)</i> given that we use two definitions?
 387                         </p><p>
 388                         That will have to wait for another midnight.  Just a teaser: <i>A(80)=36</i>.  Oh,
 389                         and with 80 characters, you can define a string of 9900 spaces...
 390                         </p>
 391                         <h6>Monday, 26th of January, Anno Domini MMIX, at the hour of the Dog</h6>
 392                         <a name=1232997290>
 393                         <h2>Valgrind takes a loooong time</h2>
 394
 395                         <p>
 396                         </p><p>
 397                         Yesterday, I started a run on a fast machine, and it took roughly 5.5
 398                         hours by the machine's clock.
 399                         </p><p>
 400                         And of course, I redirected stdout only... *sigh*
 401                         </p><p>
 402                         Which triggered a Google search how to force redirection of all the output
 403                         in the test scripts to a file and the terminal at the same time.
 404                         </p><p>
 405                         It seems as if that is not easily done.  I tried
 406                         <center><table
 407                                                         border=1 bgcolor=black>
 408                                                 <tr><td bgcolor=lightblue colspan=3>
 409                                 <pre>                                                                                </pre>
 410                         </td></tr>
 411                         <tr><td>
 412                                 <table cellspacing=5 border=0
 413                                          style="color:white;">
 414                                 <tr><td>
 415                                         <pre>
 416 exec >(tee out) 2>&1
 417 </pre>
 418                                                         </td></tr>
 419                                                         </table>
 420                                                 </td></tr>
 421                                                 </table></center>
 422                         </p><p>
 423                         but that did not work: it mumbled something about invalid file handles or some
 424                         such.
 425                         </p><p>
 426                         The only solution I found was:
 427                         <center><table
 428                                                         border=1 bgcolor=black>
 429                                                 <tr><td bgcolor=lightblue colspan=3>
 430                                 <pre>                                                                                </pre>
 431                         </td></tr>
 432                         <tr><td>
 433                                 <table cellspacing=5 border=0
 434                                          style="color:white;">
 435                                 <tr><td>
 436                                         <pre>
 437 mkpipe pipe
 438 tee out < pipe &
 439 exec > pipe 2>&1
 440 </pre>
 441                                                         </td></tr>
 442                                                         </table>
 443                                                 </td></tr>
 444                                                 </table></center>
 445                         </p><p>
 446                         That is a problem for parallel execution, though, so I am still looking for a
 447                         better way to do it.
 448                         </p><p>
 449                         Once I have the output, it is relatively easy to analyze it, as I already
 450                         made a script which disects the output into valgrind output and the test
 451                         case it came from, then groups by common valgrind output and shows the
 452                         result to the user.
 453                         </p>
 454                         <h6>Monday, 26th of January, Anno Domini MMIX, at the hour of the Rat</h6>
 455                         <a name=1232927812>
 456                         <h2>A day full of rebase... and a little valgrind</h2>
 457
 458                         <p>
 459                         </p><p>
 460                         I think that I am progressing nicely with my rebase -p work, so much so
 461                         that I will soon be able to use it myself to work on topic branches <u>and</u>
 462                         rebase all the time without much hassle.
 463                         </p><p>
 464                         In other words, I would like to be able to rebase all my topic branches
 465                         to Junio's <i>next</i> branch whenever that has new commits.  With a single
 466                         rebase.
 467                         </p><p>
 468                         And finally, I got the idea of the thing Stephen implemented for dropped
 469                         commits; however, I am quite sure I do not like it.
 470                         </p><p>
 471                         So what are "dropped" commits?
 472                         </p><p>
 473                         When you rebase, chances are that the upstream already has applied at
 474                         least some of your patches.  So we filter those out with <i>--cherry-pick</i>.
 475                         Stephen calls those "dropped" commits.
 476                         </p><p>
 477                         Then he goes on to reinvent the "$REWRITTEN" system: a directory containing
 478                         the mappings of old commit names to new commit names.  That is easily fixed.
 479                         </p><p>
 480                         But worse, he substitutes the dropped commits with their <u>parents</u>, instead
 481                         of substituting them with the corresponding commits in upstream.
 482                         </p><p>
 483                         I guess this will be a medium-sized fight on the mailing list, depending
 484                         how much energy Stephen wants to put in to defend his strategy.
 485                         </p><p>
 486                         Anyway, I finally got to a point where only three of the tests are failing,
 487                         t3404, t3410 and t3412.  Somewhat disappointing is t3404, as its name pretends
 488                         not to exercize -p at all.  Oh well, I guess I'll see what is broken tomorrow.
 489                         </p><p>
 490                         Another part of the day was dedicated to the Valgrind patch series, which
 491                         should give us yet another level of code quality.
 492                         </p><p>
 493                         After having confused myself with several diverging/obsolete branches, I did
 494                         indeed finally manage to send that patch series off.  Woohoo.
 495                         </p>
 496                         <h6>Sunday, 25th of January, Anno Domini MMIX, at the hour of the Goat</h6>
 497                         <a name=1232888842>
 498                         <h2>Regular diff with word coloring (as opposed to word diff)</h2>
 499
 500                         <p>
 501                         </p><p>
 502                         You know, if I were a bit faster with everything I do, I could do so much more!
 503                         </p><p>
 504                         For example, Junio's idea that you could keep showing a regular diff, only
 505                         coloring the words that have been removed/deleted.
 506                         </p><p>
 507                         Just imagine looking at the diff of a long line in LaTeX source code.  It
 508                         should be much nicer to the eye to see the complete removed/added sentences
 509                         instead of one sentence with colored words in between, disrupting your read
 510                         flow.
 511                         </p><p>
 512                         Compare these two versions:
 513                         </p><p>
 514                         Regular diff with colored words:
 515                         <blockquote><tt>
 516                         -This sentence has a <font color=red>tyop</font> in it.<br>
 517                         +This sentence has a <font color=green>typo</font> in it.<br>
 518                         </tt></blockquote>
 519                         </p><p>
 520                         Word diff:
 521                         <blockquote><tt>
 522                         This sentence has a <font color=red>tyop</font><font color=green>typo</font> in it.<br>
 523                         </tt></blockquote>
 524                         </p><p>
 525                         And it should not be hard to do at all!
 526                         </p><p>
 527                         In <i>diff_words_show()</i>, we basically get the minus lines as
 528                         <i>diff_words->minus</i> and the plus lines as <i>diff_words->plus</i>.  The
 529                         function then prepares the word lists and calls the xdiff engine to do all the
 530                         hard work, analyzing the result from xdiff and printing the lines in
 531                         <i>fn_out_diff_words_aux()</i>.
 532                         </p><p>
 533                         So all that would have to be changed would be to <u>record</u> the positions
 534                         of the removed/added words instead of outputting them, and at the end printing
 535                         the minus/plus buffers using the recorded information to color the words.
 536                         </p><p>
 537                         This would involve
 538                         </p><p>
 539                         <ul>
 540                         <li>adding two new members holding the offsets in the <i>diff_words</i>
 541                         struct,
 542                         <li>having a special handling for that mode in
 543                         <i>fn_out_diff_words_aux()</i> that appends the offsets and
 544                         returns,
 545                         <li>adding a function <i>show_lines_with_colored_words()</i> that
 546                         outputs a buffer with a given prefix ('-' or '+') and coloring the words at
 547                         given offsets with a given color,
 548                         <li>modify <i>diff_words_show()</i> to call that function for the "special
 549                         case: only removal" and at the end of the function, and
 550                         <li> disabling the <i>fwrite()</i> at the end of <i>diff_words_show()</i> for that
 551                         mode.
 552                         </ul>
 553                         </p><p>
 554                         Of course, the hardest part is to find a nice user interface for that.  Maybe
 555                         <i>--colored-words</i>? &#x263a;
 556                         </p>
 557                 </div>
 558         </body>
 559 </html>