Documentation/technical/remembering-renames.txt

   1 Rebases and cherry-picks involve a sequence of merges whose results are
   2 recorded as new single-parent commits.  The first parent side of those
   3 merges represent the "upstream" side, and often include a far larger set of
   4 changes than the second parent side.  Traditionally, the renames on the
   5 first-parent side of that sequence of merges were repeatedly re-detected
   6 for every merge.  This file explains why it is safe and effective during
   7 rebases and cherry-picks to remember renames on the upstream side of
   8 history as an optimization, assuming all merges are automatic and clean
   9 (i.e. no conflicts and not interrupted for user input or editing).
  10
  11 Outline:
  12
  13   0. Assumptions
  14
  15   1. How rebasing and cherry-picking work
  16
  17   2. Why the renames on MERGE_SIDE1 in any given pick are *always* a
  18      superset of the renames on MERGE_SIDE1 for the next pick.
  19
  20   3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also
  21      a rename on MERGE_SIDE1 for the next pick
  22
  23   4. A detailed description of the the counter-examples to #3.
  24
  25   5. Why the special cases in #4 are still fully reasonable to use to pair
  26      up files for three-way content merging in the merge machinery, and why
  27      they do not affect the correctness of the merge.
  28
  29   6. Interaction with skipping of "irrelevant" renames
  30
  31   7. Additional items that need to be cached
  32
  33   8. How directory rename detection interacts with the above and why this
  34      optimization is still safe even if merge.directoryRenames is set to
  35      "true".
  36
  37
  38 === 0. Assumptions ===
  39
  40 There are two assumptions that will hold throughout this document:
  41
  42   * The upstream side where commits are transplanted to is treated as the
  43     first parent side when rebase/cherry-pick call the merge machinery
  44
  45   * All merges are fully automatic
  46
  47 and a third that will hold in sections 2-5 for simplicity, that I'll later
  48 address in section 8:
  49
  50   * No directory renames occur
  51
  52
  53 Let me explain more about each assumption and why I include it:
  54
  55
  56 The first assumption is merely for the purposes of making this document
  57 clearer; the optimization implementation does not actually depend upon it.
  58 However, the assumption does hold in all cases because it reflects the way
  59 that both rebase and cherry-pick were implemented; and the implementation
  60 of cherry-pick and rebase are not readily changeable for backwards
  61 compatibility reasons (see for example the discussion of the --ours and
  62 --theirs flag in the documentation of `git checkout`, particularly the
  63 comments about how they behave with rebase).  The optimization avoids
  64 checking first-parent-ness, though.  It checks the conditions that make the
  65 optimization valid instead, so it would still continue working if someone
  66 changed the parent ordering that cherry-pick and rebase use.  But making
  67 this assumption does make this document much clearer and prevents me from
  68 having to repeat every example twice.
  69
  70 If the second assumption is violated, then the optimization simply is
  71 turned off and thus isn't relevant to consider.  The second assumption can
  72 also be stated as "there is no interruption for a user to resolve conflicts
  73 or to just further edit or tweak files".  While real rebases and
  74 cherry-picks are often interrupted (either because it's an interactive
  75 rebase where the user requested to stop and edit, or because there were
  76 conflicts that the user needs to resolve), the cache of renames is not
  77 stored on disk, and thus is thrown away as soon as the rebase or cherry
  78 pick stops for the user to resolve the operation.
  79
  80 The third assumption makes sections 2-5 simpler, and allows people to
  81 understand the basics of why this optimization is safe and effective, and
  82 then I can go back and address the specifics in section 8.  It is probably
  83 also worth noting that if directory renames do occur, then the default of
  84 merge.directoryRenames being set to "conflict" means that the operation
  85 will stop for users to resolve the conflicts and the cache will be thrown
  86 away, and thus that there won't be an optimization to apply.  So, the only
  87 reason we need to address directory renames specifically, is that some
  88 users will have set merge.directoryRenames to "true" to allow the merges to
  89 continue to proceed automatically.  The optimization is still safe with
  90 this config setting, but we have to discuss a few more cases to show why;
  91 this discussion is deferred until section 8.
  92
  93
  94 === 1. How rebasing and cherry-picking work ===
  95
  96 Consider the following setup (from the git-rebase manpage):
  97
  98                      A---B---C topic
  99                     /
 100                D---E---F---G main
 101
 102 After rebasing or cherry-picking topic onto main, this will appear as:
 103
 104                              A'--B'--C' topic
 105                             /
 106                D---E---F---G main
 107
 108 The way the commits A', B', and C' are created is through a series of
 109 merges, where rebase or cherry-pick sequentially uses each of the three
 110 A-B-C commits in a special merge operation.  Let's label the three commits
 111 in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2.  For
 112 this picture, the three commits for each of the three merges would be:
 113
 114 To create A':
 115    MERGE_BASE:   E
 116    MERGE_SIDE1:  G
 117    MERGE_SIDE2:  A
 118
 119 To create B':
 120    MERGE_BASE:   A
 121    MERGE_SIDE1:  A'
 122    MERGE_SIDE2:  B
 123
 124 To create C':
 125    MERGE_BASE:   B
 126    MERGE_SIDE1:  B'
 127    MERGE_SIDE2:  C
 128
 129 Sometimes, folks are surprised that these three-way merges are done.  It
 130 can be useful in understanding these three-way merges to view them in a
 131 slightly different light.  For example, in creating C', you can view it as
 132 either:
 133
 134   * Apply the changes between B & C to B'
 135   * Apply the changes between B & B' to C
 136
 137 Conceptually the two statements above are the same as a three-way merge of
 138 B, B', and C, at least the parts before you decide to record a commit.
 139
 140
 141 === 2. Why the renames on MERGE_SIDE1 in any given pick are always a ===
 142 ===    superset of the renames on MERGE_SIDE1 for the next pick.     ===
 143
 144 The merge machinery uses the filenames it is fed from MERGE_BASE,
 145 MERGE_SIDE1, and MERGE_SIDE2.  It will only move content to a different
 146 filename under one of three conditions:
 147
 148   * To make both pieces of a conflict available to a user during conflict
 149     resolution (examples: directory/file conflict, add/add type conflict
 150     such as symlink vs. regular file)
 151
 152   * When MERGE_SIDE1 renames the file.
 153
 154   * When MERGE_SIDE2 renames the file.
 155
 156 First, let's remember what commits are involved in the first and second
 157 picks of the cherry-pick or rebase sequence:
 158
 159 To create A':
 160    MERGE_BASE:   E
 161    MERGE_SIDE1:  G
 162    MERGE_SIDE2:  A
 163
 164 To create B':
 165    MERGE_BASE:   A
 166    MERGE_SIDE1:  A'
 167    MERGE_SIDE2:  B
 168
 169 So, in particular, we need to show that the renames between E and G are a
 170 superset of those between A and A'.
 171
 172 A' is created by the first merge.  A' will only have renames for one of the
 173 three reasons listed above.  The first case, a conflict, results in a
 174 situation where the cache is dropped and thus this optimization doesn't
 175 take effect, so we need not consider that case.  The third case, a rename
 176 on MERGE_SIDE2 (i.e. from G to A), will show up in A' but it also shows up
 177 in A -- therefore when diffing A and A' that path does not show up as a
 178 rename.  The only remaining way for renames to show up in A' is for the
 179 rename to come from MERGE_SIDE1.  Therefore, all renames between A and A'
 180 are a subset of those between E and G.  Equivalently, all renames between E
 181 and G are a superset of those between A and A'.
 182
 183
 184 === 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_   ===
 185 ===    always also a rename on MERGE_SIDE1 for the next pick.        ===
 186
 187 Let's again look at the first two picks:
 188
 189 To create A':
 190    MERGE_BASE:   E
 191    MERGE_SIDE1:  G
 192    MERGE_SIDE2:  A
 193
 194 To create B':
 195    MERGE_BASE:   A
 196    MERGE_SIDE1:  A'
 197    MERGE_SIDE2:  B
 198
 199 Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e.
 200 any given rename from E to G.  Let's use the filenames 'oldfile' and
 201 'newfile' for demonstration purposes.  That first pick will function as
 202 follows; when the rename is detected, the merge machinery will do a
 203 three-way content merge of the following:
 204     E:oldfile
 205     G:newfile
 206     A:oldfile
 207 and produce a new result:
 208     A':newfile
 209
 210 Note above that I've assumed that E->A did not rename oldfile.  If that
 211 side did rename, then we most likely have a rename/rename(1to2) conflict
 212 that will cause the rebase or cherry-pick operation to halt and drop the
 213 in-memory cache of renames and thus doesn't need to be considered further.
 214 In the special case that E->A does rename the file but also renames it to
 215 newfile, then there is no conflict from the renaming and the merge can
 216 succeed.  In this special case, the rename is not valid to cache because
 217 the second merge will find A:newfile in the MERGE_BASE (see also the new
 218 testcases in t6429 with "rename same file identically" in their
 219 description).  So a rename/rename(1to1) needs to be specially handled by
 220 pruning renames from the cache and decrementing the dir_rename_counts in
 221 the current and leading directories associated with those renames.  Or,
 222 since these are really rare, one could just take the easy way out and
 223 disable the remembering renames optimization when a rename/rename(1to1)
 224 happens.
 225
 226 The previous paragraph handled the cases for E->A renaming oldfile, let's
 227 continue assuming that oldfile is not renamed in A.
 228
 229 As per the diagram for creating B', MERGE_SIDE1 involves the changes from A
 230 to A'.  So, we are curious whether A:oldfile and A':newfile will be viewed
 231 as renames.  Note that:
 232
 233   * There will be no A':oldfile (because there could not have been a
 234     G:oldfile as we do not do break detection in the merge machinery and
 235     G:newfile was detected as a rename, and by the construction of the
 236     rename above that merged cleanly, the merge machinery will ensure there
 237     is no 'oldfile' in the result).
 238
 239   * There will be no A:newfile (if there had been, we would have had a
 240     rename/add conflict).
 241
 242   * Clearly A:oldfile and A':newfile are "related" (A':newfile came from a
 243     clean three-way content merge involving A:oldfile).
 244
 245 We can also expound on the third point above, by noting that three-way
 246 content merges can also be viewed as applying the differences between the
 247 base and one side to the other side.  Thus we can view A':newfile as
 248 having been created by taking the changes between E:oldfile and G:newfile
 249 (which were detected as being related, i.e. <50% changed) to A:oldfile.
 250
 251 Thus A:oldfile and A':newfile are just as related as E:oldfile and
 252 G:newfile are -- they have exactly identical differences.  Since the latter
 253 were detected as renames, A:oldfile and A':newfile should also be
 254 detectable as renames almost always.
 255
 256
 257 === 4. A detailed description of the counter-examples to #3.         ===
 258
 259 We already noted in section 3 that rename/rename(1to1) (i.e. both sides
 260 renaming a file the same way) was one counter-example.  The more
 261 interesting bit, though, is why did we need to use the "almost" qualifier
 262 when stating that A:oldfile and A':newfile are "almost" always detectable
 263 as renames?
 264
 265 Let's repeat an earlier point that section 3 made:
 266
 267   A':newfile was created by applying the changes between E:oldfile and
 268   G:newfile to A:oldfile.  The changes between E:oldfile and G:newfile were
 269   <50% of the size of E:oldfile.
 270
 271 If those changes that were <50% of the size of E:oldfile are also <50% of
 272 the size of A:oldfile, then A:oldfile and A':newfile will be detectable as
 273 renames.  However, if there is a dramatic size reduction between E:oldfile
 274 and A:oldfile (but the changes between E:oldfile, G:newfile, and A:oldfile
 275 still somehow merge cleanly), then traditional rename detection would not
 276 detect A:oldfile and A':newfile as renames.
 277
 278 Here's an example where that can happen:
 279   * E:oldfile had 20 lines
 280   * G:newfile added 10 new lines at the beginning of the file
 281   * A:oldfile kept the first 3 lines of the file, and deleted all the rest
 282 then
 283   => A':newfile would have 13 lines, 3 of which matches those in A:oldfile.
 284 E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and
 285 A':newfile would not be.
 286
 287
 288 === 5. Why the special cases in #4 are still fully reasonable to use to    ===
 289 ===    pair up files for three-way content merging in the merge machinery, ===
 290 ===    and why they do not affect the correctness of the merge.            ===
 291
 292 In the rename/rename(1to1) case, A:newfile and A':newfile are not renames
 293 since they use the *same* filename.  However, files with the same filename
 294 are obviously fine to pair up for three-way content merging (the merge
 295 machinery has never employed break detection).  The interesting
 296 counter-example case is thus not the rename/rename(1to1) case, but the case
 297 where A did not rename oldfile.  That was the case that we spent most of
 298 the time discussing in sections 3 and 4.  The remainder of this section
 299 will be devoted to that case as well.
 300
 301 So, even if A:oldfile and A':newfile aren't detectable as renames, why is
 302 it still reasonable to pair them up for three-way content merging in the
 303 merge machinery?  There are multiple reasons:
 304
 305   * As noted in sections 3 and 4, the diff between A:oldfile and A':newfile
 306     is *exactly* the same as the diff between E:oldfile and G:newfile.  The
 307     latter pair were detected as renames, so it seems unlikely to surprise
 308     users for us to treat A:oldfile and A':newfile as renames.
 309
 310   * In fact, "oldfile" and "newfile" were at one point detected as renames
 311     due to how they were constructed in the E..G chain.  And we used that
 312     information once already in this rebase/cherry-pick.  I think users
 313     would be unlikely to be surprised at us continuing to treat the files
 314     as renames and would quickly understand why we had done so.
 315
 316   * Marking or declaring files as renames is *not* the end goal for merges.
 317     Merges use renames to determine which files make sense to be paired up
 318     for three-way content merges.
 319
 320   * A:oldfile and A':newfile were _already_ paired up in a three-way
 321     content merge; that is how A':newfile was created.  In fact, that
 322     three-way content merge was clean.  So using them again in a later
 323     three-way content merge seems very reasonable.
 324
 325 However, the above is focusing on the common scenarios.  Let's try to look
 326 at all possible unusual scenarios and compare without the optimization to
 327 with the optimization.  Consider the following theoretical cases; we will
 328 then dive into each to determine which of them are possible,
 329 and if so, what they mean:
 330
 331   1. Without the optimization, the second merge results in a conflict.
 332      With the optimization, the second merge also results in a conflict.
 333      Questions: Are the conflicts confusingly different?  Better in one case?
 334
 335   2. Without the optimization, the second merge results in NO conflict.
 336      With the optimization, the second merge also results in NO conflict.
 337      Questions: Are the merges the same?
 338
 339   3. Without the optimization, the second merge results in a conflict.
 340      With the optimization, the second merge results in NO conflict.
 341      Questions: Possible?  Bug, bugfix, or something else?
 342
 343   4. Without the optimization, the second merge results in NO conflict.
 344      With the optimization, the second merge results in a conflict.
 345      Questions: Possible?  Bug, bugfix, or something else?
 346
 347 I'll consider all four cases, but out of order.
 348
 349 The fourth case is impossible.  For the code without the remembering
 350 renames optimization to not get a conflict, B:oldfile would need to exactly
 351 match A:oldfile -- if it doesn't, there would be a modify/delete conflict.
 352 If A:oldfile matches B:oldfile exactly, then a three-way content merge
 353 between A:oldfile, A':newfile, and B:oldfile would have no conflict and
 354 just give us the version of newfile from A' as the result.
 355
 356 From the same logic as the above paragraph, the second case would indeed
 357 result in identical merges.  When A:oldfile exactly matches B:oldfile, an
 358 undetected rename would say, "Oh, I see one side didn't modify 'oldfile'
 359 and the other side deleted it.  I'll delete it.  And I see you have this
 360 brand new file named 'newfile' in A', so I'll keep it."  That gives the
 361 same results as three-way content merging A:oldfile, A':newfile, and
 362 B:oldfile -- a removal of oldfile with the version of newfile from A'
 363 showing up in the result.
 364
 365 The third case is interesting.  It means that A:oldfile and A':newfile were
 366 not just similar enough, but that the changes between them did not conflict
 367 with the changes between A:oldfile and B:oldfile.  This would validate our
 368 hunch that the files were similar enough to be used in a three-way content
 369 merge, and thus seems entirely correct for us to have used them that way.
 370 (Sidenote: One particular example here may be enlightening.  Let's say that
 371 B was an immediate revert of A.  B clearly would have been a clean revert
 372 of A, since A was B's immediate parent.  One would assume that if you can
 373 pick a commit, you should also be able to cherry-pick its immediate revert.
 374 However, this is one of those funny corner cases; without this
 375 optimization, we just successfully picked a commit cleanly, but we are
 376 unable to cherry-pick its immediate revert due to the size differences
 377 between E:oldfile and A:oldfile.)
 378
 379 That leaves only the first case to consider -- when we get conflicts both
 380 with or without the optimization.  Without the optimization, we'll have a
 381 modify/delete conflict, where both A':newfile and B:oldfile are left in the
 382 tree for the user to deal with and no hints about the potential similarity
 383 between the two.  With the optimization, we'll have a three-way content
 384 merged A:oldfile, A':newfile, and B:oldfile with conflict markers
 385 suggesting we thought the files were related but giving the user the chance
 386 to resolve.  As noted above, I don't think users will find us treating
 387 'oldfile' and 'newfile' as related as a surprise since they were between E
 388 and G.  In any event, though, this case shouldn't be concerning since we
 389 hit a conflict in both cases, told the user what we know, and asked them to
 390 resolve it.
 391
 392 So, in summary, case 4 is impossible, case 2 yields the same behavior, and
 393 cases 1 and 3 seem to provide as good or better behavior with the
 394 optimization than without.
 395
 396
 397 === 6. Interaction with skipping of "irrelevant" renames ===
 398
 399 Previous optimizations involved skipping rename detection for paths
 400 considered to be "irrelevant".  See for example the following commits:
 401
 402   * 32a56dfb99 ("merge-ort: precompute subset of sources for which we
 403                 need rename detection", 2021-03-11)
 404   * 2fd9eda462 ("merge-ort: precompute whether directory rename
 405                 detection is needed", 2021-03-11)
 406   * 9bd342137e ("diffcore-rename: determine which relevant_sources are
 407                 no longer relevant", 2021-03-13)
 408
 409 Relevance is always determined by what the _other_ side of history has
 410 done, in terms of modifing a file that our side renamed, or adding a
 411 file to a directory which our side renamed.  This means that a path
 412 that is "irrelevant" when picking the first commit of a series in a
 413 rebase or cherry-pick, may suddenly become "relevant" when picking the
 414 next commit.
 415
 416 The upshot of this is that we can only cache rename detection results
 417 for relevant paths, and need to re-check relevance in subsequent
 418 commits.  If those subsequent commits have additional paths that are
 419 relevant for rename detection, then we will need to redo rename
 420 detection -- though we can limit it to the paths for which we have not
 421 already detected renames.
 422
 423
 424 === 7. Additional items that need to be cached ===
 425
 426 It turns out we have to cache more than just renames; we also cache:
 427
 428   A) non-renames (i.e. unpaired deletes)
 429   B) counts of renames within directories
 430   C) sources that were marked as RELEVANT_LOCATION, but which were
 431      downgraded to RELEVANT_NO_MORE
 432   D) the toplevel trees involved in the merge
 433
 434 These are all stored in struct rename_info, and respectively appear in
 435   * cached_pairs (along side actual renames, just with a value of NULL)
 436   * dir_rename_counts
 437   * cached_irrelevant
 438   * merge_trees
 439
 440 The reason for (A) comes from the irrelevant renames skipping
 441 optimization discussed in section 6.  The fact that irrelevant renames
 442 are skipped means we only get a subset of the potential renames
 443 detected and subsequent commits may need to run rename detection on
 444 the upstream side on a subset of the remaining renames (to get the
 445 renames that are relevant for that later commit).  Since unpaired
 446 deletes are involved in rename detection too, we don't want to
 447 repeatedly check that those paths remain unpaired on the upstream side
 448 with every commit we are transplanting.
 449
 450 The reason for (B) is that diffcore_rename_extended() is what
 451 generates the counts of renames by directory which is needed in
 452 directory rename detection, and if we don't run
 453 diffcore_rename_extended() again then we need to have the output from
 454 it, including dir_rename_counts, from the previous run.
 455
 456 The reason for (C) is that merge-ort's tree traversal will again think
 457 those paths are relevant (marking them as RELEVANT_LOCATION), but the
 458 fact that they were downgraded to RELEVANT_NO_MORE means that
 459 dir_rename_counts already has the information we need for directory
 460 rename detection.  (A path which becomes RELEVANT_CONTENT in a
 461 subsequent commit will be removed from cached_irrelevant.)
 462
 463 The reason for (D) is that is how we determine whether the remember
 464 renames optimization can be used.  In particular, remembering that our
 465 sequence of merges looks like:
 466
 467    Merge 1:
 468    MERGE_BASE:   E
 469    MERGE_SIDE1:  G
 470    MERGE_SIDE2:  A
 471    => Creates    A'
 472
 473    Merge 2:
 474    MERGE_BASE:   A
 475    MERGE_SIDE1:  A'
 476    MERGE_SIDE2:  B
 477    => Creates    B'
 478
 479 It is the fact that the trees A and A' appear both in Merge 1 and in
 480 Merge 2, with A as a parent of A' that allows this optimization.  So
 481 we store the trees to compare with what we are asked to merge next
 482 time.
 483
 484
 485 === 8. How directory rename detection interacts with the above and   ===
 486 ===    why this optimization is still safe even if                   ===
 487 ===    merge.directoryRenames is set to "true".                      ===
 488
 489 As noted in the assumptions section:
 490
 491     """
 492     ...if directory renames do occur, then the default of
 493     merge.directoryRenames being set to "conflict" means that the operation
 494     will stop for users to resolve the conflicts and the cache will be
 495     thrown away, and thus that there won't be an optimization to apply.
 496     So, the only reason we need to address directory renames specifically,
 497     is that some users will have set merge.directoryRenames to "true" to
 498     allow the merges to continue to proceed automatically.
 499     """
 500
 501 Let's remember that we need to look at how any given pick affects the next
 502 one.  So let's again use the first two picks from the diagram in section
 503 one:
 504
 505   First pick does this three-way merge:
 506     MERGE_BASE:   E
 507     MERGE_SIDE1:  G
 508     MERGE_SIDE2:  A
 509     => creates A'
 510
 511   Second pick does this three-way merge:
 512     MERGE_BASE:   A
 513     MERGE_SIDE1:  A'
 514     MERGE_SIDE2:  B
 515     => creates B'
 516
 517 Now, directory rename detection exists so that if one side of history
 518 renames a directory, and the other side adds a new file to the old
 519 directory, then the merge (with merge.directoryRenames=true) can move the
 520 file into the new directory.  There are two qualitatively different ways to
 521 add a new file to an old directory: create a new file, or rename a file
 522 into that directory.  Also, directory renames can be done on either side of
 523 history, so there are four cases to consider:
 524
 525   * MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to   old dir
 526   * MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames  file into old dir
 527   * MERGE_SIDE1 adds new file to   old dir, MERGE_SIDE2 renames old dir
 528   * MERGE_SIDE1 renames  file into old dir, MERGE_SIDE2 renames old dir
 529
 530 One last note before we consider these four cases: There are some
 531 important properties about how we implement this optimization with
 532 respect to directory rename detection that we need to bear in mind
 533 while considering all of these cases:
 534
 535   * rename caching occurs *after* applying directory renames
 536
 537   * a rename created by directory rename detection is recorded for the side
 538     of history that did the directory rename.
 539
 540   * dir_rename_counts, the nested map of
 541         {oldname => {newname => count}},
 542     is cached between runs as well.  This basically means that directory
 543     rename detection is also cached, though only on the side of history
 544     that we cache renames for (MERGE_SIDE1 as far as this document is
 545     concerned; see the assumptions section).  Two interesting sub-notes
 546     about these counts:
 547
 548     * If we need to perform rename-detection again on the given side (e.g.
 549       some paths are relevant for rename detection that weren't before),
 550       then we clear dir_rename_counts and recompute it, making use of
 551       cached_pairs.  The reason it is important to do this is optimizations
 552       around RELEVANT_LOCATION exist to prevent us from computing
 553       unnecessary renames for directory rename detection and from computing
 554       dir_rename_counts for irrelevant directories; but those same renames
 555       or directories may become necessary for subsequent merges.  The
 556       easiest way to "fix up" dir_rename_counts in such cases is to just
 557       recompute it.
 558
 559     * If we prune rename/rename(1to1) entries from the cache, then we also
 560       need to update dir_rename_counts to decrement the counts for the
 561       involved directory and any relevant parent directories (to undo what
 562       update_dir_rename_counts() in diffcore-rename.c incremented when the
 563       rename was initially found).  If we instead just disable the
 564       remembering renames optimization when the exceedingly rare
 565       rename/rename(1to1) cases occur, then dir_rename_counts will get
 566       re-computed the next time rename detection occurs, as noted above.
 567
 568   * the side with multiple commits to pick, is the side of history that we
 569     do NOT cache renames for.  Thus, there are no additional commits to
 570     change the number of renames in a directory, except for those done by
 571     directory rename detection (which always pad the majority).
 572
 573   * the "renames" we cache are modified slightly by any directory rename,
 574     as noted below.
 575
 576 Now, with those notes out of the way, let's go through the four cases
 577 in order:
 578
 579 Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir
 580
 581   This case looks like this:
 582
 583     MERGE_BASE:   E,   Has olddir/
 584     MERGE_SIDE1:  G,   Renames olddir/ -> newdir/
 585     MERGE_SIDE2:  A,   Adds olddir/newfile
 586     => creates    A',  With newdir/newfile
 587
 588     MERGE_BASE:   A,   Has olddir/newfile
 589     MERGE_SIDE1:  A',  Has newdir/newfile
 590     MERGE_SIDE2:  B,   Modifies olddir/newfile
 591     => expected   B',  with threeway-merged newdir/newfile from above
 592
 593   In this case, with the optimization, note that after the first commit:
 594     * MERGE_SIDE1 remembers olddir/ -> newdir/
 595     * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile
 596   Given the cached rename noted above, the second merge can proceed as
 597   expected without needing to perform rename detection from A -> A'.
 598
 599 Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames  file into old dir
 600
 601   This case looks like this:
 602     MERGE_BASE:   E    oldfile, olddir/
 603     MERGE_SIDE1:  G    oldfile, olddir/ -> newdir/
 604     MERGE_SIDE2:  A    oldfile -> olddir/newfile
 605     => creates    A',  With newdir/newfile representing original oldfile
 606
 607     MERGE_BASE:   A    olddir/newfile
 608     MERGE_SIDE1:  A'   newdir/newfile
 609     MERGE_SIDE2:  B    modify olddir/newfile
 610     => expected   B',  with threeway-merged newdir/newfile from above
 611
 612   In this case, with the optimization, note that after the first commit:
 613     * MERGE_SIDE1 remembers olddir/ -> newdir/
 614     * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile
 615                   (NOT oldfile -> newdir/newfile; compare to case with
 616                    (p->status == 'R' && new_path) in possibly_cache_new_pair())
 617
 618   Given the cached rename noted above, the second merge can proceed as
 619   expected without needing to perform rename detection from A -> A'.
 620
 621 Case 3: MERGE_SIDE1 adds new file to   old dir, MERGE_SIDE2 renames old dir
 622
 623   This case looks like this:
 624
 625     MERGE_BASE:   E,   Has olddir/
 626     MERGE_SIDE1:  G,   Adds olddir/newfile
 627     MERGE_SIDE2:  A,   Renames olddir/ -> newdir/
 628     => creates    A',  With newdir/newfile
 629
 630     MERGE_BASE:   A,   Has newdir/, but no notion of newdir/newfile
 631     MERGE_SIDE1:  A',  Has newdir/newfile
 632     MERGE_SIDE2:  B,   Has newdir/, but no notion of newdir/newfile
 633     => expected   B',  with newdir/newfile from A'
 634
 635   In this case, with the optimization, note that after the first commit there
 636   were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed.
 637   But the second merge didn't need any renames so this is fine.
 638
 639 Case 4: MERGE_SIDE1 renames  file into old dir, MERGE_SIDE2 renames old dir
 640
 641   This case looks like this:
 642
 643     MERGE_BASE:   E,   Has olddir/
 644     MERGE_SIDE1:  G,   Renames oldfile -> olddir/newfile
 645     MERGE_SIDE2:  A,   Renames olddir/ -> newdir/
 646     => creates    A',  With newdir/newfile representing original oldfile
 647
 648     MERGE_BASE:   A,   Has oldfile
 649     MERGE_SIDE1:  A',  Has newdir/newfile
 650     MERGE_SIDE2:  B,   Modifies oldfile
 651     => expected   B',  with threeway-merged newdir/newfile from above
 652
 653   In this case, with the optimization, note that after the first commit:
 654     * MERGE_SIDE1 remembers oldfile -> newdir/newfile
 655                   (NOT oldfile -> olddir/newfile; compare to case of second
 656                    block under p->status == 'R' in possibly_cache_new_pair())
 657     * MERGE_SIDE2 renames are tossed because only MERGE_SIDE1 is remembered
 658
 659   Given the cached rename noted above, the second merge can proceed as
 660   expected without needing to perform rename detection from A -> A'.
 661
 662 Finally, I'll just note here that interactions with the
 663 skip-irrelevant-renames optimization means we sometimes don't detect
 664 renames for any files within a directory that was renamed, in which
 665 case we will not have been able to detect any rename for the directory
 666 itself.  In such a case, we do not know whether the directory was
 667 renamed; we want to be careful to avoid cacheing some kind of "this
 668 directory was not renamed" statement.  If we did, then a subsequent
 669 commit being rebased could add a file to the old directory, and the
 670 user would expect it to end up in the correct directory -- something
 671 our erroneous "this directory was not renamed" cache would preclude.