Documentation/MyFirstObjectWalk.txt

   1 = My First Object Walk
   2
   3 == What's an Object Walk?
   4
   5 The object walk is a key concept in Git - this is the process that underpins
   6 operations like object transfer and fsck. Beginning from a given commit, the
   7 list of objects is found by walking parent relationships between commits (commit
   8 X based on commit W) and containment relationships between objects (tree Y is
   9 contained within commit X, and blob Z is located within tree Y, giving our
  10 working tree for commit X something like `y/z.txt`).
  11
  12 A related concept is the revision walk, which is focused on commit objects and
  13 their parent relationships and does not delve into other object types. The
  14 revision walk is used for operations like `git log`.
  15
  16 === Related Reading
  17
  18 - `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
  19   the revision walker in its various incarnations.
  20 - `revision.h`
  21 - https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
  22   gives a good overview of the types of objects in Git and what your object
  23   walk is really describing.
  24
  25 == Setting Up
  26
  27 Create a new branch from `master`.
  28
  29 ----
  30 git checkout -b revwalk origin/master
  31 ----
  32
  33 We'll put our fiddling into a new command. For fun, let's name it `git walken`.
  34 Open up a new file `builtin/walken.c` and set up the command handler:
  35
  36 ----
  37 /*
  38  * "git walken"
  39  *
  40  * Part of the "My First Object Walk" tutorial.
  41  */
  42
  43 #include "builtin.h"
  44 #include "trace.h"
  45
  46 int cmd_walken(int argc, const char **argv, const char *prefix)
  47 {
  48         trace_printf(_("cmd_walken incoming...\n"));
  49         return 0;
  50 }
  51 ----
  52
  53 NOTE: `trace_printf()`, defined in `trace.h`, differs from `printf()` in
  54 that it can be turned on or off at runtime. For the purposes of this
  55 tutorial, we will write `walken` as though it is intended for use as
  56 a "plumbing" command: that is, a command which is used primarily in
  57 scripts, rather than interactively by humans (a "porcelain" command).
  58 So we will send our debug output to `trace_printf()` instead.
  59 When running, enable trace output by setting the environment variable `GIT_TRACE`.
  60
  61 Add usage text and `-h` handling, like all subcommands should consistently do
  62 (our test suite will notice and complain if you fail to do so).
  63 We'll need to include the `parse-options.h` header.
  64
  65 ----
  66 #include "parse-options.h"
  67
  68 ...
  69
  70 int cmd_walken(int argc, const char **argv, const char *prefix)
  71 {
  72         const char * const walken_usage[] = {
  73                 N_("git walken"),
  74                 NULL,
  75         };
  76         struct option options[] = {
  77                 OPT_END()
  78         };
  79
  80         argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
  81
  82         ...
  83 }
  84 ----
  85
  86 Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
  87
  88 ----
  89 int cmd_walken(int argc, const char **argv, const char *prefix);
  90 ----
  91
  92 Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
  93 maintaining alphabetical ordering:
  94
  95 ----
  96 { "walken", cmd_walken, RUN_SETUP },
  97 ----
  98
  99 Add it to the `Makefile` near the line for `builtin/worktree.o`:
 100
 101 ----
 102 BUILTIN_OBJS += builtin/walken.o
 103 ----
 104
 105 Build and test out your command, without forgetting to ensure the `DEVELOPER`
 106 flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
 107
 108 ----
 109 $ echo DEVELOPER=1 >>config.mak
 110 $ make
 111 $ GIT_TRACE=1 ./bin-wrappers/git walken
 112 ----
 113
 114 NOTE: For a more exhaustive overview of the new command process, take a look at
 115 `Documentation/MyFirstContribution.txt`.
 116
 117 NOTE: A reference implementation can be found at
 118 https://github.com/nasamuffin/git/tree/revwalk.
 119
 120 === `struct rev_cmdline_info`
 121
 122 The definition of `struct rev_cmdline_info` can be found in `revision.h`.
 123
 124 This struct is contained within the `rev_info` struct and is used to reflect
 125 parameters provided by the user over the CLI.
 126
 127 `nr` represents the number of `rev_cmdline_entry` present in the array.
 128
 129 `alloc` is used by the `ALLOC_GROW` macro. Check `alloc.h` - this variable is
 130 used to track the allocated size of the list.
 131
 132 Per entry, we find:
 133
 134 `item` is the object provided upon which to base the object walk. Items in Git
 135 can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
 136
 137 `name` is the object ID (OID) of the object - a hex string you may be familiar
 138 with from using Git to organize your source in the past. Check the tutorial
 139 mentioned above towards the top for a discussion of where the OID can come
 140 from.
 141
 142 `whence` indicates some information about what to do with the parents of the
 143 specified object. We'll explore this flag more later on; take a look at
 144 `Documentation/revisions.txt` to get an idea of what could set the `whence`
 145 value.
 146
 147 `flags` are used to hint the beginning of the revision walk and are the first
 148 block under the `#include`s in `revision.h`. The most likely ones to be set in
 149 the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
 150 can be used during the walk, as well.
 151
 152 === `struct rev_info`
 153
 154 This one is quite a bit longer, and many fields are only used during the walk
 155 by `revision.c` - not configuration options. Most of the configurable flags in
 156 `struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
 157 good idea to take some time and read through that document.
 158
 159 == Basic Commit Walk
 160
 161 First, let's see if we can replicate the output of `git log --oneline`. We'll
 162 refer back to the implementation frequently to discover norms when performing
 163 an object walk of our own.
 164
 165 To do so, we'll first find all the commits, in order, which preceded the current
 166 commit. We'll extract the name and subject of the commit from each.
 167
 168 Ideally, we will also be able to find out which ones are currently at the tip of
 169 various branches.
 170
 171 === Setting Up
 172
 173 Preparing for your object walk has some distinct stages.
 174
 175 1. Perform default setup for this mode, and others which may be invoked.
 176 2. Check configuration files for relevant settings.
 177 3. Set up the `rev_info` struct.
 178 4. Tweak the initialized `rev_info` to suit the current walk.
 179 5. Prepare the `rev_info` for the walk.
 180 6. Iterate over the objects, processing each one.
 181
 182 ==== Default Setups
 183
 184 Before examining configuration files which may modify command behavior, set up
 185 default state for switches or options your command may have. If your command
 186 utilizes other Git components, ask them to set up their default states as well.
 187 For instance, `git log` takes advantage of `grep` and `diff` functionality, so
 188 its `init_log_defaults()` sets its own state (`decoration_style`) and asks
 189 `grep` and `diff` to initialize themselves by calling each of their
 190 initialization functions.
 191
 192 ==== Configuring From `.gitconfig`
 193
 194 Next, we should have a look at any relevant configuration settings (i.e.,
 195 settings readable and settable from `git config`). This is done by providing a
 196 callback to `git_config()`; within that callback, you can also invoke methods
 197 from other components you may need that need to intercept these options. Your
 198 callback will be invoked once per each configuration value which Git knows about
 199 (global, local, worktree, etc.).
 200
 201 Similarly to the default values, we don't have anything to do here yet
 202 ourselves; however, we should call `git_default_config()` if we aren't calling
 203 any other existing config callbacks.
 204
 205 Add a new function to `builtin/walken.c`.
 206 We'll also need to include the `config.h` header:
 207
 208 ----
 209 #include "config.h"
 210
 211 ...
 212
 213 static int git_walken_config(const char *var, const char *value,
 214                              const struct config_context *ctx, void *cb)
 215 {
 216         /*
 217          * For now, we don't have any custom configuration, so fall back to
 218          * the default config.
 219          */
 220         return git_default_config(var, value, ctx, cb);
 221 }
 222 ----
 223
 224 Make sure to invoke `git_config()` with it in your `cmd_walken()`:
 225
 226 ----
 227 int cmd_walken(int argc, const char **argv, const char *prefix)
 228 {
 229         ...
 230
 231         git_config(git_walken_config, NULL);
 232
 233         ...
 234 }
 235 ----
 236
 237 ==== Setting Up `rev_info`
 238
 239 Now that we've gathered external configuration and options, it's time to
 240 initialize the `rev_info` object which we will use to perform the walk. This is
 241 typically done by calling `repo_init_revisions()` with the repository you intend
 242 to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
 243 struct.
 244
 245 Add the `struct rev_info` and the `repo_init_revisions()` call.
 246 We'll also need to include the `revision.h` header:
 247
 248 ----
 249 #include "revision.h"
 250
 251 ...
 252
 253 int cmd_walken(int argc, const char **argv, const char *prefix)
 254 {
 255         /* This can go wherever you like in your declarations.*/
 256         struct rev_info rev;
 257         ...
 258
 259         /* This should go after the git_config() call. */
 260         repo_init_revisions(the_repository, &rev, prefix);
 261
 262         ...
 263 }
 264 ----
 265
 266 ==== Tweaking `rev_info` For the Walk
 267
 268 We're getting close, but we're still not quite ready to go. Now that `rev` is
 269 initialized, we can modify it to fit our needs. This is usually done within a
 270 helper for clarity, so let's add one:
 271
 272 ----
 273 static void final_rev_info_setup(struct rev_info *rev)
 274 {
 275         /*
 276          * We want to mimic the appearance of `git log --oneline`, so let's
 277          * force oneline format.
 278          */
 279         get_commit_format("oneline", rev);
 280
 281         /* Start our object walk at HEAD. */
 282         add_head_to_pending(rev);
 283 }
 284 ----
 285
 286 [NOTE]
 287 ====
 288 Instead of using the shorthand `add_head_to_pending()`, you could do
 289 something like this:
 290 ----
 291         struct setup_revision_opt opt;
 292
 293         memset(&opt, 0, sizeof(opt));
 294         opt.def = "HEAD";
 295         opt.revarg_opt = REVARG_COMMITTISH;
 296         setup_revisions(argc, argv, rev, &opt);
 297 ----
 298 Using a `setup_revision_opt` gives you finer control over your walk's starting
 299 point.
 300 ====
 301
 302 Then let's invoke `final_rev_info_setup()` after the call to
 303 `repo_init_revisions()`:
 304
 305 ----
 306 int cmd_walken(int argc, const char **argv, const char *prefix)
 307 {
 308         ...
 309
 310         final_rev_info_setup(&rev);
 311
 312         ...
 313 }
 314 ----
 315
 316 Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
 317 now, this is all we need.
 318
 319 ==== Preparing `rev_info` For the Walk
 320
 321 Now that `rev` is all initialized and configured, we've got one more setup step
 322 before we get rolling. We can do this in a helper, which will both prepare the
 323 `rev_info` for the walk, and perform the walk itself. Let's start the helper
 324 with the call to `prepare_revision_walk()`, which can return an error without
 325 dying on its own:
 326
 327 ----
 328 static void walken_commit_walk(struct rev_info *rev)
 329 {
 330         if (prepare_revision_walk(rev))
 331                 die(_("revision walk setup failed"));
 332 }
 333 ----
 334
 335 NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
 336 `stderr` it's likely to be seen by a human, so we will localize it.
 337
 338 ==== Performing the Walk!
 339
 340 Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
 341 can also be used as an iterator; we move to the next item in the walk by using
 342 `get_revision()` repeatedly. Add the listed variable declarations at the top and
 343 the walk loop below the `prepare_revision_walk()` call within your
 344 `walken_commit_walk()`:
 345
 346 ----
 347 #include "pretty.h"
 348
 349 ...
 350
 351 static void walken_commit_walk(struct rev_info *rev)
 352 {
 353         struct commit *commit;
 354         struct strbuf prettybuf = STRBUF_INIT;
 355
 356         ...
 357
 358         while ((commit = get_revision(rev))) {
 359                 strbuf_reset(&prettybuf);
 360                 pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
 361                 puts(prettybuf.buf);
 362         }
 363         strbuf_release(&prettybuf);
 364 }
 365 ----
 366
 367 NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
 368 command we expect to be machine-parsed, we're sending it directly to stdout.
 369
 370 Give it a shot.
 371
 372 ----
 373 $ make
 374 $ ./bin-wrappers/git walken
 375 ----
 376
 377 You should see all of the subject lines of all the commits in
 378 your tree's history, in order, ending with the initial commit, "Initial revision
 379 of "git", the information manager from hell". Congratulations! You've written
 380 your first revision walk. You can play with printing some additional fields
 381 from each commit if you're curious; have a look at the functions available in
 382 `commit.h`.
 383
 384 === Adding a Filter
 385
 386 Next, let's try to filter the commits we see based on their author. This is
 387 equivalent to running `git log --author=<pattern>`. We can add a filter by
 388 modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
 389
 390 First some setup. Add `grep_config()` to `git_walken_config()`:
 391
 392 ----
 393 static int git_walken_config(const char *var, const char *value,
 394                              const struct config_context *ctx, void *cb)
 395 {
 396         grep_config(var, value, ctx, cb);
 397         return git_default_config(var, value, ctx, cb);
 398 }
 399 ----
 400
 401 Next, we can modify the `grep_filter`. This is done with convenience functions
 402 found in `grep.h`. For fun, we're filtering to only commits from folks using a
 403 `gmail.com` email address - a not-very-precise guess at who may be working on
 404 Git as a hobby. Since we're checking the author, which is a specific line in the
 405 header, we'll use the `append_header_grep_pattern()` helper. We can use
 406 the `enum grep_header_field` to indicate which part of the commit header we want
 407 to search.
 408
 409 In `final_rev_info_setup()`, add your filter line:
 410
 411 ----
 412 static void final_rev_info_setup(int argc, const char **argv,
 413                 const char *prefix, struct rev_info *rev)
 414 {
 415         ...
 416
 417         append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
 418                 "gmail");
 419         compile_grep_patterns(&rev->grep_filter);
 420
 421         ...
 422 }
 423 ----
 424
 425 `append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
 426 it won't work unless we compile it with `compile_grep_patterns()`.
 427
 428 NOTE: If you are using `setup_revisions()` (for example, if you are passing a
 429 `setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
 430 to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
 431
 432 NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
 433 wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
 434 `enum grep_pat_token` for us.
 435
 436 === Changing the Order
 437
 438 There are a few ways that we can change the order of the commits during a
 439 revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
 440 typical orderings.
 441
 442 `topo_order` is the same as `git log --topo-order`: we avoid showing a parent
 443 before all of its children have been shown, and we avoid mixing commits which
 444 are in different lines of history. (`git help log`'s section on `--topo-order`
 445 has a very nice diagram to illustrate this.)
 446
 447 Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
 448 `REV_SORT_BY_AUTHOR_DATE`. Add the following:
 449
 450 ----
 451 static void final_rev_info_setup(int argc, const char **argv,
 452                 const char *prefix, struct rev_info *rev)
 453 {
 454         ...
 455
 456         rev->topo_order = 1;
 457         rev->sort_order = REV_SORT_BY_COMMIT_DATE;
 458
 459         ...
 460 }
 461 ----
 462
 463 Let's output this into a file so we can easily diff it with the walk sorted by
 464 author date.
 465
 466 ----
 467 $ make
 468 $ ./bin-wrappers/git walken > commit-date.txt
 469 ----
 470
 471 Then, let's sort by author date and run it again.
 472
 473 ----
 474 static void final_rev_info_setup(int argc, const char **argv,
 475                 const char *prefix, struct rev_info *rev)
 476 {
 477         ...
 478
 479         rev->topo_order = 1;
 480         rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
 481
 482         ...
 483 }
 484 ----
 485
 486 ----
 487 $ make
 488 $ ./bin-wrappers/git walken > author-date.txt
 489 ----
 490
 491 Finally, compare the two. This is a little less helpful without object names or
 492 dates, but hopefully we get the idea.
 493
 494 ----
 495 $ diff -u commit-date.txt author-date.txt
 496 ----
 497
 498 This display indicates that commits can be reordered after they're written, for
 499 example with `git rebase`.
 500
 501 Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
 502 Set that flag somewhere inside of `final_rev_info_setup()`:
 503
 504 ----
 505 static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
 506                 struct rev_info *rev)
 507 {
 508         ...
 509
 510         rev->reverse = 1;
 511
 512         ...
 513 }
 514 ----
 515
 516 Run your walk again and note the difference in order. (If you remove the grep
 517 pattern, you should see the last commit this call gives you as your current
 518 HEAD.)
 519
 520 == Basic Object Walk
 521
 522 So far we've been walking only commits. But Git has more types of objects than
 523 that! Let's see if we can walk _all_ objects, and find out some information
 524 about each one.
 525
 526 We can base our work on an example. `git pack-objects` prepares all kinds of
 527 objects for packing into a bitmap or packfile. The work we are interested in
 528 resides in `builtin/pack-objects.c:get_object_list()`; examination of that
 529 function shows that the all-object walk is being performed by
 530 `traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
 531 functions reside in `list-objects.c`; examining the source shows that, despite
 532 the name, these functions traverse all kinds of objects. Let's have a look at
 533 the arguments to `traverse_commit_list()`.
 534
 535 - `struct rev_info *revs`: This is the `rev_info` used for the walk. If
 536   its `filter` member is not `NULL`, then `filter` contains information for
 537   how to filter the object list.
 538 - `show_commit_fn show_commit`: A callback which will be used to handle each
 539   individual commit object.
 540 - `show_object_fn show_object`: A callback which will be used to handle each
 541   non-commit object (so each blob, tree, or tag).
 542 - `void *show_data`: A context buffer which is passed in turn to `show_commit`
 543   and `show_object`.
 544
 545 In addition, `traverse_commit_list_filtered()` has an additional parameter:
 546
 547 - `struct oidset *omitted`: A linked-list of object IDs which the provided
 548   filter caused to be omitted.
 549
 550 It looks like these methods use callbacks we provide instead of needing us
 551 to call it repeatedly ourselves. Cool! Let's add the callbacks first.
 552
 553 For the sake of this tutorial, we'll simply keep track of how many of each kind
 554 of object we find. At file scope in `builtin/walken.c` add the following
 555 tracking variables:
 556
 557 ----
 558 static int commit_count;
 559 static int tag_count;
 560 static int blob_count;
 561 static int tree_count;
 562 ----
 563
 564 Commits are handled by a different callback than other objects; let's do that
 565 one first:
 566
 567 ----
 568 static void walken_show_commit(struct commit *cmt, void *buf)
 569 {
 570         commit_count++;
 571 }
 572 ----
 573
 574 The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
 575 the `buf` argument is actually the context buffer that we can provide to the
 576 traversal calls - `show_data`, which we mentioned a moment ago.
 577
 578 Since we have the `struct commit` object, we can look at all the same parts that
 579 we looked at in our earlier commit-only walk. For the sake of this tutorial,
 580 though, we'll just increment the commit counter and move on.
 581
 582 The callback for non-commits is a little different, as we'll need to check
 583 which kind of object we're dealing with:
 584
 585 ----
 586 static void walken_show_object(struct object *obj, const char *str, void *buf)
 587 {
 588         switch (obj->type) {
 589         case OBJ_TREE:
 590                 tree_count++;
 591                 break;
 592         case OBJ_BLOB:
 593                 blob_count++;
 594                 break;
 595         case OBJ_TAG:
 596                 tag_count++;
 597                 break;
 598         case OBJ_COMMIT:
 599                 BUG("unexpected commit object in walken_show_object\n");
 600         default:
 601                 BUG("unexpected object type %s in walken_show_object\n",
 602                         type_name(obj->type));
 603         }
 604 }
 605 ----
 606
 607 Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
 608 context pointer that `walken_show_commit()` receives: the `show_data` argument
 609 to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
 610 `str` contains the name of the object, which ends up being something like
 611 `foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
 612
 613 To help assure us that we aren't double-counting commits, we'll include some
 614 complaining if a commit object is routed through our non-commit callback; we'll
 615 also complain if we see an invalid object type. Since those two cases should be
 616 unreachable, and would only change in the event of a semantic change to the Git
 617 codebase, we complain by using `BUG()` - which is a signal to a developer that
 618 the change they made caused unintended consequences, and the rest of the
 619 codebase needs to be updated to understand that change. `BUG()` is not intended
 620 to be seen by the public, so it is not localized.
 621
 622 Our main object walk implementation is substantially different from our commit
 623 walk implementation, so let's make a new function to perform the object walk. We
 624 can perform setup which is applicable to all objects here, too, to keep separate
 625 from setup which is applicable to commit-only walks.
 626
 627 We'll start by enabling all types of objects in the `struct rev_info`.  We'll
 628 also turn on `tree_blobs_in_commit_order`, which means that we will walk a
 629 commit's tree and everything it points to immediately after we find each commit,
 630 as opposed to waiting for the end and walking through all trees after the commit
 631 history has been discovered. With the appropriate settings configured, we are
 632 ready to call `prepare_revision_walk()`.
 633
 634 ----
 635 static void walken_object_walk(struct rev_info *rev)
 636 {
 637         rev->tree_objects = 1;
 638         rev->blob_objects = 1;
 639         rev->tag_objects = 1;
 640         rev->tree_blobs_in_commit_order = 1;
 641
 642         if (prepare_revision_walk(rev))
 643                 die(_("revision walk setup failed"));
 644
 645         commit_count = 0;
 646         tag_count = 0;
 647         blob_count = 0;
 648         tree_count = 0;
 649 ----
 650
 651 Let's start by calling just the unfiltered walk and reporting our counts.
 652 Complete your implementation of `walken_object_walk()`.
 653 We'll also need to include the `list-objects.h` header.
 654
 655 ----
 656 #include "list-objects.h"
 657
 658 ...
 659
 660         traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
 661
 662         printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
 663                 blob_count, tag_count, tree_count);
 664 }
 665 ----
 666
 667 NOTE: This output is intended to be machine-parsed. Therefore, we are not
 668 sending it to `trace_printf()`, and we are not localizing it - we need scripts
 669 to be able to count on the formatting to be exactly the way it is shown here.
 670 If we were intending this output to be read by humans, we would need to localize
 671 it with `_()`.
 672
 673 Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
 674 command line options is out of scope for this tutorial, so we'll just hardcode
 675 a branch we can change at compile time. Where you call `final_rev_info_setup()`
 676 and `walken_commit_walk()`, instead branch like so:
 677
 678 ----
 679         if (1) {
 680                 add_head_to_pending(&rev);
 681                 walken_object_walk(&rev);
 682         } else {
 683                 final_rev_info_setup(argc, argv, prefix, &rev);
 684                 walken_commit_walk(&rev);
 685         }
 686 ----
 687
 688 NOTE: For simplicity, we've avoided all the filters and sorts we applied in
 689 `final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
 690 want, you can certainly use the filters we added before by moving
 691 `final_rev_info_setup()` out of the conditional and removing the call to
 692 `add_head_to_pending()`.
 693
 694 Now we can try to run our command! It should take noticeably longer than the
 695 commit walk, but an examination of the output will give you an idea why. Your
 696 output should look similar to this example, but with different counts:
 697
 698 ----
 699 Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
 700 ----
 701
 702 This makes sense. We have more trees than commits because the Git project has
 703 lots of subdirectories which can change, plus at least one tree per commit. We
 704 have no tags because we started on a commit (`HEAD`) and while tags can point to
 705 commits, commits can't point to tags.
 706
 707 NOTE: You will have different counts when you run this yourself! The number of
 708 objects grows along with the Git project.
 709
 710 === Adding a Filter
 711
 712 There are a handful of filters that we can apply to the object walk laid out in
 713 `Documentation/rev-list-options.txt`. These filters are typically useful for
 714 operations such as creating packfiles or performing a partial clone. They are
 715 defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
 716 will use the "tree:1" filter, which causes the walk to omit all trees and blobs
 717 which are not directly referenced by commits reachable from the commit in
 718 `pending` when the walk begins. (`pending` is the list of objects which need to
 719 be traversed during a walk; you can imagine a breadth-first tree traversal to
 720 help understand. In our case, that means we omit trees and blobs not directly
 721 referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
 722 `HEAD` in the `pending` list.)
 723
 724 For now, we are not going to track the omitted objects, so we'll replace those
 725 parameters with `NULL`. For the sake of simplicity, we'll add a simple
 726 build-time branch to use our filter or not. Preface the line calling
 727 `traverse_commit_list()` with the following, which will remind us which kind of
 728 walk we've just performed:
 729
 730 ----
 731         if (0) {
 732                 /* Unfiltered: */
 733                 trace_printf(_("Unfiltered object walk.\n"));
 734         } else {
 735                 trace_printf(
 736                         _("Filtered object walk with filterspec 'tree:1'.\n"));
 737
 738                 parse_list_objects_filter(&rev->filter, "tree:1");
 739         }
 740         traverse_commit_list(rev, walken_show_commit,
 741                              walken_show_object, NULL);
 742 ----
 743
 744 The `rev->filter` member is usually built directly from a command
 745 line argument, so the module provides an easy way to build one from a string.
 746 Even though we aren't taking user input right now, we can still build one with
 747 a hardcoded string using `parse_list_objects_filter()`.
 748
 749 With the filter spec "tree:1", we are expecting to see _only_ the root tree for
 750 each commit; therefore, the tree object count should be less than or equal to
 751 the number of commits. (For an example of why that's true: `git commit --revert`
 752 points to the same tree object as its grandparent.)
 753
 754 === Counting Omitted Objects
 755
 756 We also have the capability to enumerate all objects which were omitted by a
 757 filter, like with `git log --filter=<spec> --filter-print-omitted`. To do this,
 758 change `traverse_commit_list()` to `traverse_commit_list_filtered()`, which is
 759 able to populate an `omitted` list.  Asking for this list of filtered objects
 760 may cause performance degradations, however, because in this case, despite
 761 filtering objects, the possibly much larger set of all reachable objects must
 762 be processed in order to populate that list.
 763
 764 First, add the `struct oidset` and related items we will use to iterate it:
 765
 766 ----
 767 #include "oidset.h"
 768
 769 ...
 770
 771 static void walken_object_walk(
 772         ...
 773
 774         struct oidset omitted;
 775         struct oidset_iter oit;
 776         struct object_id *oid = NULL;
 777         int omitted_count = 0;
 778         oidset_init(&omitted, 0);
 779
 780         ...
 781 ----
 782
 783 Replace the call to `traverse_commit_list()` with
 784 `traverse_commit_list_filtered()` and pass a pointer to the `omitted` oidset
 785 defined and initialized above:
 786
 787 ----
 788         ...
 789
 790                 traverse_commit_list_filtered(rev,
 791                         walken_show_commit, walken_show_object, NULL, &omitted);
 792
 793         ...
 794 ----
 795
 796 Then, after your traversal, the `oidset` traversal is pretty straightforward.
 797 Count all the objects within and modify the print statement:
 798
 799 ----
 800         /* Count the omitted objects. */
 801         oidset_iter_init(&omitted, &oit);
 802
 803         while ((oid = oidset_iter_next(&oit)))
 804                 omitted_count++;
 805
 806         printf("commits %d\nblobs %d\ntags %d\ntrees %d\nomitted %d\n",
 807                 commit_count, blob_count, tag_count, tree_count, omitted_count);
 808 ----
 809
 810 By running your walk with and without the filter, you should find that the total
 811 object count in each case is identical. You can also time each invocation of
 812 the `walken` subcommand, with and without `omitted` being passed in, to confirm
 813 to yourself the runtime impact of tracking all omitted objects.
 814
 815 === Changing the Order
 816
 817 Finally, let's demonstrate that you can also reorder walks of all objects, not
 818 just walks of commits. First, we'll make our handlers chattier - modify
 819 `walken_show_commit()` and `walken_show_object()` to print the object as they
 820 go:
 821
 822 ----
 823 #include "hex.h"
 824
 825 ...
 826
 827 static void walken_show_commit(struct commit *cmt, void *buf)
 828 {
 829         trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
 830         commit_count++;
 831 }
 832
 833 static void walken_show_object(struct object *obj, const char *str, void *buf)
 834 {
 835         trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
 836
 837         ...
 838 }
 839 ----
 840
 841 NOTE: Since we will be examining this output directly as humans, we'll use
 842 `trace_printf()` here. Additionally, since this change introduces a significant
 843 number of printed lines, using `trace_printf()` will allow us to easily silence
 844 those lines without having to recompile.
 845
 846 (Leave the counter increment logic in place.)
 847
 848 With only that change, run again (but save yourself some scrollback):
 849
 850 ----
 851 $ GIT_TRACE=1 ./bin-wrappers/git walken 2>&1 | head -n 10
 852 ----
 853
 854 Take a look at the top commit with `git show` and the object ID you printed; it
 855 should be the same as the output of `git show HEAD`.
 856
 857 Next, let's change a setting on our `struct rev_info` within
 858 `walken_object_walk()`. Find where you're changing the other settings on `rev`,
 859 such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
 860 `reverse` setting at the bottom:
 861
 862 ----
 863         ...
 864
 865         rev->tree_objects = 1;
 866         rev->blob_objects = 1;
 867         rev->tag_objects = 1;
 868         rev->tree_blobs_in_commit_order = 1;
 869         rev->reverse = 1;
 870
 871         ...
 872 ----
 873
 874 Now, run again, but this time, let's grab the last handful of objects instead
 875 of the first handful:
 876
 877 ----
 878 $ make
 879 $ GIT_TRACE=1 ./bin-wrappers/git walken 2>&1 | tail -n 10
 880 ----
 881
 882 The last commit object given should have the same OID as the one we saw at the
 883 top before, and running `git show <oid>` with that OID should give you again
 884 the same results as `git show HEAD`. Furthermore, if you run and examine the
 885 first ten lines again (with `head` instead of `tail` like we did before applying
 886 the `reverse` setting), you should see that now the first commit printed is the
 887 initial commit, `e83c5163`.
 888
 889 == Wrapping Up
 890
 891 Let's review. In this tutorial, we:
 892
 893 - Built a commit walk from the ground up
 894 - Enabled a grep filter for that commit walk
 895 - Changed the sort order of that filtered commit walk
 896 - Built an object walk (tags, commits, trees, and blobs) from the ground up
 897 - Learned how to add a filter-spec to an object walk
 898 - Changed the display order of the filtered object walk