Documentation/MyFirstObjectWalk.txt

   1 = My First Object Walk
   2
   3 == What's an Object Walk?
   4
   5 The object walk is a key concept in Git - this is the process that underpins
   6 operations like object transfer and fsck. Beginning from a given commit, the
   7 list of objects is found by walking parent relationships between commits (commit
   8 X based on commit W) and containment relationships between objects (tree Y is
   9 contained within commit X, and blob Z is located within tree Y, giving our
  10 working tree for commit X something like `y/z.txt`).
  11
  12 A related concept is the revision walk, which is focused on commit objects and
  13 their parent relationships and does not delve into other object types. The
  14 revision walk is used for operations like `git log`.
  15
  16 === Related Reading
  17
  18 - `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
  19   the revision walker in its various incarnations.
  20 - `Documentation/technical/api-revision-walking.txt`
  21 - https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
  22   gives a good overview of the types of objects in Git and what your object
  23   walk is really describing.
  24
  25 == Setting Up
  26
  27 Create a new branch from `master`.
  28
  29 ----
  30 git checkout -b revwalk origin/master
  31 ----
  32
  33 We'll put our fiddling into a new command. For fun, let's name it `git walken`.
  34 Open up a new file `builtin/walken.c` and set up the command handler:
  35
  36 ----
  37 /*
  38  * "git walken"
  39  *
  40  * Part of the "My First Object Walk" tutorial.
  41  */
  42
  43 #include "builtin.h"
  44
  45 int cmd_walken(int argc, const char **argv, const char *prefix)
  46 {
  47         trace_printf(_("cmd_walken incoming...\n"));
  48         return 0;
  49 }
  50 ----
  51
  52 NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or
  53 off at runtime. For the purposes of this tutorial, we will write `walken` as
  54 though it is intended for use as a "plumbing" command: that is, a command which
  55 is used primarily in scripts, rather than interactively by humans (a "porcelain"
  56 command). So we will send our debug output to `trace_printf()` instead. When
  57 running, enable trace output by setting the environment variable `GIT_TRACE`.
  58
  59 Add usage text and `-h` handling, like all subcommands should consistently do
  60 (our test suite will notice and complain if you fail to do so).
  61
  62 ----
  63 int cmd_walken(int argc, const char **argv, const char *prefix)
  64 {
  65         const char * const walken_usage[] = {
  66                 N_("git walken"),
  67                 NULL,
  68         }
  69         struct option options[] = {
  70                 OPT_END()
  71         };
  72
  73         argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
  74
  75         ...
  76 }
  77 ----
  78
  79 Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
  80
  81 ----
  82 int cmd_walken(int argc, const char **argv, const char *prefix);
  83 ----
  84
  85 Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
  86 maintaining alphabetical ordering:
  87
  88 ----
  89 { "walken", cmd_walken, RUN_SETUP },
  90 ----
  91
  92 Add it to the `Makefile` near the line for `builtin/worktree.o`:
  93
  94 ----
  95 BUILTIN_OBJS += builtin/walken.o
  96 ----
  97
  98 Build and test out your command, without forgetting to ensure the `DEVELOPER`
  99 flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
 100
 101 ----
 102 $ echo DEVELOPER=1 >>config.mak
 103 $ make
 104 $ GIT_TRACE=1 ./bin-wrappers/git walken
 105 ----
 106
 107 NOTE: For a more exhaustive overview of the new command process, take a look at
 108 `Documentation/MyFirstContribution.txt`.
 109
 110 NOTE: A reference implementation can be found at
 111 https://github.com/nasamuffin/git/tree/revwalk.
 112
 113 === `struct rev_cmdline_info`
 114
 115 The definition of `struct rev_cmdline_info` can be found in `revision.h`.
 116
 117 This struct is contained within the `rev_info` struct and is used to reflect
 118 parameters provided by the user over the CLI.
 119
 120 `nr` represents the number of `rev_cmdline_entry` present in the array.
 121
 122 `alloc` is used by the `ALLOC_GROW` macro. Check
 123 `Documentation/technical/api-allocation-growing.txt` - this variable is used to
 124 track the allocated size of the list.
 125
 126 Per entry, we find:
 127
 128 `item` is the object provided upon which to base the object walk. Items in Git
 129 can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
 130
 131 `name` is the object ID (OID) of the object - a hex string you may be familiar
 132 with from using Git to organize your source in the past. Check the tutorial
 133 mentioned above towards the top for a discussion of where the OID can come
 134 from.
 135
 136 `whence` indicates some information about what to do with the parents of the
 137 specified object. We'll explore this flag more later on; take a look at
 138 `Documentation/revisions.txt` to get an idea of what could set the `whence`
 139 value.
 140
 141 `flags` are used to hint the beginning of the revision walk and are the first
 142 block under the `#include`s in `revision.h`. The most likely ones to be set in
 143 the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
 144 can be used during the walk, as well.
 145
 146 === `struct rev_info`
 147
 148 This one is quite a bit longer, and many fields are only used during the walk
 149 by `revision.c` - not configuration options. Most of the configurable flags in
 150 `struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
 151 good idea to take some time and read through that document.
 152
 153 == Basic Commit Walk
 154
 155 First, let's see if we can replicate the output of `git log --oneline`. We'll
 156 refer back to the implementation frequently to discover norms when performing
 157 an object walk of our own.
 158
 159 To do so, we'll first find all the commits, in order, which preceded the current
 160 commit. We'll extract the name and subject of the commit from each.
 161
 162 Ideally, we will also be able to find out which ones are currently at the tip of
 163 various branches.
 164
 165 === Setting Up
 166
 167 Preparing for your object walk has some distinct stages.
 168
 169 1. Perform default setup for this mode, and others which may be invoked.
 170 2. Check configuration files for relevant settings.
 171 3. Set up the `rev_info` struct.
 172 4. Tweak the initialized `rev_info` to suit the current walk.
 173 5. Prepare the `rev_info` for the walk.
 174 6. Iterate over the objects, processing each one.
 175
 176 ==== Default Setups
 177
 178 Before examining configuration files which may modify command behavior, set up
 179 default state for switches or options your command may have. If your command
 180 utilizes other Git components, ask them to set up their default states as well.
 181 For instance, `git log` takes advantage of `grep` and `diff` functionality, so
 182 its `init_log_defaults()` sets its own state (`decoration_style`) and asks
 183 `grep` and `diff` to initialize themselves by calling each of their
 184 initialization functions.
 185
 186 For our first example within `git walken`, we don't intend to use any other
 187 components within Git, and we don't have any configuration to do.  However, we
 188 may want to add some later, so for now, we can add an empty placeholder. Create
 189 a new function in `builtin/walken.c`:
 190
 191 ----
 192 static void init_walken_defaults(void)
 193 {
 194         /*
 195          * We don't actually need the same components `git log` does; leave this
 196          * empty for now.
 197          */
 198 }
 199 ----
 200
 201 Make sure to add a line invoking it inside of `cmd_walken()`.
 202
 203 ----
 204 int cmd_walken(int argc, const char **argv, const char *prefix)
 205 {
 206         init_walken_defaults();
 207 }
 208 ----
 209
 210 ==== Configuring From `.gitconfig`
 211
 212 Next, we should have a look at any relevant configuration settings (i.e.,
 213 settings readable and settable from `git config`). This is done by providing a
 214 callback to `git_config()`; within that callback, you can also invoke methods
 215 from other components you may need that need to intercept these options. Your
 216 callback will be invoked once per each configuration value which Git knows about
 217 (global, local, worktree, etc.).
 218
 219 Similarly to the default values, we don't have anything to do here yet
 220 ourselves; however, we should call `git_default_config()` if we aren't calling
 221 any other existing config callbacks.
 222
 223 Add a new function to `builtin/walken.c`:
 224
 225 ----
 226 static int git_walken_config(const char *var, const char *value, void *cb)
 227 {
 228         /*
 229          * For now, we don't have any custom configuration, so fall back to
 230          * the default config.
 231          */
 232         return git_default_config(var, value, cb);
 233 }
 234 ----
 235
 236 Make sure to invoke `git_config()` with it in your `cmd_walken()`:
 237
 238 ----
 239 int cmd_walken(int argc, const char **argv, const char *prefix)
 240 {
 241         ...
 242
 243         git_config(git_walken_config, NULL);
 244
 245         ...
 246 }
 247 ----
 248
 249 ==== Setting Up `rev_info`
 250
 251 Now that we've gathered external configuration and options, it's time to
 252 initialize the `rev_info` object which we will use to perform the walk. This is
 253 typically done by calling `repo_init_revisions()` with the repository you intend
 254 to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
 255 struct.
 256
 257 Add the `struct rev_info` and the `repo_init_revisions()` call:
 258 ----
 259 int cmd_walken(int argc, const char **argv, const char *prefix)
 260 {
 261         /* This can go wherever you like in your declarations.*/
 262         struct rev_info rev;
 263         ...
 264
 265         /* This should go after the git_config() call. */
 266         repo_init_revisions(the_repository, &rev, prefix);
 267
 268         ...
 269 }
 270 ----
 271
 272 ==== Tweaking `rev_info` For the Walk
 273
 274 We're getting close, but we're still not quite ready to go. Now that `rev` is
 275 initialized, we can modify it to fit our needs. This is usually done within a
 276 helper for clarity, so let's add one:
 277
 278 ----
 279 static void final_rev_info_setup(struct rev_info *rev)
 280 {
 281         /*
 282          * We want to mimic the appearance of `git log --oneline`, so let's
 283          * force oneline format.
 284          */
 285         get_commit_format("oneline", rev);
 286
 287         /* Start our object walk at HEAD. */
 288         add_head_to_pending(rev);
 289 }
 290 ----
 291
 292 [NOTE]
 293 ====
 294 Instead of using the shorthand `add_head_to_pending()`, you could do
 295 something like this:
 296 ----
 297         struct setup_revision_opt opt;
 298
 299         memset(&opt, 0, sizeof(opt));
 300         opt.def = "HEAD";
 301         opt.revarg_opt = REVARG_COMMITTISH;
 302         setup_revisions(argc, argv, rev, &opt);
 303 ----
 304 Using a `setup_revision_opt` gives you finer control over your walk's starting
 305 point.
 306 ====
 307
 308 Then let's invoke `final_rev_info_setup()` after the call to
 309 `repo_init_revisions()`:
 310
 311 ----
 312 int cmd_walken(int argc, const char **argv, const char *prefix)
 313 {
 314         ...
 315
 316         final_rev_info_setup(&rev);
 317
 318         ...
 319 }
 320 ----
 321
 322 Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
 323 now, this is all we need.
 324
 325 ==== Preparing `rev_info` For the Walk
 326
 327 Now that `rev` is all initialized and configured, we've got one more setup step
 328 before we get rolling. We can do this in a helper, which will both prepare the
 329 `rev_info` for the walk, and perform the walk itself. Let's start the helper
 330 with the call to `prepare_revision_walk()`, which can return an error without
 331 dying on its own:
 332
 333 ----
 334 static void walken_commit_walk(struct rev_info *rev)
 335 {
 336         if (prepare_revision_walk(rev))
 337                 die(_("revision walk setup failed"));
 338 }
 339 ----
 340
 341 NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
 342 `stderr` it's likely to be seen by a human, so we will localize it.
 343
 344 ==== Performing the Walk!
 345
 346 Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
 347 can also be used as an iterator; we move to the next item in the walk by using
 348 `get_revision()` repeatedly. Add the listed variable declarations at the top and
 349 the walk loop below the `prepare_revision_walk()` call within your
 350 `walken_commit_walk()`:
 351
 352 ----
 353 static void walken_commit_walk(struct rev_info *rev)
 354 {
 355         struct commit *commit;
 356         struct strbuf prettybuf = STRBUF_INIT;
 357
 358         ...
 359
 360         while ((commit = get_revision(rev))) {
 361                 if (!commit)
 362                         continue;
 363
 364                 strbuf_reset(&prettybuf);
 365                 pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
 366                 puts(prettybuf.buf);
 367         }
 368         strbuf_release(&prettybuf);
 369 }
 370 ----
 371
 372 NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
 373 command we expect to be machine-parsed, we're sending it directly to stdout.
 374
 375 Give it a shot.
 376
 377 ----
 378 $ make
 379 $ ./bin-wrappers/git walken
 380 ----
 381
 382 You should see all of the subject lines of all the commits in
 383 your tree's history, in order, ending with the initial commit, "Initial revision
 384 of "git", the information manager from hell". Congratulations! You've written
 385 your first revision walk. You can play with printing some additional fields
 386 from each commit if you're curious; have a look at the functions available in
 387 `commit.h`.
 388
 389 === Adding a Filter
 390
 391 Next, let's try to filter the commits we see based on their author. This is
 392 equivalent to running `git log --author=<pattern>`. We can add a filter by
 393 modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
 394
 395 First some setup. Add `init_grep_defaults()` to `init_walken_defaults()` and add
 396 `grep_config()` to `git_walken_config()`:
 397
 398 ----
 399 static void init_walken_defaults(void)
 400 {
 401         init_grep_defaults(the_repository);
 402 }
 403
 404 ...
 405
 406 static int git_walken_config(const char *var, const char *value, void *cb)
 407 {
 408         grep_config(var, value, cb);
 409         return git_default_config(var, value, cb);
 410 }
 411 ----
 412
 413 Next, we can modify the `grep_filter`. This is done with convenience functions
 414 found in `grep.h`. For fun, we're filtering to only commits from folks using a
 415 `gmail.com` email address - a not-very-precise guess at who may be working on
 416 Git as a hobby. Since we're checking the author, which is a specific line in the
 417 header, we'll use the `append_header_grep_pattern()` helper. We can use
 418 the `enum grep_header_field` to indicate which part of the commit header we want
 419 to search.
 420
 421 In `final_rev_info_setup()`, add your filter line:
 422
 423 ----
 424 static void final_rev_info_setup(int argc, const char **argv,
 425                 const char *prefix, struct rev_info *rev)
 426 {
 427         ...
 428
 429         append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
 430                 "gmail");
 431         compile_grep_patterns(&rev->grep_filter);
 432
 433         ...
 434 }
 435 ----
 436
 437 `append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
 438 it won't work unless we compile it with `compile_grep_patterns()`.
 439
 440 NOTE: If you are using `setup_revisions()` (for example, if you are passing a
 441 `setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
 442 to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
 443
 444 NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
 445 wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
 446 `enum grep_pat_token` for us.
 447
 448 === Changing the Order
 449
 450 There are a few ways that we can change the order of the commits during a
 451 revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
 452 typical orderings.
 453
 454 `topo_order` is the same as `git log --topo-order`: we avoid showing a parent
 455 before all of its children have been shown, and we avoid mixing commits which
 456 are in different lines of history. (`git help log`'s section on `--topo-order`
 457 has a very nice diagram to illustrate this.)
 458
 459 Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
 460 `REV_SORT_BY_AUTHOR_DATE`. Add the following:
 461
 462 ----
 463 static void final_rev_info_setup(int argc, const char **argv,
 464                 const char *prefix, struct rev_info *rev)
 465 {
 466         ...
 467
 468         rev->topo_order = 1;
 469         rev->sort_order = REV_SORT_BY_COMMIT_DATE;
 470
 471         ...
 472 }
 473 ----
 474
 475 Let's output this into a file so we can easily diff it with the walk sorted by
 476 author date.
 477
 478 ----
 479 $ make
 480 $ ./bin-wrappers/git walken > commit-date.txt
 481 ----
 482
 483 Then, let's sort by author date and run it again.
 484
 485 ----
 486 static void final_rev_info_setup(int argc, const char **argv,
 487                 const char *prefix, struct rev_info *rev)
 488 {
 489         ...
 490
 491         rev->topo_order = 1;
 492         rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
 493
 494         ...
 495 }
 496 ----
 497
 498 ----
 499 $ make
 500 $ ./bin-wrappers/git walken > author-date.txt
 501 ----
 502
 503 Finally, compare the two. This is a little less helpful without object names or
 504 dates, but hopefully we get the idea.
 505
 506 ----
 507 $ diff -u commit-date.txt author-date.txt
 508 ----
 509
 510 This display indicates that commits can be reordered after they're written, for
 511 example with `git rebase`.
 512
 513 Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
 514 Set that flag somewhere inside of `final_rev_info_setup()`:
 515
 516 ----
 517 static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
 518                 struct rev_info *rev)
 519 {
 520         ...
 521
 522         rev->reverse = 1;
 523
 524         ...
 525 }
 526 ----
 527
 528 Run your walk again and note the difference in order. (If you remove the grep
 529 pattern, you should see the last commit this call gives you as your current
 530 HEAD.)
 531
 532 == Basic Object Walk
 533
 534 So far we've been walking only commits. But Git has more types of objects than
 535 that! Let's see if we can walk _all_ objects, and find out some information
 536 about each one.
 537
 538 We can base our work on an example. `git pack-objects` prepares all kinds of
 539 objects for packing into a bitmap or packfile. The work we are interested in
 540 resides in `builtins/pack-objects.c:get_object_list()`; examination of that
 541 function shows that the all-object walk is being performed by
 542 `traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
 543 functions reside in `list-objects.c`; examining the source shows that, despite
 544 the name, these functions traverse all kinds of objects. Let's have a look at
 545 the arguments to `traverse_commit_list_filtered()`, which are a superset of the
 546 arguments to the unfiltered version.
 547
 548 - `struct list_objects_filter_options *filter_options`: This is a struct which
 549   stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
 550 - `struct rev_info *revs`: This is the `rev_info` used for the walk.
 551 - `show_commit_fn show_commit`: A callback which will be used to handle each
 552   individual commit object.
 553 - `show_object_fn show_object`: A callback which will be used to handle each
 554   non-commit object (so each blob, tree, or tag).
 555 - `void *show_data`: A context buffer which is passed in turn to `show_commit`
 556   and `show_object`.
 557 - `struct oidset *omitted`: A linked-list of object IDs which the provided
 558   filter caused to be omitted.
 559
 560 It looks like this `traverse_commit_list_filtered()` uses callbacks we provide
 561 instead of needing us to call it repeatedly ourselves. Cool! Let's add the
 562 callbacks first.
 563
 564 For the sake of this tutorial, we'll simply keep track of how many of each kind
 565 of object we find. At file scope in `builtin/walken.c` add the following
 566 tracking variables:
 567
 568 ----
 569 static int commit_count;
 570 static int tag_count;
 571 static int blob_count;
 572 static int tree_count;
 573 ----
 574
 575 Commits are handled by a different callback than other objects; let's do that
 576 one first:
 577
 578 ----
 579 static void walken_show_commit(struct commit *cmt, void *buf)
 580 {
 581         commit_count++;
 582 }
 583 ----
 584
 585 The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
 586 the `buf` argument is actually the context buffer that we can provide to the
 587 traversal calls - `show_data`, which we mentioned a moment ago.
 588
 589 Since we have the `struct commit` object, we can look at all the same parts that
 590 we looked at in our earlier commit-only walk. For the sake of this tutorial,
 591 though, we'll just increment the commit counter and move on.
 592
 593 The callback for non-commits is a little different, as we'll need to check
 594 which kind of object we're dealing with:
 595
 596 ----
 597 static void walken_show_object(struct object *obj, const char *str, void *buf)
 598 {
 599         switch (obj->type) {
 600         case OBJ_TREE:
 601                 tree_count++;
 602                 break;
 603         case OBJ_BLOB:
 604                 blob_count++;
 605                 break;
 606         case OBJ_TAG:
 607                 tag_count++;
 608                 break;
 609         case OBJ_COMMIT:
 610                 BUG("unexpected commit object in walken_show_object\n");
 611         default:
 612                 BUG("unexpected object type %s in walken_show_object\n",
 613                         type_name(obj->type));
 614         }
 615 }
 616 ----
 617
 618 Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
 619 context pointer that `walken_show_commit()` receives: the `show_data` argument
 620 to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
 621 `str` contains the name of the object, which ends up being something like
 622 `foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
 623
 624 To help assure us that we aren't double-counting commits, we'll include some
 625 complaining if a commit object is routed through our non-commit callback; we'll
 626 also complain if we see an invalid object type. Since those two cases should be
 627 unreachable, and would only change in the event of a semantic change to the Git
 628 codebase, we complain by using `BUG()` - which is a signal to a developer that
 629 the change they made caused unintended consequences, and the rest of the
 630 codebase needs to be updated to understand that change. `BUG()` is not intended
 631 to be seen by the public, so it is not localized.
 632
 633 Our main object walk implementation is substantially different from our commit
 634 walk implementation, so let's make a new function to perform the object walk. We
 635 can perform setup which is applicable to all objects here, too, to keep separate
 636 from setup which is applicable to commit-only walks.
 637
 638 We'll start by enabling all types of objects in the `struct rev_info`.  We'll
 639 also turn on `tree_blobs_in_commit_order`, which means that we will walk a
 640 commit's tree and everything it points to immediately after we find each commit,
 641 as opposed to waiting for the end and walking through all trees after the commit
 642 history has been discovered. With the appropriate settings configured, we are
 643 ready to call `prepare_revision_walk()`.
 644
 645 ----
 646 static void walken_object_walk(struct rev_info *rev)
 647 {
 648         rev->tree_objects = 1;
 649         rev->blob_objects = 1;
 650         rev->tag_objects = 1;
 651         rev->tree_blobs_in_commit_order = 1;
 652
 653         if (prepare_revision_walk(rev))
 654                 die(_("revision walk setup failed"));
 655
 656         commit_count = 0;
 657         tag_count = 0;
 658         blob_count = 0;
 659         tree_count = 0;
 660 ----
 661
 662 Let's start by calling just the unfiltered walk and reporting our counts.
 663 Complete your implementation of `walken_object_walk()`:
 664
 665 ----
 666         traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
 667
 668         printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
 669                 blob_count, tag_count, tree_count);
 670 }
 671 ----
 672
 673 NOTE: This output is intended to be machine-parsed. Therefore, we are not
 674 sending it to `trace_printf()`, and we are not localizing it - we need scripts
 675 to be able to count on the formatting to be exactly the way it is shown here.
 676 If we were intending this output to be read by humans, we would need to localize
 677 it with `_()`.
 678
 679 Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
 680 command line options is out of scope for this tutorial, so we'll just hardcode
 681 a branch we can change at compile time. Where you call `final_rev_info_setup()`
 682 and `walken_commit_walk()`, instead branch like so:
 683
 684 ----
 685         if (1) {
 686                 add_head_to_pending(&rev);
 687                 walken_object_walk(&rev);
 688         } else {
 689                 final_rev_info_setup(argc, argv, prefix, &rev);
 690                 walken_commit_walk(&rev);
 691         }
 692 ----
 693
 694 NOTE: For simplicity, we've avoided all the filters and sorts we applied in
 695 `final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
 696 want, you can certainly use the filters we added before by moving
 697 `final_rev_info_setup()` out of the conditional and removing the call to
 698 `add_head_to_pending()`.
 699
 700 Now we can try to run our command! It should take noticeably longer than the
 701 commit walk, but an examination of the output will give you an idea why. Your
 702 output should look similar to this example, but with different counts:
 703
 704 ----
 705 Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
 706 ----
 707
 708 This makes sense. We have more trees than commits because the Git project has
 709 lots of subdirectories which can change, plus at least one tree per commit. We
 710 have no tags because we started on a commit (`HEAD`) and while tags can point to
 711 commits, commits can't point to tags.
 712
 713 NOTE: You will have different counts when you run this yourself! The number of
 714 objects grows along with the Git project.
 715
 716 === Adding a Filter
 717
 718 There are a handful of filters that we can apply to the object walk laid out in
 719 `Documentation/rev-list-options.txt`. These filters are typically useful for
 720 operations such as creating packfiles or performing a partial clone. They are
 721 defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
 722 will use the "tree:1" filter, which causes the walk to omit all trees and blobs
 723 which are not directly referenced by commits reachable from the commit in
 724 `pending` when the walk begins. (`pending` is the list of objects which need to
 725 be traversed during a walk; you can imagine a breadth-first tree traversal to
 726 help understand. In our case, that means we omit trees and blobs not directly
 727 referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
 728 `HEAD` in the `pending` list.)
 729
 730 First, we'll need to `#include "list-objects-filter-options.h`" and set up the
 731 `struct list_objects_filter_options` at the top of the function.
 732
 733 ----
 734 static void walken_object_walk(struct rev_info *rev)
 735 {
 736         struct list_objects_filter_options filter_options = {};
 737
 738         ...
 739 ----
 740
 741 For now, we are not going to track the omitted objects, so we'll replace those
 742 parameters with `NULL`. For the sake of simplicity, we'll add a simple
 743 build-time branch to use our filter or not. Replace the line calling
 744 `traverse_commit_list()` with the following, which will remind us which kind of
 745 walk we've just performed:
 746
 747 ----
 748         if (0) {
 749                 /* Unfiltered: */
 750                 trace_printf(_("Unfiltered object walk.\n"));
 751                 traverse_commit_list(rev, walken_show_commit,
 752                                 walken_show_object, NULL);
 753         } else {
 754                 trace_printf(
 755                         _("Filtered object walk with filterspec 'tree:1'.\n"));
 756                 parse_list_objects_filter(&filter_options, "tree:1");
 757
 758                 traverse_commit_list_filtered(&filter_options, rev,
 759                         walken_show_commit, walken_show_object, NULL, NULL);
 760         }
 761 ----
 762
 763 `struct list_objects_filter_options` is usually built directly from a command
 764 line argument, so the module provides an easy way to build one from a string.
 765 Even though we aren't taking user input right now, we can still build one with
 766 a hardcoded string using `parse_list_objects_filter()`.
 767
 768 With the filter spec "tree:1", we are expecting to see _only_ the root tree for
 769 each commit; therefore, the tree object count should be less than or equal to
 770 the number of commits. (For an example of why that's true: `git commit --revert`
 771 points to the same tree object as its grandparent.)
 772
 773 === Counting Omitted Objects
 774
 775 We also have the capability to enumerate all objects which were omitted by a
 776 filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
 777 `traverse_commit_list_filtered()` to populate the `omitted` list means that our
 778 object walk does not perform any better than an unfiltered object walk; all
 779 reachable objects are walked in order to populate the list.
 780
 781 First, add the `struct oidset` and related items we will use to iterate it:
 782
 783 ----
 784 static void walken_object_walk(
 785         ...
 786
 787         struct oidset omitted;
 788         struct oidset_iter oit;
 789         struct object_id *oid = NULL;
 790         int omitted_count = 0;
 791         oidset_init(&omitted, 0);
 792
 793         ...
 794 ----
 795
 796 Modify the call to `traverse_commit_list_filtered()` to include your `omitted`
 797 object:
 798
 799 ----
 800         ...
 801
 802                 traverse_commit_list_filtered(&filter_options, rev,
 803                         walken_show_commit, walken_show_object, NULL, &omitted);
 804
 805         ...
 806 ----
 807
 808 Then, after your traversal, the `oidset` traversal is pretty straightforward.
 809 Count all the objects within and modify the print statement:
 810
 811 ----
 812         /* Count the omitted objects. */
 813         oidset_iter_init(&omitted, &oit);
 814
 815         while ((oid = oidset_iter_next(&oit)))
 816                 omitted_count++;
 817
 818         printf("commits %d\nblobs %d\ntags %d\ntrees%d\nomitted %d\n",
 819                 commit_count, blob_count, tag_count, tree_count, omitted_count);
 820 ----
 821
 822 By running your walk with and without the filter, you should find that the total
 823 object count in each case is identical. You can also time each invocation of
 824 the `walken` subcommand, with and without `omitted` being passed in, to confirm
 825 to yourself the runtime impact of tracking all omitted objects.
 826
 827 === Changing the Order
 828
 829 Finally, let's demonstrate that you can also reorder walks of all objects, not
 830 just walks of commits. First, we'll make our handlers chattier - modify
 831 `walken_show_commit()` and `walken_show_object()` to print the object as they
 832 go:
 833
 834 ----
 835 static void walken_show_commit(struct commit *cmt, void *buf)
 836 {
 837         trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
 838         commit_count++;
 839 }
 840
 841 static void walken_show_object(struct object *obj, const char *str, void *buf)
 842 {
 843         trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
 844
 845         ...
 846 }
 847 ----
 848
 849 NOTE: Since we will be examining this output directly as humans, we'll use
 850 `trace_printf()` here. Additionally, since this change introduces a significant
 851 number of printed lines, using `trace_printf()` will allow us to easily silence
 852 those lines without having to recompile.
 853
 854 (Leave the counter increment logic in place.)
 855
 856 With only that change, run again (but save yourself some scrollback):
 857
 858 ----
 859 $ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10
 860 ----
 861
 862 Take a look at the top commit with `git show` and the object ID you printed; it
 863 should be the same as the output of `git show HEAD`.
 864
 865 Next, let's change a setting on our `struct rev_info` within
 866 `walken_object_walk()`. Find where you're changing the other settings on `rev`,
 867 such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
 868 `reverse` setting at the bottom:
 869
 870 ----
 871         ...
 872
 873         rev->tree_objects = 1;
 874         rev->blob_objects = 1;
 875         rev->tag_objects = 1;
 876         rev->tree_blobs_in_commit_order = 1;
 877         rev->reverse = 1;
 878
 879         ...
 880 ----
 881
 882 Now, run again, but this time, let's grab the last handful of objects instead
 883 of the first handful:
 884
 885 ----
 886 $ make
 887 $ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10
 888 ----
 889
 890 The last commit object given should have the same OID as the one we saw at the
 891 top before, and running `git show <oid>` with that OID should give you again
 892 the same results as `git show HEAD`. Furthermore, if you run and examine the
 893 first ten lines again (with `head` instead of `tail` like we did before applying
 894 the `reverse` setting), you should see that now the first commit printed is the
 895 initial commit, `e83c5163`.
 896
 897 == Wrapping Up
 898
 899 Let's review. In this tutorial, we:
 900
 901 - Built a commit walk from the ground up
 902 - Enabled a grep filter for that commit walk
 903 - Changed the sort order of that filtered commit walk
 904 - Built an object walk (tags, commits, trees, and blobs) from the ground up
 905 - Learned how to add a filter-spec to an object walk
 906 - Changed the display order of the filtered object walk