MyFirstObjectWalk.txt

   1 = My First Object Walk
   2
   3 == What's an Object Walk?
   4
   5 The object walk is a key concept in Git - this is the process that underpins
   6 operations like object transfer and fsck. Beginning from a given commit, the
   7 list of objects is found by walking parent relationships between commits (commit
   8 X based on commit W) and containment relationships between objects (tree Y is
   9 contained within commit X, and blob Z is located within tree Y, giving our
  10 working tree for commit X something like `y/z.txt`).
  11
  12 A related concept is the revision walk, which is focused on commit objects and
  13 their parent relationships and does not delve into other object types. The
  14 revision walk is used for operations like `git log`.
  15
  16 === Related Reading
  17
  18 - `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
  19   the revision walker in its various incarnations.
  20 - `revision.h`
  21 - https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
  22   gives a good overview of the types of objects in Git and what your object
  23   walk is really describing.
  24
  25 == Setting Up
  26
  27 Create a new branch from `master`.
  28
  29 ----
  30 git checkout -b revwalk origin/master
  31 ----
  32
  33 We'll put our fiddling into a new command. For fun, let's name it `git walken`.
  34 Open up a new file `builtin/walken.c` and set up the command handler:
  35
  36 ----
  37 /*
  38  * "git walken"
  39  *
  40  * Part of the "My First Object Walk" tutorial.
  41  */
  42
  43 #include "builtin.h"
  44 #include "trace.h"
  45
  46 int cmd_walken(int argc, const char **argv, const char *prefix)
  47 {
  48         trace_printf(_("cmd_walken incoming...\n"));
  49         return 0;
  50 }
  51 ----
  52
  53 NOTE: `trace_printf()`, defined in `trace.h`, differs from `printf()` in
  54 that it can be turned on or off at runtime. For the purposes of this
  55 tutorial, we will write `walken` as though it is intended for use as
  56 a "plumbing" command: that is, a command which is used primarily in
  57 scripts, rather than interactively by humans (a "porcelain" command).
  58 So we will send our debug output to `trace_printf()` instead.
  59 When running, enable trace output by setting the environment variable `GIT_TRACE`.
  60
  61 Add usage text and `-h` handling, like all subcommands should consistently do
  62 (our test suite will notice and complain if you fail to do so).
  63 We'll need to include the `parse-options.h` header.
  64
  65 ----
  66 #include "parse-options.h"
  67
  68 ...
  69
  70 int cmd_walken(int argc, const char **argv, const char *prefix)
  71 {
  72         const char * const walken_usage[] = {
  73                 N_("git walken"),
  74                 NULL,
  75         };
  76         struct option options[] = {
  77                 OPT_END()
  78         };
  79
  80         argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
  81
  82         ...
  83 }
  84 ----
  85
  86 Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
  87
  88 ----
  89 int cmd_walken(int argc, const char **argv, const char *prefix);
  90 ----
  91
  92 Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
  93 maintaining alphabetical ordering:
  94
  95 ----
  96 { "walken", cmd_walken, RUN_SETUP },
  97 ----
  98
  99 Add it to the `Makefile` near the line for `builtin/worktree.o`:
 100
 101 ----
 102 BUILTIN_OBJS += builtin/walken.o
 103 ----
 104
 105 Build and test out your command, without forgetting to ensure the `DEVELOPER`
 106 flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
 107
 108 ----
 109 $ echo DEVELOPER=1 >>config.mak
 110 $ make
 111 $ GIT_TRACE=1 ./bin-wrappers/git walken
 112 ----
 113
 114 NOTE: For a more exhaustive overview of the new command process, take a look at
 115 `Documentation/MyFirstContribution.txt`.
 116
 117 NOTE: A reference implementation can be found at
 118 https://github.com/nasamuffin/git/tree/revwalk.
 119
 120 === `struct rev_cmdline_info`
 121
 122 The definition of `struct rev_cmdline_info` can be found in `revision.h`.
 123
 124 This struct is contained within the `rev_info` struct and is used to reflect
 125 parameters provided by the user over the CLI.
 126
 127 `nr` represents the number of `rev_cmdline_entry` present in the array.
 128
 129 `alloc` is used by the `ALLOC_GROW` macro. Check `alloc.h` - this variable is
 130 used to track the allocated size of the list.
 131
 132 Per entry, we find:
 133
 134 `item` is the object provided upon which to base the object walk. Items in Git
 135 can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
 136
 137 `name` is the object ID (OID) of the object - a hex string you may be familiar
 138 with from using Git to organize your source in the past. Check the tutorial
 139 mentioned above towards the top for a discussion of where the OID can come
 140 from.
 141
 142 `whence` indicates some information about what to do with the parents of the
 143 specified object. We'll explore this flag more later on; take a look at
 144 `Documentation/revisions.txt` to get an idea of what could set the `whence`
 145 value.
 146
 147 `flags` are used to hint the beginning of the revision walk and are the first
 148 block under the `#include`s in `revision.h`. The most likely ones to be set in
 149 the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
 150 can be used during the walk, as well.
 151
 152 === `struct rev_info`
 153
 154 This one is quite a bit longer, and many fields are only used during the walk
 155 by `revision.c` - not configuration options. Most of the configurable flags in
 156 `struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
 157 good idea to take some time and read through that document.
 158
 159 == Basic Commit Walk
 160
 161 First, let's see if we can replicate the output of `git log --oneline`. We'll
 162 refer back to the implementation frequently to discover norms when performing
 163 an object walk of our own.
 164
 165 To do so, we'll first find all the commits, in order, which preceded the current
 166 commit. We'll extract the name and subject of the commit from each.
 167
 168 Ideally, we will also be able to find out which ones are currently at the tip of
 169 various branches.
 170
 171 === Setting Up
 172
 173 Preparing for your object walk has some distinct stages.
 174
 175 1. Perform default setup for this mode, and others which may be invoked.
 176 2. Check configuration files for relevant settings.
 177 3. Set up the `rev_info` struct.
 178 4. Tweak the initialized `rev_info` to suit the current walk.
 179 5. Prepare the `rev_info` for the walk.
 180 6. Iterate over the objects, processing each one.
 181
 182 ==== Default Setups
 183
 184 Before examining configuration files which may modify command behavior, set up
 185 default state for switches or options your command may have. If your command
 186 utilizes other Git components, ask them to set up their default states as well.
 187 For instance, `git log` takes advantage of `grep` and `diff` functionality, so
 188 its `init_log_defaults()` sets its own state (`decoration_style`) and asks
 189 `grep` and `diff` to initialize themselves by calling each of their
 190 initialization functions.
 191
 192 ==== Configuring From `.gitconfig`
 193
 194 Next, we should have a look at any relevant configuration settings (i.e.,
 195 settings readable and settable from `git config`). This is done by providing a
 196 callback to `git_config()`; within that callback, you can also invoke methods
 197 from other components you may need that need to intercept these options. Your
 198 callback will be invoked once per each configuration value which Git knows about
 199 (global, local, worktree, etc.).
 200
 201 Similarly to the default values, we don't have anything to do here yet
 202 ourselves; however, we should call `git_default_config()` if we aren't calling
 203 any other existing config callbacks.
 204
 205 Add a new function to `builtin/walken.c`.
 206 We'll also need to include the `config.h` header:
 207
 208 ----
 209 #include "config.h"
 210
 211 ...
 212
 213 static int git_walken_config(const char *var, const char *value, void *cb)
 214 {
 215         /*
 216          * For now, we don't have any custom configuration, so fall back to
 217          * the default config.
 218          */
 219         return git_default_config(var, value, cb);
 220 }
 221 ----
 222
 223 Make sure to invoke `git_config()` with it in your `cmd_walken()`:
 224
 225 ----
 226 int cmd_walken(int argc, const char **argv, const char *prefix)
 227 {
 228         ...
 229
 230         git_config(git_walken_config, NULL);
 231
 232         ...
 233 }
 234 ----
 235
 236 ==== Setting Up `rev_info`
 237
 238 Now that we've gathered external configuration and options, it's time to
 239 initialize the `rev_info` object which we will use to perform the walk. This is
 240 typically done by calling `repo_init_revisions()` with the repository you intend
 241 to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
 242 struct.
 243
 244 Add the `struct rev_info` and the `repo_init_revisions()` call.
 245 We'll also need to include the `revision.h` header:
 246
 247 ----
 248 #include "revision.h"
 249
 250 ...
 251
 252 int cmd_walken(int argc, const char **argv, const char *prefix)
 253 {
 254         /* This can go wherever you like in your declarations.*/
 255         struct rev_info rev;
 256         ...
 257
 258         /* This should go after the git_config() call. */
 259         repo_init_revisions(the_repository, &rev, prefix);
 260
 261         ...
 262 }
 263 ----
 264
 265 ==== Tweaking `rev_info` For the Walk
 266
 267 We're getting close, but we're still not quite ready to go. Now that `rev` is
 268 initialized, we can modify it to fit our needs. This is usually done within a
 269 helper for clarity, so let's add one:
 270
 271 ----
 272 static void final_rev_info_setup(struct rev_info *rev)
 273 {
 274         /*
 275          * We want to mimic the appearance of `git log --oneline`, so let's
 276          * force oneline format.
 277          */
 278         get_commit_format("oneline", rev);
 279
 280         /* Start our object walk at HEAD. */
 281         add_head_to_pending(rev);
 282 }
 283 ----
 284
 285 [NOTE]
 286 ====
 287 Instead of using the shorthand `add_head_to_pending()`, you could do
 288 something like this:
 289 ----
 290         struct setup_revision_opt opt;
 291
 292         memset(&opt, 0, sizeof(opt));
 293         opt.def = "HEAD";
 294         opt.revarg_opt = REVARG_COMMITTISH;
 295         setup_revisions(argc, argv, rev, &opt);
 296 ----
 297 Using a `setup_revision_opt` gives you finer control over your walk's starting
 298 point.
 299 ====
 300
 301 Then let's invoke `final_rev_info_setup()` after the call to
 302 `repo_init_revisions()`:
 303
 304 ----
 305 int cmd_walken(int argc, const char **argv, const char *prefix)
 306 {
 307         ...
 308
 309         final_rev_info_setup(&rev);
 310
 311         ...
 312 }
 313 ----
 314
 315 Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
 316 now, this is all we need.
 317
 318 ==== Preparing `rev_info` For the Walk
 319
 320 Now that `rev` is all initialized and configured, we've got one more setup step
 321 before we get rolling. We can do this in a helper, which will both prepare the
 322 `rev_info` for the walk, and perform the walk itself. Let's start the helper
 323 with the call to `prepare_revision_walk()`, which can return an error without
 324 dying on its own:
 325
 326 ----
 327 static void walken_commit_walk(struct rev_info *rev)
 328 {
 329         if (prepare_revision_walk(rev))
 330                 die(_("revision walk setup failed"));
 331 }
 332 ----
 333
 334 NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
 335 `stderr` it's likely to be seen by a human, so we will localize it.
 336
 337 ==== Performing the Walk!
 338
 339 Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
 340 can also be used as an iterator; we move to the next item in the walk by using
 341 `get_revision()` repeatedly. Add the listed variable declarations at the top and
 342 the walk loop below the `prepare_revision_walk()` call within your
 343 `walken_commit_walk()`:
 344
 345 ----
 346 #include "pretty.h"
 347
 348 ...
 349
 350 static void walken_commit_walk(struct rev_info *rev)
 351 {
 352         struct commit *commit;
 353         struct strbuf prettybuf = STRBUF_INIT;
 354
 355         ...
 356
 357         while ((commit = get_revision(rev))) {
 358                 strbuf_reset(&prettybuf);
 359                 pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
 360                 puts(prettybuf.buf);
 361         }
 362         strbuf_release(&prettybuf);
 363 }
 364 ----
 365
 366 NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
 367 command we expect to be machine-parsed, we're sending it directly to stdout.
 368
 369 Give it a shot.
 370
 371 ----
 372 $ make
 373 $ ./bin-wrappers/git walken
 374 ----
 375
 376 You should see all of the subject lines of all the commits in
 377 your tree's history, in order, ending with the initial commit, "Initial revision
 378 of "git", the information manager from hell". Congratulations! You've written
 379 your first revision walk. You can play with printing some additional fields
 380 from each commit if you're curious; have a look at the functions available in
 381 `commit.h`.
 382
 383 === Adding a Filter
 384
 385 Next, let's try to filter the commits we see based on their author. This is
 386 equivalent to running `git log --author=<pattern>`. We can add a filter by
 387 modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
 388
 389 First some setup. Add `grep_config()` to `git_walken_config()`:
 390
 391 ----
 392 static int git_walken_config(const char *var, const char *value, void *cb)
 393 {
 394         grep_config(var, value, cb);
 395         return git_default_config(var, value, cb);
 396 }
 397 ----
 398
 399 Next, we can modify the `grep_filter`. This is done with convenience functions
 400 found in `grep.h`. For fun, we're filtering to only commits from folks using a
 401 `gmail.com` email address - a not-very-precise guess at who may be working on
 402 Git as a hobby. Since we're checking the author, which is a specific line in the
 403 header, we'll use the `append_header_grep_pattern()` helper. We can use
 404 the `enum grep_header_field` to indicate which part of the commit header we want
 405 to search.
 406
 407 In `final_rev_info_setup()`, add your filter line:
 408
 409 ----
 410 static void final_rev_info_setup(int argc, const char **argv,
 411                 const char *prefix, struct rev_info *rev)
 412 {
 413         ...
 414
 415         append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
 416                 "gmail");
 417         compile_grep_patterns(&rev->grep_filter);
 418
 419         ...
 420 }
 421 ----
 422
 423 `append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
 424 it won't work unless we compile it with `compile_grep_patterns()`.
 425
 426 NOTE: If you are using `setup_revisions()` (for example, if you are passing a
 427 `setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
 428 to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
 429
 430 NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
 431 wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
 432 `enum grep_pat_token` for us.
 433
 434 === Changing the Order
 435
 436 There are a few ways that we can change the order of the commits during a
 437 revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
 438 typical orderings.
 439
 440 `topo_order` is the same as `git log --topo-order`: we avoid showing a parent
 441 before all of its children have been shown, and we avoid mixing commits which
 442 are in different lines of history. (`git help log`'s section on `--topo-order`
 443 has a very nice diagram to illustrate this.)
 444
 445 Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
 446 `REV_SORT_BY_AUTHOR_DATE`. Add the following:
 447
 448 ----
 449 static void final_rev_info_setup(int argc, const char **argv,
 450                 const char *prefix, struct rev_info *rev)
 451 {
 452         ...
 453
 454         rev->topo_order = 1;
 455         rev->sort_order = REV_SORT_BY_COMMIT_DATE;
 456
 457         ...
 458 }
 459 ----
 460
 461 Let's output this into a file so we can easily diff it with the walk sorted by
 462 author date.
 463
 464 ----
 465 $ make
 466 $ ./bin-wrappers/git walken > commit-date.txt
 467 ----
 468
 469 Then, let's sort by author date and run it again.
 470
 471 ----
 472 static void final_rev_info_setup(int argc, const char **argv,
 473                 const char *prefix, struct rev_info *rev)
 474 {
 475         ...
 476
 477         rev->topo_order = 1;
 478         rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
 479
 480         ...
 481 }
 482 ----
 483
 484 ----
 485 $ make
 486 $ ./bin-wrappers/git walken > author-date.txt
 487 ----
 488
 489 Finally, compare the two. This is a little less helpful without object names or
 490 dates, but hopefully we get the idea.
 491
 492 ----
 493 $ diff -u commit-date.txt author-date.txt
 494 ----
 495
 496 This display indicates that commits can be reordered after they're written, for
 497 example with `git rebase`.
 498
 499 Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
 500 Set that flag somewhere inside of `final_rev_info_setup()`:
 501
 502 ----
 503 static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
 504                 struct rev_info *rev)
 505 {
 506         ...
 507
 508         rev->reverse = 1;
 509
 510         ...
 511 }
 512 ----
 513
 514 Run your walk again and note the difference in order. (If you remove the grep
 515 pattern, you should see the last commit this call gives you as your current
 516 HEAD.)
 517
 518 == Basic Object Walk
 519
 520 So far we've been walking only commits. But Git has more types of objects than
 521 that! Let's see if we can walk _all_ objects, and find out some information
 522 about each one.
 523
 524 We can base our work on an example. `git pack-objects` prepares all kinds of
 525 objects for packing into a bitmap or packfile. The work we are interested in
 526 resides in `builtins/pack-objects.c:get_object_list()`; examination of that
 527 function shows that the all-object walk is being performed by
 528 `traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
 529 functions reside in `list-objects.c`; examining the source shows that, despite
 530 the name, these functions traverse all kinds of objects. Let's have a look at
 531 the arguments to `traverse_commit_list()`.
 532
 533 - `struct rev_info *revs`: This is the `rev_info` used for the walk. If
 534   its `filter` member is not `NULL`, then `filter` contains information for
 535   how to filter the object list.
 536 - `show_commit_fn show_commit`: A callback which will be used to handle each
 537   individual commit object.
 538 - `show_object_fn show_object`: A callback which will be used to handle each
 539   non-commit object (so each blob, tree, or tag).
 540 - `void *show_data`: A context buffer which is passed in turn to `show_commit`
 541   and `show_object`.
 542
 543 In addition, `traverse_commit_list_filtered()` has an additional parameter:
 544
 545 - `struct oidset *omitted`: A linked-list of object IDs which the provided
 546   filter caused to be omitted.
 547
 548 It looks like these methods use callbacks we provide instead of needing us
 549 to call it repeatedly ourselves. Cool! Let's add the callbacks first.
 550
 551 For the sake of this tutorial, we'll simply keep track of how many of each kind
 552 of object we find. At file scope in `builtin/walken.c` add the following
 553 tracking variables:
 554
 555 ----
 556 static int commit_count;
 557 static int tag_count;
 558 static int blob_count;
 559 static int tree_count;
 560 ----
 561
 562 Commits are handled by a different callback than other objects; let's do that
 563 one first:
 564
 565 ----
 566 static void walken_show_commit(struct commit *cmt, void *buf)
 567 {
 568         commit_count++;
 569 }
 570 ----
 571
 572 The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
 573 the `buf` argument is actually the context buffer that we can provide to the
 574 traversal calls - `show_data`, which we mentioned a moment ago.
 575
 576 Since we have the `struct commit` object, we can look at all the same parts that
 577 we looked at in our earlier commit-only walk. For the sake of this tutorial,
 578 though, we'll just increment the commit counter and move on.
 579
 580 The callback for non-commits is a little different, as we'll need to check
 581 which kind of object we're dealing with:
 582
 583 ----
 584 static void walken_show_object(struct object *obj, const char *str, void *buf)
 585 {
 586         switch (obj->type) {
 587         case OBJ_TREE:
 588                 tree_count++;
 589                 break;
 590         case OBJ_BLOB:
 591                 blob_count++;
 592                 break;
 593         case OBJ_TAG:
 594                 tag_count++;
 595                 break;
 596         case OBJ_COMMIT:
 597                 BUG("unexpected commit object in walken_show_object\n");
 598         default:
 599                 BUG("unexpected object type %s in walken_show_object\n",
 600                         type_name(obj->type));
 601         }
 602 }
 603 ----
 604
 605 Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
 606 context pointer that `walken_show_commit()` receives: the `show_data` argument
 607 to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
 608 `str` contains the name of the object, which ends up being something like
 609 `foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
 610
 611 To help assure us that we aren't double-counting commits, we'll include some
 612 complaining if a commit object is routed through our non-commit callback; we'll
 613 also complain if we see an invalid object type. Since those two cases should be
 614 unreachable, and would only change in the event of a semantic change to the Git
 615 codebase, we complain by using `BUG()` - which is a signal to a developer that
 616 the change they made caused unintended consequences, and the rest of the
 617 codebase needs to be updated to understand that change. `BUG()` is not intended
 618 to be seen by the public, so it is not localized.
 619
 620 Our main object walk implementation is substantially different from our commit
 621 walk implementation, so let's make a new function to perform the object walk. We
 622 can perform setup which is applicable to all objects here, too, to keep separate
 623 from setup which is applicable to commit-only walks.
 624
 625 We'll start by enabling all types of objects in the `struct rev_info`.  We'll
 626 also turn on `tree_blobs_in_commit_order`, which means that we will walk a
 627 commit's tree and everything it points to immediately after we find each commit,
 628 as opposed to waiting for the end and walking through all trees after the commit
 629 history has been discovered. With the appropriate settings configured, we are
 630 ready to call `prepare_revision_walk()`.
 631
 632 ----
 633 static void walken_object_walk(struct rev_info *rev)
 634 {
 635         rev->tree_objects = 1;
 636         rev->blob_objects = 1;
 637         rev->tag_objects = 1;
 638         rev->tree_blobs_in_commit_order = 1;
 639
 640         if (prepare_revision_walk(rev))
 641                 die(_("revision walk setup failed"));
 642
 643         commit_count = 0;
 644         tag_count = 0;
 645         blob_count = 0;
 646         tree_count = 0;
 647 ----
 648
 649 Let's start by calling just the unfiltered walk and reporting our counts.
 650 Complete your implementation of `walken_object_walk()`.
 651 We'll also need to include the `list-objects.h` header.
 652
 653 ----
 654 #include "list-objects.h"
 655
 656 ...
 657
 658         traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
 659
 660         printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
 661                 blob_count, tag_count, tree_count);
 662 }
 663 ----
 664
 665 NOTE: This output is intended to be machine-parsed. Therefore, we are not
 666 sending it to `trace_printf()`, and we are not localizing it - we need scripts
 667 to be able to count on the formatting to be exactly the way it is shown here.
 668 If we were intending this output to be read by humans, we would need to localize
 669 it with `_()`.
 670
 671 Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
 672 command line options is out of scope for this tutorial, so we'll just hardcode
 673 a branch we can change at compile time. Where you call `final_rev_info_setup()`
 674 and `walken_commit_walk()`, instead branch like so:
 675
 676 ----
 677         if (1) {
 678                 add_head_to_pending(&rev);
 679                 walken_object_walk(&rev);
 680         } else {
 681                 final_rev_info_setup(argc, argv, prefix, &rev);
 682                 walken_commit_walk(&rev);
 683         }
 684 ----
 685
 686 NOTE: For simplicity, we've avoided all the filters and sorts we applied in
 687 `final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
 688 want, you can certainly use the filters we added before by moving
 689 `final_rev_info_setup()` out of the conditional and removing the call to
 690 `add_head_to_pending()`.
 691
 692 Now we can try to run our command! It should take noticeably longer than the
 693 commit walk, but an examination of the output will give you an idea why. Your
 694 output should look similar to this example, but with different counts:
 695
 696 ----
 697 Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
 698 ----
 699
 700 This makes sense. We have more trees than commits because the Git project has
 701 lots of subdirectories which can change, plus at least one tree per commit. We
 702 have no tags because we started on a commit (`HEAD`) and while tags can point to
 703 commits, commits can't point to tags.
 704
 705 NOTE: You will have different counts when you run this yourself! The number of
 706 objects grows along with the Git project.
 707
 708 === Adding a Filter
 709
 710 There are a handful of filters that we can apply to the object walk laid out in
 711 `Documentation/rev-list-options.txt`. These filters are typically useful for
 712 operations such as creating packfiles or performing a partial clone. They are
 713 defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
 714 will use the "tree:1" filter, which causes the walk to omit all trees and blobs
 715 which are not directly referenced by commits reachable from the commit in
 716 `pending` when the walk begins. (`pending` is the list of objects which need to
 717 be traversed during a walk; you can imagine a breadth-first tree traversal to
 718 help understand. In our case, that means we omit trees and blobs not directly
 719 referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
 720 `HEAD` in the `pending` list.)
 721
 722 For now, we are not going to track the omitted objects, so we'll replace those
 723 parameters with `NULL`. For the sake of simplicity, we'll add a simple
 724 build-time branch to use our filter or not. Preface the line calling
 725 `traverse_commit_list()` with the following, which will remind us which kind of
 726 walk we've just performed:
 727
 728 ----
 729         if (0) {
 730                 /* Unfiltered: */
 731                 trace_printf(_("Unfiltered object walk.\n"));
 732         } else {
 733                 trace_printf(
 734                         _("Filtered object walk with filterspec 'tree:1'.\n"));
 735                 CALLOC_ARRAY(rev->filter, 1);
 736                 parse_list_objects_filter(rev->filter, "tree:1");
 737         }
 738         traverse_commit_list(rev, walken_show_commit,
 739                              walken_show_object, NULL);
 740 ----
 741
 742 The `rev->filter` member is usually built directly from a command
 743 line argument, so the module provides an easy way to build one from a string.
 744 Even though we aren't taking user input right now, we can still build one with
 745 a hardcoded string using `parse_list_objects_filter()`.
 746
 747 With the filter spec "tree:1", we are expecting to see _only_ the root tree for
 748 each commit; therefore, the tree object count should be less than or equal to
 749 the number of commits. (For an example of why that's true: `git commit --revert`
 750 points to the same tree object as its grandparent.)
 751
 752 === Counting Omitted Objects
 753
 754 We also have the capability to enumerate all objects which were omitted by a
 755 filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
 756 `traverse_commit_list_filtered()` to populate the `omitted` list means that our
 757 object walk does not perform any better than an unfiltered object walk; all
 758 reachable objects are walked in order to populate the list.
 759
 760 First, add the `struct oidset` and related items we will use to iterate it:
 761
 762 ----
 763 #include "oidset.h"
 764
 765 ...
 766
 767 static void walken_object_walk(
 768         ...
 769
 770         struct oidset omitted;
 771         struct oidset_iter oit;
 772         struct object_id *oid = NULL;
 773         int omitted_count = 0;
 774         oidset_init(&omitted, 0);
 775
 776         ...
 777 ----
 778
 779 Modify the call to `traverse_commit_list_filtered()` to include your `omitted`
 780 object:
 781
 782 ----
 783         ...
 784
 785                 traverse_commit_list_filtered(rev,
 786                         walken_show_commit, walken_show_object, NULL, &omitted);
 787
 788         ...
 789 ----
 790
 791 Then, after your traversal, the `oidset` traversal is pretty straightforward.
 792 Count all the objects within and modify the print statement:
 793
 794 ----
 795         /* Count the omitted objects. */
 796         oidset_iter_init(&omitted, &oit);
 797
 798         while ((oid = oidset_iter_next(&oit)))
 799                 omitted_count++;
 800
 801         printf("commits %d\nblobs %d\ntags %d\ntrees %d\nomitted %d\n",
 802                 commit_count, blob_count, tag_count, tree_count, omitted_count);
 803 ----
 804
 805 By running your walk with and without the filter, you should find that the total
 806 object count in each case is identical. You can also time each invocation of
 807 the `walken` subcommand, with and without `omitted` being passed in, to confirm
 808 to yourself the runtime impact of tracking all omitted objects.
 809
 810 === Changing the Order
 811
 812 Finally, let's demonstrate that you can also reorder walks of all objects, not
 813 just walks of commits. First, we'll make our handlers chattier - modify
 814 `walken_show_commit()` and `walken_show_object()` to print the object as they
 815 go:
 816
 817 ----
 818 #include "hex.h"
 819
 820 ...
 821
 822 static void walken_show_commit(struct commit *cmt, void *buf)
 823 {
 824         trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
 825         commit_count++;
 826 }
 827
 828 static void walken_show_object(struct object *obj, const char *str, void *buf)
 829 {
 830         trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
 831
 832         ...
 833 }
 834 ----
 835
 836 NOTE: Since we will be examining this output directly as humans, we'll use
 837 `trace_printf()` here. Additionally, since this change introduces a significant
 838 number of printed lines, using `trace_printf()` will allow us to easily silence
 839 those lines without having to recompile.
 840
 841 (Leave the counter increment logic in place.)
 842
 843 With only that change, run again (but save yourself some scrollback):
 844
 845 ----
 846 $ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10
 847 ----
 848
 849 Take a look at the top commit with `git show` and the object ID you printed; it
 850 should be the same as the output of `git show HEAD`.
 851
 852 Next, let's change a setting on our `struct rev_info` within
 853 `walken_object_walk()`. Find where you're changing the other settings on `rev`,
 854 such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
 855 `reverse` setting at the bottom:
 856
 857 ----
 858         ...
 859
 860         rev->tree_objects = 1;
 861         rev->blob_objects = 1;
 862         rev->tag_objects = 1;
 863         rev->tree_blobs_in_commit_order = 1;
 864         rev->reverse = 1;
 865
 866         ...
 867 ----
 868
 869 Now, run again, but this time, let's grab the last handful of objects instead
 870 of the first handful:
 871
 872 ----
 873 $ make
 874 $ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10
 875 ----
 876
 877 The last commit object given should have the same OID as the one we saw at the
 878 top before, and running `git show <oid>` with that OID should give you again
 879 the same results as `git show HEAD`. Furthermore, if you run and examine the
 880 first ten lines again (with `head` instead of `tail` like we did before applying
 881 the `reverse` setting), you should see that now the first commit printed is the
 882 initial commit, `e83c5163`.
 883
 884 == Wrapping Up
 885
 886 Let's review. In this tutorial, we:
 887
 888 - Built a commit walk from the ground up
 889 - Enabled a grep filter for that commit walk
 890 - Changed the sort order of that filtered commit walk
 891 - Built an object walk (tags, commits, trees, and blobs) from the ground up
 892 - Learned how to add a filter-spec to an object walk
 893 - Changed the display order of the filtered object walk