doc/find-maint.texi

   1 \input texinfo @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename find-maint.info
   4 @settitle Maintaining Findutils
   5 @c For double-sided printing, uncomment:
   6 @c @setchapternewpage odd
   7 @c %**end of header
   8
   9 @include versionmaint.texi
  10
  11 @iftex
  12 @finalout
  13 @end iftex
  14
  15 @dircategory GNU organization
  16 @direntry
  17 * Maintaining Findutils: (find-maint).        Maintaining GNU findutils
  18 @end direntry
  19
  20 @copying
  21 This manual explains how GNU findutils is maintained, how changes should
  22 be made and tested, and what resources exist to help developers.
  23
  24 This is edition @value{EDITION}, for findutils version @value{VERSION}.
  25
  26 Copyright @copyright{} 2007 Free Software Foundation, Inc.
  27
  28 Permission is granted to copy, distribute and/or modify this document
  29 under the terms of the GNU Free Documentation License, Version 1.2
  30 or any later version published by the Free Software Foundation;
  31 with no Invariant Sections, with no
  32 Front-Cover Texts, and with no Back-Cover Texts.
  33 A copy of the license is included in the section entitled ``GNU
  34 Free Documentation License''.
  35 @end copying
  36
  37 @titlepage
  38 @title Maintaining Findutils
  39 @subtitle Edition @value{EDITION}, for GNU findutils version @value{VERSION}
  40 @subtitle @value{UPDATED}
  41 @author by James Youngman
  42
  43 @page
  44 @vskip 0pt plus 1filll
  45 @insertcopying{}
  46 @end titlepage
  47
  48 @contents
  49
  50 @ifnottex
  51 @node Top, Introduction, (dir), (dir)
  52 @top Maintaining GNU Findutils
  53
  54 @insertcopying
  55 @end ifnottex
  56
  57 @menu
  58 * Introduction::
  59 * Maintaining GNU Programs::
  60 * Design Issues::
  61 * Coding Conventions::
  62 * Tools::
  63 * Using the GNU Portability Library::
  64 * Documentation::
  65 * Testing::
  66 * Bugs::
  67 * Distributions::
  68 * Internationalisation::
  69 * Security::
  70 * Making Releases::
  71 * GNU Free Documentation License::
  72 @end menu
  73
  74
  75
  76
  77
  78 @node Introduction
  79 @chapter Introduction
  80
  81 This document explains how to contribute to and maintain GNU
  82 Findutils.  It concentrates on developer-specific issues.  For
  83 information about how to use the software please refer to
  84 @xref{Introduction, ,Introduction,find,The Findutils manual}.
  85
  86 This manual aims to be useful without necessarily being verbose.  It's
  87 also a recent document, so there will be a many areas in which
  88 improvements can be made.  If you find that the document misses out
  89 important information or any part of the document is be so terse as to
  90 be unuseful, please ask for help on the @email{bug-findutils@@gnu.org}
  91 mailing list.  We'll try to improve this document too.
  92
  93
  94 @node Maintaining GNU Programs
  95 @chapter Maintaining GNU Programs
  96
  97 GNU Findutils is part of the GNU Project and so there are a number of
  98 documents which set out standards for the maintenance of GNU
  99 software.
 100
 101 @table @file
 102 @item standards.texi
 103 GNU Project Coding Standards.  All changes to findutils should comply
 104 with these standards.  In some areas we go somewhat beyond the
 105 requirements of the standards, but these cases are explained in this
 106 manual.
 107 @item maintain.texi
 108 Information for Maintainers of GNU Software.  This document provides
 109 guidance for GNU maintainers.  Everybody with commit access should
 110 read this document.   Everybody else is welcome to do so too, of
 111 course.
 112 @end table
 113
 114
 115
 116 @node Design Issues
 117 @chapter Design Issues
 118
 119 The findutils package is installed on many many systems, usually as a
 120 fundamental component.  The programs in the package are often used in
 121 order to successfully boot or fix the system.
 122
 123 This fact means that for findutils we bear in mind considerations that
 124 may not apply so much as for other packages.  For example, the fact
 125 that findutils is often a base component motivates us to
 126 @itemize
 127 @item Limit dependencies on libraries
 128 @item Avoid dependencies on other large packages (for example, interpreters)
 129 @item Be conservative when making changes to the 'stable' release branch
 130 @end itemize
 131
 132 All those considerations come before functionality.  Functional
 133 enhancements are still made to findutils, but these are almost
 134 exclusively introduced in the 'development' release branch, to allow
 135 extensive testing and proving.
 136
 137 Sometimes it is useful to have a priority list to provide guidance
 138 when making design trade-offs.   For findutils, that priority list is:
 139
 140 @enumerate
 141 @item Correctness
 142 @item Standards compliance
 143 @item Security
 144 @item Backward compatibility
 145 @item Performance
 146 @item Functionality
 147 @end enumerate
 148
 149 For example, we support the @code{-exec} action because POSIX
 150 compliance requires this, even though there are security problems with
 151 it and we would otherwise prefer people to use @code{-execdir}.  There
 152 are also cases where some performance is sacrificed in the name of
 153 security.  For example, the sanity checks that @code{find} performs
 154 while traversing a directory tree may slow it down.   We adopt
 155 functional changes, and functional changes are allowed to make
 156 @code{find} slower, but only if there is no detectable impact on users
 157 who don't use the feature.
 158
 159 Backward-incompatible changes do get made in order to comply with
 160 standards (for example the behaviour of @code{-perm -...} changed in
 161 order to comply with POSIX).  However, they don't get made in order to
 162 provide better ease of use; for example the semantics of @code{-size
 163 -2G} are almost always unexpected by users, but we retain the current
 164 behaviour because of backward compatibility and for its similarity to
 165 the block-rounding behaviour of @code{-size -30}.  We might introduce
 166 a change which does not have the unfortunate rounding behaviour, but
 167 we would choose another syntax (for example @code{-size '<2G'}) for
 168 this.
 169
 170 In a general sense, we try to do test-driven development of the
 171 findutils code; that is, we try to implement test cases for new
 172 features and bug fixes before modifying the code to make the test
 173 pass.  Some features of the code are tested well, but the test
 174 coverage for other features is less good.  If you are about to modify
 175 the code for a predicate and aren't sure about the test coverage, use
 176 @code{grep} on the test directories and measure the coverage with
 177 @code{gcov} or another test coverage tool.
 178
 179 Lastly, we try not to depend on having a ``working system''.  The
 180 findutils suite is used for diagnosis of problems, and this applies
 181 especially to @code{find}.  We should ensure that @code{find} still
 182 works on relatively broken systems, for example systems with damaged
 183 @file{/etc/passwd} files.  Another interesting example is the case
 184 where a system is a client of one or more unresponsive NFS servers.
 185 On such a system, if you try to stat all mount points, your program
 186 will hang indefinitely, waiting for the remote NFS server to respond.
 187
 188
 189
 190 @c Installed on many systems
 191 @c Often part of base
 192 @c Needs to work on broken systems (e.g. unresponsive NFS servers,
 193 @c mode-0 files)
 194
 195 @node Coding Conventions
 196 @chapter Coding Conventions
 197
 198 Coding style documents which set out to establish a uniform look and
 199 feel to source code have worthy goals, for example greater ease of
 200 maintenance and readability.  However, I do not believe that in
 201 general coding style guide authors can envisage every situation, and
 202 it is always possible that it might on occasion be necessary to break
 203 the letter of the style guide in order to honour its spirit, or to
 204 better achieve the style guide's goals.
 205
 206 I've certainly seen many style guides outside the free software world
 207 which make bald statements such as ``functions shall have exactly one
 208 return statement''.  The desire to ensure consistency and obviousness
 209 of control flow is laudable, but it is all too common for such bald
 210 requirements to be followed unthinkingly.  Certainly I've seen such
 211 coding standards result in unmaintainable code with terrible
 212 infelicities such as functions containing @code{if} statements nested
 213 nine levels deep.  I suppose such coding standards don't survive in
 214 free software projects because they tend to drive away potential
 215 contributors or tend to generate heated discussions on mailing lists.
 216 Equally, a nine-level-deep function in a free software program would
 217 quickly get refactored, assuming it is obvious what the function is
 218 supposed to do...
 219
 220 Be that as it may, the approach I will take for this document is to
 221 explain some idioms and practices in use in the findutils source code,
 222 and leave it up to the reader's engineering judgement to decide which
 223 considerations apply to the code they are working on, and whether or
 224 not there is sufficient reason to ignore the guidance in current
 225 circumstances.
 226
 227
 228 @menu
 229 * Make the Compiler Find the Bugs::
 230 * The File System Is Being Modified::
 231 * Don't Trust the File System Contents::
 232 * Debugging is For Users Too::
 233 * Factor Out Repeated Code::
 234 @end menu
 235
 236 @node    Make the Compiler Find the Bugs
 237 @section Make the Compiler Find the Bugs
 238
 239 Finding bugs is tedious.  If I have a filesystem containing two
 240 million files, and a find command line should print one million of
 241 them, but in fact it misses out 1%, you can tell the program is
 242 printing the wrong result only if you know the right answer for that
 243 filesystem at that time.  If you don't know this, you may just not
 244 find out about that bug.  For this reason it is important to have a
 245 comprehensive test suite.
 246
 247 The test suite is of course not the only way to find the bugs.  The
 248 findutils source code makes liberal use of the assert macro.  While on
 249 the one hand these might be a performance drain, the performance
 250 impact of most of these is negligible compared to the time taken to
 251 fetch even one sector from a disk drive.
 252
 253 Assertions should not be used to check the results of operations which
 254 may be affected by the program's external environment.  For example,
 255 never assert that a file could be opened successfully.  Errors
 256 relating to problems with the program's execution environment should
 257 be diagnosed with a user-oriented error message.  An assertion failure
 258 should always denote a bug in the program.
 259
 260 Several programs in the findutils suite perform self-checks.  See for
 261 example the function @code{pred_sanity_check} in @file{find/pred.c}.
 262 This is generally desirable.
 263
 264 There are also a number of small ways in which we can help the
 265 compiler to find the bugs for us.
 266
 267 @subsection Constants in Equality Testing
 268
 269 It's a common error to write @code{=} when @code{==} is meant.
 270 Sometimes this happens in new code and is simply due to finger
 271 trouble.  Sometimes it is the result of the inadvertent deletion of a
 272 character.  In any case, there is a subset of cases where we can
 273 persuade the compiler to generate an error message when we make this
 274 mistake; this is where the equality test is with a constant.
 275
 276 This is an example of a vulnerable piece of code.
 277
 278 @example
 279 if (x == 2)
 280  ...
 281 @end example
 282
 283 A simple typo converts the above into
 284
 285 @example
 286 if (x = 2)
 287  ...
 288 @end example
 289
 290 We've introduced a bug; the condition is always true, and the value of
 291 @code{x} has been changed.  However, a simple change to our practice
 292 would have made us immune to this problem:
 293
 294 @example
 295 if (2 == x)
 296  ...
 297 @end example
 298
 299 Usually, the Emacs keystroke @kbd{M-t} can be used to swap the operands.
 300
 301
 302 @subsection Spelling of ASCII NUL
 303
 304 Strings in C are just sequences of characters terminated by a NUL.
 305 The ASCII NUL character has the numerical value zero.  It is normally
 306 represented in C code as @samp{\0}.  Here is a typical piece of C
 307 code:
 308
 309 @example
 310 *p = '\0';
 311 @end example
 312
 313 Consider what happens if there is an unfortunate typo:
 314
 315 @example
 316 *p = '0';
 317 @end example
 318
 319 We have changed the meaning of our program and the compiler cannot
 320 diagnose this as an error.  Our string is no longer terminated.  Bad
 321 things will probably happen.  It would be better if the compiler could
 322 help us diagnose this problem.
 323
 324 In C, the type of @code{'\0'} is in fact int, not char.  This provides
 325 us with a simple way to avoid this error.  The constant @code{0} has
 326 the same value and type as the constant @code{'\0'}.  However, it is
 327 not as vulnerable to typos.    For this reason I normally prefer to
 328 use this code:
 329
 330 @example
 331 *p = 0;
 332 @end example
 333
 334
 335 @node    Factor Out Repeated Code
 336 @section Factor Out Repeated Code
 337
 338 Repeated code imposes a greater maintenance burden and increases the
 339 exposure to bugs.  For example, if you discover that something you
 340 want to implement has some similarity with an existing piece of code,
 341 don't cut and paste it.  Instead, factor the code out.  The risk of
 342 cutting and pasting the code, particularly if you do this several
 343 times, is that you end up with several copies of the same code.
 344
 345 If the original code had a bug, you now have N places where this needs
 346 to be fixed.  It's all to easy to miss some out when trying to fix the
 347 bug.  Equally, it's quite possible that when pasting the code into
 348 some function, the pasted code was not quite adapted correctly to its
 349 new environment.  To pick a contrived example, perhaps it modifies a
 350 global variable which it that code shouldn't be touching in its new
 351 home.  Worse, perhaps it makes some unstated assumption about the
 352 nature of the input arguments which is in fact not true for the
 353 context of the now duplicated code.
 354
 355 A good example of the use of refactoring in findutils is the
 356 @code{collect_arg} function in @file{find/parser.c}.  A less clear-cut
 357 but larger example is the factoring out of code which would otherwise
 358 have been duplicated between @file{find/find.c} and
 359 @code{find/ftsfind.c}.
 360
 361 The findutils test suite is comprehensive enough that refactoring code
 362 should not generally be a daunting prospect from a testing point of
 363 view.  Nevertheless there are some areas which are only
 364 lightly-tested:
 365
 366 @enumerate
 367 @item Tests on the ages of files
 368 @item Code which deals with the values returned by operating system calls (for example handling of ENOENT)
 369 @item Code dealing with OS limits (for example, limits on path length
 370 or exec arguments)
 371 @item Code relating to features not all systems have (for example
 372 Solaris Doors)
 373 @end enumerate
 374
 375 Please exercise caution when working in those areas.
 376
 377
 378 @node    Debugging is For Users Too
 379 @section Debugging is For Users Too
 380
 381 Debug and diagnostic code is often used to verify that a program is
 382 working in the way its author thinks it should be.  But users are
 383 often uncertain about what a program is doing, too.  Exposing them a
 384 little more diagnostic information can help.  Much of the diagnostic
 385 code in @code{find}, for example, is controlled by the @samp{-D} flag,
 386 as opposed to C preprocessor directives.
 387
 388 Making diagnostic messages available to users also means that the
 389 phrasing of the diagnostic messages becomes important, too.
 390
 391
 392 @node    Don't Trust the File System Contents
 393 @section Don't Trust the File System Contents
 394
 395 People use @code{find} to search in directories created by other
 396 people.  Sometimes they do this to check to suspicious activity (for
 397 example to look for new setuid binaries).  This means that it would be
 398 bad if @code{find} were vulnerable to, say, a security problem
 399 exploitable by constructing a specially-crafted filename.  The same
 400 consideration would apply to @code{locate} and @code{updatedb}.
 401
 402 Henry Spencer said this well in his fifth commandment:
 403 @quotation
 404 Thou shalt check the array bounds of all strings (indeed, all arrays),
 405 for surely where thou typest @samp{foo} someone someday shall type
 406 @samp{supercalifragilisticexpialidocious}.
 407 @end quotation
 408
 409 Symbolic links can often be a problem.  If @code{find} calls
 410 @code{lstat} on something and discovers that it is a directory, it's
 411 normal for @code{find} to recurse into it.  Even if the @code{chdir}
 412 system call is used immediately, there is still a window of
 413 opportunity between the @code{lstat} and the @code{chdir} in which a
 414 malicious person could rename the directory and substitute a symbolic
 415 link to some other directory.
 416
 417 @node    The File System Is Being Modified
 418 @section The File System Is Being Modified
 419
 420 The filesystem gets modified while you are traversing it.  For,
 421 example, it's normal for files to get deleted while @code{find} is
 422 traversing a directory.  Issuing an error message seems helpful when a
 423 file is deleted from the one directory you are interested in, but if
 424 @code{find} is searching 15000 directories, such a message becomes
 425 less helpful.
 426
 427 Bear in mind also that it is possible for the directory @code{find} is
 428 currently searching could be moved to another point in the filesystem,
 429 and that the directory in which @code{find} was started could be
 430 deleted.
 431
 432 Henry Spencer's sixth commandment is also apposite here:
 433 @quotation
 434 If a function be advertised to return an error code in the event of
 435 difficulties, thou shalt check for that code, yea, even though the
 436 checks triple the size of thy code and produce aches in thy typing
 437 fingers, for if thou thinkest ``it cannot happen to me'', the gods
 438 shall surely punish thee for thy arrogance.
 439 @end quotation
 440
 441 There are a lot of files out there.  They come in all dates and
 442 sizes.  There is a condition out there in the real world to exercise
 443 every bit of the code base.  So we try to test that code base before
 444 someone falls over a bug.
 445
 446
 447 @node Tools
 448 @chapter Tools
 449 Most of the tools required to build findutils are mentioned in the
 450 file @file{README-CVS}.  We also use some other tools:
 451
 452 @table @asis
 453 @item System call traces
 454 Much of the execution time of find is spent waiting for filesystem
 455 operations.  A system call trace (for example, that provided by
 456 @code{strace}) shows what system calls are being made.   Using this
 457 information we can work to remove unnecessary file system operations.
 458
 459 @item Valgrind
 460 Valgrind is a tool which dynamically verifies the memory accesses a
 461 program makes to ensure that they are valid (for example, that the
 462 behaviour of the program does not in any way depend on the contents of
 463 uninitialised memory).
 464
 465 @item DejaGnu
 466 DejaGnu is the test framework used to run the findutils test suite
 467 (the @code{runtest} program is part of DejaGnu).  It would be ideal if
 468 everybody building @code{findutils} also ran the test suite, but many
 469 people don't have DejaGnu installed.  When changes are made to
 470 findutils, DejaGnu is invoked a lot. @xref{Testing}, for more
 471 information.
 472 @end table
 473
 474 @node Using the GNU Portability Library
 475 @chapter Using the GNU Portability Library
 476 The Gnulib library (@url{http://www.gnu.org/software/gnulib/}) makes a
 477 variety of systems look more like a GNU/Linux system and also applies
 478 a bunch of automatic bug fixes and workarounds.  Some of these also
 479 apply to GNU/Linux systems too.  For example, the Gnulib regex
 480 implementation is used when we determine that we are building on a
 481 GNU libc system with a bug in the regex implementation.
 482
 483
 484 @section How and Why we Import the Gnulib Code
 485 Gnulib does not have a release process which results in a source
 486 tarball you can download.  Instead, the code is simply made available
 487 by CVS.
 488
 489 GNU projects vary in how they interact with Gnulib.  Many import a
 490 selection of code from Gnulib into the working directory and then
 491 check the updated files into the CVS repository for their project.
 492 The coreutils project does this, for example.
 493
 494 At the last maintainer changeover for findutils (2003) it turned out
 495 that there was a lot of material in findutils in common with Gnulib,
 496 but it had not been updated in a long time.  It was difficult to
 497 figure out which source files were intended to track external sources
 498 and which were intended to contain incompatible changes, or diverge
 499 for other reasons.
 500
 501 To reduce this uncertainty, I decided to treat Gnulib much like
 502 Automake.  Files supplied by Automake are simply absent from the
 503 findutils source tree.  When Automake is run with @code{automake
 504 --add-missing --copy}, it adds in all the files it thinks should be
 505 there which aren't there already.
 506
 507 An analogous approach is taken with Gnulib.  The Gnulib code is
 508 imported from the CVS repository for Gnulib with a findutils helper
 509 script, @code{import-gnulib.sh}.  That script fetches a copy of the
 510 Gnulib code into the subdirectory @file{gnulib-cvs} and then runs
 511 @code{gnulib-tool}.  The @code{gnulib-tool} program copies the
 512 required parts of Gnulib into the findutils source tree in the
 513 subdirectory @file{gnulib}.  This process gives us the property that
 514 the code in @file{gnulib} and @code{gnulib-cvs} is not included in the
 515 findutils CVS tree.   Both directories are listed in @file{.cvsignore}
 516 and so CVS ignores them.
 517
 518 Findutils does not use all the Gnulib code.  The modules we need are
 519 listed in the file @file{import-gnulib.config}.  The same file also
 520 indicates the version of Gnulib that we want to use.  Since Gnulib has
 521 no actual release process, we just use a date.  Both
 522 @file{import-gnulib.sh} and @file{import-gnulib.config} are in the
 523 findutils CVS repository.
 524
 525 The upshot of all this is that we can use the findutils CVS repository
 526 to track which version of Gnulib every findutils release uses.  That
 527 information is also provided when the user invokes a findutils program
 528 with the @samp{--version} option.  It also means that if a file exists
 529 in the Findutils CVS repository, you can be certain that the file
 530 exists in the CVS repository and is different from a similar file
 531 elsewhere, it's for a reason.
 532
 533 There are a small number of exceptions to this; the standard
 534 boiler-plate GNU files such as @file{ABOUT-NLS}, @file{INSTALL} and
 535 @file{COPYING}.
 536
 537
 538 @section How We Fix Gnulib Bugs
 539 If we always directly import the Gnulib code directly from the CVS
 540 repository in this way, it is impossible to maintain a locally
 541 different copy of Gnulib.  This is often a benefit in that accidental
 542 version skew is prevented.
 543
 544 However, sometimes we want deliberate version skew in order to use a
 545 findutils-specific patched version of a Gnulib file, for example
 546 because we fixed a bug.
 547
 548 Gnulib is used by quite a number of GNU projects, and this means that
 549 it gets plenty of testing.  Therefore there are relatively few bugs in
 550 the Gnulib code, but it does happen from time to time.
 551
 552 However, since there is no waiting around for a Gnulib source release
 553 tarball, Gnulib bugs are generally fixed quickly.  Here is an outline
 554 of the way we would contribute a fix to Gnulib (assuming you know it
 555 is not already fixed in current Gnulib CVS):
 556
 557 @table @asis
 558 @item Check you already completed a copyright assignment for Gnulib
 559 @item Begin with a vanilla CVS tree
 560 Download the Findutils source code from CVS (or use the tree you have
 561 already)
 562 @item Check out a copy of the Gnulib source
 563 An easy way to do this is to simply use @code{cp -ar} on the
 564 @file{gnulib-cvs} directory.   Have the Gnulib code checked out
 565 somewhere @emph{outside} your working CVS tree for findutils.
 566 @item Import Gnulib from your local copy
 567 The @code{import-gnulib.sh} tool has a @samp{-d} option which you can
 568 use to import the code from a local copy of Gnulib.
 569 @item Build findutils
 570 Build findutils and run the test suite, which should pass.  In our
 571 example we assume you have just noticed a bug in Gnulib, not that
 572 recent Gnulib changes broke the findutils regression tests.
 573 @item Write a test case
 574 If in fact Gnulib did break the findutils regression tests, you can probably
 575 skip this step, since you already have a test case demonstrating the problem.
 576 Otherwise, write a findutils test case for the bug and/or a Gnulib test case.
 577 @item Fix the Gnulib bug
 578 Make sure your editor follows symbolic links so that your changes to
 579 @file{gnulib/...} actually affect the files in the CVS working
 580 directory you checked out earlier.   Observe that your test now passes.
 581 @item Prepare a Gnulib patch
 582 Use @code{cvs -z3 diff -upN} to prepare the patch.  Write a ChangeLog
 583 entry and prepend this to the patch.  Check that the patch conforms
 584 with the GNU coding standards, and email it to the Gnulib mailing
 585 list.
 586 @item Wait for the patch to be applied
 587 Once your bug fix has been applied, you can update your local directory
 588 from CVS, re-import the code into Findutils (still using the @code{-d}
 589 option), and re-run the tests.  This verifies that the fix the Gnulib
 590 team made actually fixes your problem.
 591 @item Reimport the Gnulib code
 592 Update the findutils file @file{import-gnulib.config} to specify a
 593 date which is after the point at which the bug fix was committed to
 594 Gnulib.  Finally, re-import the Gnulib code directly from CVS by using
 595 @samp{import-gnulib.sh} without the @samp{-d} option, and run the
 596 tests again.  This verifies that there was no remaining local change
 597 that we were relying on to fix the bug.
 598
 599 Be aware of the fact that the date specified in the
 600 @file{import-gnulib.config} file selects the latest changes for the
 601 given date, so if you modify @file{import-gnulib.config} as soon as
 602 someone tells you they they checked in a bugfix and you set
 603 @var{gnulib_version} to today's date, there will be some file version
 604 instability for the rest of the day.
 605
 606 @end table
 607
 608 @node Documentation
 609 @chapter Documentation
 610
 611 The findutils CVS tree includes several different types of
 612 documentation.
 613
 614 @section User Documentation
 615 User-oriented documentation is provided as manual pages and in
 616 Texinfo.  See
 617 @ref{Introduction,,Introduction,find,The Findutils manual}.
 618
 619 Please make sure both sets of documentation are updated if you make a
 620 change to the code.  The GNU coding standards do not normally call for
 621 maintaining manual pages on the grounds of effort duplication.
 622 However, the manual page format is more convenient for quick
 623 reference, and so it's worth maintaining both types of documentation.
 624 However, the manual pages are normally rather more terse than the
 625 Texinfo documentation.  The manual pages are suitable for reference
 626 use, but the Texinfo manual should also include introductory and
 627 tutorial material.
 628
 629
 630 @section Build Guidance
 631
 632 @table @file
 633 @item ABOUT-NLS
 634 Describes the Free Translation Project, the translation status of
 635 various GNU projects, and how to participate by translating an
 636 application.
 637 @item AUTHORS
 638 Lists the authors of findutils.
 639 @item COPYING
 640 The copyright license covering findutils; currently, the GNU GPL,
 641 version 3.
 642 @item INSTALL
 643 Generic installation instructions for installing GNU programs.
 644 @item README
 645 Information about how to compile findutils in particular
 646 @item README-alpha
 647 A README file which is included with testing releases of findutils.
 648 @item README-CVS
 649 Describes how to build findutils from the code in CVS.
 650 @item THANKS
 651 Thanks for people who contributed to findutils.  Generally, if
 652 someone's contribution was significant enough to need a copyright
 653 assignment, their name should go in here.
 654 @item TODO
 655 Mainly obsolete.
 656 @end table
 657
 658
 659 @section Release Information
 660 @table @file
 661 @item NEWS
 662 Enumerates the user-visible change in each release.  Typical changes
 663 are fixed bugs, functionality changes and documentation changes.
 664 Include the date when a release is made.
 665 @item ChangeLog
 666 This file enumerates all changes to the findutils source code (with
 667 the possible exception of @file{.cvsignore} and @code{.gitignore}
 668 changes).  The level of detail used for this file should be sufficient
 669 to answer the questions ``what changed?'' and ``why was it changed?''.
 670 If a change fixes a bug, always give the bug reference number in both
 671 the @file{ChangeLog} and @file{NEWS} files and of course also in the
 672 checkin message.  In general, it should be possible to enumerate all
 673 material changes to a function by searching for its name in
 674 @file{ChangeLog}.  Mention when each release is made.
 675 @end table
 676
 677 @node Testing
 678 @chapter Testing
 679 This chapter will explain the general procedures for adding tests to
 680 the test suite, and the functions defined in the findutils-specific
 681 DejaGnu configuration.  Where appropriate references will be made to
 682 the DejaGnu documentation.
 683
 684 @node Bugs
 685 @chapter Bugs
 686
 687 Bugs are logged in the Savannah bug tracker
 688 @url{http://savannah.gnu.org/bugs/?group=findutils}.  The tracker
 689 offers several fields but their use is largely obvious.  The
 690 life-cycle of a bug is like this:
 691
 692
 693 @table @asis
 694 @item Open
 695 Someone, usually a maintainer, a distribution maintainer or a user,
 696 creates a bug by filling in the form.   They fill in field values as
 697 they see fit.  This will generate an email to
 698 @email{bug-findutils@@gnu.org}.
 699
 700 @item Triage
 701 The bug hangs around with @samp{Status=None} until someone begins to
 702 work on it.  At that point they set the ``Assigned To'' field and will
 703 sometimes set the status to @samp{In Progress}, especially if the bug
 704 will take a while to fix.
 705
 706 @item Non-bugs
 707 Quite a lot of reports are not actually bugs; for these the usual
 708 procedure is to explain why the problem is not a bug, set the status
 709 to @samp{Invalid} and close the bug.   Make sure you set the
 710 @samp{Assigned to} field to yourself before closing the bug.
 711
 712 @item Fixing
 713 When you commit a bug fix into CVS (or in the case of a contributed
 714 patch, commit the change), mark the bug as @samp{Fixed}.  Make sure
 715 you include a new test case where this is relevant.  If you can figure
 716 out which releases are affected, please also set the @samp{Release}
 717 field to the earliest release which is affected by the bug.
 718 Indicate which source branch the fix is included in (for example,
 719 4.2.x or 4.3.x).  Don't close the bug yet.
 720
 721 @item Release
 722 When a release is made which includes the bug fix, make sure the bug
 723 is listed in the NEWS file.  Once the release is made, fill in the
 724 @samp{Fixed Release} field and close the bug.
 725 @end table
 726
 727
 728 @node Distributions
 729 @chapter Distributions
 730 Almost all GNU/Linux distributions include findutils, but only some of
 731 them have a package maintainer who is a member of the mailing list.
 732 Distributions don't often feed back patches to the
 733 @email{bug-findutils@@gnu.org} list, but on the other hand many of
 734 their patches relate only to standards for file locations and so
 735 forth, and are therefore distribution specific.  On an irregular basis
 736 I check the current patches being used by one or two distributions,
 737 but the total number of GNU/Linux distributions is large enough that
 738 we could not hope to cover them all.
 739
 740 Often, bugs are raised against a distribution's bug tracker instead of
 741 GNU's.    Periodically (about every six months) I take a look at some
 742 of the more accessible bug trackers to indicate which bugs have been
 743 fixed upstream.
 744
 745 Many distributions include both findutils and the slocate package,
 746 which provides a replacement @code{locate}.
 747
 748
 749 @node Internationalisation
 750 @chapter Internationalisation
 751 Translation is essentially automated from the maintainer's point of
 752 view.  The TP mails the maintainer when a new PO file is available,
 753 and we just download it and check it in.  We copy the @file{.po} files
 754 into the CVS repository.  For more information, please see
 755 @url{http://www.iro.umontreal.ca/translation/HTML/domain-findutils.html}.
 756
 757
 758 @node Security
 759 @chapter Security
 760
 761 See @ref{Security Considerations, ,Security Considerations,find,The
 762 Findutils manual}, for a full description of the findutils approach to
 763 security considerations and discussion of particular tools.
 764
 765 If someone reports a security bug publicly, we should fix this as
 766 rapidly as possible.  If necessary, this can mean issuing a fixed
 767 release containing just the one bug fix.  We try to avoid issuing
 768 releases which include both significant security fixes and functional
 769 changes.
 770
 771 Where someone reports a security problem privately, we generally try
 772 to construct and test a patch without checking the intermediate code
 773 in.  Once everything has been tested, this allows us to commit a patch
 774 and immediately make a release.   The advantage of doing things this
 775 way is that we avoid situations where people watching for CVS commits
 776 can figure out and exploit a security problem before a fixed release
 777 is available.
 778
 779 It's important that security problems be fixed promptly, but don't
 780 rush so much that things go wrong.  Make sure the new release really
 781 fixes the problem.  It's usually best not to include functional
 782 changes in your security-fix release.
 783
 784 If the security problem is serious, send an alert to
 785 @email{vendor-sec@@lst.de}.  The members of the list include most
 786 GNU/Linux distributions.  The point of doing this is to allow them to
 787 prepare to release your security fix to their customers, once the fix
 788 becomes available.    Here is an example alert:-
 789
 790 @smallexample
 791 GNU findutils heap buffer overrun (potential privilege escalation)
 792
 793 $Revision: 1.4 $; $Date: 2007/11/26 10:28:00 $
 794
 795
 796 I. BACKGROUND
 797 =============
 798
 799 GNU findutils is a set of programs which search for files on Unix-like
 800 systems.  It is maintained by the GNU Project of the Free Software
 801 Foundation.  For more information, see
 802 @url{http://www.gnu.org/software/findutils}.
 803
 804
 805 II. DESCRIPTION
 806 ===============
 807
 808 When GNU locate reads filenames from an old-format locate database,
 809 they are read into a fixed-length buffer allocated on the heap.
 810 Filenames longer than the 1026-byte buffer can cause a buffer overrun.
 811 The overrunning data can be chosen by any person able to control the
 812 names of filenames created on the local system.  This will normally
 813 include all local users, but in many cases also remote users (for
 814 example in the case of FTP servers allowing uploads).
 815
 816 III. ANALYSIS
 817 =============
 818
 819 Findutils supports three different formats of locate database, its
 820 native format "LOCATE02", the slocate variant of LOCATE02, and a
 821 traditional ("old") format that locate uses on other Unix systems.
 822
 823 When locate reads filenames from a LOCATE02 database (the default
 824 format), the buffer into which data is read is automatically extended
 825 to accomodate the length of the filenames.
 826
 827 This automatic buffer extension does not happen for old-format
 828 databases.  Instead a 1026-byte buffer is used.  When a longer
 829 pathname appears in the locate database, the end of this buffer is
 830 overrun.  The buffer is allocated on the heap (not the stack).
 831
 832 If the locate database is in the default LOCATE02 format, the locate
 833 program does perform automatic buffer extension, and the program is
 834 not vulnerable to this problem.  The software used to build the
 835 old-format locate database is not itself vulnerable to the same
 836 attack.
 837
 838 Most installations of GNU findutils do not use the old database
 839 format, and so will not be vulnerable.
 840
 841
 842 IV. DETECTION
 843 =============
 844
 845 Software
 846 --------
 847 All existing releases of findutils are affected.
 848
 849
 850 Installations
 851 -------------
 852
 853 To discover the ongest path name on a given system, you can use the
 854 following command (requires GNU findutils and GNU coreutils):
 855
 856 @verbatim
 857 find / -print0 | tr -c '\0' 'x' | tr '\0' '\n' | wc -L
 858 @end verbatim
 859
 860 V. EXAMPLE
 861 ==========
 862
 863 This section includes a shell script which determines which of a list
 864 of locate binaries is vulnerable to the problem.  The shell script has
 865 been tested only on glibc based systems having a mktemp binary.
 866
 867 NOTE: This script deliberately overruns the buffer in order to
 868 determine if a binary is affected.  Therefore running it on your
 869 system may have undesirable effects.  We recommend that you read the
 870 script before running it.
 871
 872 @verbatim
 873 #! /bin/sh
 874 set +m
 875 if vanilla_db="$(mktemp nicedb.XXXXXX)" ; then
 876     if updatedb --prunepaths="" --old-format --localpaths="/tmp" \
 877         --output="$@{vanilla_db@}" ; then
 878         true
 879     else
 880         rm -f "$@{vanilla_db@}"
 881         vanilla_db=""
 882         echo "Failed to create old-format locate database; skipping the sanity checks" >&2
 883     fi
 884 fi
 885
 886 make_overrun_db() @{
 887     # Start with a valid database
 888     cat "$@{vanilla_db@}"
 889     # Make the final entry really long
 890     dd if=/dev/zero  bs=1 count=1500 2>/dev/null | tr '\000' 'x'
 891 @}
 892
 893
 894
 895 ulimit -c 0
 896
 897 usage() @{ echo "usage: $0 binary [binary...]" >&2; exit $1; @}
 898 [ $# -eq 0 ] && usage 1
 899
 900 bad=""
 901 good=""
 902 ugly=""
 903 if dbfile="$(mktemp nasty.XXXXXX)"
 904 then
 905     make_overrun_db > "$dbfile"
 906     for locate ; do
 907       ver="$locate = $("$locate"  --version | head -1)"
 908       if [ -z "$vanilla_db" ] || "$locate" -d "$vanilla_db" "" >/dev/null ; then
 909           "$locate" -d "$dbfile" "" >/dev/null
 910           if [ $? -gt 128 ] ; then
 911               bad="$bad
 912 vulnerable: $ver"
 913           else
 914               good="$good
 915 good: $ver"
 916           fi
 917        else
 918           # the regular locate failed
 919           ugly="$ugly
 920 buggy, may or may not be vulnerable: $ver"
 921        fi
 922     done
 923     rm -f "$@{dbfile@}" "$@{vanilla_db@}"
 924     # good: unaffected.  bad: affected (vulnerable).
 925     # ugly: doesn't even work for a normal old-format database.
 926     echo "$good"
 927     echo "$bad"
 928     echo "$ugly"
 929 else
 930   exit 1
 931 fi
 932 @end verbatim
 933
 934
 935
 936
 937 VI. VENDOR RESPONSE
 938 ===================
 939
 940 The GNU project discovered the problem while 'locate' was being worked
 941 on; this is the first public announcement of the problem.
 942
 943 The GNU findutils mantainer has issued a patch as p[art of this
 944 announcement.  The patch appears below.
 945
 946 A source release of findutils-4.2.31 will be issued on 2007-05-30.
 947 That release will of course include the patch.  The patch will be
 948 committed to the public CVS repository at the same time.  Public
 949 announcements of the release, including a description of the bug, will
 950 be made at the same time as the release.
 951
 952 A release of findutils-4.3.x will follow and will also include the
 953 patch.
 954
 955
 956 VII. PATCH
 957 ==========
 958
 959 This patch should apply to findutils-4.2.23 and later.
 960 Findutils-4.2.23 was released almost two years ago.
 961 @verbatim
 962 Index: locate/locate.c
 963 ===================================================================
 964 RCS file: /cvsroot/findutils/findutils/locate/locate.c,v
 965 retrieving revision 1.58.2.2
 966 diff -u -p -r1.58.2.2 locate.c
 967 --- locate/locate.c     22 Apr 2007 16:57:42 -0000      1.58.2.2
 968 +++ locate/locate.c     28 May 2007 10:18:16 -0000
 969 @@@@ -124,9 +124,9 @@@@ extern int errno;
 970
 971  #include "locatedb.h"
 972  #include <getline.h>
 973 -#include "../gnulib/lib/xalloc.h"
 974 -#include "../gnulib/lib/error.h"
 975 -#include "../gnulib/lib/human.h"
 976 +#include "xalloc.h"
 977 +#include "error.h"
 978 +#include "human.h"
 979  #include "dirname.h"
 980  #include "closeout.h"
 981  #include "nextelem.h"
 982 @@@@ -468,10 +468,36 @@@@ visit_justprint_unquoted(struct process_
 983    return VISIT_CONTINUE;
 984  @}
 985
 986 +static void
 987 +toolong (struct process_data *procdata)
 988 +@{
 989 +  error (1, 0,
 990 +        _("locate database %s contains a "
 991 +          "filename longer than locate can handle"),
 992 +        procdata->dbfile);
 993 +@}
 994 +
 995 +static void
 996 +extend (struct process_data *procdata, size_t siz1, size_t siz2)
 997 +@{
 998 +  /* Figure out if the addition operation is safe before performing it. */
 999 +  if (SIZE_MAX - siz1 < siz2)
1000 +    @{
1001 +      toolong (procdata);
1002 +    @}
1003 +  else if (procdata->pathsize < (siz1+siz2))
1004 +    @{
1005 +      procdata->pathsize = siz1+siz2;
1006 +      procdata->original_filename = x2nrealloc (procdata->original_filename,
1007 +                                               &procdata->pathsize,
1008 +                                               1);
1009 +    @}
1010 +@}
1011 +
1012  static int
1013  visit_old_format(struct process_data *procdata, void *context)
1014  @{
1015 -  register char *s;
1016 +  register size_t i;
1017    (void) context;
1018
1019    /* Get the offset in the path where this path info starts.  */
1020 @@@@ -479,20 +505,35 @@@@ visit_old_format(struct process_data *pr
1021      procdata->count += getw (procdata->fp) - LOCATEDB_OLD_OFFSET;
1022    else
1023      procdata->count += procdata->c - LOCATEDB_OLD_OFFSET;
1024 +  assert(procdata->count > 0);
1025
1026 -  /* Overlay the old path with the remainder of the new.  */
1027 -  for (s = procdata->original_filename + procdata->count;
1028 +  /* Overlay the old path with the remainder of the new.  Read
1029 +   * more data until we get to the next filename.
1030 +   */
1031 +  for (i=procdata->count;
1032         (procdata->c = getc (procdata->fp)) > LOCATEDB_OLD_ESCAPE;)
1033 -    if (procdata->c < 0200)
1034 -      *s++ = procdata->c;              /* An ordinary character.  */
1035 -    else
1036 -      @{
1037 -       /* Bigram markers have the high bit set. */
1038 -       procdata->c &= 0177;
1039 -       *s++ = procdata->bigram1[procdata->c];
1040 -       *s++ = procdata->bigram2[procdata->c];
1041 -      @}
1042 -  *s-- = '\0';
1043 +    @{
1044 +      if (procdata->c < 0200)
1045 +       @{
1046 +         /* An ordinary character. */
1047 +         extend (procdata, i, 1u);
1048 +         procdata->original_filename[i++] = procdata->c;
1049 +       @}
1050 +      else
1051 +       @{
1052 +         /* Bigram markers have the high bit set. */
1053 +         extend (procdata, i, 2u);
1054 +         procdata->c &= 0177;
1055 +         procdata->original_filename[i++] = procdata->bigram1[procdata->c];
1056 +         procdata->original_filename[i++] = procdata->bigram2[procdata->c];
1057 +       @}
1058 +    @}
1059 +
1060 +  /* Consider the case where we executed the loop body zero times; we
1061 +   * still need space for the terminating null byte.
1062 +   */
1063 +  extend (procdata, i, 1u);
1064 +  procdata->original_filename[i] = 0;
1065
1066    procdata->munged_filename = procdata->original_filename;
1067 @end verbatim
1068
1069
1070 VIII. THANKS
1071 ============
1072
1073 Thanks to Rob Holland <rob@@inversepath.com> and Tavis Ormandy.
1074
1075
1076 VIII. CVE INFORMATION
1077 =====================
1078
1079 No CVE candidate number has yet been assigned for this vulnerability.
1080 If someone provides one, I will include it in the public announcement
1081 and change logs.
1082 @end smallexample
1083
1084 The original announcement above was sent out with a cleartext PGP
1085 signature, of course, but that has been omitted from the example.
1086
1087 Once a fixed release is available, announce the new release using the
1088 normal channels.  Any CVE number assigned for the problem should be
1089 included in the @file{ChangeLog} and @file{NEWS} entries. See
1090 @url{http://cve.mitre.org/} for an explanation of CVE numbers.
1091
1092
1093
1094 @node Making Releases
1095 @chapter Making Releases
1096 This section will explain how to make a findutils release.   For the
1097 time being here is a terse description of the main steps:
1098
1099 @enumerate
1100 @item Commit changes; make sure your working directory has no
1101 uncommitted changes.
1102 @item Test; make sure that all changes you have made have tests, and
1103 that the tests pass.  Verify this with @code{make distcheck}.
1104 @item Bugs; make sure all Savannah bug entries fixed in this release
1105 are fixed.
1106 @item NEWS; make sure that the NEWS and configure.in file are updated
1107 with the new release number (and checked in).
1108 @item Build the release tarball; do this with @code{make distcheck}.
1109 Copy the tarball somewhere safe.
1110 @item Tag the release; findutils releases are tagged in CVS as
1111 FINDUTILS_x_y_z-1.  For example, the tag for findutils release 4.3.8
1112 is FINDUTILS_4_3_8-1.
1113 @item Prepare the upload and upload it.
1114 @xref{Automated FTP Uploads, ,Automated FTP
1115 Uploads, maintain, Information for Maintainers of GNU Software},
1116 for detailed upload instructions.
1117 @item Make a release announcement; include an extract from the NEWS
1118 file which explains what's changed.  Announcements for test releases
1119 should just go to @email{bug-findutils@@gnu.org}.  Announcements for
1120 stable releases should go to @email{info-gnu@@gnu.org} as well.
1121 @item Bump the release numbers in CVS; edit the @file{configure.in}
1122 and @file{NEWS} files to advance the release numbers.   For example,
1123 if you have just released @samp{4.6.2}, bump the release number to
1124 @samp{4.6.3-CVS}.  The point of the @samp{-CVS} suffix here is that a
1125 findutils binary built from CVS will bear a release number indicating
1126 it's not built from the the ``official'' source release.
1127 @item Close bugs; any bugs recorded on Savannah which were fixed in this
1128 release should now be marked as closed.   Update the @samp{Fixed
1129 Release} field of these bugs appropriately and make sure the
1130 @samp{Assigned to} field is populated.
1131 @end enumerate
1132
1133
1134 @node GNU Free Documentation License
1135 @appendix GNU Free Documentation License
1136 @include fdl.texi
1137
1138 @bye
1139
1140 @comment texi related words used by Emacs' spell checker ispell.el
1141
1142 @comment LocalWords: texinfo setfilename settitle setchapternewpage
1143 @comment LocalWords: iftex finalout ifinfo DIR titlepage vskip pt
1144 @comment LocalWords: filll dir samp dfn noindent xref pxref
1145 @comment LocalWords: var deffn texi deffnx itemx emph asis
1146 @comment LocalWords: findex smallexample subsubsection cindex
1147 @comment LocalWords: dircategory direntry itemize
1148
1149 @comment other words used by Emacs' spell checker ispell.el
1150 @comment LocalWords: README fred updatedb xargs Plett Rendell akefile
1151 @comment LocalWords: args grep Filesystems fo foo fOo wildcards iname
1152 @comment LocalWords: ipath regex iregex expr fubar regexps
1153 @comment LocalWords: metacharacters macs sr sc inode lname ilname
1154 @comment LocalWords: sysdep noleaf ls inum xdev filesystems usr atime
1155 @comment LocalWords: ctime mtime amin cmin mmin al daystart Sladkey rm
1156 @comment LocalWords: anewer cnewer bckw rf xtype uname gname uid gid
1157 @comment LocalWords: nouser nogroup chown chgrp perm ch maxdepth
1158 @comment LocalWords: mindepth cpio src CD AFS statted stat fstype ufs
1159 @comment LocalWords: nfs tmp mfs printf fprint dils rw djm Nov lwall
1160 @comment LocalWords: POSIXLY fls fprintf strftime locale's EDT GMT AP
1161 @comment LocalWords: EST diff perl backquotes sprintf Falstad Oct cron
1162 @comment LocalWords: eg vmunix mkdir afs allexec allwrite ARG bigram
1163 @comment LocalWords: bigrams cd chmod comp crc CVS dbfile dum eof
1164 @comment LocalWords: fileserver filesystem fn frcode Ghazi Hnewc iXX
1165 @comment LocalWords: joeuser Kaveh localpaths localuser LOGNAME
1166 @comment LocalWords: Meyering mv netpaths netuser nonblank nonblanks
1167 @comment LocalWords: ois ok Pinard printindex proc procs prunefs
1168 @comment LocalWords: prunepaths pwd RFS rmadillo rmdir rsh sbins str
1169 @comment LocalWords: su Timar ubins ug unstripped vf VM Weitzel
1170 @comment LocalWords: wildcard zlogout basename execdir wholename iwholename
1171 @comment LocalWords: timestamp timestamps Solaris FreeBSD OpenBSD POSIX