doc/find-maint.texi

   1 \input texinfo @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename find-maint.info
   4 @settitle Maintaining Findutils
   5 @c For double-sided printing, uncomment:
   6 @c @setchapternewpage odd
   7 @c %**end of header
   8
   9 @include versionmaint.texi
  10
  11 @iftex
  12 @finalout
  13 @end iftex
  14
  15 @dircategory GNU organization
  16 @direntry
  17 * Maintaining Findutils: (find-maint).        Maintaining GNU findutils
  18 @end direntry
  19
  20 @copying
  21 This manual explains how GNU findutils is maintained, how changes should
  22 be made and tested, and what resources exist to help developers.
  23
  24 This is edition @value{EDITION}, for findutils version @value{VERSION}.
  25
  26 Copyright @copyright{} 2007 Free Software Foundation, Inc.
  27
  28 Permission is granted to copy, distribute and/or modify this document
  29 under the terms of the GNU Free Documentation License, Version 1.2
  30 or any later version published by the Free Software Foundation;
  31 with no Invariant Sections, with no
  32 Front-Cover Texts, and with no Back-Cover Texts.
  33 A copy of the license is included in the section entitled ``GNU
  34 Free Documentation License''.
  35 @end copying
  36
  37 @titlepage
  38 @title Maintaining Findutils
  39 @subtitle Edition @value{EDITION}, for GNU findutils version @value{VERSION}
  40 @subtitle @value{UPDATED}
  41 @author by James Youngman
  42
  43 @page
  44 @vskip 0pt plus 1filll
  45 @insertcopying{}
  46 @end titlepage
  47
  48 @contents
  49
  50 @ifnottex
  51 @node Top, Introduction, (dir), (dir)
  52 @top Maintaining GNU Findutils
  53
  54 @insertcopying
  55 @end ifnottex
  56
  57 @menu
  58 * Introduction::
  59 * Maintaining GNU Programs::
  60 * Design Issues::
  61 * Coding Conventions::
  62 * Tools::
  63 * Using the GNU Portability Library::
  64 * Documentation::
  65 * Testing::
  66 * Bugs::
  67 * Distributions::
  68 * Internationalisation::
  69 * Security::
  70 * Making Releases::
  71 @end menu
  72
  73
  74
  75
  76
  77 @node Introduction
  78 @chapter Introduction
  79
  80 This document explains how to contribute to and maintain GNU
  81 Findutils.  It concentrates on developer-specific issues.  For
  82 information about how to use the software please refer to
  83 @xref{Introduction, ,Introduction,find,The Findutils manual}.
  84
  85 This manual aims to be useful without necessarily being verbose.  It's
  86 also a recent document, so there will be a many areas in which
  87 improvements can be made.  If you find that the document misses out
  88 important information or any part of the document is be so terse as to
  89 be unuseful, please ask for help on the @email{bug-findutils@@gnu.org}
  90 mailing list.  We'll try to improve this document too.
  91
  92
  93 @node Maintaining GNU Programs
  94 @chapter Maintaining GNU Programs
  95
  96 GNU Findutils is part of the GNU Project and so there are a number of
  97 documents which set out standards for the maintenance of GNU
  98 software.
  99
 100 @table @file
 101 @item standards.texi
 102 GNU Project Coding Standards.  All changes to findutils should comply
 103 with these standards.  In some areas we go somewhat beyond the
 104 requirements of the standards, but these cases are explained in this
 105 manual.
 106 @item maintain.texi
 107 Information for Maintainers of GNU Software.  This document provides
 108 guidance for GNU maintainers.  Everybody with commit access should
 109 read this document.   Everybody else is welcome to do so too, of
 110 course.
 111 @end table
 112
 113
 114
 115 @node Design Issues
 116 @chapter Design Issues
 117
 118 The findutils package is installed on many many systems, usually as a
 119 fundamental component.  The programs in the package are often used in
 120 order to successfully boot or fix the system.
 121
 122 This fact means that for findutils we bear in mind considerations that
 123 may not apply so much as for other packages.  For example, the fact
 124 that findutils is often a base component motivates us to
 125 @itemize
 126 @item Limit dependencies on libraries
 127 @item Avoid dependencies on other large packages (for example, interpreters)
 128 @item Be conservative when making changes to the 'stable' release branch
 129 @end itemize
 130
 131 All those considerations come before functionality.  Functional
 132 enhancements are still made to findutils, but these are almost
 133 exclusively introduced in the 'development' release branch, to allow
 134 extensive testing and proving.
 135
 136 Sometimes it is useful to have a priority list to provide guidance
 137 when making design trade-offs.   For findutils, that priority list is:
 138
 139 @enumerate
 140 @item Correctness
 141 @item Standards compliance
 142 @item Security
 143 @item Backward compatibility
 144 @item Performance
 145 @item Functionality
 146 @end enumerate
 147
 148 For example, we support the @code{-exec} action because POSIX
 149 compliance requires this, even though there are security problems with
 150 it and we would otherwise prefer people to use @code{-execdir}.  There
 151 are also cases where some performance is sacrificed in the name of
 152 security.  For example, the sanity checks that @code{find} performs
 153 while traversing a directory tree may slow it down.   We adopt
 154 functional changes, and functional changes are allowed to make
 155 @code{find} slower, but only if there is no detectable impact on users
 156 who don't use the feature.
 157
 158 Backward-incompatible changes do get made in order to comply with
 159 standards (for example the behaviour of @code{-perm -...} changed in
 160 order to comply with POSIX).  However, they don't get made in order to
 161 provide better ease of use; for example the semantics of @code{-size
 162 -2G} are almost always unexpected by users, but we retain the current
 163 behaviour because of backward compatibility and for its similarity to
 164 the block-rounding behaviour of @code{-size -30}.  We might introduce
 165 a change which does not have the unfortunate rounding behaviour, but
 166 we would choose another syntax (for example @code{-size '<2G'}) for
 167 this.
 168
 169 In a general sense, we try to do test-driven development of the
 170 findutils code; that is, we try to implement test cases for new
 171 features and bug fixes before modifying the code to make the test
 172 pass.  Some features of the code are tested well, but the test
 173 coverage for other features is less good.  If you are about to modify
 174 the code for a predicate and aren't sure about the test coverage, use
 175 @code{grep} on the test directories and measure the coverage with
 176 @code{gcov} or another test coverage tool.
 177
 178 Lastly, we try not to depend on having a ``working system''.  The
 179 findutils suite is used for diagnosis of problems, and this applies
 180 especially to @code{find}.  We should ensure that @code{find} still
 181 works on relatively broken systems, for example systems with damaged
 182 @file{/etc/passwd} files.  Another interesting example is the case
 183 where a system is a client of one or more unresponsive NFS servers.
 184 On such a system, if you try to stat all mount points, your program
 185 will hang indefinitely, waiting for the remote NFS server to respond.
 186
 187
 188
 189 @c Installed on many systems
 190 @c Often part of base
 191 @c Needs to work on broken systems (e.g. unresponsive NFS servers,
 192 @c mode-0 files)
 193
 194 @node Coding Conventions
 195 @chapter Coding Conventions
 196
 197 Coding style documents which set out to establish a uniform look and
 198 feel to source code have worthy goals, for example greater ease of
 199 maintenance and readability.  However, I do not believe that in
 200 general coding style guide authors can envisage every situation, and
 201 it is always possible that it might on occasion be necessary to break
 202 the letter of the style guide in order to honour its spirit, or to
 203 better achieve the style guide's goals.
 204
 205 I've certainly seen many style guides outside the free software world
 206 which make bald statements such as ``functions shall have exactly one
 207 return statement''.  The desire to ensure consistency and obviousness
 208 of control flow is laudable, but it is all too common for such bald
 209 requirements to be followed unthinkingly.  Certainly I've seen such
 210 coding standards result in unmaintainable code with terrible
 211 infelicities such as functions containing @code{if} statements nested
 212 nine levels deep.  I suppose such coding standards don't survive in
 213 free software projects because they tend to drive away potential
 214 contributors or tend to generate heated discussions on mailing lists.
 215 Equally, a nine-level-deep function in a free software program would
 216 quickly get refactored, assuming it is obvious what the function is
 217 supposed to do...
 218
 219 Be that as it may, the approach I will take for this document is to
 220 explain some idioms and practices in use in the findutils source code,
 221 and leave it up to the reader's engineering judgement to decide which
 222 considerations apply to the code they are working on, and whether or
 223 not there is sufficient reason to ignore the guidance in current
 224 circumstances.
 225
 226
 227 @menu
 228 * Make the Compiler Find the Bugs::
 229 * The File System Is Being Modified::
 230 * Don't Trust the File System Contents::
 231 * Debugging is For Users Too::
 232 * Factor Out Repeated Code::
 233 @end menu
 234
 235 @node    Make the Compiler Find the Bugs
 236 @section Make the Compiler Find the Bugs
 237
 238 Finding bugs is tedious.  If I have a filesystem containing two
 239 million files, and a find command line should print one million of
 240 them, but in fact it misses out 1%, you can tell the program is
 241 printing the wrong result only if you know the right answer for that
 242 filesystem at that time.  If you don't know this, you may just not
 243 find out about that bug.  For this reason it is important to have a
 244 comprehensive test suite.
 245
 246 The test suite is of course not the only way to find the bugs.  The
 247 findutils source code makes liberal use of the assert macro.  While on
 248 the one hand these might be a performance drain, the performance
 249 impact of most of these is negligible compared to the time taken to
 250 fetch even one sector from a disk drive.
 251
 252 Assertions should not be used to check the results of operations which
 253 may be affected by the program's external environment.  For example,
 254 never assert that a file could be opened successfully.  Errors
 255 relating to problems with the program's execution environment should
 256 be diagnosed with a user-oriented error message.  An assertion failure
 257 should always denote a bug in the program.
 258
 259 Several programs in the findutils suite perform self-checks.  See for
 260 example the function @code{pred_sanity_check} in @file{find/pred.c}.
 261 This is generally desirable.
 262
 263 There are also a number of small ways in which we can help the
 264 compiler to find the bugs for us.
 265
 266 @subsection Constants in Equality Testing
 267
 268 It's a common error to write @code{=} when @code{==} is meant.
 269 Sometimes this happens in new code and is simply due to finger
 270 trouble.  Sometimes it is the result of the inadvertent deletion of a
 271 character.  In any case, there is a subset of cases where we can
 272 persuade the compiler to generate an error message when we make this
 273 mistake; this is where the equality test is with a constant.
 274
 275 This is an example of a vulnerable piece of code.
 276
 277 @example
 278 if (x == 2)
 279  ...
 280 @end example
 281
 282 A simple typo converts the above into
 283
 284 @example
 285 if (x = 2)
 286  ...
 287 @end example
 288
 289 We've introduced a bug; the condition is always true, and the value of
 290 @code{x} has been changed.  However, a simple change to our practice
 291 would have made us immune to this problem:
 292
 293 @example
 294 if (2 == x)
 295  ...
 296 @end example
 297
 298 Usually, the Emacs keystroke @kbd{M-t} can be used to swap the operands.
 299
 300
 301 @subsection Spelling of ASCII NUL
 302
 303 Strings in C are just sequences of characters terminated by a NUL.
 304 The ASCII NUL character has the numerical value zero.  It is normally
 305 represented in C code as @samp{\0}.  Here is a typical piece of C
 306 code:
 307
 308 @example
 309 *p = '\0';
 310 @end example
 311
 312 Consider what happens if there is an unfortunate typo:
 313
 314 @example
 315 *p = '0';
 316 @end example
 317
 318 We have changed the meaning of our program and the compiler cannot
 319 diagnose this as an error.  Our string is no longer terminated.  Bad
 320 things will probably happen.  It would be better if the compiler could
 321 help us diagnose this problem.
 322
 323 In C, the type of @code{'\0'} is in fact int, not char.  This provides
 324 us with a simple way to avoid this error.  The constant @code{0} has
 325 the same value and type as the constant @code{'\0'}.  However, it is
 326 not as vulnerable to typos.    For this reason I normally prefer to
 327 use this code:
 328
 329 @example
 330 *p = 0;
 331 @end example
 332
 333
 334 @node    Factor Out Repeated Code
 335 @section Factor Out Repeated Code
 336
 337 Repeated code imposes a greater maintenance burden and increases the
 338 exposure to bugs.  For example, if you discover that something you
 339 want to implement has some similarity with an existing piece of code,
 340 don't cut and paste it.  Instead, factor the code out.  The risk of
 341 cutting and pasting the code, particularly if you do this several
 342 times, is that you end up with several copies of the same code.
 343
 344 If the original code had a bug, you now have N places where this needs
 345 to be fixed.  It's all to easy to miss some out when trying to fix the
 346 bug.  Equally, it's quite possible that when pasting the code into
 347 some function, the pasted code was not quite adapted correctly to its
 348 new environment.  To pick a contrived example, perhaps it modifies a
 349 global variable which it that code shouldn't be touching in its new
 350 home.  Worse, perhaps it makes some unstated assumption about the
 351 nature of the input arguments which is in fact not true for the
 352 context of the now duplicated code.
 353
 354 A good example of the use of refactoring in findutils is the
 355 @code{collect_arg} function in @file{find/parser.c}.  A less clear-cut
 356 but larger example is the factoring out of code which would otherwise
 357 have been duplicated between @file{find/find.c} and
 358 @code{find/ftsfind.c}.
 359
 360 The findutils test suite is comprehensive enough that refactoring code
 361 should not generally be a daunting prospect from a testing point of
 362 view.  Nevertheless there are some areas which are only
 363 lightly-tested:
 364
 365 @enumerate
 366 @item Tests on the ages of files
 367 @item Code which deals with the values returned by operating system calls (for example handling of ENOENT)
 368 @item Code dealing with OS limits (for example, limits on path length
 369 or exec arguments)
 370 @item Code relating to features not all systems have (for example
 371 Solaris Doors)
 372 @end enumerate
 373
 374 Please exercise caution when working in those areas.
 375
 376
 377 @node    Debugging is For Users Too
 378 @section Debugging is For Users Too
 379
 380 Debug and diagnostic code is often used to verify that a program is
 381 working in the way its author thinks it should be.  But users are
 382 often uncertain about what a program is doing, too.  Exposing them a
 383 little more diagnostic information can help.  Much of the diagnostic
 384 code in @code{find}, for example, is controlled by the @samp{-D} flag,
 385 as opposed to C preprocessor directives.
 386
 387 Making diagnostic messages available to users also means that the
 388 phrasing of the diagnostic messages becomes important, too.
 389
 390
 391 @node    Don't Trust the File System Contents
 392 @section Don't Trust the File System Contents
 393
 394 People use @code{find} to search in directories created by other
 395 people.  Sometimes they do this to check to suspicious activity (for
 396 example to look for new setuid binaries).  This means that it would be
 397 bad if @code{find} were vulnerable to, say, a security problem
 398 exploitable by constructing a specially-crafted filename.  The same
 399 consideration would apply to @code{locate} and @code{updatedb}.
 400
 401 Henry Spencer said this well in his fifth commandment:
 402 @quotation
 403 Thou shalt check the array bounds of all strings (indeed, all arrays),
 404 for surely where thou typest @samp{foo} someone someday shall type
 405 @samp{supercalifragilisticexpialidocious}.
 406 @end quotation
 407
 408 Symbolic links can often be a problem.  If @code{find} calls
 409 @code{lstat} on something and discovers that it is a directory, it's
 410 normal for @code{find} to recurse into it.  Even if the @code{chdir}
 411 system call is used immediately, there is still a window of
 412 opportunity between the @code{lstat} and the @code{chdir} in which a
 413 malicious person could rename the directory and substitute a symbolic
 414 link to some other directory.
 415
 416 @node    The File System Is Being Modified
 417 @section The File System Is Being Modified
 418
 419 The filesystem gets modified while you are traversing it.  For,
 420 example, it's normal for files to get deleted while @code{find} is
 421 traversing a directory.  Issuing an error message seems helpful when a
 422 file is deleted from the one directory you are interested in, but if
 423 @code{find} is searching 15000 directories, such a message becomes
 424 less helpful.
 425
 426 Bear in mind also that it is possible for the directory @code{find} is
 427 currently searching could be moved to another point in the filesystem,
 428 and that the directory in which @code{find} was started could be
 429 deleted.
 430
 431 Henry Spencer's sixth commandment is also apposite here:
 432 @quotation
 433 If a function be advertised to return an error code in the event of
 434 difficulties, thou shalt check for that code, yea, even though the
 435 checks triple the size of thy code and produce aches in thy typing
 436 fingers, for if thou thinkest ``it cannot happen to me'', the gods
 437 shall surely punish thee for thy arrogance.
 438 @end quotation
 439
 440 There are a lot of files out there.  They come in all dates and
 441 sizes.  There is a condition out there in the real world to exercise
 442 every bit of the code base.  So we try to test that code base before
 443 someone falls over a bug.
 444
 445
 446 @node Tools
 447 @chapter Tools
 448 Most of the tools required to build findutils are mentioned in the
 449 file @file{README-CVS}.  We also use some other tools:
 450
 451 @table @asis
 452 @item System call traces
 453 Much of the execution time of find is spent waiting for filesystem
 454 operations.  A system call trace (for example, that provided by
 455 @code{strace}) shows what system calls are being made.   Using this
 456 information we can work to remove unnecessary file system operations.
 457
 458 @item Valgrind
 459 Valgrind is a tool which dynamically verifies the memory accesses a
 460 program makes to ensure that they are valid (for example, that the
 461 behaviour of the program does not in any way depend on the contents of
 462 uninitialised memory).
 463
 464 @item DejaGnu
 465 DejaGnu is the test framework used to run the findutils test suite
 466 (the @code{runtest} program is part of DejaGnu).  It would be ideal if
 467 everybody building @code{findutils} also ran the test suite, but many
 468 people don't have DejaGnu installed.  When changes are made to
 469 findutils, DejaGnu is invoked a lot. @xref{Testing}, for more
 470 information.
 471 @end table
 472
 473 @node Using the GNU Portability Library
 474 @chapter Using the GNU Portability Library
 475 The Gnulib library (@url{http://www.gnu.org/software/gnulib/}) makes a
 476 variety of systems look more like a GNU/Linux system and also applies
 477 a bunch of automatic bug fixes and workarounds.  Some of these also
 478 apply to GNU/Linux systems too.  For example, the Gnulib regex
 479 implementation is used when we determine that we are building on a
 480 GNU libc system with a bug in the regex implementation.
 481
 482
 483 @section How and Why we Import the Gnulib Code
 484 Gnulib does not have a release process which results in a source
 485 tarball you can download.  Instead, the code is simply made available
 486 by CVS.
 487
 488 GNU projects vary in how they interact with Gnulib.  Many import a
 489 selection of code from Gnulib into the working directory and then
 490 check the updated files into the CVS repository for their project.
 491 The coreutils project does this, for example.
 492
 493 At the last maintainer changeover for findutils (2003) it turned out
 494 that there was a lot of material in findutils in common with Gnulib,
 495 but it had not been updated in a long time.  It was difficult to
 496 figure out which source files were intended to track external sources
 497 and which were intended to contain incompatible changes, or diverge
 498 for other reasons.
 499
 500 To reduce this uncertainty, I decided to treat Gnulib much like
 501 Automake.  Files supplied by Automake are simply absent from the
 502 findutils source tree.  When Automake is run with @code{automake
 503 --add-missing --copy}, it adds in all the files it thinks should be
 504 there which aren't there already.
 505
 506 An analogous approach is taken with Gnulib.  The Gnulib code is
 507 imported from the CVS repository for Gnulib with a findutils helper
 508 script, @code{import-gnulib.sh}.  That script fetches a copy of the
 509 Gnulib code into the subdirectory @file{gnulib-cvs} and then runs
 510 @code{gnulib-tool}.  The @code{gnulib-tool} program copies the
 511 required parts of Gnulib into the findutils source tree in the
 512 subdirectory @file{gnulib}.  This process gives us the property that
 513 the code in @file{gnulib} and @code{gnulib-cvs} is not included in the
 514 findutils CVS tree.   Both directories are listed in @file{.cvsignore}
 515 and so CVS ignores them.
 516
 517 Findutils does not use all the Gnulib code.  The modules we need are
 518 listed in the file @file{import-gnulib.config}.  The same file also
 519 indicates the version of Gnulib that we want to use.  Since Gnulib has
 520 no actual release process, we just use a date.  Both
 521 @file{import-gnulib.sh} and @file{import-gnulib.config} are in the
 522 findutils CVS repository.
 523
 524 The upshot of all this is that we can use the findutils CVS repository
 525 to track which version of Gnulib every findutils release uses.  That
 526 information is also provided when the user invokes a findutils program
 527 with the @samp{--version} option.  It also means that if a file exists
 528 in the Findutils CVS repository, you can be certain that the file
 529 exists in the CVS repository and is different from a similar file
 530 elsewhere, it's for a reason.
 531
 532 There are a small number of exceptions to this; the standard
 533 boiler-plate GNU files such as @file{ABOUT-NLS}, @file{INSTALL} and
 534 @file{COPYING}.
 535
 536
 537 @section How We Fix Gnulib Bugs
 538 If we always directly import the Gnulib code directly from the CVS
 539 repository in this way, it is impossible to maintain a locally
 540 different copy of Gnulib.  This is often a benefit in that accidental
 541 version skew is prevented.
 542
 543 However, sometimes we want deliberate version skew in order to use a
 544 findutils-specific patched version of a Gnulib file, for example
 545 because we fixed a bug.
 546
 547 Gnulib is used by quite a number of GNU projects, and this means that
 548 it gets plenty of testing.  Therefore there are relatively few bugs in
 549 the Gnulib code, but it does happen from time to time.
 550
 551 However, since there is no waiting around for a Gnulib source release
 552 tarball, Gnulib bugs are generally fixed quickly.  Here is an outline
 553 of the way we would contribute a fix to Gnulib (assuming you know it
 554 is not already fixed in current Gnulib CVS):
 555
 556 @table @asis
 557 @item Check you already completed a copyright assignment for Gnulib
 558 @item Begin with a vanilla CVS tree
 559 Download the Findutils source code from CVS (or use the tree you have
 560 already)
 561 @item Check out a copy of the Gnulib source
 562 An easy way to do this is to simply use @code{cp -ar} on the
 563 @file{gnulib-cvs} directory.   Have the Gnulib code checked out
 564 somewhere @emph{outside} your working CVS tree for findutils.
 565 @item Import Gnulib from your local copy
 566 The @code{import-gnulib.sh} tool has a @samp{-d} option which you can
 567 use to import the code from a local copy of Gnulib.
 568 @item Build findutils
 569 Build findutils and run the test suite, which should pass.  In our
 570 example we assume you have just noticed a bug in Gnulib, not that
 571 recent Gnulib changes broke the findutils regression tests.
 572 @item Write a test case
 573 If in fact Gnulib did break the findutils regression tests, you can probably
 574 skip this step, since you already have a test case demonstrating the problem.
 575 Otherwise, write a findutils test case for the bug and/or a Gnulib test case.
 576 @item Fix the Gnulib bug
 577 Make sure your editor follows symbolic links so that your changes to
 578 @file{gnulib/...} actually affect the files in the CVS working
 579 directory you checked out earlier.   Observe that your test now passes.
 580 @item Prepare a Gnulib patch
 581 Use @code{cvs -z3 diff -upN} to prepare the patch.  Write a ChangeLog
 582 entry and prepend this to the patch.  Check that the patch conforms
 583 with the GNU coding standards, and email it to the Gnulib mailing
 584 list.
 585 @item Wait for the patch to be applied
 586 Once your bug fix has been applied, you can update your local directory
 587 from CVS, re-import the code into Findutils (still using the @code{-d}
 588 option), and re-run the tests.  This verifies that the fix the Gnulib
 589 team made actually fixes your problem.
 590 @item Reimport the Gnulib code
 591 Update the findutils file @file{import-gnulib.config} to specify a
 592 date which is after the point at which the bug fix was committed to
 593 Gnulib.  Finally, re-import the Gnulib code directly from CVS by using
 594 @samp{import-gnulib.sh} without the @samp{-d} option, and run the
 595 tests again.  This verifies that there was no remaining local change
 596 that we were relying on to fix the bug.
 597
 598 Be aware of the fact that the date specified in the
 599 @file{import-gnulib.config} file selects the latest changes for the
 600 given date, so if you modify @file{import-gnulib.config} as soon as
 601 someone tells you they they checked in a bugfix and you set
 602 @var{gnulib_version} to today's date, there will be some file version
 603 instability for the rest of the day.
 604
 605 @end table
 606
 607 @node Documentation
 608 @chapter Documentation
 609
 610 The findutils CVS tree includes several different types of
 611 documentation.
 612
 613 @section User Documentation
 614 User-oriented documentation is provided as manual pages and in
 615 Texinfo.  See
 616 @ref{Introduction,,Introduction,find,The Findutils manual}.
 617
 618 Please make sure both sets of documentation are updated if you make a
 619 change to the code.  The GNU coding standards do not normally call for
 620 maintaining manual pages on the grounds of effort duplication.
 621 However, the manual page format is more convenient for quick
 622 reference, and so it's worth maintaining both types of documentation.
 623 However, the manual pages are normally rather more terse than the
 624 Texinfo documentation.  The manual pages are suitable for reference
 625 use, but the Texinfo manual should also include introductory and
 626 tutorial material.
 627
 628
 629 @section Build Guidance
 630
 631 @table @file
 632 @item ABOUT-NLS
 633 Describes the Free Translation Project, the translation status of
 634 various GNU projects, and how to participate by translating an
 635 application.
 636 @item AUTHORS
 637 Lists the authors of findutils.
 638 @item COPYING
 639 The copyright license covering findutils; currently, the GNU GPL,
 640 version 3.
 641 @item INSTALL
 642 Generic installation instructions for installing GNU programs.
 643 @item README
 644 Information about how to compile findutils in particular
 645 @item README-alpha
 646 A README file which is included with testing releases of findutils.
 647 @item README-CVS
 648 Describes how to build findutils from the code in CVS.
 649 @item THANKS
 650 Thanks for people who contributed to findutils.  Generally, if
 651 someone's contribution was significant enough to need a copyright
 652 assignment, their name should go in here.
 653 @item TODO
 654 Mainly obsolete.
 655 @end table
 656
 657
 658 @section Release Information
 659 @table @file
 660 @item NEWS
 661 Enumerates the user-visible change in each release.  Typical changes
 662 are fixed bugs, functionality changes and documentation changes.
 663 @item ChangeLog
 664 This file enumerates all changes to the findutils source code (with
 665 the possible exception of @file{.cvsignore} and @code{.gitignore}
 666 changes).  The level of detail used for this file should be sufficient
 667 to answer the questions ``what changed?'' and ``why was it changed?''.
 668 If a change fixes a bug, always give the bug reference number in both
 669 the @file{ChangeLog} and @file{NEWS} files and of course also in the
 670 checkin message.  In general, it should be possible to enumerate all
 671 material changes to a function by searching for its name in
 672 @file{ChangeLog}.
 673 @end table
 674
 675 @node Testing
 676 @chapter Testing
 677 This chapter will explain the general procedures for adding tests to
 678 the test suite, and the functions defined in the findutils-specific
 679 DejaGnu configuration.  Where appropriate references will be made to
 680 the DejaGnu documentation.
 681
 682 @node Bugs
 683 @chapter Bugs
 684
 685 Bugs are logged in the Savannah bug tracker
 686 @url{http://savannah.gnu.org/bugs/?group=findutils}.  The tracker
 687 offers several fields but their use is largely obvious.  The
 688 life-cycle of a bug is like this:
 689
 690
 691 @table @asis
 692 @item Open
 693 Someone, usually a maintainer, a distribution maintainer or a user,
 694 creates a bug by filling in the form.   They fill in field values as
 695 they see fit.  This will generate an email to
 696 @email{bug-findutils@@gnu.org}.
 697
 698 @item Triage
 699 The bug hangs around with @samp{Status=None} until someone begins to
 700 work on it.  At that point they set the ``Assigned To'' field and will
 701 sometimes set the status to @samp{In Progress}, especially if the bug
 702 will take a while to fix.
 703
 704 @item Non-bugs
 705 Quite a lot of reports are not actually bugs; for these the usual
 706 procedure is to explain why the problem is not a bug, set the status
 707 to @samp{Invalid} and close the bug.   Make sure you set the
 708 @samp{Assigned to} field to yourself before closing the bug.
 709
 710 @item Fixing
 711 When you commit a bug fix into CVS (or in the case of a contributed
 712 patch, commit the change), mark the bug as @samp{Fixed}.  Make sure
 713 you include a new test case where this is relevant.  If you can figure
 714 out which releases are affected, please also set the @samp{Release}
 715 field to the earliest release which is affected by the bug.
 716 Indicate which source branch the fix is included in (for example,
 717 4.2.x or 4.3.x).  Don't close the bug yet.
 718
 719 @item Release
 720 When a release is made which includes the bug fix, make sure the bug
 721 is listed in the NEWS file.  Once the release is made, fill in the
 722 @samp{Fixed Release} field and close the bug.
 723 @end table
 724
 725
 726 @node Distributions
 727 @chapter Distributions
 728 Almost all GNU/Linux distributions include findutils, but only some of
 729 them have a package maintainer who is a member of the mailing list.
 730 Distributions don't often feed back patches to the
 731 @email{bug-findutils@@gnu.org} list, but on the other hand many of
 732 their patches relate only to standards for file locations and so
 733 forth, and are therefore distribution specific.  On an irregular basis
 734 I check the current patches being used by one or two distributions,
 735 but the total number of GNU/Linux distributions is large enough that
 736 we could not hope to cover them all.
 737
 738 Often, bugs are raised against a distribution's bug tracker instead of
 739 GNU's.    Periodically (about every six months) I take a look at some
 740 of the more accessible bug trackers to indicate which bugs have been
 741 fixed upstream.
 742
 743 Many distributions include both findutils and the slocate package,
 744 which provides a replacement @code{locate}.
 745
 746
 747 @node Internationalisation
 748 @chapter Internationalisation
 749 Translation is essentially automated from the maintainer's point of
 750 view.  The TP mails the maintainer when a new PO file is available,
 751 and we just download it and check it in.  We copy the @file{.po} files
 752 into the CVS repository.  For more information, please see
 753 @url{http://www.iro.umontreal.ca/translation/HTML/domain-findutils.html}.
 754
 755
 756 @node Security
 757 @chapter Security
 758
 759 See @ref{Security Considerations, ,Security Considerations,find,The
 760 Findutils manual}, for a full description of the findutils approach to
 761 security considerations and discussion of particular tools.
 762
 763 If someone reports a security bug publicly, we should fix this as
 764 rapidly as possible.  If necessary, this can mean issuing a fixed
 765 release containing just the one bug fix.  We try to avoid issuing
 766 releases which include both significant security fixes and functional
 767 changes.
 768
 769 Where someone reports a security problem privately, we generally try
 770 to construct and test a patch without checking the intermediate code
 771 in.  Once everything has been tested, this allows us to commit a patch
 772 and immediately make a release.   The advantage of doing things this
 773 way is that we avoid situations where people watching for CVS commits
 774 can figure out and exploit a security problem before a fixed release
 775 is available.
 776
 777 It's important that security problems be fixed promptly, but don't
 778 rush so much that things go wrong.  Make sure the new release really
 779 fixes the problem.  It's usually best not to include functional
 780 changes in your security-fix release.
 781
 782 If the security problem is serious, send an alert to
 783 @email{vendor-sec@@lst.de}.  The members of the list include most
 784 GNU/Linux distributions.  The point of doing this is to allow them to
 785 prepare to release your security fix to their customers, once the fix
 786 becomes available.    Here is an example alert:-
 787
 788 @smallexample
 789 GNU findutils heap buffer overrun (potential privilege escalation)
 790
 791 $Revision: 1.2 $; $Date: 2007/07/02 08:25:42 $
 792
 793
 794 I. BACKGROUND
 795 =============
 796
 797 GNU findutils is a set of programs which search for files on Unix-like
 798 systems.  It is maintained by the GNU Project of the Free Software
 799 Foundation.  For more information, see
 800 @url{http://www.gnu.org/software/findutils}.
 801
 802
 803 II. DESCRIPTION
 804 ===============
 805
 806 When GNU locate reads filenames from an old-format locate database,
 807 they are read into a fixed-length buffer allocated on the heap.
 808 Filenames longer than the 1026-byte buffer can cause a buffer overrun.
 809 The overrunning data can be chosen by any person able to control the
 810 names of filenames created on the local system.  This will normally
 811 include all local users, but in many cases also remote users (for
 812 example in the case of FTP servers allowing uploads).
 813
 814 III. ANALYSIS
 815 =============
 816
 817 Findutils supports three different formats of locate database, its
 818 native format "LOCATE02", the slocate variant of LOCATE02, and a
 819 traditional ("old") format that locate uses on other Unix systems.
 820
 821 When locate reads filenames from a LOCATE02 database (the default
 822 format), the buffer into which data is read is automatically extended
 823 to accomodate the length of the filenames.
 824
 825 This automatic buffer extension does not happen for old-format
 826 databases.  Instead a 1026-byte buffer is used.  When a longer
 827 pathname appears in the locate database, the end of this buffer is
 828 overrun.  The buffer is allocated on the heap (not the stack).
 829
 830 If the locate database is in the default LOCATE02 format, the locate
 831 program does perform automatic buffer extension, and the program is
 832 not vulnerable to this problem.  The software used to build the
 833 old-format locate database is not itself vulnerable to the same
 834 attack.
 835
 836 Most installations of GNU findutils do not use the old database
 837 format, and so will not be vulnerable.
 838
 839
 840 IV. DETECTION
 841 =============
 842
 843 Software
 844 --------
 845 All existing releases of findutils are affected.
 846
 847
 848 Installations
 849 -------------
 850
 851 To discover the ongest path name on a given system, you can use the
 852 following command (requires GNU findutils and GNU coreutils):
 853
 854 @verbatim
 855 find / -print0 | tr -c '\0' 'x' | tr '\0' '\n' | wc -L
 856 @end verbatim
 857
 858 V. EXAMPLE
 859 ==========
 860
 861 This section includes a shell script which determines which of a list
 862 of locate binaries is vulnerable to the problem.  The shell script has
 863 been tested only on glibc based systems having a mktemp binary.
 864
 865 NOTE: This script deliberately overruns the buffer in order to
 866 determine if a binary is affected.  Therefore running it on your
 867 system may have undesirable effects.  We recommend that you read the
 868 script before running it.
 869
 870 @verbatim
 871 #! /bin/sh
 872 set +m
 873 if vanilla_db="$(mktemp nicedb.XXXXXX)" ; then
 874     if updatedb --prunepaths="" --old-format --localpaths="/tmp" \
 875         --output="$@{vanilla_db@}" ; then
 876         true
 877     else
 878         rm -f "$@{vanilla_db@}"
 879         vanilla_db=""
 880         echo "Failed to create old-format locate database; skipping the sanity checks" >&2
 881     fi
 882 fi
 883
 884 make_overrun_db() @{
 885     # Start with a valid database
 886     cat "$@{vanilla_db@}"
 887     # Make the final entry really long
 888     dd if=/dev/zero  bs=1 count=1500 2>/dev/null | tr '\000' 'x'
 889 @}
 890
 891
 892
 893 ulimit -c 0
 894
 895 usage() @{ echo "usage: $0 binary [binary...]" >&2; exit $1; @}
 896 [ $# -eq 0 ] && usage 1
 897
 898 bad=""
 899 good=""
 900 ugly=""
 901 if dbfile="$(mktemp nasty.XXXXXX)"
 902 then
 903     make_overrun_db > "$dbfile"
 904     for locate ; do
 905       ver="$locate = $("$locate"  --version | head -1)"
 906       if [ -z "$vanilla_db" ] || "$locate" -d "$vanilla_db" "" >/dev/null ; then
 907           "$locate" -d "$dbfile" "" >/dev/null
 908           if [ $? -gt 128 ] ; then
 909               bad="$bad
 910 vulnerable: $ver"
 911           else
 912               good="$good
 913 good: $ver"
 914           fi
 915        else
 916           # the regular locate failed
 917           ugly="$ugly
 918 buggy, may or may not be vulnerable: $ver"
 919        fi
 920     done
 921     rm -f "$@{dbfile@}" "$@{vanilla_db@}"
 922     # good: unaffected.  bad: affected (vulnerable).
 923     # ugly: doesn't even work for a normal old-format database.
 924     echo "$good"
 925     echo "$bad"
 926     echo "$ugly"
 927 else
 928   exit 1
 929 fi
 930 @end verbatim
 931
 932
 933
 934
 935 VI. VENDOR RESPONSE
 936 ===================
 937
 938 The GNU project discovered the problem while 'locate' was being worked
 939 on; this is the first public announcement of the problem.
 940
 941 The GNU findutils mantainer has issued a patch as p[art of this
 942 announcement.  The patch appears below.
 943
 944 A source release of findutils-4.2.31 will be issued on 2007-05-30.
 945 That release will of course include the patch.  The patch will be
 946 committed to the public CVS repository at the same time.  Public
 947 announcements of the release, including a description of the bug, will
 948 be made at the same time as the release.
 949
 950 A release of findutils-4.3.x will follow and will also include the
 951 patch.
 952
 953
 954 VII. PATCH
 955 ==========
 956
 957 This patch should apply to findutils-4.2.23 and later.
 958 Findutils-4.2.23 was released almost two years ago.
 959 @verbatim
 960 Index: locate/locate.c
 961 ===================================================================
 962 RCS file: /cvsroot/findutils/findutils/locate/locate.c,v
 963 retrieving revision 1.58.2.2
 964 diff -u -p -r1.58.2.2 locate.c
 965 --- locate/locate.c     22 Apr 2007 16:57:42 -0000      1.58.2.2
 966 +++ locate/locate.c     28 May 2007 10:18:16 -0000
 967 @@@@ -124,9 +124,9 @@@@ extern int errno;
 968
 969  #include "locatedb.h"
 970  #include <getline.h>
 971 -#include "../gnulib/lib/xalloc.h"
 972 -#include "../gnulib/lib/error.h"
 973 -#include "../gnulib/lib/human.h"
 974 +#include "xalloc.h"
 975 +#include "error.h"
 976 +#include "human.h"
 977  #include "dirname.h"
 978  #include "closeout.h"
 979  #include "nextelem.h"
 980 @@@@ -468,10 +468,36 @@@@ visit_justprint_unquoted(struct process_
 981    return VISIT_CONTINUE;
 982  @}
 983
 984 +static void
 985 +toolong (struct process_data *procdata)
 986 +@{
 987 +  error (1, 0,
 988 +        _("locate database %s contains a "
 989 +          "filename longer than locate can handle"),
 990 +        procdata->dbfile);
 991 +@}
 992 +
 993 +static void
 994 +extend (struct process_data *procdata, size_t siz1, size_t siz2)
 995 +@{
 996 +  /* Figure out if the addition operation is safe before performing it. */
 997 +  if (SIZE_MAX - siz1 < siz2)
 998 +    @{
 999 +      toolong (procdata);
1000 +    @}
1001 +  else if (procdata->pathsize < (siz1+siz2))
1002 +    @{
1003 +      procdata->pathsize = siz1+siz2;
1004 +      procdata->original_filename = x2nrealloc (procdata->original_filename,
1005 +                                               &procdata->pathsize,
1006 +                                               1);
1007 +    @}
1008 +@}
1009 +
1010  static int
1011  visit_old_format(struct process_data *procdata, void *context)
1012  @{
1013 -  register char *s;
1014 +  register size_t i;
1015    (void) context;
1016
1017    /* Get the offset in the path where this path info starts.  */
1018 @@@@ -479,20 +505,35 @@@@ visit_old_format(struct process_data *pr
1019      procdata->count += getw (procdata->fp) - LOCATEDB_OLD_OFFSET;
1020    else
1021      procdata->count += procdata->c - LOCATEDB_OLD_OFFSET;
1022 +  assert(procdata->count > 0);
1023
1024 -  /* Overlay the old path with the remainder of the new.  */
1025 -  for (s = procdata->original_filename + procdata->count;
1026 +  /* Overlay the old path with the remainder of the new.  Read
1027 +   * more data until we get to the next filename.
1028 +   */
1029 +  for (i=procdata->count;
1030         (procdata->c = getc (procdata->fp)) > LOCATEDB_OLD_ESCAPE;)
1031 -    if (procdata->c < 0200)
1032 -      *s++ = procdata->c;              /* An ordinary character.  */
1033 -    else
1034 -      @{
1035 -       /* Bigram markers have the high bit set. */
1036 -       procdata->c &= 0177;
1037 -       *s++ = procdata->bigram1[procdata->c];
1038 -       *s++ = procdata->bigram2[procdata->c];
1039 -      @}
1040 -  *s-- = '\0';
1041 +    @{
1042 +      if (procdata->c < 0200)
1043 +       @{
1044 +         /* An ordinary character. */
1045 +         extend (procdata, i, 1u);
1046 +         procdata->original_filename[i++] = procdata->c;
1047 +       @}
1048 +      else
1049 +       @{
1050 +         /* Bigram markers have the high bit set. */
1051 +         extend (procdata, i, 2u);
1052 +         procdata->c &= 0177;
1053 +         procdata->original_filename[i++] = procdata->bigram1[procdata->c];
1054 +         procdata->original_filename[i++] = procdata->bigram2[procdata->c];
1055 +       @}
1056 +    @}
1057 +
1058 +  /* Consider the case where we executed the loop body zero times; we
1059 +   * still need space for the terminating null byte.
1060 +   */
1061 +  extend (procdata, i, 1u);
1062 +  procdata->original_filename[i] = 0;
1063
1064    procdata->munged_filename = procdata->original_filename;
1065 @end verbatim
1066
1067
1068 VIII. THANKS
1069 ============
1070
1071 Thanks to Rob Holland <rob@@inversepath.com> and Tavis Ormandy.
1072
1073
1074 VIII. CVE INFORMATION
1075 =====================
1076
1077 No CVE candidate number has yet been assigned for this vulnerability.
1078 If someone provides one, I will include it in the public announcement
1079 and change logs.
1080 @end smallexample
1081
1082 The original announcement above was sent out with a cleartext PGP
1083 signature, of course, but that has been omitted from the example.
1084
1085 Once a fixed release is available, announce the new release using the
1086 normal channels.  Any CVE number assigned for the problem should be
1087 included in the @file{ChangeLog} and @file{NEWS} entries. See
1088 @url{http://cve.mitre.org/} for an explanation of CVE numbers.
1089
1090
1091
1092 @node Making Releases
1093 @chapter Making Releases
1094 This section will explain how to make a findutils release.   For the
1095 time being here is a terse description of the main steps:
1096
1097 @enumerate
1098 @item Commit changes; make sure your working directory has no
1099 uncommitted changes.
1100 @item Test; make sure that all changes you have made have tests, and
1101 that the tests pass.  Verify this with @code{make distcheck}.
1102 @item Bugs; make sure all Savannah bug entries fixed in this release
1103 are fixed.
1104 @item NEWS; make sure that the NEWS and configure.in file are updated
1105 with the new release number (and checked in).
1106 @item Build the release tarball; do this with @code{make distcheck}.
1107 Copy the tarball somewhere safe.
1108 @item Tag the release; findutils releases are tagged in CVS as
1109 FINDUTILS_x_y_z-1.  For example, the tag for findutils release 4.3.8
1110 is FINDUTILS_4_3_8-1.
1111 @item Prepare the upload and upload it.
1112 @xref{Automated FTP Uploads, ,Automated FTP
1113 Uploads, maintain, Information for Maintainers of GNU Software},
1114 for detailed upload instructions.
1115 @item Make a release announcement; include an extract from the NEWS
1116 file which explains what's changed.  Announcements for test releases
1117 should just go to @email{bug-findutils@@gnu.org}.  Announcements for
1118 stable releases should go to @email{info-gnu@@gnu.org} as well.
1119 @item Bump the release numbers in CVS; edit the @file{configure.in}
1120 and @file{NEWS} files to advance the release numbers.   For example,
1121 if you have just released @samp{4.6.2}, bump the release number to
1122 @samp{4.6.3-CVS}.  The point of the @samp{-CVS} suffix here is that a
1123 findutils binary built from CVS will bear a release number indicating
1124 it's not built from the the ``official'' source release.
1125 @item Close bugs; any bugs recorded on Savannah which were fixed in this
1126 release should now be marked as closed.   Update the @samp{Fixed
1127 Release} field of these bugs appropriately and make sure the
1128 @samp{Assigned to} field is populated.
1129 @end enumerate
1130
1131
1132 @bye
1133
1134 @comment texi related words used by Emacs' spell checker ispell.el
1135
1136 @comment LocalWords: texinfo setfilename settitle setchapternewpage
1137 @comment LocalWords: iftex finalout ifinfo DIR titlepage vskip pt
1138 @comment LocalWords: filll dir samp dfn noindent xref pxref
1139 @comment LocalWords: var deffn texi deffnx itemx emph asis
1140 @comment LocalWords: findex smallexample subsubsection cindex
1141 @comment LocalWords: dircategory direntry itemize
1142
1143 @comment other words used by Emacs' spell checker ispell.el
1144 @comment LocalWords: README fred updatedb xargs Plett Rendell akefile
1145 @comment LocalWords: args grep Filesystems fo foo fOo wildcards iname
1146 @comment LocalWords: ipath regex iregex expr fubar regexps
1147 @comment LocalWords: metacharacters macs sr sc inode lname ilname
1148 @comment LocalWords: sysdep noleaf ls inum xdev filesystems usr atime
1149 @comment LocalWords: ctime mtime amin cmin mmin al daystart Sladkey rm
1150 @comment LocalWords: anewer cnewer bckw rf xtype uname gname uid gid
1151 @comment LocalWords: nouser nogroup chown chgrp perm ch maxdepth
1152 @comment LocalWords: mindepth cpio src CD AFS statted stat fstype ufs
1153 @comment LocalWords: nfs tmp mfs printf fprint dils rw djm Nov lwall
1154 @comment LocalWords: POSIXLY fls fprintf strftime locale's EDT GMT AP
1155 @comment LocalWords: EST diff perl backquotes sprintf Falstad Oct cron
1156 @comment LocalWords: eg vmunix mkdir afs allexec allwrite ARG bigram
1157 @comment LocalWords: bigrams cd chmod comp crc CVS dbfile dum eof
1158 @comment LocalWords: fileserver filesystem fn frcode Ghazi Hnewc iXX
1159 @comment LocalWords: joeuser Kaveh localpaths localuser LOGNAME
1160 @comment LocalWords: Meyering mv netpaths netuser nonblank nonblanks
1161 @comment LocalWords: ois ok Pinard printindex proc procs prunefs
1162 @comment LocalWords: prunepaths pwd RFS rmadillo rmdir rsh sbins str
1163 @comment LocalWords: su Timar ubins ug unstripped vf VM Weitzel
1164 @comment LocalWords: wildcard zlogout basename execdir wholename iwholename
1165 @comment LocalWords: timestamp timestamps Solaris FreeBSD OpenBSD POSIX