manual/pattern.texi

   1 @node Pattern Matching, I/O Overview, Searching and Sorting, Top
   2 @c %MENU% Matching shell ``globs'' and regular expressions
   3 @chapter Pattern Matching
   4
   5 @Theglibc{} provides pattern matching facilities for two kinds of
   6 patterns: regular expressions and file-name wildcards.  The library also
   7 provides a facility for expanding variable and command references and
   8 parsing text into words in the way the shell does.
   9
  10 @menu
  11 * Wildcard Matching::    Matching a wildcard pattern against a single string.
  12 * Globbing::             Finding the files that match a wildcard pattern.
  13 * Regular Expressions::  Matching regular expressions against strings.
  14 * Word Expansion::       Expanding shell variables, nested commands,
  15                             arithmetic, and wildcards.
  16                             This is what the shell does with shell commands.
  17 @end menu
  18
  19 @node Wildcard Matching
  20 @section Wildcard Matching
  21
  22 @pindex fnmatch.h
  23 This section describes how to match a wildcard pattern against a
  24 particular string.  The result is a yes or no answer: does the
  25 string fit the pattern or not.  The symbols described here are all
  26 declared in @file{fnmatch.h}.
  27
  28 @comment fnmatch.h
  29 @comment POSIX.2
  30 @deftypefun int fnmatch (const char *@var{pattern}, const char *@var{string}, int @var{flags})
  31 This function tests whether the string @var{string} matches the pattern
  32 @var{pattern}.  It returns @code{0} if they do match; otherwise, it
  33 returns the nonzero value @code{FNM_NOMATCH}.  The arguments
  34 @var{pattern} and @var{string} are both strings.
  35
  36 The argument @var{flags} is a combination of flag bits that alter the
  37 details of matching.  See below for a list of the defined flags.
  38
  39 In @theglibc{}, @code{fnmatch} cannot experience an ``error''---it
  40 always returns an answer for whether the match succeeds.  However, other
  41 implementations of @code{fnmatch} might sometimes report ``errors''.
  42 They would do so by returning nonzero values that are not equal to
  43 @code{FNM_NOMATCH}.
  44 @end deftypefun
  45
  46 These are the available flags for the @var{flags} argument:
  47
  48 @table @code
  49 @comment fnmatch.h
  50 @comment GNU
  51 @item FNM_FILE_NAME
  52 Treat the @samp{/} character specially, for matching file names.  If
  53 this flag is set, wildcard constructs in @var{pattern} cannot match
  54 @samp{/} in @var{string}.  Thus, the only way to match @samp{/} is with
  55 an explicit @samp{/} in @var{pattern}.
  56
  57 @comment fnmatch.h
  58 @comment POSIX.2
  59 @item FNM_PATHNAME
  60 This is an alias for @code{FNM_FILE_NAME}; it comes from POSIX.2.  We
  61 don't recommend this name because we don't use the term ``pathname'' for
  62 file names.
  63
  64 @comment fnmatch.h
  65 @comment POSIX.2
  66 @item FNM_PERIOD
  67 Treat the @samp{.} character specially if it appears at the beginning of
  68 @var{string}.  If this flag is set, wildcard constructs in @var{pattern}
  69 cannot match @samp{.} as the first character of @var{string}.
  70
  71 If you set both @code{FNM_PERIOD} and @code{FNM_FILE_NAME}, then the
  72 special treatment applies to @samp{.} following @samp{/} as well as to
  73 @samp{.} at the beginning of @var{string}.  (The shell uses the
  74 @code{FNM_PERIOD} and @code{FNM_FILE_NAME} flags together for matching
  75 file names.)
  76
  77 @comment fnmatch.h
  78 @comment POSIX.2
  79 @item FNM_NOESCAPE
  80 Don't treat the @samp{\} character specially in patterns.  Normally,
  81 @samp{\} quotes the following character, turning off its special meaning
  82 (if any) so that it matches only itself.  When quoting is enabled, the
  83 pattern @samp{\?} matches only the string @samp{?}, because the question
  84 mark in the pattern acts like an ordinary character.
  85
  86 If you use @code{FNM_NOESCAPE}, then @samp{\} is an ordinary character.
  87
  88 @comment fnmatch.h
  89 @comment GNU
  90 @item FNM_LEADING_DIR
  91 Ignore a trailing sequence of characters starting with a @samp{/} in
  92 @var{string}; that is to say, test whether @var{string} starts with a
  93 directory name that @var{pattern} matches.
  94
  95 If this flag is set, either @samp{foo*} or @samp{foobar} as a pattern
  96 would match the string @samp{foobar/frobozz}.
  97
  98 @comment fnmatch.h
  99 @comment GNU
 100 @item FNM_CASEFOLD
 101 Ignore case in comparing @var{string} to @var{pattern}.
 102
 103 @comment fnmatch.h
 104 @comment GNU
 105 @item FNM_EXTMATCH
 106 @cindex Korn Shell
 107 @pindex ksh
 108 Recognize beside the normal patterns also the extended patterns
 109 introduced in @file{ksh}.  The patterns are written in the form
 110 explained in the following table where @var{pattern-list} is a @code{|}
 111 separated list of patterns.
 112
 113 @table @code
 114 @item ?(@var{pattern-list})
 115 The pattern matches if zero or one occurrences of any of the patterns
 116 in the @var{pattern-list} allow matching the input string.
 117
 118 @item *(@var{pattern-list})
 119 The pattern matches if zero or more occurrences of any of the patterns
 120 in the @var{pattern-list} allow matching the input string.
 121
 122 @item +(@var{pattern-list})
 123 The pattern matches if one or more occurrences of any of the patterns
 124 in the @var{pattern-list} allow matching the input string.
 125
 126 @item @@(@var{pattern-list})
 127 The pattern matches if exactly one occurrence of any of the patterns in
 128 the @var{pattern-list} allows matching the input string.
 129
 130 @item !(@var{pattern-list})
 131 The pattern matches if the input string cannot be matched with any of
 132 the patterns in the @var{pattern-list}.
 133 @end table
 134 @end table
 135
 136 @node Globbing
 137 @section Globbing
 138
 139 @cindex globbing
 140 The archetypal use of wildcards is for matching against the files in a
 141 directory, and making a list of all the matches.  This is called
 142 @dfn{globbing}.
 143
 144 You could do this using @code{fnmatch}, by reading the directory entries
 145 one by one and testing each one with @code{fnmatch}.  But that would be
 146 slow (and complex, since you would have to handle subdirectories by
 147 hand).
 148
 149 The library provides a function @code{glob} to make this particular use
 150 of wildcards convenient.  @code{glob} and the other symbols in this
 151 section are declared in @file{glob.h}.
 152
 153 @menu
 154 * Calling Glob::             Basic use of @code{glob}.
 155 * Flags for Globbing::       Flags that enable various options in @code{glob}.
 156 * More Flags for Globbing::  GNU specific extensions to @code{glob}.
 157 @end menu
 158
 159 @node Calling Glob
 160 @subsection Calling @code{glob}
 161
 162 The result of globbing is a vector of file names (strings).  To return
 163 this vector, @code{glob} uses a special data type, @code{glob_t}, which
 164 is a structure.  You pass @code{glob} the address of the structure, and
 165 it fills in the structure's fields to tell you about the results.
 166
 167 @comment glob.h
 168 @comment POSIX.2
 169 @deftp {Data Type} glob_t
 170 This data type holds a pointer to a word vector.  More precisely, it
 171 records both the address of the word vector and its size.  The GNU
 172 implementation contains some more fields which are non-standard
 173 extensions.
 174
 175 @table @code
 176 @item gl_pathc
 177 The number of elements in the vector, excluding the initial null entries
 178 if the GLOB_DOOFFS flag is used (see gl_offs below).
 179
 180 @item gl_pathv
 181 The address of the vector.  This field has type @w{@code{char **}}.
 182
 183 @item gl_offs
 184 The offset of the first real element of the vector, from its nominal
 185 address in the @code{gl_pathv} field.  Unlike the other fields, this
 186 is always an input to @code{glob}, rather than an output from it.
 187
 188 If you use a nonzero offset, then that many elements at the beginning of
 189 the vector are left empty.  (The @code{glob} function fills them with
 190 null pointers.)
 191
 192 The @code{gl_offs} field is meaningful only if you use the
 193 @code{GLOB_DOOFFS} flag.  Otherwise, the offset is always zero
 194 regardless of what is in this field, and the first real element comes at
 195 the beginning of the vector.
 196
 197 @item gl_closedir
 198 The address of an alternative implementation of the @code{closedir}
 199 function.  It is used if the @code{GLOB_ALTDIRFUNC} bit is set in
 200 the flag parameter.  The type of this field is
 201 @w{@code{void (*) (void *)}}.
 202
 203 This is a GNU extension.
 204
 205 @item gl_readdir
 206 The address of an alternative implementation of the @code{readdir}
 207 function used to read the contents of a directory.  It is used if the
 208 @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter.  The type of
 209 this field is @w{@code{struct dirent *(*) (void *)}}.
 210
 211 This is a GNU extension.
 212
 213 @item gl_opendir
 214 The address of an alternative implementation of the @code{opendir}
 215 function.  It is used if the @code{GLOB_ALTDIRFUNC} bit is set in
 216 the flag parameter.  The type of this field is
 217 @w{@code{void *(*) (const char *)}}.
 218
 219 This is a GNU extension.
 220
 221 @item gl_stat
 222 The address of an alternative implementation of the @code{stat} function
 223 to get information about an object in the filesystem.  It is used if the
 224 @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter.  The type of
 225 this field is @w{@code{int (*) (const char *, struct stat *)}}.
 226
 227 This is a GNU extension.
 228
 229 @item gl_lstat
 230 The address of an alternative implementation of the @code{lstat}
 231 function to get information about an object in the filesystems, not
 232 following symbolic links.  It is used if the @code{GLOB_ALTDIRFUNC} bit
 233 is set in the flag parameter.  The type of this field is @code{@w{int
 234 (*) (const char *,} @w{struct stat *)}}.
 235
 236 This is a GNU extension.
 237
 238 @item gl_flags
 239 The flags used when @code{glob} was called.  In addition, @code{GLOB_MAGCHAR}
 240 might be set.  See @ref{Flags for Globbing} for more details.
 241
 242 This is a GNU extension.
 243 @end table
 244 @end deftp
 245
 246 For use in the @code{glob64} function @file{glob.h} contains another
 247 definition for a very similar type.  @code{glob64_t} differs from
 248 @code{glob_t} only in the types of the members @code{gl_readdir},
 249 @code{gl_stat}, and @code{gl_lstat}.
 250
 251 @comment glob.h
 252 @comment GNU
 253 @deftp {Data Type} glob64_t
 254 This data type holds a pointer to a word vector.  More precisely, it
 255 records both the address of the word vector and its size.  The GNU
 256 implementation contains some more fields which are non-standard
 257 extensions.
 258
 259 @table @code
 260 @item gl_pathc
 261 The number of elements in the vector, excluding the initial null entries
 262 if the GLOB_DOOFFS flag is used (see gl_offs below).
 263
 264 @item gl_pathv
 265 The address of the vector.  This field has type @w{@code{char **}}.
 266
 267 @item gl_offs
 268 The offset of the first real element of the vector, from its nominal
 269 address in the @code{gl_pathv} field.  Unlike the other fields, this
 270 is always an input to @code{glob}, rather than an output from it.
 271
 272 If you use a nonzero offset, then that many elements at the beginning of
 273 the vector are left empty.  (The @code{glob} function fills them with
 274 null pointers.)
 275
 276 The @code{gl_offs} field is meaningful only if you use the
 277 @code{GLOB_DOOFFS} flag.  Otherwise, the offset is always zero
 278 regardless of what is in this field, and the first real element comes at
 279 the beginning of the vector.
 280
 281 @item gl_closedir
 282 The address of an alternative implementation of the @code{closedir}
 283 function.  It is used if the @code{GLOB_ALTDIRFUNC} bit is set in
 284 the flag parameter.  The type of this field is
 285 @w{@code{void (*) (void *)}}.
 286
 287 This is a GNU extension.
 288
 289 @item gl_readdir
 290 The address of an alternative implementation of the @code{readdir64}
 291 function used to read the contents of a directory.  It is used if the
 292 @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter.  The type of
 293 this field is @w{@code{struct dirent64 *(*) (void *)}}.
 294
 295 This is a GNU extension.
 296
 297 @item gl_opendir
 298 The address of an alternative implementation of the @code{opendir}
 299 function.  It is used if the @code{GLOB_ALTDIRFUNC} bit is set in
 300 the flag parameter.  The type of this field is
 301 @w{@code{void *(*) (const char *)}}.
 302
 303 This is a GNU extension.
 304
 305 @item gl_stat
 306 The address of an alternative implementation of the @code{stat64} function
 307 to get information about an object in the filesystem.  It is used if the
 308 @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter.  The type of
 309 this field is @w{@code{int (*) (const char *, struct stat64 *)}}.
 310
 311 This is a GNU extension.
 312
 313 @item gl_lstat
 314 The address of an alternative implementation of the @code{lstat64}
 315 function to get information about an object in the filesystems, not
 316 following symbolic links.  It is used if the @code{GLOB_ALTDIRFUNC} bit
 317 is set in the flag parameter.  The type of this field is @code{@w{int
 318 (*) (const char *,} @w{struct stat64 *)}}.
 319
 320 This is a GNU extension.
 321
 322 @item gl_flags
 323 The flags used when @code{glob} was called.  In addition, @code{GLOB_MAGCHAR}
 324 might be set.  See @ref{Flags for Globbing} for more details.
 325
 326 This is a GNU extension.
 327 @end table
 328 @end deftp
 329
 330 @comment glob.h
 331 @comment POSIX.2
 332 @deftypefun int glob (const char *@var{pattern}, int @var{flags}, int (*@var{errfunc}) (const char *@var{filename}, int @var{error-code}), glob_t *@var{vector-ptr})
 333 The function @code{glob} does globbing using the pattern @var{pattern}
 334 in the current directory.  It puts the result in a newly allocated
 335 vector, and stores the size and address of this vector into
 336 @code{*@var{vector-ptr}}.  The argument @var{flags} is a combination of
 337 bit flags; see @ref{Flags for Globbing}, for details of the flags.
 338
 339 The result of globbing is a sequence of file names.  The function
 340 @code{glob} allocates a string for each resulting word, then
 341 allocates a vector of type @code{char **} to store the addresses of
 342 these strings.  The last element of the vector is a null pointer.
 343 This vector is called the @dfn{word vector}.
 344
 345 To return this vector, @code{glob} stores both its address and its
 346 length (number of elements, not counting the terminating null pointer)
 347 into @code{*@var{vector-ptr}}.
 348
 349 Normally, @code{glob} sorts the file names alphabetically before
 350 returning them.  You can turn this off with the flag @code{GLOB_NOSORT}
 351 if you want to get the information as fast as possible.  Usually it's
 352 a good idea to let @code{glob} sort them---if you process the files in
 353 alphabetical order, the users will have a feel for the rate of progress
 354 that your application is making.
 355
 356 If @code{glob} succeeds, it returns 0.  Otherwise, it returns one
 357 of these error codes:
 358
 359 @vtable @code
 360 @comment glob.h
 361 @comment POSIX.2
 362 @item GLOB_ABORTED
 363 There was an error opening a directory, and you used the flag
 364 @code{GLOB_ERR} or your specified @var{errfunc} returned a nonzero
 365 value.
 366 @iftex
 367 See below
 368 @end iftex
 369 @ifinfo
 370 @xref{Flags for Globbing},
 371 @end ifinfo
 372 for an explanation of the @code{GLOB_ERR} flag and @var{errfunc}.
 373
 374 @comment glob.h
 375 @comment POSIX.2
 376 @item GLOB_NOMATCH
 377 The pattern didn't match any existing files.  If you use the
 378 @code{GLOB_NOCHECK} flag, then you never get this error code, because
 379 that flag tells @code{glob} to @emph{pretend} that the pattern matched
 380 at least one file.
 381
 382 @comment glob.h
 383 @comment POSIX.2
 384 @item GLOB_NOSPACE
 385 It was impossible to allocate memory to hold the result.
 386 @end vtable
 387
 388 In the event of an error, @code{glob} stores information in
 389 @code{*@var{vector-ptr}} about all the matches it has found so far.
 390
 391 It is important to notice that the @code{glob} function will not fail if
 392 it encounters directories or files which cannot be handled without the
 393 LFS interfaces.  The implementation of @code{glob} is supposed to use
 394 these functions internally.  This at least is the assumptions made by
 395 the Unix standard.  The GNU extension of allowing the user to provide
 396 own directory handling and @code{stat} functions complicates things a
 397 bit.  If these callback functions are used and a large file or directory
 398 is encountered @code{glob} @emph{can} fail.
 399 @end deftypefun
 400
 401 @comment glob.h
 402 @comment GNU
 403 @deftypefun int glob64 (const char *@var{pattern}, int @var{flags}, int (*@var{errfunc}) (const char *@var{filename}, int @var{error-code}), glob64_t *@var{vector-ptr})
 404 The @code{glob64} function was added as part of the Large File Summit
 405 extensions but is not part of the original LFS proposal.  The reason for
 406 this is simple: it is not necessary.  The necessity for a @code{glob64}
 407 function is added by the extensions of the GNU @code{glob}
 408 implementation which allows the user to provide own directory handling
 409 and @code{stat} functions.  The @code{readdir} and @code{stat} functions
 410 do depend on the choice of @code{_FILE_OFFSET_BITS} since the definition
 411 of the types @code{struct dirent} and @code{struct stat} will change
 412 depending on the choice.
 413
 414 Beside this difference the @code{glob64} works just like @code{glob} in
 415 all aspects.
 416
 417 This function is a GNU extension.
 418 @end deftypefun
 419
 420 @node Flags for Globbing
 421 @subsection Flags for Globbing
 422
 423 This section describes the flags that you can specify in the
 424 @var{flags} argument to @code{glob}.  Choose the flags you want,
 425 and combine them with the C bitwise OR operator @code{|}.
 426
 427 @vtable @code
 428 @comment glob.h
 429 @comment POSIX.2
 430 @item GLOB_APPEND
 431 Append the words from this expansion to the vector of words produced by
 432 previous calls to @code{glob}.  This way you can effectively expand
 433 several words as if they were concatenated with spaces between them.
 434
 435 In order for appending to work, you must not modify the contents of the
 436 word vector structure between calls to @code{glob}.  And, if you set
 437 @code{GLOB_DOOFFS} in the first call to @code{glob}, you must also
 438 set it when you append to the results.
 439
 440 Note that the pointer stored in @code{gl_pathv} may no longer be valid
 441 after you call @code{glob} the second time, because @code{glob} might
 442 have relocated the vector.  So always fetch @code{gl_pathv} from the
 443 @code{glob_t} structure after each @code{glob} call; @strong{never} save
 444 the pointer across calls.
 445
 446 @comment glob.h
 447 @comment POSIX.2
 448 @item GLOB_DOOFFS
 449 Leave blank slots at the beginning of the vector of words.
 450 The @code{gl_offs} field says how many slots to leave.
 451 The blank slots contain null pointers.
 452
 453 @comment glob.h
 454 @comment POSIX.2
 455 @item GLOB_ERR
 456 Give up right away and report an error if there is any difficulty
 457 reading the directories that must be read in order to expand @var{pattern}
 458 fully.  Such difficulties might include a directory in which you don't
 459 have the requisite access.  Normally, @code{glob} tries its best to keep
 460 on going despite any errors, reading whatever directories it can.
 461
 462 You can exercise even more control than this by specifying an
 463 error-handler function @var{errfunc} when you call @code{glob}.  If
 464 @var{errfunc} is not a null pointer, then @code{glob} doesn't give up
 465 right away when it can't read a directory; instead, it calls
 466 @var{errfunc} with two arguments, like this:
 467
 468 @smallexample
 469 (*@var{errfunc}) (@var{filename}, @var{error-code})
 470 @end smallexample
 471
 472 @noindent
 473 The argument @var{filename} is the name of the directory that
 474 @code{glob} couldn't open or couldn't read, and @var{error-code} is the
 475 @code{errno} value that was reported to @code{glob}.
 476
 477 If the error handler function returns nonzero, then @code{glob} gives up
 478 right away.  Otherwise, it continues.
 479
 480 @comment glob.h
 481 @comment POSIX.2
 482 @item GLOB_MARK
 483 If the pattern matches the name of a directory, append @samp{/} to the
 484 directory's name when returning it.
 485
 486 @comment glob.h
 487 @comment POSIX.2
 488 @item GLOB_NOCHECK
 489 If the pattern doesn't match any file names, return the pattern itself
 490 as if it were a file name that had been matched.  (Normally, when the
 491 pattern doesn't match anything, @code{glob} returns that there were no
 492 matches.)
 493
 494 @comment glob.h
 495 @comment POSIX.2
 496 @item GLOB_NOSORT
 497 Don't sort the file names; return them in no particular order.
 498 (In practice, the order will depend on the order of the entries in
 499 the directory.)  The only reason @emph{not} to sort is to save time.
 500
 501 @comment glob.h
 502 @comment POSIX.2
 503 @item GLOB_NOESCAPE
 504 Don't treat the @samp{\} character specially in patterns.  Normally,
 505 @samp{\} quotes the following character, turning off its special meaning
 506 (if any) so that it matches only itself.  When quoting is enabled, the
 507 pattern @samp{\?} matches only the string @samp{?}, because the question
 508 mark in the pattern acts like an ordinary character.
 509
 510 If you use @code{GLOB_NOESCAPE}, then @samp{\} is an ordinary character.
 511
 512 @code{glob} does its work by calling the function @code{fnmatch}
 513 repeatedly.  It handles the flag @code{GLOB_NOESCAPE} by turning on the
 514 @code{FNM_NOESCAPE} flag in calls to @code{fnmatch}.
 515 @end vtable
 516
 517 @node More Flags for Globbing
 518 @subsection More Flags for Globbing
 519
 520 Beside the flags described in the last section, the GNU implementation of
 521 @code{glob} allows a few more flags which are also defined in the
 522 @file{glob.h} file.  Some of the extensions implement functionality
 523 which is available in modern shell implementations.
 524
 525 @vtable @code
 526 @comment glob.h
 527 @comment GNU
 528 @item GLOB_PERIOD
 529 The @code{.} character (period) is treated special.  It cannot be
 530 matched by wildcards.  @xref{Wildcard Matching}, @code{FNM_PERIOD}.
 531
 532 @comment glob.h
 533 @comment GNU
 534 @item GLOB_MAGCHAR
 535 The @code{GLOB_MAGCHAR} value is not to be given to @code{glob} in the
 536 @var{flags} parameter.  Instead, @code{glob} sets this bit in the
 537 @var{gl_flags} element of the @var{glob_t} structure provided as the
 538 result if the pattern used for matching contains any wildcard character.
 539
 540 @comment glob.h
 541 @comment GNU
 542 @item GLOB_ALTDIRFUNC
 543 Instead of the using the using the normal functions for accessing the
 544 filesystem the @code{glob} implementation uses the user-supplied
 545 functions specified in the structure pointed to by @var{pglob}
 546 parameter.  For more information about the functions refer to the
 547 sections about directory handling see @ref{Accessing Directories}, and
 548 @ref{Reading Attributes}.
 549
 550 @comment glob.h
 551 @comment GNU
 552 @item GLOB_BRACE
 553 If this flag is given the handling of braces in the pattern is changed.
 554 It is now required that braces appear correctly grouped.  I.e., for each
 555 opening brace there must be a closing one.  Braces can be used
 556 recursively.  So it is possible to define one brace expression in
 557 another one.  It is important to note that the range of each brace
 558 expression is completely contained in the outer brace expression (if
 559 there is one).
 560
 561 The string between the matching braces is separated into single
 562 expressions by splitting at @code{,} (comma) characters.  The commas
 563 themselves are discarded.  Please note what we said above about recursive
 564 brace expressions.  The commas used to separate the subexpressions must
 565 be at the same level.  Commas in brace subexpressions are not matched.
 566 They are used during expansion of the brace expression of the deeper
 567 level.  The example below shows this
 568
 569 @smallexample
 570 glob ("@{foo/@{,bar,biz@},baz@}", GLOB_BRACE, NULL, &result)
 571 @end smallexample
 572
 573 @noindent
 574 is equivalent to the sequence
 575
 576 @smallexample
 577 glob ("foo/", GLOB_BRACE, NULL, &result)
 578 glob ("foo/bar", GLOB_BRACE|GLOB_APPEND, NULL, &result)
 579 glob ("foo/biz", GLOB_BRACE|GLOB_APPEND, NULL, &result)
 580 glob ("baz", GLOB_BRACE|GLOB_APPEND, NULL, &result)
 581 @end smallexample
 582
 583 @noindent
 584 if we leave aside error handling.
 585
 586 @comment glob.h
 587 @comment GNU
 588 @item GLOB_NOMAGIC
 589 If the pattern contains no wildcard constructs (it is a literal file name),
 590 return it as the sole ``matching'' word, even if no file exists by that name.
 591
 592 @comment glob.h
 593 @comment GNU
 594 @item GLOB_TILDE
 595 If this flag is used the character @code{~} (tilde) is handled special
 596 if it appears at the beginning of the pattern.  Instead of being taken
 597 verbatim it is used to represent the home directory of a known user.
 598
 599 If @code{~} is the only character in pattern or it is followed by a
 600 @code{/} (slash), the home directory of the process owner is
 601 substituted.  Using @code{getlogin} and @code{getpwnam} the information
 602 is read from the system databases.  As an example take user @code{bart}
 603 with his home directory at @file{/home/bart}.  For him a call like
 604
 605 @smallexample
 606 glob ("~/bin/*", GLOB_TILDE, NULL, &result)
 607 @end smallexample
 608
 609 @noindent
 610 would return the contents of the directory @file{/home/bart/bin}.
 611 Instead of referring to the own home directory it is also possible to
 612 name the home directory of other users.  To do so one has to append the
 613 user name after the tilde character.  So the contents of user
 614 @code{homer}'s @file{bin} directory can be retrieved by
 615
 616 @smallexample
 617 glob ("~homer/bin/*", GLOB_TILDE, NULL, &result)
 618 @end smallexample
 619
 620 If the user name is not valid or the home directory cannot be determined
 621 for some reason the pattern is left untouched and itself used as the
 622 result.  I.e., if in the last example @code{home} is not available the
 623 tilde expansion yields to @code{"~homer/bin/*"} and @code{glob} is not
 624 looking for a directory named @code{~homer}.
 625
 626 This functionality is equivalent to what is available in C-shells if the
 627 @code{nonomatch} flag is set.
 628
 629 @comment glob.h
 630 @comment GNU
 631 @item GLOB_TILDE_CHECK
 632 If this flag is used @code{glob} behaves like as if @code{GLOB_TILDE} is
 633 given.  The only difference is that if the user name is not available or
 634 the home directory cannot be determined for other reasons this leads to
 635 an error.  @code{glob} will return @code{GLOB_NOMATCH} instead of using
 636 the pattern itself as the name.
 637
 638 This functionality is equivalent to what is available in C-shells if
 639 @code{nonomatch} flag is not set.
 640
 641 @comment glob.h
 642 @comment GNU
 643 @item GLOB_ONLYDIR
 644 If this flag is used the globbing function takes this as a
 645 @strong{hint} that the caller is only interested in directories
 646 matching the pattern.  If the information about the type of the file
 647 is easily available non-directories will be rejected but no extra
 648 work will be done to determine the information for each file.  I.e.,
 649 the caller must still be able to filter directories out.
 650
 651 This functionality is only available with the GNU @code{glob}
 652 implementation.  It is mainly used internally to increase the
 653 performance but might be useful for a user as well and therefore is
 654 documented here.
 655 @end vtable
 656
 657 Calling @code{glob} will in most cases allocate resources which are used
 658 to represent the result of the function call.  If the same object of
 659 type @code{glob_t} is used in multiple call to @code{glob} the resources
 660 are freed or reused so that no leaks appear.  But this does not include
 661 the time when all @code{glob} calls are done.
 662
 663 @comment glob.h
 664 @comment POSIX.2
 665 @deftypefun void globfree (glob_t *@var{pglob})
 666 The @code{globfree} function frees all resources allocated by previous
 667 calls to @code{glob} associated with the object pointed to by
 668 @var{pglob}.  This function should be called whenever the currently used
 669 @code{glob_t} typed object isn't used anymore.
 670 @end deftypefun
 671
 672 @comment glob.h
 673 @comment GNU
 674 @deftypefun void globfree64 (glob64_t *@var{pglob})
 675 This function is equivalent to @code{globfree} but it frees records of
 676 type @code{glob64_t} which were allocated by @code{glob64}.
 677 @end deftypefun
 678
 679
 680 @node Regular Expressions
 681 @section Regular Expression Matching
 682
 683 @Theglibc{} supports two interfaces for matching regular
 684 expressions.  One is the standard POSIX.2 interface, and the other is
 685 what @theglibc{} has had for many years.
 686
 687 Both interfaces are declared in the header file @file{regex.h}.
 688 If you define @w{@code{_POSIX_C_SOURCE}}, then only the POSIX.2
 689 functions, structures, and constants are declared.
 690 @c !!! we only document the POSIX.2 interface here!!
 691
 692 @menu
 693 * POSIX Regexp Compilation::    Using @code{regcomp} to prepare to match.
 694 * Flags for POSIX Regexps::     Syntax variations for @code{regcomp}.
 695 * Matching POSIX Regexps::      Using @code{regexec} to match the compiled
 696                                    pattern that you get from @code{regcomp}.
 697 * Regexp Subexpressions::       Finding which parts of the string were matched.
 698 * Subexpression Complications:: Find points of which parts were matched.
 699 * Regexp Cleanup::              Freeing storage; reporting errors.
 700 @end menu
 701
 702 @node POSIX Regexp Compilation
 703 @subsection POSIX Regular Expression Compilation
 704
 705 Before you can actually match a regular expression, you must
 706 @dfn{compile} it.  This is not true compilation---it produces a special
 707 data structure, not machine instructions.  But it is like ordinary
 708 compilation in that its purpose is to enable you to ``execute'' the
 709 pattern fast.  (@xref{Matching POSIX Regexps}, for how to use the
 710 compiled regular expression for matching.)
 711
 712 There is a special data type for compiled regular expressions:
 713
 714 @comment regex.h
 715 @comment POSIX.2
 716 @deftp {Data Type} regex_t
 717 This type of object holds a compiled regular expression.
 718 It is actually a structure.  It has just one field that your programs
 719 should look at:
 720
 721 @table @code
 722 @item re_nsub
 723 This field holds the number of parenthetical subexpressions in the
 724 regular expression that was compiled.
 725 @end table
 726
 727 There are several other fields, but we don't describe them here, because
 728 only the functions in the library should use them.
 729 @end deftp
 730
 731 After you create a @code{regex_t} object, you can compile a regular
 732 expression into it by calling @code{regcomp}.
 733
 734 @comment regex.h
 735 @comment POSIX.2
 736 @deftypefun int regcomp (regex_t *restrict @var{compiled}, const char *restrict @var{pattern}, int @var{cflags})
 737 The function @code{regcomp} ``compiles'' a regular expression into a
 738 data structure that you can use with @code{regexec} to match against a
 739 string.  The compiled regular expression format is designed for
 740 efficient matching.  @code{regcomp} stores it into @code{*@var{compiled}}.
 741
 742 It's up to you to allocate an object of type @code{regex_t} and pass its
 743 address to @code{regcomp}.
 744
 745 The argument @var{cflags} lets you specify various options that control
 746 the syntax and semantics of regular expressions.  @xref{Flags for POSIX
 747 Regexps}.
 748
 749 If you use the flag @code{REG_NOSUB}, then @code{regcomp} omits from
 750 the compiled regular expression the information necessary to record
 751 how subexpressions actually match.  In this case, you might as well
 752 pass @code{0} for the @var{matchptr} and @var{nmatch} arguments when
 753 you call @code{regexec}.
 754
 755 If you don't use @code{REG_NOSUB}, then the compiled regular expression
 756 does have the capacity to record how subexpressions match.  Also,
 757 @code{regcomp} tells you how many subexpressions @var{pattern} has, by
 758 storing the number in @code{@var{compiled}->re_nsub}.  You can use that
 759 value to decide how long an array to allocate to hold information about
 760 subexpression matches.
 761
 762 @code{regcomp} returns @code{0} if it succeeds in compiling the regular
 763 expression; otherwise, it returns a nonzero error code (see the table
 764 below).  You can use @code{regerror} to produce an error message string
 765 describing the reason for a nonzero value; see @ref{Regexp Cleanup}.
 766
 767 @end deftypefun
 768
 769 Here are the possible nonzero values that @code{regcomp} can return:
 770
 771 @table @code
 772 @comment regex.h
 773 @comment POSIX.2
 774 @item REG_BADBR
 775 There was an invalid @samp{\@{@dots{}\@}} construct in the regular
 776 expression.  A valid @samp{\@{@dots{}\@}} construct must contain either
 777 a single number, or two numbers in increasing order separated by a
 778 comma.
 779
 780 @comment regex.h
 781 @comment POSIX.2
 782 @item REG_BADPAT
 783 There was a syntax error in the regular expression.
 784
 785 @comment regex.h
 786 @comment POSIX.2
 787 @item REG_BADRPT
 788 A repetition operator such as @samp{?} or @samp{*} appeared in a bad
 789 position (with no preceding subexpression to act on).
 790
 791 @comment regex.h
 792 @comment POSIX.2
 793 @item REG_ECOLLATE
 794 The regular expression referred to an invalid collating element (one not
 795 defined in the current locale for string collation).  @xref{Locale
 796 Categories}.
 797
 798 @comment regex.h
 799 @comment POSIX.2
 800 @item REG_ECTYPE
 801 The regular expression referred to an invalid character class name.
 802
 803 @comment regex.h
 804 @comment POSIX.2
 805 @item REG_EESCAPE
 806 The regular expression ended with @samp{\}.
 807
 808 @comment regex.h
 809 @comment POSIX.2
 810 @item REG_ESUBREG
 811 There was an invalid number in the @samp{\@var{digit}} construct.
 812
 813 @comment regex.h
 814 @comment POSIX.2
 815 @item REG_EBRACK
 816 There were unbalanced square brackets in the regular expression.
 817
 818 @comment regex.h
 819 @comment POSIX.2
 820 @item REG_EPAREN
 821 An extended regular expression had unbalanced parentheses,
 822 or a basic regular expression had unbalanced @samp{\(} and @samp{\)}.
 823
 824 @comment regex.h
 825 @comment POSIX.2
 826 @item REG_EBRACE
 827 The regular expression had unbalanced @samp{\@{} and @samp{\@}}.
 828
 829 @comment regex.h
 830 @comment POSIX.2
 831 @item REG_ERANGE
 832 One of the endpoints in a range expression was invalid.
 833
 834 @comment regex.h
 835 @comment POSIX.2
 836 @item REG_ESPACE
 837 @code{regcomp} ran out of memory.
 838 @end table
 839
 840 @node Flags for POSIX Regexps
 841 @subsection Flags for POSIX Regular Expressions
 842
 843 These are the bit flags that you can use in the @var{cflags} operand when
 844 compiling a regular expression with @code{regcomp}.
 845
 846 @table @code
 847 @comment regex.h
 848 @comment POSIX.2
 849 @item REG_EXTENDED
 850 Treat the pattern as an extended regular expression, rather than as a
 851 basic regular expression.
 852
 853 @comment regex.h
 854 @comment POSIX.2
 855 @item REG_ICASE
 856 Ignore case when matching letters.
 857
 858 @comment regex.h
 859 @comment POSIX.2
 860 @item REG_NOSUB
 861 Don't bother storing the contents of the @var{matches-ptr} array.
 862
 863 @comment regex.h
 864 @comment POSIX.2
 865 @item REG_NEWLINE
 866 Treat a newline in @var{string} as dividing @var{string} into multiple
 867 lines, so that @samp{$} can match before the newline and @samp{^} can
 868 match after.  Also, don't permit @samp{.} to match a newline, and don't
 869 permit @samp{[^@dots{}]} to match a newline.
 870
 871 Otherwise, newline acts like any other ordinary character.
 872 @end table
 873
 874 @node Matching POSIX Regexps
 875 @subsection Matching a Compiled POSIX Regular Expression
 876
 877 Once you have compiled a regular expression, as described in @ref{POSIX
 878 Regexp Compilation}, you can match it against strings using
 879 @code{regexec}.  A match anywhere inside the string counts as success,
 880 unless the regular expression contains anchor characters (@samp{^} or
 881 @samp{$}).
 882
 883 @comment regex.h
 884 @comment POSIX.2
 885 @deftypefun int regexec (const regex_t *restrict @var{compiled}, const char *restrict @var{string}, size_t @var{nmatch}, regmatch_t @var{matchptr}[restrict], int @var{eflags})
 886 This function tries to match the compiled regular expression
 887 @code{*@var{compiled}} against @var{string}.
 888
 889 @code{regexec} returns @code{0} if the regular expression matches;
 890 otherwise, it returns a nonzero value.  See the table below for
 891 what nonzero values mean.  You can use @code{regerror} to produce an
 892 error message string describing the reason for a nonzero value;
 893 see @ref{Regexp Cleanup}.
 894
 895 The argument @var{eflags} is a word of bit flags that enable various
 896 options.
 897
 898 If you want to get information about what part of @var{string} actually
 899 matched the regular expression or its subexpressions, use the arguments
 900 @var{matchptr} and @var{nmatch}.  Otherwise, pass @code{0} for
 901 @var{nmatch}, and @code{NULL} for @var{matchptr}.  @xref{Regexp
 902 Subexpressions}.
 903 @end deftypefun
 904
 905 You must match the regular expression with the same set of current
 906 locales that were in effect when you compiled the regular expression.
 907
 908 The function @code{regexec} accepts the following flags in the
 909 @var{eflags} argument:
 910
 911 @table @code
 912 @comment regex.h
 913 @comment POSIX.2
 914 @item REG_NOTBOL
 915 Do not regard the beginning of the specified string as the beginning of
 916 a line; more generally, don't make any assumptions about what text might
 917 precede it.
 918
 919 @comment regex.h
 920 @comment POSIX.2
 921 @item REG_NOTEOL
 922 Do not regard the end of the specified string as the end of a line; more
 923 generally, don't make any assumptions about what text might follow it.
 924 @end table
 925
 926 Here are the possible nonzero values that @code{regexec} can return:
 927
 928 @table @code
 929 @comment regex.h
 930 @comment POSIX.2
 931 @item REG_NOMATCH
 932 The pattern didn't match the string.  This isn't really an error.
 933
 934 @comment regex.h
 935 @comment POSIX.2
 936 @item REG_ESPACE
 937 @code{regexec} ran out of memory.
 938 @end table
 939
 940 @node Regexp Subexpressions
 941 @subsection Match Results with Subexpressions
 942
 943 When @code{regexec} matches parenthetical subexpressions of
 944 @var{pattern}, it records which parts of @var{string} they match.  It
 945 returns that information by storing the offsets into an array whose
 946 elements are structures of type @code{regmatch_t}.  The first element of
 947 the array (index @code{0}) records the part of the string that matched
 948 the entire regular expression.  Each other element of the array records
 949 the beginning and end of the part that matched a single parenthetical
 950 subexpression.
 951
 952 @comment regex.h
 953 @comment POSIX.2
 954 @deftp {Data Type} regmatch_t
 955 This is the data type of the @var{matcharray} array that you pass to
 956 @code{regexec}.  It contains two structure fields, as follows:
 957
 958 @table @code
 959 @item rm_so
 960 The offset in @var{string} of the beginning of a substring.  Add this
 961 value to @var{string} to get the address of that part.
 962
 963 @item rm_eo
 964 The offset in @var{string} of the end of the substring.
 965 @end table
 966 @end deftp
 967
 968 @comment regex.h
 969 @comment POSIX.2
 970 @deftp {Data Type} regoff_t
 971 @code{regoff_t} is an alias for another signed integer type.
 972 The fields of @code{regmatch_t} have type @code{regoff_t}.
 973 @end deftp
 974
 975 The @code{regmatch_t} elements correspond to subexpressions
 976 positionally; the first element (index @code{1}) records where the first
 977 subexpression matched, the second element records the second
 978 subexpression, and so on.  The order of the subexpressions is the order
 979 in which they begin.
 980
 981 When you call @code{regexec}, you specify how long the @var{matchptr}
 982 array is, with the @var{nmatch} argument.  This tells @code{regexec} how
 983 many elements to store.  If the actual regular expression has more than
 984 @var{nmatch} subexpressions, then you won't get offset information about
 985 the rest of them.  But this doesn't alter whether the pattern matches a
 986 particular string or not.
 987
 988 If you don't want @code{regexec} to return any information about where
 989 the subexpressions matched, you can either supply @code{0} for
 990 @var{nmatch}, or use the flag @code{REG_NOSUB} when you compile the
 991 pattern with @code{regcomp}.
 992
 993 @node Subexpression Complications
 994 @subsection Complications in Subexpression Matching
 995
 996 Sometimes a subexpression matches a substring of no characters.  This
 997 happens when @samp{f\(o*\)} matches the string @samp{fum}.  (It really
 998 matches just the @samp{f}.)  In this case, both of the offsets identify
 999 the point in the string where the null substring was found.  In this
1000 example, the offsets are both @code{1}.
1001
1002 Sometimes the entire regular expression can match without using some of
1003 its subexpressions at all---for example, when @samp{ba\(na\)*} matches the
1004 string @samp{ba}, the parenthetical subexpression is not used.  When
1005 this happens, @code{regexec} stores @code{-1} in both fields of the
1006 element for that subexpression.
1007
1008 Sometimes matching the entire regular expression can match a particular
1009 subexpression more than once---for example, when @samp{ba\(na\)*}
1010 matches the string @samp{bananana}, the parenthetical subexpression
1011 matches three times.  When this happens, @code{regexec} usually stores
1012 the offsets of the last part of the string that matched the
1013 subexpression.  In the case of @samp{bananana}, these offsets are
1014 @code{6} and @code{8}.
1015
1016 But the last match is not always the one that is chosen.  It's more
1017 accurate to say that the last @emph{opportunity} to match is the one
1018 that takes precedence.  What this means is that when one subexpression
1019 appears within another, then the results reported for the inner
1020 subexpression reflect whatever happened on the last match of the outer
1021 subexpression.  For an example, consider @samp{\(ba\(na\)*s \)*} matching
1022 the string @samp{bananas bas }.  The last time the inner expression
1023 actually matches is near the end of the first word.  But it is
1024 @emph{considered} again in the second word, and fails to match there.
1025 @code{regexec} reports nonuse of the ``na'' subexpression.
1026
1027 Another place where this rule applies is when the regular expression
1028 @smallexample
1029 \(ba\(na\)*s \|nefer\(ti\)* \)*
1030 @end smallexample
1031 @noindent
1032 matches @samp{bananas nefertiti}.  The ``na'' subexpression does match
1033 in the first word, but it doesn't match in the second word because the
1034 other alternative is used there.  Once again, the second repetition of
1035 the outer subexpression overrides the first, and within that second
1036 repetition, the ``na'' subexpression is not used.  So @code{regexec}
1037 reports nonuse of the ``na'' subexpression.
1038
1039 @node Regexp Cleanup
1040 @subsection POSIX Regexp Matching Cleanup
1041
1042 When you are finished using a compiled regular expression, you can
1043 free the storage it uses by calling @code{regfree}.
1044
1045 @comment regex.h
1046 @comment POSIX.2
1047 @deftypefun void regfree (regex_t *@var{compiled})
1048 Calling @code{regfree} frees all the storage that @code{*@var{compiled}}
1049 points to.  This includes various internal fields of the @code{regex_t}
1050 structure that aren't documented in this manual.
1051
1052 @code{regfree} does not free the object @code{*@var{compiled}} itself.
1053 @end deftypefun
1054
1055 You should always free the space in a @code{regex_t} structure with
1056 @code{regfree} before using the structure to compile another regular
1057 expression.
1058
1059 When @code{regcomp} or @code{regexec} reports an error, you can use
1060 the function @code{regerror} to turn it into an error message string.
1061
1062 @comment regex.h
1063 @comment POSIX.2
1064 @deftypefun size_t regerror (int @var{errcode}, const regex_t *restrict @var{compiled}, char *restrict @var{buffer}, size_t @var{length})
1065 This function produces an error message string for the error code
1066 @var{errcode}, and stores the string in @var{length} bytes of memory
1067 starting at @var{buffer}.  For the @var{compiled} argument, supply the
1068 same compiled regular expression structure that @code{regcomp} or
1069 @code{regexec} was working with when it got the error.  Alternatively,
1070 you can supply @code{NULL} for @var{compiled}; you will still get a
1071 meaningful error message, but it might not be as detailed.
1072
1073 If the error message can't fit in @var{length} bytes (including a
1074 terminating null character), then @code{regerror} truncates it.
1075 The string that @code{regerror} stores is always null-terminated
1076 even if it has been truncated.
1077
1078 The return value of @code{regerror} is the minimum length needed to
1079 store the entire error message.  If this is less than @var{length}, then
1080 the error message was not truncated, and you can use it.  Otherwise, you
1081 should call @code{regerror} again with a larger buffer.
1082
1083 Here is a function which uses @code{regerror}, but always dynamically
1084 allocates a buffer for the error message:
1085
1086 @smallexample
1087 char *get_regerror (int errcode, regex_t *compiled)
1088 @{
1089   size_t length = regerror (errcode, compiled, NULL, 0);
1090   char *buffer = xmalloc (length);
1091   (void) regerror (errcode, compiled, buffer, length);
1092   return buffer;
1093 @}
1094 @end smallexample
1095 @end deftypefun
1096
1097 @node Word Expansion
1098 @section Shell-Style Word Expansion
1099 @cindex word expansion
1100 @cindex expansion of shell words
1101
1102 @dfn{Word expansion} means the process of splitting a string into
1103 @dfn{words} and substituting for variables, commands, and wildcards
1104 just as the shell does.
1105
1106 For example, when you write @samp{ls -l foo.c}, this string is split
1107 into three separate words---@samp{ls}, @samp{-l} and @samp{foo.c}.
1108 This is the most basic function of word expansion.
1109
1110 When you write @samp{ls *.c}, this can become many words, because
1111 the word @samp{*.c} can be replaced with any number of file names.
1112 This is called @dfn{wildcard expansion}, and it is also a part of
1113 word expansion.
1114
1115 When you use @samp{echo $PATH} to print your path, you are taking
1116 advantage of @dfn{variable substitution}, which is also part of word
1117 expansion.
1118
1119 Ordinary programs can perform word expansion just like the shell by
1120 calling the library function @code{wordexp}.
1121
1122 @menu
1123 * Expansion Stages::            What word expansion does to a string.
1124 * Calling Wordexp::             How to call @code{wordexp}.
1125 * Flags for Wordexp::           Options you can enable in @code{wordexp}.
1126 * Wordexp Example::             A sample program that does word expansion.
1127 * Tilde Expansion::             Details of how tilde expansion works.
1128 * Variable Substitution::       Different types of variable substitution.
1129 @end menu
1130
1131 @node Expansion Stages
1132 @subsection The Stages of Word Expansion
1133
1134 When word expansion is applied to a sequence of words, it performs the
1135 following transformations in the order shown here:
1136
1137 @enumerate
1138 @item
1139 @cindex tilde expansion
1140 @dfn{Tilde expansion}: Replacement of @samp{~foo} with the name of
1141 the home directory of @samp{foo}.
1142
1143 @item
1144 Next, three different transformations are applied in the same step,
1145 from left to right:
1146
1147 @itemize @bullet
1148 @item
1149 @cindex variable substitution
1150 @cindex substitution of variables and commands
1151 @dfn{Variable substitution}: Environment variables are substituted for
1152 references such as @samp{$foo}.
1153
1154 @item
1155 @cindex command substitution
1156 @dfn{Command substitution}: Constructs such as @w{@samp{`cat foo`}} and
1157 the equivalent @w{@samp{$(cat foo)}} are replaced with the output from
1158 the inner command.
1159
1160 @item
1161 @cindex arithmetic expansion
1162 @dfn{Arithmetic expansion}: Constructs such as @samp{$(($x-1))} are
1163 replaced with the result of the arithmetic computation.
1164 @end itemize
1165
1166 @item
1167 @cindex field splitting
1168 @dfn{Field splitting}: subdivision of the text into @dfn{words}.
1169
1170 @item
1171 @cindex wildcard expansion
1172 @dfn{Wildcard expansion}: The replacement of a construct such as @samp{*.c}
1173 with a list of @samp{.c} file names.  Wildcard expansion applies to an
1174 entire word at a time, and replaces that word with 0 or more file names
1175 that are themselves words.
1176
1177 @item
1178 @cindex quote removal
1179 @cindex removal of quotes
1180 @dfn{Quote removal}: The deletion of string-quotes, now that they have
1181 done their job by inhibiting the above transformations when appropriate.
1182 @end enumerate
1183
1184 For the details of these transformations, and how to write the constructs
1185 that use them, see @w{@cite{The BASH Manual}} (to appear).
1186
1187 @node Calling Wordexp
1188 @subsection Calling @code{wordexp}
1189
1190 All the functions, constants and data types for word expansion are
1191 declared in the header file @file{wordexp.h}.
1192
1193 Word expansion produces a vector of words (strings).  To return this
1194 vector, @code{wordexp} uses a special data type, @code{wordexp_t}, which
1195 is a structure.  You pass @code{wordexp} the address of the structure,
1196 and it fills in the structure's fields to tell you about the results.
1197
1198 @comment wordexp.h
1199 @comment POSIX.2
1200 @deftp {Data Type} {wordexp_t}
1201 This data type holds a pointer to a word vector.  More precisely, it
1202 records both the address of the word vector and its size.
1203
1204 @table @code
1205 @item we_wordc
1206 The number of elements in the vector.
1207
1208 @item we_wordv
1209 The address of the vector.  This field has type @w{@code{char **}}.
1210
1211 @item we_offs
1212 The offset of the first real element of the vector, from its nominal
1213 address in the @code{we_wordv} field.  Unlike the other fields, this
1214 is always an input to @code{wordexp}, rather than an output from it.
1215
1216 If you use a nonzero offset, then that many elements at the beginning of
1217 the vector are left empty.  (The @code{wordexp} function fills them with
1218 null pointers.)
1219
1220 The @code{we_offs} field is meaningful only if you use the
1221 @code{WRDE_DOOFFS} flag.  Otherwise, the offset is always zero
1222 regardless of what is in this field, and the first real element comes at
1223 the beginning of the vector.
1224 @end table
1225 @end deftp
1226
1227 @comment wordexp.h
1228 @comment POSIX.2
1229 @deftypefun int wordexp (const char *@var{words}, wordexp_t *@var{word-vector-ptr}, int @var{flags})
1230 Perform word expansion on the string @var{words}, putting the result in
1231 a newly allocated vector, and store the size and address of this vector
1232 into @code{*@var{word-vector-ptr}}.  The argument @var{flags} is a
1233 combination of bit flags; see @ref{Flags for Wordexp}, for details of
1234 the flags.
1235
1236 You shouldn't use any of the characters @samp{|&;<>} in the string
1237 @var{words} unless they are quoted; likewise for newline.  If you use
1238 these characters unquoted, you will get the @code{WRDE_BADCHAR} error
1239 code.  Don't use parentheses or braces unless they are quoted or part of
1240 a word expansion construct.  If you use quotation characters @samp{'"`},
1241 they should come in pairs that balance.
1242
1243 The results of word expansion are a sequence of words.  The function
1244 @code{wordexp} allocates a string for each resulting word, then
1245 allocates a vector of type @code{char **} to store the addresses of
1246 these strings.  The last element of the vector is a null pointer.
1247 This vector is called the @dfn{word vector}.
1248
1249 To return this vector, @code{wordexp} stores both its address and its
1250 length (number of elements, not counting the terminating null pointer)
1251 into @code{*@var{word-vector-ptr}}.
1252
1253 If @code{wordexp} succeeds, it returns 0.  Otherwise, it returns one
1254 of these error codes:
1255
1256 @table @code
1257 @comment wordexp.h
1258 @comment POSIX.2
1259 @item WRDE_BADCHAR
1260 The input string @var{words} contains an unquoted invalid character such
1261 as @samp{|}.
1262
1263 @comment wordexp.h
1264 @comment POSIX.2
1265 @item WRDE_BADVAL
1266 The input string refers to an undefined shell variable, and you used the flag
1267 @code{WRDE_UNDEF} to forbid such references.
1268
1269 @comment wordexp.h
1270 @comment POSIX.2
1271 @item WRDE_CMDSUB
1272 The input string uses command substitution, and you used the flag
1273 @code{WRDE_NOCMD} to forbid command substitution.
1274
1275 @comment wordexp.h
1276 @comment POSIX.2
1277 @item WRDE_NOSPACE
1278 It was impossible to allocate memory to hold the result.  In this case,
1279 @code{wordexp} can store part of the results---as much as it could
1280 allocate room for.
1281
1282 @comment wordexp.h
1283 @comment POSIX.2
1284 @item WRDE_SYNTAX
1285 There was a syntax error in the input string.  For example, an unmatched
1286 quoting character is a syntax error.
1287 @end table
1288 @end deftypefun
1289
1290 @comment wordexp.h
1291 @comment POSIX.2
1292 @deftypefun void wordfree (wordexp_t *@var{word-vector-ptr})
1293 Free the storage used for the word-strings and vector that
1294 @code{*@var{word-vector-ptr}} points to.  This does not free the
1295 structure @code{*@var{word-vector-ptr}} itself---only the other
1296 data it points to.
1297 @end deftypefun
1298
1299 @node Flags for Wordexp
1300 @subsection Flags for Word Expansion
1301
1302 This section describes the flags that you can specify in the
1303 @var{flags} argument to @code{wordexp}.  Choose the flags you want,
1304 and combine them with the C operator @code{|}.
1305
1306 @table @code
1307 @comment wordexp.h
1308 @comment POSIX.2
1309 @item WRDE_APPEND
1310 Append the words from this expansion to the vector of words produced by
1311 previous calls to @code{wordexp}.  This way you can effectively expand
1312 several words as if they were concatenated with spaces between them.
1313
1314 In order for appending to work, you must not modify the contents of the
1315 word vector structure between calls to @code{wordexp}.  And, if you set
1316 @code{WRDE_DOOFFS} in the first call to @code{wordexp}, you must also
1317 set it when you append to the results.
1318
1319 @comment wordexp.h
1320 @comment POSIX.2
1321 @item WRDE_DOOFFS
1322 Leave blank slots at the beginning of the vector of words.
1323 The @code{we_offs} field says how many slots to leave.
1324 The blank slots contain null pointers.
1325
1326 @comment wordexp.h
1327 @comment POSIX.2
1328 @item WRDE_NOCMD
1329 Don't do command substitution; if the input requests command substitution,
1330 report an error.
1331
1332 @comment wordexp.h
1333 @comment POSIX.2
1334 @item WRDE_REUSE
1335 Reuse a word vector made by a previous call to @code{wordexp}.
1336 Instead of allocating a new vector of words, this call to @code{wordexp}
1337 will use the vector that already exists (making it larger if necessary).
1338
1339 Note that the vector may move, so it is not safe to save an old pointer
1340 and use it again after calling @code{wordexp}.  You must fetch
1341 @code{we_pathv} anew after each call.
1342
1343 @comment wordexp.h
1344 @comment POSIX.2
1345 @item WRDE_SHOWERR
1346 Do show any error messages printed by commands run by command substitution.
1347 More precisely, allow these commands to inherit the standard error output
1348 stream of the current process.  By default, @code{wordexp} gives these
1349 commands a standard error stream that discards all output.
1350
1351 @comment wordexp.h
1352 @comment POSIX.2
1353 @item WRDE_UNDEF
1354 If the input refers to a shell variable that is not defined, report an
1355 error.
1356 @end table
1357
1358 @node Wordexp Example
1359 @subsection @code{wordexp} Example
1360
1361 Here is an example of using @code{wordexp} to expand several strings
1362 and use the results to run a shell command.  It also shows the use of
1363 @code{WRDE_APPEND} to concatenate the expansions and of @code{wordfree}
1364 to free the space allocated by @code{wordexp}.
1365
1366 @smallexample
1367 int
1368 expand_and_execute (const char *program, const char **options)
1369 @{
1370   wordexp_t result;
1371   pid_t pid
1372   int status, i;
1373
1374   /* @r{Expand the string for the program to run.}  */
1375   switch (wordexp (program, &result, 0))
1376     @{
1377     case 0:                     /* @r{Successful}.  */
1378       break;
1379     case WRDE_NOSPACE:
1380       /* @r{If the error was @code{WRDE_NOSPACE},}
1381          @r{then perhaps part of the result was allocated.}  */
1382       wordfree (&result);
1383     default:                    /* @r{Some other error.}  */
1384       return -1;
1385     @}
1386
1387   /* @r{Expand the strings specified for the arguments.}  */
1388   for (i = 0; options[i] != NULL; i++)
1389     @{
1390       if (wordexp (options[i], &result, WRDE_APPEND))
1391         @{
1392           wordfree (&result);
1393           return -1;
1394         @}
1395     @}
1396
1397   pid = fork ();
1398   if (pid == 0)
1399     @{
1400       /* @r{This is the child process.  Execute the command.} */
1401       execv (result.we_wordv[0], result.we_wordv);
1402       exit (EXIT_FAILURE);
1403     @}
1404   else if (pid < 0)
1405     /* @r{The fork failed.  Report failure.}  */
1406     status = -1;
1407   else
1408     /* @r{This is the parent process.  Wait for the child to complete.}  */
1409     if (waitpid (pid, &status, 0) != pid)
1410       status = -1;
1411
1412   wordfree (&result);
1413   return status;
1414 @}
1415 @end smallexample
1416
1417 @node Tilde Expansion
1418 @subsection Details of Tilde Expansion
1419
1420 It's a standard part of shell syntax that you can use @samp{~} at the
1421 beginning of a file name to stand for your own home directory.  You
1422 can use @samp{~@var{user}} to stand for @var{user}'s home directory.
1423
1424 @dfn{Tilde expansion} is the process of converting these abbreviations
1425 to the directory names that they stand for.
1426
1427 Tilde expansion applies to the @samp{~} plus all following characters up
1428 to whitespace or a slash.  It takes place only at the beginning of a
1429 word, and only if none of the characters to be transformed is quoted in
1430 any way.
1431
1432 Plain @samp{~} uses the value of the environment variable @code{HOME}
1433 as the proper home directory name.  @samp{~} followed by a user name
1434 uses @code{getpwname} to look up that user in the user database, and
1435 uses whatever directory is recorded there.  Thus, @samp{~} followed
1436 by your own name can give different results from plain @samp{~}, if
1437 the value of @code{HOME} is not really your home directory.
1438
1439 @node Variable Substitution
1440 @subsection Details of Variable Substitution
1441
1442 Part of ordinary shell syntax is the use of @samp{$@var{variable}} to
1443 substitute the value of a shell variable into a command.  This is called
1444 @dfn{variable substitution}, and it is one part of doing word expansion.
1445
1446 There are two basic ways you can write a variable reference for
1447 substitution:
1448
1449 @table @code
1450 @item $@{@var{variable}@}
1451 If you write braces around the variable name, then it is completely
1452 unambiguous where the variable name ends.  You can concatenate
1453 additional letters onto the end of the variable value by writing them
1454 immediately after the close brace.  For example, @samp{$@{foo@}s}
1455 expands into @samp{tractors}.
1456
1457 @item $@var{variable}
1458 If you do not put braces around the variable name, then the variable
1459 name consists of all the alphanumeric characters and underscores that
1460 follow the @samp{$}.  The next punctuation character ends the variable
1461 name.  Thus, @samp{$foo-bar} refers to the variable @code{foo} and expands
1462 into @samp{tractor-bar}.
1463 @end table
1464
1465 When you use braces, you can also use various constructs to modify the
1466 value that is substituted, or test it in various ways.
1467
1468 @table @code
1469 @item $@{@var{variable}:-@var{default}@}
1470 Substitute the value of @var{variable}, but if that is empty or
1471 undefined, use @var{default} instead.
1472
1473 @item $@{@var{variable}:=@var{default}@}
1474 Substitute the value of @var{variable}, but if that is empty or
1475 undefined, use @var{default} instead and set the variable to
1476 @var{default}.
1477
1478 @item $@{@var{variable}:?@var{message}@}
1479 If @var{variable} is defined and not empty, substitute its value.
1480
1481 Otherwise, print @var{message} as an error message on the standard error
1482 stream, and consider word expansion a failure.
1483
1484 @c ??? How does wordexp report such an error?
1485 @c WRDE_BADVAL is returned.
1486
1487 @item $@{@var{variable}:+@var{replacement}@}
1488 Substitute @var{replacement}, but only if @var{variable} is defined and
1489 nonempty.  Otherwise, substitute nothing for this construct.
1490 @end table
1491
1492 @table @code
1493 @item $@{#@var{variable}@}
1494 Substitute a numeral which expresses in base ten the number of
1495 characters in the value of @var{variable}.  @samp{$@{#foo@}} stands for
1496 @samp{7}, because @samp{tractor} is seven characters.
1497 @end table
1498
1499 These variants of variable substitution let you remove part of the
1500 variable's value before substituting it.  The @var{prefix} and
1501 @var{suffix} are not mere strings; they are wildcard patterns, just
1502 like the patterns that you use to match multiple file names.  But
1503 in this context, they match against parts of the variable value
1504 rather than against file names.
1505
1506 @table @code
1507 @item $@{@var{variable}%%@var{suffix}@}
1508 Substitute the value of @var{variable}, but first discard from that
1509 variable any portion at the end that matches the pattern @var{suffix}.
1510
1511 If there is more than one alternative for how to match against
1512 @var{suffix}, this construct uses the longest possible match.
1513
1514 Thus, @samp{$@{foo%%r*@}} substitutes @samp{t}, because the largest
1515 match for @samp{r*} at the end of @samp{tractor} is @samp{ractor}.
1516
1517 @item $@{@var{variable}%@var{suffix}@}
1518 Substitute the value of @var{variable}, but first discard from that
1519 variable any portion at the end that matches the pattern @var{suffix}.
1520
1521 If there is more than one alternative for how to match against
1522 @var{suffix}, this construct uses the shortest possible alternative.
1523
1524 Thus, @samp{$@{foo%r*@}} substitutes @samp{tracto}, because the shortest
1525 match for @samp{r*} at the end of @samp{tractor} is just @samp{r}.
1526
1527 @item $@{@var{variable}##@var{prefix}@}
1528 Substitute the value of @var{variable}, but first discard from that
1529 variable any portion at the beginning that matches the pattern @var{prefix}.
1530
1531 If there is more than one alternative for how to match against
1532 @var{prefix}, this construct uses the longest possible match.
1533
1534 Thus, @samp{$@{foo##*t@}} substitutes @samp{or}, because the largest
1535 match for @samp{*t} at the beginning of @samp{tractor} is @samp{tract}.
1536
1537 @item $@{@var{variable}#@var{prefix}@}
1538 Substitute the value of @var{variable}, but first discard from that
1539 variable any portion at the beginning that matches the pattern @var{prefix}.
1540
1541 If there is more than one alternative for how to match against
1542 @var{prefix}, this construct uses the shortest possible alternative.
1543
1544 Thus, @samp{$@{foo#*t@}} substitutes @samp{ractor}, because the shortest
1545 match for @samp{*t} at the beginning of @samp{tractor} is just @samp{t}.
1546
1547 @end table