lispref/searching.texi

   1 @c -*-texinfo-*-
   2 @c This is part of the GNU Emacs Lisp Reference Manual.
   3 @c Copyright (C) 1990, 1991, 1992, 1993, 1994 Free Software Foundation, Inc.
   4 @c See the file elisp.texi for copying conditions.
   5 @setfilename ../info/searching
   6 @node Searching and Matching, Syntax Tables, Text, Top
   7 @chapter Searching and Matching
   8 @cindex searching
   9
  10   GNU Emacs provides two ways to search through a buffer for specified
  11 text: exact string searches and regular expression searches.  After a
  12 regular expression search, you can examine the @dfn{match data} to
  13 determine which text matched the whole regular expression or various
  14 portions of it.
  15
  16 @menu
  17 * String Search::         Search for an exact match.
  18 * Regular Expressions::   Describing classes of strings.
  19 * Regexp Search::         Searching for a match for a regexp.
  20 * Search and Replace::    Internals of @code{query-replace}.
  21 * Match Data::            Finding out which part of the text matched
  22                             various parts of a regexp, after regexp search.
  23 * Searching and Case::    Case-independent or case-significant searching.
  24 * Standard Regexps::      Useful regexps for finding sentences, pages,...
  25 @end menu
  26
  27   The @samp{skip-chars@dots{}} functions also perform a kind of searching.
  28 @xref{Skipping Characters}.
  29
  30 @node String Search
  31 @section Searching for Strings
  32 @cindex string search
  33
  34   These are the primitive functions for searching through the text in a
  35 buffer.  They are meant for use in programs, but you may call them
  36 interactively.  If you do so, they prompt for the search string;
  37 @var{limit} and @var{noerror} are set to @code{nil}, and @var{repeat}
  38 is set to 1.
  39
  40 @deffn Command search-forward string &optional limit noerror repeat
  41   This function searches forward from point for an exact match for
  42 @var{string}.  If successful, it sets point to the end of the occurrence
  43 found, and returns the new value of point.  If no match is found, the
  44 value and side effects depend on @var{noerror} (see below).
  45 @c Emacs 19 feature
  46
  47   In the following example, point is initially at the beginning of the
  48 line.  Then @code{(search-forward "fox")} moves point after the last
  49 letter of @samp{fox}:
  50
  51 @example
  52 @group
  53 ---------- Buffer: foo ----------
  54 @point{}The quick brown fox jumped over the lazy dog.
  55 ---------- Buffer: foo ----------
  56 @end group
  57
  58 @group
  59 (search-forward "fox")
  60      @result{} 20
  61
  62 ---------- Buffer: foo ----------
  63 The quick brown fox@point{} jumped over the lazy dog.
  64 ---------- Buffer: foo ----------
  65 @end group
  66 @end example
  67
  68   The argument @var{limit} specifies the upper bound to the search.  (It
  69 must be a position in the current buffer.)  No match extending after
  70 that position is accepted.  If @var{limit} is omitted or @code{nil}, it
  71 defaults to the end of the accessible portion of the buffer.
  72
  73 @kindex search-failed
  74   What happens when the search fails depends on the value of
  75 @var{noerror}.  If @var{noerror} is @code{nil}, a @code{search-failed}
  76 error is signaled.  If @var{noerror} is @code{t}, @code{search-forward}
  77 returns @code{nil} and does nothing.  If @var{noerror} is neither
  78 @code{nil} nor @code{t}, then @code{search-forward} moves point to the
  79 upper bound and returns @code{nil}.  (It would be more consistent now
  80 to return the new position of point in that case, but some programs
  81 may depend on a value of @code{nil}.)
  82
  83   If @var{repeat} is non-@code{nil}, then the search is repeated that
  84 many times.  Point is positioned at the end of the last match.
  85 @end deffn
  86
  87 @deffn Command search-backward string &optional limit noerror repeat
  88 This function searches backward from point for @var{string}.  It is
  89 just like @code{search-forward} except that it searches backwards and
  90 leaves point at the beginning of the match.
  91 @end deffn
  92
  93 @deffn Command word-search-forward string &optional limit noerror repeat
  94 @cindex word search
  95 This function searches forward from point for a ``word'' match for
  96 @var{string}.  If it finds a match, it sets point to the end of the
  97 match found, and returns the new value of point.
  98 @c Emacs 19 feature
  99
 100 Word matching regards @var{string} as a sequence of words, disregarding
 101 punctuation that separates them.  It searches the buffer for the same
 102 sequence of words.  Each word must be distinct in the buffer (searching
 103 for the word @samp{ball} does not match the word @samp{balls}), but the
 104 details of punctuation and spacing are ignored (searching for @samp{ball
 105 boy} does match @samp{ball.  Boy!}).
 106
 107 In this example, point is initially at the beginning of the buffer; the
 108 search leaves it between the @samp{y} and the @samp{!}.
 109
 110 @example
 111 @group
 112 ---------- Buffer: foo ----------
 113 @point{}He said "Please!  Find
 114 the ball boy!"
 115 ---------- Buffer: foo ----------
 116 @end group
 117
 118 @group
 119 (word-search-forward "Please find the ball, boy.")
 120      @result{} 35
 121
 122 ---------- Buffer: foo ----------
 123 He said "Please!  Find
 124 the ball boy@point{}!"
 125 ---------- Buffer: foo ----------
 126 @end group
 127 @end example
 128
 129 If @var{limit} is non-@code{nil} (it must be a position in the current
 130 buffer), then it is the upper bound to the search.  The match found must
 131 not extend after that position.
 132
 133 If @var{noerror} is @code{nil}, then @code{word-search-forward} signals
 134 an error if the search fails.  If @var{noerror} is @code{t}, then it
 135 returns @code{nil} instead of signaling an error.  If @var{noerror} is
 136 neither @code{nil} nor @code{t}, it moves point to @var{limit} (or the
 137 end of the buffer) and returns @code{nil}.
 138
 139 If @var{repeat} is non-@code{nil}, then the search is repeated that many
 140 times.  Point is positioned at the end of the last match.
 141 @end deffn
 142
 143 @deffn Command word-search-backward string &optional limit noerror repeat
 144 This function searches backward from point for a word match to
 145 @var{string}.  This function is just like @code{word-search-forward}
 146 except that it searches backward and normally leaves point at the
 147 beginning of the match.
 148 @end deffn
 149
 150 @node Regular Expressions
 151 @section Regular Expressions
 152 @cindex regular expression
 153 @cindex regexp
 154
 155   A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that
 156 denotes a (possibly infinite) set of strings.  Searching for matches for
 157 a regexp is a very powerful operation.  This section explains how to write
 158 regexps; the following section says how to search for them.
 159
 160 @menu
 161 * Syntax of Regexps::       Rules for writing regular expressions.
 162 * Regexp Example::          Illustrates regular expression syntax.
 163 @end menu
 164
 165 @node Syntax of Regexps
 166 @subsection Syntax of Regular Expressions
 167
 168   Regular expressions have a syntax in which a few characters are special
 169 constructs and the rest are @dfn{ordinary}.  An ordinary character is a
 170 simple regular expression which matches that character and nothing else.
 171 The special characters are @samp{$}, @samp{^}, @samp{.}, @samp{*},
 172 @samp{+}, @samp{?}, @samp{[}, @samp{]} and @samp{\}; no new special
 173 characters will be defined in the future.  Any other character appearing
 174 in a regular expression is ordinary, unless a @samp{\} precedes it.
 175
 176 For example, @samp{f} is not a special character, so it is ordinary, and
 177 therefore @samp{f} is a regular expression that matches the string
 178 @samp{f} and no other string.  (It does @emph{not} match the string
 179 @samp{ff}.)  Likewise, @samp{o} is a regular expression that matches
 180 only @samp{o}.@refill
 181
 182 Any two regular expressions @var{a} and @var{b} can be concatenated.  The
 183 result is a regular expression which matches a string if @var{a} matches
 184 some amount of the beginning of that string and @var{b} matches the rest of
 185 the string.@refill
 186
 187 As a simple example, we can concatenate the regular expressions @samp{f}
 188 and @samp{o} to get the regular expression @samp{fo}, which matches only
 189 the string @samp{fo}.  Still trivial.  To do something more powerful, you
 190 need to use one of the special characters.  Here is a list of them:
 191
 192 @need 1200
 193 @table @kbd
 194 @item .@: @r{(Period)}
 195 @cindex @samp{.} in regexp
 196 is a special character that matches any single character except a newline.
 197 Using concatenation, we can make regular expressions like @samp{a.b}, which
 198 matches any three-character string that begins with @samp{a} and ends with
 199 @samp{b}.@refill
 200
 201 @item *
 202 @cindex @samp{*} in regexp
 203 is not a construct by itself; it is a suffix operator that means to
 204 repeat the preceding regular expression as many times as possible.  In
 205 @samp{fo*}, the @samp{*} applies to the @samp{o}, so @samp{fo*} matches
 206 one @samp{f} followed by any number of @samp{o}s.  The case of zero
 207 @samp{o}s is allowed: @samp{fo*} does match @samp{f}.@refill
 208
 209 @samp{*} always applies to the @emph{smallest} possible preceding
 210 expression.  Thus, @samp{fo*} has a repeating @samp{o}, not a
 211 repeating @samp{fo}.@refill
 212
 213 The matcher processes a @samp{*} construct by matching, immediately,
 214 as many repetitions as can be found.  Then it continues with the rest
 215 of the pattern.  If that fails, backtracking occurs, discarding some
 216 of the matches of the @samp{*}-modified construct in case that makes
 217 it possible to match the rest of the pattern.  For example, in matching
 218 @samp{ca*ar} against the string @samp{caaar}, the @samp{a*} first
 219 tries to match all three @samp{a}s; but the rest of the pattern is
 220 @samp{ar} and there is only @samp{r} left to match, so this try fails.
 221 The next alternative is for @samp{a*} to match only two @samp{a}s.
 222 With this choice, the rest of the regexp matches successfully.@refill
 223
 224 @item +
 225 @cindex @samp{+} in regexp
 226 is a suffix operator similar to @samp{*} except that the preceding
 227 expression must match at least once.  So, for example, @samp{ca+r}
 228 matches the strings @samp{car} and @samp{caaaar} but not the string
 229 @samp{cr}, whereas @samp{ca*r} matches all three strings.
 230
 231 @item ?
 232 @cindex @samp{?} in regexp
 233 is a suffix operator similar to @samp{*} except that the preceding
 234 expression can match either once or not at all.  For example,
 235 @samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anyhing
 236 else.
 237
 238 @item [ @dots{} ]
 239 @cindex character set (in regexp)
 240 @cindex @samp{[} in regexp
 241 @cindex @samp{]} in regexp
 242 @samp{[} begins a @dfn{character set}, which is terminated by a
 243 @samp{]}.  In the simplest case, the characters between the two brackets
 244 form the set.  Thus, @samp{[ad]} matches either one @samp{a} or one
 245 @samp{d}, and @samp{[ad]*} matches any string composed of just @samp{a}s
 246 and @samp{d}s (including the empty string), from which it follows that
 247 @samp{c[ad]*r} matches @samp{cr}, @samp{car}, @samp{cdr},
 248 @samp{caddaar}, etc.@refill
 249
 250 The usual regular expression special characters are not special inside a
 251 character set.  A completely different set of special characters exists
 252 inside character sets: @samp{]}, @samp{-} and @samp{^}.@refill
 253
 254 @samp{-} is used for ranges of characters.  To write a range, write two
 255 characters with a @samp{-} between them.  Thus, @samp{[a-z]} matches any
 256 lower case letter.  Ranges may be intermixed freely with individual
 257 characters, as in @samp{[a-z$%.]}, which matches any lower case letter
 258 or @samp{$}, @samp{%} or a period.@refill
 259
 260 To include a @samp{]} in a character set, make it the first character.
 261 For example, @samp{[]a]} matches @samp{]} or @samp{a}.  To include a
 262 @samp{-}, write @samp{-} as the first character in the set, or put
 263 immediately after a range.  (You can replace one individual character
 264 @var{c} with the range @samp{@var{c}-@var{c}} to make a place to put the
 265 @samp{-}).  There is no way to write a set containing just @samp{-} and
 266 @samp{]}.
 267
 268 To include @samp{^} in a set, put it anywhere but at the beginning of
 269 the set.
 270
 271 @item [^ @dots{} ]
 272 @cindex @samp{^} in regexp
 273 @samp{[^} begins a @dfn{complement character set}, which matches any
 274 character except the ones specified.  Thus, @samp{[^a-z0-9A-Z]}
 275 matches all characters @emph{except} letters and digits.@refill
 276
 277 @samp{^} is not special in a character set unless it is the first
 278 character.  The character following the @samp{^} is treated as if it
 279 were first (thus, @samp{-} and @samp{]} are not special there).
 280
 281 Note that a complement character set can match a newline, unless
 282 newline is mentioned as one of the characters not to match.
 283
 284 @item ^
 285 @cindex @samp{^} in regexp
 286 @cindex beginning of line in regexp
 287 is a special character that matches the empty string, but only at
 288 the beginning of a line in the text being matched.  Otherwise it fails
 289 to match anything.  Thus, @samp{^foo} matches a @samp{foo} which occurs
 290 at the beginning of a line.
 291
 292 When matching a string, @samp{^} matches at the beginning of the string
 293 or after a newline character @samp{\n}.
 294
 295 @item $
 296 @cindex @samp{$} in regexp
 297 is similar to @samp{^} but matches only at the end of a line.  Thus,
 298 @samp{x+$} matches a string of one @samp{x} or more at the end of a line.
 299
 300 When matching a string, @samp{$} matches at the end of the string
 301 or before a newline character @samp{\n}.
 302
 303 @item \
 304 @cindex @samp{\} in regexp
 305 has two functions: it quotes the special characters (including
 306 @samp{\}), and it introduces additional special constructs.
 307
 308 Because @samp{\} quotes special characters, @samp{\$} is a regular
 309 expression which matches only @samp{$}, and @samp{\[} is a regular
 310 expression which matches only @samp{[}, and so on.
 311
 312 Note that @samp{\} also has special meaning in the read syntax of Lisp
 313 strings (@pxref{String Type}), and must be quoted with @samp{\}.  For
 314 example, the regular expression that matches the @samp{\} character is
 315 @samp{\\}.  To write a Lisp string that contains the characters
 316 @samp{\\}, Lisp syntax requires you to quote each @samp{\} with another
 317 @samp{\}.  Therefore, the read syntax for a regular expression matching
 318 @samp{\} is @code{"\\\\"}.@refill
 319 @end table
 320
 321 @strong{Please note:} for historical compatibility, special characters
 322 are treated as ordinary ones if they are in contexts where their special
 323 meanings make no sense.  For example, @samp{*foo} treats @samp{*} as
 324 ordinary since there is no preceding expression on which the @samp{*}
 325 can act.  It is poor practice to depend on this behavior; better to
 326 quote the special character anyway, regardless of where it
 327 appears.@refill
 328
 329 For the most part, @samp{\} followed by any character matches only
 330 that character.  However, there are several exceptions: characters
 331 which, when preceded by @samp{\}, are special constructs.  Such
 332 characters are always ordinary when encountered on their own.  Here
 333 is a table of @samp{\} constructs:
 334
 335 @table @kbd
 336 @item \|
 337 @cindex @samp{|} in regexp
 338 @cindex regexp alternative
 339 specifies an alternative.
 340 Two regular expressions @var{a} and @var{b} with @samp{\|} in
 341 between form an expression that matches anything that either @var{a} or
 342 @var{b} matches.@refill
 343
 344 Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar}
 345 but no other string.@refill
 346
 347 @samp{\|} applies to the largest possible surrounding expressions.  Only a
 348 surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of
 349 @samp{\|}.@refill
 350
 351 Full backtracking capability exists to handle multiple uses of @samp{\|}.
 352
 353 @item \( @dots{} \)
 354 @cindex @samp{(} in regexp
 355 @cindex @samp{)} in regexp
 356 @cindex regexp grouping
 357 is a grouping construct that serves three purposes:
 358
 359 @enumerate
 360 @item
 361 To enclose a set of @samp{\|} alternatives for other operations.
 362 Thus, @samp{\(foo\|bar\)x} matches either @samp{foox} or @samp{barx}.
 363
 364 @item
 365 To enclose an expression for a suffix operator such as @samp{*} to act
 366 on.  Thus, @samp{ba\(na\)*} matches @samp{bananana}, etc., with any
 367 (zero or more) number of @samp{na} strings.@refill
 368
 369 @item
 370 To record a matched substring for future reference.
 371 @end enumerate
 372
 373 This last application is not a consequence of the idea of a
 374 parenthetical grouping; it is a separate feature which happens to be
 375 assigned as a second meaning to the same @samp{\( @dots{} \)} construct
 376 because there is no conflict in practice between the two meanings.
 377 Here is an explanation of this feature:
 378
 379 @item \@var{digit}
 380 matches the same text which matched the @var{digit}th occurrence of a
 381 @samp{\( @dots{} \)} construct.
 382
 383 In other words, after the end of a @samp{\( @dots{} \)} construct.  the
 384 matcher remembers the beginning and end of the text matched by that
 385 construct.  Then, later on in the regular expression, you can use
 386 @samp{\} followed by @var{digit} to match that same text, whatever it
 387 may have been.
 388
 389 The strings matching the first nine @samp{\( @dots{} \)} constructs
 390 appearing in a regular expression are assigned numbers 1 through 9 in
 391 the order that the open parentheses appear in the regular expression.
 392 So you can use @samp{\1} through @samp{\9} to refer to the text matched
 393 by the corresponding @samp{\( @dots{} \)} constructs.
 394
 395 For example, @samp{\(.*\)\1} matches any newline-free string that is
 396 composed of two identical halves.  The @samp{\(.*\)} matches the first
 397 half, which may be anything, but the @samp{\1} that follows must match
 398 the same exact text.
 399
 400 @item \w
 401 @cindex @samp{\w} in regexp
 402 matches any word-constituent character.  The editor syntax table
 403 determines which characters these are.  @xref{Syntax Tables}.
 404
 405 @item \W
 406 @cindex @samp{\W} in regexp
 407 matches any character that is not a word-constituent.
 408
 409 @item \s@var{code}
 410 @cindex @samp{\s} in regexp
 411 matches any character whose syntax is @var{code}.  Here @var{code} is a
 412 character which represents a syntax code: thus, @samp{w} for word
 413 constituent, @samp{-} for whitespace, @samp{(} for open parenthesis,
 414 etc.  @xref{Syntax Tables}, for a list of syntax codes and the
 415 characters that stand for them.
 416
 417 @item \S@var{code}
 418 @cindex @samp{\S} in regexp
 419 matches any character whose syntax is not @var{code}.
 420 @end table
 421
 422   These regular expression constructs match the empty string---that is,
 423 they don't use up any characters---but whether they match depends on the
 424 context.
 425
 426 @table @kbd
 427 @item \`
 428 @cindex @samp{\`} in regexp
 429 matches the empty string, but only at the beginning
 430 of the buffer or string being matched against.
 431
 432 @item \'
 433 @cindex @samp{\'} in regexp
 434 matches the empty string, but only at the end of
 435 the buffer or string being matched against.
 436
 437 @item \=
 438 @cindex @samp{\=} in regexp
 439 matches the empty string, but only at point.
 440 (This construct is not defined when matching against a string.)
 441
 442 @item \b
 443 @cindex @samp{\b} in regexp
 444 matches the empty string, but only at the beginning or
 445 end of a word.  Thus, @samp{\bfoo\b} matches any occurrence of
 446 @samp{foo} as a separate word.  @samp{\bballs?\b} matches
 447 @samp{ball} or @samp{balls} as a separate word.@refill
 448
 449 @item \B
 450 @cindex @samp{\B} in regexp
 451 matches the empty string, but @emph{not} at the beginning or
 452 end of a word.
 453
 454 @item \<
 455 @cindex @samp{\<} in regexp
 456 matches the empty string, but only at the beginning of a word.
 457
 458 @item \>
 459 @cindex @samp{\>} in regexp
 460 matches the empty string, but only at the end of a word.
 461 @end table
 462
 463 @kindex invalid-regexp
 464   Not every string is a valid regular expression.  For example, a string
 465 with unbalanced square brackets is invalid (with a few exceptions, such
 466 as @samp{[]]}, and so is a string that ends with a single @samp{\}.  If
 467 an invalid regular expression is passed to any of the search functions,
 468 an @code{invalid-regexp} error is signaled.
 469
 470 @defun regexp-quote string
 471 This function returns a regular expression string that matches exactly
 472 @var{string} and nothing else.  This allows you to request an exact
 473 string match when calling a function that wants a regular expression.
 474
 475 @example
 476 @group
 477 (regexp-quote "^The cat$")
 478      @result{} "\\^The cat\\$"
 479 @end group
 480 @end example
 481
 482 One use of @code{regexp-quote} is to combine an exact string match with
 483 context described as a regular expression.  For example, this searches
 484 for the string which is the value of @code{string}, surrounded by
 485 whitespace:
 486
 487 @example
 488 @group
 489 (re-search-forward
 490  (concat "\\s " (regexp-quote string) "\\s "))
 491 @end group
 492 @end example
 493 @end defun
 494
 495 @node Regexp Example
 496 @comment  node-name,  next,  previous,  up
 497 @subsection Complex Regexp Example
 498
 499   Here is a complicated regexp, used by Emacs to recognize the end of a
 500 sentence together with any whitespace that follows.  It is the value of
 501 the variable @code{sentence-end}.
 502
 503   First, we show the regexp as a string in Lisp syntax to distinguish
 504 spaces from tab characters.  The string constant begins and ends with a
 505 double-quote.  @samp{\"} stands for a double-quote as part of the
 506 string, @samp{\\} for a backslash as part of the string, @samp{\t} for a
 507 tab and @samp{\n} for a newline.
 508
 509 @example
 510 "[.?!][]\"')@}]*\\($\\| $\\|\t\\|  \\)[ \t\n]*"
 511 @end example
 512
 513   In contrast, if you evaluate the variable @code{sentence-end}, you
 514 will see the following:
 515
 516 @example
 517 @group
 518 sentence-end
 519 @result{}
 520 "[.?!][]\"')@}]*\\($\\| $\\|  \\|  \\)[
 521 ]*"
 522 @end group
 523 @end example
 524
 525 @noindent
 526 In this output, tab and newline appear as themselves.
 527
 528   This regular expression contains four parts in succession and can be
 529 deciphered as follows:
 530
 531 @table @code
 532 @item [.?!]
 533 The first part of the pattern consists of three characters, a period, a
 534 question mark and an exclamation mark, within square brackets.  The
 535 match must begin with one of these three characters.
 536
 537 @item []\"')@}]*
 538 The second part of the pattern matches any closing braces and quotation
 539 marks, zero or more of them, that may follow the period, question mark
 540 or exclamation mark.  The @code{\"} is Lisp syntax for a double-quote in
 541 a string.  The @samp{*} at the end indicates that the immediately
 542 preceding regular expression (a character set, in this case) may be
 543 repeated zero or more times.
 544
 545 @item \\($\\|@ \\|\t\\|@ @ \\)
 546 The third part of the pattern matches the whitespace that follows the
 547 end of a sentence: the end of a line, or a tab, or two spaces.  The
 548 double backslashes mark the parentheses and vertical bars as regular
 549 expression syntax; the parentheses mark the group and the vertical bars
 550 separate alternatives.  The dollar sign is used to match the end of a
 551 line.
 552
 553 @item [ \t\n]*
 554 Finally, the last part of the pattern matches any additional whitespace
 555 beyond the minimum needed to end a sentence.
 556 @end table
 557
 558 @node Regexp Search
 559 @section Regular Expression Searching
 560 @cindex regular expression searching
 561 @cindex regexp searching
 562 @cindex searching for regexp
 563
 564   In GNU Emacs, you can search for the next match for a regexp either
 565 incrementally or not.  For incremental search commands, see @ref{Regexp
 566 Search, , Regular Expression Search, emacs, The GNU Emacs Manual}.  Here
 567 we describe only the search functions useful in programs.  The principal
 568 one is @code{re-search-forward}.
 569
 570 @deffn Command re-search-forward regexp &optional limit noerror repeat
 571 This function searches forward in the current buffer for a string of
 572 text that is matched by the regular expression @var{regexp}.  The
 573 function skips over any amount of text that is not matched by
 574 @var{regexp}, and leaves point at the end of the first match found.
 575 It returns the new value of point.
 576
 577 If @var{limit} is non-@code{nil} (it must be a position in the current
 578 buffer), then it is the upper bound to the search.  No match extending
 579 after that position is accepted.
 580
 581 What happens when the search fails depends on the value of
 582 @var{noerror}.  If @var{noerror} is @code{nil}, a @code{search-failed}
 583 error is signaled.  If @var{noerror} is @code{t},
 584 @code{re-search-forward} does nothing and returns @code{nil}.  If
 585 @var{noerror} is neither @code{nil} nor @code{t}, then
 586 @code{re-search-forward} moves point to @var{limit} (or the end of the
 587 buffer) and returns @code{nil}.
 588
 589 If @var{repeat} is supplied (it must be a positive number), then the
 590 search is repeated that many times (each time starting at the end of the
 591 previous time's match).  If these successive searches succeed, the
 592 function succeeds, moving point and returning its new value.  Otherwise
 593 the search fails.
 594
 595 In the following example, point is initially before the @samp{T}.
 596 Evaluating the search call moves point to the end of that line (between
 597 the @samp{t} of @samp{hat} and the newline).
 598
 599 @example
 600 @group
 601 ---------- Buffer: foo ----------
 602 I read "@point{}The cat in the hat
 603 comes back" twice.
 604 ---------- Buffer: foo ----------
 605 @end group
 606
 607 @group
 608 (re-search-forward "[a-z]+" nil t 5)
 609      @result{} 27
 610
 611 ---------- Buffer: foo ----------
 612 I read "The cat in the hat@point{}
 613 comes back" twice.
 614 ---------- Buffer: foo ----------
 615 @end group
 616 @end example
 617 @end deffn
 618
 619 @deffn Command re-search-backward regexp &optional limit noerror repeat
 620 This function searches backward in the current buffer for a string of
 621 text that is matched by the regular expression @var{regexp}, leaving
 622 point at the beginning of the first text found.
 623
 624 This function is analogous to @code{re-search-forward}, but they are
 625 not simple mirror images.  @code{re-search-forward} finds the match
 626 whose beginning is as close as possible.  If @code{re-search-backward}
 627 were a perfect mirror image, it would find the match whose end is as
 628 close as possible.  However, in fact it finds the match whose beginning
 629 is as close as possible.  The reason is that matching a regular
 630 expression at a given spot always works from beginning to end, and is
 631 done at a specified beginning position.
 632
 633 A true mirror-image of @code{re-search-forward} would require a special
 634 feature for matching regexps from end to beginning.  It's not worth the
 635 trouble of implementing that.
 636 @end deffn
 637
 638 @defun string-match regexp string &optional start
 639 This function returns the index of the start of the first match for
 640 the regular expression @var{regexp} in @var{string}, or @code{nil} if
 641 there is no match.  If @var{start} is non-@code{nil}, the search starts
 642 at that index in @var{string}.
 643
 644 For example,
 645
 646 @example
 647 @group
 648 (string-match
 649  "quick" "The quick brown fox jumped quickly.")
 650      @result{} 4
 651 @end group
 652 @group
 653 (string-match
 654  "quick" "The quick brown fox jumped quickly." 8)
 655      @result{} 27
 656 @end group
 657 @end example
 658
 659 @noindent
 660 The index of the first character of the
 661 string is 0, the index of the second character is 1, and so on.
 662
 663 After this function returns, the index of the first character beyond
 664 the match is available as @code{(match-end 0)}.  @xref{Match Data}.
 665
 666 @example
 667 @group
 668 (string-match
 669  "quick" "The quick brown fox jumped quickly." 8)
 670      @result{} 27
 671 @end group
 672
 673 @group
 674 (match-end 0)
 675      @result{} 32
 676 @end group
 677 @end example
 678 @end defun
 679
 680 @defun looking-at regexp
 681 This function determines whether the text in the current buffer directly
 682 following point matches the regular expression @var{regexp}.  ``Directly
 683 following'' means precisely that: the search is ``anchored'' and it can
 684 succeed only starting with the first character following point.  The
 685 result is @code{t} if so, @code{nil} otherwise.
 686
 687 This function does not move point, but it updates the match data, which
 688 you can access using @code{match-beginning} and @code{match-end}.
 689 @xref{Match Data}.
 690
 691 In this example, point is located directly before the @samp{T}.  If it
 692 were anywhere else, the result would be @code{nil}.
 693
 694 @example
 695 @group
 696 ---------- Buffer: foo ----------
 697 I read "@point{}The cat in the hat
 698 comes back" twice.
 699 ---------- Buffer: foo ----------
 700
 701 (looking-at "The cat in the hat$")
 702      @result{} t
 703 @end group
 704 @end example
 705 @end defun
 706
 707 @ignore
 708 @deffn Command delete-matching-lines regexp
 709 This function is identical to @code{delete-non-matching-lines}, save
 710 that it deletes what @code{delete-non-matching-lines} keeps.
 711
 712 In the example below, point is located on the first line of text.
 713
 714 @example
 715 @group
 716 ---------- Buffer: foo ----------
 717 We hold these truths
 718 to be self-evident,
 719 that all men are created
 720 equal, and that they are
 721 ---------- Buffer: foo ----------
 722 @end group
 723
 724 @group
 725 (delete-matching-lines "the")
 726      @result{} nil
 727
 728 ---------- Buffer: foo ----------
 729 to be self-evident,
 730 that all men are created
 731 ---------- Buffer: foo ----------
 732 @end group
 733 @end example
 734 @end deffn
 735
 736 @deffn Command flush-lines regexp
 737 This function is the same as @code{delete-matching-lines}.
 738 @end deffn
 739
 740 @defun delete-non-matching-lines regexp
 741 This function deletes all lines following point which don't
 742 contain a match for the regular expression @var{regexp}.
 743 @end defun
 744
 745 @deffn Command keep-lines regexp
 746 This function is the same as @code{delete-non-matching-lines}.
 747 @end deffn
 748
 749 @deffn Command how-many regexp
 750 This function counts the number of matches for @var{regexp} there are in
 751 the current buffer following point.  It prints this number in
 752 the echo area, returning the string printed.
 753 @end deffn
 754
 755 @deffn Command count-matches regexp
 756 This function is a synonym of @code{how-many}.
 757 @end deffn
 758
 759 @deffn Command list-matching-lines regexp nlines
 760 This function is a synonym of @code{occur}.
 761 Show all lines following point containing a match for @var{regexp}.
 762 Display each line with @var{nlines} lines before and after,
 763 or @code{-}@var{nlines} before if @var{nlines} is negative.
 764 @var{nlines} defaults to @code{list-matching-lines-default-context-lines}.
 765 Interactively it is the prefix arg.
 766
 767 The lines are shown in a buffer named @samp{*Occur*}.
 768 It serves as a menu to find any of the occurrences in this buffer.
 769 @kbd{C-h m} (@code{describe-mode} in that buffer gives help.
 770 @end deffn
 771
 772 @defopt list-matching-lines-default-context-lines
 773 Default value is 0.
 774 Default number of context lines to include around a @code{list-matching-lines}
 775 match.  A negative number means to include that many lines before the match.
 776 A positive number means to include that many lines both before and after.
 777 @end defopt
 778 @end ignore
 779
 780 @node Search and Replace
 781 @section Search and Replace
 782 @cindex replacement
 783
 784 @defun perform-replace from-string replacements query-flag regexp-flag delimited-flag &optional repeat-count map
 785 This function is the guts of @code{query-replace} and related commands.
 786 It searches for occurrences of @var{from-string} and replaces some or
 787 all of them.  If @var{query-flag} is @code{nil}, it replaces all
 788 occurrences; otherwise, it asks the user what to do about each one.
 789
 790 If @var{regexp-flag} is non-@code{nil}, then @var{from-string} is
 791 considered a regular expression; otherwise, it must match literally.  If
 792 @var{delimited-flag} is non-@code{nil}, then only replacements
 793 surrounded by word boundaries are considered.
 794
 795 The argument @var{replacements} specifies what to replace occurrences
 796 with.  If it is a string, that string is used.  It can also be a list of
 797 strings, to be used in cyclic order.
 798
 799 If @var{repeat-count} is non-@code{nil}, it should be an integer, the
 800 number of occurrences to consider.  In this case, @code{perform-replace}
 801 returns after considering that many occurrences.
 802
 803 Normally, the keymap @code{query-replace-map} defines the possible user
 804 responses.  The argument @var{map}, if non-@code{nil}, is a keymap to
 805 use instead of @code{query-replace-map}.
 806 @end defun
 807
 808 @defvar query-replace-map
 809 This variable holds a special keymap that defines the valid user
 810 responses for @code{query-replace} and related functions, as well as
 811 @code{y-or-n-p} and @code{map-y-or-n-p}.  It is unusual in two ways:
 812
 813 @itemize @bullet
 814 @item
 815 The ``key bindings'' are not commands, just symbols that are meaningful
 816 to the functions that use this map.
 817
 818 @item
 819 Prefix keys are not supported; each key binding must be for a single event
 820 key sequence.  This is because the functions don't use read key sequence to
 821 get the input; instead, they read a single event and look it up ``by hand.''
 822 @end itemize
 823 @end defvar
 824
 825 Here are the meaningful ``bindings'' for @code{query-replace-map}.
 826 Several of them are meaningful only for @code{query-replace} and
 827 friends.
 828
 829 @table @code
 830 @item act
 831 Do take the action being considered---in other words, ``yes.''
 832
 833 @item skip
 834 Do not take action for this question---in other words, ``no.''
 835
 836 @item exit
 837 Answer this question ``no,'' and don't ask any more.
 838
 839 @item act-and-exit
 840 Answer this question ``yes,'' and don't ask any more.
 841
 842 @item act-and-show
 843 Answer this question ``yes,'' but show the results---don't advance yet
 844 to the next question.
 845
 846 @item automatic
 847 Answer this question and all subsequent questions in the series with
 848 ``yes,'' without further user interaction.
 849
 850 @item backup
 851 Move back to the previous place that a question was asked about.
 852
 853 @item edit
 854 Enter a recursive edit to deal with this question---instead of any
 855 other action that would normally be taken.
 856
 857 @item delete-and-edit
 858 Delete the text being considered, then enter a recursive edit to replace
 859 it.
 860
 861 @item recenter
 862 Redisplay and center the window, then ask the same question again.
 863
 864 @item quit
 865 Perform a quit right away.  Only @code{y-or-n-p} and related functions
 866 use this answer.
 867
 868 @item help
 869 Display some help, then ask again.
 870 @end table
 871
 872 @node Match Data
 873 @section The Match Data
 874 @cindex match data
 875
 876   Emacs keeps track of the positions of the start and end of segments of
 877 text found during a regular expression search.  This means, for example,
 878 that you can search for a complex pattern, such as a date in an Rmail
 879 message, and then extract parts of the match under control of the
 880 pattern.
 881
 882   Because the match data normally describe the most recent search only,
 883 you must be careful not to do another search inadvertently between the
 884 search you wish to refer back to and the use of the match data.  If you
 885 can't avoid another intervening search, you must save and restore the
 886 match data around it, to prevent it from being overwritten.
 887
 888 @menu
 889 * Simple Match Data::     Accessing single items of match data,
 890                             such as where a particular subexpression started.
 891 * Replacing Match::       Replacing a substring that was matched.
 892 * Entire Match Data::     Accessing the entire match data at once, as a list.
 893 * Saving Match Data::     Saving and restoring the match data.
 894 @end menu
 895
 896 @node Simple Match Data
 897 @subsection Simple Match Data Access
 898
 899   This section explains how to use the match data to find the starting
 900 point or ending point of the text that was matched by a particular
 901 search, or by a particular parenthetical subexpression of a regular
 902 expression.
 903
 904 @defun match-beginning count
 905 This function returns the position of the start of text matched by the
 906 last regular expression searched for, or a subexpression of it.
 907
 908 The argument @var{count}, a number, specifies a subexpression whose
 909 start position is the value.  If @var{count} is zero, then the value is
 910 the position of the text matched by the whole regexp.  If @var{count} is
 911 greater than zero, then the value is the position of the beginning of
 912 the text matched by the @var{count}th subexpression.
 913
 914 Subexpressions of a regular expression are those expressions grouped
 915 inside of parentheses, @samp{\(@dots{}\)}.  The @var{count}th
 916 subexpression is found by counting occurrences of @samp{\(} from the
 917 beginning of the whole regular expression.  The first subexpression is
 918 numbered 1, the second 2, and so on.
 919
 920 The value is @code{nil} for a parenthetical grouping inside of a
 921 @samp{\|} alternative that wasn't used in the match.
 922 @end defun
 923
 924 @defun match-end count
 925 This function returns the position of the end of the text that matched
 926 the last regular expression searched for, or a subexpression of it.
 927 This function is otherwise similar to @code{match-beginning}.
 928 @end defun
 929
 930   Here is an example of using the match data, with a comment showing the
 931 positions within the text:
 932
 933 @example
 934 @group
 935 (string-match "\\(qu\\)\\(ick\\)"
 936               "The quick fox jumped quickly.")
 937               ;0123456789
 938      @result{} 4
 939 @end group
 940
 941 @group
 942 (match-beginning 1)       ; @r{The beginning of the match}
 943      @result{} 4                 ;   @r{with @samp{qu} is at index 4.}
 944 @end group
 945
 946 @group
 947 (match-beginning 2)       ; @r{The beginning of the match}
 948      @result{} 6                 ;   @r{with @samp{ick} is at index 6.}
 949 @end group
 950
 951 @group
 952 (match-end 1)             ; @r{The end of the match}
 953      @result{} 6                 ;   @r{with @samp{qu} is at index 6.}
 954
 955 (match-end 2)             ; @r{The end of the match}
 956      @result{} 9                 ;   @r{with @samp{ick} is at index 9.}
 957 @end group
 958 @end example
 959
 960   Here is another example.  Point is initially located at the beginning
 961 of the line.  Searching moves point to between the space and the word
 962 @samp{in}.  The beginning of the entire match is at the 9th character of
 963 the buffer (@samp{T}), and the beginning of the match for the first
 964 subexpression is at the 13th character (@samp{c}).
 965
 966 @example
 967 @group
 968 (list
 969   (re-search-forward "The \\(cat \\)")
 970   (match-beginning 0)
 971   (match-beginning 1))
 972     @result{} (t 9 13)
 973 @end group
 974
 975 @group
 976 ---------- Buffer: foo ----------
 977 I read "The cat @point{}in the hat comes back" twice.
 978         ^   ^
 979         9  13
 980 ---------- Buffer: foo ----------
 981 @end group
 982 @end example
 983
 984 @noindent
 985 (In this case, the index returned is a buffer position; the first
 986 character of the buffer counts as 1.)
 987
 988 @node Replacing Match
 989 @subsection Replacing the Text That Matched
 990
 991   This function replaces the text matched by the last search with
 992 @var{replacement}.
 993
 994 @cindex case in replacements
 995 @defun replace-match replacement &optional fixedcase literal
 996 This function replaces the buffer text matched by the last search, with
 997 @var{replacement}.  It applies only to buffers; you can't use
 998 @code{replace-match} to replace a substring found with
 999 @code{string-match}.
1000
1001 If @var{fixedcase} is non-@code{nil}, then the case of the replacement
1002 text is not changed; otherwise, the replacement text is converted to a
1003 different case depending upon the capitalization of the text to be
1004 replaced.  If the original text is all upper case, the replacement text
1005 is converted to upper case.  If the first word of the original text is
1006 capitalized, then the first word of the replacement text is capitalized.
1007 If the original text contains just one word, and that word is a capital
1008 letter, @code{replace-match} considers this a capitalized first word
1009 rather than all upper case.
1010
1011 If @var{literal} is non-@code{nil}, then @var{replacement} is inserted
1012 exactly as it is, the only alterations being case changes as needed.
1013 If it is @code{nil} (the default), then the character @samp{\} is treated
1014 specially.  If a @samp{\} appears in @var{replacement}, then it must be
1015 part of one of the following sequences:
1016
1017 @table @asis
1018 @item @samp{\&}
1019 @cindex @samp{&} in replacement
1020 @samp{\&} stands for the entire text being replaced.
1021
1022 @item @samp{\@var{n}}
1023 @cindex @samp{\@var{n}} in replacement
1024 @samp{\@var{n}} stands for the text that matched the @var{n}th
1025 subexpression in the original regexp.  Subexpressions are those
1026 expressions grouped inside of @samp{\(@dots{}\)}.  @var{n} is a digit.
1027
1028 @item @samp{\\}
1029 @cindex @samp{\} in replacement
1030 @samp{\\} stands for a single @samp{\} in the replacement text.
1031 @end table
1032
1033 @code{replace-match} leaves point at the end of the replacement text,
1034 and returns @code{t}.
1035 @end defun
1036
1037 @node Entire Match Data
1038 @subsection Accessing the Entire Match Data
1039
1040   The functions @code{match-data} and @code{set-match-data} read or
1041 write the entire match data, all at once.
1042
1043 @defun match-data
1044 This function returns a newly constructed list containing all the
1045 information on what text the last search matched.  Element zero is the
1046 position of the beginning of the match for the whole expression; element
1047 one is the position of the end of the match for the expression.  The
1048 next two elements are the positions of the beginning and end of the
1049 match for the first subexpression, and so on.  In general, element
1050 @ifinfo
1051 number 2@var{n}
1052 @end ifinfo
1053 @tex
1054 number {\mathsurround=0pt $2n$}
1055 @end tex
1056 corresponds to @code{(match-beginning @var{n})}; and
1057 element
1058 @ifinfo
1059 number 2@var{n} + 1
1060 @end ifinfo
1061 @tex
1062 number {\mathsurround=0pt $2n+1$}
1063 @end tex
1064 corresponds to @code{(match-end @var{n})}.
1065
1066 All the elements are markers or @code{nil} if matching was done on a
1067 buffer, and all are integers or @code{nil} if matching was done on a
1068 string with @code{string-match}.  (In Emacs 18 and earlier versions,
1069 markers were used even for matching on a string, except in the case
1070 of the integer 0.)
1071
1072 As always, there must be no possibility of intervening searches between
1073 the call to a search function and the call to @code{match-data} that is
1074 intended to access the match data for that search.
1075
1076 @example
1077 @group
1078 (match-data)
1079      @result{}  (#<marker at 9 in foo>
1080           #<marker at 17 in foo>
1081           #<marker at 13 in foo>
1082           #<marker at 17 in foo>)
1083 @end group
1084 @end example
1085 @end defun
1086
1087 @defun set-match-data match-list
1088 This function sets the match data from the elements of @var{match-list},
1089 which should be a list that was the value of a previous call to
1090 @code{match-data}.
1091
1092 If @var{match-list} refers to a buffer that doesn't exist, you don't get
1093 an error; that sets the match data in a meaningless but harmless way.
1094
1095 @findex store-match-data
1096 @code{store-match-data} is an alias for @code{set-match-data}.
1097 @end defun
1098
1099 @node Saving Match Data
1100 @subsection Saving and Restoring the Match Data
1101
1102   All asynchronous process functions (filters and sentinels) and
1103 functions that use @code{recursive-edit} should save and restore the
1104 match data if they do a search or if they let the user type arbitrary
1105 commands.  Saving the match data is useful in other cases as
1106 well---whenever you want to access the match data resulting from an
1107 earlier search, notwithstanding another intervening search.
1108
1109   This example shows the problem that can arise if you fail to
1110 attend to this requirement:
1111
1112 @example
1113 @group
1114 (re-search-forward "The \\(cat \\)")
1115      @result{} 48
1116 (foo)                   ; @r{Perhaps @code{foo} does}
1117                         ;   @r{more searching.}
1118 (match-end 0)
1119      @result{} 61              ; @r{Unexpected result---not 48!}
1120 @end group
1121 @end example
1122
1123   In Emacs versions 19 and later, you can save and restore the match
1124 data with @code{save-match-data}:
1125
1126 @defspec save-match-data body@dots{}
1127 This special form executes @var{body}, saving and restoring the match
1128 data around it.  This is useful if you wish to do a search without
1129 altering the match data that resulted from an earlier search.
1130 @end defspec
1131
1132   You can use @code{set-match-data} together with @code{match-data} to
1133 imitate the effect of the special form @code{save-match-data}.  This is
1134 useful for writing code that can run in Emacs 18.  Here is how:
1135
1136 @example
1137 @group
1138 (let ((data (match-data)))
1139   (unwind-protect
1140       @dots{}   ; @r{May change the original match data.}
1141     (set-match-data data)))
1142 @end group
1143 @end example
1144
1145 @ignore
1146   Here is a function which restores the match data provided the buffer
1147 associated with it still exists.
1148
1149 @smallexample
1150 @group
1151 (defun restore-match-data (data)
1152 @c It is incorrect to split the first line of a doc string.
1153 @c If there's a problem here, it should be solved in some other way.
1154   "Restore the match data DATA unless the buffer is missing."
1155   (catch 'foo
1156     (let ((d data))
1157 @end group
1158       (while d
1159         (and (car d)
1160              (null (marker-buffer (car d)))
1161 @group
1162              ;; @file{match-data} @r{buffer is deleted.}
1163              (throw 'foo nil))
1164         (setq d (cdr d)))
1165       (set-match-data data))))
1166 @end group
1167 @end smallexample
1168 @end ignore
1169
1170 @node Searching and Case
1171 @section Searching and Case
1172 @cindex searching and case
1173
1174   By default, searches in Emacs ignore the case of the text they are
1175 searching through; if you specify searching for @samp{FOO}, then
1176 @samp{Foo} or @samp{foo} is also considered a match.  Regexps, and in
1177 particular character sets, are included: thus, @samp{[aB]} would match
1178 @samp{a} or @samp{A} or @samp{b} or @samp{B}.
1179
1180   If you do not want this feature, set the variable
1181 @code{case-fold-search} to @code{nil}.  Then all letters must match
1182 exactly, including case.  This is a per-buffer-local variable; altering
1183 the variable affects only the current buffer.  (@xref{Intro to
1184 Buffer-Local}.)  Alternatively, you may change the value of
1185 @code{default-case-fold-search}, which is the default value of
1186 @code{case-fold-search} for buffers that do not override it.
1187
1188   Note that the user-level incremental search feature handles case
1189 distinctions differently.  When given a lower case letter, it looks for
1190 a match of either case, but when given an upper case letter, it looks
1191 for an upper case letter only.  But this has nothing to do with the
1192 searching functions Lisp functions use.
1193
1194 @defopt case-replace
1195 This variable determines whether @code{query-replace} should preserve
1196 case in replacements.  If the variable is @code{nil}, then
1197 @code{replace-match} should not try to convert case.
1198 @end defopt
1199
1200 @defopt case-fold-search
1201 This buffer-local variable determines whether searches should ignore
1202 case.  If the variable is @code{nil} they do not ignore case; otherwise
1203 they do ignore case.
1204 @end defopt
1205
1206 @defvar default-case-fold-search
1207 The value of this variable is the default value for
1208 @code{case-fold-search} in buffers that do not override it.  This is the
1209 same as @code{(default-value 'case-fold-search)}.
1210 @end defvar
1211
1212 @node Standard Regexps
1213 @section Standard Regular Expressions Used in Editing
1214 @cindex regexps used standardly in editing
1215 @cindex standard regexps used in editing
1216
1217   This section describes some variables that hold regular expressions
1218 used for certain purposes in editing:
1219
1220 @defvar page-delimiter
1221 This is the regexp describing line-beginnings that separate pages.  The
1222 default value is @code{"^\014"} (i.e., @code{"^^L"} or @code{"^\C-l"}).
1223 @end defvar
1224
1225 @defvar paragraph-separate
1226 This is the regular expression for recognizing the beginning of a line
1227 that separates paragraphs.  (If you change this, you may have to
1228 change @code{paragraph-start} also.)  The default value is @code{"^[
1229 \t\f]*$"}, which is a line that consists entirely of spaces, tabs, and
1230 form feeds.
1231 @end defvar
1232
1233 @defvar paragraph-start
1234 This is the regular expression for recognizing the beginning of a line
1235 that starts @emph{or} separates paragraphs.  The default value is
1236 @code{"^[ \t\n\f]"}, which matches a line starting with a space, tab,
1237 newline, or form feed.
1238 @end defvar
1239
1240 @defvar sentence-end
1241 This is the regular expression describing the end of a sentence.  (All
1242 paragraph boundaries also end sentences, regardless.)  The default value
1243 is:
1244
1245 @example
1246 "[.?!][]\"')@}]*\\($\\|\t\\| \\)[ \t\n]*"
1247 @end example
1248
1249 This means a period, question mark or exclamation mark, followed by a
1250 closing brace, followed by tabs, spaces or new lines.
1251
1252 For a detailed explanation of this regular expression, see @ref{Regexp
1253 Example}.
1254 @end defvar