2 @c This is part of the GNU Emacs Lisp Reference Manual.
3 @c Copyright (C) 1990, 1991, 1992, 1993, 1994 Free Software Foundation, Inc.
4 @c See the file elisp.texi for copying conditions.
5 @setfilename ../info/searching
6 @node Searching and Matching, Syntax Tables, Text, Top
7 @chapter Searching and Matching
10 GNU Emacs provides two ways to search through a buffer for specified
11 text: exact string searches and regular expression searches. After a
12 regular expression search, you can examine the @dfn{match data} to
13 determine which text matched the whole regular expression or various
17 * String Search:: Search for an exact match.
18 * Regular Expressions:: Describing classes of strings.
19 * Regexp Search:: Searching for a match for a regexp.
20 * Search and Replace:: Internals of @code{query-replace}.
21 * Match Data:: Finding out which part of the text matched
22 various parts of a regexp, after regexp search.
23 * Searching and Case:: Case-independent or case-significant searching.
24 * Standard Regexps:: Useful regexps for finding sentences, pages,...
27 The @samp{skip-chars@dots{}} functions also perform a kind of searching.
28 @xref{Skipping Characters}.
31 @section Searching for Strings
34 These are the primitive functions for searching through the text in a
35 buffer. They are meant for use in programs, but you may call them
36 interactively. If you do so, they prompt for the search string;
37 @var{limit} and @var{noerror} are set to @code{nil}, and @var{repeat}
40 @deffn Command search-forward string &optional limit noerror repeat
41 This function searches forward from point for an exact match for
42 @var{string}. If successful, it sets point to the end of the occurrence
43 found, and returns the new value of point. If no match is found, the
44 value and side effects depend on @var{noerror} (see below).
47 In the following example, point is initially at the beginning of the
48 line. Then @code{(search-forward "fox")} moves point after the last
53 ---------- Buffer: foo ----------
54 @point{}The quick brown fox jumped over the lazy dog.
55 ---------- Buffer: foo ----------
59 (search-forward "fox")
62 ---------- Buffer: foo ----------
63 The quick brown fox@point{} jumped over the lazy dog.
64 ---------- Buffer: foo ----------
68 The argument @var{limit} specifies the upper bound to the search. (It
69 must be a position in the current buffer.) No match extending after
70 that position is accepted. If @var{limit} is omitted or @code{nil}, it
71 defaults to the end of the accessible portion of the buffer.
74 What happens when the search fails depends on the value of
75 @var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed}
76 error is signaled. If @var{noerror} is @code{t}, @code{search-forward}
77 returns @code{nil} and does nothing. If @var{noerror} is neither
78 @code{nil} nor @code{t}, then @code{search-forward} moves point to the
79 upper bound and returns @code{nil}. (It would be more consistent now
80 to return the new position of point in that case, but some programs
81 may depend on a value of @code{nil}.)
83 If @var{repeat} is non-@code{nil}, then the search is repeated that
84 many times. Point is positioned at the end of the last match.
87 @deffn Command search-backward string &optional limit noerror repeat
88 This function searches backward from point for @var{string}. It is
89 just like @code{search-forward} except that it searches backwards and
90 leaves point at the beginning of the match.
93 @deffn Command word-search-forward string &optional limit noerror repeat
95 This function searches forward from point for a ``word'' match for
96 @var{string}. If it finds a match, it sets point to the end of the
97 match found, and returns the new value of point.
100 Word matching regards @var{string} as a sequence of words, disregarding
101 punctuation that separates them. It searches the buffer for the same
102 sequence of words. Each word must be distinct in the buffer (searching
103 for the word @samp{ball} does not match the word @samp{balls}), but the
104 details of punctuation and spacing are ignored (searching for @samp{ball
105 boy} does match @samp{ball. Boy!}).
107 In this example, point is initially at the beginning of the buffer; the
108 search leaves it between the @samp{y} and the @samp{!}.
112 ---------- Buffer: foo ----------
113 @point{}He said "Please! Find
115 ---------- Buffer: foo ----------
119 (word-search-forward "Please find the ball, boy.")
122 ---------- Buffer: foo ----------
123 He said "Please! Find
124 the ball boy@point{}!"
125 ---------- Buffer: foo ----------
129 If @var{limit} is non-@code{nil} (it must be a position in the current
130 buffer), then it is the upper bound to the search. The match found must
131 not extend after that position.
133 If @var{noerror} is @code{nil}, then @code{word-search-forward} signals
134 an error if the search fails. If @var{noerror} is @code{t}, then it
135 returns @code{nil} instead of signaling an error. If @var{noerror} is
136 neither @code{nil} nor @code{t}, it moves point to @var{limit} (or the
137 end of the buffer) and returns @code{nil}.
139 If @var{repeat} is non-@code{nil}, then the search is repeated that many
140 times. Point is positioned at the end of the last match.
143 @deffn Command word-search-backward string &optional limit noerror repeat
144 This function searches backward from point for a word match to
145 @var{string}. This function is just like @code{word-search-forward}
146 except that it searches backward and normally leaves point at the
147 beginning of the match.
150 @node Regular Expressions
151 @section Regular Expressions
152 @cindex regular expression
155 A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that
156 denotes a (possibly infinite) set of strings. Searching for matches for
157 a regexp is a very powerful operation. This section explains how to write
158 regexps; the following section says how to search for them.
161 * Syntax of Regexps:: Rules for writing regular expressions.
162 * Regexp Example:: Illustrates regular expression syntax.
165 @node Syntax of Regexps
166 @subsection Syntax of Regular Expressions
168 Regular expressions have a syntax in which a few characters are special
169 constructs and the rest are @dfn{ordinary}. An ordinary character is a
170 simple regular expression which matches that character and nothing else.
171 The special characters are @samp{$}, @samp{^}, @samp{.}, @samp{*},
172 @samp{+}, @samp{?}, @samp{[}, @samp{]} and @samp{\}; no new special
173 characters will be defined in the future. Any other character appearing
174 in a regular expression is ordinary, unless a @samp{\} precedes it.
176 For example, @samp{f} is not a special character, so it is ordinary, and
177 therefore @samp{f} is a regular expression that matches the string
178 @samp{f} and no other string. (It does @emph{not} match the string
179 @samp{ff}.) Likewise, @samp{o} is a regular expression that matches
180 only @samp{o}.@refill
182 Any two regular expressions @var{a} and @var{b} can be concatenated. The
183 result is a regular expression which matches a string if @var{a} matches
184 some amount of the beginning of that string and @var{b} matches the rest of
187 As a simple example, we can concatenate the regular expressions @samp{f}
188 and @samp{o} to get the regular expression @samp{fo}, which matches only
189 the string @samp{fo}. Still trivial. To do something more powerful, you
190 need to use one of the special characters. Here is a list of them:
194 @item .@: @r{(Period)}
195 @cindex @samp{.} in regexp
196 is a special character that matches any single character except a newline.
197 Using concatenation, we can make regular expressions like @samp{a.b}, which
198 matches any three-character string that begins with @samp{a} and ends with
202 @cindex @samp{*} in regexp
203 is not a construct by itself; it is a suffix operator that means to
204 repeat the preceding regular expression as many times as possible. In
205 @samp{fo*}, the @samp{*} applies to the @samp{o}, so @samp{fo*} matches
206 one @samp{f} followed by any number of @samp{o}s. The case of zero
207 @samp{o}s is allowed: @samp{fo*} does match @samp{f}.@refill
209 @samp{*} always applies to the @emph{smallest} possible preceding
210 expression. Thus, @samp{fo*} has a repeating @samp{o}, not a
211 repeating @samp{fo}.@refill
213 The matcher processes a @samp{*} construct by matching, immediately,
214 as many repetitions as can be found. Then it continues with the rest
215 of the pattern. If that fails, backtracking occurs, discarding some
216 of the matches of the @samp{*}-modified construct in case that makes
217 it possible to match the rest of the pattern. For example, in matching
218 @samp{ca*ar} against the string @samp{caaar}, the @samp{a*} first
219 tries to match all three @samp{a}s; but the rest of the pattern is
220 @samp{ar} and there is only @samp{r} left to match, so this try fails.
221 The next alternative is for @samp{a*} to match only two @samp{a}s.
222 With this choice, the rest of the regexp matches successfully.@refill
225 @cindex @samp{+} in regexp
226 is a suffix operator similar to @samp{*} except that the preceding
227 expression must match at least once. So, for example, @samp{ca+r}
228 matches the strings @samp{car} and @samp{caaaar} but not the string
229 @samp{cr}, whereas @samp{ca*r} matches all three strings.
232 @cindex @samp{?} in regexp
233 is a suffix operator similar to @samp{*} except that the preceding
234 expression can match either once or not at all. For example,
235 @samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anyhing
239 @cindex character set (in regexp)
240 @cindex @samp{[} in regexp
241 @cindex @samp{]} in regexp
242 @samp{[} begins a @dfn{character set}, which is terminated by a
243 @samp{]}. In the simplest case, the characters between the two brackets
244 form the set. Thus, @samp{[ad]} matches either one @samp{a} or one
245 @samp{d}, and @samp{[ad]*} matches any string composed of just @samp{a}s
246 and @samp{d}s (including the empty string), from which it follows that
247 @samp{c[ad]*r} matches @samp{cr}, @samp{car}, @samp{cdr},
248 @samp{caddaar}, etc.@refill
250 The usual regular expression special characters are not special inside a
251 character set. A completely different set of special characters exists
252 inside character sets: @samp{]}, @samp{-} and @samp{^}.@refill
254 @samp{-} is used for ranges of characters. To write a range, write two
255 characters with a @samp{-} between them. Thus, @samp{[a-z]} matches any
256 lower case letter. Ranges may be intermixed freely with individual
257 characters, as in @samp{[a-z$%.]}, which matches any lower case letter
258 or @samp{$}, @samp{%} or a period.@refill
260 To include a @samp{]} in a character set, make it the first character.
261 For example, @samp{[]a]} matches @samp{]} or @samp{a}. To include a
262 @samp{-}, write @samp{-} as the first character in the set, or put
263 immediately after a range. (You can replace one individual character
264 @var{c} with the range @samp{@var{c}-@var{c}} to make a place to put the
265 @samp{-}). There is no way to write a set containing just @samp{-} and
268 To include @samp{^} in a set, put it anywhere but at the beginning of
272 @cindex @samp{^} in regexp
273 @samp{[^} begins a @dfn{complement character set}, which matches any
274 character except the ones specified. Thus, @samp{[^a-z0-9A-Z]}
275 matches all characters @emph{except} letters and digits.@refill
277 @samp{^} is not special in a character set unless it is the first
278 character. The character following the @samp{^} is treated as if it
279 were first (thus, @samp{-} and @samp{]} are not special there).
281 Note that a complement character set can match a newline, unless
282 newline is mentioned as one of the characters not to match.
285 @cindex @samp{^} in regexp
286 @cindex beginning of line in regexp
287 is a special character that matches the empty string, but only at
288 the beginning of a line in the text being matched. Otherwise it fails
289 to match anything. Thus, @samp{^foo} matches a @samp{foo} which occurs
290 at the beginning of a line.
292 When matching a string, @samp{^} matches at the beginning of the string
293 or after a newline character @samp{\n}.
296 @cindex @samp{$} in regexp
297 is similar to @samp{^} but matches only at the end of a line. Thus,
298 @samp{x+$} matches a string of one @samp{x} or more at the end of a line.
300 When matching a string, @samp{$} matches at the end of the string
301 or before a newline character @samp{\n}.
304 @cindex @samp{\} in regexp
305 has two functions: it quotes the special characters (including
306 @samp{\}), and it introduces additional special constructs.
308 Because @samp{\} quotes special characters, @samp{\$} is a regular
309 expression which matches only @samp{$}, and @samp{\[} is a regular
310 expression which matches only @samp{[}, and so on.
312 Note that @samp{\} also has special meaning in the read syntax of Lisp
313 strings (@pxref{String Type}), and must be quoted with @samp{\}. For
314 example, the regular expression that matches the @samp{\} character is
315 @samp{\\}. To write a Lisp string that contains the characters
316 @samp{\\}, Lisp syntax requires you to quote each @samp{\} with another
317 @samp{\}. Therefore, the read syntax for a regular expression matching
318 @samp{\} is @code{"\\\\"}.@refill
321 @strong{Please note:} for historical compatibility, special characters
322 are treated as ordinary ones if they are in contexts where their special
323 meanings make no sense. For example, @samp{*foo} treats @samp{*} as
324 ordinary since there is no preceding expression on which the @samp{*}
325 can act. It is poor practice to depend on this behavior; better to
326 quote the special character anyway, regardless of where it
329 For the most part, @samp{\} followed by any character matches only
330 that character. However, there are several exceptions: characters
331 which, when preceded by @samp{\}, are special constructs. Such
332 characters are always ordinary when encountered on their own. Here
333 is a table of @samp{\} constructs:
337 @cindex @samp{|} in regexp
338 @cindex regexp alternative
339 specifies an alternative.
340 Two regular expressions @var{a} and @var{b} with @samp{\|} in
341 between form an expression that matches anything that either @var{a} or
342 @var{b} matches.@refill
344 Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar}
345 but no other string.@refill
347 @samp{\|} applies to the largest possible surrounding expressions. Only a
348 surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of
351 Full backtracking capability exists to handle multiple uses of @samp{\|}.
354 @cindex @samp{(} in regexp
355 @cindex @samp{)} in regexp
356 @cindex regexp grouping
357 is a grouping construct that serves three purposes:
361 To enclose a set of @samp{\|} alternatives for other operations.
362 Thus, @samp{\(foo\|bar\)x} matches either @samp{foox} or @samp{barx}.
365 To enclose an expression for a suffix operator such as @samp{*} to act
366 on. Thus, @samp{ba\(na\)*} matches @samp{bananana}, etc., with any
367 (zero or more) number of @samp{na} strings.@refill
370 To record a matched substring for future reference.
373 This last application is not a consequence of the idea of a
374 parenthetical grouping; it is a separate feature which happens to be
375 assigned as a second meaning to the same @samp{\( @dots{} \)} construct
376 because there is no conflict in practice between the two meanings.
377 Here is an explanation of this feature:
380 matches the same text which matched the @var{digit}th occurrence of a
381 @samp{\( @dots{} \)} construct.
383 In other words, after the end of a @samp{\( @dots{} \)} construct. the
384 matcher remembers the beginning and end of the text matched by that
385 construct. Then, later on in the regular expression, you can use
386 @samp{\} followed by @var{digit} to match that same text, whatever it
389 The strings matching the first nine @samp{\( @dots{} \)} constructs
390 appearing in a regular expression are assigned numbers 1 through 9 in
391 the order that the open parentheses appear in the regular expression.
392 So you can use @samp{\1} through @samp{\9} to refer to the text matched
393 by the corresponding @samp{\( @dots{} \)} constructs.
395 For example, @samp{\(.*\)\1} matches any newline-free string that is
396 composed of two identical halves. The @samp{\(.*\)} matches the first
397 half, which may be anything, but the @samp{\1} that follows must match
401 @cindex @samp{\w} in regexp
402 matches any word-constituent character. The editor syntax table
403 determines which characters these are. @xref{Syntax Tables}.
406 @cindex @samp{\W} in regexp
407 matches any character that is not a word-constituent.
410 @cindex @samp{\s} in regexp
411 matches any character whose syntax is @var{code}. Here @var{code} is a
412 character which represents a syntax code: thus, @samp{w} for word
413 constituent, @samp{-} for whitespace, @samp{(} for open parenthesis,
414 etc. @xref{Syntax Tables}, for a list of syntax codes and the
415 characters that stand for them.
418 @cindex @samp{\S} in regexp
419 matches any character whose syntax is not @var{code}.
422 These regular expression constructs match the empty string---that is,
423 they don't use up any characters---but whether they match depends on the
428 @cindex @samp{\`} in regexp
429 matches the empty string, but only at the beginning
430 of the buffer or string being matched against.
433 @cindex @samp{\'} in regexp
434 matches the empty string, but only at the end of
435 the buffer or string being matched against.
438 @cindex @samp{\=} in regexp
439 matches the empty string, but only at point.
440 (This construct is not defined when matching against a string.)
443 @cindex @samp{\b} in regexp
444 matches the empty string, but only at the beginning or
445 end of a word. Thus, @samp{\bfoo\b} matches any occurrence of
446 @samp{foo} as a separate word. @samp{\bballs?\b} matches
447 @samp{ball} or @samp{balls} as a separate word.@refill
450 @cindex @samp{\B} in regexp
451 matches the empty string, but @emph{not} at the beginning or
455 @cindex @samp{\<} in regexp
456 matches the empty string, but only at the beginning of a word.
459 @cindex @samp{\>} in regexp
460 matches the empty string, but only at the end of a word.
463 @kindex invalid-regexp
464 Not every string is a valid regular expression. For example, a string
465 with unbalanced square brackets is invalid (with a few exceptions, such
466 as @samp{[]]}, and so is a string that ends with a single @samp{\}. If
467 an invalid regular expression is passed to any of the search functions,
468 an @code{invalid-regexp} error is signaled.
470 @defun regexp-quote string
471 This function returns a regular expression string that matches exactly
472 @var{string} and nothing else. This allows you to request an exact
473 string match when calling a function that wants a regular expression.
477 (regexp-quote "^The cat$")
478 @result{} "\\^The cat\\$"
482 One use of @code{regexp-quote} is to combine an exact string match with
483 context described as a regular expression. For example, this searches
484 for the string which is the value of @code{string}, surrounded by
490 (concat "\\s " (regexp-quote string) "\\s "))
496 @comment node-name, next, previous, up
497 @subsection Complex Regexp Example
499 Here is a complicated regexp, used by Emacs to recognize the end of a
500 sentence together with any whitespace that follows. It is the value of
501 the variable @code{sentence-end}.
503 First, we show the regexp as a string in Lisp syntax to distinguish
504 spaces from tab characters. The string constant begins and ends with a
505 double-quote. @samp{\"} stands for a double-quote as part of the
506 string, @samp{\\} for a backslash as part of the string, @samp{\t} for a
507 tab and @samp{\n} for a newline.
510 "[.?!][]\"')@}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
513 In contrast, if you evaluate the variable @code{sentence-end}, you
514 will see the following:
520 "[.?!][]\"')@}]*\\($\\| $\\| \\| \\)[
526 In this output, tab and newline appear as themselves.
528 This regular expression contains four parts in succession and can be
529 deciphered as follows:
533 The first part of the pattern consists of three characters, a period, a
534 question mark and an exclamation mark, within square brackets. The
535 match must begin with one of these three characters.
538 The second part of the pattern matches any closing braces and quotation
539 marks, zero or more of them, that may follow the period, question mark
540 or exclamation mark. The @code{\"} is Lisp syntax for a double-quote in
541 a string. The @samp{*} at the end indicates that the immediately
542 preceding regular expression (a character set, in this case) may be
543 repeated zero or more times.
545 @item \\($\\|@ \\|\t\\|@ @ \\)
546 The third part of the pattern matches the whitespace that follows the
547 end of a sentence: the end of a line, or a tab, or two spaces. The
548 double backslashes mark the parentheses and vertical bars as regular
549 expression syntax; the parentheses mark the group and the vertical bars
550 separate alternatives. The dollar sign is used to match the end of a
554 Finally, the last part of the pattern matches any additional whitespace
555 beyond the minimum needed to end a sentence.
559 @section Regular Expression Searching
560 @cindex regular expression searching
561 @cindex regexp searching
562 @cindex searching for regexp
564 In GNU Emacs, you can search for the next match for a regexp either
565 incrementally or not. For incremental search commands, see @ref{Regexp
566 Search, , Regular Expression Search, emacs, The GNU Emacs Manual}. Here
567 we describe only the search functions useful in programs. The principal
568 one is @code{re-search-forward}.
570 @deffn Command re-search-forward regexp &optional limit noerror repeat
571 This function searches forward in the current buffer for a string of
572 text that is matched by the regular expression @var{regexp}. The
573 function skips over any amount of text that is not matched by
574 @var{regexp}, and leaves point at the end of the first match found.
575 It returns the new value of point.
577 If @var{limit} is non-@code{nil} (it must be a position in the current
578 buffer), then it is the upper bound to the search. No match extending
579 after that position is accepted.
581 What happens when the search fails depends on the value of
582 @var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed}
583 error is signaled. If @var{noerror} is @code{t},
584 @code{re-search-forward} does nothing and returns @code{nil}. If
585 @var{noerror} is neither @code{nil} nor @code{t}, then
586 @code{re-search-forward} moves point to @var{limit} (or the end of the
587 buffer) and returns @code{nil}.
589 If @var{repeat} is supplied (it must be a positive number), then the
590 search is repeated that many times (each time starting at the end of the
591 previous time's match). If these successive searches succeed, the
592 function succeeds, moving point and returning its new value. Otherwise
595 In the following example, point is initially before the @samp{T}.
596 Evaluating the search call moves point to the end of that line (between
597 the @samp{t} of @samp{hat} and the newline).
601 ---------- Buffer: foo ----------
602 I read "@point{}The cat in the hat
604 ---------- Buffer: foo ----------
608 (re-search-forward "[a-z]+" nil t 5)
611 ---------- Buffer: foo ----------
612 I read "The cat in the hat@point{}
614 ---------- Buffer: foo ----------
619 @deffn Command re-search-backward regexp &optional limit noerror repeat
620 This function searches backward in the current buffer for a string of
621 text that is matched by the regular expression @var{regexp}, leaving
622 point at the beginning of the first text found.
624 This function is analogous to @code{re-search-forward}, but they are
625 not simple mirror images. @code{re-search-forward} finds the match
626 whose beginning is as close as possible. If @code{re-search-backward}
627 were a perfect mirror image, it would find the match whose end is as
628 close as possible. However, in fact it finds the match whose beginning
629 is as close as possible. The reason is that matching a regular
630 expression at a given spot always works from beginning to end, and is
631 done at a specified beginning position.
633 A true mirror-image of @code{re-search-forward} would require a special
634 feature for matching regexps from end to beginning. It's not worth the
635 trouble of implementing that.
638 @defun string-match regexp string &optional start
639 This function returns the index of the start of the first match for
640 the regular expression @var{regexp} in @var{string}, or @code{nil} if
641 there is no match. If @var{start} is non-@code{nil}, the search starts
642 at that index in @var{string}.
649 "quick" "The quick brown fox jumped quickly.")
654 "quick" "The quick brown fox jumped quickly." 8)
660 The index of the first character of the
661 string is 0, the index of the second character is 1, and so on.
663 After this function returns, the index of the first character beyond
664 the match is available as @code{(match-end 0)}. @xref{Match Data}.
669 "quick" "The quick brown fox jumped quickly." 8)
680 @defun looking-at regexp
681 This function determines whether the text in the current buffer directly
682 following point matches the regular expression @var{regexp}. ``Directly
683 following'' means precisely that: the search is ``anchored'' and it can
684 succeed only starting with the first character following point. The
685 result is @code{t} if so, @code{nil} otherwise.
687 This function does not move point, but it updates the match data, which
688 you can access using @code{match-beginning} and @code{match-end}.
691 In this example, point is located directly before the @samp{T}. If it
692 were anywhere else, the result would be @code{nil}.
696 ---------- Buffer: foo ----------
697 I read "@point{}The cat in the hat
699 ---------- Buffer: foo ----------
701 (looking-at "The cat in the hat$")
708 @deffn Command delete-matching-lines regexp
709 This function is identical to @code{delete-non-matching-lines}, save
710 that it deletes what @code{delete-non-matching-lines} keeps.
712 In the example below, point is located on the first line of text.
716 ---------- Buffer: foo ----------
719 that all men are created
720 equal, and that they are
721 ---------- Buffer: foo ----------
725 (delete-matching-lines "the")
728 ---------- Buffer: foo ----------
730 that all men are created
731 ---------- Buffer: foo ----------
736 @deffn Command flush-lines regexp
737 This function is the same as @code{delete-matching-lines}.
740 @defun delete-non-matching-lines regexp
741 This function deletes all lines following point which don't
742 contain a match for the regular expression @var{regexp}.
745 @deffn Command keep-lines regexp
746 This function is the same as @code{delete-non-matching-lines}.
749 @deffn Command how-many regexp
750 This function counts the number of matches for @var{regexp} there are in
751 the current buffer following point. It prints this number in
752 the echo area, returning the string printed.
755 @deffn Command count-matches regexp
756 This function is a synonym of @code{how-many}.
759 @deffn Command list-matching-lines regexp nlines
760 This function is a synonym of @code{occur}.
761 Show all lines following point containing a match for @var{regexp}.
762 Display each line with @var{nlines} lines before and after,
763 or @code{-}@var{nlines} before if @var{nlines} is negative.
764 @var{nlines} defaults to @code{list-matching-lines-default-context-lines}.
765 Interactively it is the prefix arg.
767 The lines are shown in a buffer named @samp{*Occur*}.
768 It serves as a menu to find any of the occurrences in this buffer.
769 @kbd{C-h m} (@code{describe-mode} in that buffer gives help.
772 @defopt list-matching-lines-default-context-lines
774 Default number of context lines to include around a @code{list-matching-lines}
775 match. A negative number means to include that many lines before the match.
776 A positive number means to include that many lines both before and after.
780 @node Search and Replace
781 @section Search and Replace
784 @defun perform-replace from-string replacements query-flag regexp-flag delimited-flag &optional repeat-count map
785 This function is the guts of @code{query-replace} and related commands.
786 It searches for occurrences of @var{from-string} and replaces some or
787 all of them. If @var{query-flag} is @code{nil}, it replaces all
788 occurrences; otherwise, it asks the user what to do about each one.
790 If @var{regexp-flag} is non-@code{nil}, then @var{from-string} is
791 considered a regular expression; otherwise, it must match literally. If
792 @var{delimited-flag} is non-@code{nil}, then only replacements
793 surrounded by word boundaries are considered.
795 The argument @var{replacements} specifies what to replace occurrences
796 with. If it is a string, that string is used. It can also be a list of
797 strings, to be used in cyclic order.
799 If @var{repeat-count} is non-@code{nil}, it should be an integer, the
800 number of occurrences to consider. In this case, @code{perform-replace}
801 returns after considering that many occurrences.
803 Normally, the keymap @code{query-replace-map} defines the possible user
804 responses. The argument @var{map}, if non-@code{nil}, is a keymap to
805 use instead of @code{query-replace-map}.
808 @defvar query-replace-map
809 This variable holds a special keymap that defines the valid user
810 responses for @code{query-replace} and related functions, as well as
811 @code{y-or-n-p} and @code{map-y-or-n-p}. It is unusual in two ways:
815 The ``key bindings'' are not commands, just symbols that are meaningful
816 to the functions that use this map.
819 Prefix keys are not supported; each key binding must be for a single event
820 key sequence. This is because the functions don't use read key sequence to
821 get the input; instead, they read a single event and look it up ``by hand.''
825 Here are the meaningful ``bindings'' for @code{query-replace-map}.
826 Several of them are meaningful only for @code{query-replace} and
831 Do take the action being considered---in other words, ``yes.''
834 Do not take action for this question---in other words, ``no.''
837 Answer this question ``no,'' and don't ask any more.
840 Answer this question ``yes,'' and don't ask any more.
843 Answer this question ``yes,'' but show the results---don't advance yet
844 to the next question.
847 Answer this question and all subsequent questions in the series with
848 ``yes,'' without further user interaction.
851 Move back to the previous place that a question was asked about.
854 Enter a recursive edit to deal with this question---instead of any
855 other action that would normally be taken.
857 @item delete-and-edit
858 Delete the text being considered, then enter a recursive edit to replace
862 Redisplay and center the window, then ask the same question again.
865 Perform a quit right away. Only @code{y-or-n-p} and related functions
869 Display some help, then ask again.
873 @section The Match Data
876 Emacs keeps track of the positions of the start and end of segments of
877 text found during a regular expression search. This means, for example,
878 that you can search for a complex pattern, such as a date in an Rmail
879 message, and then extract parts of the match under control of the
882 Because the match data normally describe the most recent search only,
883 you must be careful not to do another search inadvertently between the
884 search you wish to refer back to and the use of the match data. If you
885 can't avoid another intervening search, you must save and restore the
886 match data around it, to prevent it from being overwritten.
889 * Simple Match Data:: Accessing single items of match data,
890 such as where a particular subexpression started.
891 * Replacing Match:: Replacing a substring that was matched.
892 * Entire Match Data:: Accessing the entire match data at once, as a list.
893 * Saving Match Data:: Saving and restoring the match data.
896 @node Simple Match Data
897 @subsection Simple Match Data Access
899 This section explains how to use the match data to find the starting
900 point or ending point of the text that was matched by a particular
901 search, or by a particular parenthetical subexpression of a regular
904 @defun match-beginning count
905 This function returns the position of the start of text matched by the
906 last regular expression searched for, or a subexpression of it.
908 The argument @var{count}, a number, specifies a subexpression whose
909 start position is the value. If @var{count} is zero, then the value is
910 the position of the text matched by the whole regexp. If @var{count} is
911 greater than zero, then the value is the position of the beginning of
912 the text matched by the @var{count}th subexpression.
914 Subexpressions of a regular expression are those expressions grouped
915 inside of parentheses, @samp{\(@dots{}\)}. The @var{count}th
916 subexpression is found by counting occurrences of @samp{\(} from the
917 beginning of the whole regular expression. The first subexpression is
918 numbered 1, the second 2, and so on.
920 The value is @code{nil} for a parenthetical grouping inside of a
921 @samp{\|} alternative that wasn't used in the match.
924 @defun match-end count
925 This function returns the position of the end of the text that matched
926 the last regular expression searched for, or a subexpression of it.
927 This function is otherwise similar to @code{match-beginning}.
930 Here is an example of using the match data, with a comment showing the
931 positions within the text:
935 (string-match "\\(qu\\)\\(ick\\)"
936 "The quick fox jumped quickly.")
942 (match-beginning 1) ; @r{The beginning of the match}
943 @result{} 4 ; @r{with @samp{qu} is at index 4.}
947 (match-beginning 2) ; @r{The beginning of the match}
948 @result{} 6 ; @r{with @samp{ick} is at index 6.}
952 (match-end 1) ; @r{The end of the match}
953 @result{} 6 ; @r{with @samp{qu} is at index 6.}
955 (match-end 2) ; @r{The end of the match}
956 @result{} 9 ; @r{with @samp{ick} is at index 9.}
960 Here is another example. Point is initially located at the beginning
961 of the line. Searching moves point to between the space and the word
962 @samp{in}. The beginning of the entire match is at the 9th character of
963 the buffer (@samp{T}), and the beginning of the match for the first
964 subexpression is at the 13th character (@samp{c}).
969 (re-search-forward "The \\(cat \\)")
976 ---------- Buffer: foo ----------
977 I read "The cat @point{}in the hat comes back" twice.
980 ---------- Buffer: foo ----------
985 (In this case, the index returned is a buffer position; the first
986 character of the buffer counts as 1.)
988 @node Replacing Match
989 @subsection Replacing the Text That Matched
991 This function replaces the text matched by the last search with
994 @cindex case in replacements
995 @defun replace-match replacement &optional fixedcase literal
996 This function replaces the buffer text matched by the last search, with
997 @var{replacement}. It applies only to buffers; you can't use
998 @code{replace-match} to replace a substring found with
1001 If @var{fixedcase} is non-@code{nil}, then the case of the replacement
1002 text is not changed; otherwise, the replacement text is converted to a
1003 different case depending upon the capitalization of the text to be
1004 replaced. If the original text is all upper case, the replacement text
1005 is converted to upper case. If the first word of the original text is
1006 capitalized, then the first word of the replacement text is capitalized.
1007 If the original text contains just one word, and that word is a capital
1008 letter, @code{replace-match} considers this a capitalized first word
1009 rather than all upper case.
1011 If @var{literal} is non-@code{nil}, then @var{replacement} is inserted
1012 exactly as it is, the only alterations being case changes as needed.
1013 If it is @code{nil} (the default), then the character @samp{\} is treated
1014 specially. If a @samp{\} appears in @var{replacement}, then it must be
1015 part of one of the following sequences:
1019 @cindex @samp{&} in replacement
1020 @samp{\&} stands for the entire text being replaced.
1022 @item @samp{\@var{n}}
1023 @cindex @samp{\@var{n}} in replacement
1024 @samp{\@var{n}} stands for the text that matched the @var{n}th
1025 subexpression in the original regexp. Subexpressions are those
1026 expressions grouped inside of @samp{\(@dots{}\)}. @var{n} is a digit.
1029 @cindex @samp{\} in replacement
1030 @samp{\\} stands for a single @samp{\} in the replacement text.
1033 @code{replace-match} leaves point at the end of the replacement text,
1034 and returns @code{t}.
1037 @node Entire Match Data
1038 @subsection Accessing the Entire Match Data
1040 The functions @code{match-data} and @code{set-match-data} read or
1041 write the entire match data, all at once.
1044 This function returns a newly constructed list containing all the
1045 information on what text the last search matched. Element zero is the
1046 position of the beginning of the match for the whole expression; element
1047 one is the position of the end of the match for the expression. The
1048 next two elements are the positions of the beginning and end of the
1049 match for the first subexpression, and so on. In general, element
1054 number {\mathsurround=0pt $2n$}
1056 corresponds to @code{(match-beginning @var{n})}; and
1062 number {\mathsurround=0pt $2n+1$}
1064 corresponds to @code{(match-end @var{n})}.
1066 All the elements are markers or @code{nil} if matching was done on a
1067 buffer, and all are integers or @code{nil} if matching was done on a
1068 string with @code{string-match}. (In Emacs 18 and earlier versions,
1069 markers were used even for matching on a string, except in the case
1072 As always, there must be no possibility of intervening searches between
1073 the call to a search function and the call to @code{match-data} that is
1074 intended to access the match data for that search.
1079 @result{} (#<marker at 9 in foo>
1080 #<marker at 17 in foo>
1081 #<marker at 13 in foo>
1082 #<marker at 17 in foo>)
1087 @defun set-match-data match-list
1088 This function sets the match data from the elements of @var{match-list},
1089 which should be a list that was the value of a previous call to
1092 If @var{match-list} refers to a buffer that doesn't exist, you don't get
1093 an error; that sets the match data in a meaningless but harmless way.
1095 @findex store-match-data
1096 @code{store-match-data} is an alias for @code{set-match-data}.
1099 @node Saving Match Data
1100 @subsection Saving and Restoring the Match Data
1102 All asynchronous process functions (filters and sentinels) and
1103 functions that use @code{recursive-edit} should save and restore the
1104 match data if they do a search or if they let the user type arbitrary
1105 commands. Saving the match data is useful in other cases as
1106 well---whenever you want to access the match data resulting from an
1107 earlier search, notwithstanding another intervening search.
1109 This example shows the problem that can arise if you fail to
1110 attend to this requirement:
1114 (re-search-forward "The \\(cat \\)")
1116 (foo) ; @r{Perhaps @code{foo} does}
1117 ; @r{more searching.}
1119 @result{} 61 ; @r{Unexpected result---not 48!}
1123 In Emacs versions 19 and later, you can save and restore the match
1124 data with @code{save-match-data}:
1126 @defspec save-match-data body@dots{}
1127 This special form executes @var{body}, saving and restoring the match
1128 data around it. This is useful if you wish to do a search without
1129 altering the match data that resulted from an earlier search.
1132 You can use @code{set-match-data} together with @code{match-data} to
1133 imitate the effect of the special form @code{save-match-data}. This is
1134 useful for writing code that can run in Emacs 18. Here is how:
1138 (let ((data (match-data)))
1140 @dots{} ; @r{May change the original match data.}
1141 (set-match-data data)))
1146 Here is a function which restores the match data provided the buffer
1147 associated with it still exists.
1151 (defun restore-match-data (data)
1152 @c It is incorrect to split the first line of a doc string.
1153 @c If there's a problem here, it should be solved in some other way.
1154 "Restore the match data DATA unless the buffer is missing."
1160 (null (marker-buffer (car d)))
1162 ;; @file{match-data} @r{buffer is deleted.}
1165 (set-match-data data))))
1170 @node Searching and Case
1171 @section Searching and Case
1172 @cindex searching and case
1174 By default, searches in Emacs ignore the case of the text they are
1175 searching through; if you specify searching for @samp{FOO}, then
1176 @samp{Foo} or @samp{foo} is also considered a match. Regexps, and in
1177 particular character sets, are included: thus, @samp{[aB]} would match
1178 @samp{a} or @samp{A} or @samp{b} or @samp{B}.
1180 If you do not want this feature, set the variable
1181 @code{case-fold-search} to @code{nil}. Then all letters must match
1182 exactly, including case. This is a per-buffer-local variable; altering
1183 the variable affects only the current buffer. (@xref{Intro to
1184 Buffer-Local}.) Alternatively, you may change the value of
1185 @code{default-case-fold-search}, which is the default value of
1186 @code{case-fold-search} for buffers that do not override it.
1188 Note that the user-level incremental search feature handles case
1189 distinctions differently. When given a lower case letter, it looks for
1190 a match of either case, but when given an upper case letter, it looks
1191 for an upper case letter only. But this has nothing to do with the
1192 searching functions Lisp functions use.
1194 @defopt case-replace
1195 This variable determines whether @code{query-replace} should preserve
1196 case in replacements. If the variable is @code{nil}, then
1197 @code{replace-match} should not try to convert case.
1200 @defopt case-fold-search
1201 This buffer-local variable determines whether searches should ignore
1202 case. If the variable is @code{nil} they do not ignore case; otherwise
1203 they do ignore case.
1206 @defvar default-case-fold-search
1207 The value of this variable is the default value for
1208 @code{case-fold-search} in buffers that do not override it. This is the
1209 same as @code{(default-value 'case-fold-search)}.
1212 @node Standard Regexps
1213 @section Standard Regular Expressions Used in Editing
1214 @cindex regexps used standardly in editing
1215 @cindex standard regexps used in editing
1217 This section describes some variables that hold regular expressions
1218 used for certain purposes in editing:
1220 @defvar page-delimiter
1221 This is the regexp describing line-beginnings that separate pages. The
1222 default value is @code{"^\014"} (i.e., @code{"^^L"} or @code{"^\C-l"}).
1225 @defvar paragraph-separate
1226 This is the regular expression for recognizing the beginning of a line
1227 that separates paragraphs. (If you change this, you may have to
1228 change @code{paragraph-start} also.) The default value is @code{"^[
1229 \t\f]*$"}, which is a line that consists entirely of spaces, tabs, and
1233 @defvar paragraph-start
1234 This is the regular expression for recognizing the beginning of a line
1235 that starts @emph{or} separates paragraphs. The default value is
1236 @code{"^[ \t\n\f]"}, which matches a line starting with a space, tab,
1237 newline, or form feed.
1240 @defvar sentence-end
1241 This is the regular expression describing the end of a sentence. (All
1242 paragraph boundaries also end sentences, regardless.) The default value
1246 "[.?!][]\"')@}]*\\($\\|\t\\| \\)[ \t\n]*"
1249 This means a period, question mark or exclamation mark, followed by a
1250 closing brace, followed by tabs, spaces or new lines.
1252 For a detailed explanation of this regular expression, see @ref{Regexp