1 .\" Copyright (C), 1995, Graeme W. Wilford. (Wilf.)
3 .\" %%%LICENSE_START(VERBATIM)
4 .\" Permission is granted to make and distribute verbatim copies of this
5 .\" manual provided the copyright notice and this permission notice are
6 .\" preserved on all copies.
8 .\" Permission is granted to copy and distribute modified versions of this
9 .\" manual under the conditions for verbatim copying, provided that the
10 .\" entire resulting derived work is distributed under the terms of a
11 .\" permission notice identical to this one.
13 .\" Since the Linux kernel and libraries are constantly changing, this
14 .\" manual page may be incorrect or out-of-date. The author(s) assume no
15 .\" responsibility for errors or omissions, or for damages resulting from
16 .\" the use of the information contained herein. The author(s) may not
17 .\" have taken the same level of care in the production of this manual,
18 .\" which is licensed free of charge, as they might when working
21 .\" Formatted or processed versions of this manual, if unaccompanied by
22 .\" the source, must acknowledge the copyright and authors of this work.
25 .\" Wed Jun 14 16:10:28 BST 1995 Wilf. (G.Wilford@ee.surrey.ac.uk)
26 .\" Tiny change in formatting - aeb, 950812
27 .\" Modified 8 May 1998 by Joseph S. Myers (jsm28@cam.ac.uk)
29 .\" show the synopsis section nicely
30 .TH REGEX 3 2021-03-22 "GNU" "Linux Programmer's Manual"
32 regcomp, regexec, regerror, regfree \- POSIX regex functions
37 .BI "int regcomp(regex_t *restrict " preg ", const char *restrict " regex ,
39 .BI "int regexec(const regex_t *restrict " preg \
40 ", const char *restrict " string ,
41 .BI " size_t " nmatch ", regmatch_t " pmatch "[restrict]\
44 .BI "size_t regerror(int " errcode ", const regex_t *restrict " preg ,
45 .BI " char *restrict " errbuf ", size_t " errbuf_size );
46 .BI "void regfree(regex_t *" preg );
49 .SS POSIX regex compiling
51 is used to compile a regular expression into a form that is suitable
59 a pointer to a pattern buffer storage area;
61 a pointer to the null-terminated string and
63 flags used to determine the type of compilation.
65 All regular expression searching must be done via a compiled pattern
68 must always be supplied with the address of a
69 .BR regcomp ()-initialized
75 of zero or more of the following:
80 Extended Regular Expression syntax when interpreting
84 Basic Regular Expression syntax is used.
87 Do not differentiate case.
90 searches using this pattern buffer will be case insensitive.
93 Do not report position of matches.
100 are ignored if the pattern buffer supplied was compiled with this flag set.
103 Match-any-character operators don't match a newline.
107 not containing a newline does not match a newline.
109 Match-beginning-of-line operator
111 matches the empty string immediately after a newline, regardless of
114 the execution flags of
119 Match-end-of-line operator
121 matches the empty string immediately before a newline, regardless of
126 .SS POSIX regex matching
128 is used to match a null-terminated string
129 against the precompiled pattern buffer,
134 are used to provide information regarding the location of any matches.
138 of zero or more of the following flags:
141 The match-beginning-of-line operator always fails to match (but see the
145 This flag may be used when different portions of a string are passed to
147 and the beginning of the string should not be interpreted as the
148 beginning of the line.
151 The match-end-of-line operator always fails to match (but see the
159 on the input string, starting at byte
161 and ending before byte
162 .IR pmatch[0].rm_eo .
163 This allows matching embedded NUL bytes
169 on input, and does not change
174 This flag is a BSD extension, not present in POSIX.
178 was set for the compilation of the pattern buffer, it is possible to
179 obtain match addressing information.
181 must be dimensioned to have at least
184 These are filled in by
186 with substring match addresses.
187 The offsets of the subexpression starting at the
189 open parenthesis are stored in
191 The entire regular expression's match addresses are stored in
193 (Note that to return the offsets of
195 subexpression matches,
199 Any unused structure elements will contain the value \-1.
203 structure which is the type of
219 element that is not \-1 indicates the start offset of the next largest
220 substring match within the string.
223 element indicates the end offset of the match,
224 which is the offset of the first character after the matching text.
225 .SS POSIX error reporting
227 is used to turn the error codes that can be returned by both
231 into error message strings.
234 is passed the error code,
238 a pointer to a character string buffer,
240 and the size of the string buffer,
242 It returns the size of the
244 required to contain the null-terminated error message string.
251 is filled in with the first
252 .I "errbuf_size \- 1"
253 characters of the error message and a terminating null byte (\(aq\e0\(aq).
254 .SS POSIX pattern buffer freeing
257 with a precompiled pattern buffer,
259 will free the memory allocated to the pattern buffer by the compiling
264 returns zero for a successful compilation or an error code for failure.
267 returns zero for a successful match or
271 The following errors can be returned by
275 Invalid use of back reference operator.
278 Invalid use of pattern operators such as group or list.
281 Invalid use of repetition operators such as using \(aq*\(aq
282 as the first character.
285 Un-matched brace interval operators.
288 Un-matched bracket list operators.
291 Invalid collating element.
294 Unknown character class name.
298 This is not defined by POSIX.2.
304 Un-matched parenthesis group operators.
307 Invalid use of the range operator; for example, the ending point of the range
308 occurs prior to the starting point.
311 Compiled regular expression requires a pattern buffer larger than 64\ kB.
312 This is not defined by POSIX.2.
315 The regex routines ran out of memory.
318 Invalid back reference to a subexpression.
320 For an explanation of the terms used in this section, see
328 Interface Attribute Value
332 T} Thread safety MT-Safe locale
335 T} Thread safety MT-Safe env
338 T} Thread safety MT-Safe
344 POSIX.1-2001, POSIX.1-2008.
352 #define ARRAY_SIZE(arr) (sizeof((arr)) / sizeof((arr)[0]))
354 static const char *const str =
355 "1) John Driverhacker;\en2) John Doe;\en3) John Foo;\en";
356 static const char *const re = "John.*o";
360 static const char *s = str;
362 regmatch_t pmatch[1];
365 if (regcomp(®ex, re, REG_NEWLINE))
368 printf("String = \e"%s\e"\en", str);
369 printf("Matches:\en");
371 for (int i = 0; ; i++) {
372 if (regexec(®ex, s, ARRAY_SIZE(pmatch), pmatch, 0))
375 off = pmatch[0].rm_so + (s \- str);
376 len = pmatch[0].rm_eo \- pmatch[0].rm_so;
377 printf("#%d:\en", i);
378 printf("offset = %jd; length = %jd\en", (intmax_t) off,
380 printf("substring = \e"%.*s\e"\en", len, s + pmatch[0].rm_so);
382 s += pmatch[0].rm_eo;
392 The glibc manual section,
393 .I "Regular Expressions"