2 .\" Copyright (c) 1992, X/Open Company Limited. All Rights Reserved. Portions Copyright (c) 2003, Sun Microsystems, Inc. All Rights Reserved.
3 .\" Sun Microsystems, Inc. gratefully acknowledges The Open Group for permission to reproduce portions of its copyrighted documentation. Original documentation from The Open Group can be obtained online at
4 .\" http://www.opengroup.org/bookstore/.
5 .\" The Institute of Electrical and Electronics Engineers and The Open Group, have given us permission to reprint portions of their documentation. In the following statement, the phrase "this text" refers to portions of the system documentation. Portions of this text are reprinted and reproduced in electronic form in the Sun OS Reference Manual, from IEEE Std 1003.1, 2004 Edition, Standard for Information Technology -- Portable Operating System Interface (POSIX), The Open Group Base Specifications Issue 6, Copyright (C) 2001-2004 by the Institute of Electrical and Electronics Engineers, Inc and The Open Group. In the event of any discrepancy between these versions and the original IEEE and The Open Group Standard, the original IEEE and The Open Group Standard is the referee document. The original Standard can be obtained online at http://www.opengroup.org/unix/online.html.
6 .\" This notice shall appear on any product containing this material.
7 .\" The contents of this file are subject to the terms of the Common Development and Distribution License (the "License"). You may not use this file except in compliance with the License.
8 .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing. See the License for the specific language governing permissions and limitations under the License.
9 .\" When distributing Covered Code, include this CDDL HEADER in each file and include the License file at usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your own identifying information: Portions Copyright [yyyy] [name of copyright owner]
10 .TH REGCOMP 3C "Nov 1, 2003"
12 regcomp, regexec, regerror, regfree \- regular expression matching
16 #include <sys/types.h>
19 \fBint\fR \fBregcomp\fR(\fBregex_t *restrict\fR \fIpreg\fR, \fBconst char *restrict\fR \fIpattern\fR,
20 \fBint\fR \fIcflags\fR);
25 \fBint\fR \fBregexec\fR(\fBconst regex_t *restrict\fR \fIpreg\fR,
26 \fBconst char *restrict\fR \fIstring\fR, \fBsize_t\fR \fInmatch\fR,
27 \fBregmatch_t\fR \fIpmatch\fR[restrict], \fBint\fR \fIeflags\fR);
32 \fBsize_t\fR \fBregerror\fR(\fBint\fR \fIerrcode\fR, \fBconst regex_t *restrict\fR \fIpreg\fR,
33 \fBchar *restrict\fR \fIerrbuf\fR, \fBsize_t\fR \fIerrbuf_size\fR);
38 \fBvoid\fR \fBregfree\fR(\fBregex_t *\fR\fIpreg\fR);
44 These functions interpret \fIbasic\fR and \fIextended\fR regular expressions
45 (described on the \fBregex\fR(5) manual page).
48 The structure type \fBregex_t\fR contains at least the following member:
52 \fB\fBsize_t\fR \fBre_nsub\fR\fR
55 Number of parenthesised subexpressions.
60 The structure type \fBregmatch_t\fR contains at least the following members:
64 \fB\fBregoff_t\fR \fBrm_so\fR\fR
67 Byte offset from start of \fIstring\fR to start of substring.
73 \fB\fBregoff_t\fR \fBrm_eo\fR\fR
76 Byte offset from start of \fIstring\fR of the first character after the end of
83 The \fBregcomp()\fR function will compile the regular expression contained in
84 the string pointed to by the \fIpattern\fR argument and place the results in
85 the structure pointed to by \fIpreg\fR. The \fIcflags\fR argument is the
86 bitwise inclusive \fBOR\fR of zero or more of the following flags, which are
87 defined in the header \fB<regex.h>\fR:
91 \fB\fBREG_EXTENDED\fR\fR
94 Use Extended Regular Expressions.
100 \fB\fBREG_ICASE\fR\fR
103 Ignore case in match.
109 \fB\fBREG_NOSUB\fR\fR
112 Report only success/fail in \fBregexec()\fR.
118 \fB\fBREG_NEWLINE\fR\fR
121 Change the handling of \fBNEWLINE\fR characters, as described in the text.
126 The default regular expression type for \fIpattern\fR is a Basic Regular
127 Expression. The application can specify Extended Regular Expressions using the
128 \fBREG_EXTENDED\fR \fIcflags\fR flag.
131 If the \fBREG_NOSUB\fR flag was not set in \fIcflags\fR, then \fBregcomp()\fR
132 will set \fIre_nsub\fR to the number of parenthesised subexpressions (delimited
133 by \e(\e) in basic regular expressions or () in extended regular expressions)
134 found in \fI pattern\fR.
135 .SS "\fBregexec()\fR"
138 The \fBregexec()\fR function compares the null-terminated string specified by
139 \fIstring\fR with the compiled regular expression \fIpreg\fR initialized by a
140 previous call to \fBregcomp()\fR. The \fIeflags\fR argument is the bitwise
141 inclusive \fBOR\fR of zero or more of the following flags, which are defined in
142 the header <\fBregex.h\fR>:
146 \fB\fBREG_NOTBOL\fR\fR
149 The first character of the string pointed to by \fIstring\fR is not the
150 beginning of the line. Therefore, the circumflex character (\fI^\fR), when
151 taken as a special character, will not match the beginning of \fIstring\fR.
157 \fB\fBREG_NOTEOL\fR\fR
160 The last character of the string pointed to by \fIstring\fR is not the end of
161 the line. Therefore, the dollar sign (\fI$\fR), when taken as a special
162 character, will not match the end of \fIstring\fR.
167 If \fInmatch\fR is zero or \fBREG_NOSUB\fR was set in the \fIcflags\fR argument
168 to \fBregcomp()\fR, then \fBregexec()\fR will ignore the \fIpmatch\fR argument.
169 Otherwise, the \fIpmatch\fR argument must point to an array with at least
170 \fInmatch\fR elements, and \fBregexec()\fR will fill in the elements of that
171 array with offsets of the substrings of \fIstring\fR that correspond to the
172 parenthesised subexpressions of \fIpattern\fR:
173 \fIpmatch\fR\fB[\fR\fIi\fR\fB]\fR.\fIrm_so\fR will be the byte offset of the
174 beginning and \fIpmatch\fR\fB[\fR\fIi\fR\fB]\fR.\fIrm_eo\fR will be one greater
175 than the byte offset of the end of substring \fIi\fR. (Subexpression \fIi\fR
176 begins at the \fIi\fRth matched open parenthesis, counting from 1.) Offsets in
177 \fIpmatch\fR\fB[0]\fR identify the substring that corresponds to the entire
178 regular expression. Unused elements of \fIpmatch\fR up to
179 \fIpmatch\fR\fB[\fR\fInmatch\fR\fB\(mi1]\fR will be filled with \fB\(mi1\fR\&.
180 If there are more than \fInmatch\fR subexpressions in \fIpattern\fR
181 (\fIpattern\fR itself counts as a subexpression), then \fBregexec()\fR will
182 still do the match, but will record only the first \fInmatch\fR substrings.
185 When matching a basic or extended regular expression, any given parenthesised
186 subexpression of \fIpattern\fR might participate in the match of several
187 different substrings of \fIstring\fR, or it might not match any substring even
188 though the pattern as a whole did match. The following rules are used to
189 determine which substrings to report in \fIpmatch\fR when matching regular
197 If subexpression \fIi\fR in a regular expression is not contained within
198 another subexpression, and it participated in the match several times, then the
199 byte offsets in \fIpmatch\fR\fB[\fR\fIi\fR\fB]\fR will delimit the last such
209 If subexpression \fIi\fR is not contained within another subexpression, and it
210 did not participate in an otherwise successful match, the byte offsets in
211 \fIpmatch\fR\fB[\fR\fIi\fR\fB]\fR will be \fB\(mi1\fR\&. A subexpression does
212 not participate in the match when:
214 \fB*\fR or \fB\e{\e} \fR appears immediately after the subexpression in a basic
215 regular expression, or \fB*\fR, \fB?\fR, or \fB{\|}\fR appears immediately
216 after the subexpression in an extended regular expression, and the
217 subexpression did not match (matched zero times)
221 \fB|\fR is used in an extended regular expression to select this subexpression
222 or another, and the other subexpression matched.
231 If subexpression \fIi\fR is contained within another subexpression \fIj\fR, and
232 \fIi\fR is not contained within any other subexpression that is contained
233 within \fIj\fR, and a match of subexpression \fIj\fR is reported in
234 \fIpmatch\fR\fB[\fR\fIj\fR\fB]\fR, then the match or non-match of subexpression
235 \fIi\fR reported in \fIpmatch\fR\fB[\fR\fIi\fR\fB]\fR will be as described in
236 1. and 2. above, but within the substring reported in
237 \fIpmatch\fR\fB[\fR\fIj\fR\fB]\fR rather than the whole string.
246 If subexpression \fIi\fR is contained in subexpression \fIj\fR, and the byte
247 offsets in \fIpmatch\fR\fB[\fR\fIj\fR\fB]\fR are \fB\(mi1\fR, then the pointers
248 in \fIpmatch\fR\fB[\fR\fIi\fR\fB]\fR also will be \fB\(mi1\fR\&.
257 If subexpression \fIi\fR matched a zero-length string, then both byte offsets
258 in \fIpmatch\fR\fB[\fR\fIi\fR\fB]\fR will be the byte offset of the character
259 or \fINULL\fR terminator immediately following the zero-length string.
264 If, when \fBregexec()\fR is called, the locale is different from when the
265 regular expression was compiled, the result is undefined.
268 If \fBREG_NEWLINE\fR is not set in \fIcflags\fR, then a \fBNEWLINE\fR character
269 in \fIpattern\fR or \fIstring\fR will be treated as an ordinary character. If
270 \fBREG_NEWLINE\fR is set, then newline will be treated as an ordinary character
278 A \fBNEWLINE\fR character in \fIstring\fR will not be matched by a period
279 outside a bracket expression or by any form of a non-matching list.
288 A circumflex (^) in \fIpattern\fR, when used to specify expression anchoring
289 will match the zero-length string immediately after a newline in \fIstring\fR,
290 regardless of the setting of \fBREG_NOTBOL\fR.
299 A dollar-sign ($) in \fIpattern\fR, when used to specify expression anchoring,
300 will match the zero-length string immediately before a newline in \fIstring\fR,
301 regardless of the setting of \fBREG_NOTEOL.\fR
304 .SS "\fBregfree()\fR"
307 The \fBregfree()\fR function frees any memory allocated by \fBregcomp()\fR
308 associated with \fIpreg\fR.
311 The following constants are defined as error return values:
315 \fB\fBREG_NOMATCH\fR\fR
318 The \fBregexec()\fR function failed to match.
324 \fB\fBREG_BADPAT\fR\fR
327 Invalid regular expression.
333 \fB\fBREG_ECOLLATE\fR\fR
336 Invalid collating element referenced.
342 \fB\fBREG_ECTYPE\fR\fR
345 Invalid character class type referenced.
351 \fB\fBREG_EESCAPE\fR\fR
354 Trailing \e in pattern.
360 \fB\fBREG_ESUBREG\fR\fR
363 Number in \e\fIdigit\fR invalid or in error.
369 \fB\fBREG_EBRACK\fR\fR
372 \fB[\|]\fR imbalance.
378 \fB\fBREG_ENOSYS\fR\fR
381 The function is not supported.
387 \fB\fBREG_EPAREN\fR\fR
390 \fB\e(\|\e)\fR or \fB()\fR imbalance.
396 \fB\fBREG_EBRACE\fR\fR
405 \fB\fBREG_BADBR\fR\fR
408 Content of \e{ \e} invalid: not a number, number too large, more than two
409 numbers, first larger than second.
415 \fB\fBREG_ERANGE\fR\fR
418 Invalid endpoint in range expression.
424 \fB\fBREG_ESPACE\fR\fR
433 \fB\fBREG_BADRPT\fR\fR
436 ?, * or + not preceded by valid regular expression.
439 .SS "\fBregerror()\fR"
442 The \fBregerror()\fR function provides a mapping from error codes returned by
443 \fBregcomp()\fR and \fBregexec()\fR to unspecified printable strings. It
444 generates a string corresponding to the value of the \fIerrcode\fR argument,
445 which must be the last non-zero value returned by \fBregcomp()\fR or
446 \fBregexec()\fR with the given value of \fIpreg\fR. If \fIerrcode\fR is not
447 such a value, an error message indicating that the error code is invalid is
451 If \fIpreg\fR is a \fINULL\fR pointer, but \fIerrcode\fR is a value returned by
452 a previous call to \fBregexec()\fR or \fBregcomp()\fR, the \fBregerror()\fR
453 still generates an error string corresponding to the value of \fIerrcode\fR.
456 If the \fIerrbuf_size\fR argument is not zero, \fBregerror()\fR will place the
457 generated string into the buffer of size \fIerrbuf_size\fR bytes pointed to by
458 \fIerrbuf\fR. If the string (including the terminating \fBNULL)\fR cannot fit
459 in the buffer, \fBregerror()\fR will truncate the string and null-terminate the
463 If \fIerrbuf_size\fR is zero, \fBregerror()\fR ignores the \fIerrbuf\fR
464 argument, and returns the size of the buffer needed to hold the generated
468 If the \fIpreg\fR argument to \fBregexec()\fR or \fBregfree()\fR is not a
469 compiled regular expression returned by \fBregcomp()\fR, the result is
470 undefined. A \fIpreg\fR is no longer treated as a compiled regular expression
471 after it is given to \fBregfree()\fR.
474 See \fBregex\fR(5) for BRE (Basic Regular Expression) Anchoring.
478 On successful completion, the \fBregcomp()\fR function returns \fB0\fR.
479 Otherwise, it returns an integer value indicating an error as described in
480 <\fBregex.h\fR>, and the content of \fIpreg\fR is undefined.
483 On successful completion, the \fBregexec()\fR function returns \fB0\fR.
484 Otherwise it returns \fBREG_NOMATCH\fR to indicate no match, or
485 \fBREG_ENOSYS\fR to indicate that the function is not supported.
488 Upon successful completion, the \fBregerror()\fR function returns the number of
489 bytes needed to hold the entire generated string. Otherwise, it returns \fB0\fR
490 to indicate that the function is not implemented.
493 The \fBregfree()\fR function returns no value.
497 No errors are defined.
501 An application could use:
504 \fBregerror(code,preg,(char *)NULL,(size_t)0)\fR
507 to find out how big a buffer is needed for the generated string, \fBmalloc\fR a
508 buffer to hold the string, and then call \fBregerror()\fR again to get the
509 string (see \fBmalloc\fR(3C)). Alternately, it could allocate a fixed, static
510 buffer that is big enough to hold most strings, and then use \fBmalloc()\fR to
511 allocate a larger buffer if it finds that this is too small.
514 \fBExample 1 \fRExample to match string against the extended regular expression
521 * Match string against the extended regular expression in
522 * pattern, treating errors as no match.
524 * return 1 for match, 0 for no match
528 match(const char *string, char *pattern)
532 if (regcomp(&re, pattern, REG_EXTENDED\||\|REG_NOSUB) != 0) {
533 return(0); /* report error */
535 status = regexec(&re, string, (size_t) 0, NULL, 0);
538 return(0); /* report error */
547 The following demonstrates how the \fBREG_NOTBOL\fR flag could be used with
548 \fBregexec()\fR to find all substrings in a line that match a pattern supplied
549 by a user. (For simplicity of the example, very little error checking is done.)
554 (void) regcomp (&re, pattern, 0);
555 /* this call to regexec(\|) finds the first match on the line */
556 error = regexec (&re, &buffer[0], 1, &pm, 0);
557 while (error == 0) { /* while matches found */
558 /* substring found between pm.rm_so and pm.rm_eo */
559 /* This call to regexec(\|) finds the next match */
560 error = regexec (&re, buffer + pm.rm_eo, 1, &pm, REG_NOTBOL);
568 See \fBattributes\fR(5) for descriptions of the following attributes:
576 ATTRIBUTE TYPE ATTRIBUTE VALUE
580 Interface Stability Standard
582 MT-Level MT-Safe with exceptions
588 \fBfnmatch\fR(3C), \fBglob\fR(3C), \fBmalloc\fR(3C), \fBsetlocale\fR(3C),
589 \fBattributes\fR(5), \fBstandards\fR(5), \fBregex\fR(5)
593 The \fBregcomp()\fR function can be used safely in a multithreaded application
594 as long as \fBsetlocale\fR(3C) is not being called to change the locale.