1 .\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
2 .\" Copyright (c) 1992, 1993, 1994
3 .\" The Regents of the University of California. All rights reserved.
5 .\" This code is derived from software contributed to Berkeley by
8 .\" Redistribution and use in source and binary forms, with or without
9 .\" modification, are permitted provided that the following conditions
11 .\" 1. Redistributions of source code must retain the above copyright
12 .\" notice, this list of conditions and the following disclaimer.
13 .\" 2. Redistributions in binary form must reproduce the above copyright
14 .\" notice, this list of conditions and the following disclaimer in the
15 .\" documentation and/or other materials provided with the distribution.
16 .\" 3. Neither the name of the University nor the names of its contributors
17 .\" may be used to endorse or promote products derived from this software
18 .\" without specific prior written permission.
20 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
21 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
23 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
24 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
26 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
27 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
28 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
29 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
32 .\" Sun Microsystems, Inc. gratefully acknowledges The Open Group for
33 .\" permission to reproduce portions of its copyrighted documentation.
34 .\" Original documentation from The Open Group can be obtained online at
35 .\" http://www.opengroup.org/bookstore/.
37 .\" The Institute of Electrical and Electronics Engineers and The Open
38 .\" Group, have given us permission to reprint portions of their
41 .\" In the following statement, the phrase ``this text'' refers to portions
42 .\" of the system documentation.
44 .\" Portions of this text are reprinted and reproduced in electronic form
45 .\" in the SunOS Reference Manual, from IEEE Std 1003.1, 2004 Edition,
46 .\" Standard for Information Technology -- Portable Operating System
47 .\" Interface (POSIX), The Open Group Base Specifications Issue 6,
48 .\" Copyright (C) 2001-2004 by the Institute of Electrical and Electronics
49 .\" Engineers, Inc and The Open Group. In the event of any discrepancy
50 .\" between these versions and the original IEEE and The Open Group
51 .\" Standard, the original IEEE and The Open Group Standard is the referee
52 .\" document. The original Standard can be obtained online at
53 .\" http://www.opengroup.org/unix/online.html.
55 .\" This notice shall appear on any product containing this material.
57 .\" The contents of this file are subject to the terms of the
58 .\" Common Development and Distribution License (the "License").
59 .\" You may not use this file except in compliance with the License.
61 .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
62 .\" or http://www.opensolaris.org/os/licensing.
63 .\" See the License for the specific language governing permissions
64 .\" and limitations under the License.
66 .\" When distributing Covered Code, include this CDDL HEADER in each
67 .\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
68 .\" If applicable, add the following below this CDDL HEADER, with the
69 .\" fields enclosed by brackets "[]" replaced with your own identifying
70 .\" information: Portions Copyright [yyyy] [name of copyright owner]
73 .\" Copyright (c) 1992, X/Open Company Limited. All Rights Reserved.
74 .\" Portions Copyright (c) 2003, Sun Microsystems, Inc. All Rights Reserved.
75 .\" Copyright 2017 Nexenta Systems, Inc.
85 .Nd regular-expression library
92 .Fa "regex_t *restrict preg" "const char *restrict pattern" "int cflags"
96 .Fa "const regex_t *restrict preg" "const char *restrict string"
97 .Fa "size_t nmatch" "regmatch_t pmatch[restrict]" "int eflags"
101 .Fa "int errcode" "const regex_t *restrict preg"
102 .Fa "char *restrict errbuf" "size_t errbuf_size"
105 .Fn regfree "regex_t *preg"
107 These routines implement
109 regular expressions; see
113 function compiles an RE written as a string into an internal form,
115 matches that internal form against a string and reports results,
117 transforms error codes from either into human-readable messages,
120 frees any dynamically-allocated storage used by the internal form
125 declares two structure types,
129 the former for compiled internal forms and the latter for match reporting.
130 It also declares the four functions, a type
132 and a number of constants with names starting with
137 function compiles the regular expression contained in the
139 string, subject to the flags in
141 and places the results in the
143 structure pointed to by
147 argument is the bitwise OR of zero or more of the following flags:
148 .Bl -tag -width REG_EXTENDED
150 Compile extended regular expressions
152 rather than the basic regular expressions
154 that are the default.
156 This is a synonym for 0, provided as a counterpart to
158 to improve readability.
160 Compile with recognition of all special characters turned off.
161 All characters are thus considered ordinary, so the RE is a literal string.
162 This is an extension, compatible with but not specified by
164 and should be used with caution in software intended to be portable to other
169 may not be used in the same call to
172 Compile for matching that ignores upper/lower case distinctions.
176 Compile for matching that need only report success or failure,
177 not what was matched.
179 Compile for newline-sensitive matching.
180 By default, newline is a completely ordinary character with no special
181 meaning in either REs or strings.
184 bracket expressions and
189 anchor matches the null string after any newline in the string in addition to
190 its normal function, and the
192 anchor matches the null string before any newline in the string in addition to
195 The regular expression ends, not at the first NUL, but just before the character
198 member of the structure pointed to by
204 This flag permits inclusion of NULs in the RE; they are considered ordinary
206 This is an extension, compatible with but not specified by
208 and should be used with caution in software intended to be portable to other
214 returns 0 and fills in the structure pointed to by
216 One member of that structure
224 contains the number of parenthesized subexpressions within the RE
225 .Po except that the value of this member is undefined if the
232 function matches the compiled RE pointed to by
236 subject to the flags in
238 and reports results using
241 and the returned value.
242 The RE must have been compiled by a previous invocation of
244 The compiled form is not altered during execution of
246 so a single compiled RE can be used simultaneously by multiple threads.
248 By default, the NUL-terminated string pointed to by
250 is considered to be the text of an entire line, minus any terminating
254 argument is the bitwise OR of zero or more of the following flags:
255 .Bl -tag -width REG_STARTEND
257 The first character of the string is treated as the continuation
259 This means that the anchors
264 do not match before it; but see
267 This does not affect the behavior of newlines under
270 The NUL terminating the string does not end a line, so the
272 anchor does not match before it.
273 This does not affect the behavior of newlines under
276 The string is considered to start at
278 .Fa pmatch Ns [0]. Ns Fa rm_so
279 and to end before the byte located at
281 .Fa pmatch Ns [0]. Ns Fa rm_eo ,
282 regardless of the value of
284 See below for the definition of
288 This is an extension, compatible with but not specified by
290 and should be used with caution in software intended to be portable to other
297 is considered the beginning of a line, such that
299 matches before it, and the beginning of a word if there is a word character at
300 this position, such that
308 the character at position
310 is treated as the continuation of a line, and if
312 is greater than 0, the preceding character is taken into consideration.
313 If the preceding character is a newline and the regular expression was compiled
317 matches before the string; if the preceding character is not a word character
318 but the string starts with a word character,
322 match before the string.
327 for a discussion of what is matched in situations where an RE or a portion
328 thereof could match any of several substrings of
333 was specified in the compilation of the RE, or if
340 .Po but see below for the case where
346 points to an array of
350 Such a structure has at least the members
356 .Po a signed arithmetic type at least as large as an
361 containing respectively the offset of the first character of a substring
362 and the offset of the first character after the end of the substring.
363 Offsets are measured from the beginning of the
367 An empty substring is denoted by equal offsets, both indicating the character
368 following the empty substring.
370 The 0th member of the
372 array is filled in to indicate what substring of
374 was matched by the entire RE.
375 Remaining members report what substring was matched by parenthesized
376 subexpressions within the RE; member
378 reports subexpression
380 with subexpressions counted
382 by the order of their opening parentheses in the RE, left to right.
383 Unused entries in the array
384 .Po corresponding either to subexpressions that did not participate in the match
385 at all, or to subexpressions that do not exist in the RE
389 .Fa preg Ns -> Ns Va re_nsub
397 If a subexpression participated in the match several times,
398 the reported substring is the last one it matched.
399 .Po Note, as an example in particular, that when the RE
403 the parenthesized subexpression matches each of the three
405 and then an infinite number of empty strings following the last
407 so the reported substring is one of the empties.
414 must point to at least one
422 to hold the input offsets for
424 Use for output is still entirely controlled by
433 will not be changed by a successful
438 function maps a non-zero
444 to a human-readable, printable message.
447 is non-NULL, the error code should have arisen from use of the
451 and if the error code came from
453 it should have been the result from the most recent
460 may be able to supply a more detailed message using information
466 function places the NUL-terminated message into the buffer pointed to by
469 .Pq including the NUL
473 If the whole message will not fit, as much of it as will fit before the
474 terminating NUL is supplied.
475 In any case, the returned value is the size of buffer needed to hold the whole
477 .Pq including terminating NUL .
482 is ignored but the return value is still correct.
492 that results is the printable name of the error code, e.g.
494 rather than an explanation thereof.
501 shall be non-NULL and the
503 member of the structure it points to must point to the printable name of an
504 error code; in this case, the result in
506 is the decimal digits of the numeric value of the error code
507 .Pq 0 if the name is not recognized .
511 are intended primarily as debugging facilities; they are extensions,
512 compatible with but not specified by
514 and should be used with caution in software intended to be portable to other
519 function frees any dynamically-allocated storage associated with the compiled RE
524 is no longer a valid compiled RE and the effect of supplying it to
529 .Sh IMPLEMENTATION NOTES
530 There are a number of decisions that
532 leaves up to the implementor,
533 either by explicitly saying
535 or by virtue of them being forbidden by the RE grammar.
536 This implementation treats them as follows.
538 There is no particular limit on the length of REs, except insofar as memory is
540 Memory usage is approximately linear in RE size, and largely insensitive
541 to RE complexity, except for bounded repetitions.
543 A backslashed character other than one specifically given a magic meaning by
545 .Pq such magic meanings occur only in BREs
546 is taken as an ordinary character.
554 Equivalence classes cannot begin or end bracket-expression ranges.
555 The endpoint of one range cannot begin another.
558 the limit on repetition counts in bounded repetitions, is 255.
560 A repetition operator
567 cannot follow another repetition operator.
568 A repetition operator cannot begin an expression or subexpression
575 cannot appear first or last in a (sub)expression or after another
579 cannot be an empty subexpression.
580 An empty parenthesized subexpression,
582 is legal and matches an empty (sub)string.
583 An empty string is not a legal RE.
587 followed by a digit is considered the beginning of bounds for a bounded
588 repetition, which must then follow the syntax for bounds.
592 followed by a digit is considered an ordinary character.
597 beginning and ending subexpressions in BREs are anchors, not ordinary
600 On successful completion, the
603 Otherwise, it returns an integer value indicating an error as described in
605 and the content of preg is undefined.
607 On successful completion, the
612 to indicate no match, or
614 to indicate that the function is not supported.
616 Upon successful completion, the
618 function returns the number of bytes needed to hold the entire generated string.
619 Otherwise, it returns 0 to indicate that the function is not implemented.
623 function returns no value.
625 The following constants are defined as error return values:
627 .Bl -tag -width "REG_ECOLLATE" -compact
631 function failed to match.
633 Invalid regular expression.
635 Invalid collating element referenced.
637 Invalid character class type referenced.
650 The function is not supported.
662 invalid: not a number, number too large, more than two
663 numbers, first larger than second.
665 Invalid endpoint in range expression.
673 not preceded by valid regular expression.
676 An application could use:
677 .Bd -literal -offset Ds
678 regerror(code, preg, (char *)NULL, (size_t)0)
681 to find out how big a buffer is needed for the generated string,
683 a buffer to hold the string, and then call
685 again to get the string
689 Alternately, it could allocate a fixed, static buffer that is big enough to hold
690 most strings, and then use
692 allocate a larger buffer if it finds that this is too small.
694 Matching string against the extended regular expression in pattern.
695 .Bd -literal -offset Ds
699 * Match string against the extended regular expression in
700 * pattern, treating errors as no match.
702 * return 1 for match, 0 for no match
705 match(const char *string, char *pattern)
710 if (regcomp(&re, pattern, REG_EXTENDED\||\|REG_NOSUB) != 0) {
711 return(0); /* report error */
713 status = regexec(&re, string, (size_t) 0, NULL, 0);
716 return(0); /* report error */
722 The following demonstrates how the
724 flag could be used with
726 to find all substrings in a line that match a pattern supplied by a user.
727 .Pq For simplicity of the example, very little error checking is done.
728 .Bd -literal -offset Ds
729 (void) regcomp(&re, pattern, 0);
730 /* this call to regexec() finds the first match on the line */
731 error = regexec(&re, &buffer[0], 1, &pm, 0);
732 while (error == 0) { /* while matches found */
733 /* substring found between pm.rm_so and pm.rm_eo */
734 /* This call to regexec() finds the next match */
735 error = regexec(&re, buffer + pm.rm_eo, 1, &pm, REG_NOTBOL);
739 No errors are defined.
740 .Sh CODE SET INDEPENDENCE
742 .Sh INTERFACE STABILITY
745 .Sy MT-Safe with exceptions
749 function can be used safely in a multithreaded application as long as
751 is not being called to change the locale.
759 .Pq Regular Expression Notation
762 .Pq C Binding for Regular Expression Matching .