1 .TH PCRESYNTAX 3 "11 November 2012" "PCRE 8.32"
3 PCRE - Perl-compatible regular expressions
4 .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
7 The full syntax and semantics of the regular expressions that are supported by
8 PCRE are described in the
12 documentation. This document contains a quick-reference summary of the syntax.
18 \ex where x is non-alphanumeric is a literal x
19 \eQ...\eE treat enclosed characters as literal
25 \ea alarm, that is, the BEL character (hex 07)
26 \ecx "control-x", where x is any ASCII character
28 \ef form feed (hex 0C)
30 \er carriage return (hex 0D)
32 \eddd character with octal code ddd, or backreference
33 \exhh character with hex code hh
34 \ex{hhh..} character with hex code hhh..
40 . any character except newline;
41 in dotall mode, any character whatsoever
42 \eC one data unit, even in UTF mode (best avoided)
44 \eD a character that is not a decimal digit
45 \eh a horizontal white space character
46 \eH a character that is not a horizontal white space character
47 \eN a character that is not a newline
48 \ep{\fIxx\fP} a character with the \fIxx\fP property
49 \eP{\fIxx\fP} a character without the \fIxx\fP property
50 \eR a newline sequence
51 \es a white space character
52 \eS a character that is not a white space character
53 \ev a vertical white space character
54 \eV a character that is not a vertical white space character
55 \ew a "word" character
56 \eW a "non-word" character
57 \eX a Unicode extended grapheme cluster
59 In PCRE, by default, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII
60 characters, even in a UTF mode. However, this can be changed by setting the
64 .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
93 Pc Connector punctuation
97 Pi Initial punctuation
104 Sm Mathematical symbol
109 Zp Paragraph separator
113 .SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
116 Xan Alphanumeric: union of properties L and N
117 Xps POSIX space: property Z or tab, NL, VT, FF, CR
118 Xsp Perl space: property Z or tab, NL, FF, CR
119 Xwd Perl word: property Xan or underscore
122 .SH "SCRIPT NAMES FOR \ep AND \eP"
149 Egyptian_Hieroglyphs,
164 Inscriptional_Pahlavi,
165 Inscriptional_Parthian,
185 Meroitic_Hieroglyphs,
229 .SH "CHARACTER CLASSES"
232 [...] positive character class
233 [^...] negative character class
234 [x-y] range (can be used for hex characters)
235 [[:xxx:]] positive POSIX named set
236 [[:^xxx:]] negative POSIX named set
242 cntrl control character
244 graph printing, excluding space
245 lower lower case letter
246 print printing, including space
247 punct printing, excluding alphanumeric
249 upper upper case letter
251 xdigit hexadecimal digit
253 In PCRE, POSIX character set names recognize only ASCII characters by default,
254 but some of them use Unicode properties if PCRE_UCP is set. You can use
255 \eQ...\eE inside a character class.
262 ?+ 0 or 1, possessive
265 *+ 0 or more, possessive
268 ++ 1 or more, possessive
271 {n,m} at least n, no more than m, greedy
272 {n,m}+ at least n, no more than m, possessive
273 {n,m}? at least n, no more than m, lazy
274 {n,} n or more, greedy
275 {n,}+ n or more, possessive
276 {n,}? n or more, lazy
279 .SH "ANCHORS AND SIMPLE ASSERTIONS"
283 \eB not a word boundary
285 also after internal newline in multiline mode
288 also before newline at end of subject
289 also before internal newline in multiline mode
291 also before newline at end of subject
293 \eG first matching position in subject
296 .SH "MATCH POINT RESET"
299 \eK reset start of match
311 (...) capturing group
312 (?<name>...) named capturing group (Perl)
313 (?'name'...) named capturing group (Perl)
314 (?P<name>...) named capturing group (Python)
315 (?:...) non-capturing group
316 (?|...) non-capturing group; reset group numbers for
317 capturing groups in each alternative
323 (?>...) atomic, non-capturing group
331 (?#....) comment (not nestable)
338 (?J) allow duplicate names
340 (?s) single line (dotall)
341 (?U) default ungreedy (lazy)
342 (?x) extended (ignore white space)
343 (?-...) unset option(s)
345 The following are recognized only at the start of a pattern or after one of the
346 newline-setting options with similar syntax:
348 (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
349 (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8)
350 (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16)
351 (*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32)
352 (*UTF) set appropriate UTF mode for the library in use
353 (*UCP) set PCRE_UCP (use Unicode properties for \ed etc)
356 .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
359 (?=...) positive look ahead
360 (?!...) negative look ahead
361 (?<=...) positive look behind
362 (?<!...) negative look behind
364 Each top-level branch of a look behind must be of a fixed length.
370 \en reference by number (can be ambiguous)
371 \egn reference by number
372 \eg{n} reference by number
373 \eg{-n} relative reference by number
374 \ek<name> reference by name (Perl)
375 \ek'name' reference by name (Perl)
376 \eg{name} reference by name (Perl)
377 \ek{name} reference by name (.NET)
378 (?P=name) reference by name (Python)
381 .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
384 (?R) recurse whole pattern
385 (?n) call subpattern by absolute number
386 (?+n) call subpattern by relative number
387 (?-n) call subpattern by relative number
388 (?&name) call subpattern by name (Perl)
389 (?P>name) call subpattern by name (Python)
390 \eg<name> call subpattern by name (Oniguruma)
391 \eg'name' call subpattern by name (Oniguruma)
392 \eg<n> call subpattern by absolute number (Oniguruma)
393 \eg'n' call subpattern by absolute number (Oniguruma)
394 \eg<+n> call subpattern by relative number (PCRE extension)
395 \eg'+n' call subpattern by relative number (PCRE extension)
396 \eg<-n> call subpattern by relative number (PCRE extension)
397 \eg'-n' call subpattern by relative number (PCRE extension)
400 .SH "CONDITIONAL PATTERNS"
403 (?(condition)yes-pattern)
404 (?(condition)yes-pattern|no-pattern)
406 (?(n)... absolute reference condition
407 (?(+n)... relative reference condition
408 (?(-n)... relative reference condition
409 (?(<name>)... named reference condition (Perl)
410 (?('name')... named reference condition (Perl)
411 (?(name)... named reference condition (PCRE)
412 (?(R)... overall recursion condition
413 (?(Rn)... specific group recursion condition
414 (?(R&name)... specific recursion condition
415 (?(DEFINE)... define subpattern for reference
416 (?(assert)... assertion condition
419 .SH "BACKTRACKING CONTROL"
422 The following act immediately they are reached:
424 (*ACCEPT) force successful match
425 (*FAIL) force backtrack; synonym (*F)
426 (*MARK:NAME) set name to be passed back; synonym (*:NAME)
428 The following act only when a subsequent match failure causes a backtrack to
429 reach them. They all force a match failure, but they differ in what happens
430 afterwards. Those that advance the start-of-match point do so only if the
431 pattern is not anchored.
433 (*COMMIT) overall failure, no advance of starting point
434 (*PRUNE) advance to next starting character
435 (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
436 (*SKIP) advance to current matching position
437 (*SKIP:NAME) advance to position corresponding to an earlier
438 (*MARK:NAME); if not found, the (*SKIP) is ignored
439 (*THEN) local failure, backtrack to next alternation
440 (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
443 .SH "NEWLINE CONVENTIONS"
446 These are recognized only at the very start of the pattern or after a
447 (*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option.
449 (*CR) carriage return only
451 (*CRLF) carriage return followed by linefeed
452 (*ANYCRLF) all three of the above
453 (*ANY) any Unicode newline sequence
456 .SH "WHAT \eR MATCHES"
459 These are recognized only at the very start of the pattern or after a
460 (*...) option that sets the newline convention or a UTF or UCP mode.
462 (*BSR_ANYCRLF) CR, LF, or CRLF
463 (*BSR_UNICODE) any Unicode newline sequence
470 (?Cn) callout with data n
476 \fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
477 \fBpcrematching\fP(3), \fBpcre\fP(3).
485 University Computing Service
486 Cambridge CB2 3QH, England.
494 Last updated: 11 November 2012
495 Copyright (c) 1997-2012 University of Cambridge.