release/src/router/pcre/doc/pcresyntax.3

   1 .TH PCRESYNTAX 3 "11 November 2012" "PCRE 8.32"
   2 .SH NAME
   3 PCRE - Perl-compatible regular expressions
   4 .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
   5 .rs
   6 .sp
   7 The full syntax and semantics of the regular expressions that are supported by
   8 PCRE are described in the
   9 .\" HREF
  10 \fBpcrepattern\fP
  11 .\"
  12 documentation. This document contains a quick-reference summary of the syntax.
  13 .
  14 .
  15 .SH "QUOTING"
  16 .rs
  17 .sp
  18   \ex         where x is non-alphanumeric is a literal x
  19   \eQ...\eE    treat enclosed characters as literal
  20 .
  21 .
  22 .SH "CHARACTERS"
  23 .rs
  24 .sp
  25   \ea         alarm, that is, the BEL character (hex 07)
  26   \ecx        "control-x", where x is any ASCII character
  27   \ee         escape (hex 1B)
  28   \ef         form feed (hex 0C)
  29   \en         newline (hex 0A)
  30   \er         carriage return (hex 0D)
  31   \et         tab (hex 09)
  32   \eddd       character with octal code ddd, or backreference
  33   \exhh       character with hex code hh
  34   \ex{hhh..}  character with hex code hhh..
  35 .
  36 .
  37 .SH "CHARACTER TYPES"
  38 .rs
  39 .sp
  40   .          any character except newline;
  41                in dotall mode, any character whatsoever
  42   \eC         one data unit, even in UTF mode (best avoided)
  43   \ed         a decimal digit
  44   \eD         a character that is not a decimal digit
  45   \eh         a horizontal white space character
  46   \eH         a character that is not a horizontal white space character
  47   \eN         a character that is not a newline
  48   \ep{\fIxx\fP}     a character with the \fIxx\fP property
  49   \eP{\fIxx\fP}     a character without the \fIxx\fP property
  50   \eR         a newline sequence
  51   \es         a white space character
  52   \eS         a character that is not a white space character
  53   \ev         a vertical white space character
  54   \eV         a character that is not a vertical white space character
  55   \ew         a "word" character
  56   \eW         a "non-word" character
  57   \eX         a Unicode extended grapheme cluster
  58 .sp
  59 In PCRE, by default, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII
  60 characters, even in a UTF mode. However, this can be changed by setting the
  61 PCRE_UCP option.
  62 .
  63 .
  64 .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
  65 .rs
  66 .sp
  67   C          Other
  68   Cc         Control
  69   Cf         Format
  70   Cn         Unassigned
  71   Co         Private use
  72   Cs         Surrogate
  73 .sp
  74   L          Letter
  75   Ll         Lower case letter
  76   Lm         Modifier letter
  77   Lo         Other letter
  78   Lt         Title case letter
  79   Lu         Upper case letter
  80   L&         Ll, Lu, or Lt
  81 .sp
  82   M          Mark
  83   Mc         Spacing mark
  84   Me         Enclosing mark
  85   Mn         Non-spacing mark
  86 .sp
  87   N          Number
  88   Nd         Decimal number
  89   Nl         Letter number
  90   No         Other number
  91 .sp
  92   P          Punctuation
  93   Pc         Connector punctuation
  94   Pd         Dash punctuation
  95   Pe         Close punctuation
  96   Pf         Final punctuation
  97   Pi         Initial punctuation
  98   Po         Other punctuation
  99   Ps         Open punctuation
 100 .sp
 101   S          Symbol
 102   Sc         Currency symbol
 103   Sk         Modifier symbol
 104   Sm         Mathematical symbol
 105   So         Other symbol
 106 .sp
 107   Z          Separator
 108   Zl         Line separator
 109   Zp         Paragraph separator
 110   Zs         Space separator
 111 .
 112 .
 113 .SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
 114 .rs
 115 .sp
 116   Xan        Alphanumeric: union of properties L and N
 117   Xps        POSIX space: property Z or tab, NL, VT, FF, CR
 118   Xsp        Perl space: property Z or tab, NL, FF, CR
 119   Xwd        Perl word: property Xan or underscore
 120 .
 121 .
 122 .SH "SCRIPT NAMES FOR \ep AND \eP"
 123 .rs
 124 .sp
 125 Arabic,
 126 Armenian,
 127 Avestan,
 128 Balinese,
 129 Bamum,
 130 Batak,
 131 Bengali,
 132 Bopomofo,
 133 Brahmi,
 134 Braille,
 135 Buginese,
 136 Buhid,
 137 Canadian_Aboriginal,
 138 Carian,
 139 Chakma,
 140 Cham,
 141 Cherokee,
 142 Common,
 143 Coptic,
 144 Cuneiform,
 145 Cypriot,
 146 Cyrillic,
 147 Deseret,
 148 Devanagari,
 149 Egyptian_Hieroglyphs,
 150 Ethiopic,
 151 Georgian,
 152 Glagolitic,
 153 Gothic,
 154 Greek,
 155 Gujarati,
 156 Gurmukhi,
 157 Han,
 158 Hangul,
 159 Hanunoo,
 160 Hebrew,
 161 Hiragana,
 162 Imperial_Aramaic,
 163 Inherited,
 164 Inscriptional_Pahlavi,
 165 Inscriptional_Parthian,
 166 Javanese,
 167 Kaithi,
 168 Kannada,
 169 Katakana,
 170 Kayah_Li,
 171 Kharoshthi,
 172 Khmer,
 173 Lao,
 174 Latin,
 175 Lepcha,
 176 Limbu,
 177 Linear_B,
 178 Lisu,
 179 Lycian,
 180 Lydian,
 181 Malayalam,
 182 Mandaic,
 183 Meetei_Mayek,
 184 Meroitic_Cursive,
 185 Meroitic_Hieroglyphs,
 186 Miao,
 187 Mongolian,
 188 Myanmar,
 189 New_Tai_Lue,
 190 Nko,
 191 Ogham,
 192 Old_Italic,
 193 Old_Persian,
 194 Old_South_Arabian,
 195 Old_Turkic,
 196 Ol_Chiki,
 197 Oriya,
 198 Osmanya,
 199 Phags_Pa,
 200 Phoenician,
 201 Rejang,
 202 Runic,
 203 Samaritan,
 204 Saurashtra,
 205 Sharada,
 206 Shavian,
 207 Sinhala,
 208 Sora_Sompeng,
 209 Sundanese,
 210 Syloti_Nagri,
 211 Syriac,
 212 Tagalog,
 213 Tagbanwa,
 214 Tai_Le,
 215 Tai_Tham,
 216 Tai_Viet,
 217 Takri,
 218 Tamil,
 219 Telugu,
 220 Thaana,
 221 Thai,
 222 Tibetan,
 223 Tifinagh,
 224 Ugaritic,
 225 Vai,
 226 Yi.
 227 .
 228 .
 229 .SH "CHARACTER CLASSES"
 230 .rs
 231 .sp
 232   [...]       positive character class
 233   [^...]      negative character class
 234   [x-y]       range (can be used for hex characters)
 235   [[:xxx:]]   positive POSIX named set
 236   [[:^xxx:]]  negative POSIX named set
 237 .sp
 238   alnum       alphanumeric
 239   alpha       alphabetic
 240   ascii       0-127
 241   blank       space or tab
 242   cntrl       control character
 243   digit       decimal digit
 244   graph       printing, excluding space
 245   lower       lower case letter
 246   print       printing, including space
 247   punct       printing, excluding alphanumeric
 248   space       white space
 249   upper       upper case letter
 250   word        same as \ew
 251   xdigit      hexadecimal digit
 252 .sp
 253 In PCRE, POSIX character set names recognize only ASCII characters by default,
 254 but some of them use Unicode properties if PCRE_UCP is set. You can use
 255 \eQ...\eE inside a character class.
 256 .
 257 .
 258 .SH "QUANTIFIERS"
 259 .rs
 260 .sp
 261   ?           0 or 1, greedy
 262   ?+          0 or 1, possessive
 263   ??          0 or 1, lazy
 264   *           0 or more, greedy
 265   *+          0 or more, possessive
 266   *?          0 or more, lazy
 267   +           1 or more, greedy
 268   ++          1 or more, possessive
 269   +?          1 or more, lazy
 270   {n}         exactly n
 271   {n,m}       at least n, no more than m, greedy
 272   {n,m}+      at least n, no more than m, possessive
 273   {n,m}?      at least n, no more than m, lazy
 274   {n,}        n or more, greedy
 275   {n,}+       n or more, possessive
 276   {n,}?       n or more, lazy
 277 .
 278 .
 279 .SH "ANCHORS AND SIMPLE ASSERTIONS"
 280 .rs
 281 .sp
 282   \eb          word boundary
 283   \eB          not a word boundary
 284   ^           start of subject
 285                also after internal newline in multiline mode
 286   \eA          start of subject
 287   $           end of subject
 288                also before newline at end of subject
 289                also before internal newline in multiline mode
 290   \eZ          end of subject
 291                also before newline at end of subject
 292   \ez          end of subject
 293   \eG          first matching position in subject
 294 .
 295 .
 296 .SH "MATCH POINT RESET"
 297 .rs
 298 .sp
 299   \eK          reset start of match
 300 .
 301 .
 302 .SH "ALTERNATION"
 303 .rs
 304 .sp
 305   expr|expr|expr...
 306 .
 307 .
 308 .SH "CAPTURING"
 309 .rs
 310 .sp
 311   (...)           capturing group
 312   (?<name>...)    named capturing group (Perl)
 313   (?'name'...)    named capturing group (Perl)
 314   (?P<name>...)   named capturing group (Python)
 315   (?:...)         non-capturing group
 316   (?|...)         non-capturing group; reset group numbers for
 317                    capturing groups in each alternative
 318 .
 319 .
 320 .SH "ATOMIC GROUPS"
 321 .rs
 322 .sp
 323   (?>...)         atomic, non-capturing group
 324 .
 325 .
 326 .
 327 .
 328 .SH "COMMENT"
 329 .rs
 330 .sp
 331   (?#....)        comment (not nestable)
 332 .
 333 .
 334 .SH "OPTION SETTING"
 335 .rs
 336 .sp
 337   (?i)            caseless
 338   (?J)            allow duplicate names
 339   (?m)            multiline
 340   (?s)            single line (dotall)
 341   (?U)            default ungreedy (lazy)
 342   (?x)            extended (ignore white space)
 343   (?-...)         unset option(s)
 344 .sp
 345 The following are recognized only at the start of a pattern or after one of the
 346 newline-setting options with similar syntax:
 347 .sp
 348   (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
 349   (*UTF8)         set UTF-8 mode: 8-bit library (PCRE_UTF8)
 350   (*UTF16)        set UTF-16 mode: 16-bit library (PCRE_UTF16)
 351   (*UTF32)        set UTF-32 mode: 32-bit library (PCRE_UTF32)
 352   (*UTF)          set appropriate UTF mode for the library in use
 353   (*UCP)          set PCRE_UCP (use Unicode properties for \ed etc)
 354 .
 355 .
 356 .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
 357 .rs
 358 .sp
 359   (?=...)         positive look ahead
 360   (?!...)         negative look ahead
 361   (?<=...)        positive look behind
 362   (?<!...)        negative look behind
 363 .sp
 364 Each top-level branch of a look behind must be of a fixed length.
 365 .
 366 .
 367 .SH "BACKREFERENCES"
 368 .rs
 369 .sp
 370   \en              reference by number (can be ambiguous)
 371   \egn             reference by number
 372   \eg{n}           reference by number
 373   \eg{-n}          relative reference by number
 374   \ek<name>        reference by name (Perl)
 375   \ek'name'        reference by name (Perl)
 376   \eg{name}        reference by name (Perl)
 377   \ek{name}        reference by name (.NET)
 378   (?P=name)       reference by name (Python)
 379 .
 380 .
 381 .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
 382 .rs
 383 .sp
 384   (?R)            recurse whole pattern
 385   (?n)            call subpattern by absolute number
 386   (?+n)           call subpattern by relative number
 387   (?-n)           call subpattern by relative number
 388   (?&name)        call subpattern by name (Perl)
 389   (?P>name)       call subpattern by name (Python)
 390   \eg<name>        call subpattern by name (Oniguruma)
 391   \eg'name'        call subpattern by name (Oniguruma)
 392   \eg<n>           call subpattern by absolute number (Oniguruma)
 393   \eg'n'           call subpattern by absolute number (Oniguruma)
 394   \eg<+n>          call subpattern by relative number (PCRE extension)
 395   \eg'+n'          call subpattern by relative number (PCRE extension)
 396   \eg<-n>          call subpattern by relative number (PCRE extension)
 397   \eg'-n'          call subpattern by relative number (PCRE extension)
 398 .
 399 .
 400 .SH "CONDITIONAL PATTERNS"
 401 .rs
 402 .sp
 403   (?(condition)yes-pattern)
 404   (?(condition)yes-pattern|no-pattern)
 405 .sp
 406   (?(n)...        absolute reference condition
 407   (?(+n)...       relative reference condition
 408   (?(-n)...       relative reference condition
 409   (?(<name>)...   named reference condition (Perl)
 410   (?('name')...   named reference condition (Perl)
 411   (?(name)...     named reference condition (PCRE)
 412   (?(R)...        overall recursion condition
 413   (?(Rn)...       specific group recursion condition
 414   (?(R&name)...   specific recursion condition
 415   (?(DEFINE)...   define subpattern for reference
 416   (?(assert)...   assertion condition
 417 .
 418 .
 419 .SH "BACKTRACKING CONTROL"
 420 .rs
 421 .sp
 422 The following act immediately they are reached:
 423 .sp
 424   (*ACCEPT)       force successful match
 425   (*FAIL)         force backtrack; synonym (*F)
 426   (*MARK:NAME)    set name to be passed back; synonym (*:NAME)
 427 .sp
 428 The following act only when a subsequent match failure causes a backtrack to
 429 reach them. They all force a match failure, but they differ in what happens
 430 afterwards. Those that advance the start-of-match point do so only if the
 431 pattern is not anchored.
 432 .sp
 433   (*COMMIT)       overall failure, no advance of starting point
 434   (*PRUNE)        advance to next starting character
 435   (*PRUNE:NAME)   equivalent to (*MARK:NAME)(*PRUNE)
 436   (*SKIP)         advance to current matching position
 437   (*SKIP:NAME)    advance to position corresponding to an earlier
 438                   (*MARK:NAME); if not found, the (*SKIP) is ignored
 439   (*THEN)         local failure, backtrack to next alternation
 440   (*THEN:NAME)    equivalent to (*MARK:NAME)(*THEN)
 441 .
 442 .
 443 .SH "NEWLINE CONVENTIONS"
 444 .rs
 445 .sp
 446 These are recognized only at the very start of the pattern or after a
 447 (*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option.
 448 .sp
 449   (*CR)           carriage return only
 450   (*LF)           linefeed only
 451   (*CRLF)         carriage return followed by linefeed
 452   (*ANYCRLF)      all three of the above
 453   (*ANY)          any Unicode newline sequence
 454 .
 455 .
 456 .SH "WHAT \eR MATCHES"
 457 .rs
 458 .sp
 459 These are recognized only at the very start of the pattern or after a
 460 (*...) option that sets the newline convention or a UTF or UCP mode.
 461 .sp
 462   (*BSR_ANYCRLF)  CR, LF, or CRLF
 463   (*BSR_UNICODE)  any Unicode newline sequence
 464 .
 465 .
 466 .SH "CALLOUTS"
 467 .rs
 468 .sp
 469   (?C)      callout
 470   (?Cn)     callout with data n
 471 .
 472 .
 473 .SH "SEE ALSO"
 474 .rs
 475 .sp
 476 \fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
 477 \fBpcrematching\fP(3), \fBpcre\fP(3).
 478 .
 479 .
 480 .SH AUTHOR
 481 .rs
 482 .sp
 483 .nf
 484 Philip Hazel
 485 University Computing Service
 486 Cambridge CB2 3QH, England.
 487 .fi
 488 .
 489 .
 490 .SH REVISION
 491 .rs
 492 .sp
 493 .nf
 494 Last updated: 11 November 2012
 495 Copyright (c) 1997-2012 University of Cambridge.
 496 .fi