lib/libc/tre-regex/re_format.7

   1 .\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
   2 .\" Copyright (c) 1992, 1993, 1994
   3 .\"     The Regents of the University of California.  All rights reserved.
   4 .\"
   5 .\" This code is derived from software contributed to Berkeley by
   6 .\" Henry Spencer.
   7 .\"
   8 .\" Redistribution and use in source and binary forms, with or without
   9 .\" modification, are permitted provided that the following conditions
  10 .\" are met:
  11 .\" 1. Redistributions of source code must retain the above copyright
  12 .\"    notice, this list of conditions and the following disclaimer.
  13 .\" 2. Redistributions in binary form must reproduce the above copyright
  14 .\"    notice, this list of conditions and the following disclaimer in the
  15 .\"    documentation and/or other materials provided with the distribution.
  16 .\" 3. Neither the name of the University nor the names of its contributors
  17 .\"    may be used to endorse or promote products derived from this software
  18 .\"    without specific prior written permission.
  19 .\"
  20 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  21 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  22 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  23 .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  24 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  25 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  26 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  27 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  28 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  29 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  30 .\" SUCH DAMAGE.
  31 .\"
  32 .\"     @(#)re_format.7 8.3 (Berkeley) 3/20/94
  33 .\" $FreeBSD: src/lib/libc/regex/re_format.7,v 1.12 2008/09/05 17:41:20 keramida Exp $
  34 .\"
  35 .Dd August 6, 2015
  36 .Dt RE_FORMAT 7
  37 .Os
  38 .Sh NAME
  39 .Nm re_format
  40 .Nd POSIX 1003.2 regular expressions
  41 .Sh DESCRIPTION
  42 Regular expressions
  43 .Pq Dq RE Ns s ,
  44 as defined in
  45 .St -p1003.2 ,
  46 come in two forms:
  47 modern REs (roughly those of
  48 .Xr egrep 1 ;
  49 1003.2 calls these
  50 .Dq extended
  51 REs)
  52 and obsolete REs (roughly those of
  53 .Xr ed 1 ;
  54 1003.2
  55 .Dq basic
  56 REs).
  57 Obsolete REs mostly exist for backward compatibility in some old programs;
  58 they will be discussed at the end.
  59 .St -p1003.2
  60 leaves some aspects of RE syntax and semantics open;
  61 `\(dd' marks decisions on these aspects that
  62 may not be fully portable to other
  63 .St -p1003.2
  64 implementations.
  65 .Pp
  66 A (modern) RE is one\(dd or more non-empty\(dd
  67 .Em branches ,
  68 separated by
  69 .Ql \&| .
  70 It matches anything that matches one of the branches.
  71 .Pp
  72 A branch is one\(dd or more
  73 .Em pieces ,
  74 concatenated.
  75 It matches a match for the first, followed by a match for the second, etc.
  76 .Pp
  77 A piece is an
  78 .Em atom
  79 possibly followed
  80 by a single\(dd
  81 .Ql \&* ,
  82 .Ql \&+ ,
  83 .Ql \&? ,
  84 or
  85 .Em bound .
  86 An atom followed by
  87 .Ql \&*
  88 matches a sequence of 0 or more matches of the atom.
  89 An atom followed by
  90 .Ql \&+
  91 matches a sequence of 1 or more matches of the atom.
  92 An atom followed by
  93 .Ql ?\&
  94 matches a sequence of 0 or 1 matches of the atom.
  95 .Pp
  96 A
  97 .Em bound
  98 is
  99 .Ql \&{
 100 followed by an unsigned decimal integer,
 101 possibly followed by
 102 .Ql \&,
 103 possibly followed by another unsigned decimal integer,
 104 always followed by
 105 .Ql \&} .
 106 The integers must lie between 0 and
 107 .Dv RE_DUP_MAX
 108 (255\(dd) inclusive,
 109 and if there are two of them, the first may not exceed the second.
 110 An atom followed by a bound containing one integer
 111 .Em i
 112 and no comma matches
 113 a sequence of exactly
 114 .Em i
 115 matches of the atom.
 116 An atom followed by a bound
 117 containing one integer
 118 .Em i
 119 and a comma matches
 120 a sequence of
 121 .Em i
 122 or more matches of the atom.
 123 An atom followed by a bound
 124 containing two integers
 125 .Em i
 126 and
 127 .Em j
 128 matches
 129 a sequence of
 130 .Em i
 131 through
 132 .Em j
 133 (inclusive) matches of the atom.
 134 .Pp
 135 An atom is a regular expression enclosed in
 136 .Ql ()
 137 (matching a match for the
 138 regular expression),
 139 an empty set of
 140 .Ql ()
 141 (matching the null string)\(dd,
 142 a
 143 .Em bracket expression
 144 (see below),
 145 .Ql .\&
 146 (matching any single character),
 147 .Ql \&^
 148 (matching the null string at the beginning of a line),
 149 .Ql \&$
 150 (matching the null string at the end of a line), a
 151 .Ql \e
 152 followed by one of the characters
 153 .Ql ^.[$()|*+?{\e
 154 (matching that character taken as an ordinary character),
 155 a
 156 .Ql \e
 157 followed by any other character\(dd
 158 (matching that character taken as an ordinary character,
 159 as if the
 160 .Ql \e
 161 had not been present\(dd),
 162 or a single character with no other significance (matching that character).
 163 A
 164 .Ql \&{
 165 followed by a character other than a digit is an ordinary
 166 character, not the beginning of a bound\(dd.
 167 It is illegal to end an RE with
 168 .Ql \e .
 169 .Pp
 170 A
 171 .Em bracket expression
 172 is a list of characters enclosed in
 173 .Ql [] .
 174 It normally matches any single character from the list (but see below).
 175 If the list begins with
 176 .Ql \&^ ,
 177 it matches any single character
 178 (but see below)
 179 .Em not
 180 from the rest of the list.
 181 If two characters in the list are separated by
 182 .Ql \&- ,
 183 this is shorthand
 184 for the full
 185 .Em range
 186 of characters between those two (inclusive) in the
 187 collating sequence,
 188 .No e.g. Ql [0-9]
 189 in ASCII matches any decimal digit.
 190 It is illegal\(dd for two ranges to share an
 191 endpoint,
 192 .No e.g. Ql a-c-e .
 193 Ranges are very collating-sequence-dependent,
 194 and portable programs should avoid relying on them.
 195 .Pp
 196 To include a literal
 197 .Ql \&]
 198 in the list, make it the first character
 199 (following a possible
 200 .Ql \&^ ) .
 201 To include a literal
 202 .Ql \&- ,
 203 make it the first or last character,
 204 or the second endpoint of a range.
 205 To use a literal
 206 .Ql \&-
 207 as the first endpoint of a range,
 208 enclose it in
 209 .Ql [.\&
 210 and
 211 .Ql .]\&
 212 to make it a collating element (see below).
 213 With the exception of these and some combinations using
 214 .Ql \&[
 215 (see next paragraphs), all other special characters, including
 216 .Ql \e ,
 217 lose their special significance within a bracket expression.
 218 .Pp
 219 Within a bracket expression, a collating element (a character,
 220 a multi-character sequence that collates as if it were a single character,
 221 or a collating-sequence name for either)
 222 enclosed in
 223 .Ql [.\&
 224 and
 225 .Ql .]\&
 226 stands for the
 227 sequence of characters of that collating element.
 228 The sequence is a single element of the bracket expression's list.
 229 A bracket expression containing a multi-character collating element
 230 can thus match more than one character,
 231 e.g.\& if the collating sequence includes a
 232 .Ql ch
 233 collating element,
 234 then the RE
 235 .Ql [[.ch.]]*c
 236 matches the first five characters
 237 of
 238 .Ql chchcc .
 239 .Pp
 240 Within a bracket expression, a collating element enclosed in
 241 .Ql [=
 242 and
 243 .Ql =]
 244 is an equivalence class, standing for the sequences of characters
 245 of all collating elements equivalent to that one, including itself.
 246 (If there are no other equivalent collating elements,
 247 the treatment is as if the enclosing delimiters were
 248 .Ql [.\&
 249 and
 250 .Ql .] . )
 251 For example, if
 252 .Ql x
 253 and
 254 .Ql y
 255 are the members of an equivalence class,
 256 then
 257 .Ql [[=x=]] ,
 258 .Ql [[=y=]] ,
 259 and
 260 .Ql [xy]
 261 are all synonymous.
 262 An equivalence class may not\(dd be an endpoint
 263 of a range.
 264 .Pp
 265 Within a bracket expression, the name of a
 266 .Em character class
 267 enclosed in
 268 .Ql [:
 269 and
 270 .Ql :]
 271 stands for the list of all characters belonging to that
 272 class.
 273 Standard character class names are:
 274 .Bl -column "alnum" "digit" "xdigit" -offset indent
 275 .It Em "alnum   digit   punct"
 276 .It Em "alpha   graph   space"
 277 .It Em "blank   lower   upper"
 278 .It Em "cntrl   print   xdigit"
 279 .El
 280 .Pp
 281 These stand for the character classes defined in
 282 .Xr ctype 3 .
 283 A locale may provide others.
 284 A character class may not be used as an endpoint of a range.
 285 .Pp
 286 A bracketed expression like
 287 .Ql [[:class:]]
 288 can be used to match a single character that belongs to a character
 289 class.
 290 The reverse, matching any character that does not belong to a specific
 291 class, the negation operator of bracket expressions may be used:
 292 .Ql [^[:class:]] .
 293 .Pp
 294 There are two special cases\(dd of bracket expressions:
 295 the bracket expressions
 296 .Ql [[:<:]]
 297 and
 298 .Ql [[:>:]]
 299 match the null string at the beginning and end of a word respectively.
 300 A word is defined as a sequence of word characters
 301 which is neither preceded nor followed by
 302 word characters.
 303 A word character is an
 304 .Em alnum
 305 character (as defined by
 306 .Xr ctype 3 )
 307 or an underscore.
 308 This is an extension,
 309 compatible with but not specified by
 310 .St -p1003.2 ,
 311 and should be used with
 312 caution in software intended to be portable to other systems.
 313 .Pp
 314 In the event that an RE could match more than one substring of a given
 315 string,
 316 the RE matches the one starting earliest in the string.
 317 If the RE could match more than one substring starting at that point,
 318 it matches the longest.
 319 Subexpressions also match the longest possible substrings, subject to
 320 the constraint that the whole match be as long as possible,
 321 with subexpressions starting earlier in the RE taking priority over
 322 ones starting later.
 323 Note that higher-level subexpressions thus take priority over
 324 their lower-level component subexpressions.
 325 .Pp
 326 Match lengths are measured in characters, not collating elements.
 327 A null string is considered longer than no match at all.
 328 For example,
 329 .Ql bb*
 330 matches the three middle characters of
 331 .Ql abbbc ,
 332 .Ql (wee|week)(knights|nights)
 333 matches all ten characters of
 334 .Ql weeknights ,
 335 when
 336 .Ql (.*).*\&
 337 is matched against
 338 .Ql abc
 339 the parenthesized subexpression
 340 matches all three characters, and
 341 when
 342 .Ql (a*)*
 343 is matched against
 344 .Ql bc
 345 both the whole RE and the parenthesized
 346 subexpression match the null string.
 347 .Pp
 348 If case-independent matching is specified,
 349 the effect is much as if all case distinctions had vanished from the
 350 alphabet.
 351 When an alphabetic that exists in multiple cases appears as an
 352 ordinary character outside a bracket expression, it is effectively
 353 transformed into a bracket expression containing both cases,
 354 .No e.g. Ql x
 355 becomes
 356 .Ql [xX] .
 357 When it appears inside a bracket expression, all case counterparts
 358 of it are added to the bracket expression, so that (e.g.)
 359 .Ql [x]
 360 becomes
 361 .Ql [xX]
 362 and
 363 .Ql [^x]
 364 becomes
 365 .Ql [^xX] .
 366 .Pp
 367 No particular limit is imposed on the length of REs\(dd.
 368 Programs intended to be portable should not employ REs longer
 369 than 256 bytes,
 370 as an implementation can refuse to accept such REs and remain
 371 POSIX-compliant.
 372 .Pp
 373 Obsolete
 374 .Pq Dq basic
 375 regular expressions differ in several respects.
 376 .Ql \&|
 377 is an ordinary character and there is no equivalent
 378 for its functionality.
 379 .Ql \&+
 380 and
 381 .Ql ?\&
 382 are ordinary characters, and their functionality
 383 can be expressed using bounds
 384 .No ( Ql {1,}
 385 or
 386 .Ql {0,1}
 387 respectively).
 388 Also note that
 389 .Ql x+
 390 in modern REs is equivalent to
 391 .Ql xx* .
 392 The delimiters for bounds are
 393 .Ql \e{
 394 and
 395 .Ql \e} ,
 396 with
 397 .Ql \&{
 398 and
 399 .Ql \&}
 400 by themselves ordinary characters.
 401 The parentheses for nested subexpressions are
 402 .Ql \e(
 403 and
 404 .Ql \e) ,
 405 with
 406 .Ql \&(
 407 and
 408 .Ql \&)
 409 by themselves ordinary characters.
 410 .Ql \&^
 411 is an ordinary character except at the beginning of the
 412 RE or\(dd the beginning of a parenthesized subexpression,
 413 .Ql \&$
 414 is an ordinary character except at the end of the
 415 RE or\(dd the end of a parenthesized subexpression,
 416 and
 417 .Ql \&*
 418 is an ordinary character if it appears at the beginning of the
 419 RE or the beginning of a parenthesized subexpression
 420 (after a possible leading
 421 .Ql \&^ ) .
 422 Finally, there is one new type of atom, a
 423 .Em back reference :
 424 .Ql \e
 425 followed by a non-zero decimal digit
 426 .Em d
 427 matches the same sequence of characters
 428 matched by the
 429 .Em d Ns th
 430 parenthesized subexpression
 431 (numbering subexpressions by the positions of their opening parentheses,
 432 left to right),
 433 so that (e.g.)
 434 .Ql \e([bc]\e)\e1
 435 matches
 436 .Ql bb
 437 or
 438 .Ql cc
 439 but not
 440 .Ql bc .
 441 .Sh ENHANCED FEATURES
 442 When the
 443 .Dv REG_ENHANCED
 444 flag is passed to one of the
 445 .Fn regcomp
 446 variants, additional features are activated.
 447 Like the enhanced
 448 .Nm regex
 449 implementations in scripting languages such as
 450 .Xr perl 1
 451 and
 452 .Xr python 1 ,
 453 these additional features may conflict with the
 454 .St -p1003.2
 455 standards in some ways.
 456 Use this with care in situations which require portability
 457 (including to past versions of the Mac OS X using the previous
 458 .Nm regex
 459 implementation).
 460 .Pp
 461 For enhanced basic REs,
 462 .Ql \&+ ,
 463 .Ql \&?
 464 and
 465 .Ql \&|
 466 remain regular characters, but
 467 .Ql \e+ ,
 468 .Ql \e?
 469 and
 470 .Ql \e|
 471 have the same special meaning as the unescaped characters do for
 472 extended REs, i.e., one or more matches, zero or one matches and alteration,
 473 respectively.
 474 For enhanced extended REs,
 475 back references are available.
 476 Additional enhanced features are listed below.
 477 .Pp
 478 Within a bracket expression, most characters lose their magic.
 479 This also applies to the additional enhanced features, which don't operate
 480 inside a bracket expression.
 481 .Ss Assertions (available for both enhanced basic and enhanced extended REs)
 482 In addition to
 483 .Ql \&^
 484 and
 485 .Ql \&$
 486 (the assertions that match the null string at the beginning and end of line,
 487 respectively), the following assertions become available:
 488 .Bl -tag -width ".Sy \eB" -offset indent
 489 .It Sy \e<
 490 Matches the null string at the beginning of a word.
 491 This is equivalent to
 492 .Ql [[:<:]] .
 493 .It Sy \e>
 494 Matches the null string at the end of a word.
 495 This is equivalent to
 496 .Ql [[:>:]] .
 497 .It Sy \eb
 498 Matches the null string at a word boundary (either the beginning or end of
 499 a word).
 500 .It Sy \eB
 501 Matches the null string where there is no word boundary.
 502 This is the opposite of
 503 .Ql \eb .
 504 .El
 505 .Ss Shortcuts (available for both enhanced basic and enhanced extended REs)
 506 The following shortcuts can be used to replace more complicated
 507 bracket expressions.
 508 .Bl -tag -width ".Sy \eD" -offset indent
 509 .It Sy \ed
 510 Matches a digit character.
 511 This is equivalent to
 512 .Ql [[:digit:]] .
 513 .It Sy \eD
 514 Matches a non-digit character.
 515 This is equivalent to
 516 .Ql [^[:digit:]] .
 517 .It Sy \es
 518 Matches a space character.
 519 This is equivalent to
 520 .Ql [[:space:]] .
 521 .It Sy \eS
 522 Matches a non-space character.
 523 This is equivalent to
 524 .Ql [^[:space:]] .
 525 .It Sy \ew
 526 Matches a word character.
 527 This is equivalent to
 528 .Ql [[:alnum:]_] .
 529 .It Sy \eW
 530 Matches a non-word character.
 531 This is equivalent to
 532 .Ql [^[:alnum:]_] .
 533 .El
 534 .Ss Literal Sequences (available for both enhanced basic and enhanced extended REs)
 535 Literals are normally just ordinary characters that are matched directly.
 536 Under enhanced mode, certain character sequences are
 537 converted to specific literals.
 538 .Bl -tag -width ".Sy \ea" -offset indent
 539 .It Sy \ea
 540 The
 541 .Dq bell
 542 character (ASCII code 7).
 543 .It Sy \ee
 544 The
 545 .Dq escape
 546 character (ASCII code 27).
 547 .It Sy \ef
 548 The
 549 .Dq form-feed
 550 character (ASCII code 12).
 551 .It Sy \en
 552 The
 553 .Dq new-line/line-feed
 554 character (ASCII code 10).
 555 .It Sy \er
 556 The
 557 .Dq carriage-return
 558 character (ASCII code 13).
 559 .It Sy \et
 560 The
 561 .Dq horizontal-tab
 562 character (ASCII code 9).
 563 .El
 564 .Pp
 565 Literals can also be specified directly, using their wide character values.
 566 Note that when matching a multibyte character string, the string's bytes
 567 are converted to wide character before comparing.
 568 This means that a single literal wide character value may match more than
 569 one string byte, depending on the locale's wide character encoding.
 570 .Bl -tag -width ".Sy \ex{ Ns Em x.. Ns Sy \&}" -offset indent
 571 .It Sy \ex Ns Em x..
 572 An arbitray eight-bit value.
 573 The
 574 .Em x..
 575 sequence represents zero, one or two hexadecimal digits.
 576 (Note: if
 577 .Em x..
 578 is less than two hexadecimal digits, and the character following this sequence
 579 happens to be a hexadecimal digit, use the (following) brace form to avoid
 580 confusion.)
 581 .It Sy \ex{ Ns Em x.. Ns Sy \&}
 582 An arbitrary, up to 32-bit value.
 583 The
 584 .Em x..
 585 sequence is an arbitrary sequence of hexadecimal digits that is long enough
 586 to represent the necessary value.
 587 .El
 588 .Ss Inline Literal Mode (available for both enhanced basic and enhanced extended REs)
 589 A
 590 .Ql \eQ
 591 sequence causes literal
 592 .Pq Dq quote
 593 mode to be entered,
 594 while
 595 .Ql \eE
 596 ends literal mode, and returns to normal regular expression processing.
 597 This is similar to specifying the
 598 .Dv REG_NOSPEC
 599 (or
 600 .Dv REG_LITERAL )
 601 option to
 602 .Fn regcomp ,
 603 except that rather than applying to the whole RE string, it only applies to
 604 the part between the
 605 .Ql \eQ
 606 and
 607 .Ql \eE .
 608 Note that it is not possible to have a
 609 .Ql \eE
 610 in the middle of an inline literal range, as that would terminate literal mode
 611 prematurely.
 612 .Ss Minimal Repetitions (available for enhanced extended REs only)
 613 By default, the repetition operators,
 614 .Ql \&* ,
 615 .Em bound ,
 616 .Ql \&?
 617 and
 618 .Ql \&+
 619 are
 620 .Em greedy ;
 621 they try to match as many times as possible.
 622 In enhanced mode, appending a
 623 .Ql \&?
 624 to a repetition operator makes it minimal (or
 625 .Em ungreedy ) ;
 626 it tries to match the fewest number of times (including zero times, as
 627 appropriate).
 628 .Pp
 629 For example, against the string
 630 .Ql aaa ,
 631 the RE
 632 .Ql a*
 633 would match the entire string,
 634 while
 635 .Ql a*?
 636 would match the null string at the beginning of the line
 637 (matches zero times).
 638 Likewise, against the string
 639 .Ql ababab ,
 640 the RE
 641 .Ql .*b ,
 642 would also match the entire string,
 643 while
 644 .Ql .*?b
 645 would only match the first two characters.
 646 .Pp
 647 The
 648 .Fn regcomp
 649 flag
 650 .Dv REG_UNGREEDY
 651 will make the regular
 652 .Pq greedy
 653 repetition operators ungreedy by default.
 654 Appending
 655 .Ql \&?
 656 makes them greedy again.
 657 .Pp
 658 Note that minimal repetitions are not specified by an official
 659 standard, so there may be differences between different implementations.
 660 In the current implementation, minimal repetitions have a high precedence,
 661 and can cause other standards requirements to be violated.
 662 For instance, on the string
 663 .Ql aaaaa ,
 664 the RE
 665 .Ql (aaa??)*
 666 will only match the first four characters, violating the rules that the longest
 667 possible match is made and the longest subexpressions are matched.
 668 Using
 669 .Ql (aaa??)*$
 670 forces the entire string to be matched.
 671 .Ss Non-capturing Parenthesized Subexpressions (available for enhanced extended REs only)
 672 Normally, the match offsets to parenthesized subexpressions are
 673 recorded in the
 674 .Fa pmatch
 675 array (that is, when
 676 .Dv REG_NOSUB
 677 is not specified, and
 678 .Fa nmatch
 679 is large enough to encompass the parenthesized subexpression in question).
 680 In enhanced mode, if the first two characters following the left parenthesis
 681 are
 682 .Ql ?: ,
 683 grouping of the remaining contents is done, but the corresponding offsets are
 684 not recorded in the
 685 .Fa pmatch
 686 array.
 687 For example, against the string
 688 .Ql fubar ,
 689 the RE
 690 .Ql (fu)(bar)
 691 would have two subexpression matches in
 692 .Fa pmatch ;
 693 the first for
 694 .Ql fu
 695 and the second for
 696 .Ql bar .
 697 But with the RE
 698 .Ql (?:fu)(bar) ,
 699 there would only be one subexpression match, that of
 700 .Ql bar .
 701 Furthermore,
 702 against the string
 703 .Ql fufubar ,
 704 the RE
 705 .Ql (?fu)*(bar)
 706 would again match the entire string, but only
 707 .Ql bar
 708 would be recorded in
 709 .Fa pmatch .
 710 .Ss Inline Options (available for enhanced extended REs only)
 711 Like the inline literal mode mentioned above, other options can be switched
 712 on and off for part of a RE.
 713 .Ql (? Ns Em o.. Ns \&)
 714 will turn on the options specified in
 715 .Em o..
 716 (one or more options characters; see below), while
 717 .Ql (?- Ns Em o.. Ns \&)
 718 will turn off the specified options, and
 719 .Ql (? Ns Em o1.. Ns \&- Ns Em o2.. Ns \&)
 720 will turn on the first set of options, and turn off the second set.
 721 .Pp
 722 The available options are:
 723 .Bl -tag -width ".Sy \&U" -offset indent
 724 .It Sy \&i
 725 Turning on this option will ignore case during matching, while turning off
 726 will restore case-sensitive matching.
 727 If
 728 .Dv REG_ICASE
 729 was specified to
 730 .Fn regcomp ,
 731 this option can be use to turn that off.
 732 .It Sy \&n
 733 Turn on or off special handling of the newline character.
 734 If
 735 .Dv REG_NEWLINE
 736 was specified to
 737 .Fn regcomp ,
 738 this option can be use to turn that off.
 739 .It Sy \&U
 740 Turning on this option will make ungreedy repetitions the default, while
 741 turning off will make greedy repetitions the default.
 742 If
 743 .Dv REG_UNGREEDY
 744 was specified to
 745 .Fn regcomp ,
 746 this option can be use to turn that off.
 747 .El
 748 .Pp
 749 The scope of the option change begins immediately following the right
 750 parenthesis,
 751 but up to the end of the enclosing subexpression (if any).
 752 Thus, for example, given the RE
 753 .Ql (fu(?i)bar)baz ,
 754 the
 755 .Ql fu
 756 portion matches case sensitively,
 757 .Ql bar
 758 matches case insensitively, and
 759 .Ql baz
 760 matches case sensitively again (since is it outside the scope of the
 761 subexpression in which the inline option was specified).
 762 .Pp
 763 The inline options syntax can be combined with the non-capturing parenthesized
 764 subexpression to limit the option scope to just that of the subexpression.
 765 Then, for example,
 766 .Ql fu(?i:bar)baz
 767 is similar to the previous example, except for the parenthesize subexpression
 768 around
 769 .Ql fu(?i)bar
 770 in the previous example.
 771 .Ss Inline Comments (available for enhanced extended REs only)
 772 The syntax
 773 .Ql (?# Ns Em comment Ns \&)
 774 can be used to embed comments within a RE.
 775 Note that
 776 .Em comment
 777 can not contain a right parenthesis.
 778 Also note that while syntactically, option characters can be added before
 779 the
 780 .Ql \&#
 781 character, they will be ignored.
 782 .Sh SEE ALSO
 783 .Xr regex 3
 784 .Rs
 785 .%T Regular Expression Notation
 786 .%R IEEE Std
 787 .%N 1003.2
 788 .%P section 2.8
 789 .Re
 790 .Sh BUGS
 791 Having two kinds of REs is a botch.
 792 .Pp
 793 The current
 794 .St -p1003.2
 795 spec says that
 796 .Ql \&)
 797 is an ordinary character in
 798 the absence of an unmatched
 799 .Ql \&( ;
 800 this was an unintentional result of a wording error,
 801 and change is likely.
 802 Avoid relying on it.
 803 .Pp
 804 Back references are a dreadful botch,
 805 posing major problems for efficient implementations.
 806 They are also somewhat vaguely defined
 807 (does
 808 .Ql a\e(\e(b\e)*\e2\e)*d
 809 match
 810 .Ql abbbd ? ) .
 811 Avoid using them.
 812 .Pp
 813 .St -p1003.2
 814 specification of case-independent matching is vague.
 815 The
 816 .Dq one case implies all cases
 817 definition given above
 818 is current consensus among implementors as to the right interpretation.
 819 .Pp
 820 The bracket syntax for word boundaries is incredibly ugly.