1 ------------------------------------------------------------------------------
3 -- GNAT LIBRARY COMPONENTS --
5 -- G N A T . S P I T B O L . P A T T E R N S --
9 -- Copyright (C) 1997-2005 Ada Core Technologies, Inc. --
11 -- GNAT is free software; you can redistribute it and/or modify it under --
12 -- terms of the GNU General Public License as published by the Free Soft- --
13 -- ware Foundation; either version 2, or (at your option) any later ver- --
14 -- sion. GNAT is distributed in the hope that it will be useful, but WITH- --
15 -- OUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY --
16 -- or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License --
17 -- for more details. You should have received a copy of the GNU General --
18 -- Public License distributed with GNAT; see file COPYING. If not, write --
19 -- to the Free Software Foundation, 51 Franklin Street, Fifth Floor, --
20 -- Boston, MA 02110-1301, USA. --
22 -- As a special exception, if other files instantiate generics from this --
23 -- unit, or you link this unit with other files to produce an executable, --
24 -- this unit does not by itself cause the resulting executable to be --
25 -- covered by the GNU General Public License. This exception does not --
26 -- however invalidate any other reasons why the executable file might be --
27 -- covered by the GNU Public License. --
29 -- GNAT was originally developed by the GNAT team at New York University. --
30 -- Extensive contributions were provided by Ada Core Technologies Inc. --
32 ------------------------------------------------------------------------------
34 -- SPITBOL-like pattern construction and matching
36 -- This child package of GNAT.SPITBOL provides a complete implementation
37 -- of the SPITBOL-like pattern construction and matching operations. This
38 -- package is based on Macro-SPITBOL created by Robert Dewar.
40 ------------------------------------------------------------
41 -- Summary of Pattern Matching Packages in GNAT Hierarchy --
42 ------------------------------------------------------------
44 -- There are three related packages that perform pattern maching functions.
45 -- the following is an outline of these packages, to help you determine
46 -- which is best for your needs.
48 -- GNAT.Regexp (files g-regexp.ads/g-regexp.adb)
49 -- This is a simple package providing Unix-style regular expression
50 -- matching with the restriction that it matches entire strings. It
51 -- is particularly useful for file name matching, and in particular
52 -- it provides "globbing patterns" that are useful in implementing
53 -- unix or DOS style wild card matching for file names.
55 -- GNAT.Regpat (files g-regpat.ads/g-regpat.adb)
56 -- This is a more complete implementation of Unix-style regular
57 -- expressions, copied from the original V7 style regular expression
58 -- library written in C by Henry Spencer. It is functionally the
59 -- same as this library, and uses the same internal data structures
60 -- stored in a binary compatible manner.
62 -- GNAT.Spitbol.Patterns (files g-spipat.ads/g-spipat.adb)
63 -- This is a completely general patterm matching package based on the
64 -- pattern language of SNOBOL4, as implemented in SPITBOL. The pattern
65 -- language is modeled on context free grammars, with context sensitive
66 -- extensions that provide full (type 0) computational capabilities.
68 with Ada
.Finalization
; use Ada
.Finalization
;
69 with Ada
.Strings
.Maps
; use Ada
.Strings
.Maps
;
70 with Ada
.Text_IO
; use Ada
.Text_IO
;
72 package GNAT
.Spitbol
.Patterns
is
73 pragma Elaborate_Body
(Patterns
);
75 -------------------------------
76 -- Pattern Matching Tutorial --
77 -------------------------------
79 -- A pattern matching operation (a call to one of the Match subprograms)
80 -- takes a subject string and a pattern, and optionally a replacement
81 -- string. The replacement string option is only allowed if the subject
84 -- The pattern is matched against the subject string, and either the
85 -- match fails, or it succeeds matching a contiguous substring. If a
86 -- replacement string is specified, then the subject string is modified
87 -- by replacing the matched substring with the given replacement.
89 -- Concatenation and Alternation
90 -- =============================
92 -- A pattern consists of a series of pattern elements. The pattern is
93 -- built up using either the concatenation operator:
97 -- which means match A followed immediately by matching B, or the
98 -- alternation operator:
102 -- which means first attempt to match A, and then if that does not
105 -- There is full backtracking, which means that if a given pattern
106 -- element fails to match, then previous alternatives are matched.
107 -- For example if we have the pattern:
109 -- (A or B) & (C or D) & (E or F)
111 -- First we attempt to match A, if that succeeds, then we go on to try
112 -- to match C, and if that succeeds, we go on to try to match E. If E
113 -- fails, then we try F. If F fails, then we go back and try matching
114 -- D instead of C. Let's make this explicit using a specific example,
115 -- and introducing the simplest kind of pattern element, which is a
116 -- literal string. The meaning of this pattern element is simply to
117 -- match the characters that correspond to the string characters. Now
118 -- let's rewrite the above pattern form with specific string literals
119 -- as the pattern elements:
121 -- ("ABC" or "AB") & ("DEF" or "CDE") & ("GH" or "IJ")
123 -- The following strings will be attempted in sequence:
134 -- Here we use the dot simply to separate the pieces of the string
135 -- matched by the three separate elements.
137 -- Moving the Start Point
138 -- ======================
140 -- A pattern is not required to match starting at the first character
141 -- of the string, and is not required to match to the end of the string.
142 -- The first attempt does indeed attempt to match starting at the first
143 -- character of the string, trying all the possible alternatives. But
144 -- if all alternatives fail, then the starting point of the match is
145 -- moved one character, and all possible alternatives are attempted at
146 -- the new anchor point.
148 -- The entire match fails only when every possible starting point has
149 -- been attempted. As an example, suppose that we had the subject
154 -- matched using the pattern in the previous example:
156 -- ("ABC" or "AB") & ("DEF" or "CDE") & ("GH" or "IJ")
158 -- would succeed, afer two anchor point moves:
165 -- This mode of pattern matching is called the unanchored mode. It is
166 -- also possible to put the pattern matcher into anchored mode by
167 -- setting the global variable Anchored_Mode to True. This will cause
168 -- all subsequent matches to be performed in anchored mode, where the
169 -- match is required to start at the first character.
171 -- We will also see later how the effect of an anchored match can be
172 -- obtained for a single specified anchor point if this is desired.
174 -- Other Pattern Elements
175 -- ======================
177 -- In addition to strings (or single characters), there are many special
178 -- pattern elements that correspond to special predefined alternations:
180 -- Arb Matches any string. First it matches the null string, and
181 -- then on a subsequent failure, matches one character, and
182 -- then two characters, and so on. It only fails if the
183 -- entire remaining string is matched.
185 -- Bal Matches a non-empty string that is parentheses balanced
186 -- with respect to ordinary () characters. Examples of
187 -- balanced strings are "ABC", "A((B)C)", and "A(B)C(D)E".
188 -- Bal matches the shortest possible balanced string on the
189 -- first attempt, and if there is a subsequent failure,
190 -- attempts to extend the string.
192 -- Cancel Immediately aborts the entire pattern match, signalling
193 -- failure. This is a specialized pattern element, which is
194 -- useful in conjunction with some of the special pattern
195 -- elements that have side effects.
197 -- Fail The null alternation. Matches no possible strings, so it
198 -- always signals failure. This is a specialized pattern
199 -- element, which is useful in conjunction with some of the
200 -- special pattern elements that have side effects.
202 -- Fence Matches the null string at first, and then if a failure
203 -- causes alternatives to be sought, aborts the match (like
204 -- a Cancel). Note that using Fence at the start of a pattern
205 -- has the same effect as matching in anchored mode.
207 -- Rest Matches from the current point to the last character in
208 -- the string. This is a specialized pattern element, which
209 -- is useful in conjunction with some of the special pattern
210 -- elements that have side effects.
212 -- Succeed Repeatedly matches the null string (it is equivalent to
213 -- the alternation ("" or "" or "" ....). This is a special
214 -- pattern element, which is useful in conjunction with some
215 -- of the special pattern elements that have side effects.
217 -- Pattern Construction Functions
218 -- ==============================
220 -- The following functions construct additional pattern elements
222 -- Any(S) Where S is a string, matches a single character that is
223 -- any one of the characters in S. Fails if the current
224 -- character is not one of the given set of characters.
226 -- Arbno(P) Where P is any pattern, matches any number of instances
227 -- of the pattern, starting with zero occurrences. It is
228 -- thus equivalent to ("" or (P & ("" or (P & ("" ....)))).
229 -- The pattern P may contain any number of pattern elements
230 -- including the use of alternatiion and concatenation.
232 -- Break(S) Where S is a string, matches a string of zero or more
233 -- characters up to but not including a break character
234 -- that is one of the characters given in the string S.
235 -- Can match the null string, but cannot match the last
236 -- character in the string, since a break character is
237 -- required to be present.
239 -- BreakX(S) Where S is a string, behaves exactly like Break(S) when
240 -- it first matches, but if a string is successfully matched,
241 -- then a susequent failure causes an attempt to extend the
244 -- Fence(P) Where P is a pattern, attempts to match the pattern P
245 -- including trying all possible alternatives of P. If none
246 -- of these alternatives succeeds, then the Fence pattern
247 -- fails. If one alternative succeeds, then the pattern
248 -- match proceeds, but on a subsequent failure, no attempt
249 -- is made to search for alternative matches of P. The
250 -- pattern P may contain any number of pattern elements
251 -- including the use of alternatiion and concatenation.
253 -- Len(N) Where N is a natural number, matches the given number of
254 -- characters. For example, Len(10) matches any string that
255 -- is exactly ten characters long.
257 -- NotAny(S) Where S is a string, matches a single character that is
258 -- not one of the characters of S. Fails if the current
259 -- characer is one of the given set of characters.
261 -- NSpan(S) Where S is a string, matches a string of zero or more
262 -- characters that is among the characters given in the
263 -- string. Always matches the longest possible such string.
264 -- Always succeeds, since it can match the null string.
266 -- Pos(N) Where N is a natural number, matches the null string
267 -- if exactly N characters have been matched so far, and
270 -- Rpos(N) Where N is a natural number, matches the null string
271 -- if exactly N characters remain to be matched, and
274 -- Rtab(N) Where N is a natural number, matches characters from
275 -- the current position until exactly N characters remain
276 -- to be matched in the string. Fails if fewer than N
277 -- unmatched characters remain in the string.
279 -- Tab(N) Where N is a natural number, matches characters from
280 -- the current position until exactly N characters have
281 -- been matched in all. Fails if more than N characters
282 -- have already been matched.
284 -- Span(S) Where S is a string, matches a string of one or more
285 -- characters that is among the characters given in the
286 -- string. Always matches the longest possible such string.
287 -- Fails if the current character is not one of the given
288 -- set of characters.
290 -- Recursive Pattern Matching
291 -- ==========================
293 -- The plus operator (+P) where P is a pattern variable, creates
294 -- a recursive pattern that will, at pattern matching time, follow
295 -- the pointer to obtain the referenced pattern, and then match this
296 -- pattern. This may be used to construct recursive patterns. Consider
299 -- P := ("A" or ("B" & (+P)))
301 -- On the first attempt, this pattern attempts to match the string "A".
302 -- If this fails, then the alternative matches a "B", followed by an
303 -- attempt to match P again. This second attempt first attempts to
304 -- match "A", and so on. The result is a pattern that will match a
305 -- string of B's followed by a single A.
307 -- This particular example could simply be written as NSpan('B') & 'A',
308 -- but the use of recursive patterns in the general case can construct
309 -- complex patterns which could not otherwise be built.
311 -- Pattern Assignment Operations
312 -- =============================
314 -- In addition to the overall result of a pattern match, which indicates
315 -- success or failure, it is often useful to be able to keep track of
316 -- the pieces of the subject string that are matched by individual
317 -- pattern elements, or subsections of the pattern.
319 -- The pattern assignment operators allow this capability. The first
320 -- form is the immediate assignment:
324 -- Here P is an arbitrary pattern, and S is a variable of type VString
325 -- that will be set to the substring matched by P. This assignment
326 -- happens during pattern matching, so if P matches more than once,
327 -- then the assignment happens more than once.
329 -- The deferred assignment operation:
333 -- avoids these multiple assignments by deferring the assignment to the
334 -- end of the match. If the entire match is successful, and if the
335 -- pattern P was part of the successful match, then at the end of the
336 -- matching operation the assignment to S of the string matching P is
339 -- The cursor assignment operation:
343 -- assigns the current cursor position to the natural variable N. The
344 -- cursor position is defined as the count of characters that have been
345 -- matched so far (including any start point moves).
347 -- Finally the operations * and ** may be used with values of type
348 -- Text_IO.File_Access. The effect is to do a Put_Line operation of
349 -- the matched substring. These are particularly useful in debugging
355 -- The pattern construction functions (such as Len and Any) all permit
356 -- the use of pointers to natural or string values, or functions that
357 -- return natural or string values. These forms cause the actual value
358 -- to be obtained at pattern matching time. This allows interesting
359 -- possibilities for constructing dynamic patterns as illustrated in
360 -- the examples section.
362 -- In addition the (+S) operator may be used where S is a pointer to
363 -- string or function returning string, with a similar deferred effect.
365 -- A special use of deferred matching is the construction of predicate
366 -- functions. The element (+P) where P is an access to a function that
367 -- returns a Boolean value, causes the function to be called at the
368 -- time the element is matched. If the function returns True, then the
369 -- null string is matched, if the function returns False, then failure
370 -- is signalled and previous alternatives are sought.
372 -- Deferred Replacement
373 -- ====================
375 -- The simple model given for pattern replacement (where the matched
376 -- substring is replaced by the string given as the third argument to
377 -- Match) works fine in simple cases, but this approach does not work
378 -- in the case where the expression used as the replacement string is
379 -- dependent on values set by the match.
381 -- For example, suppose we want to find an instance of a parenthesized
382 -- character, and replace the parentheses with square brackets. At first
383 -- glance it would seem that:
385 -- Match (Subject, '(' & Len (1) * Char & ')', '[' & Char & ']');
387 -- would do the trick, but that does not work, because the third
388 -- argument to Match gets evaluated too early, before the call to
389 -- Match, and before the pattern match has had a chance to set Char.
391 -- To solve this problem we provide the deferred replacement capability.
392 -- With this approach, which of course is only needed if the pattern
393 -- involved has side effects, is to do the match in two stages. The
394 -- call to Match sets a pattern result in a variable of the private
395 -- type Match_Result, and then a subsequent Replace operation uses
396 -- this Match_Result object to perform the required replacement.
398 -- Using this approach, we can now write the above operation properly
399 -- in a manner that will work:
403 -- Match (Subject, '(' & Len (1) * Char & ')', M);
404 -- Replace (M, '[' & Char & ']');
406 -- As with other Match cases, there is a function and procedure form
407 -- of this match call. A call to Replace after a failed match has no
408 -- effect. Note that Subject should not be modified between the calls.
410 -- Examples of Pattern Matching
411 -- ============================
413 -- First a simple example of the use of pattern replacement to remove
414 -- a line number from the start of a string. We assume that the line
415 -- number has the form of a string of decimal digits followed by a
416 -- period, followed by one or more spaces.
418 -- Digs : constant Pattern := Span("0123456789");
420 -- Lnum : constant Pattern := Pos(0) & Digs & '.' & Span(' ');
422 -- Now to use this pattern we simply do a match with a replacement:
424 -- Match (Line, Lnum, "");
426 -- which replaces the line number by the null string. Note that it is
427 -- also possible to use an Ada.Strings.Maps.Character_Set value as an
428 -- argument to Span and similar functions, and in particular all the
429 -- useful constants 'in Ada.Strings.Maps.Constants are available. This
430 -- means that we could define Digs as:
432 -- Digs : constant Pattern := Span(Decimal_Digit_Set);
434 -- The style we use here, of defining constant patterns and then using
435 -- them is typical. It is possible to build up patterns dynamically,
436 -- but it is usually more efficient to build them in pieces in advance
437 -- using constant declarations. Note in particular that although it is
438 -- possible to construct a pattern directly as an argument for the
439 -- Match routine, it is much more efficient to preconstruct the pattern
440 -- as we did in this example.
442 -- Now let's look at the use of pattern assignment to break a
443 -- string into sections. Suppose that the input string has two
444 -- unsigned decimal integers, separated by spaces or a comma,
445 -- with spaces allowed anywhere. Then we can isolate the two
446 -- numbers with the following pattern:
448 -- Num1, Num2 : aliased VString;
450 -- B : constant Pattern := NSpan(' ');
452 -- N : constant Pattern := Span("0123456789");
454 -- T : constant Pattern :=
455 -- NSpan(' ') & N * Num1 & Span(" ,") & N * Num2;
457 -- The match operation Match (" 124, 257 ", T) would assign the
458 -- string 124 to Num1 and the string 257 to Num2.
460 -- Now let's see how more complex elements can be built from the
461 -- set of primitive elements. The following pattern matches strings
462 -- that have the syntax of Ada 95 based literals:
464 -- Digs : constant Pattern := Span(Decimal_Digit_Set);
465 -- UDigs : constant Pattern := Digs & Arbno('_' & Digs);
467 -- Edig : constant Pattern := Span(Hexadecimal_Digit_Set);
468 -- UEdig : constant Pattern := Edig & Arbno('_' & Edig);
470 -- Bnum : constant Pattern := Udigs & '#' & UEdig & '#';
472 -- A match against Bnum will now match the desired strings, e.g.
473 -- it will match 16#123_abc#, but not a#b#. However, this pattern
474 -- is not quite complete, since it does not allow colons to replace
475 -- the pound signs. The following is more complete:
477 -- Bchar : constant Pattern := Any("#:");
478 -- Bnum : constant Pattern := Udigs & Bchar & UEdig & Bchar;
480 -- but that is still not quite right, since it allows # and : to be
481 -- mixed, and they are supposed to be used consistently. We solve
482 -- this by using a deferred match.
484 -- Temp : aliased VString;
486 -- Bnum : constant Pattern :=
487 -- Udigs & Bchar * Temp & UEdig & (+Temp)
489 -- Here the first instance of the base character is stored in Temp, and
490 -- then later in the pattern we rematch the value that was assigned.
492 -- For an example of a recursive pattern, let's define a pattern
493 -- that is like the built in Bal, but the string matched is balanced
494 -- with respect to square brackets or curly brackets.
496 -- The language for such strings might be defined in extended BNF as
498 -- ELEMENT ::= <any character other than [] or {}>
499 -- | '[' BALANCED_STRING ']'
500 -- | '{' BALANCED_STRING '}'
502 -- BALANCED_STRING ::= ELEMENT {ELEMENT}
504 -- Here we use {} to indicate zero or more occurrences of a term, as
505 -- is common practice in extended BNF. Now we can translate the above
506 -- BNF into recursive patterns as follows:
508 -- Element, Balanced_String : aliased Pattern;
512 -- Element := NotAny ("[]{}")
514 -- ('[' & (+Balanced_String) & ']')
516 -- ('{' & (+Balanced_String) & '}');
518 -- Balanced_String := Element & Arbno (Element);
520 -- Note the important use of + here to refer to a pattern not yet
521 -- defined. Note also that we use assignments precisely because we
522 -- cannot refer to as yet undeclared variables in initializations.
524 -- Now that this pattern is constructed, we can use it as though it
525 -- were a new primitive pattern element, and for example, the match:
527 -- Match ("xy[ab{cd}]", Balanced_String * Current_Output & Fail);
529 -- will generate the output:
547 -- Note that the function of the fail here is simply to force the
548 -- pattern Balanced_String to match all possible alternatives. Studying
549 -- the operation of this pattern in detail is highly instructive.
551 -- Finally we give a rather elaborate example of the use of deferred
552 -- matching. The following declarations build up a pattern which will
553 -- find the longest string of decimal digits in the subject string.
555 -- Max, Cur : VString;
558 -- function GtS return Boolean is
560 -- return Length (Cur) > Length (Max);
563 -- Digit : constant Character_Set := Decimal_Digit_Set;
565 -- Digs : constant Pattern := Span(Digit);
567 -- Find : constant Pattern :=
568 -- "" * Max & Fence & -- initialize Max to null
569 -- BreakX (Digit) & -- scan looking for digits
570 -- ((Span(Digit) * Cur & -- assign next string to Cur
571 -- (+GtS'Unrestricted_Access) & -- check size(Cur) > Size(Max)
572 -- Setcur(Loc'Access)) -- if so, save location
573 -- * Max) & -- and assign to Max
574 -- Fail; -- seek all alternatives
576 -- As we see from the comments here, complex patterns like this take
577 -- on aspects of sequential programs. In fact they are sequential
578 -- programs with general backtracking. In this pattern, we first use
579 -- a pattern assignment that matches null and assigns it to Max, so
580 -- that it is initialized for the new match. Now BreakX scans to the
581 -- next digit. Arb would do here, but BreakX will be more efficient.
582 -- Once we have found a digit, we scan out the longest string of
583 -- digits with Span, and assign it to Cur. The deferred call to GtS
584 -- tests if the string we assigned to Cur is the longest so far. If
585 -- not, then failure is signalled, and we seek alternatives (this
586 -- means that BreakX will extend and look for the next digit string).
587 -- If the call to GtS succeeds then the matched string is assigned
588 -- as the largest string so far into Max and its location is saved
589 -- in Loc. Finally Fail forces the match to fail and seek alternatives,
590 -- so that the entire string is searched.
592 -- If the pattern Find is matched against a string, the variable Max
593 -- at the end of the pattern will have the longest string of digits,
594 -- and Loc will be the starting character location of the string. For
595 -- example, Match("ab123cd4657ef23", Find) will assign "4657" to Max
596 -- and 11 to Loc (indicating that the string ends with the eleventh
597 -- character of the string).
599 -- Note: the use of Unrestricted_Access to reference GtS will not
600 -- be needed if GtS is defined at the outer level, but definitely
601 -- will be necessary if GtS is a nested function (in which case of
602 -- course the scope of the pattern Find will be restricted to this
603 -- nested scope, and this cannot be checked, i.e. use of the pattern
604 -- outside this scope is erroneous). Generally it is a good idea to
605 -- define patterns and the functions they call at the outer level
606 -- where possible, to avoid such problems.
608 -- Correspondence with Pattern Matching in SPITBOL
609 -- ===============================================
611 -- Generally the Ada syntax and names correspond closely to SPITBOL
612 -- syntax for pattern matching construction.
614 -- The basic pattern construction operators are renamed as follows:
623 -- The Ada operators were chosen so that the relative precedences of
624 -- these operators corresponds to that of the Spitbol operators, but
625 -- as always, the use of parentheses is advisable to clarify.
627 -- The pattern construction operators all have similar names except for
634 -- where we have clashes with Ada reserved names.
636 -- Ada requires the use of 'Access to refer to functions used in the
637 -- pattern match, and often the use of 'Unrestricted_Access may be
638 -- necessary to get around the scope restrictions if the functions
639 -- are not declared at the outer level.
641 -- The actual pattern matching syntax is modified in Ada as follows:
646 -- X Y = Z Match (X, Y, Z);
648 -- and pattern failure is indicated by returning a Boolean result from
649 -- the Match function (True for success, False for failure).
651 -----------------------
652 -- Type Declarations --
653 -----------------------
655 type Pattern
is private;
656 -- Type representing a pattern. This package provides a complete set of
657 -- operations for constructing patterns that can be used in the pattern
658 -- matching operations provided.
660 type Boolean_Func
is access function return Boolean;
661 -- General Boolean function type. When this type is used as a formal
662 -- parameter type in this package, it indicates a deferred predicate
663 -- pattern. The function will be called when the pattern element is
664 -- matched and failure signalled if False is returned.
666 type Natural_Func
is access function return Natural;
667 -- General Natural function type. When this type is used as a formal
668 -- parameter type in this package, it indicates a deferred pattern.
669 -- The function will be called when the pattern element is matched
670 -- to obtain the currently referenced Natural value.
672 type VString_Func
is access function return VString
;
673 -- General VString function type. When this type is used as a formal
674 -- parameter type in this package, it indicates a deferred pattern.
675 -- The function will be called when the pattern element is matched
676 -- to obtain the currently referenced string value.
678 subtype PString
is String;
679 -- This subtype is used in the remainder of the package to indicate a
680 -- formal parameter that is converted to its corresponding pattern,
681 -- i.e. a pattern that matches the characters of the string.
683 subtype PChar
is Character;
684 -- Similarly, this subtype is used in the remainder of the package to
685 -- indicate a formal parameter that is converted to its corresponding
686 -- pattern, i.e. a pattern that matches this one character.
688 subtype VString_Var
is VString
;
689 subtype Pattern_Var
is Pattern
;
690 -- These synonyms are used as formal parameter types to a function where,
691 -- if the language allowed, we would use in out parameters, but we are
692 -- not allowed to have in out parameters for functions. Instead we pass
693 -- actuals which must be variables, and with a bit of trickery in the
694 -- body, manage to interprete them properly as though they were indeed
695 -- in out parameters.
697 --------------------------------
698 -- Basic Pattern Construction --
699 --------------------------------
701 function "&" (L
: Pattern
; R
: Pattern
) return Pattern
;
702 function "&" (L
: PString
; R
: Pattern
) return Pattern
;
703 function "&" (L
: Pattern
; R
: PString
) return Pattern
;
704 function "&" (L
: PChar
; R
: Pattern
) return Pattern
;
705 function "&" (L
: Pattern
; R
: PChar
) return Pattern
;
707 -- Pattern concatenation. Matches L followed by R.
709 function "or" (L
: Pattern
; R
: Pattern
) return Pattern
;
710 function "or" (L
: PString
; R
: Pattern
) return Pattern
;
711 function "or" (L
: Pattern
; R
: PString
) return Pattern
;
712 function "or" (L
: PString
; R
: PString
) return Pattern
;
713 function "or" (L
: PChar
; R
: Pattern
) return Pattern
;
714 function "or" (L
: Pattern
; R
: PChar
) return Pattern
;
715 function "or" (L
: PChar
; R
: PChar
) return Pattern
;
716 function "or" (L
: PString
; R
: PChar
) return Pattern
;
717 function "or" (L
: PChar
; R
: PString
) return Pattern
;
718 -- Pattern alternation. Creates a pattern that will first try to match
719 -- L and then on a subsequent failure, attempts to match R instead.
721 ----------------------------------
722 -- Pattern Assignment Functions --
723 ----------------------------------
725 function "*" (P
: Pattern
; Var
: VString_Var
) return Pattern
;
726 function "*" (P
: PString
; Var
: VString_Var
) return Pattern
;
727 function "*" (P
: PChar
; Var
: VString_Var
) return Pattern
;
728 -- Matches P, and if the match succeeds, assigns the matched substring
729 -- to the given VString variable S. This assignment happens as soon as
730 -- the substring is matched, and if the pattern P1 is matched more than
731 -- once during the course of the match, then the assignment will occur
734 function "**" (P
: Pattern
; Var
: VString_Var
) return Pattern
;
735 function "**" (P
: PString
; Var
: VString_Var
) return Pattern
;
736 function "**" (P
: PChar
; Var
: VString_Var
) return Pattern
;
737 -- Like "*" above, except that the assignment happens at most once
738 -- after the entire match is completed successfully. If the match
739 -- fails, then no assignment takes place.
741 ----------------------------------
742 -- Deferred Matching Operations --
743 ----------------------------------
745 function "+" (Str
: VString_Var
) return Pattern
;
746 -- Here Str must be a VString variable. This function constructs a
747 -- pattern which at pattern matching time will access the current
748 -- value of this variable, and match against these characters.
750 function "+" (Str
: VString_Func
) return Pattern
;
751 -- Constructs a pattern which at pattern matching time calls the given
752 -- function, and then matches against the string or character value
753 -- that is returned by the call.
755 function "+" (P
: Pattern_Var
) return Pattern
;
756 -- Here P must be a Pattern variable. This function constructs a
757 -- pattern which at pattern matching time will access the current
758 -- value of this variable, and match against the pattern value.
760 function "+" (P
: Boolean_Func
) return Pattern
;
761 -- Constructs a predicate pattern function that at pattern matching time
762 -- calls the given function. If True is returned, then the pattern matches.
763 -- If False is returned, then failure is signalled.
765 --------------------------------
766 -- Pattern Building Functions --
767 --------------------------------
769 function Arb
return Pattern
;
770 -- Constructs a pattern that will match any string. On the first attempt,
771 -- the pattern matches a null string, then on each successive failure, it
772 -- matches one more character, and only fails if matching the entire rest
775 function Arbno
(P
: Pattern
) return Pattern
;
776 function Arbno
(P
: PString
) return Pattern
;
777 function Arbno
(P
: PChar
) return Pattern
;
778 -- Pattern repetition. First matches null, then on a subsequent failure
779 -- attempts to match an additional instance of the given pattern.
780 -- Equivalent to (but more efficient than) P & ("" or (P & ("" or ...
782 function Any
(Str
: String) return Pattern
;
783 function Any
(Str
: VString
) return Pattern
;
784 function Any
(Str
: Character) return Pattern
;
785 function Any
(Str
: Character_Set
) return Pattern
;
786 function Any
(Str
: access VString
) return Pattern
;
787 function Any
(Str
: VString_Func
) return Pattern
;
788 -- Constructs a pattern that matches a single character that is one of
789 -- the characters in the given argument. The pattern fails if the current
790 -- character is not in Str.
792 function Bal
return Pattern
;
793 -- Constructs a pattern that will match any non-empty string that is
794 -- parentheses balanced with respect to the normal parentheses characters.
795 -- Attempts to extend the string if a subsequent failure occurs.
797 function Break
(Str
: String) return Pattern
;
798 function Break
(Str
: VString
) return Pattern
;
799 function Break
(Str
: Character) return Pattern
;
800 function Break
(Str
: Character_Set
) return Pattern
;
801 function Break
(Str
: access VString
) return Pattern
;
802 function Break
(Str
: VString_Func
) return Pattern
;
803 -- Constructs a pattern that matches a (possibly null) string which
804 -- is immediately followed by a character in the given argument. This
805 -- character is not part of the matched string. The pattern fails if
806 -- the remaining characters to be matched do not include any of the
807 -- characters in Str.
809 function BreakX
(Str
: String) return Pattern
;
810 function BreakX
(Str
: VString
) return Pattern
;
811 function BreakX
(Str
: Character) return Pattern
;
812 function BreakX
(Str
: Character_Set
) return Pattern
;
813 function BreakX
(Str
: access VString
) return Pattern
;
814 function BreakX
(Str
: VString_Func
) return Pattern
;
815 -- Like Break, but the pattern attempts to extend on a failure to find
816 -- the next occurrence of a character in Str, and only fails when the
817 -- last such instance causes a failure.
819 function Cancel
return Pattern
;
820 -- Constructs a pattern that immediately aborts the entire match
822 function Fail
return Pattern
;
823 -- Constructs a pattern that always fails.
825 function Fence
return Pattern
;
826 -- Constructs a pattern that matches null on the first attempt, and then
827 -- causes the entire match to be aborted if a subsequent failure occurs.
829 function Fence
(P
: Pattern
) return Pattern
;
830 -- Constructs a pattern that first matches P. if P fails, then the
831 -- constructed pattern fails. If P succeeds, then the match proceeds,
832 -- but if subsequent failure occurs, alternatives in P are not sought.
833 -- The idea of Fence is that each time the pattern is matched, just
834 -- one attempt is made to match P, without trying alternatives.
836 function Len
(Count
: Natural) return Pattern
;
837 function Len
(Count
: access Natural) return Pattern
;
838 function Len
(Count
: Natural_Func
) return Pattern
;
839 -- Constructs a pattern that matches exactly the given number of
840 -- characters. The pattern fails if fewer than this number of characters
841 -- remain to be matched in the string.
843 function NotAny
(Str
: String) return Pattern
;
844 function NotAny
(Str
: VString
) return Pattern
;
845 function NotAny
(Str
: Character) return Pattern
;
846 function NotAny
(Str
: Character_Set
) return Pattern
;
847 function NotAny
(Str
: access VString
) return Pattern
;
848 function NotAny
(Str
: VString_Func
) return Pattern
;
849 -- Constructs a pattern that matches a single character that is not
850 -- one of the characters in the given argument. The pattern Fails if
851 -- the current character is in Str.
853 function NSpan
(Str
: String) return Pattern
;
854 function NSpan
(Str
: VString
) return Pattern
;
855 function NSpan
(Str
: Character) return Pattern
;
856 function NSpan
(Str
: Character_Set
) return Pattern
;
857 function NSpan
(Str
: access VString
) return Pattern
;
858 function NSpan
(Str
: VString_Func
) return Pattern
;
859 -- Constructs a pattern that matches the longest possible string
860 -- consisting entirely of characters from the given argument. The
861 -- string may be empty, so this pattern always succeeds.
863 function Pos
(Count
: Natural) return Pattern
;
864 function Pos
(Count
: access Natural) return Pattern
;
865 function Pos
(Count
: Natural_Func
) return Pattern
;
866 -- Constructs a pattern that matches the null string if exactly Count
867 -- characters have already been matched, and otherwise fails.
869 function Rest
return Pattern
;
870 -- Constructs a pattern that always succeeds, matching the remaining
871 -- unmatched characters in the pattern.
873 function Rpos
(Count
: Natural) return Pattern
;
874 function Rpos
(Count
: access Natural) return Pattern
;
875 function Rpos
(Count
: Natural_Func
) return Pattern
;
876 -- Constructs a pattern that matches the null string if exactly Count
877 -- characters remain to be matched in the string, and otherwise fails.
879 function Rtab
(Count
: Natural) return Pattern
;
880 function Rtab
(Count
: access Natural) return Pattern
;
881 function Rtab
(Count
: Natural_Func
) return Pattern
;
882 -- Constructs a pattern that matches from the current location until
883 -- exactly Count characters remain to be matched in the string. The
884 -- pattern fails if fewer than Count characters remain to be matched.
886 function Setcur
(Var
: access Natural) return Pattern
;
887 -- Constructs a pattern that matches the null string, and assigns the
888 -- current cursor position in the string. This value is the number of
889 -- characters matched so far. So it is zero at the start of the match.
891 function Span
(Str
: String) return Pattern
;
892 function Span
(Str
: VString
) return Pattern
;
893 function Span
(Str
: Character) return Pattern
;
894 function Span
(Str
: Character_Set
) return Pattern
;
895 function Span
(Str
: access VString
) return Pattern
;
896 function Span
(Str
: VString_Func
) return Pattern
;
897 -- Constructs a pattern that matches the longest possible string
898 -- consisting entirely of characters from the given argument. The
899 -- string cannot be empty , so the pattern fails if the current
900 -- character is not one of the characters in Str.
902 function Succeed
return Pattern
;
903 -- Constructs a pattern that succeeds matching null, both on the first
904 -- attempt, and on any rematch attempt, i.e. it is equivalent to an
905 -- infinite alternation of null strings.
907 function Tab
(Count
: Natural) return Pattern
;
908 function Tab
(Count
: access Natural) return Pattern
;
909 function Tab
(Count
: Natural_Func
) return Pattern
;
910 -- Constructs a pattern that from the current location until Count
911 -- characters have been matched. The pattern fails if more than Count
912 -- characters have already been matched.
914 ---------------------------------
915 -- Pattern Matching Operations --
916 ---------------------------------
918 -- The Match function performs an actual pattern matching operation.
919 -- The versions with three parameters perform a match without modifying
920 -- the subject string and return a Boolean result indicating if the
921 -- match is successful or not. The Anchor parameter is set to True to
922 -- obtain an anchored match in which the pattern is required to match
923 -- the first character of the string. In an unanchored match, which is
925 -- the default, successive attempts are made to match the given pattern
926 -- at each character of the subject string until a match succeeds, or
927 -- until all possibilities have failed.
929 -- Note that pattern assignment functions in the pattern may generate
930 -- side effects, so these functions are not necessarily pure.
932 Anchored_Mode
: Boolean := False;
933 -- This global variable can be set True to cause all subsequent pattern
934 -- matches to operate in anchored mode. In anchored mode, no attempt is
935 -- made to move the anchor point, so that if the match succeeds it must
936 -- succeed starting at the first character. Note that the effect of
937 -- anchored mode may be achieved in individual pattern matches by using
938 -- Fence or Pos(0) at the start of the pattern.
940 Pattern_Stack_Overflow
: exception;
941 -- Exception raised if internal pattern matching stack overflows. This
942 -- is typically the result of runaway pattern recursion. If there is a
943 -- genuine case of stack overflow, then either the match must be broken
944 -- down into simpler steps, or the stack limit must be reset.
946 Stack_Size
: constant Positive := 2000;
947 -- Size used for internal pattern matching stack. Increase this size if
948 -- complex patterns cause Pattern_Stack_Overflow to be raised.
950 -- Simple match functions. The subject is matched against the pattern.
951 -- Any immediate or deferred assignments or writes are executed, and
952 -- the returned value indicates whether or not the match succeeded.
956 Pat
: Pattern
) return Boolean;
960 Pat
: PString
) return Boolean;
964 Pat
: Pattern
) return Boolean;
968 Pat
: PString
) return Boolean;
970 -- Replacement functions. The subject is matched against the pattern.
971 -- Any immediate or deferred assignments or writes are executed, and
972 -- the returned value indicates whether or not the match succeeded.
973 -- If the match succeeds, then the matched part of the subject string
974 -- is replaced by the given Replace string.
977 (Subject
: VString_Var
;
979 Replace
: VString
) return Boolean;
982 (Subject
: VString_Var
;
984 Replace
: VString
) return Boolean;
987 (Subject
: VString_Var
;
989 Replace
: String) return Boolean;
992 (Subject
: VString_Var
;
994 Replace
: String) return Boolean;
996 -- Simple match procedures. The subject is matched against the pattern.
997 -- Any immediate or deferred assignments or writes are executed. No
998 -- indication of success or failure is returned.
1016 -- Replacement procedures. The subject is matched against the pattern.
1017 -- Any immediate or deferred assignments or writes are executed. No
1018 -- indication of success or failure is returned. If the match succeeds,
1019 -- then the matched part of the subject string is replaced by the given
1023 (Subject
: in out VString
;
1028 (Subject
: in out VString
;
1033 (Subject
: in out VString
;
1038 (Subject
: in out VString
;
1042 -- Deferred Replacement
1044 type Match_Result
is private;
1045 -- Type used to record result of pattern match
1047 subtype Match_Result_Var
is Match_Result
;
1048 -- This synonyms is used as a formal parameter type to a function where,
1049 -- if the language allowed, we would use an in out parameter, but we are
1050 -- not allowed to have in out parameters for functions. Instead we pass
1051 -- actuals which must be variables, and with a bit of trickery in the
1052 -- body, manage to interprete them properly as though they were indeed
1053 -- in out parameters.
1056 (Subject
: VString_Var
;
1058 Result
: Match_Result_Var
) return Boolean;
1061 (Subject
: in out VString
;
1063 Result
: out Match_Result
);
1066 (Result
: in out Match_Result
;
1068 -- Given a previous call to Match which set Result, performs a pattern
1069 -- replacement if the match was successful. Has no effect if the match
1070 -- failed. This call should immediately follow the Match call.
1072 ------------------------
1073 -- Debugging Routines --
1074 ------------------------
1076 -- Debugging pattern matching operations can often be quite complex,
1077 -- since there is no obvious way to trace the progress of the match.
1078 -- The declarations in this section provide some debugging assistance.
1080 Debug_Mode
: Boolean := False;
1081 -- This global variable can be set True to generate debugging on all
1082 -- subsequent calls to Match. The debugging output is a full trace of
1083 -- the actions of the pattern matcher, written to Standard_Output. The
1084 -- level of this information is intended to be comprehensible at the
1085 -- abstract level of this package declaration. However, note that the
1086 -- use of this switch often generates large amounts of output.
1088 function "*" (P
: Pattern
; Fil
: File_Access
) return Pattern
;
1089 function "*" (P
: PString
; Fil
: File_Access
) return Pattern
;
1090 function "*" (P
: PChar
; Fil
: File_Access
) return Pattern
;
1091 function "**" (P
: Pattern
; Fil
: File_Access
) return Pattern
;
1092 function "**" (P
: PString
; Fil
: File_Access
) return Pattern
;
1093 function "**" (P
: PChar
; Fil
: File_Access
) return Pattern
;
1094 -- These are similar to the corresponding pattern assignment operations
1095 -- except that instead of setting the value of a variable, the matched
1096 -- substring is written to the appropriate file. This can be useful in
1097 -- following the progress of a match without generating the full amount
1099 -- of information obtained by setting Debug_Mode to True.
1101 Terminal
: constant File_Access
:= Standard_Error
;
1102 Output
: constant File_Access
:= Standard_Output
;
1103 -- Two handy synonyms for use with the above pattern write operations.
1105 -- Finally we have some routines that are useful for determining what
1106 -- patterns are in use, particularly if they are constructed dynamically.
1108 function Image
(P
: Pattern
) return String;
1109 function Image
(P
: Pattern
) return VString
;
1110 -- This procedures yield strings that corresponds to the syntax needed
1111 -- to create the given pattern using the functions in this package. The
1112 -- form of this string is such that it could actually be compiled and
1113 -- evaluated to yield the required pattern except for references to
1114 -- variables and functions, which are output using one of the following
1117 -- access Natural NP(16#...#)
1118 -- access Pattern PP(16#...#)
1119 -- access VString VP(16#...#)
1121 -- Natural_Func NF(16#...#)
1122 -- VString_Func VF(16#...#)
1124 -- where 16#...# is the hex representation of the integer address that
1125 -- corresponds to the given access value
1127 procedure Dump
(P
: Pattern
);
1128 -- This procedure writes information about the pattern to Standard_Out.
1129 -- The format of this information is keyed to the internal data structures
1130 -- used to implement patterns. The information provided by Dump is thus
1131 -- more precise than that yielded by Image, but is also a bit more obscure
1132 -- (i.e. it cannot be interpreted solely in terms of this spec, you have
1133 -- to know something about the data structures).
1141 -- Pattern element, a pattern is a plex structure of PE's. This type
1142 -- is defined and sdescribed in the body of this package.
1144 type PE_Ptr
is access all PE
;
1145 -- Pattern reference. PE's use PE_Ptr values to reference other PE's
1147 type Pattern
is new Controlled
with record
1149 -- Maximum number of stack entries required for matching this
1150 -- pattern. See description of pattern history stack in body.
1153 -- Pointer to initial pattern element for pattern
1156 pragma Finalize_Storage_Only
(Pattern
);
1158 procedure Adjust
(Object
: in out Pattern
);
1159 -- Adjust routine used to copy pattern objects
1161 procedure Finalize
(Object
: in out Pattern
);
1162 -- Finalization routine used to release storage allocated for a pattern.
1164 type VString_Ptr
is access all VString
;
1166 type Match_Result
is record
1168 -- Pointer to subject string. Set to null if match failed.
1170 Start
: Natural := 1;
1171 -- Starting index position (1's origin) of matched section of
1172 -- subject string. Only valid if Var is non-null.
1174 Stop
: Natural := 0;
1175 -- Ending index position (1's origin) of matched section of
1176 -- subject string. Only valid if Var is non-null.
1180 pragma Volatile
(Match_Result
);
1181 -- This ensures that the Result parameter is passed by reference, so
1182 -- that we can play our games with the bogus Match_Result_Var parameter
1183 -- in the function case to treat it as though it were an in out parameter.
1185 end GNAT
.Spitbol
.Patterns
;