Add an UNSPEC_PROLOGUE_USE to prevent the link register from being considered dead.
[official-gcc.git] / gcc / ada / g-spipat.ads
blob2d1d91e86cedc3ac69b1a6b2eb14533e2d15ff37
1 ------------------------------------------------------------------------------
2 -- --
3 -- GNAT LIBRARY COMPONENTS --
4 -- --
5 -- G N A T . S P I T B O L . P A T T E R N S --
6 -- --
7 -- S p e c --
8 -- --
9 -- --
10 -- Copyright (C) 1997-1999 Ada Core Technologies, Inc. --
11 -- --
12 -- GNAT is free software; you can redistribute it and/or modify it under --
13 -- terms of the GNU General Public License as published by the Free Soft- --
14 -- ware Foundation; either version 2, or (at your option) any later ver- --
15 -- sion. GNAT is distributed in the hope that it will be useful, but WITH- --
16 -- OUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY --
17 -- or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License --
18 -- for more details. You should have received a copy of the GNU General --
19 -- Public License distributed with GNAT; see file COPYING. If not, write --
20 -- to the Free Software Foundation, 59 Temple Place - Suite 330, Boston, --
21 -- MA 02111-1307, USA. --
22 -- --
23 -- As a special exception, if other files instantiate generics from this --
24 -- unit, or you link this unit with other files to produce an executable, --
25 -- this unit does not by itself cause the resulting executable to be --
26 -- covered by the GNU General Public License. This exception does not --
27 -- however invalidate any other reasons why the executable file might be --
28 -- covered by the GNU Public License. --
29 -- --
30 -- GNAT is maintained by Ada Core Technologies Inc (http://www.gnat.com). --
31 -- --
32 ------------------------------------------------------------------------------
34 -- SPITBOL-like pattern construction and matching
36 -- This child package of GNAT.SPITBOL provides a complete implementation
37 -- of the SPITBOL-like pattern construction and matching operations. This
38 -- package is based on Macro-SPITBOL created by Robert Dewar.
40 ------------------------------------------------------------
41 -- Summary of Pattern Matching Packages in GNAT Hierarchy --
42 ------------------------------------------------------------
44 -- There are three related packages that perform pattern maching functions.
45 -- the following is an outline of these packages, to help you determine
46 -- which is best for your needs.
48 -- GNAT.Regexp (files g-regexp.ads/g-regexp.adb)
49 -- This is a simple package providing Unix-style regular expression
50 -- matching with the restriction that it matches entire strings. It
51 -- is particularly useful for file name matching, and in particular
52 -- it provides "globbing patterns" that are useful in implementing
53 -- unix or DOS style wild card matching for file names.
55 -- GNAT.Regpat (files g-regpat.ads/g-regpat.adb)
56 -- This is a more complete implementation of Unix-style regular
57 -- expressions, copied from the original V7 style regular expression
58 -- library written in C by Henry Spencer. It is functionally the
59 -- same as this library, and uses the same internal data structures
60 -- stored in a binary compatible manner.
62 -- GNAT.Spitbol.Patterns (files g-spipat.ads/g-spipat.adb)
63 -- This is a completely general patterm matching package based on the
64 -- pattern language of SNOBOL4, as implemented in SPITBOL. The pattern
65 -- language is modeled on context free grammars, with context sensitive
66 -- extensions that provide full (type 0) computational capabilities.
68 with Ada.Finalization; use Ada.Finalization;
69 with Ada.Strings.Maps; use Ada.Strings.Maps;
70 with Ada.Text_IO; use Ada.Text_IO;
72 package GNAT.Spitbol.Patterns is
73 pragma Elaborate_Body (Patterns);
75 -------------------------------
76 -- Pattern Matching Tutorial --
77 -------------------------------
79 -- A pattern matching operation (a call to one of the Match subprograms)
80 -- takes a subject string and a pattern, and optionally a replacement
81 -- string. The replacement string option is only allowed if the subject
82 -- is a variable.
84 -- The pattern is matched against the subject string, and either the
85 -- match fails, or it succeeds matching a contiguous substring. If a
86 -- replacement string is specified, then the subject string is modified
87 -- by replacing the matched substring with the given replacement.
90 -- Concatenation and Alternation
91 -- =============================
93 -- A pattern consists of a series of pattern elements. The pattern is
94 -- built up using either the concatenation operator:
96 -- A & B
98 -- which means match A followed immediately by matching B, or the
99 -- alternation operator:
101 -- A or B
103 -- which means first attempt to match A, and then if that does not
104 -- succeed, match B.
106 -- There is full backtracking, which means that if a given pattern
107 -- element fails to match, then previous alternatives are matched.
108 -- For example if we have the pattern:
110 -- (A or B) & (C or D) & (E or F)
112 -- First we attempt to match A, if that succeeds, then we go on to try
113 -- to match C, and if that succeeds, we go on to try to match E. If E
114 -- fails, then we try F. If F fails, then we go back and try matching
115 -- D instead of C. Let's make this explicit using a specific example,
116 -- and introducing the simplest kind of pattern element, which is a
117 -- literal string. The meaning of this pattern element is simply to
118 -- match the characters that correspond to the string characters. Now
119 -- let's rewrite the above pattern form with specific string literals
120 -- as the pattern elements:
122 -- ("ABC" or "AB") & ("DEF" or "CDE") & ("GH" or "IJ")
124 -- The following strings will be attempted in sequence:
126 -- ABC . DEF . GH
127 -- ABC . DEF . IJ
128 -- ABC . CDE . GH
129 -- ABC . CDE . IJ
130 -- AB . DEF . GH
131 -- AB . DEF . IJ
132 -- AB . CDE . GH
133 -- AB . CDE . IJ
135 -- Here we use the dot simply to separate the pieces of the string
136 -- matched by the three separate elements.
139 -- Moving the Start Point
140 -- ======================
142 -- A pattern is not required to match starting at the first character
143 -- of the string, and is not required to match to the end of the string.
144 -- The first attempt does indeed attempt to match starting at the first
145 -- character of the string, trying all the possible alternatives. But
146 -- if all alternatives fail, then the starting point of the match is
147 -- moved one character, and all possible alternatives are attempted at
148 -- the new anchor point.
150 -- The entire match fails only when every possible starting point has
151 -- been attempted. As an example, suppose that we had the subject
152 -- string
154 -- "ABABCDEIJKL"
156 -- matched using the pattern in the previous example:
158 -- ("ABC" or "AB") & ("DEF" or "CDE") & ("GH" or "IJ")
160 -- would succeed, afer two anchor point moves:
162 -- "ABABCDEIJKL"
163 -- ^^^^^^^
164 -- matched
165 -- section
167 -- This mode of pattern matching is called the unanchored mode. It is
168 -- also possible to put the pattern matcher into anchored mode by
169 -- setting the global variable Anchored_Mode to True. This will cause
170 -- all subsequent matches to be performed in anchored mode, where the
171 -- match is required to start at the first character.
173 -- We will also see later how the effect of an anchored match can be
174 -- obtained for a single specified anchor point if this is desired.
177 -- Other Pattern Elements
178 -- ======================
180 -- In addition to strings (or single characters), there are many special
181 -- pattern elements that correspond to special predefined alternations:
183 -- Arb Matches any string. First it matches the null string, and
184 -- then on a subsequent failure, matches one character, and
185 -- then two characters, and so on. It only fails if the
186 -- entire remaining string is matched.
188 -- Bal Matches a non-empty string that is parentheses balanced
189 -- with respect to ordinary () characters. Examples of
190 -- balanced strings are "ABC", "A((B)C)", and "A(B)C(D)E".
191 -- Bal matches the shortest possible balanced string on the
192 -- first attempt, and if there is a subsequent failure,
193 -- attempts to extend the string.
195 -- Cancel Immediately aborts the entire pattern match, signalling
196 -- failure. This is a specialized pattern element, which is
197 -- useful in conjunction with some of the special pattern
198 -- elements that have side effects.
200 -- Fail The null alternation. Matches no possible strings, so it
201 -- always signals failure. This is a specialized pattern
202 -- element, which is useful in conjunction with some of the
203 -- special pattern elements that have side effects.
205 -- Fence Matches the null string at first, and then if a failure
206 -- causes alternatives to be sought, aborts the match (like
207 -- a Cancel). Note that using Fence at the start of a pattern
208 -- has the same effect as matching in anchored mode.
210 -- Rest Matches from the current point to the last character in
211 -- the string. This is a specialized pattern element, which
212 -- is useful in conjunction with some of the special pattern
213 -- elements that have side effects.
215 -- Succeed Repeatedly matches the null string (it is equivalent to
216 -- the alternation ("" or "" or "" ....). This is a special
217 -- pattern element, which is useful in conjunction with some
218 -- of the special pattern elements that have side effects.
221 -- Pattern Construction Functions
222 -- ==============================
224 -- The following functions construct additional pattern elements
226 -- Any(S) Where S is a string, matches a single character that is
227 -- any one of the characters in S. Fails if the current
228 -- character is not one of the given set of characters.
230 -- Arbno(P) Where P is any pattern, matches any number of instances
231 -- of the pattern, starting with zero occurrences. It is
232 -- thus equivalent to ("" or (P & ("" or (P & ("" ....)))).
233 -- The pattern P may contain any number of pattern elements
234 -- including the use of alternatiion and concatenation.
236 -- Break(S) Where S is a string, matches a string of zero or more
237 -- characters up to but not including a break character
238 -- that is one of the characters given in the string S.
239 -- Can match the null string, but cannot match the last
240 -- character in the string, since a break character is
241 -- required to be present.
243 -- BreakX(S) Where S is a string, behaves exactly like Break(S) when
244 -- it first matches, but if a string is successfully matched,
245 -- then a susequent failure causes an attempt to extend the
246 -- matched string.
248 -- Fence(P) Where P is a pattern, attempts to match the pattern P
249 -- including trying all possible alternatives of P. If none
250 -- of these alternatives succeeds, then the Fence pattern
251 -- fails. If one alternative succeeds, then the pattern
252 -- match proceeds, but on a subsequent failure, no attempt
253 -- is made to search for alternative matches of P. The
254 -- pattern P may contain any number of pattern elements
255 -- including the use of alternatiion and concatenation.
257 -- Len(N) Where N is a natural number, matches the given number of
258 -- characters. For example, Len(10) matches any string that
259 -- is exactly ten characters long.
261 -- NotAny(S) Where S is a string, matches a single character that is
262 -- not one of the characters of S. Fails if the current
263 -- characer is one of the given set of characters.
265 -- NSpan(S) Where S is a string, matches a string of zero or more
266 -- characters that is among the characters given in the
267 -- string. Always matches the longest possible such string.
268 -- Always succeeds, since it can match the null string.
270 -- Pos(N) Where N is a natural number, matches the null string
271 -- if exactly N characters have been matched so far, and
272 -- otherwise fails.
274 -- Rpos(N) Where N is a natural number, matches the null string
275 -- if exactly N characters remain to be matched, and
276 -- otherwise fails.
278 -- Rtab(N) Where N is a natural number, matches characters from
279 -- the current position until exactly N characters remain
280 -- to be matched in the string. Fails if fewer than N
281 -- unmatched characters remain in the string.
283 -- Tab(N) Where N is a natural number, matches characters from
284 -- the current position until exactly N characters have
285 -- been matched in all. Fails if more than N characters
286 -- have already been matched.
288 -- Span(S) Where S is a string, matches a string of one or more
289 -- characters that is among the characters given in the
290 -- string. Always matches the longest possible such string.
291 -- Fails if the current character is not one of the given
292 -- set of characters.
294 -- Recursive Pattern Matching
295 -- ==========================
297 -- The plus operator (+P) where P is a pattern variable, creates
298 -- a recursive pattern that will, at pattern matching time, follow
299 -- the pointer to obtain the referenced pattern, and then match this
300 -- pattern. This may be used to construct recursive patterns. Consider
301 -- for example:
303 -- P := ("A" or ("B" & (+P)))
305 -- On the first attempt, this pattern attempts to match the string "A".
306 -- If this fails, then the alternative matches a "B", followed by an
307 -- attempt to match P again. This second attempt first attempts to
308 -- match "A", and so on. The result is a pattern that will match a
309 -- string of B's followed by a single A.
311 -- This particular example could simply be written as NSpan('B') & 'A',
312 -- but the use of recursive patterns in the general case can construct
313 -- complex patterns which could not otherwise be built.
316 -- Pattern Assignment Operations
317 -- =============================
319 -- In addition to the overall result of a pattern match, which indicates
320 -- success or failure, it is often useful to be able to keep track of
321 -- the pieces of the subject string that are matched by individual
322 -- pattern elements, or subsections of the pattern.
324 -- The pattern assignment operators allow this capability. The first
325 -- form is the immediate assignment:
327 -- P * S
329 -- Here P is an arbitrary pattern, and S is a variable of type VString
330 -- that will be set to the substring matched by P. This assignment
331 -- happens during pattern matching, so if P matches more than once,
332 -- then the assignment happens more than once.
334 -- The deferred assignment operation:
336 -- P ** S
338 -- avoids these multiple assignments by deferring the assignment to the
339 -- end of the match. If the entire match is successful, and if the
340 -- pattern P was part of the successful match, then at the end of the
341 -- matching operation the assignment to S of the string matching P is
342 -- performed.
344 -- The cursor assignment operation:
346 -- Setcur(N'Access)
348 -- assigns the current cursor position to the natural variable N. The
349 -- cursor position is defined as the count of characters that have been
350 -- matched so far (including any start point moves).
352 -- Finally the operations * and ** may be used with values of type
353 -- Text_IO.File_Access. The effect is to do a Put_Line operation of
354 -- the matched substring. These are particularly useful in debugging
355 -- pattern matches.
358 -- Deferred Matching
359 -- =================
361 -- The pattern construction functions (such as Len and Any) all permit
362 -- the use of pointers to natural or string values, or functions that
363 -- return natural or string values. These forms cause the actual value
364 -- to be obtained at pattern matching time. This allows interesting
365 -- possibilities for constructing dynamic patterns as illustrated in
366 -- the examples section.
368 -- In addition the (+S) operator may be used where S is a pointer to
369 -- string or function returning string, with a similar deferred effect.
371 -- A special use of deferred matching is the construction of predicate
372 -- functions. The element (+P) where P is an access to a function that
373 -- returns a Boolean value, causes the function to be called at the
374 -- time the element is matched. If the function returns True, then the
375 -- null string is matched, if the function returns False, then failure
376 -- is signalled and previous alternatives are sought.
378 -- Deferred Replacement
379 -- ====================
381 -- The simple model given for pattern replacement (where the matched
382 -- substring is replaced by the string given as the third argument to
383 -- Match) works fine in simple cases, but this approach does not work
384 -- in the case where the expression used as the replacement string is
385 -- dependent on values set by the match.
387 -- For example, suppose we want to find an instance of a parenthesized
388 -- character, and replace the parentheses with square brackets. At first
389 -- glance it would seem that:
391 -- Match (Subject, '(' & Len (1) * Char & ')', '[' & Char & ']');
393 -- would do the trick, but that does not work, because the third
394 -- argument to Match gets evaluated too early, before the call to
395 -- Match, and before the pattern match has had a chance to set Char.
397 -- To solve this problem we provide the deferred replacement capability.
398 -- With this approach, which of course is only needed if the pattern
399 -- involved has side effects, is to do the match in two stages. The
400 -- call to Match sets a pattern result in a variable of the private
401 -- type Match_Result, and then a subsequent Replace operation uses
402 -- this Match_Result object to perform the required replacement.
404 -- Using this approach, we can now write the above operation properly
405 -- in a manner that will work:
407 -- M : Match_Result;
408 -- ...
409 -- Match (Subject, '(' & Len (1) * Char & ')', M);
410 -- Replace (M, '[' & Char & ']');
412 -- As with other Match cases, there is a function and procedure form
413 -- of this match call. A call to Replace after a failed match has no
414 -- effect. Note that Subject should not be modified between the calls.
416 -- Examples of Pattern Matching
417 -- ============================
419 -- First a simple example of the use of pattern replacement to remove
420 -- a line number from the start of a string. We assume that the line
421 -- number has the form of a string of decimal digits followed by a
422 -- period, followed by one or more spaces.
424 -- Digs : constant Pattern := Span("0123456789");
426 -- Lnum : constant Pattern := Pos(0) & Digs & '.' & Span(' ');
428 -- Now to use this pattern we simply do a match with a replacement:
430 -- Match (Line, Lnum, "");
432 -- which replaces the line number by the null string. Note that it is
433 -- also possible to use an Ada.Strings.Maps.Character_Set value as an
434 -- argument to Span and similar functions, and in particular all the
435 -- useful constants 'in Ada.Strings.Maps.Constants are available. This
436 -- means that we could define Digs as:
438 -- Digs : constant Pattern := Span(Decimal_Digit_Set);
440 -- The style we use here, of defining constant patterns and then using
441 -- them is typical. It is possible to build up patterns dynamically,
442 -- but it is usually more efficient to build them in pieces in advance
443 -- using constant declarations. Note in particular that although it is
444 -- possible to construct a pattern directly as an argument for the
445 -- Match routine, it is much more efficient to preconstruct the pattern
446 -- as we did in this example.
448 -- Now let's look at the use of pattern assignment to break a
449 -- string into sections. Suppose that the input string has two
450 -- unsigned decimal integers, separated by spaces or a comma,
451 -- with spaces allowed anywhere. Then we can isolate the two
452 -- numbers with the following pattern:
454 -- Num1, Num2 : aliased VString;
456 -- B : constant Pattern := NSpan(' ');
458 -- N : constant Pattern := Span("0123456789");
460 -- T : constant Pattern :=
461 -- NSpan(' ') & N * Num1 & Span(" ,") & N * Num2;
463 -- The match operation Match (" 124, 257 ", T) would assign the
464 -- string 124 to Num1 and the string 257 to Num2.
466 -- Now let's see how more complex elements can be built from the
467 -- set of primitive elements. The following pattern matches strings
468 -- that have the syntax of Ada 95 based literals:
470 -- Digs : constant Pattern := Span(Decimal_Digit_Set);
471 -- UDigs : constant Pattern := Digs & Arbno('_' & Digs);
473 -- Edig : constant Pattern := Span(Hexadecimal_Digit_Set);
474 -- UEdig : constant Pattern := Edig & Arbno('_' & Edig);
476 -- Bnum : constant Pattern := Udigs & '#' & UEdig & '#';
478 -- A match against Bnum will now match the desired strings, e.g.
479 -- it will match 16#123_abc#, but not a#b#. However, this pattern
480 -- is not quite complete, since it does not allow colons to replace
481 -- the pound signs. The following is more complete:
483 -- Bchar : constant Pattern := Any("#:");
484 -- Bnum : constant Pattern := Udigs & Bchar & UEdig & Bchar;
486 -- but that is still not quite right, since it allows # and : to be
487 -- mixed, and they are supposed to be used consistently. We solve
488 -- this by using a deferred match.
490 -- Temp : aliased VString;
492 -- Bnum : constant Pattern :=
493 -- Udigs & Bchar * Temp & UEdig & (+Temp)
495 -- Here the first instance of the base character is stored in Temp, and
496 -- then later in the pattern we rematch the value that was assigned.
498 -- For an example of a recursive pattern, let's define a pattern
499 -- that is like the built in Bal, but the string matched is balanced
500 -- with respect to square brackets or curly brackets.
502 -- The language for such strings might be defined in extended BNF as
504 -- ELEMENT ::= <any character other than [] or {}>
505 -- | '[' BALANCED_STRING ']'
506 -- | '{' BALANCED_STRING '}'
508 -- BALANCED_STRING ::= ELEMENT {ELEMENT}
510 -- Here we use {} to indicate zero or more occurrences of a term, as
511 -- is common practice in extended BNF. Now we can translate the above
512 -- BNF into recursive patterns as follows:
514 -- Element, Balanced_String : aliased Pattern;
515 -- .
516 -- .
517 -- .
518 -- Element := NotAny ("[]{}")
519 -- or
520 -- ('[' & (+Balanced_String) & ']')
521 -- or
522 -- ('{' & (+Balanced_String) & '}');
524 -- Balanced_String := Element & Arbno (Element);
526 -- Note the important use of + here to refer to a pattern not yet
527 -- defined. Note also that we use assignments precisely because we
528 -- cannot refer to as yet undeclared variables in initializations.
530 -- Now that this pattern is constructed, we can use it as though it
531 -- were a new primitive pattern element, and for example, the match:
533 -- Match ("xy[ab{cd}]", Balanced_String * Current_Output & Fail);
535 -- will generate the output:
537 -- x
538 -- xy
539 -- xy[ab{cd}]
540 -- y
541 -- y[ab{cd}]
542 -- [ab{cd}]
543 -- a
544 -- ab
545 -- ab{cd}
546 -- b
547 -- b{cd}
548 -- {cd}
549 -- c
550 -- cd
551 -- d
553 -- Note that the function of the fail here is simply to force the
554 -- pattern Balanced_String to match all possible alternatives. Studying
555 -- the operation of this pattern in detail is highly instructive.
557 -- Finally we give a rather elaborate example of the use of deferred
558 -- matching. The following declarations build up a pattern which will
559 -- find the longest string of decimal digits in the subject string.
561 -- Max, Cur : VString;
562 -- Loc : Natural;
564 -- function GtS return Boolean is
565 -- begin
566 -- return Length (Cur) > Length (Max);
567 -- end GtS;
569 -- Digit : constant Character_Set := Decimal_Digit_Set;
571 -- Digs : constant Pattern := Span(Digit);
573 -- Find : constant Pattern :=
574 -- "" * Max & Fence & -- initialize Max to null
575 -- BreakX (Digit) & -- scan looking for digits
576 -- ((Span(Digit) * Cur & -- assign next string to Cur
577 -- (+GtS'Unrestricted_Access) & -- check size(Cur) > Size(Max)
578 -- Setcur(Loc'Access)) -- if so, save location
579 -- * Max) & -- and assign to Max
580 -- Fail; -- seek all alternatives
582 -- As we see from the comments here, complex patterns like this take
583 -- on aspects of sequential programs. In fact they are sequential
584 -- programs with general backtracking. In this pattern, we first use
585 -- a pattern assignment that matches null and assigns it to Max, so
586 -- that it is initialized for the new match. Now BreakX scans to the
587 -- next digit. Arb would do here, but BreakX will be more efficient.
588 -- Once we have found a digit, we scan out the longest string of
589 -- digits with Span, and assign it to Cur. The deferred call to GtS
590 -- tests if the string we assigned to Cur is the longest so far. If
591 -- not, then failure is signalled, and we seek alternatives (this
592 -- means that BreakX will extend and look for the next digit string).
593 -- If the call to GtS succeeds then the matched string is assigned
594 -- as the largest string so far into Max and its location is saved
595 -- in Loc. Finally Fail forces the match to fail and seek alternatives,
596 -- so that the entire string is searched.
598 -- If the pattern Find is matched against a string, the variable Max
599 -- at the end of the pattern will have the longest string of digits,
600 -- and Loc will be the starting character location of the string. For
601 -- example, Match("ab123cd4657ef23", Find) will assign "4657" to Max
602 -- and 11 to Loc (indicating that the string ends with the eleventh
603 -- character of the string).
605 -- Note: the use of Unrestricted_Access to reference GtS will not
606 -- be needed if GtS is defined at the outer level, but definitely
607 -- will be necessary if GtS is a nested function (in which case of
608 -- course the scope of the pattern Find will be restricted to this
609 -- nested scope, and this cannot be checked, i.e. use of the pattern
610 -- outside this scope is erroneous). Generally it is a good idea to
611 -- define patterns and the functions they call at the outer level
612 -- where possible, to avoid such problems.
615 -- Correspondence with Pattern Matching in SPITBOL
616 -- ===============================================
618 -- Generally the Ada syntax and names correspond closely to SPITBOL
619 -- syntax for pattern matching construction.
621 -- The basic pattern construction operators are renamed as follows:
623 -- Spitbol Ada
625 -- (space) &
626 -- | or
627 -- $ *
628 -- . **
630 -- The Ada operators were chosen so that the relative precedences of
631 -- these operators corresponds to that of the Spitbol operators, but
632 -- as always, the use of parentheses is advisable to clarify.
634 -- The pattern construction operators all have similar names except for
636 -- Spitbol Ada
638 -- Abort Cancel
639 -- Rem Rest
641 -- where we have clashes with Ada reserved names.
643 -- Ada requires the use of 'Access to refer to functions used in the
644 -- pattern match, and often the use of 'Unrestricted_Access may be
645 -- necessary to get around the scope restrictions if the functions
646 -- are not declared at the outer level.
648 -- The actual pattern matching syntax is modified in Ada as follows:
650 -- Spitbol Ada
652 -- X Y Match (X, Y);
653 -- X Y = Z Match (X, Y, Z);
655 -- and pattern failure is indicated by returning a Boolean result from
656 -- the Match function (True for success, False for failure).
658 -----------------------
659 -- Type Declarations --
660 -----------------------
662 type Pattern is private;
663 -- Type representing a pattern. This package provides a complete set of
664 -- operations for constructing patterns that can be used in the pattern
665 -- matching operations provided.
667 type Boolean_Func is access function return Boolean;
668 -- General Boolean function type. When this type is used as a formal
669 -- parameter type in this package, it indicates a deferred predicate
670 -- pattern. The function will be called when the pattern element is
671 -- matched and failure signalled if False is returned.
673 type Natural_Func is access function return Natural;
674 -- General Natural function type. When this type is used as a formal
675 -- parameter type in this package, it indicates a deferred pattern.
676 -- The function will be called when the pattern element is matched
677 -- to obtain the currently referenced Natural value.
679 type VString_Func is access function return VString;
680 -- General VString function type. When this type is used as a formal
681 -- parameter type in this package, it indicates a deferred pattern.
682 -- The function will be called when the pattern element is matched
683 -- to obtain the currently referenced string value.
685 subtype PString is String;
686 -- This subtype is used in the remainder of the package to indicate a
687 -- formal parameter that is converted to its corresponding pattern,
688 -- i.e. a pattern that matches the characters of the string.
690 subtype PChar is Character;
691 -- Similarly, this subtype is used in the remainder of the package to
692 -- indicate a formal parameter that is converted to its corresponding
693 -- pattern, i.e. a pattern that matches this one character.
695 subtype VString_Var is VString;
696 subtype Pattern_Var is Pattern;
697 -- These synonyms are used as formal parameter types to a function where,
698 -- if the language allowed, we would use in out parameters, but we are
699 -- not allowed to have in out parameters for functions. Instead we pass
700 -- actuals which must be variables, and with a bit of trickery in the
701 -- body, manage to interprete them properly as though they were indeed
702 -- in out parameters.
704 --------------------------------
705 -- Basic Pattern Construction --
706 --------------------------------
708 function "&" (L : Pattern; R : Pattern) return Pattern;
709 function "&" (L : PString; R : Pattern) return Pattern;
710 function "&" (L : Pattern; R : PString) return Pattern;
711 function "&" (L : PChar; R : Pattern) return Pattern;
712 function "&" (L : Pattern; R : PChar) return Pattern;
714 -- Pattern concatenation. Matches L followed by R.
716 function "or" (L : Pattern; R : Pattern) return Pattern;
717 function "or" (L : PString; R : Pattern) return Pattern;
718 function "or" (L : Pattern; R : PString) return Pattern;
719 function "or" (L : PString; R : PString) return Pattern;
720 function "or" (L : PChar; R : Pattern) return Pattern;
721 function "or" (L : Pattern; R : PChar) return Pattern;
722 function "or" (L : PChar; R : PChar) return Pattern;
723 function "or" (L : PString; R : PChar) return Pattern;
724 function "or" (L : PChar; R : PString) return Pattern;
725 -- Pattern alternation. Creates a pattern that will first try to match
726 -- L and then on a subsequent failure, attempts to match R instead.
728 ----------------------------------
729 -- Pattern Assignment Functions --
730 ----------------------------------
732 function "*" (P : Pattern; Var : VString_Var) return Pattern;
733 function "*" (P : PString; Var : VString_Var) return Pattern;
734 function "*" (P : PChar; Var : VString_Var) return Pattern;
735 -- Matches P, and if the match succeeds, assigns the matched substring
736 -- to the given VString variable S. This assignment happens as soon as
737 -- the substring is matched, and if the pattern P1 is matched more than
738 -- once during the course of the match, then the assignment will occur
739 -- more than once.
741 function "**" (P : Pattern; Var : VString_Var) return Pattern;
742 function "**" (P : PString; Var : VString_Var) return Pattern;
743 function "**" (P : PChar; Var : VString_Var) return Pattern;
744 -- Like "*" above, except that the assignment happens at most once
745 -- after the entire match is completed successfully. If the match
746 -- fails, then no assignment takes place.
748 ----------------------------------
749 -- Deferred Matching Operations --
750 ----------------------------------
752 function "+" (Str : VString_Var) return Pattern;
753 -- Here Str must be a VString variable. This function constructs a
754 -- pattern which at pattern matching time will access the current
755 -- value of this variable, and match against these characters.
757 function "+" (Str : VString_Func) return Pattern;
758 -- Constructs a pattern which at pattern matching time calls the given
759 -- function, and then matches against the string or character value
760 -- that is returned by the call.
762 function "+" (P : Pattern_Var) return Pattern;
763 -- Here P must be a Pattern variable. This function constructs a
764 -- pattern which at pattern matching time will access the current
765 -- value of this variable, and match against the pattern value.
767 function "+" (P : Boolean_Func) return Pattern;
768 -- Constructs a predicate pattern function that at pattern matching time
769 -- calls the given function. If True is returned, then the pattern matches.
770 -- If False is returned, then failure is signalled.
772 --------------------------------
773 -- Pattern Building Functions --
774 --------------------------------
776 function Arb return Pattern;
777 -- Constructs a pattern that will match any string. On the first attempt,
778 -- the pattern matches a null string, then on each successive failure, it
779 -- matches one more character, and only fails if matching the entire rest
780 -- of the string.
782 function Arbno (P : Pattern) return Pattern;
783 function Arbno (P : PString) return Pattern;
784 function Arbno (P : PChar) return Pattern;
785 -- Pattern repetition. First matches null, then on a subsequent failure
786 -- attempts to match an additional instance of the given pattern.
787 -- Equivalent to (but more efficient than) P & ("" or (P & ("" or ...
789 function Any (Str : String) return Pattern;
790 function Any (Str : VString) return Pattern;
791 function Any (Str : Character) return Pattern;
792 function Any (Str : Character_Set) return Pattern;
793 function Any (Str : access VString) return Pattern;
794 function Any (Str : VString_Func) return Pattern;
795 -- Constructs a pattern that matches a single character that is one of
796 -- the characters in the given argument. The pattern fails if the current
797 -- character is not in Str.
799 function Bal return Pattern;
800 -- Constructs a pattern that will match any non-empty string that is
801 -- parentheses balanced with respect to the normal parentheses characters.
802 -- Attempts to extend the string if a subsequent failure occurs.
804 function Break (Str : String) return Pattern;
805 function Break (Str : VString) return Pattern;
806 function Break (Str : Character) return Pattern;
807 function Break (Str : Character_Set) return Pattern;
808 function Break (Str : access VString) return Pattern;
809 function Break (Str : VString_Func) return Pattern;
810 -- Constructs a pattern that matches a (possibly null) string which
811 -- is immediately followed by a character in the given argument. This
812 -- character is not part of the matched string. The pattern fails if
813 -- the remaining characters to be matched do not include any of the
814 -- characters in Str.
816 function BreakX (Str : String) return Pattern;
817 function BreakX (Str : VString) return Pattern;
818 function BreakX (Str : Character) return Pattern;
819 function BreakX (Str : Character_Set) return Pattern;
820 function BreakX (Str : access VString) return Pattern;
821 function BreakX (Str : VString_Func) return Pattern;
822 -- Like Break, but the pattern attempts to extend on a failure to find
823 -- the next occurrence of a character in Str, and only fails when the
824 -- last such instance causes a failure.
826 function Cancel return Pattern;
827 -- Constructs a pattern that immediately aborts the entire match
829 function Fail return Pattern;
830 -- Constructs a pattern that always fails.
832 function Fence return Pattern;
833 -- Constructs a pattern that matches null on the first attempt, and then
834 -- causes the entire match to be aborted if a subsequent failure occurs.
836 function Fence (P : Pattern) return Pattern;
837 -- Constructs a pattern that first matches P. if P fails, then the
838 -- constructed pattern fails. If P succeeds, then the match proceeds,
839 -- but if subsequent failure occurs, alternatives in P are not sought.
840 -- The idea of Fence is that each time the pattern is matched, just
841 -- one attempt is made to match P, without trying alternatives.
843 function Len (Count : Natural) return Pattern;
844 function Len (Count : access Natural) return Pattern;
845 function Len (Count : Natural_Func) return Pattern;
846 -- Constructs a pattern that matches exactly the given number of
847 -- characters. The pattern fails if fewer than this number of characters
848 -- remain to be matched in the string.
850 function NotAny (Str : String) return Pattern;
851 function NotAny (Str : VString) return Pattern;
852 function NotAny (Str : Character) return Pattern;
853 function NotAny (Str : Character_Set) return Pattern;
854 function NotAny (Str : access VString) return Pattern;
855 function NotAny (Str : VString_Func) return Pattern;
856 -- Constructs a pattern that matches a single character that is not
857 -- one of the characters in the given argument. The pattern Fails if
858 -- the current character is in Str.
860 function NSpan (Str : String) return Pattern;
861 function NSpan (Str : VString) return Pattern;
862 function NSpan (Str : Character) return Pattern;
863 function NSpan (Str : Character_Set) return Pattern;
864 function NSpan (Str : access VString) return Pattern;
865 function NSpan (Str : VString_Func) return Pattern;
866 -- Constructs a pattern that matches the longest possible string
867 -- consisting entirely of characters from the given argument. The
868 -- string may be empty, so this pattern always succeeds.
870 function Pos (Count : Natural) return Pattern;
871 function Pos (Count : access Natural) return Pattern;
872 function Pos (Count : Natural_Func) return Pattern;
873 -- Constructs a pattern that matches the null string if exactly Count
874 -- characters have already been matched, and otherwise fails.
876 function Rest return Pattern;
877 -- Constructs a pattern that always succeeds, matching the remaining
878 -- unmatched characters in the pattern.
880 function Rpos (Count : Natural) return Pattern;
881 function Rpos (Count : access Natural) return Pattern;
882 function Rpos (Count : Natural_Func) return Pattern;
883 -- Constructs a pattern that matches the null string if exactly Count
884 -- characters remain to be matched in the string, and otherwise fails.
886 function Rtab (Count : Natural) return Pattern;
887 function Rtab (Count : access Natural) return Pattern;
888 function Rtab (Count : Natural_Func) return Pattern;
889 -- Constructs a pattern that matches from the current location until
890 -- exactly Count characters remain to be matched in the string. The
891 -- pattern fails if fewer than Count characters remain to be matched.
893 function Setcur (Var : access Natural) return Pattern;
894 -- Constructs a pattern that matches the null string, and assigns the
895 -- current cursor position in the string. This value is the number of
896 -- characters matched so far. So it is zero at the start of the match.
898 function Span (Str : String) return Pattern;
899 function Span (Str : VString) return Pattern;
900 function Span (Str : Character) return Pattern;
901 function Span (Str : Character_Set) return Pattern;
902 function Span (Str : access VString) return Pattern;
903 function Span (Str : VString_Func) return Pattern;
904 -- Constructs a pattern that matches the longest possible string
905 -- consisting entirely of characters from the given argument. The
906 -- string cannot be empty , so the pattern fails if the current
907 -- character is not one of the characters in Str.
909 function Succeed return Pattern;
910 -- Constructs a pattern that succeeds matching null, both on the first
911 -- attempt, and on any rematch attempt, i.e. it is equivalent to an
912 -- infinite alternation of null strings.
914 function Tab (Count : Natural) return Pattern;
915 function Tab (Count : access Natural) return Pattern;
916 function Tab (Count : Natural_Func) return Pattern;
917 -- Constructs a pattern that from the current location until Count
918 -- characters have been matched. The pattern fails if more than Count
919 -- characters have already been matched.
921 ---------------------------------
922 -- Pattern Matching Operations --
923 ---------------------------------
925 -- The Match function performs an actual pattern matching operation.
926 -- The versions with three parameters perform a match without modifying
927 -- the subject string and return a Boolean result indicating if the
928 -- match is successful or not. The Anchor parameter is set to True to
929 -- obtain an anchored match in which the pattern is required to match
930 -- the first character of the string. In an unanchored match, which is
932 -- the default, successive attempts are made to match the given pattern
933 -- at each character of the subject string until a match succeeds, or
934 -- until all possibilities have failed.
936 -- Note that pattern assignment functions in the pattern may generate
937 -- side effects, so these functions are not necessarily pure.
939 Anchored_Mode : Boolean := False;
940 -- This global variable can be set True to cause all subsequent pattern
941 -- matches to operate in anchored mode. In anchored mode, no attempt is
942 -- made to move the anchor point, so that if the match succeeds it must
943 -- succeed starting at the first character. Note that the effect of
944 -- anchored mode may be achieved in individual pattern matches by using
945 -- Fence or Pos(0) at the start of the pattern.
947 Pattern_Stack_Overflow : exception;
948 -- Exception raised if internal pattern matching stack overflows. This
949 -- is typically the result of runaway pattern recursion. If there is a
950 -- genuine case of stack overflow, then either the match must be broken
951 -- down into simpler steps, or the stack limit must be reset.
953 Stack_Size : constant Positive := 2000;
954 -- Size used for internal pattern matching stack. Increase this size if
955 -- complex patterns cause Pattern_Stack_Overflow to be raised.
957 -- Simple match functions. The subject is matched against the pattern.
958 -- Any immediate or deferred assignments or writes are executed, and
959 -- the returned value indicates whether or not the match succeeded.
961 function Match
962 (Subject : VString;
963 Pat : Pattern)
964 return Boolean;
966 function Match
967 (Subject : VString;
968 Pat : PString)
969 return Boolean;
971 function Match
972 (Subject : String;
973 Pat : Pattern)
974 return Boolean;
976 function Match
977 (Subject : String;
978 Pat : PString)
979 return Boolean;
981 -- Replacement functions. The subject is matched against the pattern.
982 -- Any immediate or deferred assignments or writes are executed, and
983 -- the returned value indicates whether or not the match succeeded.
984 -- If the match succeeds, then the matched part of the subject string
985 -- is replaced by the given Replace string.
987 function Match
988 (Subject : VString_Var;
989 Pat : Pattern;
990 Replace : VString)
991 return Boolean;
993 function Match
994 (Subject : VString_Var;
995 Pat : PString;
996 Replace : VString)
997 return Boolean;
999 function Match
1000 (Subject : VString_Var;
1001 Pat : Pattern;
1002 Replace : String)
1003 return Boolean;
1005 function Match
1006 (Subject : VString_Var;
1007 Pat : PString;
1008 Replace : String)
1009 return Boolean;
1011 -- Simple match procedures. The subject is matched against the pattern.
1012 -- Any immediate or deferred assignments or writes are executed. No
1013 -- indication of success or failure is returned.
1015 procedure Match
1016 (Subject : VString;
1017 Pat : Pattern);
1019 procedure Match
1020 (Subject : VString;
1021 Pat : PString);
1023 procedure Match
1024 (Subject : String;
1025 Pat : Pattern);
1027 procedure Match
1028 (Subject : String;
1029 Pat : PString);
1031 -- Replacement procedures. The subject is matched against the pattern.
1032 -- Any immediate or deferred assignments or writes are executed. No
1033 -- indication of success or failure is returned. If the match succeeds,
1034 -- then the matched part of the subject string is replaced by the given
1035 -- Replace string.
1037 procedure Match
1038 (Subject : in out VString;
1039 Pat : Pattern;
1040 Replace : VString);
1042 procedure Match
1043 (Subject : in out VString;
1044 Pat : PString;
1045 Replace : VString);
1047 procedure Match
1048 (Subject : in out VString;
1049 Pat : Pattern;
1050 Replace : String);
1052 procedure Match
1053 (Subject : in out VString;
1054 Pat : PString;
1055 Replace : String);
1057 -- Deferred Replacement
1059 type Match_Result is private;
1060 -- Type used to record result of pattern match
1062 subtype Match_Result_Var is Match_Result;
1063 -- This synonyms is used as a formal parameter type to a function where,
1064 -- if the language allowed, we would use an in out parameter, but we are
1065 -- not allowed to have in out parameters for functions. Instead we pass
1066 -- actuals which must be variables, and with a bit of trickery in the
1067 -- body, manage to interprete them properly as though they were indeed
1068 -- in out parameters.
1070 function Match
1071 (Subject : VString_Var;
1072 Pat : Pattern;
1073 Result : Match_Result_Var)
1074 return Boolean;
1076 procedure Match
1077 (Subject : in out VString;
1078 Pat : Pattern;
1079 Result : out Match_Result);
1081 procedure Replace
1082 (Result : in out Match_Result;
1083 Replace : VString);
1084 -- Given a previous call to Match which set Result, performs a pattern
1085 -- replacement if the match was successful. Has no effect if the match
1086 -- failed. This call should immediately follow the Match call.
1088 ------------------------
1089 -- Debugging Routines --
1090 ------------------------
1092 -- Debugging pattern matching operations can often be quite complex,
1093 -- since there is no obvious way to trace the progress of the match.
1094 -- The declarations in this section provide some debugging assistance.
1096 Debug_Mode : Boolean := False;
1097 -- This global variable can be set True to generate debugging on all
1098 -- subsequent calls to Match. The debugging output is a full trace of
1099 -- the actions of the pattern matcher, written to Standard_Output. The
1100 -- level of this information is intended to be comprehensible at the
1101 -- abstract level of this package declaration. However, note that the
1102 -- use of this switch often generates large amounts of output.
1104 function "*" (P : Pattern; Fil : File_Access) return Pattern;
1105 function "*" (P : PString; Fil : File_Access) return Pattern;
1106 function "*" (P : PChar; Fil : File_Access) return Pattern;
1107 function "**" (P : Pattern; Fil : File_Access) return Pattern;
1108 function "**" (P : PString; Fil : File_Access) return Pattern;
1109 function "**" (P : PChar; Fil : File_Access) return Pattern;
1110 -- These are similar to the corresponding pattern assignment operations
1111 -- except that instead of setting the value of a variable, the matched
1112 -- substring is written to the appropriate file. This can be useful in
1113 -- following the progress of a match without generating the full amount
1115 -- of information obtained by setting Debug_Mode to True.
1117 Terminal : constant File_Access := Standard_Error;
1118 Output : constant File_Access := Standard_Output;
1119 -- Two handy synonyms for use with the above pattern write operations.
1121 -- Finally we have some routines that are useful for determining what
1122 -- patterns are in use, particularly if they are constructed dynamically.
1124 function Image (P : Pattern) return String;
1125 function Image (P : Pattern) return VString;
1126 -- This procedures yield strings that corresponds to the syntax needed
1127 -- to create the given pattern using the functions in this package. The
1128 -- form of this string is such that it could actually be compiled and
1129 -- evaluated to yield the required pattern except for references to
1130 -- variables and functions, which are output using one of the following
1131 -- forms:
1133 -- access Natural NP(16#...#)
1134 -- access Pattern PP(16#...#)
1135 -- access VString VP(16#...#)
1137 -- Natural_Func NF(16#...#)
1138 -- VString_Func VF(16#...#)
1140 -- where 16#...# is the hex representation of the integer address that
1141 -- corresponds to the given access value
1143 procedure Dump (P : Pattern);
1144 -- This procedure writes information about the pattern to Standard_Out.
1145 -- The format of this information is keyed to the internal data structures
1146 -- used to implement patterns. The information provided by Dump is thus
1147 -- more precise than that yielded by Image, but is also a bit more obscure
1148 -- (i.e. it cannot be interpreted solely in terms of this spec, you have
1149 -- to know something about the data structures).
1151 ------------------
1152 -- Private Part --
1153 ------------------
1155 private
1156 type PE;
1157 -- Pattern element, a pattern is a plex structure of PE's. This type
1158 -- is defined and sdescribed in the body of this package.
1160 type PE_Ptr is access all PE;
1161 -- Pattern reference. PE's use PE_Ptr values to reference other PE's
1163 type Pattern is new Controlled with record
1165 Stk : Natural;
1166 -- Maximum number of stack entries required for matching this
1167 -- pattern. See description of pattern history stack in body.
1169 P : PE_Ptr;
1170 -- Pointer to initial pattern element for pattern
1172 end record;
1174 pragma Finalize_Storage_Only (Pattern);
1176 procedure Adjust (Object : in out Pattern);
1177 -- Adjust routine used to copy pattern objects
1179 procedure Finalize (Object : in out Pattern);
1180 -- Finalization routine used to release storage allocated for a pattern.
1182 type VString_Ptr is access all VString;
1184 type Match_Result is record
1185 Var : VString_Ptr;
1186 -- Pointer to subject string. Set to null if match failed.
1188 Start : Natural;
1189 -- Starting index position (1's origin) of matched section of
1190 -- subject string. Only valid if Var is non-null.
1192 Stop : Natural;
1193 -- Ending index position (1's origin) of matched section of
1194 -- subject string. Only valid if Var is non-null.
1196 end record;
1198 pragma Volatile (Match_Result);
1199 -- This ensures that the Result parameter is passed by reference, so
1200 -- that we can play our games with the bogus Match_Result_Var parameter
1201 -- in the function case to treat it as though it were an in out parameter.
1203 end GNAT.Spitbol.Patterns;