Daily bump.
[official-gcc.git] / gcc / ada / g-awk.ads
blob4b9caa05d21ea812aa79f449f102f58be0fe1dc8
1 ------------------------------------------------------------------------------
2 -- --
3 -- GNAT COMPILER COMPONENTS --
4 -- --
5 -- G N A T . A W K --
6 -- --
7 -- S p e c --
8 -- --
9 -- $Revision$
10 -- --
11 -- Copyright (C) 2000-2001 Ada Core Technologies, Inc. --
12 -- --
13 -- GNAT is free software; you can redistribute it and/or modify it under --
14 -- terms of the GNU General Public License as published by the Free Soft- --
15 -- ware Foundation; either version 2, or (at your option) any later ver- --
16 -- sion. GNAT is distributed in the hope that it will be useful, but WITH- --
17 -- OUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY --
18 -- or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License --
19 -- for more details. You should have received a copy of the GNU General --
20 -- Public License distributed with GNAT; see file COPYING. If not, write --
21 -- to the Free Software Foundation, 59 Temple Place - Suite 330, Boston, --
22 -- MA 02111-1307, USA. --
23 -- --
24 -- As a special exception, if other files instantiate generics from this --
25 -- unit, or you link this unit with other files to produce an executable, --
26 -- this unit does not by itself cause the resulting executable to be --
27 -- covered by the GNU General Public License. This exception does not --
28 -- however invalidate any other reasons why the executable file might be --
29 -- covered by the GNU Public License. --
30 -- --
31 -- GNAT is maintained by Ada Core Technologies Inc (http://www.gnat.com). --
32 -- --
33 ------------------------------------------------------------------------------
35 -- This is an AWK-like unit. It provides an easy interface for parsing one
36 -- or more files containing formatted data. The file can be viewed seen as
37 -- a database where each record is a line and a field is a data element in
38 -- this line. In this implementation an AWK record is a line. This means
39 -- that a record cannot span multiple lines. The operating procedure is to
40 -- read files line by line, with each line being presented to the user of
41 -- the package. The interface provides services to access specific fields
42 -- in the line. Thus it is possible to control actions takn on a line based
43 -- on values of some fields. This can be achieved directly or by registering
44 -- callbacks triggered on programmed conditions.
46 -- The state of an AWK run is recorded in an object of type session.
47 -- The following is the procedure for using a session to control an
48 -- AWK run:
50 -- 1) Specify which session is to be used. It is possible to use the
51 -- default session or to create a new one by declaring an object of
52 -- type Session_Type. For example:
54 -- Computers : Session_Type;
56 -- 2) Specify how to cut a line into fields. There are two modes: using
57 -- character fields separators or column width. This is done by using
58 -- Set_Fields_Separators or Set_Fields_Width. For example by:
60 -- AWK.Set_Field_Separators (";,", Computers);
62 -- or by using iterators' Separators parameter.
64 -- 3) Specify which files to parse. This is done with Add_File/Add_Files
65 -- services, or by using the iterators' Filename parameter. For
66 -- example:
68 -- AWK.Add_File ("myfile.db", Computers);
70 -- 4) Run the AWK session using one of the provided iterators.
72 -- Parse
73 -- This is the most automated iterator. You can gain control on
74 -- the session only by registering one or more callbacks (see
75 -- Register).
77 -- Get_Line/End_Of_Data
78 -- This is a manual iterator to be used with a loop. You have
79 -- complete control on the session. You can use callbacks but
80 -- this is not required.
82 -- For_Every_Line
83 -- This provides a mixture of manual/automated iterator action.
85 -- Examples of these three approaches appear below
87 -- There is many ways to use this package. The following discussion shows
88 -- three approaches, using the three iterator forms, to using this package.
89 -- All examples will use the following file (computer.db):
91 -- Pluton;Windows-NT;Pentium III
92 -- Mars;Linux;Pentium Pro
93 -- Venus;Solaris;Sparc
94 -- Saturn;OS/2;i486
95 -- Jupiter;MacOS;PPC
97 -- 1) Using Parse iterator
99 -- Here the first step is to register some action associated to a pattern
100 -- and then to call the Parse iterator (this is the simplest way to use
101 -- this unit). The default session is used here. For example to output the
102 -- second field (the OS) of computer "Saturn".
104 -- procedure Action is
105 -- begin
106 -- Put_Line (AWK.Field (2));
107 -- end Action;
109 -- begin
110 -- AWK.Register (1, "Saturn", Action'Access);
111 -- AWK.Parse (";", "computer.db");
114 -- 2) Using the Get_Line/End_Of_Data iterator
116 -- Here you have full control. For example to do the same as
117 -- above but using a specific session, you could write:
119 -- Computer_File : Session_Type;
121 -- begin
122 -- AWK.Set_Current (Computer_File);
123 -- AWK.Open (Separators => ";",
124 -- Filename => "computer.db");
126 -- -- Display Saturn OS
128 -- while not AWK.End_Of_File loop
129 -- AWK.Get_Line;
131 -- if AWK.Field (1) = "Saturn" then
132 -- Put_Line (AWK.Field (2));
133 -- end if;
134 -- end loop;
136 -- AWK.Close (Computer_File);
139 -- 3) Using For_Every_Line iterator
141 -- In this case you use a provided iterator and you pass the procedure
142 -- that must be called for each record. You could code the previous
143 -- example could be coded as follows (using the iterator quick interface
144 -- but without using the current session):
146 -- Computer_File : Session_Type;
148 -- procedure Action (Quit : in out Boolean) is
149 -- begin
150 -- if AWK.Field (1, Computer_File) = "Saturn" then
151 -- Put_Line (AWK.Field (2, Computer_File));
152 -- end if;
153 -- end Action;
155 -- procedure Look_For_Saturn is
156 -- new AWK.For_Every_Line (Action);
158 -- begin
159 -- Look_For_Saturn (Separators => ";",
160 -- Filename => "computer.db",
161 -- Session => Computer_File);
163 -- Integer_Text_IO.Put
164 -- (Integer (AWK.NR (Session => Computer_File)));
165 -- Put_Line (" line(s) have been processed.");
167 -- You can also use a regular expression for the pattern. Let us output
168 -- the computer name for all computer for which the OS has a character
169 -- O in its name.
171 -- Regexp : String := ".*O.*";
173 -- Matcher : Regpat.Pattern_Matcher := Regpat.Compile (Regexp);
175 -- procedure Action is
176 -- begin
177 -- Text_IO.Put_Line (AWK.Field (2));
178 -- end Action;
180 -- begin
181 -- AWK.Register (2, Matcher, Action'Unrestricted_Access);
182 -- AWK.Parse (";", "computer.db");
185 with Ada.Finalization;
186 with GNAT.Regpat;
188 package GNAT.AWK is
190 Session_Error : exception;
191 -- Raised when a Session is reused but is not closed.
193 File_Error : exception;
194 -- Raised when there is a file problem (see below).
196 End_Error : exception;
197 -- Raised when an attempt is made to read beyond the end of the last
198 -- file of a session.
200 Field_Error : exception;
201 -- Raised when accessing a field value which does not exist.
203 Data_Error : exception;
204 -- Raised when it is not possible to convert a field value to a specific
205 -- type.
207 type Count is new Natural;
209 type Widths_Set is array (Positive range <>) of Positive;
210 -- Used to store a set of columns widths.
212 Default_Separators : constant String := " " & ASCII.HT;
214 Use_Current : constant String := "";
215 -- Value used when no separator or filename is specified in iterators.
217 type Session_Type is limited private;
218 -- This is the main exported type. A session is used to keep the state of
219 -- a full AWK run. The state comprises a list of files, the current file,
220 -- the number of line processed, the current line, the number of fields in
221 -- the current line... A default session is provided (see Set_Current,
222 -- Current_Session and Default_Session above).
224 ----------------------------
225 -- Package initialization --
226 ----------------------------
228 -- To be thread safe it is not possible to use the default provided
229 -- session. Each task must used a specific session and specify it
230 -- explicitly for every services.
232 procedure Set_Current (Session : Session_Type);
233 -- Set the session to be used by default. This file will be used when the
234 -- Session parameter in following services is not specified.
236 function Current_Session return Session_Type;
237 -- Returns the session used by default by all services. This is the
238 -- latest session specified by Set_Current service or the session
239 -- provided by default with this implementation.
241 function Default_Session return Session_Type;
242 -- Returns the default session provided by this package. Note that this is
243 -- the session return by Current_Session if Set_Current has not been used.
245 procedure Set_Field_Separators
246 (Separators : String := Default_Separators;
247 Session : Session_Type := Current_Session);
248 -- Set the field separators. Each character in the string is a field
249 -- separator. When a line is read it will be split by field using the
250 -- separators set here. Separators can be changed at any point and in this
251 -- case the current line is split according to the new separators. In the
252 -- special case that Separators is a space and a tabulation
253 -- (Default_Separators), fields are separated by runs of spaces and/or
254 -- tabs.
256 procedure Set_FS
257 (Separators : String := Default_Separators;
258 Session : Session_Type := Current_Session)
259 renames Set_Field_Separators;
260 -- FS is the AWK abbreviation for above service.
262 procedure Set_Field_Widths
263 (Field_Widths : Widths_Set;
264 Session : Session_Type := Current_Session);
265 -- This is another way to split a line by giving the length (in number of
266 -- characters) of each field in a line. Field widths can be changed at any
267 -- point and in this case the current line is split according to the new
268 -- field lengths. A line split with this method must have a length equal or
269 -- greater to the total of the field widths. All characters remaining on
270 -- the line after the latest field are added to a new automatically
271 -- created field.
273 procedure Add_File
274 (Filename : String;
275 Session : Session_Type := Current_Session);
276 -- Add Filename to the list of file to be processed. There is no limit on
277 -- the number of files that can be added. Files are processed in the order
278 -- they have been added (i.e. the filename list is FIFO). If Filename does
279 -- not exist or if it is not readable, File_Error is raised.
281 procedure Add_Files
282 (Directory : String;
283 Filenames : String;
284 Number_Of_Files_Added : out Natural;
285 Session : Session_Type := Current_Session);
286 -- Add all files matching the regular expression Filenames in the specified
287 -- directory to the list of file to be processed. There is no limit on
288 -- the number of files that can be added. Each file is processed in
289 -- the same order they have been added (i.e. the filename list is FIFO).
290 -- The number of files (possibly 0) added is returned in
291 -- Number_Of_Files_Added.
293 -------------------------------------
294 -- Information about current state --
295 -------------------------------------
297 function Number_Of_Fields
298 (Session : Session_Type := Current_Session)
299 return Count;
300 pragma Inline (Number_Of_Fields);
301 -- Returns the number of fields in the current record. It returns 0 when
302 -- no file is being processed.
304 function NF
305 (Session : Session_Type := Current_Session)
306 return Count
307 renames Number_Of_Fields;
308 -- AWK abbreviation for above service.
310 function Number_Of_File_Lines
311 (Session : Session_Type := Current_Session)
312 return Count;
313 pragma Inline (Number_Of_File_Lines);
314 -- Returns the current line number in the processed file. It returns 0 when
315 -- no file is being processed.
317 function FNR
318 (Session : Session_Type := Current_Session)
319 return Count renames Number_Of_File_Lines;
320 -- AWK abbreviation for above service.
322 function Number_Of_Lines
323 (Session : Session_Type := Current_Session)
324 return Count;
325 pragma Inline (Number_Of_Lines);
326 -- Returns the number of line processed until now. This is equal to number
327 -- of line in each already processed file plus FNR. It returns 0 when
328 -- no file is being processed.
330 function NR
331 (Session : Session_Type := Current_Session)
332 return Count
333 renames Number_Of_Lines;
334 -- AWK abbreviation for above service.
336 function Number_Of_Files
337 (Session : Session_Type := Current_Session)
338 return Natural;
339 pragma Inline (Number_Of_Files);
340 -- Returns the number of files associated with Session. This is the total
341 -- number of files added with Add_File and Add_Files services.
343 function File
344 (Session : Session_Type := Current_Session)
345 return String;
346 -- Returns the name of the file being processed. It returns the empty
347 -- string when no file is being processed.
349 ---------------------
350 -- Field accessors --
351 ---------------------
353 function Field
354 (Rank : Count;
355 Session : Session_Type := Current_Session)
356 return String;
357 -- Returns field number Rank value of the current record. If Rank = 0 it
358 -- returns the current record (i.e. the line as read in the file). It
359 -- raises Field_Error if Rank > NF or if Session is not open.
361 function Field
362 (Rank : Count;
363 Session : Session_Type := Current_Session)
364 return Integer;
365 -- Returns field number Rank value of the current record as an integer. It
366 -- raises Field_Error if Rank > NF or if Session is not open. It
367 -- raises Data_Error if the field value cannot be converted to an integer.
369 function Field
370 (Rank : Count;
371 Session : Session_Type := Current_Session)
372 return Float;
373 -- Returns field number Rank value of the current record as a float. It
374 -- raises Field_Error if Rank > NF or if Session is not open. It
375 -- raises Data_Error if the field value cannot be converted to a float.
377 generic
378 type Discrete is (<>);
379 function Discrete_Field
380 (Rank : Count;
381 Session : Session_Type := Current_Session)
382 return Discrete;
383 -- Returns field number Rank value of the current record as a type
384 -- Discrete. It raises Field_Error if Rank > NF. It raises Data_Error if
385 -- the field value cannot be converted to type Discrete.
387 --------------------
388 -- Pattern/Action --
389 --------------------
391 -- AWK defines rules like "PATTERN { ACTION }". Which means that ACTION
392 -- will be executed if PATTERN match. A pattern in this implementation can
393 -- be a simple string (match function is equality), a regular expression,
394 -- a function returning a boolean. An action is associated to a pattern
395 -- using the Register services.
397 -- Each procedure Register will add a rule to the set of rules for the
398 -- session. Rules are examined in the order they have been added.
400 type Pattern_Callback is access function return Boolean;
401 -- This is a pattern function pointer. When it returns True the associated
402 -- action will be called.
404 type Action_Callback is access procedure;
405 -- A simple action pointer
407 type Match_Action_Callback is
408 access procedure (Matches : GNAT.Regpat.Match_Array);
409 -- An advanced action pointer used with a regular expression pattern. It
410 -- returns an array of all the matches. See GNAT.Regpat for further
411 -- information.
413 procedure Register
414 (Field : Count;
415 Pattern : String;
416 Action : Action_Callback;
417 Session : Session_Type := Current_Session);
418 -- Register an Action associated with a Pattern. The pattern here is a
419 -- simple string that must match exactly the field number specified.
421 procedure Register
422 (Field : Count;
423 Pattern : GNAT.Regpat.Pattern_Matcher;
424 Action : Action_Callback;
425 Session : Session_Type := Current_Session);
426 -- Register an Action associated with a Pattern. The pattern here is a
427 -- simple regular expression which must match the field number specified.
429 procedure Register
430 (Field : Count;
431 Pattern : GNAT.Regpat.Pattern_Matcher;
432 Action : Match_Action_Callback;
433 Session : Session_Type := Current_Session);
434 -- Same as above but it pass the set of matches to the action
435 -- procedure. This is useful to analyse further why and where a regular
436 -- expression did match.
438 procedure Register
439 (Pattern : Pattern_Callback;
440 Action : Action_Callback;
441 Session : Session_Type := Current_Session);
442 -- Register an Action associated with a Pattern. The pattern here is a
443 -- function that must return a boolean. Action callback will be called if
444 -- the pattern callback returns True and nothing will happen if it is
445 -- False. This version is more general, the two other register services
446 -- trigger an action based on the value of a single field only.
448 procedure Register
449 (Action : Action_Callback;
450 Session : Session_Type := Current_Session);
451 -- Register an Action that will be called for every line. This is
452 -- equivalent to a Pattern_Callback function always returning True.
454 --------------------
455 -- Parse iterator --
456 --------------------
458 procedure Parse
459 (Separators : String := Use_Current;
460 Filename : String := Use_Current;
461 Session : Session_Type := Current_Session);
462 -- Launch the iterator, it will read every line in all specified
463 -- session's files. Registered callbacks are then called if the associated
464 -- pattern match. It is possible to specify a filename and a set of
465 -- separators directly. This offer a quick way to parse a single
466 -- file. These parameters will override those specified by Set_FS and
467 -- Add_File. The Session will be opened and closed automatically.
468 -- File_Error is raised if there is no file associated with Session, or if
469 -- a file associated with Session is not longer readable. It raises
470 -- Session_Error is Session is already open.
472 -----------------------------------
473 -- Get_Line/End_Of_Data Iterator --
474 -----------------------------------
476 type Callback_Mode is (None, Only, Pass_Through);
477 -- These mode are used for Get_Line/End_Of_Data and For_Every_Line
478 -- iterators. The associated semantic is:
480 -- None
481 -- callbacks are not active. This is the default mode for
482 -- Get_Line/End_Of_Data and For_Every_Line iterators.
484 -- Only
485 -- callbacks are active, if at least one pattern match, the associated
486 -- action is called and this line will not be passed to the user. In
487 -- the Get_Line case the next line will be read (if there is some
488 -- line remaining), in the For_Every_Line case Action will
489 -- not be called for this line.
491 -- Pass_Through
492 -- callbacks are active, for patterns which match the associated
493 -- action is called. Then the line is passed to the user. It means
494 -- that Action procedure is called in the For_Every_Line case and
495 -- that Get_Line returns with the current line active.
498 procedure Open
499 (Separators : String := Use_Current;
500 Filename : String := Use_Current;
501 Session : Session_Type := Current_Session);
502 -- Open the first file and initialize the unit. This must be called once
503 -- before using Get_Line. It is possible to specify a filename and a set of
504 -- separators directly. This offer a quick way to parse a single file.
505 -- These parameters will override those specified by Set_FS and Add_File.
506 -- File_Error is raised if there is no file associated with Session, or if
507 -- the first file associated with Session is no longer readable. It raises
508 -- Session_Error is Session is already open.
510 procedure Get_Line
511 (Callbacks : Callback_Mode := None;
512 Session : Session_Type := Current_Session);
513 -- Read a line from the current input file. If the file index is at the
514 -- end of the current input file (i.e. End_Of_File is True) then the
515 -- following file is opened. If there is no more file to be processed,
516 -- exception End_Error will be raised. File_Error will be raised if Open
517 -- has not been called. Next call to Get_Line will return the following
518 -- line in the file. By default the registered callbacks are not called by
519 -- Get_Line, this can activated by setting Callbacks (see Callback_Mode
520 -- description above). File_Error may be raised if a file associated with
521 -- Session is not readable.
523 -- When Callbacks is not None, it is possible to exhaust all the lines
524 -- of all the files associated with Session. In this case, File_Error
525 -- is not raised.
527 -- This procedure can be used from a subprogram called by procedure Parse
528 -- or by an instantiation of For_Every_Line (see below).
530 function End_Of_Data
531 (Session : Session_Type := Current_Session)
532 return Boolean;
533 pragma Inline (End_Of_Data);
534 -- Returns True if there is no more data to be processed in Session. It
535 -- means that the latest session's file is being processed and that
536 -- there is no more data to be read in this file (End_Of_File is True).
538 function End_Of_File
539 (Session : Session_Type := Current_Session)
540 return Boolean;
541 pragma Inline (End_Of_File);
542 -- Returns True when there is no more data to be processed on the current
543 -- session's file.
545 procedure Close (Session : Session_Type);
546 -- Release all associated data with Session. All memory allocated will
547 -- be freed, the current file will be closed if needed, the callbacks
548 -- will be unregistered. Close is convenient in reestablishing a session
549 -- for new use. Get_Line is no longer usable (will raise File_Error)
550 -- except after a successful call to Open, Parse or an instantiation
551 -- of For_Every_Line.
553 -----------------------------
554 -- For_Every_Line iterator --
555 -----------------------------
557 generic
558 with procedure Action (Quit : in out Boolean);
559 procedure For_Every_Line
560 (Separators : String := Use_Current;
561 Filename : String := Use_Current;
562 Callbacks : Callback_Mode := None;
563 Session : Session_Type := Current_Session);
564 -- This is another iterator. Action will be called for each new
565 -- record. The iterator's termination can be controlled by setting Quit
566 -- to True. It is by default set to False. It is possible to specify a
567 -- filename and a set of separators directly. This offer a quick way to
568 -- parse a single file. These parameters will override those specified by
569 -- Set_FS and Add_File. By default the registered callbacks are not called
570 -- by For_Every_Line, this can activated by setting Callbacks (see
571 -- Callback_Mode description above). The Session will be opened and
572 -- closed automatically. File_Error is raised if there is no file
573 -- associated with Session. It raises Session_Error is Session is already
574 -- open.
576 private
577 type Session_Data;
578 type Session_Data_Access is access Session_Data;
580 type Session_Type is new Ada.Finalization.Limited_Controlled with record
581 Data : Session_Data_Access;
582 end record;
584 procedure Initialize (Session : in out Session_Type);
585 procedure Finalize (Session : in out Session_Type);
587 end GNAT.AWK;