2010-12-20 Tobias Burnus <burnus@net-b.de>
[official-gcc.git] / gcc / ada / g-awk.ads
bloba854489a8e258f91abbbec4afde826f5dd3110e5
1 ------------------------------------------------------------------------------
2 -- --
3 -- GNAT COMPILER COMPONENTS --
4 -- --
5 -- G N A T . A W K --
6 -- --
7 -- S p e c --
8 -- --
9 -- Copyright (C) 2000-2006, AdaCore --
10 -- --
11 -- GNAT is free software; you can redistribute it and/or modify it under --
12 -- terms of the GNU General Public License as published by the Free Soft- --
13 -- ware Foundation; either version 2, or (at your option) any later ver- --
14 -- sion. GNAT is distributed in the hope that it will be useful, but WITH- --
15 -- OUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY --
16 -- or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License --
17 -- for more details. You should have received a copy of the GNU General --
18 -- Public License distributed with GNAT; see file COPYING. If not, write --
19 -- to the Free Software Foundation, 51 Franklin Street, Fifth Floor, --
20 -- Boston, MA 02110-1301, USA. --
21 -- --
22 -- As a special exception, if other files instantiate generics from this --
23 -- unit, or you link this unit with other files to produce an executable, --
24 -- this unit does not by itself cause the resulting executable to be --
25 -- covered by the GNU General Public License. This exception does not --
26 -- however invalidate any other reasons why the executable file might be --
27 -- covered by the GNU Public License. --
28 -- --
29 -- GNAT was originally developed by the GNAT team at New York University. --
30 -- Extensive contributions were provided by Ada Core Technologies Inc. --
31 -- --
32 ------------------------------------------------------------------------------
34 -- This is an AWK-like unit. It provides an easy interface for parsing one
35 -- or more files containing formatted data. The file can be viewed seen as
36 -- a database where each record is a line and a field is a data element in
37 -- this line. In this implementation an AWK record is a line. This means
38 -- that a record cannot span multiple lines. The operating procedure is to
39 -- read files line by line, with each line being presented to the user of
40 -- the package. The interface provides services to access specific fields
41 -- in the line. Thus it is possible to control actions taken on a line based
42 -- on values of some fields. This can be achieved directly or by registering
43 -- callbacks triggered on programmed conditions.
45 -- The state of an AWK run is recorded in an object of type session.
46 -- The following is the procedure for using a session to control an
47 -- AWK run:
49 -- 1) Specify which session is to be used. It is possible to use the
50 -- default session or to create a new one by declaring an object of
51 -- type Session_Type. For example:
53 -- Computers : Session_Type;
55 -- 2) Specify how to cut a line into fields. There are two modes: using
56 -- character fields separators or column width. This is done by using
57 -- Set_Fields_Separators or Set_Fields_Width. For example by:
59 -- AWK.Set_Field_Separators (";,", Computers);
61 -- or by using iterators' Separators parameter.
63 -- 3) Specify which files to parse. This is done with Add_File/Add_Files
64 -- services, or by using the iterators' Filename parameter. For
65 -- example:
67 -- AWK.Add_File ("myfile.db", Computers);
69 -- 4) Run the AWK session using one of the provided iterators.
71 -- Parse
72 -- This is the most automated iterator. You can gain control on
73 -- the session only by registering one or more callbacks (see
74 -- Register).
76 -- Get_Line/End_Of_Data
77 -- This is a manual iterator to be used with a loop. You have
78 -- complete control on the session. You can use callbacks but
79 -- this is not required.
81 -- For_Every_Line
82 -- This provides a mixture of manual/automated iterator action.
84 -- Examples of these three approaches appear below
86 -- There are many ways to use this package. The following discussion shows
87 -- three approaches to using this package, using the three iterator forms.
88 -- All examples will use the following file (computer.db):
90 -- Pluton;Windows-NT;Pentium III
91 -- Mars;Linux;Pentium Pro
92 -- Venus;Solaris;Sparc
93 -- Saturn;OS/2;i486
94 -- Jupiter;MacOS;PPC
96 -- 1) Using Parse iterator
98 -- Here the first step is to register some action associated to a pattern
99 -- and then to call the Parse iterator (this is the simplest way to use
100 -- this unit). The default session is used here. For example to output the
101 -- second field (the OS) of computer "Saturn".
103 -- procedure Action is
104 -- begin
105 -- Put_Line (AWK.Field (2));
106 -- end Action;
108 -- begin
109 -- AWK.Register (1, "Saturn", Action'Access);
110 -- AWK.Parse (";", "computer.db");
113 -- 2) Using the Get_Line/End_Of_Data iterator
115 -- Here you have full control. For example to do the same as
116 -- above but using a specific session, you could write:
118 -- Computer_File : Session_Type;
120 -- begin
121 -- AWK.Set_Current (Computer_File);
122 -- AWK.Open (Separators => ";",
123 -- Filename => "computer.db");
125 -- -- Display Saturn OS
127 -- while not AWK.End_Of_File loop
128 -- AWK.Get_Line;
130 -- if AWK.Field (1) = "Saturn" then
131 -- Put_Line (AWK.Field (2));
132 -- end if;
133 -- end loop;
135 -- AWK.Close (Computer_File);
138 -- 3) Using For_Every_Line iterator
140 -- In this case you use a provided iterator and you pass the procedure
141 -- that must be called for each record. You could code the previous
142 -- example could be coded as follows (using the iterator quick interface
143 -- but without using the current session):
145 -- Computer_File : Session_Type;
147 -- procedure Action (Quit : in out Boolean) is
148 -- begin
149 -- if AWK.Field (1, Computer_File) = "Saturn" then
150 -- Put_Line (AWK.Field (2, Computer_File));
151 -- end if;
152 -- end Action;
154 -- procedure Look_For_Saturn is
155 -- new AWK.For_Every_Line (Action);
157 -- begin
158 -- Look_For_Saturn (Separators => ";",
159 -- Filename => "computer.db",
160 -- Session => Computer_File);
162 -- Integer_Text_IO.Put
163 -- (Integer (AWK.NR (Session => Computer_File)));
164 -- Put_Line (" line(s) have been processed.");
166 -- You can also use a regular expression for the pattern. Let us output
167 -- the computer name for all computer for which the OS has a character
168 -- O in its name.
170 -- Regexp : String := ".*O.*";
172 -- Matcher : Regpat.Pattern_Matcher := Regpat.Compile (Regexp);
174 -- procedure Action is
175 -- begin
176 -- Text_IO.Put_Line (AWK.Field (2));
177 -- end Action;
179 -- begin
180 -- AWK.Register (2, Matcher, Action'Unrestricted_Access);
181 -- AWK.Parse (";", "computer.db");
184 with Ada.Finalization;
185 with GNAT.Regpat;
187 package GNAT.AWK is
189 Session_Error : exception;
190 -- Raised when a Session is reused but is not closed
192 File_Error : exception;
193 -- Raised when there is a file problem (see below)
195 End_Error : exception;
196 -- Raised when an attempt is made to read beyond the end of the last
197 -- file of a session.
199 Field_Error : exception;
200 -- Raised when accessing a field value which does not exist
202 Data_Error : exception;
203 -- Raised when it is impossible to convert a field value to a specific type
205 type Count is new Natural;
207 type Widths_Set is array (Positive range <>) of Positive;
208 -- Used to store a set of columns widths
210 Default_Separators : constant String := " " & ASCII.HT;
212 Use_Current : constant String := "";
213 -- Value used when no separator or filename is specified in iterators
215 type Session_Type is limited private;
216 -- This is the main exported type. A session is used to keep the state of
217 -- a full AWK run. The state comprises a list of files, the current file,
218 -- the number of line processed, the current line, the number of fields in
219 -- the current line... A default session is provided (see Set_Current,
220 -- Current_Session and Default_Session above).
222 ----------------------------
223 -- Package initialization --
224 ----------------------------
226 -- To be thread safe it is not possible to use the default provided
227 -- session. Each task must used a specific session and specify it
228 -- explicitly for every services.
230 procedure Set_Current (Session : Session_Type);
231 -- Set the session to be used by default. This file will be used when the
232 -- Session parameter in following services is not specified.
234 function Current_Session return Session_Type;
235 -- Returns the session used by default by all services. This is the
236 -- latest session specified by Set_Current service or the session
237 -- provided by default with this implementation.
239 function Default_Session return Session_Type;
240 -- Returns the default session provided by this package. Note that this is
241 -- the session return by Current_Session if Set_Current has not been used.
243 procedure Set_Field_Separators
244 (Separators : String := Default_Separators;
245 Session : Session_Type);
246 procedure Set_Field_Separators
247 (Separators : String := Default_Separators);
248 -- Set the field separators. Each character in the string is a field
249 -- separator. When a line is read it will be split by field using the
250 -- separators set here. Separators can be changed at any point and in this
251 -- case the current line is split according to the new separators. In the
252 -- special case that Separators is a space and a tabulation
253 -- (Default_Separators), fields are separated by runs of spaces and/or
254 -- tabs.
256 procedure Set_FS
257 (Separators : String := Default_Separators;
258 Session : Session_Type)
259 renames Set_Field_Separators;
260 procedure Set_FS
261 (Separators : String := Default_Separators)
262 renames Set_Field_Separators;
263 -- FS is the AWK abbreviation for above service
265 procedure Set_Field_Widths
266 (Field_Widths : Widths_Set;
267 Session : Session_Type);
268 procedure Set_Field_Widths
269 (Field_Widths : Widths_Set);
270 -- This is another way to split a line by giving the length (in number of
271 -- characters) of each field in a line. Field widths can be changed at any
272 -- point and in this case the current line is split according to the new
273 -- field lengths. A line split with this method must have a length equal or
274 -- greater to the total of the field widths. All characters remaining on
275 -- the line after the latest field are added to a new automatically
276 -- created field.
278 procedure Add_File
279 (Filename : String;
280 Session : Session_Type);
281 procedure Add_File
282 (Filename : String);
283 -- Add Filename to the list of file to be processed. There is no limit on
284 -- the number of files that can be added. Files are processed in the order
285 -- they have been added (i.e. the filename list is FIFO). If Filename does
286 -- not exist or if it is not readable, File_Error is raised.
288 procedure Add_Files
289 (Directory : String;
290 Filenames : String;
291 Number_Of_Files_Added : out Natural;
292 Session : Session_Type);
293 procedure Add_Files
294 (Directory : String;
295 Filenames : String;
296 Number_Of_Files_Added : out Natural);
297 -- Add all files matching the regular expression Filenames in the specified
298 -- directory to the list of file to be processed. There is no limit on
299 -- the number of files that can be added. Each file is processed in
300 -- the same order they have been added (i.e. the filename list is FIFO).
301 -- The number of files (possibly 0) added is returned in
302 -- Number_Of_Files_Added.
304 -------------------------------------
305 -- Information about current state --
306 -------------------------------------
308 function Number_Of_Fields
309 (Session : Session_Type) return Count;
310 function Number_Of_Fields
311 return Count;
312 pragma Inline (Number_Of_Fields);
313 -- Returns the number of fields in the current record. It returns 0 when
314 -- no file is being processed.
316 function NF
317 (Session : Session_Type) return Count
318 renames Number_Of_Fields;
319 function NF
320 return Count
321 renames Number_Of_Fields;
322 -- AWK abbreviation for above service
324 function Number_Of_File_Lines
325 (Session : Session_Type) return Count;
326 function Number_Of_File_Lines
327 return Count;
328 pragma Inline (Number_Of_File_Lines);
329 -- Returns the current line number in the processed file. It returns 0 when
330 -- no file is being processed.
332 function FNR (Session : Session_Type) return Count
333 renames Number_Of_File_Lines;
334 function FNR return Count
335 renames Number_Of_File_Lines;
336 -- AWK abbreviation for above service
338 function Number_Of_Lines
339 (Session : Session_Type) return Count;
340 function Number_Of_Lines
341 return Count;
342 pragma Inline (Number_Of_Lines);
343 -- Returns the number of line processed until now. This is equal to number
344 -- of line in each already processed file plus FNR. It returns 0 when
345 -- no file is being processed.
347 function NR (Session : Session_Type) return Count
348 renames Number_Of_Lines;
349 function NR return Count
350 renames Number_Of_Lines;
351 -- AWK abbreviation for above service
353 function Number_Of_Files
354 (Session : Session_Type) return Natural;
355 function Number_Of_Files
356 return Natural;
357 pragma Inline (Number_Of_Files);
358 -- Returns the number of files associated with Session. This is the total
359 -- number of files added with Add_File and Add_Files services.
361 function File (Session : Session_Type) return String;
362 function File return String;
363 -- Returns the name of the file being processed. It returns the empty
364 -- string when no file is being processed.
366 ---------------------
367 -- Field accessors --
368 ---------------------
370 function Field
371 (Rank : Count;
372 Session : Session_Type) return String;
373 function Field
374 (Rank : Count) return String;
375 -- Returns field number Rank value of the current record. If Rank = 0 it
376 -- returns the current record (i.e. the line as read in the file). It
377 -- raises Field_Error if Rank > NF or if Session is not open.
379 function Field
380 (Rank : Count;
381 Session : Session_Type) return Integer;
382 function Field
383 (Rank : Count) return Integer;
384 -- Returns field number Rank value of the current record as an integer. It
385 -- raises Field_Error if Rank > NF or if Session is not open. It
386 -- raises Data_Error if the field value cannot be converted to an integer.
388 function Field
389 (Rank : Count;
390 Session : Session_Type) return Float;
391 function Field
392 (Rank : Count) return Float;
393 -- Returns field number Rank value of the current record as a float. It
394 -- raises Field_Error if Rank > NF or if Session is not open. It
395 -- raises Data_Error if the field value cannot be converted to a float.
397 generic
398 type Discrete is (<>);
399 function Discrete_Field
400 (Rank : Count;
401 Session : Session_Type) return Discrete;
402 generic
403 type Discrete is (<>);
404 function Discrete_Field_Current_Session
405 (Rank : Count) return Discrete;
406 -- Returns field number Rank value of the current record as a type
407 -- Discrete. It raises Field_Error if Rank > NF. It raises Data_Error if
408 -- the field value cannot be converted to type Discrete.
410 --------------------
411 -- Pattern/Action --
412 --------------------
414 -- AWK defines rules like "PATTERN { ACTION }". Which means that ACTION
415 -- will be executed if PATTERN match. A pattern in this implementation can
416 -- be a simple string (match function is equality), a regular expression,
417 -- a function returning a boolean. An action is associated to a pattern
418 -- using the Register services.
420 -- Each procedure Register will add a rule to the set of rules for the
421 -- session. Rules are examined in the order they have been added.
423 type Pattern_Callback is access function return Boolean;
424 -- This is a pattern function pointer. When it returns True the associated
425 -- action will be called.
427 type Action_Callback is access procedure;
428 -- A simple action pointer
430 type Match_Action_Callback is
431 access procedure (Matches : GNAT.Regpat.Match_Array);
432 -- An advanced action pointer used with a regular expression pattern. It
433 -- returns an array of all the matches. See GNAT.Regpat for further
434 -- information.
436 procedure Register
437 (Field : Count;
438 Pattern : String;
439 Action : Action_Callback;
440 Session : Session_Type);
441 procedure Register
442 (Field : Count;
443 Pattern : String;
444 Action : Action_Callback);
445 -- Register an Action associated with a Pattern. The pattern here is a
446 -- simple string that must match exactly the field number specified.
448 procedure Register
449 (Field : Count;
450 Pattern : GNAT.Regpat.Pattern_Matcher;
451 Action : Action_Callback;
452 Session : Session_Type);
453 procedure Register
454 (Field : Count;
455 Pattern : GNAT.Regpat.Pattern_Matcher;
456 Action : Action_Callback);
457 -- Register an Action associated with a Pattern. The pattern here is a
458 -- simple regular expression which must match the field number specified.
460 procedure Register
461 (Field : Count;
462 Pattern : GNAT.Regpat.Pattern_Matcher;
463 Action : Match_Action_Callback;
464 Session : Session_Type);
465 procedure Register
466 (Field : Count;
467 Pattern : GNAT.Regpat.Pattern_Matcher;
468 Action : Match_Action_Callback);
469 -- Same as above but it pass the set of matches to the action
470 -- procedure. This is useful to analyse further why and where a regular
471 -- expression did match.
473 procedure Register
474 (Pattern : Pattern_Callback;
475 Action : Action_Callback;
476 Session : Session_Type);
477 procedure Register
478 (Pattern : Pattern_Callback;
479 Action : Action_Callback);
480 -- Register an Action associated with a Pattern. The pattern here is a
481 -- function that must return a boolean. Action callback will be called if
482 -- the pattern callback returns True and nothing will happen if it is
483 -- False. This version is more general, the two other register services
484 -- trigger an action based on the value of a single field only.
486 procedure Register
487 (Action : Action_Callback;
488 Session : Session_Type);
489 procedure Register
490 (Action : Action_Callback);
491 -- Register an Action that will be called for every line. This is
492 -- equivalent to a Pattern_Callback function always returning True.
494 --------------------
495 -- Parse iterator --
496 --------------------
498 procedure Parse
499 (Separators : String := Use_Current;
500 Filename : String := Use_Current;
501 Session : Session_Type);
502 procedure Parse
503 (Separators : String := Use_Current;
504 Filename : String := Use_Current);
505 -- Launch the iterator, it will read every line in all specified
506 -- session's files. Registered callbacks are then called if the associated
507 -- pattern match. It is possible to specify a filename and a set of
508 -- separators directly. This offer a quick way to parse a single
509 -- file. These parameters will override those specified by Set_FS and
510 -- Add_File. The Session will be opened and closed automatically.
511 -- File_Error is raised if there is no file associated with Session, or if
512 -- a file associated with Session is not longer readable. It raises
513 -- Session_Error is Session is already open.
515 -----------------------------------
516 -- Get_Line/End_Of_Data Iterator --
517 -----------------------------------
519 type Callback_Mode is (None, Only, Pass_Through);
520 -- These mode are used for Get_Line/End_Of_Data and For_Every_Line
521 -- iterators. The associated semantic is:
523 -- None
524 -- callbacks are not active. This is the default mode for
525 -- Get_Line/End_Of_Data and For_Every_Line iterators.
527 -- Only
528 -- callbacks are active, if at least one pattern match, the associated
529 -- action is called and this line will not be passed to the user. In
530 -- the Get_Line case the next line will be read (if there is some
531 -- line remaining), in the For_Every_Line case Action will
532 -- not be called for this line.
534 -- Pass_Through
535 -- callbacks are active, for patterns which match the associated
536 -- action is called. Then the line is passed to the user. It means
537 -- that Action procedure is called in the For_Every_Line case and
538 -- that Get_Line returns with the current line active.
541 procedure Open
542 (Separators : String := Use_Current;
543 Filename : String := Use_Current;
544 Session : Session_Type);
545 procedure Open
546 (Separators : String := Use_Current;
547 Filename : String := Use_Current);
548 -- Open the first file and initialize the unit. This must be called once
549 -- before using Get_Line. It is possible to specify a filename and a set of
550 -- separators directly. This offer a quick way to parse a single file.
551 -- These parameters will override those specified by Set_FS and Add_File.
552 -- File_Error is raised if there is no file associated with Session, or if
553 -- the first file associated with Session is no longer readable. It raises
554 -- Session_Error is Session is already open.
556 procedure Get_Line
557 (Callbacks : Callback_Mode := None;
558 Session : Session_Type);
559 procedure Get_Line
560 (Callbacks : Callback_Mode := None);
561 -- Read a line from the current input file. If the file index is at the
562 -- end of the current input file (i.e. End_Of_File is True) then the
563 -- following file is opened. If there is no more file to be processed,
564 -- exception End_Error will be raised. File_Error will be raised if Open
565 -- has not been called. Next call to Get_Line will return the following
566 -- line in the file. By default the registered callbacks are not called by
567 -- Get_Line, this can activated by setting Callbacks (see Callback_Mode
568 -- description above). File_Error may be raised if a file associated with
569 -- Session is not readable.
571 -- When Callbacks is not None, it is possible to exhaust all the lines
572 -- of all the files associated with Session. In this case, File_Error
573 -- is not raised.
575 -- This procedure can be used from a subprogram called by procedure Parse
576 -- or by an instantiation of For_Every_Line (see below).
578 function End_Of_Data
579 (Session : Session_Type) return Boolean;
580 function End_Of_Data
581 return Boolean;
582 pragma Inline (End_Of_Data);
583 -- Returns True if there is no more data to be processed in Session. It
584 -- means that the latest session's file is being processed and that
585 -- there is no more data to be read in this file (End_Of_File is True).
587 function End_Of_File
588 (Session : Session_Type) return Boolean;
589 function End_Of_File
590 return Boolean;
591 pragma Inline (End_Of_File);
592 -- Returns True when there is no more data to be processed on the current
593 -- session's file.
595 procedure Close (Session : Session_Type);
596 -- Release all associated data with Session. All memory allocated will
597 -- be freed, the current file will be closed if needed, the callbacks
598 -- will be unregistered. Close is convenient in reestablishing a session
599 -- for new use. Get_Line is no longer usable (will raise File_Error)
600 -- except after a successful call to Open, Parse or an instantiation
601 -- of For_Every_Line.
603 -----------------------------
604 -- For_Every_Line iterator --
605 -----------------------------
607 generic
608 with procedure Action (Quit : in out Boolean);
609 procedure For_Every_Line
610 (Separators : String := Use_Current;
611 Filename : String := Use_Current;
612 Callbacks : Callback_Mode := None;
613 Session : Session_Type);
614 generic
615 with procedure Action (Quit : in out Boolean);
616 procedure For_Every_Line_Current_Session
617 (Separators : String := Use_Current;
618 Filename : String := Use_Current;
619 Callbacks : Callback_Mode := None);
620 -- This is another iterator. Action will be called for each new
621 -- record. The iterator's termination can be controlled by setting Quit
622 -- to True. It is by default set to False. It is possible to specify a
623 -- filename and a set of separators directly. This offer a quick way to
624 -- parse a single file. These parameters will override those specified by
625 -- Set_FS and Add_File. By default the registered callbacks are not called
626 -- by For_Every_Line, this can activated by setting Callbacks (see
627 -- Callback_Mode description above). The Session will be opened and
628 -- closed automatically. File_Error is raised if there is no file
629 -- associated with Session. It raises Session_Error is Session is already
630 -- open.
632 private
633 type Session_Data;
634 type Session_Data_Access is access Session_Data;
636 type Session_Type is new Ada.Finalization.Limited_Controlled with record
637 Data : Session_Data_Access;
638 end record;
640 procedure Initialize (Session : in out Session_Type);
641 procedure Finalize (Session : in out Session_Type);
643 end GNAT.AWK;