1 ------------------------------------------------------------------------------
3 -- GNAT COMPILER COMPONENTS --
9 -- Copyright (C) 2014-2018, Free Software Foundation, Inc. --
11 -- GNAT is free software; you can redistribute it and/or modify it under --
12 -- terms of the GNU General Public License as published by the Free Soft- --
13 -- ware Foundation; either version 3, or (at your option) any later ver- --
14 -- sion. GNAT is distributed in the hope that it will be useful, but WITH- --
15 -- OUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY --
16 -- or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License --
17 -- for more details. You should have received a copy of the GNU General --
18 -- Public License distributed with GNAT; see file COPYING3. If not, go to --
19 -- http://www.gnu.org/licenses for a complete copy of the license. --
21 -- GNAT was originally developed by the GNAT team at New York University. --
22 -- Extensive contributions were provided by Ada Core Technologies Inc. --
24 ------------------------------------------------------------------------------
26 -- Expand routines for unnesting subprograms
29 with Types
; use Types
;
37 -- Normally, nested subprograms in the source result in corresponding
38 -- nested subprograms in the resulting tree. We then expect the back end
39 -- to handle such nested subprograms, including all cases of uplevel
40 -- references. For example, the GCC back end can do this relatively easily
41 -- since GNU C (as an extension) allows nested functions with uplevel
42 -- references, and implements an appropriate static chain approach to
43 -- dealing with such uplevel references.
45 -- However, we also want to be able to interface with back ends that do
46 -- not easily handle such uplevel references. One example is the back end
47 -- that translates the tree into standard C source code. In the future,
48 -- other back ends might need the same capability (e.g. a back end that
49 -- generated LLVM intermediate code).
51 -- We could imagine simply handling such references in the appropriate
52 -- back end. For example the back end that generates C could recognize
53 -- nested subprograms and rig up some way of translating them, e.g. by
54 -- making a static-link source level visible.
56 -- Rather than take that approach, we prefer to do a semantics-preserving
57 -- transformation on the GNAT tree, that eliminates the problem before we
58 -- hand the tree over to the back end. There are two reasons for preferring
61 -- First: the work needs only to be done once for all affected back ends
62 -- and we can remain within the semantics of the tree. The front end is
63 -- full of tree transformations, so we have all the infrastructure for
64 -- doing transformations of this type.
66 -- Second: given that the transformation will be semantics-preserving,
67 -- we can still use the standard GCC back end to build code from it.
68 -- This means we can easily run our full test suite to verify that the
69 -- transformations are indeed semantics preserving. It is a lot more
70 -- work to thoroughly test the output of specialized back ends.
72 -- Looking at the problem, we have three situations to deal with. Note
73 -- that in these examples, we use all lower case, since that is the way
74 -- the internal tree is cased.
76 -- First, cases where there are no uplevel references, for example
79 -- function max (m, n : Integer) return integer is
81 -- return integer'max (m, n);
86 -- Second, cases where there are explicit uplevel references.
88 -- procedure case2 (b : integer) is
89 -- procedure Inner (bb : integer);
91 -- procedure inner2 is
97 -- y : constant integer := 15 * 16;
98 -- rv : integer := 10;
100 -- procedure inner (bb : integer) is
102 -- x := rv + y + bb + b;
109 -- In this second example, B, X, RV are uplevel referenced. Y is not
110 -- considered as an uplevel reference since it is a static constant
111 -- where references are replaced by the value at compile time.
113 -- Third, cases where there are implicit uplevel references via types
114 -- whose bounds depend on locally declared constants or variables:
116 -- function case3 (x, y : integer) return boolean is
117 -- subtype dynam is integer range x .. y + 3;
118 -- subtype static is integer range 42 .. 73;
121 -- type darr is array (dynam) of Integer;
122 -- type darec is record
128 -- function inner (b : integer) return boolean is
130 -- return b in dynam and then darecv.b in static;
134 -- return inner (42) and then inner (xx * 3 - y * 2);
137 -- In this third example, the membership test implicitly references the
138 -- the bounds of Dynam, which both involve uplevel references.
140 -- ------------------
141 -- -- The Solution --
142 -- ------------------
144 -- Looking at the three cases above, the first case poses no problem at
145 -- all. Indeed the subprogram could have been declared at the outer level
146 -- (perhaps changing the name). But this style is quite common as a way
147 -- of limiting the scope of a local procedure called only within the outer
148 -- procedure. We could move it to the outer level (with a name change if
149 -- needed), but we don't bother. We leave it nested, and the back end just
150 -- translates it as though it were not nested.
152 -- In general we leave nested procedures nested, rather than trying to move
153 -- them to the outer level (the back end may do that, e.g. as part of the
154 -- translation to C, but we don't do it in the tree itself). This saves a
155 -- LOT of trouble in terms of visibility and semantics.
157 -- But of course we have to deal with the uplevel references. The idea is
158 -- to rewrite these nested subprograms so that they no longer have any such
159 -- uplevel references, so by the time they reach the back end, they all are
160 -- case 1 (no uplevel references) and thus easily handled.
162 -- To deal with explicit uplevel references (case 2 above), we proceed with
163 -- the following steps:
165 -- All entities marked as being uplevel referenced are marked as aliased
166 -- since they will be accessed indirectly via an activation record as
169 -- An activation record is created containing system address values
170 -- for each uplevel referenced entity in a given scope. In the example
171 -- given before, we would have:
173 -- type AREC1T is record
179 -- type AREC1PT is access all AREC1T;
181 -- AREC1 : aliased AREC1T;
182 -- AREC1P : constant AREC1PT := AREC1'Access;
184 -- The fields of AREC1 are set at the point the corresponding entity
185 -- is declared (immediately for parameters).
187 -- Note: the 1 in all these names is a unique index number. Different
188 -- scopes requiring different ARECnT declarations will have different
189 -- values of n to ensure uniqueness.
191 -- Note: normally the field names in the activation record match the
192 -- name of the entity. An exception is when the entity is declared in
193 -- a declare block, in which case we append the entity number, to avoid
194 -- clashes between the same name declared in different declare blocks.
196 -- For all subprograms nested immediately within the corresponding scope,
197 -- a parameter AREC1F is passed, and all calls to these routines have
198 -- AREC1P added as an additional formal.
200 -- Now within the nested procedures, any reference to an uplevel entity
201 -- xxx is replaced by typ'Deref(AREC1.xxx) where typ is the type of the
204 -- Note: the reason that we use Address as the component type in the
205 -- declaration of AREC1T is that we may create this type before we see
206 -- the declaration of this type.
208 -- The following shows example 2 above after this translation:
210 -- procedure case2x (b : aliased Integer) is
211 -- type AREC1T is record
217 -- type AREC1PT is access all AREC1T;
219 -- AREC1 : aliased AREC1T;
220 -- AREC1P : constant AREC1PT := AREC1'Access;
222 -- AREC1.b := b'Address;
224 -- procedure inner (bb : integer; AREC1F : AREC1PT);
226 -- procedure inner2 (AREC1F : AREC1PT) is
231 -- x : aliased integer := 77;
232 -- AREC1.x := X'Address;
234 -- y : constant Integer := 15 * 16;
236 -- rv : aliased Integer;
237 -- AREC1.rv := rv'Address;
239 -- procedure inner (bb : integer; AREC1F : AREC1PT) is
241 -- Integer'Deref(AREC1F.x) :=
242 -- Integer'Deref(AREC1F.rv) + y + b + Integer'Deref(AREC1F.b);
249 -- And now the inner procedures INNER2 and INNER have no uplevel references
250 -- so they have been reduced to case 1, which is the case easily handled by
251 -- the back end. Note that the generated code is not strictly legal Ada
252 -- because of the assignments to AREC1 in the declarative sequence, but the
253 -- GNAT tree always allows such mixing of declarations and statements, so
254 -- the back end must be prepared to handle this in any case.
256 -- Case 3 where we have uplevel references to types is a bit more complex.
257 -- That would especially be the case if we did a full transformation that
258 -- completely eliminated such uplevel references as we did for case 2. But
259 -- instead of trying to do that, we rewrite the subprogram so that the code
260 -- generator can easily detect and deal with these uplevel type references.
262 -- First we distinguish two cases
264 -- Static types are one of the two following cases:
266 -- Discrete types whose bounds are known at compile time. This is not
267 -- quite the same as what is tested by Is_OK_Static_Subtype, in that
268 -- it allows compile time known values that are not static expressions.
270 -- Composite types, whose components are (recursively) static types.
272 -- Dynamic types are one of the two following cases:
274 -- Discrete types with at least one bound not known at compile time.
276 -- Composite types with at least one component that is (recursively)
279 -- Uplevel references to static types are not a problem, the front end
280 -- or the code generator fetches the bounds as required, and since they
281 -- are compile time known values, this value can just be extracted and
282 -- no actual uplevel reference is required.
284 -- Uplevel references to dynamic types are a potential problem, since
285 -- such references may involve an implicit access to a dynamic bound,
286 -- and this reference is an implicit uplevel access.
288 -- To fully unnest such references would be messy, since we would have
289 -- to create local copies of the dynamic types involved, so that the
290 -- front end or code generator could generate an explicit uplevel
291 -- reference to the bound involved. Rather than do that, we set things
292 -- up so that this situation can be easily detected and dealt with when
293 -- there is an implicit reference to the bounds.
295 -- What we do is to always generate a local constant for any dynamic
296 -- bound in a dynamic subtype xx with name xx_FIRST or xx_LAST. The one
297 -- case where we can skip this is where the bound is already a constant.
298 -- E.g. in the third example above, subtype dynam is expanded as
300 -- dynam_LAST : constant Integer := y + 3;
301 -- subtype dynam is integer range x .. dynam_LAST;
303 -- Now if type dynam is uplevel referenced (as it is in this case), then
304 -- the bounds x and dynam_LAST are marked as uplevel references
305 -- so that appropriate entries are made in the activation record. Any
306 -- explicit reference to such a bound in the front end generated code
307 -- will be handled by the normal uplevel reference mechanism which we
308 -- described above for case 2. For implicit references by a back end
309 -- that needs to unnest things, any such implicit reference to one of
310 -- these bounds can be replaced by an appropriate reference to the entry
311 -- in the activation record for xx_FIRST or xx_LAST. Thus the back end
312 -- can eliminate the problematical uplevel reference without the need to
313 -- do the heavy tree modification to do that at the code expansion level.
315 -- Looking at case 3 again, here is the normal -gnatG expanded code
317 -- function case3 (x : integer; y : integer) return boolean is
318 -- dynam_LAST : constant integer := y {+} 3;
319 -- subtype dynam is integer range x .. dynam_LAST;
320 -- subtype static is integer range 42 .. 73;
322 -- [constraint_error when
323 -- not (y in x .. dynam_LAST)
324 -- "range check failed"]
328 -- type darr is array (x .. dynam_LAST) of integer;
329 -- type darec is record
333 -- [type TdarrB is array (x .. dynam_LAST range <>) of integer]
337 -- function inner (b : integer) return boolean is
339 -- return b in x .. dynam_LAST and then darecv.b in 42 .. 73;
342 -- return inner (42) and then inner (xx {*} 3 {-} y {*} 2);
345 -- Note: the actual expanded code has fully qualified names so for
346 -- example function inner is actually function case3__inner. For now
347 -- we ignore that detail to clarify the examples.
349 -- Here we see that some of the bounds references are expanded by the
350 -- front end, so that we get explicit references to y or dynam_Last. These
351 -- cases are handled by the normal uplevel reference mechanism described
352 -- above for case 2. This is the case for the constraint check for the
353 -- initialization of xx, and the range check in function inner.
355 -- But the reference darecv.b in the return statement of function
356 -- inner has an implicit reference to the bounds of dynam, since to
357 -- compute the location of b in the record, we need the length of a.
359 -- Here is the full translation of the third example:
361 -- function case3x (x, y : integer) return boolean is
362 -- type AREC1T is record
364 -- dynam_LAST : Address;
367 -- type AREC1PT is access all AREC1T;
369 -- AREC1 : aliased AREC1T;
370 -- AREC1P : constant AREC1PT := AREC1'Access;
372 -- AREC1.x := x'Address;
374 -- dynam_LAST : constant integer := y {+} 3;
375 -- AREC1.dynam_LAST := dynam_LAST'Address;
376 -- subtype dynam is integer range x .. dynam_LAST;
379 -- [constraint_error when
380 -- not (y in x .. dynam_LAST)
381 -- "range check failed"]
383 -- subtype static is integer range 42 .. 73;
385 -- type darr is array (x .. dynam_LAST) of Integer;
386 -- type darec is record
392 -- function inner (b : integer; AREC1F : AREC1PT) return boolean is
394 -- return b in x .. Integer'Deref(AREC1F.dynam_LAST)
395 -- and then darecv.b in 42 .. 73;
399 -- return inner (42, AREC1P) and then inner (xx * 3, AREC1P);
402 -- And now the back end when it processes darecv.b will access the bounds
403 -- of darecv.a by referencing the d and dynam_LAST fields of AREC1P.
405 -----------------------------
406 -- Multiple Nesting Levels --
407 -----------------------------
409 -- In our examples so far, we have only nested to a single level, but the
410 -- scheme generalizes to multiple levels of nesting and in this section we
411 -- discuss how this generalization works.
413 -- Consider this example with two nesting levels
415 -- To deal with elimination of uplevel references, we follow the same basic
416 -- approach described above for case 2, except that we need an activation
417 -- record at each nested level. Basically the rule is that any procedure
418 -- that has nested procedures needs an activation record. When we do this,
419 -- the inner activation records have a pointer (uplink) to the immediately
420 -- enclosing activation record, the normal arrangement of static links. The
421 -- following shows the full translation of this fourth case.
423 -- function case4x (x : integer) return integer is
424 -- type AREC1T is record
428 -- type AREC1PT is access all AREC1T;
430 -- AREC1 : aliased AREC1T;
431 -- AREC1P : constant AREC1PT := AREC1'Access;
433 -- v1 : integer := x;
434 -- AREC1.v1 := v1'Address;
436 -- function inner1 (y : integer; AREC1F : AREC1PT) return integer is
437 -- type AREC2T is record
442 -- type AREC2PT is access all AREC2T;
444 -- AREC2 : aliased AREC2T;
445 -- AREC2P : constant AREC2PT := AREC2'Access;
447 -- AREC2.AREC1U := AREC1F;
449 -- v2 : integer := Integer'Deref (AREC1F.v1) {+} 1;
450 -- AREC2.v2 := v2'Address;
453 -- (z : integer; AREC2F : AREC2PT) return integer
456 -- return integer(z {+}
457 -- Integer'Deref (AREC2F.AREC1U.v1) {+}
458 -- Integer'Deref (AREC2F.v2).all);
461 -- return integer(y {+}
462 -- inner2 (Integer'Deref (AREC1F.v1), AREC2P));
465 -- return inner1 (x, AREC1P);
468 -- As can be seen in this example, the index numbers following AREC in the
469 -- generated names avoid confusion between AREC names at different levels.
471 -------------------------
472 -- Name Disambiguation --
473 -------------------------
475 -- As described above, the translation scheme would raise issues when the
476 -- code generator did the actual unnesting if identically named nested
477 -- subprograms exist. Similarly overloading would cause a naming issue.
479 -- In fact, the expanded code includes qualified names which eliminate this
480 -- problem. We omitted the qualification from the expanded examples above
481 -- for simplicity. But to see this in action, consider this example:
483 -- function Mnames return Boolean is
484 -- procedure Inner is
485 -- procedure Inner is
492 -- function F (A : Boolean) return Boolean is
496 -- function F (A : Integer) return Boolean is
502 -- return F (42) or F (True);
505 -- The expanded code actually looks like:
507 -- function mnames return boolean is
508 -- procedure mnames__inner is
509 -- procedure mnames__inner__inner is
513 -- end mnames__inner__inner;
515 -- mnames__inner__inner;
517 -- end mnames__inner;
518 -- function mnames__f (a : boolean) return boolean is
522 -- function mnames__f__2 (a : integer) return boolean is
528 -- return mnames__f__2 (42) or mnames__f (true);
531 -- As can be seen from studying this example, the qualification deals both
532 -- with the issue of clashing names (mnames__inner, mnames__inner__inner),
533 -- and with overloading (mnames__f, mnames__f__2).
535 -- In addition, the declarations of ARECnT and ARECnPT get moved to the
536 -- outer level when we actually generate C code, so we suffix these names
537 -- with the corresponding entity name to make sure they are unique.
539 ---------------------------
540 -- Terminology for Calls --
541 ---------------------------
543 -- The level of a subprogram in the nest being analyzed is defined to be
544 -- the level of nesting, so the outer level subprogram (the one passed to
545 -- Unnest_Subprogram) is 1, subprograms immediately nested within this
546 -- outer level subprogram have a level of 2, etc.
548 -- Calls within the nest being analyzed are of three types:
550 -- Downward call: this is a call from a subprogram to a subprogram that
551 -- is immediately nested with in the caller, and thus has a level that
552 -- is one greater than the caller. It is a fundamental property of the
553 -- nesting structure and visibility that it is not possible to make a
554 -- call from level N to level M, where M is greater than N + 1.
556 -- Parallel call: this is a call from a nested subprogram to another
557 -- nested subprogram that is at the same level.
559 -- Upward call: this is a call from a subprogram to a subprogram that
560 -- encloses the caller. The level of the callee is less than the level
561 -- of the caller, and there is no limit on the difference, e.g. for an
562 -- uplevel call, a subprogram at level 5 can call one at level 2 or even
563 -- the outer level subprogram at level 1.
565 -------------------------------------
566 -- Handling of unconstrained types --
567 -------------------------------------
569 -- Objects whose nominal subtype is an unconstrained array type present
570 -- additional complications for translation into LLVM. The address
571 -- attribute of such objects points to the first component of the
572 -- array, and the bounds are found elsewhere, typically ahead of the
573 -- components. In many cases the bounds of an object are stored ahead
574 -- of the components and can be retrieved from it. However, if the
575 -- object is an expression (e.g. a slice) the bounds are not adjacent
576 -- and thus must be conveyed explicitly by means of a so-called
577 -- fat pointer. This leads to the following enhancements to the
578 -- handling of uplevel references described so far. This applies only
579 -- to uplevel references to unconstrained formals of enclosing
582 -- a) Uplevel references are detected as before during the tree traversal
583 -- in Visit_Node. For reference to uplevel formals, we include those with
584 -- an unconstrained array type (e.g. String) even if such a type has
587 -- b) references to unconstrained formals are recognized in the Subp
588 -- table by means of the predicate Needs_Fat_Pointer.
590 -- c) When constructing the required activation record we also construct
591 -- a named access type whose designated type is the unconstrained array
592 -- type. The activation record of a subprogram that contains such an
593 -- uplevel reference includes a component of this access type. The
594 -- declaration for that access type is introduced and analyzed before
595 -- that of the activation record, so it appears in the subprogram that
598 -- d) The uplevel reference is rewritten as an explicit dereference (.all)
599 -- of the corresponding pointer component.
605 -- Table to record subprograms within the nest being currently analyzed.
606 -- Entries in this table are made for each subprogram expanded, and do not
607 -- get cleared as we complete the expansion, since we want the table info
608 -- around in Cprint for the actual unnesting operation. Subps_First in this
609 -- unit records the starting entry in the table for the entries for Subp
610 -- and this is also recorded in the Subps_Index field of the outer level
611 -- subprogram in the nest. The last subps index for the nest can be found
612 -- in the Subp_Entry Last field of this first entry.
614 subtype SI_Type
is Nat
;
615 -- Index type for the table
617 Subps_First
: SI_Type
;
618 -- Record starting index for entries in the current nest (this is the table
619 -- index of the entry for Subp itself, and is recorded in the Subps_Index
620 -- field of the entity for this subprogram).
622 type Subp_Entry
is record
624 -- Entity of the subprogram
627 -- Subprogram_Body node for this subprogram
630 -- Subprogram level (1 = outer subprogram (Subp argument), 2 = nested
631 -- immediately within this outer subprogram etc.)
634 -- This flag is set True if there is a call path from the outer level
635 -- subprogram to this subprogram. If Reachable is False, it means that
636 -- the subprogram is declared but not actually referenced. We remove
637 -- such subprograms from the tree, which simplifies our task, because
638 -- we don't have to worry about e.g. uplevel references from such an
639 -- unreferenced subpogram, which might require (useless) activation
640 -- records to be created. This is computed by setting the outer level
641 -- subprogram (Subp itself) as reachable, and then doing a transitive
642 -- closure following all calls.
645 -- The outermost level which defines entities which this subprogram
646 -- references either directly or indirectly via a call. This cannot
647 -- be greater than Lev. If it is equal to Lev, then it means that the
648 -- subprogram does not make any uplevel references and that thus it
649 -- does not need an activation record pointer passed. If it is less than
650 -- Lev, then an activation record pointer is needed, since there is at
651 -- least one uplevel reference. This is computed by initially setting
652 -- Uplevel_Ref to Lev for all subprograms. Then on the initial tree
653 -- traversal, decreasing Uplevel_Ref for an explicit uplevel reference,
654 -- and finally by doing a transitive closure that follows calls (if A
655 -- calls B and B has an uplevel reference to level X, then A references
656 -- level X indirectly).
658 Declares_AREC
: Boolean;
659 -- This is set True for a subprogram which include the declarations
660 -- for a local activation record to be passed on downward calls. It
661 -- is set True for the target level of an uplevel reference, and for
662 -- all intervening nested subprograms. For example, if a subprogram X
663 -- at level 5 makes an uplevel reference to an entity declared in a
664 -- level 2 subprogram, then the subprograms at levels 4,3,2 enclosing
665 -- the level 5 subprogram will have this flag set True.
668 -- This is a list of entities declared in this subprogram which are
669 -- uplevel referenced. It contains both objects (which will be put in
670 -- the corresponding AREC activation record), and types. The types are
671 -- not put in the AREC activation record, but referenced bounds (i.e.
672 -- generated _FIRST and _LAST entites, and formal parameters) will be
673 -- in the list in their own right.
676 -- This field is set only in the entry for the outer level subprogram
677 -- in a nest, and records the last index in the Subp table for all the
678 -- entries for subprograms in this nest.
681 -- This entity is defined for all subprograms which need an extra formal
682 -- that contains a pointer to the activation record needed for uplevel
683 -- references. ARECnF must be defined for any subprogram which has a
684 -- direct or indirect uplevel reference (i.e. Reference_Level < Lev).
690 -- These AREC entities are defined only for subprograms for which we
691 -- generate an activation record declaration, i.e. for subprograms for
692 -- which the Declares_AREC flag is set True.
695 -- This AREC entity is the uplink component. It is other than Empty only
696 -- for nested subprograms that declare an activation record as indicated
697 -- by Declares_AREC being True, and which have uplevel references (Lev
698 -- greater than Uplevel_Ref). It is the additional component in the
699 -- activation record that references the ARECnF pointer (which points
700 -- the activation record one level higher, thus forming the chain).
704 package Subps
is new Table
.Table
(
705 Table_Component_Type
=> Subp_Entry
,
706 Table_Index_Type
=> SI_Type
,
707 Table_Low_Bound
=> 1,
708 Table_Initial
=> 1000,
709 Table_Increment
=> 200,
710 Table_Name
=> "Unnest_Subps");
711 -- Records the subprograms in the nest whose outer subprogram is Subp
717 function Get_Level
(Subp
: Entity_Id
; Sub
: Entity_Id
) return Nat
;
718 -- Sub is either Subp itself, or a subprogram nested within Subp. This
719 -- function returns the level of nesting (Subp = 1, subprograms that
720 -- are immediately nested within Subp = 2, etc.).
722 function In_Synchronized_Unit
(Subp
: Entity_Id
) return Boolean;
723 -- Predicate to identify subprograms declared in task and protected types.
724 -- These subprograms are called from outside the compilation and therefore
725 -- must be considered reachable (and cannot be eliminated) because we must
726 -- generate code for them.
728 function Subp_Index
(Sub
: Entity_Id
) return SI_Type
;
729 -- Given the entity for a subprogram, return corresponding Subp's index
731 procedure Unnest_Subprograms
(N
: Node_Id
);
732 -- Called to unnest subprograms. If we are in unnest subprogram mode, this
733 -- is the call that traverses the tree N and locates all the library-level
734 -- subprograms with nested subprograms to process them.