1 @c Copyright (C) 2002-2014 Free Software Foundation, Inc.
2 @c This is part of the GCC manual.
3 @c For copying conditions, see the file gcc.texi.
6 @chapter Memory Management and Type Information
10 GCC uses some fairly sophisticated memory management techniques, which
11 involve determining information about GCC's data structures from GCC's
12 source code and using this information to perform garbage collection and
13 implement precompiled headers.
15 A full C++ parser would be too complicated for this task, so a limited
16 subset of C++ is interpreted and special markers are used to determine
17 what parts of the source to look at. All @code{struct}, @code{union}
18 and @code{template} structure declarations that define data structures
19 that are allocated under control of the garbage collector must be
20 marked. All global variables that hold pointers to garbage-collected
21 memory must also be marked. Finally, all global variables that need
22 to be saved and restored by a precompiled header must be marked. (The
23 precompiled header mechanism can only save static variables if they're
24 scalar. Complex data structures must be allocated in garbage-collected
25 memory to be saved in a precompiled header.)
27 The full format of a marker is
29 GTY (([@var{option}] [(@var{param})], [@var{option}] [(@var{param})] @dots{}))
32 but in most cases no options are needed. The outer double parentheses
33 are still necessary, though: @code{GTY(())}. Markers can appear:
37 In a structure definition, before the open brace;
39 In a global variable declaration, after the keyword @code{static} or
42 In a structure field definition, before the name of the field.
45 Here are some examples of marking simple data structures and globals.
48 struct GTY(()) @var{tag}
53 typedef struct GTY(()) @var{tag}
58 static GTY(()) struct @var{tag} *@var{list}; /* @r{points to GC memory} */
59 static GTY(()) int @var{counter}; /* @r{save counter in a PCH} */
62 The parser understands simple typedefs such as
63 @code{typedef struct @var{tag} *@var{name};} and
64 @code{typedef int @var{name};}.
65 These don't need to be marked.
67 Since @code{gengtype}'s understanding of C++ is limited, there are
68 several constructs and declarations that are not supported inside
69 classes/structures marked for automatic GC code generation. The
70 following C++ constructs produce a @code{gengtype} error on
71 structures/classes marked for automatic GC code generation:
75 Type definitions inside classes/structures are not supported.
77 Enumerations inside classes/structures are not supported.
80 If you have a class or structure using any of the above constructs,
81 you need to mark that class as @code{GTY ((user))} and provide your
82 own marking routines (see section @ref{User GC} for details).
84 It is always valid to include function definitions inside classes.
85 Those are always ignored by @code{gengtype}, as it only cares about
89 * GTY Options:: What goes inside a @code{GTY(())}.
90 * Inheritance and GTY:: Adding GTY to a class hierarchy.
91 * User GC:: Adding user-provided GC marking routines.
92 * GGC Roots:: Making global variables GGC roots.
93 * Files:: How the generated files work.
94 * Invoking the garbage collector:: How to invoke the garbage collector.
95 * Troubleshooting:: When something does not work as expected.
99 @section The Inside of a @code{GTY(())}
101 Sometimes the C code is not enough to fully describe the type
102 structure. Extra information can be provided with @code{GTY} options
103 and additional markers. Some options take a parameter, which may be
104 either a string or a type name, depending on the parameter. If an
105 option takes no parameter, it is acceptable either to omit the
106 parameter entirely, or to provide an empty string as a parameter. For
107 example, @code{@w{GTY ((skip))}} and @code{@w{GTY ((skip ("")))}} are
110 When the parameter is a string, often it is a fragment of C code. Four
111 special escapes may be used in these strings, to refer to pieces of
112 the data structure being marked:
114 @cindex % in GTY option
117 The current structure.
119 The structure that immediately contains the current structure.
121 The outermost structure that contains the current structure.
123 A partial expression of the form @code{[i1][i2]@dots{}} that indexes
124 the array item currently being marked.
127 For instance, suppose that you have a structure of the form
137 and @code{b} is a variable of type @code{struct B}. When marking
138 @samp{b.foo[11]}, @code{%h} would expand to @samp{b.foo[11]},
139 @code{%0} and @code{%1} would both expand to @samp{b}, and @code{%a}
140 would expand to @samp{[11]}.
142 As in ordinary C, adjacent strings will be concatenated; this is
143 helpful when you have a complicated expression.
146 GTY ((chain_next ("TREE_CODE (&%h.generic) == INTEGER_TYPE"
147 " ? TYPE_NEXT_VARIANT (&%h.generic)"
148 " : TREE_CHAIN (&%h.generic)")))
152 The available options are:
156 @item length ("@var{expression}")
158 There are two places the type machinery will need to be explicitly told
159 the length of an array of non-atomic objects. The first case is when a
160 structure ends in a variable-length array, like this:
162 struct GTY(()) rtvec_def @{
163 int num_elem; /* @r{number of elements} */
164 rtx GTY ((length ("%h.num_elem"))) elem[1];
168 In this case, the @code{length} option is used to override the specified
169 array length (which should usually be @code{1}). The parameter of the
170 option is a fragment of C code that calculates the length.
172 The second case is when a structure or a global variable contains a
173 pointer to an array, like this:
175 struct gimple_omp_for_iter * GTY((length ("%h.collapse"))) iter;
177 In this case, @code{iter} has been allocated by writing something like
179 x->iter = ggc_alloc_cleared_vec_gimple_omp_for_iter (collapse);
181 and the @code{collapse} provides the length of the field.
183 This second use of @code{length} also works on global variables, like:
185 static GTY((length("reg_known_value_size"))) rtx *reg_known_value;
188 Note that the @code{length} option is only meant for use with arrays of
189 non-atomic objects, that is, objects that contain pointers pointing to
190 other GTY-managed objects. For other GC-allocated arrays and strings
191 you should use @code{atomic}.
196 If @code{skip} is applied to a field, the type machinery will ignore it.
197 This is somewhat dangerous; the only safe use is in a union when one
198 field really isn't ever used.
201 Use this to mark types that need to be marked by user gc routines, but are not
202 refered to in a template argument. So if you have some user gc type T1 and a
203 non user gc type T2 you can give T2 the for_user option so that the marking
204 functions for T1 can call non mangled functions to mark T2.
209 @item desc ("@var{expression}")
210 @itemx tag ("@var{constant}")
213 The type machinery needs to be told which field of a @code{union} is
214 currently active. This is done by giving each field a constant
215 @code{tag} value, and then specifying a discriminator using @code{desc}.
216 The value of the expression given by @code{desc} is compared against
217 each @code{tag} value, each of which should be different. If no
218 @code{tag} is matched, the field marked with @code{default} is used if
219 there is one, otherwise no field in the union will be marked.
221 In the @code{desc} option, the ``current structure'' is the union that
222 it discriminates. Use @code{%1} to mean the structure containing it.
223 There are no escapes available to the @code{tag} option, since it is a
228 struct GTY(()) tree_binding
230 struct tree_common common;
231 union tree_binding_u @{
232 tree GTY ((tag ("0"))) scope;
233 struct cp_binding_level * GTY ((tag ("1"))) level;
234 @} GTY ((desc ("BINDING_HAS_LEVEL_P ((tree)&%0)"))) xscope;
239 In this example, the value of BINDING_HAS_LEVEL_P when applied to a
240 @code{struct tree_binding *} is presumed to be 0 or 1. If 1, the type
241 mechanism will treat the field @code{level} as being present and if 0,
242 will treat the field @code{scope} as being present.
244 The @code{desc} and @code{tag} options can also be used for inheritance
245 to denote which subclass an instance is. See @ref{Inheritance and GTY}
246 for more information.
250 @item param_is (@var{type})
253 Sometimes it's convenient to define some data structure to work on
254 generic pointers (that is, @code{PTR}) and then use it with a specific
255 type. @code{param_is} specifies the real type pointed to, and
256 @code{use_param} says where in the generic data structure that type
259 For instance, to have a @code{htab_t} that points to trees, one would
260 write the definition of @code{htab_t} like this:
262 typedef struct GTY(()) @{
264 void ** GTY ((use_param, @dots{})) entries;
268 and then declare variables like this:
270 static htab_t GTY ((param_is (union tree_node))) ict;
273 @findex param@var{n}_is
274 @findex use_param@var{n}
275 @item param@var{n}_is (@var{type})
276 @itemx use_param@var{n}
278 In more complicated cases, the data structure might need to work on
279 several different types, which might not necessarily all be pointers.
280 For this, @code{param1_is} through @code{param9_is} may be used to
281 specify the real type of a field identified by @code{use_param1} through
287 When a structure contains another structure that is parameterized,
288 there's no need to do anything special, the inner structure inherits the
289 parameters of the outer one. When a structure contains a pointer to a
290 parameterized structure, the type machinery won't automatically detect
291 this (it could, it just doesn't yet), so it's necessary to tell it that
292 the pointed-to structure should use the same parameters as the outer
293 structure. This is done by marking the pointer with the
294 @code{use_params} option.
299 @code{deletable}, when applied to a global variable, indicates that when
300 garbage collection runs, there's no need to mark anything pointed to
301 by this variable, it can just be set to @code{NULL} instead. This is used
302 to keep a list of free structures around for re-use.
305 @item if_marked ("@var{expression}")
307 Suppose you want some kinds of object to be unique, and so you put them
308 in a hash table. If garbage collection marks the hash table, these
309 objects will never be freed, even if the last other reference to them
310 goes away. GGC has special handling to deal with this: if you use the
311 @code{if_marked} option on a global hash table, GGC will call the
312 routine whose name is the parameter to the option on each hash table
313 entry. If the routine returns nonzero, the hash table entry will
314 be marked as usual. If the routine returns zero, the hash table entry
317 The routine @code{ggc_marked_p} can be used to determine if an element
318 has been marked already; in fact, the usual case is to use
319 @code{if_marked ("ggc_marked_p")}.
322 @item mark_hook ("@var{hook-routine-name}")
324 If provided for a structure or union type, the given
325 @var{hook-routine-name} (between double-quotes) is the name of a
326 routine called when the garbage collector has just marked the data as
327 reachable. This routine should not change the data, or call any ggc
328 routine. Its only argument is a pointer to the just marked (const)
334 When applied to a field, @code{maybe_undef} indicates that it's OK if
335 the structure that this fields points to is never defined, so long as
336 this field is always @code{NULL}. This is used to avoid requiring
337 backends to define certain optional structures. It doesn't work with
341 @item nested_ptr (@var{type}, "@var{to expression}", "@var{from expression}")
343 The type machinery expects all pointers to point to the start of an
344 object. Sometimes for abstraction purposes it's convenient to have
345 a pointer which points inside an object. So long as it's possible to
346 convert the original object to and from the pointer, such pointers
347 can still be used. @var{type} is the type of the original object,
348 the @var{to expression} returns the pointer given the original object,
349 and the @var{from expression} returns the original object given
350 the pointer. The pointer will be available using the @code{%h}
355 @findex chain_circular
356 @item chain_next ("@var{expression}")
357 @itemx chain_prev ("@var{expression}")
358 @itemx chain_circular ("@var{expression}")
360 It's helpful for the type machinery to know if objects are often
361 chained together in long lists; this lets it generate code that uses
362 less stack space by iterating along the list instead of recursing down
363 it. @code{chain_next} is an expression for the next item in the list,
364 @code{chain_prev} is an expression for the previous item. For singly
365 linked lists, use only @code{chain_next}; for doubly linked lists, use
366 both. The machinery requires that taking the next item of the
367 previous item gives the original item. @code{chain_circular} is similar
368 to @code{chain_next}, but can be used for circular single linked lists.
371 @item reorder ("@var{function name}")
373 Some data structures depend on the relative ordering of pointers. If
374 the precompiled header machinery needs to change that ordering, it
375 will call the function referenced by the @code{reorder} option, before
376 changing the pointers in the object that's pointed to by the field the
377 option applies to. The function must take four arguments, with the
378 signature @samp{@w{void *, void *, gt_pointer_operator, void *}}.
379 The first parameter is a pointer to the structure that contains the
380 object being updated, or the object itself if there is no containing
381 structure. The second parameter is a cookie that should be ignored.
382 The third parameter is a routine that, given a pointer, will update it
383 to its correct new value. The fourth parameter is a cookie that must
384 be passed to the second parameter.
386 PCH cannot handle data structures that depend on the absolute values
387 of pointers. @code{reorder} functions can be expensive. When
388 possible, it is better to depend on properties of the data, like an ID
389 number or the hash of a string instead.
394 The @code{atomic} option can only be used with pointers. It informs
395 the GC machinery that the memory that the pointer points to does not
396 contain any pointers, and hence it should be treated by the GC and PCH
397 machinery as an ``atomic'' block of memory that does not need to be
398 examined when scanning memory for pointers. In particular, the
399 machinery will not scan that memory for pointers to mark them as
400 reachable (when marking pointers for GC) or to relocate them (when
403 The @code{atomic} option differs from the @code{skip} option.
404 @code{atomic} keeps the memory under Garbage Collection, but makes the
405 GC ignore the contents of the memory. @code{skip} is more drastic in
406 that it causes the pointer and the memory to be completely ignored by
407 the Garbage Collector. So, memory marked as @code{atomic} is
408 automatically freed when no longer reachable, while memory marked as
411 The @code{atomic} option must be used with great care, because all
412 sorts of problem can occur if used incorrectly, that is, if the memory
413 the pointer points to does actually contain a pointer.
415 Here is an example of how to use it:
417 struct GTY(()) my_struct @{
418 int number_of_elements;
419 unsigned int * GTY ((atomic)) elements;
422 In this case, @code{elements} is a pointer under GC, and the memory it
423 points to needs to be allocated using the Garbage Collector, and will
424 be freed automatically by the Garbage Collector when it is no longer
425 referenced. But the memory that the pointer points to is an array of
426 @code{unsigned int} elements, and the GC must not try to scan it to
427 find pointers to mark or relocate, which is why it is marked with the
428 @code{atomic} option.
430 Note that, currently, global variables can not be marked with
431 @code{atomic}; only fields of a struct can. This is a known
432 limitation. It would be useful to be able to mark global pointers
433 with @code{atomic} to make the PCH machinery aware of them so that
434 they are saved and restored correctly to PCH files.
437 @item special ("@var{name}")
439 The @code{special} option is used to mark types that have to be dealt
440 with by special case machinery. The parameter is the name of the
441 special case. See @file{gengtype.c} for further details. Avoid
442 adding new special cases unless there is no other alternative.
447 The @code{user} option indicates that the code to mark structure
448 fields is completely handled by user-provided routines. See section
449 @ref{User GC} for details on what functions need to be provided.
452 @node Inheritance and GTY
453 @section Support for inheritance
454 gengtype has some support for simple class hierarchies. You can use
455 this to have gengtype autogenerate marking routines, provided:
459 There must be a concrete base class, with a discriminator expression
460 that can be used to identify which subclass an instance is.
462 Only single inheritance is used.
464 None of the classes within the hierarchy are templates.
467 If your class hierarchy does not fit in this pattern, you must use
468 @ref{User GC} instead.
470 The base class and its discriminator must be identified using the ``desc''
471 option. Each concrete subclass must use the ``tag'' option to identify
472 which value of the discriminator it corresponds to.
474 Every class in the hierarchy must have a @code{GTY(())} marker, as
475 gengtype will only attempt to parse classes that have such a marker
476 @footnote{Classes lacking such a marker will not be identified as being
477 part of the hierarchy, and so the marking routines will not handle them,
478 leading to a assertion failure within the marking routines due to an
479 unknown tag value (assuming that assertions are enabled).}.
482 class GTY((desc("%h.kind"), tag("0"))) example_base
489 class GTY((tag("1")) some_subclass : public example_base
495 class GTY((tag("2")) some_other_subclass : public example_base
502 The generated marking routines for the above will contain a ``switch''
503 on ``kind'', visiting all appropriate fields. For example, if kind is
504 2, it will cast to ``some_other_subclass'' and visit fields a, b, and c.
507 @section Support for user-provided GC marking routines
509 The garbage collector supports types for which no automatic marking
510 code is generated. For these types, the user is required to provide
511 three functions: one to act as a marker for garbage collection, and
512 two functions to act as marker and pointer walker for pre-compiled
515 Given a structure @code{struct GTY((user)) my_struct}, the following functions
516 should be defined to mark @code{my_struct}:
519 void gt_ggc_mx (my_struct *p)
521 /* This marks field 'fld'. */
525 void gt_pch_nx (my_struct *p)
527 /* This marks field 'fld'. */
531 void gt_pch_nx (my_struct *p, gt_pointer_operator op, void *cookie)
533 /* For every field 'fld', call the given pointer operator. */
534 op (&(tp->fld), cookie);
538 In general, each marker @code{M} should call @code{M} for every
539 pointer field in the structure. Fields that are not allocated in GC
540 or are not pointers must be ignored.
542 For embedded lists (e.g., structures with a @code{next} or @code{prev}
543 pointer), the marker must follow the chain and mark every element in
546 Note that the rules for the pointer walker @code{gt_pch_nx (my_struct
547 *, gt_pointer_operator, void *)} are slightly different. In this
548 case, the operation @code{op} must be applied to the @emph{address} of
551 @subsection User-provided marking routines for template types
552 When a template type @code{TP} is marked with @code{GTY}, all
553 instances of that type are considered user-provided types. This means
554 that the individual instances of @code{TP} do not need to be marked
555 with @code{GTY}. The user needs to provide template functions to mark
556 all the fields of the type.
558 The following code snippets represent all the functions that need to
559 be provided. Note that type @code{TP} may reference to more than one
560 type. In these snippets, there is only one type @code{T}, but there
565 void gt_ggc_mx (TP<T> *tp)
567 extern void gt_ggc_mx (T&);
569 /* This marks field 'fld' of type 'T'. */
574 void gt_pch_nx (TP<T> *tp)
576 extern void gt_pch_nx (T&);
578 /* This marks field 'fld' of type 'T'. */
583 void gt_pch_nx (TP<T *> *tp, gt_pointer_operator op, void *cookie)
585 /* For every field 'fld' of 'tp' with type 'T *', call the given
587 op (&(tp->fld), cookie);
591 void gt_pch_nx (TP<T> *tp, gt_pointer_operator, void *cookie)
593 extern void gt_pch_nx (T *, gt_pointer_operator, void *);
595 /* For every field 'fld' of 'tp' with type 'T', call the pointer
596 walker for all the fields of T. */
597 gt_pch_nx (&(tp->fld), op, cookie);
601 Support for user-defined types is currently limited. The following
605 @item Type @code{TP} and all the argument types @code{T} must be
606 marked with @code{GTY}.
608 @item Type @code{TP} can only have type names in its argument list.
610 @item The pointer walker functions are different for @code{TP<T>} and
611 @code{TP<T *>}. In the case of @code{TP<T>}, references to
612 @code{T} must be handled by calling @code{gt_pch_nx} (which
613 will, in turn, walk all the pointers inside fields of @code{T}).
614 In the case of @code{TP<T *>}, references to @code{T *} must be
615 handled by calling the @code{op} function on the address of the
616 pointer (see the code snippets above).
620 @section Marking Roots for the Garbage Collector
621 @cindex roots, marking
622 @cindex marking roots
624 In addition to keeping track of types, the type machinery also locates
625 the global variables (@dfn{roots}) that the garbage collector starts
626 at. Roots must be declared using one of the following syntaxes:
630 @code{extern GTY(([@var{options}])) @var{type} @var{name};}
632 @code{static GTY(([@var{options}])) @var{type} @var{name};}
638 @code{GTY(([@var{options}])) @var{type} @var{name};}
641 is @emph{not} accepted. There should be an @code{extern} declaration
642 of such a variable in a header somewhere---mark that, not the
643 definition. Or, if the variable is only used in one file, make it
647 @section Source Files Containing Type Information
648 @cindex generated files
649 @cindex files, generated
651 Whenever you add @code{GTY} markers to a source file that previously
652 had none, or create a new source file containing @code{GTY} markers,
653 there are three things you need to do:
657 You need to add the file to the list of source files the type
658 machinery scans. There are four cases:
662 For a back-end file, this is usually done
663 automatically; if not, you should add it to @code{target_gtfiles} in
664 the appropriate port's entries in @file{config.gcc}.
667 For files shared by all front ends, add the filename to the
668 @code{GTFILES} variable in @file{Makefile.in}.
671 For files that are part of one front end, add the filename to the
672 @code{gtfiles} variable defined in the appropriate
673 @file{config-lang.in}.
674 Headers should appear before non-headers in this list.
677 For files that are part of some but not all front ends, add the
678 filename to the @code{gtfiles} variable of @emph{all} the front ends
683 If the file was a header file, you'll need to check that it's included
684 in the right place to be visible to the generated files. For a back-end
685 header file, this should be done automatically. For a front-end header
686 file, it needs to be included by the same file that includes
687 @file{gtype-@var{lang}.h}. For other header files, it needs to be
688 included in @file{gtype-desc.c}, which is a generated file, so add it to
689 @code{ifiles} in @code{open_base_file} in @file{gengtype.c}.
691 For source files that aren't header files, the machinery will generate a
692 header file that should be included in the source file you just changed.
693 The file will be called @file{gt-@var{path}.h} where @var{path} is the
694 pathname relative to the @file{gcc} directory with slashes replaced by
695 @verb{|-|}, so for example the header file to be included in
696 @file{cp/parser.c} is called @file{gt-cp-parser.c}. The
697 generated header file should be included after everything else in the
698 source file. Don't forget to mention this file as a dependency in the
703 For language frontends, there is another file that needs to be included
704 somewhere. It will be called @file{gtype-@var{lang}.h}, where
705 @var{lang} is the name of the subdirectory the language is contained in.
707 Plugins can add additional root tables. Run the @code{gengtype}
708 utility in plugin mode as @code{gengtype -P pluginout.h @var{source-dir}
709 @var{file-list} @var{plugin*.c}} with your plugin files
710 @var{plugin*.c} using @code{GTY} to generate the @var{pluginout.h} file.
711 The GCC build tree is needed to be present in that mode.
714 @node Invoking the garbage collector
715 @section How to invoke the garbage collector
716 @cindex garbage collector, invocation
719 The GCC garbage collector GGC is only invoked explicitly. In contrast
720 with many other garbage collectors, it is not implicitly invoked by
721 allocation routines when a lot of memory has been consumed. So the
722 only way to have GGC reclaim storage is to call the @code{ggc_collect}
723 function explicitly. This call is an expensive operation, as it may
724 have to scan the entire heap. Beware that local variables (on the GCC
725 call stack) are not followed by such an invocation (as many other
726 garbage collectors do): you should reference all your data from static
727 or external @code{GTY}-ed variables, and it is advised to call
728 @code{ggc_collect} with a shallow call stack. The GGC is an exact mark
729 and sweep garbage collector (so it does not scan the call stack for
730 pointers). In practice GCC passes don't often call @code{ggc_collect}
731 themselves, because it is called by the pass manager between passes.
733 At the time of the @code{ggc_collect} call all pointers in the GC-marked
734 structures must be valid or @code{NULL}. In practice this means that
735 there should not be uninitialized pointer fields in the structures even
736 if your code never reads or writes those fields at a particular
737 instance. One way to ensure this is to use cleared versions of
738 allocators unless all the fields are initialized manually immediately
741 @node Troubleshooting
742 @section Troubleshooting the garbage collector
743 @cindex garbage collector, troubleshooting
745 With the current garbage collector implementation, most issues should
746 show up as GCC compilation errors. Some of the most commonly
747 encountered issues are described below.
750 @item Gengtype does not produce allocators for a @code{GTY}-marked type.
751 Gengtype checks if there is at least one possible path from GC roots to
752 at least one instance of each type before outputting allocators. If
753 there is no such path, the @code{GTY} markers will be ignored and no
754 allocators will be output. Solve this by making sure that there exists
755 at least one such path. If creating it is unfeasible or raises a ``code
756 smell'', consider if you really must use GC for allocating such type.
758 @item Link-time errors about undefined @code{gt_ggc_r_foo_bar} and
759 similarly-named symbols. Check if your @file{foo_bar} source file has
760 @code{#include "gt-foo_bar.h"} as its very last line.