1 @c Copyright (C) 2002-2024 Free Software Foundation, Inc.
2 @c This is part of the GCC manual.
3 @c For copying conditions, see the file gcc.texi.
6 @chapter Memory Management and Type Information
10 GCC uses some fairly sophisticated memory management techniques, which
11 involve determining information about GCC's data structures from GCC's
12 source code and using this information to perform garbage collection and
13 implement precompiled headers.
15 A full C++ parser would be too complicated for this task, so a limited
16 subset of C++ is interpreted and special markers are used to determine
17 what parts of the source to look at. All @code{struct}, @code{union}
18 and @code{template} structure declarations that define data structures
19 that are allocated under control of the garbage collector must be
20 marked. All global variables that hold pointers to garbage-collected
21 memory must also be marked. Finally, all global variables that need
22 to be saved and restored by a precompiled header must be marked. (The
23 precompiled header mechanism can only save static variables if they're
24 scalar. Complex data structures must be allocated in garbage-collected
25 memory to be saved in a precompiled header.)
27 The full format of a marker is
29 GTY (([@var{option}] [(@var{param})], [@var{option}] [(@var{param})] @dots{}))
32 but in most cases no options are needed. The outer double parentheses
33 are still necessary, though: @code{GTY(())}. Markers can appear:
37 In a structure definition, before the open brace;
39 In a global variable declaration, after the keyword @code{static} or
42 In a structure field definition, before the name of the field.
45 Here are some examples of marking simple data structures and globals.
48 struct GTY(()) @var{tag}
53 typedef struct GTY(()) @var{tag}
58 static GTY(()) struct @var{tag} *@var{list}; /* @r{points to GC memory} */
59 static GTY(()) int @var{counter}; /* @r{save counter in a PCH} */
62 The parser understands simple typedefs such as
63 @code{typedef struct @var{tag} *@var{name};} and
64 @code{typedef int @var{name};}.
65 These don't need to be marked.
67 However, in combination with GTY, avoid using typedefs such as
68 @code{typedef int_hash<@dots{}> @var{name};}
69 for these generate infinite-recursion code.
70 See @uref{https://gcc.gnu.org/PR103157,PR103157}.
72 @code{struct @var{name} : int_hash<@dots{}> @{@};},
75 Since @code{gengtype}'s understanding of C++ is limited, there are
76 several constructs and declarations that are not supported inside
77 classes/structures marked for automatic GC code generation. The
78 following C++ constructs produce a @code{gengtype} error on
79 structures/classes marked for automatic GC code generation:
83 Type definitions inside classes/structures are not supported.
85 Enumerations inside classes/structures are not supported.
88 If you have a class or structure using any of the above constructs,
89 you need to mark that class as @code{GTY ((user))} and provide your
90 own marking routines (see section @ref{User GC} for details).
92 It is always valid to include function definitions inside classes.
93 Those are always ignored by @code{gengtype}, as it only cares about
97 * GTY Options:: What goes inside a @code{GTY(())}.
98 * Inheritance and GTY:: Adding GTY to a class hierarchy.
99 * User GC:: Adding user-provided GC marking routines.
100 * GGC Roots:: Making global variables GGC roots.
101 * Files:: How the generated files work.
102 * Invoking the garbage collector:: How to invoke the garbage collector.
103 * Troubleshooting:: When something does not work as expected.
107 @section The Inside of a @code{GTY(())}
109 Sometimes the C code is not enough to fully describe the type
110 structure. Extra information can be provided with @code{GTY} options
111 and additional markers. Some options take a parameter, which may be
112 either a string or a type name, depending on the parameter. If an
113 option takes no parameter, it is acceptable either to omit the
114 parameter entirely, or to provide an empty string as a parameter. For
115 example, @code{@w{GTY ((skip))}} and @code{@w{GTY ((skip ("")))}} are
118 When the parameter is a string, often it is a fragment of C code. Four
119 special escapes may be used in these strings, to refer to pieces of
120 the data structure being marked:
122 @cindex % in GTY option
125 The current structure.
127 The structure that immediately contains the current structure.
129 The outermost structure that contains the current structure.
131 A partial expression of the form @code{[i1][i2]@dots{}} that indexes
132 the array item currently being marked.
135 For instance, suppose that you have a structure of the form
145 and @code{b} is a variable of type @code{struct B}. When marking
146 @samp{b.foo[11]}, @code{%h} would expand to @samp{b.foo[11]},
147 @code{%0} and @code{%1} would both expand to @samp{b}, and @code{%a}
148 would expand to @samp{[11]}.
150 As in ordinary C, adjacent strings will be concatenated; this is
151 helpful when you have a complicated expression.
154 GTY ((chain_next ("TREE_CODE (&%h.generic) == INTEGER_TYPE"
155 " ? TYPE_NEXT_VARIANT (&%h.generic)"
156 " : TREE_CHAIN (&%h.generic)")))
160 The available options are:
164 @item length ("@var{expression}")
166 There are two places the type machinery will need to be explicitly told
167 the length of an array of non-atomic objects. The first case is when a
168 structure ends in a variable-length array, like this:
170 struct GTY(()) rtvec_def @{
171 int num_elem; /* @r{number of elements} */
172 rtx GTY ((length ("%h.num_elem"))) elem[1];
176 In this case, the @code{length} option is used to override the specified
177 array length (which should usually be @code{1}). The parameter of the
178 option is a fragment of C code that calculates the length.
180 The second case is when a structure or a global variable contains a
181 pointer to an array, like this:
183 struct gimple_omp_for_iter * GTY((length ("%h.collapse"))) iter;
185 In this case, @code{iter} has been allocated by writing something like
187 x->iter = ggc_alloc_cleared_vec_gimple_omp_for_iter (collapse);
189 and the @code{collapse} provides the length of the field.
191 This second use of @code{length} also works on global variables, like:
193 static GTY((length("reg_known_value_size"))) rtx *reg_known_value;
196 Note that the @code{length} option is only meant for use with arrays of
197 non-atomic objects, that is, objects that contain pointers pointing to
198 other GTY-managed objects. For other GC-allocated arrays and strings
199 you should use @code{atomic} or @code{string_length}.
201 @findex string_length
202 @item string_length ("@var{expression}")
204 In order to simplify production of PCH, a structure member that is a plain
205 array of bytes (an optionally @code{const} and/or @code{unsigned} @code{char
206 *}) is treated specially by the infrastructure. Even if such an array has not
207 been allocated in GC-controlled memory, it will still be written properly into
208 a PCH. The machinery responsible for this needs to know the length of the
209 data; by default, the length is determined by calling @code{strlen} on the
210 pointer. The @code{string_length} option specifies an alternate way to
211 determine the length, such as by inspecting another struct member:
214 struct GTY(()) non_terminated_string @{
216 const char * GTY((string_length ("%h.sz"))) data;
220 Similarly, this is useful for (regular NUL-terminated) strings with
221 NUL characters embedded (that the default @code{strlen} use would run
225 struct GTY(()) multi_string @{
226 const char * GTY((string_length ("%h.len + 1"))) str;
231 The @code{string_length} option currently is not supported for (fields
232 in) global variables.
233 @c <https://inbox.sourceware.org/87bkgqvlst.fsf@euler.schwinge.homeip.net>
238 If @code{skip} is applied to a field, the type machinery will ignore it.
239 This is somewhat dangerous; the only safe use is in a union when one
240 field really isn't ever used.
245 @code{callback} should be applied to fields with pointer to function type
246 and causes the field to be ignored similarly to @code{skip}, except when
247 writing PCH and the field is non-NULL it will remember the field's address
248 for relocation purposes if the process writing PCH has different load base
249 from a process reading PCH.
254 Use this to mark types that need to be marked by user gc routines, but are not
255 refered to in a template argument. So if you have some user gc type T1 and a
256 non user gc type T2 you can give T2 the for_user option so that the marking
257 functions for T1 can call non mangled functions to mark T2.
262 @item desc ("@var{expression}")
263 @itemx tag ("@var{constant}")
266 The type machinery needs to be told which field of a @code{union} is
267 currently active. This is done by giving each field a constant
268 @code{tag} value, and then specifying a discriminator using @code{desc}.
269 The value of the expression given by @code{desc} is compared against
270 each @code{tag} value, each of which should be different. If no
271 @code{tag} is matched, the field marked with @code{default} is used if
272 there is one, otherwise no field in the union will be marked.
274 In the @code{desc} option, the ``current structure'' is the union that
275 it discriminates. Use @code{%1} to mean the structure containing it.
276 There are no escapes available to the @code{tag} option, since it is a
281 struct GTY(()) tree_binding
283 struct tree_common common;
284 union tree_binding_u @{
285 tree GTY ((tag ("0"))) scope;
286 struct cp_binding_level * GTY ((tag ("1"))) level;
287 @} GTY ((desc ("BINDING_HAS_LEVEL_P ((tree)&%0)"))) xscope;
292 In this example, the value of BINDING_HAS_LEVEL_P when applied to a
293 @code{struct tree_binding *} is presumed to be 0 or 1. If 1, the type
294 mechanism will treat the field @code{level} as being present and if 0,
295 will treat the field @code{scope} as being present.
297 The @code{desc} and @code{tag} options can also be used for inheritance
298 to denote which subclass an instance is. See @ref{Inheritance and GTY}
299 for more information.
304 When the @code{cache} option is applied to a global variable gt_cleare_cache is
305 called on that variable between the mark and sweep phases of garbage
306 collection. The gt_clear_cache function is free to mark blocks as used, or to
307 clear pointers in the variable.
309 In a hash table, the @samp{gt_cleare_cache} function discards entries
310 if the key is not marked, or marks the value if the key is marked.
312 Note that caches should generally use @code{deletable} instead;
313 @code{cache} is only preferable if the value is impractical to
314 recompute from the key when needed.
319 @code{deletable}, when applied to a global variable, indicates that when
320 garbage collection runs, there's no need to mark anything pointed to
321 by this variable, it can just be set to @code{NULL} instead. This is used
322 to keep a list of free structures around for re-use.
327 When applied to a field, @code{maybe_undef} indicates that it's OK if
328 the structure that this fields points to is never defined, so long as
329 this field is always @code{NULL}. This is used to avoid requiring
330 backends to define certain optional structures. It doesn't work with
334 @item nested_ptr (@var{type}, "@var{to expression}", "@var{from expression}")
336 The type machinery expects all pointers to point to the start of an
337 object. Sometimes for abstraction purposes it's convenient to have
338 a pointer which points inside an object. So long as it's possible to
339 convert the original object to and from the pointer, such pointers
340 can still be used. @var{type} is the type of the original object,
341 the @var{to expression} returns the pointer given the original object,
342 and the @var{from expression} returns the original object given
343 the pointer. The pointer will be available using the @code{%h}
348 @findex chain_circular
349 @item chain_next ("@var{expression}")
350 @itemx chain_prev ("@var{expression}")
351 @itemx chain_circular ("@var{expression}")
353 It's helpful for the type machinery to know if objects are often
354 chained together in long lists; this lets it generate code that uses
355 less stack space by iterating along the list instead of recursing down
356 it. @code{chain_next} is an expression for the next item in the list,
357 @code{chain_prev} is an expression for the previous item. For singly
358 linked lists, use only @code{chain_next}; for doubly linked lists, use
359 both. The machinery requires that taking the next item of the
360 previous item gives the original item. @code{chain_circular} is similar
361 to @code{chain_next}, but can be used for circular single linked lists.
364 @item reorder ("@var{function name}")
366 Some data structures depend on the relative ordering of pointers. If
367 the precompiled header machinery needs to change that ordering, it
368 will call the function referenced by the @code{reorder} option, before
369 changing the pointers in the object that's pointed to by the field the
370 option applies to. The function must take four arguments, with the
371 signature @samp{@w{void *, void *, gt_pointer_operator, void *}}.
372 The first parameter is a pointer to the structure that contains the
373 object being updated, or the object itself if there is no containing
374 structure. The second parameter is a cookie that should be ignored.
375 The third parameter is a routine that, given a pointer, will update it
376 to its correct new value. The fourth parameter is a cookie that must
377 be passed to the second parameter.
379 PCH cannot handle data structures that depend on the absolute values
380 of pointers. @code{reorder} functions can be expensive. When
381 possible, it is better to depend on properties of the data, like an ID
382 number or the hash of a string instead.
387 The @code{atomic} option can only be used with pointers. It informs
388 the GC machinery that the memory that the pointer points to does not
389 contain any pointers, and hence it should be treated by the GC and PCH
390 machinery as an ``atomic'' block of memory that does not need to be
391 examined when scanning memory for pointers. In particular, the
392 machinery will not scan that memory for pointers to mark them as
393 reachable (when marking pointers for GC) or to relocate them (when
396 The @code{atomic} option differs from the @code{skip} option.
397 @code{atomic} keeps the memory under Garbage Collection, but makes the
398 GC ignore the contents of the memory. @code{skip} is more drastic in
399 that it causes the pointer and the memory to be completely ignored by
400 the Garbage Collector. So, memory marked as @code{atomic} is
401 automatically freed when no longer reachable, while memory marked as
404 The @code{atomic} option must be used with great care, because all
405 sorts of problem can occur if used incorrectly, that is, if the memory
406 the pointer points to does actually contain a pointer.
408 Here is an example of how to use it:
410 struct GTY(()) my_struct @{
411 int number_of_elements;
412 unsigned int * GTY ((atomic)) elements;
415 In this case, @code{elements} is a pointer under GC, and the memory it
416 points to needs to be allocated using the Garbage Collector, and will
417 be freed automatically by the Garbage Collector when it is no longer
418 referenced. But the memory that the pointer points to is an array of
419 @code{unsigned int} elements, and the GC must not try to scan it to
420 find pointers to mark or relocate, which is why it is marked with the
421 @code{atomic} option.
423 Note that, currently, global variables cannot be marked with
424 @code{atomic}; only fields of a struct can. This is a known
425 limitation. It would be useful to be able to mark global pointers
426 with @code{atomic} to make the PCH machinery aware of them so that
427 they are saved and restored correctly to PCH files.
430 @item special ("@var{name}")
432 The @code{special} option is used to mark types that have to be dealt
433 with by special case machinery. The parameter is the name of the
434 special case. See @file{gengtype.cc} for further details. Avoid
435 adding new special cases unless there is no other alternative.
440 The @code{user} option indicates that the code to mark structure
441 fields is completely handled by user-provided routines. See section
442 @ref{User GC} for details on what functions need to be provided.
445 @node Inheritance and GTY
446 @section Support for inheritance
447 gengtype has some support for simple class hierarchies. You can use
448 this to have gengtype autogenerate marking routines, provided:
452 There must be a concrete base class, with a discriminator expression
453 that can be used to identify which subclass an instance is.
455 Only single inheritance is used.
457 None of the classes within the hierarchy are templates.
460 If your class hierarchy does not fit in this pattern, you must use
461 @ref{User GC} instead.
463 The base class and its discriminator must be identified using the ``desc''
464 option. Each concrete subclass must use the ``tag'' option to identify
465 which value of the discriminator it corresponds to.
467 Every class in the hierarchy must have a @code{GTY(())} marker, as
468 gengtype will only attempt to parse classes that have such a marker
469 @footnote{Classes lacking such a marker will not be identified as being
470 part of the hierarchy, and so the marking routines will not handle them,
471 leading to a assertion failure within the marking routines due to an
472 unknown tag value (assuming that assertions are enabled).}.
475 class GTY((desc("%h.kind"), tag("0"))) example_base
482 class GTY((tag("1"))) some_subclass : public example_base
488 class GTY((tag("2"))) some_other_subclass : public example_base
495 The generated marking routines for the above will contain a ``switch''
496 on ``kind'', visiting all appropriate fields. For example, if kind is
497 2, it will cast to ``some_other_subclass'' and visit fields a, b, and c.
500 @section Support for user-provided GC marking routines
502 The garbage collector supports types for which no automatic marking
503 code is generated. For these types, the user is required to provide
504 three functions: one to act as a marker for garbage collection, and
505 two functions to act as marker and pointer walker for pre-compiled
508 Given a structure @code{struct GTY((user)) my_struct}, the following functions
509 should be defined to mark @code{my_struct}:
512 void gt_ggc_mx (my_struct *p)
514 /* This marks field 'fld'. */
518 void gt_pch_nx (my_struct *p)
520 /* This marks field 'fld'. */
524 void gt_pch_nx (my_struct *p, gt_pointer_operator op, void *cookie)
526 /* For every field 'fld', call the given pointer operator. */
527 op (&(tp->fld), NULL, cookie);
531 In general, each marker @code{M} should call @code{M} for every
532 pointer field in the structure. Fields that are not allocated in GC
533 or are not pointers must be ignored.
535 For embedded lists (e.g., structures with a @code{next} or @code{prev}
536 pointer), the marker must follow the chain and mark every element in
539 Note that the rules for the pointer walker @code{gt_pch_nx (my_struct
540 *, gt_pointer_operator, void *)} are slightly different. In this
541 case, the operation @code{op} must be applied to the @emph{address} of
544 @subsection User-provided marking routines for template types
545 When a template type @code{TP} is marked with @code{GTY}, all
546 instances of that type are considered user-provided types. This means
547 that the individual instances of @code{TP} do not need to be marked
548 with @code{GTY}. The user needs to provide template functions to mark
549 all the fields of the type.
551 The following code snippets represent all the functions that need to
552 be provided. Note that type @code{TP} may reference to more than one
553 type. In these snippets, there is only one type @code{T}, but there
558 void gt_ggc_mx (TP<T> *tp)
560 extern void gt_ggc_mx (T&);
562 /* This marks field 'fld' of type 'T'. */
567 void gt_pch_nx (TP<T> *tp)
569 extern void gt_pch_nx (T&);
571 /* This marks field 'fld' of type 'T'. */
576 void gt_pch_nx (TP<T *> *tp, gt_pointer_operator op, void *cookie)
578 /* For every field 'fld' of 'tp' with type 'T *', call the given
580 op (&(tp->fld), NULL, cookie);
584 void gt_pch_nx (TP<T> *tp, gt_pointer_operator, void *cookie)
586 extern void gt_pch_nx (T *, gt_pointer_operator, void *);
588 /* For every field 'fld' of 'tp' with type 'T', call the pointer
589 walker for all the fields of T. */
590 gt_pch_nx (&(tp->fld), op, cookie);
594 Support for user-defined types is currently limited. The following
598 @item Type @code{TP} and all the argument types @code{T} must be
599 marked with @code{GTY}.
601 @item Type @code{TP} can only have type names in its argument list.
603 @item The pointer walker functions are different for @code{TP<T>} and
604 @code{TP<T *>}. In the case of @code{TP<T>}, references to
605 @code{T} must be handled by calling @code{gt_pch_nx} (which
606 will, in turn, walk all the pointers inside fields of @code{T}).
607 In the case of @code{TP<T *>}, references to @code{T *} must be
608 handled by calling the @code{op} function on the address of the
609 pointer (see the code snippets above).
613 @section Marking Roots for the Garbage Collector
614 @cindex roots, marking
615 @cindex marking roots
617 In addition to keeping track of types, the type machinery also locates
618 the global variables (@dfn{roots}) that the garbage collector starts
619 at. Roots must be declared using one of the following syntaxes:
623 @code{extern GTY(([@var{options}])) @var{type} @var{name};}
625 @code{static GTY(([@var{options}])) @var{type} @var{name};}
631 @code{GTY(([@var{options}])) @var{type} @var{name};}
634 is @emph{not} accepted. There should be an @code{extern} declaration
635 of such a variable in a header somewhere---mark that, not the
636 definition. Or, if the variable is only used in one file, make it
640 @section Source Files Containing Type Information
641 @cindex generated files
642 @cindex files, generated
644 Whenever you add @code{GTY} markers to a source file that previously
645 had none, or create a new source file containing @code{GTY} markers,
646 there are three things you need to do:
650 You need to add the file to the list of source files the type
651 machinery scans. There are four cases:
655 For a back-end file, this is usually done
656 automatically; if not, you should add it to @code{target_gtfiles} in
657 the appropriate port's entries in @file{config.gcc}.
660 For files shared by all front ends, add the filename to the
661 @code{GTFILES} variable in @file{Makefile.in}.
664 For files that are part of one front end, add the filename to the
665 @code{gtfiles} variable defined in the appropriate
666 @file{config-lang.in}.
667 Headers should appear before non-headers in this list.
670 For files that are part of some but not all front ends, add the
671 filename to the @code{gtfiles} variable of @emph{all} the front ends
676 If the file was a header file, you'll need to check that it's included
677 in the right place to be visible to the generated files. For a back-end
678 header file, this should be done automatically. For a front-end header
679 file, it needs to be included by the same file that includes
680 @file{gtype-@var{lang}.h}. For other header files, it needs to be
681 included in @file{gtype-desc.cc}, which is a generated file, so add it to
682 @code{ifiles} in @code{open_base_file} in @file{gengtype.cc}.
684 For source files that aren't header files, the machinery will generate a
685 header file that should be included in the source file you just changed.
686 The file will be called @file{gt-@var{path}.h} where @var{path} is the
687 pathname relative to the @file{gcc} directory with slashes replaced by
688 @verb{|-|}, so for example the header file to be included in
689 @file{cp/parser.cc} is called @file{gt-cp-parser.h}. The
690 generated header file should be included after everything else in the
695 For language frontends, there is another file that needs to be included
696 somewhere. It will be called @file{gtype-@var{lang}.h}, where
697 @var{lang} is the name of the subdirectory the language is contained in.
699 Plugins can add additional root tables. Run the @code{gengtype}
700 utility in plugin mode as @code{gengtype -P pluginout.h @var{source-dir}
701 @var{file-list} @var{plugin*.c}} with your plugin files
702 @var{plugin*.c} using @code{GTY} to generate the @var{pluginout.h} file.
703 The GCC build tree is needed to be present in that mode.
706 @node Invoking the garbage collector
707 @section How to invoke the garbage collector
708 @cindex garbage collector, invocation
711 The GCC garbage collector GGC is only invoked explicitly. In contrast
712 with many other garbage collectors, it is not implicitly invoked by
713 allocation routines when a lot of memory has been consumed. So the
714 only way to have GGC reclaim storage is to call the @code{ggc_collect}
716 With @var{mode} @code{GGC_COLLECT_FORCE} or otherwise (default
717 @code{GGC_COLLECT_HEURISTIC}) when the internal heuristic decides to
718 collect, this call is potentially an expensive operation, as it may
719 have to scan the entire heap. Beware that local variables (on the GCC
720 call stack) are not followed by such an invocation (as many other
721 garbage collectors do): you should reference all your data from static
722 or external @code{GTY}-ed variables, and it is advised to call
723 @code{ggc_collect} with a shallow call stack. The GGC is an exact mark
724 and sweep garbage collector (so it does not scan the call stack for
725 pointers). In practice GCC passes don't often call @code{ggc_collect}
726 themselves, because it is called by the pass manager between passes.
728 At the time of the @code{ggc_collect} call all pointers in the GC-marked
729 structures must be valid or @code{NULL}. In practice this means that
730 there should not be uninitialized pointer fields in the structures even
731 if your code never reads or writes those fields at a particular
732 instance. One way to ensure this is to use cleared versions of
733 allocators unless all the fields are initialized manually immediately
736 @node Troubleshooting
737 @section Troubleshooting the garbage collector
738 @cindex garbage collector, troubleshooting
740 With the current garbage collector implementation, most issues should
741 show up as GCC compilation errors. Some of the most commonly
742 encountered issues are described below.
745 @item Gengtype does not produce allocators for a @code{GTY}-marked type.
746 Gengtype checks if there is at least one possible path from GC roots to
747 at least one instance of each type before outputting allocators. If
748 there is no such path, the @code{GTY} markers will be ignored and no
749 allocators will be output. Solve this by making sure that there exists
750 at least one such path. If creating it is unfeasible or raises a ``code
751 smell'', consider if you really must use GC for allocating such type.
753 @item Link-time errors about undefined @code{gt_ggc_r_foo_bar} and
754 similarly-named symbols. Check if your @file{foo_bar} source file has
755 @code{#include "gt-foo_bar.h"} as its very last line.