1 @c Copyright (C) 2002, 2003, 2004, 2007, 2008, 2009, 2010
2 @c Free Software Foundation, Inc.
3 @c This is part of the GCC manual.
4 @c For copying conditions, see the file gcc.texi.
7 @chapter Memory Management and Type Information
11 GCC uses some fairly sophisticated memory management techniques, which
12 involve determining information about GCC's data structures from GCC's
13 source code and using this information to perform garbage collection and
14 implement precompiled headers.
16 A full C parser would be too complicated for this task, so a limited
17 subset of C is interpreted and special markers are used to determine
18 what parts of the source to look at. All @code{struct} and
19 @code{union} declarations that define data structures that are
20 allocated under control of the garbage collector must be marked. All
21 global variables that hold pointers to garbage-collected memory must
22 also be marked. Finally, all global variables that need to be saved
23 and restored by a precompiled header must be marked. (The precompiled
24 header mechanism can only save static variables if they're scalar.
25 Complex data structures must be allocated in garbage-collected memory
26 to be saved in a precompiled header.)
28 The full format of a marker is
30 GTY (([@var{option}] [(@var{param})], [@var{option}] [(@var{param})] @dots{}))
33 but in most cases no options are needed. The outer double parentheses
34 are still necessary, though: @code{GTY(())}. Markers can appear:
38 In a structure definition, before the open brace;
40 In a global variable declaration, after the keyword @code{static} or
43 In a structure field definition, before the name of the field.
46 Here are some examples of marking simple data structures and globals.
49 struct GTY(()) @var{tag}
54 typedef struct GTY(()) @var{tag}
59 static GTY(()) struct @var{tag} *@var{list}; /* @r{points to GC memory} */
60 static GTY(()) int @var{counter}; /* @r{save counter in a PCH} */
63 The parser understands simple typedefs such as
64 @code{typedef struct @var{tag} *@var{name};} and
65 @code{typedef int @var{name};}.
66 These don't need to be marked.
69 * GTY Options:: What goes inside a @code{GTY(())}.
70 * GGC Roots:: Making global variables GGC roots.
71 * Files:: How the generated files work.
72 * Invoking the garbage collector:: How to invoke the garbage collector.
73 * Troubleshooting:: When something does not work as expected.
77 @section The Inside of a @code{GTY(())}
79 Sometimes the C code is not enough to fully describe the type
80 structure. Extra information can be provided with @code{GTY} options
81 and additional markers. Some options take a parameter, which may be
82 either a string or a type name, depending on the parameter. If an
83 option takes no parameter, it is acceptable either to omit the
84 parameter entirely, or to provide an empty string as a parameter. For
85 example, @code{@w{GTY ((skip))}} and @code{@w{GTY ((skip ("")))}} are
88 When the parameter is a string, often it is a fragment of C code. Four
89 special escapes may be used in these strings, to refer to pieces of
90 the data structure being marked:
92 @cindex % in GTY option
95 The current structure.
97 The structure that immediately contains the current structure.
99 The outermost structure that contains the current structure.
101 A partial expression of the form @code{[i1][i2]@dots{}} that indexes
102 the array item currently being marked.
105 For instance, suppose that you have a structure of the form
115 and @code{b} is a variable of type @code{struct B}. When marking
116 @samp{b.foo[11]}, @code{%h} would expand to @samp{b.foo[11]},
117 @code{%0} and @code{%1} would both expand to @samp{b}, and @code{%a}
118 would expand to @samp{[11]}.
120 As in ordinary C, adjacent strings will be concatenated; this is
121 helpful when you have a complicated expression.
124 GTY ((chain_next ("TREE_CODE (&%h.generic) == INTEGER_TYPE"
125 " ? TYPE_NEXT_VARIANT (&%h.generic)"
126 " : TREE_CHAIN (&%h.generic)")))
130 The available options are:
134 @item length ("@var{expression}")
136 There are two places the type machinery will need to be explicitly told
137 the length of an array. The first case is when a structure ends in a
138 variable-length array, like this:
140 struct GTY(()) rtvec_def @{
141 int num_elem; /* @r{number of elements} */
142 rtx GTY ((length ("%h.num_elem"))) elem[1];
146 In this case, the @code{length} option is used to override the specified
147 array length (which should usually be @code{1}). The parameter of the
148 option is a fragment of C code that calculates the length.
150 The second case is when a structure or a global variable contains a
151 pointer to an array, like this:
153 struct gimple_omp_for_iter * GTY((length ("%h.collapse"))) iter;
155 In this case, @code{iter} has been allocated by writing something like
157 x->iter = ggc_alloc_cleared_vec_gimple_omp_for_iter (collapse);
159 and the @code{collapse} provides the length of the field.
161 This second use of @code{length} also works on global variables, like:
163 static GTY((length("reg_known_value_size"))) rtx *reg_known_value;
169 If @code{skip} is applied to a field, the type machinery will ignore it.
170 This is somewhat dangerous; the only safe use is in a union when one
171 field really isn't ever used.
176 @item desc ("@var{expression}")
177 @itemx tag ("@var{constant}")
180 The type machinery needs to be told which field of a @code{union} is
181 currently active. This is done by giving each field a constant
182 @code{tag} value, and then specifying a discriminator using @code{desc}.
183 The value of the expression given by @code{desc} is compared against
184 each @code{tag} value, each of which should be different. If no
185 @code{tag} is matched, the field marked with @code{default} is used if
186 there is one, otherwise no field in the union will be marked.
188 In the @code{desc} option, the ``current structure'' is the union that
189 it discriminates. Use @code{%1} to mean the structure containing it.
190 There are no escapes available to the @code{tag} option, since it is a
195 struct GTY(()) tree_binding
197 struct tree_common common;
198 union tree_binding_u @{
199 tree GTY ((tag ("0"))) scope;
200 struct cp_binding_level * GTY ((tag ("1"))) level;
201 @} GTY ((desc ("BINDING_HAS_LEVEL_P ((tree)&%0)"))) xscope;
206 In this example, the value of BINDING_HAS_LEVEL_P when applied to a
207 @code{struct tree_binding *} is presumed to be 0 or 1. If 1, the type
208 mechanism will treat the field @code{level} as being present and if 0,
209 will treat the field @code{scope} as being present.
213 @item param_is (@var{type})
216 Sometimes it's convenient to define some data structure to work on
217 generic pointers (that is, @code{PTR}) and then use it with a specific
218 type. @code{param_is} specifies the real type pointed to, and
219 @code{use_param} says where in the generic data structure that type
222 For instance, to have a @code{htab_t} that points to trees, one would
223 write the definition of @code{htab_t} like this:
225 typedef struct GTY(()) @{
227 void ** GTY ((use_param, @dots{})) entries;
231 and then declare variables like this:
233 static htab_t GTY ((param_is (union tree_node))) ict;
236 @findex param@var{n}_is
237 @findex use_param@var{n}
238 @item param@var{n}_is (@var{type})
239 @itemx use_param@var{n}
241 In more complicated cases, the data structure might need to work on
242 several different types, which might not necessarily all be pointers.
243 For this, @code{param1_is} through @code{param9_is} may be used to
244 specify the real type of a field identified by @code{use_param1} through
250 When a structure contains another structure that is parameterized,
251 there's no need to do anything special, the inner structure inherits the
252 parameters of the outer one. When a structure contains a pointer to a
253 parameterized structure, the type machinery won't automatically detect
254 this (it could, it just doesn't yet), so it's necessary to tell it that
255 the pointed-to structure should use the same parameters as the outer
256 structure. This is done by marking the pointer with the
257 @code{use_params} option.
262 @code{deletable}, when applied to a global variable, indicates that when
263 garbage collection runs, there's no need to mark anything pointed to
264 by this variable, it can just be set to @code{NULL} instead. This is used
265 to keep a list of free structures around for re-use.
268 @item if_marked ("@var{expression}")
270 Suppose you want some kinds of object to be unique, and so you put them
271 in a hash table. If garbage collection marks the hash table, these
272 objects will never be freed, even if the last other reference to them
273 goes away. GGC has special handling to deal with this: if you use the
274 @code{if_marked} option on a global hash table, GGC will call the
275 routine whose name is the parameter to the option on each hash table
276 entry. If the routine returns nonzero, the hash table entry will
277 be marked as usual. If the routine returns zero, the hash table entry
280 The routine @code{ggc_marked_p} can be used to determine if an element
281 has been marked already; in fact, the usual case is to use
282 @code{if_marked ("ggc_marked_p")}.
285 @item mark_hook ("@var{hook-routine-name}")
287 If provided for a structure or union type, the given
288 @var{hook-routine-name} (between double-quotes) is the name of a
289 routine called when the garbage collector has just marked the data as
290 reachable. This routine should not change the data, or call any ggc
291 routine. Its only argument is a pointer to the just marked (const)
297 When applied to a field, @code{maybe_undef} indicates that it's OK if
298 the structure that this fields points to is never defined, so long as
299 this field is always @code{NULL}. This is used to avoid requiring
300 backends to define certain optional structures. It doesn't work with
304 @item nested_ptr (@var{type}, "@var{to expression}", "@var{from expression}")
306 The type machinery expects all pointers to point to the start of an
307 object. Sometimes for abstraction purposes it's convenient to have
308 a pointer which points inside an object. So long as it's possible to
309 convert the original object to and from the pointer, such pointers
310 can still be used. @var{type} is the type of the original object,
311 the @var{to expression} returns the pointer given the original object,
312 and the @var{from expression} returns the original object given
313 the pointer. The pointer will be available using the @code{%h}
318 @findex chain_circular
319 @item chain_next ("@var{expression}")
320 @itemx chain_prev ("@var{expression}")
321 @itemx chain_circular ("@var{expression}")
323 It's helpful for the type machinery to know if objects are often
324 chained together in long lists; this lets it generate code that uses
325 less stack space by iterating along the list instead of recursing down
326 it. @code{chain_next} is an expression for the next item in the list,
327 @code{chain_prev} is an expression for the previous item. For singly
328 linked lists, use only @code{chain_next}; for doubly linked lists, use
329 both. The machinery requires that taking the next item of the
330 previous item gives the original item. @code{chain_circular} is similar
331 to @code{chain_next}, but can be used for circular single linked lists.
334 @item reorder ("@var{function name}")
336 Some data structures depend on the relative ordering of pointers. If
337 the precompiled header machinery needs to change that ordering, it
338 will call the function referenced by the @code{reorder} option, before
339 changing the pointers in the object that's pointed to by the field the
340 option applies to. The function must take four arguments, with the
341 signature @samp{@w{void *, void *, gt_pointer_operator, void *}}.
342 The first parameter is a pointer to the structure that contains the
343 object being updated, or the object itself if there is no containing
344 structure. The second parameter is a cookie that should be ignored.
345 The third parameter is a routine that, given a pointer, will update it
346 to its correct new value. The fourth parameter is a cookie that must
347 be passed to the second parameter.
349 PCH cannot handle data structures that depend on the absolute values
350 of pointers. @code{reorder} functions can be expensive. When
351 possible, it is better to depend on properties of the data, like an ID
352 number or the hash of a string instead.
354 @findex variable_size
357 The type machinery expects the types to be of constant size. When this
358 is not true, for example, with structs that have array fields or unions,
359 the type machinery cannot tell how many bytes need to be allocated at
360 each allocation. The @code{variable_size} is used to mark such types.
361 The type machinery then provides allocators that take a parameter
362 indicating an exact size of object being allocated. Note that the size
363 must be provided in bytes whereas the @code{length} option works with
364 array lengths in number of elements.
368 struct GTY((variable_size)) sorted_fields_type @{
370 tree GTY((length ("%h.len"))) elts[1];
374 Then the objects of @code{struct sorted_fields_type} are allocated in GC
377 field_vec = ggc_alloc_sorted_fields_type (size);
380 If @var{field_vec->elts} stores @var{n} elements, then @var{size}
381 could be calculated as follows:
383 size_t size = sizeof (struct sorted_fields_type) + n * sizeof (tree);
389 The @code{atomic} option can only be used with pointers. It informs
390 the GC machinery that the memory that the pointer points to does not
391 contain any pointers, and hence it should be treated by the GC and PCH
392 machinery as an ``atomic'' block of memory that does not need to be
393 examined when scanning memory for pointers. In particular, the
394 machinery will not scan that memory for pointers to mark them as
395 reachable (when marking pointers for GC) or to relocate them (when
398 The @code{atomic} option differs from the @code{skip} option.
399 @code{atomic} keeps the memory under Garbage Collection, but makes the
400 GC ignore the contents of the memory. @code{skip} is more drastic in
401 that it causes the pointer and the memory to be completely ignored by
402 the Garbage Collector. So, memory marked as @code{atomic} is
403 automatically freed when no longer reachable, while memory marked as
406 The @code{atomic} option must be used with great care, because all
407 sorts of problem can occur if used incorrectly, that is, if the memory
408 the pointer points to does actually contain a pointer.
410 Here is an example of how to use it:
412 struct GTY(()) my_struct @{
413 int number_of_elements;
414 unsigned int GTY ((atomic)) * elements;
417 In this case, @code{elements} is a pointer under GC, and the memory it
418 points to needs to be allocated using the Garbage Collector, and will
419 be freed automatically by the Garbage Collector when it is no longer
420 referenced. But the memory that the pointer points to is an array of
421 @code{unsigned int} elements, and the GC must not try to scan it to
422 find pointers to mark or relocate, which is why it is marked with the
423 @code{atomic} option.
425 Note that, currently, global variables can not be marked with
426 @code{atomic}; only fields of a struct can. This is a known
427 limitation. It would be useful to be able to mark global pointers
428 with @code{atomic} to make the PCH machinery aware of them so that
429 they are saved and restored correctly to PCH files.
432 @item special ("@var{name}")
434 The @code{special} option is used to mark types that have to be dealt
435 with by special case machinery. The parameter is the name of the
436 special case. See @file{gengtype.c} for further details. Avoid
437 adding new special cases unless there is no other alternative.
441 @section Marking Roots for the Garbage Collector
442 @cindex roots, marking
443 @cindex marking roots
445 In addition to keeping track of types, the type machinery also locates
446 the global variables (@dfn{roots}) that the garbage collector starts
447 at. Roots must be declared using one of the following syntaxes:
451 @code{extern GTY(([@var{options}])) @var{type} @var{name};}
453 @code{static GTY(([@var{options}])) @var{type} @var{name};}
459 @code{GTY(([@var{options}])) @var{type} @var{name};}
462 is @emph{not} accepted. There should be an @code{extern} declaration
463 of such a variable in a header somewhere---mark that, not the
464 definition. Or, if the variable is only used in one file, make it
468 @section Source Files Containing Type Information
469 @cindex generated files
470 @cindex files, generated
472 Whenever you add @code{GTY} markers to a source file that previously
473 had none, or create a new source file containing @code{GTY} markers,
474 there are three things you need to do:
478 You need to add the file to the list of source files the type
479 machinery scans. There are four cases:
483 For a back-end file, this is usually done
484 automatically; if not, you should add it to @code{target_gtfiles} in
485 the appropriate port's entries in @file{config.gcc}.
488 For files shared by all front ends, add the filename to the
489 @code{GTFILES} variable in @file{Makefile.in}.
492 For files that are part of one front end, add the filename to the
493 @code{gtfiles} variable defined in the appropriate
494 @file{config-lang.in}. For C, the file is @file{c-config-lang.in}.
495 Headers should appear before non-headers in this list.
498 For files that are part of some but not all front ends, add the
499 filename to the @code{gtfiles} variable of @emph{all} the front ends
504 If the file was a header file, you'll need to check that it's included
505 in the right place to be visible to the generated files. For a back-end
506 header file, this should be done automatically. For a front-end header
507 file, it needs to be included by the same file that includes
508 @file{gtype-@var{lang}.h}. For other header files, it needs to be
509 included in @file{gtype-desc.c}, which is a generated file, so add it to
510 @code{ifiles} in @code{open_base_file} in @file{gengtype.c}.
512 For source files that aren't header files, the machinery will generate a
513 header file that should be included in the source file you just changed.
514 The file will be called @file{gt-@var{path}.h} where @var{path} is the
515 pathname relative to the @file{gcc} directory with slashes replaced by
516 @verb{|-|}, so for example the header file to be included in
517 @file{cp/parser.c} is called @file{gt-cp-parser.c}. The
518 generated header file should be included after everything else in the
519 source file. Don't forget to mention this file as a dependency in the
524 For language frontends, there is another file that needs to be included
525 somewhere. It will be called @file{gtype-@var{lang}.h}, where
526 @var{lang} is the name of the subdirectory the language is contained in.
528 Plugins can add additional root tables. Run the @code{gengtype}
529 utility in plugin mode as @code{gengtype -P pluginout.h @var{source-dir}
530 @var{file-list} @var{plugin*.c}} with your plugin files
531 @var{plugin*.c} using @code{GTY} to generate the @var{pluginout.h} file.
532 The GCC build tree is needed to be present in that mode.
535 @node Invoking the garbage collector
536 @section How to invoke the garbage collector
537 @cindex garbage collector, invocation
540 The GCC garbage collector GGC is only invoked explicitly. In contrast
541 with many other garbage collectors, it is not implicitly invoked by
542 allocation routines when a lot of memory has been consumed. So the
543 only way to have GGC reclaim storage it to call the @code{ggc_collect}
544 function explicitly. This call is an expensive operation, as it may
545 have to scan the entire heap. Beware that local variables (on the GCC
546 call stack) are not followed by such an invocation (as many other
547 garbage collectors do): you should reference all your data from static
548 or external @code{GTY}-ed variables, and it is advised to call
549 @code{ggc_collect} with a shallow call stack. The GGC is an exact mark
550 and sweep garbage collector (so it does not scan the call stack for
551 pointers). In practice GCC passes don't often call @code{ggc_collect}
552 themselves, because it is called by the pass manager between passes.
554 At the time of the @code{ggc_collect} call all pointers in the GC-marked
555 structures must be valid or @code{NULL}. In practice this means that
556 there should not be uninitialized pointer fields in the structures even
557 if your code never reads or writes those fields at a particular
558 instance. One way to ensure this is to use cleared versions of
559 allocators unless all the fields are initialized manually immediately
562 @node Troubleshooting
563 @section Troubleshooting the garbage collector
564 @cindex garbage collector, troubleshooting
566 With the current garbage collector implementation, most issues should
567 show up as GCC compilation errors. Some of the most commonly
568 encountered issues are described below.
571 @item Gengtype does not produce allocators for a @code{GTY}-marked type.
572 Gengtype checks if there is at least one possible path from GC roots to
573 at least one instance of each type before outputting allocators. If
574 there is no such path, the @code{GTY} markers will be ignored and no
575 allocators will be output. Solve this by making sure that there exists
576 at least one such path. If creating it is unfeasible or raises a ``code
577 smell'', consider if you really must use GC for allocating such type.
579 @item Link-time errors about undefined @code{gt_ggc_r_foo_bar} and
580 similarly-named symbols. Check if your @file{foo_bar} source file has
581 @code{#include "gt-foo_bar.h"} as its very last line.