Import boehm-gc snapshot, taken from
[official-gcc.git] / boehm-gc / doc / finalization.html
blob3733c0c499270b866906a3e94ac5900383359235
1 <!DOCTYPE HTML>
2 <HEAD>
3 <TITLE>Finalization in the Boehm-Demers-Weiser collector </TITLE>
4 </HEAD>
5 <BODY>
6 <H1>Finalization</h1>
7 Many garbage collectors provide a facility for executing user code
8 just before an object is collected. This can be used to reclaim any
9 system resources or non-garbage-collected memory associated with the
10 object.
11 Experience has shown that this can be a useful facility.
12 It is indispensable in cases in which system resources are embedded
13 in complex data structures (<I>e.g.</i> file descriptors
14 in the <A type="text/plain" HREF="../include/cord.h">cord package</a>).
15 <P>
16 Our collector provides the necessary functionality through
17 <TT>GC_register_finalizer</tt> in
18 <A type="text/plain" HREF="../include/gc.h">gc.h</a>, or by
19 inheriting from <TT>gc_cleanup</tt>
20 in <A type="text/plain" HREF="../include/gc_cpp.h">gc_cpp.h</a>.
21 <P>
22 However, finalization should not be used in the same way as C++
23 destructors. In well-written programs there will typically be
24 very few uses of finalization. (Garbage collected programs that
25 interact with explicitly memory-managed libraries may be an exception.)
26 <P>
27 In general the following guidelines should be followed:
28 <UL>
29 <LI>
30 Actions that must be executed promptly do not belong in finalizers.
31 They should be handled by explicit calls in the code (or C++
32 destructors if you prefer). If you expect the action to occur at
33 a specific point, this is probably not hard.
34 <LI>
35 Finalizers are intended for resource reclamation.
36 <LI>
37 Scarce system resources should be managed explicitly whenever
38 convenient. Use finalizers only as a backup mechanism for the
39 cases that would be hard to handle explicitly.
40 <LI>
41 If scarce resources are managed with finalization, the allocation
42 routine for that resource (<I>e.g.</i> open for file handles) should force
43 a garbage collection (two if that doesn't suffice) if it finds itself
44 short of the resource.
45 <LI>
46 If extremely scarce resources are managed by finalization (<I>e.g.</i>
47 file descriptors on systems which have a limit of 20 open files),
48 it may be necessary to introduce a descriptor caching scheme to
49 hide the resource limit.
50 (<I>E.g.</i>, the program would keep real file descriptors
51 for the 20 most recently used logically open files.
52 Any other needed files would be closed after saving their state.
53 They would then be reopened on demand.
54 Finalization would logically close the file, closing the
55 real descriptor only if it happened to be cached.)
56 Note that most modern systems (<I>e.g.</i> Irix&#174;) allow hundreds or
57 thousands of open files, and this is typically not an issue.
58 <LI>
59 Finalization code may
60 be run anyplace an allocation or other call to the collector
61 takes place.
62 In multi-threaded programs, finalizers have to obey the normal
63 locking conventions to ensure safety.
64 Code run directly from finalizers should not acquire locks that may
65 be held during allocation. This restriction can be easily circumvented
66 by registering a finalizer which enqueues the real action for execution
67 in a separate thread.
68 <P>
69 In single-threaded code, it is also often easiest to have finalizers
70 queue actions, which are then explicitly run during an
71 explicit call by the user's program.
72 </ul>
73 <H1>Topologically Ordered Finalization</h1>
74 Our <A HREF="overview.html">conservative garbage collector</a> supports
75 a form of finalization
76 (with <TT>GC_register_finalizer</tt>)
77 in which objects are finalized in topological
78 order. If <I>A</i> points to <I>B</i>, and both are registered for
79 finalization, it is guaranteed the <I>A</i> will be finalized first.
80 This usually guarantees that finalization procedures see only
81 unfinalized objects.
82 <P>
83 This decision is often questioned, particularly since it has an obvious
84 disadvantage. The current implementation finalizes long chains of
85 finalizable objects one per collection. This is hard to avoid, since
86 the first finalizer invoked may store a pointer to the rest of the chain
87 in a global variable, making it accessible again. Or it may mutate the
88 rest of the chain.
89 <P>
90 Cycles involving one or more finalizable objects are never finalized.
91 <H1>
92 Why topological ordering?
93 </h1>
94 It is important to keep in mind that the choice of finalization ordering
95 matters only in relatively rare cases. In spite of the fact that it has
96 received a lot of discussion, it is not one of the more important
97 decisions in designing a system. Many, especially smaller, applications
98 will never notice the difference. Nonetheless, we believe that topologically
99 ordered finalization is the right choice.
101 To understand the justification, observe that if <I>A</i>s
102 finalization procedure does not refer to <I>B</i>, we could fairly easily have
103 avoided the dependency. We could have split <I>A</i> into <I>A'</i>
104 and <I>A''</i> such that any references to <I>A</i> become references to
105 <I>A'</i>, <I>A'</i> points to <I>A''</i> but not vice-versa, only fields
106 needed for finalization are stored in <I>A''</i>, and <I>A''</i> is enabled
107 for finalization. (<TT>GC_register_disappearing_link</tt> provides an
108 alternative mechanism that does not require breaking up objects.)
110 Thus assume that <I>A</i> actually does need access to <I>B</i> during
111 finalization. To make things concrete, assume that <I>B</i> is
112 finalizable because it holds a pointer to a C object, which must be
113 explicitly deallocated. (This is likely to be one of the most common
114 uses of finalization.) If <I>B</i> happens to be finalized first,
115 <I>A</i> will see a dangling pointer during its finalization. But a
116 principal goal of garbage collection was to avoid dangling pointers.
118 Note that the client program could enforce topological ordering
119 even if the system didn't. A pointer to <I>B</i> could be stored in
120 some globally visible place, where it is cleared only by <I>A</i>s
121 finalizer. But this puts the burden to ensure safety back on the
122 programmer.
124 With topologically ordered finalization, the programmer
125 can fail to split an object, thus leaving an accidental cycle. This
126 results in a leak, which is arguably less dangerous than a dangling
127 pointer. More importantly, it is <I>much</i> easier to diagnose,
128 since the garbage collector would have to go out of its way not to
129 notice finalization cycles. It can trivially report them.
131 Furthermore unordered finalization does not really solve the problem
132 of cycles. Consider the above case in which <I>A</i>s
133 finalization procedure depends on <I>B</i>, and thus a pointer to <I>B</i>
134 is stored in a global data structure, to be cleared by <I>A</i>s finalizer.
135 If there is an accidental pointer from <I>B</i> back to <I>A</i>, and
136 thus a cycle, neither <I>B</i> nor <I>A</i> will become unreachable.
137 The leak is there, just as in the topologically ordered case, but it is
138 hidden from easy diagnosis.
140 A number of alternative finalization orderings have been proposed, e.g.
141 based on statically assigned priorities. In our opinion, these are much
142 more likely to require complex programming discipline to use in a large
143 modular system. (Some of them, e.g. Guardians proposed by Dybvig,
144 Bruggeman, and Eby, do avoid some problems which arise in combination
145 with certain other collection algorithms.)
147 Fundamentally, a garbage collector assumes that objects reachable
148 via pointer chains may be accessed, and thus should be preserved.
149 Topologically ordered finalization simply extends this to object finalization;
150 an finalizable object reachable from another finalizer via a pointer chain
151 is presumed to be accessible by the finalizer, and thus should not be
152 finalized.
154 <H1>Programming with topological finalization</h1>
155 Experience with Cedar has shown that cycles or long chains of finalizable
156 objects are typically not a problem.
157 Finalizable objects are typically rare.
158 There are several ways to reduce spurious dependencies between finalizable
159 objects. Splitting objects as discussed above is one technique.
160 The collector also provides <TT>GC_register_disappearing_link</tt>, which
161 explicitly nils a pointer before determining finalization ordering.
163 Some so-called "operating systems" fail to clean up some resources associated
164 with a process. These resources must be deallocated at all cost before
165 process exit whether or not they are still referenced. Probably the best
166 way to deal with those is by not relying exclusively on finalization.
167 They should be registered in a table of weak pointers (implemented as
168 disguised pointers cleared by the finalization procedure that deallocates
169 the resource). If any references are still left at process exit, they
170 can be explicitly deallocated then.
172 <H1>Getting around topological finalization ordering</h1>
173 There are certain situations in which cycles between finalizable objects are
174 genuinely unavoidable. Most notably, C++ compilers introduce self-cycles
175 to represent inheritance. <TT>GC_register_finalizer_ignore_self</tt> tells the
176 finalization part of the collector to ignore self cycles.
177 This is used by the C++ interface.
179 Finalize.c actually contains an intentionally undocumented mechanism
180 for registering a finalizable object with user-defined dependencies.
181 The problem is that this dependency information is also used for memory
182 reclamation, not just finalization ordering. Thus misuse can result in
183 dangling pointers even if finalization doesn't create any.
184 The risk of dangling pointers can be eliminated by building the collector
185 with -DJAVA_FINALIZATION. This forces objects reachable from finalizers
186 to be marked, even though this dependency is not considered for finalization
187 ordering.
189 </body>
190 </html>