4 Especially when inlining is active, it can happen that temporary
5 variables add pressure to the register allocator, producing bad
8 The idea is that some of these temporaries can be totally eliminated
9 my moving the MonoInst tree that defines them directly to the use
10 point in the code (so the name "tree mover").
12 Please note that this is *not* an optimization: it is mostly a
13 workaround to issues we have in the regalloc.
14 Actually, with the new linear IR this will not be possible at all
15 (there will be no more trees in the code!).
16 Anyway, this workaround turns out to be useful in the current state
19 -----------------------------------------------------------------------
23 If a local is defined by a value which is a proper expression (a tree
24 of MonoInst, not just another local or a constant), and this definition
25 is used only once, the tree can be moved directly to the use location,
26 and the definition eliminated.
27 Of course, none of the variables used in the tree must be defined in
28 the code path between the definition and the use, and the tree must be
30 We do not handle the cases when the tree is just a local or a constant
31 because they are handled by copyprop and consprop, respectively.
33 To make things simpler, we restrict the tree move to the case when:
34 - the definition and the use are in the same BB, and
35 - the use is followed by another definition in the same BB (it is not
36 possible that the 1st value is used again), or alternatively there
37 is no BB in the whole CFG that contains a use of this local before a
38 definition (so, again, there is no code path that can lead to a
41 To handle this, we maintain an ACT array (Available Copy Tree, similar
42 to the ACP), where we store the "state" of every local.
43 Ideally, every local can be in the following state:
44 [E] Undefined (by a tree, it could be in the ACP but we don't care).
45 [D] Defined (by a tree), and waiting for a use.
46 [U] Used, with a tree definition available in the same BB, but still
47 without a definition following the use (always in the same BB).
48 Of course state [E] (empty) is the initial one.
50 Besides, there are two sort of "meta states", or flags:
51 [W] Still waiting for a use or definition in this BB (we have seen no
52 occurrence of the local yet).
53 [X] Used without being previously defined in the same BB (note that if
54 there is a definition that precedes the use in the same BB, even if
55 the definition is not a tree or is not available because of side
56 effects or because the tree value has changed the local is not in
58 Also note that state [X] is a sort of "global" condition, which if set
59 in one BB will stay valid for the whole CFG, even if the local will
60 otherwise change state. The idea of flagging a local as [X] is that if
61 there is a definition/use pair that reaches the end of a BB, it could
62 be that there is a CFG path that then leads to the BB flagging it as
63 [X] (which contains a use), so the tree cannot be moved.
64 So state [X] will always be set, and never examined in all the state
65 transitions we will describe.
66 In practice, we use flag [W] to set state [X]: if, when traversing a
67 BB, we find a use for a local in state [W], then that local is flagged
71 For each BB, we initialize all states to [E] and [W], and then we
72 traverse the code one inst at a time, and update the variable states
73 in the ACT in the following ways:
76 - Flag [W] is cleared.
77 - All "affected trees" are killed (go from state [D] to [E]).
78 The "affected trees" are the trees which contain (use) the defined
79 local, and the rationale is that the tree value changed, so the
80 tree is no longer available.
81 - If the local was in state [U], *that* tree move is marked "safe"
82 (because *this* definition makes us sure that the previous tree
83 cannot be used again in any way).
84 The idea is that "safe" moves can happen even if the local is
85 flagged [X], because the second definition "covers" the use.
86 The tree move is then saved in the "todo" list (and the affecting
88 - If the local was defined by a tree, it goes to state [D], the tree
89 is recorded, and all the locals used in it are marked as "affecting
90 this tree" (of course these markers are lists, because each local
91 could affect more than one tree).
94 - All potentially affected trees (in state [D]) are killed.
97 - If the local is still [W], it is flagged [X] (the [W] goes away).
98 - If the local is in state [D], it goes to state [U].
99 The tree move must not yet be recorded in the "todo" list, it still
100 stays in the ACT slot belonging to this local.
101 Anyway, the "affecting" nodes are updated, because now a definition
102 of a local used in this tree will affect only "indirect" (or also
103 "propagated") moves, but not *this* move (see below).
104 - If the local is in state [U], then the tree cannot be moved (it is
105 used two times): the move is canceled, and the state goes [E].
106 - If the local is in state [E], the use is ignored.
109 - All potentially affected trees (in state [D] or [U]) are killed.
112 - Tree is marked as "unmovable".
114 Then, at the end of the BB, for each ACT slot:
115 - If state is [U], the tree move is recorded in the "todo" list, but
117 - Anyway, state goes to [E], the [W] flag is set, and all "affecting"
118 lists are cleared (we get ready to traverse the next BB).
119 Finally, when all BBs has been scanned, we traverse the "todo" list,
120 moving all "safe" entries, and moving "unsafe" ones only if their ACT
121 slot is not flagged [X].
124 But there are two issues that make things harder :-(
126 The first is the concept of "indirect tree move".
127 It can happen that a tree is scheduled for moving, and its destination
128 is a use that is located in a second tree, which could also be moved.
129 The main issue is that a definition of a variable of the 1st tree on
130 the path between the definition and the use of the 2nd one must prevent
132 But which move? The 1st or the 2nd?
133 Well, any of the two!
134 The point is, the 2nd move must be prevented *only* if the 1st one
135 happens: if it is aborted (for an [X] flag or any other reason), the
136 2nd move is OK, and vice versa...
137 We must handle this in the following way:
138 - The ACT must still remember if a slot is scheduled for moving in
139 this BB, and if it is, all the locals used in the tree.
140 We say that the slot is in state [M].
141 Note that [M] is (like [X] and [W]) a sort of "meta state": a local
142 is flagged [M] when it goes to state [U], and the flag is cleared
143 when the tree move is cancelled
144 - A tree that uses a local whose slot is in state [M] is also using all
145 the locals used by the tree in state [M], but the use is "indirect".
146 These use nodes are also included in the "affecting" lists.
147 - The definition of a variable used in an "indirect" way has the
148 effect of "linking" the two involved tree moves, saying that only one
149 of the two can happen in practice, but not both.
150 - When the 2nd tree is scheduled for moving, the 1st one is *still* in
151 state [M], because a third move could "carry it forward", and all the
152 *three* moves should be mutually exclusive (to be safe!).
154 The second tricky complication is the "tree forwarding" that can happen
155 when copyprop is involved.
156 It is conceptually similar to the "indirect tree move".
157 Only, the 2nd tree is not really a tree, it is just the local defined
158 in the 1st tree move.
159 It can happen that copyprop will propagate the definition.
160 We cannot make treeprop do the same job of copyprop, because copyprop
161 has less constraints, and is therefore more powerful in its scope.
162 The main issue is that treeprop cannot propagate a tree to *two* uses,
163 while copyprop is perfectly capable of propagating one definition to
164 two (or more) different places.
165 So we must let copyprop do its job otherwise we'll miss optimizations,
166 but we must also make it play safe with treeprop.
167 Let's clarify with an example:
168 a = v1 + v2; //a is defined by a tree, state [D], uses v2 and v2
169 b = a; //a is used, state [U] with move scheduled, and
170 //b is defined by a, ACP[b] is a, and b is in state [DC]
171 c = b + v3; // b is used, goes to state [U]
172 The real trouble is that copyprop happens *immediately*, while treeprop
173 is deferred to the end of the CFG traversal.
174 So, in the 3rd statement, the "b" is immediately turned into an "a" by
175 copyprop, regardless of what treeprop will do.
176 Anyway, if we are careful, this is not so bad.
177 First of all, we must "accept" the fact that in the 3rd statement the
178 "b" is in fact an "a", as treeprop must happen *after* copyprop.
179 The real problem is that "a" is used twice: in the 2nd and 3rd lines.
180 In our usual setup, the 2nd line would set it to [U], and the 3rd line
181 would kill the move (and set "a" to [E]).
182 I have tried to play tricks, and reason as of copyprop didn't happen,
183 but everything becomes really messy.
184 Instead, we should note that the 2nd line is very likely to be dead.
185 At least in this BB, copyprop will turn all "b"s into "a"s as long as
186 it can, and when it cannot, it will be because either "a" or "b" have
187 been redefined, which would be after the tree move anyway.
188 So, the reasoning gets different: let's pretend that "b" will be dead.
189 This will make the "a" use in the 2nd statement useless, so there we
190 can "reset" "a" to [D], but also take note that if "b" will end up
191 not being dead, the tree move associated to this [D] must be aborted.
192 We can detect this in the following way:
193 - Either "b" is used before being defined in this BB, or
194 - It will be flagged "unsafe".
195 Both things are very easy to check.
196 The only quirk is that the "affecting" lists must not be cleared when
197 a slot goes to state [U], because a "propagation" could put it back
198 to state [D] (where those lists are needed, because it can be killed
199 by a definition to a used slot).
201 -----------------------------------------------------------------------
205 All the implementation runs inside the existing mono_local_cprop
206 function, and a separate memory pool is used to hold the temporary
209 A struct, MonoTreeMover, contains the pointers to the pool, the ACT,
210 the list of scheduled moves and auxiliary things.
211 This struct is allocated if the tree move pass is requested, and is
212 then passed along to all the involved functions, which are therefore
213 aware of the tree mover state.
215 The ACT is an array of slots, obviously one per local.
216 Each slot is of type MonoTreeMoverActSlot, and contains the used and
217 affected locals, a pointer to the pending tree move and the "waiting"
220 The "affecting" lists a built from "dependency nodes", of type
221 MonoTreeMoverDependencyNode.
222 Each of the nodes contains the used and affected local, and is in
223 two lists: the locals used by a slot, and the locals affected by a
224 slot (obviously a different one).
225 So, each node means: "variable x is used in tree t, so a definition
226 of x affects tree t".
227 The "affecting" lists are doubly linked, to allow for O(1) deletion.
228 The "used" lists are simply linked, but when they are mantained there
229 is always a pointer to the last element to allow for O(1) list moving.
230 When a used list is dismissed (which happens often, any time a node is
231 killed), its nodes are unlinked from their respective affecting lists
232 and are then put in a "free" list in the MonoTreeMover to be reused.
234 Each tree move is represented by a struct (MonoTreeMoverTreeMove),
236 - the definition and use points,
237 - the "affected" moves (recall the concept of "indirect tree move"),
238 - the "must be dead" slots (recall "tree forwarding"). and
239 - a few utility flags.
240 The tree moves stays in the relevant ACT slot until it is ready to be
241 scheduled for moving, at which point it is put in a list in the
243 The tree moves structs are reused when they are killed, so there is
244 also a "free" list for them in the MonoTreeMover.
246 The tree mover code has been added to all the relevant functions that
247 participate in consprop and copyprop, particularly:
248 - mono_cprop_copy_values takes care of variable uses (transitions from
249 states [D] to [U] and [U] to [E] because of killing),
250 - mono_cprop_invalidate_values takes care of side effects (indirect
252 - mono_local_cprop_bb sets up and cleans the traversals for each BB,
253 and for each MonoInst it takes care of variable definitions.
254 To each of them has been added a MonoTreeMover parameter, which is not
255 NULL if the tree mover is running.
256 After mono_local_cprop_bb has run for all BBs, the MonoTreeMover has
257 the list of all the pending moves, which must be walked to actually
258 perform the moves (when possible, because "unsafe" flags, "affected"
259 moves and "must be dead" slots can still have their effects, which
260 must be handled now because they are fully known only at the end of