2 Arrays Bounds Check Elimination (ABC)
5 Massimiliano Mantione (mass@ximian.com)
7 Here "abc" stays for "array bounds check", or "array bound checks", or
8 some combination of the two.
12 Simply use the "abcrem" optimization invoking mono.
14 To see if bound checks are actually removed, use "mono -v" and
15 grep for "ARRAY-ACCESS" in the output, there should be a
16 message for each check that has been removed.
18 To trace the algorithm execution, use "-v -v -v", and be
19 prepared to be totally submersed by debugging messages...
23 The abc removal code can now always remove bound checks from
24 "clean" array scans in loops, and generally anyway there are
25 clear conditions that already state that the index is "safe".
27 To be clearer, and give an idea of what the algorithm can and
28 cannot do without describing it in detail... keep in mind that
29 only "redundant" checks cannot be removed. By "redundant", I
30 mean "already explicitly checked" in the method code.
32 Unfortunately, analyzing complex expressions is not so easy
33 (see below for details), so the capabilities of this "abc
36 These are simple guidelines:
38 - Only expressions of the following kinds are handled:
40 - variable [+/- constant]
41 - Only comparisons between "handled" expressions are understood.
42 - "switch" statements are not yet handled.
44 This means that code like this will be handled well:
46 for (int i = 0; i < a.Length; i++) {
50 The "i" variable could be declared out of the "for", the "for"
51 could be a "while", and maybe even implemented with "goto",
52 the array could be scanned in reverse order, and everything
55 I could have a temporary variable storing the array length,
56 and check on it inside the loop, and the abc removal would
57 still occurr, like this:
68 for (int i = l; i > 0; i--) {
72 The following two examples would work:
74 for (int i = 0; i < (a.Length -1); i++) .....
75 for (int i = 0; i < a.Length; i += 2) .....
77 But just something like this
80 for (int i = 0; i < (a.Length + delta); i++) .....
85 for (int i = 0; i < a.Length; i += delta) .....
87 would not work, the check would stay there. (unless
88 a combination of cfold, consprop and copyprop is used, too,
89 which would make the constant value of "delta" explicit).
91 Just to make you understand how things are tricky... this would work!
93 int limit = a.Length - 1;
94 for (int i = 0; i < limit; i++) {
98 A detailed explanation of the reason why things are done like
103 This array bound check removal (abc removal) algorithm is
104 based on symbolic execution concepts. I handle the removal
105 like an "unreachable code elimination" (and in fact the
106 optimization could be extended to remove also other
107 unreachable sections of code, due to branches that "always go
110 In symbolic execution, variables do not have actual (numeric)
111 values, but instead symbolic expressions (like "a", or "x+y").
112 Also, branch conditions are handled like symbolic conditions
113 (like "i<k"), which state relations between variable values.
115 The SSA representation inside mini is somewhat close to a
116 symbolic representation of the execution of the compiled
119 Particularly, the representation of variable values is exactly
120 a symbolic one. It is enough to find all CEE_STIND_*
121 instructions which store to a local variable, and their second
122 argument is exactly the variable value. Actually, "cfg->vars
123 [<variable-index>]->def" should contain exactly those store
124 instructions, and the "get_variable_value_from_ssa_store"
125 function extracts the variable value from there.
127 On the other hand, the conditions under which each basic block
128 is executed are not made fully explicit.
130 However, it is not difficult to make them so.
132 Each BB that has more than one exit BB, in practice must end
133 either with a conditional branch instruction or with a switch
136 In the first case, the BB has exactly two exit BBs, and their
137 execution conditions are easy to get from the condition of the
138 branch (see the "get_relation_from_branch_instruction"
139 function, and especially the end of "analyze_block" in
142 If there is a switch, the jump condition of every exit BB is
143 the equality of the switch argument with the particular index
144 associated with its case (but the current implementation does
145 not handle switch statements yet).
147 These individual conditions are in practice associated to each
148 arc that connects BBs in the CFG (with the simple case that
149 unconditional branches have a "TRUE" condition, because they
152 So, for each BB, its *proper* entry condition is the union of
153 all the conditions associated to arcs that enter the BB. The
154 "union" is like a logical "or", in the sense that either of
155 the condition could be true, they are not necessarily all
156 true. This means that if I can enter a BB in two ways, and in
157 one case I know that "x>0", and in the other that "x==0",
158 actually in the BB I know that "x>=0", which is a weaker
159 condition (the union of the two).
161 Also, the *complete* entry condition for a BB is the
162 "intersection" of all the entry conditions of its
163 dominators. This is true because each dominator is the only
164 way to reach the BB, so the entry condition of each dominator
165 must be true if the control flow reached the BB. This
166 translates to the logical "and" of all the "proper" conditions
167 of the BBs met walking up in the dominator tree. So, if one
168 says "x>0", and another "x==1", then I know that "x==1", which
169 is a stronger condition (the intersection of the two).
171 Note that, if the two conditions were "x>0" and "x==0", then
172 the block would be unreachable (the intersection is empty),
173 because some branch is impossible.
175 Another observation is that, inside each BB, every variable is
176 subject to the complete entry condition of that very same BB,
177 and not the one in which it is defined (with the "complete
178 entry condition" being the thing I defined before, sorry if
179 these terms "proper" and "complete" are strange, I found
182 This happens because the branch conditions are related to the
183 control flow. I can define "i=a", and if I am in a BB where
184 "a>0", then "i>0", but not otherwise.
186 So, intuitively, if the available conditions say "i>=0", and i
187 is used as an index in an array access, then the lower bound
188 check can be omitted. If the condition also says
189 "(i>=0)&&(i<array.length)", the abc removal can occur.
191 So, a complete solution to the problem of abc removal would be
192 the following: for each array access, build a system of
193 equations containing:
195 [1] all the symbolic variable definitions
197 [2] the complete entry condition of the BB in which
198 the array access occurs
200 [3] the two "goal functions" ("index >=0" and "index <
203 If the system is valid for *each possible* variable value, then the goal
204 functions are always true, and the abc can be removed.
206 All this discussion is useful to give a precise specification
207 to the problem we are trying to solve.
209 The trouble is that, in the general case, the resulting system
210 of equations is like a predicate in first order logic, which
211 is semi-decidable, and its general solution is anyway too
212 complex to be attempted in a JIT compiler (which should not
213 contain a full fledged theorem prover).
215 Therefore, we must cut some corner.
217 There is also another big problem, which is caused by
218 "recursive" symbolic definitions. These definition can (and
219 generally do) happen every time there is a loop. For instance,
220 in the following piece of code:
222 for ( int i = 0; i < array.length; i++ ) {
223 Console.WriteLine( "array [i] = " + array [i] );
226 one of the definitions of i is a PHI that can be either 0 or
229 Now, we know that mathematically "i = i + 1" does not make
230 sense, and in fact symbolic values are not "equations", they
231 are "symbolic definitions".
233 The actual symbolic value of i is a generic "n", where "n" is
234 the number of iterations of the loop, but this is terrible to
235 handle (and in more complex examples the symbolic value of i
236 simply cannot be written, because i is calculated in an
239 However, the definition "i = i + 1" tells us something about
240 i: it tells us that i "grows". So (from the PHI definition) we
241 know that i is either 0, or "grows". This is enough to tell
242 that "i>=0", which is what we want! It is important to note
243 that recursive definitions can only occurr inside PHI
244 definitions, because actually a variable cannot be defined
245 *only* in terms of itself!
247 At this point, I can explain which corners I want to cut to
248 make the problem solvable. It will not remove all the abc that
249 could theoretically be removed, but at least it will work.
251 The easiest way to cut corners is to only handle expressions
252 which are "reasonably simple", and ignore the rest.
254 Keep in mind that ignoring an expression is not harmful in
255 itself. The algorithm will be simply "less powerful", because
256 it will ignore conditions that could have caused to the
257 removal of an abc, but will not remove checks "by mistake" (so
258 the resulting code will be in any case correct).
260 The expressions we handle are the following (all of integer
265 - variable + constant
266 - constant + variable
267 - variable - constant
269 And, of course, PHI definitions.
271 Any other expression causes the introduction of an "any" value
272 in the evaluation, which makes all values that depend from it
275 We will call these kind of definitions "summarizable"
278 In a first attempt, we can consider only branch conditions
279 that have the simplest possible form (the comparison of two
280 summarizable expressions).
282 We can also simplify the effect of variable definitions,
283 keeping only what is relevant to know: their value range with
284 respect to zero and with respect to the length of the array we
285 are currently handling.
287 One particular note on PHI functions: they work (obviously)
288 like the logical "or" of their definitions, and therefore are
289 equivalent to the "logical or" of the summarization of their
292 About recursive definitions (which, believe me, are the worst
293 thing in all this mess), we handle only "monotonic" ones. That
294 is, we try to understand if the recursive definition (which,
295 as we said above, must happen because of a loop) always
296 "grows" or "gets smaller". In all other cases, we decide we
299 One critical thing, once we have defined all these data
300 structures, is how the evaluation is actually performed.
302 In a first attempt I coded a "brute force" approach, which for
303 each BB tried to examine all possible conditions between all
304 variables, filling a sort of "evaluation matrix". The problem
305 was that the complexity of this evaluation was quadratic (or
306 worse) on the number of variables, and that many variables
307 were examined even if they were not involved in any array
310 Following the ABCD paper:
312 http://citeseer.ist.psu.edu/bodik00abcd.html
314 I rewrote the algorithm in a more "sparse" way.
316 Now, the main data structure is a graph of relations between
317 variables, and each attempt to remove a check performs a
318 traversal of the graph, looking for a path from the index to
319 the array length that satisfies the properties "index >= 0"
320 and "index < length". If such a path is found, the check is
321 removed. It is true that in theory *each* traversal has a
322 complexity which is exponential on the number of variables,
323 but in practice the graph is not very connected, so the
324 traversal terminates quickly.
327 Then, the algorithm to optimize one method looks like this:
331 [1a] Build the SSA representation.
333 [1b] Prepare the evaluation graph (empty)
335 [1b] Summarize each variable definition, and put
336 the resulting relations in the evaluation
339 [2] Analyze each BB, starting from the entry point and
340 following the dominator tree:
342 [2a] Summarize its entry condition, and put the resulting relations
343 in the evaluation graph (this is the reason
344 why the BBs are examined following the
345 dominator tree, so that conditions are added
346 to the graph in a "cumulative" way)
348 [2b] Scan the BB instructions, and for each array
349 access perform step [3]
351 [2c] Process children BBs following the dominator
354 [2d] Remove from the evaluation area the conditions added at step [2a]
355 (so that backtracking along the tree the area
358 [3] Attempt the removal:
360 [3a] Summarize the index expression, to see if we can handle it; there
361 are three cases: the index is either a
362 constant, or a variable (with an optional
363 delta) or cannot be handled (is a "any")
365 [3b] If the index can be handled, traverse the evaluation area searching
366 a path from the index variable to the array
367 length (if the index is a constant, just
368 examine the array length to see if it has
369 some relation with this constant)
371 [3c] Use the results of step [3b] to decide if the check is redundant