Default to dwarf version 4 on hppa64-hpux
[official-gcc.git] / gcc / tree-predcom.c
blob6b195d1914f53d79594d3eb79c55f8ec63f5ddce
1 /* Predictive commoning.
2 Copyright (C) 2005-2021 Free Software Foundation, Inc.
4 This file is part of GCC.
6 GCC is free software; you can redistribute it and/or modify it
7 under the terms of the GNU General Public License as published by the
8 Free Software Foundation; either version 3, or (at your option) any
9 later version.
11 GCC is distributed in the hope that it will be useful, but WITHOUT
12 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
13 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
14 for more details.
16 You should have received a copy of the GNU General Public License
17 along with GCC; see the file COPYING3. If not see
18 <http://www.gnu.org/licenses/>. */
20 /* This file implements the predictive commoning optimization. Predictive
21 commoning can be viewed as CSE around a loop, and with some improvements,
22 as generalized strength reduction-- i.e., reusing values computed in
23 earlier iterations of a loop in the later ones. So far, the pass only
24 handles the most useful case, that is, reusing values of memory references.
25 If you think this is all just a special case of PRE, you are sort of right;
26 however, concentrating on loops is simpler, and makes it possible to
27 incorporate data dependence analysis to detect the opportunities, perform
28 loop unrolling to avoid copies together with renaming immediately,
29 and if needed, we could also take register pressure into account.
31 Let us demonstrate what is done on an example:
33 for (i = 0; i < 100; i++)
35 a[i+2] = a[i] + a[i+1];
36 b[10] = b[10] + i;
37 c[i] = c[99 - i];
38 d[i] = d[i + 1];
41 1) We find data references in the loop, and split them to mutually
42 independent groups (i.e., we find components of a data dependence
43 graph). We ignore read-read dependences whose distance is not constant.
44 (TODO -- we could also ignore antidependences). In this example, we
45 find the following groups:
47 a[i]{read}, a[i+1]{read}, a[i+2]{write}
48 b[10]{read}, b[10]{write}
49 c[99 - i]{read}, c[i]{write}
50 d[i + 1]{read}, d[i]{write}
52 2) Inside each of the group, we verify several conditions:
53 a) all the references must differ in indices only, and the indices
54 must all have the same step
55 b) the references must dominate loop latch (and thus, they must be
56 ordered by dominance relation).
57 c) the distance of the indices must be a small multiple of the step
58 We are then able to compute the difference of the references (# of
59 iterations before they point to the same place as the first of them).
60 Also, in case there are writes in the loop, we split the groups into
61 chains whose head is the write whose values are used by the reads in
62 the same chain. The chains are then processed independently,
63 making the further transformations simpler. Also, the shorter chains
64 need the same number of registers, but may require lower unrolling
65 factor in order to get rid of the copies on the loop latch.
67 In our example, we get the following chains (the chain for c is invalid).
69 a[i]{read,+0}, a[i+1]{read,-1}, a[i+2]{write,-2}
70 b[10]{read,+0}, b[10]{write,+0}
71 d[i + 1]{read,+0}, d[i]{write,+1}
73 3) For each read, we determine the read or write whose value it reuses,
74 together with the distance of this reuse. I.e. we take the last
75 reference before it with distance 0, or the last of the references
76 with the smallest positive distance to the read. Then, we remove
77 the references that are not used in any of these chains, discard the
78 empty groups, and propagate all the links so that they point to the
79 single root reference of the chain (adjusting their distance
80 appropriately). Some extra care needs to be taken for references with
81 step 0. In our example (the numbers indicate the distance of the
82 reuse),
84 a[i] --> (*) 2, a[i+1] --> (*) 1, a[i+2] (*)
85 b[10] --> (*) 1, b[10] (*)
87 4) The chains are combined together if possible. If the corresponding
88 elements of two chains are always combined together with the same
89 operator, we remember just the result of this combination, instead
90 of remembering the values separately. We may need to perform
91 reassociation to enable combining, for example
93 e[i] + f[i+1] + e[i+1] + f[i]
95 can be reassociated as
97 (e[i] + f[i]) + (e[i+1] + f[i+1])
99 and we can combine the chains for e and f into one chain.
101 5) For each root reference (end of the chain) R, let N be maximum distance
102 of a reference reusing its value. Variables R0 up to RN are created,
103 together with phi nodes that transfer values from R1 .. RN to
104 R0 .. R(N-1).
105 Initial values are loaded to R0..R(N-1) (in case not all references
106 must necessarily be accessed and they may trap, we may fail here;
107 TODO sometimes, the loads could be guarded by a check for the number
108 of iterations). Values loaded/stored in roots are also copied to
109 RN. Other reads are replaced with the appropriate variable Ri.
110 Everything is put to SSA form.
112 As a small improvement, if R0 is dead after the root (i.e., all uses of
113 the value with the maximum distance dominate the root), we can avoid
114 creating RN and use R0 instead of it.
116 In our example, we get (only the parts concerning a and b are shown):
117 for (i = 0; i < 100; i++)
119 f = phi (a[0], s);
120 s = phi (a[1], f);
121 x = phi (b[10], x);
123 f = f + s;
124 a[i+2] = f;
125 x = x + i;
126 b[10] = x;
129 6) Factor F for unrolling is determined as the smallest common multiple of
130 (N + 1) for each root reference (N for references for that we avoided
131 creating RN). If F and the loop is small enough, loop is unrolled F
132 times. The stores to RN (R0) in the copies of the loop body are
133 periodically replaced with R0, R1, ... (R1, R2, ...), so that they can
134 be coalesced and the copies can be eliminated.
136 TODO -- copy propagation and other optimizations may change the live
137 ranges of the temporary registers and prevent them from being coalesced;
138 this may increase the register pressure.
140 In our case, F = 2 and the (main loop of the) result is
142 for (i = 0; i < ...; i += 2)
144 f = phi (a[0], f);
145 s = phi (a[1], s);
146 x = phi (b[10], x);
148 f = f + s;
149 a[i+2] = f;
150 x = x + i;
151 b[10] = x;
153 s = s + f;
154 a[i+3] = s;
155 x = x + i;
156 b[10] = x;
159 Apart from predictive commoning on Load-Load and Store-Load chains, we
160 also support Store-Store chains -- stores killed by other store can be
161 eliminated. Given below example:
163 for (i = 0; i < n; i++)
165 a[i] = 1;
166 a[i+2] = 2;
169 It can be replaced with:
171 t0 = a[0];
172 t1 = a[1];
173 for (i = 0; i < n; i++)
175 a[i] = 1;
176 t2 = 2;
177 t0 = t1;
178 t1 = t2;
180 a[n] = t0;
181 a[n+1] = t1;
183 If the loop runs more than 1 iterations, it can be further simplified into:
185 for (i = 0; i < n; i++)
187 a[i] = 1;
189 a[n] = 2;
190 a[n+1] = 2;
192 The interesting part is this can be viewed either as general store motion
193 or general dead store elimination in either intra/inter-iterations way.
195 With trivial effort, we also support load inside Store-Store chains if the
196 load is dominated by a store statement in the same iteration of loop. You
197 can see this as a restricted Store-Mixed-Load-Store chain.
199 TODO: For now, we don't support store-store chains in multi-exit loops. We
200 force to not unroll in case of store-store chain even if other chains might
201 ask for unroll.
203 Predictive commoning can be generalized for arbitrary computations (not
204 just memory loads), and also nontrivial transfer functions (e.g., replacing
205 i * i with ii_last + 2 * i + 1), to generalize strength reduction. */
207 #include "config.h"
208 #include "system.h"
209 #include "coretypes.h"
210 #include "backend.h"
211 #include "rtl.h"
212 #include "tree.h"
213 #include "gimple.h"
214 #include "predict.h"
215 #include "tree-pass.h"
216 #include "ssa.h"
217 #include "gimple-pretty-print.h"
218 #include "alias.h"
219 #include "fold-const.h"
220 #include "cfgloop.h"
221 #include "tree-eh.h"
222 #include "gimplify.h"
223 #include "gimple-iterator.h"
224 #include "gimplify-me.h"
225 #include "tree-ssa-loop-ivopts.h"
226 #include "tree-ssa-loop-manip.h"
227 #include "tree-ssa-loop-niter.h"
228 #include "tree-ssa-loop.h"
229 #include "tree-into-ssa.h"
230 #include "tree-dfa.h"
231 #include "tree-ssa.h"
232 #include "tree-data-ref.h"
233 #include "tree-scalar-evolution.h"
234 #include "tree-affine.h"
235 #include "builtins.h"
237 /* The maximum number of iterations between the considered memory
238 references. */
240 #define MAX_DISTANCE (target_avail_regs < 16 ? 4 : 8)
242 /* Data references (or phi nodes that carry data reference values across
243 loop iterations). */
245 typedef class dref_d
247 public:
248 /* The reference itself. */
249 struct data_reference *ref;
251 /* The statement in that the reference appears. */
252 gimple *stmt;
254 /* In case that STMT is a phi node, this field is set to the SSA name
255 defined by it in replace_phis_by_defined_names (in order to avoid
256 pointing to phi node that got reallocated in the meantime). */
257 tree name_defined_by_phi;
259 /* Distance of the reference from the root of the chain (in number of
260 iterations of the loop). */
261 unsigned distance;
263 /* Number of iterations offset from the first reference in the component. */
264 widest_int offset;
266 /* Number of the reference in a component, in dominance ordering. */
267 unsigned pos;
269 /* True if the memory reference is always accessed when the loop is
270 entered. */
271 unsigned always_accessed : 1;
272 } *dref;
275 /* Type of the chain of the references. */
277 enum chain_type
279 /* The addresses of the references in the chain are constant. */
280 CT_INVARIANT,
282 /* There are only loads in the chain. */
283 CT_LOAD,
285 /* Root of the chain is store, the rest are loads. */
286 CT_STORE_LOAD,
288 /* There are only stores in the chain. */
289 CT_STORE_STORE,
291 /* A combination of two chains. */
292 CT_COMBINATION
295 /* Chains of data references. */
297 typedef struct chain
299 /* Type of the chain. */
300 enum chain_type type;
302 /* For combination chains, the operator and the two chains that are
303 combined, and the type of the result. */
304 enum tree_code op;
305 tree rslt_type;
306 struct chain *ch1, *ch2;
308 /* The references in the chain. */
309 auto_vec<dref> refs;
311 /* The maximum distance of the reference in the chain from the root. */
312 unsigned length;
314 /* The variables used to copy the value throughout iterations. */
315 auto_vec<tree> vars;
317 /* Initializers for the variables. */
318 auto_vec<tree> inits;
320 /* Finalizers for the eliminated stores. */
321 auto_vec<tree> finis;
323 /* gimple stmts intializing the initial variables of the chain. */
324 gimple_seq init_seq;
326 /* gimple stmts finalizing the eliminated stores of the chain. */
327 gimple_seq fini_seq;
329 /* True if there is a use of a variable with the maximal distance
330 that comes after the root in the loop. */
331 unsigned has_max_use_after : 1;
333 /* True if all the memory references in the chain are always accessed. */
334 unsigned all_always_accessed : 1;
336 /* True if this chain was combined together with some other chain. */
337 unsigned combined : 1;
339 /* True if this is store elimination chain and eliminated stores store
340 loop invariant value into memory. */
341 unsigned inv_store_elimination : 1;
342 } *chain_p;
345 /* Describes the knowledge about the step of the memory references in
346 the component. */
348 enum ref_step_type
350 /* The step is zero. */
351 RS_INVARIANT,
353 /* The step is nonzero. */
354 RS_NONZERO,
356 /* The step may or may not be nonzero. */
357 RS_ANY
360 /* Components of the data dependence graph. */
362 struct component
364 /* The references in the component. */
365 auto_vec<dref> refs;
367 /* What we know about the step of the references in the component. */
368 enum ref_step_type comp_step;
370 /* True if all references in component are stores and we try to do
371 intra/inter loop iteration dead store elimination. */
372 bool eliminate_store_p;
374 /* Next component in the list. */
375 struct component *next;
378 /* A class to encapsulate the global states used for predictive
379 commoning work on top of one given LOOP. */
381 class pcom_worker
383 public:
384 pcom_worker (loop_p l) : m_loop (l), m_cache (NULL) {}
386 ~pcom_worker ()
388 free_data_refs (m_datarefs);
389 free_dependence_relations (m_dependences);
390 free_affine_expand_cache (&m_cache);
391 release_chains ();
394 pcom_worker (const pcom_worker &) = delete;
395 pcom_worker &operator= (const pcom_worker &) = delete;
397 /* Performs predictive commoning. */
398 unsigned tree_predictive_commoning_loop (bool allow_unroll_p);
400 /* Perform the predictive commoning optimization for chains, make this
401 public for being called in callback execute_pred_commoning_cbck. */
402 void execute_pred_commoning (bitmap tmp_vars);
404 private:
405 /* The pointer to the given loop. */
406 loop_p m_loop;
408 /* All data references. */
409 auto_vec<data_reference_p, 10> m_datarefs;
411 /* All data dependences. */
412 auto_vec<ddr_p, 10> m_dependences;
414 /* All chains. */
415 auto_vec<chain_p> m_chains;
417 /* Bitmap of ssa names defined by looparound phi nodes covered by chains. */
418 auto_bitmap m_looparound_phis;
420 typedef hash_map<tree, name_expansion *> tree_expand_map_t;
421 /* Cache used by tree_to_aff_combination_expand. */
422 tree_expand_map_t *m_cache;
424 /* Splits dependence graph to components. */
425 struct component *split_data_refs_to_components ();
427 /* Check the conditions on references inside each of components COMPS,
428 and remove the unsuitable components from the list. */
429 struct component *filter_suitable_components (struct component *comps);
431 /* Find roots of the values and determine distances in components COMPS,
432 and separates the references to chains. */
433 void determine_roots (struct component *comps);
435 /* Prepare initializers for chains, and free chains that cannot
436 be used because the initializers might trap. */
437 void prepare_initializers ();
439 /* Generates finalizer memory reference for chains. Returns true if
440 finalizer code generation for chains breaks loop closed ssa form. */
441 bool prepare_finalizers ();
443 /* Try to combine the chains. */
444 void try_combine_chains ();
446 /* Frees CHAINS. */
447 void release_chains ();
449 /* Frees a chain CHAIN. */
450 void release_chain (chain_p chain);
452 /* Prepare initializers for CHAIN. Returns false if this is impossible
453 because one of these initializers may trap, true otherwise. */
454 bool prepare_initializers_chain (chain_p chain);
456 /* Generates finalizer memory references for CHAIN. Returns true
457 if finalizer code for CHAIN can be generated, otherwise false. */
458 bool prepare_finalizers_chain (chain_p chain);
460 /* Stores DR_OFFSET (DR) + DR_INIT (DR) to OFFSET. */
461 void aff_combination_dr_offset (struct data_reference *dr, aff_tree *offset);
463 /* Determines number of iterations of the innermost enclosing loop before
464 B refers to exactly the same location as A and stores it to OFF. */
465 bool determine_offset (struct data_reference *a, struct data_reference *b,
466 poly_widest_int *off);
468 /* Returns true if the component COMP satisfies the conditions
469 described in 2) at the beginning of this file. */
470 bool suitable_component_p (struct component *comp);
472 /* Returns true if REF is a valid initializer for ROOT with given
473 DISTANCE (in iterations of the innermost enclosing loop). */
474 bool valid_initializer_p (struct data_reference *ref, unsigned distance,
475 struct data_reference *root);
477 /* Finds looparound phi node of loop that copies the value of REF. */
478 gphi *find_looparound_phi (dref ref, dref root);
480 /* Find roots of the values and determine distances in the component
481 COMP. The references are redistributed into chains. */
482 void determine_roots_comp (struct component *comp);
484 /* For references in CHAIN that are copied around the loop, add the
485 results of such copies to the chain. */
486 void add_looparound_copies (chain_p chain);
488 /* Returns the single statement in that NAME is used, excepting
489 the looparound phi nodes contained in one of the chains. */
490 gimple *single_nonlooparound_use (tree name);
492 /* Remove statement STMT, as well as the chain of assignments in that
493 it is used. */
494 void remove_stmt (gimple *stmt);
496 /* Perform the predictive commoning optimization for a chain CHAIN. */
497 void execute_pred_commoning_chain (chain_p chain, bitmap tmp_vars);
499 /* Returns the modify statement that uses NAME. */
500 gimple *find_use_stmt (tree *name);
502 /* If the operation used in STMT is associative and commutative, go
503 through the tree of the same operations and returns its root. */
504 gimple *find_associative_operation_root (gimple *stmt, unsigned *distance);
506 /* Returns the common statement in that NAME1 and NAME2 have a use. */
507 gimple *find_common_use_stmt (tree *name1, tree *name2);
509 /* Checks whether R1 and R2 are combined together using CODE, with the
510 result in RSLT_TYPE, in order R1 CODE R2 if SWAP is false and in order
511 R2 CODE R1 if it is true. */
512 bool combinable_refs_p (dref r1, dref r2, enum tree_code *code, bool *swap,
513 tree *rslt_type);
515 /* Reassociates the expression in that NAME1 and NAME2 are used so that
516 they are combined in a single statement, and returns this statement. */
517 gimple *reassociate_to_the_same_stmt (tree name1, tree name2);
519 /* Returns the statement that combines references R1 and R2. */
520 gimple *stmt_combining_refs (dref r1, dref r2);
522 /* Tries to combine chains CH1 and CH2 together. */
523 chain_p combine_chains (chain_p ch1, chain_p ch2);
526 /* Dumps data reference REF to FILE. */
528 extern void dump_dref (FILE *, dref);
529 void
530 dump_dref (FILE *file, dref ref)
532 if (ref->ref)
534 fprintf (file, " ");
535 print_generic_expr (file, DR_REF (ref->ref), TDF_SLIM);
536 fprintf (file, " (id %u%s)\n", ref->pos,
537 DR_IS_READ (ref->ref) ? "" : ", write");
539 fprintf (file, " offset ");
540 print_decs (ref->offset, file);
541 fprintf (file, "\n");
543 fprintf (file, " distance %u\n", ref->distance);
545 else
547 if (gimple_code (ref->stmt) == GIMPLE_PHI)
548 fprintf (file, " looparound ref\n");
549 else
550 fprintf (file, " combination ref\n");
551 fprintf (file, " in statement ");
552 print_gimple_stmt (file, ref->stmt, 0, TDF_SLIM);
553 fprintf (file, "\n");
554 fprintf (file, " distance %u\n", ref->distance);
559 /* Dumps CHAIN to FILE. */
561 extern void dump_chain (FILE *, chain_p);
562 void
563 dump_chain (FILE *file, chain_p chain)
565 dref a;
566 const char *chain_type;
567 unsigned i;
568 tree var;
570 switch (chain->type)
572 case CT_INVARIANT:
573 chain_type = "Load motion";
574 break;
576 case CT_LOAD:
577 chain_type = "Loads-only";
578 break;
580 case CT_STORE_LOAD:
581 chain_type = "Store-loads";
582 break;
584 case CT_STORE_STORE:
585 chain_type = "Store-stores";
586 break;
588 case CT_COMBINATION:
589 chain_type = "Combination";
590 break;
592 default:
593 gcc_unreachable ();
596 fprintf (file, "%s chain %p%s\n", chain_type, (void *) chain,
597 chain->combined ? " (combined)" : "");
598 if (chain->type != CT_INVARIANT)
599 fprintf (file, " max distance %u%s\n", chain->length,
600 chain->has_max_use_after ? "" : ", may reuse first");
602 if (chain->type == CT_COMBINATION)
604 fprintf (file, " equal to %p %s %p in type ",
605 (void *) chain->ch1, op_symbol_code (chain->op),
606 (void *) chain->ch2);
607 print_generic_expr (file, chain->rslt_type, TDF_SLIM);
608 fprintf (file, "\n");
611 if (chain->vars.exists ())
613 fprintf (file, " vars");
614 FOR_EACH_VEC_ELT (chain->vars, i, var)
616 fprintf (file, " ");
617 print_generic_expr (file, var, TDF_SLIM);
619 fprintf (file, "\n");
622 if (chain->inits.exists ())
624 fprintf (file, " inits");
625 FOR_EACH_VEC_ELT (chain->inits, i, var)
627 fprintf (file, " ");
628 print_generic_expr (file, var, TDF_SLIM);
630 fprintf (file, "\n");
633 fprintf (file, " references:\n");
634 FOR_EACH_VEC_ELT (chain->refs, i, a)
635 dump_dref (file, a);
637 fprintf (file, "\n");
640 /* Dumps CHAINS to FILE. */
642 void
643 dump_chains (FILE *file, const vec<chain_p> &chains)
645 chain_p chain;
646 unsigned i;
648 FOR_EACH_VEC_ELT (chains, i, chain)
649 dump_chain (file, chain);
652 /* Dumps COMP to FILE. */
654 extern void dump_component (FILE *, struct component *);
655 void
656 dump_component (FILE *file, struct component *comp)
658 dref a;
659 unsigned i;
661 fprintf (file, "Component%s:\n",
662 comp->comp_step == RS_INVARIANT ? " (invariant)" : "");
663 FOR_EACH_VEC_ELT (comp->refs, i, a)
664 dump_dref (file, a);
665 fprintf (file, "\n");
668 /* Dumps COMPS to FILE. */
670 extern void dump_components (FILE *, struct component *);
671 void
672 dump_components (FILE *file, struct component *comps)
674 struct component *comp;
676 for (comp = comps; comp; comp = comp->next)
677 dump_component (file, comp);
680 /* Frees a chain CHAIN. */
682 void
683 pcom_worker::release_chain (chain_p chain)
685 dref ref;
686 unsigned i;
688 if (chain == NULL)
689 return;
691 FOR_EACH_VEC_ELT (chain->refs, i, ref)
692 free (ref);
694 if (chain->init_seq)
695 gimple_seq_discard (chain->init_seq);
697 if (chain->fini_seq)
698 gimple_seq_discard (chain->fini_seq);
700 free (chain);
703 /* Frees CHAINS. */
705 void
706 pcom_worker::release_chains ()
708 unsigned i;
709 chain_p chain;
711 FOR_EACH_VEC_ELT (m_chains, i, chain)
712 release_chain (chain);
715 /* Frees list of components COMPS. */
717 static void
718 release_components (struct component *comps)
720 struct component *act, *next;
722 for (act = comps; act; act = next)
724 next = act->next;
725 XDELETE (act);
729 /* Finds a root of tree given by FATHERS containing A, and performs path
730 shortening. */
732 static unsigned
733 component_of (vec<unsigned> &fathers, unsigned a)
735 unsigned root, n;
737 for (root = a; root != fathers[root]; root = fathers[root])
738 continue;
740 for (; a != root; a = n)
742 n = fathers[a];
743 fathers[a] = root;
746 return root;
749 /* Join operation for DFU. FATHERS gives the tree, SIZES are sizes of the
750 components, A and B are components to merge. */
752 static void
753 merge_comps (vec<unsigned> &fathers, vec<unsigned> &sizes,
754 unsigned a, unsigned b)
756 unsigned ca = component_of (fathers, a);
757 unsigned cb = component_of (fathers, b);
759 if (ca == cb)
760 return;
762 if (sizes[ca] < sizes[cb])
764 sizes[cb] += sizes[ca];
765 fathers[ca] = cb;
767 else
769 sizes[ca] += sizes[cb];
770 fathers[cb] = ca;
774 /* Returns true if A is a reference that is suitable for predictive commoning
775 in the innermost loop that contains it. REF_STEP is set according to the
776 step of the reference A. */
778 static bool
779 suitable_reference_p (struct data_reference *a, enum ref_step_type *ref_step)
781 tree ref = DR_REF (a), step = DR_STEP (a);
783 if (!step
784 || TREE_THIS_VOLATILE (ref)
785 || !is_gimple_reg_type (TREE_TYPE (ref))
786 || tree_could_throw_p (ref))
787 return false;
789 if (integer_zerop (step))
790 *ref_step = RS_INVARIANT;
791 else if (integer_nonzerop (step))
792 *ref_step = RS_NONZERO;
793 else
794 *ref_step = RS_ANY;
796 return true;
799 /* Stores DR_OFFSET (DR) + DR_INIT (DR) to OFFSET. */
801 void
802 pcom_worker::aff_combination_dr_offset (struct data_reference *dr,
803 aff_tree *offset)
805 tree type = TREE_TYPE (DR_OFFSET (dr));
806 aff_tree delta;
808 tree_to_aff_combination_expand (DR_OFFSET (dr), type, offset, &m_cache);
809 aff_combination_const (&delta, type, wi::to_poly_widest (DR_INIT (dr)));
810 aff_combination_add (offset, &delta);
813 /* Determines number of iterations of the innermost enclosing loop before B
814 refers to exactly the same location as A and stores it to OFF. If A and
815 B do not have the same step, they never meet, or anything else fails,
816 returns false, otherwise returns true. Both A and B are assumed to
817 satisfy suitable_reference_p. */
819 bool
820 pcom_worker::determine_offset (struct data_reference *a,
821 struct data_reference *b, poly_widest_int *off)
823 aff_tree diff, baseb, step;
824 tree typea, typeb;
826 /* Check that both the references access the location in the same type. */
827 typea = TREE_TYPE (DR_REF (a));
828 typeb = TREE_TYPE (DR_REF (b));
829 if (!useless_type_conversion_p (typeb, typea))
830 return false;
832 /* Check whether the base address and the step of both references is the
833 same. */
834 if (!operand_equal_p (DR_STEP (a), DR_STEP (b), 0)
835 || !operand_equal_p (DR_BASE_ADDRESS (a), DR_BASE_ADDRESS (b), 0))
836 return false;
838 if (integer_zerop (DR_STEP (a)))
840 /* If the references have loop invariant address, check that they access
841 exactly the same location. */
842 *off = 0;
843 return (operand_equal_p (DR_OFFSET (a), DR_OFFSET (b), 0)
844 && operand_equal_p (DR_INIT (a), DR_INIT (b), 0));
847 /* Compare the offsets of the addresses, and check whether the difference
848 is a multiple of step. */
849 aff_combination_dr_offset (a, &diff);
850 aff_combination_dr_offset (b, &baseb);
851 aff_combination_scale (&baseb, -1);
852 aff_combination_add (&diff, &baseb);
854 tree_to_aff_combination_expand (DR_STEP (a), TREE_TYPE (DR_STEP (a)),
855 &step, &m_cache);
856 return aff_combination_constant_multiple_p (&diff, &step, off);
859 /* Returns the last basic block in LOOP for that we are sure that
860 it is executed whenever the loop is entered. */
862 static basic_block
863 last_always_executed_block (class loop *loop)
865 unsigned i;
866 auto_vec<edge> exits = get_loop_exit_edges (loop);
867 edge ex;
868 basic_block last = loop->latch;
870 FOR_EACH_VEC_ELT (exits, i, ex)
871 last = nearest_common_dominator (CDI_DOMINATORS, last, ex->src);
873 return last;
876 /* Splits dependence graph on DATAREFS described by DEPENDENCES to
877 components. */
879 struct component *
880 pcom_worker::split_data_refs_to_components ()
882 unsigned i, n = m_datarefs.length ();
883 unsigned ca, ia, ib, bad;
884 struct data_reference *dr, *dra, *drb;
885 struct data_dependence_relation *ddr;
886 struct component *comp_list = NULL, *comp;
887 dref dataref;
888 /* Don't do store elimination if loop has multiple exit edges. */
889 bool eliminate_store_p = single_exit (m_loop) != NULL;
890 basic_block last_always_executed = last_always_executed_block (m_loop);
891 auto_bitmap no_store_store_comps;
892 auto_vec<unsigned> comp_father (n + 1);
893 auto_vec<unsigned> comp_size (n + 1);
894 comp_father.quick_grow (n + 1);
895 comp_size.quick_grow (n + 1);
897 FOR_EACH_VEC_ELT (m_datarefs, i, dr)
899 if (!DR_REF (dr))
900 /* A fake reference for call or asm_expr that may clobber memory;
901 just fail. */
902 return NULL;
903 /* predcom pass isn't prepared to handle calls with data references. */
904 if (is_gimple_call (DR_STMT (dr)))
905 return NULL;
906 dr->aux = (void *) (size_t) i;
907 comp_father[i] = i;
908 comp_size[i] = 1;
911 /* A component reserved for the "bad" data references. */
912 comp_father[n] = n;
913 comp_size[n] = 1;
915 FOR_EACH_VEC_ELT (m_datarefs, i, dr)
917 enum ref_step_type dummy;
919 if (!suitable_reference_p (dr, &dummy))
921 ia = (unsigned) (size_t) dr->aux;
922 merge_comps (comp_father, comp_size, n, ia);
926 FOR_EACH_VEC_ELT (m_dependences, i, ddr)
928 poly_widest_int dummy_off;
930 if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
931 continue;
933 dra = DDR_A (ddr);
934 drb = DDR_B (ddr);
936 /* Don't do store elimination if there is any unknown dependence for
937 any store data reference. */
938 if ((DR_IS_WRITE (dra) || DR_IS_WRITE (drb))
939 && (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know
940 || DDR_NUM_DIST_VECTS (ddr) == 0))
941 eliminate_store_p = false;
943 ia = component_of (comp_father, (unsigned) (size_t) dra->aux);
944 ib = component_of (comp_father, (unsigned) (size_t) drb->aux);
945 if (ia == ib)
946 continue;
948 bad = component_of (comp_father, n);
950 /* If both A and B are reads, we may ignore unsuitable dependences. */
951 if (DR_IS_READ (dra) && DR_IS_READ (drb))
953 if (ia == bad || ib == bad
954 || !determine_offset (dra, drb, &dummy_off))
955 continue;
957 /* If A is read and B write or vice versa and there is unsuitable
958 dependence, instead of merging both components into a component
959 that will certainly not pass suitable_component_p, just put the
960 read into bad component, perhaps at least the write together with
961 all the other data refs in it's component will be optimizable. */
962 else if (DR_IS_READ (dra) && ib != bad)
964 if (ia == bad)
966 bitmap_set_bit (no_store_store_comps, ib);
967 continue;
969 else if (!determine_offset (dra, drb, &dummy_off))
971 bitmap_set_bit (no_store_store_comps, ib);
972 merge_comps (comp_father, comp_size, bad, ia);
973 continue;
976 else if (DR_IS_READ (drb) && ia != bad)
978 if (ib == bad)
980 bitmap_set_bit (no_store_store_comps, ia);
981 continue;
983 else if (!determine_offset (dra, drb, &dummy_off))
985 bitmap_set_bit (no_store_store_comps, ia);
986 merge_comps (comp_father, comp_size, bad, ib);
987 continue;
990 else if (DR_IS_WRITE (dra) && DR_IS_WRITE (drb)
991 && ia != bad && ib != bad
992 && !determine_offset (dra, drb, &dummy_off))
994 merge_comps (comp_father, comp_size, bad, ia);
995 merge_comps (comp_father, comp_size, bad, ib);
996 continue;
999 merge_comps (comp_father, comp_size, ia, ib);
1002 if (eliminate_store_p)
1004 tree niters = number_of_latch_executions (m_loop);
1006 /* Don't do store elimination if niters info is unknown because stores
1007 in the last iteration can't be eliminated and we need to recover it
1008 after loop. */
1009 eliminate_store_p = (niters != NULL_TREE && niters != chrec_dont_know);
1012 auto_vec<struct component *> comps;
1013 comps.safe_grow_cleared (n, true);
1014 bad = component_of (comp_father, n);
1015 FOR_EACH_VEC_ELT (m_datarefs, i, dr)
1017 ia = (unsigned) (size_t) dr->aux;
1018 ca = component_of (comp_father, ia);
1019 if (ca == bad)
1020 continue;
1022 comp = comps[ca];
1023 if (!comp)
1025 comp = XCNEW (struct component);
1026 comp->refs.create (comp_size[ca]);
1027 comp->eliminate_store_p = eliminate_store_p;
1028 comps[ca] = comp;
1031 dataref = XCNEW (class dref_d);
1032 dataref->ref = dr;
1033 dataref->stmt = DR_STMT (dr);
1034 dataref->offset = 0;
1035 dataref->distance = 0;
1037 dataref->always_accessed
1038 = dominated_by_p (CDI_DOMINATORS, last_always_executed,
1039 gimple_bb (dataref->stmt));
1040 dataref->pos = comp->refs.length ();
1041 comp->refs.quick_push (dataref);
1044 if (eliminate_store_p)
1046 bitmap_iterator bi;
1047 EXECUTE_IF_SET_IN_BITMAP (no_store_store_comps, 0, ia, bi)
1049 ca = component_of (comp_father, ia);
1050 if (ca != bad)
1051 comps[ca]->eliminate_store_p = false;
1055 for (i = 0; i < n; i++)
1057 comp = comps[i];
1058 if (comp)
1060 comp->next = comp_list;
1061 comp_list = comp;
1064 return comp_list;
1067 /* Returns true if the component COMP satisfies the conditions
1068 described in 2) at the beginning of this file. */
1070 bool
1071 pcom_worker::suitable_component_p (struct component *comp)
1073 unsigned i;
1074 dref a, first;
1075 basic_block ba, bp = m_loop->header;
1076 bool ok, has_write = false;
1078 FOR_EACH_VEC_ELT (comp->refs, i, a)
1080 ba = gimple_bb (a->stmt);
1082 if (!just_once_each_iteration_p (m_loop, ba))
1083 return false;
1085 gcc_assert (dominated_by_p (CDI_DOMINATORS, ba, bp));
1086 bp = ba;
1088 if (DR_IS_WRITE (a->ref))
1089 has_write = true;
1092 first = comp->refs[0];
1093 ok = suitable_reference_p (first->ref, &comp->comp_step);
1094 gcc_assert (ok);
1095 first->offset = 0;
1097 for (i = 1; comp->refs.iterate (i, &a); i++)
1099 /* Polynomial offsets are no use, since we need to know the
1100 gap between iteration numbers at compile time. */
1101 poly_widest_int offset;
1102 if (!determine_offset (first->ref, a->ref, &offset)
1103 || !offset.is_constant (&a->offset))
1104 return false;
1106 enum ref_step_type a_step;
1107 gcc_checking_assert (suitable_reference_p (a->ref, &a_step)
1108 && a_step == comp->comp_step);
1111 /* If there is a write inside the component, we must know whether the
1112 step is nonzero or not -- we would not otherwise be able to recognize
1113 whether the value accessed by reads comes from the OFFSET-th iteration
1114 or the previous one. */
1115 if (has_write && comp->comp_step == RS_ANY)
1116 return false;
1118 return true;
1121 /* Check the conditions on references inside each of components COMPS,
1122 and remove the unsuitable components from the list. The new list
1123 of components is returned. The conditions are described in 2) at
1124 the beginning of this file. */
1126 struct component *
1127 pcom_worker::filter_suitable_components (struct component *comps)
1129 struct component **comp, *act;
1131 for (comp = &comps; *comp; )
1133 act = *comp;
1134 if (suitable_component_p (act))
1135 comp = &act->next;
1136 else
1138 dref ref;
1139 unsigned i;
1141 *comp = act->next;
1142 FOR_EACH_VEC_ELT (act->refs, i, ref)
1143 free (ref);
1144 XDELETE (act);
1148 return comps;
1151 /* Compares two drefs A and B by their offset and position. Callback for
1152 qsort. */
1154 static int
1155 order_drefs (const void *a, const void *b)
1157 const dref *const da = (const dref *) a;
1158 const dref *const db = (const dref *) b;
1159 int offcmp = wi::cmps ((*da)->offset, (*db)->offset);
1161 if (offcmp != 0)
1162 return offcmp;
1164 return (*da)->pos - (*db)->pos;
1167 /* Compares two drefs A and B by their position. Callback for qsort. */
1169 static int
1170 order_drefs_by_pos (const void *a, const void *b)
1172 const dref *const da = (const dref *) a;
1173 const dref *const db = (const dref *) b;
1175 return (*da)->pos - (*db)->pos;
1178 /* Returns root of the CHAIN. */
1180 static inline dref
1181 get_chain_root (chain_p chain)
1183 return chain->refs[0];
1186 /* Given CHAIN, returns the last write ref at DISTANCE, or NULL if it doesn't
1187 exist. */
1189 static inline dref
1190 get_chain_last_write_at (chain_p chain, unsigned distance)
1192 for (unsigned i = chain->refs.length (); i > 0; i--)
1193 if (DR_IS_WRITE (chain->refs[i - 1]->ref)
1194 && distance == chain->refs[i - 1]->distance)
1195 return chain->refs[i - 1];
1197 return NULL;
1200 /* Given CHAIN, returns the last write ref with the same distance before load
1201 at index LOAD_IDX, or NULL if it doesn't exist. */
1203 static inline dref
1204 get_chain_last_write_before_load (chain_p chain, unsigned load_idx)
1206 gcc_assert (load_idx < chain->refs.length ());
1208 unsigned distance = chain->refs[load_idx]->distance;
1210 for (unsigned i = load_idx; i > 0; i--)
1211 if (DR_IS_WRITE (chain->refs[i - 1]->ref)
1212 && distance == chain->refs[i - 1]->distance)
1213 return chain->refs[i - 1];
1215 return NULL;
1218 /* Adds REF to the chain CHAIN. */
1220 static void
1221 add_ref_to_chain (chain_p chain, dref ref)
1223 dref root = get_chain_root (chain);
1225 gcc_assert (wi::les_p (root->offset, ref->offset));
1226 widest_int dist = ref->offset - root->offset;
1227 gcc_assert (wi::fits_uhwi_p (dist));
1229 chain->refs.safe_push (ref);
1231 ref->distance = dist.to_uhwi ();
1233 if (ref->distance >= chain->length)
1235 chain->length = ref->distance;
1236 chain->has_max_use_after = false;
1239 /* Promote this chain to CT_STORE_STORE if it has multiple stores. */
1240 if (DR_IS_WRITE (ref->ref))
1241 chain->type = CT_STORE_STORE;
1243 /* Don't set the flag for store-store chain since there is no use. */
1244 if (chain->type != CT_STORE_STORE
1245 && ref->distance == chain->length
1246 && ref->pos > root->pos)
1247 chain->has_max_use_after = true;
1249 chain->all_always_accessed &= ref->always_accessed;
1252 /* Returns the chain for invariant component COMP. */
1254 static chain_p
1255 make_invariant_chain (struct component *comp)
1257 chain_p chain = XCNEW (struct chain);
1258 unsigned i;
1259 dref ref;
1261 chain->type = CT_INVARIANT;
1263 chain->all_always_accessed = true;
1265 FOR_EACH_VEC_ELT (comp->refs, i, ref)
1267 chain->refs.safe_push (ref);
1268 chain->all_always_accessed &= ref->always_accessed;
1271 chain->inits = vNULL;
1272 chain->finis = vNULL;
1274 return chain;
1277 /* Make a new chain of type TYPE rooted at REF. */
1279 static chain_p
1280 make_rooted_chain (dref ref, enum chain_type type)
1282 chain_p chain = XCNEW (struct chain);
1284 chain->type = type;
1285 chain->refs.safe_push (ref);
1286 chain->all_always_accessed = ref->always_accessed;
1287 ref->distance = 0;
1289 chain->inits = vNULL;
1290 chain->finis = vNULL;
1292 return chain;
1295 /* Returns true if CHAIN is not trivial. */
1297 static bool
1298 nontrivial_chain_p (chain_p chain)
1300 return chain != NULL && chain->refs.length () > 1;
1303 /* Returns the ssa name that contains the value of REF, or NULL_TREE if there
1304 is no such name. */
1306 static tree
1307 name_for_ref (dref ref)
1309 tree name;
1311 if (is_gimple_assign (ref->stmt))
1313 if (!ref->ref || DR_IS_READ (ref->ref))
1314 name = gimple_assign_lhs (ref->stmt);
1315 else
1316 name = gimple_assign_rhs1 (ref->stmt);
1318 else
1319 name = PHI_RESULT (ref->stmt);
1321 return (TREE_CODE (name) == SSA_NAME ? name : NULL_TREE);
1324 /* Returns true if REF is a valid initializer for ROOT with given DISTANCE (in
1325 iterations of the innermost enclosing loop). */
1327 bool
1328 pcom_worker::valid_initializer_p (struct data_reference *ref, unsigned distance,
1329 struct data_reference *root)
1331 aff_tree diff, base, step;
1332 poly_widest_int off;
1334 /* Both REF and ROOT must be accessing the same object. */
1335 if (!operand_equal_p (DR_BASE_ADDRESS (ref), DR_BASE_ADDRESS (root), 0))
1336 return false;
1338 /* The initializer is defined outside of loop, hence its address must be
1339 invariant inside the loop. */
1340 gcc_assert (integer_zerop (DR_STEP (ref)));
1342 /* If the address of the reference is invariant, initializer must access
1343 exactly the same location. */
1344 if (integer_zerop (DR_STEP (root)))
1345 return (operand_equal_p (DR_OFFSET (ref), DR_OFFSET (root), 0)
1346 && operand_equal_p (DR_INIT (ref), DR_INIT (root), 0));
1348 /* Verify that this index of REF is equal to the root's index at
1349 -DISTANCE-th iteration. */
1350 aff_combination_dr_offset (root, &diff);
1351 aff_combination_dr_offset (ref, &base);
1352 aff_combination_scale (&base, -1);
1353 aff_combination_add (&diff, &base);
1355 tree_to_aff_combination_expand (DR_STEP (root), TREE_TYPE (DR_STEP (root)),
1356 &step, &m_cache);
1357 if (!aff_combination_constant_multiple_p (&diff, &step, &off))
1358 return false;
1360 if (maybe_ne (off, distance))
1361 return false;
1363 return true;
1366 /* Finds looparound phi node of loop that copies the value of REF, and if its
1367 initial value is correct (equal to initial value of REF shifted by one
1368 iteration), returns the phi node. Otherwise, NULL_TREE is returned. ROOT
1369 is the root of the current chain. */
1371 gphi *
1372 pcom_worker::find_looparound_phi (dref ref, dref root)
1374 tree name, init, init_ref;
1375 gphi *phi = NULL;
1376 gimple *init_stmt;
1377 edge latch = loop_latch_edge (m_loop);
1378 struct data_reference init_dr;
1379 gphi_iterator psi;
1381 if (is_gimple_assign (ref->stmt))
1383 if (DR_IS_READ (ref->ref))
1384 name = gimple_assign_lhs (ref->stmt);
1385 else
1386 name = gimple_assign_rhs1 (ref->stmt);
1388 else
1389 name = PHI_RESULT (ref->stmt);
1390 if (!name)
1391 return NULL;
1393 for (psi = gsi_start_phis (m_loop->header); !gsi_end_p (psi); gsi_next (&psi))
1395 phi = psi.phi ();
1396 if (PHI_ARG_DEF_FROM_EDGE (phi, latch) == name)
1397 break;
1400 if (gsi_end_p (psi))
1401 return NULL;
1403 init = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (m_loop));
1404 if (TREE_CODE (init) != SSA_NAME)
1405 return NULL;
1406 init_stmt = SSA_NAME_DEF_STMT (init);
1407 if (gimple_code (init_stmt) != GIMPLE_ASSIGN)
1408 return NULL;
1409 gcc_assert (gimple_assign_lhs (init_stmt) == init);
1411 init_ref = gimple_assign_rhs1 (init_stmt);
1412 if (!REFERENCE_CLASS_P (init_ref)
1413 && !DECL_P (init_ref))
1414 return NULL;
1416 /* Analyze the behavior of INIT_REF with respect to LOOP (innermost
1417 loop enclosing PHI). */
1418 memset (&init_dr, 0, sizeof (struct data_reference));
1419 DR_REF (&init_dr) = init_ref;
1420 DR_STMT (&init_dr) = phi;
1421 if (!dr_analyze_innermost (&DR_INNERMOST (&init_dr), init_ref, m_loop,
1422 init_stmt))
1423 return NULL;
1425 if (!valid_initializer_p (&init_dr, ref->distance + 1, root->ref))
1426 return NULL;
1428 return phi;
1431 /* Adds a reference for the looparound copy of REF in PHI to CHAIN. */
1433 static void
1434 insert_looparound_copy (chain_p chain, dref ref, gphi *phi)
1436 dref nw = XCNEW (class dref_d), aref;
1437 unsigned i;
1439 nw->stmt = phi;
1440 nw->distance = ref->distance + 1;
1441 nw->always_accessed = 1;
1443 FOR_EACH_VEC_ELT (chain->refs, i, aref)
1444 if (aref->distance >= nw->distance)
1445 break;
1446 chain->refs.safe_insert (i, nw);
1448 if (nw->distance > chain->length)
1450 chain->length = nw->distance;
1451 chain->has_max_use_after = false;
1455 /* For references in CHAIN that are copied around the loop (created previously
1456 by PRE, or by user), add the results of such copies to the chain. This
1457 enables us to remove the copies by unrolling, and may need less registers
1458 (also, it may allow us to combine chains together). */
1460 void
1461 pcom_worker::add_looparound_copies (chain_p chain)
1463 unsigned i;
1464 dref ref, root = get_chain_root (chain);
1465 gphi *phi;
1467 if (chain->type == CT_STORE_STORE)
1468 return;
1470 FOR_EACH_VEC_ELT (chain->refs, i, ref)
1472 phi = find_looparound_phi (ref, root);
1473 if (!phi)
1474 continue;
1476 bitmap_set_bit (m_looparound_phis, SSA_NAME_VERSION (PHI_RESULT (phi)));
1477 insert_looparound_copy (chain, ref, phi);
1481 /* Find roots of the values and determine distances in the component COMP.
1482 The references are redistributed into chains. */
1484 void
1485 pcom_worker::determine_roots_comp (struct component *comp)
1487 unsigned i;
1488 dref a;
1489 chain_p chain = NULL;
1490 widest_int last_ofs = 0;
1491 enum chain_type type;
1493 /* Invariants are handled specially. */
1494 if (comp->comp_step == RS_INVARIANT)
1496 chain = make_invariant_chain (comp);
1497 m_chains.safe_push (chain);
1498 return;
1501 /* Trivial component. */
1502 if (comp->refs.length () <= 1)
1504 if (comp->refs.length () == 1)
1506 free (comp->refs[0]);
1507 comp->refs.truncate (0);
1509 return;
1512 comp->refs.qsort (order_drefs);
1514 /* For Store-Store chain, we only support load if it is dominated by a
1515 store statement in the same iteration of loop. */
1516 if (comp->eliminate_store_p)
1517 for (a = NULL, i = 0; i < comp->refs.length (); i++)
1519 if (DR_IS_WRITE (comp->refs[i]->ref))
1520 a = comp->refs[i];
1521 else if (a == NULL || a->offset != comp->refs[i]->offset)
1523 /* If there is load that is not dominated by a store in the
1524 same iteration of loop, clear the flag so no Store-Store
1525 chain is generated for this component. */
1526 comp->eliminate_store_p = false;
1527 break;
1531 /* Determine roots and create chains for components. */
1532 FOR_EACH_VEC_ELT (comp->refs, i, a)
1534 if (!chain
1535 || (chain->type == CT_LOAD && DR_IS_WRITE (a->ref))
1536 || (!comp->eliminate_store_p && DR_IS_WRITE (a->ref))
1537 || wi::leu_p (MAX_DISTANCE, a->offset - last_ofs))
1539 if (nontrivial_chain_p (chain))
1541 add_looparound_copies (chain);
1542 m_chains.safe_push (chain);
1544 else
1545 release_chain (chain);
1547 /* Determine type of the chain. If the root reference is a load,
1548 this can only be a CT_LOAD chain; other chains are intialized
1549 to CT_STORE_LOAD and might be promoted to CT_STORE_STORE when
1550 new reference is added. */
1551 type = DR_IS_READ (a->ref) ? CT_LOAD : CT_STORE_LOAD;
1552 chain = make_rooted_chain (a, type);
1553 last_ofs = a->offset;
1554 continue;
1557 add_ref_to_chain (chain, a);
1560 if (nontrivial_chain_p (chain))
1562 add_looparound_copies (chain);
1563 m_chains.safe_push (chain);
1565 else
1566 release_chain (chain);
1569 /* Find roots of the values and determine distances in components COMPS, and
1570 separates the references to chains. */
1572 void
1573 pcom_worker::determine_roots (struct component *comps)
1575 struct component *comp;
1577 for (comp = comps; comp; comp = comp->next)
1578 determine_roots_comp (comp);
1581 /* Replace the reference in statement STMT with temporary variable
1582 NEW_TREE. If SET is true, NEW_TREE is instead initialized to the value of
1583 the reference in the statement. IN_LHS is true if the reference
1584 is in the lhs of STMT, false if it is in rhs. */
1586 static void
1587 replace_ref_with (gimple *stmt, tree new_tree, bool set, bool in_lhs)
1589 tree val;
1590 gassign *new_stmt;
1591 gimple_stmt_iterator bsi, psi;
1593 if (gimple_code (stmt) == GIMPLE_PHI)
1595 gcc_assert (!in_lhs && !set);
1597 val = PHI_RESULT (stmt);
1598 bsi = gsi_after_labels (gimple_bb (stmt));
1599 psi = gsi_for_stmt (stmt);
1600 remove_phi_node (&psi, false);
1602 /* Turn the phi node into GIMPLE_ASSIGN. */
1603 new_stmt = gimple_build_assign (val, new_tree);
1604 gsi_insert_before (&bsi, new_stmt, GSI_NEW_STMT);
1605 return;
1608 /* Since the reference is of gimple_reg type, it should only
1609 appear as lhs or rhs of modify statement. */
1610 gcc_assert (is_gimple_assign (stmt));
1612 bsi = gsi_for_stmt (stmt);
1614 /* If we do not need to initialize NEW_TREE, just replace the use of OLD. */
1615 if (!set)
1617 gcc_assert (!in_lhs);
1618 gimple_assign_set_rhs_from_tree (&bsi, new_tree);
1619 stmt = gsi_stmt (bsi);
1620 update_stmt (stmt);
1621 return;
1624 if (in_lhs)
1626 /* We have statement
1628 OLD = VAL
1630 If OLD is a memory reference, then VAL is gimple_val, and we transform
1631 this to
1633 OLD = VAL
1634 NEW = VAL
1636 Otherwise, we are replacing a combination chain,
1637 VAL is the expression that performs the combination, and OLD is an
1638 SSA name. In this case, we transform the assignment to
1640 OLD = VAL
1641 NEW = OLD
1645 val = gimple_assign_lhs (stmt);
1646 if (TREE_CODE (val) != SSA_NAME)
1648 val = gimple_assign_rhs1 (stmt);
1649 gcc_assert (gimple_assign_single_p (stmt));
1650 if (TREE_CLOBBER_P (val))
1651 val = get_or_create_ssa_default_def (cfun, SSA_NAME_VAR (new_tree));
1652 else
1653 gcc_assert (gimple_assign_copy_p (stmt));
1656 else
1658 /* VAL = OLD
1660 is transformed to
1662 VAL = OLD
1663 NEW = VAL */
1665 val = gimple_assign_lhs (stmt);
1668 new_stmt = gimple_build_assign (new_tree, unshare_expr (val));
1669 gsi_insert_after (&bsi, new_stmt, GSI_NEW_STMT);
1672 /* Returns a memory reference to DR in the (NITERS + ITER)-th iteration
1673 of the loop it was analyzed in. Append init stmts to STMTS. */
1675 static tree
1676 ref_at_iteration (data_reference_p dr, int iter,
1677 gimple_seq *stmts, tree niters = NULL_TREE)
1679 tree off = DR_OFFSET (dr);
1680 tree coff = DR_INIT (dr);
1681 tree ref = DR_REF (dr);
1682 enum tree_code ref_code = ERROR_MARK;
1683 tree ref_type = NULL_TREE;
1684 tree ref_op1 = NULL_TREE;
1685 tree ref_op2 = NULL_TREE;
1686 tree new_offset;
1688 if (iter != 0)
1690 new_offset = size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter));
1691 if (TREE_CODE (new_offset) == INTEGER_CST)
1692 coff = size_binop (PLUS_EXPR, coff, new_offset);
1693 else
1694 off = size_binop (PLUS_EXPR, off, new_offset);
1697 if (niters != NULL_TREE)
1699 niters = fold_convert (ssizetype, niters);
1700 new_offset = size_binop (MULT_EXPR, DR_STEP (dr), niters);
1701 if (TREE_CODE (niters) == INTEGER_CST)
1702 coff = size_binop (PLUS_EXPR, coff, new_offset);
1703 else
1704 off = size_binop (PLUS_EXPR, off, new_offset);
1707 /* While data-ref analysis punts on bit offsets it still handles
1708 bitfield accesses at byte boundaries. Cope with that. Note that
1709 if the bitfield object also starts at a byte-boundary we can simply
1710 replicate the COMPONENT_REF, but we have to subtract the component's
1711 byte-offset from the MEM_REF address first.
1712 Otherwise we simply build a BIT_FIELD_REF knowing that the bits
1713 start at offset zero. */
1714 if (TREE_CODE (ref) == COMPONENT_REF
1715 && DECL_BIT_FIELD (TREE_OPERAND (ref, 1)))
1717 unsigned HOST_WIDE_INT boff;
1718 tree field = TREE_OPERAND (ref, 1);
1719 tree offset = component_ref_field_offset (ref);
1720 ref_type = TREE_TYPE (ref);
1721 boff = tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field));
1722 /* This can occur in Ada. See the comment in get_bit_range. */
1723 if (boff % BITS_PER_UNIT != 0
1724 || !tree_fits_uhwi_p (offset))
1726 ref_code = BIT_FIELD_REF;
1727 ref_op1 = DECL_SIZE (field);
1728 ref_op2 = bitsize_zero_node;
1730 else
1732 boff >>= LOG2_BITS_PER_UNIT;
1733 boff += tree_to_uhwi (offset);
1734 coff = size_binop (MINUS_EXPR, coff, ssize_int (boff));
1735 ref_code = COMPONENT_REF;
1736 ref_op1 = field;
1737 ref_op2 = TREE_OPERAND (ref, 2);
1738 ref = TREE_OPERAND (ref, 0);
1741 tree addr = fold_build_pointer_plus (DR_BASE_ADDRESS (dr), off);
1742 addr = force_gimple_operand_1 (unshare_expr (addr), stmts,
1743 is_gimple_mem_ref_addr, NULL_TREE);
1744 tree alias_ptr = fold_convert (reference_alias_ptr_type (ref), coff);
1745 tree type = build_aligned_type (TREE_TYPE (ref),
1746 get_object_alignment (ref));
1747 ref = build2 (MEM_REF, type, addr, alias_ptr);
1748 if (ref_type)
1749 ref = build3 (ref_code, ref_type, ref, ref_op1, ref_op2);
1750 return ref;
1753 /* Get the initialization expression for the INDEX-th temporary variable
1754 of CHAIN. */
1756 static tree
1757 get_init_expr (chain_p chain, unsigned index)
1759 if (chain->type == CT_COMBINATION)
1761 tree e1 = get_init_expr (chain->ch1, index);
1762 tree e2 = get_init_expr (chain->ch2, index);
1764 return fold_build2 (chain->op, chain->rslt_type, e1, e2);
1766 else
1767 return chain->inits[index];
1770 /* Returns a new temporary variable used for the I-th variable carrying
1771 value of REF. The variable's uid is marked in TMP_VARS. */
1773 static tree
1774 predcom_tmp_var (tree ref, unsigned i, bitmap tmp_vars)
1776 tree type = TREE_TYPE (ref);
1777 /* We never access the components of the temporary variable in predictive
1778 commoning. */
1779 tree var = create_tmp_reg (type, get_lsm_tmp_name (ref, i));
1780 bitmap_set_bit (tmp_vars, DECL_UID (var));
1781 return var;
1784 /* Creates the variables for CHAIN, as well as phi nodes for them and
1785 initialization on entry to LOOP. Uids of the newly created
1786 temporary variables are marked in TMP_VARS. */
1788 static void
1789 initialize_root_vars (class loop *loop, chain_p chain, bitmap tmp_vars)
1791 unsigned i;
1792 unsigned n = chain->length;
1793 dref root = get_chain_root (chain);
1794 bool reuse_first = !chain->has_max_use_after;
1795 tree ref, init, var, next;
1796 gphi *phi;
1797 gimple_seq stmts;
1798 edge entry = loop_preheader_edge (loop), latch = loop_latch_edge (loop);
1800 /* If N == 0, then all the references are within the single iteration. And
1801 since this is an nonempty chain, reuse_first cannot be true. */
1802 gcc_assert (n > 0 || !reuse_first);
1804 chain->vars.create (n + 1);
1806 if (chain->type == CT_COMBINATION)
1807 ref = gimple_assign_lhs (root->stmt);
1808 else
1809 ref = DR_REF (root->ref);
1811 for (i = 0; i < n + (reuse_first ? 0 : 1); i++)
1813 var = predcom_tmp_var (ref, i, tmp_vars);
1814 chain->vars.quick_push (var);
1816 if (reuse_first)
1817 chain->vars.quick_push (chain->vars[0]);
1819 FOR_EACH_VEC_ELT (chain->vars, i, var)
1820 chain->vars[i] = make_ssa_name (var);
1822 for (i = 0; i < n; i++)
1824 var = chain->vars[i];
1825 next = chain->vars[i + 1];
1826 init = get_init_expr (chain, i);
1828 init = force_gimple_operand (init, &stmts, true, NULL_TREE);
1829 if (stmts)
1830 gsi_insert_seq_on_edge_immediate (entry, stmts);
1832 phi = create_phi_node (var, loop->header);
1833 add_phi_arg (phi, init, entry, UNKNOWN_LOCATION);
1834 add_phi_arg (phi, next, latch, UNKNOWN_LOCATION);
1838 /* For inter-iteration store elimination CHAIN in LOOP, returns true if
1839 all stores to be eliminated store loop invariant values into memory.
1840 In this case, we can use these invariant values directly after LOOP. */
1842 static bool
1843 is_inv_store_elimination_chain (class loop *loop, chain_p chain)
1845 if (chain->length == 0 || chain->type != CT_STORE_STORE)
1846 return false;
1848 gcc_assert (!chain->has_max_use_after);
1850 /* If loop iterates for unknown times or fewer times than chain->length,
1851 we still need to setup root variable and propagate it with PHI node. */
1852 tree niters = number_of_latch_executions (loop);
1853 if (TREE_CODE (niters) != INTEGER_CST
1854 || wi::leu_p (wi::to_wide (niters), chain->length))
1855 return false;
1857 /* Check stores in chain for elimination if they only store loop invariant
1858 values. */
1859 for (unsigned i = 0; i < chain->length; i++)
1861 dref a = get_chain_last_write_at (chain, i);
1862 if (a == NULL)
1863 continue;
1865 gimple *def_stmt, *stmt = a->stmt;
1866 if (!gimple_assign_single_p (stmt))
1867 return false;
1869 tree val = gimple_assign_rhs1 (stmt);
1870 if (TREE_CLOBBER_P (val))
1871 return false;
1873 if (CONSTANT_CLASS_P (val))
1874 continue;
1876 if (TREE_CODE (val) != SSA_NAME)
1877 return false;
1879 def_stmt = SSA_NAME_DEF_STMT (val);
1880 if (gimple_nop_p (def_stmt))
1881 continue;
1883 if (flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
1884 return false;
1886 return true;
1889 /* Creates root variables for store elimination CHAIN in which stores for
1890 elimination only store loop invariant values. In this case, we neither
1891 need to load root variables before loop nor propagate it with PHI nodes. */
1893 static void
1894 initialize_root_vars_store_elim_1 (chain_p chain)
1896 tree var;
1897 unsigned i, n = chain->length;
1899 chain->vars.create (n);
1900 chain->vars.safe_grow_cleared (n, true);
1902 /* Initialize root value for eliminated stores at each distance. */
1903 for (i = 0; i < n; i++)
1905 dref a = get_chain_last_write_at (chain, i);
1906 if (a == NULL)
1907 continue;
1909 var = gimple_assign_rhs1 (a->stmt);
1910 chain->vars[a->distance] = var;
1913 /* We don't propagate values with PHI nodes, so manually propagate value
1914 to bubble positions. */
1915 var = chain->vars[0];
1916 for (i = 1; i < n; i++)
1918 if (chain->vars[i] != NULL_TREE)
1920 var = chain->vars[i];
1921 continue;
1923 chain->vars[i] = var;
1926 /* Revert the vector. */
1927 for (i = 0; i < n / 2; i++)
1928 std::swap (chain->vars[i], chain->vars[n - i - 1]);
1931 /* Creates root variables for store elimination CHAIN in which stores for
1932 elimination store loop variant values. In this case, we may need to
1933 load root variables before LOOP and propagate it with PHI nodes. Uids
1934 of the newly created root variables are marked in TMP_VARS. */
1936 static void
1937 initialize_root_vars_store_elim_2 (class loop *loop,
1938 chain_p chain, bitmap tmp_vars)
1940 unsigned i, n = chain->length;
1941 tree ref, init, var, next, val, phi_result;
1942 gimple *stmt;
1943 gimple_seq stmts;
1945 chain->vars.create (n);
1947 ref = DR_REF (get_chain_root (chain)->ref);
1948 for (i = 0; i < n; i++)
1950 var = predcom_tmp_var (ref, i, tmp_vars);
1951 chain->vars.quick_push (var);
1954 FOR_EACH_VEC_ELT (chain->vars, i, var)
1955 chain->vars[i] = make_ssa_name (var);
1957 /* Root values are either rhs operand of stores to be eliminated, or
1958 loaded from memory before loop. */
1959 auto_vec<tree> vtemps;
1960 vtemps.safe_grow_cleared (n, true);
1961 for (i = 0; i < n; i++)
1963 init = get_init_expr (chain, i);
1964 if (init == NULL_TREE)
1966 /* Root value is rhs operand of the store to be eliminated if
1967 it isn't loaded from memory before loop. */
1968 dref a = get_chain_last_write_at (chain, i);
1969 val = gimple_assign_rhs1 (a->stmt);
1970 if (TREE_CLOBBER_P (val))
1972 val = get_or_create_ssa_default_def (cfun, SSA_NAME_VAR (var));
1973 gimple_assign_set_rhs1 (a->stmt, val);
1976 vtemps[n - i - 1] = val;
1978 else
1980 edge latch = loop_latch_edge (loop);
1981 edge entry = loop_preheader_edge (loop);
1983 /* Root value is loaded from memory before loop, we also need
1984 to add PHI nodes to propagate the value across iterations. */
1985 init = force_gimple_operand (init, &stmts, true, NULL_TREE);
1986 if (stmts)
1987 gsi_insert_seq_on_edge_immediate (entry, stmts);
1989 next = chain->vars[n - i];
1990 phi_result = copy_ssa_name (next);
1991 gphi *phi = create_phi_node (phi_result, loop->header);
1992 add_phi_arg (phi, init, entry, UNKNOWN_LOCATION);
1993 add_phi_arg (phi, next, latch, UNKNOWN_LOCATION);
1994 vtemps[n - i - 1] = phi_result;
1998 /* Find the insertion position. */
1999 dref last = get_chain_root (chain);
2000 for (i = 0; i < chain->refs.length (); i++)
2002 if (chain->refs[i]->pos > last->pos)
2003 last = chain->refs[i];
2006 gimple_stmt_iterator gsi = gsi_for_stmt (last->stmt);
2008 /* Insert statements copying root value to root variable. */
2009 for (i = 0; i < n; i++)
2011 var = chain->vars[i];
2012 val = vtemps[i];
2013 stmt = gimple_build_assign (var, val);
2014 gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
2018 /* Generates stores for CHAIN's eliminated stores in LOOP's last
2019 (CHAIN->length - 1) iterations. */
2021 static void
2022 finalize_eliminated_stores (class loop *loop, chain_p chain)
2024 unsigned i, n = chain->length;
2026 for (i = 0; i < n; i++)
2028 tree var = chain->vars[i];
2029 tree fini = chain->finis[n - i - 1];
2030 gimple *stmt = gimple_build_assign (fini, var);
2032 gimple_seq_add_stmt_without_update (&chain->fini_seq, stmt);
2035 if (chain->fini_seq)
2037 gsi_insert_seq_on_edge_immediate (single_exit (loop), chain->fini_seq);
2038 chain->fini_seq = NULL;
2042 /* Initializes a variable for load motion for ROOT and prepares phi nodes and
2043 initialization on entry to LOOP if necessary. The ssa name for the variable
2044 is stored in VARS. If WRITTEN is true, also a phi node to copy its value
2045 around the loop is created. Uid of the newly created temporary variable
2046 is marked in TMP_VARS. INITS is the list containing the (single)
2047 initializer. */
2049 static void
2050 initialize_root_vars_lm (class loop *loop, dref root, bool written,
2051 vec<tree> *vars, const vec<tree> &inits,
2052 bitmap tmp_vars)
2054 unsigned i;
2055 tree ref = DR_REF (root->ref), init, var, next;
2056 gimple_seq stmts;
2057 gphi *phi;
2058 edge entry = loop_preheader_edge (loop), latch = loop_latch_edge (loop);
2060 /* Find the initializer for the variable, and check that it cannot
2061 trap. */
2062 init = inits[0];
2064 vars->create (written ? 2 : 1);
2065 var = predcom_tmp_var (ref, 0, tmp_vars);
2066 vars->quick_push (var);
2067 if (written)
2068 vars->quick_push ((*vars)[0]);
2070 FOR_EACH_VEC_ELT (*vars, i, var)
2071 (*vars)[i] = make_ssa_name (var);
2073 var = (*vars)[0];
2075 init = force_gimple_operand (init, &stmts, written, NULL_TREE);
2076 if (stmts)
2077 gsi_insert_seq_on_edge_immediate (entry, stmts);
2079 if (written)
2081 next = (*vars)[1];
2082 phi = create_phi_node (var, loop->header);
2083 add_phi_arg (phi, init, entry, UNKNOWN_LOCATION);
2084 add_phi_arg (phi, next, latch, UNKNOWN_LOCATION);
2086 else
2088 gassign *init_stmt = gimple_build_assign (var, init);
2089 gsi_insert_on_edge_immediate (entry, init_stmt);
2094 /* Execute load motion for references in chain CHAIN. Uids of the newly
2095 created temporary variables are marked in TMP_VARS. */
2097 static void
2098 execute_load_motion (class loop *loop, chain_p chain, bitmap tmp_vars)
2100 auto_vec<tree> vars;
2101 dref a;
2102 unsigned n_writes = 0, ridx, i;
2103 tree var;
2105 gcc_assert (chain->type == CT_INVARIANT);
2106 gcc_assert (!chain->combined);
2107 FOR_EACH_VEC_ELT (chain->refs, i, a)
2108 if (DR_IS_WRITE (a->ref))
2109 n_writes++;
2111 /* If there are no reads in the loop, there is nothing to do. */
2112 if (n_writes == chain->refs.length ())
2113 return;
2115 initialize_root_vars_lm (loop, get_chain_root (chain), n_writes > 0,
2116 &vars, chain->inits, tmp_vars);
2118 ridx = 0;
2119 FOR_EACH_VEC_ELT (chain->refs, i, a)
2121 bool is_read = DR_IS_READ (a->ref);
2123 if (DR_IS_WRITE (a->ref))
2125 n_writes--;
2126 if (n_writes)
2128 var = vars[0];
2129 var = make_ssa_name (SSA_NAME_VAR (var));
2130 vars[0] = var;
2132 else
2133 ridx = 1;
2136 replace_ref_with (a->stmt, vars[ridx],
2137 !is_read, !is_read);
2141 /* Returns the single statement in that NAME is used, excepting
2142 the looparound phi nodes contained in one of the chains. If there is no
2143 such statement, or more statements, NULL is returned. */
2145 gimple *
2146 pcom_worker::single_nonlooparound_use (tree name)
2148 use_operand_p use;
2149 imm_use_iterator it;
2150 gimple *stmt, *ret = NULL;
2152 FOR_EACH_IMM_USE_FAST (use, it, name)
2154 stmt = USE_STMT (use);
2156 if (gimple_code (stmt) == GIMPLE_PHI)
2158 /* Ignore uses in looparound phi nodes. Uses in other phi nodes
2159 could not be processed anyway, so just fail for them. */
2160 if (bitmap_bit_p (m_looparound_phis,
2161 SSA_NAME_VERSION (PHI_RESULT (stmt))))
2162 continue;
2164 return NULL;
2166 else if (is_gimple_debug (stmt))
2167 continue;
2168 else if (ret != NULL)
2169 return NULL;
2170 else
2171 ret = stmt;
2174 return ret;
2177 /* Remove statement STMT, as well as the chain of assignments in that it is
2178 used. */
2180 void
2181 pcom_worker::remove_stmt (gimple *stmt)
2183 tree name;
2184 gimple *next;
2185 gimple_stmt_iterator psi;
2187 if (gimple_code (stmt) == GIMPLE_PHI)
2189 name = PHI_RESULT (stmt);
2190 next = single_nonlooparound_use (name);
2191 reset_debug_uses (stmt);
2192 psi = gsi_for_stmt (stmt);
2193 remove_phi_node (&psi, true);
2195 if (!next
2196 || !gimple_assign_ssa_name_copy_p (next)
2197 || gimple_assign_rhs1 (next) != name)
2198 return;
2200 stmt = next;
2203 while (1)
2205 gimple_stmt_iterator bsi;
2207 bsi = gsi_for_stmt (stmt);
2209 name = gimple_assign_lhs (stmt);
2210 if (TREE_CODE (name) == SSA_NAME)
2212 next = single_nonlooparound_use (name);
2213 reset_debug_uses (stmt);
2215 else
2217 /* This is a store to be eliminated. */
2218 gcc_assert (gimple_vdef (stmt) != NULL);
2219 next = NULL;
2222 unlink_stmt_vdef (stmt);
2223 gsi_remove (&bsi, true);
2224 release_defs (stmt);
2226 if (!next
2227 || !gimple_assign_ssa_name_copy_p (next)
2228 || gimple_assign_rhs1 (next) != name)
2229 return;
2231 stmt = next;
2235 /* Perform the predictive commoning optimization for a chain CHAIN.
2236 Uids of the newly created temporary variables are marked in TMP_VARS.*/
2238 void
2239 pcom_worker::execute_pred_commoning_chain (chain_p chain,
2240 bitmap tmp_vars)
2242 unsigned i;
2243 dref a;
2244 tree var;
2245 bool in_lhs;
2247 if (chain->combined)
2249 /* For combined chains, just remove the statements that are used to
2250 compute the values of the expression (except for the root one).
2251 We delay this until after all chains are processed. */
2253 else if (chain->type == CT_STORE_STORE)
2255 if (chain->length > 0)
2257 if (chain->inv_store_elimination)
2259 /* If dead stores in this chain only store loop invariant
2260 values, we can simply record the invariant value and use
2261 it directly after loop. */
2262 initialize_root_vars_store_elim_1 (chain);
2264 else
2266 /* If dead stores in this chain store loop variant values,
2267 we need to set up the variables by loading from memory
2268 before loop and propagating it with PHI nodes. */
2269 initialize_root_vars_store_elim_2 (m_loop, chain, tmp_vars);
2272 /* For inter-iteration store elimination chain, stores at each
2273 distance in loop's last (chain->length - 1) iterations can't
2274 be eliminated, because there is no following killing store.
2275 We need to generate these stores after loop. */
2276 finalize_eliminated_stores (m_loop, chain);
2279 bool last_store_p = true;
2280 for (i = chain->refs.length (); i > 0; i--)
2282 a = chain->refs[i - 1];
2283 /* Preserve the last store of the chain. Eliminate other stores
2284 which are killed by the last one. */
2285 if (DR_IS_WRITE (a->ref))
2287 if (last_store_p)
2288 last_store_p = false;
2289 else
2290 remove_stmt (a->stmt);
2292 continue;
2295 /* Any load in Store-Store chain must be dominated by a previous
2296 store, we replace the load reference with rhs of the store. */
2297 dref b = get_chain_last_write_before_load (chain, i - 1);
2298 gcc_assert (b != NULL);
2299 var = gimple_assign_rhs1 (b->stmt);
2300 replace_ref_with (a->stmt, var, false, false);
2303 else
2305 /* For non-combined chains, set up the variables that hold its value. */
2306 initialize_root_vars (m_loop, chain, tmp_vars);
2307 a = get_chain_root (chain);
2308 in_lhs = (chain->type == CT_STORE_LOAD
2309 || chain->type == CT_COMBINATION);
2310 replace_ref_with (a->stmt, chain->vars[chain->length], true, in_lhs);
2312 /* Replace the uses of the original references by these variables. */
2313 for (i = 1; chain->refs.iterate (i, &a); i++)
2315 var = chain->vars[chain->length - a->distance];
2316 replace_ref_with (a->stmt, var, false, false);
2321 /* Determines the unroll factor necessary to remove as many temporary variable
2322 copies as possible. CHAINS is the list of chains that will be
2323 optimized. */
2325 static unsigned
2326 determine_unroll_factor (const vec<chain_p> &chains)
2328 chain_p chain;
2329 unsigned factor = 1, af, nfactor, i;
2330 unsigned max = param_max_unroll_times;
2332 FOR_EACH_VEC_ELT (chains, i, chain)
2334 if (chain->type == CT_INVARIANT)
2335 continue;
2336 /* For now we can't handle unrolling when eliminating stores. */
2337 else if (chain->type == CT_STORE_STORE)
2338 return 1;
2340 if (chain->combined)
2342 /* For combined chains, we can't handle unrolling if we replace
2343 looparound PHIs. */
2344 dref a;
2345 unsigned j;
2346 for (j = 1; chain->refs.iterate (j, &a); j++)
2347 if (gimple_code (a->stmt) == GIMPLE_PHI)
2348 return 1;
2349 continue;
2352 /* The best unroll factor for this chain is equal to the number of
2353 temporary variables that we create for it. */
2354 af = chain->length;
2355 if (chain->has_max_use_after)
2356 af++;
2358 nfactor = factor * af / gcd (factor, af);
2359 if (nfactor <= max)
2360 factor = nfactor;
2363 return factor;
2366 /* Perform the predictive commoning optimization for chains.
2367 Uids of the newly created temporary variables are marked in TMP_VARS. */
2369 void
2370 pcom_worker::execute_pred_commoning (bitmap tmp_vars)
2372 chain_p chain;
2373 unsigned i;
2375 FOR_EACH_VEC_ELT (m_chains, i, chain)
2377 if (chain->type == CT_INVARIANT)
2378 execute_load_motion (m_loop, chain, tmp_vars);
2379 else
2380 execute_pred_commoning_chain (chain, tmp_vars);
2383 FOR_EACH_VEC_ELT (m_chains, i, chain)
2385 if (chain->type == CT_INVARIANT)
2387 else if (chain->combined)
2389 /* For combined chains, just remove the statements that are used to
2390 compute the values of the expression (except for the root one). */
2391 dref a;
2392 unsigned j;
2393 for (j = 1; chain->refs.iterate (j, &a); j++)
2394 remove_stmt (a->stmt);
2399 /* For each reference in CHAINS, if its defining statement is
2400 phi node, record the ssa name that is defined by it. */
2402 static void
2403 replace_phis_by_defined_names (vec<chain_p> &chains)
2405 chain_p chain;
2406 dref a;
2407 unsigned i, j;
2409 FOR_EACH_VEC_ELT (chains, i, chain)
2410 FOR_EACH_VEC_ELT (chain->refs, j, a)
2412 if (gimple_code (a->stmt) == GIMPLE_PHI)
2414 a->name_defined_by_phi = PHI_RESULT (a->stmt);
2415 a->stmt = NULL;
2420 /* For each reference in CHAINS, if name_defined_by_phi is not
2421 NULL, use it to set the stmt field. */
2423 static void
2424 replace_names_by_phis (vec<chain_p> chains)
2426 chain_p chain;
2427 dref a;
2428 unsigned i, j;
2430 FOR_EACH_VEC_ELT (chains, i, chain)
2431 FOR_EACH_VEC_ELT (chain->refs, j, a)
2432 if (a->stmt == NULL)
2434 a->stmt = SSA_NAME_DEF_STMT (a->name_defined_by_phi);
2435 gcc_assert (gimple_code (a->stmt) == GIMPLE_PHI);
2436 a->name_defined_by_phi = NULL_TREE;
2440 /* Wrapper over execute_pred_commoning, to pass it as a callback
2441 to tree_transform_and_unroll_loop. */
2443 struct epcc_data
2445 vec<chain_p> chains;
2446 bitmap tmp_vars;
2447 pcom_worker *worker;
2450 static void
2451 execute_pred_commoning_cbck (class loop *loop ATTRIBUTE_UNUSED, void *data)
2453 struct epcc_data *const dta = (struct epcc_data *) data;
2454 pcom_worker *worker = dta->worker;
2456 /* Restore phi nodes that were replaced by ssa names before
2457 tree_transform_and_unroll_loop (see detailed description in
2458 tree_predictive_commoning_loop). */
2459 replace_names_by_phis (dta->chains);
2460 worker->execute_pred_commoning (dta->tmp_vars);
2463 /* Base NAME and all the names in the chain of phi nodes that use it
2464 on variable VAR. The phi nodes are recognized by being in the copies of
2465 the header of the LOOP. */
2467 static void
2468 base_names_in_chain_on (class loop *loop, tree name, tree var)
2470 gimple *stmt, *phi;
2471 imm_use_iterator iter;
2473 replace_ssa_name_symbol (name, var);
2475 while (1)
2477 phi = NULL;
2478 FOR_EACH_IMM_USE_STMT (stmt, iter, name)
2480 if (gimple_code (stmt) == GIMPLE_PHI
2481 && flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
2483 phi = stmt;
2484 break;
2487 if (!phi)
2488 return;
2490 name = PHI_RESULT (phi);
2491 replace_ssa_name_symbol (name, var);
2495 /* Given an unrolled LOOP after predictive commoning, remove the
2496 register copies arising from phi nodes by changing the base
2497 variables of SSA names. TMP_VARS is the set of the temporary variables
2498 for those we want to perform this. */
2500 static void
2501 eliminate_temp_copies (class loop *loop, bitmap tmp_vars)
2503 edge e;
2504 gphi *phi;
2505 gimple *stmt;
2506 tree name, use, var;
2507 gphi_iterator psi;
2509 e = loop_latch_edge (loop);
2510 for (psi = gsi_start_phis (loop->header); !gsi_end_p (psi); gsi_next (&psi))
2512 phi = psi.phi ();
2513 name = PHI_RESULT (phi);
2514 var = SSA_NAME_VAR (name);
2515 if (!var || !bitmap_bit_p (tmp_vars, DECL_UID (var)))
2516 continue;
2517 use = PHI_ARG_DEF_FROM_EDGE (phi, e);
2518 gcc_assert (TREE_CODE (use) == SSA_NAME);
2520 /* Base all the ssa names in the ud and du chain of NAME on VAR. */
2521 stmt = SSA_NAME_DEF_STMT (use);
2522 while (gimple_code (stmt) == GIMPLE_PHI
2523 /* In case we could not unroll the loop enough to eliminate
2524 all copies, we may reach the loop header before the defining
2525 statement (in that case, some register copies will be present
2526 in loop latch in the final code, corresponding to the newly
2527 created looparound phi nodes). */
2528 && gimple_bb (stmt) != loop->header)
2530 gcc_assert (single_pred_p (gimple_bb (stmt)));
2531 use = PHI_ARG_DEF (stmt, 0);
2532 stmt = SSA_NAME_DEF_STMT (use);
2535 base_names_in_chain_on (loop, use, var);
2539 /* Returns true if CHAIN is suitable to be combined. */
2541 static bool
2542 chain_can_be_combined_p (chain_p chain)
2544 return (!chain->combined
2545 && (chain->type == CT_LOAD || chain->type == CT_COMBINATION));
2548 /* Returns the modify statement that uses NAME. Skips over assignment
2549 statements, NAME is replaced with the actual name used in the returned
2550 statement. */
2552 gimple *
2553 pcom_worker::find_use_stmt (tree *name)
2555 gimple *stmt;
2556 tree rhs, lhs;
2558 /* Skip over assignments. */
2559 while (1)
2561 stmt = single_nonlooparound_use (*name);
2562 if (!stmt)
2563 return NULL;
2565 if (gimple_code (stmt) != GIMPLE_ASSIGN)
2566 return NULL;
2568 lhs = gimple_assign_lhs (stmt);
2569 if (TREE_CODE (lhs) != SSA_NAME)
2570 return NULL;
2572 if (gimple_assign_copy_p (stmt))
2574 rhs = gimple_assign_rhs1 (stmt);
2575 if (rhs != *name)
2576 return NULL;
2578 *name = lhs;
2580 else if (get_gimple_rhs_class (gimple_assign_rhs_code (stmt))
2581 == GIMPLE_BINARY_RHS)
2582 return stmt;
2583 else
2584 return NULL;
2588 /* Returns true if we may perform reassociation for operation CODE in TYPE. */
2590 static bool
2591 may_reassociate_p (tree type, enum tree_code code)
2593 if (FLOAT_TYPE_P (type)
2594 && !flag_unsafe_math_optimizations)
2595 return false;
2597 return (commutative_tree_code (code)
2598 && associative_tree_code (code));
2601 /* If the operation used in STMT is associative and commutative, go through the
2602 tree of the same operations and returns its root. Distance to the root
2603 is stored in DISTANCE. */
2605 gimple *
2606 pcom_worker::find_associative_operation_root (gimple *stmt, unsigned *distance)
2608 tree lhs;
2609 gimple *next;
2610 enum tree_code code = gimple_assign_rhs_code (stmt);
2611 tree type = TREE_TYPE (gimple_assign_lhs (stmt));
2612 unsigned dist = 0;
2614 if (!may_reassociate_p (type, code))
2615 return NULL;
2617 while (1)
2619 lhs = gimple_assign_lhs (stmt);
2620 gcc_assert (TREE_CODE (lhs) == SSA_NAME);
2622 next = find_use_stmt (&lhs);
2623 if (!next
2624 || gimple_assign_rhs_code (next) != code)
2625 break;
2627 stmt = next;
2628 dist++;
2631 if (distance)
2632 *distance = dist;
2633 return stmt;
2636 /* Returns the common statement in that NAME1 and NAME2 have a use. If there
2637 is no such statement, returns NULL_TREE. In case the operation used on
2638 NAME1 and NAME2 is associative and commutative, returns the root of the
2639 tree formed by this operation instead of the statement that uses NAME1 or
2640 NAME2. */
2642 gimple *
2643 pcom_worker::find_common_use_stmt (tree *name1, tree *name2)
2645 gimple *stmt1, *stmt2;
2647 stmt1 = find_use_stmt (name1);
2648 if (!stmt1)
2649 return NULL;
2651 stmt2 = find_use_stmt (name2);
2652 if (!stmt2)
2653 return NULL;
2655 if (stmt1 == stmt2)
2656 return stmt1;
2658 stmt1 = find_associative_operation_root (stmt1, NULL);
2659 if (!stmt1)
2660 return NULL;
2661 stmt2 = find_associative_operation_root (stmt2, NULL);
2662 if (!stmt2)
2663 return NULL;
2665 return (stmt1 == stmt2 ? stmt1 : NULL);
2668 /* Checks whether R1 and R2 are combined together using CODE, with the result
2669 in RSLT_TYPE, in order R1 CODE R2 if SWAP is false and in order R2 CODE R1
2670 if it is true. If CODE is ERROR_MARK, set these values instead. */
2672 bool
2673 pcom_worker::combinable_refs_p (dref r1, dref r2,
2674 enum tree_code *code, bool *swap, tree *rslt_type)
2676 enum tree_code acode;
2677 bool aswap;
2678 tree atype;
2679 tree name1, name2;
2680 gimple *stmt;
2682 name1 = name_for_ref (r1);
2683 name2 = name_for_ref (r2);
2684 gcc_assert (name1 != NULL_TREE && name2 != NULL_TREE);
2686 stmt = find_common_use_stmt (&name1, &name2);
2688 if (!stmt
2689 /* A simple post-dominance check - make sure the combination
2690 is executed under the same condition as the references. */
2691 || (gimple_bb (stmt) != gimple_bb (r1->stmt)
2692 && gimple_bb (stmt) != gimple_bb (r2->stmt)))
2693 return false;
2695 acode = gimple_assign_rhs_code (stmt);
2696 aswap = (!commutative_tree_code (acode)
2697 && gimple_assign_rhs1 (stmt) != name1);
2698 atype = TREE_TYPE (gimple_assign_lhs (stmt));
2700 if (*code == ERROR_MARK)
2702 *code = acode;
2703 *swap = aswap;
2704 *rslt_type = atype;
2705 return true;
2708 return (*code == acode
2709 && *swap == aswap
2710 && *rslt_type == atype);
2713 /* Remove OP from the operation on rhs of STMT, and replace STMT with
2714 an assignment of the remaining operand. */
2716 static void
2717 remove_name_from_operation (gimple *stmt, tree op)
2719 tree other_op;
2720 gimple_stmt_iterator si;
2722 gcc_assert (is_gimple_assign (stmt));
2724 if (gimple_assign_rhs1 (stmt) == op)
2725 other_op = gimple_assign_rhs2 (stmt);
2726 else
2727 other_op = gimple_assign_rhs1 (stmt);
2729 si = gsi_for_stmt (stmt);
2730 gimple_assign_set_rhs_from_tree (&si, other_op);
2732 /* We should not have reallocated STMT. */
2733 gcc_assert (gsi_stmt (si) == stmt);
2735 update_stmt (stmt);
2738 /* Reassociates the expression in that NAME1 and NAME2 are used so that they
2739 are combined in a single statement, and returns this statement. */
2741 gimple *
2742 pcom_worker::reassociate_to_the_same_stmt (tree name1, tree name2)
2744 gimple *stmt1, *stmt2, *root1, *root2, *s1, *s2;
2745 gassign *new_stmt, *tmp_stmt;
2746 tree new_name, tmp_name, var, r1, r2;
2747 unsigned dist1, dist2;
2748 enum tree_code code;
2749 tree type = TREE_TYPE (name1);
2750 gimple_stmt_iterator bsi;
2752 stmt1 = find_use_stmt (&name1);
2753 stmt2 = find_use_stmt (&name2);
2754 root1 = find_associative_operation_root (stmt1, &dist1);
2755 root2 = find_associative_operation_root (stmt2, &dist2);
2756 code = gimple_assign_rhs_code (stmt1);
2758 gcc_assert (root1 && root2 && root1 == root2
2759 && code == gimple_assign_rhs_code (stmt2));
2761 /* Find the root of the nearest expression in that both NAME1 and NAME2
2762 are used. */
2763 r1 = name1;
2764 s1 = stmt1;
2765 r2 = name2;
2766 s2 = stmt2;
2768 while (dist1 > dist2)
2770 s1 = find_use_stmt (&r1);
2771 r1 = gimple_assign_lhs (s1);
2772 dist1--;
2774 while (dist2 > dist1)
2776 s2 = find_use_stmt (&r2);
2777 r2 = gimple_assign_lhs (s2);
2778 dist2--;
2781 while (s1 != s2)
2783 s1 = find_use_stmt (&r1);
2784 r1 = gimple_assign_lhs (s1);
2785 s2 = find_use_stmt (&r2);
2786 r2 = gimple_assign_lhs (s2);
2789 /* Remove NAME1 and NAME2 from the statements in that they are used
2790 currently. */
2791 remove_name_from_operation (stmt1, name1);
2792 remove_name_from_operation (stmt2, name2);
2794 /* Insert the new statement combining NAME1 and NAME2 before S1, and
2795 combine it with the rhs of S1. */
2796 var = create_tmp_reg (type, "predreastmp");
2797 new_name = make_ssa_name (var);
2798 new_stmt = gimple_build_assign (new_name, code, name1, name2);
2800 var = create_tmp_reg (type, "predreastmp");
2801 tmp_name = make_ssa_name (var);
2803 /* Rhs of S1 may now be either a binary expression with operation
2804 CODE, or gimple_val (in case that stmt1 == s1 or stmt2 == s1,
2805 so that name1 or name2 was removed from it). */
2806 tmp_stmt = gimple_build_assign (tmp_name, gimple_assign_rhs_code (s1),
2807 gimple_assign_rhs1 (s1),
2808 gimple_assign_rhs2 (s1));
2810 bsi = gsi_for_stmt (s1);
2811 gimple_assign_set_rhs_with_ops (&bsi, code, new_name, tmp_name);
2812 s1 = gsi_stmt (bsi);
2813 update_stmt (s1);
2815 gsi_insert_before (&bsi, new_stmt, GSI_SAME_STMT);
2816 gsi_insert_before (&bsi, tmp_stmt, GSI_SAME_STMT);
2818 return new_stmt;
2821 /* Returns the statement that combines references R1 and R2. In case R1
2822 and R2 are not used in the same statement, but they are used with an
2823 associative and commutative operation in the same expression, reassociate
2824 the expression so that they are used in the same statement. */
2826 gimple *
2827 pcom_worker::stmt_combining_refs (dref r1, dref r2)
2829 gimple *stmt1, *stmt2;
2830 tree name1 = name_for_ref (r1);
2831 tree name2 = name_for_ref (r2);
2833 stmt1 = find_use_stmt (&name1);
2834 stmt2 = find_use_stmt (&name2);
2835 if (stmt1 == stmt2)
2836 return stmt1;
2838 return reassociate_to_the_same_stmt (name1, name2);
2841 /* Tries to combine chains CH1 and CH2 together. If this succeeds, the
2842 description of the new chain is returned, otherwise we return NULL. */
2844 chain_p
2845 pcom_worker::combine_chains (chain_p ch1, chain_p ch2)
2847 dref r1, r2, nw;
2848 enum tree_code op = ERROR_MARK;
2849 bool swap = false;
2850 chain_p new_chain;
2851 unsigned i;
2852 tree rslt_type = NULL_TREE;
2854 if (ch1 == ch2)
2855 return NULL;
2856 if (ch1->length != ch2->length)
2857 return NULL;
2859 if (ch1->refs.length () != ch2->refs.length ())
2860 return NULL;
2862 for (i = 0; (ch1->refs.iterate (i, &r1)
2863 && ch2->refs.iterate (i, &r2)); i++)
2865 if (r1->distance != r2->distance)
2866 return NULL;
2868 if (!combinable_refs_p (r1, r2, &op, &swap, &rslt_type))
2869 return NULL;
2872 if (swap)
2873 std::swap (ch1, ch2);
2875 new_chain = XCNEW (struct chain);
2876 new_chain->type = CT_COMBINATION;
2877 new_chain->op = op;
2878 new_chain->ch1 = ch1;
2879 new_chain->ch2 = ch2;
2880 new_chain->rslt_type = rslt_type;
2881 new_chain->length = ch1->length;
2883 for (i = 0; (ch1->refs.iterate (i, &r1)
2884 && ch2->refs.iterate (i, &r2)); i++)
2886 nw = XCNEW (class dref_d);
2887 nw->stmt = stmt_combining_refs (r1, r2);
2888 nw->distance = r1->distance;
2890 new_chain->refs.safe_push (nw);
2893 ch1->combined = true;
2894 ch2->combined = true;
2895 return new_chain;
2898 /* Recursively update position information of all offspring chains to ROOT
2899 chain's position information. */
2901 static void
2902 update_pos_for_combined_chains (chain_p root)
2904 chain_p ch1 = root->ch1, ch2 = root->ch2;
2905 dref ref, ref1, ref2;
2906 for (unsigned j = 0; (root->refs.iterate (j, &ref)
2907 && ch1->refs.iterate (j, &ref1)
2908 && ch2->refs.iterate (j, &ref2)); ++j)
2909 ref1->pos = ref2->pos = ref->pos;
2911 if (ch1->type == CT_COMBINATION)
2912 update_pos_for_combined_chains (ch1);
2913 if (ch2->type == CT_COMBINATION)
2914 update_pos_for_combined_chains (ch2);
2917 /* Returns true if statement S1 dominates statement S2. */
2919 static bool
2920 pcom_stmt_dominates_stmt_p (gimple *s1, gimple *s2)
2922 basic_block bb1 = gimple_bb (s1), bb2 = gimple_bb (s2);
2924 if (!bb1 || s1 == s2)
2925 return true;
2927 if (bb1 == bb2)
2928 return gimple_uid (s1) < gimple_uid (s2);
2930 return dominated_by_p (CDI_DOMINATORS, bb2, bb1);
2933 /* Try to combine the chains. */
2935 void
2936 pcom_worker::try_combine_chains ()
2938 unsigned i, j;
2939 chain_p ch1, ch2, cch;
2940 auto_vec<chain_p> worklist;
2941 bool combined_p = false;
2943 FOR_EACH_VEC_ELT (m_chains, i, ch1)
2944 if (chain_can_be_combined_p (ch1))
2945 worklist.safe_push (ch1);
2947 while (!worklist.is_empty ())
2949 ch1 = worklist.pop ();
2950 if (!chain_can_be_combined_p (ch1))
2951 continue;
2953 FOR_EACH_VEC_ELT (m_chains, j, ch2)
2955 if (!chain_can_be_combined_p (ch2))
2956 continue;
2958 cch = combine_chains (ch1, ch2);
2959 if (cch)
2961 worklist.safe_push (cch);
2962 m_chains.safe_push (cch);
2963 combined_p = true;
2964 break;
2968 if (!combined_p)
2969 return;
2971 /* Setup UID for all statements in dominance order. */
2972 basic_block *bbs = get_loop_body_in_dom_order (m_loop);
2973 renumber_gimple_stmt_uids_in_blocks (bbs, m_loop->num_nodes);
2974 free (bbs);
2976 /* Re-association in combined chains may generate statements different to
2977 order of references of the original chain. We need to keep references
2978 of combined chain in dominance order so that all uses will be inserted
2979 after definitions. Note:
2980 A) This is necessary for all combined chains.
2981 B) This is only necessary for ZERO distance references because other
2982 references inherit value from loop carried PHIs.
2984 We first update position information for all combined chains. */
2985 dref ref;
2986 for (i = 0; m_chains.iterate (i, &ch1); ++i)
2988 if (ch1->type != CT_COMBINATION || ch1->combined)
2989 continue;
2991 for (j = 0; ch1->refs.iterate (j, &ref); ++j)
2992 ref->pos = gimple_uid (ref->stmt);
2994 update_pos_for_combined_chains (ch1);
2996 /* Then sort references according to newly updated position information. */
2997 for (i = 0; m_chains.iterate (i, &ch1); ++i)
2999 if (ch1->type != CT_COMBINATION && !ch1->combined)
3000 continue;
3002 /* Find the first reference with non-ZERO distance. */
3003 if (ch1->length == 0)
3004 j = ch1->refs.length();
3005 else
3007 for (j = 0; ch1->refs.iterate (j, &ref); ++j)
3008 if (ref->distance != 0)
3009 break;
3012 /* Sort all ZERO distance references by position. */
3013 qsort (&ch1->refs[0], j, sizeof (ch1->refs[0]), order_drefs_by_pos);
3015 if (ch1->combined)
3016 continue;
3018 /* For ZERO length chain, has_max_use_after must be true since root
3019 combined stmt must dominates others. */
3020 if (ch1->length == 0)
3022 ch1->has_max_use_after = true;
3023 continue;
3025 /* Check if there is use at max distance after root for combined chains
3026 and set flag accordingly. */
3027 ch1->has_max_use_after = false;
3028 gimple *root_stmt = get_chain_root (ch1)->stmt;
3029 for (j = 1; ch1->refs.iterate (j, &ref); ++j)
3031 if (ref->distance == ch1->length
3032 && !pcom_stmt_dominates_stmt_p (ref->stmt, root_stmt))
3034 ch1->has_max_use_after = true;
3035 break;
3041 /* Prepare initializers for store elimination CHAIN in LOOP. Returns false
3042 if this is impossible because one of these initializers may trap, true
3043 otherwise. */
3045 static bool
3046 prepare_initializers_chain_store_elim (class loop *loop, chain_p chain)
3048 unsigned i, n = chain->length;
3050 /* For now we can't eliminate stores if some of them are conditional
3051 executed. */
3052 if (!chain->all_always_accessed)
3053 return false;
3055 /* Nothing to intialize for intra-iteration store elimination. */
3056 if (n == 0 && chain->type == CT_STORE_STORE)
3057 return true;
3059 /* For store elimination chain, there is nothing to initialize if stores
3060 to be eliminated only store loop invariant values into memory. */
3061 if (chain->type == CT_STORE_STORE
3062 && is_inv_store_elimination_chain (loop, chain))
3064 chain->inv_store_elimination = true;
3065 return true;
3068 chain->inits.create (n);
3069 chain->inits.safe_grow_cleared (n, true);
3071 /* For store eliminatin chain like below:
3073 for (i = 0; i < len; i++)
3075 a[i] = 1;
3076 // a[i + 1] = ...
3077 a[i + 2] = 3;
3080 store to a[i + 1] is missed in loop body, it acts like bubbles. The
3081 content of a[i + 1] remain the same if the loop iterates fewer times
3082 than chain->length. We need to set up root variables for such stores
3083 by loading from memory before loop. Note we only need to load bubble
3084 elements because loop body is guaranteed to be executed at least once
3085 after loop's preheader edge. */
3086 auto_vec<bool> bubbles;
3087 bubbles.safe_grow_cleared (n + 1, true);
3088 for (i = 0; i < chain->refs.length (); i++)
3089 bubbles[chain->refs[i]->distance] = true;
3091 struct data_reference *dr = get_chain_root (chain)->ref;
3092 for (i = 0; i < n; i++)
3094 if (bubbles[i])
3095 continue;
3097 gimple_seq stmts = NULL;
3099 tree init = ref_at_iteration (dr, (int) 0 - i, &stmts);
3100 if (stmts)
3101 gimple_seq_add_seq_without_update (&chain->init_seq, stmts);
3103 chain->inits[i] = init;
3106 return true;
3109 /* Prepare initializers for CHAIN. Returns false if this is impossible
3110 because one of these initializers may trap, true otherwise. */
3112 bool
3113 pcom_worker::prepare_initializers_chain (chain_p chain)
3115 unsigned i, n = (chain->type == CT_INVARIANT) ? 1 : chain->length;
3116 struct data_reference *dr = get_chain_root (chain)->ref;
3117 tree init;
3118 dref laref;
3119 edge entry = loop_preheader_edge (m_loop);
3121 if (chain->type == CT_STORE_STORE)
3122 return prepare_initializers_chain_store_elim (m_loop, chain);
3124 /* Find the initializers for the variables, and check that they cannot
3125 trap. */
3126 chain->inits.create (n);
3127 for (i = 0; i < n; i++)
3128 chain->inits.quick_push (NULL_TREE);
3130 /* If we have replaced some looparound phi nodes, use their initializers
3131 instead of creating our own. */
3132 FOR_EACH_VEC_ELT (chain->refs, i, laref)
3134 if (gimple_code (laref->stmt) != GIMPLE_PHI)
3135 continue;
3137 gcc_assert (laref->distance > 0);
3138 chain->inits[n - laref->distance]
3139 = PHI_ARG_DEF_FROM_EDGE (laref->stmt, entry);
3142 for (i = 0; i < n; i++)
3144 gimple_seq stmts = NULL;
3146 if (chain->inits[i] != NULL_TREE)
3147 continue;
3149 init = ref_at_iteration (dr, (int) i - n, &stmts);
3150 if (!chain->all_always_accessed && tree_could_trap_p (init))
3152 gimple_seq_discard (stmts);
3153 return false;
3156 if (stmts)
3157 gimple_seq_add_seq_without_update (&chain->init_seq, stmts);
3159 chain->inits[i] = init;
3162 return true;
3165 /* Prepare initializers for chains, and free chains that cannot
3166 be used because the initializers might trap. */
3168 void
3169 pcom_worker::prepare_initializers ()
3171 chain_p chain;
3172 unsigned i;
3174 for (i = 0; i < m_chains.length (); )
3176 chain = m_chains[i];
3177 if (prepare_initializers_chain (chain))
3178 i++;
3179 else
3181 release_chain (chain);
3182 m_chains.unordered_remove (i);
3187 /* Generates finalizer memory references for CHAIN. Returns true
3188 if finalizer code for CHAIN can be generated, otherwise false. */
3190 bool
3191 pcom_worker::prepare_finalizers_chain (chain_p chain)
3193 unsigned i, n = chain->length;
3194 struct data_reference *dr = get_chain_root (chain)->ref;
3195 tree fini, niters = number_of_latch_executions (m_loop);
3197 /* For now we can't eliminate stores if some of them are conditional
3198 executed. */
3199 if (!chain->all_always_accessed)
3200 return false;
3202 chain->finis.create (n);
3203 for (i = 0; i < n; i++)
3204 chain->finis.quick_push (NULL_TREE);
3206 /* We never use looparound phi node for store elimination chains. */
3208 /* Find the finalizers for the variables, and check that they cannot
3209 trap. */
3210 for (i = 0; i < n; i++)
3212 gimple_seq stmts = NULL;
3213 gcc_assert (chain->finis[i] == NULL_TREE);
3215 if (TREE_CODE (niters) != INTEGER_CST && TREE_CODE (niters) != SSA_NAME)
3217 niters = unshare_expr (niters);
3218 niters = force_gimple_operand (niters, &stmts, true, NULL);
3219 if (stmts)
3221 gimple_seq_add_seq_without_update (&chain->fini_seq, stmts);
3222 stmts = NULL;
3225 fini = ref_at_iteration (dr, (int) 0 - i, &stmts, niters);
3226 if (stmts)
3227 gimple_seq_add_seq_without_update (&chain->fini_seq, stmts);
3229 chain->finis[i] = fini;
3232 return true;
3235 /* Generates finalizer memory reference for chains. Returns true if
3236 finalizer code generation for chains breaks loop closed ssa form. */
3238 bool
3239 pcom_worker::prepare_finalizers ()
3241 chain_p chain;
3242 unsigned i;
3243 bool loop_closed_ssa = false;
3245 for (i = 0; i < m_chains.length ();)
3247 chain = m_chains[i];
3249 /* Finalizer is only necessary for inter-iteration store elimination
3250 chains. */
3251 if (chain->length == 0 || chain->type != CT_STORE_STORE)
3253 i++;
3254 continue;
3257 if (prepare_finalizers_chain (chain))
3259 i++;
3260 /* Be conservative, assume loop closed ssa form is corrupted
3261 by store-store chain. Though it's not always the case if
3262 eliminated stores only store loop invariant values into
3263 memory. */
3264 loop_closed_ssa = true;
3266 else
3268 release_chain (chain);
3269 m_chains.unordered_remove (i);
3272 return loop_closed_ssa;
3275 /* Insert all initializing gimple stmts into LOOP's entry edge. */
3277 static void
3278 insert_init_seqs (class loop *loop, vec<chain_p> &chains)
3280 unsigned i;
3281 edge entry = loop_preheader_edge (loop);
3283 for (i = 0; i < chains.length (); ++i)
3284 if (chains[i]->init_seq)
3286 gsi_insert_seq_on_edge_immediate (entry, chains[i]->init_seq);
3287 chains[i]->init_seq = NULL;
3291 /* Performs predictive commoning for LOOP. Sets bit 1<<1 of return value
3292 if LOOP was unrolled; Sets bit 1<<2 of return value if loop closed ssa
3293 form was corrupted. Non-zero return value indicates some changes were
3294 applied to this loop. */
3296 unsigned
3297 pcom_worker::tree_predictive_commoning_loop (bool allow_unroll_p)
3299 struct component *components;
3300 unsigned unroll_factor = 0;
3301 class tree_niter_desc desc;
3302 bool unroll = false, loop_closed_ssa = false;
3304 if (dump_file && (dump_flags & TDF_DETAILS))
3305 fprintf (dump_file, "Processing loop %d\n", m_loop->num);
3307 /* Nothing for predicitive commoning if loop only iterates 1 time. */
3308 if (get_max_loop_iterations_int (m_loop) == 0)
3310 if (dump_file && (dump_flags & TDF_DETAILS))
3311 fprintf (dump_file, "Loop iterates only 1 time, nothing to do.\n");
3313 return 0;
3316 /* Find the data references and split them into components according to their
3317 dependence relations. */
3318 auto_vec<loop_p, 3> loop_nest;
3319 if (!compute_data_dependences_for_loop (m_loop, true, &loop_nest, &m_datarefs,
3320 &m_dependences))
3322 if (dump_file && (dump_flags & TDF_DETAILS))
3323 fprintf (dump_file, "Cannot analyze data dependencies\n");
3324 return 0;
3327 if (dump_file && (dump_flags & TDF_DETAILS))
3328 dump_data_dependence_relations (dump_file, m_dependences);
3330 components = split_data_refs_to_components ();
3332 loop_nest.release ();
3333 if (!components)
3334 return 0;
3336 if (dump_file && (dump_flags & TDF_DETAILS))
3338 fprintf (dump_file, "Initial state:\n\n");
3339 dump_components (dump_file, components);
3342 /* Find the suitable components and split them into chains. */
3343 components = filter_suitable_components (components);
3345 auto_bitmap tmp_vars;
3346 determine_roots (components);
3347 release_components (components);
3349 if (!m_chains.exists ())
3351 if (dump_file && (dump_flags & TDF_DETAILS))
3352 fprintf (dump_file,
3353 "Predictive commoning failed: no suitable chains\n");
3354 return 0;
3357 prepare_initializers ();
3358 loop_closed_ssa = prepare_finalizers ();
3360 /* Try to combine the chains that are always worked with together. */
3361 try_combine_chains ();
3363 insert_init_seqs (m_loop, m_chains);
3365 if (dump_file && (dump_flags & TDF_DETAILS))
3367 fprintf (dump_file, "Before commoning:\n\n");
3368 dump_chains (dump_file, m_chains);
3371 if (allow_unroll_p)
3372 /* Determine the unroll factor, and if the loop should be unrolled, ensure
3373 that its number of iterations is divisible by the factor. */
3374 unroll_factor = determine_unroll_factor (m_chains);
3376 if (unroll_factor > 1)
3377 unroll = can_unroll_loop_p (m_loop, unroll_factor, &desc);
3379 /* Execute the predictive commoning transformations, and possibly unroll the
3380 loop. */
3381 if (unroll)
3383 struct epcc_data dta;
3385 if (dump_file && (dump_flags & TDF_DETAILS))
3386 fprintf (dump_file, "Unrolling %u times.\n", unroll_factor);
3388 dta.tmp_vars = tmp_vars;
3389 dta.chains = m_chains.to_vec_legacy ();
3390 dta.worker = this;
3392 /* Cfg manipulations performed in tree_transform_and_unroll_loop before
3393 execute_pred_commoning_cbck is called may cause phi nodes to be
3394 reallocated, which is a problem since CHAINS may point to these
3395 statements. To fix this, we store the ssa names defined by the
3396 phi nodes here instead of the phi nodes themselves, and restore
3397 the phi nodes in execute_pred_commoning_cbck. A bit hacky. */
3398 replace_phis_by_defined_names (m_chains);
3400 edge exit = single_dom_exit (m_loop);
3401 tree_transform_and_unroll_loop (m_loop, unroll_factor, exit, &desc,
3402 execute_pred_commoning_cbck, &dta);
3403 eliminate_temp_copies (m_loop, tmp_vars);
3405 else
3407 if (dump_file && (dump_flags & TDF_DETAILS))
3408 fprintf (dump_file,
3409 "Executing predictive commoning without unrolling.\n");
3410 execute_pred_commoning (tmp_vars);
3413 return (unroll ? 2 : 1) | (loop_closed_ssa ? 4 : 1);
3416 /* Runs predictive commoning. */
3418 unsigned
3419 tree_predictive_commoning (bool allow_unroll_p)
3421 unsigned ret = 0, changed = 0;
3423 initialize_original_copy_tables ();
3424 for (auto loop : loops_list (cfun, LI_ONLY_INNERMOST))
3425 if (optimize_loop_for_speed_p (loop))
3427 pcom_worker w(loop);
3428 changed |= w.tree_predictive_commoning_loop (allow_unroll_p);
3430 free_original_copy_tables ();
3432 if (changed > 0)
3434 ret = TODO_update_ssa_only_virtuals;
3436 /* Some loop(s) got unrolled. */
3437 if (changed > 1)
3439 scev_reset ();
3441 /* Need to fix up loop closed SSA. */
3442 if (changed >= 4)
3443 rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
3445 ret |= TODO_cleanup_cfg;
3449 return ret;
3452 /* Predictive commoning Pass. */
3454 static unsigned
3455 run_tree_predictive_commoning (struct function *fun, bool allow_unroll_p)
3457 if (number_of_loops (fun) <= 1)
3458 return 0;
3460 return tree_predictive_commoning (allow_unroll_p);
3463 namespace {
3465 const pass_data pass_data_predcom =
3467 GIMPLE_PASS, /* type */
3468 "pcom", /* name */
3469 OPTGROUP_LOOP, /* optinfo_flags */
3470 TV_PREDCOM, /* tv_id */
3471 PROP_cfg, /* properties_required */
3472 0, /* properties_provided */
3473 0, /* properties_destroyed */
3474 0, /* todo_flags_start */
3475 0, /* todo_flags_finish */
3478 class pass_predcom : public gimple_opt_pass
3480 public:
3481 pass_predcom (gcc::context *ctxt)
3482 : gimple_opt_pass (pass_data_predcom, ctxt)
3485 /* opt_pass methods: */
3486 virtual bool
3487 gate (function *)
3489 if (flag_predictive_commoning != 0)
3490 return true;
3491 /* Loop vectorization enables predictive commoning implicitly
3492 only if predictive commoning isn't set explicitly, and it
3493 doesn't allow unrolling. */
3494 if (flag_tree_loop_vectorize
3495 && !global_options_set.x_flag_predictive_commoning)
3496 return true;
3498 return false;
3501 virtual unsigned int
3502 execute (function *fun)
3504 bool allow_unroll_p = flag_predictive_commoning != 0;
3505 return run_tree_predictive_commoning (fun, allow_unroll_p);
3508 }; // class pass_predcom
3510 } // anon namespace
3512 gimple_opt_pass *
3513 make_pass_predcom (gcc::context *ctxt)
3515 return new pass_predcom (ctxt);