c++: wrong error with MVP and pushdecl [PR64679]
[official-gcc.git] / gcc / tree-predcom.cc
blobfb457250bbd36dca0171e39505046e0a5aca5063
1 /* Predictive commoning.
2 Copyright (C) 2005-2022 Free Software Foundation, Inc.
4 This file is part of GCC.
6 GCC is free software; you can redistribute it and/or modify it
7 under the terms of the GNU General Public License as published by the
8 Free Software Foundation; either version 3, or (at your option) any
9 later version.
11 GCC is distributed in the hope that it will be useful, but WITHOUT
12 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
13 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
14 for more details.
16 You should have received a copy of the GNU General Public License
17 along with GCC; see the file COPYING3. If not see
18 <http://www.gnu.org/licenses/>. */
20 /* This file implements the predictive commoning optimization. Predictive
21 commoning can be viewed as CSE around a loop, and with some improvements,
22 as generalized strength reduction-- i.e., reusing values computed in
23 earlier iterations of a loop in the later ones. So far, the pass only
24 handles the most useful case, that is, reusing values of memory references.
25 If you think this is all just a special case of PRE, you are sort of right;
26 however, concentrating on loops is simpler, and makes it possible to
27 incorporate data dependence analysis to detect the opportunities, perform
28 loop unrolling to avoid copies together with renaming immediately,
29 and if needed, we could also take register pressure into account.
31 Let us demonstrate what is done on an example:
33 for (i = 0; i < 100; i++)
35 a[i+2] = a[i] + a[i+1];
36 b[10] = b[10] + i;
37 c[i] = c[99 - i];
38 d[i] = d[i + 1];
41 1) We find data references in the loop, and split them to mutually
42 independent groups (i.e., we find components of a data dependence
43 graph). We ignore read-read dependences whose distance is not constant.
44 (TODO -- we could also ignore antidependences). In this example, we
45 find the following groups:
47 a[i]{read}, a[i+1]{read}, a[i+2]{write}
48 b[10]{read}, b[10]{write}
49 c[99 - i]{read}, c[i]{write}
50 d[i + 1]{read}, d[i]{write}
52 2) Inside each of the group, we verify several conditions:
53 a) all the references must differ in indices only, and the indices
54 must all have the same step
55 b) the references must dominate loop latch (and thus, they must be
56 ordered by dominance relation).
57 c) the distance of the indices must be a small multiple of the step
58 We are then able to compute the difference of the references (# of
59 iterations before they point to the same place as the first of them).
60 Also, in case there are writes in the loop, we split the groups into
61 chains whose head is the write whose values are used by the reads in
62 the same chain. The chains are then processed independently,
63 making the further transformations simpler. Also, the shorter chains
64 need the same number of registers, but may require lower unrolling
65 factor in order to get rid of the copies on the loop latch.
67 In our example, we get the following chains (the chain for c is invalid).
69 a[i]{read,+0}, a[i+1]{read,-1}, a[i+2]{write,-2}
70 b[10]{read,+0}, b[10]{write,+0}
71 d[i + 1]{read,+0}, d[i]{write,+1}
73 3) For each read, we determine the read or write whose value it reuses,
74 together with the distance of this reuse. I.e. we take the last
75 reference before it with distance 0, or the last of the references
76 with the smallest positive distance to the read. Then, we remove
77 the references that are not used in any of these chains, discard the
78 empty groups, and propagate all the links so that they point to the
79 single root reference of the chain (adjusting their distance
80 appropriately). Some extra care needs to be taken for references with
81 step 0. In our example (the numbers indicate the distance of the
82 reuse),
84 a[i] --> (*) 2, a[i+1] --> (*) 1, a[i+2] (*)
85 b[10] --> (*) 1, b[10] (*)
87 4) The chains are combined together if possible. If the corresponding
88 elements of two chains are always combined together with the same
89 operator, we remember just the result of this combination, instead
90 of remembering the values separately. We may need to perform
91 reassociation to enable combining, for example
93 e[i] + f[i+1] + e[i+1] + f[i]
95 can be reassociated as
97 (e[i] + f[i]) + (e[i+1] + f[i+1])
99 and we can combine the chains for e and f into one chain.
101 5) For each root reference (end of the chain) R, let N be maximum distance
102 of a reference reusing its value. Variables R0 up to RN are created,
103 together with phi nodes that transfer values from R1 .. RN to
104 R0 .. R(N-1).
105 Initial values are loaded to R0..R(N-1) (in case not all references
106 must necessarily be accessed and they may trap, we may fail here;
107 TODO sometimes, the loads could be guarded by a check for the number
108 of iterations). Values loaded/stored in roots are also copied to
109 RN. Other reads are replaced with the appropriate variable Ri.
110 Everything is put to SSA form.
112 As a small improvement, if R0 is dead after the root (i.e., all uses of
113 the value with the maximum distance dominate the root), we can avoid
114 creating RN and use R0 instead of it.
116 In our example, we get (only the parts concerning a and b are shown):
117 for (i = 0; i < 100; i++)
119 f = phi (a[0], s);
120 s = phi (a[1], f);
121 x = phi (b[10], x);
123 f = f + s;
124 a[i+2] = f;
125 x = x + i;
126 b[10] = x;
129 6) Factor F for unrolling is determined as the smallest common multiple of
130 (N + 1) for each root reference (N for references for that we avoided
131 creating RN). If F and the loop is small enough, loop is unrolled F
132 times. The stores to RN (R0) in the copies of the loop body are
133 periodically replaced with R0, R1, ... (R1, R2, ...), so that they can
134 be coalesced and the copies can be eliminated.
136 TODO -- copy propagation and other optimizations may change the live
137 ranges of the temporary registers and prevent them from being coalesced;
138 this may increase the register pressure.
140 In our case, F = 2 and the (main loop of the) result is
142 for (i = 0; i < ...; i += 2)
144 f = phi (a[0], f);
145 s = phi (a[1], s);
146 x = phi (b[10], x);
148 f = f + s;
149 a[i+2] = f;
150 x = x + i;
151 b[10] = x;
153 s = s + f;
154 a[i+3] = s;
155 x = x + i;
156 b[10] = x;
159 Apart from predictive commoning on Load-Load and Store-Load chains, we
160 also support Store-Store chains -- stores killed by other store can be
161 eliminated. Given below example:
163 for (i = 0; i < n; i++)
165 a[i] = 1;
166 a[i+2] = 2;
169 It can be replaced with:
171 t0 = a[0];
172 t1 = a[1];
173 for (i = 0; i < n; i++)
175 a[i] = 1;
176 t2 = 2;
177 t0 = t1;
178 t1 = t2;
180 a[n] = t0;
181 a[n+1] = t1;
183 If the loop runs more than 1 iterations, it can be further simplified into:
185 for (i = 0; i < n; i++)
187 a[i] = 1;
189 a[n] = 2;
190 a[n+1] = 2;
192 The interesting part is this can be viewed either as general store motion
193 or general dead store elimination in either intra/inter-iterations way.
195 With trivial effort, we also support load inside Store-Store chains if the
196 load is dominated by a store statement in the same iteration of loop. You
197 can see this as a restricted Store-Mixed-Load-Store chain.
199 TODO: For now, we don't support store-store chains in multi-exit loops. We
200 force to not unroll in case of store-store chain even if other chains might
201 ask for unroll.
203 Predictive commoning can be generalized for arbitrary computations (not
204 just memory loads), and also nontrivial transfer functions (e.g., replacing
205 i * i with ii_last + 2 * i + 1), to generalize strength reduction. */
207 #include "config.h"
208 #include "system.h"
209 #include "coretypes.h"
210 #include "backend.h"
211 #include "rtl.h"
212 #include "tree.h"
213 #include "gimple.h"
214 #include "predict.h"
215 #include "tree-pass.h"
216 #include "ssa.h"
217 #include "gimple-pretty-print.h"
218 #include "alias.h"
219 #include "fold-const.h"
220 #include "cfgloop.h"
221 #include "tree-eh.h"
222 #include "gimplify.h"
223 #include "gimple-iterator.h"
224 #include "gimplify-me.h"
225 #include "tree-ssa-loop-ivopts.h"
226 #include "tree-ssa-loop-manip.h"
227 #include "tree-ssa-loop-niter.h"
228 #include "tree-ssa-loop.h"
229 #include "tree-into-ssa.h"
230 #include "tree-dfa.h"
231 #include "tree-ssa.h"
232 #include "tree-data-ref.h"
233 #include "tree-scalar-evolution.h"
234 #include "tree-affine.h"
235 #include "builtins.h"
236 #include "opts.h"
238 /* The maximum number of iterations between the considered memory
239 references. */
241 #define MAX_DISTANCE (target_avail_regs < 16 ? 4 : 8)
243 /* Data references (or phi nodes that carry data reference values across
244 loop iterations). */
246 typedef class dref_d
248 public:
249 /* The reference itself. */
250 struct data_reference *ref;
252 /* The statement in that the reference appears. */
253 gimple *stmt;
255 /* In case that STMT is a phi node, this field is set to the SSA name
256 defined by it in replace_phis_by_defined_names (in order to avoid
257 pointing to phi node that got reallocated in the meantime). */
258 tree name_defined_by_phi;
260 /* Distance of the reference from the root of the chain (in number of
261 iterations of the loop). */
262 unsigned distance;
264 /* Number of iterations offset from the first reference in the component. */
265 widest_int offset;
267 /* Number of the reference in a component, in dominance ordering. */
268 unsigned pos;
270 /* True if the memory reference is always accessed when the loop is
271 entered. */
272 unsigned always_accessed : 1;
273 } *dref;
276 /* Type of the chain of the references. */
278 enum chain_type
280 /* The addresses of the references in the chain are constant. */
281 CT_INVARIANT,
283 /* There are only loads in the chain. */
284 CT_LOAD,
286 /* Root of the chain is store, the rest are loads. */
287 CT_STORE_LOAD,
289 /* There are only stores in the chain. */
290 CT_STORE_STORE,
292 /* A combination of two chains. */
293 CT_COMBINATION
296 /* Chains of data references. */
298 typedef struct chain
300 chain (chain_type t) : type (t), op (ERROR_MARK), rslt_type (NULL_TREE),
301 ch1 (NULL), ch2 (NULL), length (0), init_seq (NULL), fini_seq (NULL),
302 has_max_use_after (false), all_always_accessed (false), combined (false),
303 inv_store_elimination (false) {}
305 /* Type of the chain. */
306 enum chain_type type;
308 /* For combination chains, the operator and the two chains that are
309 combined, and the type of the result. */
310 enum tree_code op;
311 tree rslt_type;
312 struct chain *ch1, *ch2;
314 /* The references in the chain. */
315 auto_vec<dref> refs;
317 /* The maximum distance of the reference in the chain from the root. */
318 unsigned length;
320 /* The variables used to copy the value throughout iterations. */
321 auto_vec<tree> vars;
323 /* Initializers for the variables. */
324 auto_vec<tree> inits;
326 /* Finalizers for the eliminated stores. */
327 auto_vec<tree> finis;
329 /* gimple stmts intializing the initial variables of the chain. */
330 gimple_seq init_seq;
332 /* gimple stmts finalizing the eliminated stores of the chain. */
333 gimple_seq fini_seq;
335 /* True if there is a use of a variable with the maximal distance
336 that comes after the root in the loop. */
337 unsigned has_max_use_after : 1;
339 /* True if all the memory references in the chain are always accessed. */
340 unsigned all_always_accessed : 1;
342 /* True if this chain was combined together with some other chain. */
343 unsigned combined : 1;
345 /* True if this is store elimination chain and eliminated stores store
346 loop invariant value into memory. */
347 unsigned inv_store_elimination : 1;
348 } *chain_p;
351 /* Describes the knowledge about the step of the memory references in
352 the component. */
354 enum ref_step_type
356 /* The step is zero. */
357 RS_INVARIANT,
359 /* The step is nonzero. */
360 RS_NONZERO,
362 /* The step may or may not be nonzero. */
363 RS_ANY
366 /* Components of the data dependence graph. */
368 struct component
370 component (bool es) : comp_step (RS_ANY), eliminate_store_p (es),
371 next (NULL) {}
373 /* The references in the component. */
374 auto_vec<dref> refs;
376 /* What we know about the step of the references in the component. */
377 enum ref_step_type comp_step;
379 /* True if all references in component are stores and we try to do
380 intra/inter loop iteration dead store elimination. */
381 bool eliminate_store_p;
383 /* Next component in the list. */
384 struct component *next;
387 /* A class to encapsulate the global states used for predictive
388 commoning work on top of one given LOOP. */
390 class pcom_worker
392 public:
393 pcom_worker (loop_p l) : m_loop (l), m_cache (NULL) {}
395 ~pcom_worker ()
397 free_data_refs (m_datarefs);
398 free_dependence_relations (m_dependences);
399 free_affine_expand_cache (&m_cache);
400 release_chains ();
403 pcom_worker (const pcom_worker &) = delete;
404 pcom_worker &operator= (const pcom_worker &) = delete;
406 /* Performs predictive commoning. */
407 unsigned tree_predictive_commoning_loop (bool allow_unroll_p);
409 /* Perform the predictive commoning optimization for chains, make this
410 public for being called in callback execute_pred_commoning_cbck. */
411 void execute_pred_commoning (bitmap tmp_vars);
413 private:
414 /* The pointer to the given loop. */
415 loop_p m_loop;
417 /* All data references. */
418 auto_vec<data_reference_p, 10> m_datarefs;
420 /* All data dependences. */
421 auto_vec<ddr_p, 10> m_dependences;
423 /* All chains. */
424 auto_vec<chain_p> m_chains;
426 /* Bitmap of ssa names defined by looparound phi nodes covered by chains. */
427 auto_bitmap m_looparound_phis;
429 typedef hash_map<tree, name_expansion *> tree_expand_map_t;
430 /* Cache used by tree_to_aff_combination_expand. */
431 tree_expand_map_t *m_cache;
433 /* Splits dependence graph to components. */
434 struct component *split_data_refs_to_components ();
436 /* Check the conditions on references inside each of components COMPS,
437 and remove the unsuitable components from the list. */
438 struct component *filter_suitable_components (struct component *comps);
440 /* Find roots of the values and determine distances in components COMPS,
441 and separates the references to chains. */
442 void determine_roots (struct component *comps);
444 /* Prepare initializers for chains, and free chains that cannot
445 be used because the initializers might trap. */
446 void prepare_initializers ();
448 /* Generates finalizer memory reference for chains. Returns true if
449 finalizer code generation for chains breaks loop closed ssa form. */
450 bool prepare_finalizers ();
452 /* Try to combine the chains. */
453 void try_combine_chains ();
455 /* Frees CHAINS. */
456 void release_chains ();
458 /* Frees a chain CHAIN. */
459 void release_chain (chain_p chain);
461 /* Prepare initializers for CHAIN. Returns false if this is impossible
462 because one of these initializers may trap, true otherwise. */
463 bool prepare_initializers_chain (chain_p chain);
465 /* Generates finalizer memory references for CHAIN. Returns true
466 if finalizer code for CHAIN can be generated, otherwise false. */
467 bool prepare_finalizers_chain (chain_p chain);
469 /* Stores DR_OFFSET (DR) + DR_INIT (DR) to OFFSET. */
470 void aff_combination_dr_offset (struct data_reference *dr, aff_tree *offset);
472 /* Determines number of iterations of the innermost enclosing loop before
473 B refers to exactly the same location as A and stores it to OFF. */
474 bool determine_offset (struct data_reference *a, struct data_reference *b,
475 poly_widest_int *off);
477 /* Returns true if the component COMP satisfies the conditions
478 described in 2) at the beginning of this file. */
479 bool suitable_component_p (struct component *comp);
481 /* Returns true if REF is a valid initializer for ROOT with given
482 DISTANCE (in iterations of the innermost enclosing loop). */
483 bool valid_initializer_p (struct data_reference *ref, unsigned distance,
484 struct data_reference *root);
486 /* Finds looparound phi node of loop that copies the value of REF. */
487 gphi *find_looparound_phi (dref ref, dref root);
489 /* Find roots of the values and determine distances in the component
490 COMP. The references are redistributed into chains. */
491 void determine_roots_comp (struct component *comp);
493 /* For references in CHAIN that are copied around the loop, add the
494 results of such copies to the chain. */
495 void add_looparound_copies (chain_p chain);
497 /* Returns the single statement in that NAME is used, excepting
498 the looparound phi nodes contained in one of the chains. */
499 gimple *single_nonlooparound_use (tree name);
501 /* Remove statement STMT, as well as the chain of assignments in that
502 it is used. */
503 void remove_stmt (gimple *stmt);
505 /* Perform the predictive commoning optimization for a chain CHAIN. */
506 void execute_pred_commoning_chain (chain_p chain, bitmap tmp_vars);
508 /* Returns the modify statement that uses NAME. */
509 gimple *find_use_stmt (tree *name);
511 /* If the operation used in STMT is associative and commutative, go
512 through the tree of the same operations and returns its root. */
513 gimple *find_associative_operation_root (gimple *stmt, unsigned *distance);
515 /* Returns the common statement in that NAME1 and NAME2 have a use. */
516 gimple *find_common_use_stmt (tree *name1, tree *name2);
518 /* Checks whether R1 and R2 are combined together using CODE, with the
519 result in RSLT_TYPE, in order R1 CODE R2 if SWAP is false and in order
520 R2 CODE R1 if it is true. */
521 bool combinable_refs_p (dref r1, dref r2, enum tree_code *code, bool *swap,
522 tree *rslt_type);
524 /* Reassociates the expression in that NAME1 and NAME2 are used so that
525 they are combined in a single statement, and returns this statement. */
526 gimple *reassociate_to_the_same_stmt (tree name1, tree name2);
528 /* Returns the statement that combines references R1 and R2. */
529 gimple *stmt_combining_refs (dref r1, dref r2);
531 /* Tries to combine chains CH1 and CH2 together. */
532 chain_p combine_chains (chain_p ch1, chain_p ch2);
535 /* Dumps data reference REF to FILE. */
537 extern void dump_dref (FILE *, dref);
538 void
539 dump_dref (FILE *file, dref ref)
541 if (ref->ref)
543 fprintf (file, " ");
544 print_generic_expr (file, DR_REF (ref->ref), TDF_SLIM);
545 fprintf (file, " (id %u%s)\n", ref->pos,
546 DR_IS_READ (ref->ref) ? "" : ", write");
548 fprintf (file, " offset ");
549 print_decs (ref->offset, file);
550 fprintf (file, "\n");
552 fprintf (file, " distance %u\n", ref->distance);
554 else
556 if (gimple_code (ref->stmt) == GIMPLE_PHI)
557 fprintf (file, " looparound ref\n");
558 else
559 fprintf (file, " combination ref\n");
560 fprintf (file, " in statement ");
561 print_gimple_stmt (file, ref->stmt, 0, TDF_SLIM);
562 fprintf (file, "\n");
563 fprintf (file, " distance %u\n", ref->distance);
568 /* Dumps CHAIN to FILE. */
570 extern void dump_chain (FILE *, chain_p);
571 void
572 dump_chain (FILE *file, chain_p chain)
574 dref a;
575 const char *chain_type;
576 unsigned i;
577 tree var;
579 switch (chain->type)
581 case CT_INVARIANT:
582 chain_type = "Load motion";
583 break;
585 case CT_LOAD:
586 chain_type = "Loads-only";
587 break;
589 case CT_STORE_LOAD:
590 chain_type = "Store-loads";
591 break;
593 case CT_STORE_STORE:
594 chain_type = "Store-stores";
595 break;
597 case CT_COMBINATION:
598 chain_type = "Combination";
599 break;
601 default:
602 gcc_unreachable ();
605 fprintf (file, "%s chain %p%s\n", chain_type, (void *) chain,
606 chain->combined ? " (combined)" : "");
607 if (chain->type != CT_INVARIANT)
608 fprintf (file, " max distance %u%s\n", chain->length,
609 chain->has_max_use_after ? "" : ", may reuse first");
611 if (chain->type == CT_COMBINATION)
613 fprintf (file, " equal to %p %s %p in type ",
614 (void *) chain->ch1, op_symbol_code (chain->op),
615 (void *) chain->ch2);
616 print_generic_expr (file, chain->rslt_type, TDF_SLIM);
617 fprintf (file, "\n");
620 if (chain->vars.exists ())
622 fprintf (file, " vars");
623 FOR_EACH_VEC_ELT (chain->vars, i, var)
625 fprintf (file, " ");
626 print_generic_expr (file, var, TDF_SLIM);
628 fprintf (file, "\n");
631 if (chain->inits.exists ())
633 fprintf (file, " inits");
634 FOR_EACH_VEC_ELT (chain->inits, i, var)
636 fprintf (file, " ");
637 print_generic_expr (file, var, TDF_SLIM);
639 fprintf (file, "\n");
642 fprintf (file, " references:\n");
643 FOR_EACH_VEC_ELT (chain->refs, i, a)
644 dump_dref (file, a);
646 fprintf (file, "\n");
649 /* Dumps CHAINS to FILE. */
651 void
652 dump_chains (FILE *file, const vec<chain_p> &chains)
654 chain_p chain;
655 unsigned i;
657 FOR_EACH_VEC_ELT (chains, i, chain)
658 dump_chain (file, chain);
661 /* Dumps COMP to FILE. */
663 extern void dump_component (FILE *, struct component *);
664 void
665 dump_component (FILE *file, struct component *comp)
667 dref a;
668 unsigned i;
670 fprintf (file, "Component%s:\n",
671 comp->comp_step == RS_INVARIANT ? " (invariant)" : "");
672 FOR_EACH_VEC_ELT (comp->refs, i, a)
673 dump_dref (file, a);
674 fprintf (file, "\n");
677 /* Dumps COMPS to FILE. */
679 extern void dump_components (FILE *, struct component *);
680 void
681 dump_components (FILE *file, struct component *comps)
683 struct component *comp;
685 for (comp = comps; comp; comp = comp->next)
686 dump_component (file, comp);
689 /* Frees a chain CHAIN. */
691 void
692 pcom_worker::release_chain (chain_p chain)
694 dref ref;
695 unsigned i;
697 if (chain == NULL)
698 return;
700 FOR_EACH_VEC_ELT (chain->refs, i, ref)
701 free (ref);
703 if (chain->init_seq)
704 gimple_seq_discard (chain->init_seq);
706 if (chain->fini_seq)
707 gimple_seq_discard (chain->fini_seq);
709 delete chain;
712 /* Frees CHAINS. */
714 void
715 pcom_worker::release_chains ()
717 unsigned i;
718 chain_p chain;
720 FOR_EACH_VEC_ELT (m_chains, i, chain)
721 release_chain (chain);
724 /* Frees list of components COMPS. */
726 static void
727 release_components (struct component *comps)
729 struct component *act, *next;
731 for (act = comps; act; act = next)
733 next = act->next;
734 delete act;
738 /* Finds a root of tree given by FATHERS containing A, and performs path
739 shortening. */
741 static unsigned
742 component_of (vec<unsigned> &fathers, unsigned a)
744 unsigned root, n;
746 for (root = a; root != fathers[root]; root = fathers[root])
747 continue;
749 for (; a != root; a = n)
751 n = fathers[a];
752 fathers[a] = root;
755 return root;
758 /* Join operation for DFU. FATHERS gives the tree, SIZES are sizes of the
759 components, A and B are components to merge. */
761 static void
762 merge_comps (vec<unsigned> &fathers, vec<unsigned> &sizes,
763 unsigned a, unsigned b)
765 unsigned ca = component_of (fathers, a);
766 unsigned cb = component_of (fathers, b);
768 if (ca == cb)
769 return;
771 if (sizes[ca] < sizes[cb])
773 sizes[cb] += sizes[ca];
774 fathers[ca] = cb;
776 else
778 sizes[ca] += sizes[cb];
779 fathers[cb] = ca;
783 /* Returns true if A is a reference that is suitable for predictive commoning
784 in the innermost loop that contains it. REF_STEP is set according to the
785 step of the reference A. */
787 static bool
788 suitable_reference_p (struct data_reference *a, enum ref_step_type *ref_step)
790 tree ref = DR_REF (a), step = DR_STEP (a);
792 if (!step
793 || TREE_THIS_VOLATILE (ref)
794 || !is_gimple_reg_type (TREE_TYPE (ref))
795 || tree_could_throw_p (ref))
796 return false;
798 if (integer_zerop (step))
799 *ref_step = RS_INVARIANT;
800 else if (integer_nonzerop (step))
801 *ref_step = RS_NONZERO;
802 else
803 *ref_step = RS_ANY;
805 return true;
808 /* Stores DR_OFFSET (DR) + DR_INIT (DR) to OFFSET. */
810 void
811 pcom_worker::aff_combination_dr_offset (struct data_reference *dr,
812 aff_tree *offset)
814 tree type = TREE_TYPE (DR_OFFSET (dr));
815 aff_tree delta;
817 tree_to_aff_combination_expand (DR_OFFSET (dr), type, offset, &m_cache);
818 aff_combination_const (&delta, type, wi::to_poly_widest (DR_INIT (dr)));
819 aff_combination_add (offset, &delta);
822 /* Determines number of iterations of the innermost enclosing loop before B
823 refers to exactly the same location as A and stores it to OFF. If A and
824 B do not have the same step, they never meet, or anything else fails,
825 returns false, otherwise returns true. Both A and B are assumed to
826 satisfy suitable_reference_p. */
828 bool
829 pcom_worker::determine_offset (struct data_reference *a,
830 struct data_reference *b, poly_widest_int *off)
832 aff_tree diff, baseb, step;
833 tree typea, typeb;
835 /* Check that both the references access the location in the same type. */
836 typea = TREE_TYPE (DR_REF (a));
837 typeb = TREE_TYPE (DR_REF (b));
838 if (!useless_type_conversion_p (typeb, typea))
839 return false;
841 /* Check whether the base address and the step of both references is the
842 same. */
843 if (!operand_equal_p (DR_STEP (a), DR_STEP (b), 0)
844 || !operand_equal_p (DR_BASE_ADDRESS (a), DR_BASE_ADDRESS (b), 0))
845 return false;
847 if (integer_zerop (DR_STEP (a)))
849 /* If the references have loop invariant address, check that they access
850 exactly the same location. */
851 *off = 0;
852 return (operand_equal_p (DR_OFFSET (a), DR_OFFSET (b), 0)
853 && operand_equal_p (DR_INIT (a), DR_INIT (b), 0));
856 /* Compare the offsets of the addresses, and check whether the difference
857 is a multiple of step. */
858 aff_combination_dr_offset (a, &diff);
859 aff_combination_dr_offset (b, &baseb);
860 aff_combination_scale (&baseb, -1);
861 aff_combination_add (&diff, &baseb);
863 tree_to_aff_combination_expand (DR_STEP (a), TREE_TYPE (DR_STEP (a)),
864 &step, &m_cache);
865 return aff_combination_constant_multiple_p (&diff, &step, off);
868 /* Returns the last basic block in LOOP for that we are sure that
869 it is executed whenever the loop is entered. */
871 static basic_block
872 last_always_executed_block (class loop *loop)
874 unsigned i;
875 auto_vec<edge> exits = get_loop_exit_edges (loop);
876 edge ex;
877 basic_block last = loop->latch;
879 FOR_EACH_VEC_ELT (exits, i, ex)
880 last = nearest_common_dominator (CDI_DOMINATORS, last, ex->src);
882 return last;
885 /* Splits dependence graph on DATAREFS described by DEPENDENCES to
886 components. */
888 struct component *
889 pcom_worker::split_data_refs_to_components ()
891 unsigned i, n = m_datarefs.length ();
892 unsigned ca, ia, ib, bad;
893 struct data_reference *dr, *dra, *drb;
894 struct data_dependence_relation *ddr;
895 struct component *comp_list = NULL, *comp;
896 dref dataref;
897 /* Don't do store elimination if loop has multiple exit edges. */
898 bool eliminate_store_p = single_exit (m_loop) != NULL;
899 basic_block last_always_executed = last_always_executed_block (m_loop);
900 auto_bitmap no_store_store_comps;
901 auto_vec<unsigned> comp_father (n + 1);
902 auto_vec<unsigned> comp_size (n + 1);
903 comp_father.quick_grow (n + 1);
904 comp_size.quick_grow (n + 1);
906 FOR_EACH_VEC_ELT (m_datarefs, i, dr)
908 if (!DR_REF (dr))
909 /* A fake reference for call or asm_expr that may clobber memory;
910 just fail. */
911 return NULL;
912 /* predcom pass isn't prepared to handle calls with data references. */
913 if (is_gimple_call (DR_STMT (dr)))
914 return NULL;
915 dr->aux = (void *) (size_t) i;
916 comp_father[i] = i;
917 comp_size[i] = 1;
920 /* A component reserved for the "bad" data references. */
921 comp_father[n] = n;
922 comp_size[n] = 1;
924 FOR_EACH_VEC_ELT (m_datarefs, i, dr)
926 enum ref_step_type dummy;
928 if (!suitable_reference_p (dr, &dummy))
930 ia = (unsigned) (size_t) dr->aux;
931 merge_comps (comp_father, comp_size, n, ia);
935 FOR_EACH_VEC_ELT (m_dependences, i, ddr)
937 poly_widest_int dummy_off;
939 if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
940 continue;
942 dra = DDR_A (ddr);
943 drb = DDR_B (ddr);
945 /* Don't do store elimination if there is any unknown dependence for
946 any store data reference. */
947 if ((DR_IS_WRITE (dra) || DR_IS_WRITE (drb))
948 && (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know
949 || DDR_NUM_DIST_VECTS (ddr) == 0))
950 eliminate_store_p = false;
952 ia = component_of (comp_father, (unsigned) (size_t) dra->aux);
953 ib = component_of (comp_father, (unsigned) (size_t) drb->aux);
954 if (ia == ib)
955 continue;
957 bad = component_of (comp_father, n);
959 /* If both A and B are reads, we may ignore unsuitable dependences. */
960 if (DR_IS_READ (dra) && DR_IS_READ (drb))
962 if (ia == bad || ib == bad
963 || !determine_offset (dra, drb, &dummy_off))
964 continue;
966 /* If A is read and B write or vice versa and there is unsuitable
967 dependence, instead of merging both components into a component
968 that will certainly not pass suitable_component_p, just put the
969 read into bad component, perhaps at least the write together with
970 all the other data refs in it's component will be optimizable. */
971 else if (DR_IS_READ (dra) && ib != bad)
973 if (ia == bad)
975 bitmap_set_bit (no_store_store_comps, ib);
976 continue;
978 else if (!determine_offset (dra, drb, &dummy_off))
980 bitmap_set_bit (no_store_store_comps, ib);
981 merge_comps (comp_father, comp_size, bad, ia);
982 continue;
985 else if (DR_IS_READ (drb) && ia != bad)
987 if (ib == bad)
989 bitmap_set_bit (no_store_store_comps, ia);
990 continue;
992 else if (!determine_offset (dra, drb, &dummy_off))
994 bitmap_set_bit (no_store_store_comps, ia);
995 merge_comps (comp_father, comp_size, bad, ib);
996 continue;
999 else if (DR_IS_WRITE (dra) && DR_IS_WRITE (drb)
1000 && ia != bad && ib != bad
1001 && !determine_offset (dra, drb, &dummy_off))
1003 merge_comps (comp_father, comp_size, bad, ia);
1004 merge_comps (comp_father, comp_size, bad, ib);
1005 continue;
1008 merge_comps (comp_father, comp_size, ia, ib);
1011 if (eliminate_store_p)
1013 tree niters = number_of_latch_executions (m_loop);
1015 /* Don't do store elimination if niters info is unknown because stores
1016 in the last iteration can't be eliminated and we need to recover it
1017 after loop. */
1018 eliminate_store_p = (niters != NULL_TREE && niters != chrec_dont_know);
1021 auto_vec<struct component *> comps;
1022 comps.safe_grow_cleared (n, true);
1023 bad = component_of (comp_father, n);
1024 FOR_EACH_VEC_ELT (m_datarefs, i, dr)
1026 ia = (unsigned) (size_t) dr->aux;
1027 ca = component_of (comp_father, ia);
1028 if (ca == bad)
1029 continue;
1031 comp = comps[ca];
1032 if (!comp)
1034 comp = new component (eliminate_store_p);
1035 comp->refs.reserve_exact (comp_size[ca]);
1036 comps[ca] = comp;
1039 dataref = XCNEW (class dref_d);
1040 dataref->ref = dr;
1041 dataref->stmt = DR_STMT (dr);
1042 dataref->offset = 0;
1043 dataref->distance = 0;
1045 dataref->always_accessed
1046 = dominated_by_p (CDI_DOMINATORS, last_always_executed,
1047 gimple_bb (dataref->stmt));
1048 dataref->pos = comp->refs.length ();
1049 comp->refs.quick_push (dataref);
1052 if (eliminate_store_p)
1054 bitmap_iterator bi;
1055 EXECUTE_IF_SET_IN_BITMAP (no_store_store_comps, 0, ia, bi)
1057 ca = component_of (comp_father, ia);
1058 if (ca != bad)
1059 comps[ca]->eliminate_store_p = false;
1063 for (i = 0; i < n; i++)
1065 comp = comps[i];
1066 if (comp)
1068 comp->next = comp_list;
1069 comp_list = comp;
1072 return comp_list;
1075 /* Returns true if the component COMP satisfies the conditions
1076 described in 2) at the beginning of this file. */
1078 bool
1079 pcom_worker::suitable_component_p (struct component *comp)
1081 unsigned i;
1082 dref a, first;
1083 basic_block ba, bp = m_loop->header;
1084 bool ok, has_write = false;
1086 FOR_EACH_VEC_ELT (comp->refs, i, a)
1088 ba = gimple_bb (a->stmt);
1090 if (!just_once_each_iteration_p (m_loop, ba))
1091 return false;
1093 gcc_assert (dominated_by_p (CDI_DOMINATORS, ba, bp));
1094 bp = ba;
1096 if (DR_IS_WRITE (a->ref))
1097 has_write = true;
1100 first = comp->refs[0];
1101 ok = suitable_reference_p (first->ref, &comp->comp_step);
1102 gcc_assert (ok);
1103 first->offset = 0;
1105 for (i = 1; comp->refs.iterate (i, &a); i++)
1107 /* Polynomial offsets are no use, since we need to know the
1108 gap between iteration numbers at compile time. */
1109 poly_widest_int offset;
1110 if (!determine_offset (first->ref, a->ref, &offset)
1111 || !offset.is_constant (&a->offset))
1112 return false;
1114 enum ref_step_type a_step;
1115 gcc_checking_assert (suitable_reference_p (a->ref, &a_step)
1116 && a_step == comp->comp_step);
1119 /* If there is a write inside the component, we must know whether the
1120 step is nonzero or not -- we would not otherwise be able to recognize
1121 whether the value accessed by reads comes from the OFFSET-th iteration
1122 or the previous one. */
1123 if (has_write && comp->comp_step == RS_ANY)
1124 return false;
1126 return true;
1129 /* Check the conditions on references inside each of components COMPS,
1130 and remove the unsuitable components from the list. The new list
1131 of components is returned. The conditions are described in 2) at
1132 the beginning of this file. */
1134 struct component *
1135 pcom_worker::filter_suitable_components (struct component *comps)
1137 struct component **comp, *act;
1139 for (comp = &comps; *comp; )
1141 act = *comp;
1142 if (suitable_component_p (act))
1143 comp = &act->next;
1144 else
1146 dref ref;
1147 unsigned i;
1149 *comp = act->next;
1150 FOR_EACH_VEC_ELT (act->refs, i, ref)
1151 free (ref);
1152 delete act;
1156 return comps;
1159 /* Compares two drefs A and B by their offset and position. Callback for
1160 qsort. */
1162 static int
1163 order_drefs (const void *a, const void *b)
1165 const dref *const da = (const dref *) a;
1166 const dref *const db = (const dref *) b;
1167 int offcmp = wi::cmps ((*da)->offset, (*db)->offset);
1169 if (offcmp != 0)
1170 return offcmp;
1172 return (*da)->pos - (*db)->pos;
1175 /* Compares two drefs A and B by their position. Callback for qsort. */
1177 static int
1178 order_drefs_by_pos (const void *a, const void *b)
1180 const dref *const da = (const dref *) a;
1181 const dref *const db = (const dref *) b;
1183 return (*da)->pos - (*db)->pos;
1186 /* Returns root of the CHAIN. */
1188 static inline dref
1189 get_chain_root (chain_p chain)
1191 return chain->refs[0];
1194 /* Given CHAIN, returns the last write ref at DISTANCE, or NULL if it doesn't
1195 exist. */
1197 static inline dref
1198 get_chain_last_write_at (chain_p chain, unsigned distance)
1200 for (unsigned i = chain->refs.length (); i > 0; i--)
1201 if (DR_IS_WRITE (chain->refs[i - 1]->ref)
1202 && distance == chain->refs[i - 1]->distance)
1203 return chain->refs[i - 1];
1205 return NULL;
1208 /* Given CHAIN, returns the last write ref with the same distance before load
1209 at index LOAD_IDX, or NULL if it doesn't exist. */
1211 static inline dref
1212 get_chain_last_write_before_load (chain_p chain, unsigned load_idx)
1214 gcc_assert (load_idx < chain->refs.length ());
1216 unsigned distance = chain->refs[load_idx]->distance;
1218 for (unsigned i = load_idx; i > 0; i--)
1219 if (DR_IS_WRITE (chain->refs[i - 1]->ref)
1220 && distance == chain->refs[i - 1]->distance)
1221 return chain->refs[i - 1];
1223 return NULL;
1226 /* Adds REF to the chain CHAIN. */
1228 static void
1229 add_ref_to_chain (chain_p chain, dref ref)
1231 dref root = get_chain_root (chain);
1233 gcc_assert (wi::les_p (root->offset, ref->offset));
1234 widest_int dist = ref->offset - root->offset;
1235 gcc_assert (wi::fits_uhwi_p (dist));
1237 chain->refs.safe_push (ref);
1239 ref->distance = dist.to_uhwi ();
1241 if (ref->distance >= chain->length)
1243 chain->length = ref->distance;
1244 chain->has_max_use_after = false;
1247 /* Promote this chain to CT_STORE_STORE if it has multiple stores. */
1248 if (DR_IS_WRITE (ref->ref))
1249 chain->type = CT_STORE_STORE;
1251 /* Don't set the flag for store-store chain since there is no use. */
1252 if (chain->type != CT_STORE_STORE
1253 && ref->distance == chain->length
1254 && ref->pos > root->pos)
1255 chain->has_max_use_after = true;
1257 chain->all_always_accessed &= ref->always_accessed;
1260 /* Returns the chain for invariant component COMP. */
1262 static chain_p
1263 make_invariant_chain (struct component *comp)
1265 chain_p chain = new struct chain (CT_INVARIANT);
1266 unsigned i;
1267 dref ref;
1269 chain->all_always_accessed = true;
1271 FOR_EACH_VEC_ELT (comp->refs, i, ref)
1273 chain->refs.safe_push (ref);
1274 chain->all_always_accessed &= ref->always_accessed;
1277 chain->inits = vNULL;
1278 chain->finis = vNULL;
1280 return chain;
1283 /* Make a new chain of type TYPE rooted at REF. */
1285 static chain_p
1286 make_rooted_chain (dref ref, enum chain_type type)
1288 chain_p chain = new struct chain (type);
1290 chain->refs.safe_push (ref);
1291 chain->all_always_accessed = ref->always_accessed;
1292 ref->distance = 0;
1294 chain->inits = vNULL;
1295 chain->finis = vNULL;
1297 return chain;
1300 /* Returns true if CHAIN is not trivial. */
1302 static bool
1303 nontrivial_chain_p (chain_p chain)
1305 return chain != NULL && chain->refs.length () > 1;
1308 /* Returns the ssa name that contains the value of REF, or NULL_TREE if there
1309 is no such name. */
1311 static tree
1312 name_for_ref (dref ref)
1314 tree name;
1316 if (is_gimple_assign (ref->stmt))
1318 if (!ref->ref || DR_IS_READ (ref->ref))
1319 name = gimple_assign_lhs (ref->stmt);
1320 else
1321 name = gimple_assign_rhs1 (ref->stmt);
1323 else
1324 name = PHI_RESULT (ref->stmt);
1326 return (TREE_CODE (name) == SSA_NAME ? name : NULL_TREE);
1329 /* Returns true if REF is a valid initializer for ROOT with given DISTANCE (in
1330 iterations of the innermost enclosing loop). */
1332 bool
1333 pcom_worker::valid_initializer_p (struct data_reference *ref, unsigned distance,
1334 struct data_reference *root)
1336 aff_tree diff, base, step;
1337 poly_widest_int off;
1339 /* Both REF and ROOT must be accessing the same object. */
1340 if (!operand_equal_p (DR_BASE_ADDRESS (ref), DR_BASE_ADDRESS (root), 0))
1341 return false;
1343 /* The initializer is defined outside of loop, hence its address must be
1344 invariant inside the loop. */
1345 gcc_assert (integer_zerop (DR_STEP (ref)));
1347 /* If the address of the reference is invariant, initializer must access
1348 exactly the same location. */
1349 if (integer_zerop (DR_STEP (root)))
1350 return (operand_equal_p (DR_OFFSET (ref), DR_OFFSET (root), 0)
1351 && operand_equal_p (DR_INIT (ref), DR_INIT (root), 0));
1353 /* Verify that this index of REF is equal to the root's index at
1354 -DISTANCE-th iteration. */
1355 aff_combination_dr_offset (root, &diff);
1356 aff_combination_dr_offset (ref, &base);
1357 aff_combination_scale (&base, -1);
1358 aff_combination_add (&diff, &base);
1360 tree_to_aff_combination_expand (DR_STEP (root), TREE_TYPE (DR_STEP (root)),
1361 &step, &m_cache);
1362 if (!aff_combination_constant_multiple_p (&diff, &step, &off))
1363 return false;
1365 if (maybe_ne (off, distance))
1366 return false;
1368 return true;
1371 /* Finds looparound phi node of loop that copies the value of REF, and if its
1372 initial value is correct (equal to initial value of REF shifted by one
1373 iteration), returns the phi node. Otherwise, NULL_TREE is returned. ROOT
1374 is the root of the current chain. */
1376 gphi *
1377 pcom_worker::find_looparound_phi (dref ref, dref root)
1379 tree name, init, init_ref;
1380 gimple *init_stmt;
1381 edge latch = loop_latch_edge (m_loop);
1382 struct data_reference init_dr;
1383 gphi_iterator psi;
1385 if (is_gimple_assign (ref->stmt))
1387 if (DR_IS_READ (ref->ref))
1388 name = gimple_assign_lhs (ref->stmt);
1389 else
1390 name = gimple_assign_rhs1 (ref->stmt);
1392 else
1393 name = PHI_RESULT (ref->stmt);
1394 if (!name)
1395 return NULL;
1397 tree entry_vuse = NULL_TREE;
1398 gphi *phi = NULL;
1399 for (psi = gsi_start_phis (m_loop->header); !gsi_end_p (psi); gsi_next (&psi))
1401 gphi *p = psi.phi ();
1402 if (PHI_ARG_DEF_FROM_EDGE (p, latch) == name)
1403 phi = p;
1404 else if (virtual_operand_p (gimple_phi_result (p)))
1405 entry_vuse = PHI_ARG_DEF_FROM_EDGE (p, loop_preheader_edge (m_loop));
1406 if (phi && entry_vuse)
1407 break;
1409 if (!phi || !entry_vuse)
1410 return NULL;
1412 init = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (m_loop));
1413 if (TREE_CODE (init) != SSA_NAME)
1414 return NULL;
1415 init_stmt = SSA_NAME_DEF_STMT (init);
1416 if (gimple_code (init_stmt) != GIMPLE_ASSIGN)
1417 return NULL;
1418 gcc_assert (gimple_assign_lhs (init_stmt) == init);
1420 init_ref = gimple_assign_rhs1 (init_stmt);
1421 if (!REFERENCE_CLASS_P (init_ref)
1422 && !DECL_P (init_ref))
1423 return NULL;
1425 /* Analyze the behavior of INIT_REF with respect to LOOP (innermost
1426 loop enclosing PHI). */
1427 memset (&init_dr, 0, sizeof (struct data_reference));
1428 DR_REF (&init_dr) = init_ref;
1429 DR_STMT (&init_dr) = phi;
1430 if (!dr_analyze_innermost (&DR_INNERMOST (&init_dr), init_ref, m_loop,
1431 init_stmt))
1432 return NULL;
1434 if (!valid_initializer_p (&init_dr, ref->distance + 1, root->ref))
1435 return NULL;
1437 /* Make sure nothing clobbers the location we re-use the initial value
1438 from. */
1439 if (entry_vuse != gimple_vuse (init_stmt))
1441 ao_ref ref;
1442 ao_ref_init (&ref, init_ref);
1443 unsigned limit = param_sccvn_max_alias_queries_per_access;
1444 tree vdef = entry_vuse;
1447 gimple *def = SSA_NAME_DEF_STMT (vdef);
1448 if (limit-- == 0 || gimple_code (def) == GIMPLE_PHI)
1449 return NULL;
1450 if (stmt_may_clobber_ref_p_1 (def, &ref))
1451 /* When the stmt is an assign to init_ref we could in theory
1452 use its RHS for the initial value of the looparound PHI
1453 we replace in prepare_initializers_chain, but we have
1454 no convenient place to store this info at the moment. */
1455 return NULL;
1456 vdef = gimple_vuse (def);
1458 while (vdef != gimple_vuse (init_stmt));
1461 return phi;
1464 /* Adds a reference for the looparound copy of REF in PHI to CHAIN. */
1466 static void
1467 insert_looparound_copy (chain_p chain, dref ref, gphi *phi)
1469 dref nw = XCNEW (class dref_d), aref;
1470 unsigned i;
1472 nw->stmt = phi;
1473 nw->distance = ref->distance + 1;
1474 nw->always_accessed = 1;
1476 FOR_EACH_VEC_ELT (chain->refs, i, aref)
1477 if (aref->distance >= nw->distance)
1478 break;
1479 chain->refs.safe_insert (i, nw);
1481 if (nw->distance > chain->length)
1483 chain->length = nw->distance;
1484 chain->has_max_use_after = false;
1488 /* For references in CHAIN that are copied around the loop (created previously
1489 by PRE, or by user), add the results of such copies to the chain. This
1490 enables us to remove the copies by unrolling, and may need less registers
1491 (also, it may allow us to combine chains together). */
1493 void
1494 pcom_worker::add_looparound_copies (chain_p chain)
1496 unsigned i;
1497 dref ref, root = get_chain_root (chain);
1498 gphi *phi;
1500 if (chain->type == CT_STORE_STORE)
1501 return;
1503 FOR_EACH_VEC_ELT (chain->refs, i, ref)
1505 phi = find_looparound_phi (ref, root);
1506 if (!phi)
1507 continue;
1509 bitmap_set_bit (m_looparound_phis, SSA_NAME_VERSION (PHI_RESULT (phi)));
1510 insert_looparound_copy (chain, ref, phi);
1514 /* Find roots of the values and determine distances in the component COMP.
1515 The references are redistributed into chains. */
1517 void
1518 pcom_worker::determine_roots_comp (struct component *comp)
1520 unsigned i;
1521 dref a;
1522 chain_p chain = NULL;
1523 widest_int last_ofs = 0;
1524 enum chain_type type;
1526 /* Invariants are handled specially. */
1527 if (comp->comp_step == RS_INVARIANT)
1529 chain = make_invariant_chain (comp);
1530 m_chains.safe_push (chain);
1531 return;
1534 /* Trivial component. */
1535 if (comp->refs.length () <= 1)
1537 if (comp->refs.length () == 1)
1539 free (comp->refs[0]);
1540 comp->refs.truncate (0);
1542 return;
1545 comp->refs.qsort (order_drefs);
1547 /* For Store-Store chain, we only support load if it is dominated by a
1548 store statement in the same iteration of loop. */
1549 if (comp->eliminate_store_p)
1550 for (a = NULL, i = 0; i < comp->refs.length (); i++)
1552 if (DR_IS_WRITE (comp->refs[i]->ref))
1553 a = comp->refs[i];
1554 else if (a == NULL || a->offset != comp->refs[i]->offset)
1556 /* If there is load that is not dominated by a store in the
1557 same iteration of loop, clear the flag so no Store-Store
1558 chain is generated for this component. */
1559 comp->eliminate_store_p = false;
1560 break;
1564 /* Determine roots and create chains for components. */
1565 FOR_EACH_VEC_ELT (comp->refs, i, a)
1567 if (!chain
1568 || (chain->type == CT_LOAD && DR_IS_WRITE (a->ref))
1569 || (!comp->eliminate_store_p && DR_IS_WRITE (a->ref))
1570 || wi::leu_p (MAX_DISTANCE, a->offset - last_ofs))
1572 if (nontrivial_chain_p (chain))
1574 add_looparound_copies (chain);
1575 m_chains.safe_push (chain);
1577 else
1578 release_chain (chain);
1580 /* Determine type of the chain. If the root reference is a load,
1581 this can only be a CT_LOAD chain; other chains are intialized
1582 to CT_STORE_LOAD and might be promoted to CT_STORE_STORE when
1583 new reference is added. */
1584 type = DR_IS_READ (a->ref) ? CT_LOAD : CT_STORE_LOAD;
1585 chain = make_rooted_chain (a, type);
1586 last_ofs = a->offset;
1587 continue;
1590 add_ref_to_chain (chain, a);
1593 if (nontrivial_chain_p (chain))
1595 add_looparound_copies (chain);
1596 m_chains.safe_push (chain);
1598 else
1599 release_chain (chain);
1602 /* Find roots of the values and determine distances in components COMPS, and
1603 separates the references to chains. */
1605 void
1606 pcom_worker::determine_roots (struct component *comps)
1608 struct component *comp;
1610 for (comp = comps; comp; comp = comp->next)
1611 determine_roots_comp (comp);
1614 /* Replace the reference in statement STMT with temporary variable
1615 NEW_TREE. If SET is true, NEW_TREE is instead initialized to the value of
1616 the reference in the statement. IN_LHS is true if the reference
1617 is in the lhs of STMT, false if it is in rhs. */
1619 static void
1620 replace_ref_with (gimple *stmt, tree new_tree, bool set, bool in_lhs)
1622 tree val;
1623 gassign *new_stmt;
1624 gimple_stmt_iterator bsi, psi;
1626 if (gimple_code (stmt) == GIMPLE_PHI)
1628 gcc_assert (!in_lhs && !set);
1630 val = PHI_RESULT (stmt);
1631 bsi = gsi_after_labels (gimple_bb (stmt));
1632 psi = gsi_for_stmt (stmt);
1633 remove_phi_node (&psi, false);
1635 /* Turn the phi node into GIMPLE_ASSIGN. */
1636 new_stmt = gimple_build_assign (val, new_tree);
1637 gsi_insert_before (&bsi, new_stmt, GSI_NEW_STMT);
1638 return;
1641 /* Since the reference is of gimple_reg type, it should only
1642 appear as lhs or rhs of modify statement. */
1643 gcc_assert (is_gimple_assign (stmt));
1645 bsi = gsi_for_stmt (stmt);
1647 /* If we do not need to initialize NEW_TREE, just replace the use of OLD. */
1648 if (!set)
1650 gcc_assert (!in_lhs);
1651 gimple_assign_set_rhs_from_tree (&bsi, new_tree);
1652 stmt = gsi_stmt (bsi);
1653 update_stmt (stmt);
1654 return;
1657 if (in_lhs)
1659 /* We have statement
1661 OLD = VAL
1663 If OLD is a memory reference, then VAL is gimple_val, and we transform
1664 this to
1666 OLD = VAL
1667 NEW = VAL
1669 Otherwise, we are replacing a combination chain,
1670 VAL is the expression that performs the combination, and OLD is an
1671 SSA name. In this case, we transform the assignment to
1673 OLD = VAL
1674 NEW = OLD
1678 val = gimple_assign_lhs (stmt);
1679 if (TREE_CODE (val) != SSA_NAME)
1681 val = gimple_assign_rhs1 (stmt);
1682 gcc_assert (gimple_assign_single_p (stmt));
1683 if (TREE_CLOBBER_P (val))
1684 val = get_or_create_ssa_default_def (cfun, SSA_NAME_VAR (new_tree));
1685 else
1686 gcc_assert (gimple_assign_copy_p (stmt));
1689 else
1691 /* VAL = OLD
1693 is transformed to
1695 VAL = OLD
1696 NEW = VAL */
1698 val = gimple_assign_lhs (stmt);
1701 new_stmt = gimple_build_assign (new_tree, unshare_expr (val));
1702 gsi_insert_after (&bsi, new_stmt, GSI_NEW_STMT);
1705 /* Returns a memory reference to DR in the (NITERS + ITER)-th iteration
1706 of the loop it was analyzed in. Append init stmts to STMTS. */
1708 static tree
1709 ref_at_iteration (data_reference_p dr, int iter,
1710 gimple_seq *stmts, tree niters = NULL_TREE)
1712 tree off = DR_OFFSET (dr);
1713 tree coff = DR_INIT (dr);
1714 tree ref = DR_REF (dr);
1715 enum tree_code ref_code = ERROR_MARK;
1716 tree ref_type = NULL_TREE;
1717 tree ref_op1 = NULL_TREE;
1718 tree ref_op2 = NULL_TREE;
1719 tree new_offset;
1721 if (iter != 0)
1723 new_offset = size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter));
1724 if (TREE_CODE (new_offset) == INTEGER_CST)
1725 coff = size_binop (PLUS_EXPR, coff, new_offset);
1726 else
1727 off = size_binop (PLUS_EXPR, off, new_offset);
1730 if (niters != NULL_TREE)
1732 niters = fold_convert (ssizetype, niters);
1733 new_offset = size_binop (MULT_EXPR, DR_STEP (dr), niters);
1734 if (TREE_CODE (niters) == INTEGER_CST)
1735 coff = size_binop (PLUS_EXPR, coff, new_offset);
1736 else
1737 off = size_binop (PLUS_EXPR, off, new_offset);
1740 /* While data-ref analysis punts on bit offsets it still handles
1741 bitfield accesses at byte boundaries. Cope with that. Note that
1742 if the bitfield object also starts at a byte-boundary we can simply
1743 replicate the COMPONENT_REF, but we have to subtract the component's
1744 byte-offset from the MEM_REF address first.
1745 Otherwise we simply build a BIT_FIELD_REF knowing that the bits
1746 start at offset zero. */
1747 if (TREE_CODE (ref) == COMPONENT_REF
1748 && DECL_BIT_FIELD (TREE_OPERAND (ref, 1)))
1750 unsigned HOST_WIDE_INT boff;
1751 tree field = TREE_OPERAND (ref, 1);
1752 tree offset = component_ref_field_offset (ref);
1753 ref_type = TREE_TYPE (ref);
1754 boff = tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field));
1755 /* This can occur in Ada. See the comment in get_bit_range. */
1756 if (boff % BITS_PER_UNIT != 0
1757 || !tree_fits_uhwi_p (offset))
1759 ref_code = BIT_FIELD_REF;
1760 ref_op1 = DECL_SIZE (field);
1761 ref_op2 = bitsize_zero_node;
1763 else
1765 boff >>= LOG2_BITS_PER_UNIT;
1766 boff += tree_to_uhwi (offset);
1767 coff = size_binop (MINUS_EXPR, coff, ssize_int (boff));
1768 ref_code = COMPONENT_REF;
1769 ref_op1 = field;
1770 ref_op2 = TREE_OPERAND (ref, 2);
1771 ref = TREE_OPERAND (ref, 0);
1774 tree addr = fold_build_pointer_plus (DR_BASE_ADDRESS (dr), off);
1775 addr = force_gimple_operand_1 (unshare_expr (addr), stmts,
1776 is_gimple_mem_ref_addr, NULL_TREE);
1777 tree alias_ptr = fold_convert (reference_alias_ptr_type (ref), coff);
1778 tree type = build_aligned_type (TREE_TYPE (ref),
1779 get_object_alignment (ref));
1780 ref = build2 (MEM_REF, type, addr, alias_ptr);
1781 if (ref_type)
1782 ref = build3 (ref_code, ref_type, ref, ref_op1, ref_op2);
1783 return ref;
1786 /* Get the initialization expression for the INDEX-th temporary variable
1787 of CHAIN. */
1789 static tree
1790 get_init_expr (chain_p chain, unsigned index)
1792 if (chain->type == CT_COMBINATION)
1794 tree e1 = get_init_expr (chain->ch1, index);
1795 tree e2 = get_init_expr (chain->ch2, index);
1797 return fold_build2 (chain->op, chain->rslt_type, e1, e2);
1799 else
1800 return chain->inits[index];
1803 /* Returns a new temporary variable used for the I-th variable carrying
1804 value of REF. The variable's uid is marked in TMP_VARS. */
1806 static tree
1807 predcom_tmp_var (tree ref, unsigned i, bitmap tmp_vars)
1809 tree type = TREE_TYPE (ref);
1810 /* We never access the components of the temporary variable in predictive
1811 commoning. */
1812 tree var = create_tmp_reg (type, get_lsm_tmp_name (ref, i));
1813 bitmap_set_bit (tmp_vars, DECL_UID (var));
1814 return var;
1817 /* Creates the variables for CHAIN, as well as phi nodes for them and
1818 initialization on entry to LOOP. Uids of the newly created
1819 temporary variables are marked in TMP_VARS. */
1821 static void
1822 initialize_root_vars (class loop *loop, chain_p chain, bitmap tmp_vars)
1824 unsigned i;
1825 unsigned n = chain->length;
1826 dref root = get_chain_root (chain);
1827 bool reuse_first = !chain->has_max_use_after;
1828 tree ref, init, var, next;
1829 gphi *phi;
1830 gimple_seq stmts;
1831 edge entry = loop_preheader_edge (loop), latch = loop_latch_edge (loop);
1833 /* If N == 0, then all the references are within the single iteration. And
1834 since this is an nonempty chain, reuse_first cannot be true. */
1835 gcc_assert (n > 0 || !reuse_first);
1837 chain->vars.create (n + 1);
1839 if (chain->type == CT_COMBINATION)
1840 ref = gimple_assign_lhs (root->stmt);
1841 else
1842 ref = DR_REF (root->ref);
1844 for (i = 0; i < n + (reuse_first ? 0 : 1); i++)
1846 var = predcom_tmp_var (ref, i, tmp_vars);
1847 chain->vars.quick_push (var);
1849 if (reuse_first)
1850 chain->vars.quick_push (chain->vars[0]);
1852 FOR_EACH_VEC_ELT (chain->vars, i, var)
1853 chain->vars[i] = make_ssa_name (var);
1855 for (i = 0; i < n; i++)
1857 var = chain->vars[i];
1858 next = chain->vars[i + 1];
1859 init = get_init_expr (chain, i);
1861 init = force_gimple_operand (init, &stmts, true, NULL_TREE);
1862 if (stmts)
1863 gsi_insert_seq_on_edge_immediate (entry, stmts);
1865 phi = create_phi_node (var, loop->header);
1866 add_phi_arg (phi, init, entry, UNKNOWN_LOCATION);
1867 add_phi_arg (phi, next, latch, UNKNOWN_LOCATION);
1871 /* For inter-iteration store elimination CHAIN in LOOP, returns true if
1872 all stores to be eliminated store loop invariant values into memory.
1873 In this case, we can use these invariant values directly after LOOP. */
1875 static bool
1876 is_inv_store_elimination_chain (class loop *loop, chain_p chain)
1878 if (chain->length == 0 || chain->type != CT_STORE_STORE)
1879 return false;
1881 gcc_assert (!chain->has_max_use_after);
1883 /* If loop iterates for unknown times or fewer times than chain->length,
1884 we still need to setup root variable and propagate it with PHI node. */
1885 tree niters = number_of_latch_executions (loop);
1886 if (TREE_CODE (niters) != INTEGER_CST
1887 || wi::leu_p (wi::to_wide (niters), chain->length))
1888 return false;
1890 /* Check stores in chain for elimination if they only store loop invariant
1891 values. */
1892 for (unsigned i = 0; i < chain->length; i++)
1894 dref a = get_chain_last_write_at (chain, i);
1895 if (a == NULL)
1896 continue;
1898 gimple *def_stmt, *stmt = a->stmt;
1899 if (!gimple_assign_single_p (stmt))
1900 return false;
1902 tree val = gimple_assign_rhs1 (stmt);
1903 if (TREE_CLOBBER_P (val))
1904 return false;
1906 if (CONSTANT_CLASS_P (val))
1907 continue;
1909 if (TREE_CODE (val) != SSA_NAME)
1910 return false;
1912 def_stmt = SSA_NAME_DEF_STMT (val);
1913 if (gimple_nop_p (def_stmt))
1914 continue;
1916 if (flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
1917 return false;
1919 return true;
1922 /* Creates root variables for store elimination CHAIN in which stores for
1923 elimination only store loop invariant values. In this case, we neither
1924 need to load root variables before loop nor propagate it with PHI nodes. */
1926 static void
1927 initialize_root_vars_store_elim_1 (chain_p chain)
1929 tree var;
1930 unsigned i, n = chain->length;
1932 chain->vars.create (n);
1933 chain->vars.safe_grow_cleared (n, true);
1935 /* Initialize root value for eliminated stores at each distance. */
1936 for (i = 0; i < n; i++)
1938 dref a = get_chain_last_write_at (chain, i);
1939 if (a == NULL)
1940 continue;
1942 var = gimple_assign_rhs1 (a->stmt);
1943 chain->vars[a->distance] = var;
1946 /* We don't propagate values with PHI nodes, so manually propagate value
1947 to bubble positions. */
1948 var = chain->vars[0];
1949 for (i = 1; i < n; i++)
1951 if (chain->vars[i] != NULL_TREE)
1953 var = chain->vars[i];
1954 continue;
1956 chain->vars[i] = var;
1959 /* Revert the vector. */
1960 for (i = 0; i < n / 2; i++)
1961 std::swap (chain->vars[i], chain->vars[n - i - 1]);
1964 /* Creates root variables for store elimination CHAIN in which stores for
1965 elimination store loop variant values. In this case, we may need to
1966 load root variables before LOOP and propagate it with PHI nodes. Uids
1967 of the newly created root variables are marked in TMP_VARS. */
1969 static void
1970 initialize_root_vars_store_elim_2 (class loop *loop,
1971 chain_p chain, bitmap tmp_vars)
1973 unsigned i, n = chain->length;
1974 tree ref, init, var, next, val, phi_result;
1975 gimple *stmt;
1976 gimple_seq stmts;
1978 chain->vars.create (n);
1980 ref = DR_REF (get_chain_root (chain)->ref);
1981 for (i = 0; i < n; i++)
1983 var = predcom_tmp_var (ref, i, tmp_vars);
1984 chain->vars.quick_push (var);
1987 FOR_EACH_VEC_ELT (chain->vars, i, var)
1988 chain->vars[i] = make_ssa_name (var);
1990 /* Root values are either rhs operand of stores to be eliminated, or
1991 loaded from memory before loop. */
1992 auto_vec<tree> vtemps;
1993 vtemps.safe_grow_cleared (n, true);
1994 for (i = 0; i < n; i++)
1996 init = get_init_expr (chain, i);
1997 if (init == NULL_TREE)
1999 /* Root value is rhs operand of the store to be eliminated if
2000 it isn't loaded from memory before loop. */
2001 dref a = get_chain_last_write_at (chain, i);
2002 val = gimple_assign_rhs1 (a->stmt);
2003 if (TREE_CLOBBER_P (val))
2005 val = get_or_create_ssa_default_def (cfun, SSA_NAME_VAR (var));
2006 gimple_assign_set_rhs1 (a->stmt, val);
2009 vtemps[n - i - 1] = val;
2011 else
2013 edge latch = loop_latch_edge (loop);
2014 edge entry = loop_preheader_edge (loop);
2016 /* Root value is loaded from memory before loop, we also need
2017 to add PHI nodes to propagate the value across iterations. */
2018 init = force_gimple_operand (init, &stmts, true, NULL_TREE);
2019 if (stmts)
2020 gsi_insert_seq_on_edge_immediate (entry, stmts);
2022 next = chain->vars[n - i];
2023 phi_result = copy_ssa_name (next);
2024 gphi *phi = create_phi_node (phi_result, loop->header);
2025 add_phi_arg (phi, init, entry, UNKNOWN_LOCATION);
2026 add_phi_arg (phi, next, latch, UNKNOWN_LOCATION);
2027 vtemps[n - i - 1] = phi_result;
2031 /* Find the insertion position. */
2032 dref last = get_chain_root (chain);
2033 for (i = 0; i < chain->refs.length (); i++)
2035 if (chain->refs[i]->pos > last->pos)
2036 last = chain->refs[i];
2039 gimple_stmt_iterator gsi = gsi_for_stmt (last->stmt);
2041 /* Insert statements copying root value to root variable. */
2042 for (i = 0; i < n; i++)
2044 var = chain->vars[i];
2045 val = vtemps[i];
2046 stmt = gimple_build_assign (var, val);
2047 gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
2051 /* Generates stores for CHAIN's eliminated stores in LOOP's last
2052 (CHAIN->length - 1) iterations. */
2054 static void
2055 finalize_eliminated_stores (class loop *loop, chain_p chain)
2057 unsigned i, n = chain->length;
2059 for (i = 0; i < n; i++)
2061 tree var = chain->vars[i];
2062 tree fini = chain->finis[n - i - 1];
2063 gimple *stmt = gimple_build_assign (fini, var);
2065 gimple_seq_add_stmt_without_update (&chain->fini_seq, stmt);
2068 if (chain->fini_seq)
2070 gsi_insert_seq_on_edge_immediate (single_exit (loop), chain->fini_seq);
2071 chain->fini_seq = NULL;
2075 /* Initializes a variable for load motion for ROOT and prepares phi nodes and
2076 initialization on entry to LOOP if necessary. The ssa name for the variable
2077 is stored in VARS. If WRITTEN is true, also a phi node to copy its value
2078 around the loop is created. Uid of the newly created temporary variable
2079 is marked in TMP_VARS. INITS is the list containing the (single)
2080 initializer. */
2082 static void
2083 initialize_root_vars_lm (class loop *loop, dref root, bool written,
2084 vec<tree> *vars, const vec<tree> &inits,
2085 bitmap tmp_vars)
2087 unsigned i;
2088 tree ref = DR_REF (root->ref), init, var, next;
2089 gimple_seq stmts;
2090 gphi *phi;
2091 edge entry = loop_preheader_edge (loop), latch = loop_latch_edge (loop);
2093 /* Find the initializer for the variable, and check that it cannot
2094 trap. */
2095 init = inits[0];
2097 vars->create (written ? 2 : 1);
2098 var = predcom_tmp_var (ref, 0, tmp_vars);
2099 vars->quick_push (var);
2100 if (written)
2101 vars->quick_push ((*vars)[0]);
2103 FOR_EACH_VEC_ELT (*vars, i, var)
2104 (*vars)[i] = make_ssa_name (var);
2106 var = (*vars)[0];
2108 init = force_gimple_operand (init, &stmts, written, NULL_TREE);
2109 if (stmts)
2110 gsi_insert_seq_on_edge_immediate (entry, stmts);
2112 if (written)
2114 next = (*vars)[1];
2115 phi = create_phi_node (var, loop->header);
2116 add_phi_arg (phi, init, entry, UNKNOWN_LOCATION);
2117 add_phi_arg (phi, next, latch, UNKNOWN_LOCATION);
2119 else
2121 gassign *init_stmt = gimple_build_assign (var, init);
2122 gsi_insert_on_edge_immediate (entry, init_stmt);
2127 /* Execute load motion for references in chain CHAIN. Uids of the newly
2128 created temporary variables are marked in TMP_VARS. */
2130 static void
2131 execute_load_motion (class loop *loop, chain_p chain, bitmap tmp_vars)
2133 auto_vec<tree> vars;
2134 dref a;
2135 unsigned n_writes = 0, ridx, i;
2136 tree var;
2138 gcc_assert (chain->type == CT_INVARIANT);
2139 gcc_assert (!chain->combined);
2140 FOR_EACH_VEC_ELT (chain->refs, i, a)
2141 if (DR_IS_WRITE (a->ref))
2142 n_writes++;
2144 /* If there are no reads in the loop, there is nothing to do. */
2145 if (n_writes == chain->refs.length ())
2146 return;
2148 initialize_root_vars_lm (loop, get_chain_root (chain), n_writes > 0,
2149 &vars, chain->inits, tmp_vars);
2151 ridx = 0;
2152 FOR_EACH_VEC_ELT (chain->refs, i, a)
2154 bool is_read = DR_IS_READ (a->ref);
2156 if (DR_IS_WRITE (a->ref))
2158 n_writes--;
2159 if (n_writes)
2161 var = vars[0];
2162 var = make_ssa_name (SSA_NAME_VAR (var));
2163 vars[0] = var;
2165 else
2166 ridx = 1;
2169 replace_ref_with (a->stmt, vars[ridx],
2170 !is_read, !is_read);
2174 /* Returns the single statement in that NAME is used, excepting
2175 the looparound phi nodes contained in one of the chains. If there is no
2176 such statement, or more statements, NULL is returned. */
2178 gimple *
2179 pcom_worker::single_nonlooparound_use (tree name)
2181 use_operand_p use;
2182 imm_use_iterator it;
2183 gimple *stmt, *ret = NULL;
2185 FOR_EACH_IMM_USE_FAST (use, it, name)
2187 stmt = USE_STMT (use);
2189 if (gimple_code (stmt) == GIMPLE_PHI)
2191 /* Ignore uses in looparound phi nodes. Uses in other phi nodes
2192 could not be processed anyway, so just fail for them. */
2193 if (bitmap_bit_p (m_looparound_phis,
2194 SSA_NAME_VERSION (PHI_RESULT (stmt))))
2195 continue;
2197 return NULL;
2199 else if (is_gimple_debug (stmt))
2200 continue;
2201 else if (ret != NULL)
2202 return NULL;
2203 else
2204 ret = stmt;
2207 return ret;
2210 /* Remove statement STMT, as well as the chain of assignments in that it is
2211 used. */
2213 void
2214 pcom_worker::remove_stmt (gimple *stmt)
2216 tree name;
2217 gimple *next;
2218 gimple_stmt_iterator psi;
2220 if (gimple_code (stmt) == GIMPLE_PHI)
2222 name = PHI_RESULT (stmt);
2223 next = single_nonlooparound_use (name);
2224 reset_debug_uses (stmt);
2225 psi = gsi_for_stmt (stmt);
2226 remove_phi_node (&psi, true);
2228 if (!next
2229 || !gimple_assign_ssa_name_copy_p (next)
2230 || gimple_assign_rhs1 (next) != name)
2231 return;
2233 stmt = next;
2236 while (1)
2238 gimple_stmt_iterator bsi;
2240 bsi = gsi_for_stmt (stmt);
2242 name = gimple_assign_lhs (stmt);
2243 if (TREE_CODE (name) == SSA_NAME)
2245 next = single_nonlooparound_use (name);
2246 reset_debug_uses (stmt);
2248 else
2250 /* This is a store to be eliminated. */
2251 gcc_assert (gimple_vdef (stmt) != NULL);
2252 next = NULL;
2255 unlink_stmt_vdef (stmt);
2256 gsi_remove (&bsi, true);
2257 release_defs (stmt);
2259 if (!next
2260 || !gimple_assign_ssa_name_copy_p (next)
2261 || gimple_assign_rhs1 (next) != name)
2262 return;
2264 stmt = next;
2268 /* Perform the predictive commoning optimization for a chain CHAIN.
2269 Uids of the newly created temporary variables are marked in TMP_VARS.*/
2271 void
2272 pcom_worker::execute_pred_commoning_chain (chain_p chain,
2273 bitmap tmp_vars)
2275 unsigned i;
2276 dref a;
2277 tree var;
2278 bool in_lhs;
2280 if (chain->combined)
2282 /* For combined chains, just remove the statements that are used to
2283 compute the values of the expression (except for the root one).
2284 We delay this until after all chains are processed. */
2286 else if (chain->type == CT_STORE_STORE)
2288 if (chain->length > 0)
2290 if (chain->inv_store_elimination)
2292 /* If dead stores in this chain only store loop invariant
2293 values, we can simply record the invariant value and use
2294 it directly after loop. */
2295 initialize_root_vars_store_elim_1 (chain);
2297 else
2299 /* If dead stores in this chain store loop variant values,
2300 we need to set up the variables by loading from memory
2301 before loop and propagating it with PHI nodes. */
2302 initialize_root_vars_store_elim_2 (m_loop, chain, tmp_vars);
2305 /* For inter-iteration store elimination chain, stores at each
2306 distance in loop's last (chain->length - 1) iterations can't
2307 be eliminated, because there is no following killing store.
2308 We need to generate these stores after loop. */
2309 finalize_eliminated_stores (m_loop, chain);
2312 bool last_store_p = true;
2313 for (i = chain->refs.length (); i > 0; i--)
2315 a = chain->refs[i - 1];
2316 /* Preserve the last store of the chain. Eliminate other stores
2317 which are killed by the last one. */
2318 if (DR_IS_WRITE (a->ref))
2320 if (last_store_p)
2321 last_store_p = false;
2322 else
2323 remove_stmt (a->stmt);
2325 continue;
2328 /* Any load in Store-Store chain must be dominated by a previous
2329 store, we replace the load reference with rhs of the store. */
2330 dref b = get_chain_last_write_before_load (chain, i - 1);
2331 gcc_assert (b != NULL);
2332 var = gimple_assign_rhs1 (b->stmt);
2333 replace_ref_with (a->stmt, var, false, false);
2336 else
2338 /* For non-combined chains, set up the variables that hold its value. */
2339 initialize_root_vars (m_loop, chain, tmp_vars);
2340 a = get_chain_root (chain);
2341 in_lhs = (chain->type == CT_STORE_LOAD
2342 || chain->type == CT_COMBINATION);
2343 replace_ref_with (a->stmt, chain->vars[chain->length], true, in_lhs);
2345 /* Replace the uses of the original references by these variables. */
2346 for (i = 1; chain->refs.iterate (i, &a); i++)
2348 var = chain->vars[chain->length - a->distance];
2349 replace_ref_with (a->stmt, var, false, false);
2354 /* Determines the unroll factor necessary to remove as many temporary variable
2355 copies as possible. CHAINS is the list of chains that will be
2356 optimized. */
2358 static unsigned
2359 determine_unroll_factor (const vec<chain_p> &chains)
2361 chain_p chain;
2362 unsigned factor = 1, af, nfactor, i;
2363 unsigned max = param_max_unroll_times;
2365 FOR_EACH_VEC_ELT (chains, i, chain)
2367 if (chain->type == CT_INVARIANT)
2368 continue;
2369 /* For now we can't handle unrolling when eliminating stores. */
2370 else if (chain->type == CT_STORE_STORE)
2371 return 1;
2373 if (chain->combined)
2375 /* For combined chains, we can't handle unrolling if we replace
2376 looparound PHIs. */
2377 dref a;
2378 unsigned j;
2379 for (j = 1; chain->refs.iterate (j, &a); j++)
2380 if (gimple_code (a->stmt) == GIMPLE_PHI)
2381 return 1;
2382 continue;
2385 /* The best unroll factor for this chain is equal to the number of
2386 temporary variables that we create for it. */
2387 af = chain->length;
2388 if (chain->has_max_use_after)
2389 af++;
2391 nfactor = factor * af / gcd (factor, af);
2392 if (nfactor <= max)
2393 factor = nfactor;
2396 return factor;
2399 /* Perform the predictive commoning optimization for chains.
2400 Uids of the newly created temporary variables are marked in TMP_VARS. */
2402 void
2403 pcom_worker::execute_pred_commoning (bitmap tmp_vars)
2405 chain_p chain;
2406 unsigned i;
2408 FOR_EACH_VEC_ELT (m_chains, i, chain)
2410 if (chain->type == CT_INVARIANT)
2411 execute_load_motion (m_loop, chain, tmp_vars);
2412 else
2413 execute_pred_commoning_chain (chain, tmp_vars);
2416 FOR_EACH_VEC_ELT (m_chains, i, chain)
2418 if (chain->type == CT_INVARIANT)
2420 else if (chain->combined)
2422 /* For combined chains, just remove the statements that are used to
2423 compute the values of the expression (except for the root one). */
2424 dref a;
2425 unsigned j;
2426 for (j = 1; chain->refs.iterate (j, &a); j++)
2427 remove_stmt (a->stmt);
2432 /* For each reference in CHAINS, if its defining statement is
2433 phi node, record the ssa name that is defined by it. */
2435 static void
2436 replace_phis_by_defined_names (vec<chain_p> &chains)
2438 chain_p chain;
2439 dref a;
2440 unsigned i, j;
2442 FOR_EACH_VEC_ELT (chains, i, chain)
2443 FOR_EACH_VEC_ELT (chain->refs, j, a)
2445 if (gimple_code (a->stmt) == GIMPLE_PHI)
2447 a->name_defined_by_phi = PHI_RESULT (a->stmt);
2448 a->stmt = NULL;
2453 /* For each reference in CHAINS, if name_defined_by_phi is not
2454 NULL, use it to set the stmt field. */
2456 static void
2457 replace_names_by_phis (vec<chain_p> chains)
2459 chain_p chain;
2460 dref a;
2461 unsigned i, j;
2463 FOR_EACH_VEC_ELT (chains, i, chain)
2464 FOR_EACH_VEC_ELT (chain->refs, j, a)
2465 if (a->stmt == NULL)
2467 a->stmt = SSA_NAME_DEF_STMT (a->name_defined_by_phi);
2468 gcc_assert (gimple_code (a->stmt) == GIMPLE_PHI);
2469 a->name_defined_by_phi = NULL_TREE;
2473 /* Wrapper over execute_pred_commoning, to pass it as a callback
2474 to tree_transform_and_unroll_loop. */
2476 struct epcc_data
2478 vec<chain_p> chains;
2479 bitmap tmp_vars;
2480 pcom_worker *worker;
2483 static void
2484 execute_pred_commoning_cbck (class loop *loop ATTRIBUTE_UNUSED, void *data)
2486 struct epcc_data *const dta = (struct epcc_data *) data;
2487 pcom_worker *worker = dta->worker;
2489 /* Restore phi nodes that were replaced by ssa names before
2490 tree_transform_and_unroll_loop (see detailed description in
2491 tree_predictive_commoning_loop). */
2492 replace_names_by_phis (dta->chains);
2493 worker->execute_pred_commoning (dta->tmp_vars);
2496 /* Base NAME and all the names in the chain of phi nodes that use it
2497 on variable VAR. The phi nodes are recognized by being in the copies of
2498 the header of the LOOP. */
2500 static void
2501 base_names_in_chain_on (class loop *loop, tree name, tree var)
2503 gimple *stmt, *phi;
2504 imm_use_iterator iter;
2506 replace_ssa_name_symbol (name, var);
2508 while (1)
2510 phi = NULL;
2511 FOR_EACH_IMM_USE_STMT (stmt, iter, name)
2513 if (gimple_code (stmt) == GIMPLE_PHI
2514 && flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
2516 phi = stmt;
2517 break;
2520 if (!phi)
2521 return;
2523 name = PHI_RESULT (phi);
2524 replace_ssa_name_symbol (name, var);
2528 /* Given an unrolled LOOP after predictive commoning, remove the
2529 register copies arising from phi nodes by changing the base
2530 variables of SSA names. TMP_VARS is the set of the temporary variables
2531 for those we want to perform this. */
2533 static void
2534 eliminate_temp_copies (class loop *loop, bitmap tmp_vars)
2536 edge e;
2537 gphi *phi;
2538 gimple *stmt;
2539 tree name, use, var;
2540 gphi_iterator psi;
2542 e = loop_latch_edge (loop);
2543 for (psi = gsi_start_phis (loop->header); !gsi_end_p (psi); gsi_next (&psi))
2545 phi = psi.phi ();
2546 name = PHI_RESULT (phi);
2547 var = SSA_NAME_VAR (name);
2548 if (!var || !bitmap_bit_p (tmp_vars, DECL_UID (var)))
2549 continue;
2550 use = PHI_ARG_DEF_FROM_EDGE (phi, e);
2551 gcc_assert (TREE_CODE (use) == SSA_NAME);
2553 /* Base all the ssa names in the ud and du chain of NAME on VAR. */
2554 stmt = SSA_NAME_DEF_STMT (use);
2555 while (gimple_code (stmt) == GIMPLE_PHI
2556 /* In case we could not unroll the loop enough to eliminate
2557 all copies, we may reach the loop header before the defining
2558 statement (in that case, some register copies will be present
2559 in loop latch in the final code, corresponding to the newly
2560 created looparound phi nodes). */
2561 && gimple_bb (stmt) != loop->header)
2563 gcc_assert (single_pred_p (gimple_bb (stmt)));
2564 use = PHI_ARG_DEF (stmt, 0);
2565 stmt = SSA_NAME_DEF_STMT (use);
2568 base_names_in_chain_on (loop, use, var);
2572 /* Returns true if CHAIN is suitable to be combined. */
2574 static bool
2575 chain_can_be_combined_p (chain_p chain)
2577 return (!chain->combined
2578 && (chain->type == CT_LOAD || chain->type == CT_COMBINATION));
2581 /* Returns the modify statement that uses NAME. Skips over assignment
2582 statements, NAME is replaced with the actual name used in the returned
2583 statement. */
2585 gimple *
2586 pcom_worker::find_use_stmt (tree *name)
2588 gimple *stmt;
2589 tree rhs, lhs;
2591 /* Skip over assignments. */
2592 while (1)
2594 stmt = single_nonlooparound_use (*name);
2595 if (!stmt)
2596 return NULL;
2598 if (gimple_code (stmt) != GIMPLE_ASSIGN)
2599 return NULL;
2601 lhs = gimple_assign_lhs (stmt);
2602 if (TREE_CODE (lhs) != SSA_NAME)
2603 return NULL;
2605 if (gimple_assign_copy_p (stmt))
2607 rhs = gimple_assign_rhs1 (stmt);
2608 if (rhs != *name)
2609 return NULL;
2611 *name = lhs;
2613 else if (get_gimple_rhs_class (gimple_assign_rhs_code (stmt))
2614 == GIMPLE_BINARY_RHS)
2615 return stmt;
2616 else
2617 return NULL;
2621 /* Returns true if we may perform reassociation for operation CODE in TYPE. */
2623 static bool
2624 may_reassociate_p (tree type, enum tree_code code)
2626 if (FLOAT_TYPE_P (type)
2627 && !flag_unsafe_math_optimizations)
2628 return false;
2630 return (commutative_tree_code (code)
2631 && associative_tree_code (code));
2634 /* If the operation used in STMT is associative and commutative, go through the
2635 tree of the same operations and returns its root. Distance to the root
2636 is stored in DISTANCE. */
2638 gimple *
2639 pcom_worker::find_associative_operation_root (gimple *stmt, unsigned *distance)
2641 tree lhs;
2642 gimple *next;
2643 enum tree_code code = gimple_assign_rhs_code (stmt);
2644 tree type = TREE_TYPE (gimple_assign_lhs (stmt));
2645 unsigned dist = 0;
2647 if (!may_reassociate_p (type, code))
2648 return NULL;
2650 while (1)
2652 lhs = gimple_assign_lhs (stmt);
2653 gcc_assert (TREE_CODE (lhs) == SSA_NAME);
2655 next = find_use_stmt (&lhs);
2656 if (!next
2657 || gimple_assign_rhs_code (next) != code)
2658 break;
2660 stmt = next;
2661 dist++;
2664 if (distance)
2665 *distance = dist;
2666 return stmt;
2669 /* Returns the common statement in that NAME1 and NAME2 have a use. If there
2670 is no such statement, returns NULL_TREE. In case the operation used on
2671 NAME1 and NAME2 is associative and commutative, returns the root of the
2672 tree formed by this operation instead of the statement that uses NAME1 or
2673 NAME2. */
2675 gimple *
2676 pcom_worker::find_common_use_stmt (tree *name1, tree *name2)
2678 gimple *stmt1, *stmt2;
2680 stmt1 = find_use_stmt (name1);
2681 if (!stmt1)
2682 return NULL;
2684 stmt2 = find_use_stmt (name2);
2685 if (!stmt2)
2686 return NULL;
2688 if (stmt1 == stmt2)
2689 return stmt1;
2691 stmt1 = find_associative_operation_root (stmt1, NULL);
2692 if (!stmt1)
2693 return NULL;
2694 stmt2 = find_associative_operation_root (stmt2, NULL);
2695 if (!stmt2)
2696 return NULL;
2698 return (stmt1 == stmt2 ? stmt1 : NULL);
2701 /* Checks whether R1 and R2 are combined together using CODE, with the result
2702 in RSLT_TYPE, in order R1 CODE R2 if SWAP is false and in order R2 CODE R1
2703 if it is true. If CODE is ERROR_MARK, set these values instead. */
2705 bool
2706 pcom_worker::combinable_refs_p (dref r1, dref r2,
2707 enum tree_code *code, bool *swap, tree *rslt_type)
2709 enum tree_code acode;
2710 bool aswap;
2711 tree atype;
2712 tree name1, name2;
2713 gimple *stmt;
2715 name1 = name_for_ref (r1);
2716 name2 = name_for_ref (r2);
2717 gcc_assert (name1 != NULL_TREE && name2 != NULL_TREE);
2719 stmt = find_common_use_stmt (&name1, &name2);
2721 if (!stmt
2722 /* A simple post-dominance check - make sure the combination
2723 is executed under the same condition as the references. */
2724 || (gimple_bb (stmt) != gimple_bb (r1->stmt)
2725 && gimple_bb (stmt) != gimple_bb (r2->stmt)))
2726 return false;
2728 acode = gimple_assign_rhs_code (stmt);
2729 aswap = (!commutative_tree_code (acode)
2730 && gimple_assign_rhs1 (stmt) != name1);
2731 atype = TREE_TYPE (gimple_assign_lhs (stmt));
2733 if (*code == ERROR_MARK)
2735 *code = acode;
2736 *swap = aswap;
2737 *rslt_type = atype;
2738 return true;
2741 return (*code == acode
2742 && *swap == aswap
2743 && *rslt_type == atype);
2746 /* Remove OP from the operation on rhs of STMT, and replace STMT with
2747 an assignment of the remaining operand. */
2749 static void
2750 remove_name_from_operation (gimple *stmt, tree op)
2752 tree other_op;
2753 gimple_stmt_iterator si;
2755 gcc_assert (is_gimple_assign (stmt));
2757 if (gimple_assign_rhs1 (stmt) == op)
2758 other_op = gimple_assign_rhs2 (stmt);
2759 else
2760 other_op = gimple_assign_rhs1 (stmt);
2762 si = gsi_for_stmt (stmt);
2763 gimple_assign_set_rhs_from_tree (&si, other_op);
2765 /* We should not have reallocated STMT. */
2766 gcc_assert (gsi_stmt (si) == stmt);
2768 update_stmt (stmt);
2771 /* Reassociates the expression in that NAME1 and NAME2 are used so that they
2772 are combined in a single statement, and returns this statement. */
2774 gimple *
2775 pcom_worker::reassociate_to_the_same_stmt (tree name1, tree name2)
2777 gimple *stmt1, *stmt2, *root1, *root2, *s1, *s2;
2778 gassign *new_stmt, *tmp_stmt;
2779 tree new_name, tmp_name, var, r1, r2;
2780 unsigned dist1, dist2;
2781 enum tree_code code;
2782 tree type = TREE_TYPE (name1);
2783 gimple_stmt_iterator bsi;
2785 stmt1 = find_use_stmt (&name1);
2786 stmt2 = find_use_stmt (&name2);
2787 root1 = find_associative_operation_root (stmt1, &dist1);
2788 root2 = find_associative_operation_root (stmt2, &dist2);
2789 code = gimple_assign_rhs_code (stmt1);
2791 gcc_assert (root1 && root2 && root1 == root2
2792 && code == gimple_assign_rhs_code (stmt2));
2794 /* Find the root of the nearest expression in that both NAME1 and NAME2
2795 are used. */
2796 r1 = name1;
2797 s1 = stmt1;
2798 r2 = name2;
2799 s2 = stmt2;
2801 while (dist1 > dist2)
2803 s1 = find_use_stmt (&r1);
2804 r1 = gimple_assign_lhs (s1);
2805 dist1--;
2807 while (dist2 > dist1)
2809 s2 = find_use_stmt (&r2);
2810 r2 = gimple_assign_lhs (s2);
2811 dist2--;
2814 while (s1 != s2)
2816 s1 = find_use_stmt (&r1);
2817 r1 = gimple_assign_lhs (s1);
2818 s2 = find_use_stmt (&r2);
2819 r2 = gimple_assign_lhs (s2);
2822 /* Remove NAME1 and NAME2 from the statements in that they are used
2823 currently. */
2824 remove_name_from_operation (stmt1, name1);
2825 remove_name_from_operation (stmt2, name2);
2827 /* Insert the new statement combining NAME1 and NAME2 before S1, and
2828 combine it with the rhs of S1. */
2829 var = create_tmp_reg (type, "predreastmp");
2830 new_name = make_ssa_name (var);
2831 new_stmt = gimple_build_assign (new_name, code, name1, name2);
2833 var = create_tmp_reg (type, "predreastmp");
2834 tmp_name = make_ssa_name (var);
2836 /* Rhs of S1 may now be either a binary expression with operation
2837 CODE, or gimple_val (in case that stmt1 == s1 or stmt2 == s1,
2838 so that name1 or name2 was removed from it). */
2839 tmp_stmt = gimple_build_assign (tmp_name, gimple_assign_rhs_code (s1),
2840 gimple_assign_rhs1 (s1),
2841 gimple_assign_rhs2 (s1));
2843 bsi = gsi_for_stmt (s1);
2844 gimple_assign_set_rhs_with_ops (&bsi, code, new_name, tmp_name);
2845 s1 = gsi_stmt (bsi);
2846 update_stmt (s1);
2848 gsi_insert_before (&bsi, new_stmt, GSI_SAME_STMT);
2849 gsi_insert_before (&bsi, tmp_stmt, GSI_SAME_STMT);
2851 return new_stmt;
2854 /* Returns the statement that combines references R1 and R2. In case R1
2855 and R2 are not used in the same statement, but they are used with an
2856 associative and commutative operation in the same expression, reassociate
2857 the expression so that they are used in the same statement. */
2859 gimple *
2860 pcom_worker::stmt_combining_refs (dref r1, dref r2)
2862 gimple *stmt1, *stmt2;
2863 tree name1 = name_for_ref (r1);
2864 tree name2 = name_for_ref (r2);
2866 stmt1 = find_use_stmt (&name1);
2867 stmt2 = find_use_stmt (&name2);
2868 if (stmt1 == stmt2)
2869 return stmt1;
2871 return reassociate_to_the_same_stmt (name1, name2);
2874 /* Tries to combine chains CH1 and CH2 together. If this succeeds, the
2875 description of the new chain is returned, otherwise we return NULL. */
2877 chain_p
2878 pcom_worker::combine_chains (chain_p ch1, chain_p ch2)
2880 dref r1, r2, nw;
2881 enum tree_code op = ERROR_MARK;
2882 bool swap = false;
2883 chain_p new_chain;
2884 unsigned i;
2885 tree rslt_type = NULL_TREE;
2887 if (ch1 == ch2)
2888 return NULL;
2889 if (ch1->length != ch2->length)
2890 return NULL;
2892 if (ch1->refs.length () != ch2->refs.length ())
2893 return NULL;
2895 for (i = 0; (ch1->refs.iterate (i, &r1)
2896 && ch2->refs.iterate (i, &r2)); i++)
2898 if (r1->distance != r2->distance)
2899 return NULL;
2901 if (!combinable_refs_p (r1, r2, &op, &swap, &rslt_type))
2902 return NULL;
2905 if (swap)
2906 std::swap (ch1, ch2);
2908 new_chain = new struct chain (CT_COMBINATION);
2909 new_chain->op = op;
2910 new_chain->ch1 = ch1;
2911 new_chain->ch2 = ch2;
2912 new_chain->rslt_type = rslt_type;
2913 new_chain->length = ch1->length;
2915 for (i = 0; (ch1->refs.iterate (i, &r1)
2916 && ch2->refs.iterate (i, &r2)); i++)
2918 nw = XCNEW (class dref_d);
2919 nw->stmt = stmt_combining_refs (r1, r2);
2920 nw->distance = r1->distance;
2922 new_chain->refs.safe_push (nw);
2925 ch1->combined = true;
2926 ch2->combined = true;
2927 return new_chain;
2930 /* Recursively update position information of all offspring chains to ROOT
2931 chain's position information. */
2933 static void
2934 update_pos_for_combined_chains (chain_p root)
2936 chain_p ch1 = root->ch1, ch2 = root->ch2;
2937 dref ref, ref1, ref2;
2938 for (unsigned j = 0; (root->refs.iterate (j, &ref)
2939 && ch1->refs.iterate (j, &ref1)
2940 && ch2->refs.iterate (j, &ref2)); ++j)
2941 ref1->pos = ref2->pos = ref->pos;
2943 if (ch1->type == CT_COMBINATION)
2944 update_pos_for_combined_chains (ch1);
2945 if (ch2->type == CT_COMBINATION)
2946 update_pos_for_combined_chains (ch2);
2949 /* Returns true if statement S1 dominates statement S2. */
2951 static bool
2952 pcom_stmt_dominates_stmt_p (gimple *s1, gimple *s2)
2954 basic_block bb1 = gimple_bb (s1), bb2 = gimple_bb (s2);
2956 if (!bb1 || s1 == s2)
2957 return true;
2959 if (bb1 == bb2)
2960 return gimple_uid (s1) < gimple_uid (s2);
2962 return dominated_by_p (CDI_DOMINATORS, bb2, bb1);
2965 /* Try to combine the chains. */
2967 void
2968 pcom_worker::try_combine_chains ()
2970 unsigned i, j;
2971 chain_p ch1, ch2, cch;
2972 auto_vec<chain_p> worklist;
2973 bool combined_p = false;
2975 FOR_EACH_VEC_ELT (m_chains, i, ch1)
2976 if (chain_can_be_combined_p (ch1))
2977 worklist.safe_push (ch1);
2979 while (!worklist.is_empty ())
2981 ch1 = worklist.pop ();
2982 if (!chain_can_be_combined_p (ch1))
2983 continue;
2985 FOR_EACH_VEC_ELT (m_chains, j, ch2)
2987 if (!chain_can_be_combined_p (ch2))
2988 continue;
2990 cch = combine_chains (ch1, ch2);
2991 if (cch)
2993 worklist.safe_push (cch);
2994 m_chains.safe_push (cch);
2995 combined_p = true;
2996 break;
3000 if (!combined_p)
3001 return;
3003 /* Setup UID for all statements in dominance order. */
3004 basic_block *bbs = get_loop_body_in_dom_order (m_loop);
3005 renumber_gimple_stmt_uids_in_blocks (bbs, m_loop->num_nodes);
3006 free (bbs);
3008 /* Re-association in combined chains may generate statements different to
3009 order of references of the original chain. We need to keep references
3010 of combined chain in dominance order so that all uses will be inserted
3011 after definitions. Note:
3012 A) This is necessary for all combined chains.
3013 B) This is only necessary for ZERO distance references because other
3014 references inherit value from loop carried PHIs.
3016 We first update position information for all combined chains. */
3017 dref ref;
3018 for (i = 0; m_chains.iterate (i, &ch1); ++i)
3020 if (ch1->type != CT_COMBINATION || ch1->combined)
3021 continue;
3023 for (j = 0; ch1->refs.iterate (j, &ref); ++j)
3024 ref->pos = gimple_uid (ref->stmt);
3026 update_pos_for_combined_chains (ch1);
3028 /* Then sort references according to newly updated position information. */
3029 for (i = 0; m_chains.iterate (i, &ch1); ++i)
3031 if (ch1->type != CT_COMBINATION && !ch1->combined)
3032 continue;
3034 /* Find the first reference with non-ZERO distance. */
3035 if (ch1->length == 0)
3036 j = ch1->refs.length();
3037 else
3039 for (j = 0; ch1->refs.iterate (j, &ref); ++j)
3040 if (ref->distance != 0)
3041 break;
3044 /* Sort all ZERO distance references by position. */
3045 qsort (&ch1->refs[0], j, sizeof (ch1->refs[0]), order_drefs_by_pos);
3047 if (ch1->combined)
3048 continue;
3050 /* For ZERO length chain, has_max_use_after must be true since root
3051 combined stmt must dominates others. */
3052 if (ch1->length == 0)
3054 ch1->has_max_use_after = true;
3055 continue;
3057 /* Check if there is use at max distance after root for combined chains
3058 and set flag accordingly. */
3059 ch1->has_max_use_after = false;
3060 gimple *root_stmt = get_chain_root (ch1)->stmt;
3061 for (j = 1; ch1->refs.iterate (j, &ref); ++j)
3063 if (ref->distance == ch1->length
3064 && !pcom_stmt_dominates_stmt_p (ref->stmt, root_stmt))
3066 ch1->has_max_use_after = true;
3067 break;
3073 /* Prepare initializers for store elimination CHAIN in LOOP. Returns false
3074 if this is impossible because one of these initializers may trap, true
3075 otherwise. */
3077 static bool
3078 prepare_initializers_chain_store_elim (class loop *loop, chain_p chain)
3080 unsigned i, n = chain->length;
3082 /* For now we can't eliminate stores if some of them are conditional
3083 executed. */
3084 if (!chain->all_always_accessed)
3085 return false;
3087 /* Nothing to intialize for intra-iteration store elimination. */
3088 if (n == 0 && chain->type == CT_STORE_STORE)
3089 return true;
3091 /* For store elimination chain, there is nothing to initialize if stores
3092 to be eliminated only store loop invariant values into memory. */
3093 if (chain->type == CT_STORE_STORE
3094 && is_inv_store_elimination_chain (loop, chain))
3096 chain->inv_store_elimination = true;
3097 return true;
3100 chain->inits.create (n);
3101 chain->inits.safe_grow_cleared (n, true);
3103 /* For store eliminatin chain like below:
3105 for (i = 0; i < len; i++)
3107 a[i] = 1;
3108 // a[i + 1] = ...
3109 a[i + 2] = 3;
3112 store to a[i + 1] is missed in loop body, it acts like bubbles. The
3113 content of a[i + 1] remain the same if the loop iterates fewer times
3114 than chain->length. We need to set up root variables for such stores
3115 by loading from memory before loop. Note we only need to load bubble
3116 elements because loop body is guaranteed to be executed at least once
3117 after loop's preheader edge. */
3118 auto_vec<bool> bubbles;
3119 bubbles.safe_grow_cleared (n + 1, true);
3120 for (i = 0; i < chain->refs.length (); i++)
3121 bubbles[chain->refs[i]->distance] = true;
3123 struct data_reference *dr = get_chain_root (chain)->ref;
3124 for (i = 0; i < n; i++)
3126 if (bubbles[i])
3127 continue;
3129 gimple_seq stmts = NULL;
3131 tree init = ref_at_iteration (dr, (int) 0 - i, &stmts);
3132 if (stmts)
3133 gimple_seq_add_seq_without_update (&chain->init_seq, stmts);
3135 chain->inits[i] = init;
3138 return true;
3141 /* Prepare initializers for CHAIN. Returns false if this is impossible
3142 because one of these initializers may trap, true otherwise. */
3144 bool
3145 pcom_worker::prepare_initializers_chain (chain_p chain)
3147 unsigned i, n = (chain->type == CT_INVARIANT) ? 1 : chain->length;
3148 struct data_reference *dr = get_chain_root (chain)->ref;
3149 tree init;
3150 dref laref;
3151 edge entry = loop_preheader_edge (m_loop);
3153 if (chain->type == CT_STORE_STORE)
3154 return prepare_initializers_chain_store_elim (m_loop, chain);
3156 /* Find the initializers for the variables, and check that they cannot
3157 trap. */
3158 chain->inits.create (n);
3159 for (i = 0; i < n; i++)
3160 chain->inits.quick_push (NULL_TREE);
3162 /* If we have replaced some looparound phi nodes, use their initializers
3163 instead of creating our own. */
3164 FOR_EACH_VEC_ELT (chain->refs, i, laref)
3166 if (gimple_code (laref->stmt) != GIMPLE_PHI)
3167 continue;
3169 gcc_assert (laref->distance > 0);
3170 chain->inits[n - laref->distance]
3171 = PHI_ARG_DEF_FROM_EDGE (laref->stmt, entry);
3174 for (i = 0; i < n; i++)
3176 gimple_seq stmts = NULL;
3178 if (chain->inits[i] != NULL_TREE)
3179 continue;
3181 init = ref_at_iteration (dr, (int) i - n, &stmts);
3182 if (!chain->all_always_accessed && tree_could_trap_p (init))
3184 gimple_seq_discard (stmts);
3185 return false;
3188 if (stmts)
3189 gimple_seq_add_seq_without_update (&chain->init_seq, stmts);
3191 chain->inits[i] = init;
3194 return true;
3197 /* Prepare initializers for chains, and free chains that cannot
3198 be used because the initializers might trap. */
3200 void
3201 pcom_worker::prepare_initializers ()
3203 chain_p chain;
3204 unsigned i;
3206 for (i = 0; i < m_chains.length (); )
3208 chain = m_chains[i];
3209 if (prepare_initializers_chain (chain))
3210 i++;
3211 else
3213 release_chain (chain);
3214 m_chains.unordered_remove (i);
3219 /* Generates finalizer memory references for CHAIN. Returns true
3220 if finalizer code for CHAIN can be generated, otherwise false. */
3222 bool
3223 pcom_worker::prepare_finalizers_chain (chain_p chain)
3225 unsigned i, n = chain->length;
3226 struct data_reference *dr = get_chain_root (chain)->ref;
3227 tree fini, niters = number_of_latch_executions (m_loop);
3229 /* For now we can't eliminate stores if some of them are conditional
3230 executed. */
3231 if (!chain->all_always_accessed)
3232 return false;
3234 chain->finis.create (n);
3235 for (i = 0; i < n; i++)
3236 chain->finis.quick_push (NULL_TREE);
3238 /* We never use looparound phi node for store elimination chains. */
3240 /* Find the finalizers for the variables, and check that they cannot
3241 trap. */
3242 for (i = 0; i < n; i++)
3244 gimple_seq stmts = NULL;
3245 gcc_assert (chain->finis[i] == NULL_TREE);
3247 if (TREE_CODE (niters) != INTEGER_CST && TREE_CODE (niters) != SSA_NAME)
3249 niters = unshare_expr (niters);
3250 niters = force_gimple_operand (niters, &stmts, true, NULL);
3251 if (stmts)
3253 gimple_seq_add_seq_without_update (&chain->fini_seq, stmts);
3254 stmts = NULL;
3257 fini = ref_at_iteration (dr, (int) 0 - i, &stmts, niters);
3258 if (stmts)
3259 gimple_seq_add_seq_without_update (&chain->fini_seq, stmts);
3261 chain->finis[i] = fini;
3264 return true;
3267 /* Generates finalizer memory reference for chains. Returns true if
3268 finalizer code generation for chains breaks loop closed ssa form. */
3270 bool
3271 pcom_worker::prepare_finalizers ()
3273 chain_p chain;
3274 unsigned i;
3275 bool loop_closed_ssa = false;
3277 for (i = 0; i < m_chains.length ();)
3279 chain = m_chains[i];
3281 /* Finalizer is only necessary for inter-iteration store elimination
3282 chains. */
3283 if (chain->length == 0 || chain->type != CT_STORE_STORE)
3285 i++;
3286 continue;
3289 if (prepare_finalizers_chain (chain))
3291 i++;
3292 /* Be conservative, assume loop closed ssa form is corrupted
3293 by store-store chain. Though it's not always the case if
3294 eliminated stores only store loop invariant values into
3295 memory. */
3296 loop_closed_ssa = true;
3298 else
3300 release_chain (chain);
3301 m_chains.unordered_remove (i);
3304 return loop_closed_ssa;
3307 /* Insert all initializing gimple stmts into LOOP's entry edge. */
3309 static void
3310 insert_init_seqs (class loop *loop, vec<chain_p> &chains)
3312 unsigned i;
3313 edge entry = loop_preheader_edge (loop);
3315 for (i = 0; i < chains.length (); ++i)
3316 if (chains[i]->init_seq)
3318 gsi_insert_seq_on_edge_immediate (entry, chains[i]->init_seq);
3319 chains[i]->init_seq = NULL;
3323 /* Performs predictive commoning for LOOP. Sets bit 1<<1 of return value
3324 if LOOP was unrolled; Sets bit 1<<2 of return value if loop closed ssa
3325 form was corrupted. Non-zero return value indicates some changes were
3326 applied to this loop. */
3328 unsigned
3329 pcom_worker::tree_predictive_commoning_loop (bool allow_unroll_p)
3331 struct component *components;
3332 unsigned unroll_factor = 0;
3333 class tree_niter_desc desc;
3334 bool unroll = false, loop_closed_ssa = false;
3336 if (dump_file && (dump_flags & TDF_DETAILS))
3337 fprintf (dump_file, "Processing loop %d\n", m_loop->num);
3339 /* Nothing for predicitive commoning if loop only iterates 1 time. */
3340 if (get_max_loop_iterations_int (m_loop) == 0)
3342 if (dump_file && (dump_flags & TDF_DETAILS))
3343 fprintf (dump_file, "Loop iterates only 1 time, nothing to do.\n");
3345 return 0;
3348 /* Find the data references and split them into components according to their
3349 dependence relations. */
3350 auto_vec<loop_p, 3> loop_nest;
3351 if (!compute_data_dependences_for_loop (m_loop, true, &loop_nest, &m_datarefs,
3352 &m_dependences))
3354 if (dump_file && (dump_flags & TDF_DETAILS))
3355 fprintf (dump_file, "Cannot analyze data dependencies\n");
3356 return 0;
3359 if (dump_file && (dump_flags & TDF_DETAILS))
3360 dump_data_dependence_relations (dump_file, m_dependences);
3362 components = split_data_refs_to_components ();
3364 loop_nest.release ();
3365 if (!components)
3366 return 0;
3368 if (dump_file && (dump_flags & TDF_DETAILS))
3370 fprintf (dump_file, "Initial state:\n\n");
3371 dump_components (dump_file, components);
3374 /* Find the suitable components and split them into chains. */
3375 components = filter_suitable_components (components);
3377 auto_bitmap tmp_vars;
3378 determine_roots (components);
3379 release_components (components);
3381 if (!m_chains.exists ())
3383 if (dump_file && (dump_flags & TDF_DETAILS))
3384 fprintf (dump_file,
3385 "Predictive commoning failed: no suitable chains\n");
3386 return 0;
3389 prepare_initializers ();
3390 loop_closed_ssa = prepare_finalizers ();
3392 /* Try to combine the chains that are always worked with together. */
3393 try_combine_chains ();
3395 insert_init_seqs (m_loop, m_chains);
3397 if (dump_file && (dump_flags & TDF_DETAILS))
3399 fprintf (dump_file, "Before commoning:\n\n");
3400 dump_chains (dump_file, m_chains);
3403 if (allow_unroll_p)
3404 /* Determine the unroll factor, and if the loop should be unrolled, ensure
3405 that its number of iterations is divisible by the factor. */
3406 unroll_factor = determine_unroll_factor (m_chains);
3408 if (unroll_factor > 1)
3409 unroll = can_unroll_loop_p (m_loop, unroll_factor, &desc);
3411 /* Execute the predictive commoning transformations, and possibly unroll the
3412 loop. */
3413 if (unroll)
3415 struct epcc_data dta;
3417 if (dump_file && (dump_flags & TDF_DETAILS))
3418 fprintf (dump_file, "Unrolling %u times.\n", unroll_factor);
3420 dta.tmp_vars = tmp_vars;
3421 dta.chains = m_chains.to_vec_legacy ();
3422 dta.worker = this;
3424 /* Cfg manipulations performed in tree_transform_and_unroll_loop before
3425 execute_pred_commoning_cbck is called may cause phi nodes to be
3426 reallocated, which is a problem since CHAINS may point to these
3427 statements. To fix this, we store the ssa names defined by the
3428 phi nodes here instead of the phi nodes themselves, and restore
3429 the phi nodes in execute_pred_commoning_cbck. A bit hacky. */
3430 replace_phis_by_defined_names (m_chains);
3432 tree_transform_and_unroll_loop (m_loop, unroll_factor, &desc,
3433 execute_pred_commoning_cbck, &dta);
3434 eliminate_temp_copies (m_loop, tmp_vars);
3436 else
3438 if (dump_file && (dump_flags & TDF_DETAILS))
3439 fprintf (dump_file,
3440 "Executing predictive commoning without unrolling.\n");
3441 execute_pred_commoning (tmp_vars);
3444 return (unroll ? 2 : 1) | (loop_closed_ssa ? 4 : 1);
3447 /* Runs predictive commoning. */
3449 unsigned
3450 tree_predictive_commoning (bool allow_unroll_p)
3452 unsigned ret = 0, changed = 0;
3454 initialize_original_copy_tables ();
3455 for (auto loop : loops_list (cfun, LI_ONLY_INNERMOST))
3456 if (optimize_loop_for_speed_p (loop))
3458 pcom_worker w(loop);
3459 changed |= w.tree_predictive_commoning_loop (allow_unroll_p);
3461 free_original_copy_tables ();
3463 if (changed > 0)
3465 ret = TODO_update_ssa_only_virtuals;
3467 /* Some loop(s) got unrolled. */
3468 if (changed > 1)
3470 scev_reset ();
3472 /* Need to fix up loop closed SSA. */
3473 if (changed >= 4)
3474 rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
3476 ret |= TODO_cleanup_cfg;
3480 return ret;
3483 /* Predictive commoning Pass. */
3485 static unsigned
3486 run_tree_predictive_commoning (struct function *fun, bool allow_unroll_p)
3488 if (number_of_loops (fun) <= 1)
3489 return 0;
3491 return tree_predictive_commoning (allow_unroll_p);
3494 namespace {
3496 const pass_data pass_data_predcom =
3498 GIMPLE_PASS, /* type */
3499 "pcom", /* name */
3500 OPTGROUP_LOOP, /* optinfo_flags */
3501 TV_PREDCOM, /* tv_id */
3502 PROP_cfg, /* properties_required */
3503 0, /* properties_provided */
3504 0, /* properties_destroyed */
3505 0, /* todo_flags_start */
3506 0, /* todo_flags_finish */
3509 class pass_predcom : public gimple_opt_pass
3511 public:
3512 pass_predcom (gcc::context *ctxt)
3513 : gimple_opt_pass (pass_data_predcom, ctxt)
3516 /* opt_pass methods: */
3517 virtual bool
3518 gate (function *)
3520 if (flag_predictive_commoning != 0)
3521 return true;
3522 /* Loop vectorization enables predictive commoning implicitly
3523 only if predictive commoning isn't set explicitly, and it
3524 doesn't allow unrolling. */
3525 if (flag_tree_loop_vectorize
3526 && !OPTION_SET_P (flag_predictive_commoning))
3527 return true;
3529 return false;
3532 virtual unsigned int
3533 execute (function *fun)
3535 bool allow_unroll_p = flag_predictive_commoning != 0;
3536 return run_tree_predictive_commoning (fun, allow_unroll_p);
3539 }; // class pass_predcom
3541 } // anon namespace
3543 gimple_opt_pass *
3544 make_pass_predcom (gcc::context *ctxt)
3546 return new pass_predcom (ctxt);