Fix typo in t-dimode
[official-gcc.git] / gcc / tree-predcom.c
blob208e755c22ec9379dd0faffc89c3b61be4e816c3
1 /* Predictive commoning.
2 Copyright (C) 2005-2021 Free Software Foundation, Inc.
4 This file is part of GCC.
6 GCC is free software; you can redistribute it and/or modify it
7 under the terms of the GNU General Public License as published by the
8 Free Software Foundation; either version 3, or (at your option) any
9 later version.
11 GCC is distributed in the hope that it will be useful, but WITHOUT
12 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
13 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
14 for more details.
16 You should have received a copy of the GNU General Public License
17 along with GCC; see the file COPYING3. If not see
18 <http://www.gnu.org/licenses/>. */
20 /* This file implements the predictive commoning optimization. Predictive
21 commoning can be viewed as CSE around a loop, and with some improvements,
22 as generalized strength reduction-- i.e., reusing values computed in
23 earlier iterations of a loop in the later ones. So far, the pass only
24 handles the most useful case, that is, reusing values of memory references.
25 If you think this is all just a special case of PRE, you are sort of right;
26 however, concentrating on loops is simpler, and makes it possible to
27 incorporate data dependence analysis to detect the opportunities, perform
28 loop unrolling to avoid copies together with renaming immediately,
29 and if needed, we could also take register pressure into account.
31 Let us demonstrate what is done on an example:
33 for (i = 0; i < 100; i++)
35 a[i+2] = a[i] + a[i+1];
36 b[10] = b[10] + i;
37 c[i] = c[99 - i];
38 d[i] = d[i + 1];
41 1) We find data references in the loop, and split them to mutually
42 independent groups (i.e., we find components of a data dependence
43 graph). We ignore read-read dependences whose distance is not constant.
44 (TODO -- we could also ignore antidependences). In this example, we
45 find the following groups:
47 a[i]{read}, a[i+1]{read}, a[i+2]{write}
48 b[10]{read}, b[10]{write}
49 c[99 - i]{read}, c[i]{write}
50 d[i + 1]{read}, d[i]{write}
52 2) Inside each of the group, we verify several conditions:
53 a) all the references must differ in indices only, and the indices
54 must all have the same step
55 b) the references must dominate loop latch (and thus, they must be
56 ordered by dominance relation).
57 c) the distance of the indices must be a small multiple of the step
58 We are then able to compute the difference of the references (# of
59 iterations before they point to the same place as the first of them).
60 Also, in case there are writes in the loop, we split the groups into
61 chains whose head is the write whose values are used by the reads in
62 the same chain. The chains are then processed independently,
63 making the further transformations simpler. Also, the shorter chains
64 need the same number of registers, but may require lower unrolling
65 factor in order to get rid of the copies on the loop latch.
67 In our example, we get the following chains (the chain for c is invalid).
69 a[i]{read,+0}, a[i+1]{read,-1}, a[i+2]{write,-2}
70 b[10]{read,+0}, b[10]{write,+0}
71 d[i + 1]{read,+0}, d[i]{write,+1}
73 3) For each read, we determine the read or write whose value it reuses,
74 together with the distance of this reuse. I.e. we take the last
75 reference before it with distance 0, or the last of the references
76 with the smallest positive distance to the read. Then, we remove
77 the references that are not used in any of these chains, discard the
78 empty groups, and propagate all the links so that they point to the
79 single root reference of the chain (adjusting their distance
80 appropriately). Some extra care needs to be taken for references with
81 step 0. In our example (the numbers indicate the distance of the
82 reuse),
84 a[i] --> (*) 2, a[i+1] --> (*) 1, a[i+2] (*)
85 b[10] --> (*) 1, b[10] (*)
87 4) The chains are combined together if possible. If the corresponding
88 elements of two chains are always combined together with the same
89 operator, we remember just the result of this combination, instead
90 of remembering the values separately. We may need to perform
91 reassociation to enable combining, for example
93 e[i] + f[i+1] + e[i+1] + f[i]
95 can be reassociated as
97 (e[i] + f[i]) + (e[i+1] + f[i+1])
99 and we can combine the chains for e and f into one chain.
101 5) For each root reference (end of the chain) R, let N be maximum distance
102 of a reference reusing its value. Variables R0 up to RN are created,
103 together with phi nodes that transfer values from R1 .. RN to
104 R0 .. R(N-1).
105 Initial values are loaded to R0..R(N-1) (in case not all references
106 must necessarily be accessed and they may trap, we may fail here;
107 TODO sometimes, the loads could be guarded by a check for the number
108 of iterations). Values loaded/stored in roots are also copied to
109 RN. Other reads are replaced with the appropriate variable Ri.
110 Everything is put to SSA form.
112 As a small improvement, if R0 is dead after the root (i.e., all uses of
113 the value with the maximum distance dominate the root), we can avoid
114 creating RN and use R0 instead of it.
116 In our example, we get (only the parts concerning a and b are shown):
117 for (i = 0; i < 100; i++)
119 f = phi (a[0], s);
120 s = phi (a[1], f);
121 x = phi (b[10], x);
123 f = f + s;
124 a[i+2] = f;
125 x = x + i;
126 b[10] = x;
129 6) Factor F for unrolling is determined as the smallest common multiple of
130 (N + 1) for each root reference (N for references for that we avoided
131 creating RN). If F and the loop is small enough, loop is unrolled F
132 times. The stores to RN (R0) in the copies of the loop body are
133 periodically replaced with R0, R1, ... (R1, R2, ...), so that they can
134 be coalesced and the copies can be eliminated.
136 TODO -- copy propagation and other optimizations may change the live
137 ranges of the temporary registers and prevent them from being coalesced;
138 this may increase the register pressure.
140 In our case, F = 2 and the (main loop of the) result is
142 for (i = 0; i < ...; i += 2)
144 f = phi (a[0], f);
145 s = phi (a[1], s);
146 x = phi (b[10], x);
148 f = f + s;
149 a[i+2] = f;
150 x = x + i;
151 b[10] = x;
153 s = s + f;
154 a[i+3] = s;
155 x = x + i;
156 b[10] = x;
159 Apart from predictive commoning on Load-Load and Store-Load chains, we
160 also support Store-Store chains -- stores killed by other store can be
161 eliminated. Given below example:
163 for (i = 0; i < n; i++)
165 a[i] = 1;
166 a[i+2] = 2;
169 It can be replaced with:
171 t0 = a[0];
172 t1 = a[1];
173 for (i = 0; i < n; i++)
175 a[i] = 1;
176 t2 = 2;
177 t0 = t1;
178 t1 = t2;
180 a[n] = t0;
181 a[n+1] = t1;
183 If the loop runs more than 1 iterations, it can be further simplified into:
185 for (i = 0; i < n; i++)
187 a[i] = 1;
189 a[n] = 2;
190 a[n+1] = 2;
192 The interesting part is this can be viewed either as general store motion
193 or general dead store elimination in either intra/inter-iterations way.
195 With trivial effort, we also support load inside Store-Store chains if the
196 load is dominated by a store statement in the same iteration of loop. You
197 can see this as a restricted Store-Mixed-Load-Store chain.
199 TODO: For now, we don't support store-store chains in multi-exit loops. We
200 force to not unroll in case of store-store chain even if other chains might
201 ask for unroll.
203 Predictive commoning can be generalized for arbitrary computations (not
204 just memory loads), and also nontrivial transfer functions (e.g., replacing
205 i * i with ii_last + 2 * i + 1), to generalize strength reduction. */
207 #include "config.h"
208 #include "system.h"
209 #include "coretypes.h"
210 #include "backend.h"
211 #include "rtl.h"
212 #include "tree.h"
213 #include "gimple.h"
214 #include "predict.h"
215 #include "tree-pass.h"
216 #include "ssa.h"
217 #include "gimple-pretty-print.h"
218 #include "alias.h"
219 #include "fold-const.h"
220 #include "cfgloop.h"
221 #include "tree-eh.h"
222 #include "gimplify.h"
223 #include "gimple-iterator.h"
224 #include "gimplify-me.h"
225 #include "tree-ssa-loop-ivopts.h"
226 #include "tree-ssa-loop-manip.h"
227 #include "tree-ssa-loop-niter.h"
228 #include "tree-ssa-loop.h"
229 #include "tree-into-ssa.h"
230 #include "tree-dfa.h"
231 #include "tree-ssa.h"
232 #include "tree-data-ref.h"
233 #include "tree-scalar-evolution.h"
234 #include "tree-affine.h"
235 #include "builtins.h"
236 #include "opts.h"
238 /* The maximum number of iterations between the considered memory
239 references. */
241 #define MAX_DISTANCE (target_avail_regs < 16 ? 4 : 8)
243 /* Data references (or phi nodes that carry data reference values across
244 loop iterations). */
246 typedef class dref_d
248 public:
249 /* The reference itself. */
250 struct data_reference *ref;
252 /* The statement in that the reference appears. */
253 gimple *stmt;
255 /* In case that STMT is a phi node, this field is set to the SSA name
256 defined by it in replace_phis_by_defined_names (in order to avoid
257 pointing to phi node that got reallocated in the meantime). */
258 tree name_defined_by_phi;
260 /* Distance of the reference from the root of the chain (in number of
261 iterations of the loop). */
262 unsigned distance;
264 /* Number of iterations offset from the first reference in the component. */
265 widest_int offset;
267 /* Number of the reference in a component, in dominance ordering. */
268 unsigned pos;
270 /* True if the memory reference is always accessed when the loop is
271 entered. */
272 unsigned always_accessed : 1;
273 } *dref;
276 /* Type of the chain of the references. */
278 enum chain_type
280 /* The addresses of the references in the chain are constant. */
281 CT_INVARIANT,
283 /* There are only loads in the chain. */
284 CT_LOAD,
286 /* Root of the chain is store, the rest are loads. */
287 CT_STORE_LOAD,
289 /* There are only stores in the chain. */
290 CT_STORE_STORE,
292 /* A combination of two chains. */
293 CT_COMBINATION
296 /* Chains of data references. */
298 typedef struct chain
300 /* Type of the chain. */
301 enum chain_type type;
303 /* For combination chains, the operator and the two chains that are
304 combined, and the type of the result. */
305 enum tree_code op;
306 tree rslt_type;
307 struct chain *ch1, *ch2;
309 /* The references in the chain. */
310 auto_vec<dref> refs;
312 /* The maximum distance of the reference in the chain from the root. */
313 unsigned length;
315 /* The variables used to copy the value throughout iterations. */
316 auto_vec<tree> vars;
318 /* Initializers for the variables. */
319 auto_vec<tree> inits;
321 /* Finalizers for the eliminated stores. */
322 auto_vec<tree> finis;
324 /* gimple stmts intializing the initial variables of the chain. */
325 gimple_seq init_seq;
327 /* gimple stmts finalizing the eliminated stores of the chain. */
328 gimple_seq fini_seq;
330 /* True if there is a use of a variable with the maximal distance
331 that comes after the root in the loop. */
332 unsigned has_max_use_after : 1;
334 /* True if all the memory references in the chain are always accessed. */
335 unsigned all_always_accessed : 1;
337 /* True if this chain was combined together with some other chain. */
338 unsigned combined : 1;
340 /* True if this is store elimination chain and eliminated stores store
341 loop invariant value into memory. */
342 unsigned inv_store_elimination : 1;
343 } *chain_p;
346 /* Describes the knowledge about the step of the memory references in
347 the component. */
349 enum ref_step_type
351 /* The step is zero. */
352 RS_INVARIANT,
354 /* The step is nonzero. */
355 RS_NONZERO,
357 /* The step may or may not be nonzero. */
358 RS_ANY
361 /* Components of the data dependence graph. */
363 struct component
365 /* The references in the component. */
366 auto_vec<dref> refs;
368 /* What we know about the step of the references in the component. */
369 enum ref_step_type comp_step;
371 /* True if all references in component are stores and we try to do
372 intra/inter loop iteration dead store elimination. */
373 bool eliminate_store_p;
375 /* Next component in the list. */
376 struct component *next;
379 /* A class to encapsulate the global states used for predictive
380 commoning work on top of one given LOOP. */
382 class pcom_worker
384 public:
385 pcom_worker (loop_p l) : m_loop (l), m_cache (NULL) {}
387 ~pcom_worker ()
389 free_data_refs (m_datarefs);
390 free_dependence_relations (m_dependences);
391 free_affine_expand_cache (&m_cache);
392 release_chains ();
395 pcom_worker (const pcom_worker &) = delete;
396 pcom_worker &operator= (const pcom_worker &) = delete;
398 /* Performs predictive commoning. */
399 unsigned tree_predictive_commoning_loop (bool allow_unroll_p);
401 /* Perform the predictive commoning optimization for chains, make this
402 public for being called in callback execute_pred_commoning_cbck. */
403 void execute_pred_commoning (bitmap tmp_vars);
405 private:
406 /* The pointer to the given loop. */
407 loop_p m_loop;
409 /* All data references. */
410 auto_vec<data_reference_p, 10> m_datarefs;
412 /* All data dependences. */
413 auto_vec<ddr_p, 10> m_dependences;
415 /* All chains. */
416 auto_vec<chain_p> m_chains;
418 /* Bitmap of ssa names defined by looparound phi nodes covered by chains. */
419 auto_bitmap m_looparound_phis;
421 typedef hash_map<tree, name_expansion *> tree_expand_map_t;
422 /* Cache used by tree_to_aff_combination_expand. */
423 tree_expand_map_t *m_cache;
425 /* Splits dependence graph to components. */
426 struct component *split_data_refs_to_components ();
428 /* Check the conditions on references inside each of components COMPS,
429 and remove the unsuitable components from the list. */
430 struct component *filter_suitable_components (struct component *comps);
432 /* Find roots of the values and determine distances in components COMPS,
433 and separates the references to chains. */
434 void determine_roots (struct component *comps);
436 /* Prepare initializers for chains, and free chains that cannot
437 be used because the initializers might trap. */
438 void prepare_initializers ();
440 /* Generates finalizer memory reference for chains. Returns true if
441 finalizer code generation for chains breaks loop closed ssa form. */
442 bool prepare_finalizers ();
444 /* Try to combine the chains. */
445 void try_combine_chains ();
447 /* Frees CHAINS. */
448 void release_chains ();
450 /* Frees a chain CHAIN. */
451 void release_chain (chain_p chain);
453 /* Prepare initializers for CHAIN. Returns false if this is impossible
454 because one of these initializers may trap, true otherwise. */
455 bool prepare_initializers_chain (chain_p chain);
457 /* Generates finalizer memory references for CHAIN. Returns true
458 if finalizer code for CHAIN can be generated, otherwise false. */
459 bool prepare_finalizers_chain (chain_p chain);
461 /* Stores DR_OFFSET (DR) + DR_INIT (DR) to OFFSET. */
462 void aff_combination_dr_offset (struct data_reference *dr, aff_tree *offset);
464 /* Determines number of iterations of the innermost enclosing loop before
465 B refers to exactly the same location as A and stores it to OFF. */
466 bool determine_offset (struct data_reference *a, struct data_reference *b,
467 poly_widest_int *off);
469 /* Returns true if the component COMP satisfies the conditions
470 described in 2) at the beginning of this file. */
471 bool suitable_component_p (struct component *comp);
473 /* Returns true if REF is a valid initializer for ROOT with given
474 DISTANCE (in iterations of the innermost enclosing loop). */
475 bool valid_initializer_p (struct data_reference *ref, unsigned distance,
476 struct data_reference *root);
478 /* Finds looparound phi node of loop that copies the value of REF. */
479 gphi *find_looparound_phi (dref ref, dref root);
481 /* Find roots of the values and determine distances in the component
482 COMP. The references are redistributed into chains. */
483 void determine_roots_comp (struct component *comp);
485 /* For references in CHAIN that are copied around the loop, add the
486 results of such copies to the chain. */
487 void add_looparound_copies (chain_p chain);
489 /* Returns the single statement in that NAME is used, excepting
490 the looparound phi nodes contained in one of the chains. */
491 gimple *single_nonlooparound_use (tree name);
493 /* Remove statement STMT, as well as the chain of assignments in that
494 it is used. */
495 void remove_stmt (gimple *stmt);
497 /* Perform the predictive commoning optimization for a chain CHAIN. */
498 void execute_pred_commoning_chain (chain_p chain, bitmap tmp_vars);
500 /* Returns the modify statement that uses NAME. */
501 gimple *find_use_stmt (tree *name);
503 /* If the operation used in STMT is associative and commutative, go
504 through the tree of the same operations and returns its root. */
505 gimple *find_associative_operation_root (gimple *stmt, unsigned *distance);
507 /* Returns the common statement in that NAME1 and NAME2 have a use. */
508 gimple *find_common_use_stmt (tree *name1, tree *name2);
510 /* Checks whether R1 and R2 are combined together using CODE, with the
511 result in RSLT_TYPE, in order R1 CODE R2 if SWAP is false and in order
512 R2 CODE R1 if it is true. */
513 bool combinable_refs_p (dref r1, dref r2, enum tree_code *code, bool *swap,
514 tree *rslt_type);
516 /* Reassociates the expression in that NAME1 and NAME2 are used so that
517 they are combined in a single statement, and returns this statement. */
518 gimple *reassociate_to_the_same_stmt (tree name1, tree name2);
520 /* Returns the statement that combines references R1 and R2. */
521 gimple *stmt_combining_refs (dref r1, dref r2);
523 /* Tries to combine chains CH1 and CH2 together. */
524 chain_p combine_chains (chain_p ch1, chain_p ch2);
527 /* Dumps data reference REF to FILE. */
529 extern void dump_dref (FILE *, dref);
530 void
531 dump_dref (FILE *file, dref ref)
533 if (ref->ref)
535 fprintf (file, " ");
536 print_generic_expr (file, DR_REF (ref->ref), TDF_SLIM);
537 fprintf (file, " (id %u%s)\n", ref->pos,
538 DR_IS_READ (ref->ref) ? "" : ", write");
540 fprintf (file, " offset ");
541 print_decs (ref->offset, file);
542 fprintf (file, "\n");
544 fprintf (file, " distance %u\n", ref->distance);
546 else
548 if (gimple_code (ref->stmt) == GIMPLE_PHI)
549 fprintf (file, " looparound ref\n");
550 else
551 fprintf (file, " combination ref\n");
552 fprintf (file, " in statement ");
553 print_gimple_stmt (file, ref->stmt, 0, TDF_SLIM);
554 fprintf (file, "\n");
555 fprintf (file, " distance %u\n", ref->distance);
560 /* Dumps CHAIN to FILE. */
562 extern void dump_chain (FILE *, chain_p);
563 void
564 dump_chain (FILE *file, chain_p chain)
566 dref a;
567 const char *chain_type;
568 unsigned i;
569 tree var;
571 switch (chain->type)
573 case CT_INVARIANT:
574 chain_type = "Load motion";
575 break;
577 case CT_LOAD:
578 chain_type = "Loads-only";
579 break;
581 case CT_STORE_LOAD:
582 chain_type = "Store-loads";
583 break;
585 case CT_STORE_STORE:
586 chain_type = "Store-stores";
587 break;
589 case CT_COMBINATION:
590 chain_type = "Combination";
591 break;
593 default:
594 gcc_unreachable ();
597 fprintf (file, "%s chain %p%s\n", chain_type, (void *) chain,
598 chain->combined ? " (combined)" : "");
599 if (chain->type != CT_INVARIANT)
600 fprintf (file, " max distance %u%s\n", chain->length,
601 chain->has_max_use_after ? "" : ", may reuse first");
603 if (chain->type == CT_COMBINATION)
605 fprintf (file, " equal to %p %s %p in type ",
606 (void *) chain->ch1, op_symbol_code (chain->op),
607 (void *) chain->ch2);
608 print_generic_expr (file, chain->rslt_type, TDF_SLIM);
609 fprintf (file, "\n");
612 if (chain->vars.exists ())
614 fprintf (file, " vars");
615 FOR_EACH_VEC_ELT (chain->vars, i, var)
617 fprintf (file, " ");
618 print_generic_expr (file, var, TDF_SLIM);
620 fprintf (file, "\n");
623 if (chain->inits.exists ())
625 fprintf (file, " inits");
626 FOR_EACH_VEC_ELT (chain->inits, i, var)
628 fprintf (file, " ");
629 print_generic_expr (file, var, TDF_SLIM);
631 fprintf (file, "\n");
634 fprintf (file, " references:\n");
635 FOR_EACH_VEC_ELT (chain->refs, i, a)
636 dump_dref (file, a);
638 fprintf (file, "\n");
641 /* Dumps CHAINS to FILE. */
643 void
644 dump_chains (FILE *file, const vec<chain_p> &chains)
646 chain_p chain;
647 unsigned i;
649 FOR_EACH_VEC_ELT (chains, i, chain)
650 dump_chain (file, chain);
653 /* Dumps COMP to FILE. */
655 extern void dump_component (FILE *, struct component *);
656 void
657 dump_component (FILE *file, struct component *comp)
659 dref a;
660 unsigned i;
662 fprintf (file, "Component%s:\n",
663 comp->comp_step == RS_INVARIANT ? " (invariant)" : "");
664 FOR_EACH_VEC_ELT (comp->refs, i, a)
665 dump_dref (file, a);
666 fprintf (file, "\n");
669 /* Dumps COMPS to FILE. */
671 extern void dump_components (FILE *, struct component *);
672 void
673 dump_components (FILE *file, struct component *comps)
675 struct component *comp;
677 for (comp = comps; comp; comp = comp->next)
678 dump_component (file, comp);
681 /* Frees a chain CHAIN. */
683 void
684 pcom_worker::release_chain (chain_p chain)
686 dref ref;
687 unsigned i;
689 if (chain == NULL)
690 return;
692 FOR_EACH_VEC_ELT (chain->refs, i, ref)
693 free (ref);
695 if (chain->init_seq)
696 gimple_seq_discard (chain->init_seq);
698 if (chain->fini_seq)
699 gimple_seq_discard (chain->fini_seq);
701 free (chain);
704 /* Frees CHAINS. */
706 void
707 pcom_worker::release_chains ()
709 unsigned i;
710 chain_p chain;
712 FOR_EACH_VEC_ELT (m_chains, i, chain)
713 release_chain (chain);
716 /* Frees list of components COMPS. */
718 static void
719 release_components (struct component *comps)
721 struct component *act, *next;
723 for (act = comps; act; act = next)
725 next = act->next;
726 XDELETE (act);
730 /* Finds a root of tree given by FATHERS containing A, and performs path
731 shortening. */
733 static unsigned
734 component_of (vec<unsigned> &fathers, unsigned a)
736 unsigned root, n;
738 for (root = a; root != fathers[root]; root = fathers[root])
739 continue;
741 for (; a != root; a = n)
743 n = fathers[a];
744 fathers[a] = root;
747 return root;
750 /* Join operation for DFU. FATHERS gives the tree, SIZES are sizes of the
751 components, A and B are components to merge. */
753 static void
754 merge_comps (vec<unsigned> &fathers, vec<unsigned> &sizes,
755 unsigned a, unsigned b)
757 unsigned ca = component_of (fathers, a);
758 unsigned cb = component_of (fathers, b);
760 if (ca == cb)
761 return;
763 if (sizes[ca] < sizes[cb])
765 sizes[cb] += sizes[ca];
766 fathers[ca] = cb;
768 else
770 sizes[ca] += sizes[cb];
771 fathers[cb] = ca;
775 /* Returns true if A is a reference that is suitable for predictive commoning
776 in the innermost loop that contains it. REF_STEP is set according to the
777 step of the reference A. */
779 static bool
780 suitable_reference_p (struct data_reference *a, enum ref_step_type *ref_step)
782 tree ref = DR_REF (a), step = DR_STEP (a);
784 if (!step
785 || TREE_THIS_VOLATILE (ref)
786 || !is_gimple_reg_type (TREE_TYPE (ref))
787 || tree_could_throw_p (ref))
788 return false;
790 if (integer_zerop (step))
791 *ref_step = RS_INVARIANT;
792 else if (integer_nonzerop (step))
793 *ref_step = RS_NONZERO;
794 else
795 *ref_step = RS_ANY;
797 return true;
800 /* Stores DR_OFFSET (DR) + DR_INIT (DR) to OFFSET. */
802 void
803 pcom_worker::aff_combination_dr_offset (struct data_reference *dr,
804 aff_tree *offset)
806 tree type = TREE_TYPE (DR_OFFSET (dr));
807 aff_tree delta;
809 tree_to_aff_combination_expand (DR_OFFSET (dr), type, offset, &m_cache);
810 aff_combination_const (&delta, type, wi::to_poly_widest (DR_INIT (dr)));
811 aff_combination_add (offset, &delta);
814 /* Determines number of iterations of the innermost enclosing loop before B
815 refers to exactly the same location as A and stores it to OFF. If A and
816 B do not have the same step, they never meet, or anything else fails,
817 returns false, otherwise returns true. Both A and B are assumed to
818 satisfy suitable_reference_p. */
820 bool
821 pcom_worker::determine_offset (struct data_reference *a,
822 struct data_reference *b, poly_widest_int *off)
824 aff_tree diff, baseb, step;
825 tree typea, typeb;
827 /* Check that both the references access the location in the same type. */
828 typea = TREE_TYPE (DR_REF (a));
829 typeb = TREE_TYPE (DR_REF (b));
830 if (!useless_type_conversion_p (typeb, typea))
831 return false;
833 /* Check whether the base address and the step of both references is the
834 same. */
835 if (!operand_equal_p (DR_STEP (a), DR_STEP (b), 0)
836 || !operand_equal_p (DR_BASE_ADDRESS (a), DR_BASE_ADDRESS (b), 0))
837 return false;
839 if (integer_zerop (DR_STEP (a)))
841 /* If the references have loop invariant address, check that they access
842 exactly the same location. */
843 *off = 0;
844 return (operand_equal_p (DR_OFFSET (a), DR_OFFSET (b), 0)
845 && operand_equal_p (DR_INIT (a), DR_INIT (b), 0));
848 /* Compare the offsets of the addresses, and check whether the difference
849 is a multiple of step. */
850 aff_combination_dr_offset (a, &diff);
851 aff_combination_dr_offset (b, &baseb);
852 aff_combination_scale (&baseb, -1);
853 aff_combination_add (&diff, &baseb);
855 tree_to_aff_combination_expand (DR_STEP (a), TREE_TYPE (DR_STEP (a)),
856 &step, &m_cache);
857 return aff_combination_constant_multiple_p (&diff, &step, off);
860 /* Returns the last basic block in LOOP for that we are sure that
861 it is executed whenever the loop is entered. */
863 static basic_block
864 last_always_executed_block (class loop *loop)
866 unsigned i;
867 auto_vec<edge> exits = get_loop_exit_edges (loop);
868 edge ex;
869 basic_block last = loop->latch;
871 FOR_EACH_VEC_ELT (exits, i, ex)
872 last = nearest_common_dominator (CDI_DOMINATORS, last, ex->src);
874 return last;
877 /* Splits dependence graph on DATAREFS described by DEPENDENCES to
878 components. */
880 struct component *
881 pcom_worker::split_data_refs_to_components ()
883 unsigned i, n = m_datarefs.length ();
884 unsigned ca, ia, ib, bad;
885 struct data_reference *dr, *dra, *drb;
886 struct data_dependence_relation *ddr;
887 struct component *comp_list = NULL, *comp;
888 dref dataref;
889 /* Don't do store elimination if loop has multiple exit edges. */
890 bool eliminate_store_p = single_exit (m_loop) != NULL;
891 basic_block last_always_executed = last_always_executed_block (m_loop);
892 auto_bitmap no_store_store_comps;
893 auto_vec<unsigned> comp_father (n + 1);
894 auto_vec<unsigned> comp_size (n + 1);
895 comp_father.quick_grow (n + 1);
896 comp_size.quick_grow (n + 1);
898 FOR_EACH_VEC_ELT (m_datarefs, i, dr)
900 if (!DR_REF (dr))
901 /* A fake reference for call or asm_expr that may clobber memory;
902 just fail. */
903 return NULL;
904 /* predcom pass isn't prepared to handle calls with data references. */
905 if (is_gimple_call (DR_STMT (dr)))
906 return NULL;
907 dr->aux = (void *) (size_t) i;
908 comp_father[i] = i;
909 comp_size[i] = 1;
912 /* A component reserved for the "bad" data references. */
913 comp_father[n] = n;
914 comp_size[n] = 1;
916 FOR_EACH_VEC_ELT (m_datarefs, i, dr)
918 enum ref_step_type dummy;
920 if (!suitable_reference_p (dr, &dummy))
922 ia = (unsigned) (size_t) dr->aux;
923 merge_comps (comp_father, comp_size, n, ia);
927 FOR_EACH_VEC_ELT (m_dependences, i, ddr)
929 poly_widest_int dummy_off;
931 if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
932 continue;
934 dra = DDR_A (ddr);
935 drb = DDR_B (ddr);
937 /* Don't do store elimination if there is any unknown dependence for
938 any store data reference. */
939 if ((DR_IS_WRITE (dra) || DR_IS_WRITE (drb))
940 && (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know
941 || DDR_NUM_DIST_VECTS (ddr) == 0))
942 eliminate_store_p = false;
944 ia = component_of (comp_father, (unsigned) (size_t) dra->aux);
945 ib = component_of (comp_father, (unsigned) (size_t) drb->aux);
946 if (ia == ib)
947 continue;
949 bad = component_of (comp_father, n);
951 /* If both A and B are reads, we may ignore unsuitable dependences. */
952 if (DR_IS_READ (dra) && DR_IS_READ (drb))
954 if (ia == bad || ib == bad
955 || !determine_offset (dra, drb, &dummy_off))
956 continue;
958 /* If A is read and B write or vice versa and there is unsuitable
959 dependence, instead of merging both components into a component
960 that will certainly not pass suitable_component_p, just put the
961 read into bad component, perhaps at least the write together with
962 all the other data refs in it's component will be optimizable. */
963 else if (DR_IS_READ (dra) && ib != bad)
965 if (ia == bad)
967 bitmap_set_bit (no_store_store_comps, ib);
968 continue;
970 else if (!determine_offset (dra, drb, &dummy_off))
972 bitmap_set_bit (no_store_store_comps, ib);
973 merge_comps (comp_father, comp_size, bad, ia);
974 continue;
977 else if (DR_IS_READ (drb) && ia != bad)
979 if (ib == bad)
981 bitmap_set_bit (no_store_store_comps, ia);
982 continue;
984 else if (!determine_offset (dra, drb, &dummy_off))
986 bitmap_set_bit (no_store_store_comps, ia);
987 merge_comps (comp_father, comp_size, bad, ib);
988 continue;
991 else if (DR_IS_WRITE (dra) && DR_IS_WRITE (drb)
992 && ia != bad && ib != bad
993 && !determine_offset (dra, drb, &dummy_off))
995 merge_comps (comp_father, comp_size, bad, ia);
996 merge_comps (comp_father, comp_size, bad, ib);
997 continue;
1000 merge_comps (comp_father, comp_size, ia, ib);
1003 if (eliminate_store_p)
1005 tree niters = number_of_latch_executions (m_loop);
1007 /* Don't do store elimination if niters info is unknown because stores
1008 in the last iteration can't be eliminated and we need to recover it
1009 after loop. */
1010 eliminate_store_p = (niters != NULL_TREE && niters != chrec_dont_know);
1013 auto_vec<struct component *> comps;
1014 comps.safe_grow_cleared (n, true);
1015 bad = component_of (comp_father, n);
1016 FOR_EACH_VEC_ELT (m_datarefs, i, dr)
1018 ia = (unsigned) (size_t) dr->aux;
1019 ca = component_of (comp_father, ia);
1020 if (ca == bad)
1021 continue;
1023 comp = comps[ca];
1024 if (!comp)
1026 comp = XCNEW (struct component);
1027 comp->refs.create (comp_size[ca]);
1028 comp->eliminate_store_p = eliminate_store_p;
1029 comps[ca] = comp;
1032 dataref = XCNEW (class dref_d);
1033 dataref->ref = dr;
1034 dataref->stmt = DR_STMT (dr);
1035 dataref->offset = 0;
1036 dataref->distance = 0;
1038 dataref->always_accessed
1039 = dominated_by_p (CDI_DOMINATORS, last_always_executed,
1040 gimple_bb (dataref->stmt));
1041 dataref->pos = comp->refs.length ();
1042 comp->refs.quick_push (dataref);
1045 if (eliminate_store_p)
1047 bitmap_iterator bi;
1048 EXECUTE_IF_SET_IN_BITMAP (no_store_store_comps, 0, ia, bi)
1050 ca = component_of (comp_father, ia);
1051 if (ca != bad)
1052 comps[ca]->eliminate_store_p = false;
1056 for (i = 0; i < n; i++)
1058 comp = comps[i];
1059 if (comp)
1061 comp->next = comp_list;
1062 comp_list = comp;
1065 return comp_list;
1068 /* Returns true if the component COMP satisfies the conditions
1069 described in 2) at the beginning of this file. */
1071 bool
1072 pcom_worker::suitable_component_p (struct component *comp)
1074 unsigned i;
1075 dref a, first;
1076 basic_block ba, bp = m_loop->header;
1077 bool ok, has_write = false;
1079 FOR_EACH_VEC_ELT (comp->refs, i, a)
1081 ba = gimple_bb (a->stmt);
1083 if (!just_once_each_iteration_p (m_loop, ba))
1084 return false;
1086 gcc_assert (dominated_by_p (CDI_DOMINATORS, ba, bp));
1087 bp = ba;
1089 if (DR_IS_WRITE (a->ref))
1090 has_write = true;
1093 first = comp->refs[0];
1094 ok = suitable_reference_p (first->ref, &comp->comp_step);
1095 gcc_assert (ok);
1096 first->offset = 0;
1098 for (i = 1; comp->refs.iterate (i, &a); i++)
1100 /* Polynomial offsets are no use, since we need to know the
1101 gap between iteration numbers at compile time. */
1102 poly_widest_int offset;
1103 if (!determine_offset (first->ref, a->ref, &offset)
1104 || !offset.is_constant (&a->offset))
1105 return false;
1107 enum ref_step_type a_step;
1108 gcc_checking_assert (suitable_reference_p (a->ref, &a_step)
1109 && a_step == comp->comp_step);
1112 /* If there is a write inside the component, we must know whether the
1113 step is nonzero or not -- we would not otherwise be able to recognize
1114 whether the value accessed by reads comes from the OFFSET-th iteration
1115 or the previous one. */
1116 if (has_write && comp->comp_step == RS_ANY)
1117 return false;
1119 return true;
1122 /* Check the conditions on references inside each of components COMPS,
1123 and remove the unsuitable components from the list. The new list
1124 of components is returned. The conditions are described in 2) at
1125 the beginning of this file. */
1127 struct component *
1128 pcom_worker::filter_suitable_components (struct component *comps)
1130 struct component **comp, *act;
1132 for (comp = &comps; *comp; )
1134 act = *comp;
1135 if (suitable_component_p (act))
1136 comp = &act->next;
1137 else
1139 dref ref;
1140 unsigned i;
1142 *comp = act->next;
1143 FOR_EACH_VEC_ELT (act->refs, i, ref)
1144 free (ref);
1145 XDELETE (act);
1149 return comps;
1152 /* Compares two drefs A and B by their offset and position. Callback for
1153 qsort. */
1155 static int
1156 order_drefs (const void *a, const void *b)
1158 const dref *const da = (const dref *) a;
1159 const dref *const db = (const dref *) b;
1160 int offcmp = wi::cmps ((*da)->offset, (*db)->offset);
1162 if (offcmp != 0)
1163 return offcmp;
1165 return (*da)->pos - (*db)->pos;
1168 /* Compares two drefs A and B by their position. Callback for qsort. */
1170 static int
1171 order_drefs_by_pos (const void *a, const void *b)
1173 const dref *const da = (const dref *) a;
1174 const dref *const db = (const dref *) b;
1176 return (*da)->pos - (*db)->pos;
1179 /* Returns root of the CHAIN. */
1181 static inline dref
1182 get_chain_root (chain_p chain)
1184 return chain->refs[0];
1187 /* Given CHAIN, returns the last write ref at DISTANCE, or NULL if it doesn't
1188 exist. */
1190 static inline dref
1191 get_chain_last_write_at (chain_p chain, unsigned distance)
1193 for (unsigned i = chain->refs.length (); i > 0; i--)
1194 if (DR_IS_WRITE (chain->refs[i - 1]->ref)
1195 && distance == chain->refs[i - 1]->distance)
1196 return chain->refs[i - 1];
1198 return NULL;
1201 /* Given CHAIN, returns the last write ref with the same distance before load
1202 at index LOAD_IDX, or NULL if it doesn't exist. */
1204 static inline dref
1205 get_chain_last_write_before_load (chain_p chain, unsigned load_idx)
1207 gcc_assert (load_idx < chain->refs.length ());
1209 unsigned distance = chain->refs[load_idx]->distance;
1211 for (unsigned i = load_idx; i > 0; i--)
1212 if (DR_IS_WRITE (chain->refs[i - 1]->ref)
1213 && distance == chain->refs[i - 1]->distance)
1214 return chain->refs[i - 1];
1216 return NULL;
1219 /* Adds REF to the chain CHAIN. */
1221 static void
1222 add_ref_to_chain (chain_p chain, dref ref)
1224 dref root = get_chain_root (chain);
1226 gcc_assert (wi::les_p (root->offset, ref->offset));
1227 widest_int dist = ref->offset - root->offset;
1228 gcc_assert (wi::fits_uhwi_p (dist));
1230 chain->refs.safe_push (ref);
1232 ref->distance = dist.to_uhwi ();
1234 if (ref->distance >= chain->length)
1236 chain->length = ref->distance;
1237 chain->has_max_use_after = false;
1240 /* Promote this chain to CT_STORE_STORE if it has multiple stores. */
1241 if (DR_IS_WRITE (ref->ref))
1242 chain->type = CT_STORE_STORE;
1244 /* Don't set the flag for store-store chain since there is no use. */
1245 if (chain->type != CT_STORE_STORE
1246 && ref->distance == chain->length
1247 && ref->pos > root->pos)
1248 chain->has_max_use_after = true;
1250 chain->all_always_accessed &= ref->always_accessed;
1253 /* Returns the chain for invariant component COMP. */
1255 static chain_p
1256 make_invariant_chain (struct component *comp)
1258 chain_p chain = XCNEW (struct chain);
1259 unsigned i;
1260 dref ref;
1262 chain->type = CT_INVARIANT;
1264 chain->all_always_accessed = true;
1266 FOR_EACH_VEC_ELT (comp->refs, i, ref)
1268 chain->refs.safe_push (ref);
1269 chain->all_always_accessed &= ref->always_accessed;
1272 chain->inits = vNULL;
1273 chain->finis = vNULL;
1275 return chain;
1278 /* Make a new chain of type TYPE rooted at REF. */
1280 static chain_p
1281 make_rooted_chain (dref ref, enum chain_type type)
1283 chain_p chain = XCNEW (struct chain);
1285 chain->type = type;
1286 chain->refs.safe_push (ref);
1287 chain->all_always_accessed = ref->always_accessed;
1288 ref->distance = 0;
1290 chain->inits = vNULL;
1291 chain->finis = vNULL;
1293 return chain;
1296 /* Returns true if CHAIN is not trivial. */
1298 static bool
1299 nontrivial_chain_p (chain_p chain)
1301 return chain != NULL && chain->refs.length () > 1;
1304 /* Returns the ssa name that contains the value of REF, or NULL_TREE if there
1305 is no such name. */
1307 static tree
1308 name_for_ref (dref ref)
1310 tree name;
1312 if (is_gimple_assign (ref->stmt))
1314 if (!ref->ref || DR_IS_READ (ref->ref))
1315 name = gimple_assign_lhs (ref->stmt);
1316 else
1317 name = gimple_assign_rhs1 (ref->stmt);
1319 else
1320 name = PHI_RESULT (ref->stmt);
1322 return (TREE_CODE (name) == SSA_NAME ? name : NULL_TREE);
1325 /* Returns true if REF is a valid initializer for ROOT with given DISTANCE (in
1326 iterations of the innermost enclosing loop). */
1328 bool
1329 pcom_worker::valid_initializer_p (struct data_reference *ref, unsigned distance,
1330 struct data_reference *root)
1332 aff_tree diff, base, step;
1333 poly_widest_int off;
1335 /* Both REF and ROOT must be accessing the same object. */
1336 if (!operand_equal_p (DR_BASE_ADDRESS (ref), DR_BASE_ADDRESS (root), 0))
1337 return false;
1339 /* The initializer is defined outside of loop, hence its address must be
1340 invariant inside the loop. */
1341 gcc_assert (integer_zerop (DR_STEP (ref)));
1343 /* If the address of the reference is invariant, initializer must access
1344 exactly the same location. */
1345 if (integer_zerop (DR_STEP (root)))
1346 return (operand_equal_p (DR_OFFSET (ref), DR_OFFSET (root), 0)
1347 && operand_equal_p (DR_INIT (ref), DR_INIT (root), 0));
1349 /* Verify that this index of REF is equal to the root's index at
1350 -DISTANCE-th iteration. */
1351 aff_combination_dr_offset (root, &diff);
1352 aff_combination_dr_offset (ref, &base);
1353 aff_combination_scale (&base, -1);
1354 aff_combination_add (&diff, &base);
1356 tree_to_aff_combination_expand (DR_STEP (root), TREE_TYPE (DR_STEP (root)),
1357 &step, &m_cache);
1358 if (!aff_combination_constant_multiple_p (&diff, &step, &off))
1359 return false;
1361 if (maybe_ne (off, distance))
1362 return false;
1364 return true;
1367 /* Finds looparound phi node of loop that copies the value of REF, and if its
1368 initial value is correct (equal to initial value of REF shifted by one
1369 iteration), returns the phi node. Otherwise, NULL_TREE is returned. ROOT
1370 is the root of the current chain. */
1372 gphi *
1373 pcom_worker::find_looparound_phi (dref ref, dref root)
1375 tree name, init, init_ref;
1376 gphi *phi = NULL;
1377 gimple *init_stmt;
1378 edge latch = loop_latch_edge (m_loop);
1379 struct data_reference init_dr;
1380 gphi_iterator psi;
1382 if (is_gimple_assign (ref->stmt))
1384 if (DR_IS_READ (ref->ref))
1385 name = gimple_assign_lhs (ref->stmt);
1386 else
1387 name = gimple_assign_rhs1 (ref->stmt);
1389 else
1390 name = PHI_RESULT (ref->stmt);
1391 if (!name)
1392 return NULL;
1394 for (psi = gsi_start_phis (m_loop->header); !gsi_end_p (psi); gsi_next (&psi))
1396 phi = psi.phi ();
1397 if (PHI_ARG_DEF_FROM_EDGE (phi, latch) == name)
1398 break;
1401 if (gsi_end_p (psi))
1402 return NULL;
1404 init = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (m_loop));
1405 if (TREE_CODE (init) != SSA_NAME)
1406 return NULL;
1407 init_stmt = SSA_NAME_DEF_STMT (init);
1408 if (gimple_code (init_stmt) != GIMPLE_ASSIGN)
1409 return NULL;
1410 gcc_assert (gimple_assign_lhs (init_stmt) == init);
1412 init_ref = gimple_assign_rhs1 (init_stmt);
1413 if (!REFERENCE_CLASS_P (init_ref)
1414 && !DECL_P (init_ref))
1415 return NULL;
1417 /* Analyze the behavior of INIT_REF with respect to LOOP (innermost
1418 loop enclosing PHI). */
1419 memset (&init_dr, 0, sizeof (struct data_reference));
1420 DR_REF (&init_dr) = init_ref;
1421 DR_STMT (&init_dr) = phi;
1422 if (!dr_analyze_innermost (&DR_INNERMOST (&init_dr), init_ref, m_loop,
1423 init_stmt))
1424 return NULL;
1426 if (!valid_initializer_p (&init_dr, ref->distance + 1, root->ref))
1427 return NULL;
1429 return phi;
1432 /* Adds a reference for the looparound copy of REF in PHI to CHAIN. */
1434 static void
1435 insert_looparound_copy (chain_p chain, dref ref, gphi *phi)
1437 dref nw = XCNEW (class dref_d), aref;
1438 unsigned i;
1440 nw->stmt = phi;
1441 nw->distance = ref->distance + 1;
1442 nw->always_accessed = 1;
1444 FOR_EACH_VEC_ELT (chain->refs, i, aref)
1445 if (aref->distance >= nw->distance)
1446 break;
1447 chain->refs.safe_insert (i, nw);
1449 if (nw->distance > chain->length)
1451 chain->length = nw->distance;
1452 chain->has_max_use_after = false;
1456 /* For references in CHAIN that are copied around the loop (created previously
1457 by PRE, or by user), add the results of such copies to the chain. This
1458 enables us to remove the copies by unrolling, and may need less registers
1459 (also, it may allow us to combine chains together). */
1461 void
1462 pcom_worker::add_looparound_copies (chain_p chain)
1464 unsigned i;
1465 dref ref, root = get_chain_root (chain);
1466 gphi *phi;
1468 if (chain->type == CT_STORE_STORE)
1469 return;
1471 FOR_EACH_VEC_ELT (chain->refs, i, ref)
1473 phi = find_looparound_phi (ref, root);
1474 if (!phi)
1475 continue;
1477 bitmap_set_bit (m_looparound_phis, SSA_NAME_VERSION (PHI_RESULT (phi)));
1478 insert_looparound_copy (chain, ref, phi);
1482 /* Find roots of the values and determine distances in the component COMP.
1483 The references are redistributed into chains. */
1485 void
1486 pcom_worker::determine_roots_comp (struct component *comp)
1488 unsigned i;
1489 dref a;
1490 chain_p chain = NULL;
1491 widest_int last_ofs = 0;
1492 enum chain_type type;
1494 /* Invariants are handled specially. */
1495 if (comp->comp_step == RS_INVARIANT)
1497 chain = make_invariant_chain (comp);
1498 m_chains.safe_push (chain);
1499 return;
1502 /* Trivial component. */
1503 if (comp->refs.length () <= 1)
1505 if (comp->refs.length () == 1)
1507 free (comp->refs[0]);
1508 comp->refs.truncate (0);
1510 return;
1513 comp->refs.qsort (order_drefs);
1515 /* For Store-Store chain, we only support load if it is dominated by a
1516 store statement in the same iteration of loop. */
1517 if (comp->eliminate_store_p)
1518 for (a = NULL, i = 0; i < comp->refs.length (); i++)
1520 if (DR_IS_WRITE (comp->refs[i]->ref))
1521 a = comp->refs[i];
1522 else if (a == NULL || a->offset != comp->refs[i]->offset)
1524 /* If there is load that is not dominated by a store in the
1525 same iteration of loop, clear the flag so no Store-Store
1526 chain is generated for this component. */
1527 comp->eliminate_store_p = false;
1528 break;
1532 /* Determine roots and create chains for components. */
1533 FOR_EACH_VEC_ELT (comp->refs, i, a)
1535 if (!chain
1536 || (chain->type == CT_LOAD && DR_IS_WRITE (a->ref))
1537 || (!comp->eliminate_store_p && DR_IS_WRITE (a->ref))
1538 || wi::leu_p (MAX_DISTANCE, a->offset - last_ofs))
1540 if (nontrivial_chain_p (chain))
1542 add_looparound_copies (chain);
1543 m_chains.safe_push (chain);
1545 else
1546 release_chain (chain);
1548 /* Determine type of the chain. If the root reference is a load,
1549 this can only be a CT_LOAD chain; other chains are intialized
1550 to CT_STORE_LOAD and might be promoted to CT_STORE_STORE when
1551 new reference is added. */
1552 type = DR_IS_READ (a->ref) ? CT_LOAD : CT_STORE_LOAD;
1553 chain = make_rooted_chain (a, type);
1554 last_ofs = a->offset;
1555 continue;
1558 add_ref_to_chain (chain, a);
1561 if (nontrivial_chain_p (chain))
1563 add_looparound_copies (chain);
1564 m_chains.safe_push (chain);
1566 else
1567 release_chain (chain);
1570 /* Find roots of the values and determine distances in components COMPS, and
1571 separates the references to chains. */
1573 void
1574 pcom_worker::determine_roots (struct component *comps)
1576 struct component *comp;
1578 for (comp = comps; comp; comp = comp->next)
1579 determine_roots_comp (comp);
1582 /* Replace the reference in statement STMT with temporary variable
1583 NEW_TREE. If SET is true, NEW_TREE is instead initialized to the value of
1584 the reference in the statement. IN_LHS is true if the reference
1585 is in the lhs of STMT, false if it is in rhs. */
1587 static void
1588 replace_ref_with (gimple *stmt, tree new_tree, bool set, bool in_lhs)
1590 tree val;
1591 gassign *new_stmt;
1592 gimple_stmt_iterator bsi, psi;
1594 if (gimple_code (stmt) == GIMPLE_PHI)
1596 gcc_assert (!in_lhs && !set);
1598 val = PHI_RESULT (stmt);
1599 bsi = gsi_after_labels (gimple_bb (stmt));
1600 psi = gsi_for_stmt (stmt);
1601 remove_phi_node (&psi, false);
1603 /* Turn the phi node into GIMPLE_ASSIGN. */
1604 new_stmt = gimple_build_assign (val, new_tree);
1605 gsi_insert_before (&bsi, new_stmt, GSI_NEW_STMT);
1606 return;
1609 /* Since the reference is of gimple_reg type, it should only
1610 appear as lhs or rhs of modify statement. */
1611 gcc_assert (is_gimple_assign (stmt));
1613 bsi = gsi_for_stmt (stmt);
1615 /* If we do not need to initialize NEW_TREE, just replace the use of OLD. */
1616 if (!set)
1618 gcc_assert (!in_lhs);
1619 gimple_assign_set_rhs_from_tree (&bsi, new_tree);
1620 stmt = gsi_stmt (bsi);
1621 update_stmt (stmt);
1622 return;
1625 if (in_lhs)
1627 /* We have statement
1629 OLD = VAL
1631 If OLD is a memory reference, then VAL is gimple_val, and we transform
1632 this to
1634 OLD = VAL
1635 NEW = VAL
1637 Otherwise, we are replacing a combination chain,
1638 VAL is the expression that performs the combination, and OLD is an
1639 SSA name. In this case, we transform the assignment to
1641 OLD = VAL
1642 NEW = OLD
1646 val = gimple_assign_lhs (stmt);
1647 if (TREE_CODE (val) != SSA_NAME)
1649 val = gimple_assign_rhs1 (stmt);
1650 gcc_assert (gimple_assign_single_p (stmt));
1651 if (TREE_CLOBBER_P (val))
1652 val = get_or_create_ssa_default_def (cfun, SSA_NAME_VAR (new_tree));
1653 else
1654 gcc_assert (gimple_assign_copy_p (stmt));
1657 else
1659 /* VAL = OLD
1661 is transformed to
1663 VAL = OLD
1664 NEW = VAL */
1666 val = gimple_assign_lhs (stmt);
1669 new_stmt = gimple_build_assign (new_tree, unshare_expr (val));
1670 gsi_insert_after (&bsi, new_stmt, GSI_NEW_STMT);
1673 /* Returns a memory reference to DR in the (NITERS + ITER)-th iteration
1674 of the loop it was analyzed in. Append init stmts to STMTS. */
1676 static tree
1677 ref_at_iteration (data_reference_p dr, int iter,
1678 gimple_seq *stmts, tree niters = NULL_TREE)
1680 tree off = DR_OFFSET (dr);
1681 tree coff = DR_INIT (dr);
1682 tree ref = DR_REF (dr);
1683 enum tree_code ref_code = ERROR_MARK;
1684 tree ref_type = NULL_TREE;
1685 tree ref_op1 = NULL_TREE;
1686 tree ref_op2 = NULL_TREE;
1687 tree new_offset;
1689 if (iter != 0)
1691 new_offset = size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter));
1692 if (TREE_CODE (new_offset) == INTEGER_CST)
1693 coff = size_binop (PLUS_EXPR, coff, new_offset);
1694 else
1695 off = size_binop (PLUS_EXPR, off, new_offset);
1698 if (niters != NULL_TREE)
1700 niters = fold_convert (ssizetype, niters);
1701 new_offset = size_binop (MULT_EXPR, DR_STEP (dr), niters);
1702 if (TREE_CODE (niters) == INTEGER_CST)
1703 coff = size_binop (PLUS_EXPR, coff, new_offset);
1704 else
1705 off = size_binop (PLUS_EXPR, off, new_offset);
1708 /* While data-ref analysis punts on bit offsets it still handles
1709 bitfield accesses at byte boundaries. Cope with that. Note that
1710 if the bitfield object also starts at a byte-boundary we can simply
1711 replicate the COMPONENT_REF, but we have to subtract the component's
1712 byte-offset from the MEM_REF address first.
1713 Otherwise we simply build a BIT_FIELD_REF knowing that the bits
1714 start at offset zero. */
1715 if (TREE_CODE (ref) == COMPONENT_REF
1716 && DECL_BIT_FIELD (TREE_OPERAND (ref, 1)))
1718 unsigned HOST_WIDE_INT boff;
1719 tree field = TREE_OPERAND (ref, 1);
1720 tree offset = component_ref_field_offset (ref);
1721 ref_type = TREE_TYPE (ref);
1722 boff = tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field));
1723 /* This can occur in Ada. See the comment in get_bit_range. */
1724 if (boff % BITS_PER_UNIT != 0
1725 || !tree_fits_uhwi_p (offset))
1727 ref_code = BIT_FIELD_REF;
1728 ref_op1 = DECL_SIZE (field);
1729 ref_op2 = bitsize_zero_node;
1731 else
1733 boff >>= LOG2_BITS_PER_UNIT;
1734 boff += tree_to_uhwi (offset);
1735 coff = size_binop (MINUS_EXPR, coff, ssize_int (boff));
1736 ref_code = COMPONENT_REF;
1737 ref_op1 = field;
1738 ref_op2 = TREE_OPERAND (ref, 2);
1739 ref = TREE_OPERAND (ref, 0);
1742 tree addr = fold_build_pointer_plus (DR_BASE_ADDRESS (dr), off);
1743 addr = force_gimple_operand_1 (unshare_expr (addr), stmts,
1744 is_gimple_mem_ref_addr, NULL_TREE);
1745 tree alias_ptr = fold_convert (reference_alias_ptr_type (ref), coff);
1746 tree type = build_aligned_type (TREE_TYPE (ref),
1747 get_object_alignment (ref));
1748 ref = build2 (MEM_REF, type, addr, alias_ptr);
1749 if (ref_type)
1750 ref = build3 (ref_code, ref_type, ref, ref_op1, ref_op2);
1751 return ref;
1754 /* Get the initialization expression for the INDEX-th temporary variable
1755 of CHAIN. */
1757 static tree
1758 get_init_expr (chain_p chain, unsigned index)
1760 if (chain->type == CT_COMBINATION)
1762 tree e1 = get_init_expr (chain->ch1, index);
1763 tree e2 = get_init_expr (chain->ch2, index);
1765 return fold_build2 (chain->op, chain->rslt_type, e1, e2);
1767 else
1768 return chain->inits[index];
1771 /* Returns a new temporary variable used for the I-th variable carrying
1772 value of REF. The variable's uid is marked in TMP_VARS. */
1774 static tree
1775 predcom_tmp_var (tree ref, unsigned i, bitmap tmp_vars)
1777 tree type = TREE_TYPE (ref);
1778 /* We never access the components of the temporary variable in predictive
1779 commoning. */
1780 tree var = create_tmp_reg (type, get_lsm_tmp_name (ref, i));
1781 bitmap_set_bit (tmp_vars, DECL_UID (var));
1782 return var;
1785 /* Creates the variables for CHAIN, as well as phi nodes for them and
1786 initialization on entry to LOOP. Uids of the newly created
1787 temporary variables are marked in TMP_VARS. */
1789 static void
1790 initialize_root_vars (class loop *loop, chain_p chain, bitmap tmp_vars)
1792 unsigned i;
1793 unsigned n = chain->length;
1794 dref root = get_chain_root (chain);
1795 bool reuse_first = !chain->has_max_use_after;
1796 tree ref, init, var, next;
1797 gphi *phi;
1798 gimple_seq stmts;
1799 edge entry = loop_preheader_edge (loop), latch = loop_latch_edge (loop);
1801 /* If N == 0, then all the references are within the single iteration. And
1802 since this is an nonempty chain, reuse_first cannot be true. */
1803 gcc_assert (n > 0 || !reuse_first);
1805 chain->vars.create (n + 1);
1807 if (chain->type == CT_COMBINATION)
1808 ref = gimple_assign_lhs (root->stmt);
1809 else
1810 ref = DR_REF (root->ref);
1812 for (i = 0; i < n + (reuse_first ? 0 : 1); i++)
1814 var = predcom_tmp_var (ref, i, tmp_vars);
1815 chain->vars.quick_push (var);
1817 if (reuse_first)
1818 chain->vars.quick_push (chain->vars[0]);
1820 FOR_EACH_VEC_ELT (chain->vars, i, var)
1821 chain->vars[i] = make_ssa_name (var);
1823 for (i = 0; i < n; i++)
1825 var = chain->vars[i];
1826 next = chain->vars[i + 1];
1827 init = get_init_expr (chain, i);
1829 init = force_gimple_operand (init, &stmts, true, NULL_TREE);
1830 if (stmts)
1831 gsi_insert_seq_on_edge_immediate (entry, stmts);
1833 phi = create_phi_node (var, loop->header);
1834 add_phi_arg (phi, init, entry, UNKNOWN_LOCATION);
1835 add_phi_arg (phi, next, latch, UNKNOWN_LOCATION);
1839 /* For inter-iteration store elimination CHAIN in LOOP, returns true if
1840 all stores to be eliminated store loop invariant values into memory.
1841 In this case, we can use these invariant values directly after LOOP. */
1843 static bool
1844 is_inv_store_elimination_chain (class loop *loop, chain_p chain)
1846 if (chain->length == 0 || chain->type != CT_STORE_STORE)
1847 return false;
1849 gcc_assert (!chain->has_max_use_after);
1851 /* If loop iterates for unknown times or fewer times than chain->length,
1852 we still need to setup root variable and propagate it with PHI node. */
1853 tree niters = number_of_latch_executions (loop);
1854 if (TREE_CODE (niters) != INTEGER_CST
1855 || wi::leu_p (wi::to_wide (niters), chain->length))
1856 return false;
1858 /* Check stores in chain for elimination if they only store loop invariant
1859 values. */
1860 for (unsigned i = 0; i < chain->length; i++)
1862 dref a = get_chain_last_write_at (chain, i);
1863 if (a == NULL)
1864 continue;
1866 gimple *def_stmt, *stmt = a->stmt;
1867 if (!gimple_assign_single_p (stmt))
1868 return false;
1870 tree val = gimple_assign_rhs1 (stmt);
1871 if (TREE_CLOBBER_P (val))
1872 return false;
1874 if (CONSTANT_CLASS_P (val))
1875 continue;
1877 if (TREE_CODE (val) != SSA_NAME)
1878 return false;
1880 def_stmt = SSA_NAME_DEF_STMT (val);
1881 if (gimple_nop_p (def_stmt))
1882 continue;
1884 if (flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
1885 return false;
1887 return true;
1890 /* Creates root variables for store elimination CHAIN in which stores for
1891 elimination only store loop invariant values. In this case, we neither
1892 need to load root variables before loop nor propagate it with PHI nodes. */
1894 static void
1895 initialize_root_vars_store_elim_1 (chain_p chain)
1897 tree var;
1898 unsigned i, n = chain->length;
1900 chain->vars.create (n);
1901 chain->vars.safe_grow_cleared (n, true);
1903 /* Initialize root value for eliminated stores at each distance. */
1904 for (i = 0; i < n; i++)
1906 dref a = get_chain_last_write_at (chain, i);
1907 if (a == NULL)
1908 continue;
1910 var = gimple_assign_rhs1 (a->stmt);
1911 chain->vars[a->distance] = var;
1914 /* We don't propagate values with PHI nodes, so manually propagate value
1915 to bubble positions. */
1916 var = chain->vars[0];
1917 for (i = 1; i < n; i++)
1919 if (chain->vars[i] != NULL_TREE)
1921 var = chain->vars[i];
1922 continue;
1924 chain->vars[i] = var;
1927 /* Revert the vector. */
1928 for (i = 0; i < n / 2; i++)
1929 std::swap (chain->vars[i], chain->vars[n - i - 1]);
1932 /* Creates root variables for store elimination CHAIN in which stores for
1933 elimination store loop variant values. In this case, we may need to
1934 load root variables before LOOP and propagate it with PHI nodes. Uids
1935 of the newly created root variables are marked in TMP_VARS. */
1937 static void
1938 initialize_root_vars_store_elim_2 (class loop *loop,
1939 chain_p chain, bitmap tmp_vars)
1941 unsigned i, n = chain->length;
1942 tree ref, init, var, next, val, phi_result;
1943 gimple *stmt;
1944 gimple_seq stmts;
1946 chain->vars.create (n);
1948 ref = DR_REF (get_chain_root (chain)->ref);
1949 for (i = 0; i < n; i++)
1951 var = predcom_tmp_var (ref, i, tmp_vars);
1952 chain->vars.quick_push (var);
1955 FOR_EACH_VEC_ELT (chain->vars, i, var)
1956 chain->vars[i] = make_ssa_name (var);
1958 /* Root values are either rhs operand of stores to be eliminated, or
1959 loaded from memory before loop. */
1960 auto_vec<tree> vtemps;
1961 vtemps.safe_grow_cleared (n, true);
1962 for (i = 0; i < n; i++)
1964 init = get_init_expr (chain, i);
1965 if (init == NULL_TREE)
1967 /* Root value is rhs operand of the store to be eliminated if
1968 it isn't loaded from memory before loop. */
1969 dref a = get_chain_last_write_at (chain, i);
1970 val = gimple_assign_rhs1 (a->stmt);
1971 if (TREE_CLOBBER_P (val))
1973 val = get_or_create_ssa_default_def (cfun, SSA_NAME_VAR (var));
1974 gimple_assign_set_rhs1 (a->stmt, val);
1977 vtemps[n - i - 1] = val;
1979 else
1981 edge latch = loop_latch_edge (loop);
1982 edge entry = loop_preheader_edge (loop);
1984 /* Root value is loaded from memory before loop, we also need
1985 to add PHI nodes to propagate the value across iterations. */
1986 init = force_gimple_operand (init, &stmts, true, NULL_TREE);
1987 if (stmts)
1988 gsi_insert_seq_on_edge_immediate (entry, stmts);
1990 next = chain->vars[n - i];
1991 phi_result = copy_ssa_name (next);
1992 gphi *phi = create_phi_node (phi_result, loop->header);
1993 add_phi_arg (phi, init, entry, UNKNOWN_LOCATION);
1994 add_phi_arg (phi, next, latch, UNKNOWN_LOCATION);
1995 vtemps[n - i - 1] = phi_result;
1999 /* Find the insertion position. */
2000 dref last = get_chain_root (chain);
2001 for (i = 0; i < chain->refs.length (); i++)
2003 if (chain->refs[i]->pos > last->pos)
2004 last = chain->refs[i];
2007 gimple_stmt_iterator gsi = gsi_for_stmt (last->stmt);
2009 /* Insert statements copying root value to root variable. */
2010 for (i = 0; i < n; i++)
2012 var = chain->vars[i];
2013 val = vtemps[i];
2014 stmt = gimple_build_assign (var, val);
2015 gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
2019 /* Generates stores for CHAIN's eliminated stores in LOOP's last
2020 (CHAIN->length - 1) iterations. */
2022 static void
2023 finalize_eliminated_stores (class loop *loop, chain_p chain)
2025 unsigned i, n = chain->length;
2027 for (i = 0; i < n; i++)
2029 tree var = chain->vars[i];
2030 tree fini = chain->finis[n - i - 1];
2031 gimple *stmt = gimple_build_assign (fini, var);
2033 gimple_seq_add_stmt_without_update (&chain->fini_seq, stmt);
2036 if (chain->fini_seq)
2038 gsi_insert_seq_on_edge_immediate (single_exit (loop), chain->fini_seq);
2039 chain->fini_seq = NULL;
2043 /* Initializes a variable for load motion for ROOT and prepares phi nodes and
2044 initialization on entry to LOOP if necessary. The ssa name for the variable
2045 is stored in VARS. If WRITTEN is true, also a phi node to copy its value
2046 around the loop is created. Uid of the newly created temporary variable
2047 is marked in TMP_VARS. INITS is the list containing the (single)
2048 initializer. */
2050 static void
2051 initialize_root_vars_lm (class loop *loop, dref root, bool written,
2052 vec<tree> *vars, const vec<tree> &inits,
2053 bitmap tmp_vars)
2055 unsigned i;
2056 tree ref = DR_REF (root->ref), init, var, next;
2057 gimple_seq stmts;
2058 gphi *phi;
2059 edge entry = loop_preheader_edge (loop), latch = loop_latch_edge (loop);
2061 /* Find the initializer for the variable, and check that it cannot
2062 trap. */
2063 init = inits[0];
2065 vars->create (written ? 2 : 1);
2066 var = predcom_tmp_var (ref, 0, tmp_vars);
2067 vars->quick_push (var);
2068 if (written)
2069 vars->quick_push ((*vars)[0]);
2071 FOR_EACH_VEC_ELT (*vars, i, var)
2072 (*vars)[i] = make_ssa_name (var);
2074 var = (*vars)[0];
2076 init = force_gimple_operand (init, &stmts, written, NULL_TREE);
2077 if (stmts)
2078 gsi_insert_seq_on_edge_immediate (entry, stmts);
2080 if (written)
2082 next = (*vars)[1];
2083 phi = create_phi_node (var, loop->header);
2084 add_phi_arg (phi, init, entry, UNKNOWN_LOCATION);
2085 add_phi_arg (phi, next, latch, UNKNOWN_LOCATION);
2087 else
2089 gassign *init_stmt = gimple_build_assign (var, init);
2090 gsi_insert_on_edge_immediate (entry, init_stmt);
2095 /* Execute load motion for references in chain CHAIN. Uids of the newly
2096 created temporary variables are marked in TMP_VARS. */
2098 static void
2099 execute_load_motion (class loop *loop, chain_p chain, bitmap tmp_vars)
2101 auto_vec<tree> vars;
2102 dref a;
2103 unsigned n_writes = 0, ridx, i;
2104 tree var;
2106 gcc_assert (chain->type == CT_INVARIANT);
2107 gcc_assert (!chain->combined);
2108 FOR_EACH_VEC_ELT (chain->refs, i, a)
2109 if (DR_IS_WRITE (a->ref))
2110 n_writes++;
2112 /* If there are no reads in the loop, there is nothing to do. */
2113 if (n_writes == chain->refs.length ())
2114 return;
2116 initialize_root_vars_lm (loop, get_chain_root (chain), n_writes > 0,
2117 &vars, chain->inits, tmp_vars);
2119 ridx = 0;
2120 FOR_EACH_VEC_ELT (chain->refs, i, a)
2122 bool is_read = DR_IS_READ (a->ref);
2124 if (DR_IS_WRITE (a->ref))
2126 n_writes--;
2127 if (n_writes)
2129 var = vars[0];
2130 var = make_ssa_name (SSA_NAME_VAR (var));
2131 vars[0] = var;
2133 else
2134 ridx = 1;
2137 replace_ref_with (a->stmt, vars[ridx],
2138 !is_read, !is_read);
2142 /* Returns the single statement in that NAME is used, excepting
2143 the looparound phi nodes contained in one of the chains. If there is no
2144 such statement, or more statements, NULL is returned. */
2146 gimple *
2147 pcom_worker::single_nonlooparound_use (tree name)
2149 use_operand_p use;
2150 imm_use_iterator it;
2151 gimple *stmt, *ret = NULL;
2153 FOR_EACH_IMM_USE_FAST (use, it, name)
2155 stmt = USE_STMT (use);
2157 if (gimple_code (stmt) == GIMPLE_PHI)
2159 /* Ignore uses in looparound phi nodes. Uses in other phi nodes
2160 could not be processed anyway, so just fail for them. */
2161 if (bitmap_bit_p (m_looparound_phis,
2162 SSA_NAME_VERSION (PHI_RESULT (stmt))))
2163 continue;
2165 return NULL;
2167 else if (is_gimple_debug (stmt))
2168 continue;
2169 else if (ret != NULL)
2170 return NULL;
2171 else
2172 ret = stmt;
2175 return ret;
2178 /* Remove statement STMT, as well as the chain of assignments in that it is
2179 used. */
2181 void
2182 pcom_worker::remove_stmt (gimple *stmt)
2184 tree name;
2185 gimple *next;
2186 gimple_stmt_iterator psi;
2188 if (gimple_code (stmt) == GIMPLE_PHI)
2190 name = PHI_RESULT (stmt);
2191 next = single_nonlooparound_use (name);
2192 reset_debug_uses (stmt);
2193 psi = gsi_for_stmt (stmt);
2194 remove_phi_node (&psi, true);
2196 if (!next
2197 || !gimple_assign_ssa_name_copy_p (next)
2198 || gimple_assign_rhs1 (next) != name)
2199 return;
2201 stmt = next;
2204 while (1)
2206 gimple_stmt_iterator bsi;
2208 bsi = gsi_for_stmt (stmt);
2210 name = gimple_assign_lhs (stmt);
2211 if (TREE_CODE (name) == SSA_NAME)
2213 next = single_nonlooparound_use (name);
2214 reset_debug_uses (stmt);
2216 else
2218 /* This is a store to be eliminated. */
2219 gcc_assert (gimple_vdef (stmt) != NULL);
2220 next = NULL;
2223 unlink_stmt_vdef (stmt);
2224 gsi_remove (&bsi, true);
2225 release_defs (stmt);
2227 if (!next
2228 || !gimple_assign_ssa_name_copy_p (next)
2229 || gimple_assign_rhs1 (next) != name)
2230 return;
2232 stmt = next;
2236 /* Perform the predictive commoning optimization for a chain CHAIN.
2237 Uids of the newly created temporary variables are marked in TMP_VARS.*/
2239 void
2240 pcom_worker::execute_pred_commoning_chain (chain_p chain,
2241 bitmap tmp_vars)
2243 unsigned i;
2244 dref a;
2245 tree var;
2246 bool in_lhs;
2248 if (chain->combined)
2250 /* For combined chains, just remove the statements that are used to
2251 compute the values of the expression (except for the root one).
2252 We delay this until after all chains are processed. */
2254 else if (chain->type == CT_STORE_STORE)
2256 if (chain->length > 0)
2258 if (chain->inv_store_elimination)
2260 /* If dead stores in this chain only store loop invariant
2261 values, we can simply record the invariant value and use
2262 it directly after loop. */
2263 initialize_root_vars_store_elim_1 (chain);
2265 else
2267 /* If dead stores in this chain store loop variant values,
2268 we need to set up the variables by loading from memory
2269 before loop and propagating it with PHI nodes. */
2270 initialize_root_vars_store_elim_2 (m_loop, chain, tmp_vars);
2273 /* For inter-iteration store elimination chain, stores at each
2274 distance in loop's last (chain->length - 1) iterations can't
2275 be eliminated, because there is no following killing store.
2276 We need to generate these stores after loop. */
2277 finalize_eliminated_stores (m_loop, chain);
2280 bool last_store_p = true;
2281 for (i = chain->refs.length (); i > 0; i--)
2283 a = chain->refs[i - 1];
2284 /* Preserve the last store of the chain. Eliminate other stores
2285 which are killed by the last one. */
2286 if (DR_IS_WRITE (a->ref))
2288 if (last_store_p)
2289 last_store_p = false;
2290 else
2291 remove_stmt (a->stmt);
2293 continue;
2296 /* Any load in Store-Store chain must be dominated by a previous
2297 store, we replace the load reference with rhs of the store. */
2298 dref b = get_chain_last_write_before_load (chain, i - 1);
2299 gcc_assert (b != NULL);
2300 var = gimple_assign_rhs1 (b->stmt);
2301 replace_ref_with (a->stmt, var, false, false);
2304 else
2306 /* For non-combined chains, set up the variables that hold its value. */
2307 initialize_root_vars (m_loop, chain, tmp_vars);
2308 a = get_chain_root (chain);
2309 in_lhs = (chain->type == CT_STORE_LOAD
2310 || chain->type == CT_COMBINATION);
2311 replace_ref_with (a->stmt, chain->vars[chain->length], true, in_lhs);
2313 /* Replace the uses of the original references by these variables. */
2314 for (i = 1; chain->refs.iterate (i, &a); i++)
2316 var = chain->vars[chain->length - a->distance];
2317 replace_ref_with (a->stmt, var, false, false);
2322 /* Determines the unroll factor necessary to remove as many temporary variable
2323 copies as possible. CHAINS is the list of chains that will be
2324 optimized. */
2326 static unsigned
2327 determine_unroll_factor (const vec<chain_p> &chains)
2329 chain_p chain;
2330 unsigned factor = 1, af, nfactor, i;
2331 unsigned max = param_max_unroll_times;
2333 FOR_EACH_VEC_ELT (chains, i, chain)
2335 if (chain->type == CT_INVARIANT)
2336 continue;
2337 /* For now we can't handle unrolling when eliminating stores. */
2338 else if (chain->type == CT_STORE_STORE)
2339 return 1;
2341 if (chain->combined)
2343 /* For combined chains, we can't handle unrolling if we replace
2344 looparound PHIs. */
2345 dref a;
2346 unsigned j;
2347 for (j = 1; chain->refs.iterate (j, &a); j++)
2348 if (gimple_code (a->stmt) == GIMPLE_PHI)
2349 return 1;
2350 continue;
2353 /* The best unroll factor for this chain is equal to the number of
2354 temporary variables that we create for it. */
2355 af = chain->length;
2356 if (chain->has_max_use_after)
2357 af++;
2359 nfactor = factor * af / gcd (factor, af);
2360 if (nfactor <= max)
2361 factor = nfactor;
2364 return factor;
2367 /* Perform the predictive commoning optimization for chains.
2368 Uids of the newly created temporary variables are marked in TMP_VARS. */
2370 void
2371 pcom_worker::execute_pred_commoning (bitmap tmp_vars)
2373 chain_p chain;
2374 unsigned i;
2376 FOR_EACH_VEC_ELT (m_chains, i, chain)
2378 if (chain->type == CT_INVARIANT)
2379 execute_load_motion (m_loop, chain, tmp_vars);
2380 else
2381 execute_pred_commoning_chain (chain, tmp_vars);
2384 FOR_EACH_VEC_ELT (m_chains, i, chain)
2386 if (chain->type == CT_INVARIANT)
2388 else if (chain->combined)
2390 /* For combined chains, just remove the statements that are used to
2391 compute the values of the expression (except for the root one). */
2392 dref a;
2393 unsigned j;
2394 for (j = 1; chain->refs.iterate (j, &a); j++)
2395 remove_stmt (a->stmt);
2400 /* For each reference in CHAINS, if its defining statement is
2401 phi node, record the ssa name that is defined by it. */
2403 static void
2404 replace_phis_by_defined_names (vec<chain_p> &chains)
2406 chain_p chain;
2407 dref a;
2408 unsigned i, j;
2410 FOR_EACH_VEC_ELT (chains, i, chain)
2411 FOR_EACH_VEC_ELT (chain->refs, j, a)
2413 if (gimple_code (a->stmt) == GIMPLE_PHI)
2415 a->name_defined_by_phi = PHI_RESULT (a->stmt);
2416 a->stmt = NULL;
2421 /* For each reference in CHAINS, if name_defined_by_phi is not
2422 NULL, use it to set the stmt field. */
2424 static void
2425 replace_names_by_phis (vec<chain_p> chains)
2427 chain_p chain;
2428 dref a;
2429 unsigned i, j;
2431 FOR_EACH_VEC_ELT (chains, i, chain)
2432 FOR_EACH_VEC_ELT (chain->refs, j, a)
2433 if (a->stmt == NULL)
2435 a->stmt = SSA_NAME_DEF_STMT (a->name_defined_by_phi);
2436 gcc_assert (gimple_code (a->stmt) == GIMPLE_PHI);
2437 a->name_defined_by_phi = NULL_TREE;
2441 /* Wrapper over execute_pred_commoning, to pass it as a callback
2442 to tree_transform_and_unroll_loop. */
2444 struct epcc_data
2446 vec<chain_p> chains;
2447 bitmap tmp_vars;
2448 pcom_worker *worker;
2451 static void
2452 execute_pred_commoning_cbck (class loop *loop ATTRIBUTE_UNUSED, void *data)
2454 struct epcc_data *const dta = (struct epcc_data *) data;
2455 pcom_worker *worker = dta->worker;
2457 /* Restore phi nodes that were replaced by ssa names before
2458 tree_transform_and_unroll_loop (see detailed description in
2459 tree_predictive_commoning_loop). */
2460 replace_names_by_phis (dta->chains);
2461 worker->execute_pred_commoning (dta->tmp_vars);
2464 /* Base NAME and all the names in the chain of phi nodes that use it
2465 on variable VAR. The phi nodes are recognized by being in the copies of
2466 the header of the LOOP. */
2468 static void
2469 base_names_in_chain_on (class loop *loop, tree name, tree var)
2471 gimple *stmt, *phi;
2472 imm_use_iterator iter;
2474 replace_ssa_name_symbol (name, var);
2476 while (1)
2478 phi = NULL;
2479 FOR_EACH_IMM_USE_STMT (stmt, iter, name)
2481 if (gimple_code (stmt) == GIMPLE_PHI
2482 && flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
2484 phi = stmt;
2485 break;
2488 if (!phi)
2489 return;
2491 name = PHI_RESULT (phi);
2492 replace_ssa_name_symbol (name, var);
2496 /* Given an unrolled LOOP after predictive commoning, remove the
2497 register copies arising from phi nodes by changing the base
2498 variables of SSA names. TMP_VARS is the set of the temporary variables
2499 for those we want to perform this. */
2501 static void
2502 eliminate_temp_copies (class loop *loop, bitmap tmp_vars)
2504 edge e;
2505 gphi *phi;
2506 gimple *stmt;
2507 tree name, use, var;
2508 gphi_iterator psi;
2510 e = loop_latch_edge (loop);
2511 for (psi = gsi_start_phis (loop->header); !gsi_end_p (psi); gsi_next (&psi))
2513 phi = psi.phi ();
2514 name = PHI_RESULT (phi);
2515 var = SSA_NAME_VAR (name);
2516 if (!var || !bitmap_bit_p (tmp_vars, DECL_UID (var)))
2517 continue;
2518 use = PHI_ARG_DEF_FROM_EDGE (phi, e);
2519 gcc_assert (TREE_CODE (use) == SSA_NAME);
2521 /* Base all the ssa names in the ud and du chain of NAME on VAR. */
2522 stmt = SSA_NAME_DEF_STMT (use);
2523 while (gimple_code (stmt) == GIMPLE_PHI
2524 /* In case we could not unroll the loop enough to eliminate
2525 all copies, we may reach the loop header before the defining
2526 statement (in that case, some register copies will be present
2527 in loop latch in the final code, corresponding to the newly
2528 created looparound phi nodes). */
2529 && gimple_bb (stmt) != loop->header)
2531 gcc_assert (single_pred_p (gimple_bb (stmt)));
2532 use = PHI_ARG_DEF (stmt, 0);
2533 stmt = SSA_NAME_DEF_STMT (use);
2536 base_names_in_chain_on (loop, use, var);
2540 /* Returns true if CHAIN is suitable to be combined. */
2542 static bool
2543 chain_can_be_combined_p (chain_p chain)
2545 return (!chain->combined
2546 && (chain->type == CT_LOAD || chain->type == CT_COMBINATION));
2549 /* Returns the modify statement that uses NAME. Skips over assignment
2550 statements, NAME is replaced with the actual name used in the returned
2551 statement. */
2553 gimple *
2554 pcom_worker::find_use_stmt (tree *name)
2556 gimple *stmt;
2557 tree rhs, lhs;
2559 /* Skip over assignments. */
2560 while (1)
2562 stmt = single_nonlooparound_use (*name);
2563 if (!stmt)
2564 return NULL;
2566 if (gimple_code (stmt) != GIMPLE_ASSIGN)
2567 return NULL;
2569 lhs = gimple_assign_lhs (stmt);
2570 if (TREE_CODE (lhs) != SSA_NAME)
2571 return NULL;
2573 if (gimple_assign_copy_p (stmt))
2575 rhs = gimple_assign_rhs1 (stmt);
2576 if (rhs != *name)
2577 return NULL;
2579 *name = lhs;
2581 else if (get_gimple_rhs_class (gimple_assign_rhs_code (stmt))
2582 == GIMPLE_BINARY_RHS)
2583 return stmt;
2584 else
2585 return NULL;
2589 /* Returns true if we may perform reassociation for operation CODE in TYPE. */
2591 static bool
2592 may_reassociate_p (tree type, enum tree_code code)
2594 if (FLOAT_TYPE_P (type)
2595 && !flag_unsafe_math_optimizations)
2596 return false;
2598 return (commutative_tree_code (code)
2599 && associative_tree_code (code));
2602 /* If the operation used in STMT is associative and commutative, go through the
2603 tree of the same operations and returns its root. Distance to the root
2604 is stored in DISTANCE. */
2606 gimple *
2607 pcom_worker::find_associative_operation_root (gimple *stmt, unsigned *distance)
2609 tree lhs;
2610 gimple *next;
2611 enum tree_code code = gimple_assign_rhs_code (stmt);
2612 tree type = TREE_TYPE (gimple_assign_lhs (stmt));
2613 unsigned dist = 0;
2615 if (!may_reassociate_p (type, code))
2616 return NULL;
2618 while (1)
2620 lhs = gimple_assign_lhs (stmt);
2621 gcc_assert (TREE_CODE (lhs) == SSA_NAME);
2623 next = find_use_stmt (&lhs);
2624 if (!next
2625 || gimple_assign_rhs_code (next) != code)
2626 break;
2628 stmt = next;
2629 dist++;
2632 if (distance)
2633 *distance = dist;
2634 return stmt;
2637 /* Returns the common statement in that NAME1 and NAME2 have a use. If there
2638 is no such statement, returns NULL_TREE. In case the operation used on
2639 NAME1 and NAME2 is associative and commutative, returns the root of the
2640 tree formed by this operation instead of the statement that uses NAME1 or
2641 NAME2. */
2643 gimple *
2644 pcom_worker::find_common_use_stmt (tree *name1, tree *name2)
2646 gimple *stmt1, *stmt2;
2648 stmt1 = find_use_stmt (name1);
2649 if (!stmt1)
2650 return NULL;
2652 stmt2 = find_use_stmt (name2);
2653 if (!stmt2)
2654 return NULL;
2656 if (stmt1 == stmt2)
2657 return stmt1;
2659 stmt1 = find_associative_operation_root (stmt1, NULL);
2660 if (!stmt1)
2661 return NULL;
2662 stmt2 = find_associative_operation_root (stmt2, NULL);
2663 if (!stmt2)
2664 return NULL;
2666 return (stmt1 == stmt2 ? stmt1 : NULL);
2669 /* Checks whether R1 and R2 are combined together using CODE, with the result
2670 in RSLT_TYPE, in order R1 CODE R2 if SWAP is false and in order R2 CODE R1
2671 if it is true. If CODE is ERROR_MARK, set these values instead. */
2673 bool
2674 pcom_worker::combinable_refs_p (dref r1, dref r2,
2675 enum tree_code *code, bool *swap, tree *rslt_type)
2677 enum tree_code acode;
2678 bool aswap;
2679 tree atype;
2680 tree name1, name2;
2681 gimple *stmt;
2683 name1 = name_for_ref (r1);
2684 name2 = name_for_ref (r2);
2685 gcc_assert (name1 != NULL_TREE && name2 != NULL_TREE);
2687 stmt = find_common_use_stmt (&name1, &name2);
2689 if (!stmt
2690 /* A simple post-dominance check - make sure the combination
2691 is executed under the same condition as the references. */
2692 || (gimple_bb (stmt) != gimple_bb (r1->stmt)
2693 && gimple_bb (stmt) != gimple_bb (r2->stmt)))
2694 return false;
2696 acode = gimple_assign_rhs_code (stmt);
2697 aswap = (!commutative_tree_code (acode)
2698 && gimple_assign_rhs1 (stmt) != name1);
2699 atype = TREE_TYPE (gimple_assign_lhs (stmt));
2701 if (*code == ERROR_MARK)
2703 *code = acode;
2704 *swap = aswap;
2705 *rslt_type = atype;
2706 return true;
2709 return (*code == acode
2710 && *swap == aswap
2711 && *rslt_type == atype);
2714 /* Remove OP from the operation on rhs of STMT, and replace STMT with
2715 an assignment of the remaining operand. */
2717 static void
2718 remove_name_from_operation (gimple *stmt, tree op)
2720 tree other_op;
2721 gimple_stmt_iterator si;
2723 gcc_assert (is_gimple_assign (stmt));
2725 if (gimple_assign_rhs1 (stmt) == op)
2726 other_op = gimple_assign_rhs2 (stmt);
2727 else
2728 other_op = gimple_assign_rhs1 (stmt);
2730 si = gsi_for_stmt (stmt);
2731 gimple_assign_set_rhs_from_tree (&si, other_op);
2733 /* We should not have reallocated STMT. */
2734 gcc_assert (gsi_stmt (si) == stmt);
2736 update_stmt (stmt);
2739 /* Reassociates the expression in that NAME1 and NAME2 are used so that they
2740 are combined in a single statement, and returns this statement. */
2742 gimple *
2743 pcom_worker::reassociate_to_the_same_stmt (tree name1, tree name2)
2745 gimple *stmt1, *stmt2, *root1, *root2, *s1, *s2;
2746 gassign *new_stmt, *tmp_stmt;
2747 tree new_name, tmp_name, var, r1, r2;
2748 unsigned dist1, dist2;
2749 enum tree_code code;
2750 tree type = TREE_TYPE (name1);
2751 gimple_stmt_iterator bsi;
2753 stmt1 = find_use_stmt (&name1);
2754 stmt2 = find_use_stmt (&name2);
2755 root1 = find_associative_operation_root (stmt1, &dist1);
2756 root2 = find_associative_operation_root (stmt2, &dist2);
2757 code = gimple_assign_rhs_code (stmt1);
2759 gcc_assert (root1 && root2 && root1 == root2
2760 && code == gimple_assign_rhs_code (stmt2));
2762 /* Find the root of the nearest expression in that both NAME1 and NAME2
2763 are used. */
2764 r1 = name1;
2765 s1 = stmt1;
2766 r2 = name2;
2767 s2 = stmt2;
2769 while (dist1 > dist2)
2771 s1 = find_use_stmt (&r1);
2772 r1 = gimple_assign_lhs (s1);
2773 dist1--;
2775 while (dist2 > dist1)
2777 s2 = find_use_stmt (&r2);
2778 r2 = gimple_assign_lhs (s2);
2779 dist2--;
2782 while (s1 != s2)
2784 s1 = find_use_stmt (&r1);
2785 r1 = gimple_assign_lhs (s1);
2786 s2 = find_use_stmt (&r2);
2787 r2 = gimple_assign_lhs (s2);
2790 /* Remove NAME1 and NAME2 from the statements in that they are used
2791 currently. */
2792 remove_name_from_operation (stmt1, name1);
2793 remove_name_from_operation (stmt2, name2);
2795 /* Insert the new statement combining NAME1 and NAME2 before S1, and
2796 combine it with the rhs of S1. */
2797 var = create_tmp_reg (type, "predreastmp");
2798 new_name = make_ssa_name (var);
2799 new_stmt = gimple_build_assign (new_name, code, name1, name2);
2801 var = create_tmp_reg (type, "predreastmp");
2802 tmp_name = make_ssa_name (var);
2804 /* Rhs of S1 may now be either a binary expression with operation
2805 CODE, or gimple_val (in case that stmt1 == s1 or stmt2 == s1,
2806 so that name1 or name2 was removed from it). */
2807 tmp_stmt = gimple_build_assign (tmp_name, gimple_assign_rhs_code (s1),
2808 gimple_assign_rhs1 (s1),
2809 gimple_assign_rhs2 (s1));
2811 bsi = gsi_for_stmt (s1);
2812 gimple_assign_set_rhs_with_ops (&bsi, code, new_name, tmp_name);
2813 s1 = gsi_stmt (bsi);
2814 update_stmt (s1);
2816 gsi_insert_before (&bsi, new_stmt, GSI_SAME_STMT);
2817 gsi_insert_before (&bsi, tmp_stmt, GSI_SAME_STMT);
2819 return new_stmt;
2822 /* Returns the statement that combines references R1 and R2. In case R1
2823 and R2 are not used in the same statement, but they are used with an
2824 associative and commutative operation in the same expression, reassociate
2825 the expression so that they are used in the same statement. */
2827 gimple *
2828 pcom_worker::stmt_combining_refs (dref r1, dref r2)
2830 gimple *stmt1, *stmt2;
2831 tree name1 = name_for_ref (r1);
2832 tree name2 = name_for_ref (r2);
2834 stmt1 = find_use_stmt (&name1);
2835 stmt2 = find_use_stmt (&name2);
2836 if (stmt1 == stmt2)
2837 return stmt1;
2839 return reassociate_to_the_same_stmt (name1, name2);
2842 /* Tries to combine chains CH1 and CH2 together. If this succeeds, the
2843 description of the new chain is returned, otherwise we return NULL. */
2845 chain_p
2846 pcom_worker::combine_chains (chain_p ch1, chain_p ch2)
2848 dref r1, r2, nw;
2849 enum tree_code op = ERROR_MARK;
2850 bool swap = false;
2851 chain_p new_chain;
2852 unsigned i;
2853 tree rslt_type = NULL_TREE;
2855 if (ch1 == ch2)
2856 return NULL;
2857 if (ch1->length != ch2->length)
2858 return NULL;
2860 if (ch1->refs.length () != ch2->refs.length ())
2861 return NULL;
2863 for (i = 0; (ch1->refs.iterate (i, &r1)
2864 && ch2->refs.iterate (i, &r2)); i++)
2866 if (r1->distance != r2->distance)
2867 return NULL;
2869 if (!combinable_refs_p (r1, r2, &op, &swap, &rslt_type))
2870 return NULL;
2873 if (swap)
2874 std::swap (ch1, ch2);
2876 new_chain = XCNEW (struct chain);
2877 new_chain->type = CT_COMBINATION;
2878 new_chain->op = op;
2879 new_chain->ch1 = ch1;
2880 new_chain->ch2 = ch2;
2881 new_chain->rslt_type = rslt_type;
2882 new_chain->length = ch1->length;
2884 for (i = 0; (ch1->refs.iterate (i, &r1)
2885 && ch2->refs.iterate (i, &r2)); i++)
2887 nw = XCNEW (class dref_d);
2888 nw->stmt = stmt_combining_refs (r1, r2);
2889 nw->distance = r1->distance;
2891 new_chain->refs.safe_push (nw);
2894 ch1->combined = true;
2895 ch2->combined = true;
2896 return new_chain;
2899 /* Recursively update position information of all offspring chains to ROOT
2900 chain's position information. */
2902 static void
2903 update_pos_for_combined_chains (chain_p root)
2905 chain_p ch1 = root->ch1, ch2 = root->ch2;
2906 dref ref, ref1, ref2;
2907 for (unsigned j = 0; (root->refs.iterate (j, &ref)
2908 && ch1->refs.iterate (j, &ref1)
2909 && ch2->refs.iterate (j, &ref2)); ++j)
2910 ref1->pos = ref2->pos = ref->pos;
2912 if (ch1->type == CT_COMBINATION)
2913 update_pos_for_combined_chains (ch1);
2914 if (ch2->type == CT_COMBINATION)
2915 update_pos_for_combined_chains (ch2);
2918 /* Returns true if statement S1 dominates statement S2. */
2920 static bool
2921 pcom_stmt_dominates_stmt_p (gimple *s1, gimple *s2)
2923 basic_block bb1 = gimple_bb (s1), bb2 = gimple_bb (s2);
2925 if (!bb1 || s1 == s2)
2926 return true;
2928 if (bb1 == bb2)
2929 return gimple_uid (s1) < gimple_uid (s2);
2931 return dominated_by_p (CDI_DOMINATORS, bb2, bb1);
2934 /* Try to combine the chains. */
2936 void
2937 pcom_worker::try_combine_chains ()
2939 unsigned i, j;
2940 chain_p ch1, ch2, cch;
2941 auto_vec<chain_p> worklist;
2942 bool combined_p = false;
2944 FOR_EACH_VEC_ELT (m_chains, i, ch1)
2945 if (chain_can_be_combined_p (ch1))
2946 worklist.safe_push (ch1);
2948 while (!worklist.is_empty ())
2950 ch1 = worklist.pop ();
2951 if (!chain_can_be_combined_p (ch1))
2952 continue;
2954 FOR_EACH_VEC_ELT (m_chains, j, ch2)
2956 if (!chain_can_be_combined_p (ch2))
2957 continue;
2959 cch = combine_chains (ch1, ch2);
2960 if (cch)
2962 worklist.safe_push (cch);
2963 m_chains.safe_push (cch);
2964 combined_p = true;
2965 break;
2969 if (!combined_p)
2970 return;
2972 /* Setup UID for all statements in dominance order. */
2973 basic_block *bbs = get_loop_body_in_dom_order (m_loop);
2974 renumber_gimple_stmt_uids_in_blocks (bbs, m_loop->num_nodes);
2975 free (bbs);
2977 /* Re-association in combined chains may generate statements different to
2978 order of references of the original chain. We need to keep references
2979 of combined chain in dominance order so that all uses will be inserted
2980 after definitions. Note:
2981 A) This is necessary for all combined chains.
2982 B) This is only necessary for ZERO distance references because other
2983 references inherit value from loop carried PHIs.
2985 We first update position information for all combined chains. */
2986 dref ref;
2987 for (i = 0; m_chains.iterate (i, &ch1); ++i)
2989 if (ch1->type != CT_COMBINATION || ch1->combined)
2990 continue;
2992 for (j = 0; ch1->refs.iterate (j, &ref); ++j)
2993 ref->pos = gimple_uid (ref->stmt);
2995 update_pos_for_combined_chains (ch1);
2997 /* Then sort references according to newly updated position information. */
2998 for (i = 0; m_chains.iterate (i, &ch1); ++i)
3000 if (ch1->type != CT_COMBINATION && !ch1->combined)
3001 continue;
3003 /* Find the first reference with non-ZERO distance. */
3004 if (ch1->length == 0)
3005 j = ch1->refs.length();
3006 else
3008 for (j = 0; ch1->refs.iterate (j, &ref); ++j)
3009 if (ref->distance != 0)
3010 break;
3013 /* Sort all ZERO distance references by position. */
3014 qsort (&ch1->refs[0], j, sizeof (ch1->refs[0]), order_drefs_by_pos);
3016 if (ch1->combined)
3017 continue;
3019 /* For ZERO length chain, has_max_use_after must be true since root
3020 combined stmt must dominates others. */
3021 if (ch1->length == 0)
3023 ch1->has_max_use_after = true;
3024 continue;
3026 /* Check if there is use at max distance after root for combined chains
3027 and set flag accordingly. */
3028 ch1->has_max_use_after = false;
3029 gimple *root_stmt = get_chain_root (ch1)->stmt;
3030 for (j = 1; ch1->refs.iterate (j, &ref); ++j)
3032 if (ref->distance == ch1->length
3033 && !pcom_stmt_dominates_stmt_p (ref->stmt, root_stmt))
3035 ch1->has_max_use_after = true;
3036 break;
3042 /* Prepare initializers for store elimination CHAIN in LOOP. Returns false
3043 if this is impossible because one of these initializers may trap, true
3044 otherwise. */
3046 static bool
3047 prepare_initializers_chain_store_elim (class loop *loop, chain_p chain)
3049 unsigned i, n = chain->length;
3051 /* For now we can't eliminate stores if some of them are conditional
3052 executed. */
3053 if (!chain->all_always_accessed)
3054 return false;
3056 /* Nothing to intialize for intra-iteration store elimination. */
3057 if (n == 0 && chain->type == CT_STORE_STORE)
3058 return true;
3060 /* For store elimination chain, there is nothing to initialize if stores
3061 to be eliminated only store loop invariant values into memory. */
3062 if (chain->type == CT_STORE_STORE
3063 && is_inv_store_elimination_chain (loop, chain))
3065 chain->inv_store_elimination = true;
3066 return true;
3069 chain->inits.create (n);
3070 chain->inits.safe_grow_cleared (n, true);
3072 /* For store eliminatin chain like below:
3074 for (i = 0; i < len; i++)
3076 a[i] = 1;
3077 // a[i + 1] = ...
3078 a[i + 2] = 3;
3081 store to a[i + 1] is missed in loop body, it acts like bubbles. The
3082 content of a[i + 1] remain the same if the loop iterates fewer times
3083 than chain->length. We need to set up root variables for such stores
3084 by loading from memory before loop. Note we only need to load bubble
3085 elements because loop body is guaranteed to be executed at least once
3086 after loop's preheader edge. */
3087 auto_vec<bool> bubbles;
3088 bubbles.safe_grow_cleared (n + 1, true);
3089 for (i = 0; i < chain->refs.length (); i++)
3090 bubbles[chain->refs[i]->distance] = true;
3092 struct data_reference *dr = get_chain_root (chain)->ref;
3093 for (i = 0; i < n; i++)
3095 if (bubbles[i])
3096 continue;
3098 gimple_seq stmts = NULL;
3100 tree init = ref_at_iteration (dr, (int) 0 - i, &stmts);
3101 if (stmts)
3102 gimple_seq_add_seq_without_update (&chain->init_seq, stmts);
3104 chain->inits[i] = init;
3107 return true;
3110 /* Prepare initializers for CHAIN. Returns false if this is impossible
3111 because one of these initializers may trap, true otherwise. */
3113 bool
3114 pcom_worker::prepare_initializers_chain (chain_p chain)
3116 unsigned i, n = (chain->type == CT_INVARIANT) ? 1 : chain->length;
3117 struct data_reference *dr = get_chain_root (chain)->ref;
3118 tree init;
3119 dref laref;
3120 edge entry = loop_preheader_edge (m_loop);
3122 if (chain->type == CT_STORE_STORE)
3123 return prepare_initializers_chain_store_elim (m_loop, chain);
3125 /* Find the initializers for the variables, and check that they cannot
3126 trap. */
3127 chain->inits.create (n);
3128 for (i = 0; i < n; i++)
3129 chain->inits.quick_push (NULL_TREE);
3131 /* If we have replaced some looparound phi nodes, use their initializers
3132 instead of creating our own. */
3133 FOR_EACH_VEC_ELT (chain->refs, i, laref)
3135 if (gimple_code (laref->stmt) != GIMPLE_PHI)
3136 continue;
3138 gcc_assert (laref->distance > 0);
3139 chain->inits[n - laref->distance]
3140 = PHI_ARG_DEF_FROM_EDGE (laref->stmt, entry);
3143 for (i = 0; i < n; i++)
3145 gimple_seq stmts = NULL;
3147 if (chain->inits[i] != NULL_TREE)
3148 continue;
3150 init = ref_at_iteration (dr, (int) i - n, &stmts);
3151 if (!chain->all_always_accessed && tree_could_trap_p (init))
3153 gimple_seq_discard (stmts);
3154 return false;
3157 if (stmts)
3158 gimple_seq_add_seq_without_update (&chain->init_seq, stmts);
3160 chain->inits[i] = init;
3163 return true;
3166 /* Prepare initializers for chains, and free chains that cannot
3167 be used because the initializers might trap. */
3169 void
3170 pcom_worker::prepare_initializers ()
3172 chain_p chain;
3173 unsigned i;
3175 for (i = 0; i < m_chains.length (); )
3177 chain = m_chains[i];
3178 if (prepare_initializers_chain (chain))
3179 i++;
3180 else
3182 release_chain (chain);
3183 m_chains.unordered_remove (i);
3188 /* Generates finalizer memory references for CHAIN. Returns true
3189 if finalizer code for CHAIN can be generated, otherwise false. */
3191 bool
3192 pcom_worker::prepare_finalizers_chain (chain_p chain)
3194 unsigned i, n = chain->length;
3195 struct data_reference *dr = get_chain_root (chain)->ref;
3196 tree fini, niters = number_of_latch_executions (m_loop);
3198 /* For now we can't eliminate stores if some of them are conditional
3199 executed. */
3200 if (!chain->all_always_accessed)
3201 return false;
3203 chain->finis.create (n);
3204 for (i = 0; i < n; i++)
3205 chain->finis.quick_push (NULL_TREE);
3207 /* We never use looparound phi node for store elimination chains. */
3209 /* Find the finalizers for the variables, and check that they cannot
3210 trap. */
3211 for (i = 0; i < n; i++)
3213 gimple_seq stmts = NULL;
3214 gcc_assert (chain->finis[i] == NULL_TREE);
3216 if (TREE_CODE (niters) != INTEGER_CST && TREE_CODE (niters) != SSA_NAME)
3218 niters = unshare_expr (niters);
3219 niters = force_gimple_operand (niters, &stmts, true, NULL);
3220 if (stmts)
3222 gimple_seq_add_seq_without_update (&chain->fini_seq, stmts);
3223 stmts = NULL;
3226 fini = ref_at_iteration (dr, (int) 0 - i, &stmts, niters);
3227 if (stmts)
3228 gimple_seq_add_seq_without_update (&chain->fini_seq, stmts);
3230 chain->finis[i] = fini;
3233 return true;
3236 /* Generates finalizer memory reference for chains. Returns true if
3237 finalizer code generation for chains breaks loop closed ssa form. */
3239 bool
3240 pcom_worker::prepare_finalizers ()
3242 chain_p chain;
3243 unsigned i;
3244 bool loop_closed_ssa = false;
3246 for (i = 0; i < m_chains.length ();)
3248 chain = m_chains[i];
3250 /* Finalizer is only necessary for inter-iteration store elimination
3251 chains. */
3252 if (chain->length == 0 || chain->type != CT_STORE_STORE)
3254 i++;
3255 continue;
3258 if (prepare_finalizers_chain (chain))
3260 i++;
3261 /* Be conservative, assume loop closed ssa form is corrupted
3262 by store-store chain. Though it's not always the case if
3263 eliminated stores only store loop invariant values into
3264 memory. */
3265 loop_closed_ssa = true;
3267 else
3269 release_chain (chain);
3270 m_chains.unordered_remove (i);
3273 return loop_closed_ssa;
3276 /* Insert all initializing gimple stmts into LOOP's entry edge. */
3278 static void
3279 insert_init_seqs (class loop *loop, vec<chain_p> &chains)
3281 unsigned i;
3282 edge entry = loop_preheader_edge (loop);
3284 for (i = 0; i < chains.length (); ++i)
3285 if (chains[i]->init_seq)
3287 gsi_insert_seq_on_edge_immediate (entry, chains[i]->init_seq);
3288 chains[i]->init_seq = NULL;
3292 /* Performs predictive commoning for LOOP. Sets bit 1<<1 of return value
3293 if LOOP was unrolled; Sets bit 1<<2 of return value if loop closed ssa
3294 form was corrupted. Non-zero return value indicates some changes were
3295 applied to this loop. */
3297 unsigned
3298 pcom_worker::tree_predictive_commoning_loop (bool allow_unroll_p)
3300 struct component *components;
3301 unsigned unroll_factor = 0;
3302 class tree_niter_desc desc;
3303 bool unroll = false, loop_closed_ssa = false;
3305 if (dump_file && (dump_flags & TDF_DETAILS))
3306 fprintf (dump_file, "Processing loop %d\n", m_loop->num);
3308 /* Nothing for predicitive commoning if loop only iterates 1 time. */
3309 if (get_max_loop_iterations_int (m_loop) == 0)
3311 if (dump_file && (dump_flags & TDF_DETAILS))
3312 fprintf (dump_file, "Loop iterates only 1 time, nothing to do.\n");
3314 return 0;
3317 /* Find the data references and split them into components according to their
3318 dependence relations. */
3319 auto_vec<loop_p, 3> loop_nest;
3320 if (!compute_data_dependences_for_loop (m_loop, true, &loop_nest, &m_datarefs,
3321 &m_dependences))
3323 if (dump_file && (dump_flags & TDF_DETAILS))
3324 fprintf (dump_file, "Cannot analyze data dependencies\n");
3325 return 0;
3328 if (dump_file && (dump_flags & TDF_DETAILS))
3329 dump_data_dependence_relations (dump_file, m_dependences);
3331 components = split_data_refs_to_components ();
3333 loop_nest.release ();
3334 if (!components)
3335 return 0;
3337 if (dump_file && (dump_flags & TDF_DETAILS))
3339 fprintf (dump_file, "Initial state:\n\n");
3340 dump_components (dump_file, components);
3343 /* Find the suitable components and split them into chains. */
3344 components = filter_suitable_components (components);
3346 auto_bitmap tmp_vars;
3347 determine_roots (components);
3348 release_components (components);
3350 if (!m_chains.exists ())
3352 if (dump_file && (dump_flags & TDF_DETAILS))
3353 fprintf (dump_file,
3354 "Predictive commoning failed: no suitable chains\n");
3355 return 0;
3358 prepare_initializers ();
3359 loop_closed_ssa = prepare_finalizers ();
3361 /* Try to combine the chains that are always worked with together. */
3362 try_combine_chains ();
3364 insert_init_seqs (m_loop, m_chains);
3366 if (dump_file && (dump_flags & TDF_DETAILS))
3368 fprintf (dump_file, "Before commoning:\n\n");
3369 dump_chains (dump_file, m_chains);
3372 if (allow_unroll_p)
3373 /* Determine the unroll factor, and if the loop should be unrolled, ensure
3374 that its number of iterations is divisible by the factor. */
3375 unroll_factor = determine_unroll_factor (m_chains);
3377 if (unroll_factor > 1)
3378 unroll = can_unroll_loop_p (m_loop, unroll_factor, &desc);
3380 /* Execute the predictive commoning transformations, and possibly unroll the
3381 loop. */
3382 if (unroll)
3384 struct epcc_data dta;
3386 if (dump_file && (dump_flags & TDF_DETAILS))
3387 fprintf (dump_file, "Unrolling %u times.\n", unroll_factor);
3389 dta.tmp_vars = tmp_vars;
3390 dta.chains = m_chains.to_vec_legacy ();
3391 dta.worker = this;
3393 /* Cfg manipulations performed in tree_transform_and_unroll_loop before
3394 execute_pred_commoning_cbck is called may cause phi nodes to be
3395 reallocated, which is a problem since CHAINS may point to these
3396 statements. To fix this, we store the ssa names defined by the
3397 phi nodes here instead of the phi nodes themselves, and restore
3398 the phi nodes in execute_pred_commoning_cbck. A bit hacky. */
3399 replace_phis_by_defined_names (m_chains);
3401 tree_transform_and_unroll_loop (m_loop, unroll_factor, &desc,
3402 execute_pred_commoning_cbck, &dta);
3403 eliminate_temp_copies (m_loop, tmp_vars);
3405 else
3407 if (dump_file && (dump_flags & TDF_DETAILS))
3408 fprintf (dump_file,
3409 "Executing predictive commoning without unrolling.\n");
3410 execute_pred_commoning (tmp_vars);
3413 return (unroll ? 2 : 1) | (loop_closed_ssa ? 4 : 1);
3416 /* Runs predictive commoning. */
3418 unsigned
3419 tree_predictive_commoning (bool allow_unroll_p)
3421 unsigned ret = 0, changed = 0;
3423 initialize_original_copy_tables ();
3424 for (auto loop : loops_list (cfun, LI_ONLY_INNERMOST))
3425 if (optimize_loop_for_speed_p (loop))
3427 pcom_worker w(loop);
3428 changed |= w.tree_predictive_commoning_loop (allow_unroll_p);
3430 free_original_copy_tables ();
3432 if (changed > 0)
3434 ret = TODO_update_ssa_only_virtuals;
3436 /* Some loop(s) got unrolled. */
3437 if (changed > 1)
3439 scev_reset ();
3441 /* Need to fix up loop closed SSA. */
3442 if (changed >= 4)
3443 rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
3445 ret |= TODO_cleanup_cfg;
3449 return ret;
3452 /* Predictive commoning Pass. */
3454 static unsigned
3455 run_tree_predictive_commoning (struct function *fun, bool allow_unroll_p)
3457 if (number_of_loops (fun) <= 1)
3458 return 0;
3460 return tree_predictive_commoning (allow_unroll_p);
3463 namespace {
3465 const pass_data pass_data_predcom =
3467 GIMPLE_PASS, /* type */
3468 "pcom", /* name */
3469 OPTGROUP_LOOP, /* optinfo_flags */
3470 TV_PREDCOM, /* tv_id */
3471 PROP_cfg, /* properties_required */
3472 0, /* properties_provided */
3473 0, /* properties_destroyed */
3474 0, /* todo_flags_start */
3475 0, /* todo_flags_finish */
3478 class pass_predcom : public gimple_opt_pass
3480 public:
3481 pass_predcom (gcc::context *ctxt)
3482 : gimple_opt_pass (pass_data_predcom, ctxt)
3485 /* opt_pass methods: */
3486 virtual bool
3487 gate (function *)
3489 if (flag_predictive_commoning != 0)
3490 return true;
3491 /* Loop vectorization enables predictive commoning implicitly
3492 only if predictive commoning isn't set explicitly, and it
3493 doesn't allow unrolling. */
3494 if (flag_tree_loop_vectorize
3495 && !OPTION_SET_P (flag_predictive_commoning))
3496 return true;
3498 return false;
3501 virtual unsigned int
3502 execute (function *fun)
3504 bool allow_unroll_p = flag_predictive_commoning != 0;
3505 return run_tree_predictive_commoning (fun, allow_unroll_p);
3508 }; // class pass_predcom
3510 } // anon namespace
3512 gimple_opt_pass *
3513 make_pass_predcom (gcc::context *ctxt)
3515 return new pass_predcom (ctxt);