Fix issue for pointers to anonymous types with -fdump-ada-spec
[official-gcc.git] / gcc / tree-predcom.cc
blobe4aea7cdcb47541d7133ff2577e53a6978fdf249
1 /* Predictive commoning.
2 Copyright (C) 2005-2022 Free Software Foundation, Inc.
4 This file is part of GCC.
6 GCC is free software; you can redistribute it and/or modify it
7 under the terms of the GNU General Public License as published by the
8 Free Software Foundation; either version 3, or (at your option) any
9 later version.
11 GCC is distributed in the hope that it will be useful, but WITHOUT
12 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
13 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
14 for more details.
16 You should have received a copy of the GNU General Public License
17 along with GCC; see the file COPYING3. If not see
18 <http://www.gnu.org/licenses/>. */
20 /* This file implements the predictive commoning optimization. Predictive
21 commoning can be viewed as CSE around a loop, and with some improvements,
22 as generalized strength reduction-- i.e., reusing values computed in
23 earlier iterations of a loop in the later ones. So far, the pass only
24 handles the most useful case, that is, reusing values of memory references.
25 If you think this is all just a special case of PRE, you are sort of right;
26 however, concentrating on loops is simpler, and makes it possible to
27 incorporate data dependence analysis to detect the opportunities, perform
28 loop unrolling to avoid copies together with renaming immediately,
29 and if needed, we could also take register pressure into account.
31 Let us demonstrate what is done on an example:
33 for (i = 0; i < 100; i++)
35 a[i+2] = a[i] + a[i+1];
36 b[10] = b[10] + i;
37 c[i] = c[99 - i];
38 d[i] = d[i + 1];
41 1) We find data references in the loop, and split them to mutually
42 independent groups (i.e., we find components of a data dependence
43 graph). We ignore read-read dependences whose distance is not constant.
44 (TODO -- we could also ignore antidependences). In this example, we
45 find the following groups:
47 a[i]{read}, a[i+1]{read}, a[i+2]{write}
48 b[10]{read}, b[10]{write}
49 c[99 - i]{read}, c[i]{write}
50 d[i + 1]{read}, d[i]{write}
52 2) Inside each of the group, we verify several conditions:
53 a) all the references must differ in indices only, and the indices
54 must all have the same step
55 b) the references must dominate loop latch (and thus, they must be
56 ordered by dominance relation).
57 c) the distance of the indices must be a small multiple of the step
58 We are then able to compute the difference of the references (# of
59 iterations before they point to the same place as the first of them).
60 Also, in case there are writes in the loop, we split the groups into
61 chains whose head is the write whose values are used by the reads in
62 the same chain. The chains are then processed independently,
63 making the further transformations simpler. Also, the shorter chains
64 need the same number of registers, but may require lower unrolling
65 factor in order to get rid of the copies on the loop latch.
67 In our example, we get the following chains (the chain for c is invalid).
69 a[i]{read,+0}, a[i+1]{read,-1}, a[i+2]{write,-2}
70 b[10]{read,+0}, b[10]{write,+0}
71 d[i + 1]{read,+0}, d[i]{write,+1}
73 3) For each read, we determine the read or write whose value it reuses,
74 together with the distance of this reuse. I.e. we take the last
75 reference before it with distance 0, or the last of the references
76 with the smallest positive distance to the read. Then, we remove
77 the references that are not used in any of these chains, discard the
78 empty groups, and propagate all the links so that they point to the
79 single root reference of the chain (adjusting their distance
80 appropriately). Some extra care needs to be taken for references with
81 step 0. In our example (the numbers indicate the distance of the
82 reuse),
84 a[i] --> (*) 2, a[i+1] --> (*) 1, a[i+2] (*)
85 b[10] --> (*) 1, b[10] (*)
87 4) The chains are combined together if possible. If the corresponding
88 elements of two chains are always combined together with the same
89 operator, we remember just the result of this combination, instead
90 of remembering the values separately. We may need to perform
91 reassociation to enable combining, for example
93 e[i] + f[i+1] + e[i+1] + f[i]
95 can be reassociated as
97 (e[i] + f[i]) + (e[i+1] + f[i+1])
99 and we can combine the chains for e and f into one chain.
101 5) For each root reference (end of the chain) R, let N be maximum distance
102 of a reference reusing its value. Variables R0 up to RN are created,
103 together with phi nodes that transfer values from R1 .. RN to
104 R0 .. R(N-1).
105 Initial values are loaded to R0..R(N-1) (in case not all references
106 must necessarily be accessed and they may trap, we may fail here;
107 TODO sometimes, the loads could be guarded by a check for the number
108 of iterations). Values loaded/stored in roots are also copied to
109 RN. Other reads are replaced with the appropriate variable Ri.
110 Everything is put to SSA form.
112 As a small improvement, if R0 is dead after the root (i.e., all uses of
113 the value with the maximum distance dominate the root), we can avoid
114 creating RN and use R0 instead of it.
116 In our example, we get (only the parts concerning a and b are shown):
117 for (i = 0; i < 100; i++)
119 f = phi (a[0], s);
120 s = phi (a[1], f);
121 x = phi (b[10], x);
123 f = f + s;
124 a[i+2] = f;
125 x = x + i;
126 b[10] = x;
129 6) Factor F for unrolling is determined as the smallest common multiple of
130 (N + 1) for each root reference (N for references for that we avoided
131 creating RN). If F and the loop is small enough, loop is unrolled F
132 times. The stores to RN (R0) in the copies of the loop body are
133 periodically replaced with R0, R1, ... (R1, R2, ...), so that they can
134 be coalesced and the copies can be eliminated.
136 TODO -- copy propagation and other optimizations may change the live
137 ranges of the temporary registers and prevent them from being coalesced;
138 this may increase the register pressure.
140 In our case, F = 2 and the (main loop of the) result is
142 for (i = 0; i < ...; i += 2)
144 f = phi (a[0], f);
145 s = phi (a[1], s);
146 x = phi (b[10], x);
148 f = f + s;
149 a[i+2] = f;
150 x = x + i;
151 b[10] = x;
153 s = s + f;
154 a[i+3] = s;
155 x = x + i;
156 b[10] = x;
159 Apart from predictive commoning on Load-Load and Store-Load chains, we
160 also support Store-Store chains -- stores killed by other store can be
161 eliminated. Given below example:
163 for (i = 0; i < n; i++)
165 a[i] = 1;
166 a[i+2] = 2;
169 It can be replaced with:
171 t0 = a[0];
172 t1 = a[1];
173 for (i = 0; i < n; i++)
175 a[i] = 1;
176 t2 = 2;
177 t0 = t1;
178 t1 = t2;
180 a[n] = t0;
181 a[n+1] = t1;
183 If the loop runs more than 1 iterations, it can be further simplified into:
185 for (i = 0; i < n; i++)
187 a[i] = 1;
189 a[n] = 2;
190 a[n+1] = 2;
192 The interesting part is this can be viewed either as general store motion
193 or general dead store elimination in either intra/inter-iterations way.
195 With trivial effort, we also support load inside Store-Store chains if the
196 load is dominated by a store statement in the same iteration of loop. You
197 can see this as a restricted Store-Mixed-Load-Store chain.
199 TODO: For now, we don't support store-store chains in multi-exit loops. We
200 force to not unroll in case of store-store chain even if other chains might
201 ask for unroll.
203 Predictive commoning can be generalized for arbitrary computations (not
204 just memory loads), and also nontrivial transfer functions (e.g., replacing
205 i * i with ii_last + 2 * i + 1), to generalize strength reduction. */
207 #include "config.h"
208 #include "system.h"
209 #include "coretypes.h"
210 #include "backend.h"
211 #include "rtl.h"
212 #include "tree.h"
213 #include "gimple.h"
214 #include "predict.h"
215 #include "tree-pass.h"
216 #include "ssa.h"
217 #include "gimple-pretty-print.h"
218 #include "alias.h"
219 #include "fold-const.h"
220 #include "cfgloop.h"
221 #include "tree-eh.h"
222 #include "gimplify.h"
223 #include "gimple-iterator.h"
224 #include "gimplify-me.h"
225 #include "tree-ssa-loop-ivopts.h"
226 #include "tree-ssa-loop-manip.h"
227 #include "tree-ssa-loop-niter.h"
228 #include "tree-ssa-loop.h"
229 #include "tree-into-ssa.h"
230 #include "tree-dfa.h"
231 #include "tree-ssa.h"
232 #include "tree-data-ref.h"
233 #include "tree-scalar-evolution.h"
234 #include "tree-affine.h"
235 #include "builtins.h"
236 #include "opts.h"
238 /* The maximum number of iterations between the considered memory
239 references. */
241 #define MAX_DISTANCE (target_avail_regs < 16 ? 4 : 8)
243 /* Data references (or phi nodes that carry data reference values across
244 loop iterations). */
246 typedef class dref_d
248 public:
249 /* The reference itself. */
250 struct data_reference *ref;
252 /* The statement in that the reference appears. */
253 gimple *stmt;
255 /* In case that STMT is a phi node, this field is set to the SSA name
256 defined by it in replace_phis_by_defined_names (in order to avoid
257 pointing to phi node that got reallocated in the meantime). */
258 tree name_defined_by_phi;
260 /* Distance of the reference from the root of the chain (in number of
261 iterations of the loop). */
262 unsigned distance;
264 /* Number of iterations offset from the first reference in the component. */
265 widest_int offset;
267 /* Number of the reference in a component, in dominance ordering. */
268 unsigned pos;
270 /* True if the memory reference is always accessed when the loop is
271 entered. */
272 unsigned always_accessed : 1;
273 } *dref;
276 /* Type of the chain of the references. */
278 enum chain_type
280 /* The addresses of the references in the chain are constant. */
281 CT_INVARIANT,
283 /* There are only loads in the chain. */
284 CT_LOAD,
286 /* Root of the chain is store, the rest are loads. */
287 CT_STORE_LOAD,
289 /* There are only stores in the chain. */
290 CT_STORE_STORE,
292 /* A combination of two chains. */
293 CT_COMBINATION
296 /* Chains of data references. */
298 typedef struct chain
300 chain (chain_type t) : type (t), op (ERROR_MARK), rslt_type (NULL_TREE),
301 ch1 (NULL), ch2 (NULL), length (0), init_seq (NULL), fini_seq (NULL),
302 has_max_use_after (false), all_always_accessed (false), combined (false),
303 inv_store_elimination (false) {}
305 /* Type of the chain. */
306 enum chain_type type;
308 /* For combination chains, the operator and the two chains that are
309 combined, and the type of the result. */
310 enum tree_code op;
311 tree rslt_type;
312 struct chain *ch1, *ch2;
314 /* The references in the chain. */
315 auto_vec<dref> refs;
317 /* The maximum distance of the reference in the chain from the root. */
318 unsigned length;
320 /* The variables used to copy the value throughout iterations. */
321 auto_vec<tree> vars;
323 /* Initializers for the variables. */
324 auto_vec<tree> inits;
326 /* Finalizers for the eliminated stores. */
327 auto_vec<tree> finis;
329 /* gimple stmts intializing the initial variables of the chain. */
330 gimple_seq init_seq;
332 /* gimple stmts finalizing the eliminated stores of the chain. */
333 gimple_seq fini_seq;
335 /* True if there is a use of a variable with the maximal distance
336 that comes after the root in the loop. */
337 unsigned has_max_use_after : 1;
339 /* True if all the memory references in the chain are always accessed. */
340 unsigned all_always_accessed : 1;
342 /* True if this chain was combined together with some other chain. */
343 unsigned combined : 1;
345 /* True if this is store elimination chain and eliminated stores store
346 loop invariant value into memory. */
347 unsigned inv_store_elimination : 1;
348 } *chain_p;
351 /* Describes the knowledge about the step of the memory references in
352 the component. */
354 enum ref_step_type
356 /* The step is zero. */
357 RS_INVARIANT,
359 /* The step is nonzero. */
360 RS_NONZERO,
362 /* The step may or may not be nonzero. */
363 RS_ANY
366 /* Components of the data dependence graph. */
368 struct component
370 component (bool es) : eliminate_store_p (es), next (NULL) {}
372 /* The references in the component. */
373 auto_vec<dref> refs;
375 /* What we know about the step of the references in the component. */
376 enum ref_step_type comp_step;
378 /* True if all references in component are stores and we try to do
379 intra/inter loop iteration dead store elimination. */
380 bool eliminate_store_p;
382 /* Next component in the list. */
383 struct component *next;
386 /* A class to encapsulate the global states used for predictive
387 commoning work on top of one given LOOP. */
389 class pcom_worker
391 public:
392 pcom_worker (loop_p l) : m_loop (l), m_cache (NULL) {}
394 ~pcom_worker ()
396 free_data_refs (m_datarefs);
397 free_dependence_relations (m_dependences);
398 free_affine_expand_cache (&m_cache);
399 release_chains ();
402 pcom_worker (const pcom_worker &) = delete;
403 pcom_worker &operator= (const pcom_worker &) = delete;
405 /* Performs predictive commoning. */
406 unsigned tree_predictive_commoning_loop (bool allow_unroll_p);
408 /* Perform the predictive commoning optimization for chains, make this
409 public for being called in callback execute_pred_commoning_cbck. */
410 void execute_pred_commoning (bitmap tmp_vars);
412 private:
413 /* The pointer to the given loop. */
414 loop_p m_loop;
416 /* All data references. */
417 auto_vec<data_reference_p, 10> m_datarefs;
419 /* All data dependences. */
420 auto_vec<ddr_p, 10> m_dependences;
422 /* All chains. */
423 auto_vec<chain_p> m_chains;
425 /* Bitmap of ssa names defined by looparound phi nodes covered by chains. */
426 auto_bitmap m_looparound_phis;
428 typedef hash_map<tree, name_expansion *> tree_expand_map_t;
429 /* Cache used by tree_to_aff_combination_expand. */
430 tree_expand_map_t *m_cache;
432 /* Splits dependence graph to components. */
433 struct component *split_data_refs_to_components ();
435 /* Check the conditions on references inside each of components COMPS,
436 and remove the unsuitable components from the list. */
437 struct component *filter_suitable_components (struct component *comps);
439 /* Find roots of the values and determine distances in components COMPS,
440 and separates the references to chains. */
441 void determine_roots (struct component *comps);
443 /* Prepare initializers for chains, and free chains that cannot
444 be used because the initializers might trap. */
445 void prepare_initializers ();
447 /* Generates finalizer memory reference for chains. Returns true if
448 finalizer code generation for chains breaks loop closed ssa form. */
449 bool prepare_finalizers ();
451 /* Try to combine the chains. */
452 void try_combine_chains ();
454 /* Frees CHAINS. */
455 void release_chains ();
457 /* Frees a chain CHAIN. */
458 void release_chain (chain_p chain);
460 /* Prepare initializers for CHAIN. Returns false if this is impossible
461 because one of these initializers may trap, true otherwise. */
462 bool prepare_initializers_chain (chain_p chain);
464 /* Generates finalizer memory references for CHAIN. Returns true
465 if finalizer code for CHAIN can be generated, otherwise false. */
466 bool prepare_finalizers_chain (chain_p chain);
468 /* Stores DR_OFFSET (DR) + DR_INIT (DR) to OFFSET. */
469 void aff_combination_dr_offset (struct data_reference *dr, aff_tree *offset);
471 /* Determines number of iterations of the innermost enclosing loop before
472 B refers to exactly the same location as A and stores it to OFF. */
473 bool determine_offset (struct data_reference *a, struct data_reference *b,
474 poly_widest_int *off);
476 /* Returns true if the component COMP satisfies the conditions
477 described in 2) at the beginning of this file. */
478 bool suitable_component_p (struct component *comp);
480 /* Returns true if REF is a valid initializer for ROOT with given
481 DISTANCE (in iterations of the innermost enclosing loop). */
482 bool valid_initializer_p (struct data_reference *ref, unsigned distance,
483 struct data_reference *root);
485 /* Finds looparound phi node of loop that copies the value of REF. */
486 gphi *find_looparound_phi (dref ref, dref root);
488 /* Find roots of the values and determine distances in the component
489 COMP. The references are redistributed into chains. */
490 void determine_roots_comp (struct component *comp);
492 /* For references in CHAIN that are copied around the loop, add the
493 results of such copies to the chain. */
494 void add_looparound_copies (chain_p chain);
496 /* Returns the single statement in that NAME is used, excepting
497 the looparound phi nodes contained in one of the chains. */
498 gimple *single_nonlooparound_use (tree name);
500 /* Remove statement STMT, as well as the chain of assignments in that
501 it is used. */
502 void remove_stmt (gimple *stmt);
504 /* Perform the predictive commoning optimization for a chain CHAIN. */
505 void execute_pred_commoning_chain (chain_p chain, bitmap tmp_vars);
507 /* Returns the modify statement that uses NAME. */
508 gimple *find_use_stmt (tree *name);
510 /* If the operation used in STMT is associative and commutative, go
511 through the tree of the same operations and returns its root. */
512 gimple *find_associative_operation_root (gimple *stmt, unsigned *distance);
514 /* Returns the common statement in that NAME1 and NAME2 have a use. */
515 gimple *find_common_use_stmt (tree *name1, tree *name2);
517 /* Checks whether R1 and R2 are combined together using CODE, with the
518 result in RSLT_TYPE, in order R1 CODE R2 if SWAP is false and in order
519 R2 CODE R1 if it is true. */
520 bool combinable_refs_p (dref r1, dref r2, enum tree_code *code, bool *swap,
521 tree *rslt_type);
523 /* Reassociates the expression in that NAME1 and NAME2 are used so that
524 they are combined in a single statement, and returns this statement. */
525 gimple *reassociate_to_the_same_stmt (tree name1, tree name2);
527 /* Returns the statement that combines references R1 and R2. */
528 gimple *stmt_combining_refs (dref r1, dref r2);
530 /* Tries to combine chains CH1 and CH2 together. */
531 chain_p combine_chains (chain_p ch1, chain_p ch2);
534 /* Dumps data reference REF to FILE. */
536 extern void dump_dref (FILE *, dref);
537 void
538 dump_dref (FILE *file, dref ref)
540 if (ref->ref)
542 fprintf (file, " ");
543 print_generic_expr (file, DR_REF (ref->ref), TDF_SLIM);
544 fprintf (file, " (id %u%s)\n", ref->pos,
545 DR_IS_READ (ref->ref) ? "" : ", write");
547 fprintf (file, " offset ");
548 print_decs (ref->offset, file);
549 fprintf (file, "\n");
551 fprintf (file, " distance %u\n", ref->distance);
553 else
555 if (gimple_code (ref->stmt) == GIMPLE_PHI)
556 fprintf (file, " looparound ref\n");
557 else
558 fprintf (file, " combination ref\n");
559 fprintf (file, " in statement ");
560 print_gimple_stmt (file, ref->stmt, 0, TDF_SLIM);
561 fprintf (file, "\n");
562 fprintf (file, " distance %u\n", ref->distance);
567 /* Dumps CHAIN to FILE. */
569 extern void dump_chain (FILE *, chain_p);
570 void
571 dump_chain (FILE *file, chain_p chain)
573 dref a;
574 const char *chain_type;
575 unsigned i;
576 tree var;
578 switch (chain->type)
580 case CT_INVARIANT:
581 chain_type = "Load motion";
582 break;
584 case CT_LOAD:
585 chain_type = "Loads-only";
586 break;
588 case CT_STORE_LOAD:
589 chain_type = "Store-loads";
590 break;
592 case CT_STORE_STORE:
593 chain_type = "Store-stores";
594 break;
596 case CT_COMBINATION:
597 chain_type = "Combination";
598 break;
600 default:
601 gcc_unreachable ();
604 fprintf (file, "%s chain %p%s\n", chain_type, (void *) chain,
605 chain->combined ? " (combined)" : "");
606 if (chain->type != CT_INVARIANT)
607 fprintf (file, " max distance %u%s\n", chain->length,
608 chain->has_max_use_after ? "" : ", may reuse first");
610 if (chain->type == CT_COMBINATION)
612 fprintf (file, " equal to %p %s %p in type ",
613 (void *) chain->ch1, op_symbol_code (chain->op),
614 (void *) chain->ch2);
615 print_generic_expr (file, chain->rslt_type, TDF_SLIM);
616 fprintf (file, "\n");
619 if (chain->vars.exists ())
621 fprintf (file, " vars");
622 FOR_EACH_VEC_ELT (chain->vars, i, var)
624 fprintf (file, " ");
625 print_generic_expr (file, var, TDF_SLIM);
627 fprintf (file, "\n");
630 if (chain->inits.exists ())
632 fprintf (file, " inits");
633 FOR_EACH_VEC_ELT (chain->inits, i, var)
635 fprintf (file, " ");
636 print_generic_expr (file, var, TDF_SLIM);
638 fprintf (file, "\n");
641 fprintf (file, " references:\n");
642 FOR_EACH_VEC_ELT (chain->refs, i, a)
643 dump_dref (file, a);
645 fprintf (file, "\n");
648 /* Dumps CHAINS to FILE. */
650 void
651 dump_chains (FILE *file, const vec<chain_p> &chains)
653 chain_p chain;
654 unsigned i;
656 FOR_EACH_VEC_ELT (chains, i, chain)
657 dump_chain (file, chain);
660 /* Dumps COMP to FILE. */
662 extern void dump_component (FILE *, struct component *);
663 void
664 dump_component (FILE *file, struct component *comp)
666 dref a;
667 unsigned i;
669 fprintf (file, "Component%s:\n",
670 comp->comp_step == RS_INVARIANT ? " (invariant)" : "");
671 FOR_EACH_VEC_ELT (comp->refs, i, a)
672 dump_dref (file, a);
673 fprintf (file, "\n");
676 /* Dumps COMPS to FILE. */
678 extern void dump_components (FILE *, struct component *);
679 void
680 dump_components (FILE *file, struct component *comps)
682 struct component *comp;
684 for (comp = comps; comp; comp = comp->next)
685 dump_component (file, comp);
688 /* Frees a chain CHAIN. */
690 void
691 pcom_worker::release_chain (chain_p chain)
693 dref ref;
694 unsigned i;
696 if (chain == NULL)
697 return;
699 FOR_EACH_VEC_ELT (chain->refs, i, ref)
700 free (ref);
702 if (chain->init_seq)
703 gimple_seq_discard (chain->init_seq);
705 if (chain->fini_seq)
706 gimple_seq_discard (chain->fini_seq);
708 delete chain;
711 /* Frees CHAINS. */
713 void
714 pcom_worker::release_chains ()
716 unsigned i;
717 chain_p chain;
719 FOR_EACH_VEC_ELT (m_chains, i, chain)
720 release_chain (chain);
723 /* Frees list of components COMPS. */
725 static void
726 release_components (struct component *comps)
728 struct component *act, *next;
730 for (act = comps; act; act = next)
732 next = act->next;
733 delete act;
737 /* Finds a root of tree given by FATHERS containing A, and performs path
738 shortening. */
740 static unsigned
741 component_of (vec<unsigned> &fathers, unsigned a)
743 unsigned root, n;
745 for (root = a; root != fathers[root]; root = fathers[root])
746 continue;
748 for (; a != root; a = n)
750 n = fathers[a];
751 fathers[a] = root;
754 return root;
757 /* Join operation for DFU. FATHERS gives the tree, SIZES are sizes of the
758 components, A and B are components to merge. */
760 static void
761 merge_comps (vec<unsigned> &fathers, vec<unsigned> &sizes,
762 unsigned a, unsigned b)
764 unsigned ca = component_of (fathers, a);
765 unsigned cb = component_of (fathers, b);
767 if (ca == cb)
768 return;
770 if (sizes[ca] < sizes[cb])
772 sizes[cb] += sizes[ca];
773 fathers[ca] = cb;
775 else
777 sizes[ca] += sizes[cb];
778 fathers[cb] = ca;
782 /* Returns true if A is a reference that is suitable for predictive commoning
783 in the innermost loop that contains it. REF_STEP is set according to the
784 step of the reference A. */
786 static bool
787 suitable_reference_p (struct data_reference *a, enum ref_step_type *ref_step)
789 tree ref = DR_REF (a), step = DR_STEP (a);
791 if (!step
792 || TREE_THIS_VOLATILE (ref)
793 || !is_gimple_reg_type (TREE_TYPE (ref))
794 || tree_could_throw_p (ref))
795 return false;
797 if (integer_zerop (step))
798 *ref_step = RS_INVARIANT;
799 else if (integer_nonzerop (step))
800 *ref_step = RS_NONZERO;
801 else
802 *ref_step = RS_ANY;
804 return true;
807 /* Stores DR_OFFSET (DR) + DR_INIT (DR) to OFFSET. */
809 void
810 pcom_worker::aff_combination_dr_offset (struct data_reference *dr,
811 aff_tree *offset)
813 tree type = TREE_TYPE (DR_OFFSET (dr));
814 aff_tree delta;
816 tree_to_aff_combination_expand (DR_OFFSET (dr), type, offset, &m_cache);
817 aff_combination_const (&delta, type, wi::to_poly_widest (DR_INIT (dr)));
818 aff_combination_add (offset, &delta);
821 /* Determines number of iterations of the innermost enclosing loop before B
822 refers to exactly the same location as A and stores it to OFF. If A and
823 B do not have the same step, they never meet, or anything else fails,
824 returns false, otherwise returns true. Both A and B are assumed to
825 satisfy suitable_reference_p. */
827 bool
828 pcom_worker::determine_offset (struct data_reference *a,
829 struct data_reference *b, poly_widest_int *off)
831 aff_tree diff, baseb, step;
832 tree typea, typeb;
834 /* Check that both the references access the location in the same type. */
835 typea = TREE_TYPE (DR_REF (a));
836 typeb = TREE_TYPE (DR_REF (b));
837 if (!useless_type_conversion_p (typeb, typea))
838 return false;
840 /* Check whether the base address and the step of both references is the
841 same. */
842 if (!operand_equal_p (DR_STEP (a), DR_STEP (b), 0)
843 || !operand_equal_p (DR_BASE_ADDRESS (a), DR_BASE_ADDRESS (b), 0))
844 return false;
846 if (integer_zerop (DR_STEP (a)))
848 /* If the references have loop invariant address, check that they access
849 exactly the same location. */
850 *off = 0;
851 return (operand_equal_p (DR_OFFSET (a), DR_OFFSET (b), 0)
852 && operand_equal_p (DR_INIT (a), DR_INIT (b), 0));
855 /* Compare the offsets of the addresses, and check whether the difference
856 is a multiple of step. */
857 aff_combination_dr_offset (a, &diff);
858 aff_combination_dr_offset (b, &baseb);
859 aff_combination_scale (&baseb, -1);
860 aff_combination_add (&diff, &baseb);
862 tree_to_aff_combination_expand (DR_STEP (a), TREE_TYPE (DR_STEP (a)),
863 &step, &m_cache);
864 return aff_combination_constant_multiple_p (&diff, &step, off);
867 /* Returns the last basic block in LOOP for that we are sure that
868 it is executed whenever the loop is entered. */
870 static basic_block
871 last_always_executed_block (class loop *loop)
873 unsigned i;
874 auto_vec<edge> exits = get_loop_exit_edges (loop);
875 edge ex;
876 basic_block last = loop->latch;
878 FOR_EACH_VEC_ELT (exits, i, ex)
879 last = nearest_common_dominator (CDI_DOMINATORS, last, ex->src);
881 return last;
884 /* Splits dependence graph on DATAREFS described by DEPENDENCES to
885 components. */
887 struct component *
888 pcom_worker::split_data_refs_to_components ()
890 unsigned i, n = m_datarefs.length ();
891 unsigned ca, ia, ib, bad;
892 struct data_reference *dr, *dra, *drb;
893 struct data_dependence_relation *ddr;
894 struct component *comp_list = NULL, *comp;
895 dref dataref;
896 /* Don't do store elimination if loop has multiple exit edges. */
897 bool eliminate_store_p = single_exit (m_loop) != NULL;
898 basic_block last_always_executed = last_always_executed_block (m_loop);
899 auto_bitmap no_store_store_comps;
900 auto_vec<unsigned> comp_father (n + 1);
901 auto_vec<unsigned> comp_size (n + 1);
902 comp_father.quick_grow (n + 1);
903 comp_size.quick_grow (n + 1);
905 FOR_EACH_VEC_ELT (m_datarefs, i, dr)
907 if (!DR_REF (dr))
908 /* A fake reference for call or asm_expr that may clobber memory;
909 just fail. */
910 return NULL;
911 /* predcom pass isn't prepared to handle calls with data references. */
912 if (is_gimple_call (DR_STMT (dr)))
913 return NULL;
914 dr->aux = (void *) (size_t) i;
915 comp_father[i] = i;
916 comp_size[i] = 1;
919 /* A component reserved for the "bad" data references. */
920 comp_father[n] = n;
921 comp_size[n] = 1;
923 FOR_EACH_VEC_ELT (m_datarefs, i, dr)
925 enum ref_step_type dummy;
927 if (!suitable_reference_p (dr, &dummy))
929 ia = (unsigned) (size_t) dr->aux;
930 merge_comps (comp_father, comp_size, n, ia);
934 FOR_EACH_VEC_ELT (m_dependences, i, ddr)
936 poly_widest_int dummy_off;
938 if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
939 continue;
941 dra = DDR_A (ddr);
942 drb = DDR_B (ddr);
944 /* Don't do store elimination if there is any unknown dependence for
945 any store data reference. */
946 if ((DR_IS_WRITE (dra) || DR_IS_WRITE (drb))
947 && (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know
948 || DDR_NUM_DIST_VECTS (ddr) == 0))
949 eliminate_store_p = false;
951 ia = component_of (comp_father, (unsigned) (size_t) dra->aux);
952 ib = component_of (comp_father, (unsigned) (size_t) drb->aux);
953 if (ia == ib)
954 continue;
956 bad = component_of (comp_father, n);
958 /* If both A and B are reads, we may ignore unsuitable dependences. */
959 if (DR_IS_READ (dra) && DR_IS_READ (drb))
961 if (ia == bad || ib == bad
962 || !determine_offset (dra, drb, &dummy_off))
963 continue;
965 /* If A is read and B write or vice versa and there is unsuitable
966 dependence, instead of merging both components into a component
967 that will certainly not pass suitable_component_p, just put the
968 read into bad component, perhaps at least the write together with
969 all the other data refs in it's component will be optimizable. */
970 else if (DR_IS_READ (dra) && ib != bad)
972 if (ia == bad)
974 bitmap_set_bit (no_store_store_comps, ib);
975 continue;
977 else if (!determine_offset (dra, drb, &dummy_off))
979 bitmap_set_bit (no_store_store_comps, ib);
980 merge_comps (comp_father, comp_size, bad, ia);
981 continue;
984 else if (DR_IS_READ (drb) && ia != bad)
986 if (ib == bad)
988 bitmap_set_bit (no_store_store_comps, ia);
989 continue;
991 else if (!determine_offset (dra, drb, &dummy_off))
993 bitmap_set_bit (no_store_store_comps, ia);
994 merge_comps (comp_father, comp_size, bad, ib);
995 continue;
998 else if (DR_IS_WRITE (dra) && DR_IS_WRITE (drb)
999 && ia != bad && ib != bad
1000 && !determine_offset (dra, drb, &dummy_off))
1002 merge_comps (comp_father, comp_size, bad, ia);
1003 merge_comps (comp_father, comp_size, bad, ib);
1004 continue;
1007 merge_comps (comp_father, comp_size, ia, ib);
1010 if (eliminate_store_p)
1012 tree niters = number_of_latch_executions (m_loop);
1014 /* Don't do store elimination if niters info is unknown because stores
1015 in the last iteration can't be eliminated and we need to recover it
1016 after loop. */
1017 eliminate_store_p = (niters != NULL_TREE && niters != chrec_dont_know);
1020 auto_vec<struct component *> comps;
1021 comps.safe_grow_cleared (n, true);
1022 bad = component_of (comp_father, n);
1023 FOR_EACH_VEC_ELT (m_datarefs, i, dr)
1025 ia = (unsigned) (size_t) dr->aux;
1026 ca = component_of (comp_father, ia);
1027 if (ca == bad)
1028 continue;
1030 comp = comps[ca];
1031 if (!comp)
1033 comp = new component (eliminate_store_p);
1034 comp->refs.reserve_exact (comp_size[ca]);
1035 comps[ca] = comp;
1038 dataref = XCNEW (class dref_d);
1039 dataref->ref = dr;
1040 dataref->stmt = DR_STMT (dr);
1041 dataref->offset = 0;
1042 dataref->distance = 0;
1044 dataref->always_accessed
1045 = dominated_by_p (CDI_DOMINATORS, last_always_executed,
1046 gimple_bb (dataref->stmt));
1047 dataref->pos = comp->refs.length ();
1048 comp->refs.quick_push (dataref);
1051 if (eliminate_store_p)
1053 bitmap_iterator bi;
1054 EXECUTE_IF_SET_IN_BITMAP (no_store_store_comps, 0, ia, bi)
1056 ca = component_of (comp_father, ia);
1057 if (ca != bad)
1058 comps[ca]->eliminate_store_p = false;
1062 for (i = 0; i < n; i++)
1064 comp = comps[i];
1065 if (comp)
1067 comp->next = comp_list;
1068 comp_list = comp;
1071 return comp_list;
1074 /* Returns true if the component COMP satisfies the conditions
1075 described in 2) at the beginning of this file. */
1077 bool
1078 pcom_worker::suitable_component_p (struct component *comp)
1080 unsigned i;
1081 dref a, first;
1082 basic_block ba, bp = m_loop->header;
1083 bool ok, has_write = false;
1085 FOR_EACH_VEC_ELT (comp->refs, i, a)
1087 ba = gimple_bb (a->stmt);
1089 if (!just_once_each_iteration_p (m_loop, ba))
1090 return false;
1092 gcc_assert (dominated_by_p (CDI_DOMINATORS, ba, bp));
1093 bp = ba;
1095 if (DR_IS_WRITE (a->ref))
1096 has_write = true;
1099 first = comp->refs[0];
1100 ok = suitable_reference_p (first->ref, &comp->comp_step);
1101 gcc_assert (ok);
1102 first->offset = 0;
1104 for (i = 1; comp->refs.iterate (i, &a); i++)
1106 /* Polynomial offsets are no use, since we need to know the
1107 gap between iteration numbers at compile time. */
1108 poly_widest_int offset;
1109 if (!determine_offset (first->ref, a->ref, &offset)
1110 || !offset.is_constant (&a->offset))
1111 return false;
1113 enum ref_step_type a_step;
1114 gcc_checking_assert (suitable_reference_p (a->ref, &a_step)
1115 && a_step == comp->comp_step);
1118 /* If there is a write inside the component, we must know whether the
1119 step is nonzero or not -- we would not otherwise be able to recognize
1120 whether the value accessed by reads comes from the OFFSET-th iteration
1121 or the previous one. */
1122 if (has_write && comp->comp_step == RS_ANY)
1123 return false;
1125 return true;
1128 /* Check the conditions on references inside each of components COMPS,
1129 and remove the unsuitable components from the list. The new list
1130 of components is returned. The conditions are described in 2) at
1131 the beginning of this file. */
1133 struct component *
1134 pcom_worker::filter_suitable_components (struct component *comps)
1136 struct component **comp, *act;
1138 for (comp = &comps; *comp; )
1140 act = *comp;
1141 if (suitable_component_p (act))
1142 comp = &act->next;
1143 else
1145 dref ref;
1146 unsigned i;
1148 *comp = act->next;
1149 FOR_EACH_VEC_ELT (act->refs, i, ref)
1150 free (ref);
1151 delete act;
1155 return comps;
1158 /* Compares two drefs A and B by their offset and position. Callback for
1159 qsort. */
1161 static int
1162 order_drefs (const void *a, const void *b)
1164 const dref *const da = (const dref *) a;
1165 const dref *const db = (const dref *) b;
1166 int offcmp = wi::cmps ((*da)->offset, (*db)->offset);
1168 if (offcmp != 0)
1169 return offcmp;
1171 return (*da)->pos - (*db)->pos;
1174 /* Compares two drefs A and B by their position. Callback for qsort. */
1176 static int
1177 order_drefs_by_pos (const void *a, const void *b)
1179 const dref *const da = (const dref *) a;
1180 const dref *const db = (const dref *) b;
1182 return (*da)->pos - (*db)->pos;
1185 /* Returns root of the CHAIN. */
1187 static inline dref
1188 get_chain_root (chain_p chain)
1190 return chain->refs[0];
1193 /* Given CHAIN, returns the last write ref at DISTANCE, or NULL if it doesn't
1194 exist. */
1196 static inline dref
1197 get_chain_last_write_at (chain_p chain, unsigned distance)
1199 for (unsigned i = chain->refs.length (); i > 0; i--)
1200 if (DR_IS_WRITE (chain->refs[i - 1]->ref)
1201 && distance == chain->refs[i - 1]->distance)
1202 return chain->refs[i - 1];
1204 return NULL;
1207 /* Given CHAIN, returns the last write ref with the same distance before load
1208 at index LOAD_IDX, or NULL if it doesn't exist. */
1210 static inline dref
1211 get_chain_last_write_before_load (chain_p chain, unsigned load_idx)
1213 gcc_assert (load_idx < chain->refs.length ());
1215 unsigned distance = chain->refs[load_idx]->distance;
1217 for (unsigned i = load_idx; i > 0; i--)
1218 if (DR_IS_WRITE (chain->refs[i - 1]->ref)
1219 && distance == chain->refs[i - 1]->distance)
1220 return chain->refs[i - 1];
1222 return NULL;
1225 /* Adds REF to the chain CHAIN. */
1227 static void
1228 add_ref_to_chain (chain_p chain, dref ref)
1230 dref root = get_chain_root (chain);
1232 gcc_assert (wi::les_p (root->offset, ref->offset));
1233 widest_int dist = ref->offset - root->offset;
1234 gcc_assert (wi::fits_uhwi_p (dist));
1236 chain->refs.safe_push (ref);
1238 ref->distance = dist.to_uhwi ();
1240 if (ref->distance >= chain->length)
1242 chain->length = ref->distance;
1243 chain->has_max_use_after = false;
1246 /* Promote this chain to CT_STORE_STORE if it has multiple stores. */
1247 if (DR_IS_WRITE (ref->ref))
1248 chain->type = CT_STORE_STORE;
1250 /* Don't set the flag for store-store chain since there is no use. */
1251 if (chain->type != CT_STORE_STORE
1252 && ref->distance == chain->length
1253 && ref->pos > root->pos)
1254 chain->has_max_use_after = true;
1256 chain->all_always_accessed &= ref->always_accessed;
1259 /* Returns the chain for invariant component COMP. */
1261 static chain_p
1262 make_invariant_chain (struct component *comp)
1264 chain_p chain = new struct chain (CT_INVARIANT);
1265 unsigned i;
1266 dref ref;
1268 chain->all_always_accessed = true;
1270 FOR_EACH_VEC_ELT (comp->refs, i, ref)
1272 chain->refs.safe_push (ref);
1273 chain->all_always_accessed &= ref->always_accessed;
1276 chain->inits = vNULL;
1277 chain->finis = vNULL;
1279 return chain;
1282 /* Make a new chain of type TYPE rooted at REF. */
1284 static chain_p
1285 make_rooted_chain (dref ref, enum chain_type type)
1287 chain_p chain = new struct chain (type);
1289 chain->refs.safe_push (ref);
1290 chain->all_always_accessed = ref->always_accessed;
1291 ref->distance = 0;
1293 chain->inits = vNULL;
1294 chain->finis = vNULL;
1296 return chain;
1299 /* Returns true if CHAIN is not trivial. */
1301 static bool
1302 nontrivial_chain_p (chain_p chain)
1304 return chain != NULL && chain->refs.length () > 1;
1307 /* Returns the ssa name that contains the value of REF, or NULL_TREE if there
1308 is no such name. */
1310 static tree
1311 name_for_ref (dref ref)
1313 tree name;
1315 if (is_gimple_assign (ref->stmt))
1317 if (!ref->ref || DR_IS_READ (ref->ref))
1318 name = gimple_assign_lhs (ref->stmt);
1319 else
1320 name = gimple_assign_rhs1 (ref->stmt);
1322 else
1323 name = PHI_RESULT (ref->stmt);
1325 return (TREE_CODE (name) == SSA_NAME ? name : NULL_TREE);
1328 /* Returns true if REF is a valid initializer for ROOT with given DISTANCE (in
1329 iterations of the innermost enclosing loop). */
1331 bool
1332 pcom_worker::valid_initializer_p (struct data_reference *ref, unsigned distance,
1333 struct data_reference *root)
1335 aff_tree diff, base, step;
1336 poly_widest_int off;
1338 /* Both REF and ROOT must be accessing the same object. */
1339 if (!operand_equal_p (DR_BASE_ADDRESS (ref), DR_BASE_ADDRESS (root), 0))
1340 return false;
1342 /* The initializer is defined outside of loop, hence its address must be
1343 invariant inside the loop. */
1344 gcc_assert (integer_zerop (DR_STEP (ref)));
1346 /* If the address of the reference is invariant, initializer must access
1347 exactly the same location. */
1348 if (integer_zerop (DR_STEP (root)))
1349 return (operand_equal_p (DR_OFFSET (ref), DR_OFFSET (root), 0)
1350 && operand_equal_p (DR_INIT (ref), DR_INIT (root), 0));
1352 /* Verify that this index of REF is equal to the root's index at
1353 -DISTANCE-th iteration. */
1354 aff_combination_dr_offset (root, &diff);
1355 aff_combination_dr_offset (ref, &base);
1356 aff_combination_scale (&base, -1);
1357 aff_combination_add (&diff, &base);
1359 tree_to_aff_combination_expand (DR_STEP (root), TREE_TYPE (DR_STEP (root)),
1360 &step, &m_cache);
1361 if (!aff_combination_constant_multiple_p (&diff, &step, &off))
1362 return false;
1364 if (maybe_ne (off, distance))
1365 return false;
1367 return true;
1370 /* Finds looparound phi node of loop that copies the value of REF, and if its
1371 initial value is correct (equal to initial value of REF shifted by one
1372 iteration), returns the phi node. Otherwise, NULL_TREE is returned. ROOT
1373 is the root of the current chain. */
1375 gphi *
1376 pcom_worker::find_looparound_phi (dref ref, dref root)
1378 tree name, init, init_ref;
1379 gphi *phi = NULL;
1380 gimple *init_stmt;
1381 edge latch = loop_latch_edge (m_loop);
1382 struct data_reference init_dr;
1383 gphi_iterator psi;
1385 if (is_gimple_assign (ref->stmt))
1387 if (DR_IS_READ (ref->ref))
1388 name = gimple_assign_lhs (ref->stmt);
1389 else
1390 name = gimple_assign_rhs1 (ref->stmt);
1392 else
1393 name = PHI_RESULT (ref->stmt);
1394 if (!name)
1395 return NULL;
1397 for (psi = gsi_start_phis (m_loop->header); !gsi_end_p (psi); gsi_next (&psi))
1399 phi = psi.phi ();
1400 if (PHI_ARG_DEF_FROM_EDGE (phi, latch) == name)
1401 break;
1404 if (gsi_end_p (psi))
1405 return NULL;
1407 init = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (m_loop));
1408 if (TREE_CODE (init) != SSA_NAME)
1409 return NULL;
1410 init_stmt = SSA_NAME_DEF_STMT (init);
1411 if (gimple_code (init_stmt) != GIMPLE_ASSIGN)
1412 return NULL;
1413 gcc_assert (gimple_assign_lhs (init_stmt) == init);
1415 init_ref = gimple_assign_rhs1 (init_stmt);
1416 if (!REFERENCE_CLASS_P (init_ref)
1417 && !DECL_P (init_ref))
1418 return NULL;
1420 /* Analyze the behavior of INIT_REF with respect to LOOP (innermost
1421 loop enclosing PHI). */
1422 memset (&init_dr, 0, sizeof (struct data_reference));
1423 DR_REF (&init_dr) = init_ref;
1424 DR_STMT (&init_dr) = phi;
1425 if (!dr_analyze_innermost (&DR_INNERMOST (&init_dr), init_ref, m_loop,
1426 init_stmt))
1427 return NULL;
1429 if (!valid_initializer_p (&init_dr, ref->distance + 1, root->ref))
1430 return NULL;
1432 return phi;
1435 /* Adds a reference for the looparound copy of REF in PHI to CHAIN. */
1437 static void
1438 insert_looparound_copy (chain_p chain, dref ref, gphi *phi)
1440 dref nw = XCNEW (class dref_d), aref;
1441 unsigned i;
1443 nw->stmt = phi;
1444 nw->distance = ref->distance + 1;
1445 nw->always_accessed = 1;
1447 FOR_EACH_VEC_ELT (chain->refs, i, aref)
1448 if (aref->distance >= nw->distance)
1449 break;
1450 chain->refs.safe_insert (i, nw);
1452 if (nw->distance > chain->length)
1454 chain->length = nw->distance;
1455 chain->has_max_use_after = false;
1459 /* For references in CHAIN that are copied around the loop (created previously
1460 by PRE, or by user), add the results of such copies to the chain. This
1461 enables us to remove the copies by unrolling, and may need less registers
1462 (also, it may allow us to combine chains together). */
1464 void
1465 pcom_worker::add_looparound_copies (chain_p chain)
1467 unsigned i;
1468 dref ref, root = get_chain_root (chain);
1469 gphi *phi;
1471 if (chain->type == CT_STORE_STORE)
1472 return;
1474 FOR_EACH_VEC_ELT (chain->refs, i, ref)
1476 phi = find_looparound_phi (ref, root);
1477 if (!phi)
1478 continue;
1480 bitmap_set_bit (m_looparound_phis, SSA_NAME_VERSION (PHI_RESULT (phi)));
1481 insert_looparound_copy (chain, ref, phi);
1485 /* Find roots of the values and determine distances in the component COMP.
1486 The references are redistributed into chains. */
1488 void
1489 pcom_worker::determine_roots_comp (struct component *comp)
1491 unsigned i;
1492 dref a;
1493 chain_p chain = NULL;
1494 widest_int last_ofs = 0;
1495 enum chain_type type;
1497 /* Invariants are handled specially. */
1498 if (comp->comp_step == RS_INVARIANT)
1500 chain = make_invariant_chain (comp);
1501 m_chains.safe_push (chain);
1502 return;
1505 /* Trivial component. */
1506 if (comp->refs.length () <= 1)
1508 if (comp->refs.length () == 1)
1510 free (comp->refs[0]);
1511 comp->refs.truncate (0);
1513 return;
1516 comp->refs.qsort (order_drefs);
1518 /* For Store-Store chain, we only support load if it is dominated by a
1519 store statement in the same iteration of loop. */
1520 if (comp->eliminate_store_p)
1521 for (a = NULL, i = 0; i < comp->refs.length (); i++)
1523 if (DR_IS_WRITE (comp->refs[i]->ref))
1524 a = comp->refs[i];
1525 else if (a == NULL || a->offset != comp->refs[i]->offset)
1527 /* If there is load that is not dominated by a store in the
1528 same iteration of loop, clear the flag so no Store-Store
1529 chain is generated for this component. */
1530 comp->eliminate_store_p = false;
1531 break;
1535 /* Determine roots and create chains for components. */
1536 FOR_EACH_VEC_ELT (comp->refs, i, a)
1538 if (!chain
1539 || (chain->type == CT_LOAD && DR_IS_WRITE (a->ref))
1540 || (!comp->eliminate_store_p && DR_IS_WRITE (a->ref))
1541 || wi::leu_p (MAX_DISTANCE, a->offset - last_ofs))
1543 if (nontrivial_chain_p (chain))
1545 add_looparound_copies (chain);
1546 m_chains.safe_push (chain);
1548 else
1549 release_chain (chain);
1551 /* Determine type of the chain. If the root reference is a load,
1552 this can only be a CT_LOAD chain; other chains are intialized
1553 to CT_STORE_LOAD and might be promoted to CT_STORE_STORE when
1554 new reference is added. */
1555 type = DR_IS_READ (a->ref) ? CT_LOAD : CT_STORE_LOAD;
1556 chain = make_rooted_chain (a, type);
1557 last_ofs = a->offset;
1558 continue;
1561 add_ref_to_chain (chain, a);
1564 if (nontrivial_chain_p (chain))
1566 add_looparound_copies (chain);
1567 m_chains.safe_push (chain);
1569 else
1570 release_chain (chain);
1573 /* Find roots of the values and determine distances in components COMPS, and
1574 separates the references to chains. */
1576 void
1577 pcom_worker::determine_roots (struct component *comps)
1579 struct component *comp;
1581 for (comp = comps; comp; comp = comp->next)
1582 determine_roots_comp (comp);
1585 /* Replace the reference in statement STMT with temporary variable
1586 NEW_TREE. If SET is true, NEW_TREE is instead initialized to the value of
1587 the reference in the statement. IN_LHS is true if the reference
1588 is in the lhs of STMT, false if it is in rhs. */
1590 static void
1591 replace_ref_with (gimple *stmt, tree new_tree, bool set, bool in_lhs)
1593 tree val;
1594 gassign *new_stmt;
1595 gimple_stmt_iterator bsi, psi;
1597 if (gimple_code (stmt) == GIMPLE_PHI)
1599 gcc_assert (!in_lhs && !set);
1601 val = PHI_RESULT (stmt);
1602 bsi = gsi_after_labels (gimple_bb (stmt));
1603 psi = gsi_for_stmt (stmt);
1604 remove_phi_node (&psi, false);
1606 /* Turn the phi node into GIMPLE_ASSIGN. */
1607 new_stmt = gimple_build_assign (val, new_tree);
1608 gsi_insert_before (&bsi, new_stmt, GSI_NEW_STMT);
1609 return;
1612 /* Since the reference is of gimple_reg type, it should only
1613 appear as lhs or rhs of modify statement. */
1614 gcc_assert (is_gimple_assign (stmt));
1616 bsi = gsi_for_stmt (stmt);
1618 /* If we do not need to initialize NEW_TREE, just replace the use of OLD. */
1619 if (!set)
1621 gcc_assert (!in_lhs);
1622 gimple_assign_set_rhs_from_tree (&bsi, new_tree);
1623 stmt = gsi_stmt (bsi);
1624 update_stmt (stmt);
1625 return;
1628 if (in_lhs)
1630 /* We have statement
1632 OLD = VAL
1634 If OLD is a memory reference, then VAL is gimple_val, and we transform
1635 this to
1637 OLD = VAL
1638 NEW = VAL
1640 Otherwise, we are replacing a combination chain,
1641 VAL is the expression that performs the combination, and OLD is an
1642 SSA name. In this case, we transform the assignment to
1644 OLD = VAL
1645 NEW = OLD
1649 val = gimple_assign_lhs (stmt);
1650 if (TREE_CODE (val) != SSA_NAME)
1652 val = gimple_assign_rhs1 (stmt);
1653 gcc_assert (gimple_assign_single_p (stmt));
1654 if (TREE_CLOBBER_P (val))
1655 val = get_or_create_ssa_default_def (cfun, SSA_NAME_VAR (new_tree));
1656 else
1657 gcc_assert (gimple_assign_copy_p (stmt));
1660 else
1662 /* VAL = OLD
1664 is transformed to
1666 VAL = OLD
1667 NEW = VAL */
1669 val = gimple_assign_lhs (stmt);
1672 new_stmt = gimple_build_assign (new_tree, unshare_expr (val));
1673 gsi_insert_after (&bsi, new_stmt, GSI_NEW_STMT);
1676 /* Returns a memory reference to DR in the (NITERS + ITER)-th iteration
1677 of the loop it was analyzed in. Append init stmts to STMTS. */
1679 static tree
1680 ref_at_iteration (data_reference_p dr, int iter,
1681 gimple_seq *stmts, tree niters = NULL_TREE)
1683 tree off = DR_OFFSET (dr);
1684 tree coff = DR_INIT (dr);
1685 tree ref = DR_REF (dr);
1686 enum tree_code ref_code = ERROR_MARK;
1687 tree ref_type = NULL_TREE;
1688 tree ref_op1 = NULL_TREE;
1689 tree ref_op2 = NULL_TREE;
1690 tree new_offset;
1692 if (iter != 0)
1694 new_offset = size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter));
1695 if (TREE_CODE (new_offset) == INTEGER_CST)
1696 coff = size_binop (PLUS_EXPR, coff, new_offset);
1697 else
1698 off = size_binop (PLUS_EXPR, off, new_offset);
1701 if (niters != NULL_TREE)
1703 niters = fold_convert (ssizetype, niters);
1704 new_offset = size_binop (MULT_EXPR, DR_STEP (dr), niters);
1705 if (TREE_CODE (niters) == INTEGER_CST)
1706 coff = size_binop (PLUS_EXPR, coff, new_offset);
1707 else
1708 off = size_binop (PLUS_EXPR, off, new_offset);
1711 /* While data-ref analysis punts on bit offsets it still handles
1712 bitfield accesses at byte boundaries. Cope with that. Note that
1713 if the bitfield object also starts at a byte-boundary we can simply
1714 replicate the COMPONENT_REF, but we have to subtract the component's
1715 byte-offset from the MEM_REF address first.
1716 Otherwise we simply build a BIT_FIELD_REF knowing that the bits
1717 start at offset zero. */
1718 if (TREE_CODE (ref) == COMPONENT_REF
1719 && DECL_BIT_FIELD (TREE_OPERAND (ref, 1)))
1721 unsigned HOST_WIDE_INT boff;
1722 tree field = TREE_OPERAND (ref, 1);
1723 tree offset = component_ref_field_offset (ref);
1724 ref_type = TREE_TYPE (ref);
1725 boff = tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field));
1726 /* This can occur in Ada. See the comment in get_bit_range. */
1727 if (boff % BITS_PER_UNIT != 0
1728 || !tree_fits_uhwi_p (offset))
1730 ref_code = BIT_FIELD_REF;
1731 ref_op1 = DECL_SIZE (field);
1732 ref_op2 = bitsize_zero_node;
1734 else
1736 boff >>= LOG2_BITS_PER_UNIT;
1737 boff += tree_to_uhwi (offset);
1738 coff = size_binop (MINUS_EXPR, coff, ssize_int (boff));
1739 ref_code = COMPONENT_REF;
1740 ref_op1 = field;
1741 ref_op2 = TREE_OPERAND (ref, 2);
1742 ref = TREE_OPERAND (ref, 0);
1745 tree addr = fold_build_pointer_plus (DR_BASE_ADDRESS (dr), off);
1746 addr = force_gimple_operand_1 (unshare_expr (addr), stmts,
1747 is_gimple_mem_ref_addr, NULL_TREE);
1748 tree alias_ptr = fold_convert (reference_alias_ptr_type (ref), coff);
1749 tree type = build_aligned_type (TREE_TYPE (ref),
1750 get_object_alignment (ref));
1751 ref = build2 (MEM_REF, type, addr, alias_ptr);
1752 if (ref_type)
1753 ref = build3 (ref_code, ref_type, ref, ref_op1, ref_op2);
1754 return ref;
1757 /* Get the initialization expression for the INDEX-th temporary variable
1758 of CHAIN. */
1760 static tree
1761 get_init_expr (chain_p chain, unsigned index)
1763 if (chain->type == CT_COMBINATION)
1765 tree e1 = get_init_expr (chain->ch1, index);
1766 tree e2 = get_init_expr (chain->ch2, index);
1768 return fold_build2 (chain->op, chain->rslt_type, e1, e2);
1770 else
1771 return chain->inits[index];
1774 /* Returns a new temporary variable used for the I-th variable carrying
1775 value of REF. The variable's uid is marked in TMP_VARS. */
1777 static tree
1778 predcom_tmp_var (tree ref, unsigned i, bitmap tmp_vars)
1780 tree type = TREE_TYPE (ref);
1781 /* We never access the components of the temporary variable in predictive
1782 commoning. */
1783 tree var = create_tmp_reg (type, get_lsm_tmp_name (ref, i));
1784 bitmap_set_bit (tmp_vars, DECL_UID (var));
1785 return var;
1788 /* Creates the variables for CHAIN, as well as phi nodes for them and
1789 initialization on entry to LOOP. Uids of the newly created
1790 temporary variables are marked in TMP_VARS. */
1792 static void
1793 initialize_root_vars (class loop *loop, chain_p chain, bitmap tmp_vars)
1795 unsigned i;
1796 unsigned n = chain->length;
1797 dref root = get_chain_root (chain);
1798 bool reuse_first = !chain->has_max_use_after;
1799 tree ref, init, var, next;
1800 gphi *phi;
1801 gimple_seq stmts;
1802 edge entry = loop_preheader_edge (loop), latch = loop_latch_edge (loop);
1804 /* If N == 0, then all the references are within the single iteration. And
1805 since this is an nonempty chain, reuse_first cannot be true. */
1806 gcc_assert (n > 0 || !reuse_first);
1808 chain->vars.create (n + 1);
1810 if (chain->type == CT_COMBINATION)
1811 ref = gimple_assign_lhs (root->stmt);
1812 else
1813 ref = DR_REF (root->ref);
1815 for (i = 0; i < n + (reuse_first ? 0 : 1); i++)
1817 var = predcom_tmp_var (ref, i, tmp_vars);
1818 chain->vars.quick_push (var);
1820 if (reuse_first)
1821 chain->vars.quick_push (chain->vars[0]);
1823 FOR_EACH_VEC_ELT (chain->vars, i, var)
1824 chain->vars[i] = make_ssa_name (var);
1826 for (i = 0; i < n; i++)
1828 var = chain->vars[i];
1829 next = chain->vars[i + 1];
1830 init = get_init_expr (chain, i);
1832 init = force_gimple_operand (init, &stmts, true, NULL_TREE);
1833 if (stmts)
1834 gsi_insert_seq_on_edge_immediate (entry, stmts);
1836 phi = create_phi_node (var, loop->header);
1837 add_phi_arg (phi, init, entry, UNKNOWN_LOCATION);
1838 add_phi_arg (phi, next, latch, UNKNOWN_LOCATION);
1842 /* For inter-iteration store elimination CHAIN in LOOP, returns true if
1843 all stores to be eliminated store loop invariant values into memory.
1844 In this case, we can use these invariant values directly after LOOP. */
1846 static bool
1847 is_inv_store_elimination_chain (class loop *loop, chain_p chain)
1849 if (chain->length == 0 || chain->type != CT_STORE_STORE)
1850 return false;
1852 gcc_assert (!chain->has_max_use_after);
1854 /* If loop iterates for unknown times or fewer times than chain->length,
1855 we still need to setup root variable and propagate it with PHI node. */
1856 tree niters = number_of_latch_executions (loop);
1857 if (TREE_CODE (niters) != INTEGER_CST
1858 || wi::leu_p (wi::to_wide (niters), chain->length))
1859 return false;
1861 /* Check stores in chain for elimination if they only store loop invariant
1862 values. */
1863 for (unsigned i = 0; i < chain->length; i++)
1865 dref a = get_chain_last_write_at (chain, i);
1866 if (a == NULL)
1867 continue;
1869 gimple *def_stmt, *stmt = a->stmt;
1870 if (!gimple_assign_single_p (stmt))
1871 return false;
1873 tree val = gimple_assign_rhs1 (stmt);
1874 if (TREE_CLOBBER_P (val))
1875 return false;
1877 if (CONSTANT_CLASS_P (val))
1878 continue;
1880 if (TREE_CODE (val) != SSA_NAME)
1881 return false;
1883 def_stmt = SSA_NAME_DEF_STMT (val);
1884 if (gimple_nop_p (def_stmt))
1885 continue;
1887 if (flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
1888 return false;
1890 return true;
1893 /* Creates root variables for store elimination CHAIN in which stores for
1894 elimination only store loop invariant values. In this case, we neither
1895 need to load root variables before loop nor propagate it with PHI nodes. */
1897 static void
1898 initialize_root_vars_store_elim_1 (chain_p chain)
1900 tree var;
1901 unsigned i, n = chain->length;
1903 chain->vars.create (n);
1904 chain->vars.safe_grow_cleared (n, true);
1906 /* Initialize root value for eliminated stores at each distance. */
1907 for (i = 0; i < n; i++)
1909 dref a = get_chain_last_write_at (chain, i);
1910 if (a == NULL)
1911 continue;
1913 var = gimple_assign_rhs1 (a->stmt);
1914 chain->vars[a->distance] = var;
1917 /* We don't propagate values with PHI nodes, so manually propagate value
1918 to bubble positions. */
1919 var = chain->vars[0];
1920 for (i = 1; i < n; i++)
1922 if (chain->vars[i] != NULL_TREE)
1924 var = chain->vars[i];
1925 continue;
1927 chain->vars[i] = var;
1930 /* Revert the vector. */
1931 for (i = 0; i < n / 2; i++)
1932 std::swap (chain->vars[i], chain->vars[n - i - 1]);
1935 /* Creates root variables for store elimination CHAIN in which stores for
1936 elimination store loop variant values. In this case, we may need to
1937 load root variables before LOOP and propagate it with PHI nodes. Uids
1938 of the newly created root variables are marked in TMP_VARS. */
1940 static void
1941 initialize_root_vars_store_elim_2 (class loop *loop,
1942 chain_p chain, bitmap tmp_vars)
1944 unsigned i, n = chain->length;
1945 tree ref, init, var, next, val, phi_result;
1946 gimple *stmt;
1947 gimple_seq stmts;
1949 chain->vars.create (n);
1951 ref = DR_REF (get_chain_root (chain)->ref);
1952 for (i = 0; i < n; i++)
1954 var = predcom_tmp_var (ref, i, tmp_vars);
1955 chain->vars.quick_push (var);
1958 FOR_EACH_VEC_ELT (chain->vars, i, var)
1959 chain->vars[i] = make_ssa_name (var);
1961 /* Root values are either rhs operand of stores to be eliminated, or
1962 loaded from memory before loop. */
1963 auto_vec<tree> vtemps;
1964 vtemps.safe_grow_cleared (n, true);
1965 for (i = 0; i < n; i++)
1967 init = get_init_expr (chain, i);
1968 if (init == NULL_TREE)
1970 /* Root value is rhs operand of the store to be eliminated if
1971 it isn't loaded from memory before loop. */
1972 dref a = get_chain_last_write_at (chain, i);
1973 val = gimple_assign_rhs1 (a->stmt);
1974 if (TREE_CLOBBER_P (val))
1976 val = get_or_create_ssa_default_def (cfun, SSA_NAME_VAR (var));
1977 gimple_assign_set_rhs1 (a->stmt, val);
1980 vtemps[n - i - 1] = val;
1982 else
1984 edge latch = loop_latch_edge (loop);
1985 edge entry = loop_preheader_edge (loop);
1987 /* Root value is loaded from memory before loop, we also need
1988 to add PHI nodes to propagate the value across iterations. */
1989 init = force_gimple_operand (init, &stmts, true, NULL_TREE);
1990 if (stmts)
1991 gsi_insert_seq_on_edge_immediate (entry, stmts);
1993 next = chain->vars[n - i];
1994 phi_result = copy_ssa_name (next);
1995 gphi *phi = create_phi_node (phi_result, loop->header);
1996 add_phi_arg (phi, init, entry, UNKNOWN_LOCATION);
1997 add_phi_arg (phi, next, latch, UNKNOWN_LOCATION);
1998 vtemps[n - i - 1] = phi_result;
2002 /* Find the insertion position. */
2003 dref last = get_chain_root (chain);
2004 for (i = 0; i < chain->refs.length (); i++)
2006 if (chain->refs[i]->pos > last->pos)
2007 last = chain->refs[i];
2010 gimple_stmt_iterator gsi = gsi_for_stmt (last->stmt);
2012 /* Insert statements copying root value to root variable. */
2013 for (i = 0; i < n; i++)
2015 var = chain->vars[i];
2016 val = vtemps[i];
2017 stmt = gimple_build_assign (var, val);
2018 gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
2022 /* Generates stores for CHAIN's eliminated stores in LOOP's last
2023 (CHAIN->length - 1) iterations. */
2025 static void
2026 finalize_eliminated_stores (class loop *loop, chain_p chain)
2028 unsigned i, n = chain->length;
2030 for (i = 0; i < n; i++)
2032 tree var = chain->vars[i];
2033 tree fini = chain->finis[n - i - 1];
2034 gimple *stmt = gimple_build_assign (fini, var);
2036 gimple_seq_add_stmt_without_update (&chain->fini_seq, stmt);
2039 if (chain->fini_seq)
2041 gsi_insert_seq_on_edge_immediate (single_exit (loop), chain->fini_seq);
2042 chain->fini_seq = NULL;
2046 /* Initializes a variable for load motion for ROOT and prepares phi nodes and
2047 initialization on entry to LOOP if necessary. The ssa name for the variable
2048 is stored in VARS. If WRITTEN is true, also a phi node to copy its value
2049 around the loop is created. Uid of the newly created temporary variable
2050 is marked in TMP_VARS. INITS is the list containing the (single)
2051 initializer. */
2053 static void
2054 initialize_root_vars_lm (class loop *loop, dref root, bool written,
2055 vec<tree> *vars, const vec<tree> &inits,
2056 bitmap tmp_vars)
2058 unsigned i;
2059 tree ref = DR_REF (root->ref), init, var, next;
2060 gimple_seq stmts;
2061 gphi *phi;
2062 edge entry = loop_preheader_edge (loop), latch = loop_latch_edge (loop);
2064 /* Find the initializer for the variable, and check that it cannot
2065 trap. */
2066 init = inits[0];
2068 vars->create (written ? 2 : 1);
2069 var = predcom_tmp_var (ref, 0, tmp_vars);
2070 vars->quick_push (var);
2071 if (written)
2072 vars->quick_push ((*vars)[0]);
2074 FOR_EACH_VEC_ELT (*vars, i, var)
2075 (*vars)[i] = make_ssa_name (var);
2077 var = (*vars)[0];
2079 init = force_gimple_operand (init, &stmts, written, NULL_TREE);
2080 if (stmts)
2081 gsi_insert_seq_on_edge_immediate (entry, stmts);
2083 if (written)
2085 next = (*vars)[1];
2086 phi = create_phi_node (var, loop->header);
2087 add_phi_arg (phi, init, entry, UNKNOWN_LOCATION);
2088 add_phi_arg (phi, next, latch, UNKNOWN_LOCATION);
2090 else
2092 gassign *init_stmt = gimple_build_assign (var, init);
2093 gsi_insert_on_edge_immediate (entry, init_stmt);
2098 /* Execute load motion for references in chain CHAIN. Uids of the newly
2099 created temporary variables are marked in TMP_VARS. */
2101 static void
2102 execute_load_motion (class loop *loop, chain_p chain, bitmap tmp_vars)
2104 auto_vec<tree> vars;
2105 dref a;
2106 unsigned n_writes = 0, ridx, i;
2107 tree var;
2109 gcc_assert (chain->type == CT_INVARIANT);
2110 gcc_assert (!chain->combined);
2111 FOR_EACH_VEC_ELT (chain->refs, i, a)
2112 if (DR_IS_WRITE (a->ref))
2113 n_writes++;
2115 /* If there are no reads in the loop, there is nothing to do. */
2116 if (n_writes == chain->refs.length ())
2117 return;
2119 initialize_root_vars_lm (loop, get_chain_root (chain), n_writes > 0,
2120 &vars, chain->inits, tmp_vars);
2122 ridx = 0;
2123 FOR_EACH_VEC_ELT (chain->refs, i, a)
2125 bool is_read = DR_IS_READ (a->ref);
2127 if (DR_IS_WRITE (a->ref))
2129 n_writes--;
2130 if (n_writes)
2132 var = vars[0];
2133 var = make_ssa_name (SSA_NAME_VAR (var));
2134 vars[0] = var;
2136 else
2137 ridx = 1;
2140 replace_ref_with (a->stmt, vars[ridx],
2141 !is_read, !is_read);
2145 /* Returns the single statement in that NAME is used, excepting
2146 the looparound phi nodes contained in one of the chains. If there is no
2147 such statement, or more statements, NULL is returned. */
2149 gimple *
2150 pcom_worker::single_nonlooparound_use (tree name)
2152 use_operand_p use;
2153 imm_use_iterator it;
2154 gimple *stmt, *ret = NULL;
2156 FOR_EACH_IMM_USE_FAST (use, it, name)
2158 stmt = USE_STMT (use);
2160 if (gimple_code (stmt) == GIMPLE_PHI)
2162 /* Ignore uses in looparound phi nodes. Uses in other phi nodes
2163 could not be processed anyway, so just fail for them. */
2164 if (bitmap_bit_p (m_looparound_phis,
2165 SSA_NAME_VERSION (PHI_RESULT (stmt))))
2166 continue;
2168 return NULL;
2170 else if (is_gimple_debug (stmt))
2171 continue;
2172 else if (ret != NULL)
2173 return NULL;
2174 else
2175 ret = stmt;
2178 return ret;
2181 /* Remove statement STMT, as well as the chain of assignments in that it is
2182 used. */
2184 void
2185 pcom_worker::remove_stmt (gimple *stmt)
2187 tree name;
2188 gimple *next;
2189 gimple_stmt_iterator psi;
2191 if (gimple_code (stmt) == GIMPLE_PHI)
2193 name = PHI_RESULT (stmt);
2194 next = single_nonlooparound_use (name);
2195 reset_debug_uses (stmt);
2196 psi = gsi_for_stmt (stmt);
2197 remove_phi_node (&psi, true);
2199 if (!next
2200 || !gimple_assign_ssa_name_copy_p (next)
2201 || gimple_assign_rhs1 (next) != name)
2202 return;
2204 stmt = next;
2207 while (1)
2209 gimple_stmt_iterator bsi;
2211 bsi = gsi_for_stmt (stmt);
2213 name = gimple_assign_lhs (stmt);
2214 if (TREE_CODE (name) == SSA_NAME)
2216 next = single_nonlooparound_use (name);
2217 reset_debug_uses (stmt);
2219 else
2221 /* This is a store to be eliminated. */
2222 gcc_assert (gimple_vdef (stmt) != NULL);
2223 next = NULL;
2226 unlink_stmt_vdef (stmt);
2227 gsi_remove (&bsi, true);
2228 release_defs (stmt);
2230 if (!next
2231 || !gimple_assign_ssa_name_copy_p (next)
2232 || gimple_assign_rhs1 (next) != name)
2233 return;
2235 stmt = next;
2239 /* Perform the predictive commoning optimization for a chain CHAIN.
2240 Uids of the newly created temporary variables are marked in TMP_VARS.*/
2242 void
2243 pcom_worker::execute_pred_commoning_chain (chain_p chain,
2244 bitmap tmp_vars)
2246 unsigned i;
2247 dref a;
2248 tree var;
2249 bool in_lhs;
2251 if (chain->combined)
2253 /* For combined chains, just remove the statements that are used to
2254 compute the values of the expression (except for the root one).
2255 We delay this until after all chains are processed. */
2257 else if (chain->type == CT_STORE_STORE)
2259 if (chain->length > 0)
2261 if (chain->inv_store_elimination)
2263 /* If dead stores in this chain only store loop invariant
2264 values, we can simply record the invariant value and use
2265 it directly after loop. */
2266 initialize_root_vars_store_elim_1 (chain);
2268 else
2270 /* If dead stores in this chain store loop variant values,
2271 we need to set up the variables by loading from memory
2272 before loop and propagating it with PHI nodes. */
2273 initialize_root_vars_store_elim_2 (m_loop, chain, tmp_vars);
2276 /* For inter-iteration store elimination chain, stores at each
2277 distance in loop's last (chain->length - 1) iterations can't
2278 be eliminated, because there is no following killing store.
2279 We need to generate these stores after loop. */
2280 finalize_eliminated_stores (m_loop, chain);
2283 bool last_store_p = true;
2284 for (i = chain->refs.length (); i > 0; i--)
2286 a = chain->refs[i - 1];
2287 /* Preserve the last store of the chain. Eliminate other stores
2288 which are killed by the last one. */
2289 if (DR_IS_WRITE (a->ref))
2291 if (last_store_p)
2292 last_store_p = false;
2293 else
2294 remove_stmt (a->stmt);
2296 continue;
2299 /* Any load in Store-Store chain must be dominated by a previous
2300 store, we replace the load reference with rhs of the store. */
2301 dref b = get_chain_last_write_before_load (chain, i - 1);
2302 gcc_assert (b != NULL);
2303 var = gimple_assign_rhs1 (b->stmt);
2304 replace_ref_with (a->stmt, var, false, false);
2307 else
2309 /* For non-combined chains, set up the variables that hold its value. */
2310 initialize_root_vars (m_loop, chain, tmp_vars);
2311 a = get_chain_root (chain);
2312 in_lhs = (chain->type == CT_STORE_LOAD
2313 || chain->type == CT_COMBINATION);
2314 replace_ref_with (a->stmt, chain->vars[chain->length], true, in_lhs);
2316 /* Replace the uses of the original references by these variables. */
2317 for (i = 1; chain->refs.iterate (i, &a); i++)
2319 var = chain->vars[chain->length - a->distance];
2320 replace_ref_with (a->stmt, var, false, false);
2325 /* Determines the unroll factor necessary to remove as many temporary variable
2326 copies as possible. CHAINS is the list of chains that will be
2327 optimized. */
2329 static unsigned
2330 determine_unroll_factor (const vec<chain_p> &chains)
2332 chain_p chain;
2333 unsigned factor = 1, af, nfactor, i;
2334 unsigned max = param_max_unroll_times;
2336 FOR_EACH_VEC_ELT (chains, i, chain)
2338 if (chain->type == CT_INVARIANT)
2339 continue;
2340 /* For now we can't handle unrolling when eliminating stores. */
2341 else if (chain->type == CT_STORE_STORE)
2342 return 1;
2344 if (chain->combined)
2346 /* For combined chains, we can't handle unrolling if we replace
2347 looparound PHIs. */
2348 dref a;
2349 unsigned j;
2350 for (j = 1; chain->refs.iterate (j, &a); j++)
2351 if (gimple_code (a->stmt) == GIMPLE_PHI)
2352 return 1;
2353 continue;
2356 /* The best unroll factor for this chain is equal to the number of
2357 temporary variables that we create for it. */
2358 af = chain->length;
2359 if (chain->has_max_use_after)
2360 af++;
2362 nfactor = factor * af / gcd (factor, af);
2363 if (nfactor <= max)
2364 factor = nfactor;
2367 return factor;
2370 /* Perform the predictive commoning optimization for chains.
2371 Uids of the newly created temporary variables are marked in TMP_VARS. */
2373 void
2374 pcom_worker::execute_pred_commoning (bitmap tmp_vars)
2376 chain_p chain;
2377 unsigned i;
2379 FOR_EACH_VEC_ELT (m_chains, i, chain)
2381 if (chain->type == CT_INVARIANT)
2382 execute_load_motion (m_loop, chain, tmp_vars);
2383 else
2384 execute_pred_commoning_chain (chain, tmp_vars);
2387 FOR_EACH_VEC_ELT (m_chains, i, chain)
2389 if (chain->type == CT_INVARIANT)
2391 else if (chain->combined)
2393 /* For combined chains, just remove the statements that are used to
2394 compute the values of the expression (except for the root one). */
2395 dref a;
2396 unsigned j;
2397 for (j = 1; chain->refs.iterate (j, &a); j++)
2398 remove_stmt (a->stmt);
2403 /* For each reference in CHAINS, if its defining statement is
2404 phi node, record the ssa name that is defined by it. */
2406 static void
2407 replace_phis_by_defined_names (vec<chain_p> &chains)
2409 chain_p chain;
2410 dref a;
2411 unsigned i, j;
2413 FOR_EACH_VEC_ELT (chains, i, chain)
2414 FOR_EACH_VEC_ELT (chain->refs, j, a)
2416 if (gimple_code (a->stmt) == GIMPLE_PHI)
2418 a->name_defined_by_phi = PHI_RESULT (a->stmt);
2419 a->stmt = NULL;
2424 /* For each reference in CHAINS, if name_defined_by_phi is not
2425 NULL, use it to set the stmt field. */
2427 static void
2428 replace_names_by_phis (vec<chain_p> chains)
2430 chain_p chain;
2431 dref a;
2432 unsigned i, j;
2434 FOR_EACH_VEC_ELT (chains, i, chain)
2435 FOR_EACH_VEC_ELT (chain->refs, j, a)
2436 if (a->stmt == NULL)
2438 a->stmt = SSA_NAME_DEF_STMT (a->name_defined_by_phi);
2439 gcc_assert (gimple_code (a->stmt) == GIMPLE_PHI);
2440 a->name_defined_by_phi = NULL_TREE;
2444 /* Wrapper over execute_pred_commoning, to pass it as a callback
2445 to tree_transform_and_unroll_loop. */
2447 struct epcc_data
2449 vec<chain_p> chains;
2450 bitmap tmp_vars;
2451 pcom_worker *worker;
2454 static void
2455 execute_pred_commoning_cbck (class loop *loop ATTRIBUTE_UNUSED, void *data)
2457 struct epcc_data *const dta = (struct epcc_data *) data;
2458 pcom_worker *worker = dta->worker;
2460 /* Restore phi nodes that were replaced by ssa names before
2461 tree_transform_and_unroll_loop (see detailed description in
2462 tree_predictive_commoning_loop). */
2463 replace_names_by_phis (dta->chains);
2464 worker->execute_pred_commoning (dta->tmp_vars);
2467 /* Base NAME and all the names in the chain of phi nodes that use it
2468 on variable VAR. The phi nodes are recognized by being in the copies of
2469 the header of the LOOP. */
2471 static void
2472 base_names_in_chain_on (class loop *loop, tree name, tree var)
2474 gimple *stmt, *phi;
2475 imm_use_iterator iter;
2477 replace_ssa_name_symbol (name, var);
2479 while (1)
2481 phi = NULL;
2482 FOR_EACH_IMM_USE_STMT (stmt, iter, name)
2484 if (gimple_code (stmt) == GIMPLE_PHI
2485 && flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
2487 phi = stmt;
2488 break;
2491 if (!phi)
2492 return;
2494 name = PHI_RESULT (phi);
2495 replace_ssa_name_symbol (name, var);
2499 /* Given an unrolled LOOP after predictive commoning, remove the
2500 register copies arising from phi nodes by changing the base
2501 variables of SSA names. TMP_VARS is the set of the temporary variables
2502 for those we want to perform this. */
2504 static void
2505 eliminate_temp_copies (class loop *loop, bitmap tmp_vars)
2507 edge e;
2508 gphi *phi;
2509 gimple *stmt;
2510 tree name, use, var;
2511 gphi_iterator psi;
2513 e = loop_latch_edge (loop);
2514 for (psi = gsi_start_phis (loop->header); !gsi_end_p (psi); gsi_next (&psi))
2516 phi = psi.phi ();
2517 name = PHI_RESULT (phi);
2518 var = SSA_NAME_VAR (name);
2519 if (!var || !bitmap_bit_p (tmp_vars, DECL_UID (var)))
2520 continue;
2521 use = PHI_ARG_DEF_FROM_EDGE (phi, e);
2522 gcc_assert (TREE_CODE (use) == SSA_NAME);
2524 /* Base all the ssa names in the ud and du chain of NAME on VAR. */
2525 stmt = SSA_NAME_DEF_STMT (use);
2526 while (gimple_code (stmt) == GIMPLE_PHI
2527 /* In case we could not unroll the loop enough to eliminate
2528 all copies, we may reach the loop header before the defining
2529 statement (in that case, some register copies will be present
2530 in loop latch in the final code, corresponding to the newly
2531 created looparound phi nodes). */
2532 && gimple_bb (stmt) != loop->header)
2534 gcc_assert (single_pred_p (gimple_bb (stmt)));
2535 use = PHI_ARG_DEF (stmt, 0);
2536 stmt = SSA_NAME_DEF_STMT (use);
2539 base_names_in_chain_on (loop, use, var);
2543 /* Returns true if CHAIN is suitable to be combined. */
2545 static bool
2546 chain_can_be_combined_p (chain_p chain)
2548 return (!chain->combined
2549 && (chain->type == CT_LOAD || chain->type == CT_COMBINATION));
2552 /* Returns the modify statement that uses NAME. Skips over assignment
2553 statements, NAME is replaced with the actual name used in the returned
2554 statement. */
2556 gimple *
2557 pcom_worker::find_use_stmt (tree *name)
2559 gimple *stmt;
2560 tree rhs, lhs;
2562 /* Skip over assignments. */
2563 while (1)
2565 stmt = single_nonlooparound_use (*name);
2566 if (!stmt)
2567 return NULL;
2569 if (gimple_code (stmt) != GIMPLE_ASSIGN)
2570 return NULL;
2572 lhs = gimple_assign_lhs (stmt);
2573 if (TREE_CODE (lhs) != SSA_NAME)
2574 return NULL;
2576 if (gimple_assign_copy_p (stmt))
2578 rhs = gimple_assign_rhs1 (stmt);
2579 if (rhs != *name)
2580 return NULL;
2582 *name = lhs;
2584 else if (get_gimple_rhs_class (gimple_assign_rhs_code (stmt))
2585 == GIMPLE_BINARY_RHS)
2586 return stmt;
2587 else
2588 return NULL;
2592 /* Returns true if we may perform reassociation for operation CODE in TYPE. */
2594 static bool
2595 may_reassociate_p (tree type, enum tree_code code)
2597 if (FLOAT_TYPE_P (type)
2598 && !flag_unsafe_math_optimizations)
2599 return false;
2601 return (commutative_tree_code (code)
2602 && associative_tree_code (code));
2605 /* If the operation used in STMT is associative and commutative, go through the
2606 tree of the same operations and returns its root. Distance to the root
2607 is stored in DISTANCE. */
2609 gimple *
2610 pcom_worker::find_associative_operation_root (gimple *stmt, unsigned *distance)
2612 tree lhs;
2613 gimple *next;
2614 enum tree_code code = gimple_assign_rhs_code (stmt);
2615 tree type = TREE_TYPE (gimple_assign_lhs (stmt));
2616 unsigned dist = 0;
2618 if (!may_reassociate_p (type, code))
2619 return NULL;
2621 while (1)
2623 lhs = gimple_assign_lhs (stmt);
2624 gcc_assert (TREE_CODE (lhs) == SSA_NAME);
2626 next = find_use_stmt (&lhs);
2627 if (!next
2628 || gimple_assign_rhs_code (next) != code)
2629 break;
2631 stmt = next;
2632 dist++;
2635 if (distance)
2636 *distance = dist;
2637 return stmt;
2640 /* Returns the common statement in that NAME1 and NAME2 have a use. If there
2641 is no such statement, returns NULL_TREE. In case the operation used on
2642 NAME1 and NAME2 is associative and commutative, returns the root of the
2643 tree formed by this operation instead of the statement that uses NAME1 or
2644 NAME2. */
2646 gimple *
2647 pcom_worker::find_common_use_stmt (tree *name1, tree *name2)
2649 gimple *stmt1, *stmt2;
2651 stmt1 = find_use_stmt (name1);
2652 if (!stmt1)
2653 return NULL;
2655 stmt2 = find_use_stmt (name2);
2656 if (!stmt2)
2657 return NULL;
2659 if (stmt1 == stmt2)
2660 return stmt1;
2662 stmt1 = find_associative_operation_root (stmt1, NULL);
2663 if (!stmt1)
2664 return NULL;
2665 stmt2 = find_associative_operation_root (stmt2, NULL);
2666 if (!stmt2)
2667 return NULL;
2669 return (stmt1 == stmt2 ? stmt1 : NULL);
2672 /* Checks whether R1 and R2 are combined together using CODE, with the result
2673 in RSLT_TYPE, in order R1 CODE R2 if SWAP is false and in order R2 CODE R1
2674 if it is true. If CODE is ERROR_MARK, set these values instead. */
2676 bool
2677 pcom_worker::combinable_refs_p (dref r1, dref r2,
2678 enum tree_code *code, bool *swap, tree *rslt_type)
2680 enum tree_code acode;
2681 bool aswap;
2682 tree atype;
2683 tree name1, name2;
2684 gimple *stmt;
2686 name1 = name_for_ref (r1);
2687 name2 = name_for_ref (r2);
2688 gcc_assert (name1 != NULL_TREE && name2 != NULL_TREE);
2690 stmt = find_common_use_stmt (&name1, &name2);
2692 if (!stmt
2693 /* A simple post-dominance check - make sure the combination
2694 is executed under the same condition as the references. */
2695 || (gimple_bb (stmt) != gimple_bb (r1->stmt)
2696 && gimple_bb (stmt) != gimple_bb (r2->stmt)))
2697 return false;
2699 acode = gimple_assign_rhs_code (stmt);
2700 aswap = (!commutative_tree_code (acode)
2701 && gimple_assign_rhs1 (stmt) != name1);
2702 atype = TREE_TYPE (gimple_assign_lhs (stmt));
2704 if (*code == ERROR_MARK)
2706 *code = acode;
2707 *swap = aswap;
2708 *rslt_type = atype;
2709 return true;
2712 return (*code == acode
2713 && *swap == aswap
2714 && *rslt_type == atype);
2717 /* Remove OP from the operation on rhs of STMT, and replace STMT with
2718 an assignment of the remaining operand. */
2720 static void
2721 remove_name_from_operation (gimple *stmt, tree op)
2723 tree other_op;
2724 gimple_stmt_iterator si;
2726 gcc_assert (is_gimple_assign (stmt));
2728 if (gimple_assign_rhs1 (stmt) == op)
2729 other_op = gimple_assign_rhs2 (stmt);
2730 else
2731 other_op = gimple_assign_rhs1 (stmt);
2733 si = gsi_for_stmt (stmt);
2734 gimple_assign_set_rhs_from_tree (&si, other_op);
2736 /* We should not have reallocated STMT. */
2737 gcc_assert (gsi_stmt (si) == stmt);
2739 update_stmt (stmt);
2742 /* Reassociates the expression in that NAME1 and NAME2 are used so that they
2743 are combined in a single statement, and returns this statement. */
2745 gimple *
2746 pcom_worker::reassociate_to_the_same_stmt (tree name1, tree name2)
2748 gimple *stmt1, *stmt2, *root1, *root2, *s1, *s2;
2749 gassign *new_stmt, *tmp_stmt;
2750 tree new_name, tmp_name, var, r1, r2;
2751 unsigned dist1, dist2;
2752 enum tree_code code;
2753 tree type = TREE_TYPE (name1);
2754 gimple_stmt_iterator bsi;
2756 stmt1 = find_use_stmt (&name1);
2757 stmt2 = find_use_stmt (&name2);
2758 root1 = find_associative_operation_root (stmt1, &dist1);
2759 root2 = find_associative_operation_root (stmt2, &dist2);
2760 code = gimple_assign_rhs_code (stmt1);
2762 gcc_assert (root1 && root2 && root1 == root2
2763 && code == gimple_assign_rhs_code (stmt2));
2765 /* Find the root of the nearest expression in that both NAME1 and NAME2
2766 are used. */
2767 r1 = name1;
2768 s1 = stmt1;
2769 r2 = name2;
2770 s2 = stmt2;
2772 while (dist1 > dist2)
2774 s1 = find_use_stmt (&r1);
2775 r1 = gimple_assign_lhs (s1);
2776 dist1--;
2778 while (dist2 > dist1)
2780 s2 = find_use_stmt (&r2);
2781 r2 = gimple_assign_lhs (s2);
2782 dist2--;
2785 while (s1 != s2)
2787 s1 = find_use_stmt (&r1);
2788 r1 = gimple_assign_lhs (s1);
2789 s2 = find_use_stmt (&r2);
2790 r2 = gimple_assign_lhs (s2);
2793 /* Remove NAME1 and NAME2 from the statements in that they are used
2794 currently. */
2795 remove_name_from_operation (stmt1, name1);
2796 remove_name_from_operation (stmt2, name2);
2798 /* Insert the new statement combining NAME1 and NAME2 before S1, and
2799 combine it with the rhs of S1. */
2800 var = create_tmp_reg (type, "predreastmp");
2801 new_name = make_ssa_name (var);
2802 new_stmt = gimple_build_assign (new_name, code, name1, name2);
2804 var = create_tmp_reg (type, "predreastmp");
2805 tmp_name = make_ssa_name (var);
2807 /* Rhs of S1 may now be either a binary expression with operation
2808 CODE, or gimple_val (in case that stmt1 == s1 or stmt2 == s1,
2809 so that name1 or name2 was removed from it). */
2810 tmp_stmt = gimple_build_assign (tmp_name, gimple_assign_rhs_code (s1),
2811 gimple_assign_rhs1 (s1),
2812 gimple_assign_rhs2 (s1));
2814 bsi = gsi_for_stmt (s1);
2815 gimple_assign_set_rhs_with_ops (&bsi, code, new_name, tmp_name);
2816 s1 = gsi_stmt (bsi);
2817 update_stmt (s1);
2819 gsi_insert_before (&bsi, new_stmt, GSI_SAME_STMT);
2820 gsi_insert_before (&bsi, tmp_stmt, GSI_SAME_STMT);
2822 return new_stmt;
2825 /* Returns the statement that combines references R1 and R2. In case R1
2826 and R2 are not used in the same statement, but they are used with an
2827 associative and commutative operation in the same expression, reassociate
2828 the expression so that they are used in the same statement. */
2830 gimple *
2831 pcom_worker::stmt_combining_refs (dref r1, dref r2)
2833 gimple *stmt1, *stmt2;
2834 tree name1 = name_for_ref (r1);
2835 tree name2 = name_for_ref (r2);
2837 stmt1 = find_use_stmt (&name1);
2838 stmt2 = find_use_stmt (&name2);
2839 if (stmt1 == stmt2)
2840 return stmt1;
2842 return reassociate_to_the_same_stmt (name1, name2);
2845 /* Tries to combine chains CH1 and CH2 together. If this succeeds, the
2846 description of the new chain is returned, otherwise we return NULL. */
2848 chain_p
2849 pcom_worker::combine_chains (chain_p ch1, chain_p ch2)
2851 dref r1, r2, nw;
2852 enum tree_code op = ERROR_MARK;
2853 bool swap = false;
2854 chain_p new_chain;
2855 unsigned i;
2856 tree rslt_type = NULL_TREE;
2858 if (ch1 == ch2)
2859 return NULL;
2860 if (ch1->length != ch2->length)
2861 return NULL;
2863 if (ch1->refs.length () != ch2->refs.length ())
2864 return NULL;
2866 for (i = 0; (ch1->refs.iterate (i, &r1)
2867 && ch2->refs.iterate (i, &r2)); i++)
2869 if (r1->distance != r2->distance)
2870 return NULL;
2872 if (!combinable_refs_p (r1, r2, &op, &swap, &rslt_type))
2873 return NULL;
2876 if (swap)
2877 std::swap (ch1, ch2);
2879 new_chain = new struct chain (CT_COMBINATION);
2880 new_chain->op = op;
2881 new_chain->ch1 = ch1;
2882 new_chain->ch2 = ch2;
2883 new_chain->rslt_type = rslt_type;
2884 new_chain->length = ch1->length;
2886 for (i = 0; (ch1->refs.iterate (i, &r1)
2887 && ch2->refs.iterate (i, &r2)); i++)
2889 nw = XCNEW (class dref_d);
2890 nw->stmt = stmt_combining_refs (r1, r2);
2891 nw->distance = r1->distance;
2893 new_chain->refs.safe_push (nw);
2896 ch1->combined = true;
2897 ch2->combined = true;
2898 return new_chain;
2901 /* Recursively update position information of all offspring chains to ROOT
2902 chain's position information. */
2904 static void
2905 update_pos_for_combined_chains (chain_p root)
2907 chain_p ch1 = root->ch1, ch2 = root->ch2;
2908 dref ref, ref1, ref2;
2909 for (unsigned j = 0; (root->refs.iterate (j, &ref)
2910 && ch1->refs.iterate (j, &ref1)
2911 && ch2->refs.iterate (j, &ref2)); ++j)
2912 ref1->pos = ref2->pos = ref->pos;
2914 if (ch1->type == CT_COMBINATION)
2915 update_pos_for_combined_chains (ch1);
2916 if (ch2->type == CT_COMBINATION)
2917 update_pos_for_combined_chains (ch2);
2920 /* Returns true if statement S1 dominates statement S2. */
2922 static bool
2923 pcom_stmt_dominates_stmt_p (gimple *s1, gimple *s2)
2925 basic_block bb1 = gimple_bb (s1), bb2 = gimple_bb (s2);
2927 if (!bb1 || s1 == s2)
2928 return true;
2930 if (bb1 == bb2)
2931 return gimple_uid (s1) < gimple_uid (s2);
2933 return dominated_by_p (CDI_DOMINATORS, bb2, bb1);
2936 /* Try to combine the chains. */
2938 void
2939 pcom_worker::try_combine_chains ()
2941 unsigned i, j;
2942 chain_p ch1, ch2, cch;
2943 auto_vec<chain_p> worklist;
2944 bool combined_p = false;
2946 FOR_EACH_VEC_ELT (m_chains, i, ch1)
2947 if (chain_can_be_combined_p (ch1))
2948 worklist.safe_push (ch1);
2950 while (!worklist.is_empty ())
2952 ch1 = worklist.pop ();
2953 if (!chain_can_be_combined_p (ch1))
2954 continue;
2956 FOR_EACH_VEC_ELT (m_chains, j, ch2)
2958 if (!chain_can_be_combined_p (ch2))
2959 continue;
2961 cch = combine_chains (ch1, ch2);
2962 if (cch)
2964 worklist.safe_push (cch);
2965 m_chains.safe_push (cch);
2966 combined_p = true;
2967 break;
2971 if (!combined_p)
2972 return;
2974 /* Setup UID for all statements in dominance order. */
2975 basic_block *bbs = get_loop_body_in_dom_order (m_loop);
2976 renumber_gimple_stmt_uids_in_blocks (bbs, m_loop->num_nodes);
2977 free (bbs);
2979 /* Re-association in combined chains may generate statements different to
2980 order of references of the original chain. We need to keep references
2981 of combined chain in dominance order so that all uses will be inserted
2982 after definitions. Note:
2983 A) This is necessary for all combined chains.
2984 B) This is only necessary for ZERO distance references because other
2985 references inherit value from loop carried PHIs.
2987 We first update position information for all combined chains. */
2988 dref ref;
2989 for (i = 0; m_chains.iterate (i, &ch1); ++i)
2991 if (ch1->type != CT_COMBINATION || ch1->combined)
2992 continue;
2994 for (j = 0; ch1->refs.iterate (j, &ref); ++j)
2995 ref->pos = gimple_uid (ref->stmt);
2997 update_pos_for_combined_chains (ch1);
2999 /* Then sort references according to newly updated position information. */
3000 for (i = 0; m_chains.iterate (i, &ch1); ++i)
3002 if (ch1->type != CT_COMBINATION && !ch1->combined)
3003 continue;
3005 /* Find the first reference with non-ZERO distance. */
3006 if (ch1->length == 0)
3007 j = ch1->refs.length();
3008 else
3010 for (j = 0; ch1->refs.iterate (j, &ref); ++j)
3011 if (ref->distance != 0)
3012 break;
3015 /* Sort all ZERO distance references by position. */
3016 qsort (&ch1->refs[0], j, sizeof (ch1->refs[0]), order_drefs_by_pos);
3018 if (ch1->combined)
3019 continue;
3021 /* For ZERO length chain, has_max_use_after must be true since root
3022 combined stmt must dominates others. */
3023 if (ch1->length == 0)
3025 ch1->has_max_use_after = true;
3026 continue;
3028 /* Check if there is use at max distance after root for combined chains
3029 and set flag accordingly. */
3030 ch1->has_max_use_after = false;
3031 gimple *root_stmt = get_chain_root (ch1)->stmt;
3032 for (j = 1; ch1->refs.iterate (j, &ref); ++j)
3034 if (ref->distance == ch1->length
3035 && !pcom_stmt_dominates_stmt_p (ref->stmt, root_stmt))
3037 ch1->has_max_use_after = true;
3038 break;
3044 /* Prepare initializers for store elimination CHAIN in LOOP. Returns false
3045 if this is impossible because one of these initializers may trap, true
3046 otherwise. */
3048 static bool
3049 prepare_initializers_chain_store_elim (class loop *loop, chain_p chain)
3051 unsigned i, n = chain->length;
3053 /* For now we can't eliminate stores if some of them are conditional
3054 executed. */
3055 if (!chain->all_always_accessed)
3056 return false;
3058 /* Nothing to intialize for intra-iteration store elimination. */
3059 if (n == 0 && chain->type == CT_STORE_STORE)
3060 return true;
3062 /* For store elimination chain, there is nothing to initialize if stores
3063 to be eliminated only store loop invariant values into memory. */
3064 if (chain->type == CT_STORE_STORE
3065 && is_inv_store_elimination_chain (loop, chain))
3067 chain->inv_store_elimination = true;
3068 return true;
3071 chain->inits.create (n);
3072 chain->inits.safe_grow_cleared (n, true);
3074 /* For store eliminatin chain like below:
3076 for (i = 0; i < len; i++)
3078 a[i] = 1;
3079 // a[i + 1] = ...
3080 a[i + 2] = 3;
3083 store to a[i + 1] is missed in loop body, it acts like bubbles. The
3084 content of a[i + 1] remain the same if the loop iterates fewer times
3085 than chain->length. We need to set up root variables for such stores
3086 by loading from memory before loop. Note we only need to load bubble
3087 elements because loop body is guaranteed to be executed at least once
3088 after loop's preheader edge. */
3089 auto_vec<bool> bubbles;
3090 bubbles.safe_grow_cleared (n + 1, true);
3091 for (i = 0; i < chain->refs.length (); i++)
3092 bubbles[chain->refs[i]->distance] = true;
3094 struct data_reference *dr = get_chain_root (chain)->ref;
3095 for (i = 0; i < n; i++)
3097 if (bubbles[i])
3098 continue;
3100 gimple_seq stmts = NULL;
3102 tree init = ref_at_iteration (dr, (int) 0 - i, &stmts);
3103 if (stmts)
3104 gimple_seq_add_seq_without_update (&chain->init_seq, stmts);
3106 chain->inits[i] = init;
3109 return true;
3112 /* Prepare initializers for CHAIN. Returns false if this is impossible
3113 because one of these initializers may trap, true otherwise. */
3115 bool
3116 pcom_worker::prepare_initializers_chain (chain_p chain)
3118 unsigned i, n = (chain->type == CT_INVARIANT) ? 1 : chain->length;
3119 struct data_reference *dr = get_chain_root (chain)->ref;
3120 tree init;
3121 dref laref;
3122 edge entry = loop_preheader_edge (m_loop);
3124 if (chain->type == CT_STORE_STORE)
3125 return prepare_initializers_chain_store_elim (m_loop, chain);
3127 /* Find the initializers for the variables, and check that they cannot
3128 trap. */
3129 chain->inits.create (n);
3130 for (i = 0; i < n; i++)
3131 chain->inits.quick_push (NULL_TREE);
3133 /* If we have replaced some looparound phi nodes, use their initializers
3134 instead of creating our own. */
3135 FOR_EACH_VEC_ELT (chain->refs, i, laref)
3137 if (gimple_code (laref->stmt) != GIMPLE_PHI)
3138 continue;
3140 gcc_assert (laref->distance > 0);
3141 chain->inits[n - laref->distance]
3142 = PHI_ARG_DEF_FROM_EDGE (laref->stmt, entry);
3145 for (i = 0; i < n; i++)
3147 gimple_seq stmts = NULL;
3149 if (chain->inits[i] != NULL_TREE)
3150 continue;
3152 init = ref_at_iteration (dr, (int) i - n, &stmts);
3153 if (!chain->all_always_accessed && tree_could_trap_p (init))
3155 gimple_seq_discard (stmts);
3156 return false;
3159 if (stmts)
3160 gimple_seq_add_seq_without_update (&chain->init_seq, stmts);
3162 chain->inits[i] = init;
3165 return true;
3168 /* Prepare initializers for chains, and free chains that cannot
3169 be used because the initializers might trap. */
3171 void
3172 pcom_worker::prepare_initializers ()
3174 chain_p chain;
3175 unsigned i;
3177 for (i = 0; i < m_chains.length (); )
3179 chain = m_chains[i];
3180 if (prepare_initializers_chain (chain))
3181 i++;
3182 else
3184 release_chain (chain);
3185 m_chains.unordered_remove (i);
3190 /* Generates finalizer memory references for CHAIN. Returns true
3191 if finalizer code for CHAIN can be generated, otherwise false. */
3193 bool
3194 pcom_worker::prepare_finalizers_chain (chain_p chain)
3196 unsigned i, n = chain->length;
3197 struct data_reference *dr = get_chain_root (chain)->ref;
3198 tree fini, niters = number_of_latch_executions (m_loop);
3200 /* For now we can't eliminate stores if some of them are conditional
3201 executed. */
3202 if (!chain->all_always_accessed)
3203 return false;
3205 chain->finis.create (n);
3206 for (i = 0; i < n; i++)
3207 chain->finis.quick_push (NULL_TREE);
3209 /* We never use looparound phi node for store elimination chains. */
3211 /* Find the finalizers for the variables, and check that they cannot
3212 trap. */
3213 for (i = 0; i < n; i++)
3215 gimple_seq stmts = NULL;
3216 gcc_assert (chain->finis[i] == NULL_TREE);
3218 if (TREE_CODE (niters) != INTEGER_CST && TREE_CODE (niters) != SSA_NAME)
3220 niters = unshare_expr (niters);
3221 niters = force_gimple_operand (niters, &stmts, true, NULL);
3222 if (stmts)
3224 gimple_seq_add_seq_without_update (&chain->fini_seq, stmts);
3225 stmts = NULL;
3228 fini = ref_at_iteration (dr, (int) 0 - i, &stmts, niters);
3229 if (stmts)
3230 gimple_seq_add_seq_without_update (&chain->fini_seq, stmts);
3232 chain->finis[i] = fini;
3235 return true;
3238 /* Generates finalizer memory reference for chains. Returns true if
3239 finalizer code generation for chains breaks loop closed ssa form. */
3241 bool
3242 pcom_worker::prepare_finalizers ()
3244 chain_p chain;
3245 unsigned i;
3246 bool loop_closed_ssa = false;
3248 for (i = 0; i < m_chains.length ();)
3250 chain = m_chains[i];
3252 /* Finalizer is only necessary for inter-iteration store elimination
3253 chains. */
3254 if (chain->length == 0 || chain->type != CT_STORE_STORE)
3256 i++;
3257 continue;
3260 if (prepare_finalizers_chain (chain))
3262 i++;
3263 /* Be conservative, assume loop closed ssa form is corrupted
3264 by store-store chain. Though it's not always the case if
3265 eliminated stores only store loop invariant values into
3266 memory. */
3267 loop_closed_ssa = true;
3269 else
3271 release_chain (chain);
3272 m_chains.unordered_remove (i);
3275 return loop_closed_ssa;
3278 /* Insert all initializing gimple stmts into LOOP's entry edge. */
3280 static void
3281 insert_init_seqs (class loop *loop, vec<chain_p> &chains)
3283 unsigned i;
3284 edge entry = loop_preheader_edge (loop);
3286 for (i = 0; i < chains.length (); ++i)
3287 if (chains[i]->init_seq)
3289 gsi_insert_seq_on_edge_immediate (entry, chains[i]->init_seq);
3290 chains[i]->init_seq = NULL;
3294 /* Performs predictive commoning for LOOP. Sets bit 1<<1 of return value
3295 if LOOP was unrolled; Sets bit 1<<2 of return value if loop closed ssa
3296 form was corrupted. Non-zero return value indicates some changes were
3297 applied to this loop. */
3299 unsigned
3300 pcom_worker::tree_predictive_commoning_loop (bool allow_unroll_p)
3302 struct component *components;
3303 unsigned unroll_factor = 0;
3304 class tree_niter_desc desc;
3305 bool unroll = false, loop_closed_ssa = false;
3307 if (dump_file && (dump_flags & TDF_DETAILS))
3308 fprintf (dump_file, "Processing loop %d\n", m_loop->num);
3310 /* Nothing for predicitive commoning if loop only iterates 1 time. */
3311 if (get_max_loop_iterations_int (m_loop) == 0)
3313 if (dump_file && (dump_flags & TDF_DETAILS))
3314 fprintf (dump_file, "Loop iterates only 1 time, nothing to do.\n");
3316 return 0;
3319 /* Find the data references and split them into components according to their
3320 dependence relations. */
3321 auto_vec<loop_p, 3> loop_nest;
3322 if (!compute_data_dependences_for_loop (m_loop, true, &loop_nest, &m_datarefs,
3323 &m_dependences))
3325 if (dump_file && (dump_flags & TDF_DETAILS))
3326 fprintf (dump_file, "Cannot analyze data dependencies\n");
3327 return 0;
3330 if (dump_file && (dump_flags & TDF_DETAILS))
3331 dump_data_dependence_relations (dump_file, m_dependences);
3333 components = split_data_refs_to_components ();
3335 loop_nest.release ();
3336 if (!components)
3337 return 0;
3339 if (dump_file && (dump_flags & TDF_DETAILS))
3341 fprintf (dump_file, "Initial state:\n\n");
3342 dump_components (dump_file, components);
3345 /* Find the suitable components and split them into chains. */
3346 components = filter_suitable_components (components);
3348 auto_bitmap tmp_vars;
3349 determine_roots (components);
3350 release_components (components);
3352 if (!m_chains.exists ())
3354 if (dump_file && (dump_flags & TDF_DETAILS))
3355 fprintf (dump_file,
3356 "Predictive commoning failed: no suitable chains\n");
3357 return 0;
3360 prepare_initializers ();
3361 loop_closed_ssa = prepare_finalizers ();
3363 /* Try to combine the chains that are always worked with together. */
3364 try_combine_chains ();
3366 insert_init_seqs (m_loop, m_chains);
3368 if (dump_file && (dump_flags & TDF_DETAILS))
3370 fprintf (dump_file, "Before commoning:\n\n");
3371 dump_chains (dump_file, m_chains);
3374 if (allow_unroll_p)
3375 /* Determine the unroll factor, and if the loop should be unrolled, ensure
3376 that its number of iterations is divisible by the factor. */
3377 unroll_factor = determine_unroll_factor (m_chains);
3379 if (unroll_factor > 1)
3380 unroll = can_unroll_loop_p (m_loop, unroll_factor, &desc);
3382 /* Execute the predictive commoning transformations, and possibly unroll the
3383 loop. */
3384 if (unroll)
3386 struct epcc_data dta;
3388 if (dump_file && (dump_flags & TDF_DETAILS))
3389 fprintf (dump_file, "Unrolling %u times.\n", unroll_factor);
3391 dta.tmp_vars = tmp_vars;
3392 dta.chains = m_chains.to_vec_legacy ();
3393 dta.worker = this;
3395 /* Cfg manipulations performed in tree_transform_and_unroll_loop before
3396 execute_pred_commoning_cbck is called may cause phi nodes to be
3397 reallocated, which is a problem since CHAINS may point to these
3398 statements. To fix this, we store the ssa names defined by the
3399 phi nodes here instead of the phi nodes themselves, and restore
3400 the phi nodes in execute_pred_commoning_cbck. A bit hacky. */
3401 replace_phis_by_defined_names (m_chains);
3403 tree_transform_and_unroll_loop (m_loop, unroll_factor, &desc,
3404 execute_pred_commoning_cbck, &dta);
3405 eliminate_temp_copies (m_loop, tmp_vars);
3407 else
3409 if (dump_file && (dump_flags & TDF_DETAILS))
3410 fprintf (dump_file,
3411 "Executing predictive commoning without unrolling.\n");
3412 execute_pred_commoning (tmp_vars);
3415 return (unroll ? 2 : 1) | (loop_closed_ssa ? 4 : 1);
3418 /* Runs predictive commoning. */
3420 unsigned
3421 tree_predictive_commoning (bool allow_unroll_p)
3423 unsigned ret = 0, changed = 0;
3425 initialize_original_copy_tables ();
3426 for (auto loop : loops_list (cfun, LI_ONLY_INNERMOST))
3427 if (optimize_loop_for_speed_p (loop))
3429 pcom_worker w(loop);
3430 changed |= w.tree_predictive_commoning_loop (allow_unroll_p);
3432 free_original_copy_tables ();
3434 if (changed > 0)
3436 ret = TODO_update_ssa_only_virtuals;
3438 /* Some loop(s) got unrolled. */
3439 if (changed > 1)
3441 scev_reset ();
3443 /* Need to fix up loop closed SSA. */
3444 if (changed >= 4)
3445 rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
3447 ret |= TODO_cleanup_cfg;
3451 return ret;
3454 /* Predictive commoning Pass. */
3456 static unsigned
3457 run_tree_predictive_commoning (struct function *fun, bool allow_unroll_p)
3459 if (number_of_loops (fun) <= 1)
3460 return 0;
3462 return tree_predictive_commoning (allow_unroll_p);
3465 namespace {
3467 const pass_data pass_data_predcom =
3469 GIMPLE_PASS, /* type */
3470 "pcom", /* name */
3471 OPTGROUP_LOOP, /* optinfo_flags */
3472 TV_PREDCOM, /* tv_id */
3473 PROP_cfg, /* properties_required */
3474 0, /* properties_provided */
3475 0, /* properties_destroyed */
3476 0, /* todo_flags_start */
3477 0, /* todo_flags_finish */
3480 class pass_predcom : public gimple_opt_pass
3482 public:
3483 pass_predcom (gcc::context *ctxt)
3484 : gimple_opt_pass (pass_data_predcom, ctxt)
3487 /* opt_pass methods: */
3488 virtual bool
3489 gate (function *)
3491 if (flag_predictive_commoning != 0)
3492 return true;
3493 /* Loop vectorization enables predictive commoning implicitly
3494 only if predictive commoning isn't set explicitly, and it
3495 doesn't allow unrolling. */
3496 if (flag_tree_loop_vectorize
3497 && !OPTION_SET_P (flag_predictive_commoning))
3498 return true;
3500 return false;
3503 virtual unsigned int
3504 execute (function *fun)
3506 bool allow_unroll_p = flag_predictive_commoning != 0;
3507 return run_tree_predictive_commoning (fun, allow_unroll_p);
3510 }; // class pass_predcom
3512 } // anon namespace
3514 gimple_opt_pass *
3515 make_pass_predcom (gcc::context *ctxt)
3517 return new pass_predcom (ctxt);