Store num args instead of offset in prologue and func entry SrcKeys
[hiphop-php.git] / hphp / runtime / vm / jit / store-elim.cpp
blobc2ad2d20eb600270b88ac490b2b5ecd2d286eb0a
1 /*
2 +----------------------------------------------------------------------+
3 | HipHop for PHP |
4 +----------------------------------------------------------------------+
5 | Copyright (c) 2010-present Facebook, Inc. (http://www.facebook.com) |
6 +----------------------------------------------------------------------+
7 | This source file is subject to version 3.01 of the PHP license, |
8 | that is bundled with this package in the file LICENSE, and is |
9 | available through the world-wide-web at the following url: |
10 | http://www.php.net/license/3_01.txt |
11 | If you did not receive a copy of the PHP license and are unable to |
12 | obtain it through the world-wide-web, please send a note to |
13 | license@php.net so we can mail you a copy immediately. |
14 +----------------------------------------------------------------------+
16 #include "hphp/runtime/vm/jit/opt.h"
18 #include <iterator>
19 #include <string>
21 #include <folly/Format.h>
22 #include <folly/ScopeGuard.h>
24 #include "hphp/util/bisector.h"
25 #include "hphp/util/bitset-utils.h"
26 #include "hphp/util/dataflow-worklist.h"
27 #include "hphp/util/match.h"
28 #include "hphp/util/trace.h"
30 #include "hphp/runtime/vm/jit/alias-analysis.h"
31 #include "hphp/runtime/vm/jit/analysis.h"
32 #include "hphp/runtime/vm/jit/cfg.h"
33 #include "hphp/runtime/vm/jit/containers.h"
34 #include "hphp/runtime/vm/jit/fixup.h"
35 #include "hphp/runtime/vm/jit/ir-unit.h"
36 #include "hphp/runtime/vm/jit/memory-effects.h"
37 #include "hphp/runtime/vm/jit/mutation.h"
38 #include "hphp/runtime/vm/jit/pass-tracer.h"
39 #include "hphp/runtime/vm/jit/state-multi-map.h"
40 #include "hphp/runtime/vm/jit/state-vector.h"
41 #include "hphp/runtime/vm/jit/timer.h"
42 #include "hphp/runtime/vm/jit/vm-reg-liveness.h"
45 This implements partial-redundancy elimination for stores.
47 The basic algorithm follows Morel & Renvoise "Global Optimization by
48 Suppression of Partial Redundancies". That paper talks about redundancy of
49 expressions, so we have to "reverse" everything to apply it to stores
50 (Anticipated <=> Available, In <=> Out).
52 Some general terminology:
53 - A store to a location L is /available/ at code position P if it has
54 happened on all paths to P, and there is no interference (loads or
55 other stores) of L between those stores and P.
56 - A store to a location L is /anticipated/ at code position P if it occurs
57 between P and any subsequent use of L.
59 There are two forms of (non)-transparency for stores. A (possible) use of
60 the result of a candidate store prevents the candidate from being moved past
61 the use, and also prevents the candidate from being killed by a later store.
62 This is very similar to transparency of expressions. But in addition, a
63 possibly interfering store following a candidate store prevents the candidate
64 from moving past the store, but does not prevent the candidate from being
65 killed by a later store. We represent these two criteria via alteredAnt and
66 alteredAvl (see below).
68 This also affects local availabilty; a store may not be available in the
69 sense that it can be moved to the end of the block (because of a possible
70 conflicting write), but it may still be available in the sense that it can be
71 killed by a later store. We split this out into AvlLoc (can be moved to the
72 end), and DelLoc (can be deleted if it's anticipated out).
74 Finally, we make some simplifications to the CONST term in M & R to
75 reduce the number of bitvectors involved, and reduce the bidirectionality
76 of the data flow equations.
78 The final equations are:
80 Local bitvectors (contents of BlockAnalysis), per block B:
81 antLoc[L] : There exists a store to L in B which is anticipated at the
82 start of B---and thus can be moved there, without changing
83 the meaning of the program.
84 avlLoc[L] : There exists a store to L in B which is available at the
85 end of B---and thus can be moved there, without changing
86 the meaning of the program.
87 delLoc[L] : There exists a store to L in B which could be killed if it
88 would be redundant at the end of B (even though it may not
89 be legal to move it there).
91 alteredAnt[L] : B contains a possible use of L. This prevents a store to L
92 from being killed by an otherwise-redundant store in a
93 later block. (It also prevents stores from being moved
94 forward through this block.)
95 alteredAvl[L] : B contains a possible use or store of L. This prevents a
96 store to L from being moved through this block, but stores
97 might still be allowed to be eliminated due to redundancy
98 with later stores (unless, of course, alteredAnt[L] holds).
100 Global bitvectors:
101 Anticipated: (backward walk, initial 1s except in exit blocks)
102 AntIn = AntLoc | (AntOut & ~alteredAnt);
103 AntOut = Product(succs, AntIn(s))
105 Partially Anticipated: (backward walk, initial 0s)
106 PAntIn = AntLoc | (PAntOut & ~alteredAnt)
107 PAntOut = Union(succs, PAntIn(s))
109 Available: (forward walk, initial 1s except in entry block)
110 AvlIn = Product(preds, AvlOut(p))
111 AvlOut = AvlLoc | (AvlIn & ~alteredAvl);
113 Placement Possible: (forward walk, initialized to AvlIn except entry, and
114 AvlOut except exit block)
115 PPIn = Product(preds, PPOut(p))
116 PPOut = PAntOut & (AvlLoc | (PPIn & ~alteredAvl)) &
117 Product(succs, PPIn(s) | AntIn(s))
119 Roughly speaking, at any given block, we're interested in the stores that
120 are PAntOut, because they're redundant on at least one path from this
121 block to the exit of the region. We can place a store at the end of a
122 block if it's AvlLoc (by definition) or if it's PPIn and "transparent" to
123 the block; but since we're actually going to be inserting at the start of
124 blocks, there's no point marking a block PPOut unless the store is PPIn
125 or it's redundant in each of the following blocks.
127 Note that we could use 1s instead of AvlIn/AvlOut, but convergence would
128 be slower, and we would still need to deal with stores where the
129 bitvectors say they're available, but we can't construct a suitable
130 store).
132 Insert = PPIn & ~AntIn & (~PPOut | alteredAvl);
133 We want to insert if it's possible, and the store is not redundant, and we
134 can't push the store any later.
136 Delete = (AntOut & DelLoc) | (PPOut & AvlLoc);
137 AntOut & DelLoc means it's redundant; we'll just kill it.
138 PPOut => PPIn | AntIn in each successor, meaning that for each successor
139 (recursively) we can either push the store into that successor, or it's
140 redundant there.
143 namespace HPHP::jit {
145 TRACE_SET_MOD(hhir_store);
147 namespace {
149 //////////////////////////////////////////////////////////////////////
151 using PostOrderId = uint32_t;
154 TrackedStore is used to keep track of which stores are available at
155 the beginning and end of each Block.
156 Its value can be
157 - Unseen (either there is no store available, or we
158 haven't processed it yet),
159 - Instruction, in which case it holds a pointer to the
160 store which is available there,
161 - Phi..Phi+kMaxTrackedAlocs, in which case it holds a pointer to the
162 Block where the phi would have to be inserted, or
163 - Bad which means that although stores are available on all
164 paths to this point they're not compatible in some way.
166 - Pending and Processed are used while building phis to handle
167 cycles of phis.
169 For the Phi case, we just need to ensure that we can differentiate
170 Phis in the same block for different ALocations; this is just used
171 for the same() method for the combine_ts().
173 struct TrackedStore {
174 enum Kind : int16_t {
175 Unseen,
176 Instruction,
177 Phi,
178 Bad = Phi + kMaxTrackedALocs,
179 Pending,
180 Processed,
183 TrackedStore() = default;
184 TrackedStore(const TrackedStore&) = default;
185 explicit TrackedStore(IRInstruction* i) { m_ptr.set(Instruction, i); }
186 explicit TrackedStore(Block* b, uint32_t id) {
187 m_ptr.set(static_cast<Kind>(Phi + id), b);
190 static TrackedStore BadVal() { TrackedStore s; s.setBad(); return s; }
191 const IRInstruction* instruction() const {
192 return kind() == Instruction ?
193 static_cast<const IRInstruction*>(m_ptr.ptr()) : nullptr;
195 IRInstruction* instruction() {
196 return kind() == Instruction ?
197 static_cast<IRInstruction*>(m_ptr.ptr()) : nullptr;
199 const Block* block() const {
200 return isPhi() ? static_cast<const Block*>(m_ptr.ptr()) : nullptr;
202 Block* block() {
203 return isPhi() ? static_cast<Block*>(m_ptr.ptr()) : nullptr;
205 Block* pending() {
206 return kind() == Pending ? static_cast<Block*>(m_ptr.ptr()) : nullptr;
208 const Block* pending() const {
209 return kind() == Pending ? static_cast<const Block*>(m_ptr.ptr()) : nullptr;
211 IRInstruction* processed() {
212 return kind() == Processed ?
213 static_cast<IRInstruction*>(m_ptr.ptr()) : nullptr;
215 const IRInstruction* processed() const {
216 return kind() == Processed ?
217 static_cast<const IRInstruction*>(m_ptr.ptr()) : nullptr;
219 bool isUnseen() const {
220 return kind() == Unseen;
222 bool isBad() const {
223 return kind() == Bad;
225 bool isPhi() const {
226 return kind() >= Phi && kind() < Bad;
228 void set(IRInstruction* inst) { m_ptr.set(Instruction, inst); }
229 void set(Block* block, uint32_t id) {
230 m_ptr.set(static_cast<Kind>(Phi + id), block);
232 void setPending(Block* block) { m_ptr.set(Pending, block); }
233 void setProcessed(IRInstruction* inst) { m_ptr.set(Processed, inst); }
234 void reset() { m_ptr.set(Unseen, nullptr); }
235 void setBad() { m_ptr.set(Bad, nullptr); }
236 bool same(const TrackedStore& other) const {
237 return kind() == other.kind() && m_ptr.ptr() == other.m_ptr.ptr();
239 friend DEBUG_ONLY bool operator>=(const TrackedStore& a,
240 const TrackedStore& b) {
241 return a.kind() >= b.kind();
243 std::string toString() const {
244 if (isUnseen()) return "Unseen";
245 if (isBad()) return "Bad";
246 char* type;
247 if (instruction()) type = "Inst";
248 else if (isPhi()) type = "Phi";
249 else if (pending()) type = "Pending";
250 else if (processed()) type = "Processed";
251 else type = "XXX";
252 return folly::sformat("{}:0x{:x}",type, uintptr_t(m_ptr.ptr()));
254 private:
255 Kind kind() const { return m_ptr.tag(); }
256 CompactTaggedPtr<void, Kind> m_ptr;
259 struct StoreKey {
260 enum Where { In, Out };
261 StoreKey(uint32_t id, Where w, uint32_t aId) :
262 blkId(id), where(w), alocId(aId) {}
263 StoreKey(const Block* blk, Where w, uint32_t aId) :
264 blkId(blk->id()), where(w), alocId(aId) {}
265 StoreKey(const Block& blk, Where w, uint32_t aId) :
266 blkId(blk.id()), where(w), alocId(aId) {}
268 uint32_t blkId;
269 Where where;
270 uint32_t alocId;
273 struct StoreKeyHashCmp {
274 size_t operator()(const StoreKey& k) const {
275 return (k.blkId * 2 + (int)k.where) * kMaxTrackedALocs + k.alocId;
277 size_t operator()(const StoreKey& k1, const StoreKey& k2) const {
278 return
279 k1.blkId == k2.blkId &&
280 k1.where == k2.where &&
281 k1.alocId == k2.alocId;
283 static uint32_t size(uint32_t numBlocks) {
284 return (numBlocks + 1) * 2 * kMaxTrackedALocs;
288 using MovableStoreMap = hphp_hash_map<StoreKey, TrackedStore,
289 StoreKeyHashCmp, StoreKeyHashCmp>;
291 struct BlockState {
292 PostOrderId id;
293 ALocBits antIn;
294 ALocBits antOut;
295 ALocBits pAntIn;
296 ALocBits pAntOut;
297 ALocBits ppIn;
298 ALocBits ppOut;
301 // Environment for the whole optimization pass.
302 struct Global {
303 explicit Global(IRUnit& unit)
304 : unit(unit)
305 , poBlockList(poSortCfg(unit))
306 , ainfo(collect_aliases(unit, poBlockList))
307 , blockStates(unit, BlockState{})
308 , seenStores(unit, 0)
309 , vmRegsLiveness(analyzeVMRegLiveness(unit, poBlockList))
312 IRUnit& unit;
313 BlockList poBlockList;
314 AliasAnalysis ainfo;
317 * Keep a mapping from DefLabel blocks, to the Alocs of stores that
318 * might depend on them. Normally we don't need to worry about
319 * this, because ssa guarantees that a store is dominated by the def
320 * of its address. But in a loop, if the store's address depends on
321 * a phi, the store might be partially available at the DefLabel.
323 hphp_fast_map<Block*,ALocBits> blk2Aloc;
325 MovableStoreMap trackedStoreMap;
326 // Block states are indexed by block->id(). These are only meaningful after
327 // we do dataflow.
328 StateVector<Block,BlockState> blockStates;
329 jit::vector<IRInstruction*> reStores;
330 // Used to prevent cycles in find_candidate_store
331 StateVector<Block,uint32_t> seenStores;
332 StateVector<IRInstruction,KnownRegState> vmRegsLiveness;
333 uint32_t seenStoreId{0};
334 bool needsReflow{false};
337 // Block-local environment.
338 struct Local {
339 explicit Local(Global& global)
340 : global(global)
343 Global& global;
345 ALocBits antLoc; // Copied to BlockAnalysis::antLoc
346 ALocBits mayLoad; // Things that may be read in the block
347 ALocBits mayStore; // Things that may be written in the block
348 ALocBits avlLoc; // Copied to BlockAnalysis::avlLoc
349 ALocBits delLoc; // Copied to BlockAnalysis::delLoc
351 ALocBits reStores;
354 //////////////////////////////////////////////////////////////////////
356 using jit::show;
358 const char* show(StoreKey::Where w) {
359 switch (w) {
360 case StoreKey::In: return "In";
361 case StoreKey::Out: return "Out";
363 not_reached();
366 std::string show(TrackedStore ts) {
367 if (ts.isUnseen()) return "U";
368 if (ts.isBad()) return "B";
369 if (auto i = ts.instruction()) return folly::sformat("I{}", i->id());
370 if (auto i = ts.processed()) return folly::sformat("I*{}", i->id());
371 if (auto b = ts.block()) return folly::sformat("P{}", b->id());
372 if (auto b = ts.pending()) return folly::sformat("P*{}", b->id());
373 not_reached();
376 Optional<uint32_t> pure_store_bit(Local& env, AliasClass acls) {
377 if (auto const meta = env.global.ainfo.find(canonicalize(acls))) {
378 return meta->index;
380 return std::nullopt;
383 void set_movable_store(Local& env, uint32_t bit, IRInstruction& inst) {
384 env.global.trackedStoreMap[
385 StoreKey { inst.block(), StoreKey::Out, bit }].set(&inst);
388 bool isDead(Local& env, int bit) {
389 return env.antLoc[bit];
392 bool isDeadSet(Local& env, const ALocBits& bits) {
393 return (~env.antLoc & bits).none();
396 bool removeDead(Local& env, IRInstruction& inst, bool trash) {
397 BISECTOR(store_delete);
398 if (!store_delete.go()) return false;
400 FTRACE(4, " dead (removed)\n");
402 IRInstruction* dbgInst = nullptr;
403 if (trash && RuntimeOption::EvalHHIRGenerateAsserts) {
404 switch (inst.op()) {
405 case StStk:
406 case StStkMeta:
407 dbgInst = env.global.unit.gen(
408 DbgTrashStk,
409 inst.bcctx(),
410 *inst.extra<IRSPRelOffsetData>(),
411 inst.src(0)
413 break;
414 case StMem:
415 case StMemMeta:
416 dbgInst = env.global.unit.gen(DbgTrashMem, inst.bcctx(), inst.src(0));
417 break;
418 default:
419 dbgInst = nullptr;
420 break;
424 auto block = inst.block();
425 auto pos = block->erase(&inst);
426 if (dbgInst) {
427 block->insert(pos, dbgInst);
428 return false;
430 return true;
433 void addLoadSet(Local& env, const ALocBits& bits) {
434 FTRACE(4, " load: {}\n", show(bits));
435 env.mayLoad |= bits;
436 env.antLoc &= ~bits;
439 void addAllLoad(Local& env) {
440 FTRACE(4, " load: -1\n");
441 env.mayLoad.set();
442 env.antLoc.reset();
445 void mustStore(Local& env, int bit) {
446 FTRACE(4, " mustStore: {}\n", bit);
447 if (!env.antLoc[bit] &&
448 !env.mayLoad[bit]) {
449 env.delLoc[bit] = 1;
451 env.antLoc[bit] = 1;
452 env.mayStore[bit] = 0;
453 env.reStores[bit] = 0;
456 void mustStoreSet(Local& env, const ALocBits& bits) {
457 FTRACE(4, " mustStore: {}\n", show(bits));
458 env.delLoc |= bits & ~(env.antLoc | env.mayLoad);
459 env.antLoc |= bits;
460 env.mayStore &= ~bits;
461 env.reStores &= ~bits;
464 void killSet(Local& env, const ALocBits& bits) {
465 FTRACE(4, " kill: {}\n", show(bits));
466 env.mayLoad &= ~bits;
467 mustStoreSet(env, bits);
470 //////////////////////////////////////////////////////////////////////
472 void load(Local& env, AliasClass acls) {
473 addLoadSet(env, env.global.ainfo.may_alias(canonicalize(acls)));
476 void mayStore(Local& env, AliasClass acls) {
477 auto const canon = canonicalize(acls);
478 auto const mayStore = env.global.ainfo.may_alias(canon);
479 FTRACE(4, " mayStore: {}\n", show(mayStore));
480 env.mayStore |= mayStore;
481 env.reStores &= ~mayStore;
484 void store(Local& env, AliasClass acls) {
485 mayStore(env, acls);
486 auto const canon = canonicalize(acls);
487 mustStoreSet(env, env.global.ainfo.expand(canon));
490 void kill(Local& env, AliasClass acls) {
491 auto const canon = canonicalize(acls);
492 killSet(env, env.global.ainfo.expand(canon));
495 //////////////////////////////////////////////////////////////////////
497 void addPhiDeps(Local& env, uint32_t bit, SSATmp* dep) {
498 if (dep->inst()->is(DefLabel) || dep->inst()->producesReference()) {
499 FTRACE(2, " B{} alters {} due to potential address change\n",
500 dep->inst()->block()->id(), bit);
501 env.global.blk2Aloc[dep->inst()->block()][bit] = 1;
502 return;
505 for (auto src : dep->inst()->srcs()) {
506 addPhiDeps(env, bit, src);
510 //////////////////////////////////////////////////////////////////////
512 void visit(Local& env, IRInstruction& inst) {
513 auto const effects = memory_effects(inst);
514 FTRACE(3, " {}\n"
515 " {}\n",
516 inst.toString(),
517 show(effects));
519 auto const doPureStore = [&] (PureStore l) {
520 if (auto bit = pure_store_bit(env, l.dst)) {
521 if (l.dep) addPhiDeps(env, *bit, l.dep);
522 if (isDead(env, *bit)) {
523 if (!removeDead(env, inst, true)) {
524 mayStore(env, l.dst);
525 // These don't actually perform a store, so they cannot be a
526 // "mustStore" (they cannot kill a previous store).
527 if (!inst.is(StLocMeta, StStkMeta, StMemMeta)) {
528 mustStore(env, *bit);
531 return;
533 if (!env.antLoc[*bit] &&
534 !env.mayLoad[*bit] &&
535 !env.mayStore[*bit]) {
536 env.avlLoc[*bit] = 1;
537 set_movable_store(env, *bit, inst);
539 mayStore(env, l.dst);
540 if (inst.is(StLocMeta, StStkMeta, StMemMeta)) return;
541 mustStore(env, *bit);
542 if (!l.value || l.value->inst()->block() != inst.block()) return;
543 auto const le = memory_effects(*l.value->inst());
544 auto pl = boost::get<PureLoad>(&le);
545 if (!pl) return;
546 auto lbit = pure_store_bit(env, pl->src);
547 if (!lbit || *lbit != *bit) return;
549 * The source of the store is a load from the same address, which is
550 * also in this block. If there's no interference, we can kill this
551 * store. We won't know until we see the load though.
553 if (env.global.reStores.size() <= *bit) {
554 env.global.reStores.resize(*bit + 1);
556 env.global.reStores[*bit] = &inst;
557 env.reStores[*bit] = 1;
558 return;
560 mayStore(env, l.dst);
563 if (inst.is(BeginCatch)) {
564 // The unwinder fixes up the VMRegs before entering a catch trace, so we
565 // must account for these effects. Otherwise, store-elim may sink sync
566 // operations across the catch trace boundary.
567 auto const doMustStore = [&](AliasClass acls) {
568 if (auto const bit = pure_store_bit(env, acls)) {
569 mustStore(env, *bit);
572 doMustStore(AVMRegState);
573 doMustStore(AVMFP);
574 doMustStore(AVMSP);
575 doMustStore(AVMPC);
576 doMustStore(AVMRetAddr);
577 return;
580 match<void>(
581 effects,
582 [&] (IrrelevantEffects) {
583 switch (inst.op()) {
584 case AssertLoc:
585 load(env, ALocal { inst.src(0), inst.extra<AssertLoc>()->locId });
586 return;
587 case AssertStk:
588 load(env, AStack::at(inst.extra<AssertStk>()->offset));
589 return;
590 default:
591 return;
594 [&] (UnknownEffects) {
595 addAllLoad(env);
596 env.mayStore.set();
597 if (env.global.vmRegsLiveness[inst] == KnownRegState::Dead) {
598 // If the VMRegs are dead, we know they can't be accessed. Kill them
599 // before pessimizing loads.
600 kill(env, AVMRegAny);
603 [&] (PureLoad l) {
604 if (auto bit = pure_store_bit(env, l.src)) {
605 if (env.reStores[*bit]) {
606 auto const st = memory_effects(*env.global.reStores[*bit]);
607 auto const pst = boost::get<PureStore>(&st);
608 if (pst && pst->value && pst->value == inst.dst()) {
609 FTRACE(4, "Killing self-store: {}\n",
610 env.global.reStores[*bit]->toString());
611 removeDead(env, *env.global.reStores[*bit], false);
612 env.reStores[*bit] = 0;
616 load(env, l.src);
618 [&] (GeneralEffects preL) {
619 auto const liveness = env.global.vmRegsLiveness[inst];
620 auto const l = general_effects_for_vmreg_liveness(preL, liveness);
621 load(env, l.loads);
622 load(env, l.inout);
623 load(env, l.backtrace);
624 mayStore(env, l.stores);
625 mayStore(env, l.inout);
626 kill(env, l.kills);
629 [&] (ReturnEffects l) {
630 // Return from the main function. Locations other than the frame and
631 // stack (e.g. object properties and whatnot) are always live on a
632 // function return---so mark everything read before we start killing
633 // things.
634 addAllLoad(env);
635 killSet(env, env.global.ainfo.all_local);
636 kill(env, l.kills);
637 if (env.global.vmRegsLiveness[inst] == KnownRegState::Dead) {
638 // If the VMRegs are dead, we know they can't be accessed. Kill them
639 // before pessimizing loads.
640 kill(env, AVMRegAny);
644 [&] (ExitEffects l) {
645 load(env, l.live);
646 kill(env, l.kills);
649 [&] (PureInlineCall l) {
650 store(env, l.base);
651 load(env, l.actrec);
655 * Call instructions can potentially read any heap location or the VM
656 * register state (which must be dirty), but we can be more precise about
657 * everything else.
659 [&] (CallEffects l) {
660 store(env, l.outputs);
661 load(env, AHeapAny);
662 load(env, AVMRegState);
663 load(env, ARdsAny);
664 load(env, l.locals);
665 load(env, l.inputs);
666 store(env, l.actrec);
667 kill(env, l.kills);
670 [&] (PureStore l) { doPureStore(l); }
674 void block_visit(Local& env, Block* block) {
675 FTRACE(2, " visiting B{}\n", block->id());
676 assertx(!block->instrs().empty());
678 auto prev = std::prev(block->instrs().end());
679 for (;;) {
680 auto it = prev;
681 if (it != block->instrs().begin()) {
682 prev = std::prev(it);
683 visit(env, *it);
684 } else {
685 visit(env, *it);
686 break;
691 //////////////////////////////////////////////////////////////////////
693 struct BlockAnalysis {
694 ALocBits antLoc;
695 ALocBits alteredAnt;
696 ALocBits alteredAvl;
697 ALocBits avlLoc;
698 ALocBits delLoc;
701 BlockAnalysis analyze_block(Global& genv, Block* block) {
702 auto env = Local { genv };
703 env.antLoc.reset(); // During this first analysis pass, we need to pretend
704 // none of the stores is anticipated (because we don't yet
705 // know). We may still remove some stores during this
706 // visit if they are locally proven dead, however.
707 block_visit(env, block);
708 return BlockAnalysis {
709 env.antLoc,
710 env.mayLoad,
711 // Note that we unset must-store bits in env.mayStore, so including
712 // env.antLoc and env.delLoc is required for correctness here.
713 env.mayLoad | env.mayStore | env.antLoc | env.delLoc,
714 env.avlLoc,
715 env.delLoc
719 void find_all_stores(Global& genv, Block* blk, uint32_t id,
720 jit::vector<IRInstruction*>& stores,
721 jit::hash_set<void*>& seen) {
722 ITRACE(7, "find_all_stores: {} B{}\n", id, blk->id());
723 Trace::Indent _i;
724 blk->forEachPred(
725 [&](Block* pred) {
726 if (!seen.insert(pred).second) {
727 ITRACE(7, "find_all_stores: {} B{} skipping pred B{}\n",
728 id, blk->id(), pred->id());
729 return;
731 ITRACE(7, "find_all_stores: {} B{} processing pred B{}\n",
732 id, blk->id(), pred->id());
733 auto& pst =
734 genv.trackedStoreMap[StoreKey { pred, StoreKey::Out, id }];
735 IRInstruction* inst;
736 if ((inst = pst.instruction()) != nullptr ||
737 (inst = pst.processed()) != nullptr) {
738 if (seen.insert(inst).second) {
739 ITRACE(7, "find_all_stores: {} B{} pred B{}: adding {}\n",
740 id, blk->id(), pred->id(), inst->toString());
741 stores.push_back(inst);
742 } else {
743 ITRACE(7, "find_all_stores: {} B{} pred B{}: dropping {}\n",
744 id, blk->id(), pred->id(), inst->toString());
746 return;
748 Block* b;
749 if ((b = pst.block()) != nullptr ||
750 (b = pst.pending()) != nullptr) {
751 if (b != pred && seen.count(b)) {
752 ITRACE(7,
753 "find_all_stores: {} B{} pred B{} previously processed B{}\n",
754 id, blk->id(), pred->id(), b->id());
755 return;
757 ITRACE(7, "find_all_stores: {} B{} pred B{} recur to B{}\n",
758 id, blk->id(), pred->id(), b->id());
759 return find_all_stores(genv, b, id, stores, seen);
761 always_assert(false);
766 IRInstruction* resolve_ts(Global& genv, Block* blk,
767 StoreKey::Where w, uint32_t id);
769 void resolve_cycle(Global& genv, TrackedStore& ts, Block* blk, uint32_t id) {
770 genv.needsReflow = true;
771 jit::vector<IRInstruction*> stores;
772 jit::hash_set<void*> seen;
773 // find all the stores, so we can determine
774 // whether a phi is actually required for each
775 // src (also, we need a candidate store to clone)
777 seen.insert(blk);
779 ITRACE(7, "resolve_cycle - store id {}:\n", id);
780 Trace::Indent _i;
782 find_all_stores(genv, blk, id, stores, seen);
783 always_assert(stores.size() > 0);
784 if (Trace::moduleEnabled(TRACEMOD, 7)) {
785 for (auto const DEBUG_ONLY st : stores) {
786 ITRACE(7, " - {}\n", st->toString());
789 auto const cand = stores[0];
790 if (stores.size() == 1) {
791 ts.set(cand);
792 return;
794 jit::vector<uint32_t> srcsToPhi;
795 for (uint32_t i = 0; i < cand->numSrcs(); i++) {
796 SSATmp* prev = nullptr;
797 for (auto const st : stores) {
798 auto const si = st->src(i);
799 if (prev && prev != si) {
800 srcsToPhi.push_back(i);
801 break;
803 prev = si;
806 if (!srcsToPhi.size()) {
807 // the various stores all had the same inputs
808 // so nothing to do.
809 ts.set(cand);
810 return;
812 bool needsProcessed = false;
813 auto const inst = genv.unit.clone(cand);
814 for (auto i : srcsToPhi) {
815 auto t = TBottom;
816 for (auto const st : stores) {
817 t |= st->src(i)->type();
819 if (t.admitsSingleVal()) {
820 inst->setSrc(i, genv.unit.cns(t));
821 continue;
823 needsProcessed = true;
824 // create a Mov; we'll use its dst as the src of the store, and
825 // when we eventually create the phi, we'll set its dst as the src
826 // of the Mov (this allows us to avoid creating a new phi if
827 // there's already a suitable one there). We also set the Mov's
828 // src to be its dst, so that we can identify it as needing to be
829 // fixed up in resolve_flat.
830 auto mv = genv.unit.gen(Mov, blk->front().bcctx(), cand->src(i));
831 blk->prepend(mv);
832 inst->setSrc(i, mv->dst());
833 mv->setSrc(0, mv->dst());
834 mv->dst()->setType(t);
835 ITRACE(7, " + created {} for {}\n", mv->toString(), inst->toString());
837 if (needsProcessed) {
838 ts.setProcessed(inst);
839 } else {
840 ts.set(inst);
844 IRInstruction* resolve_flat(Global& genv, Block* blk, uint32_t id,
845 TrackedStore& ts) {
846 ts.setPending(blk);
848 jit::vector<IRInstruction*> stores;
849 blk->forEachPred([&](Block* pred) {
850 stores.push_back(resolve_ts(genv, pred, StoreKey::Out, id));
852 always_assert(stores.size() > 0);
853 if (auto const rep = ts.instruction()) {
854 ITRACE(7, "resolve_flat: returning {}\n", rep->toString());
855 return rep;
858 auto const cand = stores[0];
859 if (stores.size() == 1) return cand;
860 assertx(blk->numPreds() == stores.size());
861 auto& preds = blk->preds();
862 jit::vector<SSATmp*> newSrcs;
863 jit::vector<uint32_t> srcsToPhi;
864 for (uint32_t i = 0; i < cand->numSrcs(); i++) {
865 bool same = true;
866 SSATmp* temp = nullptr;
867 jit::hash_map<Block*, SSATmp*> phiInputs;
868 uint32_t j = 0;
869 for (auto const& edge : preds) {
870 auto const st = stores[j++];
871 auto si = st->src(i);
872 if (!si->inst()->is(DefConst) && si->type().admitsSingleVal()) {
873 // If edges do not all have the same SSATmp src,
874 // rewrite srcs with constant values to use DefConst.
875 phiInputs[edge.from()] = genv.unit.cns(si->type());
876 } else {
877 phiInputs[edge.from()] = si;
879 if (temp == nullptr) temp = si;
880 if (si != temp) same = false;
882 if (!same) {
883 srcsToPhi.push_back(i);
884 newSrcs.push_back(insertPhi(genv.unit, blk, phiInputs));
885 } else if (ts.processed()) {
886 // even if we don't need a phi, resolve_cycle might have thought
887 // we did; if so, we still need to fix up the input.
888 auto const mv = ts.processed()->src(i)->inst();
889 if (mv->is(Mov) && mv->src(0) == mv->dst()) {
890 srcsToPhi.push_back(i);
891 newSrcs.push_back(temp);
896 if (auto rep = ts.processed()) {
897 if (Trace::moduleEnabled(TRACEMOD, 7)) {
898 ITRACE(7, "resolve_flat: fixing {}\n", rep->toString());
899 for (auto const DEBUG_ONLY st : stores) {
900 ITRACE(7, " - {}\n", st->toString());
903 // the replacement was constructed during the recursive
904 // walk. Just need to hook up the new phis.
905 for (uint32_t ix = 0; ix < srcsToPhi.size(); ix++) {
906 auto i = srcsToPhi[ix];
907 auto src = rep->src(i);
908 auto mv = src->inst();
909 always_assert(mv->is(Mov) && mv->src(0) == mv->dst());
910 mv->setSrc(0, newSrcs[ix]);
911 retypeDests(mv, &genv.unit);
913 return rep;
916 auto rep = genv.unit.clone(cand);
917 for (uint32_t ix = 0; ix < srcsToPhi.size(); ix++) {
918 auto i = srcsToPhi[ix];
919 rep->setSrc(i, newSrcs[ix]);
922 return rep;
925 IRInstruction* resolve_ts(Global& genv, Block* blk,
926 StoreKey::Where w, uint32_t id) {
927 auto& ts = genv.trackedStoreMap[StoreKey { blk, w, id }];
929 ITRACE(7, "resolve_ts: B{}:{} store:{} ts:{}\n",
930 blk->id(), show(w), id, show(ts));
931 Trace::Indent _i;
933 if (ts.pending()) {
934 always_assert(ts.pending() == blk);
935 resolve_cycle(genv, ts, blk, id);
936 assertx(ts.instruction() || ts.processed());
939 if (auto inst = ts.instruction()) {
940 ITRACE(7, "-> inst: {}\n", inst->toString());
941 return inst;
943 if (auto inst = ts.processed()) {
944 ITRACE(7, "-> proc: {}\n", inst->toString());
945 return inst;
948 always_assert(ts.block());
949 if (w != StoreKey::In || blk != ts.block()) {
950 ITRACE(7, "direct recur: B{}:{} -> B{}:In\n",
951 blk->id(), show(w), ts.block()->id());
952 ts.set(resolve_ts(genv, ts.block(), StoreKey::In, id));
953 } else {
954 ts.set(resolve_flat(genv, blk, id, ts));
956 auto const rep = ts.instruction();
958 ITRACE(7, "-> {} (resolve_ts B{}, store {})\n",
959 rep->toString(), blk->id(), id);
960 return rep;
963 void optimize_block_pre(Global& genv, Block* block,
964 const jit::vector<BlockAnalysis>& blockAnalysis) {
965 auto& state = genv.blockStates[block];
966 auto const& transfer = blockAnalysis[state.id];
967 auto insertBits = state.ppIn & ~state.antIn &
968 (~state.ppOut | transfer.alteredAvl);
969 auto const deleteBits =
970 (state.antOut & transfer.delLoc) | (state.ppOut & transfer.avlLoc);
972 if (deleteBits.any()) {
973 FTRACE(1,
974 "Optimize B{}:\n"
975 " Delete: {}\n",
976 block->id(),
977 show(deleteBits));
979 auto env = Local { genv };
980 env.antLoc = deleteBits;
981 block_visit(env, block);
983 if (insertBits.any()) {
984 // warning: if you turn off any stores using this bisector, you
985 // need to disable the corresponding deletes (or simpler, all
986 // deletes) using store_delete.
987 BISECTOR(store_insert);
988 FTRACE(1,
989 "Optimize B{}:\n"
990 " Insert: {}\n",
991 block->id(),
992 show(insertBits));
994 bitset_for_each_set(
995 insertBits,
996 [&](size_t i){
997 if (!store_insert.go()) return;
998 auto const cinst = resolve_ts(genv, block, StoreKey::In, i);
999 FTRACE(1, " Inserting store {}: {}\n", i, cinst->toString());
1000 auto const inst = genv.unit.clone(cinst);
1001 block->prepend(inst);
1002 assertx(!inst->is(InlineCall));
1008 void optimize_block(Global& genv, Block* block) {
1009 auto& state = genv.blockStates[block];
1010 auto env = Local { genv };
1011 FTRACE(1,
1012 "Optimize B{}:\n"
1013 " antOut: {}\n",
1014 block->id(),
1015 show(state.antOut));
1016 env.antLoc = state.antOut;
1017 block_visit(env, block);
1020 enum class Direction { Forward, Backward };
1021 template <Direction dir, typename IFunc, typename UFunc, typename RFunc>
1022 void compute_dataflow(Global& genv,
1023 const jit::vector<BlockAnalysis>& blockAnalysis,
1024 IFunc init,
1025 UFunc update,
1026 RFunc reschedule) {
1027 using sorter = typename std::conditional<dir == Direction::Forward,
1028 std::less<PostOrderId>,
1029 std::greater<PostOrderId>>::type;
1031 auto incompleteQ =
1032 dataflow_worklist<PostOrderId, sorter>(genv.unit.numBlocks());
1034 for (auto poId = uint32_t{0}; poId < genv.poBlockList.size(); ++poId) {
1035 auto const blk = genv.poBlockList[poId];
1036 auto& state = genv.blockStates[blk];
1037 init(blk, state);
1038 incompleteQ.push(poId);
1041 while (!incompleteQ.empty()) {
1042 auto const poId = incompleteQ.pop();
1043 auto const blk = genv.poBlockList[poId];
1045 auto& state = genv.blockStates[blk];
1046 auto const& transfer = blockAnalysis[poId];
1047 assertx(state.id == poId);
1049 if (!update(blk, transfer, state)) continue;
1052 * Update the predecessors / successors if things change
1054 auto doblock = [&] (Block* other) {
1055 FTRACE(4, " -> {}\n", other->id());
1056 auto& otherState = genv.blockStates[other];
1058 if (reschedule(state, otherState, incompleteQ)) {
1059 incompleteQ.push(otherState.id);
1062 if (dir == Direction::Forward) {
1063 blk->forEachSucc(doblock);
1064 } else {
1065 blk->forEachPred(doblock);
1071 * Iterate on the antIn/antOut states until we reach a fixed point.
1073 void compute_anticipated(Global& genv,
1074 const jit::vector<BlockAnalysis>& blockAnalysis) {
1075 FTRACE(4, "Iterating Anticipated\n");
1076 compute_dataflow<Direction::Backward>(
1077 genv, blockAnalysis,
1079 * Important note: every block starts with a full antOut set,
1080 * including blocks that are exiting the region. When an HHIR
1081 * region is exited, there's always some instruction we can use
1082 * to indicate via memory_effects what may be read (e.g.
1083 * EndCatch, RetCtrl, ReqBindJmp, etc). When we start iterating,
1084 * we'll appropriately mark things as read based on these.
1086 [](Block*, BlockState& state) { state.antOut.set(); },
1087 [](Block* blk, const BlockAnalysis& transfer, BlockState& state) {
1088 state.antIn = transfer.antLoc |
1089 (state.antOut & ~transfer.alteredAnt);
1090 state.pAntIn = transfer.antLoc |
1091 (state.pAntOut & ~transfer.alteredAnt);
1092 FTRACE(4,
1093 " block B{}\n"
1094 " antIn : {}\n"
1095 " pAntIn : {}\n"
1096 " antLoc : {}\n"
1097 " altered : {}\n"
1098 " antOut : {}\n"
1099 " pAntOut : {}\n",
1100 blk->id(),
1101 show(state.antIn),
1102 show(state.pAntIn),
1103 show(transfer.antLoc),
1104 show(transfer.alteredAnt),
1105 show(state.antOut),
1106 show(state.pAntOut));
1107 return true;
1109 [](const BlockState& state, BlockState& pred,
1110 dataflow_worklist<PostOrderId>&) {
1111 auto const oldAntOut = pred.antOut;
1112 auto const oldPAntOut = pred.pAntOut;
1113 pred.antOut &= state.antIn;
1114 pred.pAntOut |= state.pAntIn;
1115 return pred.antOut != oldAntOut || pred.pAntOut != oldPAntOut;
1119 TrackedStore find_candidate_store_helper(Global& genv, TrackedStore ts,
1120 uint32_t id) {
1121 auto block = ts.block();
1122 assertx(block);
1123 TrackedStore ret;
1124 if (genv.seenStores[block] == genv.seenStoreId) return ret;
1126 // look for a candidate in the immediate predecessors
1127 block->forEachPred([&](Block* pred) {
1128 if (ret.isBad() || ret.instruction()) return;
1129 auto const pred_ts =
1130 genv.trackedStoreMap[StoreKey { pred, StoreKey::Out, id }];
1131 if (pred_ts.isBad() || pred_ts.instruction()) {
1132 ret = pred_ts;
1133 return;
1137 if (ret.isBad() || ret.instruction()) return ret;
1139 genv.seenStores[block] = genv.seenStoreId;
1140 // recursively search the predecessors
1141 block->forEachPred([&](Block* pred) {
1142 if (ret.isBad() || ret.instruction()) return;
1143 auto const pred_ts =
1144 genv.trackedStoreMap[StoreKey { pred, StoreKey::Out, id }];
1145 if (!pred_ts.block()) {
1146 always_assert_flog(pred_ts.isUnseen(),
1147 "pred_ts: {}", pred_ts.toString());
1148 return;
1150 ret = find_candidate_store_helper(genv, pred_ts, id);
1153 return ret;
1157 * Find a candidate store instruction; any one will do, because
1158 * compatibility between stores is transitive.
1160 TrackedStore find_candidate_store(Global& genv, TrackedStore ts, uint32_t id) {
1161 if (!ts.block()) return ts;
1162 genv.seenStoreId++;
1163 return find_candidate_store_helper(genv, ts, id);
1166 bool pureStoreSupportsPhi(Opcode op) {
1167 switch (op) {
1168 /* Instructions that require constant arguments cannot have different
1169 * sources be Phi-ed together. */
1170 case StVMReturnAddr:
1171 case StVMPC:
1172 return false;
1173 /* The Phi-ing of frame pointers is catastrophic. */
1174 case StVMFP:
1175 return false;
1176 default:
1177 return true;
1181 TrackedStore combine_ts(Global& genv, uint32_t id,
1182 TrackedStore s1,
1183 TrackedStore s2, Block* succ) {
1184 if (s1.same(s2)) return s1;
1185 if (s1.isUnseen() || s2.isBad()) return s2;
1186 if (s2.isUnseen() || s1.isBad()) return s1;
1188 enum class Compat { Same, Compat, Bad };
1189 auto compat = [](TrackedStore store1, TrackedStore store2) {
1190 auto i1 = store1.instruction();
1191 auto i2 = store2.instruction();
1192 assertx(i1 && i2);
1193 if (i1->op() != i2->op()) return Compat::Bad;
1194 if (i1->numSrcs() != i2->numSrcs()) return Compat::Bad;
1195 for (auto i = i1->numSrcs(); i--; ) {
1196 // Ptr and Lval types are imcompatible, as one requires two register,
1197 // while the other requires only one. This stops us from phiing the
1198 // two types together to elimnate a store.
1199 auto const& t1 = i1->src(i)->type();
1200 auto const& t2 = i2->src(i)->type();
1201 if ((t1.maybe(TPtr) && t2.maybe(TLval)) ||
1202 (t2.maybe(TPtr) && t1.maybe(TLval))) {
1203 return Compat::Bad;
1206 for (auto i = i1->numSrcs(); i--; ) {
1207 if (i1->src(i) != i2->src(i)) {
1208 if (!pureStoreSupportsPhi(i1->op())) return Compat::Bad;
1209 return Compat::Compat;
1212 return Compat::Same;
1215 auto trackedBlock = [&]() {
1216 return TrackedStore(succ, id);
1219 if (s1.block() || s2.block()) {
1220 auto c1 = find_candidate_store(genv, s1, id);
1221 auto c2 = find_candidate_store(genv, s2, id);
1222 if (c1.instruction() && c2.instruction() &&
1223 compat(c1, c2) != Compat::Bad) {
1224 return trackedBlock();
1226 return TrackedStore::BadVal();
1229 switch (compat(s1, s2)) {
1230 case Compat::Same: return s1;
1231 case Compat::Compat: return trackedBlock();
1232 case Compat::Bad: break;
1234 return TrackedStore::BadVal();
1238 * Compute a TrackedStore for succ by merging its preds.
1240 TrackedStore recompute_ts(Global& genv, uint32_t id, Block* succ) {
1241 TrackedStore ret;
1242 succ->forEachPred([&](Block* pred) {
1243 auto const ts =
1244 genv.trackedStoreMap[StoreKey { pred, StoreKey::Out, id }];
1245 ret = combine_ts(genv, id, ts, ret, succ);
1247 return ret;
1251 * Compute a starting point for ppIn/ppOut based on global availability.
1252 * At the same time, compute exactly which stores are available at each
1253 * point. This is essentially the bitvector implementation of global
1254 * availability, but disallowing the cases where we can't find/create
1255 * a suitable store.
1257 void compute_available_stores(
1258 Global& genv, const jit::vector<BlockAnalysis>& blockAnalysis) {
1260 FTRACE(4, "\nIterating Available Stores\n");
1261 compute_dataflow<Direction::Forward>(
1262 genv, blockAnalysis,
1263 [](Block* blk, BlockState& state) {
1264 if (blk->numPreds()) state.ppIn.set();
1266 [&](Block* blk, const BlockAnalysis& transfer, BlockState& state) {
1267 state.ppOut = transfer.avlLoc | (state.ppIn & ~transfer.alteredAvl);
1268 auto propagate = state.ppOut & ~transfer.avlLoc;
1269 bitset_for_each_set(
1270 propagate,
1271 [&](uint32_t i) {
1272 auto const& tsIn =
1273 genv.trackedStoreMap[StoreKey { blk, StoreKey::In, i }];
1274 auto& tsOut =
1275 genv.trackedStoreMap[StoreKey { blk, StoreKey::Out, i }];
1276 assertx(!tsIn.isUnseen());
1277 tsOut = tsIn;
1278 if (tsOut.isBad()) {
1279 state.ppOut[i] = false;
1283 auto showv DEBUG_ONLY = [&] (StoreKey::Where w, const ALocBits& pp) {
1284 std::string r;
1285 bitset_for_each_set(
1287 [&](uint32_t i) {
1288 auto& ts = genv.trackedStoreMap[StoreKey { blk, w, i}];
1289 folly::format(&r, " {}:{}", i, show(ts));
1292 return r;
1294 FTRACE(4,
1295 " block B{}\n"
1296 " ppIn : {}\n"
1297 " ppInV : {}\n"
1298 " avlLoc : {}\n"
1299 " altered : {}\n"
1300 " ppOut : {}\n"
1301 " ppOutV : {}\n",
1302 blk->id(),
1303 show(state.ppIn),
1304 showv(StoreKey::In, state.ppIn),
1305 show(transfer.avlLoc),
1306 show(transfer.alteredAvl),
1307 show(state.ppOut),
1308 showv(StoreKey::Out, state.ppOut));
1309 return true;
1311 [&](const BlockState& state, BlockState& succState,
1312 dataflow_worklist<PostOrderId, std::less<PostOrderId>>&) {
1313 auto const oldPpIn = succState.ppIn;
1314 succState.ppIn &= state.ppOut;
1315 bool changed = succState.ppIn != oldPpIn;
1316 auto blk = genv.poBlockList[state.id];
1317 auto succ = genv.poBlockList[succState.id];
1318 bitset_for_each_set(
1319 succState.ppIn,
1320 [&](uint32_t i) {
1321 auto& ts =
1322 genv.trackedStoreMap[StoreKey { blk, StoreKey::Out, i }];
1323 auto& tsSucc =
1324 genv.trackedStoreMap[StoreKey { succ, StoreKey::In, i }];
1325 auto tsNew = succ->numPreds() == 1 ? ts : recompute_ts(genv, i, succ);
1326 if (!tsNew.same(tsSucc)) {
1327 changed = true;
1328 assertx(tsNew >= tsSucc);
1329 tsSucc = tsNew;
1330 if (tsSucc.isBad()) {
1331 succState.ppIn[i] = false;
1336 return changed;
1341 * Iterate on the ppIn/ppOut states until we reach a fixed point.
1343 void compute_placement_possible(
1344 Global& genv, const jit::vector<BlockAnalysis>& blockAnalysis) {
1346 compute_available_stores(genv, blockAnalysis);
1348 FTRACE(4, "\nIterating Placement Possible\n");
1349 compute_dataflow<Direction::Forward>(
1350 genv, blockAnalysis,
1351 [](Block* blk, BlockState& state) {
1352 if (!blk->numPreds()) state.ppIn.reset();
1353 if (!blk->numSuccs()) state.ppOut.reset();
1355 [&](Block* blk, const BlockAnalysis& transfer, BlockState& state) {
1356 auto trace = [&] () {
1357 FTRACE(4,
1358 " block B{}\n"
1359 " ppIn : {}\n"
1360 " avlLoc: {}\n"
1361 " altLoc: {}\n"
1362 " ppOut: {}\n",
1363 blk->id(),
1364 show(state.ppIn),
1365 show(transfer.avlLoc),
1366 show(transfer.alteredAvl),
1367 show(state.ppOut));
1370 if (!blk->numSuccs()) {
1371 trace();
1372 // ppOut is empty for a block with no succs (since placement
1373 // is definitely not possible in any of its successors), and
1374 // its initialized to empty, so nothing to do.
1376 // Note that this is not just an optimization - the update
1377 // below would produce the wrong results.
1378 return false;
1381 state.ppOut = transfer.avlLoc | (state.ppIn & ~transfer.alteredAvl);
1382 auto restrictBits = state.pAntOut;
1384 blk->forEachSucc([&] (Block* succ) {
1385 auto const& succState = genv.blockStates[succ];
1386 restrictBits &= succState.ppIn | succState.antIn;
1389 state.ppOut &= restrictBits;
1390 trace();
1391 return true;
1393 [&](const BlockState& state, BlockState& succState,
1394 dataflow_worklist<PostOrderId, std::less<PostOrderId>>& incompleteQ) {
1395 auto const oldPPIn = succState.ppIn;
1396 succState.ppIn &= state.ppOut;
1398 if (succState.ppIn == oldPPIn) return false;
1399 FTRACE(4, " reprocessing succ {}\n",
1400 genv.poBlockList[succState.id]->id());
1401 auto const ssppa = succState.ppIn | succState.antIn;
1402 if (ssppa != (oldPPIn | succState.antIn)) {
1403 // this change might affect this successor's
1404 // predecessors; reschedule them if necessary
1405 auto succ = genv.poBlockList[succState.id];
1406 auto blk = genv.poBlockList[state.id];
1407 succ->forEachPred([&] (Block* pred) {
1408 if (pred == blk) return; // can't be affected
1409 auto const& psState = genv.blockStates[pred];
1410 if ((psState.ppOut & ssppa) != psState.ppOut) {
1411 FTRACE(4, " reprocessing succ's pred {}\n", pred->id());
1412 incompleteQ.push(psState.id);
1416 return true;
1420 //////////////////////////////////////////////////////////////////////
1424 void optimizeStores(IRUnit& unit) {
1425 PassTracer tracer{&unit, Trace::hhir_store, "optimizeStores"};
1426 Timer t(Timer::optimize_stores, unit.logEntry().get_pointer());
1428 // This isn't required for correctness, but it may allow removing stores that
1429 // otherwise we would leave alone.
1430 splitCriticalEdges(unit);
1433 * Global state for this pass, visible while processing any block.
1435 auto genv = Global { unit };
1436 auto const& poBlockList = genv.poBlockList;
1437 if (genv.ainfo.locations.size() == 0) {
1438 FTRACE(1, "no memory accesses to possibly optimize\n");
1439 return;
1441 FTRACE(1, "\nLocations:\n{}\n", show(genv.ainfo));
1444 * Initialize the block state structures.
1446 for (auto poId = uint32_t{0}; poId < poBlockList.size(); ++poId) {
1447 genv.blockStates[poBlockList[poId]].id = poId;
1451 * Analyze each block to compute its transfer function.
1453 * The blockAnalysis vector is indexed by post order id.
1455 auto const blockAnalysis = [&] () -> jit::vector<BlockAnalysis> {
1456 auto ret = jit::vector<BlockAnalysis>{};
1457 ret.reserve(unit.numBlocks());
1458 for (auto poId : poBlockList) {
1459 ret.push_back(analyze_block(genv, poId));
1461 for (auto& elm : genv.blk2Aloc) {
1462 auto poId = genv.blockStates[elm.first].id;
1463 auto& ba = ret[poId];
1464 ba.antLoc &= ~elm.second;
1465 ba.alteredAnt |= elm.second;
1466 ba.alteredAvl |= elm.second;
1468 genv.blk2Aloc.clear();
1469 return ret;
1470 }();
1472 FTRACE(2, "\nTransfer functions:\n{}\n",
1473 [&]() -> std::string {
1474 auto ret = std::string{};
1475 for (auto poId = uint32_t{0}; poId < poBlockList.size(); ++poId) {
1476 auto& analysis = blockAnalysis[poId];
1477 folly::format(
1478 &ret,
1479 " B{}\n"
1480 " antLoc : {}\n"
1481 " alteredAnt: {}\n"
1482 " alteredAvl: {}\n"
1483 " avlLoc : {}\n"
1484 " delLoc : {}\n",
1485 poBlockList[poId]->id(),
1486 show(analysis.antLoc),
1487 show(analysis.alteredAnt),
1488 show(analysis.alteredAvl),
1489 show(analysis.avlLoc),
1490 show(analysis.delLoc)
1493 return ret;
1497 compute_anticipated(genv, blockAnalysis);
1499 if (!RuntimeOption::EvalHHIRStorePRE) {
1501 * We've reached a fixed point.
1502 * Delete redundant stores
1504 FTRACE(2, "\nFixed point:\n{}\n",
1505 [&]() -> std::string {
1506 auto ret = std::string{};
1507 for (auto poId = uint32_t{0}; poId < poBlockList.size(); ++poId) {
1508 auto const blk = poBlockList[poId];
1509 auto const& state = genv.blockStates[blk];
1510 folly::format(
1511 &ret,
1512 " B{: <3}: antOut {}\n",
1513 blk->id(), show(state.antOut)
1516 return ret;
1519 for (auto& block : poBlockList) {
1520 optimize_block(genv, block);
1522 return;
1525 compute_placement_possible(genv, blockAnalysis);
1527 for (auto& block : poBlockList) {
1528 optimize_block_pre(genv, block, blockAnalysis);
1531 if (genv.needsReflow) {
1532 reflowTypes(genv.unit);
1536 //////////////////////////////////////////////////////////////////////