1 YARI Pipeline Design, version 1
3 Assuming the classic five stage pipeline:
4 - IF - instruction fetch,
5 - DE - decode and register fetch,
6 - EX - bypassing and execution,
7 - ME - memory access, and
8 - WB - results write-back.
10 Implementing this is trivial if you ignore hazards.
12 The main hazards we have to deal with are:
14 - WAR hazards - in the simplest pipeline this only happens for loads
15 followed by an immediate use. The most obvious way to deal with this
18 - Control transfers, such as branches (B and Bxx), jumps (J),
19 subroutine calls (JAL and JRAL), etc. We'll call all of these
20 branches for simplicity. MIPS adds the twist of delayed branches,
21 meaning that the instruction immediately following the branch (the
22 "delay slot") is executed before the control is transferred. If any
23 of the instructions following the delay are present in the pipe they
26 - Memory busy / data cache (D$) misses. Again, the most obvious way to
27 deal with this is to stall the pipe until the memory/cache is ready.
29 - Instruction cache (I$) misses.
31 Unfortunately, stalling the pipeline not only is very expensive (a
32 global signal gating all flip-flops in the pipeline, and memory blocks
33 can't be stalled so needs extra logic), but stalling for WAR hazards
34 means introducing bubbles in the pipeline, which in turn makes dealing
35 with the delay slot very complicated (branches must flush everything
36 following the delay slot, but the delay slot can be in several
37 locations depending on the history of stalls).
39 We thus reject stalling and instead restart the pipeline whenever we
40 encounter a hazard. This generally means that means a higher cost of
41 hazards in number of pipeline bubbles, but this is partially offset by
42 a shorter cycle time and the fact that hazards doesn't happen all the
43 time (given the probability of the hazard, you can work out much speed
44 up you need to offset the additional hazard costs).
47 Restarting upon hazards:
49 - I$ miss - not much choice here, the IF stage will emit bubbles until
52 - Branches - to simplify the handling of delay slots we will require
53 that a taken branch will never see a bubble in it's delay slot. If
54 that is detected, we treat it as a different kind of hazard and
55 restart the branch instruction, flushing everything following and
56 including the branch itself. This trick only works because all
57 branches in MIPS are idempotent. Otherwise, a taken branch flushes
58 everything following the delay slot and restart from the branch
61 - D$ misses/memory busy - there are several options here, but they are
62 somewhat dependent on how the ME stage is implemented.
64 The simplest option is to kill the pipeline while memory is
65 busy/while the D$ fills and restart the instruction following the
66 store/load. Unfortunately, as this hazard is fairly deep in the
67 pipeline it pays the highest penalty for restarting as opposed to
70 A slightly better option is keep restarting the following
71 instruction until we know that memory will be ready before that
72 instruction propagates down to the ME stage.
74 We can further reduce store hazards by applying a small store buffer
75 (~ FIFO). The caveat is of course that loads would either need to
76 check against the store buffer or stall until it has drained.
78 Raising the D$ size and/or associativity will help lower the
79 frequency of the load hazards. Another technique that can sometimes
80 hide load latencies is non-blocking loads. Here the idea is simply
81 to decouple the load pipeline, keeping track of which registers have
82 outstanding loads, thus causes WAR hazards whenever a use of such a
83 register is attempted. As data eventually arrives they can be
84 written to the register file using a second write port (requiring
85 two write ports to the register file), or they can wait from the
86 write port to be otherwise idle and write back the loaded
87 results. Alas, in practice for MIPS code there tends to be a fairly
88 short distance between the load and its use, rendering this
89 elaborate technique less valuable.
91 - WAR - simply restart (**) the use. This hazard is non-existent for
92 properly scheduled code (assuming blocking caches).
94 (**) Alas, nothing is quite that simple. The instructions restarted
95 in the WAR hazards could be delay-slots, and if so, would cause the
96 effect of the branch to be ignored. To issue this restart, we must
97 know if the instruction is in a delay-slot, and if it is, we restart
98 the preceding instruction. (This again depends on the impotence of
102 Choosing between hazards: as hazards are generally detected in
103 detected in different stages, it's important to ensure that the hazard
104 of the instruction closet to committing, that is, furthest along in
105 the pipe, takes priority.
108 Implementing the hazard handling
110 Intuitively we would want to restart as soon as a hazard can be
111 detected, however to more stages that can issue a restart, the larger
112 the priority arbitration as well larger the mux for the restart
113 control (address and stages to flush). It will be better to focus on
114 the frequent hazards (say, branches and D$ misses) and only accept
115 restarts from the corresponding stages (say, EX and ME). A less
116 important hazard as fx. WAR detected in DE, will have to ask EX to
117 perform the restart. However, care must be taken to ensure than these
118 pending hazards are handled correctly (for example, that no
119 instruction that will be flushed by the pending restart can cause any
120 hazards or other effects).
122 Naively implemented flushing a pipeline stage means clearing all the
123 flip flops, turning instructions into nops. This is very expensive, so
124 instead provide all stage with a "valid" bit. Any effect from a stage
125 (hazards, memory ops, register writes etc) must be ignored if the
126 stage is invalid (that is, it's a pipeline bubble).