Content
- Overview
- Pipelined Control
- Hazards
- Structural Hazards
- Data Hazards
- Control Hazards
Overview
- 5 stages of instruction execution
- Instruction Fetch
- Instruction Decode
- Execution
- Memory Operation
- Write Back
- Pipeline registers, compare with single cycle:
- Increase latency
- Throughput the same as per cycle
- Throughput increases as per unit time
- Cycle time shorter
- Performance
- Ideal speedup = number of stages
Pipelined Control
The instruction and control signals need to be sent down the pipeline -- instruction register & decoder replicated at every stage.
Hazards
On every cycle, the hardware needs to detect and resolve all types of hazards, while keeping pipeline as filled as possible to achieve CPI = 1. In real systems, CPI suffers slightly in return for higher clock speed.
Structural Hazards
More than 1 stage needs access to the same physical resource.
- Solutions
- Extra copy
- Hardware handle concurrent use
- Require different access time
Data Hazards
Stages access data location (register or memory) in ways incompatible with ISA contract with the programmer.
- 3 types
- WAR
- WAW
- RAW: the only hazard for RISC-V
- Solutions
- Stalling: freeze earlier stages
- Feedback & interlock
- Later stages provide dependence information to earlier stage which can kill or stall instructions
- Need to prevent deadlock
- Pipeline bubbles: NOP, special control decoding, disable pipeline registers, etc.
- Load & store hazards: sometimes resolved in pipeline or in the memory system itself
- Feedback & interlock
- Data forwarding: route new data to earlier stages
- Bypass
- Only case to stall: load-use (use after load)
- Avoided by compiler
- Only case to stall: load-use (use after load)
- Bypass
- Stalling: freeze earlier stages
Control Hazards
Results from branches & jumps.
- Solutions
- Stalling (X)
- Wait for EX stage to compute the target address & branch condition
- Stall for 2 cycles
- Doesn't know is a branch instruction until ID stage
- Changing ISA
- The 2 instructions following the branch instruction will always be executed
- Branch delay slot: the extra cycles, insert useful instructions or NOPs
- Speculate & Kill
- Speculate the instructions in delay slots will be executed
- If branch taken, kill instructions; if not, continue
- Pros: wastes cycles only when branch taken
- Cons: complicated hardware
- Interact with stalls
- Branch/jump in delay slots?
- Pipelining jumps (JAL)
- Branch condition always true
- Always kill instructions
- Proceed until write-back
- Stalling (X)
- Concerns for branch delay slots
- Hurts performance due to the penalty from I-cache misses
- Complicates advanced microarchitectures
- Difficult to find instructions to fill deeply pipelined processors
- Can use branch prediction, predicated instructions to help