Content

Instruction Scheduling
1. H/W Resources
2. Data Dependencies
Software Pipelining
1. Example
2. Modulo Scheduling

Instruction Scheduling

Motivation
- RISC architecture philosophy: expose H/W features to compilers for optimizations
Scheduling Constraints
- H/W resources
- Data dependencies
- Control dependencies

H/W Resources

Conflict between register allocation & instruction scheduling
- Register allocation: minimize # of registers
- Instruction scheduling: reduce stalls by using more registers
Cache
- e.g. loop reordering, cache blocking

Data Dependencies

Directed acyclic graph with edges indicating dependency. Find topological sort to produce better sequence.

Heuristics
- Instructions that can run to completion without interference before those with interference
- Instructions with more dependents before those with less dependents
- Latency-weighted path

Software Pipelining

Hiding instruction latency
- Increase distance between dependent operations by spreading them over more iterations; latency can be absorbed by other non-dependent operations

Often used in combination with loop unrolling
- Variable latency of instructions?
- Unroll multiple times, loop body huge; bad for cache

Example

for i = 1 to bignumber
  A(i) ; 3 cycle latency
  B(i) ; 3
  C(i) ; 12(perhaps a floating point operation)
  D(i) ; 3
  E(i) ; 3
  F(i) ; 3
end

->

prologue
for i = 1 to (bignumber - 6)
  A(i+6)
  B(i+5)
  C(i+4)
  D(i+2) ; note that we skip i+3
  E(i+1)
  F(i)
end
epilogue

Distance of 12 between C(i) and D(i).

; loop prologue (arranged on lines for clarity)
A(1)
A(2), B(1)
A(3), B(2), C(1)
A(4), B(3), C(2)
A(5), B(4), C(3), D(1)
A(6), B(5), C(4), D(2), E(1)

; loop epilogue (arranged on lines for clarity)
B(bignumber), C(bignumber-1), D(bignumber-3), E(bignumber-4), F(bignumber-5)
C(bignumber), D(bignumber-2), E(bignumber-3), F(bignumber-4)
D(bignumber-1), E(bignumber-2), F(bignumber-3)
D(bignumber), E(bignumber-1), F(bignumber-2)
E(bignumber), F(bignumber-1)
F(bignumber)

Modulo Scheduling

Interleaving instructions of an inner loop to achieve better parallelism of software pipelining.

Repeatedly schedule body of loop with different initiation intervals until perfect schedule reached.

References

Modulo Scheduling

Instruction Scheduling & Software Pipelining