Content
- Performance Measurement
- CPU Time
- CISC v.s. RISC
- Amdahl's Law
- Benchmark
- Fallacies & Pitfalls
Performance Measurement
- Execution time: time to finish a task
- Disk & memory accesses, I/O, OS overhead, CPU execution time, etc.
- Important for individual PCs
- Throughput: number of tasks finished per time
- Important for batch processing (datacenters)
Performance = 1 / Execution Time
Measuring Execution Time
- Wall clock time (elapsed time): time a user experiences for a computer to finish a task
- Includes OS overhead, I/O, idle time, time shared with other users
- CPU time: time spent on user task in CPU
- Includes user CPU and OS CPU time
real: wall clock time
user: user CPU time --|--> CPU time; can be larger than real time on multiprocessor computer
sys: OS CPU time --|
CPU Time
* The Iron Law
CPU time = Cycle Count * Cycle Time
= Cycle Count / Cycle Frequency
= # of instructions / program * ===> A
# of cycles / instruction * ===> B
time / cycle ===> C
A: algorithm; language; compiler; ISA
B: language; compiler; ISA; microarchitecture
C: ISA; H/W design
CPU time = Cycle Count * Cycle Time = Cycle Count / Cycle Frequency
Synchronous Sequential Circuits
All state elements connected to the same clock signal.
input |combinational| - | | - |combinational| - ... - output | logic | |/\| | logic | clk ---------------------|------------------------| ...
The propagation delay through the combinational logic must be shorter than the clock period. Such path is called critical path, which determines max clock speed.
Increasing clock frequency leads to less accomplished work within 1 cycle -> more cycles needed.
Cycle Count = Instruction Count * Cycles Per Instruction
Different instructions require different number of cycles.
* Average CPI (n classes of instructions)
CPI = Cycle Count / Instruction Count
= sum_i_to_n(CPI_i * Instruction Count i) / Instruction Count
// Example (CPI = 1)
i = 0
loop: a = a + 1
i = i + 1
if i < 10 goto loop
// static instructions: 4
// dynamic instructions: 31
// cycles: 31
- Number of instructions in a program depends on:
- Nature of application
- Compiler techniques
- Available instructions of ISA
- Average CPI
- CPU microarchitecture
- ISA (CISC v.s. RISC)
- Current running state of CPU
CISC v.s. RISC
CISC | RISC | |
---|---|---|
Instructions | Complex (memory-memory; LOAD & STORE incorporated) | Simple (register-register; LOAD & STORE independent) |
Addressing mode | Complex | Simple |
H/W design | Harder | Simpler (H/W optimization; pipelining) |
Compiler | Easier implementation (shorter compiled code); less attractive as compiling techniques improve | Easier optimization (simpler choice of instruction) |
Emphasis | H/W | S/W |
Registers | Less | (need) More (each instruction requires less chip space) |
Microarchitecture | CPI | Cycle Time |
---|---|---|
CISC | >1 | short |
RISC (single cycle unpipelined) | 1 | long |
RISC (pipelined) | 1 | short |
Amdahl's Law
Max speedup is limited by the portion of program that can be sped up.
P = portion of program optimizable
S = speedup
Speedup = 1 / ((1-P) + P/S)
Benchmark
A set of programs representing normal workloads, used to compare processor performance.
SPEC CPU Benchmark
Overall performance = geometric mean of performance ratios (reference time / execution time)
Fallacies & Pitfalls
- Pitfall: Expecting the improvement of one aspect of a computer to increase overall performance by an amount proportional to the size of the improvement.
- Fallacy: Computers at low utilization use little power.
- Fallacy: Designing for performance and designing for energy efficiency are unrelated goals.
- Pitfall: Using a subset of the performance equation as a performance metric.