Content

Performance Measurement
CPU Time
CISC v.s. RISC
Amdahl's Law
Benchmark
Fallacies & Pitfalls

Performance Measurement

Execution time: time to finish a task
- Disk & memory accesses, I/O, OS overhead, CPU execution time, etc.
- Important for individual PCs
Throughput: number of tasks finished per time
- Important for batch processing (datacenters)

Performance = 1 / Execution Time

Measuring Execution Time

Wall clock time (elapsed time): time a user experiences for a computer to finish a task
- Includes OS overhead, I/O, idle time, time shared with other users
CPU time: time spent on user task in CPU
- Includes user CPU and OS CPU time

real: wall clock time
user: user CPU time --|--> CPU time; can be larger than real time on multiprocessor computer
sys: OS CPU time    --|

CPU Time

* The Iron Law
CPU time = Cycle Count * Cycle Time
         = Cycle Count / Cycle Frequency
         = # of instructions / program * ===> A
           # of cycles / instruction *   ===> B
           time / cycle                  ===> C
A: algorithm; language; compiler; ISA
B: language; compiler; ISA; microarchitecture
C: ISA; H/W design

CPU time = Cycle Count * Cycle Time = Cycle Count / Cycle Frequency

Synchronous Sequential Circuits

All state elements connected to the same clock signal.
input |combinational| - |  | - |combinational| - ... - output
      | logic       |   |/\|   | logic       |
clk ---------------------|------------------------| ...
The propagation delay through the combinational logic must be shorter than the clock period. Such path is called critical path, which determines max clock speed.

Increasing clock frequency leads to less accomplished work within 1 cycle -> more cycles needed.

Cycle Count = Instruction Count * Cycles Per Instruction

Different instructions require different number of cycles.

* Average CPI (n classes of instructions)
CPI = Cycle Count / Instruction Count
    = sum_i_to_n(CPI_i * Instruction Count i) / Instruction Count

// Example (CPI = 1)
        i = 0 

loop:   a = a + 1 

        i = i + 1 

        if i < 10 goto loop
// static instructions: 4
// dynamic instructions: 31
// cycles: 31

Number of instructions in a program depends on:
- Nature of application
- Compiler techniques
- Available instructions of ISA
Average CPI
- CPU microarchitecture
- ISA (CISC v.s. RISC)
- Current running state of CPU

CISC v.s. RISC

	CISC	RISC
Instructions	Complex (memory-memory; LOAD & STORE incorporated)	Simple (register-register; LOAD & STORE independent)
Addressing mode	Complex	Simple
H/W design	Harder	Simpler (H/W optimization; pipelining)
Compiler	Easier implementation (shorter compiled code); less attractive as compiling techniques improve	Easier optimization (simpler choice of instruction)
Emphasis	H/W	S/W
Registers	Less	(need) More (each instruction requires less chip space)

Microarchitecture	CPI	Cycle Time
CISC	>1	short
RISC (single cycle unpipelined)	1	long
RISC (pipelined)	1	short

Amdahl's Law

Max speedup is limited by the portion of program that can be sped up.

P = portion of program optimizable
S = speedup
Speedup = 1 / ((1-P) + P/S)

Benchmark

A set of programs representing normal workloads, used to compare processor performance.

SPEC CPU Benchmark

Overall performance = geometric mean of performance ratios (reference time / execution time)

Fallacies & Pitfalls

Pitfall: Expecting the improvement of one aspect of a computer to increase overall performance by an amount proportional to the size of the improvement.
Fallacy: Computers at low utilization use little power.
Fallacy: Designing for performance and designing for energy efficiency are unrelated goals.
Pitfall: Using a subset of the performance equation as a performance metric.

Performance

Content

Performance Measurement

Measuring Execution Time

CPU Time

Synchronous Sequential Circuits

CISC v.s. RISC

Amdahl's Law

Benchmark

SPEC CPU Benchmark

Fallacies & Pitfalls

References

results matching ""

No results matching ""