Content

  1. Performance Measurement
  2. CPU Time
  3. CISC v.s. RISC
  4. Amdahl's Law
  5. Benchmark
  6. Fallacies & Pitfalls

Performance Measurement

  • Execution time: time to finish a task
    • Disk & memory accesses, I/O, OS overhead, CPU execution time, etc.
    • Important for individual PCs
  • Throughput: number of tasks finished per time
    • Important for batch processing (datacenters)

Performance = 1 / Execution Time

Measuring Execution Time

  • Wall clock time (elapsed time): time a user experiences for a computer to finish a task
    • Includes OS overhead, I/O, idle time, time shared with other users
  • CPU time: time spent on user task in CPU
    • Includes user CPU and OS CPU time
real: wall clock time
user: user CPU time --|--> CPU time; can be larger than real time on multiprocessor computer
sys: OS CPU time    --|

CPU Time

* The Iron Law
CPU time = Cycle Count * Cycle Time
         = Cycle Count / Cycle Frequency
         = # of instructions / program * ===> A
           # of cycles / instruction *   ===> B
           time / cycle                  ===> C
A: algorithm; language; compiler; ISA
B: language; compiler; ISA; microarchitecture
C: ISA; H/W design
  1. CPU time = Cycle Count * Cycle Time = Cycle Count / Cycle Frequency

Synchronous Sequential Circuits

All state elements connected to the same clock signal.

input |combinational| - |  | - |combinational| - ... - output
      | logic       |   |/\|   | logic       |
clk ---------------------|------------------------| ...

The propagation delay through the combinational logic must be shorter than the clock period. Such path is called critical path, which determines max clock speed.

Increasing clock frequency leads to less accomplished work within 1 cycle -> more cycles needed.

  1. Cycle Count = Instruction Count * Cycles Per Instruction

Different instructions require different number of cycles.

* Average CPI (n classes of instructions)
CPI = Cycle Count / Instruction Count
    = sum_i_to_n(CPI_i * Instruction Count i) / Instruction Count
// Example (CPI = 1)
        i = 0 

loop:   a = a + 1 

        i = i + 1 

        if i < 10 goto loop
// static instructions: 4
// dynamic instructions: 31
// cycles: 31
  • Number of instructions in a program depends on:
    • Nature of application
    • Compiler techniques
    • Available instructions of ISA
  • Average CPI
    • CPU microarchitecture
    • ISA (CISC v.s. RISC)
    • Current running state of CPU

CISC v.s. RISC

CISC RISC
Instructions Complex (memory-memory; LOAD & STORE incorporated) Simple (register-register; LOAD & STORE independent)
Addressing mode Complex Simple
H/W design Harder Simpler (H/W optimization; pipelining)
Compiler Easier implementation (shorter compiled code); less attractive as compiling techniques improve Easier optimization (simpler choice of instruction)
Emphasis H/W S/W
Registers Less (need) More (each instruction requires less chip space)
Microarchitecture CPI Cycle Time
CISC >1 short
RISC (single cycle unpipelined) 1 long
RISC (pipelined) 1 short

Amdahl's Law

Max speedup is limited by the portion of program that can be sped up.

P = portion of program optimizable
S = speedup
Speedup = 1 / ((1-P) + P/S)

Benchmark

A set of programs representing normal workloads, used to compare processor performance.

SPEC CPU Benchmark

Overall performance = geometric mean of performance ratios (reference time / execution time)

Fallacies & Pitfalls

  • Pitfall: Expecting the improvement of one aspect of a computer to increase overall performance by an amount proportional to the size of the improvement.
  • Fallacy: Computers at low utilization use little power.
  • Fallacy: Designing for performance and designing for energy efficiency are unrelated goals.
  • Pitfall: Using a subset of the performance equation as a performance metric.

References

results matching ""

    No results matching ""