Content

  1. Measurements of Programs & Computers
    1. Measurements
    2. Amdahl's Law
    3. Tools for Measuring
  2. Compiler Optimizations
    1. Common Compiler Optimizations
  3. Basics of Computer Architecture
    1. Pipelines
    2. Branch Prediction
    3. Out-of-Order Execution
    4. Superscalar & Simultaneous Multithreading(SMT)/Hyperthreading

Measurements of Programs & Computers

Measurements

  • IPS: instructions per second
  • FLOPS: floating point operations per second
  • IPC: instructions per cycle
  • CPI: cycles per instruction

Amdahl's Law

  • speedup = OldTime/NewTime
  • NewTime = OldTime x [(1 - f) + f/s]
    speedup = 1 / (1 - f + f/s)
    • f = fraction of execution time the optimization applies to
    • s = improvement factor

Tools for Measuring

  • S/W timers
    • C library: <sys/times.h>
    • OS-level timers: /usr/bin/time
  • H/W timers & performance counters
    • Built into processor chip: low-level architecture events e.g. cache misses, branch mispredictions, ...
    • S/W packages to make them easier to use: perf
  • Instrumentation

    • Self implemented codes
    • gprof
      • Periodically interrupt program
      • Measure time spent & # of calls made in each function
    • gcov
      • Profile of execution within a function e.g. # of times each line of code was executed, which loops/conditional statements are important
    • Disturbing the system slows down execution
    • Comparison

      | | gprof | gcov | |:---|:------|:-----| | Compile | faster (insert counter func for each function) | slower (insert counter func for each line) | | Size | smaller | bigger | | Runtime | slower (frequent interrupts) | faster |

Compiler Optimizations

  • Preserve correctness
  • Improve performance
    • Fewer CPI
      • Schedule instructions
      • Improve cache/memory behavior
    • Fewer instructions
  • Worth the effort
---> Front End ---> Optimizer ---> Code Generator --->
 HLL            IR             IR                  LLL
  • Limitations
    • Cannot change program behavior
    • Inter-procedural analysis too expensive -> most analysis performed within procedures
    • Hard to anticipate run-time inputs -> most analysis based on static information
    • Must be conservative
  • Programmers should:
    • Select best algorithm
    • Write readable & maintainable code
    • Eliminate optimization blockers
      • Memory aliasing
        • 2 different memory references specify single location
      • Procedural side-effects

Common Compiler Optimizations

  • Machine independent
    • Constant propagation
    • Constant folding
    • Common subexpression elimination
    • Dead code elimination
    • Loop invariant code motion
    • Function inlining
      • Space-speed tradeoff
      • Overloading issue
    • Reduction in strength: replace costly operation with simpler one
      • Shifting (machine-dependent)
      • Make use of registers
    • Direct access
      • Avoid bound checking
  • Machine dependent
    • Instruction scheduling
    • Loop unrolling
      • Enables more aggressive instruction scheduling
    • Parallel unrolling

Basics of Computer Architecture

  • RISC: simpler instructions -> pipeline
  • 64-bit: larger databases, counter not overflowing, better math performance -> code size increases

Pipelines

  • Important factors
    • Branch prediction
    • Data dependency
  • Pipeline deeper -> misprediction penalty larger
  • Multiple instruction issue: flushing & refetching more instructions
  • OOP: More indirect branches to be predicted

Branch Prediction

  • Predict future based on history
  • Global predictor: predict based on global & local history
    • (m, n) prediction: m-bit global x n-bit local

Out-of-Order Execution

  • Solve data dependency
  • Mask cache miss delay

Superscalar & Simultaneous Multithreading(SMT)/Hyperthreading

    1                1 2              1 2 6
    2                3 4 5            3 4 5    
    3                6                7 8
    4                7 8              9
    5                9
    6
    7
    8
    9
single-issue     superscalar    out-of-order superscalar

    1    1'            1 2 6             1 2 1'
    2    2'            3 4 5             2'6 6'
    3    3'            7 8               3'3 5
    4    4'            9                 4 4'5'
    5    5'            1'2'6'            7 8 7'
    6    6'            3'4'5'            9 8'
    7    7'            7'8'              9'
    8    8'            9'
    9    9'
2 applications        fast         hyperthreading
                context switching

results matching ""

    No results matching ""