Content

Overview
Multicore Processor
1. Direct Connections
2. On-Chip Networks
3. Shared Memory Cores
4. Symmetric Multiprocessors
5. Synchronization
  1. Sequential Consistency
  2. Memory Fences
6. Memory Coherence
  1. Cache Coherence v.s. Memory Consistency
  2. Example: Parallel I/O
7. Intervention
8. False Sharing
9. Out-of-Order Loads/Stores & CC

Overview

Instruction level parallelism
- Dynamic: Super-scalar processor, OOO execution
- Static: VLIW
Data level parallelism
- Vector machines, SIMD extensions
Thread level parallelism

Multicore Processors

Uniprocessor speed limited by power wall -> multicore

Direct Connections

Low latency, high throughput, point-to-point network between processors
- Bypass I/O subsystems
Low latency between neighboring processors
- Sometimes dedicated machine instructions
Multi-hop routing between further processors
Often tie to distributed system design
Often proprietary design

On-Chip Networks

Building network for system-on-chip
- Complete computer system on chip, including graphs, peripheral and memory controllers, accelerators
MPSoC (Multi-processor system on a chip)
- Multiple compute core in the system
Mostly proprietary

Shared Memory Cores

Symmetric Multiprocessors

Each processor equally far away from memory; any processor can do any I/O.

Synchronization

Producer-consumer
Mutual exclusion

Sequential Consistency

A system is sequentially consistent if the result of any execution = all processors executed in sequential order, and operations of each processor appear in the order specified by the program.

Sequential consistency = arbitrary order-preserving interleaving of memory references of sequential programs

Imposes more memory ordering constraints than those imposed by uniprocessor program dependencies.

Issues
- Out-of-order execution capability
- Caches: store not seen by other processors

No common commercial architecture has a sequentially consistent memory model.

Memory Fences

Processors with relaxed/weak memory models (permit load/store to different addresses to be reordered) need to provide memory fence instructions to force serialization of memory accesses.
Expensive, but cost only paid when needed.

Memory Coherence

Both write-back and write-through may still cause memory inconsistency.

H/W support
- Only 1 processor has write permission to a memory location at a time
- No processor can load a stale copy

Cache Coherence v.s. Memory Consistency

Cache coherence protocol
- Ensures all writes by 1 processor are eventually visible to others, for 1 memory address
- Not enough to ensure sequential consistency
Memory consistency model
- Gives rules when a write will be visible to other's read, across different addresses
Cache coherence protocol + processor memory reorder buffer -> implement memory consistency model

Example: Parallel I/O

DMA: direct memory access, I/O can read/write memory autonomously from CPU

Problem
- Memory - disk
- Disk - memory
Solution: snoopy cache
- Cache watch DMA transfers
- Tags are dual-ported

Observed Bus Cycle	Cache State	Cache Action
DMA read memory -> disk	Not cached	N/A
	Cached, not modified	N/A
	Cached, modified	Cache intervene
DMA write disk -> memory	Not cached	N/A
	Cached, not modified	Cache purge copy
	Cached, modified	???

Snoopy cache coherence protocol
- Write miss
  - Invalidate address in all other caches
- Read miss
  - Force write-back from dirty cache to memory
Cache state transition diagram
- MSI
- MESI

Intervention

When a cache places a read request on the bus to read from another cache, memory holding stale data may also respond to the request.
The cache needs to intervene through memory controller to supply the correct data.

2 caches read different words from the same cache line.

Out-of-Order Loads/Stores & CC

Blocking caches
- One request at a time + CC => SC
Non-blocking caches
- Multiple requests (different address) + CC => relaxed memory models

Multithreading & Multicore Processors

Content

Overview

Multicore Processors

Direct Connections

On-Chip Networks

Shared Memory Cores

Symmetric Multiprocessors

Synchronization

Sequential Consistency

Memory Fences

Memory Coherence

Cache Coherence v.s. Memory Consistency

Example: Parallel I/O

Intervention

Out-of-Order Loads/Stores & CC

results matching ""

No results matching ""

Content

Overview

Multicore Processors

Direct Connections

On-Chip Networks

Shared Memory Cores

Symmetric Multiprocessors

Synchronization

Sequential Consistency

Memory Fences

Memory Coherence

Cache Coherence v.s. Memory Consistency

Example: Parallel I/O

Intervention

False Sharing

Out-of-Order Loads/Stores & CC

results matching ""

No results matching ""