Content
1. Overview
2. Paged Memory System
1. Private Address Space
3. Demand Paging
1. Page Fault Handling
4. Page Tables
1. Linear Page Table
2. Hierarchical Page Table
3. Translation Lookaside Buffer (TLB)
1. TLB Designs
5. Address Translation
1. TLB Miss Handling
1. Software
2. Hardware
3. Page Table Walk
2. Performance
1. Physical Addressed Cache v.s. Virtual Addressed Cache
2. Virtual-Index Physical-Tag Caches
6. Historical Use
Overview
- Virtualization
- Assume having access to infinite memory address space
- Relocation
- Assume running at fix memory addresses
- Protection
- Assume being the only running program
Paged Memory System
Address -> | page number | offset |
2^offset B = page size
Needs to be fast and space efficient.
Private Address Space
Demand Paging
Virtual memory requirements too large; can use hard disk for saving memory pages.
Page Fault Handling
- Exception -> OS take over
- Create new page OR locate page on disk
- Copy content to memory
- Swap pages to swap space on disk if no more physical space
- Thrashing: excessive swapping e.g. working set > memory
- Update page tables
Long time to transfer pages -> handled completely by OS, with untranslated addressing mode on. Other jobs can be run on the CPU while waiting.
Page Tables
Too large to keep in registers -> keep in memory. 2 memory accesses per request, one for page base, one for offset.
Linear Page Table
- Page table entry (PTE)
- Valid bit
- Physical page number (PPN)
- Disk page number (DPN)
- Status bit (protection & usage)
- Limitations
- Too big
- Larger pages?
- Increased internal fragmentation
- Increased page fault penalty
Hierarchical Page Table
Translation Lookaside Buffer (TLB)
- TLB hit: single-cycle
- TLB miss: page table walk
TLB Designs
- Typically 32-128 entries, fully-associative
- Each entry maps a large page -> less spatial locality across pages -> more likely for 2 entries to conflict
- Sometimes 256-512 B TLB are 4-8 way set-associative
- Larger systems can have multi-level TLBs
- Random or FIFO policy
- Process information
- Process identifier: may waste space and is not useful for modern large-space processes running
- TLB reach: size of virtual memory space that can be simultaneously mapped by TLB
Address Translation
TLB Miss Handling
- Handling TLB miss: S/W or H/W
- Handling page fault: restartable exception for S/W to resume
- Handling protection fault: abort process or handles like page fault
Software
- Exception caused, OS walks page table and reloads
- Privileged untranslated addressing mode
Hardware
- Memory Management Unit walks page table and reloads
- If page not found, raise page fault exception
Page Table Walk
Performance
- Additional latency of TLB
- Slow down clock
- Pipeline TLB and cache
- Virtual addressed cache
- Parallel TLB/cache access
Physical Addressed Cache v.s. Virtual Addressed Cache
- Physical addressed cache
- Virtual addressed cache
- One-step process
- Need to flush on context switch or use address space identifiers (ASID) in tags
- Aliasing problem
- 2 copies of same physical data; not visible to others
- Prevent aliases coexisting in cache OR shared pages must agree in index bits
- Cache coherence problem
Virtual-Index Physical-Tag Caches
Allows concurrent access to TLB and cache.
Historical Use
- Bare machine with physical addresses only
- One program
- Batch-style multiprogramming
- Multiple programs sharing CPU while waiting for I/O
- Base & bound: translation & protection between programs
- External fragmentation
- Time sharing
- More interactive programs
- Motivated to move to fixed-size page translation & protection -> no external fragmentation, but internal fragmentation
- Motivated adoption of virtual memory
- Virtual machine monitor
- Multiple OS sharing 1 machine
- Guest OS virtual -> Guest OS physical -> Host machine physical
- Full demand-paged virtual memory
- Portability
- Protection
- Share small physical memory among active tasks
- Simplified implementation of some OS features
- Vector supercomputers
- Translation & protection
- Rarely complete demand-paging
- Prevent thrashing
- Difficult to implement restartable vector instructions
- Embedded systems & DSPs: physical address only
- Cannot afford area/speed/power of VM support
- No secondary storage to swap to
- Customised for specific memory configuration
- Difficult to implement restartable instructions for exposed architectures