Content
- Paging Overview
- Introduction to Virtual Memory
- Paging MMU
- Page Tables
- Translation Lookaside Buffer (TLB)
- TLB Basics
- Memory Protection
- Demand Paging
- Demand Paging Overview
- Virtual Memory Hierarchy
- Managing Virtual Address Space
- Managing Physical Memory
- Managing Swap Area
- Processes & Virtual Memory
- Interaction of Processes with VM System
- Page Sharing & Memory-Mapped Files
- Kernal Address Space
- Page Replacement Algorithms
- Page Replacement Algorithms Overview
- Paging Issues
Paging Overview
Introduction to Virtual Memory
Problems with Contiguous Memory
- Growing program requires copying entire program
- Wasted memory due to internal & external fragmentation
- Running program requires loading entire program
- Max program size limited by memory size
Paging MMU
Pages & Frames
- Page: contiguous, fixed size chunks in virtual address space
- Frame: physical memory mapped from a page
Virtual & Physical Addresses
# 32-bit machine, virtual address space 2^32 B
# page size 2^12, # of pages 2^20
|...20 bits...|.12 bits.|
page num offset
# e.g. physical address space 2^30 B
# frame size 2^12, # of frames 2^18
|...18 bits...|.12 bits.|
page num offset
VA PA
-----------> ----------->
Processor MMU Bus
<----------------------------
data
Benefits of Paging
- Growing program requires allocating a page at a time
- No external fragmentation (page granularity); internal fragmentation 1/2 page per region
- Pages can be loaded in memory as program runs
- Max program size limited by disk size
Page Tables
Mapping information maintained by MMU.
Page Table Entries (PTE)
|...18 bits...|.7 bits.|C|D|R|W|V|
frame num unused
Storing Page Table
- Stored in memory
- Each memory access requires 2 memory accesses: PTE & PA
- Solution: Cache PTE in TLB
- Page table register (PTR)
- For MMU to store the location of start of page table
- OS associates a seperate page table for each process
- Address space switch switches PTR
Page Table Size
#pages * PTE size (typically word size)
= (va_size / page_size) * PTE size
- Smaller page size: lower internal fragmentation
- Larger page size: fewer PTE & memory overhead
Multi-Level Page Table
|.10 bits.|.10 bits.|.12 bits.|
PT1 PT2 offset
- Slower than single-level page table because more memory accesses
- Saves space
Inverted Page Table
Maps frame num -> (thread id, page num)
|....|...52 bits...|.12 bits.|
TID page num offset
| |
-----------
|
hash func ----> index into page table
- Exhaustive search is slow
- Use hash table indexed by page number
- Hashing function needs to be good
- Poor cache locality
- Adjacent pages hashed to scattered locations
- Sharing memory is complicated
Translation Lookaside Buffer (TLB)
TLB Basics
- H/W cache of PTE
- Small # of entries
- Exploits locality (program uses small # of pages at a time)
|virtual page number (VPN)|...18 bits...|.7 bits.|C|D|R|W|V|
frame num unused
| key: page number | data: PTE |
V: valid, W: writable, R: referenced (H/W), D: dirty (H/W), C: cacheable
V: is the entry valid or not?
W: is the page writable?
C: is the page cacheable? when not cacheable, processor should bypass the cache when accessing the page, e.g., to access memory-mapped device registers
D: has a writable page been written to (is it dirty with respect to the data on disk)?
R: has a page been referenced?
unused: unused by hardware, can be used by OS
H/W may or may not need execution bit, but instead use read as execute
TLB Operations
TLB Lookup
Fully associative TLB example
TLB cache miss handling
- TLB lookup fails -> page table lookup -> TLB cache replaced (TLB replacement policy)
- Handled by H/W or OS
- H/W managed TLB
- H/W defines page table format & replacement policy
- PTR to locate page table in physical memory
- Fast miss handling
- S/W managed TLB
- H/W generates trap called TLB miss fault
- Read fault
- Write fault
- OS handles TLB miss similar to exception handling
- OS figures out correct PTE, add in TLB (CPU has instructions to modify TLB)
- Page tables become entirely a OS data structure (no PTR in H/W, H/W doesn't know about page table)
- TLB replacement policy managed in S/W
- Slower by more flexible miss handling
- H/W generates trap called TLB miss fault
- H/W managed TLB
TLB cache invalidate
- Adds cost to context switching
- Changing PTR + invalidating TLB + TLB misses afterwards
- Invalidate options
- Clear TLB: clearing valid bit of all entries
- Tagged TLB: H/W maintains id tag on each entry
- Compare current thread id (stored in register) to the tag
- No invalidation; enables space multiplexing of entries
Memory Protection
- Generate protection fault if memory access inconsistent with protection bits
- Read-only fault
- No-execute fault
Demand Paging
Demand Paging Overview
- Motivation: program size/# of running programs limited by physical memory
- Make memory a cache for data on disk
- Basic Mechanism
- PTE unset for invalid page
- Cache miss when program accesses invalid page
- Generates page fault
- OS page fault handler performs miss handling: allocate frame, update PTE, set valid bit, restart instruction
- Benefits
- Allows programs run to exceed available memory (limited by VA space (processor architecture) or disk)
- Faster startup programs (no need to load entire program)
Virtual Memory Hierarchy
Swap Handler
- Chooses a page to evict using page replacement algorithm
- If page modified, write to a free location on swap
- Find pages that map to the evicted frame
- Need frame-to-page mapping (coremap)
- Change all PTEs to invalid
- Keep track of the evicted frame location in swap
- Return newly freed frame to page fault handler
Other Notes
- If TLB miss handling is performed by H/W, no TLB miss fault generated; if by S/W, H/W doesn't access PT
- Page fault handler:
- checks segmentation fault, protection fault
- allocates frame, PTE
- loads data from disk into frame
- maps page to frame by updating PTE
- Page contents may:
- be all 0
- in swap
- in executable
- OS accesses PT when:
- on TLB-miss fault (for h/w managed TLB), it reads the page table
- on page fault, it allocates frame and updates page table
- whenever address space of current process is modified, (frame is allocated/deallocated, context switch), it updates page table
Managing Virtual Address Space
Address Space
- Broken into regions or segments
- OS must track the regions within an address space & PT holding translations
- API
id = as_create()
create empty address space, allocate new PT with no pages mapped to framesas_destroy(id)
destroy as, free PTas_define_region(id,...)
add new regionas_modify_region(id,...)
change size (e.g. heap, stack)as_find_region(id,va)
find valid region or give seg faultas_load_page(id,...)
load page from file into memorynew_id = as_copy(id)
create copy of as & PT (actually copy data to other frames)as_switch(id)
mappings in TLB must be removed
Managing Physical Memory
Coremap
- Array of structures, one per frame
- Tracks:
- whether frame allocated
- address spaces & pages map to this frame (list of (pid, page_id))
- kernel or user
- API
frame = allocate_frame(n)
alloc n free contiguous frames (user needs 1 frame; kernel needs more)free_frame(frame)
map(id,page_nr,frame)
unamp(id,page_nr)
evict(frame)
for swap handler to unmap pages associated with frame
Managing Swap Area
- Dirty pages need to be placed in swap
- Bitmap or linked list of swap frames on disk; keep location in invalid PTE
If loading shared frame from swap and needs to write to it:
- allocate a frame and keep the swap frame
next time allocate another frame and free the swap frame
# PTE |frame number|unused|R|D|C|W|1| virtual page in memory |000.......................0|0| invalid PTE |index in swap area.........|0| virtual page in swap
Processes & Virtual Memory
Interaction of Processes with VM System
- OS implements VM system using 3 abstractions: address space, physical memory, & swap management
- Rest of OS invokes the VM systems when address space of current process changes
- Process creation
fork
- Create new PT & copy content from parent's regions
- Process execution
execv
- Destroy old address space structure & create new one
- Process termination
exit
- Context switch
- H/W managed TLB: change PTR, flush TLB
- S/W managed TLB: flush TLB
- Memory allocation/deallocation
sbrk, stack
- Heap: system call (
sbrk
) to grow; allows OS to detect errors - Stack: when faulting address close to stack, extend stack region running standard page fault handler
- Heap: system call (
- Process creation
Page Sharing & Memory-Mapped Files
- Benefits
- Fast communication between processes
- Good isolation (only share when needed)
- Applications
- Sharing text regions
- e.g. dynamically loaded libraries
- Must update all pages when frame evicted
- Copy-on-write pages
- Child shares pages with parent until pages modified
- PTE marks COW & read-only
- On protection fault: allocate new frame, copy, remap, make writable, update TLB, resume
- Must update all pages when frame evicted
- Memory-mapped files
- Threads r/w files using load/store instead of system calls
mmap(addr,length,prot_flags,fd,offset)
maps a file at given offset contiguously within address space- Storing
offset
to provide flexibility since different processes may have different addresses of shared memory
- Storing
- File data loaded on page fault
- Instead of swap area, use file itself as backing store
- More efficient than r/w system calls because system calls need buffer in kernel to transfer data
- Sharing text regions
- Shared memory
- Threads r/w to mapped memory region
- PTE can have different protections for different threads
- Memory can be mapped at same or different VA in each process
Kernal Address Space
- OS typically lies in low physical memory
- Options
- Paging turned off in kernel mode, OS access physical memory directly
- OS needs to simulate paging, i.e. do address translation in software, to copy in or out user parameters on a system call, deal with dirty bit, etc.
- OS uses separate address space
- Pros
- Clean isolation
- Cons
- System call requires switching MMU -> TLB flush
- Copy in/out of system call parameters requires traversing PT in S/W
- Pros
- OS maps to address space of each thread
- Typically mapped to high addresses
- Pros
- No system call
- Copy in/out of system call parameters can reuse paging H/W
- Cons
- Address space of processes reduces
- PT needs to be setup to access OS code
- H/W may allow OS to bypass PT
- Paging turned off in kernel mode, OS access physical memory directly
Page Replacement Algorithms
Page Replacement Algorithms Overview
- When will paging work?
- If occurs rarely
- Spatial locality & temporal locality
Algorithms
- Optimal algorithm: page that will not be needed for longest time
- Need to know the future
- Can be used to compare performance
- Generate a log of VM accesses (reference string); use log to simulate various replacement policies
- FIFO: oldest page
- Oldest page may be needed soon
- Faults may increase if given more memory (Belady's Anomaly)
Clock: FIFO giving second chance to referenced pages
- Referenced bit: set by processor when page read/written; cleared by OS/software
- Dirty bit: set by processor when page written; cleared by OS/software
- OS synchronizes R/D bits in TLB with PTE (H/W: write-through or write-back; S/W: needs to implement synchronization)
OS simulates the bits if H/W doesn't maintain
- When TLB read fault:
- Set referenced bit; make page read-only
- When TLB read-only protection fault/write fault:
- Set referenced & dirty bits; make page writeable
Algorithm
Choose page starting from clock hand: if referenced bit set: unset, goto next else: if page dirty: schedule page write, goto next else: select for replacement
- When TLB read fault:
- LRU: page used least recently
- Updating LRU on each memory access is expensive
- If MMU maintains a counter incremented at each clock cycle:
- When page accessed:
- Writes counter value to PTE
- On page fault:
- S/W looks through PT, identifies entry with oldest timestamp
- When page accessed:
- If MMU doesn't provide such counter, OS maintains it in S/W (LRU aprroximation):
- Periodically (timer interrupt) increment counter; granularity depends on timer interrupt
- If referenced bit set:
- Write counter value
- Clear referenced bit
- If referenced bit set:
- On page fault:
- S/W looks through PT & identify the oldest timestamp
- Periodically (timer interrupt) increment counter; granularity depends on timer interrupt
Working set clock: keep working set in memory
- Working set: set of pages a program needs currently
- Working set interval
T
:WS(T) = {pages accessed in interval (now, now-T)}
- Max WS: how much memory a program needs
- Working set interval
Algorithm
Choose page starting from clock hand: if referenced bit set: update time-of-last-use to current virtual time unset, goto next else: if (current time - time-of-last-use) < T: continue else: if page dirty: schedule page write, goto next else: select for replacement
- Working set: set of pages a program needs currently
Page fault frequency: estimate of working set needs of a program
Measurements
For each thread: On fault: f = f + 1 Every second, update faults/sec (fe) via aging: fe = (1-a) * fe + a * f, f = 0 # 0 < a < 1, weighting factor; a -> 1 means history ignored
- Allocate frames s.t. PFF is equal for programs
- Global replacement: victim process has lowest PFF
- Local replacement: use algorithms above (clock, LRU, etc.) to evict a page within the victim process
- Optimal algorithm: page that will not be needed for longest time
Comparison
|Algorithm|Comment| |:--------|:------| |Optimal|Not implementable; useful as benchmark| |FIFO|Might throw out important pages| |Clock|Realistic| |LRU|Excellent; difficult to implement efficiently| |WSC|Efficient working set algorithm| |PFF|Fairness in working set allocation|
Paging Issues
Paging & I/O Interaction
- Problem: a frame waiting for I/O may be selected for eviction
- Solution: each frame has do not evict flag called pinned page; un-pin after I/O completes
Paging Performance
- Paging works best if many free frames
- If pages full of dirty pages, 2 disk operations (swap in/out) needed on each page fault
- Paging daemon: swap out in advance
- OS maintains a pool of free frames using paging thread/daemon
- Daemon runs replacement algorithm periodically or when pool reaches low watermark:
- Writes out dirty pages
- Frees enough pages until pool reaches high watermark
- Frames can be rescued if page referenced before reallocation, because previous content still holds
- Prefetching: swap in in advance
- Predict future page usage at current faults
- Works well when pages read sequentially
Thrashing
- Livelock: OS spends time paging data from disk, programs not making progress
- Causes
- Pages replacement algorithm not working
- Not enough memory to hold working set of all running programs
- Solutions
- Swapping: suspend some programs for a while
- Buy more memory