Content
- Overview
- Memory Hierarchy
- Reliability & Efficiency- Disk Failure
- Mirroring
- Data Striping
- Parity Check
- RAID
 
- File Organization- Records- Fixed-Length Records
- Variable-Length Records
 
- Organizing Records in Files- Heap File
- Sequential File
- Hashing
- Multitable Clustering
 
 
- Records
- Buffer Manager
Overview
- Cache- Query processing data structure/algorithm design
- KB ~ MB
 
- Memory- Too small for the entire enterprise database
- 2 GB ~ 8 GB
- The buffer for disk so that DBMS can operate on
 
- Magnetic disk
   - 2 surfaces per platter; track 1 on surface 1, track 2 on surface 2, etc.
- 50000 ~ 100000 track on a platter
- 500 ~ 1000 sectors per inner track
- 1000 ~ 2000 sectors per outer track
- Sector size typically 512 B
- Steps- Seek track
- Rotate disk to place data under head
- Transfer data
 
- Access time: request ~ start of data transfer - Seek + rotational delay
 
- Data transfer rate: rate retrieving/storing data from/to disks- Disk controller may be shared across multiple disks; its processing speed is crucial
 
- Data Block: data transfer unit between memory & disk- 4 KB ~ 16 KB (multiple sectors)
- I/O operation: reading/writing of data blocks- Seek time + rotational delay + transfer time
 
 
 
- Magnetic tape- Backup storage
- Only sequential accesses (no seeks)
- ~5 TB can be achieved
 
- Optical storage- Slower than disk
- CD-ROM (640 MB), DVD (4.7~17 GB)
- Usually write-once read-many (WORM)
 
- Flash memory- Reads roughly as fast as memory; writes slow
- Each location write-once, or erase the entire bank
- Limit of number of writes at each location (100000~1000000)
 
Memory Hierarchy
 
 
- Data put on disk
- Data manipulated on memory
- Data backup on tertiary storage
Reliability & Efficiency
Disk Failure
- Mean time to failure (MTTF): average time the disk is expected to run continuously without failure
- Mean time to data loss: depends on MTTF & disk organization- Shortened if multiple disks are present with the same MTTF; MTTF/number_of_disks
 
- Shortened if multiple disks are present with the same MTTF; 
Mirroring

Storing redundant copies in other disks.
- Reliability- High yet expensive
 
- Efficiency- Read request handling rate doubled
- Speed of processing each read request remains the same
 
Data Striping

Partition data into blocks and store in different disks.
- Reliability- No improvement
 
- Efficiency- High data transfer rate by parallel read
 
Parity Check

Extra parity disk storing the XOR results of all disks.
- Reliability- Tolerates failure of 1 disk
 
- Efficiency- No improvement
 
RAID
Redundant Arrays of Independent Disks, different combinations of mirroring, striping, and error-correction strategies.
- RAID 0: striping only
- RAID 1/ RAID 10/ RAID 1+0: striping + mirroring
- Others: RAID 2, 3, 4, 5, 6
RAID 5
All disks share the same workload of read requests.
 
    
File Organization
Large-scale DB systems don't usually rely on the underlying OS for file management; one large OS file is allocated to the DB system instead, and all relations are stored in this file and are managed by the DB system.
- Blocks: storage/data transfer unit
- Records: several records in a block
Records
Fixed-Length Records
 

- Disallow records to cross block borders to avoid doubling the I/O requests

Variable-Length Records
- Different types of record may appear in the same block
- Variable lengths are allowed for the same record types
- Slotted-page structure
   
Organizing Records in Files
Heap File

Stores records in any block with empty space.
- Adv.- Simplicity
 
- Disadv.- Blocks in a file scattered over the disk
- Need to scan all data to locate a record
 
Sequential File

Records in sequential order of the search key.
- Adv.- Faster processing for search keys in sorted order
- e.g. SELECT * FROM T WHERE ID < xxx;
 
- Disadv.- Hard to maintain
 
- Insertion- Locate the record just before
- If free slot in the same block, insert there
- Otherwise, insert into overflow block
 
Hashing

Hash values computed based on some attributes of each record.
Multitable Clustering

Put >= 2 related relations in the same file to achieve faster joins.
- Adv.- Faster processing for joins and finding related records in relations
 
- Disadv.- Suffers data on only 1 relation
- Hard to deal with variable-length records
 
Buffer Manager
- Buffer: portion of memory storing copies of disk blocks- Buffer miss
- Buffer hit
- Buffer write- Data written to buffer will be updated to disk when buffer is flushed or upon other reliability requirements
 
- Buffer eviction
 
- Buffer replacement policy- LRU, LFU, ...
 
- Buffer manager: program responsible for buffer allocation in memory & moving blocks between memory and disk to minimize I/O
- Data dictionary/system catalog: stores metadata; frequently accessed by buffer manager & query optimizer, hence stay in memory- Info about relations e.g. names/types of relations/attributes, integrity constraints, views
- Statistical data e.g. number of tuples
- Physical file organization