Content
- Overview
- Memory Hierarchy
- Reliability & Efficiency
- Disk Failure
- Mirroring
- Data Striping
- Parity Check
- RAID
- File Organization
- Records
- Fixed-Length Records
- Variable-Length Records
- Organizing Records in Files
- Heap File
- Sequential File
- Hashing
- Multitable Clustering
- Records
- Buffer Manager
Overview
- Cache
- Query processing data structure/algorithm design
- KB ~ MB
- Memory
- Too small for the entire enterprise database
- 2 GB ~ 8 GB
- The buffer for disk so that DBMS can operate on
- Magnetic disk
- 2 surfaces per platter; track 1 on surface 1, track 2 on surface 2, etc.
- 50000 ~ 100000 track on a platter
- 500 ~ 1000 sectors per inner track
- 1000 ~ 2000 sectors per outer track
- Sector size typically 512 B
- Steps
- Seek track
- Rotate disk to place data under head
- Transfer data
- Access time: request ~ start of data transfer
- Seek + rotational delay
- Data transfer rate: rate retrieving/storing data from/to disks
- Disk controller may be shared across multiple disks; its processing speed is crucial
- Data Block: data transfer unit between memory & disk
- 4 KB ~ 16 KB (multiple sectors)
- I/O operation: reading/writing of data blocks
- Seek time + rotational delay + transfer time
- Magnetic tape
- Backup storage
- Only sequential accesses (no seeks)
- ~5 TB can be achieved
- Optical storage
- Slower than disk
- CD-ROM (640 MB), DVD (4.7~17 GB)
- Usually write-once read-many (WORM)
- Flash memory
- Reads roughly as fast as memory; writes slow
- Each location write-once, or erase the entire bank
- Limit of number of writes at each location (100000~1000000)
Memory Hierarchy
- Data put on disk
- Data manipulated on memory
- Data backup on tertiary storage
Reliability & Efficiency
Disk Failure
- Mean time to failure (MTTF): average time the disk is expected to run continuously without failure
- Mean time to data loss: depends on MTTF & disk organization
- Shortened if multiple disks are present with the same MTTF;
MTTF/number_of_disks
- Shortened if multiple disks are present with the same MTTF;
Mirroring
Storing redundant copies in other disks.
- Reliability
- High yet expensive
- Efficiency
- Read request handling rate doubled
- Speed of processing each read request remains the same
Data Striping
Partition data into blocks and store in different disks.
- Reliability
- No improvement
- Efficiency
- High data transfer rate by parallel read
Parity Check
Extra parity disk storing the XOR results of all disks.
- Reliability
- Tolerates failure of 1 disk
- Efficiency
- No improvement
RAID
Redundant Arrays of Independent Disks, different combinations of mirroring, striping, and error-correction strategies.
- RAID 0: striping only
- RAID 1/ RAID 10/ RAID 1+0: striping + mirroring
- Others: RAID 2, 3, 4, 5, 6
RAID 5
All disks share the same workload of read requests.
File Organization
Large-scale DB systems don't usually rely on the underlying OS for file management; one large OS file is allocated to the DB system instead, and all relations are stored in this file and are managed by the DB system.
- Blocks: storage/data transfer unit
- Records: several records in a block
Records
Fixed-Length Records
- Disallow records to cross block borders to avoid doubling the I/O requests
Variable-Length Records
- Different types of record may appear in the same block
- Variable lengths are allowed for the same record types
- Slotted-page structure
Organizing Records in Files
Heap File
Stores records in any block with empty space.
- Adv.
- Simplicity
- Disadv.
- Blocks in a file scattered over the disk
- Need to scan all data to locate a record
Sequential File
Records in sequential order of the search key.
- Adv.
- Faster processing for search keys in sorted order
- e.g. SELECT * FROM T WHERE ID < xxx;
- Disadv.
- Hard to maintain
- Insertion
- Locate the record just before
- If free slot in the same block, insert there
- Otherwise, insert into overflow block
Hashing
Hash values computed based on some attributes of each record.
Multitable Clustering
Put >= 2 related relations in the same file to achieve faster joins.
- Adv.
- Faster processing for joins and finding related records in relations
- Disadv.
- Suffers data on only 1 relation
- Hard to deal with variable-length records
Buffer Manager
- Buffer: portion of memory storing copies of disk blocks
- Buffer miss
- Buffer hit
- Buffer write
- Data written to buffer will be updated to disk when buffer is flushed or upon other reliability requirements
- Buffer eviction
- Buffer replacement policy
- LRU, LFU, ...
- Buffer manager: program responsible for buffer allocation in memory & moving blocks between memory and disk to minimize I/O
- Data dictionary/system catalog: stores metadata; frequently accessed by buffer manager & query optimizer, hence stay in memory
- Info about relations e.g. names/types of relations/attributes, integrity constraints, views
- Statistical data e.g. number of tuples
- Physical file organization