Content

Overview
Memory Hierarchy
Reliability & Efficiency
1. Disk Failure
2. Mirroring
3. Data Striping
4. Parity Check
5. RAID
File Organization
1. Records
  1. Fixed-Length Records
  2. Variable-Length Records
2. Organizing Records in Files
  1. Heap File
  2. Sequential File
  3. Hashing
  4. Multitable Clustering
Buffer Manager

Overview

Cache
- Query processing data structure/algorithm design
- KB ~ MB
Memory
- Too small for the entire enterprise database
- 2 GB ~ 8 GB
- The buffer for disk so that DBMS can operate on
Magnetic disk
- 2 surfaces per platter; track 1 on surface 1, track 2 on surface 2, etc.
- 50000 ~ 100000 track on a platter
- 500 ~ 1000 sectors per inner track
- 1000 ~ 2000 sectors per outer track
- Sector size typically 512 B
- Steps
  1. Seek track
  2. Rotate disk to place data under head
  3. Transfer data
- Access time: request ~ start of data transfer
  - Seek + rotational delay
- Data transfer rate: rate retrieving/storing data from/to disks
  - Disk controller may be shared across multiple disks; its processing speed is crucial
- Data Block: data transfer unit between memory & disk
  - 4 KB ~ 16 KB (multiple sectors)
  - I/O operation: reading/writing of data blocks
    - Seek time + rotational delay + transfer time
Magnetic tape
- Backup storage
- Only sequential accesses (no seeks)
- ~5 TB can be achieved
Optical storage
- Slower than disk
- CD-ROM (640 MB), DVD (4.7~17 GB)
- Usually write-once read-many (WORM)
Flash memory
- Reads roughly as fast as memory; writes slow
- Each location write-once, or erase the entire bank
- Limit of number of writes at each location (100000~1000000)

Memory Hierarchy

Data put on disk
Data manipulated on memory
Data backup on tertiary storage

Reliability & Efficiency

Disk Failure

Mean time to failure (MTTF): average time the disk is expected to run continuously without failure
Mean time to data loss: depends on MTTF & disk organization
- Shortened if multiple disks are present with the same MTTF; MTTF/number_of_disks

Mirroring

Storing redundant copies in other disks.

Reliability
- High yet expensive
Efficiency
- Read request handling rate doubled
- Speed of processing each read request remains the same

Data Striping

Partition data into blocks and store in different disks.

Reliability
- No improvement
Efficiency
- High data transfer rate by parallel read

Parity Check

Extra parity disk storing the XOR results of all disks.

Reliability
- Tolerates failure of 1 disk
Efficiency
- No improvement

RAID

Redundant Arrays of Independent Disks, different combinations of mirroring, striping, and error-correction strategies.

RAID 0: striping only
RAID 1/ RAID 10/ RAID 1+0: striping + mirroring
Others: RAID 2, 3, 4, 5, 6

RAID 5

All disks share the same workload of read requests.

File Organization

Large-scale DB systems don't usually rely on the underlying OS for file management; one large OS file is allocated to the DB system instead, and all relations are stored in this file and are managed by the DB system.

Blocks: storage/data transfer unit
Records: several records in a block

Records

Fixed-Length Records

Disallow records to cross block borders to avoid doubling the I/O requests

Variable-Length Records

Different types of record may appear in the same block
Variable lengths are allowed for the same record types
Slotted-page structure

Organizing Records in Files

Heap File

Stores records in any block with empty space.

Adv.
- Simplicity
Disadv.
- Blocks in a file scattered over the disk
- Need to scan all data to locate a record

Sequential File

Records in sequential order of the search key.

Adv.
- Faster processing for search keys in sorted order
- e.g. SELECT * FROM T WHERE ID < xxx;
Disadv.
- Hard to maintain
Insertion
1. Locate the record just before
2. If free slot in the same block, insert there
3. Otherwise, insert into overflow block

Hashing

Hash values computed based on some attributes of each record.

Multitable Clustering

Put >= 2 related relations in the same file to achieve faster joins.

Adv.
- Faster processing for joins and finding related records in relations
Disadv.
- Suffers data on only 1 relation
- Hard to deal with variable-length records

Buffer Manager

Buffer: portion of memory storing copies of disk blocks
- Buffer miss
- Buffer hit
- Buffer write
  - Data written to buffer will be updated to disk when buffer is flushed or upon other reliability requirements
- Buffer eviction
Buffer replacement policy
- LRU, LFU, ...
Buffer manager: program responsible for buffer allocation in memory & moving blocks between memory and disk to minimize I/O
Data dictionary/system catalog: stores metadata; frequently accessed by buffer manager & query optimizer, hence stay in memory
- Info about relations e.g. names/types of relations/attributes, integrity constraints, views
- Statistical data e.g. number of tuples
- Physical file organization

File & Storage