Content
- Overview
- B+ Tree
- Searching
- Insertion
- Deletion
- Hashing
- Static Hashing
- Dynamic/Extendable Hashing
- Indexing & SQL
Overview
- Search key: an attribute or a set of attributes for table lookup
- Primary: defines the sequential order of the file
- Secondary: allows to access data with different search key
- Indices
- Ordered
- Indexed-sequential file, B+ tree, ...
- Hashed
- Extendable hash-index
- Ordered
- Index evaluation
- Access types
- Equality v.s. range search
- Single attr v.s. multi attr search
- Access time
- Insertion/deletion time
- Space overhead
- Access types
B+ Tree
- All paths from root to tree are of the same length (balanced)
- Node size = a block
- A node contains >= 1 keys
- Minimize block retrieval instead of key comparison as in balanced binary search tree
- Low in height
- Range search supported
- Node
n
pointersn-1
search keys- Search keys within a node are sorted
- Leaf node
- At most
n-1
values - At least
ceil((n-1)/2)
values - Last pointer: chain leaf nodes
- Pointer before a search key: points to record
- At most
- Non-leaf node
- At most
n
pointers - At least
ceil(n/2)
pointers - Pointer left to key
K
: points to subtree containing values <K
- Pointer right to key
K
: points to subtree containing values >=K
- At most
Searching
- Traverse from root to leaf
- Search in leaf nodes
- Find pointer, retrieve record
Insertion
- Search
- Insert if free space on node
- Split node if node full
- Leaf node:
n
records- Create a new node
- Distribute
ceil(n/2)
records to one node, remaining to the other - Update parent
- Non-leaf node:
n+1
pointers- Create a new node
- Distribute the pointers
- Consider the keys required in each slot among the 2 nodes; move key to parent node to separate the search keys
- Leaf node:
Deletion
- Search
- Remove record from file
- Remove key from leaf node
- If leaf node underfull, merge with sibling nodes
- Merge
- Update parent
- If merging fails, redistribute the entries
- Redistribute
- Update keys
Hashing
- No order
- Entries divided into buckets (typically a disk block)
- Records with different search keys may map to the same bucket -> collision
- Sequential search within the bucket
- Best for equality comparisons
- Not for range search or ORDER BY operations
Static Hashing
- If number of blocks grow to an extent, performance degrades due to excessive overflow buckets
- If DB size initially set large, waste space at first
Dynamic/Extendable Hashing
Bucket address table
- Hash prefix
n
n
pointers pointing to buckets containing that hash value
- Hash prefix
Bucket
- Integer
i_j
- Records sharing the same first
i_j
hash values
- Integer
Bucket splitting
- If
n == i_j
- Increment
n
- Double the bucket address table
- Reinsert entries in that bucket
- Increment
- If
n > i_j
- Create new bucket
z
, seti_j = i_z = i_j + 1
- Change lower-half of entries pointing to bucket
j
to point to bucketz
- Remove & reinsert entries in the buckets
- Create new bucket
- If
Indexing & SQL
CREATE INDEX <index-name> // CREATE UNIQUE INDEX
ON <relation-name> ( <attribute-list> )
USING {BTREE | HASH} // optional index_type
DROP INDEX <index-name>