Content

  1. Overview
  2. B+ Tree
    1. Searching
    2. Insertion
    3. Deletion
  3. Hashing
    1. Static Hashing
    2. Dynamic/Extendable Hashing
  4. Indexing & SQL

Overview

  • Search key: an attribute or a set of attributes for table lookup
    • Primary: defines the sequential order of the file
    • Secondary: allows to access data with different search key
  • Indices
    • Ordered
      • Indexed-sequential file, B+ tree, ...
    • Hashed
      • Extendable hash-index
  • Index evaluation
    • Access types
      • Equality v.s. range search
      • Single attr v.s. multi attr search
    • Access time
    • Insertion/deletion time
    • Space overhead

B+ Tree

  • All paths from root to tree are of the same length (balanced)
  • Node size = a block
    • A node contains >= 1 keys
    • Minimize block retrieval instead of key comparison as in balanced binary search tree
  • Low in height
  • Range search supported

  • Node
    • n pointers
    • n-1 search keys
    • Search keys within a node are sorted
  • Leaf node
    • At most n-1 values
    • At least ceil((n-1)/2) values
    • Last pointer: chain leaf nodes
    • Pointer before a search key: points to record
  • Non-leaf node
    • At most n pointers
    • At least ceil(n/2) pointers
    • Pointer left to key K: points to subtree containing values < K
    • Pointer right to key K: points to subtree containing values >= K

Searching

  1. Traverse from root to leaf
  2. Search in leaf nodes
  3. Find pointer, retrieve record

Insertion

  1. Search
  2. Insert if free space on node
  3. Split node if node full
    • Leaf node: n records
      1. Create a new node
      2. Distribute ceil(n/2) records to one node, remaining to the other
      3. Update parent
    • Non-leaf node: n+1 pointers
      1. Create a new node
      2. Distribute the pointers
      3. Consider the keys required in each slot among the 2 nodes; move key to parent node to separate the search keys

Deletion

  1. Search
  2. Remove record from file
  3. Remove key from leaf node
  4. If leaf node underfull, merge with sibling nodes
    1. Merge
    2. Update parent
    3. If merging fails, redistribute the entries
      1. Redistribute
      2. Update keys

Hashing

  • No order
  • Entries divided into buckets (typically a disk block)
  • Records with different search keys may map to the same bucket -> collision
    • Sequential search within the bucket
  • Best for equality comparisons
  • Not for range search or ORDER BY operations

Static Hashing

  • If number of blocks grow to an extent, performance degrades due to excessive overflow buckets
  • If DB size initially set large, waste space at first

Dynamic/Extendable Hashing

  • Bucket address table

    • Hash prefix n
    • n pointers pointing to buckets containing that hash value
  • Bucket

    • Integer i_j
    • Records sharing the same first i_j hash values
  • Bucket splitting

    • If n == i_j
      • Increment n
      • Double the bucket address table
      • Reinsert entries in that bucket
    • If n > i_j
      • Create new bucket z, set i_j = i_z = i_j + 1
      • Change lower-half of entries pointing to bucket j to point to bucket z
      • Remove & reinsert entries in the buckets

Indexing & SQL

CREATE INDEX <index-name> // CREATE UNIQUE INDEX
ON <relation-name> ( <attribute-list> ) 
USING {BTREE | HASH} // optional index_type
DROP INDEX <index-name>

results matching ""

    No results matching ""