Query Evaluation
- Parsing & translation into relational algebra expression- RA is procedural
 
- Optimization
- Evaluation

Cost of Query Evaluation
Factors for Evaluating Each Operation
- Size of input
- Technique used to access the input
- Technique used to evaluate the operation
Measures of Query Cost
Total elapsed time for answering query.
- Disk access- Number of seeks * avg seek cost
- Number of block transfers * avg block transfer time- Number of blocks read * avg block read cost
- Number of blocks written * avg block write cost
 
 
- Buffer size
- CPU
- Network
Evaluation of Expressions
- Evaluation of an entire expression tree- Materialization- Evaluate one operation at a time, starting from the lowest level- Intermediate results materialized into temporary relations and passed on the next-level operations
 
 
- Evaluate one operation at a time, starting from the lowest level
- Pipelining- Evaluate several operations simultaneously, passing results of one operation on to the next
- May not always be possible
 
 
- Materialization
Pipelining v.s. Blocking Operators
Operators: algorithms for evaluating a basic operation.

- Blocking operator- Operators require the whole input to be available before they start producing results
 
- Pipelining operator- Operators may produce results before one or both inputs are fully seen
 
Interface of Pipelining Operators
- Open(): init state by allocating buffers for I/O
- GetNext(): get next output of operator
- Close(): terminate operator by closing I/O
Operations
Selection Operation
- Linear search
- Binary search
- Index-based search
B+-Tree
Primary Index
Cost = height of tree + number of blocks matching the index
Secondary Index
- Cost = height of tree + 1if candidate key
- Cost = height of tree + number of recordsotherwise
Sorting
- Logical sorting: sort by secondary index
- Physical sorting: sort the relation- Internal sorting: if relations fit in memory
- External sorting: sort-merge algorithm
 
External Sort-Merge Algorithm
Let M be number of blocks in memory, N be number of runs created.
- Create sorted runs- Read Mblocks of relation into memory
- Sort
- Write to run Ri
 
- Read 
- If N < M- Merge runs (N-say merge)- Use Nblocks of memory to buffer input runs, 1 block to buffer output
- Read first block of each run to its buffer page
- Select first record in sorted order among buffer pages, write to output buffer, delete from input buffer- If output full, write to disk
- If input empty, load next block
 
- Repeat until input buffer pages empty
 
- Use 
 
- Merge runs (N-say merge)
- If - N >= M- Merge every M-1runs, each pass reduces the number of runs by a factor ofM-1
- Repeat until runs merged into 1
 
- Merge every 
- Cost - $$br (2 * ceil(log{M-1}(\frac{b_r}{M})) + 1)$$
- Each block in relation- 2 reads (internal sort + merge)
- 1 write (output to disk)
 
 
Join Operation
- Merge join- Sorted output
 
- Hash join- Cheap
 
- Nested-loop join- Can pipeline
 
Nested-Loop Join

r: outer relation; s: inner relation
- Steps- Compare each tuple in rwith tuples ins
 
- Compare each tuple in 
- Cost- $$b_r + n_r * b_s$$
 
- Advantage- Simple
- Requires no indices
- General join
 
- Disadvantage- Expensive
 
Block Nested-Loop Join

- Steps- Compare each tuple in each block in rwith tuples in each block ins
 
- Compare each tuple in each block in 
- Cost- $$b_r + b_r * b_s$$
- Can be reduced to $$b_r + ceil(\frac{b_r}{M-2}) * b_s$$- M-2memory blocks used for loading blocks of- r
 
 
Indexed Nested-Loop Join

- Steps- For each tuple in r, use the index to look up tuples ins
 
- For each tuple in 
- Cost- $$b_r + n_r * \text{index search cost}$$
 
- Disadvantage- Applicable only to equi-join or natural join
- Index needs to be available on inner relation's join attribute
 
Merge Join

- Steps- Sort randson their join attribute
- Merge (similar to sort-merge algorithm)- Need to handle duplicate values in join attribute
 
 
- Sort 
- Cost- $$sort(r) + sort(s) + b_r + b_s$$
 
- Disadvantage- Applicable only to equi-join or natural join
 
Hash Join


r: probe input; s: build input
- Steps- Use hash function hto partition tuples ofrands- hmaps common attributes of- rand- sto- 0..nbuckets
- Tuples in bucket rineed only be compared with tuples in bucketsi
 
- For each i- Load siinto memory
- Build an in-memory hash table using the join attribute- Different hash function than the earlier h
 
- Different hash function than the earlier 
- Read tuples in rione by one, locate matching tuple insiusing the in-memory hash table
- Output concatenation of their attributes
 
- Load 
 
- Use hash function 
- Choice- Number of buckets n- sishould fit in memory,- rineed not
- Typically $$n = ceil(\frac{b_s}{M}) * f$$- f: fudge factor, typically around- 1.2
 
 
 
- Number of buckets 
- Cost- $$3 * (b_r + b_s) + \text{a little}$$
 
- Disadvantage- Applicable only to equi-join or natural join
 
Projection Operation
Drop some columns & eliminate duplicate tuples.
Duplicate Elimination
- Index exists for projection attributes- Cost: index scan (cheaper than file scan)
 
- Otherwise- Partitioning (sorting or hashing) to bring duplicates together, then eliminate
- Cost: sorting or hashing- External sorting: $$O(N log_{M-1} N)$$- N: number of pages of input
- M: number of pages fit in memory
 
- Hashing: same as above
 
- External sorting: $$O(N log_{M-1} N)$$
 
Aggregation Operation
- Similar to duplicate elimination
- Optimization- Compute partial aggregate values during run generation & intermediate merges