Академический Документы
Профессиональный Документы
Культура Документы
Chs 12 - 14
Web Forms
Application FEs
SQL Interface
Parser Optimizer
Concurrency Control
Index Files
CISC 432/832
Lectures in Module 2
Overview Sort Join Other operations
CISC 432/832
Plan Generator
Catalog Manager
CISC 432/832
Each operator typically implemented using a `pull interface: when an operator is `pulled for the next output tuples, it `pulls on its inputs and computes them.
Ideally: Want to find best plan. Practically: Avoid worst plans! System R approach discussed in text.
CISC 432/832 6
Evaluation of Expressions
Materialization: generate results of an expression whose inputs are relations or are already computed, materialize (store) it on disk. Repeat. Pipelining: pass on tuples to parent operations even as an operation is being executed
CISC 432/832 7
Materialization
Materialized evaluation: evaluate one operation at a time, starting at the lowest-level. Use intermediate results materialized into temporary relations to evaluate next-level operations. E.g., in figure below, compute and store
building"Watson" (department )
then compute the store its join with instructor, and finally compute the projection on name.
CISC 432/832
Pipelining
Result of one operator pipelined to another without creating temporary table Pipelines can be executed in two ways: demand driven and producer driven
CISC 432/832
D
C A B
Pipelined Evaluation
9
Pipelining (Cont.)
In demand driven or lazy evaluation
system repeatedly requests next tuple from top level operation Each operation requests next tuple from children operations as required, in order to output its next tuple In between calls, operation has to maintain state so it knows what to return next
System schedules operations that have space in output buffer and can process more input tuples
Iterator Interface
Relational operators at nodes in plan tree support a uniform iterator interface
Open: initializes state by allocating input and output buffers, passes arguments to operator. Get_next: calls operator specific code to process input tuples and generate output tuples. Close: deallocates state info when all output produced.
Hides whether operator pipelines or materializes input tuples Also used to encapsulate access methods like B+tree and hash indexes.
CISC 432/832 12
# tuples (NTuples) and # pages (NPages) for each relation. # distinct key values (NKeys) and NPages for each index. Index height, low/high key values (Low/High) for each tree index. Updating whenever data changes is too expensive; lots of approximation anyway, so slight inconsistency ok.
More detailed information (e.g., histograms of the values in some field) are sometimes stored.
CISC 432/832 13
Most widely used currently; works well for < 10 joins. Statistics, maintained in system catalogs, used to estimate cost of operations and result sizes. Considers combination of CPU and I/O costs. Only the space of left-deep plans is considered.
Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation.
Cost Estimation
For each plan considered, must estimate cost:
CISC 432/832
15
P)
(t=5 T)
16
Selection ( ) Selects a subset of rows from relation. Projection ( ) Selects desired columns from relation. Cross-product ( X ) Combines two relations. Set-difference ( - ) Tuples in one relation but not the other. Union ( ) All tuples in both relations. Intersection () , join ( ), outer joins .
17
Additional operations:
CISC 432/832
Reserves:
Each tuple is 40 bytes long, 100 tuples per page, 1000 pages. Each tuple is 50 bytes long, 80 tuples per page, 500 pages.
19
Sailors:
CISC 432/832
Example
SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5
What is the equivalent relational algebra query?
CISC 432/832
20
RA Tree:
sname
Motivating Example
SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5
bid=100
rating > 5
sid=sid
Sailors Reserves Cost: 500+500*1000 I/Os By no means the worst plan! (On-the-fly) Plan: sname Misses several opportunities: selections could have been `pushed rating > 5 (On-the-fly) bid=100 earlier, no use is made of any available indexes, etc. Goal of optimization: To find more (PO Nested Loops) sid=sid efficient plans that compute the same answer.
CISC 432/832
Sailors
Reserves
21
(On-the-fly) sname
bid=100
rating > 5
Cost of plan:
Reserves
Sailors
Scan Reserves (1000) - produces 10 pages, if we have 100 boats, uniform distribution. Scan Sailors (500) + write temp T1 (250 pages, if we have 10 ratings). PONL: 10 * 250 = 2500 Total: 1000 + 500 + 250 + 2500 = 4250 page I/Os.
CISC 432/832 22
sname
(On-the-fly)
sid=sid
bid=100
Sailors
Reserves
availability of sid index on Sailors. Cost: Selection of Reserves tuples (10 IOs); for each, must get matching Sailors tuple (1000*1.2 IOs); total 1210 I/Os.
CISC 432/832 23
Summary
There are several alternative evaluation algorithms for each relational operator. A query is evaluated by converting it to a tree of operators and evaluating the operators in the tree. Must understand query optimization in order to fully understand the performance impact of a given database design (relations, indexes) on a workload (set of queries). Two parts to optimizing a query: