Академический Документы
Профессиональный Документы
Культура Документы
Query Processing
Chapter VI
Query Processing
Basic Steps
Cost Model
Join Operations
Other Operations
Evaluation of Expressions
Sattler
Query Processing
Basic Steps
Information Systems
Query Processing
61
Basic Steps
SQL Query
Parser &
Translator
Relational Algebra
Expression
Statistics
Optimizer
Evaluation Engine
Execution Plan
(Data Dictionary)
Data Files
Sattler
Information Systems
62
Sattler
Information Systems
63
Query Processing
Basic Steps
Query Processing
Translate the query into its internal form (parse tree). This is then
translated into an expression of the relational algebra.
Parser checks syntax, validates relations, attributes and access
permissions
CName ((CUSTOMERS o
n ORDERS) o
n (Price>5000 (OFFERS)))
Representation as evaluation plan (query tree):
Evaluation
I
Basic Steps
CName
CName
OFFERS
ORDERS
CUSTOMERS
ORDERS
OFFERS
Information Systems
Query Processing
64
Sattler
Basic Steps
Information Systems
Query Processing
65
Cost Model
FR : blocking factor; number of tuples from R that fit into one block
(FR = dNR /BR e)
V(A, R): number of distinct values for attribute A in R.
Information Systems
66
Sattler
Information Systems
67
Query Processing
Cost Model
Query Processing
Cost Model
Selection Operation
Sattler
Information Systems
Query Processing
68
dlog2 (BR )e cost to locate the first tuple using binary search
Second term blocks that contain records satisfying the selection.
If A is primary key and SC(A, R) = 1, then cost(S2) = dlog2 (BR )e.
In case of a uniform distribution of attribute values for A, selection
A=a (R) retrieves SC(A, R) = NR /V(A, R) tuples.
Sattler
Information Systems
69
FEmployee = 10;
V(Deptno, Employee) = 50 (different departments)
FR
Cost Model
Information Systems
Query Processing
Sattler
Cost Model
610
Information Systems
611
Query Processing
Cost Model
Query Processing
Cost Model
Sattler
Information Systems
Query Processing
612
S6 A primary index on A, or
S7 A secondary index on A (in this case, typically a linear file
scan may be cheaper; but this depends on the selectivity of A)
Sattler
Information Systems
613
Cost Model
Complex Selections
Information Systems
Query Processing
Sattler
Cost Model
General pattern:
I
I
I
614
Sattler
Information Systems
615
Query Processing
Join Operations
Query Processing
Join Operations
Nested-Loop Join
Block Nested-Loop Join
Index Nested-Loop Join
Sort-Merge Join
Hash-Join
I
I
I
I
I
For this, join size estimates are required and in particular cost
estimates for outer-level operations in a relational algebra
expression.
Sattler
Join Operations
Information Systems
Query Processing
616
Sattler
Join Operations
Information Systems
Query Processing
617
Join Operations
Sattler
Information Systems
618
Sattler
Information Systems
619
Query Processing
Join Operations
Query Processing
Join Operations
Sattler
Information Systems
Query Processing
620
I
I
If equi-join attribute is the key on inner relation, stop inner loop with
first match
Use M 2 disk blocks as blocking unit for outer relation, where M =
db buffer size in blocks; use remaining two blocks to buffer inner
relation and output.
Reduces number of scans of inner relation greatly.
Scan inner loop forward and backward alternately, to make use of
blocks remaining in buffer (with LRU replacement strategy)
Use index on inner relation, if available.
Sattler
Information Systems
621
Join Operations
Worst case: db buffer can only hold one block of each relation
= BR + BR BS disk accesses.
Information Systems
Query Processing
Sattler
Join Operations
622
lookup on S.
Cost of the join: BR + NR c, where c is the cost of a single selection
on S using the join condition.
If indexes are available on both R and S, use the one with the
fewer tuples as the outer relation.
Sattler
Information Systems
623
Query Processing
Join Operations
Query Processing
Sort-Merge Join
Example:
I
I
I
I
I
Join Operations
Basic idea: first sort both relations on join attribute (if not already
sorted this way)
Join steps are similar to the merge stage in the external
sort-merge algorithm (discussed later)
Every pair with same value on join attribute must be matched.
Compute CUSTOMERS o
n ORDERS, with CUSTOMERS as the outer relation.
Let ORDERS have a primary B+ -tree index on the join-attribute CName,
which contains 20 entries per index node
Since ORDERS has 10,000 tuples, the height of the tree is 4, and one
more access is needed to find the actual data records (based on
tuple identifier).
Since NCUSTOMERS is 5,000, the total cost is 250 + 5000 5 = 25,250
disk accesses.
This cost is lower than the 100,250 accesses needed for a block
nested-loop join.
Relation R
1
2
2
3
4
5
1
2
3
3
5
Relation S
Information Systems
Query Processing
624
Sattler
Join Operations
Information Systems
625
Join Operations
Hash-Join
Sattler
Information Systems
Query Processing
626
Sattler
Information Systems
627
Query Processing
Other Operations
Query Processing
Other Operations
Sorting:
Duplicate Elimination:
quick-sort
Sorting: remove all but one copy of tuples having identical value(s)
on projection attribute(s)
Hashing:
Or, build index on the relation, and use index to read relation in
sorted order.
Relation that does not fit into db buffer
external sort-merge
1
2
I
I
Sattler
Information Systems
Query Processing
628
Sattler
Other Operations
Information Systems
Query Processing
Set Operations:
Grouping:
Sorting or hashing
Sorting or hashing.
Hashing:
I
I
I
I
I
Sattler
Information Systems
630
Sattler
629
Other Operations
Information Systems
631
Query Processing
Evaluation of Expressions
Query Processing
Evaluation of Expressions
Evaluation of Expressions
ORDERS
OFFERS
Information Systems
Query Processing
632
Sattler
Evaluation of Expressions
Information Systems
Query Processing
633
Evaluation of Expressions
1
2
Sattler
Information Systems
634
Sattler
Information Systems
635
Query Processing
Evaluation of Expressions
Query Processing
Evaluation of Expressions
Equivalence of Expressions
5
6
(E1 E2 ) E3 (E1 E3 ) E2
E1 o
n E2 E2 o
n E1 (E1 o
n E2 ) o
n E3 E1 o
n (E2 o
n E3 )
Sattler
Information Systems
Winter term 2013/14
637
The application of equivalence rules to a relational algebra
expression
Query Processing
Evaluation of Expressions
is also sometimes called algebraic optimization.
9
Sattler
Information Systems
Query Processing
636
Evaluation of Expressions
Examples
Examples (cont.)
Selection:
I
Projection:
Find the name of all customers who have ordered a product for
more than $5,000 from a supplier located in Ilmenau.
CName (SAddress like 0 %Ilmenau%0
CName,account (CUSTOMERS o
n Prodname=0 CDROM0 (ORDERS))
Reduce the size of argument relation in join
Price>5000
(CUSTOMERS o
n (ORDERS o
n (OFFERS o
n SUPPLIERS))))
Perform selection as early as possible (but take existing indexes on
relations into account)
CName (CUSTOMERS o
n (ORDERS o
n
CName,account (CUSTOMERS o
n CName (Prodname=0 CDROM0 (ORDERS)))
Projection should not be shifted before selections, because
minimizing the number of tuples in general leads to more efficient
plans than reducing the size of tuples.
(Price>5000 (OFFERS) o
n (SAddress like 0 %Ilmenau%0 (SUPPLIERS)))))
Sattler
Information Systems
638
Sattler
Information Systems
639
Query Processing
Evaluation of Expressions
Query Processing
Join Ordering
Evaluation of Expressions
Information Systems
Query Processing
640
Sattler
Information Systems
Evaluation of Expressions
Query Processing
Evaluation Plan
641
Evaluation of Expressions
Query: List the name of all customers who have ordered a product
that costs more than $5,000.
Assume that for both CUSTOMERS and ORDERS an index on
CName exists: I1 (CName, CUSTOMERS), I2 (CName, ORDERS).
CName
pipeline
I1(CName, CUSTOMERS)
Sattler
Information Systems
642
Sattler
pipeline
I2(CName, ORDERS)
Information Systems
643
Query Processing
Evaluation of Expressions
Query Processing
Evaluation of Expressions
Cost-based: enumerate all the plans and choose the best plan in a
cost-based fashion.
Rule-based: Use rules (heuristics) to choose plan.
Sattler
Information Systems
Query Processing
644
Evaluation of Expressions
Summary
Steps of query processing: translate a SQL query into an
execution plan
Cost estimation based on statistics and cost formulas
Physical query operators: scans, join implementations, etc.
Evaluation of query expressions
Query optimization: search strategy for finding the best (cheapest)
plan
Sattler
Information Systems
646
Sattler
Information Systems
645