Вы находитесь на странице: 1из 26

QUERY OPTIMIZATION

BY NAJIYA P M No:100919002

OUTLINE
y INTRODUCTION y QUERY PROCESSOR y QUERY OPTIMIZATION y QUERY OPTIMIZATION:AS A SEARCH PROBLEM y TRANSFORMATIONS OF RELATIONAL ALGEBRA y STATISTICS OF EXPRESSION RESULTS y COST BASED OPTIMIZATION y CONCLUSION y REFERENCES
16/1/2012

INTRODUCTION
y Query processing and optimization is a fundamental, if not

critical, part of any DBMS.


y Given a query, there are many plans that a database management

system (DBMS) can follow to process it and produce its answer.


y All plans are equivalent in terms of their final output but vary in

their cost, i.e., the amount of time that they need to run.
y What is the plan that needs the least amount of time? y The cost difference between two alternatives can be enormous.

16/1/2012

QUERY PROCESSOR
y Parsing and translation

first step, convert a query into a form usable by the query processing engine. y Optimization the query processor applies rules to the internal data structures of the query to transform these structures into equivalent ,but more efficient representations y Evaluation best evaluation plan candidate generated by the optimization engine is selected and then executed
4 16/1/2012

QUERY PROCESSOR(Cont)
y

16/1/2012

Query - Example
Query: find the names of all customer who have an account at any branch located in Brooklyn

y Branch-schema = (branch-name, branch-city, assets) y Account-schema = (account-number, branch-name, balance) y Depositor-schema = (customer-name, account-number)

16/1/2012

Example

Evaluation plans
7 16/1/2012

QUERY OPTIMIZATION
y Query optimization is the process of selecting the most

efcient query- evaluation plan from among the many strategies usually possible for processing a given query, especially if the query is complex. The function of a DBMS query optimization engine is to find an evaluation plan that reduces the overall execution cost of a query.

16/1/2012

Aspects of optimization
y One aspect at the relational algebra level the system

attempts to find an expression that is equivalent to the given expression ,but more efficient to execute. Another aspect select a detailed strategy for processing the query

16/1/2012

QUERY OPTIMIZATION :AS A SEARCH PROBLEM


y

10

16/1/2012

Transformations of Relational Algebra


y Two relational algebra expressions are said to be equivalent

if on every database instance the two expressions generate the same set of tuples
y In SQL, inputs and outputs are set of tuples

Two expressions in the set version of the relational algebra are said to be equivalent if on every legal database instance the two expressions generate the same set of tuples
y An equivalence rule says that expressions of two forms

are equivalent
Can replace expression of first form by second, or vice versa
11 16/1/2012

Statistics of Expression Results


The cost of an operation depends on the size and other statistics of its inputs o Use the statistics to estimate statistics on the results of relational algebra, but the estimates need not be accurate

12

16/1/2012

Catalog information
y nr: number of tuples in a relation r. y br: number of blocks containing tuples of r. y lr: size of a tuple of r. y fr: blocking factor of r i.e., the number of tuples of r that fit

into one block. y V(A, r): number of distinct values that appear in r for attribute A; same as the size of A(r). y If tuples of r are stored together physically in a file, then:

nr br ! fr
13 16/1/2012

Cost based optimization


y Generation of query-evaluation plans for an expression

involves several steps:


1. Generating logically equivalent expressions using

equivalence rules. 2. Annotating resultant expressions to get alternative query plans 3. Choosing the cheapest plan based on estimated cost
y The overall process is called cost based optimization.

14

16/1/2012

Cost based optimization(cont)


No need to generate all the join orders. Using dynamic programming, the least-cost join order for any subset of {r1, r2, . . . rn} is computed only once and stored for future use.
y To find best join tree for a set of n relations: y To find best plan for a set S of n relations, consider all possible plans of the form: S1 (S S1) where S1 is any non-empty subset of S. y Recursively compute costs for joining subsets of S to find the cost of each plan. Choose the cheapest of the 2n 1 alternatives. y When plan for any subset is computed, store it and reuse it when it is required again, instead of recomputing it y Dynamic programming
15 16/1/2012

Cost based optimization(cont)


y The cost of a evaluation plan cannot be calculated without

actually evaluating the plan. y Optimizers make use of statistical information about the relations y Disk access dominates the cost of processing a query

16

16/1/2012

JOIN ORDER OPTIMIZATION ALGORITHM


procedure findbestplan(S) if (bestplan[S].cost { g) return bestplan[S] // else bestplan[S] has not been computed earlier, compute it now if (S contains only 1 relation) set bestplan[S].plan and bestplan[S].cost based on the best way of accessing S else for each non-empty subset S1 of S such that S1 { S P1= findbestplan(S1) P2= findbestplan(S - S1) A = best algorithm for joining results of P1 and P2 cost = P1.cost + P2.cost + cost of A if cost < bestplan[S].cost bestplan[S].cost = cost bestplan[S].plan = execute P1.plan; execute P2.plan; join results of P1 and P2 using A return bestplan[S]
17 16/1/2012

Cost based optimization


y With dynamic programming time complexity of

optimization with trees is O(3n).[1]


 With n = 10, this number is 59000 instead of 176 billion!

y Cost-based optimization is expensive, but worthwhile for

queries on large datasets (typical queries have small n, generally < 10)

18

[1]

16/1/2012

CONCLUSION
y Cost based optimization is expensive y Other optimization techniques like heuristic based

optimization can be used y Cost estimation is only as good as cost estimates y To a large extent ,the success of a DBMS lies in the quality , functionality and sophistication of its query optimizer, since that determines much of the systems performance

19

16/1/2012

REFERENCES
[1] Michael L. Rupley, Jr , Introduction to query processing and optimization [2] Surajit Chaudhari , An Overview of Query Optimization in Relational Systems [3] Henry F Korth, Abraham Silberschatz and Sudarshan, Database System Concepts [4] Ramez Elmasri, Shamkant B Navathe, Fundamentals of Database Systems

20

16/1/2012

THANK YOU

21

16/1/2012

HEURISTIC OPTIMIZATION
y Cost-based optimization is expensive, even with dynamic

programming. y Systems may use heuristics to reduce the number of choices that must be made in a cost-based fashion. y Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in all cases) improve execution performance:
y Perform selection early (reduces the number of tuples) y Perform projection early (reduces the number of attributes) y Perform most restrictive selection and join operations before other

similar operations. y Some systems use only heuristics, others combine heuristics with partial cost-based optimization.
22 16/1/2012

ENUMERATION OF EQUIVALENT EXPRESSIONS


y Query optimizers use equivalence rules to systematically generate

expressions equivalent to the given expression y Conceptually, generate all equivalent expressions by repeatedly executing the following step until no more expressions can be found:

y for each expression found so far, use all applicable equivalence rules y add newly generated expressions to the set of expressions found so far

y The above approach is very expensive in space and time y Space requirements reduced by sharing common subexpressions:
y when E1 is generated from E2 by an equivalence rule, usually only
y E.g. when applying join associativity

the top level of the two are different, subtrees below are the same and can be shared

y Time requirements are reduced by not generating all expressions


y More details shortly

23

16/1/2012

JOIN OPERATION -EXAMPLE


Running example: depositor customer Catalog information for join examples: y ncustomer = 10,000. y fcustomer = 25, which implies that bcustomer =10000/25 = 400. y ndepositor = 5000. y fdepositor = 50, which implies that bdepositor = 5000/50 = 100. y V(customer_name, depositor) = 2500, which implies that , on average, each customer has two accounts.
y Also assume that customer_name in depositor is a foreign key on customer. y V(customer_name, customer) = 10000 (primary key!)
24 16/1/2012

JOIN ORDERING -EXAMPLE


y Consider the expression

4customer_name ((Wbranch_city = Brooklyn (branch)) (account depositor)) y Could compute account depositor first, and join result with Wbranch_city = Brooklyn (branch) but account depositor is likely to be a large relation. y Only a small fraction of the banks customers are likely to have accounts in branches located in Brooklyn
y it is better to compute

Wbranch_city = Brooklyn (branch) account first.


25 16/1/2012

Difference in cost - example


y

26

16/1/2012

Вам также может понравиться