Qry Myppt

QUERY OPTIMIZATION
BY NAJIYA P M No:100919002
OUTLINE
y INTRODUCTION y QUERY PROCESSOR y QUERY OPTIMIZATION y QUERY OPTIMIZATION:AS A SEARCH PROBLEM y TRANSFORMATIONS OF RELATIONAL ALGEBRA y STATISTICS OF EXPRESSION RESULTS y COST BASED OPTIMIZATION y CONCLUSION y REFERENCES
16/1/2012
INTRODUCTION
y Query processing and optimization is a fundamental, if not
critical, part of any DBMS.

y Given a query, there are many plans that a database management
system (DBMS) can follow to process it and produce its answer.

y All plans are equivalent in terms of their final output but vary in
their cost, i.e., the amount of time that they need to run.
y What is the plan that needs the least amount of time? y The cost difference between two alternatives can be enormous.
16/1/2012
QUERY PROCESSOR
y Parsing and translation
first step, convert a query into a form usable by the query processing engine. y Optimization the query processor applies rules to the internal data structures of the query to transform these structures into equivalent ,but more efficient representations y Evaluation best evaluation plan candidate generated by the optimization engine is selected and then executed
4 16/1/2012
QUERY PROCESSOR(Cont)
y
16/1/2012
Query - Example
Query: find the names of all customer who have an account at any branch located in Brooklyn
y Branch-schema = (branch-name, branch-city, assets) y Account-schema = (account-number, branch-name, balance) y Depositor-schema = (customer-name, account-number)
16/1/2012
Example
Evaluation plans
7 16/1/2012
QUERY OPTIMIZATION
y Query optimization is the process of selecting the most
efcient query- evaluation plan from among the many strategies usually possible for processing a given query, especially if the query is complex. The function of a DBMS query optimization engine is to find an evaluation plan that reduces the overall execution cost of a query.
16/1/2012
Aspects of optimization
y One aspect at the relational algebra level the system
attempts to find an expression that is equivalent to the given expression ,but more efficient to execute. Another aspect select a detailed strategy for processing the query
16/1/2012
QUERY OPTIMIZATION :AS A SEARCH PROBLEM

y
10
16/1/2012
Transformations of Relational Algebra

y Two relational algebra expressions are said to be equivalent
if on every database instance the two expressions generate the same set of tuples
y In SQL, inputs and outputs are set of tuples
Two expressions in the set version of the relational algebra are said to be equivalent if on every legal database instance the two expressions generate the same set of tuples
y An equivalence rule says that expressions of two forms
are equivalent
Can replace expression of first form by second, or vice versa
11 16/1/2012
Statistics of Expression Results

The cost of an operation depends on the size and other statistics of its inputs o Use the statistics to estimate statistics on the results of relational algebra, but the estimates need not be accurate
12
16/1/2012
Catalog information
y nr: number of tuples in a relation r. y br: number of blocks containing tuples of r. y lr: size of a tuple of r. y fr: blocking factor of r i.e., the number of tuples of r that fit
into one block. y V(A, r): number of distinct values that appear in r for attribute A; same as the size of A(r). y If tuples of r are stored together physically in a file, then:
nr br ! fr
13 16/1/2012
Cost based optimization

y Generation of query-evaluation plans for an expression
involves several steps:

1. Generating logically equivalent expressions using
equivalence rules. 2. Annotating resultant expressions to get alternative query plans 3. Choosing the cheapest plan based on estimated cost
y The overall process is called cost based optimization.
14
16/1/2012
Cost based optimization(cont)

No need to generate all the join orders. Using dynamic programming, the least-cost join order for any subset of {r1, r2, . . . rn} is computed only once and stored for future use.
y To find best join tree for a set of n relations: y To find best plan for a set S of n relations, consider all possible plans of the form: S1 (S S1) where S1 is any non-empty subset of S. y Recursively compute costs for joining subsets of S to find the cost of each plan. Choose the cheapest of the 2n 1 alternatives. y When plan for any subset is computed, store it and reuse it when it is required again, instead of recomputing it y Dynamic programming
15 16/1/2012
Cost based optimization(cont)

y The cost of a evaluation plan cannot be calculated without
actually evaluating the plan. y Optimizers make use of statistical information about the relations y Disk access dominates the cost of processing a query
16
16/1/2012
JOIN ORDER OPTIMIZATION ALGORITHM

procedure findbestplan(S) if (bestplan[S].cost { g) return bestplan[S] // else bestplan[S] has not been computed earlier, compute it now if (S contains only 1 relation) set bestplan[S].plan and bestplan[S].cost based on the best way of accessing S else for each non-empty subset S1 of S such that S1 { S P1= findbestplan(S1) P2= findbestplan(S - S1) A = best algorithm for joining results of P1 and P2 cost = P1.cost + P2.cost + cost of A if cost < bestplan[S].cost bestplan[S].cost = cost bestplan[S].plan = execute P1.plan; execute P2.plan; join results of P1 and P2 using A return bestplan[S]
17 16/1/2012
Cost based optimization

y With dynamic programming time complexity of
optimization with trees is O(3n).[1]

With n = 10, this number is 59000 instead of 176 billion!
y Cost-based optimization is expensive, but worthwhile for
queries on large datasets (typical queries have small n, generally < 10)
18
[1]
16/1/2012
CONCLUSION
y Cost based optimization is expensive y Other optimization techniques like heuristic based
optimization can be used y Cost estimation is only as good as cost estimates y To a large extent ,the success of a DBMS lies in the quality , functionality and sophistication of its query optimizer, since that determines much of the systems performance
19
16/1/2012
REFERENCES
[1] Michael L. Rupley, Jr , Introduction to query processing and optimization [2] Surajit Chaudhari , An Overview of Query Optimization in Relational Systems [3] Henry F Korth, Abraham Silberschatz and Sudarshan, Database System Concepts [4] Ramez Elmasri, Shamkant B Navathe, Fundamentals of Database Systems
20
16/1/2012
THANK YOU
21
16/1/2012
HEURISTIC OPTIMIZATION
y Cost-based optimization is expensive, even with dynamic
programming. y Systems may use heuristics to reduce the number of choices that must be made in a cost-based fashion. y Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in all cases) improve execution performance:
y Perform selection early (reduces the number of tuples) y Perform projection early (reduces the number of attributes) y Perform most restrictive selection and join operations before other
similar operations. y Some systems use only heuristics, others combine heuristics with partial cost-based optimization.
22 16/1/2012
ENUMERATION OF EQUIVALENT EXPRESSIONS

y Query optimizers use equivalence rules to systematically generate
expressions equivalent to the given expression y Conceptually, generate all equivalent expressions by repeatedly executing the following step until no more expressions can be found:
y for each expression found so far, use all applicable equivalence rules y add newly generated expressions to the set of expressions found so far
y The above approach is very expensive in space and time y Space requirements reduced by sharing common subexpressions:
y when E1 is generated from E2 by an equivalence rule, usually only
y E.g. when applying join associativity
the top level of the two are different, subtrees below are the same and can be shared
y Time requirements are reduced by not generating all expressions

y More details shortly
23
16/1/2012
JOIN OPERATION -EXAMPLE

Running example: depositor customer Catalog information for join examples: y ncustomer = 10,000. y fcustomer = 25, which implies that bcustomer =10000/25 = 400. y ndepositor = 5000. y fdepositor = 50, which implies that bdepositor = 5000/50 = 100. y V(customer_name, depositor) = 2500, which implies that , on average, each customer has two accounts.
y Also assume that customer_name in depositor is a foreign key on customer. y V(customer_name, customer) = 10000 (primary key!)
24 16/1/2012
JOIN ORDERING -EXAMPLE

y Consider the expression
4customer_name ((Wbranch_city = Brooklyn (branch)) (account depositor)) y Could compute account depositor first, and join result with Wbranch_city = Brooklyn (branch) but account depositor is likely to be a large relation. y Only a small fraction of the banks customers are likely to have accounts in branches located in Brooklyn
y it is better to compute
Wbranch_city = Brooklyn (branch) account first.

25 16/1/2012
Difference in cost - example

y
26
16/1/2012

Qry Myppt

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Qry Myppt

Загружено:

Авторское право:

Доступные форматы

QUERY OPTIMIZATION

critical, part of any DBMS.

system (DBMS) can follow to process it and produce its answer.

QUERY OPTIMIZATION :AS A SEARCH PROBLEM

Transformations of Relational Algebra

Statistics of Expression Results

Cost based optimization

involves several steps:

Cost based optimization(cont)

Cost based optimization(cont)

JOIN ORDER OPTIMIZATION ALGORITHM

Cost based optimization

optimization with trees is O(3n).[1]

y Cost-based optimization is expensive, but worthwhile for

ENUMERATION OF EQUIVALENT EXPRESSIONS

y Time requirements are reduced by not generating all expressions

JOIN OPERATION -EXAMPLE

JOIN ORDERING -EXAMPLE

Wbranch_city = Brooklyn (branch) account first.

Difference in cost - example

Вам также может понравиться