Академический Документы
Профессиональный Документы
Культура Документы
BY NAJIYA P M No:100919002
OUTLINE
y INTRODUCTION y QUERY PROCESSOR y QUERY OPTIMIZATION y QUERY OPTIMIZATION:AS A SEARCH PROBLEM y TRANSFORMATIONS OF RELATIONAL ALGEBRA y STATISTICS OF EXPRESSION RESULTS y COST BASED OPTIMIZATION y CONCLUSION y REFERENCES
16/1/2012
INTRODUCTION
y Query processing and optimization is a fundamental, if not
their cost, i.e., the amount of time that they need to run.
y What is the plan that needs the least amount of time? y The cost difference between two alternatives can be enormous.
16/1/2012
QUERY PROCESSOR
y Parsing and translation
first step, convert a query into a form usable by the query processing engine. y Optimization the query processor applies rules to the internal data structures of the query to transform these structures into equivalent ,but more efficient representations y Evaluation best evaluation plan candidate generated by the optimization engine is selected and then executed
4 16/1/2012
QUERY PROCESSOR(Cont)
y
16/1/2012
Query - Example
Query: find the names of all customer who have an account at any branch located in Brooklyn
y Branch-schema = (branch-name, branch-city, assets) y Account-schema = (account-number, branch-name, balance) y Depositor-schema = (customer-name, account-number)
16/1/2012
Example
Evaluation plans
7 16/1/2012
QUERY OPTIMIZATION
y Query optimization is the process of selecting the most
efcient query- evaluation plan from among the many strategies usually possible for processing a given query, especially if the query is complex. The function of a DBMS query optimization engine is to find an evaluation plan that reduces the overall execution cost of a query.
16/1/2012
Aspects of optimization
y One aspect at the relational algebra level the system
attempts to find an expression that is equivalent to the given expression ,but more efficient to execute. Another aspect select a detailed strategy for processing the query
16/1/2012
10
16/1/2012
if on every database instance the two expressions generate the same set of tuples
y In SQL, inputs and outputs are set of tuples
Two expressions in the set version of the relational algebra are said to be equivalent if on every legal database instance the two expressions generate the same set of tuples
y An equivalence rule says that expressions of two forms
are equivalent
Can replace expression of first form by second, or vice versa
11 16/1/2012
12
16/1/2012
Catalog information
y nr: number of tuples in a relation r. y br: number of blocks containing tuples of r. y lr: size of a tuple of r. y fr: blocking factor of r i.e., the number of tuples of r that fit
into one block. y V(A, r): number of distinct values that appear in r for attribute A; same as the size of A(r). y If tuples of r are stored together physically in a file, then:
nr br ! fr
13 16/1/2012
equivalence rules. 2. Annotating resultant expressions to get alternative query plans 3. Choosing the cheapest plan based on estimated cost
y The overall process is called cost based optimization.
14
16/1/2012
actually evaluating the plan. y Optimizers make use of statistical information about the relations y Disk access dominates the cost of processing a query
16
16/1/2012
queries on large datasets (typical queries have small n, generally < 10)
18
[1]
16/1/2012
CONCLUSION
y Cost based optimization is expensive y Other optimization techniques like heuristic based
optimization can be used y Cost estimation is only as good as cost estimates y To a large extent ,the success of a DBMS lies in the quality , functionality and sophistication of its query optimizer, since that determines much of the systems performance
19
16/1/2012
REFERENCES
[1] Michael L. Rupley, Jr , Introduction to query processing and optimization [2] Surajit Chaudhari , An Overview of Query Optimization in Relational Systems [3] Henry F Korth, Abraham Silberschatz and Sudarshan, Database System Concepts [4] Ramez Elmasri, Shamkant B Navathe, Fundamentals of Database Systems
20
16/1/2012
THANK YOU
21
16/1/2012
HEURISTIC OPTIMIZATION
y Cost-based optimization is expensive, even with dynamic
programming. y Systems may use heuristics to reduce the number of choices that must be made in a cost-based fashion. y Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in all cases) improve execution performance:
y Perform selection early (reduces the number of tuples) y Perform projection early (reduces the number of attributes) y Perform most restrictive selection and join operations before other
similar operations. y Some systems use only heuristics, others combine heuristics with partial cost-based optimization.
22 16/1/2012
expressions equivalent to the given expression y Conceptually, generate all equivalent expressions by repeatedly executing the following step until no more expressions can be found:
y for each expression found so far, use all applicable equivalence rules y add newly generated expressions to the set of expressions found so far
y The above approach is very expensive in space and time y Space requirements reduced by sharing common subexpressions:
y when E1 is generated from E2 by an equivalence rule, usually only
y E.g. when applying join associativity
the top level of the two are different, subtrees below are the same and can be shared
23
16/1/2012
4customer_name ((Wbranch_city = Brooklyn (branch)) (account depositor)) y Could compute account depositor first, and join result with Wbranch_city = Brooklyn (branch) but account depositor is likely to be a large relation. y Only a small fraction of the banks customers are likely to have accounts in branches located in Brooklyn
y it is better to compute
26
16/1/2012