Throwaway 2

Martin Berg (martin.berg@oracle.
com)
Server Technology
System Management & Performance
Oracle Denmark APS
July 2000
Query tuning by eliminating throwaway

This paper deals with optimizing non optimal performing queries.
Abstract
During time there has been many descriptions on how to tune queries, but most descriptions
ends up being a list of recommendations (best practices) or examples on experienced
problems (worst practices).
The result of this is that tuning to a large degree is left over to guesswork and the experience
of the SQL tuner.
Several experienced SQL tuners today are using information about the row sources used in a
query for tuning purposes, but the method(s) has sofar not been formalized nor described and
has been up to the ingenuity of the SQL tuner.
This paper attempts to deliver a structured method for query tuning using tools already
available with the Oracle RDBMS.
The method described here was first introduced in a very limited way in the Oracle 8
Statement Tuning courseware from 1998.
Intended audience is all developers and DBAs with the need to tune queries.
The required level is above novice level as many concepts from basic SQL execution are
assumed as known.
Why is a query not performing optimal?
There can be more reasons why a query is not performing optimal.
For the purpose of this paper these reasons will be categorized in 4 main areas
Doing too many IO operations
Using expensive operations on rows (like sorting)
Not utilizing the available resources (for example parallelization)
Processing too many rows
Experience has shown that a large part of non optimal performing queries can credit the lack
of performance to the fourth reason - processing too many rows.
The number of rows processed within a query is dependent on output generated, the
operations used and how many rows are retrieved, but then thrown away, because they were
not used in the output of the query.
Most badly performing queries usually exhibits a high amount of rows processed and then
thrown away because they were not needed in the output.
This paper focuses on how to identify these processed, but thrown away rows and how to
use this identification to tune queries.
This idea of rows thrown away will hereafter be called throwaway.
July 2000
Page 1
Structure of paper
The rest of the paper will use following outline:
1. Basic Method for Query Tuning
2. Initial Examples
3. Defining Throwaway
4. Oracle Version Dependencies
5. Detailed Description of Throwaway in Row Sources
6. Secondary Throwaway
7. Tuning SQL by Eliminating Throwaway, the Method
8. Examples on Applying the Method
9. Using SQL Trace and tkprof
10. Tracing Issues
11. Miscellanous Issues
If you already are familiar with using the quantification of row sources found in the tkprof
output you can skip directly to section 3 and then continue with sections 6, 7 and 8 and use
sections 4, 5, 9, 10 and 11 as reference.
July 2000
Page 2
1. A Basic Method for Performing Query Tuning

Before describing the concept of throwaway it would be appropriate to describe a generic
method for performing query tuning.
Basically tuning of queries can be done by following high level method:
1. Set goals
2. Identify queries to be tuned
3. For every query identified:
4. Analyze query
5. Suggest changes to improve performance
6. Implement changes
7. Test improvement
8. Repeat until goal reached (or time spend)
Traditionally the hardest part of this is to get from step 4 (analyze) to step 5 (suggest
changes) and has - as described in the introduction - been up to the guesswork and/or
experience of the statement tuner.
The concept of tuning by eliminating throwaway is meant to directly address this part of the
tuning method by quantifying the work done in each of the execution steps a query is build
upon.
July 2000
Page 3
2. Initial Examples
Lets demonstrate the concept of throwaway by looking at two simple examples:
Query 1:
Select CLASS_ID
From CLASSES
Where TYPE = 'S/EC'
Output from tkprof:
Rows
---2386
2387
Execution Plan
---------------------------------------------TABLE ACCESS (BY INDEX ROWID) OF 'CLASSES'
INDEX (RANGE SCAN) OF 'CLASSES_TYPE' (NON-UNIQUE)
Index: CLASSES_TYPE on CLASSES(TYPE)

This trace shows that 2387 rows was retrieved from the index row source (range scan on
CLASSES_TYPE) and that all these rows (except for one caused by the index range scan)
resulted in a row returned from the table access.
Query 2:
Select CLASS_ID
From CLASSES
Where TYPE = 'S/EC'
and DAYS = 15
Output from tkprof:
Rows
---4
2387
Execution Plan
----------------------------------------------TABLE ACCESS (BY INDEX ROWID) OF 'CLASSES'
Same index as Query 1

This trace shows again that 2387 rows was retrieved from the index row source (range scan on
CLASSES_TYPE).
The difference is that now after retrieving the 2387 rows from the index these are used to
retrieve the same number of rows from the table (minus the extra row from the range scan) just
as before, but then most of these rows are filtered away (caused by the extra non-indexed
predicate in the select) - only leaving 4 rows.
Another way of describing this is that to execute the query some 99.8% of the rows retrieved are
not returned as output and are just thrown away after further evaluation (the extra predicate) hence the term throwaway.
It should now be intuitively obvious that the tkprof output for Query 2 shows that there is
unnecessary work done - and that it has something to do with how the predicates for CLASSES
are evaluated.
July 2000
Page 4
The experienced SQL tuner will of cause immediately spot the probable cause: the index used is
an single column index and by adding an extra column - DAYS - to the index the predicate
selectivity in the query will be matched by the selectivity in the index.
Now, a more tricky example:
Query 3:
Select *
From TABLE2
Where COL1 = 102
and COL2 = 23
(Index: I_COL1_COL2 on TABLE2(COL1, COL2))
A simple execution plan (by SQL*Plus AUTOTRACE or manual explain plan) would show:
TABLE ACCESS (BY INDEX ROWID) OF TABLE2
INDEX ACCESS (RANGE SCAN) OF I_COL1_COL2 (NON-UNIQUE)
This would lead most SQL tuners to the conclusion that this query was performing as good as
it can (all predicates can be found in the leading columns of an concatenated index).
The tkprof output is this case could show:
Rows
---71
3401
Execution Plan
--------------------------------------------------TABLE ACCESS (BY INDEX ROWID) OF TABLE2
INDEX ACCESS (RANGE SCAN) OF I_COL1_COL2 (NONUNIQUE)
According to the previous example this shows that there is work done that gets thrown away
before output.
This should only happen if not all predicates are used for the index retrieval.
Now the problem is to determine why only a part of the predicates are used for index retrieval.
A likely candidate would be to check for a implicit type conversion on the predicate
corresponding to the non-leading column in the index - COL2 (the leading column will be used otherwise the index would not have been used at all),
This example shows that by using the Rows column from the execution plan in the tkprof
output more information about the performance of a particular query can be obtained than by
just using the execution plan alone.
July 2000
Page 5
3. Defining Throwaway
3.1 Formal definition of throwaway
After the initial examples it is time to formalize a definition of the term throwaway:
There are two types of throwaway:
Primary throwaway is the amount of rows from the input to a row source that can not
be found in the output from that particular row source.
Secondary throwaway is the amount of rows generated as output from a row source
which are then thrown away in a following row source with primary throwaway.
3.2 Primary vs. secondary throwaway
To illustrate the difference between primary throwaway and secondary throwaway consider
following graphical representation of an execution plan:
#R1
Row Source 1
#R2
Row Source 2
Row Source 2 generates #R2 number of rows which are then processed in Row Source 1
resulting in a final set of rows - #R1.
Primary throwaway in Row Source 1 can be described as this row source has to process all the
input rows (#R2), but only delivers #R1 worth of output - it is the row source itself that in
processing the input rows is throwing away the ones not making it to the output.
Row Source 2 will in this case generate #R2 number of rows as output - which represents a
certain amount of work.
The secondary throwaway is then the amount of this work already done which is thrown away in
a following row source (Row Source 1 in this case).
The distinction between these two types of throwaway is very important when analysing
complex queries (for example joins).
To evaluate the total work thrown away by an operation the implied secondary throwaway
should be included in the evaluation. This will be further described in 6. Secondary
Throwaway.
July 2000
Page 6
3.3 Background - execution plans and row sources

To analyze queries for throwaway a clear understanding of execution plans, row sources and the
interpretation of the tkprof output is needed:
Every line in the execution plan (except for the very top one which just describes what type of
statement it is and informes about the optimizer approach) represents a row source.
A row source either generates rows directly by accessing a data object or accepts one or more
other row sources as input, manipulates these and then returns these as a new stream of rows.
For purpose of this paper all row sources has an input - either from a database object or from
one or more other row sources.
If the execution plan is interpreted as a tree with the row sources being leafs and branches the
leaf row sources will be row sources that accesses data and all branch row sources of the tree
will be row sources that accepts a row source as input.
The number of rows processed by a row source can be found left of the execution plan in the
tkprof output. In this context it should be noted that this definition of rows processed is not
consistent over different versions of Oracle (see 4. Oracle Version Dependencies).
The input rows is received from a row source which can be found down right in the execution
plan and the output rows are delivered up left to a receiving row source.
In above examples - Query 3 - the TABLE ACCESS (BY INDEX ROWID) row source receives
3401 rows as input and generates 71 rows output.
This can also be described as the rows starts from a leaf row source and then walks up the tree
through the branch row sources ending at the root.
July 2000
Page 7
3.4 What row sources can introduce throwaway?

It is not all row sources that can introduce throwaway - it depends on what processing the row
source is doing on the input rows.
The following table list the most common row sources and how throwaway can be identified for
each
Row source
Table access (full)
Table access (by index rowid)
Table access (cluster)
Table access (hash)
Index access
Nested Loops
Merge (join)
Merge (cartesian)
Hash join
Filter
Sort (aggregate)
Sort (unique)
Sort (group by)
Sort (join)
Sort (order by)
And-equal
Concatenation
Stop key
Connect by
Count
Minus
Intersection
Union-All
Group by
July 2000
Number of input row sources

Data access from table
1 (rowids from index)
1 (rowids from cluster index)
Data access from table
Data access from index
2
2
2
2
1 primary to be filtered and 0 or more
secondary for executing the filtering
1
1
1
1
1
2-5 (rowids from indexes)
2 or more
1
3
1
2
2
2 or more
1
Page 8
4. Oracle Version Dependencies

Before going into details with the different row sources it should be noted that there are some
version dependent issues on what numbers are reported for a row source.
4.1 Row source counts and calculation of throwaway in Oracle 7.3 and Oracle 8.0
In Oracle 7.3 and 8.0 the number reported for non-join operations is the total number of rows
processed within the row source.
For join operations the number reported basically is the number of rows processed in inner table
access by the join operation.
The result of this is that when running Query 2 from above following tkprof output would be
generated under Oracle 7.3 and Oracle 8.0:
Rows
---2386*)
2387
Execution Plan
----------------------------------------------TABLE ACCESS (BY INDEX ROWID) OF 'CLASSES'
*) In Oracle8i this was reported as 4.

For Oracle 7.3 and 8.0 this basically means that for simple row sources (one input) the
throwaway in a particular row source can be calculated as the number of rows processed by the
row source of interest minus the number of rows processed by the receiving row source.
In the case where there is no receiving row source (as the table access in above example) the
number of rows output from the statement (this information can be found elsewhere in the
tkprof output) is the number that should be subtracted.
In the case where the receiving row source reports output rows (for example a join operation)
the throwaway in the row source of interest can not always be determined - see later.
In above example 4 rows are returned from the select and therefore the throwaway in the table
access row source can be calculated as number of rows processed in table access operation
minus number of rows from statement: 2386-4 = 2382 rows (or a throwaway of 99.8% of the
rows processed).
4.2 Row source counts and calculation of throwaway in Oracle8i
In Oracle8i the number of rows reported is always the number of rows output from the row
source.
Throwaway in simple row sources in Oracle8i can then be calculated as number of rows
received from the input row source minus the number of rows from the row source of interest.
In the case where the row source of interest is accessing data (full table scans and index scans) it
becomes necessary to quantify the number of rows in the accessed data.
In the Oracle8i version of above example the throwaway from the table access row source can
be calculated as number of rows from index scan operation minus number of rows from table
access operation: (2387 - 1) - 4 = 2382 rows (or a throwaway of 99.8% of the rows processed).
The minus one from the formula comes from the range scan of the indes - see later.
July 2000
Page 9
5. Detailed Description of Simple Row Sources

In the following the most relevant simple row sources are described in further detail.
By simple is meant that the row sources being described all are row sources that either
accesses data (index or table) or accepts one row source as input.
Not all simple row sources mentioned above will be described - only the ones relevant for
statement tuning by elimination of throwaway.
For each described row source the calculation of throwaway and a probable cause and cure for
throwaway is described.
5.1 Throwaway in row source: TABLE ACCESS (FULL)
The table access (full) row source process all rows from the specified table and filters away throws away - the rows not satifying non-indexed predicates (if there are any).
For full table scans the number of rows processed in the accessed data is the number of rows in
the table multiplied by number of times the table was scanned during execution of the statement.
A full table scan should only be repeated if the full table scan can be found somewhere as a
inner row source of a nested loops join or if the full table scan is part of a filter operation.
Calculating throwaway:
Rows Execution plan
---- ----------------A TABLE ACCESS (FULL) OF TABLE
For Oracle 7.3 and 8.0 the throwaway can be calculated as:
A - #rows output
For Oracle8i the throwaway can be calculated as:
#N * Card(TABLE) - A
Where #N is the number of times the table was scanned (for example as the
inner table in a nested loops join) and Card(TABLE) is the number of rows in
TABLE.
In some cases there is reported one row more than output when using Oracle8i.
Actually this plus one can appear not on the full table scan, but later in one of
the row sources processing the rows from the full table scan.
In the real world this extra row does not matter, as it does not seem reasonable
to elaborate on one single row more or less.
Cure:
Throwaway in a table access (full) operation can only be caused by predicates that does
not use an index and the only solution is to use an index for all the predicates that
results in (a significant part of) the throwaway.
Limitation:
Not all predicates can be indexed.
July 2000
Page 10
5.2 Throwaway in row source: TABLE ACCESS (BY INDEX ROWID)

This is one of the most commonly seen row sources: it accepts rowids comming from an index
scan row source, retrieves the corresponding rows from the table and finally throws away the
rows not satifying non-indexed predicates (if there are any). Thereby the index scan is also
subjected to secondary throwaway.
Calculating (primary) throwaway, scenario 1 - Index Range Scan:
Rows Execution plan
---- ----------------A TABLE ACCESS (BY INDEX ROWID) OF TABLE
B
INDEX (RANGE SCAN) OF
A - #rows output
B - #N - A , where #N is the number of lookups in the index.
The explanation for above #N is that Oracle 7.3, 8.0 and 8i reports one row
extra from an index range scan per lookup - which in above case is 1 lookup.
In the case where the index is used for several lookups - for example when
being used for accessing the inner table in a nested loops join - #N will be the
number of times the index is used to find rows in the table - which for the
nested loops join means the number of rows gotten from the outer table (see 5.6
Throwaway in row sources: NESTED LOOPS join operation).
Calculating (primary) throwaway, scenario 2 - Index Unique Scan:
Rows Execution plan
B
INDEX (UNIQUE SCAN) OF
A - #rows output
(B - #N) - A , where #N is the number of lookups in the index
The explanation for above #N is that Oracle8i (but not Oracle 7.3 and 8.0)
also reports one row extra from an index unique scan per lookup - just as for
index range scans.
Cure:
Like the table access (full) row source throwaway can only be caused by non-indexed
predicates - and the cure is to add the non-indexed predicates to the index already in
use.
Limitation:
Not all predicates can be indexed.
July 2000
Page 11
5.3 Throwaway row source: INDEX SCAN

Currently only B*tree indexes are dealt with in this paper.
An index scan row source can either be a UNIQUE, RANGE, FULL or FAST FULL scan of
an B*tree index (a reverse key index can not be FULL scanned).
The UNIQUE scan requires that all columns in an unique index is used for the index lookup.
The FULL scan is used when all entries in the index is retrieved with the purpose of avoiding
sorting the table rows after retrieval.
There is a special version of the FULL scan - the FULL SCAN(MIN/MAX) introduced in
Oracle8i for finding resp. the minimum or the maximum value from an index (aka: go directly
to one end of the index).
All other index uses will be RANGE scans (and the FULL scan can be seen as a special case of
the RANGE scan and was not reported as such before Oracle 7.3).
Throwaway in index scan row sources can in normal cases only occur when doing index range
scans on concatenated indexes.
Sofar following three possibilities for throwaway have been discovered:
1. When a column between the leading column and the last used column from the index is not
being addressed by a predicate.
Example:
Select *
From CLASSES
Where TYPE = 'S/EC'
and DAYS = 15
(Index on CLASSES(TYPE, STATUS, DAYS))
The first predicate corresponding to the first column in the index is used for the lookup in the
index.
Hereafter all the rows satifying this predicate are evaluated using column values retrieved from
the index to satify the last predicate and the STATUS column is not used at all.
It should be noted that all the predicates are evaluated through the index - but only the first
predicate can be used to navigate down the index tree and the last predicate has to be
processed by retrieving the column value from the index, evaluate the condition and then
finally filter away the rows not satifying the condition.
This last filtering of retrieved rows from the index is throwaway.
2. When an indexed column is being processed by a function call.
Example:
Select *
From CLASSES
Where TYPE = 'S/EC'
and STATUS = 'BOOK'
and to_char(DAYS) = '5'
July 2000
Page 12
(Index on CLASSES(TYPE, STATUS, DAYS))

Here the two first predicates corresponding to the two first columns in the index are used for the
lookup in the index.
Hereafter all the rows satifying these predicates are evaluated to satify the last predicate.
It should be noted that all the predicates are evaluated through the index - but the difference is
that the two first predicates can be used to navigate down the index tree and the last
predicate has to be processed by retrieving the column value from the index, execute the
function (to_char) on the retrieved value, evaluate the condition and then finally filter away
the rows not satifying the condition.
This last filtering of retrieved rows from the index is throwaway.
3. When a column in the index is subjected to a range scan and there is a column placed later
in the index which also is used for evaluating predicates.
Example:
Select *
From REGISTRATIONS
Where REG_DATE between sysdate-10 and sysdate
and STATUS = 'HOLD'
(Index on REGISTRATIONS(REG_DATE, STATUS))
In this case all index entries satisfying the first predicate will be scanned and those not satifying
the second (equality) predicate will be filtered away, thus resulting in throwaway.
Calculating throwaway, Index Range Scan:
Rows Execution plan
B
INDEX (RANGE SCAN) OF
(B - #N) - A, where #N is the number of lookups in the index.
The explanation for above #N is the same as for table access (by index rowid).
For Oracle8i the throwaway can NOT be calculated.
In Oracle8i the rows reported for an index scan row source is (just as for a table
access row source) the rows returned (and not the rows processed as in Oracle 7.3 and
8.0) and it is impossible (or very difficult) to determine the number of rows retrieved
internally in the index as would be required to determine the throwaway.
Cure:
In the case where not all leading columns in the index is used for the index lookup (case
1 above) the solution could be to rearrange the sequence of the columns in the index to
match the predicates used.
July 2000
Page 13
In the case with a function disabling the full usage of predicates for index lookup (case
2 above) the only solution is to change the statement so that the function is not
performed on the indexed column side of the predicate.
In the case with a range scan on a leading column (case 3 above) the solution could be
to rearrange the sequence of the indexed columns to move the column being range
scanned to be the last column in the index.
Limitations:
Rearranging the column sequence in concatenated indexes migth result in unwanted
behaviour for other statements previosly using the same index.
Not all function calls can be avoided.
With range scans on more columns in the index it can prove impossible to find a
sequence that can not give any throwaway.
July 2000
Page 14
5.4 Throwaway in row source: SORT (UNIQUE)

There are several sort operations in Oracle - sort (order by), sort (group by), sort
(aggregate), sort (unique) etc. - but only the sort (unique) can introduce throwaway.
The sort (unique) row source is used to extract unique rows from the input row source - for
example when using the distinct operator.
The sort (aggregate) and sort (group by) operations both returns fewer rows than the input,
but as all input rows is included in the final then throwaway is not introduced in these
operations.
Calculating primary throwaway:
Rows Execution plan
---- ----------------A SORT (UNIQUE)
B
TABLE ACCESS ()OF
A - #rows output
B-A
Cure:
Nothing can be done by changing the statement or by adding (or removing) indexes
except if a predicate limiting the number of duplicate rows can be squeezed into the
query to be evaluated before the sort(unique) operation.
Some times it is a possibility is to change the data model or use the existing data model
in another manner to avoid the possible duplicate rows and thereby avoiding the
sort(unique) operation..
Limitations
The cures described might - if possible at all - be very complex to implement.
July 2000
Page 15
5.5 Throwaway in join operations, generic

Join operation row sources always takes two row sources as input, throws away the rows from
each input row source that can not be matched with a row from the other input row source and
finally returns the matched sets of rows.
This means that both input sources can be subjected to throwaway.
It should be noted that it is dependent on the join operation whether both input row sources or
only one of the input row sources can be subjected to secondary throwaway - this will be
described under each join operation.
It is not possible in all cases to quantify the amount of throwaway in join operations.
Therefore there must be adopted some pragmatic rules covering the cases where there is no
exact quantification possible.
If the output rows from the join operation is less than the largest input row source then there
without doubt are rows being thrown away, but it can not necessarily be determined from what
input the rows are thrown away or said in another way:
Throwaway in join operations can be identified by the output rows being less than an input row
source.
It is possible though to have throwaway from an input row source even if it is smaller than the
output rows.
July 2000
Page 16
5.6 Throwaway in row source: NESTED LOOPS join operation

For the nested loops join operation it is necessary to distinguish between the outer (the driving
row source) and the inner row source.
Non-join predicates on the outer row source (in the case where the outer row source is a table
access) will be applied before the join operation and throwaway caused by this is not considered
as throwaway from the nested loops join operation, but rather as throwaway from the row
source itself.
If the outer row source is not a table access, but instead the result of another join operation then
the non-join predicates has already been evaluated and can not cause this type of throwaway
from the outer row source.
There is subtle case where non-join predicates are applied on the outer row source just before
the join operation even if this row source is the result of another join operation - and this is
when the outer row source is evaluated in a non-merged view.
Throwaway from the outer row source during join happens when a row is retrieved from the
outer row source (after applying non-join predicates) and then a matching row from the inner
row source can not be found.
The amount of throwaway from the outer row source can therefore only be determined if the
number of lookups in the inner row source that did not result in any rows can be determined.
As the rows from the outer row source already has been retrieved, throwaway from the outer
row source will imply secondary throwaway on the outer row source.
A precise calculation of the number of thown away rows from the outer row source depends on
the access path to the inner row source.
To simplify the calculation of throwaway in nested loops join operations use following list of
scenarios that all deals with both row sources being table accesses (the Card(table name)
represents the number of rows in that particular table):
Calculating throwaway, scenario 1 (inner table access via unique scan on index):
Rows Execution plan
---- ----------------A NESTED LOOPS
B
TABLE ACCESS (FULL) OF TABLE_O
C
TABLE ACCESS (BY INDEX ROWID) OF TABLE_I
D
INDEX ACCESS (UNIQUE SCAN) OF (UNIQUE)
Throwaway from outer table (before join):
Throwaway from outer table (during join):
Throwaway from inner table (during join):
B-D
D - A (Outer joins: always 0)
C-A
Note that the output from a nested loops join operation is the number of row
retrieved from the inner table and not the number of rows from the join
operation. For normal (inner) joins these two numbers will be identical.
In case of an outer join these two row count will not be identical and to
determine the number of rows output from the nested loops operation use the
number of rows from the outer table minus throwaway before join:
B - (B - D) = D (and remember that this is for an index unique scan on the inner
table).
July 2000
Page 17

Card(TABLE_O) - B
B-A
D-B-C
Calculating throwaway, scenario 2 (inner table access via non-unique scan on index):
Rows Execution plan
---- ----------------A NESTED LOOPS
B
C
TABLE ACCESS (BY INDEX ROWID) OF TABLE_I
D
INDEX ACCESS (RANGE SCAN) OF (NON-UNIQUE)
B - (D - C)
If B > A then:
At least B - A
Otherwise indeterminable
Always 0 for outer joins
C-A

Card(TABLE_O) - B
If B > A then:
At least B - A
D-B-C
Calculating throwaway, scenario 3 (inner table access using full table scan):
Rows Execution plan
---- ----------------A NESTED LOOPS
B
C
TABLE ACCESS (FULL) OF TABLE_I
July 2000
Page 18
B - (C / Card(TABLE_I) )
If B > A then:
At least B - A
C-A

Card(TABLE_O) - B
If B > A then:
At least B - A
B * Card(TABLE_I) - A
In most pratical cases of this particular scenario only the throwaway from the inner
table is important.
Above formulas takes following peculiarity in Oracle8i into account:
All index scans (both unique and range scans) reports one row extra per lookup (in Oracle
7.3 and 8.0 this only happens for range scans).
Calculating throwaway, generic scenario:
Not all nested loops falls into above categories (especially when dealing with joins with
one or both row sources not being table accesses) and it is necessary to be able to
determine throwaway from a generic nested loops operations:
Rows
---A
B
C
Execution plan
----------------NESTED LOOPS
Outer Row Source
Inner Row Source
For all versions of Oracle following pragmatic rule can be formulated:

If A < B then the throwaway from the outer row source is AT LEAST:
B-A
If A < C then the throwaway from the inner row source is AT LEAST:
C-A
That C > A should only happen for Oracle 7.3 or 8.0 as Oracle8i reports the
retrieved rows for both the inner row source and the nested loops - and this
should not be different.
Cure:
If the join operation is part of a larger join with several join operations, please refer to
the section 6. Secondary Throwaway.
Throwaway from the outer table before the nested loops join operation can be cured
using the same principles as for curing throwaway from a table access (indexed or full
table scan).
The cure for throwaway from the inner table is the same as for throwaway from a table
access (indexed or full table scan).
If the join operation is throwing rows from the outer table (and thereby subjecting this
table access to secondary throwaway) this can only be counteracted by reversing the
join order.
July 2000
Page 19
Whether this reversal will indeed reduce the number of rows processed depends on the
predicates on both the inner and the outer table..
Limitations:
Change of join order can not always remove (join based) throwaway from the outer
table.
Removal of throwaway from the inner table and the the outer table (before the join) is
subjected to same limitations as throwaway from a table access.
July 2000
Page 20
5.7 Throwaway in row source: SORT/MERGE join operation

In a sort merge join the non-join predicates are applied before the sort operations and will not
cause throwaway from the join operation itself.
Throwaway from a sort merge join can therefore only be introduced by rows from the first sort
operation not matching rows from the second sort operation (resulting in throwaway from one or
both inputs).
As both row sources are retrieved and sorted before the merge operation any throwaway in the
join operation (actually the merge operation) will imply secondary throwaway in one or both of
the row sources.
Rows Execution plan
---- ----------------A
MERGE JOIN
B
SORT (JOIN)
C
D
SORT (JOIN)
E
C-B
If C > A then:
At least C - A
E-D


Cure:
Card(TABLE_O) - C
If C > A then:
At least C - A
E-D
Only throwaway from one table can be cured and this is done by changing the join
operation to a nested loops operation with the row source being subjected to throwaway
as the inner table and at the same time ensure that all join and non-join predicates on
this table can be evaluated using an index.
Limitations:
Changing a sort/merge join operation to a nested loops operation with fully indexed
predicates on the inner table is not always possible - for example not all predicates are
indexable.
Furthermore a change from a sort/merge join to a nested loops operation with indexed
inner table access will change the I/O pattern from multiblock reads to single block
access (by going from full table scans to indexed lookups) which easily can result in
more time spend waiting for I/O.
July 2000
Page 21
5.8 Throwaway in row source: HASH JOIN operation

The hash join will act as a sort merge join regarding throwaway, but as fewer row sources are
involved there is less possibility to determine throwaway and to distinguish between throwaway
introduced by non-join predicates and the join operation itself.
As for the sort merge join operation both input row sources are retrieved before being matched
in the join operation, meaning that both row sources can be subjected to secondary throwaway
in the case where the hash join operation discards rows not matching.
Rows Execution plan
---- ----------------A
HASH JOIN
B
C
For Oracle 7.3 and 8.0 tests have shown that the number of rows from the hash join
operation (which for nested loops and sort merge join operations are equal to the
number of rows output) is somewhat unprecisely defined.
All tests has resulted in numbers that are larger than the rows output and depends on the
size of the row sources and can not be used for a precise determination of throwaway
from the join operation.
Card(TABLE_O) - B
If B > A then:
At least B - A
C-A
Cure and limitations: see 5.7 Throwaway in row source: SORT/MERGE join operation
July 2000
Page 22
5.9 Throwaway in row source: FILTER

A filter row source filters rows from being output.
The filter row source will not be used for simple predicates on tables but is used for more
complex predicates - for example having clauses and subqueries.
The filter row source always has the row source to be filtered as input (for later reference this
row source will be called the primary input row source).
Example:
select *
from courses crs
where not exists (select X
from classes cls
where cls.crs_id = crs.crs_id)
Rows
---587
1019
1018
Execution Plan
----------------------------------------------FILTER
TABLE ACCESS (FULL) OF 'COURSES'
INDEX (RANGE SCAN) OF 'CLASS_CRS' (NON-UNIQUE)
The primary row source to the filter operation is the full table scan of COURSES (this is the
row source being filtered).
The filtering can be done using one or more subqueries. These subqueries are secondary input
row sources to the filter row source and are executed very much like the row retrieval from the
inner table in a nested loops join operation.
In above example the index range scan on CLASS_CRS is executed for each row in the primary
row source to determine what rows should be filtered away.
The filter row source will always introduce throwaway - except when it does not filter anything
away.
Another source of throwaway from filter operations is the number of rows from the secondary
row sources (those used for filtering).
It can be disputed whether these secondary row sources are thrown away (they do not appear in
the output) or are just separate queries requering work to be evaluated.
No matter what terms are used these secondary row sources represents work done - and has been
seen on several occations to process a bigger number of rows than the primary query itself.
Therefore a separate tuning effort on the secondary row sources can be beneficial.
The amount of throwaway can be found by determing how much smaller the output
rows is than the number of rows from the primary input row source.
In above example this results in a throwaway of 432 rows (1019 - 587).
Cure:
Often a change from a filter operation to a join can help.
Limitations:
The filter operation is often caused by the data model not being able to satisfy specific
requirements, and to change the filter into a join or removing it often requires major
changes in the application.
July 2000
Page 23
6. Secondary Throwaway
In more complex statements it becomes less and less simple to quantify the amount of secondary
throwaway.
Consider following example:
Rows #1
Operation
#1
Rows #4
Operation
#4
Rows #2
Rows #3
Operation
#2
Operation
#3
Rows #5
Operation
#5
If Operation #1 throws away rows from Operation #2 and thereby not only subjecting Operation
#2, but also Operation #4 and Operation #5 to secondary throwaway, then it can be difficult to
determine precisely how many rows from Operation #4 and Operation #5 have been thrown
away.
To adopt a pragmatic approach to quantify this secondary throwaway it is assumed that the
secondary throwaway is distributed evenly across all the row sources contributing to the row
source being subjected to primary throwaway and that the fraction of secondary throwaway
equals the primary throwaway.
This leads to following formula for calculating secondary throwaway:
Primary Throwaway (fraction of input) * sum(rows in contributing row sources)
Based on the above example the secondary throwaway generated by the primary throwaway in
Operation #1 can then be quantified by:
Primary Throwaway:
Rows #2 - Rows #1
Primary Throwaway fraction:
(Rows #2 - Rows #1)/Rows #2
Sum(rows in contributing row sources): Rows #3 + Rows #4 + Rows #5
This leads to:
Secondary throwaway: (Rows #2 - Rows #1)/Rows #2 * (Rows #3 + Rows #4 + Rows #5)
July 2000
Page 24
Curing secondary throwaway:

The primary throwaway can be cured (if indeed it is possible to design a cure) by
following the suggestions given for each type of row source.
This however can not be used to reduce the amount of secondary throwaway.
The simplest way to describe this is that the rows thrown away by secondary throwaway
are generated already when they arrive to the row source introducing the primary
throwaway.
This means that the only cure for secondary throwaway is to move the row source
generating the primary throwaway which in turn is responsible for the secondary
throwaway to an earlier evaluation in the execution plan.
In a sense this is what happens when an unindexed predicate is changed to be evaluated
in an index lookup - which is an evaluation of the predicate before the table access
previously having the primary throwaway.
Limitations:
It is not always feasible to change the execution plan to accommodate a reduction in
secondary throwaway.
An often seen consequence of moving a row source that generates throwaway to another
position in the execution plan (with the purpose of reducing secondary throwaway) is
that the row source no longer can limit the amount of rows as it needs information from
the row sources now positioned later in the execution plan to fully limit the amount of
rows.
July 2000
Page 25
6.1 Example on calculating secondary throwaway

To illustrate the concept of primary versus secondary throwaway consider following example
that is based on a thought case with joins with throwaway:
Id
Rows
--- -----1)
10
2) 20560
3) 10920
4)
110
5)
111
6) 10920
7) 11030
8) 20560
9) 31480
10)
10
11) 20570
Execution plan
-------------NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
TABLE ACCESS (BY INDEX ROWID) OF <A>
INDEX ACCESS (RANGE SCAN) OF
TABLE ACCESS (BY INDEX ROWID) OF <B>
TABLE ACCESS (BY INDEX ROWID) OF <C>
TABLE ACCESS (BY INDEX ROWID) OF <D>
To ease the interpretation of what the execution plan shows following graphical representation
is helpful:
10
1) Nested
Loops #3
10
20560
10) Table
<D>
2) Nested
Loops #2
10920
4) Table
<A>
111
5) Index
Range
11) Index
Range
8) Table
3) Nested
Loops #1
110
20570
20560
<C>
31480
10920
9) Index
Range
6) Table
<B>
11030
7) Index
Range
First lets determine the amount of throwaway in the row sources (the primary throwaway).
Assuming Oracle8i the throwaway can be determined using the descriptions found in 5.6
Throwaway in row source: NESTED LOOPS join operation.
July 2000
Page 26
TA from table A before join:

TA from table A during join:
TA from table B during join:
TA from NL #1 before join:
TA from NL #1 during join:
TA from table C during join:
TA from NL #2 before join:
TA from NL #2 during join:
TA from table D during join:
111 - 1 = 0 (all rows from index lookup is returned)

Not determinable (Join #1 returns more rows than outer row
source)
11.030 - 110 - 10.920 = 0
none (would have been applied earlier)
Not determinable (Join #2 returns more rows than outer row
source)
31.480 - 10.920 - 20.560 = 0
none (would have been applied earlier)
At least 20.560 - 10 = at least 20.550
20.570 - 20.560 - 10 = 0
(TA: ThrowAway)
The only row source subjected to significant (primary) throwaway is marked in bold.
As it now has been determined that there is throwaway from a row source which is generated by
the combined work of several other row sources these row sources are subjected to secondary
throwaway.
As the primary throwaway fraction is at least 20.550/20.560 = 0.9995 then according to above
quantification of secondary throwaway at least the fraction 0.9995 of all the rows in the two first
join operations, the three first table accesses (tables <A>, <B> and <C>) and the corresponding
index accesses are thrown away.
According to above formula the secondary throwaway can be calculated as:
primary throwaway fraction * sum(rows in contributing row sources)
= 0.9995* sum(row sources 2, 3, 4, 5, 6, 7, 8 and 9)
= 0.9995 * (111+110+11.030+10.920+10.920+31.480+20.560+20.560) rows
= 105.638 rows
This means that the total throwaway introduced in the last join operation is (at least):
20.550 rows (primary throwaway) + 105.638 rows (secondary throwaway)
= 126.188 rows out of the total 126.281 rows processed in the query , which equals a
throwaway of 99.93% of all the rows processed.
July 2000
Page 27
7. Tuning SQL Queries by Eliminating Throwaway

As described in the beginning of this paper the core method of tuning a single query is as
follows:
1.
2.
3.
4.
5.
Analyze query
Suggest changes to improve performance
Implement changes
Test improvement
Repeat until goal reached
By using identification and elimination of throwaway for analysis and suggested changes the
method can be expanded:
1. Calculate throwaway (primary and secondary) for relevant row sources in the execution plan
2. Identify row sources that introduce a significant amount of throwaway
3. Map row sources to the part of the statement they correspond to and determine the predicates
for relevant row sources
4. Suggest changes in execution plan to reduce secondary throwaway. This must be done by
moving the row source to be evaluated earlier in the execution plan
5. If a row source is introducing primary throwaway (or will be expected to do so after reducing
secondary throwaway) then analyze whether it is possible to reduce the amount of
throwaway
6. Identify the means to change the existing execution plan into the proposed one
7. Implement changes
8. Verify that expected execution plan is obtained
9. Test improvement
10. Repeat until goal reached or all throwaway has been eliminated
Re 1): The calculation of throwaway follows the formulas described for the different row
sources (for primary throwaway) and for complex statements (for secondary throwaway).
Re 2): A row source can be said to introduce a significant amount of throwaway if the total
amount of throwaway (primary + secondary throwaway) generated by that particular row source
is a significant part of all the rows processed within the statement.
Re 3): This part will not be described in sufficient detail to solve all possible scenarios in this
version of the paper.
The meaning of the step is to take the all row sources in the execution plan and then identify
what part of the statement these row sources correspond to (: what part of the statement the row
source is solving).
It could be argued that only the relevant row sources needs to be mapped to the statement, but in
practise this limitation will invariably lead to mistakes - especially when dealing with complex
statements.
It is normally straight forward to map row sources to the statement when the statement only
addresses each table once, but in the cases where the same table is referenced several times in
the statement (including views) it can be a challenge to identify exactly where a particular table
access row source belongs in the statement.
July 2000
Page 28
The interesting part of this mapping is the mapping of what predicates are in action for what
row source. This might seem trivial, but consider following simple join:
select cls.start_date, crs.short_name
from classes cls, courses crs
where cls.instr_id = crs.dev_id
and cls.status = AVAI
and crs.cat_id = 34;
If the execution plan for above statement looks like:
Execution plan
-------------NESTED LOOPS
TABLE ACCESS (BY INDEX ROWID)
TABLE ACCESS (BY INDEX ROWID)
OF CLASSES
OF COURSES
Then the join order would be CLASSES - COURSES and the predicates for CLASSES and
COURSES would be:
CLASSES:
status = AVAI
COURSES:
dev_id = <cls.instr_id>
cat_id = 34
If on the other hand the join order is COURSES - CLASSES then the predicates for CLASSES
and COURSES would be:
COURSES:
cat_id = 34
CLASSES:
instr_id = <crs.dev_id>
status = AVAI
Above demonstrates that the predicates and therefore the optimal indexes (for eliminating
primary throwaway) will depend on the join order and that focus on the predicates is important.
Re 4): Secondary throwaway - which is often seen to play the biggest part in work wasted by
throwaway in complex joins - needs as described above changes in the execution plan so that
the row source introducing the (primary) throwaway is evaluated earlier in the execution plan.
Moving a row source to an earlier evaluation in the execution plan will in most cases also
change the predicates for that particular row source and potentially all other row sources
between the old and the new position in the execution plan.
Changing the predicates for a row source has a direct influence on the ability of the row source
to throw rows away.
A prerequisite for reducing secondary throwaway by moving the row source with the primary
throwaway is that this row source is still capable of throwing rows away after having been
moved to the new position in the execution plan.
A second prerequsite is that the row sources having previously being subjected to secondary
throwaway has predicates that after the change in execution plan can limit the number of rows
processed.
An other way of stating this second prerequsite is that there must be some predicates connecting
the row source introducing the primary throwaway and the row source(s) subjected to secondary
July 2000
Page 29
throwaway and that these predicates by being reversed by the proposed change in the
execution plan can actually limit the number of rows processed in the row source(s) originally
subjected to secondary throwaway.
Therefore it is necessary to evaluate what predicates are available after the proposed change in
execution plan for the both the row source introducing the primary throwaway and the row
source(s) subjected to secondary throwaway.
This is a partial repetition of step 3, but using the new order of evaluation in the execution plan.
Re 5): As primary throwaway is local to one row source it is rather straight forward to verify if
there is a cure available and what means can be used to implement the cure.
The local nature of primary throwaway also makes it more likely that the cure can be
implemented without side-effects in other parts of the execution plan which has the pleasant
consequence of easing testing.
Re 6) The means to implement the wanted execution plan includes:
Creation of indexes
Adding/changing hints (for specifying join execution plans the ORDERED,
USE_NL, USE_HASH and INDEX hints often are quite usefull)
Changing optimizer mode
Changing other parameters influencing the optimizer
Changing statistics
Changing the where clause (for example: forcing join order by disabling index usage
under the Rule Based Optimizer)
Changing the statement to allow other access paths
Changing table definitions (for example by changing a NULL column to NOT
NULL in order to allow anti-join optimization)
Re 8) It is recommended that the expected execution plan is verified before testing the result especially if there is a possibility of getting long response times.
For this purpose the AUTOTRACE TRACE EXPLAIN setting in SQL*Plus (which only
outputs the execution plan) is sufficient and rather easy to use.
Re 9) The actual test should be performed using SQL trace as this will verify that the expected
elimination of throwaway has been successfull.
July 2000
Page 30
8. Examples on Applying the Method

In the following a series of examples are described to demonstrate how the described method
can be applied.
It should be noted that the original - non optimal performing - queries has been made to perform
badly by using wrong statistics and hints.
Furthermore as described other places in this paper all versions of Oracle does display some
lack of precision when counting the row numbers. These imprecisions has been edited out, as it
only confuses the demonstrations and does not give any improved understanding of what is
going on - see 11.5 Accuracy of reported number of rows from row sources.
8.1 Join order in 2 table join
Following example examines a 2 table join:
select crs.short_name, cls.start_date
from courses crs,
classes cls
where crs.crs_id = cls.crs_id
and cls.days = 15
One not optimal execution plan would be:
Rows
Execution Plan
----- --------------------------------------------------4
NESTED LOOPS
1018
4
TABLE ACCESS (BY INDEX ROWID) OF 'CLASSES'
12181
Step 1,1 (calculate throwaway)
Using the formulas described in 5.6 Throwaway in row sources: NESTED LOOPS join
operation the throwaway can be determined as:
Throwaway from the COURSES table access before join: 0 (as there is 1018 rows in the
COURSES table).
Throwaway (secondary) from the COURSES table access during join: At least 1014 rows (1018
- 4)
Throwaway from the CLASSES table during join: 11159 rows (12181 - 1018 - 4)
Step 2,1 (identify row sources introducing significant throwaway)
There is significant throwaway from both the outer table (COURSES) and the inner table
(CLASSES) during join.
Step 3,1 (map row sources to statement)
Row source:
Corresponds to table COURSES in the from clause with no predicates.
July 2000
Page 31
Row sources:
INDEX (RANGE SCAN) OF 'CLASS_CRS'
Corresponds to table CLASSES in the from clause with (join and non-join) predicates:
crs_id
= <courses.crs_id>
days = 15
Step 4,1 (Suggest changes in execution plan to reduce secondary throwaway)
As the only possible solution to the secondary throwaway from the COURSES table access is to
move the evaluation of the row source introducing the primary throwaway which causes the
secondary throwaway to an earlier evaluation in the execution plan.
This means that the join order should be reversed:
CLASSES - COURSES
Lets analyze this join order for cardinalities and throwaway:
Join order: CLASSES - COURSES
Cardinalities after join operation:
CLASSES:
Predicates:
days = 15
Cardinality:
4 (select count(*) from classes where days = 15)
COURSES:
Predicates (including join predicates)

crs_id = <classes.crs_id>
Cardinality after join: 4 (result of query)
When only looking at throwaway from the tables this join order will result in a total of 8
retrieved rows (4 + 4) which compares favorably against to the original querys 1022 rows (1018
+ 4).
Step 5,1 (Reduce primary throwaway)
The chosen join order (CLASSES - COURSES) will result in following predicates (as described
above):
CLASSES:
days = 15
Available indexes: None
Optimal index: (DAYS)
COURSES:
crs_id = <classes.crs_id>
Available indexes: (CRS_ID)
Optimal index: (CRS_ID)
To reduce primary throwaway the aim is to have fully indexed predicates where beneficial:
The optimal index on CLASSES is not available and as this index - CLASSES(DAYS) will be very selective (only retrieving 4 rows out of all the rows in CLASSES) it is
assumed that the creation of this particular index will be beneficial.
The optimal index on COURSES is available and no further considerations are needed.
Steps 4 and 5 then results in following first suggestion for changes to reduce throwaway to
obtain a better performing query:
July 2000
Page 32
New join order: CLASSES - COURSES

Join operation: No change
New index:
CLASSES(DAYS)
Step 6,1 (Identify means to implement suggested changes)
Above join order and join operation can be obtained by rewriting (and hinting) the query as
follows:
select /*+ORDERED USE_NL(crs) index(cls) */
crs.short_name, cls.start_date
from classes cls,
courses crs
where crs.crs_id = cls.crs_id
and cls.days = 15
Create an index:
CREATE INDEX CLASS_DAYS ON CLASSES(DAYS);
Steps 7,1, 8,1 and 9,1 (Implement, verify execution plan and test improvement)
Which gives the result:
Rows
Execution Plan
----- --------------------------------------------------4
NESTED LOOPS
4
5
INDEX (RANGE SCAN) OF 'CLASS_DAYS' (NON-UNIQUE)
4
TABLE ACCESS (BY INDEX ROWID) OF 'COURSES'
8
INDEX (UNIQUE SCAN) OF 'CRS_PK' (UNIQUE)
The statement now requires processing 25 rows in total - compared to the original querys 13207
rows (including rows from indexes).
On the used hardware the CPU consumption went down from 0.19 seconds to below 0.01
seconds
No further action is relevant using the described method as all throwaway has been eliminated.
July 2000
Page 33
8.2 Join order in 3 table join (with secondary throwaway)

To illustrate the method on an example with a 3-table join and secondary throwaway consider
following example:
select cls.start_date, loc.capacity
from locations loc,
classes
cls,
courses
crs
where loc.name = 'Belmont Shores Ed Center'
and loc.capacity between 20 and 30
and loc.loc_id = cls.loc_id
and cls.crs_id = crs.crs_id
and crs.dev_id = 121688
One (not optimal) execution plan would be:
Rows
Execution Plan
----- --------------------------------------------------40 NESTED LOOPS
597 NESTED LOOPS
6
TABLE ACCESS (BY INDEX ROWID) OF LOCATIONS
16
INDEX (RANGE SCAN) OF 'LOC_NAME' (NON-UNIQUE)
597
TABLE ACCESS (BY INDEX ROWID) OF CLASSES
603
INDEX (RANGE SCAN) OF 'CLASS_LOC' (NON-UNIQUE)
40 TABLE ACCESS (BY INDEX ROWID) OF COURSES
1194
INDEX (UNIQUE SCAN) OF 'CRS_PK' (UNIQUE)
Step 1,1 (Calculate throwaway):
The first join (joining LOCATIONS and CLASSES) does not throw any rows away from
CLASSES and most likely nothing from LOCATIONS (but this cant be determined precisely)
The second join (joining COURSES to the sofar obtained result set) throws 557 out of 597 rows
from the first join operation away - this is secondary throwaway.
The range scan of the LOC_NAME index results in 15 (16 - 1) rows, but the corresponding
table access (LOCATIONS) only returns 6 rows - which means that 9 rows are thrown away due
to non-indexed predicates on LOCATIONS.
The unique scan of the CRS_PK index results in 597 rows (1194 - 597 : remember that
Oracle8i reports one row extra per index lookup also when doing unique scans).
The corresponding table access (COURSES) only returns 40 rows, which means that 557 rows
are thrown away in the table access on COURSES by non-indexed predicates.
Step 2,1 (Identify row sources introducing significant throwaway):
As the second join operation throws away 93% of the work done by the first join operation this
throwaway can be claimed to be significant (secondary throwaway).
Furthermore the COURSES table access is also subjected to significant throwaway.
Step 3,1 (Map row sources to statement)
Row sources:
TABLE ACCESS (BY INDEX ROWID) OF LOCATIONS
INDEX (RANGE SCAN) OF 'LOC_NAME'
July 2000
Page 34
Corresponds to table LOCATIONS in the from clause with (non-join) predicates:

name = Belmont Shore Ed Center
capacity between 20 and 30
Row sources:
TABLE ACCESS (BY INDEX ROWID) OF CLASSES
INDEX (RANGE SCAN) OF 'CLASS_LOC'
Corresponds to table CLASSES in the from clause with (join) predicate:
loc_id = <locations.loc_id>
Row sources:
TABLE ACCESS (BY INDEX ROWID) OF COURSES
INDEX (UNIQUE SCAN) OF 'CRS_PK'
Corresponds to table COURSES in the from clause with predicates:
crs_id = <classes.cr_id>
dev_id = 121688
To cure the secondary throwaway the joining of the COURSES table (introducing the primary
throwaway) should be moved to an earlier evaluation in the join order.
This would result in one of the following four join orders:
A) LOCATIONS - COURSES - CLASSES
B) COURSES - LOCATIONS - CLASSES
C) COURSES - CLASSES - LOCATIONS
D) CLASSES - COURSES - LOCATIONS
It should be noted that A) and B) would result in a cartesian product between LOCATIONS and
COURSES.
To find the optimal join order the predicates and cardinalities for each table in above join orders
are analyzed:
As there are more candidates for join orders it is usefull to discard the join orders where the
cardinality of the driving table is significantly larger than the number of rows from the total
query (as this is bound to generate significant throwaway).
To do this the basic cardinalities (cardinality after applying non-join predicates) of the three
tables can be determined as:
LOCATIONS:
Non-join predicates:
Basic cardinality:
CLASSES:
Basic cardinality:
July 2000

6 (can be determined from obtained trace above)
None
11163 (select count(*) from classes or obtained from table
statistics)
Page 35
COURSES:
Basic cardinality:
dev_id = 121688
7 (select count(*) from courses where dev_id = 121688 or
obtained from table and column statistics)
With a total of 40 rows output from the query CLASSES does not seem like a good candidate
for choice of driving table (this would result in a throwaway of at least 11123 rows which is
much worse than the total number a rows thrown away in the original query).
This leaves the join orders:
A) LOCATIONS - COURSES - CLASSES
B) COURSES - LOCATIONS - CLASSES
C) COURSES - CLASSES - LOCATIONS
Now, lets analyze the join orders for cardinalities and throwaway:
Join order A): LOCATIONS - COURSES - CLASSES
Cardinalities after join operations
LOCATIONS: 6 (determined under basic cardinalities)
COURSES:

dev_id = 121688
Cardinality after join: 7*7 (cartesian product)
CLASSES:
crs_id = <courses.crs_id>
Join order B): COURSES - LOCATIONS - CLASSES

COURSES:
7 (determined under basic cardinalities)
LOCATIONS: Predicates (including join predicates)
Cardinality after join: 6*7 (cartesian product)
CLASSES:
A) and B) are similar in the sense that the two leading tables are identical (but reversed) and are
not joined togther, resulting in cartesian products with an identical set of rows (42).
Join order C): COURSES - CLASSES - LOCATIONS
COURSES:
July 2000
Page 36
CLASSES:
Cardinality:
Needs a test and cant be determined by analysis, but
must be at least 40 as the join predicate to
LOCATIONS is referencing the primary key in
LOCATIONS.
LOCATIONS: Predicates (including join predicates)

loc_id = <classes.loc_id>
After the analysis it can be determined that join orders A) and B) at maximum should generate a
throwaway of 2 rows (42 rows from the cartesian product between COURSES and
LOCATIONS - 40 rows from the query output). It should be noted that this calculation of
throwaway focuses on throwaway from table row sources - not indexes.
Join order C) might work as well, but can not possibly be significantly better and can
(depending on the data contents) be a lot worse than A) and B)..
Therefore only join orders A) and B) will be considered and join order A) will be chosen to
reduce secondary throwaway in the query.
The chosen join order (LOCATIONS - COURSES - CLASSES) will result in following
predicates (as described above):
LOCATIONS: name = Belmont Shore Ed Center
Available indexes: (NAME)
Optimal index: (NAME, CAPACITY)
COURSES:
dev_id = 121688
Available indexes: (DEV_ID)
Optimal index: (DEV_ID)
CLASSES:
Available indexes: (CRS_ID), (LOC_ID)
Optimal index: (CRS_ID, LOC_ID)
As the throwaway caused by the non-optimal index on LOCATIONS (8 rows as calculated
above) is small compared to the total number of rows processed in the statement, it does not
seem worthwhile to add an index on (NAME, CAPACITY).
The optimal index on COURSES is available and no further considerations are needed.
During the initital calculation of throwaway it was determined that the retrieval from CLASSES
using the LOC_ID index (named CLASS_NAME) resulted in 597 rows retrieved - which means
that this index alone will introduce a level of primary throwaway.
It is not easily determinable whether the simple index on CRS_ID will be enough for
eliminating primary throwaway from this last table access.
July 2000
Page 37
First recommendation is therefore to perform at test without the optimal index and then verify
whether an acceptable result can be obtained without building new indexes - which normally is
the best solution as indexes are expensive (maintenance, space).
New join order: LOCATIONS - COURSES - CLASSES
Join operations: No change (use nested loops)
New indexes: None
Above join order and usage of join operations can be obtained by rewriting (and hinting) the
query as follows:
select /*+ORDERED USE_NL(cls crs) */
cls.start_date, loc.capacity
from locations loc,
courses
crs,
classes
cls
where loc.name = 'Belmont Shores Ed Center'
and loc.capacity between 20 and 30
and loc.loc_id = cls.loc_id
and cls.crs_id = crs.crs_id
and crs.dev_id = 121688
Rows
Execution Plan
----- -------------------------------------------------40 NESTED LOOPS
42
NESTED LOOPS
6
TABLE ACCESS (BY INDEX ROWID) OF 'LOCATIONS'
16
42
48
INDEX (RANGE SCAN) OF 'CRS_DEV' (NON-UNIQUE)
40
82
AND-EQUAL
383
363
INDEX (RANGE SCAN) OF 'CLASS_LOC' (NON-UNIQUE)
This is a much better result than the original, but there is still a significant amount of throwaway
- therefore an new iteration is executed (described somewhat more compact):
Step 1,2 and 2,2 (Calculate significant throwaway):
There is no throwaway to speak of (actually no throwaway exists) in the two join operations.
The only operation with throwaway is the AND-EQUAL operation on the two index range scan
operations on CLASS_CRS and CLASS_LOC indexes where in total 746 rows (383 + 363) are
input to the AND-EQUAL operation, but only 82 are output.
Step 3,2 (Map row sources to statement)
July 2000
Page 38
Row sources:

AND-EQUAL
INDEX (RANGE SCAN) OF 'CLASS_CRS'
INDEX (RANGE SCAN) OF 'CLASS_LOC'
Corresponds to table CLASSES in the from clause with join predicates - note: this is exactly the
indexes that was left over for further evaluation in the first iteration (step 5,1).
No secondary throwaway.

As the join order will not be changed (for reducing secondary throwaway) this step can directly
address the calculated primary throwaway around the CLASSES table access.
According to above discussion on the optimal index for accessing the CLASSES table it can be
determined that an index on CLASSES(CRS_ID, LOC_ID) should reduce the amount of
primary throwaway identified.
Create an index:
CREATE INDEX CLASS_CRS_LOC ON CLASSES(CRS_ID, LOC_ID);
Rows
Execution Plan
----- -------------------------------------------------40 NESTED LOOPS
42
NESTED LOOPS
6
TABLE ACCESS (BY INDEX ROWID) OF 'LOCATIONS'
16
42
48
INDEX (RANGE SCAN) OF 'CRS_DEV' (NON-UNIQUE)
40
82
INDEX (RANGE SCAN) OF 'CLASS_CRS_LOC' (NON-UNIQUE)
The statement now processes 322 rows in total - compared to the original 2093 rows.
On the hardware used for executing this demonstration the CPU consumption went down from
0.06 seconds to 0.01 seconds (the latter being somewhat unprecise as the resolution of the
measurement is 0.01 seconds).
It may be worthwhile to note that this query ended up being tuned by changing the execution
plan into a plan that for all pratical purposes can be seen as a star join.
July 2000
Page 39
8.3 N-way join (with secondary throwaway)

Following example examines a 3 table join with all tables joining to all tables.
select crs.description, cls.start_date
from courses
crs,
classes
cls,
employees emp
where crs.dev_id
= emp.emp_id
and cls.instr_id = emp.mgr_id
and cls.crs_id
= crs.crs_id
One not optimal execution plan would be:
Rows
Execution Plan
----- --------------------------------------------------1
NESTED LOOPS
11163
NESTED LOOPS
11163
TABLE ACCESS (FULL) OF 'CLASSES'
11163
22326
INDEX (RANGE SCAN) OF 'CRS_PK' (UNIQUE)
1
TABLE ACCESS (BY INDEX ROWID) OF 'EMPLOYEES'
11164
INDEX (UNIQUE SCAN) OF 'EMP_PK' (UNIQUE)
Step 1,1 (calculate throwaway)
The first join (joining CLASSES and COURSES) does not throw any rows away from either
CLASSES nor COURSES.
The second join (joining EMPLOYEES to the sofar obtained result set) throws 11162 out of
11163 rows from the first join operation away
The unique scan on the EMP_PK index results in 11163 rows (22326 - number of lookups,
11163) out of which 11162 rows are thrown away due to unindexed predicates.
Step 2,1 (identify row sources introducing significant throwaway)
As the second join operation throws away 99,99% of all the work done by the first join
operation this throwaway can be claimed to be significant.
Furthermore the EMPLOYEES table access is also subjected to significant throwaway.
Step 3,1 (map row sources to statement)
Row source:
Corresponds to table CLASSES in the from clause with no predicates.
Row sources:
INDEX (RANGE SCAN) OF 'CRS_PK'
Corresponds to table COURSES in the from clause with (join) predicates:
crs_id
= <classes.crs_id>
July 2000
Page 40
Row sources:
INDEX (UNIQUE SCAN) OF 'EMP_PK'
Corresponds to table EMPLOYEES in the from clause with (join) predicates:
emp_id = <courses.dev_id>
mgr_id = <classes.instr_id>
To cure the secondary throwaway the join operation joining the EMPLOYEES table
(introducing the primary throwaway) should be moved to an earlier evaluation in the join order.
This would result in one of the following four join orders:
A) EMPLOYEES - CLASSES - COURSES
B) EMPLOYEES - COURSES - CLASSES
C) CLASSES - EMPLOYEES - COURSES
D) COURSES - EMPLOYEES - CLASSES
To find the optimal join order the predicates and cardinalities for each table in above join orders
are analyzed.
As there are more candidates for join orders it is usefull to discard join orders where the
cardinality of the driving table is significantly larger than the number of rows from the total
query.
To do this the basic cardinalities (cardinality after applying non-join predicates, if any) of the
three involved tables can be determined as:
EMPLOYEES:
Basic cardinality:
None
15132 (select count(*) .. or table statistics)
CLASSES:
Basic cardinality:
None
COURSES:
Basic cardinality:
None
With a total of 1 row output from the query the tables EMPLOYEES and CLASSES seems like
the worst possible choices for driving table.
This leaves the join order:
D) COURSES - EMPLOYEES - CLASSES
It would now be tempting to continue to next step (reducing primary throwaway), but for
completeness lets analyze this join order for cardinalities and throwaway:
Join order D): COURSES - EMPLOYEES - CLASSES
Cardinalities after join operations:
COURSES:
July 2000
Page 41
EMPLOYEES: Predicates (including join predicates)

emp_id = <courses.dev_id>
Cardinality after join: 1018 (unique scan)
CLASSES:

instr_id = <employees.mgr_id>
When only looking at throwaway from the tables this join order will result in a total of 2037
retrieved rows (1018 + 1018 + 1) which compares favorably against to the original querys
22327 rows (11163 + 11163 + 1).
The chosen join order (COURSES - EMPLOYEES - CLASSES) will result in following
predicates (as described above):
COURSES:
None
EMPLOYEES: emp_id = <courses.dev_id>

Available indexes: (EMP_ID)
Optimal index: (EMP_ID)
CLASSES:
instr_id = <employees.mgr_id>
Available indexes: (CRS_ID)
Optimal index: (INSTR_ID, CRS_ID)
The optimal index on EMPLOYEES is available and no further considerations are needed.
Regarding indexes on CLASSES it must be assumed that the available index is insuffient.
This can be argued as follows: As all courses are retrieved, also all 11163 classes must be
retrieved using the available index - resulting in having to throw away 11162 rows to return the
one correct row from the query.
This assumption however relies on still using nested loops join operations and having 1018
lookups via an index resulting n only one row found.
This can very well be more resource consuming than executing a hash join operation between
the 1018 from joining COURSES and EMPLOYEES and the 11163 rows from CLASSES
(without any indexes on CLASSES).
So, again with the wish to avoid unnecesary index creations it is recommended to test this latter
approach before creating an index on CLASSES(INSTR_ID, CRS_ID).
New join order: COURSES - EMPLOYEES - CLASSES
July 2000
Page 42
Join operations: NL on EMPLOYEES, HASH on CLASSES

New indexes: None
Above join order and usage of join operations can be obtained by rewriting (and hinting) the
query as follows:
select /*+ORDERED USE_NL(emp) USE_HASH(cls) */
crs.description, cls.start_date
from courses
crs,
employees emp,
classes
cls
where crs.dev_id
= emp.emp_id
and cls.instr_id = emp.mgr_id
and cls.crs_id
= crs.crs_id
Rows
Execution Plan
----- --------------------------------------------------1 HASH JOIN
1018
NESTED LOOPS
1018
1018
2036
INDEX (UNIQUE SCAN) OF 'EMP_PK' (UNIQUE)
11163
The statement now requires processing 15254 rows in total - compared to the original querys
66981 rows (including rows from indexes).
On the used hardware the CPU consumption went down from 1.13 seconds to 0.12 seconds
Assuming this is acceptable, no further analysis is required.
(For completeness the alternative solution with an extra index on CLASSES (INSTR_ID,
CRS_ID) was tested. This resulted in a CPU consumption of 0.11 seconds and apr. 50% more
logical I/Os with all physical I/O as single block I/O - resulting in potential more wait time
spend on reads from the I/O subsystem).
July 2000
Page 43
9. Using SQL Trace and tkprof

This section is a short description on how to obtain the number of row sources on which the
whole concept of tuning by eliminating throwaway is based.
Sofar (that is: up to and including Oracle8i release 2) the number of rows processed or returned
from row sources can only be extracted when using the SQL trace facility. No other possibility
exists (no V$ or X$ tables).
The reason for emphazising this is that SQL trace has been considered difficult to use as it
requires the SQL statement tuner to be able to acquire the generated trace files on the database
server, which with 2 tier, 3 tier etc. technologies seems to be harder and harder to get to.
The basic steps are:
1. Activate SQL trace, either:
sql_trace = TRUE (in init.ora)
alter session set sql_trace = TRUE; (from the session)
statistics = y (Oracle Forms)
dbms_session.set_sql_trace(TRUE)
dbms_system.set_sql_trace_in_session(sid,#serial,TRUE)
2. Run the statements to analyze
3. Deactivate SQL trace (using the same methods as activating, but with FALSE as argument)
4. Identify trace file in user_dump_dest:
Typically this will be the youngest file in the directory
5. Run tkprof on trace file:
tkprof <tracefile> <outfile> explain=<un>/<pw>@<tnsid>
Suggestions:
Set the timed_statistics parameter to true as this will help identify time consuming statements.
Use the SYS=NO and SORT=EXEELA FCHELA arguments with tkprof (in addition to the
above)
July 2000
Page 44
10. Tracing Issues

When tracing applications you might quite often experience that the row source counts are 0 for
all row sources in a statement - maybe for all traced statements.
This can happen for a couple of reasons:
The trace file does not contain STAT lines.
The STAT lines are written to the trace file only when a cursor is closed - either explicitly or
implicitly when the cursor is reparsed (as in SQL*Plus).
The STAT lines are therefore not written to the trace file if your application just exits without
explicitly closing the cursor (SQL*Plus does not perform an explicit close cursor before
exiting).
If you have problems with this the best solution might be to take the offending statements and
execute these from SQL*Plus.
After the last statement execute any statement to force an implicit cursor close (for example
alter session set sql_trace = falseor select * from dual).
Of course the STAT lines will not be written if you terminate the program prematurely or if you
use the trace file before the offending statement has completed its work.
The execution plan described in the STAT lines does not match the execution plan when
executing explain plan during tkprof.
This can happen if information to optimize the statement has changed - for example changed
statistics or the creation of new indexes.
In Oracle 7.3 and 8.0 the information is then ignored - giving 0 for the row counts.
In Oracle8i tkprof tries to explain what happened anyway as demonstrated in following
example:
1. create index emp_fname on employees(first_name);
2. alter session set optimizer_mode = first_rows;
3. alter session set sql_trace = true;
4. select * from employees
where first_name = Chris;
5. alter session set sql_trace = false;
6. drop index emp_fname;
Produces following tkprof output:
Rows Row Source Operation
---- -----------------------------------------------16 TABLE ACCESS BY INDEX ROWID EMPLOYEES
17
INDEX RANGE SCAN (object id 3183)
Rows
---0
16
17
0
Execution Plan
-----------------------------------------------SELECT STATEMENT
GOAL: FIRST_ROWS
TABLE ACCESS GOAL: ANALYZED (BY INDEX ROWID) OF
'EMPLOYEES'
BITMAP CONVERSION (TO ROWIDS)
BITMAP INDEX (FULL SCAN) OF 'EMP_JOB'
The first partial execution plan is generated directly from the STAT lines in the trace file thus
showing the actual execution plan when the statement was executed (having no name for the
index as this index does not exist anymore).
July 2000
Page 45
The second execution plan shows what the optimizer would have chosen with the current state
of the database and the row counts should not be taken seriously.
You run on Oracle 8.0.4
On several platforms there was a bug in this release where the STAT lines was written with a
wrong number for the corresponding cursor (#0 instead of #1). This only happened if you
traced from an SQL*Plus session.
If you can not upgrade to a later version the problem can be corrected by editing the trace file
and do a global change of STAT #0 to STAT #1
July 2000
Page 46
11. Miscellanous Issues

11.1 I/Os
Tuning by eliminating throwaway is mostly based on limiting the amount of CPU work spend.
It is therefore not given that a statement that exhibits no significant throwaway also is
performing optimally as there still might be a significant amount of time spend doing reads from
the datafiles.
In many cases the elimination of throwaway will indeed result in fewer I/Os.
This comes from the fact that by eliminating throwaway only rows which is part of the output
from the query will be retrieved from the database.
This means that only the blocks containing data which is needed in the output will be visited
having the consequence that the query can not be solved using fewer block visits.
This however does not guarantee that the blocks could not be read more efficiently (for example
by doing multi block reads instead of single block reads).
Furthermore by focusing purely on eliminating throwaway there will be a tendency to replace
full table scans with index retrieval.
To avoid this (and thereby getting the advantange obtained by using multiblock reads) care
should be taken to only eliminating throwaway if it represents a significant part of the rows
processed within the statement.
This is saying the same as the well known advice only to use indexes instead of full table scans
if the selectivity of the involved predicates is sufficiently high.
11.2 Rows are not equal
Throughout the description of throwaway it has been assumed that handling a row within a row
source has the same cost in terms of work spend without taking the type of row source into
consideration.
Without going into details it can be assumed that following rank could be made:
Cheapest
Most expensive
Retrieve a row via a full table scan

Retrieve a row via a rowid
Go through an index to find a row
Sort a row
In this raking it should be noted that the focus is rows - not blocks (as more work is required to
handle a block during a full table scan than for example handling a block during a rowid
retrieval).
Ideally this ranking should be included in the determination of how significant a throwaway is.
This would require a reasonable precise quantification of above ranking (aka: getting a row
from an index is 3.5 times more expensive than retrieving a row from a table via a rowid).
For the time being there has not been made such a quantification of this ranking and it is
doubtful if it is possible at all to do without adding to much complexity.
July 2000
Page 47
Seen from a pragmatic point of view however experience has sofar shown that it is a sufficiently
precise assumption that all rows require the same amount of work disregarding what row source
is handing the row.
This however makes the prediction of the expected result of the query tuning unprecise.
11.3 PL/SQL functions called from row sources
Above pragamatic view on rows representing more or less the same amount of work should not
be adopted for row sources calling PL/SQL functions.
As just the task switch between the SQL and the PL/SQL engines is rather expensive then
reducing the number rows from row sources calling PL/SQL functions can reduce the CPU
consumption significantly - even if the result of this change is more rows throwaway from
normal row sources.
11.4 Optimizer mode
It should be noted that this paper does not deal with the optimizer.
The optimizer (disregarding if it is the RBO or the CBO) just generates the execution plan (with
the small twist that the CBO has some more row sources to choose from): When the execution
plan first has been determined it does not matter for performance what optimizer mode was
used.
The idea of throwaway is purely a view on the result of applying the exection plan on the
involved database objects and the goal of tuning by eliminating throwaway is obtained by
finding a better execution plan. In this the choice of optimizer just becomes a means to generate
the wanted execution plan.
11.5 Accuracy of reported number of rows from row sources
During the preparation of this paper it was several times experienced that the numbers reported
for row source counters in the trace files were not always precise - in the sense that the rules for
how the counters are updated might change for a row source depending on where in the
execution plan the row source is used.
Some good examples on this are:
1. There seems to be a consistent extra row reported somewhere in the execution plan
above a full table scan (not always on the full table scan itself) when using Oracle8i
2. The number of rows from hash joins in Oracle 7.3 and 8.0 resembles (!) the number of rows
from the hash join, but are higher and the difference changes according to how the tables are
ordered in the hash join operation.
3. Using Oracle8i the number of rows from a table access (by rowid) could be double as high
as physically possible if the table access was used as the inner table of a nested loops join but not always - this even happenes if other parts of the execution plan changes.
This can make the precise quantification of throwaway impossible, but in all cases seen during
the preparation of this paper and in real life it does not prevent the identification of where in a
query the work is wasted by throwaway.
In many cases an evaluation of the query can identify these discrepancies and in some cases
even an determination of the correct numbers can be done.
July 2000
Page 48

Throwaway 2

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Throwaway 2

Загружено:

Авторское право:

Доступные форматы

Martin Berg (martin.berg@oracle.

Query tuning by eliminating throwaway

1. A Basic Method for Performing Query Tuning

Index: CLASSES_TYPE on CLASSES(TYPE)

Same index as Query 1

3.3 Background - execution plans and row sources

3.4 What row sources can introduce throwaway?

Number of input row sources

4. Oracle Version Dependencies

*) In Oracle8i this was reported as 4.

5. Detailed Description of Simple Row Sources

5.2 Throwaway in row source: TABLE ACCESS (BY INDEX ROWID)

5.3 Throwaway row source: INDEX SCAN

(Index on CLASSES(TYPE, STATUS, DAYS))

5.4 Throwaway in row source: SORT (UNIQUE)

5.5 Throwaway in join operations, generic

5.6 Throwaway in row source: NESTED LOOPS join operation

For Oracle8i the throwaway can be calculated as:

Throwaway from inner table (during join):

For Oracle8i the throwaway can be calculated as:

Throwaway from inner table (during join):

Throwaway from inner table (during join):

For Oracle8i the throwaway can be calculated as:

Throwaway from inner table (during join):

For all versions of Oracle following pragmatic rule can be formulated:

5.7 Throwaway in row source: SORT/MERGE join operation

Throwaway from inner table (during join):

For Oracle8i the throwaway can be calculated as:

Throwaway from inner table (during join):

5.8 Throwaway in row source: HASH JOIN operation

Throwaway from inner table (during join):

5.9 Throwaway in row source: FILTER

Often a change from a filter operation to a join can help.

Curing secondary throwaway:

6.1 Example on calculating secondary throwaway

TA from table A before join:

111 - 1 = 0 (all rows from index lookup is returned)

7. Tuning SQL Queries by Eliminating Throwaway

8. Examples on Applying the Method

Predicates (including join predicates)

New join order: CLASSES - COURSES

8.2 Join order in 3 table join (with secondary throwaway)

Corresponds to table LOCATIONS in the from clause with (non-join) predicates:

name = Belmont Shore Ed Center

Predicates (including join predicates)

Join order B): COURSES - LOCATIONS - CLASSES

LOCATIONS: Predicates (including join predicates)

TABLE ACCESS (BY INDEX ROWID) OF 'CLASSES'

Step 5,2 (Reduce primary throwaway)

8.3 N-way join (with secondary throwaway)

EMPLOYEES: Predicates (including join predicates)

Predicates (including join predicates)

EMPLOYEES: emp_id = <courses.dev_id>

New join order: COURSES - EMPLOYEES - CLASSES

Join operations: NL on EMPLOYEES, HASH on CLASSES

9. Using SQL Trace and tkprof

10. Tracing Issues

11. Miscellanous Issues

Retrieve a row via a full table scan

Вам также может понравиться