Вы находитесь на странице: 1из 14

Meet the experts: Terry Purcell on coding predicates in

outer joins: Advanced outer join constructs


Part 2 of 2
Terry Purcell (terry_purcell@ylassoc.com)
Senior Consultant
Yevich, Lawson and Associates

17 January 2002

In this second of two parts, Terry Purcell, an IBM Gold Consultant with Yevich, Lawson, and
Associates, and the the winner of the Best Overall Speaker Award at the 2001 IDUG Asia
Pacific conference in Canberra, Australia, offers more advice on coding predicates in outer
joins.
View more content in this series
2002 International Business Machines Corporation. All rights reserved.

Introduction
This is part two of my column in which I attempt to make it easier for you to understand and use
the powerful outer join feature of the SQL language.
Part one provided a simple comparison of inner and outer joins, and also introduced new
terminology for explaining the tables in the outer join operations. Finally, I discussed the different
predicate types and at what stages DB2 is able to apply them.
In this part, I will provide background on coding predicates on the NULL-supplying table, how DB2
may simplify your query to improve performance, and what steps you can take to ensure that you
get the results you expect.

Left and right outer join NULL-supplying table predicates


Outer join simplification
Preserving NULLs
Before-join predicates
Copyright IBM Corporation 2002
Meet the experts: Terry Purcell on coding predicates in outer
joins: Advanced outer join constructs

Trademarks
Page 1 of 14

developerWorks

ibm.com/developerWorks/

Outer join simplification


To recap some terminology that I introduced in Part 1:
The preserved row table is the table for which rows are preserved for an unmatched row in
the join operation.
The NULL-supplying table is the table for which rows are supplied with NULLs for an
unmatched row in the join.
Whereas the most overwhelming attribute of a WHERE clause predicate applied to the preservedrow table is that DB2 can apply the predicate before or after the join; a WHERE clause predicate
applied to the NULL-supplying table has a very different impact, in that it causes DB2 to simplify
the join if the predicate negates NULLs introduced by the outer join.
To explain what I mean by this, see Figure 13 , which shows an example of a WHERE clause
predicate that references the NULL-supplying table.

Left Outer Join -- join simplification

A WHERE clause predicate must evaluate to TRUE for a row to qualify. If there is an unmatched
row in the join, the columns from the NULL-supplying table are NULL. When DB2 compares the
WHERE clause predicate D.DEPTNAME NOT LIKE '%CENTER%' to a NULL, the result is neither TRUE
nor FALSE, but UNKNOWN. This row is not returned because it does not evaluate to TRUE.
Thus, NULLs that are supplied by the left outer join are negated by the WHERE clause predicate.
This causes DB2 to determine that a left outer join is unnecessary, and it causes the query to be
rewritten as an inner join, which may or may not be what you intended when you coded the query.
The advantage to having DB2 rewrite the left outer as an inner join is that performance might
improve. The predicate that is applied to table D can now be applied before the join, rather than
after, since it now references a non-NULL-supplying table. Preserved-row and NULL-supplying are
unrelated terms for inner joins; given that there will never be an unmatched row that is returned in
the join, that is, both tables are non-NULL-supplying.
Meet the experts: Terry Purcell on coding predicates in outer
joins: Advanced outer join constructs

Page 2 of 14

ibm.com/developerWorks/

developerWorks

Preserving NULLs
If this outer join simplification does not give you the results you wanted -- that is, you require
that the NULL (or optional) rows be returned -- then use OR D.DEPTNAME IS NULL to preserve the
NULLs in the answer set.
An example of this is shown in Figure 14.

Left Outer Join: NULL preservation

DB2 must apply the WHERE clause predicate after the join, because it is not known until after
the join whether the row has been matched (and thus the first part of the predicate -- D.DEPTNAME
NOT LIKE '%CENTER%' -- applies) or unmatched (and thus the second part of the predicate -- OR
D.DEPTNAME IS NULL -- applies).

Before-join predicates
What happens if you choose to code the WHERE clause predicate to be applied by DB2 upon the
NULL-supplying table before the join?
If you do this, there is no WHERE clause predicate to limit rows on the preserved-row table or in
the final result. You are only limiting rows from the NULL-supplying table. Figure 15 shows the
result of this.

Meet the experts: Terry Purcell on coding predicates in outer


joins: Advanced outer join constructs

Page 3 of 14

developerWorks

ibm.com/developerWorks/

Predicates on NULL-supplying table

Figure 16 shows that if you recode the before-join predicate to an ON clause predicate, the same
results are returned.

Predicates on NULL-supplying table: simplified

Whether the rows from the NULL-supplying table are filtered before or during the join is a matter
for performance only, and DB2 decides this based on the join method used. For producing the
correct result, either method is valid. Because the row does not match in the join, removing the row
before the join does not affect the output.
DB2 is able (from Version 6 on) to merge any unnecessary nested table expressions such as
those on the NULL-supplying table ( Figure 15 ) to be rewritten as a single query block and applied
as a during-join or before-join predicate ( Figure 16 ).
Meet the experts: Terry Purcell on coding predicates in outer
joins: Advanced outer join constructs

Page 4 of 14

ibm.com/developerWorks/

developerWorks

Full outer join NULL-supplying table predicates


Outer join simplification
Preserving NULLs
Before-join predicates

Outer join simplification


The rule for outer join simplification that is applicable to left and right outer joins is also valid for
full joins. A WHERE clause predicate applied to the NULL-supplying table that causes the NULLs
to be negated causes DB2 to simplify the join. Because both tables supply NULLs, this clause
negates NULLs regardless of which table it is applied to.
Figure 17 shows an example of a WHERE clause predicate applied to a full outer join, which
causes DB2 to simplify the join.

Full Outer Join: join simplification

Without join simplification, the predicate is applied as a totally-after-join predicate. Given that the
optimizer can determine that the predicate negates NULLs, it can rewrite the query as a left outer
join. A full outer join can be rewritten as either a left or right outer join, depending on the table join
sequence chosen.
Rewriting the query as a left outer join means that the WHERE predicate can now be applied as a
before-join predicate, resulting in fewer rows being joined. If the WHERE clause predicates negate
the NULLs from both tables, then simplification causes the query to be rewritten as an inner join.
Executing the query as an inner join allows predicates on both tables to be applied before the join.
Join simplification can be identified by the JOIN_TYPE column of the explain (plan table) output. A
value of 'L' indicates that the join has been simplified to a left outer join (as there is no runtime right
outer join), and a 'blank' indicates an inner join.
Meet the experts: Terry Purcell on coding predicates in outer
joins: Advanced outer join constructs

Page 5 of 14

developerWorks

ibm.com/developerWorks/

Preserving NULLs
If you require that the NULLs from both tables be returned in the result set, alter the WHERE
clause predicate to ensure that they are not removed.
Figure 18 shows an example of a WHERE clause predicate that does preserves NULLs.
COALESCE returns the first value in the list that is not NULL. Thus, the WHERE comparison is
always against an actual value (unless the column is defined to allow NULLs).

Full Outer Join: NULL preservation

In the example shown in Figure 18 , the WHERE clause predicate is applied totally after the join,
because the WHERE predicate depends on columns from both tables.
Is there a better alternative?

Before-join predicates
You can get better performance by coding the query to apply these predicates before the join (as
shown in Figure 19 ).

Meet the experts: Terry Purcell on coding predicates in outer


joins: Advanced outer join constructs

Page 6 of 14

ibm.com/developerWorks/

developerWorks

Full Outer Join: NULL preservation alternative

Be careful when coding multiple outer joins


Missing rows
Finding missing rows

Missing rows
In Part one , I mentioned that obtaining the correct result is the major focus when writing any SQL
statement. And with multiple joins involving outer joins, it is very easy to accidentally lose rows
based upon the source of the join predicates.
Of course, I have already explained that columns from a NULL-supplied table will be NULL for
an unmatched row. If a column from a NULL-supplied table is referenced as a join predicate
in a subsequent join, then NULL will never equate, and thus a further match will not be made.
Without interrogating the data, there may be no indication that these rows are missing, because
it is possible that outer join simplification was not necessary by DB2. Outer join simplification is at
least identifiable in the plan table (based on JOIN_TYPE column).
Figure 20 shows an example of a NULL-supplied column that is used in a subsequent join.

Meet the experts: Terry Purcell on coding predicates in outer


joins: Advanced outer join constructs

Page 7 of 14

developerWorks

ibm.com/developerWorks/

Multiple Outer Joins: Missing Rows

Step 1 applies the before-join predicate. Step 2 performs the left outer join. The left outer join from
the department and employee tables (step 2) does not produce a matching row in this example.
Thus, when step 3 is performed (subsequent left outer join to the project table), the value of the
join predicate from the previous join is NULL, because it was from the NULL-supplied table. The
row is still preserved, however, because it is a left outer join. For an inner join, the row would not
have been preserved.

Finding missing rows


If you ensure that subsequent join predicates always refer to columns from preserved-row tables,
the true value will be available for the later join. This is an extremely important point to ensure that
the correct rows are returned in the join.
Figure 21 shows an example whereby the second join refers to a join predicate from the
preserved-row table.

Meet the experts: Terry Purcell on coding predicates in outer


joins: Advanced outer join constructs

Page 8 of 14

ibm.com/developerWorks/

developerWorks

Multiple Outer Joins: Finding Missing Rows

In this example, I have corrected the error from the previous example ( Figure 20 ). The resultant
data from step 2 (first left outer join) contains the actual value 'D01' from the preserved row table.
This value is used in step 3 (subsequent left outer join) to match against the project table, rather
than against NULL.
The dependence on join sequence for this query requires that the department table be accessed
TM

first. The DB2 for z/OS


optimizer (from Version 6 on) is able to determine the remaining
join sequences based on the lowest cost, rather than the coding sequence. Each of the joins
highlighted as step 2 and 3 depend on the department table only, and not on each other.
For full outer joins, it is more difficult to specify a join column from the preserved-row (or nonNULL-supplying) table, because both tables can supply NULLs. You must always specify the
COALESCE clause to ensure that a preserved value is always specified by the join. For example:
ON P.DEPTNO = COALESCE (D.DEPTNO, E.WORKDEPT)

Nesting outer joins


Outer join simplification
Preserving NULLs

Outer join simplification


I have described the situation in which a WHERE clause causes DB2 to simplify the outer join. But
there is also a situation in which the ON clause can cause this simplification to occur, and that is
Meet the experts: Terry Purcell on coding predicates in outer
joins: Advanced outer join constructs

Page 9 of 14

developerWorks

ibm.com/developerWorks/

when an outer join is nested within another outer join. This type of simplification performed by DB2
is only briefly explained in the DB2 Application Programming and SQL Guide .
Nesting joins appear to defy the following rule for coding an ON clause: the ON clause must
immediately succeed the join for which it relates. This is shown in Figure 22 .

Outer Join Nesting -- join simplification

In Figure 22 , table D is"left joined" to the result of the full outer join of table P and E. Thus, the
ON clause dictates that the join between P and E occurs first. This becomes more logical if you
strategically place parentheses and indent the query for readability, such as:
SELECT D.DEPTNO, D.DEPTNAME, P.DEPTNO, P.PROJNAME, E.EMPNO
FROMDEPARTMENT D LEFT OUTER JOIN
(PROJECT P FULL OUTER JOIN EMPLOYEE EM
ONP.DEPTNO = E.WORKDEPT)
OND.DEPTNO = P.DEPTNO
ORDER BY D.DEPTNO
The parentheses more easily show that the full outer join is performed first, and that the resultant
table is the right table of a left outer join.
Meet the experts: Terry Purcell on coding predicates in outer
joins: Advanced outer join constructs

Page 10 of 14

ibm.com/developerWorks/

developerWorks

After the full outer join (step 1), the result contains NULLs that are supplied from either the left or
right table for any unmatched rows. Next, the Department table is left joined to that result set (step
2). For step 2, there are no rows in D that can equal the supplied NULLs, so these rows are not
joined. The consequence is that the NULLs are negated by the final ON clause. Therefore, DB2
realizes that there is no requirement for a full join to be coded initially.
The full outer join from the previous example can be rewritten by the DB2 optimizer as a left outer
join (as in Figure 23 ).

Outer Join Nesting: query rewrite

Now there are no NULL-supplied rows negated by the ON clause.

Preserving NULLs
To avoid DB2 performing this simplification, or more importantly, to maintain all rows introduced by
the full outer join from Figure 22 , use COALESCE to ensure that the resultant join column is not
NULL.

Meet the experts: Terry Purcell on coding predicates in outer


joins: Advanced outer join constructs

Page 11 of 14

developerWorks

ibm.com/developerWorks/

Outer Join Nesting: NULL preservation

The presence of COALESCE ensures that the subsequent join column cannot be NULL.
COALESCE dictates that when the column is NULL-supplied take the value from the preserved
row table. For example, when the left table is preserved, the right is NULL-supplied, and vice
versa.
Now, because of COALESCE, there aren't any NULLs introduced on the subsequent join column,
and thus no rows are inadvertently negated by the ON clause.
Important: As I mentioned earlier, any subsequent table join after a full join must contain the
COALESCE in the ON clause to ensure that NULL-supplied rows are not negated by the join.

Recapping what we've learned about predicates


From the examples shown in Part 1 and 2 of this article, we can draw the following conclusions
about how DB2 evaluates predicates:
WHERE clause predicates:
Applied to a preserved-row table:
Filter rows as either:
Before-join predicates
Meet the experts: Terry Purcell on coding predicates in outer
joins: Advanced outer join constructs

Page 12 of 14

ibm.com/developerWorks/

developerWorks

After-join-step predicates
Totally-after-join predicates.
Explicitly coded in a nested table expression can be written in the WHERE clause.
Applied to a NULL-supplying table:
Optimizer simplifies the join, unless the predicate includes either OR colx IS NULL or
COALESCE to preserve NULLs.
Explicitly coded in a nested table expression can be written in ON clause.
ON clause predicates:
Only filter rows on the NULL-supplying table as with during-join predicates.
Do not filter rows on the preserved-row table.

Conclusion
In this part of my discussion about outer joins I specifically focused coding predicates on the
NULL-supplied table.
On the top of the list was outer join simplification, whereby a SQL coder inadvertently writes a
WHERE or ON clause predicate that negates NULL-supplied rows. If a user explicitly codes the
outer join, then it is unlikely that it was intentional to negate these rows. Of course, producing the
correct result is the objective when coding any SQL statement.
With this newfound knowledge I hope that you are able to write your outer join queries to produce
the result you expect, with the performance you desire. If nothing else I hope to have opened your
eyes to the power of DB2's implementation of outer joins and given you something to think about
when you code your queries.

Meet the experts: Terry Purcell on coding predicates in outer


joins: Advanced outer join constructs

Page 13 of 14

developerWorks

ibm.com/developerWorks/

About the author


Terry Purcell
Terry Purcell is a Senior Consultant with YL&A (Yevich, Lawson & Associates) and
is an industry recognized authority in DB2 SQL. Terry has presented at numerous
conferences on complex SQL and published many articles on SQL performance
including an IBM white paper entitled "Evolution of Star Joins." He is a regular author
for the DB2 Performance Journal and frequent contributor to the DB2 List Server.
Terry has been working with DB2 for over 10 years in both database administration
and application development. He is also an IBM Certified Solutions Expert in DB2
V7 Database Administration for OS/390 and a member of the IBM Gold Consultants
program for DB2. Terry can be reached at Terry_Purcell@ylassoc.com.
Copyright IBM Corporation 2002
(www.ibm.com/legal/copytrade.shtml)
Trademarks
(www.ibm.com/developerworks/ibm/trademarks/)

Meet the experts: Terry Purcell on coding predicates in outer


joins: Advanced outer join constructs

Page 14 of 14

Вам также может понравиться