Академический Документы
Профессиональный Документы
Культура Документы
17 January 2002
In this second of two parts, Terry Purcell, an IBM Gold Consultant with Yevich, Lawson, and
Associates, and the the winner of the Best Overall Speaker Award at the 2001 IDUG Asia
Pacific conference in Canberra, Australia, offers more advice on coding predicates in outer
joins.
View more content in this series
2002 International Business Machines Corporation. All rights reserved.
Introduction
This is part two of my column in which I attempt to make it easier for you to understand and use
the powerful outer join feature of the SQL language.
Part one provided a simple comparison of inner and outer joins, and also introduced new
terminology for explaining the tables in the outer join operations. Finally, I discussed the different
predicate types and at what stages DB2 is able to apply them.
In this part, I will provide background on coding predicates on the NULL-supplying table, how DB2
may simplify your query to improve performance, and what steps you can take to ensure that you
get the results you expect.
Trademarks
Page 1 of 14
developerWorks
ibm.com/developerWorks/
A WHERE clause predicate must evaluate to TRUE for a row to qualify. If there is an unmatched
row in the join, the columns from the NULL-supplying table are NULL. When DB2 compares the
WHERE clause predicate D.DEPTNAME NOT LIKE '%CENTER%' to a NULL, the result is neither TRUE
nor FALSE, but UNKNOWN. This row is not returned because it does not evaluate to TRUE.
Thus, NULLs that are supplied by the left outer join are negated by the WHERE clause predicate.
This causes DB2 to determine that a left outer join is unnecessary, and it causes the query to be
rewritten as an inner join, which may or may not be what you intended when you coded the query.
The advantage to having DB2 rewrite the left outer as an inner join is that performance might
improve. The predicate that is applied to table D can now be applied before the join, rather than
after, since it now references a non-NULL-supplying table. Preserved-row and NULL-supplying are
unrelated terms for inner joins; given that there will never be an unmatched row that is returned in
the join, that is, both tables are non-NULL-supplying.
Meet the experts: Terry Purcell on coding predicates in outer
joins: Advanced outer join constructs
Page 2 of 14
ibm.com/developerWorks/
developerWorks
Preserving NULLs
If this outer join simplification does not give you the results you wanted -- that is, you require
that the NULL (or optional) rows be returned -- then use OR D.DEPTNAME IS NULL to preserve the
NULLs in the answer set.
An example of this is shown in Figure 14.
DB2 must apply the WHERE clause predicate after the join, because it is not known until after
the join whether the row has been matched (and thus the first part of the predicate -- D.DEPTNAME
NOT LIKE '%CENTER%' -- applies) or unmatched (and thus the second part of the predicate -- OR
D.DEPTNAME IS NULL -- applies).
Before-join predicates
What happens if you choose to code the WHERE clause predicate to be applied by DB2 upon the
NULL-supplying table before the join?
If you do this, there is no WHERE clause predicate to limit rows on the preserved-row table or in
the final result. You are only limiting rows from the NULL-supplying table. Figure 15 shows the
result of this.
Page 3 of 14
developerWorks
ibm.com/developerWorks/
Figure 16 shows that if you recode the before-join predicate to an ON clause predicate, the same
results are returned.
Whether the rows from the NULL-supplying table are filtered before or during the join is a matter
for performance only, and DB2 decides this based on the join method used. For producing the
correct result, either method is valid. Because the row does not match in the join, removing the row
before the join does not affect the output.
DB2 is able (from Version 6 on) to merge any unnecessary nested table expressions such as
those on the NULL-supplying table ( Figure 15 ) to be rewritten as a single query block and applied
as a during-join or before-join predicate ( Figure 16 ).
Meet the experts: Terry Purcell on coding predicates in outer
joins: Advanced outer join constructs
Page 4 of 14
ibm.com/developerWorks/
developerWorks
Without join simplification, the predicate is applied as a totally-after-join predicate. Given that the
optimizer can determine that the predicate negates NULLs, it can rewrite the query as a left outer
join. A full outer join can be rewritten as either a left or right outer join, depending on the table join
sequence chosen.
Rewriting the query as a left outer join means that the WHERE predicate can now be applied as a
before-join predicate, resulting in fewer rows being joined. If the WHERE clause predicates negate
the NULLs from both tables, then simplification causes the query to be rewritten as an inner join.
Executing the query as an inner join allows predicates on both tables to be applied before the join.
Join simplification can be identified by the JOIN_TYPE column of the explain (plan table) output. A
value of 'L' indicates that the join has been simplified to a left outer join (as there is no runtime right
outer join), and a 'blank' indicates an inner join.
Meet the experts: Terry Purcell on coding predicates in outer
joins: Advanced outer join constructs
Page 5 of 14
developerWorks
ibm.com/developerWorks/
Preserving NULLs
If you require that the NULLs from both tables be returned in the result set, alter the WHERE
clause predicate to ensure that they are not removed.
Figure 18 shows an example of a WHERE clause predicate that does preserves NULLs.
COALESCE returns the first value in the list that is not NULL. Thus, the WHERE comparison is
always against an actual value (unless the column is defined to allow NULLs).
In the example shown in Figure 18 , the WHERE clause predicate is applied totally after the join,
because the WHERE predicate depends on columns from both tables.
Is there a better alternative?
Before-join predicates
You can get better performance by coding the query to apply these predicates before the join (as
shown in Figure 19 ).
Page 6 of 14
ibm.com/developerWorks/
developerWorks
Missing rows
In Part one , I mentioned that obtaining the correct result is the major focus when writing any SQL
statement. And with multiple joins involving outer joins, it is very easy to accidentally lose rows
based upon the source of the join predicates.
Of course, I have already explained that columns from a NULL-supplied table will be NULL for
an unmatched row. If a column from a NULL-supplied table is referenced as a join predicate
in a subsequent join, then NULL will never equate, and thus a further match will not be made.
Without interrogating the data, there may be no indication that these rows are missing, because
it is possible that outer join simplification was not necessary by DB2. Outer join simplification is at
least identifiable in the plan table (based on JOIN_TYPE column).
Figure 20 shows an example of a NULL-supplied column that is used in a subsequent join.
Page 7 of 14
developerWorks
ibm.com/developerWorks/
Step 1 applies the before-join predicate. Step 2 performs the left outer join. The left outer join from
the department and employee tables (step 2) does not produce a matching row in this example.
Thus, when step 3 is performed (subsequent left outer join to the project table), the value of the
join predicate from the previous join is NULL, because it was from the NULL-supplied table. The
row is still preserved, however, because it is a left outer join. For an inner join, the row would not
have been preserved.
Page 8 of 14
ibm.com/developerWorks/
developerWorks
In this example, I have corrected the error from the previous example ( Figure 20 ). The resultant
data from step 2 (first left outer join) contains the actual value 'D01' from the preserved row table.
This value is used in step 3 (subsequent left outer join) to match against the project table, rather
than against NULL.
The dependence on join sequence for this query requires that the department table be accessed
TM
Page 9 of 14
developerWorks
ibm.com/developerWorks/
when an outer join is nested within another outer join. This type of simplification performed by DB2
is only briefly explained in the DB2 Application Programming and SQL Guide .
Nesting joins appear to defy the following rule for coding an ON clause: the ON clause must
immediately succeed the join for which it relates. This is shown in Figure 22 .
In Figure 22 , table D is"left joined" to the result of the full outer join of table P and E. Thus, the
ON clause dictates that the join between P and E occurs first. This becomes more logical if you
strategically place parentheses and indent the query for readability, such as:
SELECT D.DEPTNO, D.DEPTNAME, P.DEPTNO, P.PROJNAME, E.EMPNO
FROMDEPARTMENT D LEFT OUTER JOIN
(PROJECT P FULL OUTER JOIN EMPLOYEE EM
ONP.DEPTNO = E.WORKDEPT)
OND.DEPTNO = P.DEPTNO
ORDER BY D.DEPTNO
The parentheses more easily show that the full outer join is performed first, and that the resultant
table is the right table of a left outer join.
Meet the experts: Terry Purcell on coding predicates in outer
joins: Advanced outer join constructs
Page 10 of 14
ibm.com/developerWorks/
developerWorks
After the full outer join (step 1), the result contains NULLs that are supplied from either the left or
right table for any unmatched rows. Next, the Department table is left joined to that result set (step
2). For step 2, there are no rows in D that can equal the supplied NULLs, so these rows are not
joined. The consequence is that the NULLs are negated by the final ON clause. Therefore, DB2
realizes that there is no requirement for a full join to be coded initially.
The full outer join from the previous example can be rewritten by the DB2 optimizer as a left outer
join (as in Figure 23 ).
Preserving NULLs
To avoid DB2 performing this simplification, or more importantly, to maintain all rows introduced by
the full outer join from Figure 22 , use COALESCE to ensure that the resultant join column is not
NULL.
Page 11 of 14
developerWorks
ibm.com/developerWorks/
The presence of COALESCE ensures that the subsequent join column cannot be NULL.
COALESCE dictates that when the column is NULL-supplied take the value from the preserved
row table. For example, when the left table is preserved, the right is NULL-supplied, and vice
versa.
Now, because of COALESCE, there aren't any NULLs introduced on the subsequent join column,
and thus no rows are inadvertently negated by the ON clause.
Important: As I mentioned earlier, any subsequent table join after a full join must contain the
COALESCE in the ON clause to ensure that NULL-supplied rows are not negated by the join.
Page 12 of 14
ibm.com/developerWorks/
developerWorks
After-join-step predicates
Totally-after-join predicates.
Explicitly coded in a nested table expression can be written in the WHERE clause.
Applied to a NULL-supplying table:
Optimizer simplifies the join, unless the predicate includes either OR colx IS NULL or
COALESCE to preserve NULLs.
Explicitly coded in a nested table expression can be written in ON clause.
ON clause predicates:
Only filter rows on the NULL-supplying table as with during-join predicates.
Do not filter rows on the preserved-row table.
Conclusion
In this part of my discussion about outer joins I specifically focused coding predicates on the
NULL-supplied table.
On the top of the list was outer join simplification, whereby a SQL coder inadvertently writes a
WHERE or ON clause predicate that negates NULL-supplied rows. If a user explicitly codes the
outer join, then it is unlikely that it was intentional to negate these rows. Of course, producing the
correct result is the objective when coding any SQL statement.
With this newfound knowledge I hope that you are able to write your outer join queries to produce
the result you expect, with the performance you desire. If nothing else I hope to have opened your
eyes to the power of DB2's implementation of outer joins and given you something to think about
when you code your queries.
Page 13 of 14
developerWorks
ibm.com/developerWorks/
Page 14 of 14