Вы находитесь на странице: 1из 8

Tuning SQL Statements That Use the IN Operator

Roger Schrag, Database Specialists, Inc.


A product manager at a software development company called me recently in a panic. His team had built an
application that runs against Oracle databases, and in their test environment the application offered good response
time. Then a key customer installed the software on their system and discovered that mission critical queries that
should return results in under a second were taking over a minute to complete.
The product manager, suspecting the customers database was not properly tuned or configured, asked me to
examine the system. Gaining access to the customers database was at first politically touchy, but in the end I got
excellent cooperation from the customers DBA staff and found their database was in fine shape.
I quickly switched to my application developer hat as I traced sessions on an active database and ran Tkprof to
collect information about what was going on inside the customers Oracle server. After an hour of generating trace
files and looking at the results, the problem was clear: The application was using the IN operator in its SQL to an
extreme I had never seen before, and the customers Oracle 7.3 database didnt take kindly to this.
The software development company, which Ill refer to as Acme Software, had used an Oracle 7.2 database inhouse to build the application. It turned out that Oracle 7.2 handled this applications far-out use of the IN operator
somewhat better than Oracle 7.3. Also clouding the issue was a flaw in Acmes testing strategy that made it appear
as if the application ran very fast on the Oracle 7.2 database when in reality performance was still not so hot.
So at the end of the day Acme had built an application whose core queries ran in five seconds against an Oracle 7.2
database, 70 seconds against an Oracle 7.3 database, and under one second against an Oracle 8.0 database. All three
databases had the exact same data and ran on comparable hardware. My mission was to explain to the product
manager why different releases of Oracle would give such varied results and, more importantly, help his developers
get the application to run quickly on all releases of Oracle.

The IN Operator
The IN operator in SQL gives you the ability to allow for multiple conditions in one SQL statement. The following
query, for example, will give you the order numbers of all orders that are either canceled or invalid:
SELECT order_number
FROM
orders
WHERE status IN (canceled, invalid)

The list of values enclosed in the parentheses is called an inlist. The above example had an inlist with two values in
it. The following two queries will list the names of all employees in departments 10, 20 and 30. Both queries yield
the same resultsthe IN operator and the = ANY operator mean the same thing to the Oracle server:
SELECT ename
FROM
emp
WHERE deptno IN (10, 20, 30)
SELECT ename
FROM
emp
WHERE deptno = ANY (10, 20, 30)

In each of these queries, the IN operator has been used to select data based on multiple constant values (order status
codes in the first query and department numbers in the second and third). The IN operator can also be used with
subqueries. The query below retrieves names of all employees in departments where the name of the department
starts with the word Shoes:
SELECT ename

FROM
WHERE

emp
deptno
(
SELECT
FROM
WHERE
)

IN
deptno
dept
deptname LIKE Shoes %

IN operators used with subqueries can often be rewritten using a join between the two tables instead. For example,
the previous query is identical to:
SELECT
FROM
WHERE
AND

emp.ename
dept, emp
dept.deptname LIKE Shoes %
emp.deptno = dept.deptno

The query optimizer in the Oracle server is well aware of this trick, and will usually process IN operators with
subqueries as joins instead wherever possible. This has been true since the Oracle V6 days. Developers still use the
IN operator with subqueries, though, because sometimes it makes the code easier to understand. Also, there are
times when this use of the IN operator is not the same as an ordinary join.

How Oracle 7 Processes IN Operators


So far weve looked at examples of how the IN operator can be used to handle multiple conditions in one query
either from a list of enumerated constants or a subquery that returns multiple values. But how does Oracle actually
process the IN operator when executing a SQL statement? The answer can depend on which release of Oracle you
are using, and whether the IN operator is being used with a subquery or enumerated values.
First lets look at the scenario where the IN operator is being used with a subquery. This is the more straightforward
case, and seems to be consistent across all releases of Oracle. Fig. 1 shows the same query used in a previous
example, along with its execution plan. You can see that the Oracle server performs the query as if it were a simple
join between the dept and emp tables: First Oracle finds the departments with names starting with the word Shoes
and then it finds all of the employees in these departments.
SELECT ename
FROM
emp
WHERE deptno
(
SELECT
FROM
WHERE
)

IN
deptno
dept
deptname LIKE Shoes %

Execution Plan
------------------------------------------------------SELECT STATEMENT
NESTED LOOPS
TABLE ACCESS BY ROWID (DEPT)
INDEX RANGE SCAN (DEPT_NAME_IDX)
TABLE ACCESS BY ROWID (EMP)
INDEX RANGE SCAN (EMP_FK_DEPT_IDX)

Fig. 1: A query using the IN operator with a subquery, and its execution plan.
Next lets look at how Oracle processes IN operators where constant values are enumerated in the inlist instead of a
subquery. Fig. 2 shows the same query used in the very first example along with its execution plan. In this query
we wish to retrieve the order numbers of all orders that have been canceled or are invalid.
SELECT order_number
FROM
orders
WHERE status IN (canceled, invalid)

Execution Plan
------------------------------------------------------SELECT STATEMENT
CONCATENATION
TABLE ACCESS BY ROWID (ORDERS)
INDEX RANGE SCAN (ORDER_STATUS_IDX)
TABLE ACCESS BY ROWID (ORDERS)
INDEX RANGE SCAN (ORDER_STATUS_IDX)

Fig. 2: A simple query using the IN operator with a list of constant values, and the resulting execution plan.
If we take it as a given that the status column in the orders table is indexed, you might expect Oracle to simply scan
the status index twice (once for the value canceled and once for the value invalid), and then combine the results.
In a roundabout way, that is in fact exactly what Oracle is doing. First, the Oracle server has transformed the
original query into the following:
SELECT order_number
FROM
orders
WHERE (status = canceled OR status = invalid)

Next, Oracle transforms the query further as follows:


SELECT order_number
FROM
orders
WHERE status = canceled
UNION ALL
SELECT order_number
FROM
orders
WHERE status = invalid

This second transformation should help you understand the execution plan shown in Fig. 2. Conceptually, the
Oracle server is doing about what we would expect: It is looking up all canceled orders and all invalid orders and
combining the results together. Oracle calls this combining action a concatenation.
But suppose the query were more complex. Fig. 3 shows a more sophisticated query using the IN operator with
enumerated values in the inlist, and the resulting execution plan. In this query we are retrieving more information
about canceled and invalid orders than just the order number as in the previous example. Because of the way Oracle
7 transforms the IN operator into an OR clause and subsequently into a compound SQL statement using the UNION
ALL set operator, you can see that the execution plan doubles in size for a query with an IN operator that has two
constant values in its inlist.
SELECT
FROM
WHERE
AND

ORD.order_id, ORD.order_number,
ORD.total, CUST.customer_name
orders ORD, customers CUST
ORD.status IN (canceled, invalid)
CUST.customer_id = ORD.customer_id

Execution Plan
------------------------------------------------------SELECT STATEMENT
CONCATENATION
NESTED LOOPS
TABLE ACCESS BY ROWID (ORDERS)
INDEX RANGE SCAN (ORDER_STATUS_IDX)
TABLE ACCESS BY ROWID (CUSTOMERS)
INDEX UNIQUE SCAN (CUSTOMERS_PK)
NESTED LOOPS
TABLE ACCESS BY ROWID (ORDERS)
INDEX RANGE SCAN (ORDER_STATUS_IDX)
TABLE ACCESS BY ROWID (CUSTOMERS)
INDEX UNIQUE SCAN (CUSTOMERS_PK)

Fig. 3: A more complex query using the IN operator with a list of constant values, and the resulting execution

plan.
As you can imagine, Oracle 7s strategy for processing SQL statements with IN operators and constant values works
well for small inlists, but is not very scaleable. In fact Oracle 7 will return an ORA-01795 error if the inlist for an
IN operator contains more than 254 values.
With this in mind, lets go back to the core queries in Acmes application. These queries, it turns out, hinge on a four
table join and an IN operator with 200 constant values enumerated in the inlist. The execution plan for this query on
their Oracle 7.2 database was over 2000 lines long!
Fig. 4a shows a query similar to one of Acmes core queries with the extreme IN operators. Fig. 4b shows the
execution plan generated by an Oracle 7.2 database. Part of it anyway; you dont need to see all fifty pages to get
the idea!
SELECT
FROM
WHERE
AND
AND
AND

ORD.order_id, ORD.order_number,
ORD.status, CUST.customer_name,
CAT.category, SHIP.description
orders ORD, customers CUST,
customer_categories CAT,
shipping_methods SHIP
CUST.customer_id = ORD.customer_id
CAT.category_id = CUST.category_id
SHIP.shipping_method_id =
ORD.shipping_method_id
ORD.order_id IN
(
385, 409, 507, 511, 601, 633, 641, 651,
715, 799, 972, 984, 1038, 1074, 1122, 1135,
1143, 1189, 1241, 1242, 1370, 1442, 1565, 1581,
1593, 1614, 1803, 1881, 1890, 1899, 1943, 2015,
2049, 2217, 2269, 2389, 2409, 2411, 2486, 2527,
2576, 2696, 2704, 2759, 2768, 2846, 2924, 2927,
2973, 3129, 3138, 3162, 3181, 3252, 3341, 3357,
3383, 3503, 3551, 3738, 3741, 3753, 3822, 3864,
3914, 4090, 4096, 4185, 4221, 4236, 4263, 4271,
4278, 4305, 4396, 4465, 4485, 4570, 4578, 4607,
4656, 4742, 4777, 4799, 4824, 4838, 4940, 5030,
5134, 5137, 5199, 5302, 5360, 5531, 5569, 5592,
5787, 5946, 5986, 6051, 6161, 6191, 6193, 6196,
6261, 6392, 6465, 6597, 6611, 6725, 6782, 6793,
6813, 6822, 6843, 6896, 6995, 7037, 7076, 7141,
7305, 7307, 7327, 7374, 7387, 7410, 7424, 7442,
7443, 7473, 7488, 7496, 7579, 7637, 7662, 7664,
7748, 7830, 7891, 7902, 7947, 7997, 8008, 8092,
8130, 8180, 8224, 8304, 8351, 8400, 8466, 8475,
8480, 8491, 8509, 8572, 8592, 8704, 8766, 8775,
8860, 8897, 8965, 9024, 9034, 9052, 9083, 9089,
9102, 9107, 9230, 9269, 9358, 9412, 9423, 9429,
9490, 9507, 9530, 9581, 9591, 9730, 9768, 9770,
9771, 9772, 9784, 9791, 9854, 9858, 9929, 9964,
9974, 9982, 9987, 9996, 9998,10003,10004,10092
)

Fig. 4a: A query using the IN operator with a long inlist of constant values.

Execution Plan
------------------------------------------------------SELECT STATEMENT
CONCATENATION
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
TABLE ACCESS BY ROWID (ORDERS)
INDEX UNIQUE SCAN (ORDERS_PK)
TABLE ACCESS BY ROWID (SHIPPING_METHODS)
INDEX UNIQUE SCAN (SHIPPING_METHODS_PK)
TABLE ACCESS BY ROWID (CUSTOMERS)
INDEX UNIQUE SCAN (CUSTOMERS_PK)
TABLE ACCESS BY ROWID (CUSTOMER_CATEGORIES)
INDEX UNIQUE SCAN (CUSTOMER_CATEGORIES_PK)
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
TABLE ACCESS BY ROWID (ORDERS)
INDEX UNIQUE SCAN (ORDERS_PK)
TABLE ACCESS BY ROWID (SHIPPING_METHODS)
INDEX UNIQUE SCAN (SHIPPING_METHODS_PK)
TABLE ACCESS BY ROWID (CUSTOMERS)
INDEX UNIQUE SCAN (CUSTOMERS_PK)
TABLE ACCESS BY ROWID (CUSTOMER_CATEGORIES)
INDEX UNIQUE SCAN (CUSTOMER_CATEGORIES_PK)
. . .

Fig. 4b: The execution plan generated by an Oracle 7.2 database for the query shown in Fig. 4a.
With release 7.3, Oracle apparently altered the optimizers strategy for transforming SQL statements with large
inlists. Instead of transforming one query into 200 queries and using UNION ALL to bring the results together,
Oracle 7.3 will often ignore the relevant index and instead perform one full table scan. In this scenario, Oracle reads
every row of data from the table and compares each to all values listed in the inlist of the IN operator. If the
underlying table is large, this full table scan can clobber performance.
To illustrate this point, Fig. 4c shows the execution plan generated by an Oracle 7.3 database for the query shown in
Fig. 4a. Instead of looking up each of the 200 order_id values in the primary key index on the orders table, the
Oracle 7.3 server has elected to perform a full scan of the orders table, finding all rows that have any of the 200
values in the order_id column, and then join the results to the other three tables.
Execution Plan
------------------------------------------------------SELECT STATEMENT
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
TABLE ACCESS FULL (ORDERS)
TABLE ACCESS BY ROWID (CUSTOMERS)
INDEX UNIQUE SCAN (CUSTOMERS_PK)
TABLE ACCESS BY ROWID (CUSTOMER_CATEGORIES)
INDEX UNIQUE SCAN (CUSTOMER_CATEGORIES_PK)
TABLE ACCESS BY ROWID (SHIPPING_METHODS)
INDEX UNIQUE SCAN (SHIPPING_METHODS_PK)

Fig. 4c: The execution plan generated by an Oracle 7.3 database for the query shown in Fig. 4a.
Its not clear exactly where Oracle 7.3 draws the line. If the inlist is quite small (only a few constant values) then
Oracle 7.3 will use the same transformation strategy as Oracle 7.2. But when the inlist is large (over perhaps 30 or
40 enumerated values) the full table scan seems to be used instead. You can force Oracle to use the transformation
strategy, regardless of software release and inlist size, by using the USE_CONCAT optimizer hint.

Enter Oracle 8

Oracle 8 seems to handle IN operators with subqueries in much the same way that Oracle 7 and V6 did. This was
already very efficient and needed no improvement. But for IN operators with many constant values in the inlist,
Oracle 8 introduces the new inlist iterator data access path. This new method in the query optimizer makes
handling of large inlists of constants extremely efficient and intuitive. Fig. 4d shows the execution plan generated
by an Oracle 8.0 database for the query shown in Fig. 4a.
Execution Plan
------------------------------------------------------SELECT STATEMENT
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
INLIST ITERATOR CONCATENATED
TABLE ACCESS BY INDEX ROWID (ORDERS)
INDEX RANGE SCAN (ORDERS_PK)
TABLE ACCESS BY INDEX ROWID (CUSTOMERS)
INDEX UNIQUE SCAN (CUSTOMERS_PK)
TABLE ACCESS BY INDEX ROWID (CUSTOMER_CATEGORIES)
INDEX UNIQUE SCAN (CUSTOMER_CATEGORIES_PK)
TABLE ACCESS BY INDEX ROWID (SHIPPING_METHODS)
INDEX UNIQUE SCAN (SHIPPING_METHODS_PK)

Fig. 4d: The execution plan generated by an Oracle 8.0 database for the query shown in Fig. 4a.
With the inlist iterator of Oracle 8, there is no transforming of the SQL statement into something far more complex
than the original. And there is also no setting aside indexes in favor of full table scans. Instead, the Oracle 8 server
just does what you would expect: It walks down the list of constant values spelled out in the inlist and looks each
one up in the index, collects matching rows from the table, and moves on.
There is no hint to turn on the inlist iterator data access path. The optimizer will simply use it when appropriate. If
for some reason you need Oracle 8 to mimic the Oracle 7 behavior, you can use the USE_CONCAT hint to force the
UNION ALL transformation.

Back to the Original Puzzle


Using the query shown in Fig. 4a one can now explain the puzzling results Acme Software observed. Their in-house
Oracle 7.2 database transformed the core queries in their application into extremely complex statements as
demonstrated in Fig. 4b. It took their Oracle server about five seconds to parse such a query and build the
enormous execution plan. Actually executing and fetching the query, though, proved fast because only unique index
scans and concatenations were involved.
The first time Acme ran a regression test of their application, the core query took five seconds because of the long
parse. Subsequent runs took under one second because the SQL statement was already in the shared pool and did
not need to be parsed. So Acme had concluded that the query ran very fast in their regression test. (This was
actually a flawed conclusion because in real life the criteria would be slightly different on each invocation, and so
the five second parse would apply each time an end-user initiated a search.)
On an Oracle 7.3 database, meanwhile, the query always took approximately 70 seconds. This was directly
attributable to the full table scan of the orders tableone of the largest tables in their databaseas demonstrated in
Fig. 4c.
And finally on the Oracle 8.0 database, the query always ran in less than one second because the new inlist iterator
data access path put an end to complex transformations and full table scans.
All of this served to explain why Acmes application performed differently on different Oracle releases. But the real
interest was in getting the application to run fast on any release of Oracle. One approach would be to add a
USE_CONCAT optimizer hint to the query in Fig. 4a to cause Oracle releases 7.2, 7.3, and 8.0 to all process the
query in the same way (as shown in Fig. 4b). This could benefit Oracle 7.3 users, but would significantly harm

Oracle 8 users. And at five seconds per query, the performance would not be very satisfying.
A better solution was to do away with the enormous inlists altogether. After some research, it turned out that the
application in question was a PowerBuilder screen where an end-user could query order information from the
database based on many different types of criteria. Instead of building one query to retrieve order information based
on the end-users criteria, the application worked by retrieving order_ids of orders meeting the criteria and then
retrieving the orders themselves using a second query along the lines of the query shown in Fig. 4a.
Breaking the operation into two separate SQL statements like this perhaps made the application easier for the
developer to build, but offered no other benefits. At the same time, this strategy came at a high performance cost for
Oracle 7.2 and 7.3 users. Acme ultimately got the application to run faster on all releases of Oracle by merging the
two SQL statements together and replacing queries like the one in Fig. 4a with dynamically generated queries such
as the one shown in Fig. 5.
SELECT
FROM
WHERE
AND
AND
AND
AND
AND

ORD.order_id, ORD.order_number,
ORD.status, CUST.customer_name,
CAT.category, SHIP.description
orders ORD, customers CUST,
customer_categories CAT,
shipping_methods SHIP
CUST.customer_id = ORD.customer_id
CAT.category_id = CUST.category_id
SHIP.shipping_method_id =
ORD.shipping_method_id
ORD.order_date BETWEEN :date_lo AND :date_hi
ORD.status = :order_status
ORD.total BETWEEN :total_lo AND :total_hi

Fig. 5: A query to efficiently retrieve orders based on ad hoc search criteria.

Conclusion
The IN operator allows one SQL statement to accommodate multiple possible conditions. These conditions could
come from a subquery that retrieves multiple values or from an enumerated list of constant values. IN operators
with subqueries are a very powerful tool in the SQL language. IN operators with relatively small numbers of
constant values can also be very useful.
The Oracle server has handled IN operators that use subqueries in a very efficient manner since the days of Oracle
V6. Oracle 7 handles IN operators with a small set of constant values in a reasonably efficient manner, but it is not
until Oracle 8 that you can use arbitrarily large numbers of constant values in an inlist without paying performance
penalties.
Until I began troubleshooting Acme Softwares application, the idea of using the IN operator with large numbers of
constant values in the inlist had never occurred to me. As was the case with Acmes application, I believe that the
large inlists are often unnecessary. But if such uses of the IN operator are something you feel you cannot live
without, then Oracle 8 has you covered. Still using Oracle 7? Then try to eliminate your giant inlists or consider an
upgrade to Oracle 8.

References
Oracle 7 Server SQL Reference, Oracle Corporation
Oracle 7 Server Tuning, Oracle Corporation
Oracle 8 Server Tuning, Oracle Corporation
Oracle Performance Tuning, OReilly & Associates, Inc.

About the Author


Roger Schrag has been an Oracle DBA and application architect for over nine years, starting out at Oracle

Corporation on the Oracle Financials development team. He is currently a senior consultant at Database Specialists,
Inc., a small consulting group specializing in business solutions based on Oracle technology. You can visit Database
Specialists on the web at www.dbspecialists.com, and you can reach Roger by calling +1.415.841.1429 or via email
at rschrag@dbspecialists.com.

Вам также может понравиться