Вы находитесь на странице: 1из 41

The Impact of Missing Partition Level Column Stats on Partition Key

EDWLITE does not have long running report query for a while. Let's check what this one is doing, which has run more than 10 hours.

1. 2. 3. 4.

Realtime tracking screen, Stats Tab of SQL panel. Review SQL elapsed time (ELAPSED_MS) against other time. The majority of elapsed time has been on CPU (46246215/47063224 = 98%). High CPU time is frequently caused by high cache access (memory access), although not always. LIO buffer_gets is 2B (2,115,581,456).

High DIRECT_WRITES is usually caused by large HASH JOIN or SORTING

1. Some systems allow parallel operation to cross multiple instances. So it is better to check SQL stats on all instances. 2. Some PX SQL execution could have more than 1 child cursors, 1 for QC and 1 or more for slave processes. For that case, it is better to blank out Cursor# text box. 3. Multiple slave cursors could be generated for parallel DDL operation cross multiple instances. 4. In this case, the majority of the activities are on node 4. 5. Also two PX slave processes have done their tasks.

1. Understanding the business meaning and the schema characteristics can significantly help troubleshooting and tuning. Here the only thing I can do is guessing. 2. This SQL is to query data for a list of customer IDs. Usually this should be the start of the query plan. 3. This SQL is to query a fact table for a time range. And the time range is specified inside a DATE/TIME dimension table. A typical pattern of some Y! report queries and constantly troublemakers, especially in 10G when dealing with partition pruning. 4. We can use Related Table/Index Info tab to research the table and index structures and stats, to gain some sense about how this query could run.

1. For 11G, v$sql_monitor and v$sql_plan_monitor are always our best friends for troubleshooting. 2. This information is usually kept for very short period after SQL completes. In any case, when researching a problematic SQL (not necessarily long running like this), try to gather information from both views as earliest as possible. You can use Plan Monitor tab for this purpose. SQL Monitor View is for gv$sql_monitor. Summary View is for aggregated gv$sql_plan_monitor and Detail View is for every sessions inside gv$sql_plan_monitor for the concerned SQL. 3. In this case, we only get v$sql_monitor information from one session. Other sessions might be long done or idled for too long and the in-memory info has been purged. We have known two PX sessions have been done. It is also interesting to research why a long running SQL with PX only has activities on a single session, either it is caused by data skew or some steps are not in PX.

1. Here is v$sql_plan_monitor info. PID (parent plan line ID) column is blank. For PX operation, it is recorded inside QC session. Because the query has run for too long, we have lost the information recorded inside QC session, which also includes PLAN_OBJECT_NAME, CARD (plan cardinality estimate). But we can always refer back to the execution plan for those information. 2. Always pay attention to any very large values inside OUTPUT_ROWS column, and TIME_SEC column. For IO intensive query, READ_REQ or WRT_REQ are the most important ones. 3. Here we can see the line 8 HASH JOIN has both as large numbers, and the first row source of this HASH JOIN only used 822 seconds, with a huge output of 44,809,229 rows. 4. While the output of HASH JOIN is very large, it is the first row source (outer table) of a NESTED LOOPS JOIN (line 7). It could cause huge LIOs or cache accesses for the inner table of the NL JOIN.

1. The "start" count of step 32 is consistent with the output of HASH JOIN at step 8. This step is an INLIST ITERATOR. Because the INLIST ITERATOR will be operated on each record from HASH JOIN output, the overall "start" count will be amplified by the INLIST size. That is why we see step 33/34 with a huge number 4,294,957,578, 121 times of 35,302,788. 2. Step 34 INDEX UNIQUE SCAN returned 211,816,728 index records. Oracle retrieved each related rows from the table then applied the JOIN condition plus other filter condition. The final output is only 1996. So there is significant inefficiency here. 3. Remember earlier we saw the v$sql has buffer_gets around 2B, so pretty much here is the source of LIO. 4. Note most of the runtime information from step 11 to 31 have been lost because the query has run for too long.

Using Active Sessions tab of the top panel to check current system session status. There is only one session for the concerned query is still active on this node. Right click to invoke context menu and select "Track PX Operation"

1. One important tool to track query status is v$session_longops. 2. Inside "SQL Sessions", we can see all sessions (active and idle" related to this query. Right click any row for the concerned SQL to invoke context menu and select "Long Ops by SQL"

1. There is only one active operation, at the top and it is a HASH JOIN. 2. There was plenty of time used by SID 413 to scan MYDIM.DIM_ACCOUNT_C table on plan step 31. This must be the other row source related to the HASH JOIN. Since V$sql_plan_monitor has no information about step 31, we can think this step has done long time ago. 3. Usually high CPU usage from HASH JOIN is caused by data skew, when some hash buckets have very long chains. I believe there is some data skew because there is only one session left we can see meaningful activity. But the CPU usage could be from other steps HASH JOIN is feeding the output to them. In any case, we can use information from v$sesstat to check which assumption is true.

V$sesstat is one method to check what the SQL is actually doing.

1. After initial display, click Refresh button to get delta and average. 2. In this case, the majority session stats are LIO related.

1. PIO stats is not very big, average 111KB/s. 2. Average 46K/s LIO is a high number. For case of data skew, high LIO is normally not expected because the majority of CPU would be wasted on traverse linked list chain inside several hash buckets. 3. Since the delta shows the major stat changes are LIO related, so we can conclude the issue is related to LIO

Session Events will give us what the session has been waiting so far

The other useful info is ASH. Using "Activity" tab will give us ASH information for related cursor. The dominant one if CPU.

1. HASH JOIN is inside PX. So there may be some data skew since one session has been done. 2. At step 8, the estimated HASH JOIN cardinality is 2587, but the actual output is beyond 35M and still counting. So the bad estimate could be the source of suboptimal plan. 3. There is a HASH JOIN between line 15 to 22, a fact table with a date dimension table. By common sense, we should expect the JOIN output is inline with the output from the fact table. The fact table has 51M rows as estimated output, but the final HASH JOIN output is 2874. The bad news is that we lost runtime data about these steps inside v$sql_plan_monitor, so we cannot simply verify by using V$ view data.

The plan predicates show us the fact table date range and the INLIST we mentioned in SQL PLAN MONITOR screen.

1. Usually suboptimal plan is the result of bad or missing stats. 2. Use Related Table/Index Info to check table/index stats 3. Here is the list for all the tables/indexes used by the SQL cursor.

Table stats for DIM_DATE

1. For JOIN cardinality estimate, table/partition stats and join column stats are very important. 2. Column stats for DATE_SID is fine for DIM_DATE.

Check column stats LOW value and high value to make sure the dates inside the SQL are in range. Fake Out of Range is a frequent case for cardinality underestimate.

The fact table has no global stats.

The fact table is partitioned by DATE_SID.

1. Let's try partition stats. 2. We don't know partition name yet. So use % wildcard to start so that we can get the partition name convention.

1. Modify partition pattern here with % as wildcard. Here is the result for 201203. We can do the similar for 201202, 201204 and 201205, since those months are referenced. 2. We do have partition level stats. Each partition is not very large.

Lets check column stats. For now, we can only check one partition at a time. So we have to pick several partitions to check.

1. Column stats for DATE_SID, the join key, is missing for 20120517 partition, the end date of the query. 2. Because some columns have stats, but not all columns, the stats gathering scripts must have used something like for all indexed columns. 3. Partition key DATE_SID is not inside any index, because it is assume only one DATE_SID per partition.

Column stats for DATE_SID does exist for earlier days. We can find out DATE_SID has column stats up to 20120408.

1. We need further investigation about the root cause of high LIO, that is, why the join output between DIM_DATE and the fact table is so low. 2. One method is to construct some small queries as subsets from original query, and manipulate some predicates, for example, changing the date range, and to use "explain plan" to see the difference, especially when we know the boundary dates 04/08 and 04/09. 3. This Framework does not have Explain Plan capability yet. So we have to use SQLPLUS for this purpose. 4. If DATE_SID is up to 08-04-2012, the JOIN cardinality is good.

1. The estimate is not good for 09-04-2012. One more day is added. We expected more, but got less.

1. We have to use 10053 event trace to find out why the one day difference could make the plan so different. 2. 10053 CBO event is very useful to investigate those types of delicate changes. Here we enable 10053 event before rerun two explain plans with different dates. Then we can compare stats, cardinality, join order and cost calculation differences.

1. You can try to generate one for each explain plan and compare the trace files with text file diff tool. 2. Here I have run two explain inside the same trace file. So I have to read through it.

1. Here is the table stats part for the first good query. 2. Oracle CBO does know the partitions involved and consider their stats.

1. CBO needs column stats at partition or global level for JOIN cardinality estimate. 2. Here is the summary for good case with date as 08-04-2012. Note NDV value 51, and density is 0.019608.

Here is the JOIN cardinality calculation, for good case, use sel(ectivity) of 1/51.

Here is the plan with good JOIN cardinality estimate.

Here is the query with bad plan, with additional date 0904-2012 which has no partition column stats for DATE_SID.

1. Note the information for Part#: 525, which as NDV 889211, compared to Part# 524 which has column stats, with NDV 1. 2. The reason is "NO STATISTICS (using defaults)"

1. Now the overall NDV is 889262. 2. The density is 0.000001. The density for good plan is 0.019608.

1. Here is impact of bad NDV and density on JOIN cardinality estimate. 2. The JOIN cardinality calculation is way off. We added additional day and expected more, but got less. All is because of missing column stat for the additional day, which changes NDV for DATE_SID from 51 to 889262 and the sel(ectivity) is changed from 1/51 to 1/889262.

Of course with bad plan

Вам также может понравиться