Вы находитесь на странице: 1из 6

Function Based Index and Column Statistics

Introduction
During the investigation of some YCRM query tunings, I was puzzled that Oracle refused to use a function based index created on a single column, when they are multiple indexes which can be considered. The column which has the function based index has much better selectivity than the column contained inside the normal index actually picked. The original query is very long, around 400 lines joining 64 tables, a typical SIEBEL application query. For investigation purpose, only the main table, S_ASSET, is important. So I extracted a sub query only related to S_ASSET as follows:
select * from siebel.s_asset T1 where (((T1.INTERNAL_ASSET_FLG = 'N' OR T1.INTERNAL_ASSET_FLG IS NULL) AND T1.TEST_ASSET_FLG != 'Y' ) AND (T1.BU_ID = :2) ) AND (NLS_UPPER(T1.X_YMIT_ADCTR_ACCNT_NUM,'NLS_SORT=GENERIC_BASELETTER') NLS_UPPER(:4,'NLS_SORT=GENERIC_BASELETTER'))

Index Definitions, Table and Column Statistics


S_ASSET has a lot of columns and indexes. Here are the ones we are really concerned: S_ASSET_U3: (BU_ID, ASSET_NUM, PROD_ID, CONFLICT_ID) FIDX_NLS_ASSET: (SYS_NC00165$), that is NLS_UPPER(X_YMIT_ADCTR_ACCNT_NUM,'nls_sort=GENERIC_BASELETTER') Table Statistics #Rows: 2100413 #Blks: 131858 AvgRowLen: 460.00 ChainCnt: 0.00 Column Statistics Column NDV BU_ID 16 X_YMIT_ADCTR_ACCNT_NUM 14310

Nulls 0 2086103

Density 0.000165 0.000069

AvgLen 8 2

Histograms Freq Height Balanced

Buckets 12 254

Index Statistics FIDX_NLS_ASSET : LVLS: 1 #LB: 41 #DK: 14374 LB/K: 1.00 DB/K: 1.00 CLUF: 14335.00 S_ASSET_U3 : LVLS: 2 #LB: 16825 #DK: 2161293 LB/K: 1.00 DB/K: 1.00 CLUF: 2017636.00

Explain Plan
Given the selectivity of column X_YMIT_ADCTR_ACCNT_NUM, I was expecting the explain plan would pick the function based index FIDX_NLS_ASSET for my simple test query. When the following result came back, I was so surprised to see the opposite.
SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT Plan hash value: 1326221515 -----------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -----------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 78 | 35880 | 1272 (0)| 00:00:16 | |* 1 | TABLE ACCESS BY INDEX ROWID| S_ASSET | 78 | 35880 | 1272 (0)| 00:00:16 | |* 2 | INDEX RANGE SCAN | S_ASSET_U3 | 131K| | 11 (0)| 00:00:01 | -----------------------------------------------------------------------------------------Predicate Information (identified by operation id): --------------------------------------------------1 - filter(("T1"."INTERNAL_ASSET_FLG" IS NULL OR "T1"."INTERNAL_ASSET_FLG"='N') AND NLS_UPPER("X_YMIT_ADCTR_ACCNT_NUM",'nls_sort='' GENERIC_BASELETTER''')=NLS_UPPER(:4,'nls_sort=''GENERIC_BASELETTER''') AND "T1"."TEST_ASSET_FLG"<>'Y') 2 - access("T1"."BU_ID"=:2) 18 rows selected.

But the real surprise came after I used hint to force the query on index FIDX_NLS_ASSET, which did have much lower cost.
SQL> explain plan for select /*+ index(T1 FIDX_NLS_ASSET) */ * from siebel.s_asset T1 where (((T1.INTERNAL_ASSET_FLG = 'N' OR T1.INTERNAL_ASSET_FLG IS NULL) AND T1.TEST_ASSET_FLG != 'Y' ) AND (T1.BU_ID = :2) ) AND (NLS_UPPER(T1.X_YMIT_ADCTR_ACCNT_NUM,'NLS_SORT=GENERIC_BASELETTER') = NLS_UPPER(:4,'NLS_SORT=GENERIC_BASELETTER')) SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------Plan hash value: 2315759423 ---------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 78 | 35880 | 84 (0)| 00:00:02 | |* 1 | TABLE ACCESS BY INDEX ROWID| S_ASSET | 78 | 35880 | 84 (0)| 00:00:02 | |* 2 | INDEX RANGE SCAN | FIDX_NLS_ASSET | 8402 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------Predicate Information (identified by operation id): --------------------------------------------------1 - filter("T1"."BU_ID"=:2 AND ("T1"."INTERNAL_ASSET_FLG" IS NULL OR

"T1"."INTERNAL_ASSET_FLG"='N') AND "T1"."TEST_ASSET_FLG"<>'Y') 2 - access(NLS_UPPER("X_YMIT_ADCTR_ACCNT_NUM",'nls_sort=''GENERIC_BASELETTER''')=NL S_UPPER(:4,'nls_sort=''GENERIC_BASELETTER''')) 17 rows selected.

CBO Trace
Why did Oracle pick a plan with much higher cost estimation? Did oracle ever consider the lower cost function based index plan? A very useful tool is CBO (10053) trace file with explain plan. Of course, if you can actually run the query with 10053 event enabled, it will even be better.
SQL> alter session set events '10053 trace name context forever, level 1'; SQL> explain plan for select * from siebel.s_asset T1 where (((T1.INTERNAL_ASSET_FLG = 'N' OR T1.INTERNAL_ASSET_FLG IS NULL) AND T1.TEST_ASSET_FLG != 'Y' ) AND (T1.BU_ID = :2) ) AND (NLS_UPPER(T1.X_YMIT_ADCTR_ACCNT_NUM,'NLS_SORT=GENERIC_BASELETTER') = NLS_UPPER(:4,'NLS_SORT=GENERIC_BASELETTER'))

The resulted CBO trace file shows Oracle did evaluate function based index:
***** Virtual column Adjustment ****** Column name SYS_NC00165$ cost_cpu 300.00 cost_io 1797693134862315708145274237317043567980705675258449965989174768031572607800285387605895586327668 7817154045895351438246423432132688946418276846754670353751698604991057655128207624549009038932894 4075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250 404026184124858368.00 ***** End virtual column Adjustment ****** Access Path: index (AllEqGuess) Index: FIDX_NLS_ASSET resc_io: 8380.00 resc_cpu: 90572845 ix_sel: 0.584503 ix_sel_with_filters: 0.584503 Cost: 83.85 Resp: 83.85 Degree: 1 Access Path: index (RangeScan) Index: S_ASSET_U3 resc_io: 127157.00 resc_cpu: 1432407278 ix_sel: 0.062500 ix_sel_with_filters: 0.062500 Cost: 1272.32 Resp: 1272.32 Degree: 1

CBO trace file shows access path with index FIDX_NLS_ASSET has much better cost estimation than access path with index S_ASSET_U3. But, Oracle picked S_ASSET_U3 anyway.
Best:: AccessPath: IndexRange Index: S_ASSET_U3 Cost: 1272.32 Degree: 1

Resp: 1272.32

Card: 78.11

Bytes: 0

What makes Oracle to decide on a higher cost plan? Note the section Virtual column Adjustment has a monster size of cost_io. I have no idea what exactly it is, but the final cost with index FIDX_NLS_ASSET is only 83.85. Where can we get a clue? There is no other better place than CBO trace file to look for clues. After read through the full trace file, I could not find any text like this access path is rejected because of

. There is one interesting hint when evaluating access path using FIDX_NLS_ASSET, AllEqGuess. Furthermore, the following lines did provide more clues.
Column (#165): SYS_NC00165$( NO STATISTICS (using defaults) AvgLen: 122 NDV: 65638 Nulls: 0 Density: 0.000015

Although CBO trace file includes column statistics for BU_ID and several other columns involved, it does not include the column statistics for X_YMIT_ADCTR_ACCNT_NUM.

Virtual Column Statistics


When Oracle creates a function based index, it will create a virtual column. For index FIDX_NLS_ASSET, the virtual column name is SYS_NC00165$. It is interesting that while Oracle gathered index statics when the function based index was created, the column statistics for the virtual column was left alone. YCRM uses METHOD_OPT as FOR ALL INDEXED COLUMNS for its statistics job, and this virtual column was left out again. Virtual column information can be found inside data dictionary view dba_tab_cols.
SQL> select column_name, num_distinct, density, num_nulls from dba_tab_cols owner='SIEBEL' and table_name='S_ASSET' order by column_name COLUMN_NAME NUM_DISTINCT DENSITY NUM_NULLS ------------------------------ ------------ ---------- ---------... SP_NUM SRL_NUM_VRFD_FLG STATUS_CD 7 3.8252E-06 1968889 SYS_NC00165$ TEST_ASSET_FLG 1 2.3299E-07 0 where

Oracle suggests gathering new column statistics after a function based index is created so that the virtual column statistics will be gathered, using METHOD_OPT as FOR ALL HIDDEN COLUMNS. The information can be found at http://docs.oracle.com/cd/B28359_01/server.111/b28274/stats.htm, section 13.3.1.9.
You should gather new column statistics on a table after creating a function-based index, to allow Oracle to collect column statistics equivalent information for the expression. This is done by calling the statisticsgathering procedure with the METHOD_OPT argument set to FOR ALL HIDDEN COLUMNS.
SQL> exec dbms_stats.gather_table_stats(ownname=>'SIEBEL', tabname=>'S_ASSET', estimate_percent=>1, method_opt=>'FOR ALL HIDDEN COLUMNS', degree=>4); PL/SQL procedure successfully completed. SQL> select column_name, num_distinct, density, num_nulls from dba_tab_cols owner='SIEBEL' and table_name='S_ASSET' and column_name like 'SYS_%'; COLUMN_NAME NUM_DISTINCT DENSITY NUM_NULLS ------------------------------ ------------ ---------- ---------SYS_NC00165$ 14200 .000070423 2078300

where

The Impact with Virtual Column Statistics


Now we have virtual column statistics related to the function based index, will Oracle pick the function based index for test query? Lets try it again:

SQL> explain plan for select * from siebel.s_asset T1 where (((T1.INTERNAL_ASSET_FLG = 'N' OR T1.INTERNAL_ASSET_FLG IS NULL) AND T1.TEST_ASSET_FLG != 'Y' ) AND (T1.BU_ID = :2) ) AND (NLS_UPPER(T1.X_YMIT_ADCTR_ACCNT_NUM,'NLS_SORT=GENERIC_BASELETTER') = NLS_UPPER(:4,'NLS_SORT=GENERIC_BASELETTER')) / Explained. QL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------Plan hash value: 2315759423 ---------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 453 | 1 (0)| 00:00:01 | |* 1 | TABLE ACCESS BY INDEX ROWID| S_ASSET | 1 | 453 | 1 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | FIDX_NLS_ASSET | 1 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------Predicate Information (identified by operation id): --------------------------------------------------1 - filter("T1"."BU_ID"=:2 AND ("T1"."INTERNAL_ASSET_FLG" IS NULL OR "T1"."INTERNAL_ASSET_FLG"='N') AND "T1"."TEST_ASSET_FLG"<>'Y') 2 - access(NLS_UPPER("X_YMIT_ADCTR_ACCNT_NUM",'nls_sort=''GENERIC_BASELETTER''')=NL S_UPPER(:4,'nls_sort=''GENERIC_BASELETTER''')) 17 rows selected.

Not only Oracle picked the function based index, the cost estimation also dropped to 1. The reason for the cost and cardinality estimation drop is the large count of null values of column X_YMIT_ADCTR_ACCNT_NUM. When the column statistics for the virtual column was absent, the default column statistics considered the number of nulls as zero. I enabled 10053 event again. Here is the relevant evaluation traces
***** Virtual column Adjustment ****** Column name SYS_NC00165$ cost_cpu 300.00 cost_io 1797693134862315708145274237317043567980705675258449965989174768031572607800285387605895586327668 7817154045895351438246423432132688946418276846754670353751698604991057655128207624549009038932894 4075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250 404026184124858368.00 ***** End virtual column Adjustment ****** Access Path: index (AllEqRange) Index: FIDX_NLS_ASSET resc_io: 2.00 resc_cpu: 18770 ix_sel: 0.000069 ix_sel_with_filters: 0.000069 Cost: 1.00 Resp: 1.00 Degree: 1 Access Path: index (RangeScan) Index: S_ASSET_U3 resc_io: 121276.00 resc_cpu: 1366625036 ix_sel: 0.062500 ix_sel_with_filters: 0.062500

Cost: 1213.48

Resp: 1213.48

Degree: 1

Note the cost_io of Virtual column Adjustment is still a mysterious monster number. The evaluation path with FIDX_NLS_ASSET does not use AllEqGuess anymore, but with AllEqRange. Looks like Oracle does not like AllEqGuess. With AlLEqRange, Oracle happily picks the function based index FIDX_NLS_ASSET.
Best:: AccessPath: IndexRange Index: FIDX_NLS_ASSET Cost: 1.00 Degree: 1 Resp: 1.00

Card: 0.55

Bytes: 0

Conclusion
When function based index is used, we need make sure the column statistics on the related virtual column are gathered. Even when the original column has accurate statistics, it may not represent true distribution of the virtual column. Even worse, it is possible that Oracle will not consider the original column statistics at all. This might not be an issue for databases using default METHOD_OPT or FOR ALL COLUMNS for the statistics job, but it will present challenges for the databases which only gather column statistics on selected columns, for example, with METHOD_OPT like FOR ALL INDEXED COLUMNS.

Вам также может понравиться