Function Based Index and Column Statistics
Function Based Index and Column Statistics
Introduction
During the investigation of some YCRM query tunings, I was puzzled that Oracle refused to use a function based index created on a single column, when they are multiple indexes which can be considered. The column which has the function based index has much better selectivity than the column contained inside the normal index actually picked. The original query is very long, around 400 lines joining 64 tables, a typical SIEBEL application query. For investigation purpose, only the main table, S_ASSET, is important. So I extracted a sub query only related to S_ASSET as follows:
select * from siebel.s_asset T1 where (((T1.INTERNAL_ASSET_FLG = 'N' OR T1.INTERNAL_ASSET_FLG IS NULL) AND T1.TEST_ASSET_FLG != 'Y' ) AND (T1.BU_ID = :2) ) AND (NLS_UPPER(T1.X_YMIT_ADCTR_ACCNT_NUM,'NLS_SORT=GENERIC_BASELETTER') NLS_UPPER(:4,'NLS_SORT=GENERIC_BASELETTER'))
Nulls 0 2086103
AvgLen 8 2
Buckets 12 254
Index Statistics FIDX_NLS_ASSET : LVLS: 1 #LB: 41 #DK: 14374 LB/K: 1.00 DB/K: 1.00 CLUF: 14335.00 S_ASSET_U3 : LVLS: 2 #LB: 16825 #DK: 2161293 LB/K: 1.00 DB/K: 1.00 CLUF: 2017636.00
Explain Plan
Given the selectivity of column X_YMIT_ADCTR_ACCNT_NUM, I was expecting the explain plan would pick the function based index FIDX_NLS_ASSET for my simple test query. When the following result came back, I was so surprised to see the opposite.
SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT Plan hash value: 1326221515 -----------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -----------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 78 | 35880 | 1272 (0)| 00:00:16 | |* 1 | TABLE ACCESS BY INDEX ROWID| S_ASSET | 78 | 35880 | 1272 (0)| 00:00:16 | |* 2 | INDEX RANGE SCAN | S_ASSET_U3 | 131K| | 11 (0)| 00:00:01 | -----------------------------------------------------------------------------------------Predicate Information (identified by operation id): --------------------------------------------------1 - filter(("T1"."INTERNAL_ASSET_FLG" IS NULL OR "T1"."INTERNAL_ASSET_FLG"='N') AND NLS_UPPER("X_YMIT_ADCTR_ACCNT_NUM",'nls_sort='' GENERIC_BASELETTER''')=NLS_UPPER(:4,'nls_sort=''GENERIC_BASELETTER''') AND "T1"."TEST_ASSET_FLG"<>'Y') 2 - access("T1"."BU_ID"=:2) 18 rows selected.
But the real surprise came after I used hint to force the query on index FIDX_NLS_ASSET, which did have much lower cost.
SQL> explain plan for select /*+ index(T1 FIDX_NLS_ASSET) */ * from siebel.s_asset T1 where (((T1.INTERNAL_ASSET_FLG = 'N' OR T1.INTERNAL_ASSET_FLG IS NULL) AND T1.TEST_ASSET_FLG != 'Y' ) AND (T1.BU_ID = :2) ) AND (NLS_UPPER(T1.X_YMIT_ADCTR_ACCNT_NUM,'NLS_SORT=GENERIC_BASELETTER') = NLS_UPPER(:4,'NLS_SORT=GENERIC_BASELETTER')) SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------Plan hash value: 2315759423 ---------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 78 | 35880 | 84 (0)| 00:00:02 | |* 1 | TABLE ACCESS BY INDEX ROWID| S_ASSET | 78 | 35880 | 84 (0)| 00:00:02 | |* 2 | INDEX RANGE SCAN | FIDX_NLS_ASSET | 8402 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------Predicate Information (identified by operation id): --------------------------------------------------1 - filter("T1"."BU_ID"=:2 AND ("T1"."INTERNAL_ASSET_FLG" IS NULL OR
CBO Trace
Why did Oracle pick a plan with much higher cost estimation? Did oracle ever consider the lower cost function based index plan? A very useful tool is CBO (10053) trace file with explain plan. Of course, if you can actually run the query with 10053 event enabled, it will even be better.
SQL> alter session set events '10053 trace name context forever, level 1'; SQL> explain plan for select * from siebel.s_asset T1 where (((T1.INTERNAL_ASSET_FLG = 'N' OR T1.INTERNAL_ASSET_FLG IS NULL) AND T1.TEST_ASSET_FLG != 'Y' ) AND (T1.BU_ID = :2) ) AND (NLS_UPPER(T1.X_YMIT_ADCTR_ACCNT_NUM,'NLS_SORT=GENERIC_BASELETTER') = NLS_UPPER(:4,'NLS_SORT=GENERIC_BASELETTER'))
The resulted CBO trace file shows Oracle did evaluate function based index:
***** Virtual column Adjustment ****** Column name SYS_NC00165$ cost_cpu 300.00 cost_io 1797693134862315708145274237317043567980705675258449965989174768031572607800285387605895586327668 7817154045895351438246423432132688946418276846754670353751698604991057655128207624549009038932894 4075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250 404026184124858368.00 ***** End virtual column Adjustment ****** Access Path: index (AllEqGuess) Index: FIDX_NLS_ASSET resc_io: 8380.00 resc_cpu: 90572845 ix_sel: 0.584503 ix_sel_with_filters: 0.584503 Cost: 83.85 Resp: 83.85 Degree: 1 Access Path: index (RangeScan) Index: S_ASSET_U3 resc_io: 127157.00 resc_cpu: 1432407278 ix_sel: 0.062500 ix_sel_with_filters: 0.062500 Cost: 1272.32 Resp: 1272.32 Degree: 1
CBO trace file shows access path with index FIDX_NLS_ASSET has much better cost estimation than access path with index S_ASSET_U3. But, Oracle picked S_ASSET_U3 anyway.
Best:: AccessPath: IndexRange Index: S_ASSET_U3 Cost: 1272.32 Degree: 1
Resp: 1272.32
Card: 78.11
Bytes: 0
What makes Oracle to decide on a higher cost plan? Note the section Virtual column Adjustment has a monster size of cost_io. I have no idea what exactly it is, but the final cost with index FIDX_NLS_ASSET is only 83.85. Where can we get a clue? There is no other better place than CBO trace file to look for clues. After read through the full trace file, I could not find any text like this access path is rejected because of
. There is one interesting hint when evaluating access path using FIDX_NLS_ASSET, AllEqGuess. Furthermore, the following lines did provide more clues.
Column (#165): SYS_NC00165$( NO STATISTICS (using defaults) AvgLen: 122 NDV: 65638 Nulls: 0 Density: 0.000015
Although CBO trace file includes column statistics for BU_ID and several other columns involved, it does not include the column statistics for X_YMIT_ADCTR_ACCNT_NUM.
Oracle suggests gathering new column statistics after a function based index is created so that the virtual column statistics will be gathered, using METHOD_OPT as FOR ALL HIDDEN COLUMNS. The information can be found at http://docs.oracle.com/cd/B28359_01/server.111/b28274/stats.htm, section 13.3.1.9.
You should gather new column statistics on a table after creating a function-based index, to allow Oracle to collect column statistics equivalent information for the expression. This is done by calling the statisticsgathering procedure with the METHOD_OPT argument set to FOR ALL HIDDEN COLUMNS.
SQL> exec dbms_stats.gather_table_stats(ownname=>'SIEBEL', tabname=>'S_ASSET', estimate_percent=>1, method_opt=>'FOR ALL HIDDEN COLUMNS', degree=>4); PL/SQL procedure successfully completed. SQL> select column_name, num_distinct, density, num_nulls from dba_tab_cols owner='SIEBEL' and table_name='S_ASSET' and column_name like 'SYS_%'; COLUMN_NAME NUM_DISTINCT DENSITY NUM_NULLS ------------------------------ ------------ ---------- ---------SYS_NC00165$ 14200 .000070423 2078300
where
SQL> explain plan for select * from siebel.s_asset T1 where (((T1.INTERNAL_ASSET_FLG = 'N' OR T1.INTERNAL_ASSET_FLG IS NULL) AND T1.TEST_ASSET_FLG != 'Y' ) AND (T1.BU_ID = :2) ) AND (NLS_UPPER(T1.X_YMIT_ADCTR_ACCNT_NUM,'NLS_SORT=GENERIC_BASELETTER') = NLS_UPPER(:4,'NLS_SORT=GENERIC_BASELETTER')) / Explained. QL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------Plan hash value: 2315759423 ---------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 453 | 1 (0)| 00:00:01 | |* 1 | TABLE ACCESS BY INDEX ROWID| S_ASSET | 1 | 453 | 1 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | FIDX_NLS_ASSET | 1 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------Predicate Information (identified by operation id): --------------------------------------------------1 - filter("T1"."BU_ID"=:2 AND ("T1"."INTERNAL_ASSET_FLG" IS NULL OR "T1"."INTERNAL_ASSET_FLG"='N') AND "T1"."TEST_ASSET_FLG"<>'Y') 2 - access(NLS_UPPER("X_YMIT_ADCTR_ACCNT_NUM",'nls_sort=''GENERIC_BASELETTER''')=NL S_UPPER(:4,'nls_sort=''GENERIC_BASELETTER''')) 17 rows selected.
Not only Oracle picked the function based index, the cost estimation also dropped to 1. The reason for the cost and cardinality estimation drop is the large count of null values of column X_YMIT_ADCTR_ACCNT_NUM. When the column statistics for the virtual column was absent, the default column statistics considered the number of nulls as zero. I enabled 10053 event again. Here is the relevant evaluation traces
***** Virtual column Adjustment ****** Column name SYS_NC00165$ cost_cpu 300.00 cost_io 1797693134862315708145274237317043567980705675258449965989174768031572607800285387605895586327668 7817154045895351438246423432132688946418276846754670353751698604991057655128207624549009038932894 4075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250 404026184124858368.00 ***** End virtual column Adjustment ****** Access Path: index (AllEqRange) Index: FIDX_NLS_ASSET resc_io: 2.00 resc_cpu: 18770 ix_sel: 0.000069 ix_sel_with_filters: 0.000069 Cost: 1.00 Resp: 1.00 Degree: 1 Access Path: index (RangeScan) Index: S_ASSET_U3 resc_io: 121276.00 resc_cpu: 1366625036 ix_sel: 0.062500 ix_sel_with_filters: 0.062500
Cost: 1213.48
Resp: 1213.48
Degree: 1
Note the cost_io of Virtual column Adjustment is still a mysterious monster number. The evaluation path with FIDX_NLS_ASSET does not use AllEqGuess anymore, but with AllEqRange. Looks like Oracle does not like AllEqGuess. With AlLEqRange, Oracle happily picks the function based index FIDX_NLS_ASSET.
Best:: AccessPath: IndexRange Index: FIDX_NLS_ASSET Cost: 1.00 Degree: 1 Resp: 1.00
Card: 0.55
Bytes: 0
Conclusion
When function based index is used, we need make sure the column statistics on the related virtual column are gathered. Even when the original column has accurate statistics, it may not represent true distribution of the virtual column. Even worse, it is possible that Oracle will not consider the original column statistics at all. This might not be an issue for databases using default METHOD_OPT or FOR ALL COLUMNS for the statistics job, but it will present challenges for the databases which only gather column statistics on selected columns, for example, with METHOD_OPT like FOR ALL INDEXED COLUMNS.