25 August, 2011

Does GATHER_TABLE_STATS update Index Statistics ?

In 9i, the GATHER_TABLE_STATS procedure would default CASCADE to FALSE.
In 10g and 11g, the procedure defaults CASCADE to AUTO.
In theory, you would expect that if the optimizer knows that the number of updates to the index since the last gather index stats is not significant, the AUTO behaviour for CASCADE would mean that Oracle would not include Indexes when running a GATHER_TABLE_STATS.
(Of course, if the Index statistics are old and there has been significant DML since then, AUTO would cause Index statistics to be gathered).

Here's a test in 11.2 (11.2.0.1) :

First I confirm that my CASCADE still defaults to AUTO :

SQL> select dbms_stats.get_prefs('CASCADE') from dual;

DBMS_STATS.GET_PREFS('CASCADE')
--------------------------------------------------------------------------
DBMS_STATS.AUTO_CASCADE

SQL>


Next I run two tests whereby
a) I CREATE INDEX *after* the table is populated -- so the CREATE INDEX does it's own GATHER_INDEX_STATS
b) I run GATHER_TABLE_STATS
c) I then verify if the GATHER_TABLE_STATS did re-gather index stats

In the first test, the table is populated via "normal" DML. In the second test, the table is populated via a Direct Path INSERT.

However (surprisingly ??) I find that although the CREATE INDEX has actually populated the Index Statistics with 100% sampling, the subsequent call to GATHER_TABLE_STATS seems to re-gather Index Statistics !

23:28:10 SQL> alter session set nls_date_format='DD-MON-RR HH24:MI:SS';

Session altered.

23:28:10 SQL>
23:28:10 SQL> REM REM ##########################################
23:28:10 SQL> REM -- This table has data inserted normally
23:28:10 SQL>
23:28:10 SQL> create table GTS_NORMAL as select * from sales_fact where 1=2;

Table created.

23:28:10 SQL> insert into GTS_NORMAL select * from sales_fact;

1599995 rows created.

23:28:17 SQL> create index GTS_NORMAL_CUST_NDX on GTS_NORMAL (CUST_ID);

Index created.

23:28:27 SQL> -- check existing statistics
23:28:27 SQL> select table_name Object_name, num_rows, last_analyzed
23:28:27 2 from user_tables
23:28:27 3 where table_name = 'GTS_NORMAL'
23:28:27 4 union
23:28:27 5 select index_name Object_name, num_rows, last_analyzed
23:28:27 6 from user_indexes
23:28:27 7 where table_name = 'GTS_NORMAL'
23:28:27 8 /

OBJECT_NAME NUM_ROWS LAST_ANALYZED
------------------------------ ---------- ------------------
GTS_NORMAL
GTS_NORMAL_CUST_NDX 1599995 25-AUG-11 23:28:27

23:28:27 SQL> exec dbms_lock.sleep(180);

PL/SQL procedure successfully completed.

23:31:28 SQL> -- gather statistics
23:31:28 SQL> exec dbms_stats.gather_table_stats('','GTS_NORMAL',estimate_percent=>30,degree=>2);

PL/SQL procedure successfully completed.

23:31:37 SQL> -- recheck statistics
23:31:37 SQL> select table_name Object_name, num_rows, last_analyzed
23:31:37 2 from user_tables
23:31:37 3 where table_name = 'GTS_NORMAL'
23:31:37 4 union
23:31:37 5 select index_name Object_name, num_rows, last_analyzed
23:31:37 6 from user_indexes
23:31:37 7 where table_name = 'GTS_NORMAL'
23:31:37 8 /

OBJECT_NAME NUM_ROWS LAST_ANALYZED
------------------------------ ---------- ------------------
GTS_NORMAL 1597223 25-AUG-11 23:31:36
GTS_NORMAL_CUST_NDX 1572991 25-AUG-11 23:31:37

23:31:37 SQL>

The CREATE INDEX had gathered index statistics with 100% sampling and presenting the correct NUM_ROWS count at 23:28:27. However, the GATHER_TABLE_STATS re-updated Index Statistics as well, this time using the 30% sampling ! It overwrote the Index statistics that had ben gathered only 3minutes earlier (with no new DML occurring in-between !).

Here's the test with a table where DIRECT PATH INSERT is used :

23:31:37 SQL> REM REM ##########################################
23:31:37 SQL> REM -- This table has data inserted via DirectPath
23:31:37 SQL>
23:31:37 SQL> create table GTS_DIRECT as select * from sales_fact where 1=2;

Table created.

23:31:37 SQL> insert /*+ APPEND */ into GTS_DIRECT select * from sales_fact;

1599995 rows created.

23:31:44 SQL> create index GTS_DIRECT_CUST_NDX on GTS_DIRECT (CUST_ID);

Index created.

23:31:57 SQL> -- check existing statistics
23:31:57 SQL> select table_name Object_name, num_rows, last_analyzed
23:31:57 2 from user_tables
23:31:57 3 where table_name = 'GTS_DIRECT'
23:31:57 4 union
23:31:57 5 select index_name Object_name, num_rows, last_analyzed
23:31:57 6 from user_indexes
23:31:57 7 where table_name = 'GTS_DIRECT'
23:31:57 8 /

OBJECT_NAME NUM_ROWS LAST_ANALYZED
------------------------------ ---------- ------------------
GTS_DIRECT
GTS_DIRECT_CUST_NDX 1599995 25-AUG-11 23:31:57

23:31:57 SQL> exec dbms_lock.sleep(180);

PL/SQL procedure successfully completed.

23:34:58 SQL> -- gather statistics
23:34:58 SQL> exec dbms_stats.gather_table_stats('','GTS_DIRECT',estimate_percent=>30,degree=>2);

PL/SQL procedure successfully completed.

23:35:06 SQL> -- recheck statistics
23:35:06 SQL> select table_name Object_name, num_rows, last_analyzed
23:35:06 2 from user_tables
23:35:06 3 where table_name = 'GTS_DIRECT'
23:35:06 4 union
23:35:06 5 select index_name Object_name, num_rows, last_analyzed
23:35:06 6 from user_indexes
23:35:06 7 where table_name = 'GTS_DIRECT'
23:35:06 8 /

OBJECT_NAME NUM_ROWS LAST_ANALYZED
------------------------------ ---------- ------------------
GTS_DIRECT 1599103 25-AUG-11 23:35:04
GTS_DIRECT_CUST_NDX 1645512 25-AUG-11 23:35:06

23:35:06 SQL>

Again, the GATHER_TABLE_STATS resulted in a fresh gathering of Index Statistics, overwriting those that the CREATE INDEX had meticulously done !

What if I try again after some time ?

23:43:04 SQL> exec dbms_stats.gather_table_stats('','GTS_NORMAL',estimate_percent=>30,degree=>2);

PL/SQL procedure successfully completed.

23:43:24 SQL> select table_name Object_name, num_rows, last_analyzed
from user_tables
where table_name = 'GTS_NORMAL'
union
select index_name Object_name, num_rows, last_analyzed
from user_indexes
where table_name = 'GTS_NORMAL'
/
23:43:55 2 23:43:55 3 23:43:55 4 23:43:55 5 23:43:55 6 23:43:55 7 23:43:55 8
OBJECT_NAME NUM_ROWS LAST_ANALYZED
------------------------------ ---------- ------------------
GTS_NORMAL 1597310 25-AUG-11 23:43:23
GTS_NORMAL_CUST_NDX 1594325 25-AUG-11 23:43:24

23:43:55 SQL>
23:43:55 SQL> exec dbms_stats.gather_table_stats('','GTS_DIRECT',estimate_percent=>30,degree=>2);

PL/SQL procedure successfully completed.

23:44:58 SQL> select table_name Object_name, num_rows, last_analyzed
from user_tables
where table_name = 'GTS_DIRECT'
union
select index_name Object_name, num_rows, last_analyzed
from user_indexes
where table_name = 'GTS_DIRECT'
/
23:45:13 2 23:45:13 3 23:45:13 4 23:45:13 5 23:45:13 6 23:45:13 7 23:45:13 8
OBJECT_NAME NUM_ROWS LAST_ANALYZED
------------------------------ ---------- ------------------
GTS_DIRECT 1598393 25-AUG-11 23:44:57
GTS_DIRECT_CUST_NDX 1629881 25-AUG-11 23:44:58

23:45:13 SQL>


So, a run of DBMS_STATS.GATHER_TABLE_STATS seems to re-gather Index Statistics, even if Index Statistics are very fresh and do not need to be updated. What gives ? Why does AUTO_CASCADE behave in this manner ?

.
.
.



2 comments:

Joel Garry said...

I'm not sure why it should be different in theory. You are telling it to auto-cascade, which means you are smarter than it is about whether the interval between last gathering and the cascade gathering has a significant amount of data change. One data point can add enough skew to influence the optimizer with an edge case, right?

I suspect even zero data change can have an effect, if you consider something like an outside change that affects the optimizer (like changing the SGA size that flips a particular table between a small and normal size definition, perhaps system statistics, moving data files to a different device, init parameters, who knows?).

word: sheroca

Hemant K Chitale said...

Joel,
In this case there has been absolutely no change to the data OR to any parameters after the CREATE INDEX.
I wouldn't agree with the logic of re-gathering statistics because there has been an external change (parameters, system_stats, relocation of files etc). The expected behaviour is that if parameters/system_stats etc change then the optimizer may come up with a different plan for the same statistics. But table/index statistics gathering should not be determined by 'non-data" changes. While "non-data" changes can and will influence the optimizer.