Hemant's Oracle DBA Blog: Testing Gather Stats behaviour based on COL

Here I am posting some results on tests with COL_USAGE$ and GATHER_STATS behaviours of 'FOR ALL COLUMNS SIZE AUTOSKEWONLY250' with different ESTIMATE_PERCENTs.

I've put in some, what I hope are, explanatory notes in the output.

These tests were on 10.2.0.4 where a few bugs are fixed. I had slightly different results in 10.2.0.2 (a different database so different rows counts). I plan to post the same test results on other 10.2.0.1/10.2.0.3 databases, when possible. The bugs relate to a higher degree of inaccuracy in colum histograms / histogram types. 10.2.0.4 seems to be more reliable, but not 100%

23:26:08 SQL> select table_name, num_rows, last_analyzed from user_tables;

TABLE_NAME                       NUM_ROWS LAST_ANALYZED                                                                            
------------------------------ ---------- ---------------                                                                          
SOURCE_TABLE                                                                                                                       

23:26:19 SQL> select * from sys.col_usage$ where obj# in (select object_id from user_objects);

no rows selected

23:26:38 SQL> create table test_data (owner_col varchar2(30), name_col varchar2(128), id_col number, type_col varchar2(19), created_col date);

Table created.

23:28:06 SQL> insert into test_data select owner, object_name, object_id, object_type, created from dba_objects;

50605 rows created.

23:28:28 SQL> commit;

Commit complete.

23:28:33 SQL> select object_id, data_object_id from user_objects where object_name  ='TEST_DATA';

 OBJECT_ID DATA_OBJECT_ID                                                                                                          
---------- --------------                                                                                                          
     53392          53392                                                                                                          

23:28:57 SQL> insert into test_data select * from test_data union all select * from test_data;

101210 rows created.

23:29:20 SQL> commit;

Commit complete.

23:29:24 SQL> select count(*) from test_data;

  COUNT(*)                                                                                                                         
----------                                                                                                                         
    151815                                                                                                                         

23:29:47 SQL> select count(*) from test_data where owner_col = 'HEMANT';

  COUNT(*)                                                                                                                         
----------                                                                                                                         
         9                                                                                                                         

23:30:09 SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

23:30:26 SQL> select count(*) from test_data where owner_col = 'HEMANT';

  COUNT(*)                                                                                                                         
----------                                                                                                                         
         9                                                                                                                         

23:30:50 SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

23:31:26 SQL> select * from sys.col_usage$ where obj# in (select object_id from user_objects);

      OBJ#    INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP                  
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------            
     53392          1              1              0                 0           0          0          0 01-AUG 23:30:26            

23:31:43 SQL> select object_name, object_id from user_objects;

OBJECT_NAME                                                                                                                        
--------------------------------------------------------------------------------------------------------------------------------   
 OBJECT_ID                                                                                                                         
----------                                                                                                                         
SOURCE_TABLE                                                                                                                       
     53391                                                                                                                         
                                                                                                                                   
TEST_DATA                                                                                                                          
     53392                                                                                                                         
                                                                                                                                   
LEV_SEQ                                                                                                                            
     53322                                                                                                                         
                                                                                                                                   

23:32:00 SQL> REM So, for OBJ 53392 (table TEST_DATA), COL_USAGE$ show that Col#1 (OWNER_COL has had 1 query on Equality Predicate)
23:38:44 SQL>
23:38:44 SQL> select count(*) from test_data where id_col > 1000;

  COUNT(*)                                                                                                                         
----------                                                                                                                         
    148953                                                                                                                         

23:39:04 SQL> select * from sys.col_usage$ where obj# in (select object_id from user_objects);

      OBJ#    INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP                  
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------            
     53392          1              1              0                 0           0          0          0 01-AUG 23:30:26            

REM  The query on iD_COL (which is INTCOL#3 in the table) isn't reflected in COL_USAGE$ yet.
REM  We'll flush database monitoring to ensure that the data is reflected
REM   note : A Gather_Statistics call would be expected to flush monitoring as well

23:39:13 SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

23:39:23 SQL> select * from sys.col_usage$ where obj# in (select object_id from user_objects);

      OBJ#    INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP                  
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------            
     53392          1              1              0                 0           0          0          0 01-AUG 23:30:26            
     53392          3              0              0                 0           1          0          0 01-AUG 23:39:23            

23:39:31 SQL> REM The entry for Col# (ID_COL) now shows that a RangeScan was executed
23:40:09 SQL>
23:40:09 SQL> exec dbms_stats.gather_schema_stats(user);

PL/SQL procedure successfully completed.

23:40:20 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE LAST_ANALYZED                                                                
------------------------------ ---------- ----------- ---------------                                                              
SOURCE_TABLE                       303108       60119 01-AUG 23:40:20                                                              
TEST_DATA                          152116       38029 01-AUG 23:40:20                                                              

23:40:36 SQL> REM Oracle used an Auto Size to sample Table statistics.
23:40:58 SQL> REM what about columns ?

23:41:07 SQL> select column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
23:41:28   2  from user_tab_col_statistics
23:41:32   3  where table_name = 'TEST_DATA';

COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM       SAMPLE_SIZE LAST_ANALYZED                       
------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------                     
OWNER_COL                                22 3.3324E-06          22 FREQUENCY              5832 01-AUG 23:40:20                     
NAME_COL                              28387 .000035227           1 NONE                  38029 01-AUG 23:40:20                     
ID_COL                                50487 .000019807           1 NONE                  38029 01-AUG 23:40:20                     
TYPE_COL                                 31 .032258065           1 NONE                   5832 01-AUG 23:40:20                     
CREATED_COL                             935 .001069519           1 NONE                   5832 01-AUG 23:40:20                     

23:41:49 SQL> REM Oracle gathered a Frequency Histogram on OWNER_COL.

23:42:44 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>100);

PL/SQL procedure successfully completed.

23:43:06 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE LAST_ANALYZED                                                                
------------------------------ ---------- ----------- ---------------                                                              
SOURCE_TABLE                       303666      303666 01-AUG 23:43:04                                                              
TEST_DATA                          151815      151815 01-AUG 23:43:06                                                              

23:43:14 SQL> select column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
23:43:22   2  from user_tab_col_statistics
23:43:28   3  where table_name = 'TEST_DATA';

COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM       SAMPLE_SIZE LAST_ANALYZED                       
------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------                     
OWNER_COL                                26 3.2935E-06          26 FREQUENCY            151815 01-AUG 23:43:06                     
NAME_COL                              30275 .000033031           1 NONE                 151815 01-AUG 23:43:06                     
ID_COL                                50605 .000019761           1 NONE                 151815 01-AUG 23:43:06                     
TYPE_COL                                 41 .024390244           1 NONE                 151815 01-AUG 23:43:06                     
CREATED_COL                            1380 .000724638           1 NONE                 151815 01-AUG 23:43:06                     

23:43:33 SQL> REM  Sampling is 100% but no new histograms are gathered
23:44:24 SQL>
23:44:44 SQL> select count(*) from test_data where type_col like 'P%';

  COUNT(*)                                                                                                                         
----------                                                                                                                         
      5268                                                                                                                         

23:44:53 SQL> select * from sys.col_usage$ where obj# in (select object_id from user_objects);

      OBJ#    INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP                  
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------            
     53392          1              1              0                 0           0          0          0 01-AUG 23:30:26            
     53392          3              0              0                 0           1          0          0 01-AUG 23:39:23            

23:45:06 SQL> REM Col_Usage still shows queries on OWNER_COL and ID_COL
23:45:21 SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

23:45:31 SQL> select * from sys.col_usage$ where obj# in (select object_id from user_objects);

      OBJ#    INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP                  
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------            
     53392          1              1              0                 0           0          0          0 01-AUG 23:30:26            
     53392          3              0              0                 0           1          0          0 01-AUG 23:39:23            
     53392          4              0              0                 0           0          1          0 01-AUG 23:45:31            

23:45:38 SQL> REM Now we see the LIKE_PREDs having been used for Col#4 (TYPE_COL)
23:46:10 SQL> exec dbms_stats.gather_schema_stats(user);

PL/SQL procedure successfully completed.

23:46:27 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE LAST_ANALYZED                                                                
------------------------------ ---------- ----------- ---------------                                                              
SOURCE_TABLE                       303256       75814 01-AUG 23:46:27                                                              
TEST_DATA                          152828       38207 01-AUG 23:46:27                                                              

23:46:40 SQL> select column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
23:47:10   2  from user_tab_col_statistics
23:47:23   3  where table_name = 'TEST_DATA';

COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM       SAMPLE_SIZE LAST_ANALYZED                       
------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------                     
OWNER_COL                                22 .000003283          22 FREQUENCY              6332 01-AUG 23:46:27                     
NAME_COL                              28368 .000035251           1 NONE                  38207 01-AUG 23:46:27                     
ID_COL                                50662 .000019739           1 NONE                  38207 01-AUG 23:46:27                     
TYPE_COL                                 30 .000003283          30 FREQUENCY              6332 01-AUG 23:46:27                     
CREATED_COL                             950 .001052632           1 NONE                   6332 01-AUG 23:46:27                     

23:47:36 SQL> REM Now, Table level statistics are updated and a new Histogram on TYPE_COL has been gathered
23:48:44 SQL> REM The Presence of TYPE_COL (INTCOL#=4) in SYS.COL_USAGE$ drove the 'FOR ALL COLUMN SIZE AUTO
23:49:16 SQL> REM to generate a histogram on this column
23:49:34 SQL> REM Earlier, there was no Histogram on this column
23:50:04 SQL> REM Therefore, the first one or few queries on this column would have been executed without the
23:50:23 SQL> REM advantage of Histogram !
23:50:37 SQL>
23:50:47 SQL> create table other_test_data as select * from test_data;

Table created.

23:51:01 SQL> exec dbms_Stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

23:51:11 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE LAST_ANALYZED                                                                
------------------------------ ---------- ----------- ---------------                                                              
OTHER_TEST_DATA                                                                                                                    
SOURCE_TABLE                       303256       75814 01-AUG 23:46:27                                                              
TEST_DATA                          152828       38207 01-AUG 23:46:27                                                              

23:51:20 SQL> exec dbms_stats.gather_schema_stats(user);

PL/SQL procedure successfully completed.

23:51:57 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE LAST_ANALYZED                                                                
------------------------------ ---------- ----------- ---------------                                                              
OTHER_TEST_DATA                    151376       37844 01-AUG 23:51:56                                                              
SOURCE_TABLE                       301861       51707 01-AUG 23:51:57                                                              
TEST_DATA                          151992       37998 01-AUG 23:51:57                                                              

23:52:05 SQL> REM OK, we now have table statistics on OTHER_TEST_DATA
23:52:14 SQL> REM   ... what about columns ?
23:52:20 SQL> select column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
23:52:32   2  from user_tab_col_statistics
23:52:40   3  wehre table_name = 'OTHER_TEST_DATA'
23:52:47 SQL> l3
  3* wehre table_name = 'OTHER_TEST_DATA'
23:52:49 SQL> c/wehre/where
  3* where table_name = 'OTHER_TEST_DATA'
23:52:55 SQL> l
  1  select column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
  2  from user_tab_col_statistics
  3* where table_name = 'OTHER_TEST_DATA'
23:52:55 SQL> /

COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM       SAMPLE_SIZE LAST_ANALYZED                       
------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------                     
OWNER_COL                                25        .04           1 NONE                   5781 01-AUG 23:51:56                     
NAME_COL                              28382 .000035234           1 NONE                  37844 01-AUG 23:51:56                     
ID_COL                                50681 .000019731           1 NONE                  37844 01-AUG 23:51:56                     
TYPE_COL                                 30 .033333333           1 NONE                   5781 01-AUG 23:51:56                     
CREATED_COL                             929 .001076426           1 NONE                   5781 01-AUG 23:51:56                     

23:52:56 SQL> REM This table has exactly the same data as table "TEST_DATA"
23:53:10 SQL> REM Yet, Oracle's method_opt=>'FOR ALL COLUMNS SIZE AUTO' behaviour did not gather any histograms
23:53:34 SQL> REM The reason ? That sys.col_usage$ does not indicate any queries as having been executed yet !
23:53:55 SQL> REM     slightly counter-intuitive ? Until queries are executed, Oracle will not decide which
23:54:18 SQL> REM histograms to create.
23:54:23 SQL> REM     but the first few queries will, probably, "suffer" for lack of histograms
23:54:46 SQL>
23:54:46 SQL> exec dbms_stats.gather_schema_stats(user, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

23:55:41 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE LAST_ANALYZED                                                                
------------------------------ ---------- ----------- ---------------                                                              
OTHER_TEST_DATA                    151724       37931 01-AUG 23:55:40                                                              
SOURCE_TABLE                       302095       58785 01-AUG 23:55:41                                                              
TEST_DATA                          152784       38196 01-AUG 23:55:41                                                              

23:55:54 SQL> col table_name format a18
23:56:15 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
23:56:23   2  from user_tab_col_statistics
23:56:27   3  where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2,1;

TABLE_NAME         COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM       SAMPLE_SIZE LAST_ANALYZED    
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------  
OTHER_TEST_DATA    CREATED_COL                             903  .00110742           1 NONE                   4915 01-AUG 23:55:40  
TEST_DATA          CREATED_COL                             899 .001112347           1 NONE                   5021 01-AUG 23:55:41  
OTHER_TEST_DATA    ID_COL                                50377  .00001985           1 NONE                  37931 01-AUG 23:55:40  
TEST_DATA          ID_COL                                50762   .0000197           1 NONE                  38196 01-AUG 23:55:41  
OTHER_TEST_DATA    NAME_COL                              28466  .00003513           1 NONE                  37931 01-AUG 23:55:40  
TEST_DATA          NAME_COL                              28556 .000035019           1 NONE                  38196 01-AUG 23:55:41  
OTHER_TEST_DATA    OWNER_COL                                23 .043478261           1 NONE                   4915 01-AUG 23:55:40  
TEST_DATA          OWNER_COL                                20        .05           1 NONE                   5021 01-AUG 23:55:41  
OTHER_TEST_DATA    TYPE_COL                                 32     .03125           1 NONE                   4915 01-AUG 23:55:40  
TEST_DATA          TYPE_COL                                 31 .032258065           1 NONE                   5021 01-AUG 23:55:41  

10 rows selected.

23:57:05 SQL> REM  All the columns are without Histograms
23:57:16 SQL> REM  The "method_opt=>'FOR ALL COLUMNS SIZE 1'" has deleted the Histograms
23:57:45 SQL>
23:57:55 SQL> exec dbms_stats.gather_schema_stats(user,method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY');

PL/SQL procedure successfully completed.

23:58:11 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME           NUM_ROWS SAMPLE_SIZE LAST_ANALYZED                                                                            
------------------ ---------- ----------- ---------------                                                                          
OTHER_TEST_DATA        151456       37864 01-AUG 23:58:07                                                                          
SOURCE_TABLE           303666      303666 01-AUG 23:58:10                                                                          
TEST_DATA              152804       38201 01-AUG 23:58:11                                                                          

23:58:19 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
23:58:28   2  from user_tab_col_statistics
23:58:35   3  where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME         COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM       SAMPLE_SIZE LAST_ANALYZED    
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------  
TEST_DATA          CREATED_COL                             957 .002409639         254 HEIGHT BALANCED        6456 01-AUG 23:58:11  
OTHER_TEST_DATA    CREATED_COL                             886  .00243309         254 HEIGHT BALANCED        5061 01-AUG 23:58:07  
TEST_DATA          ID_COL                                50294 .000019883           1 NONE                  38201 01-AUG 23:58:11  
OTHER_TEST_DATA    ID_COL                                50622 .000019754           1 NONE                  37864 01-AUG 23:58:07  
TEST_DATA          NAME_COL                              28410 .000052648         254 HEIGHT BALANCED       38201 01-AUG 23:58:11  
OTHER_TEST_DATA    NAME_COL                              28447 .000052918         254 HEIGHT BALANCED       37864 01-AUG 23:58:07  
TEST_DATA          OWNER_COL                                23 3.3141E-06          23 FREQUENCY              6456 01-AUG 23:58:11  
OTHER_TEST_DATA    OWNER_COL                                20 3.2683E-06          20 FREQUENCY              5061 01-AUG 23:58:07  
TEST_DATA          TYPE_COL                                 31 3.3141E-06          31 FREQUENCY              6456 01-AUG 23:58:11  
OTHER_TEST_DATA    TYPE_COL                                 31 3.2683E-06          31 FREQUENCY              5061 01-AUG 23:58:07  

10 rows selected.

23:58:47 SQL> REM Aah ! SIZE SKEWONLY caused Histograms to be gathered on most columns
23:59:13 SQL> REM  but do you notice that sample sizes are different between the two tables ?
23:59:26 SQL> REM  and that estmated "num_distinct" and "num_buckets"  aren't the same ! (even though the data is the same !)
23:59:51 SQL> REM  sampling errors have creeped in !
REM  I'd like to test this back in 10.2.0.x  where x is less than 4 !!
00:00:00 SQL>
00:00:01 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY');

PL/SQL procedure successfully completed.

00:00:41 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:00:49   2  from user_tab_col_statistics
00:00:55   3  where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME         COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM       SAMPLE_SIZE LAST_ANALYZED    
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------  
TEST_DATA          CREATED_COL                            1380 .002350099         254 HEIGHT BALANCED      151815 02-AUG 00:00:41  
OTHER_TEST_DATA    CREATED_COL                            1380 .002350099         254 HEIGHT BALANCED      151815 02-AUG 00:00:30  
TEST_DATA          ID_COL                                50605 .000019761           1 NONE                 151815 02-AUG 00:00:41  
OTHER_TEST_DATA    ID_COL                                50605 .000019761           1 NONE                 151815 02-AUG 00:00:30  
TEST_DATA          NAME_COL                              30275   .0000387         254 HEIGHT BALANCED      151815 02-AUG 00:00:41  
OTHER_TEST_DATA    NAME_COL                              30275   .0000387         254 HEIGHT BALANCED      151815 02-AUG 00:00:30  
TEST_DATA          OWNER_COL                                26 3.2935E-06          26 FREQUENCY            151815 02-AUG 00:00:41  
OTHER_TEST_DATA    OWNER_COL                                26 3.2935E-06          26 FREQUENCY            151815 02-AUG 00:00:30  
TEST_DATA          TYPE_COL                                 41 3.2935E-06          41 FREQUENCY            151815 02-AUG 00:00:41  
OTHER_TEST_DATA    TYPE_COL                                 41 3.2935E-06          41 FREQUENCY            151815 02-AUG 00:00:30  

10 rows selected.

00:01:03 SQL> REM  Now, with 100% sampling and SIZE SKEWONLY, we have "better" histograms
00:02:18 SQL> REM  (see, e.g. how OWNER_COL's NUM_DISTINCT is up from 23 and 20 to 26 and 26 for the two tables)
00:02:45 SQL> REM  (or, TYPE_COL NUM_DISTINCT is up from 31 to 41)
REM  I'd like to test this back in 10.2.0.x  where x is less than 4 !!
00:03:01 SQL>
00:09:42 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE AUTO');

PL/SQL procedure successfully completed.

00:11:02 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:11:09   2  from user_tab_col_statistics
00:11:15   3  where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME         COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM       SAMPLE_SIZE LAST_ANALYZED    
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------  
TEST_DATA          CREATED_COL                            1380 .000724638           1 NONE                 151815 02-AUG 00:11:02  
OTHER_TEST_DATA    CREATED_COL                            1380 .000724638           1 NONE                 151815 02-AUG 00:10:54  
TEST_DATA          ID_COL                                50605 .000019761           1 NONE                 151815 02-AUG 00:11:02  
OTHER_TEST_DATA    ID_COL                                50605 .000019761           1 NONE                 151815 02-AUG 00:10:54  
TEST_DATA          NAME_COL                              30275 .000033031           1 NONE                 151815 02-AUG 00:11:02  
OTHER_TEST_DATA    NAME_COL                              30275 .000033031           1 NONE                 151815 02-AUG 00:10:54  
TEST_DATA          OWNER_COL                                26 3.2935E-06          26 FREQUENCY            151815 02-AUG 00:11:02  
OTHER_TEST_DATA    OWNER_COL                                26 .038461538           1 NONE                 151815 02-AUG 00:10:54  
TEST_DATA          TYPE_COL                                 41 3.2935E-06          41 FREQUENCY            151815 02-AUG 00:11:02  
OTHER_TEST_DATA    TYPE_COL                                 41 .024390244           1 NONE                 151815 02-AUG 00:10:54  

10 rows selected.

00:11:21 SQL> REM  Note how the "SIZE AUTO" reverted to Histograms on OWNER_COL and TYPE_COL only in TEST_DATA !
00:12:29 SQL> REM  It isn't intelligent enough to figure out that OTHER_TEST_DATA's columns are also candidates
00:13:03 SQL>
00:13:20 SQL> select count(*) from other_test_data where type_col like 'T%';

  COUNT(*)                                                                                                                         
----------                                                                                                                         
     12291                                                                                                                         

00:13:34 SQL> exec dbms_Stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

00:14:00 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE AUTO');

PL/SQL procedure successfully completed.

00:14:27 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:14:28   2  from user_tab_col_statistics
00:14:34   3  where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME         COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM       SAMPLE_SIZE LAST_ANALYZED    
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------  
TEST_DATA          CREATED_COL                            1380 .000724638           1 NONE                 151815 02-AUG 00:14:27  
OTHER_TEST_DATA    CREATED_COL                            1380 .000724638           1 NONE                 151815 02-AUG 00:14:19  
TEST_DATA          ID_COL                                50605 .000019761           1 NONE                 151815 02-AUG 00:14:27  
OTHER_TEST_DATA    ID_COL                                50605 .000019761           1 NONE                 151815 02-AUG 00:14:19  
TEST_DATA          NAME_COL                              30275 .000033031           1 NONE                 151815 02-AUG 00:14:27  
OTHER_TEST_DATA    NAME_COL                              30275 .000033031           1 NONE                 151815 02-AUG 00:14:19  
TEST_DATA          OWNER_COL                                26 3.2935E-06          26 FREQUENCY            151815 02-AUG 00:14:27  
OTHER_TEST_DATA    OWNER_COL                                26 .038461538           1 NONE                 151815 02-AUG 00:14:19  
TEST_DATA          TYPE_COL                                 41 3.2935E-06          41 FREQUENCY            151815 02-AUG 00:14:27  
OTHER_TEST_DATA    TYPE_COL                                 41 3.2935E-06          41 FREQUENCY            151815 02-AUG 00:14:19  

10 rows selected.

00:14:41 SQL> REM Now it has gathered a Histogram on OTHER_TEST_DATA.TYPE_COL because it is now reflected in SYS.COL_USAGE$
00:15:23 SQL> select t.table_name, cu.* from  user_objects t, sys.col_usage$ cu
00:15:55   2  where t.object_id=cu.obj#
00:16:04 SQL> l1
  1* select t.table_name, cu.* from  user_objects t, sys.col_usage$ cu
00:16:07 SQL> c/table/object
  1* select t.object_name, cu.* from  user_objects t, sys.col_usage$ cu
00:16:10 SQL> l
  1  select t.object_name, cu.* from  user_objects t, sys.col_usage$ cu
  2* where t.object_id=cu.obj#
00:16:11 SQL> /

OBJECT_NAME                                                                                                                        
--------------------------------------------------------------------------------------------------------------------------------   
      OBJ#    INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP                  
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------            
TEST_DATA                                                                                                                          
     53392          4              0              0                 0           0          1          0 01-AUG 23:45:31            
                                                                                                                                   
TEST_DATA                                                                                                                          
     53392          3              0              0                 0           1          0          0 01-AUG 23:39:23            
                                                                                                                                   
TEST_DATA                                                                                                                          
     53392          1              1              0                 0           0          0          0 01-AUG 23:30:26            
                                                                                                                                   
OTHER_TEST_DATA                                                                                                                    
     53395          4              0              0                 0           0          1          0 02-AUG 00:14:00            
                                                                                                                                   

00:16:12 SQL> REM Col#4 in both tables is TYPE_COL
00:16:45 SQL> REM  In both tables, I have run a LIKE query on this column
00:17:02 SQL> REM The TIMESTAMP shows that there has been no query on TEST_DATA recently, but there has been 1 query on OTHER_TEST_DATA
00:17:32 SQL>
00:17:32 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

00:18:43 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:18:50   2  from user_tab_col_statistics
00:19:04   3  where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME         COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM       SAMPLE_SIZE LAST_ANALYZED    
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------  
TEST_DATA          CREATED_COL                            1380 .000724638           1 NONE                 151815 02-AUG 00:18:43  
OTHER_TEST_DATA    CREATED_COL                            1380 .000724638           1 NONE                 151815 02-AUG 00:18:36  
TEST_DATA          ID_COL                                50605 .000019761           1 NONE                 151815 02-AUG 00:18:43  
OTHER_TEST_DATA    ID_COL                                50605 .000019761           1 NONE                 151815 02-AUG 00:18:36  
TEST_DATA          NAME_COL                              30275 .000033031           1 NONE                 151815 02-AUG 00:18:43  
OTHER_TEST_DATA    NAME_COL                              30275 .000033031           1 NONE                 151815 02-AUG 00:18:36  
TEST_DATA          OWNER_COL                                26 .038461538           1 NONE                 151815 02-AUG 00:18:43  
OTHER_TEST_DATA    OWNER_COL                                26 .038461538           1 NONE                 151815 02-AUG 00:18:36  
TEST_DATA          TYPE_COL                                 41 .024390244           1 NONE                 151815 02-AUG 00:18:43  
OTHER_TEST_DATA    TYPE_COL                                 41 .024390244           1 NONE                 151815 02-AUG 00:18:36  

10 rows selected.

00:19:08 SQL> REM  Now, the "SIZE 1" has deleted all Histograms
00:19:21 SQL>
00:19:21 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE 250');

PL/SQL procedure successfully completed.

00:19:51 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:19:53   2 
00:20:00 SQL>
00:20:01 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:20:05   2  from user_tab_col_statistics
00:20:11   3  where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME         COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM       SAMPLE_SIZE LAST_ANALYZED    
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------  
TEST_DATA          CREATED_COL                            1380 .002372155         250 HEIGHT BALANCED      151815 02-AUG 00:19:51  
OTHER_TEST_DATA    CREATED_COL                            1380 .002372155         250 HEIGHT BALANCED      151815 02-AUG 00:19:41  
TEST_DATA          ID_COL                                50605 .000019761         250 HEIGHT BALANCED      151815 02-AUG 00:19:51  
OTHER_TEST_DATA    ID_COL                                50605 .000019761         250 HEIGHT BALANCED      151815 02-AUG 00:19:41  
TEST_DATA          NAME_COL                              30275   .0000387         250 HEIGHT BALANCED      151815 02-AUG 00:19:51  
OTHER_TEST_DATA    NAME_COL                              30275   .0000387         250 HEIGHT BALANCED      151815 02-AUG 00:19:41  
TEST_DATA          OWNER_COL                                26 3.2935E-06          26 FREQUENCY            151815 02-AUG 00:19:51  
OTHER_TEST_DATA    OWNER_COL                                26 3.2935E-06          26 FREQUENCY            151815 02-AUG 00:19:41  
TEST_DATA          TYPE_COL                                 41 3.2935E-06          41 FREQUENCY            151815 02-AUG 00:19:51  
OTHER_TEST_DATA    TYPE_COL                                 41 3.2935E-06          41 FREQUENCY            151815 02-AUG 00:19:41  

10 rows selected.

00:20:21 SQL> REM Conclusions :
00:21:30 SQL> REM 1.  SIZE AUTO  relies on prior queries being reflected on SYS.COL_USAGE$
00:21:48 SQL> REM     If this is a new table or the column hasn't been queried, Oracle will not gather Histograms even if it is a candidate
00:22:15 SQL> REM 2.  SIZE SKEWONLY gathers histograms irrespective of whether the table and/or column has been queried before
00:22:48 SQL> REM 3.  The Sample Size is a signficant determinant of accuracy (NUM_DISTINCT and NUM_BUCKETS)
REM  Another thing to test in 10.2.0.[1-3]
00:23:23 SQL>
00:23:24 SQL>
00:23:24 SQL>
REM  Causing a very high degree of skew
00:24:18 SQL> update test_data set owner_col = 'HEMANT_X' where mod(id_col,100) != 0;

150273 rows updated.

00:24:43 SQL> commit;

Commit complete.

00:24:45 SQL> exec dbms_stats.gather_schema_stats(user);

PL/SQL procedure successfully completed.

00:25:15 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:25:26   2  from user_tab_col_statistics
00:25:32   3  where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME         COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM       SAMPLE_SIZE LAST_ANALYZED    
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------  
TEST_DATA          CREATED_COL                             934 .001070664           1 NONE                   5950 02-AUG 00:25:15  
OTHER_TEST_DATA    CREATED_COL                             941 .001062699           1 NONE                   5730 02-AUG 00:25:13  
TEST_DATA          ID_COL                                50279 .000019889           1 NONE                  37863 02-AUG 00:25:15  
OTHER_TEST_DATA    ID_COL                                50729 .000019713           1 NONE                  37813 02-AUG 00:25:13  
TEST_DATA          NAME_COL                              28255 .000035392           1 NONE                  37863 02-AUG 00:25:15  
OTHER_TEST_DATA    NAME_COL                              28643 .000034913           1 NONE                  37813 02-AUG 00:25:13  
TEST_DATA          OWNER_COL                                10 3.2925E-06          10 FREQUENCY              5950 02-AUG 00:25:15  
OTHER_TEST_DATA    OWNER_COL                                24 .041666667           1 NONE                   5730 02-AUG 00:25:13  
TEST_DATA          TYPE_COL                                 35 3.2925E-06          35 FREQUENCY              5950 02-AUG 00:25:15  
OTHER_TEST_DATA    TYPE_COL                                 30 3.2623E-06          30 FREQUENCY              5730 02-AUG 00:25:13  

10 rows selected.

00:25:42 SQL> select owner_col, count(*) from test_data group by owner_col;

OWNER_COL                        COUNT(*)                                                                                          
------------------------------ ----------                                                                                          
MDSYS                                  27                                                                                          
DMSYS                                   3                                                                                          
PUBLIC                                987                                                                                          
OLAPSYS                                27                                                                                          
CTXSYS                                 12                                                                                          
SYSTEM                                  9                                                                                          
EXFSYS                                  6                                                                                          
SH                                      9                                                                                          
ORDSYS                                 48                                                                                          
ORDPLUGINS                              3                                                                                          
SYSMAN                                 39                                                                                          
OE                                      6                                                                                          
XDB                                    27                                                                                          
IX                                      3                                                                                          
BI                                      3                                                                                          
HEMANT_X                           150273                                                                                          
SYS                                   321                                                                                          
WMSYS                                  12                                                                                          

18 rows selected.

00:26:11 SQL> REM  With very high skew (HEMANT_X accounting for 99% of the rows)
00:26:28 SQL> REM  the AUTO_SAMPLE_SIZE of 5,950 generated a Frequency Histogram of 10 buckets for 10 distinct values
00:27:17 SQL> REM  However, the  column actually has 18 distint values !!
REM  Another thing to test in 10.2.0.[1-3]

00:27:33 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>20);

PL/SQL procedure successfully completed.

00:27:58 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:28:13   2  from user_tab_col_statistics
00:28:24   3   where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME         COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM       SAMPLE_SIZE LAST_ANALYZED    
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------  
TEST_DATA          CREATED_COL                            1269 .000788022           1 NONE                  30211 02-AUG 00:27:58  
OTHER_TEST_DATA    CREATED_COL                            1262 .000792393           1 NONE                  30390 02-AUG 00:27:54  
TEST_DATA          ID_COL                                50492 .000019805           1 NONE                  30211 02-AUG 00:27:58  
OTHER_TEST_DATA    ID_COL                                50341 .000019865           1 NONE                  30390 02-AUG 00:27:54  
TEST_DATA          NAME_COL                              28226 .000035428           1 NONE                  30211 02-AUG 00:27:58  
OTHER_TEST_DATA    NAME_COL                              28169   .0000355           1 NONE                  30390 02-AUG 00:27:54  
TEST_DATA          OWNER_COL                                17 3.3101E-06          17 FREQUENCY             30211 02-AUG 00:27:58  
OTHER_TEST_DATA    OWNER_COL                                26 .038461538           1 NONE                  30390 02-AUG 00:27:54  
TEST_DATA          TYPE_COL                                 38 3.3101E-06          38 FREQUENCY             30211 02-AUG 00:27:58  
OTHER_TEST_DATA    TYPE_COL                                 36 3.2906E-06          36 FREQUENCY             30390 02-AUG 00:27:54  

10 rows selected.

00:28:33 SQL> REM With a 20% sample, Oracle came up with an estimate of 17 distinct values.
00:29:36 SQL>
00:29:37 SQL> spool off
I hope that you can follow what might seem haphazard testing.
I hadn't prepared a "test script" with all the queries in advance but was building the queries as I went along, based on results of preceding queries.
Hemant's Oracle DBA Blog

02 August, 2008

Testing Gather Stats behaviour based on COL_USAGE

No comments:

Aggregated by orafaq.com

OCP 12c