02 August, 2008

Testing Gather Stats behaviour based on COL_USAGE

Here I am posting some results on tests with COL_USAGE$ and GATHER_STATS behaviours of 'FOR ALL COLUMNS SIZE AUTOSKEWONLY250' with different ESTIMATE_PERCENTs.

I've put in some, what I hope are, explanatory notes in the output.

These tests were on 10.2.0.4 where a few bugs are fixed. I had slightly different results in 10.2.0.2 (a different database so different rows counts). I plan to post the same test results on other 10.2.0.1/10.2.0.3 databases, when possible. The bugs relate to a higher degree of inaccuracy in colum histograms / histogram types. 10.2.0.4 seems to be more reliable, but not 100%



23:26:08 SQL> select table_name, num_rows, last_analyzed from user_tables;

TABLE_NAME NUM_ROWS LAST_ANALYZED
------------------------------ ---------- ---------------
SOURCE_TABLE

23:26:19 SQL> select * from sys.col_usage$ where obj# in (select object_id from user_objects);

no rows selected

23:26:38 SQL> create table test_data (owner_col varchar2(30), name_col varchar2(128), id_col number, type_col varchar2(19), created_col date);

Table created.

23:28:06 SQL> insert into test_data select owner, object_name, object_id, object_type, created from dba_objects;

50605 rows created.

23:28:28 SQL> commit;

Commit complete.

23:28:33 SQL> select object_id, data_object_id from user_objects where object_name ='TEST_DATA';

OBJECT_ID DATA_OBJECT_ID
---------- --------------
53392 53392

23:28:57 SQL> insert into test_data select * from test_data union all select * from test_data;

101210 rows created.

23:29:20 SQL> commit;

Commit complete.

23:29:24 SQL> select count(*) from test_data;

COUNT(*)
----------
151815

23:29:47 SQL> select count(*) from test_data where owner_col = 'HEMANT';

COUNT(*)
----------
9

23:30:09 SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

23:30:26 SQL> select count(*) from test_data where owner_col = 'HEMANT';

COUNT(*)
----------
9

23:30:50 SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

23:31:26 SQL> select * from sys.col_usage$ where obj# in (select object_id from user_objects);

OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------
53392 1 1 0 0 0 0 0 01-AUG 23:30:26

23:31:43 SQL> select object_name, object_id from user_objects;

OBJECT_NAME
--------------------------------------------------------------------------------------------------------------------------------
OBJECT_ID
----------
SOURCE_TABLE
53391

TEST_DATA
53392

LEV_SEQ
53322


23:32:00 SQL> REM So, for OBJ 53392 (table TEST_DATA), COL_USAGE$ show that Col#1 (OWNER_COL has had 1 query on Equality Predicate)
23:38:44 SQL>
23:38:44 SQL> select count(*) from test_data where id_col > 1000;

COUNT(*)
----------
148953

23:39:04 SQL> select * from sys.col_usage$ where obj# in (select object_id from user_objects);

OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------
53392 1 1 0 0 0 0 0 01-AUG 23:30:26

REM The query on iD_COL (which is INTCOL#3 in the table) isn't reflected in COL_USAGE$ yet.
REM We'll flush database monitoring to ensure that the data is reflected
REM note : A Gather_Statistics call would be expected to flush monitoring as well

23:39:13 SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

23:39:23 SQL> select * from sys.col_usage$ where obj# in (select object_id from user_objects);

OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------
53392 1 1 0 0 0 0 0 01-AUG 23:30:26
53392 3 0 0 0 1 0 0 01-AUG 23:39:23

23:39:31 SQL> REM The entry for Col# (ID_COL) now shows that a RangeScan was executed
23:40:09 SQL>
23:40:09 SQL> exec dbms_stats.gather_schema_stats(user);

PL/SQL procedure successfully completed.

23:40:20 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME NUM_ROWS SAMPLE_SIZE LAST_ANALYZED
------------------------------ ---------- ----------- ---------------
SOURCE_TABLE 303108 60119 01-AUG 23:40:20
TEST_DATA 152116 38029 01-AUG 23:40:20

23:40:36 SQL> REM Oracle used an Auto Size to sample Table statistics.
23:40:58 SQL> REM what about columns ?

23:41:07 SQL> select column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
23:41:28 2 from user_tab_col_statistics
23:41:32 3 where table_name = 'TEST_DATA';

COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
OWNER_COL 22 3.3324E-06 22 FREQUENCY 5832 01-AUG 23:40:20
NAME_COL 28387 .000035227 1 NONE 38029 01-AUG 23:40:20
ID_COL 50487 .000019807 1 NONE 38029 01-AUG 23:40:20
TYPE_COL 31 .032258065 1 NONE 5832 01-AUG 23:40:20
CREATED_COL 935 .001069519 1 NONE 5832 01-AUG 23:40:20

23:41:49 SQL> REM Oracle gathered a Frequency Histogram on OWNER_COL.

23:42:44 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>100);

PL/SQL procedure successfully completed.

23:43:06 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME NUM_ROWS SAMPLE_SIZE LAST_ANALYZED
------------------------------ ---------- ----------- ---------------
SOURCE_TABLE 303666 303666 01-AUG 23:43:04
TEST_DATA 151815 151815 01-AUG 23:43:06

23:43:14 SQL> select column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
23:43:22 2 from user_tab_col_statistics
23:43:28 3 where table_name = 'TEST_DATA';

COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
OWNER_COL 26 3.2935E-06 26 FREQUENCY 151815 01-AUG 23:43:06
NAME_COL 30275 .000033031 1 NONE 151815 01-AUG 23:43:06
ID_COL 50605 .000019761 1 NONE 151815 01-AUG 23:43:06
TYPE_COL 41 .024390244 1 NONE 151815 01-AUG 23:43:06
CREATED_COL 1380 .000724638 1 NONE 151815 01-AUG 23:43:06

23:43:33 SQL> REM Sampling is 100% but no new histograms are gathered
23:44:24 SQL>
23:44:44 SQL> select count(*) from test_data where type_col like 'P%';

COUNT(*)
----------
5268

23:44:53 SQL> select * from sys.col_usage$ where obj# in (select object_id from user_objects);

OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------
53392 1 1 0 0 0 0 0 01-AUG 23:30:26
53392 3 0 0 0 1 0 0 01-AUG 23:39:23

23:45:06 SQL> REM Col_Usage still shows queries on OWNER_COL and ID_COL
23:45:21 SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

23:45:31 SQL> select * from sys.col_usage$ where obj# in (select object_id from user_objects);

OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------
53392 1 1 0 0 0 0 0 01-AUG 23:30:26
53392 3 0 0 0 1 0 0 01-AUG 23:39:23
53392 4 0 0 0 0 1 0 01-AUG 23:45:31

23:45:38 SQL> REM Now we see the LIKE_PREDs having been used for Col#4 (TYPE_COL)
23:46:10 SQL> exec dbms_stats.gather_schema_stats(user);

PL/SQL procedure successfully completed.

23:46:27 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME NUM_ROWS SAMPLE_SIZE LAST_ANALYZED
------------------------------ ---------- ----------- ---------------
SOURCE_TABLE 303256 75814 01-AUG 23:46:27
TEST_DATA 152828 38207 01-AUG 23:46:27

23:46:40 SQL> select column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
23:47:10 2 from user_tab_col_statistics
23:47:23 3 where table_name = 'TEST_DATA';

COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
OWNER_COL 22 .000003283 22 FREQUENCY 6332 01-AUG 23:46:27
NAME_COL 28368 .000035251 1 NONE 38207 01-AUG 23:46:27
ID_COL 50662 .000019739 1 NONE 38207 01-AUG 23:46:27
TYPE_COL 30 .000003283 30 FREQUENCY 6332 01-AUG 23:46:27
CREATED_COL 950 .001052632 1 NONE 6332 01-AUG 23:46:27

23:47:36 SQL> REM Now, Table level statistics are updated and a new Histogram on TYPE_COL has been gathered
23:48:44 SQL> REM The Presence of TYPE_COL (INTCOL#=4) in SYS.COL_USAGE$ drove the 'FOR ALL COLUMN SIZE AUTO
23:49:16 SQL> REM to generate a histogram on this column
23:49:34 SQL> REM Earlier, there was no Histogram on this column
23:50:04 SQL> REM Therefore, the first one or few queries on this column would have been executed without the
23:50:23 SQL> REM advantage of Histogram !
23:50:37 SQL>
23:50:47 SQL> create table other_test_data as select * from test_data;

Table created.

23:51:01 SQL> exec dbms_Stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

23:51:11 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME NUM_ROWS SAMPLE_SIZE LAST_ANALYZED
------------------------------ ---------- ----------- ---------------
OTHER_TEST_DATA
SOURCE_TABLE 303256 75814 01-AUG 23:46:27
TEST_DATA 152828 38207 01-AUG 23:46:27

23:51:20 SQL> exec dbms_stats.gather_schema_stats(user);

PL/SQL procedure successfully completed.

23:51:57 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME NUM_ROWS SAMPLE_SIZE LAST_ANALYZED
------------------------------ ---------- ----------- ---------------
OTHER_TEST_DATA 151376 37844 01-AUG 23:51:56
SOURCE_TABLE 301861 51707 01-AUG 23:51:57
TEST_DATA 151992 37998 01-AUG 23:51:57

23:52:05 SQL> REM OK, we now have table statistics on OTHER_TEST_DATA
23:52:14 SQL> REM ... what about columns ?
23:52:20 SQL> select column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
23:52:32 2 from user_tab_col_statistics
23:52:40 3 wehre table_name = 'OTHER_TEST_DATA'
23:52:47 SQL> l3
3* wehre table_name = 'OTHER_TEST_DATA'
23:52:49 SQL> c/wehre/where
3* where table_name = 'OTHER_TEST_DATA'
23:52:55 SQL> l
1 select column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
2 from user_tab_col_statistics
3* where table_name = 'OTHER_TEST_DATA'
23:52:55 SQL> /

COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
OWNER_COL 25 .04 1 NONE 5781 01-AUG 23:51:56
NAME_COL 28382 .000035234 1 NONE 37844 01-AUG 23:51:56
ID_COL 50681 .000019731 1 NONE 37844 01-AUG 23:51:56
TYPE_COL 30 .033333333 1 NONE 5781 01-AUG 23:51:56
CREATED_COL 929 .001076426 1 NONE 5781 01-AUG 23:51:56

23:52:56 SQL> REM This table has exactly the same data as table "TEST_DATA"
23:53:10 SQL> REM Yet, Oracle's method_opt=>'FOR ALL COLUMNS SIZE AUTO' behaviour did not gather any histograms
23:53:34 SQL> REM The reason ? That sys.col_usage$ does not indicate any queries as having been executed yet !
23:53:55 SQL> REM slightly counter-intuitive ? Until queries are executed, Oracle will not decide which
23:54:18 SQL> REM histograms to create.
23:54:23 SQL> REM but the first few queries will, probably, "suffer" for lack of histograms
23:54:46 SQL>
23:54:46 SQL> exec dbms_stats.gather_schema_stats(user, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

23:55:41 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME NUM_ROWS SAMPLE_SIZE LAST_ANALYZED
------------------------------ ---------- ----------- ---------------
OTHER_TEST_DATA 151724 37931 01-AUG 23:55:40
SOURCE_TABLE 302095 58785 01-AUG 23:55:41
TEST_DATA 152784 38196 01-AUG 23:55:41

23:55:54 SQL> col table_name format a18
23:56:15 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
23:56:23 2 from user_tab_col_statistics
23:56:27 3 where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2,1;

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
OTHER_TEST_DATA CREATED_COL 903 .00110742 1 NONE 4915 01-AUG 23:55:40
TEST_DATA CREATED_COL 899 .001112347 1 NONE 5021 01-AUG 23:55:41
OTHER_TEST_DATA ID_COL 50377 .00001985 1 NONE 37931 01-AUG 23:55:40
TEST_DATA ID_COL 50762 .0000197 1 NONE 38196 01-AUG 23:55:41
OTHER_TEST_DATA NAME_COL 28466 .00003513 1 NONE 37931 01-AUG 23:55:40
TEST_DATA NAME_COL 28556 .000035019 1 NONE 38196 01-AUG 23:55:41
OTHER_TEST_DATA OWNER_COL 23 .043478261 1 NONE 4915 01-AUG 23:55:40
TEST_DATA OWNER_COL 20 .05 1 NONE 5021 01-AUG 23:55:41
OTHER_TEST_DATA TYPE_COL 32 .03125 1 NONE 4915 01-AUG 23:55:40
TEST_DATA TYPE_COL 31 .032258065 1 NONE 5021 01-AUG 23:55:41

10 rows selected.

23:57:05 SQL> REM All the columns are without Histograms
23:57:16 SQL> REM The "method_opt=>'FOR ALL COLUMNS SIZE 1'" has deleted the Histograms
23:57:45 SQL>
23:57:55 SQL> exec dbms_stats.gather_schema_stats(user,method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY');

PL/SQL procedure successfully completed.

23:58:11 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME NUM_ROWS SAMPLE_SIZE LAST_ANALYZED
------------------ ---------- ----------- ---------------
OTHER_TEST_DATA 151456 37864 01-AUG 23:58:07
SOURCE_TABLE 303666 303666 01-AUG 23:58:10
TEST_DATA 152804 38201 01-AUG 23:58:11

23:58:19 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
23:58:28 2 from user_tab_col_statistics
23:58:35 3 where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
TEST_DATA CREATED_COL 957 .002409639 254 HEIGHT BALANCED 6456 01-AUG 23:58:11
OTHER_TEST_DATA CREATED_COL 886 .00243309 254 HEIGHT BALANCED 5061 01-AUG 23:58:07
TEST_DATA ID_COL 50294 .000019883 1 NONE 38201 01-AUG 23:58:11
OTHER_TEST_DATA ID_COL 50622 .000019754 1 NONE 37864 01-AUG 23:58:07
TEST_DATA NAME_COL 28410 .000052648 254 HEIGHT BALANCED 38201 01-AUG 23:58:11
OTHER_TEST_DATA NAME_COL 28447 .000052918 254 HEIGHT BALANCED 37864 01-AUG 23:58:07
TEST_DATA OWNER_COL 23 3.3141E-06 23 FREQUENCY 6456 01-AUG 23:58:11
OTHER_TEST_DATA OWNER_COL 20 3.2683E-06 20 FREQUENCY 5061 01-AUG 23:58:07
TEST_DATA TYPE_COL 31 3.3141E-06 31 FREQUENCY 6456 01-AUG 23:58:11
OTHER_TEST_DATA TYPE_COL 31 3.2683E-06 31 FREQUENCY 5061 01-AUG 23:58:07

10 rows selected.

23:58:47 SQL> REM Aah ! SIZE SKEWONLY caused Histograms to be gathered on most columns
23:59:13 SQL> REM but do you notice that sample sizes are different between the two tables ?
23:59:26 SQL> REM and that estmated "num_distinct" and "num_buckets" aren't the same ! (even though the data is the same !)
23:59:51 SQL> REM sampling errors have creeped in !
REM I'd like to test this back in 10.2.0.x where x is less than 4 !!
00:00:00 SQL>
00:00:01 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY');

PL/SQL procedure successfully completed.

00:00:41 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:00:49 2 from user_tab_col_statistics
00:00:55 3 where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
TEST_DATA CREATED_COL 1380 .002350099 254 HEIGHT BALANCED 151815 02-AUG 00:00:41
OTHER_TEST_DATA CREATED_COL 1380 .002350099 254 HEIGHT BALANCED 151815 02-AUG 00:00:30
TEST_DATA ID_COL 50605 .000019761 1 NONE 151815 02-AUG 00:00:41
OTHER_TEST_DATA ID_COL 50605 .000019761 1 NONE 151815 02-AUG 00:00:30
TEST_DATA NAME_COL 30275 .0000387 254 HEIGHT BALANCED 151815 02-AUG 00:00:41
OTHER_TEST_DATA NAME_COL 30275 .0000387 254 HEIGHT BALANCED 151815 02-AUG 00:00:30
TEST_DATA OWNER_COL 26 3.2935E-06 26 FREQUENCY 151815 02-AUG 00:00:41
OTHER_TEST_DATA OWNER_COL 26 3.2935E-06 26 FREQUENCY 151815 02-AUG 00:00:30
TEST_DATA TYPE_COL 41 3.2935E-06 41 FREQUENCY 151815 02-AUG 00:00:41
OTHER_TEST_DATA TYPE_COL 41 3.2935E-06 41 FREQUENCY 151815 02-AUG 00:00:30

10 rows selected.

00:01:03 SQL> REM Now, with 100% sampling and SIZE SKEWONLY, we have "better" histograms
00:02:18 SQL> REM (see, e.g. how OWNER_COL's NUM_DISTINCT is up from 23 and 20 to 26 and 26 for the two tables)
00:02:45 SQL> REM (or, TYPE_COL NUM_DISTINCT is up from 31 to 41)
REM I'd like to test this back in 10.2.0.x where x is less than 4 !!
00:03:01 SQL>
00:09:42 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE AUTO');

PL/SQL procedure successfully completed.

00:11:02 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:11:09 2 from user_tab_col_statistics
00:11:15 3 where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
TEST_DATA CREATED_COL 1380 .000724638 1 NONE 151815 02-AUG 00:11:02
OTHER_TEST_DATA CREATED_COL 1380 .000724638 1 NONE 151815 02-AUG 00:10:54
TEST_DATA ID_COL 50605 .000019761 1 NONE 151815 02-AUG 00:11:02
OTHER_TEST_DATA ID_COL 50605 .000019761 1 NONE 151815 02-AUG 00:10:54
TEST_DATA NAME_COL 30275 .000033031 1 NONE 151815 02-AUG 00:11:02
OTHER_TEST_DATA NAME_COL 30275 .000033031 1 NONE 151815 02-AUG 00:10:54
TEST_DATA OWNER_COL 26 3.2935E-06 26 FREQUENCY 151815 02-AUG 00:11:02
OTHER_TEST_DATA OWNER_COL 26 .038461538 1 NONE 151815 02-AUG 00:10:54
TEST_DATA TYPE_COL 41 3.2935E-06 41 FREQUENCY 151815 02-AUG 00:11:02
OTHER_TEST_DATA TYPE_COL 41 .024390244 1 NONE 151815 02-AUG 00:10:54

10 rows selected.

00:11:21 SQL> REM Note how the "SIZE AUTO" reverted to Histograms on OWNER_COL and TYPE_COL only in TEST_DATA !
00:12:29 SQL> REM It isn't intelligent enough to figure out that OTHER_TEST_DATA's columns are also candidates
00:13:03 SQL>
00:13:20 SQL> select count(*) from other_test_data where type_col like 'T%';

COUNT(*)
----------
12291

00:13:34 SQL> exec dbms_Stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

00:14:00 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE AUTO');

PL/SQL procedure successfully completed.

00:14:27 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:14:28 2 from user_tab_col_statistics
00:14:34 3 where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
TEST_DATA CREATED_COL 1380 .000724638 1 NONE 151815 02-AUG 00:14:27
OTHER_TEST_DATA CREATED_COL 1380 .000724638 1 NONE 151815 02-AUG 00:14:19
TEST_DATA ID_COL 50605 .000019761 1 NONE 151815 02-AUG 00:14:27
OTHER_TEST_DATA ID_COL 50605 .000019761 1 NONE 151815 02-AUG 00:14:19
TEST_DATA NAME_COL 30275 .000033031 1 NONE 151815 02-AUG 00:14:27
OTHER_TEST_DATA NAME_COL 30275 .000033031 1 NONE 151815 02-AUG 00:14:19
TEST_DATA OWNER_COL 26 3.2935E-06 26 FREQUENCY 151815 02-AUG 00:14:27
OTHER_TEST_DATA OWNER_COL 26 .038461538 1 NONE 151815 02-AUG 00:14:19
TEST_DATA TYPE_COL 41 3.2935E-06 41 FREQUENCY 151815 02-AUG 00:14:27
OTHER_TEST_DATA TYPE_COL 41 3.2935E-06 41 FREQUENCY 151815 02-AUG 00:14:19

10 rows selected.

00:14:41 SQL> REM Now it has gathered a Histogram on OTHER_TEST_DATA.TYPE_COL because it is now reflected in SYS.COL_USAGE$
00:15:23 SQL> select t.table_name, cu.* from user_objects t, sys.col_usage$ cu
00:15:55 2 where t.object_id=cu.obj#
00:16:04 SQL> l1
1* select t.table_name, cu.* from user_objects t, sys.col_usage$ cu
00:16:07 SQL> c/table/object
1* select t.object_name, cu.* from user_objects t, sys.col_usage$ cu
00:16:10 SQL> l
1 select t.object_name, cu.* from user_objects t, sys.col_usage$ cu
2* where t.object_id=cu.obj#
00:16:11 SQL> /

OBJECT_NAME
--------------------------------------------------------------------------------------------------------------------------------
OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------
TEST_DATA
53392 4 0 0 0 0 1 0 01-AUG 23:45:31

TEST_DATA
53392 3 0 0 0 1 0 0 01-AUG 23:39:23

TEST_DATA
53392 1 1 0 0 0 0 0 01-AUG 23:30:26

OTHER_TEST_DATA
53395 4 0 0 0 0 1 0 02-AUG 00:14:00


00:16:12 SQL> REM Col#4 in both tables is TYPE_COL
00:16:45 SQL> REM In both tables, I have run a LIKE query on this column
00:17:02 SQL> REM The TIMESTAMP shows that there has been no query on TEST_DATA recently, but there has been 1 query on OTHER_TEST_DATA
00:17:32 SQL>
00:17:32 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

00:18:43 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:18:50 2 from user_tab_col_statistics
00:19:04 3 where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
TEST_DATA CREATED_COL 1380 .000724638 1 NONE 151815 02-AUG 00:18:43
OTHER_TEST_DATA CREATED_COL 1380 .000724638 1 NONE 151815 02-AUG 00:18:36
TEST_DATA ID_COL 50605 .000019761 1 NONE 151815 02-AUG 00:18:43
OTHER_TEST_DATA ID_COL 50605 .000019761 1 NONE 151815 02-AUG 00:18:36
TEST_DATA NAME_COL 30275 .000033031 1 NONE 151815 02-AUG 00:18:43
OTHER_TEST_DATA NAME_COL 30275 .000033031 1 NONE 151815 02-AUG 00:18:36
TEST_DATA OWNER_COL 26 .038461538 1 NONE 151815 02-AUG 00:18:43
OTHER_TEST_DATA OWNER_COL 26 .038461538 1 NONE 151815 02-AUG 00:18:36
TEST_DATA TYPE_COL 41 .024390244 1 NONE 151815 02-AUG 00:18:43
OTHER_TEST_DATA TYPE_COL 41 .024390244 1 NONE 151815 02-AUG 00:18:36

10 rows selected.

00:19:08 SQL> REM Now, the "SIZE 1" has deleted all Histograms
00:19:21 SQL>
00:19:21 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE 250');

PL/SQL procedure successfully completed.

00:19:51 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:19:53 2
00:20:00 SQL>
00:20:01 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:20:05 2 from user_tab_col_statistics
00:20:11 3 where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
TEST_DATA CREATED_COL 1380 .002372155 250 HEIGHT BALANCED 151815 02-AUG 00:19:51
OTHER_TEST_DATA CREATED_COL 1380 .002372155 250 HEIGHT BALANCED 151815 02-AUG 00:19:41
TEST_DATA ID_COL 50605 .000019761 250 HEIGHT BALANCED 151815 02-AUG 00:19:51
OTHER_TEST_DATA ID_COL 50605 .000019761 250 HEIGHT BALANCED 151815 02-AUG 00:19:41
TEST_DATA NAME_COL 30275 .0000387 250 HEIGHT BALANCED 151815 02-AUG 00:19:51
OTHER_TEST_DATA NAME_COL 30275 .0000387 250 HEIGHT BALANCED 151815 02-AUG 00:19:41
TEST_DATA OWNER_COL 26 3.2935E-06 26 FREQUENCY 151815 02-AUG 00:19:51
OTHER_TEST_DATA OWNER_COL 26 3.2935E-06 26 FREQUENCY 151815 02-AUG 00:19:41
TEST_DATA TYPE_COL 41 3.2935E-06 41 FREQUENCY 151815 02-AUG 00:19:51
OTHER_TEST_DATA TYPE_COL 41 3.2935E-06 41 FREQUENCY 151815 02-AUG 00:19:41

10 rows selected.

00:20:21 SQL> REM Conclusions :
00:21:30 SQL> REM 1. SIZE AUTO relies on prior queries being reflected on SYS.COL_USAGE$
00:21:48 SQL> REM If this is a new table or the column hasn't been queried, Oracle will not gather Histograms even if it is a candidate
00:22:15 SQL> REM 2. SIZE SKEWONLY gathers histograms irrespective of whether the table and/or column has been queried before
00:22:48 SQL> REM 3. The Sample Size is a signficant determinant of accuracy (NUM_DISTINCT and NUM_BUCKETS)
REM Another thing to test in 10.2.0.[1-3]
00:23:23 SQL>
00:23:24 SQL>
00:23:24 SQL>
REM Causing a very high degree of skew
00:24:18 SQL> update test_data set owner_col = 'HEMANT_X' where mod(id_col,100) != 0;

150273 rows updated.

00:24:43 SQL> commit;

Commit complete.

00:24:45 SQL> exec dbms_stats.gather_schema_stats(user);

PL/SQL procedure successfully completed.

00:25:15 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:25:26 2 from user_tab_col_statistics
00:25:32 3 where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
TEST_DATA CREATED_COL 934 .001070664 1 NONE 5950 02-AUG 00:25:15
OTHER_TEST_DATA CREATED_COL 941 .001062699 1 NONE 5730 02-AUG 00:25:13
TEST_DATA ID_COL 50279 .000019889 1 NONE 37863 02-AUG 00:25:15
OTHER_TEST_DATA ID_COL 50729 .000019713 1 NONE 37813 02-AUG 00:25:13
TEST_DATA NAME_COL 28255 .000035392 1 NONE 37863 02-AUG 00:25:15
OTHER_TEST_DATA NAME_COL 28643 .000034913 1 NONE 37813 02-AUG 00:25:13
TEST_DATA OWNER_COL 10 3.2925E-06 10 FREQUENCY 5950 02-AUG 00:25:15
OTHER_TEST_DATA OWNER_COL 24 .041666667 1 NONE 5730 02-AUG 00:25:13
TEST_DATA TYPE_COL 35 3.2925E-06 35 FREQUENCY 5950 02-AUG 00:25:15
OTHER_TEST_DATA TYPE_COL 30 3.2623E-06 30 FREQUENCY 5730 02-AUG 00:25:13

10 rows selected.

00:25:42 SQL> select owner_col, count(*) from test_data group by owner_col;

OWNER_COL COUNT(*)
------------------------------ ----------
MDSYS 27
DMSYS 3
PUBLIC 987
OLAPSYS 27
CTXSYS 12
SYSTEM 9
EXFSYS 6
SH 9
ORDSYS 48
ORDPLUGINS 3
SYSMAN 39
OE 6
XDB 27
IX 3
BI 3
HEMANT_X 150273
SYS 321
WMSYS 12

18 rows selected.

00:26:11 SQL> REM With very high skew (HEMANT_X accounting for 99% of the rows)
00:26:28 SQL> REM the AUTO_SAMPLE_SIZE of 5,950 generated a Frequency Histogram of 10 buckets for 10 distinct values
00:27:17 SQL> REM However, the column actually has 18 distint values !!
REM Another thing to test in 10.2.0.[1-3]

00:27:33 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>20);

PL/SQL procedure successfully completed.

00:27:58 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:28:13 2 from user_tab_col_statistics
00:28:24 3 where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
TEST_DATA CREATED_COL 1269 .000788022 1 NONE 30211 02-AUG 00:27:58
OTHER_TEST_DATA CREATED_COL 1262 .000792393 1 NONE 30390 02-AUG 00:27:54
TEST_DATA ID_COL 50492 .000019805 1 NONE 30211 02-AUG 00:27:58
OTHER_TEST_DATA ID_COL 50341 .000019865 1 NONE 30390 02-AUG 00:27:54
TEST_DATA NAME_COL 28226 .000035428 1 NONE 30211 02-AUG 00:27:58
OTHER_TEST_DATA NAME_COL 28169 .0000355 1 NONE 30390 02-AUG 00:27:54
TEST_DATA OWNER_COL 17 3.3101E-06 17 FREQUENCY 30211 02-AUG 00:27:58
OTHER_TEST_DATA OWNER_COL 26 .038461538 1 NONE 30390 02-AUG 00:27:54
TEST_DATA TYPE_COL 38 3.3101E-06 38 FREQUENCY 30211 02-AUG 00:27:58
OTHER_TEST_DATA TYPE_COL 36 3.2906E-06 36 FREQUENCY 30390 02-AUG 00:27:54

10 rows selected.

00:28:33 SQL> REM With a 20% sample, Oracle came up with an estimate of 17 distinct values.
00:29:36 SQL>
00:29:37 SQL> spool off




I hope that you can follow what might seem haphazard testing.
I hadn't prepared a "test script" with all the queries in advance but was building the queries as I went along, based on results of preceding queries.

No comments: