27 April, 2008

Row Sizes and Sort Operations

The Row Size can have a signicant impact on a sort operation. A SORT operation would be required not only when an explicit SORT BY is specified but also for operations like GROUP BY (10g has introduced "Group By Hash" operations to speed up Grouping without doing explicit sorts, but there are some bugs with this feature).

Here is a test case where I create a "TEST_SORT_TABLE" and index it on "ID_COLUMN" and "ITEM_TYPE".




-- create the source table with slightly large row lengths
drop table test_sort_table;
create table test_sort_table (id_column number not null, item_name varchar2(36) not null, item_type varchar2(19) not null, owner_name varchar2(30) not null, other_cols varchar2(128)) ;
alter table test_sort_table nologging;
insert /*+ APPEND */ into test_sort_table
2 select object_id, substr(object_name,1,32), object_type, owner, rpad('a',124,'f')
3 from dba_objects
4 where object_id is not null
5 order by substr(object_type,1,6)substr(object_name,4,9) ;
commit;
insert /*+ APPEND */ into test_sort_table
2 select * from test_sort_table
3 union all select * from test_sort_table
4 order by substr(item_name,1,8)substr(owner_name,2,4);
commit;
create index test_sort_table_ndx_1 on test_sort_table (id_column , item_name);
exec dbms_stats.gather_table_Stats('','TEST_SORT_TABLE',method_opt=>'FOR ALL COLUMNS SIZE 1',estimate_percent=>100,cascade=>TRUE);

-- get a "feel" for the size of the table (row count, row length, block count)
select count(*) from test_sort_table;

COUNT(*)
----------
155025
select num_rows, avg_row_len, blocks from user_tables where table_name = 'TEST_SORT_TABLE';

NUM_ROWS AVG_ROW_LEN BLOCKS
---------- ----------- ----------
155025 168 3835
select num_rows, blevel, leaf_blocks from user_indexes where index_name = 'TEST_SORT_TABLE_NDX_1';

NUM_ROWS BLEVEL LEAF_BLOCKS
---------- ---------- -----------
155025 2 865

explain plan for
2 select /* only the index columns */ id_column, item_name from test_sort_table order by id_column , item_name ;
select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2301815179

------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
------------------------------------------------------------------------------------------
0 SELECT STATEMENT 155K 4541K 873 (1) 00:00:11
1 INDEX FULL SCAN TEST_SORT_TABLE_NDX_1 155K 4541K 873 (1) 00:00:11
------------------------------------------------------------------------------------------

explain plan for
2 select /* one additional column */ id_column, item_name, item_type from test_sort_table order by id_column , item_name ;
select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1390622354

----------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes TempSpc Cost (%CPU) Time
----------------------------------------------------------------------------------------------
0 SELECT STATEMENT 155K 5904K 2429 (2) 00:00:30
1 SORT ORDER BY 155K 5904K 15M 2429 (2) 00:00:30
2 TABLE ACCESS FULL TEST_SORT_TABLE 155K 5904K 851 (2) 00:00:11
----------------------------------------------------------------------------------------------

explain plan for
2 select /* all columns, not ordered */ * from test_sort_table ;
select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2991649059

-------------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
-------------------------------------------------------------------------------------
0 SELECT STATEMENT 155K 24M 852 (2) 00:00:11
1 TABLE ACCESS FULL TEST_SORT_TABLE 155K 24M 852 (2) 00:00:11
-------------------------------------------------------------------------------------

explain plan for
2 select /* all columns, ordered */ * from test_sort_table order by id_column , item_name ;
select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1390622354

----------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes TempSpc Cost (%CPU) Time
----------------------------------------------------------------------------------------------
0 SELECT STATEMENT 155K 24M 6597 (1) 00:01:20
1 SORT ORDER BY 155K 24M 55M 6597 (1) 00:01:20
2 TABLE ACCESS FULL TEST_SORT_TABLE 155K 24M 852 (2) 00:00:11
----------------------------------------------------------------------------------------------





In the first query, I query for only the indexed columns and request an ORDER BY on
these columns. Oracle is able to do an "Index Full Scan" (which retrieves the values *in order*) for all 155thousand rows at a "COST" of 873.

In the second query, I request 3 columns (ie, including one additional column not in the index). Since Oracle has to go to the Table to fetch the third column, it decides to do a Full Table Scan as it has to read ALL the rows from the table. It avoids using the Index. That is sensible. However, note that the total "Bytes" fetched has gone up from 4541K to 5904K because of the additional column. What is worse is that the SORT operation requires 15MB of TempSpace and increases the total "COST". Apparently, the "COST" of the SORT operation is 1,578 (2429-851).
One additional column ("ITEM_TYPE" varchar2(19) -- doesn't seem very large ?) has increased the "COST" of my query from 873 to 2,429

If we compare the third and fourth queries the TempSpace for the SORT operation is now 55MB and the "COST" goes up from 852 to 6,597. Why such a big difference ? Although my Sort is supposed to be on only 2 columns, since Oracle fetches the full row, it has to effectively "sort" all the columns. It might have been better if we could pre-sort from the Index and, then, go to the Table, to fetch the rows in the sorted order. Would it ? Actually , no. Even if we use the Index to pre-sort and get ROWIDs in sorted order, we'd be introducing multiple read calls to then fetch the rows from the table -- a single block read call for each row. Clustering of the table might help reducing physical reads but not 'consistent gets'. I leave that experiement for you to try.

So what we get is :


1. Fetch Only 2 Indexed Columns in Sorted Order : COST 873, TempSpace NIL
3. Fetch 2+1 Columns in Sorted Order : COST 2,429, TempSpace 15M
4. Fetch ALL Columns without a Sort : COST 852, TempSpace NIL
5. Fetch ALL columns and Sort on 2 columns : COST 6,597, TempSpace 55M




The next time you write a query that does a "SELECT *" stop and think about how the
Row Size affects the "COST". If you add an "ORDER BY" stop again and think twice
as hard.

21 April, 2008

Using SYS

Many DBAs regularly use SYS -- either in SQLPlus or in Enterprise Manager or in RMAN scripts. DBAs not familiar with the reasoning behind the "AS SYSDBA" might not even be aware that they connect as SYS.
I prefer to avoid using SYS as far as possible. There is no reason to use SYS to query data dictionary views and performance views if you can use DBSNMP or PERFSTAT or setup an account with the MONITORER role. Best would be to setup an account with the required privileges, reading from the privileges granted to DBSNMP and PERFSTAT.

Besides the fact that SYS is owner of the data dictionary, the "normal rules" of behaviour for monitoring and DBA accounts do not always apply to SYS. For example SYS does not get audited in the normal database audit defined by AUDIT_TRAIL. 9i introduced AUDIT_SYS_OPERATIONS to cater to the requirement to audit SYS but this does not provide the same structure as the DBA_AUDIT% views. Another difference in SYS is in read consistency. If you use SYS for Exports, please stop doing so. SYS may be used for Transportable Tablespaces in 8i/9i or under Oracle Support's direction but *not* for regular Exports. If you have any doubts, please read MetaLink Note#277237.1.

15 April, 2008

Indexed column (unique or not) -- What if it is NULLable

One of the numerous facets of the Optimizer that Jonathan Lewis touches on is the fact that the Optimizer can and does make use of information about the possible presence of NULLs in an indexed column. Where we have a single column index (or even a multi-column index if none of the columns is NOT NULL), the Optimizer has to assume that there may be one or more NULLs present for that column. It doesn't matter if statistics have been gathered very recently (just before a query is to be optimized) ; it doesn't matter if we (the DBA and Analyst) *know* that there are no NULLs in the column.
The Optimizer must assume that a column not defined as NOT NULL may have NULLs.
From that assumption, it must also assume that not every row in the table is included in an index on that column as NULLs are not indexed. {See my other postings [1] and [2] on indexing NULLs}.
Whether a column is NULLable or is explicitly NOT NULL can help Oracle determine if an Index is to be used.

I've built a small case study here. Assume that we have a PARTS table with a Unique Key (but not defined as a Primary Key) on PART_ID. What would be the COST and CARD estimates for queries against PART_ID ? These become more important when a query on PARTS is a subquery in or part of a much larger, complex query.

(note : I've substituted "geq" and "leq" for ">=" and "<=" in the Predicate information section of the explain plan for the Range Scan queries so that the ">" and "<" don't get misinterpreted)

SQL>
SQL> set pages600
SQL> set linesize132
SQL> set SQLPrompt ''

set feedback off

-- create the PARTS table
drop table parts;
create table parts (part_id number, part_name varchar2(128), part_descr varchar2(128)) nologging;
insert /*+ APPEND */ into parts
2 select object_id, object_name'_'rownum, owner'_'object_name'_'object_type
3 from dba_objects where object_id is not null
4 -- deliberately order by part_id to get good clustering ?
5 order by object_id ;
commit;

-- currently the column is NULLable, let's create a Unique Index
create unique index parts_uk on parts(part_id) nologging;

alter system flush buffer_cache;
-- force any delayed block cleanout and load blocks into sga. This isn't important to the Optimizer in 10g.
select /*+ FULL (p) */ count(*) from parts p;

COUNT(*)
----------
51750
select /*+ INDEX (p parts_uk) */ count(*) from parts p where part_id is not null;

COUNT(*)
----------
51750


exec dbms_stats.gather_table_stats('',tabname=>'PARTS',method_opt=>'FOR ALL COLUMNS SIZE 1',estimate_percent=>100,cascade=>TRUE);

select blocks, num_rows, avg_row_len from user_tables where table_name = 'PARTS';

BLOCKS NUM_ROWS AVG_ROW_LEN
---------- ---------- -----------
581 51750 73
select blevel, leaf_blocks, num_rows, clustering_factor from user_indexes where index_name = 'PARTS_UK';

BLEVEL LEAF_BLOCKS NUM_ROWS CLUSTERING_FACTOR
---------- ----------- ---------- -----------------
1 107 51750 563

-- begin analysis . What are Oracle's COST and CARD for this table ?
-- the impact is to be considered if these queries were subqueries in more detailed/complex queries

-- first case : All PART_IDs, ordered
explain plan for select part_id from parts order by part_id;
select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3769467330

------------------------------------------------------------------------------------
Id Operation Name Rows Bytes TempSpc Cost (%CPU) Time
------------------------------------------------------------------------------------
0 SELECT STATEMENT 51750 252K 297 (4) 00:00:04
1 SORT ORDER BY 51750 252K 1240K 297 (4) 00:00:04
2 TABLE ACCESS FULL PARTS 51750 252K 131 (2) 00:00:02
------------------------------------------------------------------------------------
rollback;

-- second case : All PART_IDs, order doesn't matter
explain plan for select part_id from parts;
select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3931018009

---------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
---------------------------------------------------------------------------
0 SELECT STATEMENT 51750 252K 131 (2) 00:00:02
1 TABLE ACCESS FULL PARTS 51750 252K 131 (2) 00:00:02
---------------------------------------------------------------------------
rollback;

-- third case : Single PART_ID
explain plan for select part_id from parts where part_id=20001;
select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2169424942

------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
------------------------------------------------------------------------------
0 SELECT STATEMENT 1 5 1 (0) 00:00:01
* 1 INDEX UNIQUE SCAN PARTS_UK 1 5 1 (0) 00:00:01
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("PART_ID"=20001)
rollback;

-- third case : Range of PART_IDs
explain plan for select part_id from parts where part_id between 25000 and 25050;
select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2044929039

-----------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
-----------------------------------------------------------------------------
0 SELECT STATEMENT 47 235 2 (0) 00:00:01
* 1 INDEX RANGE SCAN PARTS_UK 47 235 2 (0) 00:00:01
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("PART_ID"geq25000 AND "PART_ID"leq25050)
rollback;


REM REM REM !!!!!!!!!!!!!!!!!!!!!!!!!!!!!
-- We now set PART_ID to be NOT NULL
REM
ALTER TABLE PARTS MODIFY (PART_ID NOT NULL);

-- We re-run the test queries

-- first case : All PART_IDs, ordered
explain plan for select part_id from parts order by part_id;
select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1189395249

-----------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
-----------------------------------------------------------------------------
0 SELECT STATEMENT 51750 252K 110 (2) 00:00:02
1 INDEX FULL SCAN PARTS_UK 51750 252K 110 (2) 00:00:02
-----------------------------------------------------------------------------
rollback;

-- second case : All PART_IDs, order doesn't matter
explain plan for select part_id from parts;
select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2423297136

---------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
---------------------------------------------------------------------------------
0 SELECT STATEMENT 51750 252K 26 (4) 00:00:01
1 INDEX FAST FULL SCAN PARTS_UK 51750 252K 26 (4) 00:00:01
---------------------------------------------------------------------------------
rollback;

-- third case : Single PART_ID
explain plan for select part_id from parts where part_id=20001;
select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2169424942

------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
------------------------------------------------------------------------------
0 SELECT STATEMENT 1 5 1 (0) 00:00:01
* 1 INDEX UNIQUE SCAN PARTS_UK 1 5 1 (0) 00:00:01
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("PART_ID"=20001)
rollback;

-- third case : Range of PART_IDs
explain plan for select part_id from parts where part_id between 25000 and 25050;
select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2044929039

-----------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
-----------------------------------------------------------------------------
0 SELECT STATEMENT 47 235 2 (0) 00:00:01
* 1 INDEX RANGE SCAN PARTS_UK 47 235 2 (0) 00:00:01
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("PART_ID"geq25000 AND "PART_ID"leq25050)
rollback;





While the column (PART_ID) was not a NOT NULL, Oracle would choose a
FullTableScan for queries attempting to retrieve all the rows from the table.
The ORDER BY in the first query also adds to the COST.
Queries for a single value (20001) or a range of values (25000 to 25050) would be executed by Unique Scan and Range Scan respectively, as we expect.


What happens when the column is defined as a NOT NULL ? This, being a DDL,
invalidates parsed statements so the next execution of the same SQLs requires re-parsing.
This time, the Optimizer DOES decide to use the Index on PART_ID for the first two queries. We also see that where the query specifies an ORDER BY, Oracle does an Index Full Scan -- it "walks" the Leaf Blocks from the lowest value to the highest value, retrieving all the values in order. Oracle can now avoid having to do a SORT. Where an ORDER BY is not required, Oracle does a faster Index Fast Full Scan.

We have learnt three things :
1. If a column is guaranteed to contain no NULLs, it is best to explicitly specify it as a NOT NULL so that the Optimizer can take a "better informed" decision on the usability of an index on it.
2. When Oracle percieves a lower cost to an Index access, this can have a greater impact on the execution plan of a larger query which the table and column under consideration may only be a part of.
3. An ordered retrieval of rows can be executed by an Index Full Scan.

What I have not done in this test case is to force a fetch from table rows. The query can be satisfied by the Index alone, in this case. In the real world, too, we do come across and/or can create such indexes where a query or subquery is satisfied by an Index alone.

14 April, 2008

The Worst Ever SQL Rewrite


What is the worst ever re-write you've done in attempting to tune an SQL statement ?

This is what I did a few weeks ago. I took a complicated (it had 3 correlated complex sub-queries) of the form :



SELECT A.column_1
, A.column_2
...
FROM table_A A
,table_P P
...
WHERE A.column_1=E.column_1
...
AND D.column_15 = (
SELECT /*+ NO_MERGE */ MAX(column_25)
FROM
....
.. 71 line SQL statement

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 888888888

------------------------------------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes TempSpc Cost (%CPU) Time
------------------------------------------------------------------------------------------------------------------------
0 SELECT STATEMENT 1 337 47M (1)159:03:36
* 1 FILTER
* 2 HASH JOIN RIGHT OUTER 48M 15G 177K (1) 00:35:26
3 INDEX FAST FULL SCAN A_TABLE 2687 67175 5 (0) 00:00:01


so you can see that it had already been hinted during the course of a tuning exercise ...

and I transformed it with



SELECT /*+ ORDERED */ A.column_1
...
-- FROM table_A A
-- , table_P P
FROM
table_E E
, table_D D
... 74lines

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 999999999

------------------------------------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes TempSpc Cost (%CPU) Time
------------------------------------------------------------------------------------------------------------------------
0 SELECT STATEMENT 1 337 250T (2)999:59:59
* 1 FILTER
* 2 HASH JOIN RIGHT OUTER 48M 15G 250T (2)999:59:59
3 INDEX FAST FULL SCAN A_TABLE 2687 67175 5 (0) 00:00:01




I managed to increase the "COST" from 47million to 250trillion.
Oracle's projected query runtime went up from 159hours to to in excess of 1000 hours. !

And my mistake ? Wrongly ordering the tables in the FROM clause after putting in an ORDERED Hint ! Very fortunately, I hadn't run the query else I would have waited too long before realising that I had forced Oracle into an atrocious execution plan.

Moral of the story ? Hints do work as Directives ! Be careful with Hints.

Complex View Merging -- 7

My final test. What happens if I disable Cost Based Transformation ?


SQL> alter session set "_optimizer_cost_based_transformation"=OFF;

Session altered.

SQL>
SQL> select /* disabled cost_based_trasnfm */ m.author, m.book_name from my_fat_book_list m
2 where
3 m.published_date
4 = (select max(l.published_date)
5 from library_fat_book_list l
6 where l.author=m.author
7 and l.book_name=m.book_name)
8 order by 1,2
9 /
(output deleted)
123 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1796690992

-------------------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes TempSpc Cost (%CPU) Time
-------------------------------------------------------------------------------------------------------
0 SELECT STATEMENT 101K 7666K 3315 (2) 00:00:40
1 MERGE JOIN 101K 7666K 3315 (2) 00:00:40
2 SORT JOIN 101K 3982K 10M 3310 (2) 00:00:40
3 VIEW VW_SQ_1 101K 3982K 2250 (2) 00:00:28
4 HASH GROUP BY 101K 6969K 16M 2250 (2) 00:00:28
5 TABLE ACCESS FULL LIBRARY_FAT_BOOK_LIST 101K 6969K 556 (2) 00:00:07
* 6 SORT JOIN 260 9620 5 (20) 00:00:01
7 TABLE ACCESS FULL MY_FAT_BOOK_LIST 260 9620 4 (0) 00:00:01
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - access("AUTHOR"="M"."AUTHOR" AND "BOOK_NAME"="M"."BOOK_NAME" AND
"M"."PUBLISHED_DATE"="VW_COL_1")
filter("BOOK_NAME"="M"."BOOK_NAME" AND "AUTHOR"="M"."AUTHOR" AND
"M"."PUBLISHED_DATE"="VW_COL_1")


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2470 consistent gets
0 physical reads
0 redo size
5107 bytes sent via SQL*Net to client
483 bytes received via SQL*Net from client
10 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
123 rows processed

SQL>




Much more expensive. The 10.2 Cost Based Transformation WAS better.

12 April, 2008

Complex View Merging - 4,5,6

Continuing the tests that I did in postings 1 , 2 , 3 , I now use "fatter" tables, adding 128 bytes in the row size.

This is the Test Data Creation :

SQL> drop table library_fat_book_list;
SQL> drop table my_fat_book_list;
SQL> drop table friends_fat_book_list;
SQL>
SQL> create table library_fat_book_list as
2 select owner author, substr(object_name,1,30) book_name, rpad('a',128,'f') other_cols, trunc(created) published_date from dba_objects
3 union
4 select owner author, substr(object_name,1,30) book_name, rpad('a',128,'f') other_cols, trunc(created)+object_id published_date from dba_objects
5 /
SQL>
SQL> create table my_fat_book_list as
2 select owner author, substr(object_name,1,30) book_name, rpad('a',128,'f') other_cols, trunc(created) published_date from dba_objects where mod(object_id,400)=0
3 union
4 select owner author, substr(object_name,1,30) book_name, rpad('a',128,'f') other_cols, trunc(created)+object_id published_date from dba_objects where mod(object_id,400)=0
5 /
SQL>
SQL> create table friends_fat_book_list as
2 select owner author, substr(object_name,1,30) book_name, rpad('a',128,'f') other_cols, trunc(created) published_date from dba_objects where mod(object_id,80)=0
3 union
4 select owner author, substr(object_name,1,30) book_name, rpad('a',128,'f') other_cols, trunc(created)+object_id from dba_objects where mod(object_id,80)=0
5 /
SQL>
SQL>
SQL> create index library_bl_ndx on library_fat_book_list(author,book_name);
SQL> create index my_bl_ndx on my_fat_book_list(author,book_name);
SQL>
SQL> exec dbms_stats.gather_table_stats('','LIBRARY_FAT_BOOK_LIST',method_opt=>'FOR ALL COLUMNS SIZE 1',estimate_percent=>100,cascade=>TRUE);
SQL> exec dbms_stats.gather_table_stats('','MY_FAT_BOOK_LIST',method_opt=>'FOR ALL COLUMNS SIZE 1',estimate_percent=>100,cascade=>TRUE);
SQL> exec dbms_stats.gather_table_stats('','FRIENDS_FAT_BOOK_LIST',method_opt=>'FOR ALL COLUMNS SIZE 1',estimate_percent=>100,cascade=>TRUE);
SQL>
SQL> select count(*) from library_fat_book_list;

COUNT(*)
----------
101968
SQL> select count(*) from my_fat_book_list;

COUNT(*)
----------
260
SQL> select count(*) from friends_fat_book_list;

COUNT(*)
----------
1296
SQL> select table_name, blocks, num_rows from user_tables where table_name in ('LIBRARY_FAT_BOOK_LIST','MY_FAT_BOOK_LIST','FRIENDS_FAT_BOOK_LIST') order by 1;

TABLE_NAME BLOCKS NUM_ROWS
------------------------------ ---------- ----------
FRIENDS_FAT_BOOK_LIST 36 1296
LIBRARY_FAT_BOOK_LIST 2502 101968
MY_FAT_BOOK_LIST 10 260
SQL>


Notice that the Row Count is almost exactly the same as in the first two test. However,
the Block Count is 4 times as much as earlier (see the first posting).

So, what do I see for MY_FAT_BOOK_LIST vis-a-vis MY_BOOK_LIST ?


SQL> select m.author, m.book_name from my_fat_book_list m
2 where
3 m.published_date
4 = (select max(l.published_date)
5 from library_fat_book_list l
6 where l.author=m.author
7 and l.book_name=m.book_name)
8 order by 1,2
9 /
(deleting the output)
123 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 4124939080

--------------------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
--------------------------------------------------------------------------------------------------------
0 SELECT STATEMENT 101K 3684K 526 (1) 00:00:07
1 SORT ORDER BY 101K 3684K 526 (1) 00:00:07
* 2 FILTER
3 TABLE ACCESS FULL MY_FAT_BOOK_LIST 260 9620 4 (0) 00:00:01
4 SORT AGGREGATE 1 39
5 TABLE ACCESS BY INDEX ROWID LIBRARY_FAT_BOOK_LIST 1 39 4 (0) 00:00:01
* 6 INDEX RANGE SCAN LIBRARY_BL_NDX 2 3 (0) 00:00:01
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("M"."PUBLISHED_DATE"= (SELECT MAX("L"."PUBLISHED_DATE") FROM
"LIBRARY_FAT_BOOK_LIST" "L" WHERE "L"."BOOK_NAME"=:B1 AND "L"."AUTHOR"=:B2))
6 - access("L"."AUTHOR"=:B1 AND "L"."BOOK_NAME"=:B2)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
405 consistent gets
0 physical reads
0 redo size
4218 bytes sent via SQL*Net to client
406 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
123 rows processed



Now, that is surprising ! The Cardinality Estimate for MY_FAT_BOOK_LIST is the same
as was for MY_BOOK_LIST. But Oracle now decides to use the FILTER operation against the subquery and *not* merge the subquery ! It's estimate of the resulting rowset count (101K) is way off ! Also, note the expected Byte count against that for the MY_BOOK_LIST query. Expanding the row size has made a significant difference.

And what happens with an explicit NO_MERGE Hint ?


SQL> select m.author, m.book_name from my_fat_book_list m
2 where
3 m.published_date
4 = (select /*+ NO_MERGE */ max(l.published_date)
5 from library_fat_book_list l
6 where l.author=m.author
7 and l.book_name=m.book_name)
8 order by 1,2
9 /
(deleting the output
123 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 4124939080

--------------------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
--------------------------------------------------------------------------------------------------------
0 SELECT STATEMENT 101K 3684K 526 (1) 00:00:07
1 SORT ORDER BY 101K 3684K 526 (1) 00:00:07
* 2 FILTER
3 TABLE ACCESS FULL MY_FAT_BOOK_LIST 260 9620 4 (0) 00:00:01
4 SORT AGGREGATE 1 39
5 TABLE ACCESS BY INDEX ROWID LIBRARY_FAT_BOOK_LIST 1 39 4 (0) 00:00:01
* 6 INDEX RANGE SCAN LIBRARY_BL_NDX 2 3 (0) 00:00:01
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("M"."PUBLISHED_DATE"= (SELECT /*+ NO_MERGE */ MAX("L"."PUBLISHED_DATE") FROM
"LIBRARY_FAT_BOOK_LIST" "L" WHERE "L"."BOOK_NAME"=:B1 AND "L"."AUTHOR"=:B2))
6 - access("L"."AUTHOR"=:B1 AND "L"."BOOK_NAME"=:B2)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
405 consistent gets
0 physical reads
0 redo size
4218 bytes sent via SQL*Net to client
406 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
123 rows processed

SQL>
)


Yes, it is the same. So, Oracle had chosen NOT to merge the view even if I didn't provide
the Hint ! (could this be the "cost based transformation" that 9i doesn't do ?)


What happens with FRIENDS_FAT_BOOK_LIST ? How does that fare against FRIENDS_BOOK_LIST ?

Let's see :

SQL> select f.author, f.book_name from friends_fat_book_list f
2 where
3 f.published_date
4 = (select max(l.published_date)
5 from library_fat_book_list l
6 where l.author=f.author
7 and l.book_name=f.book_name)
8 order by 1,2
9 /
(deleting output)
628 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 700107581

----------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
----------------------------------------------------------------------------------------------
0 SELECT STATEMENT 2 240 569 (2) 00:00:07
* 1 FILTER
2 SORT GROUP BY 2 240 569 (2) 00:00:07
* 3 HASH JOIN 1296 151K 568 (2) 00:00:07
4 TABLE ACCESS FULL FRIENDS_FAT_BOOK_LIST 1296 64800 10 (0) 00:00:01
5 TABLE ACCESS FULL LIBRARY_FAT_BOOK_LIST 101K 6970K 556 (2) 00:00:07
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("F"."PUBLISHED_DATE"=MAX("L"."PUBLISHED_DATE"))
3 - access("L"."AUTHOR"="F"."AUTHOR" AND "L"."BOOK_NAME"="F"."BOOK_NAME")


Statistics
----------------------------------------------------------
24 recursive calls
0 db block gets
2496 consistent gets
0 physical reads
0 redo size
19720 bytes sent via SQL*Net to client
461 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
628 rows processed

SQL>




Hmm ! This is similar to FRIENDS_BOOK_LIST. Oracle *did* decide to merge the subquery
and use a Hash Join !

Let's see the NO_MERGE against MY_FRIENDS_BOOK_LIST.


SQL> select f.author, f.book_name from friends_fat_book_list f
2 where
3 f.published_date
4 = (select /*+ NO_MERGE */ max(l.published_date)
5 from library_fat_book_list l
6 where l.author=f.author
7 and l.book_name=f.book_name)
8 order by 1,2
9 /
(output deleted)
628 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1487416122

--------------------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
--------------------------------------------------------------------------------------------------------
0 SELECT STATEMENT 101K 3783K 2607 (1) 00:00:32
1 SORT ORDER BY 101K 3783K 2607 (1) 00:00:32
* 2 FILTER
3 TABLE ACCESS FULL FRIENDS_FAT_BOOK_LIST 1296 49248 10 (0) 00:00:01
4 SORT AGGREGATE 1 39
5 TABLE ACCESS BY INDEX ROWID LIBRARY_FAT_BOOK_LIST 1 39 4 (0) 00:00:01
* 6 INDEX RANGE SCAN LIBRARY_BL_NDX 2 3 (0) 00:00:01
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("F"."PUBLISHED_DATE"= (SELECT /*+ NO_MERGE */ MAX("L"."PUBLISHED_DATE") FROM
"LIBRARY_FAT_BOOK_LIST" "L" WHERE "L"."BOOK_NAME"=:B1 AND "L"."AUTHOR"=:B2))
6 - access("L"."AUTHOR"=:B1 AND "L"."BOOK_NAME"=:B2)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2010 consistent gets
0 physical reads
0 redo size
19720 bytes sent via SQL*Net to client
461 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
628 rows processed

SQL>




The expected Cardinality and Cost have changed, along with Byte count.

08 April, 2008

Complex View Merging -- 3

Continuing the testing of Complex View Merging, this time with the "larger" FRIENDS_BOOK_LIST Table :



SQL> REM --------------------------------------------------------------------------------------------------------------
> REM REM REM ******* With Complex View Merging ******
SQL>
SQL> select f.author, f.book_name from friends_book_list f
2 where
3 f.published_date
4 = (select max(l.published_date)
5 from library_book_list l
6 where l.author=f.author
7 and l.book_name=f.book_name)
8 order by 1,2
9 /
(deleted output)

628 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2027338862

------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
------------------------------------------------------------------------------------------
0 SELECT STATEMENT 2 148 151 (6) 00:00:02
* 1 FILTER
2 SORT GROUP BY 2 148 151 (6) 00:00:02
* 3 HASH JOIN 1294 95756 149 (5) 00:00:02
4 TABLE ACCESS FULL FRIENDS_BOOK_LIST 1294 47878 4 (0) 00:00:01
5 TABLE ACCESS FULL LIBRARY_BOOK_LIST 101K 3685K 143 (3) 00:00:02
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("F"."PUBLISHED_DATE"=MAX("L"."PUBLISHED_DATE"))
3 - access("L"."AUTHOR"="F"."AUTHOR" AND "L"."BOOK_NAME"="F"."BOOK_NAME")


Statistics
----------------------------------------------------------
24 recursive calls
0 db block gets
625 consistent gets
0 physical reads
0 redo size
19721 bytes sent via SQL*Net to client
461 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
628 rows processed




Oracle has chose to Join the two tables (it has "merged" the
subquery into the parent query).

So let's see what happens when I use the NO_MERGE Hint :



SQL> REM --------------------------------------------------------------------------------------------------------------
> REM REM REM ******* WITHOUT Complex View Merging ******
SQL>
SQL> select f.author, f.book_name from friends_book_list f
2 where
3 f.published_date
4 = (select /*+ NO_MERGE */ max(l.published_date)
5 from library_book_list l
6 where l.author=f.author
7 and l.book_name=f.book_name)
8 order by 1,2
9 /
(deleted output)

628 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 258794817

----------------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
----------------------------------------------------------------------------------------------------
0 SELECT STATEMENT 101K 3685K 2596 (1) 00:00:32
1 SORT ORDER BY 101K 3685K 2596 (1) 00:00:32
* 2 FILTER
3 TABLE ACCESS FULL FRIENDS_BOOK_LIST 1294 47878 4 (0) 00:00:01
4 SORT AGGREGATE 1 37
5 TABLE ACCESS BY INDEX ROWID LIBRARY_BOOK_LIST 1 37 4 (0) 00:00:01
* 6 INDEX RANGE SCAN LIBRARY_BL_NDX 2 3 (0) 00:00:01
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("F"."PUBLISHED_DATE"= (SELECT /*+ NO_MERGE */ MAX("L"."PUBLISHED_DATE") FROM
"LIBRARY_BOOK_LIST" "L" WHERE "L"."BOOK_NAME"=:B1 AND "L"."AUTHOR"=:B2))
6 - access("L"."AUTHOR"=:B1 AND "L"."BOOK_NAME"=:B2)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1966 consistent gets
0 physical reads
0 redo size
19721 bytes sent via SQL*Net to client
461 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
628 rows processed

SQL>




Aah ! Something more complicated. Oracle has instantiated the subquery as a View.
However, this time we see that the NO_MERGE that we forced Oracle to do has significantly increased the total effort. The "COST" has gone up from 151 to 2,596 but, more importantly, the "consistent gets" is up from 625 to 1,966.

So, with a larger FRIENDS_BOOK_LIST table, Oracle was actually better off doing Complex View Merging !
There is no "hard-and-fast" rule as to whether you should have Complex View Merging enabled or not. Oracle 10g does evaluate costs when choosing between a Merged and an Instantiated representation of the Aggregation subquery ("complex view").

In the next few postings, I will explore a "fatter" table (more columns increasing the number of table blocks without increasing the number of rows at all) and also see if I can figure out how the cost based transformation works.

07 April, 2008

Complex View Merging -- 2

After having described the test data , here are the results of my first test :

[on 10.2.0.3 on 32-bit Windows (8KB Block Size) ]

Querying for My Private Library (fewer books) against the Public Library, with Complex View Merging enabled by default, I see that the tables MY_BOOK_LIST and LIBRARY_BOOK_LIST are joined. The implicit view in the Subquery that does gets the max(published_date) has actually been merged into the parent query.



SQL> REM --------------------------------------------------------------------------------------------------------------
> REM REM REM ******* With Complex View Merging ******
SQL>
SQL>
SQL>
SQL>
SQL> select m.author, m.book_name from my_book_list m
2 where
3 m.published_date
4 = (select max(l.published_date)
5 from library_book_list l
6 where l.author=m.author
7 and l.book_name=m.book_name)
8 order by 1,2
9 /

(deleting the output)

123 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2352755801

------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
------------------------------------------------------------------------------------------
0 SELECT STATEMENT 2 146 149 (5) 00:00:02
* 1 FILTER
2 SORT GROUP BY 2 146 149 (5) 00:00:02
* 3 HASH JOIN 260 18980 148 (5) 00:00:02
4 TABLE ACCESS FULL MY_BOOK_LIST 260 9360 3 (0) 00:00:01
5 TABLE ACCESS FULL LIBRARY_BOOK_LIST 101K 3685K 143 (3) 00:00:02
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("M"."PUBLISHED_DATE"=MAX("L"."PUBLISHED_DATE"))
3 - access("L"."AUTHOR"="M"."AUTHOR" AND "L"."BOOK_NAME"="M"."BOOK_NAME")


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
616 consistent gets
0 physical reads
0 redo size
4218 bytes sent via SQL*Net to client
406 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
123 rows processed





Note that Oracle expected to retrieve only 2 rows at a "Cost" of 149.
This execution plan did 616 logical reads.

What happens if I disable Complex_View_Merging by using the "NO_MERGE" Hint ?

This is what I get :


SQL> REM --------------------------------------------------------------------------------------------------------------
> REM REM REM ******* WITHOUT Complex View Merging ******
SQL>
SQL>
SQL> select m.author, m.book_name from my_book_list m
2 where
3 m.published_date
4 = (select /*+ NO_MERGE */ max(l.published_date)
5 from library_book_list l
6 where l.author=m.author
7 and l.book_name=m.book_name)
8 order by 1,2
9 /

(deleting the output)
123 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3701741988

----------------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
----------------------------------------------------------------------------------------------------
0 SELECT STATEMENT 101K 3585K 525 (1) 00:00:07
1 SORT ORDER BY 101K 3585K 525 (1) 00:00:07
* 2 FILTER
3 TABLE ACCESS FULL MY_BOOK_LIST 260 9360 3 (0) 00:00:01
4 SORT AGGREGATE 1 37
5 TABLE ACCESS BY INDEX ROWID LIBRARY_BOOK_LIST 1 37 4 (0) 00:00:01
* 6 INDEX RANGE SCAN LIBRARY_BL_NDX 2 3 (0) 00:00:01
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("M"."PUBLISHED_DATE"= (SELECT /*+ NO_MERGE */ MAX("L"."PUBLISHED_DATE") FROM
"LIBRARY_BOOK_LIST" "L" WHERE "L"."BOOK_NAME"=:B1 AND "L"."AUTHOR"=:B2))
6 - access("L"."AUTHOR"=:B1 AND "L"."BOOK_NAME"=:B2)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
396 consistent gets
0 physical reads
0 redo size
4218 bytes sent via SQL*Net to client
406 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
123 rows processed

SQL>



What we now see is that Oracle had executed the subquery (operations 4 to 6) and passed
AUTHOR and BOOK_NAME from each row fetched in the outer query (the FILTER
operation at Operation 2 from the Full Table Scan of MY_BOOK_LIST at Operation 3).
Thus, for each row retrieved from MY_BOOK_LIST, Oracle did an Indexed Range Scan
of LIBRARY_BOOK_LIST. Surprisingly, with this Execution Plan, Oracle thought it would retrieve 101 thousand rows at a "Cost" of 525.
It is no surpise, then, that the default behaviour of Complex View Merging which Oracle
thought had a lower cost (149 against 525) was chosen. However, if we look at the count
of logical reads in "consistent gets", the first plan was actually more "expensive" in that
it read 616 blocks while the second plan read 396 blocks.

My next pair of tests will be using the larger "FRIENDS_BOOK_LIST" table to drive
the subquery.

Complex View Merging -- 1

Complex View Merging occurs in the Transformation phase -- before the actual Optimization which is the choosing of Access Paths.
In Oracle 9i, if Oracle can do a Transformation it will do so, without considering the costs. That is, costs are not considered before the Transformation.
In Oracle 10g, the optimizer will attempt to work out the costs before deciding on a Transformed version.

Thus, where a query uses a subquery (or a correlated subquery) with an Aggregation function, Oracle can determine that the subquery is a complex view and choose to either execute the joins having merged the subquery (aka "Complex View Merging") or treat the subquery as a seperate View and then execute the joins.

In my next few blog postings, I am going to take up a simulated case of Libraries : One Public Library holding very many books and two private libraries of different sizes and the SQL queries will attempt to determine the latest published books (from the "master" list in the Public Library) also present in the two private libraries.

The "Aggregation" here would be the max(published_date) operation.

Thus, this is the test data :


SQL> rem We have three tables
SQL> rem LIBRARY_BOOK_list is the list of books in the local library
SQL> rem MY_BOOK_list is the books that I own and have at home
SQL> rem FRIENDS_BOOK_list is the much larger private collection of a friend I envy
SQL> rem There are 2 published_dates for each book
SQL>
SQL> rem My first query is to identify the latest library books that I, too, have in my private collection
SQL> rem The second query is the same for my friend's collection
SQL>
SQL>
SQL>
SQL> drop table library_book_list;
SQL> drop table my_book_list;
SQL> drop table friends_book_list;
SQL>
SQL> create table library_book_list as
2 select owner author, substr(object_name,1,30) book_name, trunc(created) published_date from dba_objects
3 union
4 select owner author, substr(object_name,1,30) book_name, trunc(created)+object_id published_date from dba_objects
5 /
SQL>
SQL> create table my_book_list as
2 select owner author, substr(object_name,1,30) book_name, trunc(created) published_date from dba_objects where mod(object_id,400)=0
3 union
4 select owner author, substr(object_name,1,30) book_name, trunc(created)+object_id published_date from dba_objects where mod(object_id,400)=0
5 /
SQL>
SQL> create table friends_book_list as
2 select owner author, substr(object_name,1,30) book_name, trunc(created) published_date from dba_objects where mod(object_id,80)=0
3 union
4 select owner author, substr(object_name,1,30) book_name, trunc(created)+object_id from dba_objects where mod(object_id,80)=0
5 /
SQL>
SQL>
SQL> create index library_bl_ndx on library_book_list(author,book_name);
SQL> create index my_bl_ndx on my_book_list(author,book_name);
SQL>
SQL> exec dbms_stats.gather_table_stats('','LIBRARY_BOOK_LIST',method_opt=>'FOR ALL COLUMNS SIZE 1',estimate_percent=>100,cascade=>TRUE);
SQL> exec dbms_stats.gather_table_stats('','MY_BOOK_LIST',method_opt=>'FOR ALL COLUMNS SIZE 1',estimate_percent=>100,cascade=>TRUE);
SQL> exec dbms_stats.gather_table_stats('','FRIENDS_BOOK_LIST',method_opt=>'FOR ALL COLUMNS SIZE 1',estimate_percent=>100,cascade=>TRUE);
SQL>
SQL> select count(*) from library_book_list;

COUNT(*)
----------
101986
SQL> select count(*) from my_book_list;

COUNT(*)
----------
260
SQL> select count(*) from friends_book_list;

COUNT(*)
----------
1294
SQL> select table_name, blocks, num_rows from user_tables where table_name in ('LIBRARY_BOOK_LIST','MY_BOOK_LIST','FRIENDS_BOOK_LIST') order by 1;

TABLE_NAME BLOCKS NUM_ROWS
------------------------------ ---------- ----------
FRIENDS_BOOK_LIST 11 1294
LIBRARY_BOOK_LIST 626 101986
MY_BOOK_LIST 5 260
SQL>





and the test results on 10.2.0.3 on 32-bit Windows (8KB Block Size) are published in the next few posts.