21 September, 2011

An Index that is a "subset" of a pre-existing Index

It is a common [mis]understanding that creating a new Index that is a subset of an existing Index is an unnecessary overhead. The argument goes "if you have an Index on (A,B,C) this Index can be used to satisfy queries where predicate 'A' is specified, so creating an additional Index on (A) alone is an unnecessary overhead".
It is true that the the Index could be used, but the Cost Based Optimizer also computes a "COST" for using the Index. A composite Index is larger (in terms of Leaf Blocks to navigate) then a single-column Index. Similarly, at times, the composite Index becomes so large that many queries are executed as FullTableScans !

This question arose in a forums discussion today on how a CREATE INDEX an be speeded up. The CREATion of a Index on (A) can actually read the existing Index on (A,B,C) without having to do a FullTableScan. Furthermore, the new Index on (A) can satisfy queries on the single predicate better than the existing Index.

Here I demonstrate both cases :
1. A CREATE INDEX reading an existing Index and avoiding a FullTableScan
2. An Index that is a "subset" of an existing Index being useful

I first setup the "demo" table
SQL> drop table objects_list purge;

Table dropped.

SQL> -- assume a "largeish" table -- the Remarks column makes the row length greater
SQL> create table objects_list as
2 select owner, object_name, subobject_name, object_type, created, status,
3 dbms_random.string('X',200) as Remarks
4 from dba_objects
5 /

Table created.

SQL>
SQL> -- in the absence of object_id, these columns constitute the Unique Index
SQL> create unique index objects_list_u1 on objects_list (owner, object_name, subobject_name, object_type);

Index created.

SQL>
SQL> exec dbms_stats.gather_table_stats('','OBJECTS_LIST',estimate_percent=>100,cascade=>TRUE);

PL/SQL procedure successfully completed.

SQL>
SQL> select blocks, num_rows from user_tables where table_name = 'OBJECTS_LIST';

BLOCKS NUM_ROWS
---------- ----------
2867 76955

SQL>


Now, I verify if I can use the composite Index to execute a query on the leading column alone :
SQL> --- we should be able to use the Unique Index when querying for the leading column alone
SQL> explain plan for
2 select owner, count(*) from objects_list where owner like 'O%' group by owner order by owner;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 544682346

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 21 | 39 (0)| 00:00:01 |
| 1 | SORT GROUP BY NOSORT| | 3 | 21 | 39 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | OBJECTS_LIST_U1 | 5141 | 35987 | 39 (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OWNER" LIKE 'O%')
filter("OWNER" LIKE 'O%')

15 rows selected.

SQL> select owner, count(*) from objects_list where owner like 'O%' group by owner order by owner;

OWNER COUNT(*)
------------------------------ ----------
OBE 75
OE 84
OE1 55
OLAPSYS 719
ORACLE_OCM 8
ORDDATA 248
ORDPLUGINS 10
ORDSYS 2532
OUTLN 9
OWBSYS 2
OWBSYS_AUDIT 12

11 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 544682346

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 21 | 39 (0)| 00:00:01 |
| 1 | SORT GROUP BY NOSORT| | 3 | 21 | 39 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | OBJECTS_LIST_U1 | 5141 | 35987 | 39 (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OWNER" LIKE 'O%')
filter("OWNER" LIKE 'O%')


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
30 consistent gets
0 physical reads
0 redo size
676 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
11 rows processed

SQL>

Yes. The composite Index is used even for the single column predicate "OWNER". The query does 30 block gets. (I've actually executed the query twice to eliminate recursive calls at the second execution).

What if I created an Index on "OWNER" alone ? It would be a "subset of the existing Index.
SQL> create index objects_list_n1 on objects_list (owner);

Index created.

SQL>
SQL> --- now Oracle can use the leaner index for the same query
SQL> --- check the "Cost"
SQL> explain plan for
2 select owner, count(*) from objects_list where owner like 'O%' group by owner order by owner;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1721948621

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 21 | 14 (0)| 00:00:01 |
| 1 | SORT GROUP BY NOSORT| | 3 | 21 | 14 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | OBJECTS_LIST_N1 | 5141 | 35987 | 14 (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OWNER" LIKE 'O%')
filter("OWNER" LIKE 'O%')

15 rows selected.

SQL>
SQL> select owner, count(*) from objects_list where owner like 'O%' group by owner order by owner;

OWNER COUNT(*)
------------------------------ ----------
OBE 75
OE 84
OE1 55
OLAPSYS 719
ORACLE_OCM 8
ORDDATA 248
ORDPLUGINS 10
ORDSYS 2532
OUTLN 9
OWBSYS 2
OWBSYS_AUDIT 12

11 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1721948621

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 21 | 14 (0)| 00:00:01 |
| 1 | SORT GROUP BY NOSORT| | 3 | 21 | 14 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | OBJECTS_LIST_N1 | 5141 | 35987 | 14 (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OWNER" LIKE 'O%')
filter("OWNER" LIKE 'O%')


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
12 consistent gets
0 physical reads
0 redo size
676 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
11 rows processed

SQL>


The new Index is used ! It has a lower "COST" (14 for the new Index versus 39 for the composite Index) and doing fewer block gets (12 versus 30).

So, this new Index is better than the existing one for such queries.

How was the Index created ? I actually obtained a trace of the CREATE INDEX run.
LOCK TABLE "OBJECTS_LIST" IN SHARE MODE  NOWAIT

create index objects_list_n1 on objects_list (owner)


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 2 1 0
Execute 1 0.27 0.29 1 77000 714 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.27 0.29 1 77002 715 0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 187

Rows Row Source Operation
------- ---------------------------------------------------
1 INDEX BUILD NON UNIQUE OBJECTS_LIST_N1 (cr=77119 pr=1 pw=184 time=0 us)(object id 0)
76955 SORT CREATE INDEX (cr=76965 pr=1 pw=0 time=857870 us)
76955 INDEX FAST FULL SCAN OBJECTS_LIST_U1 (cr=76965 pr=1 pw=0 time=1649397 us)(object id 88291)
Thus the CREATE INDEX OBJECTS_LIST_N1 actually did an Index Fast Full Scan of the existing Index OBJECTS_LIST_U1 !

.
.
.



4 comments:

Anonymous said...

Hi Hemant. Interesting article. This means that CBO decides to make an index fast full scan on already created table instead querying the original table, because the first one will be faster

Thanks for sharing!

Facebook Apps said...
This comment has been removed by a blog administrator.
Brooke Kay said...

Marvelous work.Just wanted to drop a comment and say I am new to your blog and really like what I am reading.

robertson said...

Lighter index is always used in comparison to heavier index(like composite index with more columns). Example and Concept is 100% correct.

Not an suggestible option to have multiple indexes ( Composite/single indexes on same column) unless there are daily frequent queries having where clause on single column as well other queries where clause having clause on composite columns (consituting the single column too) with pretty tight response times in milliseconds.

Lets say, there are 22 crore records joining another table of 22 crores ( real time example)... We have 30 indexes of which 18 are lighter indexes (subset index of composite index)... Example: Subset means.. index on status field, composite means index on status and merchant_id and txn_date field.
A Settlement batch job inserts 3+ lakh records sequentially one after the other at end of day along with multiple checks with other referencing tables...

"Will this not lead to growth of all the 30 indexes on the table, hence slowening the INSERTS" ( 5 millisecond per row insert leads to 200+ millisecond per row insert, as all Indexes needs to grow ).

In this case, we prefer to have required composite indexes only ( 12 in our case instead of 30 ) for speedening up the INSERTS. We preferred to drop subset indexes, which speedened our INSERTS.. Approx...Job run time of 4.5 hrs reduces drastically to 2 hrs 45 mins.. ! My 2 cents.. ! Robertson Bhadrachalam