26 September, 2010

Data Skew changing over time --- and the Cardinality Estimate as well !

What happens if the nature of "data skew" changes over time ? For example, in the first few years of operations, you manufacture and sell 4 types of widgets ("A", "B", "C" and "D") where the volume of sales is approximately equally distributed amongst the 4 (Sales may be increasing over the years, but the distribution -- i.e. proportions -- remains the same). After say, 5 years, you add new widgets ("X", "Y" and "Z") to your product line. You notice that sales for "A" and "B" are now actually declining while sales for "Z" over the next 3 years are growing much faster than "C", "D", "X" and "Y".
What you have is changing data skew.
How does the optimizer estimate cardinality ?

Column Statistics used for a Histogram are at the *Table* level. A Histogram of the frequency of occurrences of these widgets in your sales would be computed merely as the number of occurrences across the table -- without accounting for dates. At the end of year 4, Oracle may say that "A" accounted for 25% of sales. At the end of year 8, "A" may now account for only 5% of total sales across all 8 years. And that is why the histogram would say. However, the histogram would not now (at the end of 8 years) say that "A" still accounts for 25% of sales in the first 4 years ! The information content is now "normalised" -- the metadata loses some vital information. (Of course, actually querying the table still shows you that "A" accounts for 25% of sales in the first 4 years, but the Optimizer's statistics can no longer reflect this knowledge). The Optimizer's cardinality estimates after 8 years are actually "weaker" because it has "normalised" (or "averaged out") the skew.


I've takan the "SH" schema's SALES table (this is available as one of the EXAMPLE schemas) from an Oracle 10.2 installation to demonstrate this. I take the example of 3.5inch diskettes whose sales decline. To simplify the case, I count the number of Sales Orders, not the quantities across all orders (for example, we could assume that each Sales Order is for exactly 1000 diskettes).

First, I start with SALES information for the years 1998 to 2002 :
SQL> exec dbms_stats.gather_table_stats('','SALES',estimate_percent=>100,granularity=>'ALL',degree=>4,-
> method_opt=>'FOR ALL COLUMNS SIZE 250',cascade=>TRUE,no_invalidate=>FALSE);

PL/SQL procedure successfully completed.

SQL>
SQL> -- What is PROD_ID 31 ?
SQL> select prod_id, prod_name
  2  from products
  3  where prod_id=31
  4  /

   PROD_ID PROD_NAME
---------- --------------------------------------------------
        31 1.44MB External 3.5" Diskette

SQL>
SQL> -- What are the range of dates in the Sales (history) table ?
SQL> select min(time_id), max(time_id)
  2  from times
  3  /

MIN(TIME_ MAX(TIME_
--------- ---------
01-JAN-98 31-DEC-02

SQL>
SQL> -- What are the total number of Sales records by year ?
SQL> select /*+ FULL(s) PARALLEL (s 4) */ to_char(time_id,'YYYY'), count(*)
  2  from sales s
  3  group by to_char(time_id,'YYYY')
  4  order by 1
  5  /

TO_C   COUNT(*)
---- ----------
1998     178834
1999     247945
2000     232646
2001     259418

SQL>
SQL> -- How many sales exist for the range 01-Jan-1998 to 31-Dec-2001 ?
SQL> select count(*)
  2  from sales
  3  where time_id > to_date('31-DEC-1997','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY')
  4  /

  COUNT(*)
----------
    918843

SQL>
SQL> -- How many sales of 3.5" diskettes were made in the range 01-Jan-1998 to 31-Dec-2001 ?
SQL> select prod_id, to_char(time_id,'YYYY'), count(*)
  2  from sales
  3  where
  4  prod_id=31
  5  and time_id > to_date('31-DEC-1997','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY')
  6  group by prod_id,to_char(time_id,'YYYY')
  7  order by 1,2
  8  /

   PROD_ID TO_C   COUNT(*)
---------- ---- ----------
        31 1998       6602
        31 1999       6586
        31 2000       7568
        31 2001       2352

SQL>
SQL> -- How many sales 3.5" diskettes were made in  2002 ?
SQL> select prod_id, to_char(time_id,'YYYY'), count(*)
  2  from sales
  3  where
  4  prod_id=31
  5  and time_id > to_date('31-DEC-2001','DD-MON-YYYY') and time_id < to_date('01-JAN-2003','DD-MON-YYYY')
  6  group by prod_id,to_char(time_id,'YYYY')
  7  order by 1,2
  8  /

no rows selected

SQL> 

We know that gross Sales (for all products) have (generally) been increasing over the years -- from 179thousand orders in 1998 to 259thousand in 2001 (with a slight dip in the year 2000).
However, sales of 3.5" diskettes declined in 2001.
Sales for 2002 are not recorded.

We look at the Optimizer's cardinality estimates based on available statistics :
SQL> -- Verify the Cardinality Estimates by year
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-1997','DD-MON-YYYY') and time_id < to_date('01-JAN-1999','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2009851720

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |  4478 | 53736 |    22  (10)| 00:00:01 |       |       |
|*  1 |  VIEW                          | index$_join$_001 |  4478 | 53736 |    22  (10)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR    |                  |  4478 | 53736 |     6   (0)| 00:00:01 |     4 |     8 |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |  4478 | 53736 |     6   (0)| 00:00:01 |       |       |
|*  5 |      BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX   |       |       |            |          |     4 |     8 |
|   6 |    PARTITION RANGE ITERATOR    |                  |  4478 | 53736 |    15   (7)| 00:00:01 |     4 |     8 |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |  4478 | 53736 |    15   (7)| 00:00:01 |       |       |
|*  8 |      BITMAP INDEX RANGE SCAN   | SALES_TIME_BIX   |       |       |            |          |     4 |     8 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROD_ID"=31 AND "TIME_ID"              "TIME_ID">TO_DATE(' 1997-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   2 - access(ROWID=ROWID)
   5 - access("PROD_ID"=31)
   8 - access("TIME_ID">TO_DATE(' 1997-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              1999-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

25 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-1998','DD-MON-YYYY') and time_id < to_date('01-JAN-2000','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2009851720

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |  6275 | 75300 |    27   (8)| 00:00:01 |       |       |
|*  1 |  VIEW                          | index$_join$_001 |  6275 | 75300 |    27   (8)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR    |                  |  6275 | 75300 |     6   (0)| 00:00:01 |     8 |    12 |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |  6275 | 75300 |     6   (0)| 00:00:01 |       |       |
|*  5 |      BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX   |       |       |            |          |     8 |    12 |
|   6 |    PARTITION RANGE ITERATOR    |                  |  6275 | 75300 |    20   (5)| 00:00:01 |     8 |    12 |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |  6275 | 75300 |    20   (5)| 00:00:01 |       |       |
|*  8 |      BITMAP INDEX RANGE SCAN   | SALES_TIME_BIX   |       |       |            |          |     8 |    12 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROD_ID"=31 AND "TIME_ID"              "TIME_ID">TO_DATE(' 1998-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   2 - access(ROWID=ROWID)
   5 - access("PROD_ID"=31)
   8 - access("TIME_ID">TO_DATE(' 1998-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              2000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

25 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-1999','DD-MON-YYYY') and time_id < to_date('01-JAN-2001','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2009851720

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |  5859 | 70308 |    25   (8)| 00:00:01 |       |       |
|*  1 |  VIEW                          | index$_join$_001 |  5859 | 70308 |    25   (8)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR    |                  |  5859 | 70308 |     6   (0)| 00:00:01 |    12 |    16 |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |  5859 | 70308 |     6   (0)| 00:00:01 |       |       |
|*  5 |      BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX   |       |       |            |          |    12 |    16 |
|   6 |    PARTITION RANGE ITERATOR    |                  |  5859 | 70308 |    18   (6)| 00:00:01 |    12 |    16 |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |  5859 | 70308 |    18   (6)| 00:00:01 |       |       |
|*  8 |      BITMAP INDEX RANGE SCAN   | SALES_TIME_BIX   |       |       |            |          |    12 |    16 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROD_ID"=31 AND "TIME_ID">TO_DATE(' 1999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TIME_ID"   2 - access(ROWID=ROWID)
   5 - access("PROD_ID"=31)
   8 - access("TIME_ID">TO_DATE(' 1999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

25 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-2000','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2009851720

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |  6528 | 78336 |    27   (8)| 00:00:01 |       |       |
|*  1 |  VIEW                          | index$_join$_001 |  6528 | 78336 |    27   (8)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR    |                  |  6528 | 78336 |     6   (0)| 00:00:01 |    16 |    20 |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |  6528 | 78336 |     6   (0)| 00:00:01 |       |       |
|*  5 |      BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX   |       |       |            |          |    16 |    20 |
|   6 |    PARTITION RANGE ITERATOR    |                  |  6528 | 78336 |    20   (5)| 00:00:01 |    16 |    20 |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |  6528 | 78336 |    20   (5)| 00:00:01 |       |       |
|*  8 |      BITMAP INDEX RANGE SCAN   | SALES_TIME_BIX   |       |       |            |          |    16 |    20 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROD_ID"=31 AND "TIME_ID">TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TIME_ID"   2 - access(ROWID=ROWID)
   5 - access("PROD_ID"=31)
   8 - access("TIME_ID">TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

25 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-2001','DD-MON-YYYY') and time_id < to_date('01-JAN-2003','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1295169200

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                |    16 |   192 |     4   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR    |                |    16 |   192 |     4   (0)| 00:00:01 |    20 |    24 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                |    16 |   192 |     4   (0)| 00:00:01 |       |       |
|   3 |    BITMAP AND                |                |       |       |            |          |       |       |
|   4 |     BITMAP MERGE             |                |       |       |            |          |       |       |
|*  5 |      BITMAP INDEX RANGE SCAN | SALES_TIME_BIX |       |       |            |          |    20 |    24 |
|*  6 |     BITMAP INDEX SINGLE VALUE| SALES_PROD_BIX |       |       |            |          |    20 |    24 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("TIME_ID">TO_DATE(' 2001-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TIME_ID"   6 - access("PROD_ID"=31)

20 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-2002','DD-MON-YYYY') and time_id < to_date('01-JAN-2004','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2294783259

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                |    16 |   192 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR          |                |    16 |   192 |     2   (0)| 00:00:01 |    24 |    28 |
|*  2 |   TABLE ACCESS BY LOCAL INDEX ROWID| SALES          |    16 |   192 |     2   (0)| 00:00:01 |    24 |    28 |
|   3 |    BITMAP CONVERSION TO ROWIDS     |                |       |       |            |          |       |       |
|*  4 |     BITMAP INDEX RANGE SCAN        | SALES_TIME_BIX |       |       |            |          |    24 |    28 |
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("PROD_ID"=31)
   4 - access("TIME_ID">TO_DATE(' 2002-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              2004-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

18 rows selected.
SQL>
SQL> 


We can see that the estimates for PROD_ID=31 do not match reality. Particularly for 2001.

Let's now create 2002 data :
SQL> -- let's create  2002 sales data
SQL> alter index sales_time_bix modify partition sales_q1_2002 unusable;

Index altered.

SQL> alter index sales_time_bix modify partition sales_q2_2002 unusable;

Index altered.

SQL> alter index sales_time_bix modify partition sales_q3_2002 unusable;

Index altered.

SQL> alter index sales_time_bix modify partition sales_q4_2002 unusable;

Index altered.

SQL> alter index sales_prod_bix modify partition sales_q1_2002 unusable;

Index altered.

SQL> alter index sales_prod_bix modify partition sales_q2_2002 unusable;

Index altered.

SQL> alter index sales_prod_bix modify partition sales_q3_2002 unusable;

Index altered.

SQL> alter index sales_prod_bix modify partition sales_q4_2002 unusable;

Index altered.

SQL>
SQL>
SQL> insert /*+ APPEND */ into sales
  2  select prod_id, cust_id, time_id+365, channel_id, promo_id, quantity_sold, amount_sold
  3  from sales
  4  where 1=1
  5  and time_id > to_date('31-DEC-2000','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY')
  6  and decode(prod_id,31,mod(cust_id,3),0)=0
  7  /

257877 rows created.

SQL> alter index sales_time_bix rebuild partition sales_q1_2002 ;

Index altered.

SQL> alter index sales_time_bix rebuild partition sales_q2_2002 ;

Index altered.

SQL> alter index sales_time_bix rebuild partition sales_q3_2002 ;

Index altered.

SQL> alter index sales_time_bix rebuild partition sales_q4_2002 ;

Index altered.

SQL> alter index sales_prod_bix rebuild partition sales_q1_2002 ;

Index altered.

SQL> alter index sales_prod_bix rebuild partition sales_q2_2002 ;

Index altered.

SQL> alter index sales_prod_bix rebuild partition sales_q3_2002 ;

Index altered.

SQL> alter index sales_prod_bix rebuild partition sales_q4_2002 ;

Index altered.

SQL>
SQL>
SQL> -- the number of total sales orders in 2002
SQL> select count(*)
  2  from sales
  3  where time_id > to_date('31-DEC-2001','DD-MON-YYYY') and time_id < to_date('01-JAN-2003','DD-MON-YYYY')
  4  /

  COUNT(*)
----------
    257877

SQL>
SQL> -- the number of sales orders for 3.5" diskettes in 2002
SQL> select count(*)
  2  from sales
  3  where time_id > to_date('31-DEC-2001','DD-MON-YYYY') and time_id < to_date('01-JAN-2003','DD-MON-YYYY')
  4  and prod_id=31
  5  /

  COUNT(*)
----------
       811

SQL> -- so we had only 811 sales orders for 3.5" diskettes
SQL>
SQL> -- Gather Statistics
SQL> exec dbms_stats.gather_table_stats('','SALES',estimate_percent=>100,granularity=>'ALL',degree=>4,-
> method_opt=>'FOR ALL COLUMNS SIZE 250',cascade=>TRUE,no_invalidate=>FALSE);

PL/SQL procedure successfully completed.

SQL> 


Thus, although gross sales for all products did not decline significantly in 2002, sales for 3.5" diskettes, at 811 orders, were only 1/3rd the number in 2001.

Let's revisit the Optimizer's cardinality estimates :
SQL> -- ReVerify the Cardinality Estimates by year
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-1997','DD-MON-YYYY') and time_id < to_date('01-JAN-1999','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2009851720

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |  3636 | 43632 |    22  (10)| 00:00:01 |       |       |
|*  1 |  VIEW                          | index$_join$_001 |  3636 | 43632 |    22  (10)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR    |                  |  3636 | 43632 |     6   (0)| 00:00:01 |     4 |     8 |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |  3636 | 43632 |     6   (0)| 00:00:01 |       |       |
|*  5 |      BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX   |       |       |            |          |     4 |     8 |
|   6 |    PARTITION RANGE ITERATOR    |                  |  3636 | 43632 |    15   (7)| 00:00:01 |     4 |     8 |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |  3636 | 43632 |    15   (7)| 00:00:01 |       |       |
|*  8 |      BITMAP INDEX RANGE SCAN   | SALES_TIME_BIX   |       |       |            |          |     4 |     8 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROD_ID"=31 AND "TIME_ID"              "TIME_ID">TO_DATE(' 1997-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   2 - access(ROWID=ROWID)
   5 - access("PROD_ID"=31)
   8 - access("TIME_ID">TO_DATE(' 1997-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              1999-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

25 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-1998','DD-MON-YYYY') and time_id < to_date('01-JAN-2000','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2009851720

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |  5047 | 60564 |    27   (8)| 00:00:01 |       |       |
|*  1 |  VIEW                          | index$_join$_001 |  5047 | 60564 |    27   (8)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR    |                  |  5047 | 60564 |     6   (0)| 00:00:01 |     8 |    12 |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |  5047 | 60564 |     6   (0)| 00:00:01 |       |       |
|*  5 |      BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX   |       |       |            |          |     8 |    12 |
|   6 |    PARTITION RANGE ITERATOR    |                  |  5047 | 60564 |    20   (5)| 00:00:01 |     8 |    12 |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |  5047 | 60564 |    20   (5)| 00:00:01 |       |       |
|*  8 |      BITMAP INDEX RANGE SCAN   | SALES_TIME_BIX   |       |       |            |          |     8 |    12 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROD_ID"=31 AND "TIME_ID"              "TIME_ID">TO_DATE(' 1998-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   2 - access(ROWID=ROWID)
   5 - access("PROD_ID"=31)
   8 - access("TIME_ID">TO_DATE(' 1998-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              2000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

25 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-1999','DD-MON-YYYY') and time_id < to_date('01-JAN-2001','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2009851720

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |  4755 | 57060 |    26   (8)| 00:00:01 |       |       |
|*  1 |  VIEW                          | index$_join$_001 |  4755 | 57060 |    26   (8)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR    |                  |  4755 | 57060 |     6   (0)| 00:00:01 |    12 |    16 |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |  4755 | 57060 |     6   (0)| 00:00:01 |       |       |
|*  5 |      BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX   |       |       |            |          |    12 |    16 |
|   6 |    PARTITION RANGE ITERATOR    |                  |  4755 | 57060 |    19   (6)| 00:00:01 |    12 |    16 |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |  4755 | 57060 |    19   (6)| 00:00:01 |       |       |
|*  8 |      BITMAP INDEX RANGE SCAN   | SALES_TIME_BIX   |       |       |            |          |    12 |    16 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROD_ID"=31 AND "TIME_ID"              "TIME_ID">TO_DATE(' 1999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   2 - access(ROWID=ROWID)
   5 - access("PROD_ID"=31)
   8 - access("TIME_ID">TO_DATE(' 1999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

25 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-2000','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2009851720

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |  5277 | 63324 |    28   (8)| 00:00:01 |       |       |
|*  1 |  VIEW                          | index$_join$_001 |  5277 | 63324 |    28   (8)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR    |                  |  5277 | 63324 |     6   (0)| 00:00:01 |    16 |    20 |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |  5277 | 63324 |     6   (0)| 00:00:01 |       |       |
|*  5 |      BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX   |       |       |            |          |    16 |    20 |
|   6 |    PARTITION RANGE ITERATOR    |                  |  5277 | 63324 |    21   (5)| 00:00:01 |    16 |    20 |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |  5277 | 63324 |    21   (5)| 00:00:01 |       |       |
|*  8 |      BITMAP INDEX RANGE SCAN   | SALES_TIME_BIX   |       |       |            |          |    16 |    20 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROD_ID"=31 AND "TIME_ID">TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TIME_ID"   2 - access(ROWID=ROWID)
   5 - access("PROD_ID"=31)
   8 - access("TIME_ID">TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

25 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-2001','DD-MON-YYYY') and time_id < to_date('01-JAN-2003','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2009851720

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |  5250 | 63000 |    28   (8)| 00:00:01 |       |       |
|*  1 |  VIEW                          | index$_join$_001 |  5250 | 63000 |    28   (8)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR    |                  |  5250 | 63000 |     6   (0)| 00:00:01 |    20 |    24 |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |  5250 | 63000 |     6   (0)| 00:00:01 |       |       |
|*  5 |      BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX   |       |       |            |          |    20 |    24 |
|   6 |    PARTITION RANGE ITERATOR    |                  |  5250 | 63000 |    21   (5)| 00:00:01 |    20 |    24 |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |  5250 | 63000 |    21   (5)| 00:00:01 |       |       |
|*  8 |      BITMAP INDEX RANGE SCAN   | SALES_TIME_BIX   |       |       |            |          |    20 |    24 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROD_ID"=31 AND "TIME_ID">TO_DATE(' 2001-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TIME_ID"   2 - access(ROWID=ROWID)
   5 - access("PROD_ID"=31)
   8 - access("TIME_ID">TO_DATE(' 2001-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              2003-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

25 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-2002','DD-MON-YYYY') and time_id < to_date('01-JAN-2004','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1295169200

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                |    13 |   156 |     4   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR    |                |    13 |   156 |     4   (0)| 00:00:01 |    24 |    28 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                |    13 |   156 |     4   (0)| 00:00:01 |       |       |
|   3 |    BITMAP AND                |                |       |       |            |          |       |       |
|   4 |     BITMAP MERGE             |                |       |       |            |          |       |       |
|*  5 |      BITMAP INDEX RANGE SCAN | SALES_TIME_BIX |       |       |            |          |    24 |    28 |
|*  6 |     BITMAP INDEX SINGLE VALUE| SALES_PROD_BIX |       |       |            |          |    24 |    28 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("TIME_ID">TO_DATE(' 2002-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TIME_ID"   6 - access("PROD_ID"=31)

20 rows selected.

SQL> 


Oracle now seems to have continued "normalising" (or "averaging out") the distribution across all the years. The discrepancy between the actual count of rows and the estimated count of rows has increased significantly.
Actual versus Estimated Cardinality for PROD_ID=31


Year            Actual  Estimated  Estimated
                before inserting   after inserting
                Year 2002 rows     Year 2002 rows

1998              6602       4478       3636   
1999              6586       6275       5047
2000              7568       5859       4755
2001              2352       6528       5277
2002                 0         16         na
2002 inserted      811         na       5250
2003                 0         16         13
                                                               


Although the number of occurrences of Sales Orders for PROD_ID=31 has declined considerably in 2001 and 2002, the Optimizer cannot adjust for the decline. The discrepancy between Actual and Estimated row counts increases over time as the skew changes.
The Cardinality estimate is computed from Column statistics (number of distinct values, selectivity, histogram) at the Table level but combined with Partition row counts without any knowledge of the occurrences of PRODUCT_ID=31 in each of the target Partitions for the year 2002.


SUGGESTED TESTS For Readers :
1. What if we had a BTree Index ?
2. What if the SALES table wasn't Partitioned ?


UPDATE : See the next post, with a Non-Partitioned Table
.
.
.

No comments: