27 June, 2010

Some Common Errors - 6 - Not collecting Metrics

In recent versions, Oracle Enterprise Manager, StatsPack and AWRs have automated collection of Metrics. Unfortunately, these aren't preserved for a long time.

We might expect a DBA (Apps Admin / SysAdmin / Ops Manager) to be able to answer questions like :

a. What has been the Server CPU Utilisation / IO Wait Time 1 month / 3 months / 6 months ago ?

b. What was the size of the (specific) Table / Schema / Database 1 month / 3 months / 6 months / 1 year / 2 years ago ?

c. Which application programs and/or SQL statements have been the cause of most complaints of performance (throughput / response time) or waits / "hangs" (really meaning 'enqueues') from the application / users / management and what has been done to monitor and/or improve them ? (statistics are necessary to provide evidence).

d. What errors occur in the database and whether patches have been identified / requested in the past 3 months / 6 months ?

e. What has been the volume of transactions in the past 1 month / 3 months / 6 months / 1 year ? As a corollary, what have been user volumes ? (Unfortunately, far too often, some manager might think that there is a high correlation between user volumes and transaction volumes / performance / throughput.)


Much of the evidence / metrics can be statistics. Some of it can be (unformatted) notes and documents and/or email trails.

.
.
.



08 June, 2010

Know your data and write a better query

If you "know" your data you can find opportunities to improve your query. This without having to add indexes and even as you have 100% statistics on the table and full histograms.

Here is a simple case study :

First I present a summary of the data. BIZ_COUNTRIES is a table listing the countries that this organisation does business in. BIZ_PARTNERS lists it's partners -- it has a separate partner enterprise in each country. PRODUCTS and SALES are self-explanatory. Each of these tables has a Primary Key and Foreign Key constraints have been defined for SALES to BIZ_PARTNERS and PRODUCTS and BIZ_PARTNERS to BIZ_COUNTRIES.

create table BIZ_COUNTRIES
(country_id number not null primary key,
country_name varchar2(50) not null,
country_notes varchar2(200),
biz_start_date date)
/

create table BIZ_PARTNERS
(partner_id number not null primary key,
partner_country_id number not null,
partner_name varchar2(50) not null,
partner_notes varchar2(200),
partner_start_date date)
/
alter table BIZ_PARTNERS
add constraint p_c_fk foreign key (partner_country_id)
references biz_countries(country_id)
/

create table PRODUCTS
(product_id number not null primary key,
product_name varchar2(50) not null)
/

create table SALES
(sales_inv_id number not null primary key,
sale_partner_id number,
sale_product_id number,
sale_quantity number,
sale_date date)
/
alter table SALES
add constraint s_ptnr_fk foreign key (sale_partner_id)
references biz_partners(partner_id)
/
alter table SALES
add constraint s_prd_fk foreign key (sale_product_id)
references products(product_id)
/

create index sales_partner_ndx on sales(sale_partner_id);
create index sales_product on sales(sale_product_id);


SQL> desc BIZ_COUNTRIES
Name Null? Type
----------------------------------------------------- -------- ------------------------------------
COUNTRY_ID NOT NULL NUMBER
COUNTRY_NAME NOT NULL VARCHAR2(50)
COUNTRY_NOTES VARCHAR2(200)
BIZ_START_DATE DATE

SQL> desc BIZ_PARTNERS
Name Null? Type
----------------------------------------------------- -------- ------------------------------------
PARTNER_ID NOT NULL NUMBER
PARTNER_COUNTRY_ID NOT NULL NUMBER
PARTNER_NAME NOT NULL VARCHAR2(50)
PARTNER_NOTES VARCHAR2(200)
PARTNER_START_DATE DATE

SQL> desc PRODUCTS
Name Null? Type
----------------------------------------------------- -------- ------------------------------------
PRODUCT_ID NOT NULL NUMBER
PRODUCT_NAME NOT NULL VARCHAR2(50)

SQL> desc SALES
Name Null? Type
----------------------------------------------------- -------- ------------------------------------
SALES_INV_ID NOT NULL NUMBER
SALE_PARTNER_ID NUMBER
SALE_PRODUCT_ID NUMBER
SALE_QUANTITY NUMBER
SALE_DATE DATE

SQL>
SQL> REM *after* loading data, statistics are gathered as :
exec dbms_stats.gather_table_stats('','BIZ_COUNTRIES',estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE 250',cascade=>TRUE);
exec dbms_stats.gather_table_stats('','BIZ_PARTNERS',estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE 250',cascade=>TRUE);
exec dbms_stats.gather_table_stats('','PRODUCTS',estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE 250',cascade=>TRUE);
exec dbms_stats.gather_table_stats('','SALES',estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE 250',cascade=>TRUE);


This is the data present :

SQL> select country_id, country_name from biz_countries order by 1;

COUNTRY_ID COUNTRY_NAME
---------- --------------------------------------------------
1 India
2 Singapore
3 Malaysia
4 Thailand
5 Australia
6 UK
7 USA
8 Brazil

8 rows selected.

SQL> select partner_id, partner_country_id, partner_name from biz_partners order by 1;

PARTNER_ID PARTNER_COUNTRY_ID PARTNER_NAME
---------- ------------------ --------------------------------------------------
1 1 Star Enterprises Pvt Ltd
2 2 LionCity Traders
3 3 Growth Traders
4 4 Rising Pvt Ltd
5 5 Star Enterprises _5_ Inc.
6 6 Star Enterprises _6_ Inc.
7 7 Star Enterprises _7_ Inc.
8 8 Star Enterprises _8_ Inc.

8 rows selected.

SQL> select product_id, product_name from products order by 1;

PRODUCT_ID PRODUCT_NAME
---------- --------------------------------------------------
1 YellowWidgets
2 RedWidgets
3 GreenWidgets
4 BlueWidgets
5 BlackWidgets
6 WhiteWidgets

6 rows selected.

SQL> select count(*) from sales;

COUNT(*)
----------
999999

SQL> select min(sale_date), max(sale_date) from sales;

MIN(SALE_ MAX(SALE_
--------- ---------
01-JAN-07 26-SEP-09

SQL> select sale_partner_id, count(*) from sales group by sale_partner_id;

SALE_PARTNER_ID COUNT(*)
--------------- ----------
1 749999
2 250000

SQL>


Thus, we see that only 2 Partners have made sales for us, and Partner ID 1 ("Star Enterprises Pvt Ltd") accounts for almost 75% of the sales -- all in Country ID 1 ("India").
(We can assume that all the products have the same sale price so we do not need to trace sale price and revenue by product).

Here is the first query where I query for Sales made after 01-Jan-2009 by Partner 1 :

SQL> -- QUERY 1
SQL> select c.country_id, c.country_name, p.partner_id, count(s.sale_date)
2 from biz_countries c, biz_partners p, sales s
3 where c.country_id=p.partner_country_id
4 and p.partner_id=s.sale_partner_id
5 and s.sale_date > to_date('01-JAN-2009','DD-MON-YYYY')
6 and p.partner_name = 'Star Enterprises Pvt Ltd'
7 group by c.country_id, c.country_name, p.partner_id
8 order by 1,2,3
9 /

COUNTRY_ID COUNTRY_NAME PARTNER_ID COUNT(S.SALE_DATE)
---------- -------------------------------------------------- ---------- ------------------
1 India 1 201749

SQL>
SQL> -- let's see the optimizer's cardinality estimates
SQL> explain plan for
2 select c.country_id, c.country_name, p.partner_id, count(s.sale_date)
3 from biz_countries c, biz_partners p, sales s
4 where c.country_id=p.partner_country_id
5 and p.partner_id=s.sale_partner_id
6 and s.sale_date > to_date('01-JAN-2009','DD-MON-YYYY')
7 and p.partner_name = 'Star Enterprises Pvt Ltd'
8 group by c.country_id, c.country_name, p.partner_id
9 order by 1,2,3
10 /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 473150350

------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 32 | 1632 | 1052 (3)| 00:00:13 |
| 1 | SORT GROUP BY | | 32 | 1632 | 1052 (3)| 00:00:13 |
|* 2 | HASH JOIN | | 33625 | 1674K| 1050 (3)| 00:00:13 |
| 3 | NESTED LOOPS | | 1 | 40 | 4 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL | BIZ_PARTNERS | 1 | 29 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| BIZ_COUNTRIES | 1 | 11 | 1 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | SYS_C006892 | 1 | | 0 (0)| 00:00:01 |
|* 7 | TABLE ACCESS FULL | SALES | 269K| 2889K| 1043 (2)| 00:00:13 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("P"."PARTNER_ID"="S"."SALE_PARTNER_ID")
4 - filter("P"."PARTNER_NAME"='Star Enterprises Pvt Ltd')
6 - access("C"."COUNTRY_ID"="P"."PARTNER_COUNTRY_ID")
7 - filter("S"."SALE_DATE">TO_DATE(' 2009-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

22 rows selected.

SQL>

If you read the Execution Plan carefully you will see that the Optimizer expected to fetch 33,625 rows in the Joins between the three tables and generate 32 distinct groupings in the Group By. The actual result was 201,749 rows and only 1 single grouping !
Even with 100% statistics the optimizer may not be very accurate ?!

What if I rewrite the query :

SQL> explain plan for
2 select c.country_id, c.country_name, p.partner_id, count(s.sale_date)
3 from biz_countries c, biz_partners p, sales s
4 where c.country_id=p.partner_country_id
5 and p.partner_id=s.sale_partner_id
6 and s.sale_date > to_date('01-JAN-2009','DD-MON-YYYY')
7 -- and p.partner_name = 'Star Enterprises Pvt Ltd'
8 and p.partner_id = 1
9 group by c.country_id, c.country_name, p.partner_id
10 order by 1,2,3
11 /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 574353167

------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 32 | 896 | 1048 (3)| 00:00:13 |
| 1 | HASH GROUP BY | | 32 | 896 | 1048 (3)| 00:00:13 |
| 2 | NESTED LOOPS | | 201K| 5516K| 1048 (3)| 00:00:13 |
| 3 | NESTED LOOPS | | 1 | 17 | 2 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| BIZ_PARTNERS | 1 | 6 | 1 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | SYS_C006896 | 1 | | 0 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| BIZ_COUNTRIES | 8 | 88 | 1 (0)| 00:00:01 |
|* 7 | INDEX UNIQUE SCAN | SYS_C006892 | 1 | | 0 (0)| 00:00:01 |
|* 8 | TABLE ACCESS FULL | SALES | 201K| 2167K| 1046 (3)| 00:00:13 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("P"."PARTNER_ID"=1)
7 - access("C"."COUNTRY_ID"="P"."PARTNER_COUNTRY_ID")
8 - filter("S"."SALE_DATE">TO_DATE(' 2009-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
AND "S"."SALE_PARTNER_ID"=1)

23 rows selected.

SQL>

What did I change ? Instead of supplying the Partner Name, I supplied the Partner ID in my query. That was the only change I made to the query. Yet, if you see the Predicate Information section of the Execution Plan, Oracle has now passed "SALE_PARTNER_ID=1" as one of the filter predicates on the SALES table. My query did not explicitly specify what the SALE_PARTNER_ID was to be. It only specified the PARTNER_ID (from the BIZ_PARTNERS table). Yet, Oracle was able to transform (i.e. rewrite the query) and, instead of merely using a Join on SALE_PARTNER_ID, it decided to apply SALE_PARTNER_ID=1 as a filter against the SALES table !
The Optimizer has also come up with a more accurate estimate of 201K rows to be fetched from the SALES table and also to be the result of the Joins.

Because I knew that the SALE_PARTNER_ID for "Star Enterprises Pvt Ltd" was 1 and that there is no other Partner with the same name (this is the key piece of information !), I could change my query predicate from "and p.partner_name = 'Star Enterprises Pvt Ltd'" to "and p.partner_id = 1". This is based on my knowledge of the data !


Let me look at the second query (very very similar !) :

SQL> -- QUERY 2
SQL> select c.country_id, c.country_name, p.partner_id, count(s.sale_date)
2 from biz_countries c, biz_partners p, sales s
3 where c.country_id=p.partner_country_id
4 and p.partner_id=s.sale_partner_id
5 and s.sale_date > to_date('01-JAN-2009','DD-MON-YYYY')
6 and p.partner_name = 'Star Enterprises _7_ Inc.'
7 group by c.country_id, c.country_name, p.partner_id
8 order by 1,2,3
9 /

no rows selected

SQL>

Partner ID 7 ("Star Enterprises _7_ Inc.") has made no sales.

What does the Optimizer estimate ? :

SQL> explain plan for
2 select c.country_id, c.country_name, p.partner_id, count(s.sale_date)
3 from biz_countries c, biz_partners p, sales s
4 where c.country_id=p.partner_country_id
5 and p.partner_id=s.sale_partner_id
6 and s.sale_date > to_date('01-JAN-2009','DD-MON-YYYY')
7 and p.partner_name = 'Star Enterprises _7_ Inc.'
8 group by c.country_id, c.country_name, p.partner_id
9 order by 1,2,3
10 /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 473150350

------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 32 | 1632 | 1052 (3)| 00:00:13 |
| 1 | SORT GROUP BY | | 32 | 1632 | 1052 (3)| 00:00:13 |
|* 2 | HASH JOIN | | 33625 | 1674K| 1050 (3)| 00:00:13 |
| 3 | NESTED LOOPS | | 1 | 40 | 4 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL | BIZ_PARTNERS | 1 | 29 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| BIZ_COUNTRIES | 1 | 11 | 1 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | SYS_C006892 | 1 | | 0 (0)| 00:00:01 |
|* 7 | TABLE ACCESS FULL | SALES | 269K| 2889K| 1043 (2)| 00:00:13 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("P"."PARTNER_ID"="S"."SALE_PARTNER_ID")
4 - filter("P"."PARTNER_NAME"='Star Enterprises _7_ Inc.')
6 - access("C"."COUNTRY_ID"="P"."PARTNER_COUNTRY_ID")
7 - filter("S"."SALE_DATE">TO_DATE(' 2009-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

22 rows selected.

SQL>

Isn't that very surprising ? The Optimizer estimates the same number of rows for sales by "Star Enterprises _7_ Inc." (which are actually 0 rows) as it did for sales by "Star Enterprises Pvt Ltd" (which were 201thousand !).

Let me rewrite my query again :

SQL> explain plan for
2 select c.country_id, c.country_name, p.partner_id, count(s.sale_date)
3 from biz_countries c, biz_partners p, sales s
4 where c.country_id=p.partner_country_id
5 and p.partner_id=s.sale_partner_id
6 and s.sale_date > to_date('01-JAN-2009','DD-MON-YYYY')
7 -- and p.partner_name = 'Star Enterprises _7_ Inc.'
8 and p.partner_id = 7
9 group by c.country_id, c.country_name, p.partner_id
10 order by 1,2,3
11 /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 2893611923

----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 28 | 5 (0)| 00:00:01 |
| 1 | HASH GROUP BY | | 1 | 28 | 5 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 28 | 5 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 1 | 17 | 2 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| BIZ_PARTNERS | 1 | 6 | 1 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | SYS_C006896 | 1 | | 0 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| BIZ_COUNTRIES | 8 | 88 | 1 (0)| 00:00:01 |
|* 7 | INDEX UNIQUE SCAN | SYS_C006892 | 1 | | 0 (0)| 00:00:01 |
|* 8 | TABLE ACCESS BY INDEX ROWID | SALES | 1 | 11 | 3 (0)| 00:00:01 |
|* 9 | INDEX RANGE SCAN | SALES_PARTNER_NDX | 1 | | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("P"."PARTNER_ID"=7)
7 - access("C"."COUNTRY_ID"="P"."PARTNER_COUNTRY_ID")
8 - filter("S"."SALE_DATE">TO_DATE(' 2009-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
9 - access("S"."SALE_PARTNER_ID"=7)

24 rows selected.

SQL>

Once again, I used my knowledge of the fact that "Star Enterprises _7_ Inc." appears only once in BIZ_PARTNERS and is PARTNER_ID 7 and replaced the query predicate "and p.partner_name = 'Star Enterprises _7_ Inc.'" with "and p.partner_id = 7". Oracle could add it's own "SALE_PARTNER_ID=7" access predicate on the SALES table.
This time the Optimizer presents 1 row as the estimated row count from SALES (an estimate of 0 gets rounded up to 1 row !) and, correspondingly, a group by result of 1 row. Since it expected only 1 row from SALES, it even avoided a FullTableScan on the SALES table and used the SALE_PARTNER_NDX. The estimated "Cost" of the query comes down from 1052 to 5. (Although, as I have repeatedly said, ignore "Cost" look at the actual execution steps and look at the real effort -- buffer gets, consistent reads, physical reads, elapsed time -- when the query is executed).

So, with prior knowledge of the data, I can rewrite my user's query and supply the proper predicate to allow the optimizer to further improve on my query and add it's own filter on the SALES table !


Also read this post on Transitive Closure by Jonathan Lewis.

.
.
.

04 June, 2010

RECOVER DATABASE starts with an update -- 2

Continuing my previous post where I showed that a RECOVER DATABASE USING BACKUP CONTROLFILE starts with an Update

I STARTUP MOUNT the database :

ora10204>sqlplus '/ as sysdba'

SQL*Plus: Release 10.2.0.4.0 - Production on Fri Jun 4 22:55:11 2010

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

Connected to an idle instance.

SQL> startup mount;
ORACLE instance started.

Total System Global Area 880803840 bytes
Fixed Size 2087992 bytes
Variable Size 218104776 bytes
Database Buffers 654311424 bytes
Redo Buffers 6299648 bytes
Database mounted.
SQL>


From another session, I check the type of controlfile :

SQL> select controlfile_type from v$database;

CONTROL
-------
CURRENT

SQL>

So, the controlfile "type" is "CURRENT" , this really being the current controlfile.

I then issue my RECOVER .. USING BACKUP CONTROLFILE command :

SQL> recover database using backup controlfile until cancel;
ORA-00279: change 3632262 generated at 06/04/2010 22:54:20 needed for thread 1
ORA-00289: suggestion :
/oracle_fs/FRAs/ORT24FS/ORT24FS/archivelog/2010_06_04/o1_mf_1_1_%u_.arc
ORA-00280: change 3632262 for thread 1 is in sequence #1


Specify log: {=suggested | filename | AUTO | CANCEL}


I then re-check the type of controlfile :

SQL> select controlfile_type from v$database;

CONTROL
-------
BACKUP

SQL>

Aah ! The Controlfile itself has been updated and is now a BACKUP file, not the CURRENT file.
This will be the status regardless of whether ReCOVERy succeeds or fails.

CANCEL
Media recovery cancelled.
SQL>

From the other session :

SQL> select controlfile_type from v$database;

CONTROL
-------
BACKUP

SQL>

Can I open the database, knowing that no file are inconsitent (had files been inconsistent, I would have recived a warning "file %s needs more recovery") :

SQL> alter database open ;
alter database open
*
ERROR at line 1:
ORA-01589: must use RESETLOGS or NORESETLOGS option for database open


SQL>

The status remains a BACKUP :

SQL> select controlfile_type from v$database;

CONTROL
-------
BACKUP

SQL>

Since I cannot OPEN without resetting logs when I have a Backup Controlfile (the RESETLOGS will synchronise the controlfile with the "latest" current information from the Datafiles, even as updated by application of Redo (if any)), I must RESETLOGS :

SQL> alter database open resetlogs;

Database altered.

SQL>

The status now is CURRENT:

SQL> select controlfile_type from v$database;

CONTROL
-------
CURRENT

SQL>


.
.
.