31 May, 2011

Interpreting an AWR report when the ArchiveLog Dest or FRA was full

It can be very interesting / challenging / confusing / misleading / wrong to attempt to interpret an AWR report for the duration when ArchiveLogs could not be generated. Such a case is when the ArchiveLog Destination directory or FRA (whichever is in use) went to 100% full usage (or hit the db_recovery_file_dest_size and Oracle cannot auto-delete any files in the FRA, all files present being necessary to meet the RETENTION Policy).

Take such a case which I simulated today. I started a "batch" job that generates a high volume of redo and also does some disconnect/reconnect calls. This job started at Tue May 31 22:56:30 SGT 2011 and completed at Tue May 31 23:13:28 SGT 2011 (I've taken a small time frame as I cannot afford to sit up to 3am to simulate a 4hour job !).
The User / Application Adminsitrator / DBA may think that the job to 17 minutes to run. This job was expected to complete in 5 minutes. (extrapolate to 4hours and 1hour respectively for a "real world" job ?).
The DBA was asked to investigate why the job took 3 times as long. He generated an AWR report.
             Snap Id      Snap Time      Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 1215 31-May-11 22:52:52 30 1.2
End Snap: 1216 31-May-11 23:12:47 34 1.0
Elapsed: 19.92 (mins)
DB Time: 119.47 (mins)

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ --------------- --------------- ---------- ----------
DB Time(s): 6.0 57.4 0.62 4.17
DB CPU(s): 0.0 0.4 0.00 0.03
Redo size: 386,454.6 3,694,963.3
Logical reads: 1,557.5 14,891.5
Block changes: 1,822.6 17,426.5
Physical reads: 2.1 19.8
Physical writes: 29.0 276.8
User calls: 1.4 13.7
Parses: 6.0 57.0
Hard parses: 0.4 3.6
W/A MB processed: 0.1 0.7
Logons: 0.1 0.7
Executes: 9.6 92.0
Rollbacks: 0.0 0.0
Transactions: 0.1

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.84 Redo NoWait %: 98.01
Buffer Hit %: 99.87 In-memory Sort %: 100.00
Library Hit %: 92.83 Soft Parse %: 93.71
Execute to Parse %: 38.05 Latch Hit %: 100.00
Parse CPU to Parse Elapsd %: 39.34 % Non-Parse CPU: 96.00

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 70.72 78.91
% SQL with executions>1: 55.21 86.47
% Memory for SQL w/exec>1: 56.19 80.02

Only 440MB of redo was generated in 19.92 minutes. Only about 12MB of logical I/O per second ?
And DB Time was a phenomenal 119.47 minutes against an elapsed 19.92minutes ?
Why was the job so slow ? Was the CPU running slowly ? Was there a memory fault ? Were I/Os performed very slowly ?
Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg
wait % DB
Event Waits Time(s) (ms) time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
buffer busy waits 5 1,964 4.E+05 27.4 Concurrenc
log file switch (archiving nee 5 1,900 4.E+05 26.5 Configurat
DB CPU 48 .7
JS kgl get object wait 30 24 803 .3 Administra
log buffer space 32 19 585 .3 Configurat

OMG ! Buffer Busy Waits accounted for 27.4% of total database response time !
Yes, log file switch (archiving needed) accounted for 26.5% of total database response time !
Action Item for the DBA : He MUST find the reason for those buffer busy waits ! These seem to be more important than the log file switch waits. He MUST find why the job ran so slowly.


This is the sort of error that occurs when you look at figures in isolation and also when you look at figures "averaged" out over the AWR elapsed duration.

But guess what ? The job didn't run "slow". The server CPU and Memory and I/O didn't run "slow" or had faults. The database didn't run "slow".
What had happened is that all Redo generation halted when LGWR couldn't switch into an Online Redo Log file because it hadn't been archived.
The database instance was "stuck" at this position for almost 13minutes. From 23:00:35 :
Tue May 31 23:00:35 2011
Thread 1 advanced to log sequence 37 (LGWR switch)
Current log# 1 seq# 37 mem# 0: /home/oracle/app/oracle/oradata/orcl/redo01.log
Tue May 31 23:00:35 2011
Errors in file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_arc0_2435.trc:
ORA-19815: WARNING: db_recovery_file_dest_size of 4294967296 bytes is 100.00% used, and has 0 remaining bytes available.
************************************************************************
You have following choices to free up space from recovery area:
1. Consider changing RMAN RETENTION POLICY. If you are using Data Guard,
then consider changing RMAN ARCHIVELOG DELETION POLICY.
2. Back up files to tertiary device such as tape using RMAN
BACKUP RECOVERY AREA command.
3. Add disk space and increase db_recovery_file_dest_size parameter to
reflect the new space.
4. Delete unnecessary files using RMAN DELETE command. If an operating
system command was used to delete files, then use RMAN CROSSCHECK and
DELETE EXPIRED commands.
************************************************************************
Errors in file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_arc0_2435.trc:
ORA-19809: limit exceeded for recovery files
ORA-19804: cannot reclaim 41253888 bytes disk space from 4294967296 limit
ARC0: Error 19809 Creating archive log file to '/addtl/oracle/flash_recovery_area/ORCL/archivelog/2011_05_31/o1_mf_1_36_%u_.arc'
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance orcl - Archival Error
ORA-16038: log 3 sequence# 36 cannot be archived
ORA-19809: limit exceeded for recovery files

To 23:12:58 :
Tue May 31 23:12:58 2011
Archived Log entry 605 added for thread 1 sequence 36 ID 0x4c1c54ba dest 1:
Archiver process freed from errors. No longer stopped
Tue May 31 23:12:59 2011
Thread 1 advanced to log sequence 39 (LGWR switch)
Current log# 3 seq# 39 mem# 0: /home/oracle/app/oracle/oradata/orcl/redo03.log

Of the 19.92 minutes, 12minutes22seconds were completely lost when the database instance was "frozen".
That 440MB of Redo was not generated over 19.92minutes but over about 7minutes33seconds.

Guess what ? Those 1,964 seconds of buffer busy waits were the process "stuck" in the wait -- the buffer wasn't being freed because no activity was possible. The actual number of logical I/Os wasn't 1,557.5 per second but double that -- because the database instance was active for only half the elapsed duration.

Remember :
1. One wait event may well be attributed to another wait event. AWR or the DBA or the Performance Analyst may wrongly "double count" waits.

2. Statistics presented for an elapsed duration do not necessarily mean that every second of that duration was equally loaded. Some of the elapsed duration (9minutes22seconds here) may have been completely idle time. Unfortunately, when we read an AWR we assume that the whole duration of the AWR has had a uniform load.

3. Statistics are accumulated for multiple sessions. In less than 20minutes elapsed time we had 32minutes of "buffer busy waits" time !

4. Such wrong interpretations are more pernicious (if that is the right word) when an AWR is for a longer duration. If a job runs for 30minutes and no other activity occurs over the next 30minutes, an AWR report for a 60minute duration will under-report the load of the job by 50% !

.
.
.

29 May, 2011

Deleting SQL Plan Baselines

Following the example of capturing SQL Plan Baselines, here's a deletion :
SQL> select sql_handle, sql_text from dba_sql_plan_baselines
2 where sql_text like 'SELECT CHAR_VALUE FROM SYSTEM.PRODUCT_PRIVS %';

SQL_HANDLE SQL_TEXT
------------------------------ --------------------------------------------------------------------------------
SYS_SQL_2bdf77bedbcdea50 SELECT CHAR_VALUE FROM SYSTEM.PRODUCT_PRIVS WHERE (UPPER('SQL*Plus') LIKE UPPE

SQL>
SQL> set serveroutput on
SQL> declare
2 ret_value pls_integer;
3 sql_handle_in varchar2(30);
4 cursor c1 is
5 select sql_handle from dba_sql_plan_baselines
6 where sql_text like 'SELECT CHAR_VALUE FROM SYSTEM.PRODUCT_PRIVS %' ;
7 begin
8 open c1 ;
9 loop
10 fetch c1 into sql_handle_in;
11 exit when c1%notfound;
12 ret_value := dbms_spm.drop_sql_plan_baseline(sql_handle=>sql_handle_in,plan_name=>NULL);
13 dbms_output.put_line(ret_value);
14 end loop;
15 end;
16 /
1

PL/SQL procedure successfully completed.

SQL> select sql_handle, sql_text from dba_sql_plan_baselines
2 where sql_text like 'SELECT CHAR_VALUE FROM SYSTEM.PRODUCT_PRIVS %'
3 /

no rows selected

SQL>


Deletion of multiple SQL_HANDLEs or PLAN_NAMEs can also be executed similarly.

.
.
.

28 May, 2011

Database Audit setup -- 2 interesting findings

Jonathan Lewis blogs about two interesting findings in 3 days :
1. Object name being captured in audit trail
2. Collateral damage (my words !) from enabling extended audit, without even starting auditing

And another one :


.

.

.

Capturing SQL PLAN Baselines

An introductory :
SQL> select count(*) from dba_sql_plan_baselines ;

COUNT(*)
----------
       2

SQL>
SQL> declare
2  ret_value pls_integer;
3  sql_id_in varchar2(30);
4  cursor c1 is
5  select distinct sql_id from v$sql,dba_users
6  where parsing_user_id = user_id
7  and username = 'HEMANT'
8  order by 1;
9  begin
10  open c1 ;
11  loop
12  fetch c1 into sql_id_in;
13  exit when c1%NOTFOUND;
14  dbms_output.put_line('For SQL_ID  :  '     ||  sql_id_in);
15  ret_value := dbms_spm.load_plans_from_cursor_cache(sql_id=>sql_id_in);
16  -- dbms_output.put_line(ret_value);
17  end loop;
18  end;
19  /
For SQL_ID  :  0v8ygnt3qfnwq
For SQL_ID  :  1761jd8v8v5vu
For SQL_ID  :  2rcwxc7j3b6cu
For SQL_ID  :  3kvqxcy1cg1av
For SQL_ID  :  3qxyrb5h82cra
For SQL_ID  :  5qgz1p0cut7mx
For SQL_ID  :  7cfz5wy9caaf4
For SQL_ID  :  7hys3h7ysgf9m
For SQL_ID  :  7na10rxj8ksw5
For SQL_ID  :  9rvgxuv7w16qg
For SQL_ID  :  buuzqhh3qshuw
For SQL_ID  :  cf19zu91tn7mj
For SQL_ID  :  cuwm7p53jggtb
For SQL_ID  :  cw6vxf0kbz3v1
For SQL_ID  :  d6vwqbw6r2ffk
For SQL_ID  :  dpbugmu3fw5qk
For SQL_ID  :  dt0hqkzcdq5ny
For SQL_ID  :  du2u5hgh368qt
For SQL_ID  :  dyk4dprp70d74
For SQL_ID  :  g3f3cw3zy5aat
For SQL_ID  :  g4y6nw3tts7cc
For SQL_ID  :  gpp3q1rch3vpx

PL/SQL procedure successfully completed.

SQL>
SQL> select count(*) from dba_sql_plan_baselines ;

COUNT(*)
----------
      21

SQL>  



(Note : 1 had already been captured earlier)
Evolving baselines to follow...
.


Another blog post (by Kellyn Pot'vin) :  Capturing SQL Baselines for Consistent Performance for 10g Upgrades to 11g



UPDATE  14-Jun-19 :  See this post :  Automatic SQL Plan Management in 19c 


23 May, 2011

11g OCP

I passed the 1Z0-055 "Oracle Database 11g: New Features for 9i OCPs" examination today.

(yes, I skipped 10g certification)
.
.
.

15 May, 2011

Getting all the instance parameters

A simple method (in 11g -- here tested in 11.2.0.1) to get all the database instance parameters.

Currently, I am using an SPFILE :
[oracle@localhost ~]$ cd $ORACLE_HOME/dbs
[oracle@localhost dbs]$ ls -ltr
total 32
-rw-r--r-- 1 oracle oracle 2851 May 16 2009 init.ora
drwx------ 2 oracle oracle 4096 Oct 30 2009 peshm_DBUA0_0
-rw-rw---- 1 oracle oracle 1544 Oct 30 2009 hc_DBUA0.dat
drwx------ 2 oracle oracle 4096 Oct 30 2009 peshm_orcl_0
-rw-rw---- 1 oracle oracle 24 Apr 7 2010 lkORCL
-rw-r----- 1 oracle oracle 1536 Jul 7 2010 orapworcl
-rw-rw---- 1 oracle oracle 1544 May 15 12:59 hc_orcl.dat
-rw-r----- 1 oracle oracle 2560 May 15 12:59 spfileorcl.ora
[oracle@localhost dbs]$ sqlplus '/ as sysdba'

SQL*Plus: Release 11.2.0.1.0 Production on Sun May 15 14:27:26 2011

Copyright (c) 1982, 2009, Oracle. All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> show parameter spfile

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string /home/oracle/app/oracle/produc
t/11.2.0/dbhome_1/dbs/spfileor
cl.ora
SQL>

If I create a PFILE, I have 25 lines of parameters :
(UPDATE : As suggested by Coskan, use "create pfile=/tmp/init.ora from spfile;" so as to not overwrite an existing pfile -- e.g. if it happens to be different from the current set of processes.)
SQL> create pfile from spfile;

File created.

SQL> quit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@localhost dbs]$ ls -ltr|tail
total 36
-rw-r--r-- 1 oracle oracle 2851 May 16 2009 init.ora
drwx------ 2 oracle oracle 4096 Oct 30 2009 peshm_DBUA0_0
-rw-rw---- 1 oracle oracle 1544 Oct 30 2009 hc_DBUA0.dat
drwx------ 2 oracle oracle 4096 Oct 30 2009 peshm_orcl_0
-rw-rw---- 1 oracle oracle 24 Apr 7 2010 lkORCL
-rw-r----- 1 oracle oracle 1536 Jul 7 2010 orapworcl
-rw-rw---- 1 oracle oracle 1544 May 15 12:59 hc_orcl.dat
-rw-r----- 1 oracle oracle 2560 May 15 12:59 spfileorcl.ora
-rw-rw-r-- 1 oracle oracle 959 May 15 14:28 initorcl.ora
[oracle@localhost dbs]$ wc -l spfileorcl.ora
25 spfileorcl.ora
[oracle@localhost dbs]$ wc -l initorcl.ora
25 initorcl.ora
[oracle@localhost dbs]$


But what if I want to see all the "hidden" parameters ? I could query some view.

I could also do this :
(UPDATE : As suggested by Coskan, use "create pfile=/tmp/init.ora from memory;" so as to not overwrite an existing pfile -- e.g. if it happens to be different from the current set of processes.)
[oracle@localhost dbs]$ rm initorcl.ora
[oracle@localhost dbs]$ sqlplus '/ as sysdba'

SQL*Plus: Release 11.2.0.1.0 Production on Sun May 15 14:30:07 2011

Copyright (c) 1982, 2009, Oracle. All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> create pfile from memory;

File created.

SQL> quit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@localhost dbs]$ ls -l initorcl.ora
-rw-rw-r-- 1 oracle oracle 5625 May 15 14:30 initorcl.ora
[oracle@localhost dbs]$ wc -l initorcl.ora
163 initorcl.ora
[oracle@localhost dbs]$

I now have 163 entries in my PFILE !

And these contain 137 parameters beginning with the "_" character :
[oracle@localhost dbs]$ strings -a initorcl.ora |grep '^\_' | wc -l
137
[oracle@localhost dbs]$


I can also create an SPFILE with all the parameters :
[oracle@localhost dbs]$ sqlplus '/ as sysdba'

SQL*Plus: Release 11.2.0.1.0 Production on Sun May 15 14:36:38 2011

Copyright (c) 1982, 2009, Oracle. All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> create spfile='spfileorcl.bck' from memory;

File created.

SQL> quit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@localhost dbs]$ wc -l spfileorcl.bck
164 spfileorcl.bck
[oracle@localhost dbs]$
[oracle@localhost dbs]$ strings -a spfileorcl.bck | grep '^\*\.\_' | wc -l
136
[oracle@localhost dbs]$


The "CREATE SPFILE || PFILE FROM MEMORY" is a new command in 11g.

UPDATE : You can still query the V$PARAMETER, V$SYSTEM_PARAMETER (and V$PARAMETER2, V$SYSTEM_PARAMETER2) views WHERE ISDEFAULT=TRUE to get the default values (of "normal" parameters) that you hadn't put in your parameter file.

I leave it to you to find the reasons for the slight discrepancies in line counts.
.
.
.

05 May, 2011

RMAN's COPY command

Following some thread on forums, I had logged a documentation query against the 11gR2 RMAN documentation on 31-Jan.
My question to Oracle's 11gR2 documentation team had been

"Why is the COPY command not documented in the Reference ?" (referring to the 11gR2 Backup and Recovery Reference).



I have today received a response from the documentation team :


Dear RMAN Reader,

Thank you for your feedback and question.

The RMAN COPY command was deprecated in Oracle Database 10gR1. This change is highlighted in "Deprecated RMAN Syntax" section of the documentation. The command that replaces it is BACKUP AS COPY. Previously, when you created a backup set with the COPY command you then had to use RMAN to extract individual datafiles from that backup set. The BACKUP AS COPY command creates image copies and this eliminates the need to extract them from a backup set. The BACKUP AS COPY command allows you to copy a database, tablespaces, datafiles, archived redo logs, and control files.


For example:

RMAN> BACKUP AS COPY DATAFILE 4 FORMAT '+TEST';


I hope this answers your question.

Regards,
Oracle Documentation Team


Although I can't now recall which forums thread was it that piqued my curiousity about this issue with the documentation, I'd thought I should now post this response here.
.
.
.

04 May, 2011

Collection of my Oracle Blog posts on Backup and Recovery

I have uploaded a collection of my Oracle Blog posts on Backup and Recovery, Control files, ArchiveLogs and such. This is now available as a PDF file. I hope that it will be useful as it is downloadable and viewable "off the web".

The entries go as far back as December 2006 and are current as of upto 03-May-2011. I have also included comments as of upto 03-May-2011.

I intend to upload collections of posts on Performance and Optimization and on Architecture and Design soon.


Incidentally, this post, today, is my 250th post to the blog.
.
.
.