Performance Tuning Basics 15 - AWR Report Analysis - Expert Oracle
Performance Tuning Basics 15 - AWR Report Analysis - Expert Oracle
Analysis
PUBLISHED FEBRUARY 6, 2018 by BRIJESH GOGIA
The Oracle’s Automatic Workload Repository (AWR) collects, processes, and maintains
performance statistics for problem detection and self-tuning purposes. The report
generated by AWR is a big report and it can take years of experience to actually
understand all aspects of this report. In this post we will try to explain some important
sections of AWR, significance of those sections and also some important tips. Please
note that explaining all sections of AWR will not be possible so we will stick to some of
the most frequently used sections.
Note that this is not comprehensive information and goal is to help in giving an
overview of few key sections to Junior DBAs as a primer and to encourage them to
build further the knowledge in related fields.
To start with let us mention some high level important tips regarding AWR:
1. Collect Multiple AWR Reports: It’s beneficial to have two AWR Reports, one for the
good time and other when performance is poor or you can create three reports
(Before/Meantime/After reports) during the time frame problem was experienced and
compare it with the time frame before and after.
2. Stick to Particular Time: You must have a specific time when Database was slow so
that you can choose a shorter timeframe to get a more precise report.
3. Split Large AWR Report into Smaller Reports: Instead of having one report for long
time like one report for 3 hrs. it is better to have three reports each for one hour. This
will help to isolate the problem
4. FOR RAC, take each instance’s individual report: For RAC environment, you need to
do it separately of all the instances in the RAC to see if all the instances are balanced
the way they should be.
5. Use ASH also : Use AWR to identify the troublesome areas and then use ASH to
confirm those areas.
6. Increase the retention period : Some instances where you get more performance
issues you should increase the retention time so that you can have historical data to
compare.
-> s – second
-> cs – centisecond – 100th of a second
-> ms – millisecond – 1000th of a second
-> us – microsecond – 1000000th of a second
Top Header
This contains information about the Database and environment. Along with the
snapshot Ids and times. Important thing to notice is that the configuration like CPU and
Memory has not changed when the performance is degraded.
PARAMETER
DESCRIPTION ANALYSIS
DB TIME Time spent in database during the DB TIME > Elapsed Time will mean that
Elapsed Time the sessions were active on database
OR concurrently
Sum of the time taken by all sessions in
the database during the ‘Elapsed’ You cam find the average active
time.DB Time= CPU Time + sessions during AWR Time:
Non IDLE wait time.
DB TIME/ELAPSED =>
Note: it does not include 1964.97/899.99 = 2.18
background processes
So database load (average
active sessions) = 2.18
It means that
Load Profile
SIGNIFICANCE OF THIS SECTION:
Here in load profile (average active sessions, DB CPU, logical and physical reads, user
calls, executions, parses, hard parses, logons, rollbacks, transactions) —
check if the numbers are consistent with each other and with general database profile
(OLTP/DWH/mixed)
Pay most attention to physical reads, physical writes, hard parse to parse ratio
and executes to transaction ratio.
The ratio of hard parses to parses tells you how often SQL is being fully
parsed. Full parsing of SQL statements has a negative effect on performance.
High hard parse ratios (>2 – 3 percent) indicate probable bind variable issues or
maybe versioning problems.
Rows per sort can also be reviewed here to see if large sorts are occurring.
This section can help in the load testing for application releases. You can
compare this section for the baseline as well as high load situation.
Redo Size The main sources of redo Not very scary number in our
(Bytes) are (in roughly descending report
order): INSERT, UPDATE and
DELETE. For INSERTs and High redo figures mean that either
UPDATE s lots of new data is being saved
into the database, or existing data
is undergoing lots of changes.
User Calls number of calls from a user This is an extremely useful piece of
process into the database – information, because it sets the
things like “parse”, “fetch”, scale for other statistics (such as
“execute”, “close” commits, hard parses etc.).
Shows % of times
the SQL in shared
pool is used. Shows
Soft Parsing being low indicates bind
how often sessions
variable and versioning issues. With 99.25
issued a SQL
% for the soft parse meaning that about
soft parse % statement that is
0.75 % (100 – soft parse) is happening for
already in the
hard parsing. Low hard parse is good for
shared pool and
us.
how it can use an
existing version of
that statement.
If this value is near 100% means most of
the CPU resources are used into
operations other than parsing, which is
Oracle utilizes the good for database health.
CPU mostly for
% Non-Parse
statement
CPU Most of our statements were already
execution but not
parsed so we weren’t doing a lot of re
for parsing.
parsing. Re parsing is high on CPU and
should be avoided.
Shows % of time
If Latch Hit % is <99%, you may have a
latches are acquired
Latch Hit % latch problem. Tune latches to reduce
without having to
cache contention
wait.
Note that there could be significant waits that are not listed here, so check the Foreground
Wait Events (Wait Event Statistics) section for any other time consuming wait events.
For the largest waits look at the Wait Event Histogram to identify the distribution of
waits.
DB CPU Time running in CPU (waiting in run- Here 84.8% is the %DB Time for this Event
queue not included) which is really HIGH!
We can find
1) DB CPU LOAD
2) DB CPU UTLIZATION %
=DB CPU LOAD/Number of
Cores
=
=(1.85/8) X 100
= 23% of Host cores
enq TX – row lock waited for locked rows This parameter value currently is only 0.2%
contention of total DB time so we don’t have to much
worry about it.
DB FILE single block i/o Average I/O call is 2ms which is not very
SEQUENTIAL high. If you have say very high wait average
READ Sequential read is an index read example 100ms or 200ms, it means that
LOG FILE SYNC Here Wait AVG (MS) is 6 which is not a cry
number.
Above 20ms we don’t consider good
numberAlso go to “Instance
Activity Stats” section and see
how many commits actually
happened and then see here
that what % of COMMITS have to
wait.Remember that short
transactions, frequent commits
is property of OLTP Application.
Wait Class, helps in classifying whether the issue is Wait events are broadly classified in to
column related to application or infrastructure. different WAIT CLASSES:
Administrative
Application
Concurrency
User IO
System IO
Cluster
Commit
Configuration
Idle
Network
Host CPU
A high level of DB CPU usage in the Top N Foreground Events (or Instance CPU: %Busy
CPU) does not necessarily mean that CPU is a bottleneck. In this example also we have
DB CPU as the highest consuming category in the “Top 10 Foreground Events”
Look at the Host CPU and Instance CPU sections. The key things to look for are the
values “%Idle” in the “Host CPU” section and “%Total CPU” in the “Instance CPU” section.
If the “%Idle” is low and “%Total CPU” is high then the instance could have a bottleneck
in CPU (be CPU constrained). Otherwise, the high DB CPU usage just means that the
database is spending a lot of time in CPU (processing) compared to I/O and other
events. In either case (CPU is a bottleneck or not) there could be individual expensive
SQLs with high CPU time, which could indicate suboptimal
execution plans, especially if accompanied with high (buffer) gets.
If you see in our case %idle is high 74% AND %Total CPU is just 7.45 so CPU is not a
bottle neck in this example.
Cache Sizes
SIGNIFICANCE OF THIS SECTION:
From Oracle 10g onwards, database server does Automatic Memory Management for
PGA and SGA components. Based on load, database server keeps on allocating or
deallocating memory assigned to different components of SGA and PGA. Due to this
reason, we can observe different sizes for Buffer Cache and Shared Pool, at the
beginning or end of AWR snapshot period.
Above % show
focus on. You s
DB CPU represents time spent on CPU resource by what is DB CPU
DB CPU foreground user processes. This time doesn’t
include waiting time for CPU. DB CPU usage
CPU time / NU
Where NUM_C
statistics sectio
Of course, if the
system, the for
check that, loo
directly in the O
SQL*Net Message Idle wait event We can find the number of average inactive sessions by this
from client wait event
SQL*Net
SQL*Net message to client waits almost
Message to
always indicates network contention.
client
SQL*Net more
If it is very low then it indicates that the Oracle
data from
Net session data unit size is likely set correctly.
client
Db file scattered Usually indicates excessive full table scans, look at the AWR
reads segment statistics for tables that are fully scanned
In this report, look for query has low executions and high Elapsed time per Exec (s) and
this query could be a candidate for troubleshooting or optimizations. In above report,
you can see first query has maximum Elapsed time but only 2 execution. So you have
to investigate this.
NOTE 1: The Elapsed time can indicate if a SQL is multithreaded (either Parallel
DML/SQL or multiple workers). In this case the elapsed time will be multiple times the
AWR duration (or the
observed clock time of the process/SQL). The elapsed time for multithreaded SQL will
be the total of elapsed time for all workers or parallel slaves.
NOTE 2: The “SQL Ordered” sections can often contain the PL/SQL call that contains
SQLs. So in this case the procedure WF_ENGINE (via procedures) ultimately calls the
SQL b6mcn03jvfg41. Also if you see the first line here that is also a package BEGIN
XXINV7566…. and inside this package it is running the SQL query running in the line 2
which is insert into XXINV7566_IQR…..
DBMS,
sqlplusw,
TOAD,
rman,
SQL,
SQL Module Enterprise Manager,
ORACLE,
MMON_SLAVE,
emagent etc…
Note that the ‘Physical Read Reqs’ column in the ‘SQL ordered by
Physical Reads (UnOptimized)’ section is the number of I/O
requests and not the number of blocks returned. Be careful not to
Physica Read
confuse these with the Physical Reads statistics from the AWR
Reqs
section ‘SQL ordered by Reads’, which counts database blocks read
from the disk not actual I/Os (a single I/O operation may return
many blocks from disk).
SQL ordered by Parse Calls
Tablespace IO Stats
SIGNIFICANCE OF THIS SECTION:
These are useful to see what your hot tablespaces are. For example, having the
SYSTEM tablespace as the number one source of IO could indicate you have improper
temporary tablespace assignments as these used to default to SYSTEM. Having the
TEMP or UNDO tablespaces in the top position has already been discussed. Usually in
an OLTP system one of your index tablespaces should be at the top. In a DWH or OLAP
a data tablespace should be at the top. Also look at the latency values. For disk based
systems 5.0 ms is considered good performance.
EST LC TIME LC means LIBRARY you have to see that if increase your shared pool then
SAVED CACHE what is the amount of this time that you can save
SGA Target Advisory
The SGA target advisory report is somewhat of a summation of all the advisory reports
previously presented in the AWR report. It helps you determine the impact of changing
the settings of the SGA target size in terms of overall database performance. The
report uses a value called DB Time as a measure of the increase or decrease in
performance relative to the memory change made. Also the report will summarize an
estimate of physical reads associated with the listed setting for the SGA.
Starting at a “Size Factor” of 1 (this indicates the current size of the SGA). If the “Est DB
Time (s)” decreases significantly as the “Size Factor” increases then increasing the SGA
will significantly reduce the physical reads and improve performance. but here in our
example the Est DB Time is not reducing as much with increase in SGA so increasing
SGA in our case will not be beneficial.
When the SQL requires a large volume of data access, increasing the SGA_TARGET
size can reduce the amount of disk I/O and improve the SQL performance.
The buffer wait statistics report helps you drill down on specific buffer wait events, and
where the waits are occurring
We focus on Total wait time(s) and in this example this value is only 702 seconds
Enqueue Activities
The Enqueue activity report provides information on enqueues (higher level Oracle
locking) that occur. As with other reports, if you see high levels of wait times in these
reports, you might dig further into the nature of the enqueue and determine the cause
of the delays.
This can give some more information for enqueue waits (e.g. Requests, Successful
gets, Failed gets), which can give an indication of the percentage of times that an
enqueue has to wait and the number of failed gets.
In our example the top row do have failed gets but the number of waits is only 55 and
wait time (s) is also not high number. So Enqueue is not our major issue in this AWR.
STO/ OOS Represents count for Sanpshot Too Old and Out Of In this example, we can see
Space errors, occurred during the snapshot period. 0 errors occurred during this
period.
Latch Activity
The statistic displays segment details based on logical reads happened. Data
displayed is sorted on “Logical Reads” column in descending order. It provides
information about segments for which more logical reads are happening. Most of these
SQLs can be found under section SQL Statistics -> SQL ordered by Gets.
These reports can help you find objects that are “hot” objects in the database. You may
want to review the objects and determine why they are hot, and if there are any tuning
opportunities available on those objects (e.g. partitioning), or on SQL accessing those
objects.
When the segments are suffering from high logical I/O, those segments are listed
here. When the table has high logical reads and its index has relatively small logical
reads, there is a high possibility some SQL is using the index inefficiently, which is
making a throw-away issue in the table. Find out the columns of the condition
evaluated in the table side and move them into the index. When the index has high
logical reads, the index is used excessively with wide range. You need to reduce the
range with an additional filtering condition whose columns are in the same index.
If a SQL is suboptimal then this can indicate the tables and indexes where the
workload or throwaway occurs and where the performance issue lies. It can be
particularly useful if there are no actual statistics elsewhere (e.g. Row Source Operation
Counts (STAT lines) in the SQL Trace or no actuals in the SQLT/Display Cursor report).
If there are a high number of physical read waits (db file scattered read, db file
sequential read and direct path read) then this section can indicate on which segments
(tables or
indexes) the issue occurs.
This can help identify suboptimal execution plan lines. It can also help identify changes
to tablespace and storage management that will improve performance.
When the SQLs need excessive physical reads on the particular segments, this section
lists them. You need to check if some of SQLs are using unnecessary full scan and
wide range scan.
The statistic displays segment details based on physical reads happened. Data
displayed is sorted on “Physical Reads” column in descending order. It provides
information about segments for which more physical reads are happening.
Queries using these segments should be analysed to check whether any FTS is
happening on these segments. In case FTS is happening then proper indexes should
be created to eliminate FTS. Most of these SQLs can be found under section SQL
Statistics -> SQL ordered by Reads.
These reports can help you find objects that are “hot” objects in the database. You may
want to review the objects and determine why they are hot, and if there are any tuning
opportunities available on those objects (e.g. partitioning), or on SQL accessing those
objects.
For example, if an object is showing up on the physical reads report, it may be that an
index is needed on that object.
If there is a high level of “enq: TX allocate ITL entry” waits then this section can identify
the segments (tables/indexes) on which they occur.
Whenver a transaction modifies segment block, it first add transaction id in the Internal
Transaction List table of the block. Size of this table is a block level configurable
parameter. Based on the value of this parameter those many ITL slots are created in
each block.
ITL wait happens in case total trasactions trying to update same block at the same
time are greater than the ITL parameter value.
Total waits happening in the example are very less, 34 is the Max one. Hence it is not
recommended to increase the ITL parameter value.
Usually when the segments are suffering from Row Lock, those segments are listed in
this section. The general solution is to provide more selective condition for the SQL to
lock only rows that are restricted. Or, after DML execution, commit or rollback as soon
as possible. Or so on.
If there is a high level of “Buffer Busy Waits” waits then this section can identify the
segments (tables/indexes) on which they occur.
The section lists segments that are suffering from buffer busy waits. Based on the
reason code or class#, the treatment of each is different. The physical segment’s
attributes such as freelist, freelist groups, pctfree, pctused and so on are handled by
rebuilding the object. But before this treatment, you need to check if your SQLs can
visit different blocks at the same time if possible to avoid the contention.
Buffer busy waits happen when more than one transaction tries to access same block
at the same time. In this scenario, the first transaction which acquires lock on the block
will able to proceed further whereas other transaction waits for the first transaction to
finish.
If there are more than one instances of a process continuously polling database by
executing same SQL (to check if there are any records available for processing), same
block is read concurrently by all the instances of a process and this result in Buffer
Busy wait event.
This is one of the post in Performance Tuning Fundamentals Series. Click on below links to
read more posts from the series:
Share
Brijesh Gogia
I’m an experienced Cloud/Oracle Applications/DBA Architect with more than 15 years of
full-time DBA/Architect experience. I have gained wide knowledge on Oracle and Non-
Oracle software stack running on-prem and on Cloud and have worked on several big
projects for multi-national companies. I enjoy working with leading-edge technology and
have a passion for Cloud architecture, automation, database performance, and stability.
Thankfully my work allows me time for researching new technologies (and to write about
them).
Related Posts:
Bharat Kaushik
Hi Brijesh Gogia this is really great work done for you. your research work many
people used. Please do and write more Articles AWR report for Solve performance
Issue and solutions.
Thanks,
Gowri Infosys India.
Smita
Ramya
Hi Brijesh,
very Nice article and very help full for new DBA.
Anto
Deepa
Rahul
Raj kadam
Good article to understand the metrics inside db and it exactly said where to trace.
Kishore
sandeep singh
Sikam Patar
This is beyond amazing. Thank you so much for your time in writing this. Very helpful to
newcomers.
Sumanta
Excellent
Jon Adams
Very detailed and thoughtful. Thanks.
Ramakrishna
Kishore
Hi Sir,
This is Very very helpful for beginners to understand the AWR report
Thanks for the Information.
Biao.You
Jimmy
Thank you !!
This very helpful to me
mohan
Excellent
Aditya
Prasad
suresh kumar
Hi Brijesh,
thanks for indeed explanation. I have one query regarding DB cpu utilization .
in your post you have mentioned DBCPU/core* 100. In my case my server have 20core
40 num cpu. CPU distributed to databases on their criticality. one of Database have
cpu_count is 19. if I have to calculate CPU utilization for this DB so it would be 19/2=9.5
make it round 10 core or I need to consider cpu_count is core value. in AWR report
under load profile DBCPU is 10.38 if I consider core (19/2=9.5 making it round 10 then
my database cpu utilization is almost 100%. Could you please explain more on this .
Anonymous
super post
Chandru
Oscar
Srinivas
aditaya dayal
you should have taken a real time example.. or awr with some issue at least
Anoop KC.
Xim
Makara Theang
Miguel
Vimal Rathinasamy
Thank you Brijesh for writing this. up with great Effort. This is Excellent and no words!
DECEMBER 6, 2023 REPLY
Leave a Reply
Comment
Name (optional)
Post Comment