Analyzing Statspack Report Good Oracle 9i
Analyzing Statspack Report Good Oracle 9i
html
Introduction
If you could choose just two Oracle utilities to find and monitor performance problems in your
Database system, those two utilities would be Oracle Enterprise Manager and Statspack. Which area of
the Summary page you will focus will depend on whether you are investigating a performance problem
on monitoring the load of changes, you should start checking the top 5 wait events section.
When statistics and wait events can be misleading
There are certain checks which can be performed to help identify whether a statistic or event is really of
interest. When timed_statistics is false, wait events are ordered by the number of waits. This
information may indicate which events are of interest, however it may be misleading. An event may be
waited for a large number of times, however the wait time (if it were available for comparison) may
show the actual time waited is small despite the high count, hence the event is not really of interest. If
wait time is available, a useful comparison can be made by taking the total wait time for an event, and
comparing it to the elapsed time between snapshots. For example, if the wait event accounts for only 30
seconds out of a two hour period, there is probably little to be gained by investigating this event.
However, if the event accounts for 30 minutes of a 45 minute period, the event may be worth
investigating. There is a warning here, too: even an event which had a wait of 30 minutes in a 45
minute snapshot may not be indicative of a problem, when you take into account there were 2000 users
on the system, and the host hardware was a 64 node machine.
When interpreting computed statistics (such as percentages, or per-second rates), it is important to
cross-verify the computed statistic with the actual statistic counts. This acts as a sanity check to
determine whether the derived rates are really of interest. On initial examination, a soft-parse ratio of
50% would normally indicate a potential tuning area. However if the actual statistic counts are small,
this would not be an area of interest. For example, if there was one hard parse and one soft parse during
the Statspack interval, the soft-parse ratio would be 50%, even though the statistic counts show this is
not an area of concern.
SNAPSHOT LEVELS
LEVEL 0 - GENERAL PERFORMANCE
This level can be used to gather general performance information about the database.
LEVEL 5 - GENERAL PERFORMANCE + SQL STATEMENTS (DEFAULT)
This snapshot level will gather all the information from the previous levels, plus it will collect
performance data on high resource SQL statements. This is also the default snapshot level when
Statspack is installed.
LEVEL 6 - GENERAL PERFORMANCE + SQL STATEMENTS + SQL PLANS AND SQL PLAN
USAGE
This level is new in Oracle9i and it will include all the information collected from the previous
snapshot levels, plus execution path and plan usage information as they relate to high resource SQL
statements. This type of information can prove critical when determining if the execution path or plan
has changed for high resource SQL statements. Oracle recommends using this level for when one of the
following situations has occurred:
- A plan has possibly changed after large volumes of data have been added.
- Obtaining new optimizer setting information.
LEVEL 10 - GENERAL PERFORMANCE + SQL STATEMENTS + SQL PLANS AND SQL PLAN
USAGE + PARENT AND CHILD LATCHES
This level will include all the information collected from previous snapshot levels, plus the addition of
parent and child latch information. This level will take even longer to complete since the parent and
child latch information are added to the duration of the previous 2 levels, which are already information
gathering intensive. First, because the information gathered is based on the shared_pool_size and
secondly the volume of information gathered based on SQL statement information, plus the parent and
child latch information. Snapshots taken from this level will take even longer and it is Oracle's
recommendation to only use this level when requested by Oracle technical support personnel.
LEVEL SETTING RECOMMENDATION
It is recommended to set the timed_statistics to true BEFORE the first snapshot because it will help to
establish a better baseline, otherwise another baseline will be needed AFTER it is turned on. This can
be done with the Alter SYSTEM command and/or setting it in the init.ora file.
SESSION SPECIFIC SNAPSHOT
Statspack also provides the capability to gather session specific information. Passing the i_session_id
value to the Statspack.snap procedure will enable this option.
The following is an example of using this feature:
SQL> EXECUTE STATSPACK.SNAP(i_session_id=>20);
EXECUTING SNAPSHOTS
Executing a snapshot interactively can be as easy as accessing SQL*Plus as the PERFSTAT user and
using the SNAPSHOT.SNAP command or automating when a snapshot is executed. The interactive
method is highly beneficial for when a problem is reported in the database and a snapshot could prove
beneficial for troubleshooting, whereas the value of an automated snapshot is realized when a problem
is reported at a later time and a comparison needs to be made between two specific times that occurred
in the past.
INTERACTIVE METHOD
Access SQL*Plus as the PERFSTAT user and execute either method 1, 2 or 3 as discussed in the above
snapshot Configuration section. The simplest form of the interactive mode is as follows:
SQL> EXECUTE STATSPACK.SNAP
AUTOMATED METHOD
The ability to automate a snapshot is another one of the great features of the Statspack utility.
Automating and scheduling when to take snapshots allows for the collection of database performance
information that would be beneficial for troubleshooting performance problems that occurred earlier.
The following are two ways that snapshots can be automated:
- Oracle's DBMS_JOB utility to schedule snapshots. This utility will be discussed in greater detail.
- An operating specific job scheduler. For example on Unix, shell scripts can be written and then
scheduled through the CRON scheduler. For NT, the AT scheduler in combination with .cmd files.
DBMS_JOB UTILITY
The DBMS_JOB utility provides a way to schedule database related tasks that are controlled within the
database. Through the DBMS_JOB utility snapshots can be taken at a scheduled interval. When the
spcpkg.sql script was executed as part of the Statspack installation, the DBMS_JOB package was
created for the PERFSTAT user. One of the requirements to use the DBMS_JOB utility is that the
init.ora parameter job_queue_processes must be set to a value greater than 0. The spauto.sql script is
designed to setup the automation of executing snapshots once every hour. The following line from the
script is how the job is added to the schedule:
dbms_job.submit(:jobno, 'statspack.snap;', trunc(sysdate+1/24,'HH'), - 'trunc(SYSDATE+1/24,''HH'')',
TRUE, :instno);
The benefits of using the spauto.sql script is that it:
- Displays the job number assigned
- Identifies the number of job_queue_processes set for the database
- The next time that the snapshot will occur
Load Profile
~~~~~~~~~~~~ Per Second Per
Transaction
---------------
---------------
Redo size: 351,530.67
7,007.37
Logical reads: 5,449.81
108.64
Block changes: 1,042.0 8
20.77
Physical reads: 37.71
0.75
Physical writes: 134.68
2.68
User calls: 1,254.72
25.01
Parses: 4.92
0.10
Hard parses: 0.02
0.00
Sorts: 15.73
0.31
Logons: -0.01
0.00
Executes: 473.73
9.44
Transactions: 50.17
Where:
. Redo size: This is the amount of redo generated during this
report.
. Logical Reads: This is calculated as Consistent Gets + DB Block
Gets = Logical Reads
. Block changes: The number of blocks modified during the sample
interval
. Physical Reads: The number of requests for a block that caused a
physical I/O.
. Physical Writes: The number of physical writes issued.
. User Calls: The number of queries generated
. Parses: Total of all parses: both hard and soft
. Hard Parses: Those parses requiring a completely new parse of the
SQL statement. A ‘hard parse’ rate of greater than 100 per second
indicates there is a very high amount of hard parsing on the system.
High hard parse rates cause serious performance issues, and must be
investigated. A high hard parse rate is usually accompanied by latch
contention on the shared pool and library cache latches. Check
whether waits for ‘latch free’ appear in the top-5 wait events, and
if so, examine the latching sections of the Statspack report. Of
course, we want a low number here.
. Soft Parses: Not listed but derived by subtracting the hard parses
from parses. A soft parse reuses a previous hard parse and hence
consumes far fewer resources. A high soft parse rate could be
anywhere in the rate of 300 or more per second. Unnecessary soft
parses also limit application scalability; optimally a SQL statement
should be soft-parsed once per session, and executed many times.
. Sorts and Logons are all self explanatory
. Executes: how many statements we are executing per second /
transaction
. Transactions: how many transactions per second we process
This gives an overall view of the load on the server. In this case, we are looking at a very good hard
parse number and a fairly high system load.
The per-second statistics show you the changes in throughput (i.e. whether the instance is performing
more work per second). For example:
• a significant increase in ‘redo size’, ‘block changes’ and ‘pct of blocks changed per read’ would
indicate the instance is performing more inserts/updates/deletes.
• an increase in the ‘redo size’ without an increase in the number of ‘transactions per second’ would
indicate a changing transaction profile.
Similarly, looking at the per-transaction statistics allows you to identify changes in the application
characteristics by comparing these to the corresponding statistics from the baseline report.
Interpreting the ratios in this section can be slightly more complex than it may seem at first glance.
While high values for the ratios are generally good (indicating high efficiency), such values can be
misleading your system may be doing something efficiently that it would be better off not doing at all.
Similarly, low values aren't always bad. For example, a low in-memory sort ratio (indicating a low
percentage of sorts performed in memory) would not necessarily be a cause for concern in a decision-
support system (DSS) environment, where user response time is less critical than in an online
transaction processing (OLTP) environment.
Basically, you need to keep in mind the characteristics of your application - whether it is query-
intensive or update-intensive, whether it involves lots of sorting, and so on - when you're evaluating the
Instance Efficiency Percentages. Here's how each ratio is calculated, along with which related sections
of the report you should look at when investigating suspicious values:
It is possible for both the 'buffer hit ratio' and the 'execute to parse' ratios to be negative. In the case
of the buffer hit ration, the buffer cache is too small and the data in is being aged out before it can be
used so it must be retrieved again. This is a form of thrashing which degrades performance immensely.
Execute to Parse. If value is negative, it means that the number of parses is larger than the number of
executions. Another cause for a negative execute to parse ratio is if the shared pool is too small and
queries are aging out of the shared pool and need to be reparsed. This is another form of thrashing
which also degrades performance tremendously. This is very BAD!!
Buffer Nowait Ratio. This ratio relates to requests that a server process makes for a specific buffer; it
gives the percentage of those requests in which the requested buffer is immediately available. All buffer
types are included in this statistic. If the ratio is low, check the Buffer Wait Statistics section of the
report for more detail on which type of block is being contended for.
Buffer Hit Ratio. This ratio, also known as the buffer-cache hit ratio, gives the percentage of block
requests that were satisfied within the cache without requiring physical I/O. Although historically
known as one of the most important statistics to evaluate, this ratio can sometimes be misleading. A
low buffer hit ratio does not necessarily mean the cache is too small; it may be that potentially valid
full-table scans are artificially reducing what is otherwise a good ratio. Similarly, a high buffer hit ratio
(say, 99 percent) normally indicates that the cache is adequately sized, but this assumption may not
always be valid. For example, frequently executed SQL statements that repeatedly refer to a small
number of buffers via indexed lookups can create a misleadingly high buffer hit ratio. When these
buffers are read, they are placed at the most recently used (MRU) end of the buffer cache; iterative
access to these buffers can artificially inflate the buffer hit ratio. This inflation makes tuning the buffer
cache a challenge. Sometimes you can identify a too-small buffer cache by the appearance of the write
complete waits event, which indicates that hot blocks (that is, blocks that are still being modified) are
aging out of the cache while they are still needed; check the Wait Events list for evidence of this event.
Library Hit Ratio. This ratio, also known as the library-cache hit ratio, gives the percentage of pin
requests that result in pin hits. A pin hit occurs when the SQL or PL/SQL code to be executed is already
in the library cache and is valid to execute. If the "Library Hit ratio" is low, it could be indicative of a
shared pool that is too small (SQL is prematurely aging out), or just as likely, that the system did not
make correct use of bind variables in the application. If the soft parse ratio is also low, check whether
there's a parsing issue.
Redo Nowait Ratio. This ratio indicates the amount of redo entries generated for which there was
space available in the redo log. The percentage is calculated as follows:
100 x (1- (redo-log space requests/redo entries))
The redo-log space-request statistic is incremented when an Oracle process attempts to write a redo-log
entry but there is not sufficient space remaining in the online redo log. Thus, a value close to 100
percent for the redo nowait ratio indicates minimal time spent waiting for redo logs to become
available, either because the logs are not filling up very often or because the database is able to switch
to a new log quickly whenever the current log fills up.
If your alert log shows that you are switching logs frequently (that is, more than once every 15
minutes), you may be able to reduce the amount of switching by increasing the size of the online redo
logs. If the log switches are not frequent, check the disks on which the redo logs reside to see why the
switches are not happening quickly. If these disks are not overloaded, they may be slow, which means
you could put the files on faster disks.
In-Memory Sort Ratio. This ratio gives the percentage of sorts that were performed in memory, rather
than requiring a disk-sort segment to complete the sort. Optimally, in an OLTP environment, this ratio
should be high. Setting the PGA_AGGREGATE_TARGET (or SORT_AREA_SIZE) initialization
parameter effectively will eliminate this problem.
Soft Parse Ratio. This ratio gives the percentage of parses that were soft, as opposed to hard. A soft
parse occurs when a session attempts to execute a SQL statement and a usable version of the statement
is already in the shared pool. In other words, all data (such as the optimizer execution plan) pertaining
to the statement in the shared pool is equally applicable to the statement currently being issued. A hard
parse, on the other hand, occurs when the current SQL statement is either not in the shared pool or not
there in a shareable form. An example of the latter case would be when the SQL statement in the shared
pool is textually identical to the current statement but the tables referred to in the two statements
resolve to physically different tables.
Hard parsing is an expensive operation and should be kept to a minimum in an OLTP environment. The
aim is to parse once, execute many times.
Ideally, the soft parse ratio should be greater than 95 percent. When the soft parse ratio falls much
below 80 percent, investigate whether you can share SQL by using bind variables or force cursor
sharing by using the init.ora parameter cursor_sharing (new in Oracle8 i Release 8.1.6).
The Soft Parse % value is one of the most important (if not the only important) ratio in the database.
For a typical OLTP system, it should be as near to 100% as possible. You quite simply do not hard
parse after the database has been up for a while in your typical transactional / general-purpose database.
Before you jump to any conclusions about your soft parse ratio, however, be sure to compare it against
the actual hard and soft parse rates shown in the Load Profile. If the rates are low (for example, 1 parse
per second), parsing may not be a significant issue in your system. Another useful standard of
comparison is the proportion of parse time that was not CPU-related, given by the following ratio:
(parse time CPU) / (parse time elapsed)
A low value for this ratio could mean that the non-CPU-related parse time was spent waiting for
latches, which might indicate a parsing or latching problem. To investigate further, look at the shared-
pool and library-cache latches in the Latch sections of the report for indications of contention on these
latches.
Latch Hit Ratio. This is the ratio of the total number of latch misses to the number of latch gets for all
latches. A low value for this ratio indicates a latching problem, whereas a high value is generally good.
However, as the data is rolled up over all latches, a high latch hit ratio can artificially mask a low get
rate on a specific latch. Cross-check this value with the Top 5 Wait Events to see if latch free is in the
list, and refer to the Latch sections of the report. Latch Hit % of less than 99 percent is usually a big
problem.
Also check the "Shared Pool Statistics", if the "End" value is in the high 95%-100% range ,this is a
indication that the shared pool needs to be increased (especially if the "Begin" value is much smaller)
***Please see the following NOTES on shared pool issues
[NOTE:146599.1] Diagnosing and Resolving Error ORA04031
[NOTE:62143.1] Understanding and Tuning the Shared Pool
[NOTE:105813.1] SCRIPT TO SUGGEST MINIMUM SHARED POOL SIZE
This section is among the most important and relevant sections in the Statspack report. Here is
where you find out what events (typically wait events) are consuming the most time. In Oracle9i
Release 2, this section includes a new event: CPU time
When you are trying to eliminate bottlenecks on your system, your Statspack report's Top 5 Timed
Events section is the first place to look. This section of the report shows the top 5 wait events, the full
list of wait events, and the background wait events. If your system's TIMED_STATISTICS
initialization parameter is set to true, the events are ordered in time waited, which is preferable, since
all events don't show the waits. If TIMED_STATISTICS is false, the events are ordered by the number
of waits.
Listing 1 shows a large number of waits related to reading a single block (db file sequential read) as
well as waits for latches (latch free). You can see in this listing high waits for some of the writing to
datafiles and log files. To identify which of these are major issues, you must narrow down the list by
investigating the granular reports within other sections of Statspack.
Code Listing 1: Statspack report showing waits related to reading a single block
Top 5 Wait Events
------------------------------------
Event Waits Time (s) % Total Elap. Time
------------------------------------------------------------------------------
db file sequential read 18,977,104 22,379,571 82.29
latch free 4,016,773 2,598,496 9.55
log file sync 1,057,224 733,490 2.70
log file parallel write 1,054,006 503,695 1.85
db file parallel write 1,221,755 404,230 1.49
Based on this listing we may be tempted to immediately start looking at the causes between the 'direct
path read' and 'db file scattered read' waits and to try to tune them. This approach would not take into
account 'Service Time'. Here is the statistic that measures 'Service Time' from the same report:
In this section we list the I/O-related Wait Events that occur most often in Oracle databases together
with reference notes describing each wait.
Datafile I/O-Related Wait Events:
'db file sequential read' [NOTE:34559.1]
'db file scattered read' [NOTE:34558.1]
'db file parallel read'
'direct path read' [NOTE:50415.1]
'direct path write' [NOTE:50416.1]
'direct path read (lob)'
'direct path write (lob)'
Controlfile I/O-Related Wait Events:
'control file parallel write'
'control file sequential read'
'control file single write'
Redo Logging I/O-Related Wait Events:
'log file parallel write' [NOTE:34583.1]
'log file sync' [NOTE:34592.1]
'log file sequential read'
'log file single write'
'switch logfile command'
'log file switch completion'
'log file switch (clearing log file)'
'log file switch (checkpoint incomplete)'
'log switch/archive'
'log file switch (archiving needed)'
Buffer Cache I/O-Related Wait Events:
'db file parallel write' [NOTE:34416.1]
'db file single write'
'write complete waits'
'free buffer waits'
Approaches for handling General I/O problems
Some of these approaches can be used regardless of the particular Wait Event.
o Redistribute database I/O by manual placement of database files across different filesystems,
controllers and physical devices
This is an approach used in the absence of advanced modern storage technologies. Again the aim is to
distribute the database I/O so that no single set of disks or controller becomes saturated from I/O
requests when there is still unused disk throughput. It is harder to get right than the previous approach
and most often less successful.
wait',
where t.hash_value=p.hash_value
and p.options='FULL'
where t.hash_value=p.hash_value
and p.operation='INDEX'
SQL Text
~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
they were first found in the shared pool. A Plan Hash Value
will appear
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
of current values
-------------------------------------------------------------
-------------------
-------------------------------------------------------------
-------------------
----| | | 52 |
|LOAD AS SELECT |
| | | |
| VIEW | |
1K| 216K| 44 |
| FILTER |
| | | |
| HASH JOIN | |
1K| 151K| 38 |
29 | 464 | 2 |
3K| 249K| 35 |
1 | 7 | 2 |
1 | | 1 |
| NESTED LOOPS | |
5 | 115 | 16 |
1 | 10 | 2 |
1 | 26 | 14 |
1 | 26 | 14 |
1 | 26 | 14 |
1 | 26 | 14 |
1 | 26 | 14 |
1 | 26 | 14 |
1 | 26 | 14 |
1 | 26 | 14 |
1 | 26 | 14 |
1 | 26 | 14 |
1 | 26 | 14 |
1 | 26 | 14 |
1 | 26 | 14 |
1 | 26 | 14 |
1 | 26 | 14 |
1 | 26 | 14 |
1 | 26 | 14 |
1 | 26 | 14 |
1 | 26 | 14 |
| VIEW | |
1 | 13 | 2 |
| FAST DUAL | |
1 | | 2 |
-------------------------------------------------------------
-------------------
Instance Activity Stats for DB: PHS2 Instance: phs2 Snaps: 100 -104
Statistic Total per Second
per Trans
--------------------------------- ---------------- ------------
------------
CPU used by this session 84,161 23.4
3,825.5
CPU used when call started 196,346 54.5
8,924.8
CR blocks created 709
0.2 32.2
DBWR buffers scanned 0
0.0 0.0
DBWR checkpoint buffers written 245
0.1 11.1
DBWR checkpoints 33
0.0 1.5
DBWR cross instance writes 93
0.0 4.2
DBWR free buffers found 0
0.0 0.0
....
Trustworthy if :
(db version>= 8.1.7.2 and 9.0.1)
OR ((db version >= 9.0.1.1) = 8.0.6.0 AND not using job_queue_processes AND CPU_PER_CALL =
default)
recursive cpu usage = This component can be high if large amounts of PL/SQL are being processed.
It is outside the scope of this document to go into detail with this, but you will need to identify your
complete set of PL/SQL, including stored procedures, finding the ones with the highest CPU load and
optimize these. If most work done in PL/SQL is procedural processing (rather than executing SQL), a
high recursive cpu usage can actually indicate a potential tuning effort.
parse time cpu= Parsing SQL statements is a heavy operation, that should be avoided by reusing SQL
statements as much as possible. In precompiler programs, unnecessary parting of implicit SQL
statements can be avoided by increasing the cursor cache (MAXOPENCURSORS parameter) and by
reusing cursors. In programs using Oracle Call Interface, you need to write the code, so that it re-
executes (in stead of reparse) cursors with frequently executed SQL statements. The v$sql view
contains PARSE_CALLS and EXECUTIONS columns, that can be used to identify SQL, that is parsed
often or is only executed once per parse.
other cpu= The source of other cpu is primarily handling of buffers in the buffer cache. It can
generally be assumed, that the CPU time spent by a SQL statement is approximately proportional to the
number of buffer gets for that SQL statements, hence, you should identify and sort SQL statements by
buffer gets in v$sql. In your statspack report, look at the part ‘SQL ordered by Gets for DB’. Start
tuning SQL statements from the top of this list. In Oracle9i, the v$sql view contain a column,
CPU_TIME, which directly shows the cpu time associated with executing the SQL statement.
- DBWR BUFFERS SCANNED: the number of buffers looked at when scanning the lru portion of
the buffer cache for dirty buffers to make clean. Divide by "dbwr lru scans" to find the average number
of buffers scanned. This count includes both dirty and clean buffers. The average buffers scanned may
be different from the average scan depth due to write batches filling up before a scan is complete. Note
that this includes scans for reasons other than make free buffer requests.
- DBWR CHECKPOINTS: the number of checkpoints messages that were sent to DBWR and not
necessarily the total number of actual checkpoints that took place. During a checkpoint there is a slight
decrease in performance since data blocks are being written to disk and that causes I/O. If the number
of checkpoints is reduced, the performance of normal database operations improve but recovery after
instance failure is slower.
- DBWR TIMEOUTS: the number of timeouts when DBWR had been idle since the last timeout.
These are the times that DBWR looked for buffers to idle write.
- DIRTY BUFFERS INSPECTED: the number of times a foreground encountered a dirty buffer
which had aged out through the lru queue, when foreground is looking for a buffer to reuse. This
should be zero if DBWR is keeping up with foregrounds.
- FREE BUFFER INSPECTED: the number of buffers skipped over from the end of the LRU queue
in order to find a free buffer. The difference between this and "dirty buffers inspected" is the number of
buffers that could not be used because they were busy or needed to be written after rapid aging out.
They may have a user, a waiter, or being read/written.
- RECURSIVE CALLS: Recursive calls occur because of cache misses and segment extension. In
general if recursive calls is greater than 30 per process, the data dictionary cache should be optimized
and segments should be rebuilt with storage clauses that have few large extents. Segments include
tables, indexes, rollback segment, and temporary segments.
NOTE: PL/SQL can generate extra recursive calls which may be unavoidable.
- REDO BUFFER ALLOCATION RETRIES: total number of retries necessary to allocate space in
the redo buffer. Retries are needed because either the redo writer has gotten behind, or because an
event (such as log switch) is occurring
- REDO LOG SPACE REQUESTS: indicates how many times a user process waited for space in the
redo log buffer. Try increasing the init.ora parameter LOG_BUFFER so that zero Redo Log Space
Requests are made.
- REDO WASTAGE: Number of bytes "wasted" because redo blocks needed to be written before they
are completely full. Early writing may be needed to commit transactions, to be able to write a
database buffer, or to switch logs
- SUMMED DIRTY QUEUE LENGTH: the sum of the lruw queue length after every write request
completes. (divide by write requests to get average queue length after write completion)
- TABLE FETCH BY ROWID: the number of rows that were accessed by a rowid. This includes
rows that were accessed using an index and rows that were accessed using the statement where rowid =
'xxxxxxxx.xxxx.xxxx'.
- TABLE FETCH BY CONTINUED ROW: indicates the number of rows that are chained to another
block. In some cases (i.e. tables with long columns) this is unavoidable, but the ANALYZE table
command should be used to further investigate the chaining, and where possible, should be eliminated
by rebuilding the table.
- Table Scans (long tables) is the total number of full table scans performed on tables with more than 5
database blocks. If the number of full table scans is high the application should be tuned to effectively
use Oracle indexes. Indexes, if they exist, should be used on long tables if less than 10-20% (depending
on parameter settings and CPU count) of the rows from the table are returned. If this is not the case,
check the db_file_multiblock_read_count parameter setting. It may be too high. You may also need to
tweak optimizer_index_caching and optimizer_index_cost_adj.
- Table Scans (short tables) is the number of full table scans performed on tables with less than 5
database blocks. It is optimal to perform full table scans on short tables rather than using indexes.
The above shows no real contention. Typically, when there is buffer contention, it is due to data block
contention with large average wait times, like the example below:
Buffer wait Statistics for DB: GLOVP Instance: glovp Snaps: 454 - 455
Tot Wait Avg
Class Waits Time (cs) Time (cs)
------------------ ----------- ---------- ---------
data block 9,698 17,097 2
undo block 210 1,225 6
segment header 259 367 1
undo header 259 366 1
file header block 24 1 33
system undo header 1 0 0
Enqueue Activity
An enqueue is simply a locking mechanism. This section is very useful and must be used when the wait
event "enqueue" is listed in the "Top 5 timed events".
Enqueue activity for DB: S901 Instance: S901 Snaps: 2 -3
-> Enqueue stats gathered prior to 9i should not be compared with 9i data
-> ordered by waits desc, Waits desc
Avg Wt Wait
Eq Requests Succ Gets Failed Gets Waits Time (ms) Time (s)
-- ------------ ------------ ----------- ----------- ------------- ------------
TC 44,270 44,270 0 8,845 619.37 5,478
TX 13,072,864 13,072,809 0 4,518 641.72 2,899
CU 5,532,494 5,532,494 0 33,355 4.78 159
SQ 418,547 418,547 0 1,251 15.10 19
PS 5,950,717 5,189,366 761,354 69,381 .19 13
US 4,912 4,912 0 282 45.16 13
PR 8,325 8,325 0 11 213.64 2
CI 67,060 67,060 0 15 18.93 0
JD 165,560 165,560 0 1 261.00 0
HW 56,401 56,401 0 3 2.67 0
The action to take depends on the lock type that is causing the most problems. The most common lock
waits are generally for:
- TX (Transaction Lock): Generally due to application concurrency mechanisms, or table setup issues.
The TX lock is acquired when a transaction initiates its first change and is held until the transaction
does a COMMIT or ROLLBACK. It is used mainly as a queuing mechanism so that other resources
can wait for a transaction to complete.
- TM (DML enqueue): Generally due to application issues, particularly if foreign key constraints have
not been indexed. This lock/enqueue is acquired when performing an insert, update, or delete on a
parent or child table.
- ST (Space management enqueue): Usually caused by too much space management occurring. For
example: create table as select on large tables on busy instances, small extent sizes, lots of sorting, etc.
These enqueues are caused if a lot of space management activity is occurring on the database (such as
small extent size, several sortings occurring on the disk).
The description of the view V$UNDOSTAT in the Oracle9i Database Reference guide provides
some insight as to the columns definitions. Should the client encounter SMU problems,
monitoring this view every few minutes would provide more useful information.
- Undo Segment Stats for DB
Undo Segment Stats for DB: S901 Instance: S901 Snaps: 2 -3
-> ordered by Time desc
This section provides a more detailed look at the statistics in the previous section by listing the
information as it appears in each snapshot.
It should be noted that 9i introduces an optional init.ora parameter called UNDO_RETENTION
which allows the DBA to specify how long the system will attempt to retain undo information
for a committed transaction without being overwritten or recaptured. This parameter, based in
units of wall-clock seconds, is defined universally for all undo segments.
Use of UNDO_RETENTION can potentially increase the size of the undo segment for a given
period of time, so the retention period should not be arbitrarily set too high. The UNDO
tablespace still must be sized appropriately. The following calculation can be used to determine
how much space a given undo segment will consume given a set value of
UNDO_RETENTION.
Undo Segment Space Required = (undo_retention_time * undo_blocks_per_seconds)
As an example, an UNDO_RETENTION of 5 minutes (default) with 50 undo blocks/second (8k
blocksize) will generate:
Undo Segment Space Required = (300 seconds * 50 blocks/ seconds * 8K/block) = 120 M
The retention information (transaction commit time) is stored in every transaction table block
and each extent map block. When the retention period has expired, SMON will be signaled to
perform undo reclaims, done by scanning each transaction table for undo timestamps and
deleting the information from the undo segment extent map. Only during extreme space
constraint issues will retention period not be obeyed.
Latch Information
Latch information is provided in the following three sections:
. Latch Activity
. Latch Sleep breakdown
. Latch Miss Sources
This information should be checked whenever the "latch free" wait event or other latch wait events
experience long waits. This section is particularly useful for determining latch contention on an
instance. Latch contention generally indicates resource contention and supports indications of it in
other sections. Latch contention is indicated by a Pct Miss of greater than 1.0% or a relatively high
value in Avg Sleeps/Miss. While each latch can indicate contention on some resource, the more
common latches to watch are:
cache buffer chain= The cache buffer chain latch protects the hash chain of cache buffers, and is used
for each access to cache buffers. Contention for this latch can often only be reduced by reducing the
amount of access to cache buffers. Using the X$BH fixed table can identify if some hash chains have
many buffers associated with them. Often, a single hot block, such as an index root block, can cause
contention for this latch. In Oracle9i, this is a shared latch, which minimizes contention for blocks
being read only. Contention on this latch confirms a hot block issue.
shared pool= The shared pool latch is heavily used during parsing, in particular during hard parse. If
your application is written so that it generally uses literals in stead of bind variables, you will have high
contention on this latch. Contention on this latch in conjunction with reloads in the SQL Area of the
library cache section indicates that the shared pool is too small. In release 8.1.6 and later, you can set
the cursor_sharing parameter in init.ora to the value ‘force’ to reduce the hard parsing and reduce some
of the contention for the shared pool latch. Applications that are coded to only parse once per cursor
and execute multiple times will almost completely avoid contention for the shared pool latch.
• Literal SQL is being used. See Note 62143.1 'Understanding and Tuning the Shared Pool for an
excellent discussion of this topic.
• On versions 8.1.7.2 and higher, session_cached_cursors might need to be set. See enhancement
bug 1589185 for details.
library cache= The library cache latch is heavily used during both hard and soft parsing. If you have
high contention for this latch, your application should be modified to avoid parsing if at all possible.
Setting the cursor_sharing parameter in init.ora to the value ‘force’ provides some reduction in the
library cache latch needs for hard parses, and setting the session_cached_cursors sufficiently high
provides some reduction in the library cache latch needs for repeated soft parsing within a single
session. There is minor contention for this latch involved in executing SQL statements, which can be
reduced further by setting cursor_space_for_time=true, if the application is properly written to parse
statements once and execute multiple times.
row cache= The row cache latch protects the data dictionary information, such as information about
tables and columns. During hard parsing, this latch is used extensively. In release 8.1.6 and above, the
cursor_sharing parameter can be used to completely avoid the row cache latch lookup during parsing.
cache buffer lru chain= The buffer cache has a set of chains of LRU block, each protected by one of
these latches. Contention for this latch can often be reduced by increasing the db_block_lru_latches
parameter or by reducing the amount of access to cachebuffers.
Values in Pct Misses or Reloads in the SQL Area, Tables/Procedures or Trigger rows indicate that the
shared pool may be too small. To confirm this, consistent values (not sporadic) in Pct Misses or
Reloads in the Index row indicate that the buffer cache is too small. (No longer available in 9i.)
Values in Invalidations in the SQL Area indicate that a table definition changed while a query was
being run against it or a PL/SQL package being used was recompiled.
Wait Events Look for excessive waits and wait times; drill down to
specific problems
SQL Ordered by Buffer Gets, Physical Reads, and Rows Figure out which SQL statements to tune
Processed
Instance Activity Statistics Compare with baseline report; compute additional statistics
Tablespace and File I/O Investigate I/O bottlenecks, identify files and tablespaces
with heavy I/O
Buffer Pool Identify specific buffer pools with high contention or I/O
Buffer Wait Statistics Identify types of buffers with large number of buffer waits
Enqueue Activity Investigate specific lock types that are causing the most
waits
Rollback Segment Statistics and Storage Investigate waits for rollback segment headers
Latch Activity, Latch Sleep Breakdown, Latch Miss Identify latching bottlenecks; diagnose and related problems
Sources
DB File Wait for Multi-block read of a table or index (full scan): tune the code and/or
Scattered Read cache small tables.
DB File Wait for single block read of a table or index. Indicates many index reads: tune the
Sequential Read code (especially joins).
Used when Oracle performs in parallel reads from multiple datafiles to non-
DB File parallel
contiguous buffers in memory (PGA or Buffer Cache). Similar to db file sequential
read
read
Free Buffer Increase the DB_CACHE_SIZE; shorten the checkpoint; tune the code.
Buffer Busy Data block: separate "hot" data; use reverse key indexes and/or smaller blocks.
Buffer Busy Data block: increase initrans and/or maxtrans.
Buffer Busy Undo block: commit more often; use larger rollback segments or areas.
Log Buffer Space Increase the log buffer; use faster disks for the redo logs.
Log File Switch Archive destination slow or full; add more or larger redo logs.
Log File Sync Commit more records at a time; use faster redo log disks or raw devices.
Direct Path Read Used by Oracle when reading directly into PGA (sort or hash)
Direct Path Write Used by Oracle when writing directly into PGA (sort or hash)
Idle Event Ignore it.
Lock manager wait for remote message Oracle9i Real Application Clusters