SlideShare a Scribd company logo
Oracle Database Performance Diagnostics

Oracle Database “Performance diagnostics” : Before you begin
Hemant K Chitale

Introduction
What do you, as the DBA / Developer / System Administrator / Analyst / Performance
Analyst / Application Manager, do when you get calls like:
1. The “system” is slow
2. The batch job is “hanging”
3. Users cannot login
Are these Database Performance issues? Always?
Where do you begin diagnostics? Do you jump into trace files, StatsPack / AWR, OS
statistics etc ?
This article is a primer on what you should be aware of *before* you begin looking at
Oracle Trace Files, Explain Plans, Statistics and what-not.
The diagnostic process must be able to help the Oracle Database Performance Analyst
identify :
a. Whether there really is an “issue”
b. How well the issue is defined, if necessary redefine it
c. Where the cause arises
d. What can be done to address the cause

Note : This article is NOT about how to use Oracle and OS methods to diagnose a
performance issue and/or tune an SQL/Application/Schema/Database. It is about what
you should be aware of before you begin.

Environmental Factors
Let’s begin with some basic factors:
1. Response Time
“Response Time” is what users (and application servers!) see. They do not see
‘consistent gets’ or ‘redo size’ or ‘enq: TX - row lock contention”.
User perception of a system’s usability is significantly impacted by Response Time. “fit
for use” (the application is usable) must co-exist with “fit for purpose” (the application
does what it is supposed to do).
On the other hand, Response Time for a batch job can vary from execution time for a
(significant) single SQL call to the elapsed time for a key stage in the job.
© Hemant K Chitale 2011.

http://hemantoracledba.blogspot.com
Oracle Database Performance Diagnostics

2. Tiers
There are very many tiers through which a response reaches a user (or an application
server, depending on who/what has “response issues”).
From the desktop, via a browser, over the internet/intranet to an application server,
rewritten as an SQL call to the database, parsed and executed by the database, CPU and
I/O cycles consumed to fetch, filter and compute values, round-trips between the
application server and database server, formatting on the application server, latency down
to the user’s desktop; there are very many tiers that are comprised in an application’s
performance. Such tiers also exist in a batch job – often ignored are the round-trips
between the application server and database server.
3. Capacity
Each “component” (be it the User’s Desktop or the WAN Link or the App Server CPU or
the App Server RAM etc … down through the Tiers) has a defined Capacity – theoretical
and practical. Within a database instance, also, there are capacity parameters – e.g. SGA
sizing parameters, the processes parameter etc.
4. Usage
Usage of the available capacity of any component varies from time to time. Any tool that
“measures” usage has to collect a snapshot of usage at a certain point in time. Multiple
snapshots must be analyzed together.
5. Throughput
Throughput is the volume of “load” (Transactions/Queries/Rows/Users – each is a
different facet of “load”) that is being serviced by the “system”.
6. Constraints
Capacity is a constraint.
Concurrency is a constraint as well.
Two
users/processes/sessions may not be permitted to modify the same row/resource at the
same time.
7. Serialisation
Because Capacity is not Unlimited and because there are Constraints
(automatic/system/artificial/user-defined), there may well be some points in application
code or database code or the operating system where serialisation occurs.
8. Requirements
Volume requirements, usability requirements and control requirements are defined by
users / analysts and must be built into the “system”. Requirements also add to code
complexity.
9. Scalability
Scalability of the system is it’s ability to handle additional workload without more than a
proportional increase in component resources (CPU, RAM, I/O) usage. Scalability is

© Hemant K Chitale 2011.

http://hemantoracledba.blogspot.com
Oracle Database Performance Diagnostics

adversely impacted by points of contention or serialisation in the requirements / design /
code.
10. Non-Linearity
Many systems are non-linear. If a query that processes ten thousand rows that are always
in memory and never overflows to disk for Group/Sort operations takes 1second to run, it
doesn’t necessarily follow that a hundred thousand rows would take 10seconds. The
hundred thousand row query may require multiple disk reads because not all rows are
cached in memory and, furthermore, the Group/Sort operations also overflow to disk.

11. Shared Resources
A database server may be configured to host multiple databases. The CPU and I/O load
of one or more “other” databases may well be “interference” in the performance of a
database under review. The “cost” of such “interference” must be computed and
accounted for. Similarly, within a database, Batch reports may interfere with online
queries. Also, when multiple schemas (e.g. for different “applications”) are provided for
within a database, they share and contend for shared pool, library cache and buffer cache
resources as well as for CPU and I/O.

These basic Factors apply to any System. They apply to Airports and Aeroplanes. They
apply to Factories and Refineries. They apply to Hotels and Restaurants. They apply to
Applications using Oracle Databases.

As an Oracle Database Performance Analyst (a DBA or a Developer or a System
Administrator), it is necessary to be aware of these Factors.

Definition of “Issue”
The definition is the first step in the process.
First start with identify what the command/process/job is that is under contention. Is it a
daily task? How many components (see Factor 2 “Tiers”) does it involve? Do you / the
team need to evaluate the capacity, usage and throughput of each of the Tiers? Can a
specific Tier be identified as a constraint?
Typically for a performance issue, the best reference is “Time Taken”. How long does
the particular command/process/job take to run? How long did it take to run on previous
occasions? Was there any variance in run times on previous occasions?
Can a test system / test run be executed? Can the test be traced (end to end, from the
user to system level waits and back to the user)? Can the production run be traced? Can
© Hemant K Chitale 2011.

http://hemantoracledba.blogspot.com
Oracle Database Performance Diagnostics

both traces be compared? Remember: The test may not have the same level of
components, capacity, throughput, usage and may have a different set of constraints.
Also important to understand when analysing the performance of a particular
system/job/process/function is to be able to differentiate between “short, sharp” queries
and sessions and “medium to long running” batch jobs and reports. A system may have a
mix of such operations.
Some of these questions may not need to be formally asked. The answers may be well
known or documented (e.g. the components and capacity). Others may need to be
discovered (e.g. previous response times, usage). Throughput and constraints may get
identified only during the diagnostics phase (unless some of them are “well known” and
documented).

A good definition of an issue might be:
Program “A” that takes 15minutes to run at (approximately) the same time every day (on
the same server), for the same volume of data, is now (since the last 2 days) taking
45minutes, although no change to program code or parameters has been made.

Another good definition might be:
Users are usually able to view the details on screen within 5seconds of submitting the
query and navigate through all the screens in 15seconds and commit in 2seconds but the
same query and same data is now taking 25seconds, 30seconds and 5seconds
respectively, under the same user workload.

Another example might be:
We have exactly doubled the incoming data volume for the ETL job but processing time is
now 5x with no other changes to the system.

Collection of Data
Use the “Questionnaire for Issue Identification” in the Appendix. Remember, not all
questions need to be formally raised. Some information may be available from
documentation. Some recursion may be necessary – questions or answers that were
deemed “insignificant” during the first round of diagnostics may have to be revisited and
reviewed. (e.g. early discussion may have considered that the network was always stable
but testing or trace files may indicate that network round trips are significant so that
network component (“Tier”) may have to be revisited).
© Hemant K Chitale 2011.

http://hemantoracledba.blogspot.com
Oracle Database Performance Diagnostics

Some of the data collection may take time -- .e.g. running a trace and analyzing the trace
file. You may need to prioritize which data is to be collected early while other collection
can run “in the background”. Time Data should always be the first priority.

Time Data
Data about “Time taken to process/run the query/request/job/batch” should be in terms of
Seconds or Minutes (where the time exceeds an hour).
Data about “Time for on-screen query” should be in terms of seconds.
Data about “SQL Execution time” should be in terms of Milli-seconds, Seconds or
Minutes.
Time data for previous runs (including min/avg/max) and test runs should also be
collected.
When collecting data about different executions, ensure that the executions are
comparable – e.g. at the same time of day, for the same volume.

Time Series Data
Time Series Data (as different from “Time Data”) is about plotting performance
information and statistics over time and validating if a trend exists. If such a trend exists,
it must be considered as a factor when evaluating and projecting load and performance.
Such Time Series Data covers not only performance and response times but also volume
and workload, concurrency and throughput.

Components (Tiers) data
Data about the Tiers involved should include :
a. Hardware Size (number of CPUs/Cores, CPU Speeds, RAM, HBAs, Network
Interfaces)
b. Operating System and FileSystem types
c. OS performance counters – sar, vmstat, iostat, top, topas
d. Latency (min/avg/max)

Volume / Workload data
Data about Volume and Workload should include:
a. Number of concurrent, active users
b. Number and sizes of rows being processes
c. Number and sizes of batch jobs running concurrently
Such workloads impact throughput and concurrency.

© Hemant K Chitale 2011.

http://hemantoracledba.blogspot.com
Oracle Database Performance Diagnostics

Execution Plans, Statistics, Wait Events
Details about SQL Execution Plans and Execution Statistics (e.g. “consistent gets”) and
Wait Events are to be collected and analyzed when it is determined that performance
within the database needs to be reviewed. Let me emphasise: This is only after you
have determined that the database and, in particular, a specific portion of the application
needs to be reviewed. Do not jump into this too soon. I put this last in the list of data to
be collected.

Interpreting the Data
The Time data must be interpreted to identify patterns. For example, has the job been
taking ever more increasing time as the weeks/months have progressed? Does the job
take more time on certain days or at certain times? Is there a correlation between the
Time and the Volume? Can a report that is to be run every 30minutes be allowed to take
10minutes to run? Should the report OR the schema OR the data loads be redesigned to
have the report run in less than 1minute? Or should the frequency of the report be
changed to run every 60minutes?
Workload/Volume/Usage and Capacity/Throughput/Tiers data must be correlated. Does a
20% increase in Workload/Volume/Usage result in a 20% increase in CPU usage?
Oracle Trace Files, Oracle Wait Statistics, Server Performance (sar, vmstat, ping latency)
data must be reviewed to identify component resource utilisation. The key resources
CPU, RAM and I/O are used to transfer data to the user. Therefore, it is necessary to
correlate the usage of these resources to the volume of data. Does the query that fetches
100 rows without having to do any aggregation really need to do 1million buffer gets?

Making Recommendations
What changes (schema, code, architecture) you recommend will, to a not inconsiderable
degree, depend on your prior experiences and “confidence” level in the tools and methods
used. Remember that your proposed changes may interact with and impact other
environmental factors!
Identify which “environmental factors” are impinging on performance.
recommendation should be able to address the factor.

© Hemant K Chitale 2011.

http://hemantoracledba.blogspot.com

Your
Oracle Database Performance Diagnostics

A cardinal rule of Performance is “never does anything that is not necessary”. For
example, when you review a user requirement, you do ask the questions “Is this
requirement necessary? Has it already been met by some portion of the design that the
user is not aware of? Should the data be duplicated?” Similarly, when reviewing a
system, configuration or code (or a diagnostic trace) asks the questions “Is this
component necessary? Is it duplicated? Is the same task being done repeatedly (e.g. a
lookup on the same rows or a validation being done twice)?”

Managing Changes
Once the root cause for an issue is identified, and recommendations made the steps of
defining, creating, testing and migrating the change (or changes) required have to be
careful managed.
Some issues can be addressed by workarounds while others may require changes with
long term impacts. However, workarounds, themselves, may have adverse consequences.
A reasonable degree of confidence in the impact assessment is a requirement.

Appendix
Example Questionnaire for Issue Definition:
What is the command/process/job is that is under contention? What is it called?
Is it a daily task?
How many components (see Factor 2 “Tiers”) does it involve? List each component.
Do you / the team need to evaluate the capacity, usage and throughput of each of the
Tiers?
Can a specific Tier be identified as a constraint?
How long does the particular command/process/job take to run?
How long did it take to run on previous occasions?
Was there any variance in run times on previous occasions?
Can a test system / test run be executed?
Can the test be traced (end to end, from the user to system level waits and back to the
user)?
Can the production run be traced?
Can both traces be compared?

© Hemant K Chitale 2011.

http://hemantoracledba.blogspot.com

More Related Content

What's hot (19)

PPTX
שבוע אורקל 2016
Aaron Shilo
 
PDF
End to-end root cause analysis minimize the time to incident resolution
Cleo Filho
 
PPT
resource governor
Aaron Shilo
 
DOC
Analyzing awr report
satish Gaddipati
 
PPT
Earl Shaffer Oracle Performance Tuning pre12c 11g AWR uses
oramanc
 
PPTX
Database Performance Tuning
Arno Huetter
 
PDF
Analyzing and Interpreting AWR
pasalapudi
 
PPT
Oracle 10g Performance: chapter 00 intro live_short
Kyle Hailey
 
PPTX
Getting to know oracle database objects iot, mviews, clusters and more…
Aaron Shilo
 
PPTX
Oracle ebs capacity_analysisusingstatisticalmethods
Ajith Narayanan
 
DOCX
Oracle dba interview questions with answer
upenpriti
 
PPSX
Sql Performance Tuning with ASH & AWR: Real World Use Cases
vbarun01
 
PPTX
Oracle database performance tuning
Yogiji Creations
 
PDF
Data warehousing labs maunal
Education
 
PPTX
Oracle DB Performance Tuning Tips
Asanka Dilruk
 
PPTX
Top 10 tips for Oracle performance (Updated April 2015)
Guy Harrison
 
PPSX
Database Performance Tuning Introduction
MyOnlineITCourses
 
PDF
USING SEMI-SUPERVISED CLASSIFIER TO FORECAST EXTREME CPU UTILIZATION
ijaia
 
PDF
USING SEMI-SUPERVISED CLASSIFIER TO FORECAST EXTREME CPU UTILIZATION
gerogepatton
 
שבוע אורקל 2016
Aaron Shilo
 
End to-end root cause analysis minimize the time to incident resolution
Cleo Filho
 
resource governor
Aaron Shilo
 
Analyzing awr report
satish Gaddipati
 
Earl Shaffer Oracle Performance Tuning pre12c 11g AWR uses
oramanc
 
Database Performance Tuning
Arno Huetter
 
Analyzing and Interpreting AWR
pasalapudi
 
Oracle 10g Performance: chapter 00 intro live_short
Kyle Hailey
 
Getting to know oracle database objects iot, mviews, clusters and more…
Aaron Shilo
 
Oracle ebs capacity_analysisusingstatisticalmethods
Ajith Narayanan
 
Oracle dba interview questions with answer
upenpriti
 
Sql Performance Tuning with ASH & AWR: Real World Use Cases
vbarun01
 
Oracle database performance tuning
Yogiji Creations
 
Data warehousing labs maunal
Education
 
Oracle DB Performance Tuning Tips
Asanka Dilruk
 
Top 10 tips for Oracle performance (Updated April 2015)
Guy Harrison
 
Database Performance Tuning Introduction
MyOnlineITCourses
 
USING SEMI-SUPERVISED CLASSIFIER TO FORECAST EXTREME CPU UTILIZATION
ijaia
 
USING SEMI-SUPERVISED CLASSIFIER TO FORECAST EXTREME CPU UTILIZATION
gerogepatton
 

Viewers also liked (8)

PPTX
My rise youtube
youknowittttt
 
PPTX
Assessing glacier retreat and water resource vulnerability at volcán Chimbora...
glacierchangeosu
 
PPT
Star chart slide show
nkmaddox
 
PPT
Star chart slide show
nkmaddox
 
PDF
Partitioning Tables and Indexing Them --- Article
Hemant K Chitale
 
PDF
Bryan Mark Peru night 1
glacierchangeosu
 
PPTX
Auv Dane Tele Conference Solutions Web Ex
dang_slideshare
 
PDF
Imja Lake clip book
glacierchangeosu
 
My rise youtube
youknowittttt
 
Assessing glacier retreat and water resource vulnerability at volcán Chimbora...
glacierchangeosu
 
Star chart slide show
nkmaddox
 
Star chart slide show
nkmaddox
 
Partitioning Tables and Indexing Them --- Article
Hemant K Chitale
 
Bryan Mark Peru night 1
glacierchangeosu
 
Auv Dane Tele Conference Solutions Web Ex
dang_slideshare
 
Imja Lake clip book
glacierchangeosu
 
Ad

Similar to Oracle database performance diagnostics - before your begin (20)

PDF
Oracle database performance are database users telling me the truth
Alfredo Krieg
 
PDF
553: Oracle Database Performance: Are Database Users Telling Me The Truth?
Alfredo Krieg
 
PPTX
Quick and dirty performance analysis
Chris Kernaghan
 
PDF
Identification of Performance Problems without the Diagnostic Pack
Christian Antognini
 
PDF
Doc 2011101412020074
Rhythm Sun
 
PPTX
Advanced Database Administration 10g
Connor McDonald
 
PPTX
Oracle - Checklist for performance issues
Markus Flechtner
 
PPT
Understanding System Performance
Teradata
 
PPS
Database Optimization Service
mvorholt
 
PPS
Database Optimzation
mvorholt
 
PDF
Getting optimal performance from oracle e business suite
aioughydchapter
 
PDF
Getting optimal performance from oracle e business suite(aioug aug2015)
pasalapudi123
 
PPT
261197832 8-performance-tuning-part i
NaviSoft
 
PDF
Ebs performance tuning session feb 13 2013---Presented by Oracle
Akash Pramanik
 
PPTX
Real World Performance - OLTP
Connor McDonald
 
PDF
How to find and fix your Oracle application performance problem
Cary Millsap
 
PDF
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
cookie1969
 
PPTX
Design and development of oracle database system
shubhankar Gupta
 
PPT
Improving Reporting Performance
Dhiren Gala
 
PDF
IOUG Collaborate 18 - Get the Oracle Performance Diagnostics Capabilities You...
Pini Dibask
 
Oracle database performance are database users telling me the truth
Alfredo Krieg
 
553: Oracle Database Performance: Are Database Users Telling Me The Truth?
Alfredo Krieg
 
Quick and dirty performance analysis
Chris Kernaghan
 
Identification of Performance Problems without the Diagnostic Pack
Christian Antognini
 
Doc 2011101412020074
Rhythm Sun
 
Advanced Database Administration 10g
Connor McDonald
 
Oracle - Checklist for performance issues
Markus Flechtner
 
Understanding System Performance
Teradata
 
Database Optimization Service
mvorholt
 
Database Optimzation
mvorholt
 
Getting optimal performance from oracle e business suite
aioughydchapter
 
Getting optimal performance from oracle e business suite(aioug aug2015)
pasalapudi123
 
261197832 8-performance-tuning-part i
NaviSoft
 
Ebs performance tuning session feb 13 2013---Presented by Oracle
Akash Pramanik
 
Real World Performance - OLTP
Connor McDonald
 
How to find and fix your Oracle application performance problem
Cary Millsap
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
cookie1969
 
Design and development of oracle database system
shubhankar Gupta
 
Improving Reporting Performance
Dhiren Gala
 
IOUG Collaborate 18 - Get the Oracle Performance Diagnostics Capabilities You...
Pini Dibask
 
Ad

More from Hemant K Chitale (8)

PDF
SQL Tracing
Hemant K Chitale
 
PDF
Oracle : Monitoring and Diagnostics without OEM
Hemant K Chitale
 
PDF
Oracle Diagnostics : Latches and Enqueues
Hemant K Chitale
 
PDF
Oracle Diagnostics : Joins - 1
Hemant K Chitale
 
PDF
Oracle Diagnostics : Explain Plans (Simple)
Hemant K Chitale
 
PDF
Oracle Diagnostics : Locks and Lock Trees
Hemant K Chitale
 
PDF
The role of the dba
Hemant K Chitale
 
PDF
Partitioning tables and indexing them
Hemant K Chitale
 
SQL Tracing
Hemant K Chitale
 
Oracle : Monitoring and Diagnostics without OEM
Hemant K Chitale
 
Oracle Diagnostics : Latches and Enqueues
Hemant K Chitale
 
Oracle Diagnostics : Joins - 1
Hemant K Chitale
 
Oracle Diagnostics : Explain Plans (Simple)
Hemant K Chitale
 
Oracle Diagnostics : Locks and Lock Trees
Hemant K Chitale
 
The role of the dba
Hemant K Chitale
 
Partitioning tables and indexing them
Hemant K Chitale
 

Recently uploaded (20)

PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Digital Circuits, important subject in CS
contactparinay1
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 

Oracle database performance diagnostics - before your begin

  • 1. Oracle Database Performance Diagnostics Oracle Database “Performance diagnostics” : Before you begin Hemant K Chitale Introduction What do you, as the DBA / Developer / System Administrator / Analyst / Performance Analyst / Application Manager, do when you get calls like: 1. The “system” is slow 2. The batch job is “hanging” 3. Users cannot login Are these Database Performance issues? Always? Where do you begin diagnostics? Do you jump into trace files, StatsPack / AWR, OS statistics etc ? This article is a primer on what you should be aware of *before* you begin looking at Oracle Trace Files, Explain Plans, Statistics and what-not. The diagnostic process must be able to help the Oracle Database Performance Analyst identify : a. Whether there really is an “issue” b. How well the issue is defined, if necessary redefine it c. Where the cause arises d. What can be done to address the cause Note : This article is NOT about how to use Oracle and OS methods to diagnose a performance issue and/or tune an SQL/Application/Schema/Database. It is about what you should be aware of before you begin. Environmental Factors Let’s begin with some basic factors: 1. Response Time “Response Time” is what users (and application servers!) see. They do not see ‘consistent gets’ or ‘redo size’ or ‘enq: TX - row lock contention”. User perception of a system’s usability is significantly impacted by Response Time. “fit for use” (the application is usable) must co-exist with “fit for purpose” (the application does what it is supposed to do). On the other hand, Response Time for a batch job can vary from execution time for a (significant) single SQL call to the elapsed time for a key stage in the job. © Hemant K Chitale 2011. http://hemantoracledba.blogspot.com
  • 2. Oracle Database Performance Diagnostics 2. Tiers There are very many tiers through which a response reaches a user (or an application server, depending on who/what has “response issues”). From the desktop, via a browser, over the internet/intranet to an application server, rewritten as an SQL call to the database, parsed and executed by the database, CPU and I/O cycles consumed to fetch, filter and compute values, round-trips between the application server and database server, formatting on the application server, latency down to the user’s desktop; there are very many tiers that are comprised in an application’s performance. Such tiers also exist in a batch job – often ignored are the round-trips between the application server and database server. 3. Capacity Each “component” (be it the User’s Desktop or the WAN Link or the App Server CPU or the App Server RAM etc … down through the Tiers) has a defined Capacity – theoretical and practical. Within a database instance, also, there are capacity parameters – e.g. SGA sizing parameters, the processes parameter etc. 4. Usage Usage of the available capacity of any component varies from time to time. Any tool that “measures” usage has to collect a snapshot of usage at a certain point in time. Multiple snapshots must be analyzed together. 5. Throughput Throughput is the volume of “load” (Transactions/Queries/Rows/Users – each is a different facet of “load”) that is being serviced by the “system”. 6. Constraints Capacity is a constraint. Concurrency is a constraint as well. Two users/processes/sessions may not be permitted to modify the same row/resource at the same time. 7. Serialisation Because Capacity is not Unlimited and because there are Constraints (automatic/system/artificial/user-defined), there may well be some points in application code or database code or the operating system where serialisation occurs. 8. Requirements Volume requirements, usability requirements and control requirements are defined by users / analysts and must be built into the “system”. Requirements also add to code complexity. 9. Scalability Scalability of the system is it’s ability to handle additional workload without more than a proportional increase in component resources (CPU, RAM, I/O) usage. Scalability is © Hemant K Chitale 2011. http://hemantoracledba.blogspot.com
  • 3. Oracle Database Performance Diagnostics adversely impacted by points of contention or serialisation in the requirements / design / code. 10. Non-Linearity Many systems are non-linear. If a query that processes ten thousand rows that are always in memory and never overflows to disk for Group/Sort operations takes 1second to run, it doesn’t necessarily follow that a hundred thousand rows would take 10seconds. The hundred thousand row query may require multiple disk reads because not all rows are cached in memory and, furthermore, the Group/Sort operations also overflow to disk. 11. Shared Resources A database server may be configured to host multiple databases. The CPU and I/O load of one or more “other” databases may well be “interference” in the performance of a database under review. The “cost” of such “interference” must be computed and accounted for. Similarly, within a database, Batch reports may interfere with online queries. Also, when multiple schemas (e.g. for different “applications”) are provided for within a database, they share and contend for shared pool, library cache and buffer cache resources as well as for CPU and I/O. These basic Factors apply to any System. They apply to Airports and Aeroplanes. They apply to Factories and Refineries. They apply to Hotels and Restaurants. They apply to Applications using Oracle Databases. As an Oracle Database Performance Analyst (a DBA or a Developer or a System Administrator), it is necessary to be aware of these Factors. Definition of “Issue” The definition is the first step in the process. First start with identify what the command/process/job is that is under contention. Is it a daily task? How many components (see Factor 2 “Tiers”) does it involve? Do you / the team need to evaluate the capacity, usage and throughput of each of the Tiers? Can a specific Tier be identified as a constraint? Typically for a performance issue, the best reference is “Time Taken”. How long does the particular command/process/job take to run? How long did it take to run on previous occasions? Was there any variance in run times on previous occasions? Can a test system / test run be executed? Can the test be traced (end to end, from the user to system level waits and back to the user)? Can the production run be traced? Can © Hemant K Chitale 2011. http://hemantoracledba.blogspot.com
  • 4. Oracle Database Performance Diagnostics both traces be compared? Remember: The test may not have the same level of components, capacity, throughput, usage and may have a different set of constraints. Also important to understand when analysing the performance of a particular system/job/process/function is to be able to differentiate between “short, sharp” queries and sessions and “medium to long running” batch jobs and reports. A system may have a mix of such operations. Some of these questions may not need to be formally asked. The answers may be well known or documented (e.g. the components and capacity). Others may need to be discovered (e.g. previous response times, usage). Throughput and constraints may get identified only during the diagnostics phase (unless some of them are “well known” and documented). A good definition of an issue might be: Program “A” that takes 15minutes to run at (approximately) the same time every day (on the same server), for the same volume of data, is now (since the last 2 days) taking 45minutes, although no change to program code or parameters has been made. Another good definition might be: Users are usually able to view the details on screen within 5seconds of submitting the query and navigate through all the screens in 15seconds and commit in 2seconds but the same query and same data is now taking 25seconds, 30seconds and 5seconds respectively, under the same user workload. Another example might be: We have exactly doubled the incoming data volume for the ETL job but processing time is now 5x with no other changes to the system. Collection of Data Use the “Questionnaire for Issue Identification” in the Appendix. Remember, not all questions need to be formally raised. Some information may be available from documentation. Some recursion may be necessary – questions or answers that were deemed “insignificant” during the first round of diagnostics may have to be revisited and reviewed. (e.g. early discussion may have considered that the network was always stable but testing or trace files may indicate that network round trips are significant so that network component (“Tier”) may have to be revisited). © Hemant K Chitale 2011. http://hemantoracledba.blogspot.com
  • 5. Oracle Database Performance Diagnostics Some of the data collection may take time -- .e.g. running a trace and analyzing the trace file. You may need to prioritize which data is to be collected early while other collection can run “in the background”. Time Data should always be the first priority. Time Data Data about “Time taken to process/run the query/request/job/batch” should be in terms of Seconds or Minutes (where the time exceeds an hour). Data about “Time for on-screen query” should be in terms of seconds. Data about “SQL Execution time” should be in terms of Milli-seconds, Seconds or Minutes. Time data for previous runs (including min/avg/max) and test runs should also be collected. When collecting data about different executions, ensure that the executions are comparable – e.g. at the same time of day, for the same volume. Time Series Data Time Series Data (as different from “Time Data”) is about plotting performance information and statistics over time and validating if a trend exists. If such a trend exists, it must be considered as a factor when evaluating and projecting load and performance. Such Time Series Data covers not only performance and response times but also volume and workload, concurrency and throughput. Components (Tiers) data Data about the Tiers involved should include : a. Hardware Size (number of CPUs/Cores, CPU Speeds, RAM, HBAs, Network Interfaces) b. Operating System and FileSystem types c. OS performance counters – sar, vmstat, iostat, top, topas d. Latency (min/avg/max) Volume / Workload data Data about Volume and Workload should include: a. Number of concurrent, active users b. Number and sizes of rows being processes c. Number and sizes of batch jobs running concurrently Such workloads impact throughput and concurrency. © Hemant K Chitale 2011. http://hemantoracledba.blogspot.com
  • 6. Oracle Database Performance Diagnostics Execution Plans, Statistics, Wait Events Details about SQL Execution Plans and Execution Statistics (e.g. “consistent gets”) and Wait Events are to be collected and analyzed when it is determined that performance within the database needs to be reviewed. Let me emphasise: This is only after you have determined that the database and, in particular, a specific portion of the application needs to be reviewed. Do not jump into this too soon. I put this last in the list of data to be collected. Interpreting the Data The Time data must be interpreted to identify patterns. For example, has the job been taking ever more increasing time as the weeks/months have progressed? Does the job take more time on certain days or at certain times? Is there a correlation between the Time and the Volume? Can a report that is to be run every 30minutes be allowed to take 10minutes to run? Should the report OR the schema OR the data loads be redesigned to have the report run in less than 1minute? Or should the frequency of the report be changed to run every 60minutes? Workload/Volume/Usage and Capacity/Throughput/Tiers data must be correlated. Does a 20% increase in Workload/Volume/Usage result in a 20% increase in CPU usage? Oracle Trace Files, Oracle Wait Statistics, Server Performance (sar, vmstat, ping latency) data must be reviewed to identify component resource utilisation. The key resources CPU, RAM and I/O are used to transfer data to the user. Therefore, it is necessary to correlate the usage of these resources to the volume of data. Does the query that fetches 100 rows without having to do any aggregation really need to do 1million buffer gets? Making Recommendations What changes (schema, code, architecture) you recommend will, to a not inconsiderable degree, depend on your prior experiences and “confidence” level in the tools and methods used. Remember that your proposed changes may interact with and impact other environmental factors! Identify which “environmental factors” are impinging on performance. recommendation should be able to address the factor. © Hemant K Chitale 2011. http://hemantoracledba.blogspot.com Your
  • 7. Oracle Database Performance Diagnostics A cardinal rule of Performance is “never does anything that is not necessary”. For example, when you review a user requirement, you do ask the questions “Is this requirement necessary? Has it already been met by some portion of the design that the user is not aware of? Should the data be duplicated?” Similarly, when reviewing a system, configuration or code (or a diagnostic trace) asks the questions “Is this component necessary? Is it duplicated? Is the same task being done repeatedly (e.g. a lookup on the same rows or a validation being done twice)?” Managing Changes Once the root cause for an issue is identified, and recommendations made the steps of defining, creating, testing and migrating the change (or changes) required have to be careful managed. Some issues can be addressed by workarounds while others may require changes with long term impacts. However, workarounds, themselves, may have adverse consequences. A reasonable degree of confidence in the impact assessment is a requirement. Appendix Example Questionnaire for Issue Definition: What is the command/process/job is that is under contention? What is it called? Is it a daily task? How many components (see Factor 2 “Tiers”) does it involve? List each component. Do you / the team need to evaluate the capacity, usage and throughput of each of the Tiers? Can a specific Tier be identified as a constraint? How long does the particular command/process/job take to run? How long did it take to run on previous occasions? Was there any variance in run times on previous occasions? Can a test system / test run be executed? Can the test be traced (end to end, from the user to system level waits and back to the user)? Can the production run be traced? Can both traces be compared? © Hemant K Chitale 2011. http://hemantoracledba.blogspot.com