Db2Monitoring v1.0.2
Db2Monitoring v1.0.2
A Monitoring Approach
This article provides a monitoring approach for IBM Db2 databases via Db2 table functions. It
describes how relevant data can be collected, and displayed for further tracking.
The following terms are registered trademarks of International Business Machines Corporation in the
United States and/or other countries: AIX, AS/400, DB2, IBM, Micro Channel, MQSeries, Netfinity,
NUMA-Q, OS/390, OS/400, Parallel Sysplex, PartnerLink, POWERparallel, RS/6000, S/390, Scalable
POWERparallel Systems, Sequent, SP2, System/390, ThinkPad, WebSphere.
The following terms are trademarks of International Business Machines Corporation in the United States
and/or other countries: DB2 Universal Database, DEEP BLUE, e-business (logo), ~, GigaProcessor,
HACMP/6000, Intelligent Miner, iSeries, Network Station, NUMACenter, POWER2 Architecture,
PowerPC 604,pSeries, Sequent (logo), SmoothStart, SP, xSeries, zSeries. A full list of U.S. trademarks
owned by IBM may be found at http://www.ibm.com/legal/copytrade.shtml . NetView, Tivoli and TME are
registered trademarks and TME Enterprise is a trademark of Tivoli Systems, Inc. in the United States and/or
other countries.
Oracle, MetaLink are registered trademarks of Oracle Corporation in the USA and/or other countries.
Microsoft, Windows, Windows NT and the Windows logo are registered trademarks of Microsoft
Corporation in the United States and/or other countries.
UNIX is a registered trademark in the United States and other countries licensed exclusively through The
Open Group.
LINUX is a registered trademark of Linus Torvalds.
Intel and Pentium are registered trademarks and MMX, Pentium II Xeon and Pentium III Xeon are
trademarks of Intel Corporation in the United States and/or other countries.
Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United
States and/or other countries.
Other company, product and service names may be trademarks or service marks of others.
Abstract
This document describes how to collect monitoring data for a database on Db2 LUW© in a
certain time frame.
The following key items are main characteristics of the approach described.
• The information is collected in iterations. This allows you to monitor the database over a
certain time period.
• Metrics represented by counter values are displayed as differences, rather than absolute
values. This eases the identification of possible pain points within the monitored time
frame.
• The procedure described runs the data evaluation independently from the data collection.
This allows you to perform the investigation in a different, non-severe environment
rather than a sensitive production database.
• The data can be viewed in a spreadsheet. Key metrics can be graphically visualized.
The monitoring method allows you to keep track of changes of Db2 metrics with high granularity
across an extended period of time.
As an example, see the following chart that shows the total request time vs extended latch waits
and extended latch wait time for a single application handle.
Publications available so far describe two consecutive data collections and the differences
between the considered metrics. This requires detailed knowledge of the problem, and the right
point in time to be used for data collections. Examples are the db2mon script
(https://ibm.ent.box.com/s/iz3ytk28d8wsfg03s1lxwjsl7mifvfmu/file/160446905792) and the
article “Tuning and Monitoring Database System Performance”
(https://www.ibm.com/developerworks/community/wikis/home?
lang=en#!/wiki/Wc9a068d7f6a6_4434_aece_0d297ea80ab1/page/Tuning%20and%20Monitoring
%20Database%20System%20Performance).
This document provides one of countless ways to monitor a Db2 database. We'd like to encourage
you to modify and even improve the described approach to fit it into your own monitoring
scenarios.
Table of Contents
Motivation................................................................................................................................5
Prerequisites and conventions...................................................................................................5
Terminology..............................................................................................................................6
1 Collect data..........................................................................................................................6
1.1 How to query table functions.......................................................................................6
1.2 Choose a data format for the collection........................................................................7
1.3 How to set up the collection process............................................................................8
1.4 Which data sets should be collected ?........................................................................12
2 Prepare for investigation....................................................................................................13
2.1 Import the data...........................................................................................................13
2.2 The first query............................................................................................................14
2.3 Shortcomings.............................................................................................................17
3 A fully generic approach....................................................................................................17
3.1 Developing the parts of the generator query..............................................................18
3.2 Putting it all together..................................................................................................25
3.3 Using the generated SQL query.................................................................................30
3.4 Data representation....................................................................................................31
Appendix.................................................................................................................................35
A) Some SQL concepts......................................................................................................35
B) A monitoring scenario...................................................................................................37
C) Extended list of fields to be shown with absolute values..............................................40
References..............................................................................................................................42
However, many table functions return a large number of fields. This circumstance can make
tracking potential problems a nasty task. In addition, values for counters are collected from the
start of an entity (database start or connect time of an application process), or from the creation of
an object (an SQL statement in the SQL cache). Problem tracking often requires that you monitor
changes over a defined period of time. Consequently, the task requires that you check the change
of a value over that period of time rather than investigate absolute values.
collecting Db2 monitoring data in iterations over an extended, but defined period of time
preparing the data for efficient tracking and investigation
Note that in order to follow the steps in this document, you require some scripting
abilities (shell, awk) and enhanced SQL knowledge. We recommend to have a test system
to reproduce the SQL statements and scripts that are discussed throughout this document.
• has been written for Db2 LUW© version 9.7, or higher, and
• uses the Korn shell as it is available in Linux and UNIX environments. In addition, we
also provide bash examples.
The SQL commands presented in this document can be used to run against a database via Db2
CLP. Unless stated otherwise, copy + paste the SQL text into a file, replace parts in blue by the
proper value, terminate the statement by a semicolon, and run it using the following OS
command:
Term Definition
diff value Difference between two values
Table function A Db2 LUW© monitoring table function as shipped with the product.
This document refers to mon_get_...() table functions.
1 Collect data
1.1 How to query table functions
The number of table functions that come with Db2 LUW© is increasing with higher release levels.
In addition, the content is changing as there are more columns for many of the table functions as
well. To ensure that you don’t miss important information that is available, but still have a
procedure that works across different release levels of Db2 LUW©, you need a generic SQL
structure that allows for easy access to the complete data set you intend to collect.
Furthermore, when you query the result set of a monitoring table function like
mon_get_database(), you need to take extra care to save the time of the data collection.
Therefore, the data collection procedure requires the following.
Queries on one table function can be modified easily to query another table function.
Additional information for easier problem analysis can easily be amended, like the time
of the data collection
select
current timestamp as collection_timestamp,
t.*
from table(mon_get_database(-2) ) as t
The time the data collection is retrieved from the Db2 register current timestamp. You can
also use other information that you might want to add.
This SQL query is generic with regard to the number of columns returned by the table function.
We can expect this query to run on every level of Db2 LUW© provided the table function exists
select
current timestamp as collection_timestamp,
t.*
from table( mon_get_connection(NULL,-2) ) as t
So you just need to replace one table function with another and ensure to use the proper
parameter list for the table function.
The generalized SQL query structure then is the one shown below. The part to be adjusted is
shown in blue.
select
current timestamp as collection_timestamp,
t.*
from table( TABLE_FUNCTION ( [parameter list] )) as t
Below, these options are discussed in more detail. However, this document focuses on the export
of data.
With the above command, the data part of the table function is then stored in FILENAME.ixf
while the LOB data go to a file named FILENAME.001.lob. Of course, to benefit from this
method, you have to import the data into a target environment. This is described later in this
document.
The EXPORT command of Db2 LUW© issues a warning SQL27984W indicating it does
not have any table structure information that can be stored in the output file.
This warning is returned due to the nature of the SELECT statement being used for the
EXPORT, and is not a reason for concern.
Once the table exists, you can collect data using the following command:
For most table functions, querying with a delay of at least 30 seconds does not cause too
much impact. For other table functions, it is advisable to be a bit more careful. The most
prominent example is mon_get_pkg_cache_stmt() which returns the content of the
SQLcache. Apart from the impact on the database being monitored, the amount of data
returned is considerable, and the time to store the data in a table is time consuming as
well. Therefore, to query the total SQL cache, use iterations with a delay of several
minutes. A good starting point is 10 or 15 minutes.
To run in iterations, the usual procedure is to use a shell script. We use the Korn shell here, as it is
installed in every Linux and UNIX environment running Db2 LUW©. In addition, we use the
bash because this shell is very convenient and the preferred shell for many administrators.
#!/bin/ksh
_num_iterations=10
_delay=60
(( i = 0 ))
whil e [[ i -lt ${_num_iterations} ]]; do
< some activity >
(( i = i + 1 ))
sleep "${_delay}"
done
The intention is to run 10 iterations with a 1-minute delay. The problem with this traditional
approach is that the activities (marked blue) take time and therefore cause the iterations to take
longer than the amount of time indicated by parameter _delay.
Better results are achieved e.g. by a co-process that runs in the background and indicates when
the next iteration can start. To do so, the co-process writes a line once per iteration. The main
process then reads from the co-process and starts the next iteration once the data is read.
_num_iterations=10
_delay=60
# co-process started
{
while true; do
sleep ${_delay}
print "next"
done
} |&
A simple script for collecting data for table function mon_get_database() then looks as shown
below.
_num_iterations=10
_delay=60
_database="DB0"
_logfile="script.log"
touch "${_logfile}"
# co-process started
{
while true; do
sleep "${_delay}"
print "next"
done
} |&
This processing runs with a roughly constant frequency of iterations. Of course, the example
given is a most simple realization of the principles that have been sketched. However, it works. A
more sophisticated solution based on the script shown is surely possible.
Using bash
You can derive a basic script for the bash in the same way as above. A useful example is the
following script.
_num_iterations=10
_delay=60
_database="DB0"
_logfile="script.log"
touch "${_logfile}"
# co-process started
coproc drummer {
while true; do
sleep "${_delay}"
echo "next"
done
}
Most parts of the script remain unchanged. However, the syntax of the co-processes is different.
You must replace the shell’s print command, and the shell’s read command takes different
arguments.
• In case of a single performance problem scenario, the tracking methods heavily depend
on the nature of this scenario. In many cases, a general monitoring approach as discussed
in this document is not required.
For a general monitoring approach, a good starting point is the database level, i.e. the table
function mon_get_database(). Depending on the results and to investigate at a more granular
level, you might choose e.g. mon_get_connection() to identify a problem that occurs in
specific connections, or mon_get_transaction_log() if there are problems in the logging
area.
It is advisable to collect all data at the same time from all monitoring table functions being
considered.
The Db2 LUW© level must be compatible with that of the environment where the data
was collected. This means the target environment should have the same Db2 LUW©
version. For levels as of Db2 LUW© 11.1, it must also have the same Modification Pack
(Mod Pack) level.
Tables created to import and investigate data should reside in a separate tablespace so that
there is no interference with other data in the database.
Example
If you take the example of table function mon_get_database(), you will have the export files
mon_get_db0.ixf, mon_get_db1.ixf, and so on from the script developed in section 1.3 .
On the target side, you need to create a tablespace and a table to import the data into.
The amount of data will be one row per database member in each data collection. In addition, the
table function has no LOBs. Thus, the import command can be used in its simplest form.
Let’s assume the command file has the name tf_database_import.clp. Then you can run
this command file as follows.
Screen output will be dumped to the logfile specified with option -z.
While this is a pattern that is specific to our example, we need a more general SQL pattern that
applies to other table functions as well. Let’s assume we have placed all data from iterations on
some table function in the table called data_table. Then, our SQL pattern will look as shown
below.
select
<field list>
from
data_table a
inner join
data_table b
on
a.join_field1 = b.join_field2 and
...
b.collection_timestamp in (
select max(x.collection_timestamp) from data_table x
where a.collection_timestamp > x.collection_timestamp
)
order by a.member, a.join_field2, … , a.collection_timestamp
The names and numbers of join fields depend on the table function that was queried. The
following rules apply:
• For the join, use either all arguments or a subset of the argument list of the table function.
• All the join fields from the table function, plus the field collection_timestamp,
uniquely identify a row in the table of collected data ( data_table in our example)
Additional information:
• The ORDER BY clause in the SQL pattern must be grouped together to data rows
in order to be compared. The field COLLECTION_TIMESTAMP is the last ORDER
BY field as data with identical join fields of the table function need to be
compared.
• Counters
Here we have metric data that is counted from the creation of an object or from the start
of the database. Counters typically have integer data types. To properly interpret this data,
we need to look at the diff values of the data in consecutive iterations.
• Non-integer data
Finally, there are non-numerical data types and time stamps. This data is also displayed
“as is”.
The table functions return almost all numeric data as integer data types. Thus, there is no
fractional numeric data to take care of.
So in the query pattern above, there is no generalized form of the field list. In the following
example we use an SQL to query fields explicitly.
You might notice correlation names with a leading asterisk symbol. This is a good method to
distinguish between fields showing absolute values, like identifiers, and diff values.
2.3 Shortcomings
What is missing in the approach developed so far? Basically, at least two items:
• Compiling and writing down the selection field list in the SQL can quickly become a
tedious work.
• If there are several potential areas to look into, the fields have to be identified, and
classified first (e.g. whether diff values or absolute values are to be shown).
These facts immediately show that the current approach cannot be the end of the story. To have a
useful procedure, we need to continue.
• A fixed part
This means we need an SQL statement that works as a generator of an SQL query.
If you run this SQL query, the name of the data table is returned. The disadvantage of having the
table name in the generator statement is that each time a new data table is being used you need to
modify the generator query. To avoid this, use an SQL variable.
You can create the variable outside the script. As value, assign the proper table name. From OS
level, run the following commands:
To make use of this construct, you can put the result of the SQL statement into a table within a
compound SQL statement.
The result of this SQL statement is the same as above. Compound statements will turn out to
become an essential part of the solution.
This SQL statement will run fine if you have created the SQL variable
mondb2_tabname as described above. Create an SQL command file as described in
section Prerequisites and conventions at the beginning of this document, and run it
against a Db2 database to retrieve the current content of the SQL variable.
Field (or column) names have the same meaning across all table functions, although the
scope may differ. This means that the field NUM_LOCKS_HELD is always the number of
locks being held currently. However, in table function mon_get_database(), this
number is evaluated for the whole database, while in table function
mon_get_unit_of_work() it refers to the number of locks being held by the UOW
(database transaction) that the collected row refers to.
This means that the list of fields with absolute values can be used for all table functions.
For our query, the column names again go into a table, just as before.
The list above is incomplete in the sense that across the existing monitoring table functions there
are quite a lot of fields that should be displayed as absolute values. However, the list can be easily
extended to include additional fields. For a more comprehensive list, refer to appendix C.
The SQL statement just returns the field names put into table ABS_COLS above.
The SQL statements in this document assume that you always work in a single database
schema. In particular, it is required that
• the data table exists, and is created in the current schema of your database
connection.
• the SQL variable mondb2_tabname exists, and is created in the current schema
of your database connection.
If you reproduce the SQL statements in a local environment, then your database
connection, the data table, and the SQL variable must have the same schema value. For
information on how to adjust schema settings, see Appendix A.
Note that the generated field list starts with a comma. The selection fields are separated by
commas, so when the final SQL command is put together, a first selection field still needs to be
specified.
Note that potential join conditions are those fields that together identify a single row within the
collected data. Therefore, apart from the field COLLECTION_TIMESTAMP, this list will consist of
the parameter list (or a subset of it) of the table function used in the data collection. Conversely, if
certain identifier fields like APPLICATION_HANDLE exist, the conclusion is that this field will
need to be used as join condition.
The entries are enumerated according to the database catalog. Thus, the sorting of the data table
can be adjusted to follow the order of occurrence of fields.
Again, the assumption is that the data table has the same table schema as the schema of the
current connection.
From the list returned, the join conditions and the ORDER BY list can be built.
As mentioned before, the field list of the SQL statement requires a first field. This is added as the
time difference of the data rows being compared, measured in number of seconds between
iterations. Apart from its use for a correct SQL syntax, the field just added is useful for verifying
the length of each iteration.
Similarly, the first entry in the join conditions has to be treated differently than the rest.
A look at the general form of the SQL statement to be generated helps to get a picture of what to
do.
As a special requirement, it turns out that the static part consists of several pieces spread across
the SQL text. To realise the correct order in the final SQL statement, we use an additional
numbering scheme.
If you want to display the time difference between data rows in a different format for
example, for <seconds>.<microseconds>, replace the first field in the SQL
statement above by the following expression:
timestampdiff(2,char(timestamp(a.collection_timestamp)-
timestamp(b.collection_timestamp))) +
( (microsecond(timestamp(a.collection_timestamp)-
timestamp(b.collection_timestamp)) + 1000000)
% 1000000) * 0.000001
as time_interval
If you maintain the table name in the SQL command for the static part, the result resembles the
first query from section 2.2.
We then need the static part that takes care of fixed language elements in the SQL query to be
generated.
Note that the sub-selects for the selection fields, join conditions, and the ORDER BY clause
contain a literal number as first field to properly fit into the static part that was carefully designed
for this task.
After you have put together all the pieces, end the SQL statement with a semicolon and save the
result in a file. Let’s say the file name is SQLMonGenerator.sql. Then you can run the SQL
query via Db2 CLP.
The first command is only required if the SQL variable mondb2_tabname does not yet exist. For
more information, refer to appendix A. To work with the output of the generated SQL query, save
the retrieved information in a file. Let’s use the file name SQL_Output.
The command line option -x instructs the Db2 CLP command to suppress header and trailing
information. In addition, the SQL query being generated is terminated by a semicolon, so the
output can be run immediately. This looks similar to what was used in section 2.2. However,
instead of just a few fields, the query is using all fields available in the table.
Performance
For a small number of rows in the data table, the performance of the generated SQL query will be
fine. However, depending on the table function being used and the number of iterations run to
generate the data, the join condition on field COLLECTION_TIMESTAMP potentially causes a
performance issue. Therefore, from a performance perspective, it is helpful to create an index on
this field.
Manual modifications
You might want to add further information to the data displayed, e.g. the ratio between certain
fields. The following is an example of the ratio of rows returned compared to rows read that you
may want to calculate.
Note that this will make the data generation more complex because you need to take extra care,
for example, to avoid division by zero.
It is easier to use the data as generated by the provided SQL procedure, and later add information
that is derived from the data representation process e.g. by using a pivot table in the spreadsheet.
There are still situations where manual modifications of the generated query can be helpful. To do
so, see the example discussed below where the amount of data returned is limited.
The dashed line can be used to identify the length and offset of the field content. A simple form of
the script is shown here. The field separator character of the output is @ by default.
If you need to use a different field separator like the colon, simply run the script with the
modified command line shown below.
You can then read the output file using a spreadsheet calculator. Specify the proper field
separation character, no text separation character.
There are other table functions that potentially provide much more data. For example, take
mon_get_connection() which returns one row per connection and database member. If you
monitor a database with one member and e.g. 2000 connections, the data collection outlined
above will give you 120 data sets of 2000 entries each, providing a total of ~250000 entries.
That’s why working with output from this table function should start with information that lets
you limit the investigation of this data to a subset of database connections and focus on a certain
area.
Let’s assume you have collected data for mon_get_database() and mon_get_connection()
in parallel. You see from mon_get_database() that there was a particularly high amount of
rows read during the monitoring time period, indicated by a.rows_read-b.rows_read (field
"*ROWS_READ" in the SQL output). You then want to find the application handles that were
reading the highest amount of data. To do so, you take output from table function
mon_get_connection() that was collected at the same time as mon_get_database() to
import the data collection into a table, e.g. tf_connection. You generate the SQL for that data
collection following the procedure as described. Now you modify the query to put it into another
select * with an ORDER BY clause on field "*ROWS_READ". It may be useful to limit the
amount of data returned using a FETCH FIRST clause. In the example used here, the changes will
be as follows (marked red).
This query returns the data collections and application handles with the highest amount of rows
read. You can then take the original query and restrict it to just the most significant application
handles that give you an amount of data you can handle more easily.
With the approach outlined, you can always put the generated query into a select statement, so
modifications consist of adding lines to the beginning and end of the generated query.
Appendix
A) Some SQL concepts
In this section, you can find help on basic SQL concepts used in this document.
db2 "values(var_name)"
To query the current schema of your database connection, run the following command:
You can modify the current schema of your database connection to a new value, e.g.
new_schema, using the following command:
The schema of a table, e.g. tf_database, is queried from the database catalog. Use the
following SQL statement:
You can query the schema of an SQL variable in a similar way. Let’s use the SQL variable
mondb2_tabname. Use the following SQL statement:
The schema of an existing SQL variable cannot be modified. However, with the above
information, it is easy to first set the proper database connection schema, create the SQL variable,
and assign the desired value to the (re-)created SQL variable.
_num_iterations=61
_delay=120
3. Transfer the 61 files to the target environment for investigation, create the data table, and
import the data as shown in section 2.1.
5. Set SQL variable mondb2_tabname to the proper table name and run the generator SQL
query script.
6. Now, the query to be used for investigation is available. Run that query and save the SQL
output in a file.
7. Convert the SQL output file using awk as explained in section 3.4 and redirect the output
to a text file.
8. Open the generated text file using the spreadsheet calculator of your office product.
Note the time interval, which in this example is always roughly 120 seconds. The column
showing the CPU time indicates by the leading asterisk in the header that the displayed numbers
are per iteration (i.e. per ~120 seconds).
From this data, you can easily generate a chart that clearly shows the peak in the CPU time
consumption. To further investigate, you can then narrow down the tracking to the time of high
CPU utilization and e.g. check for applications that could have been responsible for this increase.
◦ Monitoring Elements
https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.a
dmin.mon.doc/doc/c0059125.html