0% found this document useful (0 votes)

4 views11 pages

CMG workload correlation and virtualization

The Computer Measurement Group (CMG) is a non-profit organization focused on the performance evaluation and capacity management of computer systems. This paper discusses methods for workload correlation and visualization to aid in performance testing, emphasizing the importance of representative workload samples. It highlights the need for effective communication of data through visual means to justify system improvements and ensure accurate performance evaluations.

Uploaded by

kmdbasappa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views11 pages

CMG workload correlation and virtualization

Uploaded by

kmdbasappa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

The Association of System

Performance Professionals

The Computer Measurement Group, commonly called CMG, is a not for profit, worldwide organization of data processing
professionals committed to the measurement and management of computer systems. CMG members are primarily concerned
with performance evaluation of existing systems to maximize performance (eg. response time, throughput, etc.) and with capacity
management where planned enhancements to existing systems or the design of new systems are evaluated to find the necessary
resources required to provide adequate performance at a reasonable cost.

This paper was originally published in the Proceedings of the Computer Measurement Group’s 2010 International Conference.

For more information on CMG please visit http://www.cmg.org

Copyright 2010 by The Computer Measurement Group, Inc. All Rights Reserved
Published by The Computer Measurement Group, Inc., a non-profit Illinois membership corporation. Permission to reprint in whole
or in any part may be granted for educational and scientific purposes upon written application to the Editor, CMG Headquarters,
151 Fries Mill Road, Suite 104, Turnersville, NJ 08012. Permission is hereby granted to CMG members to reproduce this
publication in whole or in part solely for internal distribution with the member’s organization provided the copyright notice above is
set forth in full text on the title page of each item reproduced. The ideas and concepts set forth in this publication are solely those
of the respective authors, and not of CMG, and CMG does not endorse, guarantee or otherwise certify any such ideas or concepts
in any application or usage. Printed in the United States of America.
WORKLOAD CORRELATION AND VISUALIZATION

Tom Wilson

A great way to make your point is to “show them”. This paper demonstrates
some simple comparison and correlation methods for studying workload using vi-
sualization as an aid in the analysis. First, we examine some metrics for a real
system and perform correlations between a few parameters. Then, we compare
workloads from various samples to determine an appropriately representative in-
terval upon which to base a performance test. These methods provide confidence
for the system development and testing.

1 Introduction
As engineers in a multi-program environment, we sometimes become part of a team where the program is already well
under way. Part of the learning curve involves reading established documents and reviewing existing work. Occasionally,
we find opportunities for improvements and have to justify them to a decision-maker who is not necessarily familiar with
the analysis details. One of the simplest communication mechanisms is a graph or chart. It is far easier to argue a point
by visualizing the data rather than by showing a table of numbers, arguing emotionally, and waving your hands. We will
demonstrate several visualization analyses, two of which convinced the decision-makers to make improvements to the work
being done.
This particular program consists of a series of projects developing proprietary transaction systems. Each system builds
upon the previous, bringing more functionality to a continuously-growing user base. The systems support logistics and
maintenance for military equipment. For the purpose of anonymity, only general details about the system are provided
here. One of the concerns common to all of the systems is performance.
A key component to system performance testing is the usage of a relevant workload. Since the performance test has a
short duration (i.e., on the order of hours), the test workload should represent that portion of real workload that stresses
the resources of the system. When an existing system provides the real workload, we are left with the decision as to how
to sample that data to create the test workload. If we consider two samples, how can we compare them?
A workload characterization effort would associate resource usage with the transactions in order to determine some
equivalence. For our system, associating resources with transactions was deemed difficult due to lack of support within the
existing software. Carrying out a large number of isolated tests as an alternative to determining this association was not
considered feasible. We need an alternative to characterization so that we can quickly compare workloads. The literature
contains numerous papers on workload characterization (e.g., [EM02]), but little on simple comparison.
In this paper, we want to investigate several concerns. Is a sample workload representative? If the wrong workload
is modeled, then an incorrect evaluation of performance will exist. If the test results are good, then problems may arise
in production because a different workload exists. If the test results are bad, then over-engineering may occur to solve
problems that may never arise and raise cost. In order to answer this question, we need to be able to compare workloads
and make decisions based on the comparison. The results of the comparisons should guide us toward selecting the workload
upon which to base a test. Past testing selected a sample based on experience and had no supporting analysis. This is
common in a pressure-filled development environment, especially where review is absent, but does not necessarily lead to
a wrong answer.
Will the workload change as load increases? If it does, we need to understand how, so that our tests are appropriate.
To answer this question, we must first confirm that load is a function of users and that the user base is growing. If both
are confirmed, we need to understand how the workload has changed as the user base grew in the past.
We will begin by analyzing some source data. In this paper, the analyses will be constrained to those that support
answering our questions. Then, we will define how we can compare workloads.
2 Source Data Analysis
A significant amount of source data is available from the existing system against which many analyses can be performed.
Some of those data are captured in transaction logs, which include the beginning and ending times of the transactions.
From these data, a sampling was performed to determine the number of transactions per hour, the number of users per
hour, and the number of logins per hour.1 The source data cover an eight-month period during which the system was in
continual use. User load varies because of many factors, such as the day-of-the-week, the time-of-day, and seasonal events
(i.e., holidays). This behavior will be illustrated in Section 2.1.
The source data are filled with various problems that are mostly a small percentage in the total volume. However, one
problem that was frequent concerned the naming of transactions. For reasons that we cannot explain, a large number of
transactions were named “Unknown Transaction”. This many-to-one mapping masks what the real frequencies of these
transactions are. At one point during the maintenance lifetime of the system, many of these transactions were distinctly
renamed. We anticipate a poor result when comparing a sample from before this point to a sample after this point.
First, we will look at metrics for a few parameters during several intervals. Then, we will perform correlations between
those parameters. Finally, we will perform a trend analysis. These analyses are foundational to answering our workload
questions.

2.1 Metrics
Figure 1 shows metrics for the three parameters for the eight-month interval. The data consist of hourly counts for
each metric. A weekly pattern is fairly apparent; each “column” consists of data for the weekdays, while the gaps in
between them are the weekends. The “valleys” that occur in April and August are due to holidays. The users and logins
graphs use the same y-axis scale, so as to better appreciate their relationship.

Transactions
15,000

10,000
Count

5,000

Mar May Jul Sep Nov

Users

1,500
Count

1,000

500

Mar May Jul Sep Nov

Logins

1,500
Count

1,000

500

Mar May Jul Sep Nov

Figure 1: These charts show metrics for the three parameters over an eight-month period: transactions, users, and logins.
The users and logins charts use the same y-axis scale; the transactions chart does not.

Figure 2 shows a few charts of the same three parameters for specific one-month and one-week time intervals. In
Figure 2a, the weekly pattern is now more obvious: five work days followed by two non-work days (i.e., a weekend). While
there are some minor differences in the data for each day, there is nothing that easily explains it. Figure 2b shows data for
one week. Most weekdays show a common double-hump feature that results from peaks before and after a lunch break,
although Friday lacks this feature. This double-hump artifact is a property of people, not the system.
Most other months are similar to the example, although the peaks might be lower. Months with holidays are certainly
different, but we are not expecting to base a test on a holiday load.
1 We have provided login data purely for contrast with the other parameters.
Month: October Week: October 11−18

15,000 Transactions 15,000 Transactions

Users Users
Logins Logins

10,000 10,000
Count

Count
5,000 5,000

0 0

Oct 03 Oct 08 Oct 13 Oct 18 Oct 23 Oct 28 Nov 02 Oct 11 Oct 13 Oct 15 Oct 17

Date Date

(a) (b)

Figure 2: These charts show the three metrics over (a) a one-month interval and (b) a one-week interval.

Figure 3 shows two different one-day intervals. Figure 3a shows transaction counts during a typical work day. The
corresponding graphs of the other metrics would look similar and are not shown. Figure 3b shows transaction counts
during an atypical work day, where some event has caused a reduction in transaction counts around 8:00. Figure 3c shows
the user and login counts for the same time period. The user counts are similar in shape to transaction counts, while the
login counts reveal a spike reflecting users logging in numerous times. The event probably brought the system down and
required the users to login again.

Day: October 12 Day: October 13 Day: October 13

14,000
14,000 Users
Logins
12,000
12,000 1,500

10,000
10,000

8,000
Transactions

Transactions

8,000 1,000
Count

6,000
6,000

4,000 4,000 500

2,000 2,000

0 0 0

04:00 09:00 14:00 19:00 00:00 05:00 10:00 15:00 20:00 00:00 05:00 10:00 15:00 20:00

Time Time Time

(a) Typical Day: Transactions (b) Atypical Day: Transactions (c) Atypical Day: Users and Logins

Figure 3: These charts show the three metrics during a day interval: (a) Transactions during a typical work day, (b) transactions
during a work day with an anomaly, and (c) logins and users during that same atypical day.

These metrics give some insight to system usage and are the basis for several subsequent analyses. What we did not
study here is what the users did on the system (i.e., the activities).
2.2 Parameter Correlation
Correlation expresses a (linear) relationship between two parameters. If the relationship is strong, then one parameter
may be a function of the other. Correlation can be computed by a function and expressed as a quantity (called the
correlation coefficient), and it can be visualized with a scatterplot and a regression line. We will look at the correlation
between pairs of our three parameters. One expectation is that the number of users dictates the number of transactions.
Figure 4 plots each parameter against the other as a matrix of graphs. The cells of the diagonal contain the parameter
name. Any other cell is a comparison of the two parameters specified in the cell’s row and column. The cells below the
diagonal show a scatterplot with a blue regression line and the correlation coefficient (in green). Cells above the diagonal
show the same text as below the diagonal for readability purposes. The correlation coefficient for users and transactions is
very high, so we should conclude that the number of users drives the number of transactions. This confirms our expectation
that the number of users drives the number of transactions.

Parameter Correlation
0 500 1000 1500

700
500
Users 0.964 0.990

300
100
0
●
0.964
●
1500

●
●
●
●

●
● ●
1000

● ● ●
●
●
●
●
●
●
●
●

●●●
●●●
●
●
● ●
●
●
●●● ●
● ●●
●
●
●
●
Logins 0.961
●●● ●● ●
●●● ●●
●
●
●●
●
●
●
●
●●
●●●
●●●
● ●
●●●● ●
●●● ● ●●
●●●●
●● ● ● ● ●●● ●
● ●●●
●●● ● ● ● ●●●●
●
● ●●●●
●
●●
●
●
●●●●●
●
●●
●●●●●
● ●
●●●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
● ●
●
●
●● ●●
●●●
● ●
● ● ●
●●●● ●●
500

● ●
● ●●●●●● ●●
●●
●● ●
●
● ●
●
●
●●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
● ● ● ● ●●
●
●
●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●●●●
●
●
●
● ● ●
●● ●●● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●● ●
●
●
●● ●
●
●● ●●●●● ●
●●●●
●●
●●
●
●●●
●
●●●
●
●
● ●
●●
●●
●
●
●●
●●
●●
●●●
●● ●
●
●●● ●
●●
●
●
●●●●
● ●
●
● ●
●
●●●
●●
●●●
●● ●
●●●●
●●
●
●●●
● ●
●
●
●●●
●
●●
●
●●
●
●●
●●
●●
●●
●●
● ●
●●● ●
●
●
●●
●
●
●
●
●
●
● ●
●●
●
● ●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
● ●●
●
● ●●
●●●●●●
●●
●●●●
●
●●●●●●●●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
● ●●●●
●●●●●●●
●●
●●●●
●
●●
●●
●
● ●
●●●
● ● ●
●
●●
●●●
●
● ● ● ● ●●●
●●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●●●
●
●
●●●●●
●●
● ●
●●●●
●●
●● ●●●
● ●●
●●●
●●
●
●●
●
●
●
●●
●●
●
●
●●
●
●●●
●
●
●
●●
●●
●
●
●
●●
●●
●
●●
●
●●
●●●●●● ●● ●● ●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
●● ●
●●●●●● ●
● ●●●●● ● ●
●
●●●
●
●●
●●
●
●
●●●
●●
●
●
●●
●
●
●●
●
●●
●●●
●
● ●●●●● ● ● ●● ●
●● ● ●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●●●
●
●
●● ●● ● ●
● ● ●● ●●
●●●●
● ●●
●
● ●
●●
●●●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
● ● ●●
●
●●
●●
●●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●●
●
0

●
●
●●
●
●●
●●
●●
●●
●
●

15000
● ●
0.990 ● ●● ● 0.961 ● ●●●
● ●●●● ●● ●● ●
●● ●
●
●●
●● ●
●●●● ● ●● ●●
●●●
● ●
● ●●
●●●●●●
● ●●●●● ●●
●● ● ●● ●●
● ●
●● ●●● ●
●●
● ●●●
●●●●●
●
● ●●●●●●●●●●●
●● ●●
●
●●●●
●
●●
●●
● ● ●●
●●
●●●●
●●●
●●
●●
●
● ●
●
●●
●
●●
● ●
●
●●
●
●
●
●
●
●
●
●●●
●●
●
●
●●
● ●●● ● ● ●●● ●●● ●●●●
● ●●
●
● ●●●
●● ●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●●●
●●
●
●●
●●
●
●●
● ●●●
●● ●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
● ●● ●
●
●
●
●●
●●● ●
● ● ●●
●●● ● ●●●●●
●
●● ●
● ● ●●●
●● ●●
●
●●●
●●
●
● ●
●●
●
●●
●
●
●
●●
●●
●
●●
●●
●
●● ● ● ●
●●
●
●●●
●
●
●●●
●
●●
●●
●●
●
●●●●●● ●
● ● ●
●●●●
●●
●
●●
●
●●
●
●●●
●
●
●
●
●
●●
●●
●
●
●●
●
●●● ● ●●● ●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●●
●●●● ● ●
●
●● ● ●●
●● ●
●● ●● ● ●● ●●● ●●● ●●● ●
● ● ●●● ● ●●● ●

10000
●●●
●● ●●
●
●●●
●● ●
●●
● ●●
●● ●● ●●●●
●
●●
● ●
●●
●●●● ● ● ●
●●●
●●●●
●●
●●
●●
●
●●
●●●
● ●●●●●
●
●●
●●
● ●●●●
●
●●●●
● ●
●●●
●●●
● ●●
●
●●●
●●
● ●
●●●●● ●
●●●
● ●●
● ●●
●
●●●
●●
●●●
●● ●
● ●
●●●●●
●● ● ●
●●
●●
●
●
●
●
●● ●
●
●
●
●
●
●
●●
●●
●
●
● ●● ●● ●●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
● ●
●
●
●● ●
● ●
●●
●
●●
●●●
●●
●
●●
●
●
●
●
●● ●
●
●●●
●●
●●●●
●● ● ●● ●●
●●●
●●●
●
●●
●●
●
●●●
●●
●●
● ●● ●● ●
●●● ●●
●●
●
●●●
● ●
●●●
●
●
●● ●
●●
●
●●●
●
●● ● ●● ● ●
●
●●●
●
●● ●
●●●
●
●●
●
●●
●●
●●●
●
●●●● ● ●
● ●●
●●●
●●
●●●
●●● ● ●● ●●●●●● ●●●●●●●
●●●●●
●
● ●
●●●
● ●
●●
●●
●
●
●
●
●
●
●
●●
●●
● ●
●
●
●
●●
●●
●
●●
●
●●● ● ●
●●
●● ●●●●
●●●
●●
● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
● ● ●● ●
● ●
●●●●
●●
●●●
●
●
●●●
● ●●●
●●● ●
●● ● ●●● ●
●●
●●●
● ●
●
●●●●●●●
●●●
●●●●
●●
●
● ●●
●●
● ●●
●
●
●●●
●
●●●
●
●●
●
●●●
●
●●
●●●
● ●
●●●
●●●
●●●●
●
●
●●
●●
●
●●
●
●
●
●
●●
●
●
● ●
● ●
●● ●
●●●
●●●
●●
●●
●
●●
●
●
●●●
●●●● ●
●
● ● ● ●●●●●●
●●
●● ●
●●●●●
●●
●●
●
●
●
●●
●●
●● ●
●● ●●●●
●●●●
●
●
●●
●
● ●
●●●
●
●● ● ●
●●
●●●●
●
●●●●
● ●
●●●●
●●●●●
●●●● ● ●●
●●●
●●●
● ● ●●● ●●●●
●●●● ●●
●●
●

Transactions
● ●●●
●●
●
●
●●● ●● ●● ●
●●●
●●
● ●
● ●
●●
● ●●
●●
●●●●●
●●●
●●●
●●
●●
●●●
●
●
●
●●
●●
●
●●
● ●●
●● ● ●●
●●●
●●●
● ●●
● ●●
●●●●●●
●●
● ● ●●
●● ●
●
●●●
● ●
●●
●
●●
●● ●
●●●
●
●
●●●
●
● ●●
●
●●●
●●●● ●
●●
● ●●● ●
●●
●● ●●
●
●
●
●●●
●● ●●
●●
● ● ●●
● ●
●●
●●● ●●
●
● ●●
●
●●●
● ●
●●
●
●
●●●● ●●●● ●●●
●
●●
●
●●
●
●
●
●●●●
●
●●●
● ●
●
●
●
●●
●
● ●● ●● ●●
●●
●●●
●●
●
●●
●
●
●●● ●●
●
●● ●
●●● ● ●●
●●●
●
●
●
●●
●
●●
●
●
●
●●●
●●●
●●
● ●
●●● ● ●● ●
●
●●●
●
●
●
●●●
●
●●●
●
●●
●
●
●
●
● ●● ●
●
●●
●●●
●●
●
●
● ●
●●
●●
●
●
●●
● ●●●
●
●●● ●
●
●
●●
●
●●
●●● ●●
●
● ●●●● ● ●
●●
● ●
●
●●●
●●
●●●●
● ●
● ●●●●
●●
●
●●
●●●
●●
●● ●
●
● ●
● ● ●●●
● ●●●
●●
●
●●
●
●●
●●●
● ● ●●●● ● ●●
●
● ●
●
●
●●●●
●
● ●●
●●● ●
●●
●
● ● ● ●

5000
● ●● ●●
●●●
●
●●●●
●
●
●
●
●
●
●●
●●
●
●●●●●● ● ● ●● ● ●●●
●●
●●
●
●
●●
●
●
●●
●●●●● ● ● ● ●
● ●
● ●● ● ●● ●
● ●
● ●●
●●
●
●●●
●
●●
●
●
●●
●
●●
●
●●
● ●●● ●● ●
●●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●●●
●
● ●
●
● ●
●
●●
●
●
●●●●
●
●
●●
●●
●
●●
●●
●●●
●
●● ● ●● ●●
●● ● ●● ●●●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●●
●
●●
●
●
●
●● ●
● ●●
●●
●
●
● ●
●
●
●
●●
●
●●
●
●
●
●●● ●
● ●
●●
●●
●●
●
●
●●
●
●●
●●
●
●●
● ●
● ● ● ●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ● ●● ● ●●● ●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●● ●●
●●
●
●
●
●●
● ●
●
●●
●
●
●
●
●
●
●●●●
●
●
●● ● ● ●● ●●
●
●●
●
●●
●
●
●
●
●●●●
●
●
●
●●
●●
● ● ●
●
●● ●●●
●●●● ● ● ● ● ●
●●
●
●● ●●
●●● ●
● ●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●●●●●●● ● ● ●●
● ●●●
● ●
●
●●
●
●
●
●
●
●●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
● ● ●
●●
●●
●●●
●●
●●
●
●
●
●
●
●
●
●
●
● ●
● ●●●
● ●● ● ●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●●
●●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●
● ● ●
● ● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●●
●
●
●●
●●●
●
● ●
●●●
●●●
● ● ● ●●
●
●●●
●●
●
●●
●
●●
●
●●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●● ●
●● ● ● ●●● ●●●
●
●
●●
●
●●
●
●
●
●
●●●●● ●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●● ● ●● ● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●
●●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●● ● ● ●
●
●●
●
●
●
●
●●
●
●
●●
●●●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●● ●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●●
●
●
●●
●●
●
●●● ● ●
●
●●
●
●
●
●●
●●
●●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●● ● ● ●
●
●
●●
●
●
●
●●
●
●
●●
●●
●●
●
●●● ●
●●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
● ● ●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
● ●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
● ●
●
●●
●
●●
●●
●
●●
●●
●●
0

●●
●
●
●● ●
●
●
●

0 100 300 500 700 0 5000 10000 15000

Figure 4: This chart shows the correlation between three metrics: users, logins, and transactions. Note that axes have different
scales.

For contrast, we also compare the number of logins to the number of users. The correlation coefficient is still fairly
high, but not as high as the previous comparison. It is easy to see many outlying values that reduce the coefficient. One
of them corresponds to the feature in Figure 3c where the number of logins was much higher than the number of users.
Many others correspond to the spikes in Figure 1. The anomalies are not numerous and can probably be explained (e.g.,
a system crash). A similar correlation occurs between logins and transactions. Many outlying values are visible. Note that
the scatterplot tells us more than the correlation coefficient does (because it expresses more data).
We have performed the correlation to establish a strong relationship between users and transactions. This is necessary
as we consider a growing user base. This will imply a growing transaction load. Our later analysis will attempt to determine
if the workload changes with the growth.
Another reason for doing this analysis is because some testing results were providing contradictory evidence against
a relationship. The results here say that we should question our tests if the tests do not have a number of transactions
that relates to the number of users. It turns out that some simulated users were failing during tests, resulting in a varying
number of transactions for the tests. In reality, the number of users was sometimes lower than was reported. Therefore,
the relationship was still probably very strong. Thorough analysis will often uncover problems and eventually explain them.

2.3 Trending
Trending analysis provides us with an understanding of a general parameter behavior over time. More specifically, we
are interested in whether or not the load on the system is growing. The rate of growth is not an issue here, but could be
for a capacity planning effort. What we need to know is whether or not the workload changes as the load grows. We will
limit ourselves to the “growth” aspect here and will consider the impact of the growth on workload in Section 3.2.
A trend is computed via linear regression. Figure 5a shows the growth trend for transaction counts across the entire
measurement interval. Even though the trend shows an increase, its location is deceptive (it is well below the peaks). This
is because the data vary so much.

Transaction Count Trending Transaction Count Peak Trending

15,000 Transactions 15,000

Trend

10,000 10,000
Count

Count
5,000 5,000

Transaction Peaks
Trend
0 0

Mar May Jul Sep Nov 10 15 20 25 30 35 40

Date Week Number

(a) (b)

Figure 5: These charts show the transaction growth trend for the measurement interval. (a) This chart shows the trend
computed from all of the data. (b) This chart shows the trend computed from weekly maximums.

We would prefer to understand how the peaks are trending. Peaks can be computed in a few ways, but we will compute
a maximum transaction count for each week. This will eliminate a lot of data from the regression. Figure 5b shows the
growth trend for transactions by looking only at the peaks. Again, an increase is shown. The elimination of detail is very
helpful. The seasonal events are more obvious with this view of the data. Their impact on the trend location is worth
noting.
Both figures in Figure 5 tell us that there in an increase in load over time. This is expected and the explanation is
simple: The customer has been increasing its user base over time as more and more organizations have transitioned to
using the system. This explanation implies a user growth rather than a transaction growth, but we already know there is
a strong relationship between users and transactions. While the first chart is sufficient to establish the growth trend, it
may not leave a lasting impression due to the location of the trend line. The peak trend is certainly more demonstrative.

3 Workload Correlation
Workload is the collection of activities being executed on the system. It is function of the number of users, the types
of activities, and their timings. The timing of activities will be effectively eliminated by just counting the activities over an
interval. These counts are a function of the number of users. For simplicity, we will normalize the counts to frequencies.
So, a workload is summarized as a set of frequencies. Different user counts and different intervals can be compared if we
can compare sets of frequencies.
We will define workload correlation and then compare various workloads in order to answer our remaining questions.
Our analyses in the previous section support the analyses here.
3.1 Example Correlation
We will adjust the concept of correlation slightly to apply it to workload. Instead of comparing data for two parameters,
we will be comparing two sets of frequencies. In the case of the parameter correlation presented earlier, the data are paired
according to the time at which they were measured. Such a pairing can also exist based upon a subject with which the
measurements are associated (e.g., a subject’s height and weight). Here, the pairing is formed by the transaction name
(e.g., a search transaction).
We will demonstrate the workload correlation concept with three trivial workload samples shown in Table 1. There
are five transactions. Each sample consists of a collection of transactions from which the frequencies of the transactions
are derived. The first two samples “look” similar to each other and should have a high correlation. The third sample is
significantly different.

Table 1: Sample Workloads

Sample
Transaction 1 2 3
Transaction 1 35% 35% 20%
Transaction 2 30% 35% 15%
Transaction 3 25% 20% 30%
Transaction 4 7% 10% 10%
Transaction 5 3% 0% 25%

Figure 6 shows a scatterplot, the regression line, and the correlation coefficient for each pair of samples. Here, strong
correlation is shown in green and poor correlation is shown in red. Sets 1 and 2 are similar; set 3 is significantly different
when compared to either of the other two. Other examples can be created where the correlation coefficient can take on
any value between -1 and 1.2 What values are considered to be “good” is subjective.

Workload Sample Correlation

0.00 0.10 0.20 0.30

0.35
0.25

Sample 1 0.964 0.100

0.15
0.05

● ●
0.964
0.30
0.20

Sample 2 −0.154
0.10

●
0.00

●
0.30

● ●
0.100 −0.154
0.25

● ●
0.20

● ●
Sample 3
0.15

● ●
0.10

● ●

0.05 0.15 0.25 0.35 0.10 0.15 0.20 0.25 0.30

Figure 6: This notional example illustrates how correlation is performed for workload samples. A good correlation coefficient
is in green; a bad one is in red.

This example had only 5 transactions. Our data have almost 350. That observation does not change the approach—the
more data the better.
2 Whether or not a result of -1 has a logical meaning has not been investigated. Such a result would mean that one workload is a reflection

of the other.
3.2 Correlation of Various Workloads
If we are going to compare real workloads, we need to define some. Figure 7 shows the transaction counts for a
one-month interval where circles highlight the peak transaction hour for each work day and a square highlights the peak
transaction hour for the month. We will define several periods for each month: All hours (AH) is the period (i.e., the
entire month) that contains all data; work-day maximum hours (WMH) is the collection of one-hour periods containing
the most transactions for each work day (i.e., a week day that is not a holiday); and monthly maximum hour (MMH) is
the one hour period containing the most transactions for the month. So, in Figure 7, the entire interval represents AH,
all of the circles represent WMH, and the square represents MMH. The AH data set contains intervals where the number
of transactions varies greatly. The WMH data set is a small subset of the AH data set, where the number of transactions
in the intervals does not vary greatly. The MMH data set is a subset of the WMH data set.

Peak Hours for October Week: October 11−18

● ●
15,000 ● 15,000
● ● ● ●
● ●
● ●
● ● ●
● ●
● ● ●
● ●
● ●
●
● ●

10,000 10,000
Transactions

Count

5,000 5,000

0 0

Oct 03 Oct 08 Oct 13 Oct 18 Oct 23 Oct 28 Nov 02 Oct 11 Oct 13 Oct 15 Oct 17

Date Date

(a) (b)

Figure 7: This chart highlights peak values for the transaction count metric for each weekday (i.e., weekends are not considered)
using circles. The maximum value is highlighted with a square. (a) This chart shows the entire month. (b) This chart shows
one week in order to highlight that the circles and square correspond to only one hour.

Figure 8 shows a matrix of month-by-month comparisons of the AH workloads. The text color is green if the correlation
coefficient is greater than or equal to 0.990, purple if it is greater than or equal to 0.900 but less than 0.990 (there are
none in this figure), and red otherwise. The coloring scheme and the associated ranges are arbitrary. A lot of information
can be quickly consumed with this visualization approach.
For months March through July, there is high correlation between each pair of months—similarly, for months August
through October. When comparing a month from the former range to a month from the latter range, the correlation
is very poor. The bad correlation is a result of the transaction renaming modifications as previously described (refer to
Section 2). The unique naming of many transactions had a large impact on the correlation because the pairings are
affected. However, we can probably safely conclude that the workloads are consistent over the eight-month interval.
Figure 9a shows a matrix of month-by-month comparisons of the WMH workloads. The analysis is similar to that for
the AH workloads. Again, we can conclude that the workload is not changing appreciably for different WMH intervals.
Figure 9b shows a matrix of month-by-month comparisons of the MMH workloads. In this case, all of the correlation
coefficients are purple, reflecting less correlation than the other workloads. While the workloads have high correlation,
there is more variation across the workloads in comparison to the AH and WMH workload correlations. While the risk is
probably low that a bad test would be created, it would be advisable not to base a test on such a sample if using a larger
sample is not costly to analyze.
All of the matrices of graphs provide a quick glance at the comparisons. The conclusion that we can draw is this:
Constructing a workload from an interval that is one hour out of a month carries some risk, even if the risk is low. However,
AH Workload Correlation
0.00 0.08 0.00 0.08 0.00 0.06 0.00 0.06

0.08
Mar 0.997 0.999 0.999 0.999 0.251 0.241 0.248

0.00
●
0.997

0.08
Apr
●

●
●●
●●
● 0.997 0.998 0.997 0.266 0.256 0.263
●●
●
●●

0.00
●●
●●
●●
●●
●
●
●
●
●
●●
●

● ●
0.999 0.997

0.08
● ●

●●
●●
●
●
●
●
●
●●
●
●
●
● May 0.999 0.998 0.249 0.240 0.247
●
●● ●●

0.00
●●●
●● ●●
●
●
●●
●
●
●●
●●
●
●●●
●
●
●
●
●
● ●
●
●
●●
●

● ● ●
0.999 0.998 0.999

0.08
● ● ●

●●
●
● ●
●●
●

●●
●
●●
●
●
●

●●
●●●
●
●● Jun 0.999 0.258 0.247 0.254

0.00
●●
●
●
●● ●
●
●
●● ●
●
●
●
●●
●
●
●● ●
●● ●●
●
●●
●● ●●
●● ●
●●
●
●
●
●
● ●
●
● ●
●

● ● ● ●
0.999 0.997 0.998 0.999

0.08
● ● ● ●

●●
●
●
●●
●●
●

●●
●
●
●●
●
●
●

●●
●●●
●
●●
●●
●
●
●
●
●
●● Jul 0.267 0.257 0.264
●● ● ● ●

0.00
●
●●
●
● ●●
●
●● ●
●●
●
● ●
●
●●
●
●
●● ●●
● ●●
● ●●
●
●●
●
● ●●
●● ●
●●
●
● ●●
●●
●
● ●
●
● ●
● ●
●
●

● ● ● ● ●
0.251 0.266 0.249 0.258 0.267
0.06

●
● ●
● ●
● ●
● ●
●

●
●
●
● ●
●
●
●
●●
●
●
●
●
● ●
●●
●
●
●● ●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
● ●
●
● ● ●
●●
●
●●
●
Aug 0.995 0.996
● ● ● ● ● ● ●● ● ● ● ●
●● ● ●
● ● ●
●
● ● ●
● ● ● ●
0.00

● ● ● ● ● ●
0.241 0.256 0.240 0.247 0.257 0.995●

0.06
● ● ● ● ●

● ● ● ● ● ●

●
●
●
● ●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
● ●●
●
●
●
●

●
●●●
●
●
●
●
●
●
●
Sep 0.999
●
●● ●● ● ●
● ●●●● ● ●
● ●●●
● ● ●
● ●●●
● ● ●
● ●●● ● ●●●●
●
● ● ● ● ● ●● ●●●

0.00
●●●
●● ●●●● ●●●● ●●● ●●● ●
●●
●
●
●●
●
● ● ●
●●
●● ● ●
●
●●
● ● ●
●
●●
● ● ●
●●
●
● ● ●
●
●
●●
●●
●
●
●● ● ●● ●● ● ● ●
●
●
●
●● ● ●●●
● ● ● ●
●
●
●
● ● ●●●●● ● ●
●
●
● ● ●●●
●● ● ●
●
●
●
●●
●●● ●●● ● ● ●
●
●●
●
●●
●●
●●
● ●
● ●●
●● ●
●● ●
●
●●
●●●●● ●
●●●●
●● ●●
●
● ●

● ● ● ● ● ● ●
0.248 0.263 0.247 0.254 0.264 0.996● 0.999 ●
0.06

● ● ● ● ●
● ● ● ● ● ● ●

●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●

●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●

●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●

●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●

● ●●
●
●
●
●●●●
●
●
●
●
●
●
●

●
●
●
●
●
●
●
●●
●
●
●●
●●
●

Oct
●● ● ●●● ●●
●
0.00

0.00 0.08 0.00 0.08 0.00 0.08 0.00 0.06

Figure 8: This matrix of charts shows a month-by-month comparison of the AH workloads. Color is used as a quick indicator
of high and low correlation.

WMH Workload Correlation MMH Workload Correlation

0.00 0.08 0.00 0.08 0.00 0.06 0.00 0.06 0.00 0.06 0.00 0.06 0.00 0.06 0.00 0.06
0.08

0.00 0.08
Mar 0.995 0.997 0.998 0.997 0.274 0.275 0.281 Mar 0.944 0.969 0.967 0.975 0.236 0.290 0.256
0.00

● ●
0.995 0.944
0.08
0.08

Apr Apr
●

●●
●
●
●●
●
0.997 0.997 0.997 0.291 0.291 0.298 ● ●
● ● ●
● 0.951 0.942 0.900 0.268 0.348 0.302
●●
● ●●● ● ●●
● ●●
●
0.00

0.00

●
●
●
●●● ●●●●
●
● ●
●●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●●
●●●
● ●
●
●

● ● ● ●
0.997 0.997 0.969 0.951

0.08
0.08

● ●

●
●
●●
●
●
●
●
●●
●●
●● May 0.999 0.998 0.274 0.275 0.280
●
●
●
●
●
●●●
●

●●
●
●
●●● ●
●

May 0.981 0.962 0.246 0.313 0.267

●● ●● ●●
● ●●●
● ●● ●
●
0.00

0.00
●
●●
●
●●● ●
●●
●● ●●● ●●●
● ●
●●
●
●●
●●
●
●●●
●
● ●
●●
●●
●
●
●●
●
●●
●
●●
●●
●
●●●●
●●
●
●
●
●
●
●● ●
●
●
●●
● ●
●
●
●
●
●● ●
●
●
●
●●●

● ● ● ● ● ●
0.998 0.997 0.999 0.967 0.942 0.981
0.08

0.00 0.06

● ● ● ● ● ●

●●
●●
●
●●
●
●●
●●
●
●
●
●●
●
●●
●
●
●
●●
●●
Jun 0.998 0.284 0.285 0.290 ●● ●
● ●● ●
● ●
●
●
●● ●
●
●
● ● ●
● ● ●●
●
●
●
●
● ●●●
●●
●●
●●
●
● ●●
●
●
Jun 0.977 0.265 0.339 0.289
●
● ●●
●● ● ●
●● ●
●● ●● ● ●●
●
0.00

●
●
● ● ●●
● ● ●●● ●
● ● ● ● ●●
●
●
●
●
●● ●●
●
●
●
● ●
●
●
●● ●
●●
●●
● ●●●
●
●
● ●
●●●
●●
● ●●
●
● ●
●●
●
● ●
●
●
●
●
●●
● ●
●
●
●●● ●
● ●
●
●
●
●●●
●
●
●
●
● ●
●
● ●
● ●
●
● ●
●
●
● ●
●
● ●●

● ● ● ● ● ● ● ●
0.997 0.997 0.998 0.998 0.975 0.900 0.962 0.977

0.08
0.08

● ● ● ●

●●
●●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●●●
●
●●
●●
●
●
●●
●
●
●
● Jul 0.281 0.281 0.287
●
●
●
● ●
●
●

●●●● ●
●

●
●
●●

● ●● ●
●●
●

●●
●
●
●
●●
●●●
●
●● ●

●
●●
●●
●
●
●
●
●● ●
Jul 0.240 0.293 0.257
● ●
●●● ●●
● ● ● ● ●
0.00

0.00
●
● ● ●
●●
●
●
● ●●●
● ●
●
●●
● ●
●
●●
●
● ●●●●
●● ● ●
●● ●
●● ● ●●●●
●●● ● ●●●● ●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●
●
● ●
●●
●
●
●
● ●●
●
●
●
●●
●
●
● ●
●
●●
●●●●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
● ●
● ●
●
●● ●
●
●● ●
●
●
●●●● ●
●
●●

● ● ● ● ● ● ● ● ● ●
0.274 0.291 0.274 0.284 0.281 0.236 0.268 0.246 0.265 0.240
0.06

● ● ● ● ●
0.06

● ● ● ● ●

●
●
● ●
● ●● ●
●
●
●

● ●
●
●
●
●
● ●
●
●●●
●
●
●
●

●
●
●
●
● ●
●
●
●●
●
●
●

●
●
●
● ●●
● ● ●
●
●
●●

●
●
●
● ●●
● ● ●
●
●
●

●
Aug 0.993 0.993
●

● ●
●
●
●
● ● ●
● ●
●
●

● ●
●
●
●
● ● ●
●
●
●
●

● ●
●
●
●
● ●● ●
●●
●
●
●

● ●
●
●
●
● ●●●
●●
●
●

● ●
●
●
●
● ●● ●
●●
●
Aug 0.955 0.958
●
● ●● ● ●●●
● ●
● ● ● ● ●●
● ● ●●●
● ●● ● ● ●
● ●●
●● ● ● ●● ● ●
● ●●● ●
●●● ● ●●●● ● ● ● ● ● ● ●●● ●
0.00

0.00

● ● ● ● ● ● ● ● ● ● ● ●
0.275 0.291 0.275 0.285 0.281 0.993● 0.290 0.348 0.313 0.339 0.293 0.955
0.06

● ● ● ● ● ● ● ● ● ● ●
0.04
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●

●
●
●
●
● ●
●●●
●
●
●
●
●
●
●
●
●
● ● ●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
● ●
● ●●
●
●
●
●
●

●
●
●
●
●●●●
●
●●
●
●●
●
Sep 0.997 ●
●●●
●
● ●
●
●
●● ●
●●●
●
●
●

●
●
● ●
●
● ●
●
●
●●
●●
●
●
●
●
●●
●
● ●
●
●
●
●
●●● ●
●
●
●
●●● ●●
●
● ●●●●
●
●
●
●
●● ● ●
●
●●
●
● ●
● ●
●
●
●
●
●●● ●
●
●
● ●
●● ●
●●●
●
●
●●
●
●●
●
● ● ●
● ●●
●
●● ●
Sep 0.977
●●
● ● ● ●
● ●● ● ● ●
● ●
● ● ● ●
● ●● ● ● ●
● ●● ● ●●●
●
●
●● ●●
●●●● ●
●●●● ●
●
● ●
●●●
● ●
● ●●●
● ●
● ● ●
●
● ●● ●
●
●
●
●●●
0.00

0.00

● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.06

0.281 0.298 0.280 0.290 0.287 0.993● 0.997 0.256 0.302 0.267 0.289 0.257 0.958 0.977
0.06

● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●● ●●

Oct Oct
● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ●
●● ●●
● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ●●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ●
● ●●
●
●
● ● ●● ●●
●
● ●● ●● ●●
●
● ● ● ●
●● ●● ●
● ●●
●
● ●
● ●● ●
● ●
●● ● ● ●
●●● ●
●
● ●
●● ● ●● ●●
● ●
● ●●● ●
●
● ●
● ●● ● ●● ● ●
●
●
● ● ●
● ●● ● ●●
● ●●
●
●
● ● ● ● ● ●●
●
●●
●
●
●●●●●
● ● ●●
● ● ● ●
● ● ● ●
● ● ● ●●
● ● ●
●●
●●
● ●
●●●
●
● ●●●●●●
● ●
● ●●●
●●
●
● ●●● ●
● ● ●
●
● ● ●● ●
● ●●●● ● ●●● ● ● ●
● ●●●●●●
● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●●● ●
●
0.00

0.00

0.00 0.08 0.00 0.08 0.00 0.08 0.00 0.06 0.00 0.08 0.00 0.08 0.00 0.08 0.00 0.04

(a) (b)

Figure 9: These two matrices of charts show a month-by-month comparison of the WMH and MMH workloads. The correlation
of the MMH workloads is slightly less than the correlation of the AH workloads or the WMH workloads.

it is just as easy to construct a workload from a larger sample that has even lower risk. We prefer the WMH workload to
the AH workload, even though the two workloads do not seem to be noticeably different.
Next we consider comparing the different workloads within a month: AH vs. WMH vs. MMH. Rather than creating
a collection of matrices of graphs, we have opted to create a graph of correlation coefficients. Figure 10a shows these
comparisons in a chart. While we have plotted the results with lines, we should note that the x-axis is categorical. This
presentation format is simpler to visually consume than a collection of bars in a bar chart or a collection of matrices. What
this says is that there is some variation in workload if we limit ourselves to sampling one hour out of the month. If we
limited ourselves to sampling the work day hours of one month (typically including 20-23 hours), there would be less risk.

April Workload Correlation

Workload Correlation Within Month
0.00 0.04 0.08

0.12
1.00

0.08
AH 0.994 0.950

0.04
0.98

0.00
Correlation Coefficient

●
0.994
0.96

0.08
●

●
●
WMH 0.956

0.04
0.94 ●
●
●●
●●●

●
●
●
●●
●
●
●
●
●●●●
●●
●
●●

0.00
●●
●
●
●
●●
●●
●
●
●
●
●
●

● ●
0.950 0.956
0.92
● ●

0.08
AH vs. WMH
WMH vs. MMH MMH
AH vs MMH ●● ●●

0.04
● ●

0.90 ●●
● ● ● ●● ●● ●

● ●
● ● ● ●
● ● ● ● ● ●● ● ● ●
●●●● ● ● ●●● ●
● ●
●
● ●●
●●
●● ● ● ●●●
●● ●●● ● ●

Mar Apr May Jun Jul Aug Sep Oct ●●● ●● ●

0.00
●●● ●●
●
●
●●
●●
● ●
●●
●
●●●
●
●
●
●●
●
●● ●
●
●
●●
●
●
●
●
●
●●
● ●●
●

0.00 0.04 0.08 0.12 0.00 0.02 0.04 0.06 0.08 0.10
Month

(a) (b)

Figure 10: (a) This chart shows how the different workloads compare to each other within the same month. The WMH
workload consistently correlates against the AH workload. The MMH also has good correlation against AH and WMH, but
variation is higher. We should highlight that all values are above 0.9 and are still good. Note that the y-axis does not extend
to 0. (b) This chart compares correlation results for various workloads within the month of April. The results define the points
in the April column of (a).

Figure 10b compares the three data sets within April to illustrate how the data in Figure 10a are computed. April was
selected because it had the lowest correlation coefficients in Figure 10a. What the comparisons say is this: When WMH is
compared to AH, a very high correlation exists. When MMH is compared to AH or WMH, the correlation is a little lower.
Considering AH alone, there does not seem to be very much workload variation. This is further supported by analysis
of WMH. We can also conclude that workload is not changing with growth since our trending analysis shows that the user
base has grown during the eight-month interval.

4 Use R!
We have intentionally not made this paper about R, but want to highlight its merits. Originally, most of the analysis
was done in Microsoft Excel. The major exceptions are the matrix figures. Excel’s advantages are (1) familiarity to others,
(2) an interactive nature, and (3) the ability to see a lot of numbers quickly. The interactive nature comes from using
forms/controls to select data and to turn features on and off. Excel’s disadvantages are (1) slowness in computing large
spreadsheets, (2) the difficulty in creating some chart types (e.g., creating a boxplot), and (3) the changes that occur
across releases and/or inconsistency across platforms (e.g., PC vs. Macintosh). It is certainly an acceptable analysis tool,
especially considering advantage #1.
The usage of R has been written about previously (e.g., [Hol04] and [Hol05]) and there are numerous books (e.g.,
[ZIM09] and [Sar08]). Once the learning curve is overcome, it is possible to process data more quickly than with Excel.
An expert in Excel requires time to build a system of formulas to present the data. Significantly less work is required in
R. R allows us to graph our data quickly and control its presentation. It allows us to put many parameters in the same
figure or examine the data according to various groups. The graphs in this paper were created with the standard graphics
package ([R D09]) and the lattice package ([Sar09]).
5 Conclusions
One of our primary concerns was the selection of a sample of real workload on which to base a test workload. The
existing choice was to select the one hour period within a month that had the largest transaction count. Such samples from
several months were compared to each other as well as to larger samples. A tremendous amount of data can be quickly
consumed when presented as a matrix of graphs. We concluded that choosing such a sample is not very risky, although
some of that risk is unnecessary. Choosing a larger sample resulted in very little variation in content. Larger samples
included peak-transaction-hours from many work days of the month. The visualization techniques were not necessary, but
were helpful in reaching this conclusion.
We also looked at growth trends using visualization. This confirmed our expectation of how the system was being used.
Visualization was also not necessary here. However, visualization simplifies finding irreqularities in the usage patterns (like
the login spike of Figure 3c).
We computed some correlations between metrics for the system. A key relationship was the one between users and
transactions. We confirmed this relationship and suggested corrections to our testing model. Again, the visualization is
not necessary, but is helpful in communicating the results to a decision maker.

References
[EM02] Said Elnaffar and Pat Martin. “Characterizing Computer Systems’ Workloads”. Technical report, 2002.

[Hol04] James Holtman. “Using R for System Performance Analysis”. CMG 2004, 2004.
[Hol05] James Holtman. “Visualization Techniques for Analyzing Patterns in System Performance Data”. CMG 2005,
2005.
[R D09] R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for
Statistical Computing, Vienna, Austria, 2009.
[Sar08] Deepayan Sarkar. Lattice: Multivariate Data Visualization with R. Use R! series. Springer Science+Business
Media, 2008.
[Sar09] Deepayan Sarkar. Lattice Graphics, 2009. R package version 0.17-25.

[ZIM09] Alain Zuur, Elena Ieno, and Erik Meesters. A Beginner’s Guide to R. Use R! series. Springer Science+Business
Media, 2009.

CH 12
No ratings yet
CH 12
53 pages
Capacity planning in Oracle database
No ratings yet
Capacity planning in Oracle database
30 pages
STEM PPT
No ratings yet
STEM PPT
16 pages
Bundle Test Bank Guide to the Code of Ethics for Nurses Interpretation and Application 2nd Edition eBook and TestBank Bundle Instant Download
No ratings yet
Bundle Test Bank Guide to the Code of Ethics for Nurses Interpretation and Application 2nd Edition eBook and TestBank Bundle Instant Download
404 pages
CM All
No ratings yet
CM All
282 pages
Assignment II
No ratings yet
Assignment II
5 pages
Solution Manual for Systems Analysis and Design, 7th Edition, Alan Dennis download
100% (2)
Solution Manual for Systems Analysis and Design, 7th Edition, Alan Dennis download
53 pages
CMG1990_Perf engg formulas equations and relationships
No ratings yet
CMG1990_Perf engg formulas equations and relationships
18 pages
capture system data usign native commands
No ratings yet
capture system data usign native commands
14 pages
Performance testing
No ratings yet
Performance testing
78 pages
Workloads 02 Tutorial
No ratings yet
Workloads 02 Tutorial
149 pages
A Tiered Approach to Performance Engineering
No ratings yet
A Tiered Approach to Performance Engineering
10 pages
Analysis of web transactions in a banking application
No ratings yet
Analysis of web transactions in a banking application
10 pages
San06ISPASS
No ratings yet
San06ISPASS
10 pages
review 2
No ratings yet
review 2
7 pages
Performance Analysis of Cloud Applications
No ratings yet
Performance Analysis of Cloud Applications
22 pages
Sea facing Comfort room
No ratings yet
Sea facing Comfort room
4 pages
Accurately recreating web workloads using production data
No ratings yet
Accurately recreating web workloads using production data
29 pages
Operation Analytics and Investigating Metric Spike
No ratings yet
Operation Analytics and Investigating Metric Spike
26 pages
EPREUVE D'ANGLAIS CLASSE DE 4EME 2EME DEVOIR DU 2EME TRIMESTRE 2023-2024 CPEG SAINT JUSTIN
No ratings yet
EPREUVE D'ANGLAIS CLASSE DE 4EME 2EME DEVOIR DU 2EME TRIMESTRE 2023-2024 CPEG SAINT JUSTIN
2 pages
capacitry planning for DBMS with OLAP workloads
No ratings yet
capacitry planning for DBMS with OLAP workloads
13 pages
Ch1a Slides
No ratings yet
Ch1a Slides
33 pages
L36 - IO Perf Measures
No ratings yet
L36 - IO Perf Measures
10 pages
man-is-part-of-the-whole-life-love-joy-truth-compassion
No ratings yet
man-is-part-of-the-whole-life-love-joy-truth-compassion
390 pages
Project Related Questions
No ratings yet
Project Related Questions
8 pages
Benchmarking and modelling
No ratings yet
Benchmarking and modelling
10 pages
Unit 14 Automated Assembly Systems
No ratings yet
Unit 14 Automated Assembly Systems
23 pages
CMG tending for capacity plannign
No ratings yet
CMG tending for capacity plannign
7 pages
Perf Teaching 3
No ratings yet
Perf Teaching 3
16 pages
Workload Characterization
No ratings yet
Workload Characterization
5 pages
capacity management techniques
No ratings yet
capacity management techniques
6 pages
my-guru-my-mentor-oh-my-god-last-man-on-earth-first-god-on-earth-coach-teacher–happy-life-live-and-live-meaning-inspire-1
No ratings yet
my-guru-my-mentor-oh-my-god-last-man-on-earth-first-god-on-earth-coach-teacher–happy-life-live-and-live-meaning-inspire-1
194 pages
CMG1983_capacity modelling case stduy BKB app
No ratings yet
CMG1983_capacity modelling case stduy BKB app
9 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
CMG - Managing Capacity Planning Through Simple Techniques
No ratings yet
CMG - Managing Capacity Planning Through Simple Techniques
11 pages
Group 2, Discussion #1 Chapter 1, Questions 6-10
No ratings yet
Group 2, Discussion #1 Chapter 1, Questions 6-10
8 pages
CSC 417 Note
No ratings yet
CSC 417 Note
5 pages
CMG performance and scalibility of a .net application
No ratings yet
CMG performance and scalibility of a .net application
5 pages
Performance Testing of Software Systems: January 1998
No ratings yet
Performance Testing of Software Systems: January 1998
9 pages
best capacity planning method for evaluating large systems
No ratings yet
best capacity planning method for evaluating large systems
7 pages
Characterizing HEC Storage Systems at Rest
No ratings yet
Characterizing HEC Storage Systems at Rest
33 pages
64209
No ratings yet
64209
80 pages
Satyam-Shivam-Sundaram-Booklet-by-AiR-Atman-in-Ravi
No ratings yet
Satyam-Shivam-Sundaram-Booklet-by-AiR-Atman-in-Ravi
32 pages
What Is Black Box Testing ?
No ratings yet
What Is Black Box Testing ?
5 pages
Secured File Transfer Agent
No ratings yet
Secured File Transfer Agent
17 pages
T C EP M: HE O Erformance Ethod
No ratings yet
T C EP M: HE O Erformance Ethod
14 pages
Dada Bhagwan Part 5 Eng
No ratings yet
Dada Bhagwan Part 5 Eng
44 pages
Ten Names For Load Testing: Load Testing. I Prefer To Think of Load Testing As of A Blanket Term For All Other Types
No ratings yet
Ten Names For Load Testing: Load Testing. I Prefer To Think of Load Testing As of A Blanket Term For All Other Types
48 pages
CMG1989_capacity planning for unix systems
No ratings yet
CMG1989_capacity planning for unix systems
9 pages
Practitioner Flashcards Fronts
No ratings yet
Practitioner Flashcards Fronts
15 pages
Capacity costing methodology
No ratings yet
Capacity costing methodology
11 pages
CMG1987_3480 performance and CP
No ratings yet
CMG1987_3480 performance and CP
10 pages
CMG1988_How to obtain data for performance engineering studies
No ratings yet
CMG1988_How to obtain data for performance engineering studies
10 pages
Solutions for Problems in Systems Analysis and Design, 5th Edition by Roberta Roth and Alan Dennis
No ratings yet
Solutions for Problems in Systems Analysis and Design, 5th Edition by Roberta Roth and Alan Dennis
20 pages
CMG - Determining Capacity Utilization for 1 Lakh Servers (2)
No ratings yet
CMG - Determining Capacity Utilization for 1 Lakh Servers (2)
10 pages
Capacity planning tools and techniques
No ratings yet
Capacity planning tools and techniques
10 pages
capacity plannign for soa e-business application
No ratings yet
capacity plannign for soa e-business application
9 pages
CMG1985_Strategic capacity planning
No ratings yet
CMG1985_Strategic capacity planning
8 pages
INFO 6055 Week 2 A
No ratings yet
INFO 6055 Week 2 A
14 pages
Capacity Plan - An IT power tool
No ratings yet
Capacity Plan - An IT power tool
8 pages
Law-of-Attraction-Booklet-by-AiR-Atman-in-Ravi-1
No ratings yet
Law-of-Attraction-Booklet-by-AiR-Atman-in-Ravi-1
32 pages
Mind-is-Rascal-Booklet-by-AiR-Atman-in-Ravi-1
No ratings yet
Mind-is-Rascal-Booklet-by-AiR-Atman-in-Ravi-1
32 pages
best practices for server virtualization
No ratings yet
best practices for server virtualization
7 pages
Lovable_Laura_eng
No ratings yet
Lovable_Laura_eng
40 pages
Proud-Peter_eng
No ratings yet
Proud-Peter_eng
40 pages
What-is-Yoga-Booklet-by-AiR-Atman-in-Ravi-1
No ratings yet
What-is-Yoga-Booklet-by-AiR-Atman-in-Ravi-1
24 pages
AutoAnalysisITSysPerfMgmt
No ratings yet
AutoAnalysisITSysPerfMgmt
6 pages
CMG1986_top down capacity planning
No ratings yet
CMG1986_top down capacity planning
5 pages
Applications have performance
No ratings yet
Applications have performance
5 pages
Introduction to Service: What It Is and What It Should Be
From Everand
Introduction to Service: What It Is and What It Should Be
Harry Katzan Jr
No ratings yet
capacity plan table of contents
No ratings yet
capacity plan table of contents
4 pages
Cartoon Story 3
No ratings yet
Cartoon Story 3
20 pages
Chapter 12
No ratings yet
Chapter 12
17 pages
The-Ultimate-Goal-of-Life-Liberation_Print
No ratings yet
The-Ultimate-Goal-of-Life-Liberation_Print
32 pages
How To Do Capacity Planning: About The Author
No ratings yet
How To Do Capacity Planning: About The Author
13 pages
Performance Overview
No ratings yet
Performance Overview
6 pages
But-We-Pray_AiR-Atman-in-Ravi
No ratings yet
But-We-Pray_AiR-Atman-in-Ravi
32 pages
Rainbow-of-True-Love-Print-Booklet-by-AiR-Atman-in-Ravi
No ratings yet
Rainbow-of-True-Love-Print-Booklet-by-AiR-Atman-in-Ravi
32 pages
Science_C_Light Reflection and Refraction
No ratings yet
Science_C_Light Reflection and Refraction
8 pages
Ranger EagleEye3
No ratings yet
Ranger EagleEye3
2 pages
35 Database Examples: A Database Reference Book For Anyone
From Everand
35 Database Examples: A Database Reference Book For Anyone
Mark Hayford
5/5 (1)
SOUL-Booklet-by-AiR-Atman-in-Ravi
No ratings yet
SOUL-Booklet-by-AiR-Atman-in-Ravi
32 pages
Problem Statement # 01: Citizen Care Systems
No ratings yet
Problem Statement # 01: Citizen Care Systems
3 pages
Sangeet-Natak-Akademi-SNA-Recruitment-2025-Notice
No ratings yet
Sangeet-Natak-Akademi-SNA-Recruitment-2025-Notice
1 page
Unit 1
No ratings yet
Unit 1
24 pages
Operation Analytics and Investigating Metric Spike
No ratings yet
Operation Analytics and Investigating Metric Spike
13 pages
Q1-1st-SUMMATIVE-TEST-IN-ENGLISH-2-WEEK 1-2
No ratings yet
Q1-1st-SUMMATIVE-TEST-IN-ENGLISH-2-WEEK 1-2
4 pages
AI Image Generator PPT-1
No ratings yet
AI Image Generator PPT-1
15 pages
Performance Testing Gap Analysis Case Study
No ratings yet
Performance Testing Gap Analysis Case Study
8 pages
Analyzing Your Network
No ratings yet
Analyzing Your Network
3 pages
IT2032 Software Testing Unit-3
No ratings yet
IT2032 Software Testing Unit-3
39 pages
Presentasi Narrative Text Raihan Syah Ali Safaat
No ratings yet
Presentasi Narrative Text Raihan Syah Ali Safaat
10 pages
The Tuh Eeg Corpus
No ratings yet
The Tuh Eeg Corpus
5 pages
Operations Analytics and Metrics Spike _20241115_124811_0000
No ratings yet
Operations Analytics and Metrics Spike _20241115_124811_0000
11 pages
Project Specifications: Desktop Application
No ratings yet
Project Specifications: Desktop Application
13 pages
Ava Mar Technical
No ratings yet
Ava Mar Technical
530 pages
Operations Analytics
No ratings yet
Operations Analytics
13 pages
430 New Study
No ratings yet
430 New Study
88 pages
Higher CS Software Design Development
No ratings yet
Higher CS Software Design Development
24 pages
Capacity Planning For Application Design: White Paper
No ratings yet
Capacity Planning For Application Design: White Paper
10 pages
Log
100% (1)
Log
13 pages
Blended Learning
No ratings yet
Blended Learning
8 pages
Ins Elektrikal Nyy Jembo
No ratings yet
Ins Elektrikal Nyy Jembo
8 pages
Performance Testing Presentation On 03july
No ratings yet
Performance Testing Presentation On 03july
36 pages
TTD Seva Receipt - SEENU KALYANAM-1
No ratings yet
TTD Seva Receipt - SEENU KALYANAM-1
1 page
Korean Wave
No ratings yet
Korean Wave
12 pages
Project ICS 104 Term 232
No ratings yet
Project ICS 104 Term 232
7 pages
Drug Study Nursery
No ratings yet
Drug Study Nursery
6 pages
Bahria University, Islamabad Campus
No ratings yet
Bahria University, Islamabad Campus
11 pages
Philippines Environmental Laws and Policies
No ratings yet
Philippines Environmental Laws and Policies
3 pages
Kettlebell Program
No ratings yet
Kettlebell Program
1 page
Key Considerations in Performance Testing: Leslie Segal President Testware Associates, Inc
No ratings yet
Key Considerations in Performance Testing: Leslie Segal President Testware Associates, Inc
9 pages
Generations in The Workplace
No ratings yet
Generations in The Workplace
42 pages
Design and Analysis of Propeller Blade Geometry Using The PDE Method
100% (1)
Design and Analysis of Propeller Blade Geometry Using The PDE Method
215 pages
Physical and Chemical Changes Powerpoint
100% (1)
Physical and Chemical Changes Powerpoint
34 pages
Software Performance Workload Modelling
No ratings yet
Software Performance Workload Modelling
6 pages
St. Peter Lutheran School 2 Grade Class Handbook 2017-2018: Gail Raupp Samantha Crowe
No ratings yet
St. Peter Lutheran School 2 Grade Class Handbook 2017-2018: Gail Raupp Samantha Crowe
6 pages
EASA SIB in Flight Fuel Management
No ratings yet
EASA SIB in Flight Fuel Management
4 pages

CMG workload correlation and virtualization

Uploaded by

CMG workload correlation and virtualization

Uploaded by

The Association of System

For more information on CMG please visit http://www.cmg.org

Mar May Jul Sep Nov

Mar May Jul Sep Nov

Mar May Jul Sep Nov

15,000 Transactions 15,000 Transactions

Day: October 12 Day: October 13 Day: October 13

4,000 4,000 500

Time Time Time

0 100 300 500 700 0 5000 10000 15000

Transaction Count Trending Transaction Count Peak Trending

15,000 Transactions 15,000

Mar May Jul Sep Nov 10 15 20 25 30 35 40

Date Week Number

Table 1: Sample Workloads

Workload Sample Correlation

Sample 1 0.964 0.100

0.05 0.15 0.25 0.35 0.10 0.15 0.20 0.25 0.30

Peak Hours for October Week: October 11−18

0.00 0.08 0.00 0.08 0.00 0.08 0.00 0.06

WMH Workload Correlation MMH Workload Correlation

May 0.981 0.962 0.246 0.313 0.267

April Workload Correlation

Mar Apr May Jun Jul Aug Sep Oct ●●● ●● ●

You might also like