0% found this document useful (0 votes)
2 views

CMG - How to Validate Performance and Scalibilty

The Computer Measurement Group (CMG) is a non-profit organization focused on the performance evaluation and capacity management of computer systems. This document outlines a methodology for validating application quality, performance, and scalability, detailing the evolution of a Production Certification testing program to ensure applications meet performance standards before deployment. It includes requirements for a stable environment, application, and database, as well as specific pass/fail criteria for testing.

Uploaded by

kmdbasappa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

CMG - How to Validate Performance and Scalibilty

The Computer Measurement Group (CMG) is a non-profit organization focused on the performance evaluation and capacity management of computer systems. This document outlines a methodology for validating application quality, performance, and scalability, detailing the evolution of a Production Certification testing program to ensure applications meet performance standards before deployment. It includes requirements for a stable environment, application, and database, as well as specific pass/fail criteria for testing.

Uploaded by

kmdbasappa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

The Association of System

Performance Professionals

The Computer Measurement Group, commonly called CMG, is a not for profit, worldwide organization of data processing professionals committed to the
measurement and management of computer systems. CMG members are primarily concerned with performance evaluation of existing systems to maximize
performance (eg. response time, throughput, etc.) and with capacity management where planned enhancements to existing systems or the design of new
systems are evaluated to find the necessary resources required to provide adequate performance at a reasonable cost.

This paper was originally published in the Proceedings of the Computer Measurement Group’s 2001 International Conference.

For more information on CMG please visit http://www.cmg.org

Copyright Notice and License

Copyright 2001 by The Computer Measurement Group, Inc. All Rights Reserved. Published by The Computer Measurement Group, Inc. (CMG), a non-profit
Illinois membership corporation. Permission to reprint in whole or in any part may be granted for educational and scientific purposes upon written application to
the Editor, CMG Headquarters, 151 Fries Mill Road, Suite 104, Turnersville , NJ 08012.

BY DOWNLOADING THIS PUBLICATION, YOU ACKNOWLEDGE THAT YOU HAVE READ, UNDERSTOOD AND AGREE TO BE BOUND BY THE
FOLLOWING TERMS AND CONDITIONS:

License: CMG hereby grants you a nonexclusive, nontransferable right to download this publication from the CMG Web site for personal use on a single
computer owned, leased or otherwise controlled by you. In the event that the computer becomes dysfunctional, such that you are unable to access the
publication, you may transfer the publication to another single computer, provided that it is removed from the computer from which it is transferred and its use
on the replacement computer otherwise complies with the terms of this Copyright Notice and License.

Concurrent use on two or more computers or on a network is not allowed.

Copyright: No part of this publication or electronic file may be reproduced or transmitted in any form to anyone else, including transmittal by e-mail, by file
transfer protocol (FTP), or by being made part of a network-accessible system, without the prior written permission of CMG. You may not merge, adapt,
translate, modify, rent, lease, sell, sublicense, assign or otherwise transfer the publication, or remove any proprietary notice or label appearing on the
publication.

Disclaimer; Limitation of Liability: The ideas and concepts set forth in this publication are solely those of the respective authors, and not of CMG, and CMG
does not endorse, approve, guarantee or otherwise certify any such ideas or concepts in any application or usage. CMG assumes no responsibility or liability
in connection with the use or misuse of the publication or electronic file. CMG makes no warranty or representation that the electronic file will be free from
errors, viruses, worms or other elements or codes that manifest contaminating or destructive properties, and it expressly disclaims liability arising from such
errors, elements or codes.

General: CMG reserves the right to terminate this Agreement immediately upon discovery of violation of any of its terms.
Learn the basics and latest aspects of IT Service Management at CMG's Annual Conference - www.cmg.org/conference

How to validate application quality, performance and scalability

Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
Prepared by: Jack Woolley, Kaiser Permanente

This member paper presents a time proven methodology to validate an application’s availability, performance and
scalability. At the end of the paper you will know how to perform this methodology yourself.
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe

INTRODUCTION

History
Why Validate Application Quality?
Our company didn’t begin with a Production
Our management recognized that much of their CPU Certification testing program. It evolved slowly over a
utilization, response time and availability issues were period of more than nine years. The initial objective of
a result of “conflict of objectives”. This “conflict of the Production Certification testing program was to
objectives” is between the application development ensure that applications that used a “new” database
groups and the system software/capacity groups. (named DB/2) didn’t over utilize mainframe CPU
resource and cause CICS outages. At that point, this
At the root of this conflict… on the one hand the testing was simply named “Stress Testing”.
development group’s focus on “implementing on time
and within budget”. (Obviously a proper goal for their Stress Testing used a mainframe product called
role in data processing.) Frequently their employees’ TeleProcessing Network Simulator (TPNS). It was
performance reviews and/or bonuses rely significantly used to simulate CICS production transaction loads in
on these two criteria. Unfortunately meeting this order to verify that the new application performed well
objective often means doing things the “easy way” and consumed a rational amount CPU resource.
and perhaps not necessarily the most efficient way for
CPU resource utilization and response times. We quickly found that this stress testing allowed us to
Sometimes meeting the “on time and within budget” pre-tune the CICS environments so that new
objective means cutting corners and not coding fault applications could be implemented without causing
tolerance and recovery procedures into the outages on the first day of production implementation.
application. The lack of this fault tolerance and With this in mind, all new applications (or major
recovery procedures can cause many availability application modifications) were required to “pass” a
issues. stress test. At this time the pass/fail criterion was
based solely on whether the application caused a
All too often the rush is to get the application CICS outage or not.
implemented. The mindset of the application
developers is often “implement now and fix it later”. Later when availability was under control, our
Unfortunately, ‘later’ frequently never comes… management began to focus on application
performance. Soon a “one size fits all” response time
On the other hand the system software and capacity objective was created. This criteria consisted of two
groups maintain and execute applications on a daily parts, “internal” CICS response time and an “external”
basis. Because of this, their primary objectives are response time. The internal response time objective
different than the application developers. Their focus of less than two seconds 95%-ile and an external
is on application efficiency, application performance response time objective of less than five seconds
and application availability. 95%-ile was instituted. The Stress Testing pass/fail
criteria was expanded to include these two additional
In an attempt to balance the difference between these requirements.
disparate objectives, a Production Certification testing
methodology was created. In essence, this Things went along pretty smoothly for several years
Production Certification testing is meant to be a final and then Client Server (C/S) applications started
(and overall) application quality check before the
application implementation is accepted for deployment
into the production environment.

Find a CMG regional meeting near you at www.cmg.org/regions


Learn the basics and latest aspects of IT Service Management at CMG's Annual Conference - www.cmg.org/conference

appearing. Obviously the mainframe product (TPNS) They are turned over to the developers/vendors
was not going to be suitable for this new application for resolution. However the Production
category. A frantic search ensued for a C/S stress Certification team will make itself available to
test tool. We tried several and selected one very assist in resolving issues.

Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
expensive tool that “recorded and then simulated”
multiple-workstation C/S network traffic to stress the · Application functional testing is to be completed
C/S application server. Unfortunately we discovered before Production Certification starts. All the
that testing in this manner consistently gave us significant issues need to be completely resolved
misleading and inaccurate results and we abandoned before testing begins.
the use of this tool. · Application user acceptance is complete before
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe

Production Certification starts and all the resulting


Shortly after this abandonment, we found a relatively application modifications are complete.
inexpensive tool named “Microsoft Test”. After
several tests we found that it worked well with our · Production Certification will only include the three
stress testing methodology. The “Microsoft Test” tool to five most frequently used functions of the
was later purchased by Rational Software and application.
renamed as “Visual Test”. Although there are many · Production Certification will be completed with the
equally good tools, we have had several years of database size and the forecasted load of the fully
success with testing C/S and Web based applications deployed application two years in the future.
with this stress test tool.
· Although the application development group
The stress testing process grew to include several identifies the volumetric and the most frequently
more evaluation criteria as the years passed. Stress used three to five functions, the testing
testing included workstation CPU consumption, methodology is “fixed” and is not altered.
workstation memory consumption, standard Windows
· If Production Certification discovers critical
GUI compliance, client usability concerns, network
functionality, resource usage or performance
utilization, application-workstation compatibility,
issues; Production Certification ceases and the
hardware comparison, failure under load, … The
application is assigned a “Failed Production
process grew so much that the original name “stress
Certification” designation. Thereafter, the
testing” no longer applied. Thusly the name changed
Production Change Control Council will review
to “Production Certification” in an attempt to more
these findings.
accurately describe the methodology.
· If the application has a “Failed Production
Certification” assessment and the issues have
Implementation been resolved, another Production Certification
test can be requested. The application receives
Early in this evolution there were several discussions another risk assessment based solely on the
about creating a devoted “lab” environment to support subsequent testing effort.
Production Certification. Up to that point we had
always used the client environment (off prime hours)
to perform the tests.

The cost of creating a dedicated “lab” environment PRODUCTION CERTIFICATION REQUIREMENTS


seemed feasible. However when we added the costs
associated with maintaining the “lab”, the rapidly Early in this process we recognized a need to clearly
changing software and hardware costs made the outline the requirements to begin the Production
dedicated “lab” totally unreasonable. Later we found Certification effort. A document was created that
an added benefit of performing these tests in the client defines the strict requirements that must be met
environment… the results were much more accurate before Production Certification can begin. These are
than any lab simulation we could have created. outlined in the following text.

About the same time, some “rules” for Production Frequently we have to point out that Production
Certification were identified: Certification does not necessarily start with the date
on some application development project plan. We
find ourselves reminding people that the start of
· Production Certification should take less than two
Production Certification begins when these
to four weeks.
requirements are met, not by a calendar date.
· The Production Certification team does, not
resolve the issues/problems that are discovered.

Find a CMG regional meeting near you at www.cmg.org/regions


Learn the basics and latest aspects of IT Service Management at CMG's Annual Conference - www.cmg.org/conference

Stable Environment
Input Data
A stable environment is one that that closely
resembles the production implementation A list of valid input data which would be used in the

Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
environment. For CICS based applications: a quality above workflows/screenflows is also needed. For
assurance environment. For GUI applications: example; if the most frequently used function is to
network, workstations and servers in the expected search for a customer, obviously a list of several
production hardware and software implementation customer numbers are required. (Keep in mind that
configurations. The term “stable” not only refers to the these workflows/screenflows sometimes need to be
availability of the environment, but also mandates a executed thousands of times.)
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe

“frozen” development environment.

If this is a new application, we frequently schedule the Function Usage Volumetric


hardware implementations few weeks early and
perform the testing on the pre-implemented In order to accurately simulate the future production
environment. If we are testing in an existing environment, an indication of the mix of activity for the
enviornment, we frequently disable the “hot” failover three to five application functions is necessary.
hardware/software and perform the testing on the idle Please keep in mind that these estimates should
failover server. reflect the activity after two years of full production
implementation.

Stable Application

A stable application is one that closely resembles the Total System Volumetric
expected production implementation. The application
should have completed integration testing and have Again, in order to accurately simulate the future
had all significant problems completely resolved. The production environment, we need an indication of total
application also should have user acceptance testing system activity (including the three to five functions
completed and all the resulting application listed above). And once again, these estimates
modifications complete. The term “stable” not only should reflect the activity after two years of full
refers to the availability of the application, but also production implementation.
mandates a “frozen” application.

Pass / Fail Criteria


Stable Database
There are several pass/fail criteria. Some of these
A stable database is one that closely resembles the are:
full production implementation. The database
structure and volume of application data should reflect · Internal response time (complete application
two years of full implementation. The term “stable” server processing, not including network traversal)
not only refers to the availability of the database, but must be under two seconds, 95%-ile.
also mandates a “frozen” database.
· External response time is defined as starting
NOTE: In some cases we are requested to assist in when the client issues a request, and ends when
creating the appropriate amount of application test the client interface is available for additional
data. The reqired time for this additional effort is not activities. (Please note: this criterion does not
included in the Production Certification time estimates. indicate that client request is fulfilled. It merely
indicates that the client can continue with their
workflow.) This time must be less than five
Screen Flows seconds, 95%-ile.

Workflows/screenflows of the three to five application · Workflow Response time (the time to fulfill the
tasks that are most frequently performed by the client request) must not inhibit the client from
clients. Please note that the “most frequent” functions proceeding with their workflow for more than five
are often not the “most critical” functions. We assume seconds, 95%-ile.
that the most critical functions have been very well · The total number of network packets that traverse
tested in integration testing and therefore, not the network for any single client interaction must
included in the Prodcution Certification effort. not exceed 375 packets.
Production Certification focuses only on the most
frequently used functions. · No “memory leak” is acceptable on the server.

Find a CMG regional meeting near you at www.cmg.org/regions


Learn the basics and latest aspects of IT Service Management at CMG's Annual Conference - www.cmg.org/conference

· No “memory leak” is acceptable on the Network Usage Test


workstation.
In the Network Usage test, we perform each of the
· Any deviation from this criterion is not allowed. identified high-frequency application functions while a

Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
network analyzer monitors the network traffic. This
testing is designed to identify application network
Workstations For Client Server Production behavior and measure the utilization of the network
Certification resources. Again, our pass/fail criterion is that less
than 375 packets traverse the network for any client
The Production Certification testing methodology interaction.
exercises the complete application. As a result, we
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe

use tools that create client activity using the “real” There are two basic reasons for this test. We have
client interface. For Client Server applications this found that the application network utilization normally
means that several client workstations are required to has a significant impact on the application being able
perform the tests. By cutting down the client “think to meet the client response time expectations. The
time” we have found that each workstation can other reason for this testing is to make sure that no
normally generate the activity of 50 to 100 clients. single application monopolizes the network resources
in a way that would impact other applications.
As stated previously in this paper, we’ve decided not
to maintain a dedicated “testing lab”. The application Network Resource Example:
development group normally provides the client
workstations to perform the Production Certification Using the Network Usage test, we discovered an
testing. application that caused 3,200 network packets to
traverse the network as the client opened an empty
“search” window. Opening this empty search window
Time On The Implementation Schedule caused over 1.5 megabytes of data to traverse the
network. In talking to the developers about this
Normally, two weeks are adequate for 3270 (character network traffic, they indicated that all the possible
based) applications which have completed search criteria was being loaded on the local
functionality testing and user acceptance testing. workstation “just in case” the client needed the
Four weeks are normally adequate for GUI selection. One of these pre-loaded selection criteria
Client/Server applications. had a drop-down list that was over 500 entries long.
The testing itself requires much less time. Most of this
scheduled time is meant for the development We discovered this issue and had the developers alter
team/vendor to resolve the issues that are discovered the “search” window logic to populate the search
as a result of the Produciton Certification effort. criteria only when the client selects the particular
criterion. Although this logic is slightly more complex,
This scheduled time period is “fixed”. For example; if it significantly decreased the number of network
the user acceptance testing is a week behind packets. After this was done the application met our
schedule, the full Productiion Certification timeframe is pass criteria for the Network Usage test.
still required. If the project is behind schedule, the
implementation date needs to be adjusted accordingly
before Produciton Certification begins. Functional Contention Test

For the functional contention test, each of the high-


Application Development Support frequency functions is executed on multiple
workstations at high levels of activity for an extended
Inevitably, issues are discovered while the Production time. This testing is designed to identify application
Certification proceeds. Application development functions that contend with themselves. In other
resources (or vendor resources) need to be aware words, we identify intra-function database deadlocks
that they are likely to be called upon to fix the and abnormal application locking contention with
discovered issues. these tests.

Functional Contention Example:

TESTING SCENARIOS We were performing the Functional Contention test on


a web based, Java application. We noticed that when
we started a second workstation in the test, that the
application seemed to “lose track” of the execution

Find a CMG regional meeting near you at www.cmg.org/regions


Learn the basics and latest aspects of IT Service Management at CMG's Annual Conference - www.cmg.org/conference

sequence of the logic. (It lost track of the client’s When we were performing a Longevity test on a two-
selections and the flow of the screens was not tier Client Server application we noticed a significant
consistent.) The developers were notified and their memory leak in an “automatic update” feature. We
investigation discovered that several Java “global” found that the application (without keyboard or mouse

Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
variables should have been defined as “local” activity) “leaked” over 300 K of memory every two
variables. This was fixed and the application passed minutes. We notified the developers and at first they
the Functional Contention test. refuted our findings. Later they attempted to resolve
the issue. Unfortunately they were unable to resolve
In another example, we were testing a 3270 character the issue and the application implementation was
based CICS/DB2 application. The Functional permanently canceled.
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe

Contention testing clearly indicated that at a load of


two transactions a second, the application would
“hang” with a DB2 deadlock situation. (Database Progressive Load Test
deadlocks are caused by the sequence of the locks,
and the duration of these locks.) We passed this In a Progressive Load test, the testing automation is
information to the development group who then started on two workstations. They are allowed to
modified the sequence of the database locks. The execute for 15 minutes. After 15 minutes, another two
resulting application was able to execute at over ten workstations are started. (So, a total of 4 workstations
transactions a second without a single database are executing.) After another 15 minutes, another two
deadlock issue. workstations are started and so on, until all the
workstations are executing at once. After the testing
is over, the response time data is collected from all
Longevity Test workstations. This data is then imported into EXCEL
spreadsheet and sorted by application function and
The Longevity test executes each of the high- time of day.
frequency functions on a single workstation for an
extended amount of time. This testing is designed to The resulting table is then split by function with an
identify any memory leaks, or resource over- increasing aggregate load. When a graph is
utilization/abuse within the application. generated with increasing aggregate load across the
x-axis and increasing response time across the y-axis.
Workstation resources are carefully monitored to A gentle upward trend on the graph indicates that the
detect workstation “memory leaks”. (Memory leaks application is scaleable well above the tested levels of
are when the application acquires memory to create activity. A sharp "knee of the curve" on the graph
objects and then “forgets” to release the memory for indicates an application that is not scaleable above
reuse. As a result, the application memory usage the tested aggregate activity.
continues to grow as the application executes.) Our
studies indicate that when the committed memory on Progressive Load Example:
a workstation reaches 1.4 times the amount of
physical memory on the machine, there is a Upward Trend
measurable negative impact on workstation
10
performance.
8
Response Time

6
The resulting response times are also carefully 4
collected and statistically analyzed for indications of 2
server “memory leaks”. This is indicated by the same 0
functional activity (at the same application load) 3 6 9 12 15 18 21 24 27 30
Functions Per Minute
slowing over the time of the Longevity test.
Ultimately, when a large amount of server memory
has “leaked”, the server stops processing completely.
The following graph shows a gentle upward trend.

Longevity Test Example:

The following graph shows a “knee of the curve”.

Find a CMG regional meeting near you at www.cmg.org/regions


Learn the basics and latest aspects of IT Service Management at CMG's Annual Conference - www.cmg.org/conference

workstation requirements. As a result they request


our assistance. In essence they are caught between
Knee Of The Curve purchasing a server too small (and risk an
implementation failure) or purchasing a server too

Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
10 large (and waste money on a machine which will be
8 obsolete within two years).
6
4 We can help by executing the Production Simulation
2 test for an extended time and have our capacity group
0 measure the server CPU consumption and memory
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe

3 6 9 12 15 18 21 24 27 30 usage. This gives valuable information for accurately


FunctionsPer Minute sizing the server.

Sometimes we get requests to perform our


automation on several models of workstations. We
execute the Production Simulation test and carefully
Production Simulation Test
measure the workstation response times, CPU
consumption and memory usage. This information is
In a Production Simulation test, the test automation is
valuable in workstation cost-benefit analysis.
tuned to simulate the forecasted application function
mix. This mix is then executed at the forecasted total
Hardware Comparison Example:
peak production rate. This Production Simulation test
is allowed to execute for an extended time and any
Our company had planned to purchase a Client
anomalies are recorded and investigated.
Server application from a well-known software
company. However we were going to scale the
This testing is used to identify any high-frequency
application much higher than any of their other
application functions that conflict with each other at
customers. As a result, the application vendor could
the forecasted peak production rate. We frequently
not provide information about a recommended server
identify inter-function database deadlocks and locking
size, memory configuration, I/O subsystem
delays with these tests. The associated response
configuration, … We were asked to perform the
times are collected and analyzed for compliance to
Production Simulation test on a specific server
the pass/fail response time criteria.
configuration. And as a result of this test, they were
able to successfully size the server to optimize the
Production Simulation Example:
cost-benefit.
In this example, a 3270 CICS-DB2 application was
able to pass the Functional Contention test. However
Failure Under Load Test
when we executed the various functions at the same
time, we discovered that DB2 deadlocks occurred at
The Failure Under Load test is also optional. This
relatively low activity. We shared this information with
testing scenario has several purposes:
the developers and the database administration
group. These two groups worked together to resolve · Failure Under Load test can create situations that
the DB2 deadlock issue. are not visible under non-load failure tests. A
Production Simulation test is started and a portion
Subsequent testing indicated that the two groups did of the application is failed (server, database,
not entirely resolve the issue, but they were able to workstation, …). We then attempt a recovery and
allow the application to execute at a much higher rate make sure that the recovery is successful. Failure
of aggregate activity before the deadlocks occurred. recovery documentation is frequently created as a
Because this higher activity rate was well above the result of this testing.
forecasted production activity, the application was
assigned a “pass” for this test. · The Failure Under Load test can simulate a
production failure. It can make sure that existing
failure recovery documentation function properly.
· The Failure Under Load test can make sure that
“failure rollover” automation (HACMP) functions
Hardware Comparison Test properly. (We have consistently found that
improperly configured HACMP configurations
The Hardware Comparison test is optional. pass non-load failure tests but fail with tests under
Sometimes the application does not have the load.)
information necessary to size their server or

Find a CMG regional meeting near you at www.cmg.org/regions


Learn the basics and latest aspects of IT Service Management at CMG's Annual Conference - www.cmg.org/conference

· The Failure Under Load can also verify that


application and database recover/rollback
processing functions properly.

Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
Failure Under Load Example:

When our Call Centers fail, we found that the recovery


procedures would frequently not recover the entire
application. We were requested to provide an
application load (in a development environment) to
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe

validate several failed scenarios.

As a result of this test, the Call Centers now have


validated recovery procedures that are very reliable.

CONCLUSION

Our company has found that the Production


Certification methodology creates an acceptable
compromise between the competing objectives of the
application development and operations/systems
groups. As a result our application development
groups continue to develop applications at an optimal
rate while the resource usage, availability and
response times remain acceptable.

This author believes that other organizations can


significantly benefit from implementing their own flavor
of this Production Certification methodology.

Find a CMG regional meeting near you at www.cmg.org/regions

You might also like