client server - end to end response time
client server - end to end response time
Performance Professionals
The Computer Measurement Group, commonly called CMG, is a not for profit, worldwide organization of data processing professionals committed to the
measurement and management of computer systems. CMG members are primarily concerned with performance evaluation of existing systems to maximize
performance (eg. response time, throughput, etc.) and with capacity management where planned enhancements to existing systems or the design of new
systems are evaluated to find the necessary resources required to provide adequate performance at a reasonable cost.
This paper was originally published in the Proceedings of the Computer Measurement Group’s 1996 International Conference.
Copyright 1996 by The Computer Measurement Group, Inc. All Rights Reserved. Published by The Computer Measurement Group, Inc. (CMG), a non-profit
Illinois membership corporation. Permission to reprint in whole or in any part may be granted for educational and scientific purposes upon written application to
the Editor, CMG Headquarters, 151 Fries Mill Road, Suite 104, Turnersville , NJ 08012.
BY DOWNLOADING THIS PUBLICATION, YOU ACKNOWLEDGE THAT YOU HAVE READ, UNDERSTOOD AND AGREE TO BE BOUND BY THE
FOLLOWING TERMS AND CONDITIONS:
License: CMG hereby grants you a nonexclusive, nontransferable right to download this publication from the CMG Web site for personal use on a single
computer owned, leased or otherwise controlled by you. In the event that the computer becomes dysfunctional, such that you are unable to access the
publication, you may transfer the publication to another single computer, provided that it is removed from the computer from which it is transferred and its use
on the replacement computer otherwise complies with the terms of this Copyright Notice and License.
Copyright: No part of this publication or electronic file may be reproduced or transmitted in any form to anyone else, including transmittal by e-mail, by file
transfer protocol (FTP), or by being made part of a network-accessible system, without the prior written permission of CMG. You may not merge, adapt,
translate, modify, rent, lease, sell, sublicense, assign or otherwise transfer the publication, or remove any proprietary notice or label appearing on the
publication.
Disclaimer; Limitation of Liability: The ideas and concepts set forth in this publication are solely those of the respective authors, and not of CMG, and CMG
does not endorse, approve, guarantee or otherwise certify any such ideas or concepts in any application or usage. CMG assumes no responsibility or liability
in connection with the use or misuse of the publication or electronic file. CMG makes no warranty or representation that the electronic file will be free from
errors, viruses, worms or other elements or codes that manifest contaminating or destructive properties, and it expressly disclaims liability arising from such
errors, elements or codes.
General: CMG reserves the right to terminate this Agreement immediately upon discovery of violation of any of its terms.
CLIENT/SERVER END-TO-END RESPONSE TIME:
REAL LIFE EXPERIENCE
Mark Maccabee
IBM Thomas J. Watson Research Center
Yorktown Heights, NY
Abstract:
This paper deals with the use of End-to-End Response Time (ETE RT) information in a
production client/server application. The paper describes the instrumentation of the application,
the introduc-tion of ETE RT into the production environment, the business requirements for which
the ETE RT information is used and the current state of the evolving methodology for use of ETE
RT informa-tion. The main impression from user experience is that ETE RT is a powerful
business tool in a corporate environment. ETE RT data drives configuration updates and
software improvements. It is discussed almost daily at operations management meetings.
Although the data is technical (and contains a wealth of information) it relates well to terms that
end users and executives understand. The ETE RT facility was accepted and adopted very
naturally by the corporation.
The key information is in the client column that introduced. The employee ids of those users whose
identifies the user, the date and time columns, the RT transactions are to be measured were included in this
column that gives ETE RT, the Query Type column file.
and the Rows column (indicates the number of rows After some experimentation it was decided to
returned by the query). store the ETE RT records in a relational (Oracle) table
There was a design question of controlling the in the NT server. Thus every transaction resulted in a
collection, storing and processing this data. Thus the record added to this table. As for reporting,
designers/developers of the sensors had to contend management felt that client/server response time
with systems management issues, in addition to the reporting should follow the pattern of mainframe
issues of actually gathering the data. To determine response time reporting.
which users are monitored, a control file was
35000
30000
Response Time (milliseconds)
25000
20000
15000
10000
5000
0
7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00
Event Measu remen t T im e o f D ay
P er f RT
P erf RT Tr end (P oly 4)
35000
30000
Response Time (milliseconds)
25000
20000
15000
10000
5000
0
0 5 10 15 20 25 30 35 40
Number of Rows
Perf Polis
Linear (Perf) Linear (Polis)
This graph shows that response time increases exhibited 3-4 minutes response time. A controlled
with the number of rows and that "Polis" query is more experiment was conducted. Three different queries
sensitive to the increase in the number of rows than is were issued at each of three client workstations. All
the "Perf" query ("Polis" response time increases queries returned under 100 rows of information. One
faster). Both of the preceding graphs show that workstation was located as close as possible from the
important information about a client/server application data base and the application servers. Another
can be gained from ETE RT data. workstation was located about 40 miles from the
servers and was connected to the servers by means of
4.3. Initial Production Measurements a T3 line. A third workstation was in North Carolina
and the connection was through a T1 line. The closest
One aspect in those measurements was workstation had a response times of under 3 seconds.
determining whether geographical distance makes a The workstation at the end of T3 ran the queries in
difference as far as response time is concerned. under 3 seconds (although a fraction of a second
slower than the closest workstation). The workstations
4.3.1. Affect of Geographical Distance in North Carolina exhibited 3-4 minutes response time.
The background to this is the following. About 10 It seemed that the size of the data returned and the
of the client workstations were installed in North speed of the line to the data base could not have been
Carolina (see Figure 1). The data base server as well the reason for such a drastic change in response time.
as the application server were located in Middletown, Since there was no Application Server in North
Pennsylvania. The workstations in North Carolina Carolina a solution was attempted by moving an
16000
14000
Response Time (milliseconds)
12000
10000
8000
6000
4000
2000
0
0 5 10 15 20 25 30 35 40
Number of Rows
COM63441(Perf) COM97443(Perf)
This graph shows that response time of the local averaged 100 queries per month, with the heaviest
user is smaller that the response time of the remote use by one division topping 600 queries per month.
user for the "Perf" query. Response time of both An analysis by the manager of the client/server
users increases with the number of rows. system indicated that more queries are issued at the
beginning of the month (when the data is fresh) with
4.3.2. Extent of Usage queries tapering off towards the end of the month.
Since the manager knows how the users use the
As soon as the system was operational, management system, she believes that some of the usage is due to
wanted to know how widely used it is. This was users playing with the system, rather than actual
motivated by a few factors. Those had to do with production. This is another aspect of systems
justifying the investment, tracking the biggest users, management we noticed in this client/server system:
tracking pattern of use, etc. A monthly report it is done by a person familiar with the business of the
revealed a low usage, perhaps not surprising in view company rather than with the technology. The
of the fact that the system was just put in production. measurement technology for this manager is just an
Of the 13 divisions of the corporation, three divisions
did not use the system at all. The other ten divisions
C/S End-to-End Response Time: Customer Experience - CMG96
aid rather than the major source for her information
about what goes on in the system. 4.4. Production Measurements
4.4.1. Description
A typical incident is the following. A user
complained about a response time (of a "Perf" report)
being higher than the norm he is used to. The
manager of Supplier Metrics system turned on the
measurement facility for this user. The
measurements showed that the response time was
high and it was significantly higher than the response
time experienced by the manager when she tried to
issue the same query from her workstation. The
manager’s workstation was connected to an Ethernet
segment that is further from the data base server than
the user that reported the problem. The manager
suspected that the response time problem may be
shared by other users in the hub where the
complaining user is connected. She turned on the
measurement for a few users in the hub and found all
of them had a response time problem. Some
experienced a response time as high as 31 seconds
for a report involving one row of data returned. It
seemed that the problem had to do with this particular
hub. To get a resource oriented perspective the
manager requested measurement with the Sniffer (a
LAN traffic measurement tool).
The response time problem was reported
originally at 4 PM. The measurements on the
workstation of the complaining user, the analysis of
his data, the measurements for the workstations of the
other users on the hub, the analysis of their data and
the activation of the Sniffer on the hub occurred all on
the day when the problem was reported. The
manager of the Supplier Metrics system felt she
needed to act fast because she did not want her
system to get bad publicity. She felt that the response
time measurement facility gave her the functionality
and that this functionality could be activated fast.
Note: To activate the measurements the manager
needs to specify the employee id (COM#) in a control
file and the user needs to log off and to log back on.
4.4.2. Observations
Some of the observations about this incident are
of interest.
1. The detection of the problem was done manually;
i.e. a user complained. An existence of a
measurement tool, such as the one described in
the paper, may allow in the future detection by a
performance management application.
4.4.3. Follow on
Measurements with the Sniffer, as well as
additional response time measurements indicate the
following condition. All the workstations on the local
segment tend to get long response time. Whenever
workstations are requesting service within the local
segment (Supplier Metrics is not local) the response
time is good. The conclusion is that the activity
generated by the people and devices on this segment
of the LAN is too much for the router (that is
connected to this segment) to handle. A LAN
hardware solution will probably be undertaken after
additional measurements. .*(e.g. placing the users on
2 separate segments).
35000
Response Time (milliseconds)
30000
25000
20000
15000
10000
5000
0
0 5 10 15 20 25 30 35 40
Number of Rows
Perf Graph
Linear (Perf) Linear (Graph)
These production measurements show that The manager of Supplier Metrics system
response time increases with the number of rows (we complained that in cases where response time is long,
have seen this already in pre-production it is hard to determine whether the problem is in the
measurements). We also see that "Graph" query is data base or in the network (her expression was "we
more sensitive to the increase in the number of rows are blind without this data"). For this purpose one
than the "Perf" query. developer was assigned to investigate the possibility
of developing sensors in the Oracle Data Base to
5. Upcoming Developments measure the time spent in the data base. The
requirement to measure the components of respose
The use of the response time measurement time is fairly typical once the total response time is
facility, lead to additional requirements and uses of available.
this capability. Here we will list a few of those.
The company is planning to start operation of a 6. Summary and Conclusions
new client/server system. This one is planned for
about 3000 workstations and 9 large data base The work described in this paper is the first
servers. The application for this system was methodical application of the ETE RT technology we
developed using Powerbuilder as well. have done in a production corporate environment.
This work is part of an ongoing research in the
To allow response time measurements, the
methodology and technology of ETE RT.
sensors from Supplier Metrics will be used. For use
on this system, the sensors will be re-implemented in Throughout our work, both conceptual and
object oriented technology based on Microsoft practical, we gained an appreciation of the importance
distributed OLE. Once operational on the new of user-oriented measurements. One motivating
system, the sensors on the Supplier Metrics system example is described in the Summary Section of [4].
will be replaced with the new OLE conforming The ability to have a metric is an important
sensors. management requirement. It is important both to
1. Management is interested in ETE RT information Although highly productive the facility described here has
on a day to day basis. What is measured affects limitations. For one it is limited to a simple client/server
daily business operation and is of direct business system. More complex system have additional
consequence. Thus we find discussions based on complicating issues to deal with. Larger systems have
ETE RT data at management meetings, meetings’ tens of thousands of clients. There are cases of three or
participants share their perspective on the data even four levels of hierarchy. Those complex systems
and discussion sometimes centers on the validity have multiple types of servers and server requesting
of the data. ETE RT is clearly important service from other servers. Then there are internet
information for management. This tells systems. Beyond this is the subject of breaking the total
management whether the system delivers the response time (ETE RT) to its components or
information the purchase managers need within a decomposition. Decomposition is harder to implement
than ETE RT. Especially daunting is the task of