Proceedings of The 1997 Winter Simulation Conference Ed. S. Andradóttir, K. J. Healy, D. H. Withers, and B. L. Nelson
Proceedings of The 1997 Winter Simulation Conference Ed. S. Andradóttir, K. J. Healy, D. H. Withers, and B. L. Nelson
William S. Keezer
1O3
104 Keezer
may be modeled separately from the rest of the operating treat it as a separate resource with a capacity equal to
system workload. Detailed presentations of the UNIX that of one processor on a multiprocessor system.
operating system can be found in two sources, Bach Though the additional detail of modeling the various
(1986) and Leffler, et al (1990). Processing in a UNIX work-spaces and the queueing for the kernel lock might
system occurs in one of two logical divisions, user space provide a slightly more accurate value for CPU
or system space. User space contains all the application processing, in the author’s experience, the above
specific data, buffers, and compiled code, and system technique was adequate for design and development
space contains all the I/O buffers, system memory, and phase work. However, the exact nature of the workload
operating system code. To maintain integrity, any data sharing must be kept in mind. In cases where their is a
to be processed by the operating system is rewritten to master processor and the remainder are slaves, the
separate memory locations in system space from the user master processor must be separately modeled, because its
space. The processing in user space is only for the workload is different. By the time truly accurate answers
application-level calculations. All work scheduling and are required, the system under development is usually up
I/O is handled by the operating system. and running and can be measured and modified faster
Multiprocessor systems have the potential for parallel than a model. The exception would be a model to do
processing, but only for user spaces. The operating what-if planning for an existing system.
system has only one lock on the kernel, and therefore any As a side note, MVS systems can be readily modeled
system work is single-threaded through that lock. as multiple-server systems, but without the concerns over
System work takes precedence over user space work, and the kernel locks. MVS has matured to the point that
a request for a system function will pre-empt any user there are thousands of kernel locks, and with fourteen
work in the processor. Though system work functions levels of interrupts, system functions can be pre-empted
have priorities among themselves, no system call can for more critical system functions, and all processors are
pre-empt any other system call; it must wait until the equal in capability.
current system function is complete. Once a system
function is complete the next function will be the one 2.2 Memory
with the highest priority, but it in turn will be in the
processor for its full time-slice or until it requests an UNIX was developed in a compute-intensive
I/O. This of itself does not lead to complications, but the environment, and most systems come with sufficient
fact that there is only one kernel lock for the operating memory as not to be memory constrained. In the case of
system means all system calls are single-threaded applications with extremely large memory requirements,
through the equivalent of one processor. Actually, however, memory will have to be tracked. It is possible
slightly less, because of scheduling overhead and the to allocate more memory for a process than is available
handling of the lock between processors. as physical memory. This is called virtual memory. In
The implications of this arrangement are that a principle, it is not necessary to be concerned about
computationally intensive application or set of virtual memory; it is a programming constraint. What is
applications will have a high degree of parallelism on a important is the actual use of physical memory.
multiprocessor system, but an I/O intensive system will Memory is managed by keeping the most active
be throttled to slightly more than the capacity of one portions in physical memory and putting the inactive
processor, and all I/O handling will be serialized. It is portions out on specially allocated DASD, to be brought
possible to have multiple I/Os outstanding, but the actual into physical memory as necessary (paging out and
interrupt handling is only done under the one kernel lock. paging in, respectively). Physical memory is also used
The simulation of such an arrangement is actually as the cache for I/O, those portions being handled
much more simple than might be first supposed. One separately by versions of UNIX that use separate caches.
models a multiple-server queue, and increases the CPU Some versions of UNIX, e.g. Solaris and SunOS
time to account for the single-threading through the from Sun Microsystems, do not partition off a separate
kernel lock. This does not lead to inflated CPU cache allocation, but simply use as much memory as
utilization times, because when a user process is waiting necessary for the function.
for the kernel lock, it goes into a spin-loop checking for Memory is modeled as a pool from which allocations
the availability of the lock. Thus the CPU is being are withdrawn as necessary. When the pool is depleted
utilized, though not directly for productive work. In I/O or reaches some set level of empty pages, the least-
intensive functions, cache management may need to be recently used pages are paged to disc and freed for reuse.
modeled separately from the rest of the CPU It is not necessary to track locations, simply the amount
usage(Nelson, Keezer, and Schuppe, 1996). One can left. One can allocate precisely or with a distribution,
and when the page out to reuse space occurs, add some
Simulation of Computer Systems and Applications 105
distributed increment back into the pool. When Read operations are all handled similarly, but there
allocating memory, allocate the entire amount necessary, are three main types of writes, synchronous,
then let the paging process replenish the pool; this is asynchronous, and DASD fast write. A synchronous
what occurs in the actual system. The actual details will write holds the program until the I/O results return to the
vary with the system under study, and the modeler must program. This increases response time by the time of the
maintain communications with the development staff to entire write operation. Asynchronous writes do not wait
help determine what the necessary parameters are. One for a return, and, once the write has been set up, the
area that requires care is the termination of an program continues. DASD fast write occurs on discs
application. The amount of physical memory recovered which have non-volatile storage (storage that maintains
is not equal to the total allocation, but only that part in its contents in a power outage or power down). In this
physical memory. case as soon as the data reaches the on-board cache of
the disc, a result is returned, eliminating the need to wait
2.3 DASD for the completion of the physical I/O.
UNIX stores data in memory in a cache as mentioned
It has been the author’s experience that storage I/O, and above. Before a physical I/O is performed, a check is
in particular DASD I/O, is a major potential bottleneck made to see if the data is in cache. If so, then the I/O
in UNIX systems. DASD I/O is inherently slower than consists of a simple transfer from cache to the user space.
processing, with one I/O typically taking ten Generally, cache hits are approached on a percentage
milliseconds, while processing speeds are measured in basis, which leads to a probability for modeling
nanoseconds per instruction. purposes. If the I/O is intensive, and the number of files
There has been a lot of progress in the last several are small relative to the amount of cache allocated, in
years, since the development of the SCSI controller, in some versions of UNIX, the cache manager must be
improving DASD performance. At one time a DASD explicitly modeled as a significant contributor to
I/O required the complete attention of the central response time.
processor. Data was moved to the system buffer for a Details on programming UNIX I/O can be found in
write or the buffer allocated for a read; the data was Nelson, B. L., Keezer, W. S., and Schuppe, T. F. (1996).
transferred to or from the disc a block at a time, and then As a first step in the earliest models, one can replace the
the results transferred back to the calling program. With details of the I/O with a branch to either a cache hit or a
the advent of SCSI controllers, the data is transferred to a physical I/O with a distribution of possible response
system buffer and the I/O commands issued to the times. Generally a skewed distribution with a mode close
controller. The controller then executes the I/O and puts to the minimum response time is adequate. Separate
the results in a system buffer, reading and writing provision may be made for the CPU involvement. The
multiple blocks with DMA (direct memory access). At next complexity would be to break down the I/O into
the completion, the SCSI controller then issues an disc fixed overhead, seek time, rotational delay, and data
interrupt for the system to process the result. This is transfer time. Finally, one can model I/Os with a high
quite similar to the manner in which MVS DASD degree of precision using the methods in the referenced
functions, with DASD controllers and channel processors paper.
functioning similarly to SCSI controllers. As RAID (redundant arrays of inexpensive discs)
The discs themselves have also been improved, with devices become more common for servers, the models
on-board caches and processors, some of which could will simplify. RAID architecture separates the physical
run DOS. Discs can perform prefetch for sequential data I/O from the requested logical I/O with large caches, and
and store the results in the cache to speed up the transfer performance times for cache misses tend to be more
of data and reduce the potential for hardware delays. uniform. With the very large caches seen on some RAID
Details of typical disc operations are given in Ruemmler controllers, cache hit percentages can routinely run in the
and Wilkes (1992,1993). 90-99%. RAID controllers can also reduce CPU
When modeling I/O for applications, the two most overhead values since mirroring and other management
important operations are read and write. Open and functions can be handled in hardware in some products.
closes generally are not of major importance, even
though they can have high overhead, unless there are 2.4 Network
many of them relative to the other activities. The major
impact of an open command is to increase the response Networks have been the subject of intense simulation
time on a first return from a transaction, if it opens a file. activity for years. For the purposes of this tutorial,
The major impact of a close command is to increase the networks are simply a combination of CPU and memory
response time on the last return, if it closes a file. overhead and a time delay, possibly distributed. The
106 Keezer
details of network protocols are usually not important in or parameterize the transactions and use a generic sub-
modeling the interaction of applications in systems. model. If the transactions are complex the impact on the
Results are mostly dependent on how long a transmission system may vary greatly with small changes. In such a
takes, regardless of protocol. case the individual sub-model approach is necessary. In
the author’s experience this creates a high maintenance
3 WORKLOAD COMPONENTS effort and large amounts of code, since in the design and
development phases, changes are constantly being made
in response to model results or to programming
3.1 Load Generation
problems.
The author has used parameterization to simulate
Creating a realistic workload for the system is a critical
relational database transactions successfully (Keezer, in
part of the model. Because of the complexity of the
preparation). For standard applications, there are five
transaction interactions at a resource level, small changes
basic parameters, system id (if the model consists of
can sometimes have major effects. Arrival rates,
more than one system), CPU used, memory used, the
transaction mixes, and transaction sources are important.
number of DASD I/Os, and the number of network I/Os.
The sources of transactions are other systems, internal
To parameterize transactions, create tables with one
schedulers, and external users.
column for each system in which the transaction occurs,
The easiest to handle are the internal schedulers. One
e.g.,
can simply generate transactions similarly to the schedule
system ID
in the system under study. The other two sources are
CPU
more difficult. Transactions from other systems can be
memory
handled in two different ways. If the other systems are
DASD I/O
part of the model, then their requests will naturally arise
network I/O
from the execution of the model. If the request-
others as needed.
generating systems are external to the model, then they
One may then index into the table to obtain the necessary
need to be handled similarly to external users.
parameters. The distribution of the work in the generic
The three main parameters to consider with external
sub-model would be via a set of indexed loops. The
users are how many are there at one time, what
indices would be for the number of DASD I/O’s, the
transactions are they submitting, and how long do they
number of network I/Os, and their total plus one or two
think between transactions. The number of users can be
for distributing the CPU. The CPU data should only
handled as an arrival rate problem, based on the average
include the requirements for processing, not for the
session time and the arrival rate. The simulation of
system calls. Those are provided by the system call
transaction choices can depend on how many
simulation. The parameterization of relational database
transactions there are from which to chose. If there are a
procedures is not as simple, though the method is much
number of repetitive transactions, such as those from
the same. In relational databases there are both the
data entry clerks, then each of these could be simply
various tables and the operations on them by the
modeled as separate generators with correct arrival rates.
transactions to parameterize. The details are beyond the
If, as in the author’s experience, there are a large
scope of this tutorial.
number of transactions, and the choices of the next
Unless there is a known large skew in the way
transaction are varied, a transition matrix has proven to
resources are consumed, The most straightforward
be a good method for generating the arrivals. Each user
method of modeling their consumption is to divide the
is generated as an entity and then chooses a transaction
total CPU requirement by the total DASD and network
starting at a given place in the matrix. The choices are
I/Os plus one, and the DASD I/Os by the network I/Os.
based strictly on probability of a transition from the last
One then models a cycle of CPU consumption followed
transaction to any given transaction. A corresponding
by a DASD I/O, and occasionally one of CPU
pair of matrices with the user think times as a mean and
consumption followed by a network I/O. The final pass
standard deviation is used to generate the delay between
is the remaining CPU consumption. The memory is
transaction submissions. The details of the method may
allocated at the beginning of the sub-model, similarly to
be found in Keezer, Fenic, and Nelson (1992).
the way most transactions work, unless it is known that
additional allocations of memory are done inside the
3.2 Applications
transaction. In which case the additional memory
allocations can be done intermittently as were the
There are two ways to model applications, create a sub-
network I/Os.
model for each different transaction of the application(s)
Simulation of Computer Systems and Applications 107
in this is Jain(1991). Many designers have not done this development can provide considerable assistance to the
before starting development. Remember, the cheapest, developers, revealing potential problems and finding
fastest simulation is the one that doesn’t need to be built. better ways to distribute work. It is important to keep
Many designs implicitly assume that resources are everything as simple as possible. The model should
unconstrained until proven otherwise. Find the obvious never be more complex than the development concepts it
constraints the easy way. supports, and generally does not need to be as specific as
Once the overall numbers indicate the system may be the development efforts. In this area, knowledge of
able to perform, the first iteration of the model may be computer systems is as important as simulation
started. This iteration should be as simple as possible, technique; the simulations flow in a straightforward
with no details on consumption of resources. The load manner once the systems are understood.
can even be one or two general transactions. The idea is
to obtain a first feel for the behavior of the system. If REFERENCES
there are surprises at this stage, GOOD! Everybody
stands to win once they are analyzed; early problems are Alexander, T.B., et al 1994. Corporate Business servers:
far easier to solve than later ones. An Alternative to Mainframes for Business
The overall structure of the model should be as Computing. Hewlett-Packard Journal 45 No. 3: 8-30.
modular as possible. Each system in the model should Bach, M.J. 1986. The Design of the UNIX Operating
be represented by either a sub-model or a generic system System. Englewood Cliffs, N.J: Prentice-Hall, Inc.
model with a table of parameters for each system being Jain, R. 1991. The Art of Computer Systems
studied. Within a system, each function should be as Performance Analysis. New York: John Wiley and
independent as possible. The whole purpose of this is to Sons.
allow for easy change as designs change. It also makes Keezer, W.S., in preparation. Simulating Relational
the job of trying various alternatives easier. One can Databases.
even start as simply as a single system, with the Keezer, W.S., Fenic, A.P., and Nelson, B.L. 1992.
interactions with other systems modeled as distributed Representation of User Transaction Processing
delays. Later, the various systems can be combined for a Behavior with a State Transition Matrix. Proceedings
more detailed look at the overall combination. The of the 1992 Winter Simulation Conference, Vol. 25,
internal divisions of a system are a load generator, the ed. Swain, J.J., Goldsman, D., Crain, R.D., Wilson,
CPU, memory, DASD, and a network connection. J.R. 1223-1231. Baltimore, MD: Association for
Unless it is definitely known ahead of time, one can Computing Machinery.
model the dynamics of the various components using McBeath, D.F., and Keezer, W.S., 1993. Simulation
distributions as approximations. Exponential arrivals, Support of Software Development. Proceedings of the
log-normal processing and network times, and triangular 1993 Winter Simulation Conference, Evans, G.W.,
DASD responses will provide a good high level Mollaghasemi, M., Russell, E.C., and Biles, W.E.,
approximation for overall dynamics. Other than a eds., IEEE, Piscataway, NJ: p. 1143.
reasonable estimate of the mean and dispersion, the Nelson, B. L., Keezer, W. S., and Schuppe, T. F. 1996.
inputs do not have to be created yet, but some first A Hybrid Simulation-Queueing Module for Modeling
answers can be obtained. If various functions are UNIX I/O in Performance Analysis. Proceedings of
modeled as sub-models, then as more detail is needed the 1996 Winter Simulation Conference, Charnes, J.
and the necessary supporting input data is generated M., Morrice, D. M., Brunner, D. T., and Swain, J. J.,
those functions can be replaced easily. eds., IEEE, Piscataway, NJ: p. 1238.
The outputs of interest include resource utilization, Ruemmler, C., and Wilkes, J. 1992. UNIX Disk Access
throughput-response curves for various systems, and Patterns. USENIX Winter 1993 Technical Conference
overall transaction response times. Throughput-response Proceedings, San Diego, CA, January.
curves are very useful and sensitive for validating and Ruemmler, C., and Wilkes, J. 1993. Modeling Disks. HP
calibrating components of a model against benchmark Laboratories Technical Report HPL-93-68 revision 1,
data, and the process of making the model reproduce the December.
benchmark results can lead to insights into the operation S. J. Leffler, M. K. McKusick, M.J. Karels, and J.S.
of the systems that are not necessarily documented. Quarterman 1990, The Design and Implementation of
the 4.3SD UNIX Operating System, Addison-Wesley,
5 CONCLUSION NY.
SPARCcentera 2000, Technical White Paper, Sun
Just as in manufacturing, simulation studies of computer Microsystems, Inc., November, 1992.
systems during the design phase and throughout
Simulation of Computer Systems and Applications 109
AUTHOR BIOGRAPHY