Essentials of Monte Carlo Simulation - Statistical Methods For Building Simulation Models
Essentials of Monte Carlo Simulation - Statistical Methods For Building Simulation Models
Nick T. Thomopoulos
I was fortunate to have a diverse career in industry and academia. This included
working at International Harvester as supervisor of operations research in the
corporate headquarters; at IIT Research Institute (IITRI) as a senior scientist with
applications that spanned worldwide in industry and government; as a professor in
the Industrial Engineering Department at the Illinois Institute of Technology (IIT),
in the Stuart School of Business at IIT; at FIC Inc. as a consultant for a software
house that specializes in supply chain applications; and the many years of consult-
ing assignments with industry and government throughout the world. At IIT, I was
fortunate to be assigned a broad array of courses, gaining a wide breadth with the
variety of topics, and with the added knowledge I acquired from the students, and
with every repeat of the course. I also was privileged to serve as the advisor to many
bright Ph.D. students as they carried on their dissertation research. Bits of knowl-
edge from the various courses and research helped me in the classroom, and also in
my consulting assignments. I used my industry knowledge in classroom lectures so
the students could see how some of the textbook methodologies actually are applied
in industry. At the same time, the knowledge from the classroom helped to
formulate and develop Monte Carlo solutions to industry applications as they
unfolded. This variety of experience allowed the author to view how simulation
can be used in industry. This book is based on this total experience.
Simulation has been a valuable tool in my professional life, and some of the
applications are listed below. The simulations models were from real applications
and were coded in various languages of FORTRAN, C++, Basic, and Visual Basic.
Some models were coded in an hour, others in several hours, and some in many
days, depending on the complexity of the system under study. The knowledge
gained from the output of the simulation models proved to be invaluable to the
research team and to the project that was in study. The simulation results allowed
the team to confidently make the decisions needed for the applications at hand. For
convenience, the models below are listed by type of application.
vii
viii Preface
• Compare the accuracy of the horizontal forecast model when using 12, 24 or 36
months of history.
• Compare the accuracy of the trend forecast model when using 12, 24 or 36
months of history.
• Compare the accuracy of the seasonal forecast model when using 12, 24, or 36
months of history.
• Compare the accuracy of forecasts between weekly and monthly forecast
intervals.
• Compare the accuracy benefit of forecasts when using month-to-date demands to
revise monthly forecasts.
• Compare the accuracy of the horizontal forecast model with the choice of the
alternative forecast parameters.
• Compare the accuracy of the trend forecast model with the choice of the
alternative forecast parameters.
• Compare the accuracy of the seasonal forecast model with the choice of the
alternative forecast parameters.
• In seasonal forecast models, measure how the size of the forecast error varies as
the season changes from low-demand months to high-demand months.
Order Quantity
• Compare the inventory costs for parts (with horizontal, trend, and seasonal
demand patterns) when stock is replenished by use of the following strategies:
EOQ, Month-in-Buy or Least Unit Cost.
• Compare various strategies to determine the mix of shoe styles to have in a store
that yields the desired service level and satisfies the store quota.
• Compare various strategies to determine the mix of shoe sizes for each style type
to have in a store that yields the desired service level and satisfies the store quota
for the style.
• Compare various strategies to find the initial-order-quantity that yields the least
cost for a new part in a service parts distribution center.
• Compare various strategies to find the all-time-requirement that yields the least
cost for a part in a service parts distribution center.
• Compare various ways to determine how to measure lost sales demand for an
individual part in a dealer.
• Compare strategies, for a multi-distribution system, on how often to run a
transfer routine that determines for each part when and how much stock to
transfer from one location to another to avoid mal-distribution.
Preface ix
Safety Stock
• Compare the costs between the four basic methods of generating safety stock:
month’s supply, availability, service level and Lagrange.
• Compare how the service level for a part reacts as the size of the safety stock and
the order quantity vary.
• Compare how a late delivery of stock by the supplier affects the service level of a
part.
• Compare strategies on how to find the minimum amount of early stock to have
available to offset the potential of late delivery by the supplier.
• Measure the relationship between the service level of a part and the amount of
lost sales on the part.
Production
Other
Nick T. Thomopoulos
xi
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Monte Carlo Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Random Number Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Computer Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Computer Simulation Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Basic Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter Summaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Random Number Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Modular Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Linear Congruent Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Generating Uniform Variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
32-Bit Word Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Random Number Generator Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Length of the Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Mean and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Chi Square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Pseudo Random Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Generating Random Variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Inverse Transform Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Continuous Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Discrete Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Accept-Reject Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Truncated Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
xiii
xiv Contents
Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Sorted Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Minimum Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Maximum Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Triangular Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Empirical Ungrouped Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Empirical Grouped Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Generating Continuous Random Variates . . . . . . . . . . . . . . . . . . . . 27
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Continuous Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Standard Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Erlang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
When k < 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
When k > 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Standard Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Weibull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Hastings Approximation of F(z) from z . . . . . . . . . . . . . . . . . . . . . 36
Hastings Approximation of z from F(z) . . . . . . . . . . . . . . . . . . . . . 37
Hastings Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Convolution Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Sine-Cosine Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Lognormal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Chi-Square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Approximation Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Relation to Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Generate a Random Chi-Square Variate . . . . . . . . . . . . . . . . . . . . . 41
Student’s t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Generate a Random Variate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Fishers’ F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5 Generating Discrete Random Variates . . . . . . . . . . . . . . . . . . . . . . 45
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Discrete Arbitrary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Discrete Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Bernoulli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Contents xv
Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
When n is Small . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Normal Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Poisson Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Hyper Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Pascal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Relation to the Exponential Distribution . . . . . . . . . . . . . . . . . . . . . 53
Generating a Random Poisson Variate . . . . . . . . . . . . . . . . . . . . . . 53
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6 Generating Multivariate Random Variates . . . . . . . . . . . . . . . . . . . 57
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Multivariate Discrete Arbitrary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Generate a Random Set of Variates . . . . . . . . . . . . . . . . . . . . . . . . 58
Multinomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Generating Random Multinomial Variates . . . . . . . . . . . . . . . . . . . 60
Multivariate Hyper Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Generating Random Variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Bivariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Marginal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Generate Random Variates (x1, x2) . . . . . . . . . . . . . . . . . . . . . . . . 64
Bivariate Lognormal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Generate a Random Pair (x1, x2) . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Multivariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Cholesky Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Generate a Random Set [x1, . . . , xk] . . . . . . . . . . . . . . . . . . . . . . . 67
Multivariate Lognormal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Cholesky Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Generate a Random Set [x1, . . . , xk] . . . . . . . . . . . . . . . . . . . . . . . 69
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7 Special Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Constant Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Batch Arrivals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Active Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Generate a Random Variate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Standby Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Generate a Random Variate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Random Integers Without Replacement . . . . . . . . . . . . . . . . . . . . . . . 76
Generate a Random Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
xvi Contents
Poker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Generate Random Hands to Players A and B . . . . . . . . . . . . . . . . . 77
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
8 Output from Simulation Runs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Terminating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Nonterminating Transient Equilibrium Systems . . . . . . . . . . . . . . . . . 80
Identifying the End of the Transient Stage . . . . . . . . . . . . . . . . . . . 81
Output Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Partitions and Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Nonterminating Transient Cyclical Systems . . . . . . . . . . . . . . . . . . . . 83
Output Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Cyclical Partitions and Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Other Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Forecasting Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Forecast and Replenish Database . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
9 Analysis of Output Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Variable Type Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Proportion Type Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Analysis of Variable Type Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Sample Mean and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Confidence Interval of m when x is Normal . . . . . . . . . . . . . . . . . . . 93
Approximate Confidence Interval of m when x
is Not Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
When Need More Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Analysis of Proportion Type Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Proportion Estimate and Its Variance . . . . . . . . . . . . . . . . . . . . . . . 97
Confidence Interval of p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
When Need More Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Comparing Two Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Comparing Two Means when Variable Type Data . . . . . . . . . . . . . . . 101
Comparing x1 and x2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Confidence Interval of (m1 m2) when Normal Distribution . . . . . . 101
Significant Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
When s1 ¼ s2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
When s1 6¼ s2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Approximate Confidence Interval of (m1 m2)
when Not Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
As Degrees of Freedom Increases . . . . . . . . . . . . . . . . . . . . . . . . . 104
Contents xvii
Pascal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Q-Q Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
P-P Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Adjustment for Ties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
11 Choosing the Probability Distribution When No Data . . . . . . . . . . 137
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Continuous Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Triangular . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Lognormal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Weibull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Solving for k1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Solving for k2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Appendix C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Chapter 1
Introduction
To apply the Monte Carol method, the analyst constructs a mathematical model that
simulates a real system. A large number of random sampling of the model is applied
yielding a large number of random samples of output results from the model. The
origin began in the 1940s by three scientists, John von Neumann, Stanislaw Ulam
and Nicholas Metropolis who were employed on a secret assignment in the Los
Alamos National Laboratory, while working on a nuclear weapon project called the
Manhattan Project. They conceived of a new mathematical method that would
become known as the Monte Carlo method. Stanislaw Ulam coined the name
after the Monte Carlo Casinos, located in Monaco. Monaco is a tiny country located
just south of France facing the Mediterranean Sea, and is famous for its beauty,
casinos, beaches, and auto racing. The Manhattan team formulated a model of a
system they were studying that included input variables, and a series of algorithms
that were too complicated to analytically solve.
The method is based on running the model many times as in random sampling.
For each sample, random variates are generated on each input variable; compu-
tations are run through the model yielding random outcomes on each output
variable. Since each input is random, the outcomes are random. In the same way,
they generated thousands of such samples and achieved thousands of outcomes for
each output variable. In order to carryout this method, a large stream of random
numbers were needed. Von Neumann developed a way to calculate pseudo random
numbers by using a middle-square method. Von Neumann realized the method had
faults, but he reasoned the method was the fastest that was then available, and he
would be aware when the method would fall out of alignment.
The Monte Carlo method proved to be successful and was an important instru-
ment in the Manhattan Project. After the War, during the 1940s, the method was
continually in use and became a prominent tool in the development of the hydrogen
bomb. The Rand Corporation and the U.S. Air Force were two of the top
organizations that were funding and circulating information on the use of the Monte
Carlo method. Soon, applications started popping up in all sorts of situations in
business, engineering, science and finance.
Computer Languages
Since the 1940s, many computer languages have been developed and in use in one
way or another allowing programmers to write code for countless applications.
Early languages like: COBOL, FORTRAN, Basic, Visual Basic , JAVA and C++,
were popular in developing computer simulation models.
As simulation became more popular in solving complex problems in business,
engineering, science, and finance, a new set of computer languages (e.g., GAUSS,
SAS, SPSS, R) have evolved. Some are for either of the two major types of
simulation: continuous and discrete, and some can handle both. These languages
allow the user to construct the process he/she is emulating and also has the ability to
gather statistics, perform data analysis and provide tables and graphs on outcome
summaries.
Discrete simulation models are when the events change one at a time, like in
queuing models where new customers arrive or depart the system individually or in
batches. Continuous simulation models are when the events are continuously
changing over time, according to a set of differential equations. Could be the
trajectory of a rocket.
Basic Fundamentals 3
The number of simulation software packages has also exploded over the years and
most apply for specific processes in industry, gaming and finance. The software
may include a series of mathematical equations and algorithms that are associated
with a given process. When the software fits the process under study, the user can
apply the software and quickly observe the outcomes from a new or modified
arrangement of the process without actually developing the real system. This ability
is a large savings in time and cost of development.
In the 1990s, Frontline presented the Solver application that was for spreadsheet
use to solve linear and nonlinear problems. This soon led to Microsoft’s Excel
Solver. Improvements were made and in the 2000s, Monte Carlo Simulation was
introduced as another Excel application where many trials of a process are auto-
matically performed from probability distributions specified by the user. In 2006,
yet another added feature is the RISK Solver Engine for EXCEL that performs
instant Monte Carlo Simulations whenever a user changes a number on a spreadsheet.
Basic Fundamentals
The early pioneers of the Manhattan Project were fully aware that the validity of
their model highly depended on the authenticity of the algorithms they formed as
well as the choice of the input probability distributions and parameter values they
selected. An error in the formulation could give misleading results. With their
creativity, intellect and careful construction, the output from their simulation
model was highly successful.
Monte Carlo methods are now extensively used in all industries and government
to study the behavior of complex systems of all sorts. Many of the applications are
performed with software programs, like the Excel Solver models, described earlier.
The casual user will run the powerful easy-to-use software model and collect the
results, and with them, make decisions in the work place, and that might be all that
is needed. This user may not be completely aware of how the model works inside
and may not have a need to know.
Another user may want to know more on how the model does what it does. Many
others who code their own simulation models need to know the fundamentals. This
book is meant for these users, giving the mathematical basis on developing simu-
lation models. A description is given on how pseudo random numbers are
generated, and further, how they are used to generate random variates of input
variables that come from specified probability distributions, discrete or continuous.
The book further describes how to cope with simulation models that are
associated with two or more variables that are correlated and jointly related.
These are called multivariate variables, and various distributions of this type are
described. Some are discrete, like the multinomial, multivariate hyper geometric,
and some are continuous like the multivariate normal and multivariate lognormal.
4 1 Introduction
In addition, the text helps those users who are confronted with a probability
distribution that does not comply with those that are available in the software in use.
Further, the system could be a non-terminating system that includes transient and
equilibrium (steady state) stages. The text also gives insight on how to determine
the end of the transient stage and the beginning of the equilibrium stage. Most
analysts only want to collect and analyze the data from the equilibrium stage.
Further, one chapter shows how to generate output data that are independent so
they can properly be analyzed with statistical methods. Methods are described to steer
the results so that the output data is independent. Another chapter presents a review
on the common statistical methods that are used to analyze the output results. This
includes the measuring of the average, variance, confidence intervals, tests between
two means or between two proportions, and the one-way analysis of variance.
The better the analyst can structure a simulation model to emulate the real system,
the more reliable the output results in problem solving decisions. Besides formulating
the equations and algorithms of the system properly, the analyst is confronted with
selecting the probability distributions that apply for each input variable in the model.
This is done with use of the data, empirical or sample, that is available. With this
data, the probability distributions are selected and the accompanying parameter
values are estimated. The better the fit, the better the model. One of the chapters
describes how to do this.
Another issue that sometimes confronts the analyst is to choose the probability
distribution and the corresponding parameter value(s) when no data is available for
an input variable. In this situation, the analyst relies of the best judgment of one or
more experts. Statistical ways are offered in the text to assist in choosing the
probability distribution and estimating the parameters.
Chapter Summaries
The following is a list of the remaining chapters and a quick summary on the content
of each.
Chapter 2. Random Number Generators Since early days, the many applications of
randomness have led to a wide variety of methods for generating random data of
various type, like rolling dice, flipping coins and shuffling cards. But these methods
are physical and are not practical when a large number of random data is needed in
an application. Since the advent of computers, a variety of computational methods
have been suggested to generate the random data, usually with random numbers.
Scientists, engineers and researchers are ever more developing simulation models
in their applications; and their models require a large – if not vast – number of
random numbers in processing. Developing these simulation models is not possible
without a reliable way to generate random numbers.
Chapter 3. Generating Random Variates Random variables are classified as discrete
or continuous. Discrete is when the variable can take on a specified list of values,
Chapter Summaries 5
and continuous is when the variable can assume any value in a specified interval.
The mathematical function that relates the values of the random variable with a
probability is the probability distribution. When a value of the variable is randomly
chosen according to the probability distribution, it is called a random variate. This
chapter describes the common methods to generate random variates for random
variables from various probability distributions. Two methods are in general use for
this purpose, one is called the Inverse Transform method (IT), and the other is the
Accept-Reject method (AR). The IT method is generally preferred assuming the
distribution function transforms readilly. If the distribution is mathematically
complicated and not easily transformed, the IT method becomes complicated and
is not easily used. The AR method generally requires more steps than the IT method
to achieve the random variate. The chapter presents various adaptations of these
two methods.
Chapter 4. Generating Continuous Random Variates A continuous random variable
has a mathematical function that defines the relative likelihood that any value in a
defined interval will occur by chance. The mathematical function is called the
probability density. For example, the interval could be all values from 10 to 50,
or might be all values zero or larger, and so forth. This chapter considers the more
common continuous probability distributions and shows how to generate random
variates for each. The probability distributions described here are the following: the
continuous uniform, exponential, Erlang, gamma, beta, Weibull, normal, lognormal,
chi-square, student’s t, and Fishers F. Because the standard normal distribution is so
useful in statistics and in simulation, and no closed-form formula is available, the
chapter also lists the Hastings approximation formula that measures the relationship
between the variable value and its associated cumulative probability.
Chapter 5. Generating Discrete Random Variates A discrete random variable
includes a specified list of exact values where each is assigned a probability of
occurring by chance. The variable can take on a particular set of discrete events,
like tossing a coin (head or tail), or rolling a die (1,2,3,4,5,6). This chapter considers
the more common discrete probability distributions and shows how to generate
random variates for each. The probability distributions described here are the
following: discrete arbitrary, discrete uniform, Bernoulli, binomial, hyper geometric,
geometric, Pascal and Poisson.
Chapter 6. Generating Multivariate Random Variates When two or more random
variables are jointly related in a probability way, they are labeled as multivariate
random variables. The probability of the variables occurring together is defined by a
joint probability distribution. In most situations, all of the variables included in the
distribution are continuous or all are discrete; and on less situations, they are a
mixture between continuous and discrete. This chapter considers some of the more
popular multivariate distributions and shows how to generate random variates for
each. The probability distributions described here are the following: multivariate
discrete arbitrary, multinomial, multivariate hyper geometric, bivariate normal,
bivariate lognormal, multivariate normal and multivariate lognormal. The Cholesky
6 1 Introduction
The appropriate statistical method for each type of data is applied as needed. This
includes, measuring the average value and computing the confidence interval of the
true mean. Oftentimes, the simulation model is run with one or more control
variables in a ‘what if’ manner. The output data between the two or more settings
of the control variables can be compared using appropriate statistical tools. This
includes testing for significant difference between two means, between two
proportions, and between k or more means.
Chapter 10. Choosing the Probability Distribution From Sample Data In building a
simulation model, the analyst often includes several input variables of the control
and random type. The control variables are those that are of the “what if” type.
Often, the purpose of the simulation model is to determine how to set the control
variables in the real system seeking optimal results. For example, in an inventory
simulation model, the control variables may be the service level and the holding
rate, both of which are controlled by the inventory manager. On each run of the
model, the analyst sets the values of the control variables and observes the output
measures to see how the system reacts.
Another type of variable is the random input variables, and these are of the
continuous and discrete type. This type of variable is needed to match, as best as
possible, the real life system for which the simulation model is seeking to emulate.
For each such variable, the analyst is confronted with choosing the probability
distribution to apply and the parameter value(s) to use. Often empirical or sample
data is available to assist in choosing the distribution to apply and in estimating
the associated parameter values. Sometimes two or more distributions may seem
appropriate and the one to select is needed. The authenticity of the simulation
model largely depends on how well the analyst emulates the real system. Choosing
the random variables and their parameter values is vital in this process.
This chapter gives guidance on the steps to find the probability distribution to use
in the simulation model and how to estimate the parameter values that pertain. For
each of the random variables in the simulation model with data available, the
following steps are described: verify the data is independent, compute various
statistical measures, choose the candidate probability distributions, estimate the
parameter(s) for each probability distribution, and determine the adequacy of the fit.
Chapter 11. Choosing the Probability Distribution When No Data Sometimes the
analyst has no data to measure the parameters on one or more of the input variables
in a simulation model. When this occurs, the analyst is limited to a few distributions
where the parameters may be estimated without empirical or sample data. Instead of
data, experts are consulted who give their judgment on various parameters of the
distributions. This chapter explores some of the more common distributions where
such expert opinions are useful. The distributions described here are continuous and
are the following: continuous uniform, triangular, beta, lognormal and Weibull. The
type of data provided by the experts is the following type: minimum value,
maximum value, most likely value, average value, and a p-quantile value.
Chapter 2
Random Number Generators
Introduction
For many past years, numerous applications of randomness have led to a wide
variety of methods for generating random data of various type, like rolling dice,
flipping coins and shuffling cards. But these methods are physical and are not
practical when a large number of random data is needed in an application. Since
the advent of computers, a variety of computational methods have been suggested
to generate the random data, usually with random numbers. Scientists, engineers
and researchers are ever more developing simulation models in their applications;
and their models require a large – if not vast – number of random numbers in
processing. Developing these simulation models is not possible without a reliable
way to generate random numbers. This chapter describes some of the fundamental
considerations in this process.
Modular Arithmetic
Generating random numbers with use of a computer is not easy. Many mathe-
maticians have grappled with the task and only a few acceptable algorithms have
been found. One of the tools used to generate random numbers is by way of the
mathematical function called modular arithmetic. For a variable w, the modulo of
w with modulus m is denoted as: w modulo(m). The function returns the remainder
of w when divided by m. In the context here, w and m are integers, and the function
returns the remainder that also is an integer. For example, if m ¼ 5, and w ¼ 1,
5 moduloð5Þ ¼ 0
6 moduloð5Þ ¼ 1
17 moduloð5Þ ¼ 2
and so forth. Hence, for example, the numbers 1, 6, 11, 16 are all congruent
modulo 1 when m ¼ 5. Note, the difference of any of the numbers, that have the
same remainder, is perfectly divisible by m, and thus they are congruent. Also
notice, when the parameter is m, the values returned are all integers, 0 to m-1.
An example of modular arithmetic is somewhat like the clock where the
numbers for the hours are always from 1 to 12. The same applies with the days of
the week (1–7), and the months of the year (1–12).
In 1951, Lehmer introduced a way to generated random numbers, called the linear
congruent generator, LCG. This method adapts well to computer applications and
today is the most common technique in use. The method requires the use of the
modulo function as shown below.
LCG calls for three parameters (a, b, m) and uses modular arithmetic. To obtain
the i-th value of w, the function uses the prior value of w as in the method
shown below:
wi ¼ ða wi1 þ bÞ moduloðmÞ
In the above, w is a sequence of integers that are generated from this function,
and i is the index of the sequence.
Example 2.1 Suppose, m ¼ 32, a ¼ 5 and b ¼ 3. Also, assume the seed (at i ¼ 0)
for the LCG is w0 ¼ 11. Applying,
wi ¼ ð5wi1 þ 3Þ moduloð32Þ
The reader should be aware that it is not easy to find the combination of a and b
that gives a full cycle for a modulus m. The trio of m ¼ 32, a ¼ 5 and b ¼ 3 is one
such combination that works.
Example 2.2 Now suppose the parameters are, m ¼ 32, a ¼ 7 and b ¼ 13. Also
assume the seed for the LCG is w0 ¼ 20. Applying,
Note there is a cycle of eight numbers, 25–20. In general, when one of the
numbers in the sequence is the same as the seed, w0 ¼ 20, the sequence repeats
with the same numbers, 25–20. In this situation, after eight numbers, the seed value
is generated, and thereby the cycle of eight numbers will continually repeat, in a
loop, as shown in the example above.
The majority of computers today have word lengths of 32 bits. For these machines,
the largest number that is recognized is (231 – 1), and the smallest number is –(231–1).
The first of the 32 bits is used to identify whether the number is a plus or a minus,
leaving the remaining 31 bits to determine the number.
12 2 Random Number Generators
So, for these machines, the ideal value of the modulus is m ¼ (231–1) since a full
cycle with m gives a sequence with the largest number of unique random uniform
variables The goal is to find a combination of parameters that are compatible with
the modulus. Fishman and Moore (1982), have done extensive analysis on random
number generators determining their acceptability for simulation use. In 1969,
Lewis, Goodman and Miller suggested the parameter values of a ¼ 16,807 and
b ¼ 0; and also in 1969, Payne, Rabung and Bogyo offered a ¼ 630,360,016 with
b ¼ 0. These combinations have been installed in computer compilers and are
accepted as parameter combinations that are acceptable for scientific use. The Fishman
and Moore reference also identifies other multipliers that achieve good results.
The first consideration is how many variates are generated before the cycle repeats.
The most important rule is to have a full cycle with the length of the modulus m.
Assume that n uniform variates are generated and are labeled as (u1, . . ., un) where
n ¼ m or is close to m. The ideal would be to generate random numbers where the
numbers span a full cycle or very near a full cycle.
For the set of n variates (u1, . . ., un), the sample average and variance are computed
and labeled as u and su2, respectively. The goal of the random number generator
is to emulate the standard continuous uniform random variable u, denoted here
as u ~ U(0,1), with expected value E(u) ¼ 0.5 and the variance V(u) ¼ s2 ¼ 1/12 ¼
0.0833. So, the appropriate hypothesis mean and variance tests are used to compare
u to 0.5 and su2 to 0.0833.
Chi Square
The sequence (u1, . . ., un) are set in k intervals, say k ¼ 10, where i ¼ 1–10
identifies the interval for which each u falls. When k ¼ 10, the intervals are:
(0.0–0.1), (0.1–0.2), . . .., (0.9–1.0). Now let fi designate the number of u’s
Pseudo Random Numbers 13
that fall in interval i. Since n is the number of u’s in the total sample, the expected
number of u’s in an interval is ei ¼ 0.1n. With the ten sets of fi and ei, a Chi Square
(goodness-of-fit) test is used to determine if the sequence of u’s are spread equally
in the range from zero to one.
The above chi square test can be expanded to two dimensions where the pair of
u’s (ui, ui+1) are applied as follows. Assume the same ten intervals are used as above
for both ui and for ui+1. That means there are10 10 ¼ 100 possible cells where
the pair can fall. Let fij designate the number of pairs that fall in the cell ij. Since n
values of u are tested, there are n/2 pairs. So the expected number of units to fall in a
cell is eij ¼ 0.01n/2. This allows use of the Chi Square test to determine if fij is
significantly different than eij. For a truly uniform distribution, the number of
entries in a cell should be equally distributed. When more precision is called, the
length of the intervals can be reduced from 0.10 to 0.05 or to 0.01, for example.
In the same way, the triplet of u’s can be tested to determine if the u’s generated
follow the expected values from a truly uniform distribution. With k ¼ 10 and
with three dimensions, the entries fall into 10 10 10 ¼ 1,000 cubes, and for a
truly uniform distribution, the number of entries in the cubes should be equally
distributed.
Autocorrelation
Another test computes the autocorrelation between the u’s with various lags of
length 1, 2, 3,. . .. The ideal is for all the lag autocorrelations to be significantly close
to zero, plus or minus. When the lag is k, the estimate of the autocorrelation is the
following,
X X
rk ¼ ðui 0:5Þðuik 0:5Þ= i
ðui 0:5Þ2
i
In the earlier example when m ¼ 32, a full cycle of w’s were generated with the
parameters (a ¼ 5, b ¼ 3). Further when the seed was set at w0 ¼ 11, the sequence
of w’s generated were the following:
26,5,28,15,14,9,16,19,2,13,4,23,22,17,24,27,10,21,12,31,30,25,0,3,18,29,20,7,
6,1,8,11
With this combination of parameters, whenever the seed is w0 ¼ 11, the same
sequence of random numbers will emerge. So in essence, these are not truly random
numbers, since they are predictable and will fall exactly as listed above. These
numbers are thereby called pseudo random numbers, where pseudo is another term
for pretend.
14 2 Random Number Generators
Note, in the above, if the seed were changed to w0 ¼ 30, say, the sequence of
random numbers would be the following:
25,0,3,18,29,20,7,6,1,8,11,26,5,28,15,14,9,16,19,2,13,4,23,22,17,24,27,10,21,
12,31,30
As the seed changes, another full cycle of 32 numbers is again attained.
The examples here are illustrated with m ¼ 32, a small set of random values.
But when m is large like (231 – 1), a very large sequence of random numbers is
generated. In several simulation situations, it is useful to use the same sequence of
random numbers, and therefore the same seed is appropriately applied. In other
situations, the seed is changed on each run so that a different sequence of random
numbers is used in the analysis.
Summary
The integrity of computer simulation models is only as good as the reliability of the
random number generator that produces the stream of random numbers one after
the other. The chapter describes the difficult task of developing an algorithm to
generate random numbers that are statistically valid and have a large cycle length.
The linear congruent method is currently the common way to generate the random
numbers for a computer. The parameters of this method include the multiplier and
the seed. Only a few multipliers are statistically recommended, and two popular
ones in use for 32-bit word length computers are presented. Another parameter is
the seed and this allows the analyst the choice of altering the sequence of random
numbers with each run, and also when necessary, offers the choice of using the
same sequence of random numbers from one run to another.
Chapter 3
Generating Random Variates
Introduction
Perhaps the most common way to generate a random variate for a random variable
is by the inverse transform method. The method applies to continuous and discrete
variables.
Continuous Variables
Discrete Variables
Accept-Reject Method
The mode is at x~ ¼ 4 and thereby f’ ¼ f(4) ¼ 0.5. So, now the four steps noted
above are followed:
1. Generate (u1, u2) ¼ (0.26, 0,83), say.
2. x ¼ 0 þ 0.26(40) ¼ 1.04.
3. f(1.04) ¼ 0.13.
4. Since 0.83 > 0.13/0.50 ¼ 0.26, reject x ¼ 1.04, and repeat the steps 1–4.
1. Generate (u1, u2) ¼ (0.72, 0,15), say.
2. x ¼ 0 þ 0.72(40) ¼ 2.88.
3. f(2.88) ¼ 0.36.
4. Since 0.15 < 0.36/0.50 ¼ 0.72, accept x ¼ 2.88.
5. Return x ¼ 2.88.
Truncated Variables
Sometimes, when the inverse transform method applies, the random variable of
interest is a truncated portion of another random variable. For example, suppose x
has density f(x) for a x b, and F(x) is the associated cumulative distribution
18 3 Generating Random Variates
function of x. But assume the variable of interest is only a portion of the original
density where c x d and the limits c, d are within the original limits of a and b,
Therefore, a c x d b. Note the new density of this truncated variable
becomes,
Order Statistics
Suppose n samples are taken from a continuous random variable, x, with density
f(x) and cumulative distribution F(x), and these are labeled as (x1, . . ., xn). Sorting
the n samples from low to high yields, x(1) x(2) . . ... x(n)where x(i) is the i-th
lowest value in the sample.
Order Statistics 19
Sorted Values
The notation y is here used to denote the i-th sorted value from the n samples of x.
From order statistics, the probability density of y becomes:
Note, there is one value of x ¼ y, (i-1) values of x smaller than y, and (n-i)
values larger than y. See Rose and Smith, (2002) for more detail on order statistics.
Minimum Value
GðyÞ ¼ ½1 FðyÞn
Maximum Value
When w is the largest value of x, then w ¼ max(x1, . . ., xn). Hence, the probability
density of w becomes:
GðwÞ ¼ FðwÞn
20 3 Generating Random Variates
Composition
P
k
and pi ¼ 1
i¼1
Note, where each of the k densities, fi(x), has a unique cumulative distribution
function, Fi(x), and a corresponding unique inverse function, Fi1(u).
The composition can be described as below.
i fi(x) pi Gi
1 f1(x) p1 G1 ¼ p1
...
k fk(x) pk Gk ¼ Gk1þ pk
The term Gi is the cumulative distribution function of the pi’s, and when i ¼ k,
Gk ¼ 1.
Example 3.7 The density for variable x is composed of two separate densities,
f1(x) ¼ 1.25x for (0 x 4) and f2(x) ¼ 0.25 for (2 x 6). The associated
probabilities are p1 ¼ 0.6 for f1(x), and p2 ¼ .4 for f2(x). So, G1 ¼ 0.6 and
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
G2 ¼ 1.0. Note also, F1 1 ðuÞ ¼ u=0:0625 , and F21(u) ¼ 2 þ 4u. To generate
a random x, two random u U(0,1), are needed, u1 and u2. The steps below are
followed:
1. Find two random uniform variates, u ~ U(0,1). Say (u1, u2) ¼ (0.14, 0.53).
2. Since u1 < G1 ¼ 0.60, density f1(x) is used.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3. x ¼ F1 1 ð0:53Þ ¼ 0:53=0:0625 ¼ 2:91.
4. Return x ¼ 2.91.
Summation
On some occasions when generating a random variate, the sum of a random variable
is needed, as in y ¼ (x1 þ . . . þ xk). For notation convenience in this section, y
will denote the sum of k independent samples of x, where x is a random variable
with distribution f(x). This method of summing is applied in subsequent chapters, in
a convolution manner, to generate a random variate for the continuous Erlang
distribution, and also for the discrete Pascal distribution.
22 3 Generating Random Variates
Triangular Distribution
x Tða; b; x~Þ
x0 Tð0; 1; x~0 Þ
x0 ¼ ðx aÞ=ðb aÞ
Triangular Distribution 23
and
x~0 ¼ ð~
x aÞ=ðb aÞ
fðx0 Þ ¼ 2x0 =~
x0 ð0 x0 x~0 Þ
2ð1 x0 Þ=ð1 x~0 Þ x0 < x0 1Þ
ð~
Fðx0 Þ ¼ x02 =~
x ð0 x0 x~0 Þ
1 ð1 x0 Þ2 =ð1 x~0 Þ ð~
x0 <x0 1Þ
Eðx0 Þ ¼ ð~
x0 þ 1Þ=3
The expected value and variance of the triangular x is related to the same from
the standard triangular x0 as shown below.
Sometimes, in simulation modeling, the data for a variable is not used to seek the
theoretical continuous density, but instead is applied directly to define the distribu-
tion. The density that results is called the empirical distribution. Suppose the data is
denoted as (x1, . . ., xn), and when sorted from low to high, it becomes x(1) x(2)
. . . x(n). The data is then arranged in tabular form as below, along with the
associated cumulative distribution function, Fx(i).
i x(i) Fx(i)
1 2 0.00
2 5 0.25
3 10 0.50
4 11 0.75
5 20 1.00
Example 3.11 Assume the same data (2, 5, 10, 11, 20) as Example 3.10.
To generate a random x by the inverse transform method, the following steps are
followed:
1. Generate a random uniform variate, say u ¼ 0.82.
2. Since 0.75 0.82 < 1.00, i ¼ 4, and
x ¼ 11 þ {[0.820.75]/[1.000.75]}[2011] ¼ 13.52.
3. Return x ¼13.52.
Sometimes the data comes in grouped form as shown in the table below. Note the k
intervals where (ai x < bi) identify the limits of x within each interval, and fi is
the frequency of samples in interval i. The sum of the frequencies is denoted as n.
The cumulative distribution function for interval i becomes Fi ¼ Fi1þ fi/n, where
F0 ¼ 0.
i [ai, bi) fi Fi
F0 ¼ 0
1 [a1,b1) f1 F1 ¼ F0 þ f1/n
2 [a2,b2) f2 F2 ¼ F1 þ f2/n
...
k [ak,bk) fk Fk ¼ Fk1 þ fk/n
i [ai, bi) fi Fi
1 [5,8) 2 0.0250
2 [8,11) 27 0.3625
3 [11,13) 32 0.7625
4 [13,15) 15 0.9500
5 [15,18) 4 1.0000
Another way to generate the random variate is by the inverse transform method
as shown below:
1. Generate a random uniform variate, u U(0,1).
2. Find the interval, i, where Fi1 u < Fi, and set x ¼ ai þ {[uFi1]/
[FiFi1]}(biai).
3. Return x.
Example 3.13 Assume the same data as in Example 3.12. To find a random x by
the inverse transform method, the routine below applies:
1. Generate a random uniform variate, u ~ U(0,1). Say u ¼ 0.91.
2. Since 0.7625 0.91 < 0.9500, i ¼ 4 and
x ¼ 13þ {[0.910.7625]/[0.95000.7625]}(1513) ¼ 14.573.
3. Return x ¼ 14.573.
Summary
This chapter shows how the continuous uniform u ~ U(0,1) random variates are
used to generate random variates for random variables from defined probability
distributions. To accomplish in a computer simulation model, a random number
generator algorithm is applied whenever a random uniform u ~ U(0,1) variate is
needed. The random number generator is the catalyst that delivers the uniform,
u ~ U(0,1), random variates, one after another, as they are needed in the simulation
model. This is essential since it allows the analyst the opportunity to create
simulation models that use any probability distribution that pertains and gives
flexibility to emulate the actual system under study.
Chapter 4
Generating Continuous Random Variates
Introduction
A continuous random variable has a mathematical function that defines the relative
likelihood that any value in a defined interval will occur by chance. The mathe-
matical function is called the probability density. For example, the interval could be
all values from 10 to 50, or might be all values zero or larger, and so forth. This
chapter considers the more common continuous probability distributions and shows
how to generate random variates for each. The probability distributions described
here are the following: the continuous uniform, exponential, Erlang, gamma, beta,
Weibull, normal, lognormal, chi-square, student’s t, and Fishers F. Because the
standard normal distribution is so useful in statistics and in simulation, and no
closed-form formula is available, the chapter also lists the Hastings approximation
formula that measures the relationship between the variable value and its associated
cumulative probability.
Continuous Uniform
EðxÞ ¼ ðb þ aÞ=2
Exponential
m ¼ 1=y
and
s2 ¼ 1=y2
Exponential 29
FðxÞ ¼ u ¼ 1 eyx
x ¼ 1=y lnð1 uÞ
Standard Exponential
Note, the expected value of x is EðxÞ ¼ 1=y,and in the special case when y ¼ 1, the
expected value of x is E(x) ¼ 1. When x ¼ 1 (the mean), F(1) ¼ 0.632, indicating
that 63.2 % of the values of x are below the mean value and 36.8 % are above.
In this special situation at E(x) ¼ 1, the distribution is like a standard exponential
distribution. The list below relates some selective values of the cumulative distri-
bution function, F(x), with the corresponding values of x.
F(x) x
0.0 0.000
0.1 0.105
0.2 0.223
0.3 0.357
0.4 0.511
0.5 0.693
0.6 0.916
0.7 1.204
0.8 1.609
0.9 2.303
Note, the median occurs at x ¼ 0.693 and the mean at x ¼ 1.00, indicating the
distribution skews far to the right.
Example 4.2 Assume an exponential distributed random variable x with a mean of
20 and a random variate of x is called in a simulation model. To generate the
random x, the following steps are followed:
1. Generate a random uniform u U(0,1), say u ¼ 0.17
2. x ¼ 20 ln(10.17) ¼ 3.727
3. Return x ¼ 3.727
30 4 Generating Continuous Random Variates
Erlang
In some queuing systems, the time associated with arrivals and service times is
assumed as an Erlang continuous random variable. The Erlang variable has two
parameters, y and k. The parameter y is the scale parameter, and k, an integer,
identifies the number of independent exponential variables that are summed
together to form the Erlang variable. In this way, the Erlang variable x is related
to the exponential variable y as below:
x ¼ ðy1 þ . . . þ yk Þ;
X
k1
FðxÞ ¼ 1 eyx ðyxÞj =j! x 0
j¼0
Gamma
The gamma distribution is almost the same as the Erlang distribution, except the
parameter k is any value larger than zero whereas k is a positive integer for the
Erlang. Also, x is any value greater or equal to zero. The density of the gamma is:
m ¼ k=y
and
s2 ¼ k=y2
To generate a random gamma variate is not easy. The method presented here
depends on whether k < 1 or k > 1. Note, when, k ¼ 1, the distribution is the same
as an exponential.
When k < 1
A routine to generate a random x, when k < 1, comes from Ahrens and Dieter
(1974). It is based on the Accept-Reject method and is shown below in five steps.
In the computations below, x0 is a gamma variate with y ¼ 1, and x is a gamma
variate with any positive y:
32 4 Generating Continuous Random Variates
When k > 1
Cheng (1977) developed the routine to generate a random x when k > 1. The
method uses the Accept-Reject method as shown in the five steps listed below. Note
below where x0 is a gamma variate with y ¼1, and x is a gamma variate with any
positive y:
pffiffiffiffiffiffiffiffiffiffiffiffiffi
1. Set a ¼ 1/ 2k 1
b ¼ k ln4, where ln ¼ natural logarithm.
q ¼ k þ 1/a
c ¼ 4.5
d ¼ 1 þ ln(4.5)
2. Generate two random uniform u U(0,1) varites, u1 and u2.
v ¼ a ln[u1/(1 u1)]
y ¼ kev
z ¼ u 12u 2
w ¼ b þ qv y
3. if w þ d cz 0, set x0 ¼ y, go to step 5
if w þ d cz < 0, go to step 4
4. if w ln(z), set x0 ¼ y, go to step 5
If w < ln(z), goto step 2
5. Return x ¼ x0 /y.
Example 4.4 Suppose x is gamma distributed with a mean of 0.1 and the variance ¼
0.02. Since, m ¼ k/y, and s2 ¼ k/y2, then solving for k and y, yields k ¼ 0.5 and
y ¼ 5. The computations are below:
1. e ¼ 2.718
b ¼ 1.184
2. (u1, u2) ¼ (0.71, 0.21), say.
Since p ¼ 0.841 1, go to step 3
Beta 33
3. y ¼ 0.707
ey ¼ 0.493
Since u2 0.493, x0 ¼ 0.707
4. Return x ¼ 0.707/5 ¼ 0.141.
Example 4.5 Suppose x is gamma distributed with a mean of 10 and the variance ¼
66.6. Since, m ¼ k/y, and s2 ¼ k/y2, then solving for k and y, yields k ¼ 1.5 and
y 0.15. The computations are below:
1. a ¼ 0.707
b ¼ 0.114
q ¼ 2.914
c ¼ 4.5
d ¼ 2.504
2. (u1, u2) ¼ (0.15, 0.74), say
v ¼ 1.226
y ¼ 0.440
z ¼ 0.0167
w ¼ 3.899
3. Since w þ dcz < 0, go to 4
4. ln(z) ¼ 4.092
Since w ln(z), x0 ¼ 0.440
5. Return x ¼ 0.440/0.15 ¼ 2.933
Beta
The beta distribution has two parameters (k1,k2) where k1 > 0 and k2 > 0, and
takes on many shapes depending on the values of the parameters. The variable
denoted as x, lies within two limits, a and b where (a x b).
Standard Beta
Another distribution is introduced and is called the standard beta. This distribution
has the same parameters (k1,k2) as the beta distribution, but the limits are
constrained to the range (0,1). So when x is a beta with a range (a,b), x0 is a
standard beta with a range (0,1). When they both have the same parameters, x and x0
are related as below:
x0 ¼ ðx aÞ=ðb aÞ
and
x ¼ a þ x0 ðb aÞ
34 4 Generating Continuous Random Variates
where
ð1
Bðc; dÞ ¼ beta function ¼ tc1 ð1 tÞd1 dt
0
Eðx0 Þ ¼ k1 =ðk1 þ k2 Þ
The mode of the standard beta variable could be 0 or 1 depending on the values
of k1 and k2. However, when k1 > 1 and k2 > 1, the mode lies between 0 and 1 and
is computed by,
x~ ¼ a þ x~0 ðb aÞ
Below is a list describing the relation between the parameters and the shape of
the distribution.
Parameters Shape
k1 < 1 and k2 1 Mode at zero (right skewed)
k1 1 and k2 < 1 Mode at one (left skewed)
k1 ¼ 1 and k2 > 1 Ramp down from zero to one
k1 > 1 and k2 ¼ 1 Ramp up from zero to one
k1 < 1 and k2 < 1 Bathtub shape
k1 > 1 and k2 > 1 and k1 > k2 Mode between zero and one (left skewed)
k1 > 1 and k2 > 1 and k2 > k1 Mode between zero and one (right skewed)
k1 > 1 and k2 > 1 and k1 ¼ k2 Mode in middle, symmetrical, normal like
k1 ¼ k2 ¼ 1 Uniform
Weibull 35
To generate a random beta variate x with parameters (k1,k2), and with the range
(a, b) the following routine is run:
1. Generate a random gamma variate, g1, with parameters k1 and y1 ¼1.
Generate a random gamma variate, g2, with parameters k2 and y2 ¼1.
2. x0 ¼ g1/(g1 þ g2)
3. x ¼ a þ x0 (b a)
4. Return x
Example 4.6 Suppose x is a beta random variable with parameters (2,4) and has a
range of 10–50. The following steps are followed to show how to generate a random x:
1. A random gamma variate is generated with (k1 ¼ 2, y1 ¼1), say, g1 ¼ 1.71.
A random gamma variate is generated with (k2 ¼ 4, y2 ¼1), say, g2 ¼ 4.01.
2. The random standard beta variate is x0 ¼ 1.71/(1.71 þ 4.01) ¼ 0.299
3. The random beta variate is x ¼ 10 þ 0.299(50 10) ¼ 21.958
4. Return x ¼ 21.958
Weibull
The Weibull distribution has two parameters, k1 > 0 and k2 > 0, and the random
variable, denoted as x, ranges from zero and above. The density is
h i
k1
fðxÞ ¼ k1 k~2 xk1 exp ðx=k2 Þk1 x>0
Recall G denotes the gamma function described earlier in this chapter. When the
parameter k1 1, the shape of the density is exponential like. When k1 > 1, the
shape has a mode greater than zero and skews to the right, and at k1 3, the density
shape starts looking like a normal distribution.
To generate a random x from the Weibull, the inverse transform method is used.
Setting a random uniform variate u U(0,1) to F(x), and solving for x, yields the
following:
36 4 Generating Continuous Random Variates
x ¼ k2 ½ lnð1 uÞ 1=k1
Normal Distribution
The normal distribution is symmetrical with a bell shaped density. Its mean is
denoted as m and the standard deviation as s. This is perhaps the most widely used
probability distribution in business and scientific applications. A companion distri-
bution, the standard normal distribution, has a mean of zero, a standard deviation of
one, and has the same shape as the normal distribution. The notation for the normal
variable is x N(m,s2), and its counterpart, the standard normal is z N(0,1). The
variable z is related to x by the relation: z ¼ (xm)/s. In the same way, x is
obtained from z by: x ¼ m þ zs. When k represents a particular value of z, the
pffiffiffiffiffiffi
probability density is f(k) ¼ 1/ 2p exp(k2/2). The probability that z is less than k
Ðk
is denoted as F(k) and is computed by F(k) ¼ 1 f ðzÞdz.
There is no closed-form solution for the cumulative distribution F(z). A way to
approximate F(z) has been developed by C.Hastings, Jr. (1955), and also reported
by A. Abramowitz and I. A. Stegun (1964). Two methods credited to Hastings are
listed below.
If z < 0, F(z) ¼ 1 F
5. Return F(z)
Another useful approximation also comes from Hastings, and gives a formula that
yields a random z from a value of F(z). The routine is listed below:
1. c0 ¼ 2.515517
c1 ¼ 0.802853
c2 ¼ 0.010328
d1 ¼ 1.432788
d2 ¼ 0.189269
d3 ¼ 0.001308
2. H(z) ¼1 F(z)
If H(z) 0.5, H ¼ H(z)
If H(z) > 0.5, H ¼ 1 H(z)
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3. t ¼ lnð1=H2 Þ where ln ¼ natural logarithm
4. k ¼ t [c0 þ c1t þ c2t2]/[1 þ d1t þ d2t2 þ d3t3]
5. If H(z) 0.5, z ¼ k
If H(z) > 0.5, z ¼ k
6. Return z
The literature reports various ways to generate a random standard normal variate z.
Three of the methods are presented here.
Hastings Method
The first way utilizes the Hastings method that finds a z from F(z), and is based
on the inverse transform method. The routine uses one standard uniform variate,
u U(0,1), as shown below:
1. Generate a random continuous uniform variate u U(0,1).
2. Set F(z) ¼ u.
3. Use Hastings Approximation of z from F(z) to generate a random standard
normal variate z.
4. Return z.
38 4 Generating Continuous Random Variates
Convolution Method
A second way to generate a random standard normal variate uses twelve random
continuous uniform variates. The routine is listed below:
1. z ¼ 6
2. For i ¼ 1 to 12
3. Generate a random uniform variate u U(0,1)
4. z¼zþu
5. Next i
6. Return z
Note in the above routine, E(u) ¼ 0.5 and V(u) ¼ 1/12, and thereby, E(z) ¼ 0
and V(z) ¼ 1. Also since z is based on the convolution of twelve continuous uniform
u U(0,1) variates, the central limit theorem applies and hence, z N(0,1).
Example 4.7 The routine below shows how to use the convolution method to
generate a random z N(0.1):
1. Set z ¼ 6.0
2. Sum 12 random continuous uniform variates, u U(0,1), say Su ¼ 7.12
3. z ¼ 6.0 þ Su ¼ 1.12
4. Return z ¼ 1.12
Sine-Cosine Method
A third way generates two random standard normal variates, z1, z2. This method
comes from a paper by Box and Muller (1958). The routine requires two random
continuous uniform variates to generate the two random standard normal variates.
The routine is listed below:
1. Generate two random continuous uniform variates, u1 and u2
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2. z1 ¼ 2 lnðu1 Þ cos½2pðu2 Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
z2 ¼ 2 lnðu1 Þ sin½2pðu2 Þ
3. Return z1 and z2
Example 4.8 Suppose x is normally distributed with mean 40 and standard devia-
tion 10, and a random normal variate of x is needed. Using the Sine-Cosine method,
the steps below follow:
1. Two random continuous uniform u U(0,1)variates are (u1, u2) ¼ (.37, .54), say
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2. z1 ¼ 2 lnðu1 Þ cos½2pðu2 Þ ¼ 1:3658
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
z2 ¼ 2 lnðu1 Þ sin½2pðu2 Þ ¼ 0:3506
3. x ¼ 40 1.3658 10 ¼ 26.342
4. Return x ¼ 26.342.
Lognormal 39
Lognormal
The lognormal distribution with variable x > 0, reaches a peak greater than zero
and skews far to the right. This variable is related to a counterpart normal variable y,
in the following way.
y ¼ lnðxÞ
where ln is the natural logarithm. In the same way, x is related to y by the relation
below:
x ¼ ey :
The variable y is normally distributed with mean and variance, my and sy2,
respectively, and x is lognormal with mean and variance, mx and sx2. The notation
for x and y are as below:
x LN my ; sy 2
y N my ; sy 2
Note, the parameters to define the distribution of x, are the mean and variance of y.
The parameters between x and y are related in the following way:
mx ¼ exp my þ sy 2 =2
sx 2 ¼ exp 2my þ sy 2 exp sy 2 1
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
my ¼ ln mx 2 = m2x þ s2x
sy 2 ¼ ln 1 þ sx 2 = mx 2
To generate a random x with parameters my and sy2, the following routine is run:
1. Generate a random standard normal variate, z.
2. A random normal variate becomes: y ¼ my þ zsy.
3. The random lognormal variate is x ¼ ey.
4. Return x.
Example 4.9 Suppose x is lognormal with mean mx ¼ 10, variance sx2 ¼ 400,
and a random lognormal variate of x is needed. The steps below show how to find a
random x:
40 4 Generating Continuous Random Variates
sy 2 ¼ ln½1 þ sx 2 = mx 2 ¼ 1:609
The standard deviation of y is sy ¼ 1.269.
2. A random standard normal variate is generated as z ¼ 1.37, say.
3. The random normal variate becomes y ¼ 1.498 þ 1.37 1.269 ¼ 3.236.
4. The random lognormal variate is x ¼ e3.236 ¼ 25.44.
5. Return x ¼ 25.44.
Chi-Square
The mean and variance of w (and w2) are E(w) ¼ k and V(w) ¼ 2k, respec-
tively. So, the mean and variance of w2 with k degrees of freedom are:
Eðw2 Þ ¼ k
Vðw2 Þ ¼ 2k
Probability tables values of chi-square with parameter k and probability P(w2 >
w a) ¼ a are listed in most statistical books, and usually for k 100.
2
w2 ¼ z1 2 þ . . . þ zk 2
Approximation Formula
When the parameter k is large (k >30), thanks to the central limit theorem, the
chi-square probability density is shaped like a normal distribution whereby, w2
N(k,2k). Using this relation, an approximation to the a-percent chi-square value is
shown below:
pffiffiffiffiffi
w2 a k þ za 2k
where z is a standard normal variable with P(z > za) ¼ a and thereby
P(w2 > w2a) a.
Relation to Gamma
It is also noted where the density f(w) has the same shape as the gamma distribution
with parameters y and k0 when the gamma parameters are set as: y ¼ 2 and k0 ¼ k/2,
Example 4.11 Suppose a simulation model needs a chi-square random variate with
239 degrees of freedom. To accomplish, the following routine is run:
1. Generate a random standard normal z N(0,1), say z ¼ 1.34.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2. w2 ¼ int 239 þ 1:34 2 239 þ 0:5 ¼ 269
3. Return w2 ¼ 269
Student’s t
EðtÞ ¼ 0
Fishers’ F
Table values, Fa,u,v, of F with parameters u and v are listed in most statistical
books where P[F > Fa,u,] ¼ a. Note the relation below that shows how the lower
tail values of F with degrees of freedom v and u is related to the upper tail values
when the degrees of freedom are u and v.
Fð1aÞ;v;u ¼ 1=Fa;u;v
is distributed as an F variable with degrees of freedom n10 ¼ n11 and n20 ¼ n21.
When s12 and s22 are equal, the ratio becomes,
F ¼ s1 2 =s2 2
Note, u ¼ n10 and v ¼ n20 is the notation for the degrees of freedom. To generate
a random F with parameters, u and v, the following routine is run:
1. Generate w12 and w22 with degree of freedom, u and v, respectively.
2. F ¼ [w12/u]/[w22/v]
3. Return F
Example 4.13 Suppose a random variate of F with degrees of freedom 4 and 6 is
needed in a simulation run. The steps below show how the random F is derived:
44 4 Generating Continuous Random Variates
Summary
This chapter shows how to transform the continuous uniform random variates,
u U(0,1), to random variates for a variable that comes from one of the common
continuous probability distributions. The probability distributions described here
are the following: the continuous uniform, exponential, Erlang, gamma, beta,
Weibull, normal, lognormal, chi-square, student’s t, and Fishers F. The chapter
also shows how to use the (Hastings) approximation formulas for the standard
normal distribution.
Chapter 5
Generating Discrete Random Variates
Introduction
A discrete random variable includes a specified list of exact values where each is
assigned a probability of occurring by chance. The variable can take on a particular
set of discrete events, like tossing a coin (head or tail), or rolling a die (1,2,3,4,5,6).
This chapter considers the more common discrete probability distributions and
shows how to generate random variates for each. The probability distributions
described here are the following: discrete arbitrary, discrete uniform, Bernoulli,
binomial, hyper geometric, geometric, Pascal and Poisson.
Discrete Arbitrary
where,
X
Eðx2 Þ ¼ x2i Pðxi Þ
i
Fðxi Þ ¼ Pðx xi Þ
x P(x) F(x)
0 0.4 0.4
1 0.3 0.7
2 0.2 0.9
3 0.1 1.0
Discrete Uniform
A variable x follows the discrete uniform distribution with parameter (a,b) when x
takes on all integers from a to b with equal probabilities. The probability of x
becomes,
pðxÞ ¼ 1=ðb a þ 1Þ x ¼ a to b
FðxÞ ¼ ðx a þ 1Þ=ðb a þ 1Þ x ¼ a to b
Bernoulli 47
EðxÞ ¼ ða þ bÞ=2
h i
VðxÞ ¼ ðb a þ 1Þ2 1 =12
Bernoulli
Pðx ¼ 0Þ ¼ 1 p
Pðx ¼ 1Þ ¼ p
EðxÞ ¼ p
VðxÞ ¼ pð1 pÞ
To generate a random Bernoulli variate x, the following three steps are taken:
1. Generate a random uniform u U(0,1).
2. If u < p, x ¼ 1; else, x ¼ 0.
3. Return x.
48 5 Generating Discrete Random Variates
Binomial
EðxÞ ¼ np
VðxÞ ¼ npð1 pÞ
Fðxo Þ ¼ Pðx xo Þ:
When n is Small
Normal Approximation
When n is large and if p 0.5 with np > 5, or if p > 0.5 with n(1p) > 5, then x
can be approximated with the normal distribution, whereby x N[np, np(1p)].
The routine listed below will generate a random x:
h
1. Generate a random standard
p normal
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi i
variate, z N(0,1).
2. x ¼ integer np þ z npð1 pÞ þ 0:5 :
3. Return x.
Poisson Approximation
When n is large and p is small, and the above normal approximation does not apply,
x is approximated by the Poisson distribution that is described subsequently in this
chapter:
1. The expected value of the Poisson variable is denoted here asy, where E(x) ¼ y ¼ np.
2. Generate a random Poisson variate x with parameter y.
3. Return x.
Example 5.4 Suppose x is binomial distributed with n ¼ 5 trials and p ¼ 0.375.
To generate a random binomial x, five continuous uniform variates,u U(0,1), are
needed as shown below:
1. Suppose the five uniform variates are the following: 0.286, 0.949, 0.710, 0.633,
and 0.325.
2. Since two of the variates are below p ¼ 0.375, x ¼ 2.
3. Return x ¼ 2.
Example 5.5 Assume 100 independent Bernoulli trials are run where the probabil-
ity of a success per trial is p ¼ 0.40. Of interest is to generate a random binomial
variate x. Since n ¼ 100, p ¼ 0.40 and np ¼ 40, the normal approximation to
the binomial can be used. The mean and variance of the normal are, m ¼ np ¼ 40
and s2 ¼ npð1 pÞ ¼ 24 , respectively. Hence s ¼ 4.89. The following routine
generates the random x:
1. Generate a random standard normal variate z, say z ¼ 0.87.
2. x ¼ integer[m þ zs þ 0.5] ¼ 44.
3. Return x ¼ 44.
Example 5.6 Suppose a random binomial variate x is needed from a sample of
n ¼ 1,000 trials with p ¼ 0.001. With np ¼ 1.0, the normal distribution does not
apply, but the Poisson distribution is applicable with y ¼ 1.00. Later in this chapter,
the way to generate a random x from the Poisson distribution is shown:
1. Generate a random Poisson variate with parameter y ¼ 1.00, say x ¼ 2.
2. Return x ¼ 2.
50 5 Generating Discrete Random Variates
Hyper Geometric
EðxÞ ¼ nD=N
To generate a random hyper geometric variate, the following routine is run. The
parameter notations are N ¼ population size, D ¼ number of defects in the popu-
lation, and n ¼ number of samples without replacement:
1. Set N1 ¼ N, D1 ¼ D and x ¼ 0.
2. For i ¼ 1 to n
p ¼ D1/N1
Generate a random continuous uniform u U(0,1)
N1 ¼ N1 - 1
If u < p, x ¼ x þ 1 and D1 ¼ D11
Next i.
3. Return x.
Example 5.7 Suppose a situation where a lot of ten units contains two defectives
and a sample of size four is taken without replacement. The goal is to generate a
random hyper geometric variate on the number of defective units, x, observed in the
sample. Note, N ¼ 10, D ¼ 2 and n ¼ 4. The steps below show how a random x is
generated:
1. Set N1 ¼ N ¼ 10, D1 ¼ D ¼ 2 and n ¼ 4.
2. Start loop.
3. At i ¼ 1, p ¼ D1/N1 ¼ 0.200, u ¼ 0.37 say. Since u p,set N1 ¼ 9.
4. At i ¼ 2, p ¼ D1/N1 ¼ 0.222, u ¼ 0.51 say. Since u p, set N1 ¼ 8.
5. At i ¼ 3, p ¼ D1/N1 ¼ 0.250, u ¼ 0.14 say. Since u < p, set N1 ¼ 7, x ¼ 1,
D1 ¼ 1.
6. At i ¼ 4, p ¼ D1/N1 ¼ 0.143, u ¼ 0.84 say. Since u p, set N1 ¼ 6.
7. End loop.
8. Return x ¼ 1.
Geometric 51
Geometric
FðxÞ ¼ 1 ð1 pÞx x ¼ 1; 2; . . .
The expected value and the corresponding variance of x are listed below:
EðxÞ ¼ 1=p
VðxÞ ¼ ð1 pÞ=p2
Pascal
A variable x follows the Pascal distribution when x represents the number of trials
needed to gain k successes when the probability of a success is p. This distribution is
also called the negative binomial distribution. The probability of x is listed below:
x1 k
PðxÞ ¼ p ð1 pÞxk x ¼ k; k þ 1; . . .
k1
X
x
FðxÞ ¼ PðyÞ x ¼ k; k þ 1; . . .
y¼k
EðxÞ ¼ k=p
Poisson
EðxÞ ¼ y
VðxÞ ¼ y
The Poisson and the exponential distributions are related since the time, t, between
events from a Poisson variable is distributed as exponential with EðtÞ ¼ 1=y. This
relation is used to randomly generate a value of x.
Example 5.11 A gas station is open 24 h a day where 200–300 vehicles arrive for
gas each day, equally distributed. Eighty percent of the vehicles are cars, 15 % are
trucks and 5 % motorcycles. The consumption of gas per vehicle is a truncated
exponential with a minimum and average known by vehicle type. Cars consume on
average 11 gal with a minimum of 3 gal. Trucks consumer a minimum of 8 gal and
an average of 20 gal. The motorcycles consume a minimum of 2 gal and an average
of 4 gal. The analyst wants to determine the distribution of the total consumption of
gas for a day.
A simulation model is developed and run for 1,000 days. On the first day, 228
vehicles enter the station and for each of the 228 vehicles, the type of vehicle and
the amount of gas consumed is randomly generated. The sum of gas consumed in
the first day is G ¼ 2,596 gal. The simulation is carried on for 1,000 days and the
amount of gas consumed is recorded for each day. Now with 1,000 values of G, the
gas consumed per day, the next step is to sort the values of G from low to high to
yield Gð1Þ Gð2Þ . . . Gð1000Þ. The p-quantile is estimated by Gðp 1000Þ.
For example, the 0.01 quantile is estimated using 0:01 1000 ¼ 10 where
G(10) ¼ 2,202, that is the tenth smallest value of G. The table below lists various
p-quantiles of the daily gas consumption.
p G(p)
0.01 2,202
0.05 2,336
0.10 2,436
0.20 2,546
0.30 2,658
0.40 2,768
0.50 2,883
0.60 3,014
0.70 3,138
0.80 3,261
0.90 3,396
0.95 3,503
0.99 3,656
The results show where there is a 5 % chance that G will exceed 3,503 gal and a
1 % chance that G will exceed 3,656 gal. Further, an estimate of the 90 % prediction
interval on G becomes Pð2336 G 3503Þ ¼ 0:90.
Example 5.12 An always-open truck dealership has ten bays to service trucks for
maintenance and service. The arrival rate of trucks is Poisson distributed with 6.67
vehicles per day, and the service rate is also Poisson with a service rate of 1.33
vehicles per day. The vehicles require n parts in the service operation and the
probability of n, denoted as P(n), is as follows:
n 3 4 5 6 7 8
P(n) 0.11 0.17 0.22 0.28 0.18 0.04
Summary 55
Four types of parts are described depending on the source of the supplier: PDC
(parts distribution center), OEM (original equipment manufacturer), DSH (direct
ship supplier), NST (Non stocked part). The table below lists the probability the
vehicle needs one of these type of parts, P(type); the service level for each type of
part, (SL), where SL ¼ P(part is available in dealer); and the lead time to obtain the
part in days (LT). Note, the dealer is limited on space and budget and must set his
service levels accordingly. The higher the service level, the more inventory in
pieces and in investment.
A simulation model is developed and is run until 5,000 vehicles are processed in
the dealership. The first 500 vehicles are used as the transient stage, whereby the
equilibrium stage is for the final 4,500 vehicles. This is where all the measurements
are tallied. The table below shows some of the statistics gathered from the final
4,500 vehicles.
Bay averages per vehicle:
Vehicle averages:
The results show where the average bay is empty for 0.19 days for each vehicle it
processes. Further the average service time is 0.75 days and the average idle time
per vehicle waiting to receive the out-of-stock part(s) is 0.56 days. The average wait
time a vehicle is in the yard prior to service is 0.09 days and the average time in the
dealership is 1.40 days. Note also where the average time between vehicles for a
bay is 1.50 days.
Summary
Introduction
When two or more random variables are jointly related in a probability way, they
are labeled as multivariate random variables. The probability of the variables
occurring together is defined by a joint probability distribution. In most situations,
all of the variables included in the distribution are continuous or all are discrete; and
on less situations, they are a mixture between continuous and discrete. This chapter
considers some of the more popular multivariate distributions and shows how to
generate random variates for each. The probability distributions described here are
the following: multivariate discrete arbitrary, multinomial, multivariate hyper geo-
metric, bivariate normal, bivariate lognormal, multivariate normal and multivariate
lognormal. The Cholesky decomposition method is also described since it is needed
to generate random variates from the multivariate normal and the multivariate
lognormal distributions.
Suppose k discrete random variables (x1,. . .,xk) are jointly related by the probability
distribution P(x1,. . .,xk). The sum of the probabilities over all possible values of the
k variables is one, i.e.,
X
x ::x
Pðx1 . . . xk Þ ¼ 1:0
1 k
Consider one of the k variables, say xj. The marginal probability of xj, denoted as
P(xj. . .), is obtained by summing the joint probability distribution over all xi except
xj, as shown below,
X
Pðxj . . .Þ ¼ Pðx1 ; x2 ; . . . ; xk Þ
all:x: :but:xj
where
X 2
E xj 2 . . . ¼ xj Pðxj . . .Þ
xj
The steps below show how to generate a random set of variates for the variables,
x1,x2,. . .,xk.
1. Get P(x1. . .), the marginal probability of x1, and also F(x1. . .), the corresponding
cumulative distribution.
Generate a random continuous uniform variate u U(0,1).
Locate the smallest value of x1 where u < F(x1. . .), say x10.
2. Get P(x2|x10. . .), the marginal probability of x2 given x10, and also F(x2|x10. . .),
the corresponding cumulative distribution.
Generate a random uniform continuous variate u U(0,1).
Locate the smallest value of x2 where u < F(x2|x10. . .), say x20.
3. Get P(x3|x10x20. . .), the marginal probability of x3 given x10 and x20, and also
F(x3|x10x20. . .), the corresponding cumulative distribution.
Generate a random continuous uniform variate u U(0,1).
Locate the smallest value of x3 where u < F(x3|x10x20. . .), say x30.
4. Repeat in the same way until get xk0.
5. Return (x10, . . ., xk0).
Example 6.1 Suppose a three variable joint probability distribution with variables,
x1, x2, x3 where the possible values for x1 is 0,1,2, for x2 it is 0,1, and for x3 it is 1,2,3.
The probability distribution P(x1,x2,x3) is listed below. Note the sum of all
probabilities is one. Below shows how to generate one set of random variates.
Multinomial 59
P(x1,x2,x3)
x3 1 2 3 |1 2 3
x2 0 |1
x1
0 | 0.12 0.10 0.08 | 0.08 0.06 0.05 |
1 | 0.08 0.06 0.04 | 0.05 0.04 0.03 |
2 | 0.06 0.04 0.02 | 0.04 0.03 0.02 |
1. The marginal distribution for x1 and the associated cumulative distribution is below:
P(0..) ¼ 0.49 F(0..) ¼ 0.49
P(1..) ¼ 0.30 F(1..) ¼ 0.79
P(2..) ¼ 0.21 F(2..) ¼ 1.00
Generate a random u U(0,1), u ¼ 0.37 say. Hence x10 ¼ 0.
2. The marginal distribution for x2 given x10 ¼ 0, and the associated cumulative
distribution is below:
P(0|0.) ¼ 0.30/0.49 ¼ 0.612 F(0|0.) ¼ 0.612
P(1|0.) ¼ 0.19/0.49 ¼ 0.388 F(1|0.) ¼ 1.000
Generate a random u U(0,1), u ¼ 0.65 say. Hence x20 ¼ 1.
3. The marginal distribution for x3 given x10 ¼ 0 and x20 ¼ 1, and the associated
cumulative distribution is below.
P(1|01) ¼ 0.08/0.19 ¼ 0.421 F(1|01) ¼ 0.421
P(2|01) ¼ 0.06/0.19 ¼ 0.316 F(2|01) ¼ 0.737
P(3|01) ¼ 0.05/0.19 ¼ 0.263 F(3|01) ¼ 1.000
Generate a random u U(0,1), u ¼ 0.84 say. Hence x30 ¼ 3.
4. Return (x10, x20, x30) ¼ (0, 1, 3)
Multinomial
Eðxi Þ ¼ npi
Vðxi Þ ¼ npi ð1 pi Þ
The steps below show how to randomly generate a set of multinomial variates
(x1, . . ., xk) from n trials with probabilities, p1, . . . , pk.
1. For i ¼ 1 to k
P
k
pi 0 ¼ pi = pj
j¼i
iP
1
ni 0 ¼ n xj
j¼1
Generate a random Binomial variate, xi, with parameters, ni0 and pi0 .
Next i
2. Return x1, . . ., xk
Example 6.2 Suppose an experiment with three possible outcomes with
probabilities 0.5, 0.3 and 0.2, respectively, where five trials of the experiment are
run. Of need is to randomly generate the multinomial variate set (x1, x2, x3) for this
situation. The steps below show how this is done.
1. At i ¼ 1, with parameters n10 ¼ 5, p10 ¼ 0.5, generate a random binomial, say
x1 ¼ 2.
2. At i ¼ 2, with parameters n20 ¼ 3, p20 ¼ 0.6, generate a random binomial, say
x2 ¼ 2.
3. At i ¼ 3, with parameters n30 ¼ 1, p30 ¼ 1.0, generate a random binomial, say
x3 ¼ 1.
4. Return (x1, x2, x3) ¼ (2, 2, 1).
Suppose a population of N items where some are non-defective and the remainder are
defective falling into k defective categories with number of defectives, D1, . . ., Dk.
A sample of n items are taken without replacement and the outcomes are x1, . . ., xk
Multivariate Hyper Geometric 61
defective items. Note, xi ¼ number of defective items of the ith category in the
sample. The random variables follow the Multivariate Hyper Geometric distribution.
The probability distribution is listed below,
N SD D1 Dk N
Pðx1 ; . . . ; xk Þ ¼ ...:
n Sx x1 xk n
where
P
k
SD ¼ Di ¼ sum of defective items in the population
i¼1
P
k
Sx ¼ xi ¼ sum of defective items in the sample
i¼1
n Sx ¼ sum of non-defective items in the sample
To generate a random set of output variates for the Multivariate Hyper Geometric
distribution, the following routine is run. Recall the notation of N, n, D1, . . ., Dk and
x1, . . ., xk.
1. Initialize D1i ¼ Di for i ¼ 1 to k, and N1 ¼ N.
2. For j ¼ 1 to n
Generate a random uniform u U(0,1).
F¼0
For i ¼ 1 to k
p ¼ D1i/N1
F¼Fþp
If u < F, xi ¼ xi þ 1, D1i ¼ D1i 1, go to 3
Next i
3. N1 ¼ N1 1
Next j
4. Return (x1, . . ., xk).
Example 6.3 Suppose a lot of size 20 comes in to a receiving dock with three
types of defectives. There are 4 defective of type 1, 3 defectives of type 2, and
2 defectives of type 3. Eleven of the items have no defectives. A sample of size
four is taken without replacement and of need is to generate a random set of
output variates, x1, x2, x3. The method to obtain the variates is shown below.
For simplicity, the fractions are carried out only to two decimal places.
62 6 Generating Multivariate Random Variates
Bivariate Normal
Consider variables x1 and x2 that are jointly related via the bivariate normal
distribution (BVN) as below:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
fðx1 ; x2 Þ ¼1=½2ps1 s2 ð1 r2Þ expff½ðx1 m1 Þ=s1 2 þ ½ðx2 m2 Þ=s2 2
2r½ðx1 m1 Þðx2 m2 Þ=s1 s2 g=2ð1 r2 Þg
where (m1, m2, s1, s2, r) are five parameters of the distribution.
Marginal Distributions
x1 Nðm1 ; s1 2 Þ
Bivariate Normal 63
and
x2 Nðm2 ; s2 2 Þ
Eðx1 Þ ¼ m1
Vðx1 Þ ¼ s1 2
respectively.
The corresponding values for x2 are
Eðx2 Þ ¼ m2 ;
Vðx2 Þ ¼ s2 2
r ¼ s12 =ðs1 s2 Þ
where s12 is the covariance between x1 and x2. The covariance is also denoted as
C(x1,x2) and is obtained from,
Cðx1 ; x2 Þ ¼ Eðx1 x2 Þ Eðx1 ÞEðx2 Þ
Conditional Distributions
sx2jx1 2 ¼ s2 2 ð1 r2 Þ
sx1jx2 2 ¼ s1 2 ð1 r2 Þ
correlations, r ¼ 0.5, 0.0 and 0.5. Recall, F(x10,x20) ¼ P(x1 x10, and
x2 x20). To comply, a simulation model is developed to find the probabilities
needed. For a given x10, and x20, n trials are run where the values of x1 and x2 are
randomly generated to conform with the stated parameters. The program is run with
n trials where g represents the number of times in the n trials the generated values
were both less or equal to x10, and x20. The estimate of the probability is computed
by F(x10,x20) ¼ g/n.
The table below lists the simulation findings where various number of trials are
run at n ¼ 50, 100, 500 and 1,000. Note, at r ¼ 0, the true value of the cumulative
distribution is known at x10, and x20 and is listed in the table at n ¼ 1. The results
point to the need for more trials to sharpen the probability estimates.
Bivariate Lognormal
When the pair x1,x2 are bivariate lognormal (BVLN), the distribution is noted as
BVLN(my1, my2, ,sy1, sy2, ry), where my1 and my2 are the means of the y1 ¼ ln(x1)
and y2 ¼ ln(x2). Also sy1 and sy2 are the corresponding standard deviations of y1
and y2. ry is the correlation between y1 and y2. The transformed pair, (y1, y2) are
distributed by the bivariate normal distribution and the notation is BVN(my1, my2,
sy1, sy2, ry). Note x1 ¼ ey1 and x2 ¼ ey2.
4. Now find the conditional mean and standard deviation of y2, given y10, from
my2|y10 ¼ my2 þ ry(sy2/sy1)(y10 my1), and sy2|y12 ¼ sy22(1 ry2).
5. Get a random y2 by y20 ¼ my2|y10 þ z2 sy2|y1.
6. Now, x1 ¼ ey10 and x2 ¼ e y20.
7. Return x1, x2.
Example 6.6 Assume x1, x2 are a pair of bivariate lognormal variables with
parameters BVLN(5,8,1,2,0.5), and a set of random variates is needed. The follow-
ing steps are followed:
1. From z N(0,1), get the pair of random variates, say: z1 ¼ 0.72 and
z2 ¼ 1.08.
2. A random y1 becomes y10 ¼ 5 þ 0.72(1) ¼ 5.72
3. The mean and standard deviation of y2 given y10 ¼ 5.72 are my2|5.72 ¼ 8.72, and
sy2|5.72 ¼ 1.732.
4. So now, the random y2 becomes, y20 ¼ 8.72 1.08(1.732) ¼ 6.85.
5. Finally, the pair needed are x1 ¼ e5.72 ¼ 304.90 and x2 ¼ e6.85 ¼ 943.88.
6. Return (x1, x2) ¼ (304.90, 943.88).
Multivariate Normal
When k variables, x1, . . ., xk, are related by the multivariate normal distribution, the
parameters are m and S. The parameter m is a k-dimensional vector whose transpose
mT ¼ [m1 ,. . .., mk] houses the mean of each of the variables, and S is a k-by-k
matrix, that contains sij in row i and column j, where sij is the covariance between
variables i and j. Note, the covariances along the main diagonal, sii, are the
variances of variable i, i ¼ 1 to k. Thus si2 ¼ sii, i ¼ 1 to k.
Cholesky Decomposition
S ¼ CCT
and C is a k-by-k matrix where the upper diagonal is all zeros and the diagonal and
lower diagonal contain the elements cij. For a full discussion, see Gentle (1998).
The values of the elements of matrix C are computed in the three stages listed
below.
pffiffiffiffiffiffiffi
Column 1 elements: ci1 ¼ si1 = s11 i ¼ 1; . . . :; k
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
iP
1
Main diagonal elements: cii ¼ sii c2im i ¼ 2; . . . ; k
m¼1
Multivariate Normal 67
P
j1
Lower diagonal elements: cij ¼ sij cim cjm =cjj 1<j<i k
m¼1
When the three entries of the standard normal variates are: z1 ¼ 0.02,
z2 ¼ 2.00, z3 ¼ 1.20, and the matrix manipulation of X ¼ CZ þ m is applied,
the random normal variates become:
2 3 2 3
x1 99:84
X ¼ 4 x2 5 ¼ 4 88:16 5
x3 153:20
Multivariate Lognormal
When k variables, x1, . . ., xk, are related by the multivariate lognormal distribution, the
associated bivariate normal variables are y1,. . ., yk, where yi ¼ ln(xi) i ¼ 1,. . ., k. The
parameters for this distribution are derived from the k variables, y1,. . ., yk.
The parameters are listed in the transposed k-dimensional vector, m, whose transpose
is mT ¼ [m1 ,. . .., mk], that houses the mean of each of the yi variables, and the k-by-k
matrix, S, that contains sij in row i and column j, where sij is the covariance between
variables yi and yj. Note, the covariances along the main diagonal, sii, are the
variances of variable yi, i ¼ 1 to k. Thus si2 ¼ sii, i ¼ 1 to k.
Cholesky Decomposition
S ¼ CCT
and C is a k-by-k matrix where the upper diagonal is all zeros and the diagonal and
lower diagonal contain the elements cij. As shown earlier, the values of the elements
of matrix C are computed in the three stages listed below.
pffiffiffiffiffiffiffi
Column 1 elements: ci1 ¼ si1 = s11 i ¼ 1; . . . :; k
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
iP
1
Main diagonal elements: cii ¼ sii c2im i ¼ 2; . . . ; k
m¼1
P
j1
Lower diagonal elements: cij ¼ sij cim cjm cjj 1<j<i k
m¼1
When the three entries of the standard normal variates are: z1 ¼ 0.72, z2 ¼ 1.00,
z3 ¼ 1.04, and the matrix manipulation of Y ¼ CZ þ my is applied, the random
normal variates become:
70 6 Generating Multivariate Random Variates
2 3 2 3
y1 0:76
Y ¼ 4 y2 5 ¼ 4 8:13 5
y3 4:11
Finally, convert y1, y2, y3, to x1, x2, x3 by xi ¼ eyi for i ¼ 1, 2, 3. The multivari-
ate lognormal variates are below.
2 3 2 3
x1 1:82
X ¼ 4 x2 5 ¼ 4 3394:80 5
x3 0:02
Summary
This chapter considers some of the more popular multivariate distributions and
shows how to generate random variates for each. The probability distributions
described are the following: multivariate discrete arbitrary, multinomial, multivari-
ate hyper geometric, bivariate normal, bivariate lognormal, multivariate normal and
multivariate lognormal. The Cholesky decomposition method is also presented
because of its important role in generating random variates from the multivariate
normal and multivariate lognormal distributions.
Chapter 7
Special Applications
Introduction
This chapter shows how to generate random variates for applications that are not
directly bound by a probability distribution as was described in some of the earlier
chapters. The applications are instructively useful and often are needed as such in
simulation models. They are the following: Poisson process, constant Poisson
process, batch arrivals, active redundancy, standby redundancy, random integers
without replacement and poker.
Poisson Process
There are many simulation models where a series of events take place over a fixed
time horizon, say, from t ¼ 0 to T. In this section, the arrivals are from a Poisson
process, whereby the time between events are distributed via the exponential
density. Further, when the expected time between arrivals changes over the time
horizon, additional information is needed concerning various points of time in the
interval, B(j), and the associated expected time between arrivals, A(j). At t ¼ 0, the
average time between arrivals is A(1), and the point in time is B(1) ¼ 0, An interval
of time later, say at B(2), the average time between arrivals is A(2), and so forth. In
this way, A(j) and B(j) jointly identify how the average time between arrivals vary
from t ¼ 0 to T. At t ¼ T, the last entry of j occurs and is denoted as j ¼ J, and B
(J) ¼ T. Note, A(J) gives the associated average time at t ¼ T, whereby B(J) ¼ T.
For any other time from 0 to T, interpolation is used to determine the average time
at t, as is described in the routine below:
1. Parameters and initial values: T ¼ length of time horizon, J ¼ number of points in
time from 0 to T, B(j) ¼ the j-th point in time, and A(j) ¼ average time between
arrivals at time B(j), j ¼ 1 to J, t(0) ¼ 0 is the starting point, and n is an index.
2. Using t(n) and {B(j) j ¼ 1 to J}, find the minimum jo where t(n) B(jo).
3. The average time between arrivals becomes: A ¼ A(jo) + {[t(n)B(jo)]/
[B(jo + 1)B(jo)]}[A(jo + 1)A(jo)].
4. Get a random uniform u ~ U(0,1).
5. Use A and u and the exponential density to generate the random time between
the arrivals, x ¼ A lnð1 uÞ.
6. If [t(n1) + x] T, n ¼ n + 1, t(n) ¼ [t(n1) + x], go to 2.
7. If [t(n1) + x] > T, end, go to 8.
8. Return {t(i), i ¼ 1 to n}.
Example 7.1 Suppose T ¼ 24 and J ¼ 5. {B(j), j ¼ 1 to 5} ¼ {0, 8, 14, 18, 24},
{A(j), j ¼ 1 to 5} ¼ {5, 4, 2, 3, 5}, and the time between arrivals are from a
Poisson process, with the exponential density. The routine below shows how to find
the time of arrival for the first three units:
1. n ¼ 0.
At t(0) ¼ 0, jo ¼ 1, since B(1) t(0) < B(2).
The average time is: A ¼ 5 + (00)/(80) (45) ¼ 5.0.
Get u ~ U(0,1), assume u ¼ 0.72.
The random time is: x ¼ 5.0 ln(10.72) ¼ 6.36.
n ¼ n + 1 ¼ 1, t(1) ¼ 0 + 6.36 ¼ 6.36.
2. n ¼ 1.
At t(1) ¼ 6.36, jo ¼ 1, since Bð1Þ tð1Þ<Bð2Þ.
The average time is: A ¼ 5 þ ð6:36 0Þ=ð8 0Þ ð4 5Þ ¼ 4:205.
Get u ~ U(0,1), assume u ¼ 0.38.
The random time is: x ¼ 4:205 lnð1 0:38Þ ¼ 2:01.
n ¼ n + 1 ¼ 2, t(2) ¼ 6.36 + 2.01 ¼ 8.37.
3. n ¼ 2.
At t(2) ¼ 8.37, jo ¼ 2, since Bð2Þ tð2Þ<Bð3Þ.
The average time is: A ¼ 4 þ ð8:37 8Þ=ð14 8Þ ð2 4Þ ¼ 3:666.
Get u ~ U(0,1), assume u ¼ 0.17.
The random time is: x ¼ 3:666 lnð1 0:17Þ ¼ 0:68.
n ¼ n + 1 ¼ 3, t(3) ¼ 8.37 + 0.68 ¼ 9.05.
4. The first three arrivals occur at times: 6.36, 8.37, 9.05.
In the event the expected inter-arrival time is the same for the whole time horizon,
then J ¼ 2 periods in the time horizon, B(1) ¼ 0 is the start time, B(2) ¼ T is the
end time, and A(1) ¼ A(2) are the arrival rates for time periods 1 and 2.
Active Redundancy 73
Batch Arrivals
Consider a simulation model where units arrive to a system in batch sizes of one or
more. The model generates the random time of arrival and the associated batch size.
One way to describe the batch size distribution is by the modified Poisson distri-
bution. Since each individual batch size, x, is one or larger, the expected value of x
is EðxÞ 1:0. The modified Poisson becomes x ¼ y + 1, where y is a Poisson
variable with mean m ¼ E(x)1. So, to generate a random x, the following routine
is run:
1. For a Poisson variable y with parameter m, generate a random Poisson y.
2. Set x ¼ y + 1.
3. Return x.
Example 7.2 Suppose the average batch size for a simulation run is 1.6, and a
random variate of the batch size, y, is needed. The following routine is run:
1. From the Poisson distribution with parameter m ¼ E(x)1 ¼ 0.6, generate a
random y. Assume the random Poisson y ¼ 0.
2. Hence, x ¼ y + 1 ¼ 0 + 1 ¼ 1.
3. Return x ¼ 1.
Active Redundancy
Example 7.3 Consider a unit that has three active redundant components with run
times following the exponential density and each with an expected run time of 10 h.
The routine below generates one random variate of the unit run time:
1. At i ¼ 1, generate a random exponential with mean 10, say, y1 ¼ 7.4.
2. At i ¼ 2, generate a random exponential with mean 10, say, y2 ¼ 15.1.
3. At i ¼ 3, generate a random exponential with mean 10, say, y3 ¼ 4.2.
4. The run time for the unit is x ¼ max(7.4, 15.1, 4.2) ¼ 15.1.
5. Return x ¼ 15.1.
Example 7.4 A component is in the design stage and will include m identical
subcomponents in an active redundancy manner. All m subcomponents start and
run together. The component run time ends when the last subcomponent fails. The
time to fail, t, for each subcomponent follows a gamma distribution with parameters
k ¼ 3 and 1/ y ¼ 6.0 (1,000 h), whereby the time to fail, t, has a mean of
E(t) ¼ 18 (1,000 h). The design engineer wants to know the minimum number
of the subcomponents to include in the active redundancy package so that the time
to fail for the component, T, has a reliability of R equal to 0.99 or greater at
20 (1,000 h). Note T ¼ max(t1, . . .., tm) and the goal is to have
R ¼ P½T 20 ð1000 hoursÞ 0:99.
A simulation model is developed to find the minimum number of
subcomponents to achieve the reliability specified. In the table, m denotes the
number of subcomponents, n is the number of trials in a run – where in each trial,
T is computed from m random variates of the gamma distribution with the stated
parameters, g is number of the trials in the run where T 20 (1,000 h), and
R ¼ g/n is an estimate of the probability that T will be 20 (1,000 h) or greater. The
simulation results are listed below where the number of components (m) increases
by one with each simulation run of n ¼ 1,000 trials. Note, at m ¼ 11, R ¼ 0.993
and at m ¼ 12, R ¼ 0.998. Hence, the minimum value of m is 11 and the reliability
is estimated as R ¼ 0.993.
m n g R
1 1,000 318 0.318
2 1,000 588 0.588
3 1,000 707 0.707
4 1,000 802 0.802
5 1,000 883 0.883
6 1,000 927 0.927
7 1,000 961 0.961
8 1,000 975 0.975
9 1,000 973 0.973
10 1,000 980 0.980
11 1,000 993 0.993
12 1,000 998 0.998
Standby Redundancy 75
Standby Redundancy
A unit with standby redundancy is defined when the run time of the unit is the sum
of the run times of m components that are run one after the other. That is, when one
component fails, another starts running. This model assumes the run time, y, of each
component is based on the exponential distribution with parameter y. The run times
for the m components are (y1, . . ., ym), and the run time for the unit becomes,
x ¼ (y1 + . . . + ym).
computed from m random variates of the gamma distribution with the stated
parameters, g is number of the trials in the run where T 20 (1,000 h), and
R ¼ g/n is an estimate of the probability that T will be 20 (1,000 h) or greater.
The simulation results are listed below where the number of components (m)
increases by one with each simulation run of n ¼ 1,000 trials. Note, at m ¼ 3,
R ¼ 0.993 and at m ¼ 4, R ¼ 0.999. Hence, the minimum value of m is three
and the reliability is estimated as R ¼ 0.993.
m n g R
1 1,000 348 0.348
2 1,000 871 0.871
3 1,000 993 0.993
4 1,000 999 0.999
The routine below shows how to generate a random sequence of n samples without
replacement from N unique items in a population:
1. The parameters are N ¼ population size, n ¼ sample size without replace-
ment, {D(i), i ¼ 1 to N} identifies the N unique items, and {E(j) j ¼ 1 to n}
denotes the n random items in sequence.
2. ND ¼ N ¼ number of unique items remaining.
3. For j ¼ 1 to n.
4. Generate a random discrete uniform integer, k, (1ND).
Set E(j) ¼ D(k).
5. ND ¼ ND1.
6. For m ¼ k to ND.
7. D(m) ¼ D(m + 1).
8. Next m.
9. Next j.
10. Return [E(1), . . ., E(n)].
Example 7.7 Suppose a population of N ¼ 10 integers, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
where n ¼ 5 will be selected randomly in sequence. The routine below is run:
1. At j ¼ 1, ND ¼ 10 and D(i) ¼ (1,2,3,4,5,6,7,8,9,10).
Using the discrete uniform (1,10), randomly generate k, say k ¼ 7.
Poker 77
Poker
A deck of cards has 52 unique items. For simplicity, the items are here labeled as
D(i) i ¼ 1 to 52. The notation is (H, D, S, C) for (hearts, diamonds, spades, clubs)
and (A,2,3,4,5,6,7,8,9,10,J,Q,K) for (ace, 2,3,4,5,6,7,8,9,10, jack, queen, king). The
52 items become:
{D(i) i ¼ 1 to 13} ¼ {AH, 2H, 3H, 4H, 5H, 6H, 7H, 8H, 9H, 10H, JH, QH, KH}
{D(i) i ¼ 14 to 26} ¼ {AD, 2D, 3D, 4D, 5D, 6D, 7D, 8D, 9D, 10D, JD, QD, KD}
{D(i) i ¼ 27 to 39} ¼ {AS, 2S, 3S, 4S, 5S, 6S, 7S, 8S, 9S, 10S, JS, QS, KS}
{D(i) i ¼ 40 to 52} ¼ {AC, 2C, 3C, 4C, 5C, 6C, 7C, 8C, 9C, 10C, JC, QC, KC}
The model developed here assumes two players (A,B) and each are dealt five
cards from the deck of 52 cards. Altogether, ten cards are dealt, the first five to
player A and the next five to B.
The routine below shows how to generate random hands of five cards each to
players A and B:
1. Using the random integer method of 52 unique integers, randomly generate
n ¼ 10 integers in sequence and label as: {E(j) j ¼ 1 to 10}.
2. Use {E(j) j ¼ 1 to 5} to select from the set D(i), the five cards for player A.
Label as {C(k) k ¼ 1 to 5}.
3. Use {E(j) j ¼ 6 to 10} to select from the set D(i), the five cards for player B.
Label as {C(k) k ¼ 6 to 10}.
78 7 Special Applications
Summary
This chapter concerns applications that are not from the common probability
distributions, continuous or discrete. The applications are instructive since they
show some popular deviations in generating random variates as is often needed in
building computer simulation models. The applications presented are the Poisson
process, constant Poisson process, batch arrivals, active redundancy, standby
redundancy, random integers without replacement and poker.
Chapter 8
Output from Simulation Runs
Introduction
Terminating System
A terminating system is when at some point in time, or in events, the system comes
to a natural end. Various portions of each run of the computer simulation model
may be of interest to the analyst, and for each portion, a collection of output
measures are saved for subsequent statistical analysis. To illustrate, an example is
provided below.
Suppose an analyst is developing a computer simulation model of a large car
wash system. Assume, the business opens at 8 a.m. and closes at 8 p.m; and at the
beginning of the day, the car wash is empty. During the open hours, the customer
arrival rate varies over the day. The end of the day is when the last car enters before
closing time of 8 p.m. The analyst may be interested in the activities at different
interval times of the day, perhaps from 8 to 12 a.m., 12 to 4 p.m. and 4 to 8 p.m. For
convenience, the time periods are labeled as j (j ¼ 1 to 3). Also suppose the
measures of interest (denoted by index k), are collected for each time interval j.
In the example, assume these are the following:
x1j ¼ number vehicle arrivals in j
x2j ¼ number vehicles serviced without waiting in j
x3j ¼ total vehicle wait time in j
x4j ¼ total system idle time in j
p1j ¼ x2j/x1j ¼ percent of vehicles that do not need to wait in j
Suppose further, the model is run n times, each with a different string of random
variates, where i denotes the run number, whereby i ¼ 1 to n. So the output data for
the n runs of the simulation model would be the following:
xkji k ¼ 1 to 4; j ¼ 1 to 3 and i ¼ 1 to n
p1ji k ¼ 1; j ¼ 1 to 3 and i ¼ 1 to n
The computer simulation model could be for a system that is nonterminating and
evolves into an equilibrium state. The computer model would begin empty and flow
through a transient stage prior to reaching the equilibrium stage. To illustrate,
suppose the system under study is a mixed model assembly line where at the
beginning of the simulation there are no units on the line. One by one, units are
Nonterminating Transient Equilibrium Systems 81
placed in station 1 and they move on up to station 2 and beyond, while new units are
arriving to station 1. When the line is filled up in all stations, the transient staged
ends and this is the start of the equilibrium stage. From that point on the system is in
its equilibrium stage. In the example provided, this is fairly obvious; but in the
general case, it is not obvious and a concern is how to determine when the transient
stage ends.
Suppose a variable of interest from the model is denoted as xki, the average of
output index k in the i-th interval (batch) of events. The batches are run one after the
other and the average for each batch is measured on every output index k. The
difference from one batch to the previous is measured and the end of the transient
stage is signaled when the differences begin to cycle around zero. The analyst seeks
the event index i where thereafter, the average difference between two successive
measures is sufficiently close to zero, i.e., ðxk;iþ1 xk;i Þ 0 . So, the transient
stage, for output variable k, ends when such value of i is identified, say i ¼ A,
whereby the equilibrium stage follows. For convenience sake, the analyst would
likely choose one value of A that defines the end of the transient stage for all K
output variables.
Output Data
The output for each run begins after the transient period of A events has elapsed.
When K is the number of output variables of interest, and n is the number of
simulation runs during the equilibrium period, the output becomes the following:
xki k ¼ 1 to K and i ¼ 1 to n:
r ¼ 0.10
Runs w Pd dw dPd
1–500 1.05 0.086 1.05 0.086
501–1,000 1.14 0.112 0.09 0.026
1,001–1,500 1.18 0.106 0.04 0.006
1,501–2,000 1.05 0.116 0.13 0.010
2,001–2,500 1.01 0.081 0 .04 0.035
2,501–3,000 1.08 0.106 0.07 0.025
r ¼ 0.50
Runs w Pd dw dPd
1–500 1.75 0.448 1.75 0.448
501–1,000 2.52 0.548 0.77 0.100
1,001–1,500 2.01 0.474 0.51 0.074
1,501–2,000 1.78 0.462 0.23 0.012
2,001–2,500 2.05 0.516 0.27 0.044
2,501–3,000 1.77 0.524 0.28 0.008
At r ¼ 0.10, note how dw and dPd start to cycle around zero after 1,000 arrivals,
and this signals the transient stage is ending at n ¼ 1,000 arrivals. At r ¼ 0.50, dw
and dPd both start to cycle around zero after 1,500 arrivals, indicating the transient
stage ends at n ¼ 1,500 arrivals, and the equilibrium stage begins.
Another way to collect the data for this system is described here. For all the events
after the transent stage ends, the analyst could specify two parameters, N and M that
will be used in partitions and buffers, respectively. Each partition is of length N and
every buffer is of length M, where typically N M . As the simulation model
progresses, the partitions and buffers will follow in a leapfrog manner one after the
other. For example, the progression of events is the following:
Nonterminating Transient Cyclical Systems 83
1 to N Partition 1
N + 1 to N + M Buffer 1
N + M + 1 to 2N + M Partition 2
2N + M + 1 to 2N + 2M Buffer 2
2N + 2M + 1 to 3N + 2M Partition 3
...
The computer simulation model could be for a system that is nonterminating and
progresses into a cyclical state. The computer model would begin empty and flow
through a transient stage prior to reaching the cyclical stage. To illustrate, suppose
the system under study is a car repair center where customers leave their vehicles
for maintenance or repair. The shop is open 12 h a day and the vehicles remain in
the system, even overnight, until the service is finished. The arrival rate varies by
the time of day and as such the system follows a cyclical daily pattern. In the
computer simulation model, after the transition stage ends, the status evolves into
daily cycles.
Output Data
Upon identifying A, the number of events until the cyclical stage begins, the
computer simulation model now runs with n different strings of random numbers.
The output for each run begins after the transient period of A events have elapsed.
When K is the number of output measures of interest and n is the number of runs,
the output becomes the following:
xki k ¼ 1 to K and i ¼ 1 to n:
The output measures might be collected for each cycle and even at different
intervals of the cycle. Suppose K is the number of variables of interest, J is the
number of intervals in a cycle to measure, and n is the number of partitions. Hence,
the data collected after n partitions is the following:
Other Models
Some simulation models are not time or event related and have none of the
associated traits described earlier as terminating, transient or equilibrium. Instead,
the simulation model may be used to develop data that can be used in subsequent
analysis. Several examples are provided below.
Forecasting Database
compare with the forecast model generated. The analyst could also occasionally
insert an outlier demand in one of the history fields to see how competent the
forecast system is to detect for outliers and adjust accordingly. The output record
for each part might include the following:
Part, Comments, Number Months History, D1, D2, . . .., D24.
In the event the system also has replenish capability, the forecast-replenish
system would then compute the order size, safety stock, order point and order
level. Depending on the on-hand and on-order data, the system would compute
the replenish quantity needed, if any, for each part. The simulation part data now
includes fields with the following: on-hand, on-order, cost per unit, multiple
quantity, minimum buy quantity, lead time, and price break data when
they pertain.
The simulation model has to coordinate this data for each part with the above
forecast data generated to yield a realistic database record for each part.
Comments should be included in a field to allow the analyst to compare the
forecast-replenish system results with the data provided. When the on-hand and
on-order are low, a replenish quantity should be called in the subsequent
replenishment routine. When the on-hand is low and the on-order high, no
replenish quantity is needed, and so forth. The additional data per part may
include the following:
Part, Cost, Multiple Quantity, Min Quantity, Lead Time, On-Hand, On-Order,
Price Break Data.
Example 8.2 An inventory manager is seeking guidance on how to set the
forecasting parameter, a (alpha) that plays an important role in controlling the
inventory. In particular, the horizontal single smoothing forecasting model is in
use to generate the forecasts for the future months for many of the parts in the
inventory. The particular values of a under consideration are: 0.05, 0.10, 0.20,
0.30, 0.40 and 0.50.
A simulation model is developed to randomly generate, for each part, 48 months
of demand history that follows a horizontal demand pattern with a coefficient of
variation (cov) set at 0.30. The demands are randomly generated using the normal
distribution. The cov implies, the standard deviation of the demands is s ¼ 0.30m
where m is the average demand per month. Essentially, cov ¼ s/m where s is the
standard deviation of each monthly demand. So for each part in the simulation
study, the demands generated are: x1, . . .. ,x48.
The forecast model is run through the first 24 months of history and the
forecast for the next 12 months of demands is generated. Starting with the month
24 forecast, the forecast errors are measured for each of the next 12 months, and
the standard error of the forecast error is tabulated. From month 24 to month 36,
the forecast model moves forward and the forecast errors are measured in the
same way.
86 8 Output from Simulation Runs
Altogether, 13 sets of forecasts are generated for each part, (one each for months
24–36), and the corresponding forecast errors are measured. Finally, the cov of the 1
month ahead forecast error is measured by cov ¼ s= x , where s is the standard
deviation of 1 month ahead forecast error, and x is the average 1 month demand.
This process is followed for each of the six parameter values of a listed earlier,
and also for 100 parts where the demand stream of 48 months are each generated
with a different set of random numbers. The average cov from all of the 100 parts
are listed in the table below for each of the a values under review. The results
clearly show where the smaller value of the parameter a yields the best forecast
results. The forecaster is cautiously aware that in the simulation model all the data
are from a horizontal demand pattern and the accuracy results are for history
patterns that are truly of that type.
a Cov
0.05 0.297
0.10 0.305
0.20 0.315
0.30 0.318
0.40 0.325
0.50 0.344
The table below lists the results from each of the three sets of demand history
from the same part. The table shows how the cov increases tremendously when
the part has one outlier in the demand history. The forecast manager can now
clearly see how important it is to include logic in the forecasting routine to
detect for outlier demands and adjust accordingly prior to forecasting.
Summary
Introduction
This chapter is a quick review on some of the common statistical tests that are useful
in analyzing the output data from runs of a computer simulation model. This pertains
when each run of the model yields a group of k unique output measures that are of
interest to the analyst. When the model is run n times, each with a different string of
continuous uniform u ~ U(0,1) random variates, the output data is generated inde-
pendently from run to run, and therefore the data can be analyzed using ordinary
statistical methods. See, for example, Hines et al. (2003) for a full description on
statistical methods. Some of the output data may be of the variable type and some
may be of the proportion type. The appropriate statistical method for each type of
data is applied as needed. This includes, measuring the average value and computing
the confidence interval of the true mean. Oftentimes, the simulation model is run
with one or more control input variables in a ‘what if’ manner. The output data
between the two or more settings of the control variables can be compared using
appropriate statistical tools. This includes testing for significant difference between
two means, between two proportions, and between k or more means.
Example 9.1 A maintenance and repair shop for cars is open Monday through
Saturday from 8 a.m. till 6 p.m. The cars needing service arrive during the day with
an average arrival rate via the Poisson distribution. The service times vary via a
gamma distribution with a location parameter to signify the minimum time of
service. A simulation model is developed to emulate the daily activities and the
model collects a series of measures of interest to the analyst. The number of
independent runs of the model is n. Some of the measures collected in each run
of the model are listed below:
n0 ¼ number of bays (this is a control parameter)
n1 ¼ number of vehicles that arrive for service.
n2 ¼ number of parts needed to complete the service.
n3 ¼ number of vehicles serviced without a delay.
The variable type data for the individual daily run are the following:
x1 ¼ S1/n1 ¼ average wait time per vehicle.
x2 ¼ S2/(10 n0) ¼ average idle time per hour per bay.
x3 ¼ n3/n1 ¼ service level for the vehicles.
x4 ¼ n4/n2 ¼ service level for the parts.
When n replications of the simulation model are run, and n5 is the number of days
when one or more vehicles wait at least 60 min for service, then,
p ¼ n5/n is the proportion of days when a vehicle waits 60 or more minutes.
Variable type data is like time to complete a service on an item in repair, strength
of a steel beam, elasticity from rubber compound, labor cost to service a vehicle,
and so forth. Upon completion of n replications of a simulation model, the analyst
may be interested in determining the point estimate on the variable measured in
the model and also the corresponding confidence interval. The analyst may
further inquire on how many replications are needed to gain the precision desired
in the estimate.
Consider the output measure, x, from a run of a simulation model. The variable x
could be the average of a large number of events that take place in the model. In the
service department of the car dealership model, x could be the average time a
customer waits for service on the vehicle. In the typical situation, the distribution
and the mean and variance of x are unknown. When n replications of the simulation
are run, with the same initial conditions and with different streams of continuous
uniform u ~ U(0,1) random variates, then the output data, x1, . . ., xn, for each
random variable can be treated as independent observations of the random variable.
Analysis of Variable Type Data 93
Some of the more common statistical tools for analyzing the output data are
presented below. Each of the tools are valid as long as the n sample observations,
x1, . . ., xn, are statistically independent.
Using the n output results, x1, . . ., xn, the sample mean and variance of x are
computed as below:
X
n
x ¼ xi =n
i¼1
X
n
s2 ¼ ðxi xÞ2 =ðn 1Þ
i¼1
pffiffiffiffi
The standard deviation of x is s ¼ s2, and the standard error of the mean, sx, is
obtained by,
pffiffiffi
sx ¼ s= n
LmU
where, the upper and lower confidence limits (U, L) are, obtained from
U ¼ x þ ta=2 sx
L ¼ x ta=2 sx
94 9 Analysis of Output Data
Note, ta/2, has (n 1) degrees of freedom and is the a/2 upper-tail percentage-
point of the student’s t distribution where P(t > ta/2) ¼ a/2. Hence,
PðL m UÞ ¼ 1 a:
In the event n is large, say n > 30, the standard normal variable, z, can be used in
place of the student’s variable, t. In this event, the confidence limits become,
U ¼ x þ za=2 sx
L ¼ x za=2 sx
where za/2 is the z value from N(0,1)that gives P(z > za/2) ¼ a/2.
In the event, x is not normally distributed, and the sample size is small, an
approximate (1 a) confidence interval of m is computed in the same way. That is,
LmU
where, the upper and lower confidence limits (U, L) are, obtained from
U ¼ x þ ta=2 sx
L ¼ x ta=2 sx
The term ta/2 is the upper-tail percentage-point from the student’s t distribution
with degrees of freedom (n 1), where P(t > ta/2) ¼ a/2. But since, x is not
normal, the exact probability of the interval is not truly known and is
approximated as,
PðL m UÞ 1 a:
When n increases, the Central Limit Theorem applies, and the distribution shape of
the sample mean approaches a normal distribution. Hence, the standard normal
variable, z, replaces the student’s variable, t and the confidence limits become,
Analysis of Variable Type Data 95
U ¼ x þ za=2 sx
L ¼ x za=2 sx
where za/2 is the z value that gives P(z > za/2) ¼ a/2. The confidence interval on m
becomes,
PðL m UÞ ¼ 1 a:
Example 9.2 Suppose a simulation run yields a variable, x with each trial run of
the simulation. Assume further, the simulation is run with n ¼ 10 repetitions, and
each repetition begins with the same initial values and terminates over the same
length of events. The only difference is the stream of random variates used in each
of the simulation runs. So, as much as possible, there are now n output results, x1,
. . ., xn, that are from the same distribution and are independently generated.
Assume further, the sample mean and variance from the ten samples are the
following:
x ¼ 20:0
and
s2 ¼ 25:0;
respectively. The standard deviation becomes s ¼ 5, and the standard error of the
mean is,
pffiffiffi
sx ¼ s= n ¼ 1:58:
Suppose the analyst wants the length of the (1 a) confidence interval, currently
(U L), to shrink to 2E, and all else remains the same. The number of repetitions
to achieve this goal is obtained from the relation below,
where s is the sample standard deviation, t is the student’s t value and E is the
precision sought. Note the student’s t value must be coordinated with degree of
freedom (n 1). The problem with the above relation is that the t value cannot be
inserted in the formula until the sample size n is known.
A way to approximate the formula is to use the normal z value instead of the
student’s t value in the above formula. Replacing t with z yields,
The above estimate of n will be less or equal to the counterpart value of n when
the student’s t is used. As n gets larger, (n > 30), the difference between using the
t and z value is minor.
Example 9.3 Consider the approximate 95 % confidence interval that is shown in
Example 9.2. Suppose the analyst wants the length of the 95 confidence interval to
shrink from the current (U L) ¼ (23.57 16.43) ¼ 7.14 to (U L) ¼ 4.0.
The number of repetitions to achieve this goal is obtained from the relation below,
Proportion type data is like the portion of trials an event occurs. Examples are: the
portion of units that have defects; the portion of customers who use a credit card for
a purchase; the portion of customers in a gas station that use premium gas; the
portion of police calls that have to wait more than 10 min for service, and so forth.
Upon completion of n replications of a simulation model, the analyst may be
interested in determining the point estimate of the proportion measured in the
model, and also on the corresponding confidence interval. The analyst may also
inquire how many replications are needed to gain more precision.
Analysis of Proportion Type Data 97
Using the n output results, let w represent the number from the n replications where
a specified event occurs. The estimated proportion is obtained by,
p^ ¼ w=n
p2 ¼ p^ð1 p^Þ=n
s^
Confidence Interval of p
LpU
U ¼ p^ þ za=2 s^
p
L ¼ p^ za=2 s^
p
PðL p UÞ ¼ ð1 aÞ:
p^ ¼ 6=60 ¼ 0:10:
98 9 Analysis of Output Data
ð0:024 p 0:176Þ
In the event the analyst desires more accuracy on the estimate of the proportion, p,
the number of repetitions needs to increase. The following formula computes the
size of n for the accuracy desired. Suppose (1 a) is fixed and the tolerance
E ¼ 0.5(U L) desired is specified. So the number of repetitions becomes,
If an estimate on the proportion p is not available, then set p ¼ 0.5, and the
number of repetitions becomes
n ¼ ð0:25Þ½za=2 =E2
Example 9.5 Consider Example 9.4 again, and assume the analyst needs more
accuracy in the estimate of p. He/she wants to lower the 95 % tolerance from 0.5
(0.176 0.24) ¼ 0.076 to E ¼ 0.050. The question is how many more repetitions
Analysis of Proportion Type Data 99
are needed to accomplish. The above formula is used assuming p ¼ 0.10 and letting
(1 a) ¼ 0.95. Hence, a ¼ 0.05 and za/2 ¼ 1.96. The number of repetitions
needed becomes,
n ¼ ð0:1Þð0:9Þ½1:96=0:052 ¼ 138:3:
No Ns PðNg NoÞ
10 10 0.156
10 11 0.537
10 12 0.850
10 13 0.952
10 14 0.988
10 15 0.999
Example 9.7 In Example 9.6, from 1,000 trials with control variable Ns ¼ 13 as
the number of units to start production, the number of runs producing ten or more
good units becomes 952. This is a proportion situation where the estimate of the
proportion of good units, p, becomes p^ ¼ 952=1000 ¼ 0:952 . The associated
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
standard deviation of this estimate is sp ¼ ð:952 :048Þ=1000 ¼ :0067. Hence,
the 90 % confidence limits of p are,
Based on the confidence interval, since L ¼ 0.941 is less than 0.95, the manage-
ment should be aware, with 90 % confidence, the proportion of good units could be
lower than the goal of 0.95, when Ns ¼ 13. Instead of settling on Ns ¼ 14 units to
start the production process, the analyst could consider taking more samples to gain
a higher precision on the estimate of p at Ns ¼ 13.
Sometimes the analyst is seeking the better solution when two or more control
options are in consideration. An example could be a mixed model assembly line
where k different models are produced on the line and the analyst is seeking the best
way to send the different models down the line for the day. One way is to send each
model down the line in batches, and another way is to send them down the line in a
random order. The assembly time for each model is known by station. A simulation
model is developed and is run for a day’s activity and the following type of
measures are tallied. Idle time is recorded when an operator in a station must wait
for the next unit in sequence before he/she can begin its work. Congestion is when
the operator has extended work on a unit and is forced to continue working on the
unit even when the next unit arrives at the station on a moving conveyer. A goal is
to minimize the idle time and congestion time over the day’s activities.
Comparing Two Means when Variable Type Data 101
S1 ¼ sum of idle time across all stations and models for the day.
S2 ¼ sum of congestion time across all stations and models for the day.
N1 ¼ number of units assembled for the day.
N2 ¼ number of stations on the line.
N3 ¼ number of units with congestion over the day.
The computations for the day are the following:
x1 ¼ S1=ðN1 N2Þ ¼ average idle time per station per unit assembled.
x2 ¼ S2 =ðN1 N2Þ ¼ average congestion time per station per unit assembled.
Note, the measures x1 and x2 are variable type data, and p1 is proportion
type data.
Suppose a terminating simulation model has two different options (1 and 2) and the
analyst wants to compare the one against the other to see which is more preferable.
For option 1, n1 repetitions of simulation runs are taken and the output yields a
sample mean x1 and variance s21. For option 2, n2 simulation runs are generated and
the results give x2 and s22 . Typically, the number of simulation runs are the same,
whereby n1 ¼ n2. The true mean and variance of options 1 and 2 are not known and
are estimated by the sample simulation runs.
Comparing x1 and x2
The estimate of the difference between the true means is measured by, x1 x2 .
Denoting the true means of options 1 and 2 by m1 and m2, respectively, the
difference between the sample means is an estimate of the difference between the
true means. The expected value of the difference yields the following,
x1 x2 Þ ¼ ðm1 m2 Þ
Eð
L ðm1 m2 Þ U
and
P½L ðm1 m2 Þ U ¼ ð1 aÞ
The lower and upper limits are computed in the following way,
Where
sðx1 x2 Þ ¼ standard error of the difference between the two means,
ta/2 ¼ the student’s t value with degrees of freedom df.
The way to compute the above standard error and degrees of freedom will be
given subsequently.
Significant Test
The significance of the difference between the two options can be noted by use of
the confidence interval by observing the range of values from confidence limits L to
U. In the event the interval (L to U) passes through zero, the means of the two
options are not significantly different with (1 a) confidence level. When the
interval is always positive, the mean of option 1 is significantly higher than the
mean of option 2. On the other hand, if the interval is always negative, the mean of
option 1 is significantly smaller than the mean of option 2.
When s1 ¼ s2
When the true standard deviations of the two options are assumed the same, the way
to measure the standard error of the difference between the two means is shown
below.
First, the two sample variances are combined to estimate the common variance.
The is called the pooled estimate of the variance and is computed as follows,
s2 ¼ ðn1 1Þs1 2 þ ðn2 1Þs2 2 =½n1 þ n2 2
Comparing Two Means when Variable Type Data 103
Second, the standard error on the difference between the two means becomes,
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 1
sðx1 x2 Þ ¼s þ
n1 n2
df ¼ ðn1 þ n2 2Þ:
When s1 6¼ s2
When the true standard deviations of the two options are not assumed the same, the
way to measure the standard error of the difference between the two means is
below:
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
s21 s22
sðx1 x2 Þ ¼ þ
n1 n2
x1 2 ¼ s21 =n1
s
x2 2 ¼ s22 =n1
s
2
x1 2 þ s
df ¼ ½s x2 2 =½s
x1 4 =ðn1 þ 1Þ þ s
x2 4 =ðn2 þ 1Þ 2
When one or both variables x1 and x2 are not normally distributed, it is possible to
compute an approximate (1 a) confidence interval on the difference between the
two means, (m1 m2) in the same way as shown above when the normal distribu-
tion applies. The result will yield approximate upper and lower confidence limits,
(U, L) where,
104 9 Analysis of Output Data
L ðm1 m2 Þ U
and
P½L ðm1 m2 Þ U ð1 aÞ
The upper and lower limits are computed in the following way,
Via the central limit theorem, as the degrees of freedom increases, the shape of the
x1 x2 Þ increasingly resembles a normal distribution, and eventu-
distribution of ð
ally the approximation term in the confidence interval is dropped.
Example 9.8 Suppose a terminating simulation model has two options, (1,2), and
ten simulation runs of each are taken. The goal is to compare the difference between
the means of each option. Assume the sample mean and variance of the two options
are computed and the results are listed below.
Option 1: n1 ¼ 10, x1 ¼ 50, and s21 ¼ 36.
Option 2: n2 ¼ 10, x2 ¼ 46, and s21 ¼ 27.
The analyst assumes the variables x1 and x2 are sufficiently close to a normal
distribution, and also the variances of the two options are equal. Hence, the standard
error of the difference between the two means is computed as follows.
First, the pooled estimate of the variance is calculated as below.
Second, the standard error on the difference between the means is calculated as,
Comparing the Proportions Between Two Options 105
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 1
sðx1 x2 Þ ¼ 5:61 þ
10 10
¼ 2:51
Hence, the upper and lower confidence limits are computed as below,
U ¼ ð50 46Þ þ 2:101 2:51 ¼ 5:27
Note, the range of the confidence interval passes across zero. Hence, with the
sample sizes taken so far, there is no evidence of a significant difference between
the means of the two options at the (1 a) ¼ 95 % confidence level.
Suppose a terminating simulation model has two different options (1 and 2) and the
analyst wants to compare the one against the other to see which is more preferable.
For option 1, n1 repetitions of simulation runs are taken and x1 of them have an
attribute of interest. The proportion of repetitions that have the attribute is measured
by p^1 ¼ x1 =n1 . For option 2, n2 repetitions are taken, x2 have the attribute and the
proportion becomes p^2 ¼ x2 =n2 . Typically, the number of simulation runs are the
same, whereby n1 ¼ n2. The true proportions of options 1 and 2 are not known and
are estimated by the sample simulation runs.
Comparing p1 and p2
The estimate of the difference between the two true proportions p1 and p2 is
measured by the difference of their estimates p^1 and p^2 . The expected value of
the difference yields the following,
106 9 Analysis of Output Data
p1 p^2 Þ ¼ ðp1 p2 Þ
Eð^
and the estimate of the difference in the two proportions is p^1 p^2 .
When a sufficient number of repetitions (n1, n2) are taken, the normal distribution
applies to the shape of the difference between p^1 and p^2 , and it is possible to
compute a (1 a) confidence interval on the difference between the two
proportions, (p1 p2). The result will yield upper and lower confidence limits,
(U, L) where,
L ðp1 p2 Þ U
and
P½L ðp1 p2 Þ U ¼ ð1 aÞ
The lower and upper limits are computed in the following way,
where
sp1 p2 ¼ standard error of the difference between the two proportions,
za/2 ¼ the standard normal variable where P(z > za/2) ¼ a/2.
The standard error sp1 p2 is calculated as follows,
0:5
sp1p2 ¼ ½sp1 2 þ sp2 2
Comparing the Proportions Between Two Options 107
Significant Test
The significance of the difference between the two options can be noted by use of
the confidence interval by observing the range of values from confidence limits L to
U. In the event the interval (L to U) passes through zero, the proportions of the two
options are not significantly different with (1 a) confidence level. When the
interval is always positive, the proportion of option 1 is significantly higher than the
proportion of option 2. On the other hand, if the interval is always negative, the
proportion of option 1 is significantly smaller than the proportion of option 2.
Example 9.9 Suppose a terminating simulation model is run with two options to
determine which is preferable with respected to an event, A. Option 1 is
run with n1 ¼ 200 repetitions and event A occurs on x1 ¼ 28 occasions. Hence,
p^1 ¼ x1 =n1 ¼ 0:14 is the portion of times that event A has occurred. Option 2 is
run with n2 ¼ 200 repetitions and event A occurs on x2 ¼ 44 occasions, whereas,
p^2 ¼ x2 =n2 ¼ 0:22. The analyst wants to determine the 95 % confidence interval on
the difference between the two proportions, (p1 p2).
The point estimate on the true difference between the two proportions is,
The standard error between the two proportions is now computed as below,
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
sp1p2 ¼ 0:00060 þ 0:00086
¼ 0:0382
ð0:1546 p1 p2 0:0052Þ
108 9 Analysis of Output Data
Finally, since the range from the lower limit to the upper limit is all negative
numbers, the difference between the two proportions is significant at the 0.95
confidence level, and thereby, p1 is significantly smaller than p2.
Suppose a simulation model with terminating type data where the analyst is
comparing various options in the model seeking the combination of options that
yield the optimal efficiency. Could be a mixed model assembly line simulation
model where the analyst is seeking the best way to sequence the units down the line
to minimize the total idle time and congestion time on the line. One of the output
measures is a variable type data. Each option is run with n repetitions and various
outputs results are measured. The analyst wants to determine whether the difference
in the output measure is significant.
The one-way analysis of variance is a method to test for significant differences in
the output results. In the event, the options are found significantly different in an
output measure, the next step is to determine which option(s) give significantly
better results. Below shows how to use the one-way analysis of variance method.
Assume k treatments and n repetitions of each are observed (in simulation runs).
Note, in this situation, treatment is the same as option and is the common term
in use for analysis of variance. When k treatments and n repetitions, the data
available are:
xij i ¼ 1; . . . ; k and j ¼ 1; . . . ; n:
The one-way analysis of variance method assumes each of the options, i, have
mean mi and variance s2, all not known, and all are normally distributed. The null
hypothesis is below:
Ho : m1 ¼ . . . ¼ mk
In the event Ho is rejected, the analyst will seek the option(s) that yield signifi-
cantly better results. One way to do this is by comparing the difference of two
variables, as described earlier in this chapter. More of this is given subsequently.
A first step in using this method is to calculate the sample averages for each
treatment, i, and for the total of all treatments as shown below.
X
n
xi ¼ xij =n i ¼ 1 to k
j¼1
X
k
x ¼ xl =k
i¼1
A second step is to compute, the sum of squares of treatments, (SSTR), and the
sum of squares of error, (SSE) as below.
X
k
SSTR ¼ n xi xÞ2
ð
i¼1
k X
X n
SSE ¼ ðxij xl Þ2
i¼1 j¼1
The degrees of freedom for the treatments and for the error are the following:
df TR ¼ k 1
df E ¼ nk k
Next, the mean square for the treatments and the mean square of errors are
obtained as follows:
The residual errors for each observation are denoted as, eij ¼ ðxij xi Þ, and the
estimate of the variance becomes,
^2 ¼ MSE
s
EðMSE Þ ¼ s2
110 9 Analysis of Output Data
To test if the null hypothesis is true, Fisher’s test is applied and Fo is computed
by,
Fo ¼ MSTR =MSE
Observation j
Option i 1 2 3 4 5 Average
1 10 9 11 8 12 x1 ¼ 10:0
2 6 10 7 8 9 x2 ¼ 8:0
3 9 5 6 9 6 x3 ¼ 7:0
Total x ¼ 8:33
The associated sum of squares, degrees of freedom and mean squares are below.
The F value in Fisher’s tables with a ¼ 0.05 and degrees of freedom (dfTR, dfE)
¼ (2, 12) yields, F0.05(2,12) ¼ 3.89.
Since, Fo > 3.89, the null hypothesis is rejected, indicating there is a significant
difference in the means from two or more of the options.
In this example, the smaller the mean, the better. The simulation results show
where option 3 gives the best results and option 2 is the next best. The next step is to
determine whether the sample mean values of options 2 and 3 are significantly
different of not.
Example 9.11 Continuing with Example 9.10, the goal now is to determine
whether the means of options 2 and 3 are significantly different. The estimate of
the variance for the residual errors is s^2 ¼ MSE ¼ 2:833 ; thereby, the standard
pffiffiffiffiffi
^¼ s
error is s ^2 ¼ 1:683 , and the associated degrees of freedom is dfE ¼ 12.
The (1 a) ¼ 95 % confidence limits (U and L) between the means of options
2 and 3 is computed using the student’s t value with a/2 ¼ 0.025 and dfE ¼ 12,
whereby ta/2 ¼ 2.179. Hence,
pffiffiffiffiffiffiffiffiffi
U ¼ ðx3 x2 Þ þ ta=2 s
^= ð2nÞ ¼ 0:160
pffiffiffiffiffiffiffiffiffi
x3 x2 Þ ta=2 s
L ¼ ð ^= ð2nÞ ¼ 2:160
ð2:16 m3 m2 0:16Þ
Because the values from L to U pass through zero, the mean of option 3 is not
significantly smaller than the mean of option 2 with 95 % confidence.
Example 9.12 The 95 % confidence intervals of all three comparisons are below:
The results show where option 1 is significantly higher from options 2 and 3
since the upper and lower limits are both negative and do not pass through zero. As
stated, options 2 and 3 are not significantly different from each other, but option 1 is
significantly higher. The analyst might consider taking more samples to gain further
precision on comparing the difference between options 2 and 3.
112 9 Analysis of Output Data
Summary
This chapter describes the common statistical methods that are used to analyze the
output data from computer models that are based on terminating and nonterminat-
ing systems. The statistical methods are essentially the same that are described in
the common statistical textbooks. They include measuring the average, standard
deviation, confidence interval from output data, some of the variable type and some
of the proportion type. The methods described also pertain when the two or more
variables are in review.
Chapter 10
Choosing the Probability Distribution from Data
Introduction
In building a simulation model, the analyst often includes several input variables of
the control and random type. The control variables are those that are of the “what if”
type. Often, the purpose of the simulation model is to determine how to set the
control variables seeking optimal results. For example, in an inventory simulation
model, the control variables may be the service level and the holding rate, both of
which are controlled by the inventory manager. On each run of the model, the
analyst sets the values of the control variables and observes the output measures to
see how the system reacts.
Another type of variable is the input random variables, and these are of the
continuous and discrete type. This type of variable is needed to match, as best as
possible, the real life system for which the simulation model is seeking to
emulate. For each such variable, the analyst is confronted with choosing the
probability distribution to apply and the parameter value(s) to use. Often empiri-
cal or sample data is available to assist in choosing the distribution to apply and
in estimating the associated parameter values. Sometimes two or more
distributions may seem appropriate and the one to select is needed. The authen-
ticity of the simulation model largely depends on how well the analyst can
emulate the real system. Choosing the random variables and their parameter
values is vital in this process.
This chapter gives guidance on the steps to find the probability distribution to
use in the simulation model and how to estimate the parameter values that
pertain. For each of the random variables in the simulation model with data
available, the following steps are described: verify the data is independent,
compute various statistical measures, choose the candidate probability
distributions, estimate the parameter(s) for each probability distribution, and
determine the adequacy of the fit.
For each random variable in the simulation model, the analyst is obliged to seek
actual data (empirical or sample) from the real system under study. For example, if
the variable is the number of units purchased with each customer order for an item,
the data collected might be the history of number of pieces from each order, for the
item, in the past year, say. Since the numbers are integers, the variable is from a
discrete probability distribution. The data is recorded and identified as x1, . . ., xn,
where n is the number of data entries collected. Subsequently, this sample or
empirical data is the catalyst in selecting the probability distribution for the
variable. The data is also needed to estimate the distribution’s parameter value,
and subsequently is applied to compare fitted values with the actual values.
Autocorrelation
Suppose the data, x1, . . ., xn, is sequentially collected, and thereby may not be
independent. A way to detect for independence is by measuring the autocorrelation
of the data with various lags. The sample autocorrelation with a lag of k, denoted as
rk, is computed as below.
Xn
ðxi xÞðxik xÞ
rk ¼ P
n k ¼ 1; 2; 3; ::
i¼kþ1 ðxi xÞ2
i¼1
where x is the average of all x’s. When all sample autocorrelations are near zero,
plus or minus, the data is assumed independent. In the event the data appears not
independent, the sample should be retaken, perhaps sampling one item out of each
five items, or one item per hour, so forth.
Introduction 115
Example 10.1 Assume a series of observations are taken in sequence and the first
three autocorrelations, say, are: 0.89, 0.54, 0.39 for lags of k ¼ 1, 2, 3, respectively.
Since they are not near zero (plus or minus), the data does not appear as indepen-
dent. On the other hand, if the first three autocorrelations were: 0.07, 0.13, 0.18,
the data does appear independent.
A variety of statistical measures can be computed from the sample data, x1, . . ., xn.
Some that are useful in selecting the probability distribution are listed here:
xð1Þ ¼ minimum
xðnÞ ¼ maximum
x ¼ average
s ¼ standard deviation
cov ¼ s=
x ¼ coefficient of variation
t ¼ s2 =
x ¼ lexis ratio
Example 10.2 Suppose 20 samples are the following: [5.3, 9.8, 5.1, 0.6, 3.9, 8.1,
4.0, 0.1, 4.6, 0.6, 2.9, 7.1, 2.7, 7.0, 2.5, 5.8, 3.0, 7.6, 3.5, 7.7]. The statistical
measures from this data are listed below.
xð1Þ ¼ 0:1
xð20Þ ¼ 9:8
x ¼ 4:595
s ¼ 2:713
cov ¼ 0:590
t ¼ 1:602
Location Parameter
^
g ¼ ½x ð1Þ x ðnÞ xðkÞ2 =½x ð1Þ þ xðnÞ 2xðkÞ
The typical probability distribution candidates for a continuous random variable are
the following: continuous uniform, normal, exponential, lognormal, gamma, beta
and Wiebull. The more common discrete probability distributions are the discrete
uniform, binomial, geometric, Pascal and Poisson.
Transforming Variables
¼ ½s=ðx a0 Þ . The cov of this measure may be useful when selecting the
distribution to apply.
Candidate Continuous Distributions 117
Transform Data to ðx 0Þ
A way to convert a variable to a range where x0 lies approximately zero and larger is
described here. Recall again the summary statistics of the variable x as listed earlier,
and once more use the notation a0 ¼ x(1) for the minimum. When x is converted to
x0 by the relation x0 ¼ (xa0 ), the range of x0 becomes zero or larger. The
corresponding sample average and standard deviation become, x0 ¼ ð x a0 Þ, and
0 0
s ¼ s, respectively. Finally, the coefficient of variation is cov ¼ s=ð x a0 Þ.
Continuous Uniform
The random variable x from the continuous uniform distribution (0,1) has a range of
pffiffiffiffiffi
zero to one. The mean is m ¼ 0:5 and standard deviation is s ¼ 1= 12 ¼ 0:289,
and thereby, the coefficient of variation becomes cov ¼ s=m ¼ 0:577.
Normal
When a variable x is normally distributed with mean m and standard deviation s, the
notation is x ~ Nðm; s2 Þ. Note, the coefficient of variation for x is cov ¼ s=m. When
all x values are zero or larger, the coefficient of variation is always 0.33 or smaller,
i.e., cov 0:33.
Exponential
Recall the exponential distribution where the variable x is zero or larger. The mean,
m; and standard deviation, s; of this distribution have the same value and thereby,
the coefficient of variation is cov ¼ s=m ¼ 1:00.
118 10 Choosing the Probability Distribution from Data
Lognormal
Gamma
The variable x from the (standard) gamma distribution is always zero or larger and
has parameters (k; y). Recall, the mean and variance of x are m ¼ k=y, and s2 ¼ k=y2,
pffiffiffi
respectively, and therefore, the coefficient of variation is cov ¼ 1= k. When k>1,
cov is less than one. When k 1, cov is one or larger. Note, the mode is (k 1)/y
when k 1, and is zero when k < 1.
Beta
The variable x from a beta distribution has many shapes that could skew right or
left or be symmetric and look like the uniform, normal and may even have a
bathtub-like shape. This distribution emulates most shapes, but is a bit difficult
to apply. The parameters are ðk1 ; k2 Þ , and the mean and variance are shown
below:
k1
m¼
k1 þ k2
ðk1 k2 Þ
s2 ¼
ðk1 þ k2 Þ2 ðk1 þ k2 þ 1Þ
Weibull
The variable x from a Wiebull distribution has three parameters, ðk1 ; k2 ; gÞ, where g
is the location parameter and can be estimated from the relation given earlier. The
values of x are greater than g and the shape is skewed to the right after the mode is
reached. The mean and variance are below:
Some Candidate Discrete Distributions 119
k2 1
m¼ G
k1 k1
ðk2 2 Þ
s2 ¼ 2
k1 2G 2
k1 1=k1 G k11
Discrete Uniform
The variable x with the discrete uniform distribution has parameters, (a,b), where x
are all the integers from a to b. The mean and variance of x are m ¼ ða þ bÞ=2 and
s2 ¼ ½ðb a þ 1Þ2 1=12, respectively. When a ¼ 0, the lexis ratio is t ¼ s2 =m
¼ ½ðb þ 1Þ2 1=6b. Note, when b 4, t 1.
Binomial
The parameters for the binomial distribution are n (number of trials) and p (proba-
bility of a success per trial). The random variable is x (number of successes in n
trials). The mean of x is m ¼ np, and the variance is s2 ¼ npð1 pÞ. Hence, the
lexis ratio, t ¼ s2 =m ¼ ð1 pÞ<1.
Geometric
Pascal
The parameters for the Pascal distribution are p (probability of a success) and
k (number of successes). The random variable is x (number of fails till k successes),
where x ¼ 0, 1, 2,. . ., the mean is m ¼ kð1 pÞ=p , and the variance is
s2 ¼ kð1 pÞ=p2 . The lexis ratio is t ¼ s2 =m ¼ 1=p>1.
But when x0 ¼ (x + k) is the number of trials until k success’s, x0 ¼ k, k + 1, . . .,
the mean is m ¼ k=p, the variance remains as s2 ¼ kð1 pÞ=p2 , and the lexis ratio
becomes t ¼ s2 =m ¼ ð1 pÞ=p. In this situation, the lexis ratio ranges above and
below one.
Poisson
The parameter for the Poisson distribution is y (rate per unit of measure), where the
unit of measure is typically a unit of time (minute, hour), and so forth. The random
variable is x (number of events in a unit of measure). Since the mean of x is m ¼ y,
and the variance is s2 ¼ y, the lexis ratio becomes t ¼ s2 =m ¼ 1.
Below gives the popular ways to estimate the parameters for the common continu-
ous distributions. These are by the maximum-likelihood estimators and/or the
method-of-moment estimators.
Continuous Uniform
The parameters of the continuous uniform distribution are (a,b) where the variable x
is equally likely to fall anywhere from a to b. When the data x1, . . ., xn is available,
the maximum likelihood estimates of the parameters are as follows:
a^ ¼ minðx1 ; . . . ; xn Þ
b^ ¼ maxðx1 ; . . . ; xn Þ
Estimating Parameters for Continuous Distributions 121
Example 10.4 Consider a situation where the sample of n ¼ 20 yield the follow-
ing sorted data: [0.1, 0.6, 0.6, 2.5, 2.7, 2.9, 3.0, 3.5, 3.9, 4.0, 4.6, 5.1, 5.3, 5.8, 7.0,
7.1, 7.6, 7.7, 8.1, 9.8], and suppose the analyst suspects the data comes from a
continuous uniform distribution and thereby needs estimates of the parameters, a
and b. From the maximum likelihood estimator method, the estimates of the
parameters are a^ ¼ 0:1 and b^ ¼ 9:8.
Another way to estimate the parameters is by the method-of-moments. To find
the estimates this way, the average and standard deviation of the data entries are
needed, and they are: x ¼ 4:595 and s ¼ 2.713, respectively. Thereby the method-
of-moment estimates become a^ ¼ 0:10 and b^ ¼ 9:29.
Normal Distribution
The normal distribution has two parameters, m, the mean, and s2, the variance.
The estimates are obtained from the sample mean, x, and sample variance, s2,
as below,
^ ¼ x
m
^ 2 ¼ s2
s
Example 10.5 Suppose the analyst has ten sample sorted data entries as [1.3, 6.4,
7.1, 8.7, 9.1, 10.2, 11.5, 14.3, 16.1, 18.0]. The sample average is x ¼ 10:27 and the
standard deviation is s ¼ 4.95. Hence, x is estimated as: N(10.27, 4.952).
Exponential
The exponential distribution has one parameter, y, where the mean and standard
deviation of x are equal whereby m ¼ s ¼ 1y . The maximum-likelihood-estimator of
the parameter is based on the sample mean, x, as shown below,
^
y ¼ 1=
x
122 10 Choosing the Probability Distribution from Data
Example 10.6 Suppose the analyst has the following data with n ¼ 10
observations: 3.0, 5.7, 10.8, 0.3, 1.5, 2.5, 4.5, 7.3, 1.3, 2.1, and assumes the data
comes from an exponential distribution. The sample average is x ¼ 3:90 , and
thereby the estimate of the exponential parameter is ^y ¼ 1= x ¼ 1=3:90 ¼ 0:256.
Upon further computations, the standard deviation of the ten observations is
measured as s ¼ 3.24, not too far away from the average of 3.90.
Lognormal
Consider the variable x of the lognormal distribution, and another related variable,
y, that is the natural logarithm of x, i.e. y ¼ ln(x). The parameters for x are the
mean and variance of y and are denoted as, my, and sy2, respectively. To estimate
the parameters, the n corresponding values of y (y1, . . ., yn) are needed to give the
sample average, y, and the sample variance, sy2. The estimates of the parameters for
the lognormal distribution are the following:
^y ¼ y
m
^ y 2 ¼ sy 2
s
Example10.7 Assume the analyst has collected ten sample entries as X ¼ [0.3,
1.3, 1.5, 2.1, 2.5, 3.0, 4.5, 5.7, 7.3, 10.8]. Upon taking the natural logarithm of each,
y ¼ ln(x), the sample now has ten variables on y. The corresponding values of y are
Y ¼ [1.204, 0.262, 0.405, 0.742, 0.916, 1.099, 1.504, 1.741, 1.988, 2.379]. The
mean and variance of the n ¼ 10 observations of y are y ¼ 0:983 and sy2 ¼ 1.057.
Gamma
The variable x from the gamma distribution has two parameters ðk; yÞ. The mean of
x is m ¼ k=y, and the variance is s2 ¼ k=y2 . One way to estimate the parameters is
by the method-of-moments using the sample average, x, and the sample variance, s2
that are computed from data, x1, . . ., xn. The estimate of the gamma parameters are
derived from,
^
y ¼ x=s2
k^ ¼ x^
y
Estimating Parameters for Discrete Distributions 123
Example 10.8 Assume a sample of n entries, x1, . . ., xn, are collected, from which
the average and variance are measured as x ¼ 10:8 and s2 ¼ 4.3, respectively. The
analyst wants to estimate the gamma parameters for this data. Using the method of
moments, the estimates are: ^y ¼ 2:51 and k^ ¼ 27:12.
Beta
The variable x from the beta distribution (0–1) has two parameters (k1,k2). The
mean of x is m ¼ k1kþk
1
2
, and when the two parameters are greater than zero, (k1 > 0,
~ ¼ ðkðk
k2 > 0), the mode is m 1 1Þ
1 þk2 2Þ
. In the typical situation, the distribution skews to
the right. This occurs when k2 > k1 > 1. For this situation, a way to estimate the
parameters is with use of the sample average, x, and the sample mode, x~. From
the two equations and two unknowns, and some algebra, the estimates of the
parameters are computed as below:
k^1 ¼ x½2~ x x
x 1=½~
k^2 ¼ ½1 xk^1 =
x
Example 10.9 Assume sample data of x that lies between 0 and 1, and yield the
average and mode as x ¼ 0:4 and x~ ¼ 0:2, respectively. The analyst wants to estimate
the parameters for a beta distribution in the range (0–1). The estimates are below:
k^1 ¼ 0:4½2 0:2 1=½0:2 0:4 ¼ 1:2
Below gives the popular ways to estimate the parameters for the common discrete
distributions. These are by the maximum-likelihood estimators and/or the method-
of-moment estimators.
Discrete Uniform
The variable x from the discrete uniform distribution has two parameters (a,b)
where the variable x is equally likely to fall as an integer from a to b. The sample
data (x1, . . ., xn) is used to find the minimum, x(1), and maximum, x(n). The maximum
likelihood estimator of the parameters, a and b, are obtained as below:
124 10 Choosing the Probability Distribution from Data
a^ ¼ xð1Þ
b^ ¼ xðnÞ
Example 10.10 Suppose an analyst collects ten discrete sample data [7, 5, 4, 8, 5,
4, 12, 9, 2, 8] and wants to estimate the min and max coefficients from a discrete
uniform distribution. Using the maximum likelihood estimator, the minimum and
maximum estimates are:
a^ ¼ 2
b^ ¼ 12
Binomial
The variable x from the binomial distribution has parameters (n, p), where typically
n is known and p is not. The expected value of x is E(x) ¼ np, and thereby, when a
sample of n trials yields x successes, the maximum likelihood estimate of p is,
p^ ¼ x=n:
In the event the n trial experiment is run m times, and the results are (x1, . . ., xm),
with an average of x, the estimate of p becomes,
p^ ¼ x=n:
Estimating Parameters for Discrete Distributions 125
Geometric
Consider the geometric distribution where the variable x (0, 1, 2, . . .) is the number
of fails before the first success and p is the probability of a success per trial. The
expected value of x is E(x) ¼ (1p)/p. When m samples of x are taken, with results
(x1, . . ., xm), and a sample average x , the maximum-likelihood-estimator of
p becomes,
p^ ¼ 1=ð
x þ 1Þ:
Example 10.12 Suppose m ¼ 8 samples from geometric data are observed and
yield the following values of x: [3,6, 2, 5, 4, 4, 1, 5] where x is the number of
failures till the first success. The analyst wants to estimate the probability of
a success, p, and since the average of x is x ¼ 3:75 , the estimate becomes
p^ ¼1=ð3:75 þ 1Þ ¼ 0:211.
Pascal
Recall the Pascal distribution where the variable x is the number of failures till k
success’s. The parameters are (k, p), where k is known, and assume p is not known.
When m samples of x are taken, with results (x1, . . ., xm), and a sample average x is
computed, the maximum-likelihood-estimator of p becomes the following:
p^ ¼ k=ð
x þ kÞ:
Example 10.13 Suppose m ¼ 5 samples from the Pascal distribution with param-
eter k ¼ 4 are observed and yield the following data entries of x: [6, 4, 7, 5, 6]
where x is the number of failures till k successes. The analyst wants to estimate
the probability of a success, and since the average of x is x ¼ 5:60, the estimate is
p^ ¼ 4=ð5:60 þ 4Þ ¼ 0:417.
Poisson
The variable x (0, 1, 2, . . .) from the Poisson distribution has parameter y. The
expected value of x is EðxÞ ¼ y. When m samples of x (x1, . . ., xm) are collected,
the sample average of x is readily computed as x. Using the sample mean, the
maximum likelihood estimator of y becomes,
^
y ¼ x:
126 10 Choosing the Probability Distribution from Data
Example 10.14 Suppose m ¼ 10 samples from Poisson data are observed and
yield the following values of x: [0, 0, 1, 2, 2, 0, 1, 2, 0, 1]. The analyst wants to
estimate the Poisson parameter, y , and since the average of x is x ¼ 0:90 , the
estimate is ^
y ¼ 0:90.
Q-Q Plot
The Q-Q plot is a graphical way to compare the quantiles of sample (empirical) data
to the quantiles from a specified probability distribution as a way of observing the
goodness-of-fit. This plot applies to continuous probability distributions. See Wilk
and Ganandel (1968) for a fuller description on the Q-Q (quantile to quantile) plot.
To carryout, the empirical or sample data [x1,, . . ., xn] are first arranged in sorted
order [x(1),. . . ., x(n)] where x(1) is the smallest value and x(n) the largest. The
quantiles for the sample data are merely [x(1), . . ., x(n)]. The empirical cumulative
distribution function (cdf) of the sample quantiles is computed and denoted as
F½xðiÞ ¼ wi ¼ ði 0:5Þ=n i ¼ 1 to n
xi 0 ¼ F1 ðwi Þ i ¼ 1 to n
For each quantile from the sample, a corresponding quantile is computed for
the probability distribution. For convenience, the pair of quantiles are labeled as
Xs ¼ [x(1), . . ., x(n)] for the sample data, and Xf ¼ [x10 , . . ., xn0 ] for the fit from
the probability model.
The n pair of quantiles are now placed on a scatter plot with the sample quantiles,
Xs, on the x-axis and the probability model quantiles, Xf, on the y-axis. In the event
the probability model is a good fit to the sample data, the scatter plot will look like a
straight line going through a 45 angle from the lower left-hand side to the upper
right-hand side, and the scale of the x and y axis will be similar. In the literature, it is
noted where some references place, Xs on the y-axis and Xf on the x-axis.
Example 10.15 Suppose n ¼5 sample (or empirical)data of a variable are observed
as: [8.3, 2.5, 1.3, 9.4, 5.0]. The data are sorted and the sample quantiles are:
Xs ¼ [1.3, 2.5, 5.0, 8.3, 9.4]. The set of empirical probabilities are obtained from
the n samples and are listed in vector form as: Ps ¼ [0.1, 0.3, 0.5, 0.7, 0.9].
Q-Q Plot 127
xi ¼ wi =0:1 i ¼ 1 to 5:
Thereby, the five probability fit quantiles are Xf ¼ [1.0, 3.0, 5.0, 7.0, 9.0]. The
Q-Q plot for the pair of quantiles (Xs, Xf) is shown in Fig. 10.1. Since the scatter
appears much like a straight line with a 45 fit, the conclusion is that the sample data
is a reasonably close fit to the continuous uniform distribution that is under
consideration.
Example 10.16 Consider once more the sample data from Example 10.15 where
n ¼ 5 and the sorted data yield the quantile set: Xs ¼ [1.3, 2.5, 5.0, 8.3, 9.4], and
associated empirical probabilities Ps ¼ [0.1, 0.3, 0.5, 0.7, 0.9].
Now suppose the sample data are to be compared to a continuous distribution, f
(x) ¼ x/50 for 0 x 10 . Note, the cumulative distribution of x becomes
F(x) ¼ x2/100 ¼ w. So now, for probability w, the quantile is computed by x
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffi
¼ F1 ðwÞ ¼ 100FðxÞ ¼ 10 w . Hence, for each w in the sample set, Ps, an
associated fitted quantile is obtained by,
pffiffiffiffiffi
xi ¼ 10 wi i ¼ 1 to 5:
Thereby, the five quantiles for the probability fit become Xf ¼ [3.2, 5.5, 7.1,
8.4, 9.5]. The Q-Q plot for the pair of quantiles (Xs, Xf) is shown in Fig. 10.2.
Since the scatter plot is not close to the 45 line, the conclusion is that the sample
data is not a good fit to the probability distribution under consideration.
Example 10.17 (Continuous Uniform Q-Q Plot). Consider the sample data from
Example 10.4, and suppose the analyst wants to run a Q-Q Plot assuming the
probability distribution is a continuous uniform distribution. Recall from the earlier
example, the MLE estimates of the parameters are a^ ¼ 0:1 and b^ ¼ 9:8. Hence, the
probability function is estimated as f(x) ¼ 1/9.7 for 0:1 x 9:8, and the cumul-
ative distribution is F(x) ¼ (x0.1)/9.7. So, when the cumulative probability
is w ¼ F(x), the associated quantile becomes w ¼ F1(w) ¼ 0.1 + w(9.7). At
i ¼ 1, the minimum rank, wi ¼ (10.5)/20 ¼ 0.025 and x10 ¼ 0.1 + 0.025
(9.7) ¼ 0.345. At i ¼ 20, the maximum rank, w20 ¼ (200.5)/20 ¼ 0.975 and
x200 ¼ 0.1 + 0.975(9.7) ¼ 9.56, and so forth. The full set of quantiles for the
probability fit is denoted as Xf, and for simplicity, are listed here with one decimal
accuracy, Xf ¼ [0.3, 0.8, 1.3, 1.8, 2.3, 2.8, 3.3, 3.7, 4.2, 4.7, 5.2, 5.7, 6.2, 6.7, 7.1,
7.6, 8.1, 8.6, 9.1, 9.6].
Using the pair (Xs, Xf), the Q-Q Plot is in Fig. 10.3. The vector Xs contains the
sample data from Example 10.4. Note, the scatter plot closely fits a 45 angle from
128 10 Choosing the Probability Distribution from Data
10
5
Xf
0
0 5 10
Xs
the lower left corner to the upper right corner, and thereby the sample data appears
as a good fit to the continuous distribution with parameters a^ ¼ 0:1 and b^ ¼ 9:8.
Example 10.18 (Normal Q-Q Plot). Recall Example 10.5 where an analyst has
collected a sample of n ¼ 10 observations: [7.1, 9.1, 11.5, 16.1, 18.0, 14.3, 10.2,
8.7, 6.4, 1.3],and where the sample average and standard deviation of the data are:
x ¼ 10:27 and s ¼ 4.95, respectively. The sorted values gives the sample quantile
set Xs ¼ [1.3, 6.4, 7.1, 8.7, 9.1, 10.2, 11.5, 14.3, 16.1, 18.0]. With n ¼ 10, the ith
sorted cumulative probability for the sample quantiles are wi ¼ (i 0.5)/10 for
(i ¼ 1 to 10). The set Ps of cumulative probabilities is: Ps ¼ [0.05, 0.15, 0.25, 0.35,
0.45, 0.55, 0.65, 0.75, 0.85, 0.95]. Note the cumulative probabilities for i ¼ 1 is
w1 ¼ 0.05, for i ¼ 2, it is w2 ¼ 0.15, and so forth.
The analyst wishes to explore how the data fits with the normal distribution.
To do this, the z variable from the standard normal distribution is needed with each
w entry in the sample probability set Ps. This gives another set here denoted as
Z ¼ [1.645,1.036,0.674,0.385,0.125, 0.125, 0.385, 0.674, 1.036, 1.645].
See Table A.1 in the Appendix. Note at i ¼ 1, w1 ¼ 0.05 and z1 ¼ 1.645,
whereby P(z < 1.645) ¼ 0.05. In the same way, all the z values are obtained
from the standard normal distribution. Now using the average, x, standard deviation,
s, and z values, it is possible to compute the n ¼ 10 fitted quantiles for the normal
distribution by the following formula,
xi 0 ¼ x þ zi s i ¼ 1 to 10
Q-Q Plot 129
10
6
Xf
0
0 5 10
Xs
12.0
10.0
8.0
Xf
6.0
4.0
2.0
0.0
0.0 5.0 10.0 15.0
Xs
Applying the above formula yields the quantiles for the probability model,
Xf ¼ [2.12, 5.14, 6.93, 8.36, 9.65, 10.89, 12.18, 13.61, 15.40, 18.42]. The Q-Q
Plot comparing the quantiles from the sample, Xs, with the quantiles from the
probability fit, Xf, is shown in Fig. 10.4. Since the plot closely follows the 45 line
from the lower left-hand side to the upper right-hand side, the sample data seems
like a good fit with the normal distribution.
130 10 Choosing the Probability Distribution from Data
20
18
16
14
12
10
Xf
0
0 5 10 15 20
Xs
x ¼ 1=y lnð1 wÞ
14
12
10
8
Xf
0
0 5 10 15
Xs
Example 10.20 (Lognormal Q-Q Plot) Suppose the analyst wants to run a Q-Q
Plot comparing the lognormal distribution on the same data of Example 10.7. Recall,
the ten observations are Xs ¼ [0.3, 1.3, 1.5, 2.1, 2.5, 3.0, 4.5, 5.7, 7.3, 10.8], and the
cumulative probabilities are: Ps ¼ [0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75,
0.85, 0.95]. This gives the set Z ¼ [1.645,1.036,0.674,0.385,0.125, 0.125,
0.385, 0.674, 1.036, 1.645]. Recall at i ¼ 1, w1 ¼ 0.05 and z1 ¼ 1.645, whereby
P(z < 1.645) ¼ 0.05, and so forth. All the z values are obtained from the standard
normal distribution. To test for the lognormal, the natural logarithm of each sample
is taken as yi ¼ ln(xi) for i ¼ 1–10, The ten quantiles are the transformed data and
are denoted as Ys ¼ [1.204, 0.262, 0.405, 0.742, 0.916, 1.099, 1.504, 1.741,
1.988, 2.379]. The average and standard deviation on the ten values of y are, y ¼ 0
:9833 and s ¼ 1.0283, respectively. For each zi, the corresponding(fitted) entry is
obtained from the relation below,
yi 0 ¼ y þ zi s for i ¼ 1 to 10:
The ten fitted values of the normal distribution are now compared to their
counterpart yi (i ¼ 1 to 10), and are listed as Yf ¼ [0.706, 0.080, 0.292,
0.589, 0.857, 1.113, 1.381, 1.678, 2.051, 2.677]. The ten-paired data of Ys and Yf
form the Q-Q Plot in Fig. 10.6. Since the plotted data lie below the 45 line from the
lower left-hand corner to the upper right-hand corner, the lognormal distribution
does not appear as a good fit for the data.
132 10 Choosing the Probability Distribution from Data
2.5
1.5
Yf
0.5
0
−2 −1 0 1 2 3
−0.5
−1
Ys
P-P Plot
The P-P plot is a graphical way to compare the cumulative distribution function
(cdf) of sample (or empirical) data to the cdf of a specified probability distribution
as a way of detecting the goodness-of-fit. The plot applies for both continuous and
discrete probability distributions. See Wilk and Ganandel (1968) for a fuller
description on the P-P (probability to probability) plot. To carryout, the sample
data (x1,, . . ., xn) are first arranged in sorted order [x(1),. . . ., x(n)] where x(1) is the
smallest value and x(n) the largest. The cdf for the sample data, F[x(i)] i ¼ 1 to n,
are denoted here as (w1, . . ., wn), where
wi ¼ F½xðiÞ ¼ ði 0:5Þ=n i ¼ 1 to n
wi 0 ¼ Fðxði Þ Þ i ¼ 1 to n
For each cdf from the sample, a corresponding cdf is computed for the probabil-
ity distribution. For convenience, the pair of cdf’s are labeled as Fs ¼ [w1, . . ., w10]
for the sample data, and Ff ¼ [w10 , . . ., w100 ] for the probability model.
The n pair of cdf’s are now placed on a scatter plot with the sample cdf’s, Fs, on
the x-axis and the probability model cdf’s, Ff, on the y-axis. In the event the
P-P Plot 133
0.9
0.8
0.7
0.6
0.5
Ff
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
Fs
probability model is a good fit to the sample data, the scatter plot will look like a
straight line going through a 45 angle from the lower left hand side to the upper
right hand side, and the scale of the x-axis and y-axis are similar. Note, in some
references, Fs is placed on the y-axis and Ff on the x-axis.
Example 10.21 (Discrete Uniform) Recall Example 10.10 where n ¼ 10 samples
were taken from data assumed as discrete uniform, and where the maximum
likelihood estimates of the parameters are a^ ¼ 2 and b^ ¼ 12 . The sorted data
becomes [2, 4, 4, 5, 5, 7, 8, 8, 9, 10]. Since n ¼ 10, the sample cdf for this data are
listed here as Fs ¼ [0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85, 0.95]. Recall,
for a discrete uniform distribution, with parameters (a,b), the cdf is computed by
F[x] ¼ (x a+1)/(b a+1). Hence, the cdf’s for the fitted probability model
become: Ff ¼ [0.09, 0.27, 0.27, 0.36, 0.36, 0.55, 0.64, 0.64, 0.73, 0.81]. The P-P
Plot is shown in Fig. 10.7. Since the plotted points are similar to a 45 line, the
sample data appears reasonably close to a discrete uniform distribution.
Example 10.22 (Binomial) Consider Example 10.11 where m ¼ 5 samples on
binomial data for the number of successes in n ¼ 8 trials yields an estimate of
p^ ¼ 0:20. The sorted data is Xs ¼ [0, 1, 2, 2, 3] and the cdf of the sample data
becomes: Fs ¼ [0.1, 0.3, 0.5, 0.7, 0.9]. From the binomial distribution, the proba-
bility of x successes in n ¼ 8 trials with p ¼ 0.2 are computed as: p(0) ¼ 0.168,
p(1) ¼ 0.336, p(2) ¼ 0.293, p(3) ¼ 0.146, . . . Hence the associated cdf’s for the
fitted probability model are: F(0) ¼ 0.168, F(1) ¼ 0.504, F(2) ¼ 0.797,
F(3) ¼ 0.943, ...., and thereby Ff ¼ [0.168, 0.504, 0.797, 0.797, 0.943]. Figure 10.8
is the P-P Plot for this data. The plot somewhat follows a 45 line and as such, the
binomial probability distribution appears as a fair fit to the data.
Example 10.23 (Geometric) Recall Example 10.12 where m ¼ 8 samples are
taken from a geometric distribution where x is the number of failures till a success.
The estimate of the success probability for the example is p^ ¼ 0:211. The sorted
134 10 Choosing the Probability Distribution from Data
1
0.9
0.8
0.7
0.6
Ff
0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
Fs
0.8
0.7
0.6
0.5
0.4
Ff
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
Fs
data are Xs ¼ [1, 2, 3, 4, 4, 5, 5, 6], and the corresponding cdf’s are Fs ¼ [0.0625,
0.1875, 0.3125, 0.4375, 0.5625, 0.6875, 0.8125, 0.9375]. Using p ¼ 0.211, and the
cumulative distribution function,
FðxÞ ¼ 1 ð1 pÞxþ1 x ¼ 0; 1; . . .
Ff ¼ [0.377, 0.508, 0.612, 0.694, 0.694, 0.758, 0.758, 0.809]. Figure 10.9 is the
P-P Plot for this example. Since the plot does not follow a 45 line, the geometric
distribution does not appear as a good fit to the data.
Summary 135
Ff
0.5
0.4
0.3
0.2
0.1
0
0 0.5 1
Fs
In this example, there are (many) ties in the sample data, and special consideration is
applied in the P-P Plot. With ties in the sample (or empirical) data, the P-P plot only
considers one entry for each unique value of the data. In this way, Xs ¼ [0, 1, 2],
Fs ¼ [0.35, 0.65, 0.95], and Ff ¼ [0.407, 0.773, 0.938]. Figure 10.10 is the P-P Plot
for the cdf’s of the sample and the fit. Since, Fs is similar to Ff, the fit seems
appropriate.
Summary
Computer simulation models often include one or more variables that play impor-
tant roles in the model. Some of the random variables are of the continuous type and
others are discrete. The analyst is confronted with choosing the proper probability
distribution for each variable, and also with estimating the associated parameter(s)
value. The chapter describes some of the common ways to select the distribution
and to estimate the associated parameter values when some empirical or sample
data is available from the real system.
Chapter 11
Choosing the Probability Distribution
When No Data
Introduction
Sometimes the analyst has no data to measure the parameters on one or more of the
input variables in a simulation model. When this occurs, the analyst is limited to a few
distributions where the parameter(s) may be estimated without empirical or sample
data. Instead of data, experts are consulted who give their judgment on various
parameters of the distributions. This chapter explores some of the more common
distributions where such expert opinions are useful. The distributions described here
are continuous and are the following: continuous uniform, triangular, beta, lognor-
mal and Weibull. The data provided by the experts is the following type: minimum
value, maximum value, most likely value, average value, and a p-quantile value.
Continuous Uniform
b^ ¼ a^ þ ðxa a^Þ=a
Example 11.1 Suppose the analyst wants to use the continuous uniform distribution
in the simulation model and has estimates of the minimum value of x as a^ ¼ 10 and
the 0.9-quantitleas x0.9 ¼ 15. With this information, the estimate of b becomes,
15 10
b^ ¼ 10 þ ¼ 15:56:
0:90
Now assume the expert(s) can give the following two estimates on the
distribution:
b^ ¼ an estimate of the maximum value of x, and xa ¼ an estimate of the of a-
quantile of x where P½x xa ¼ a. An estimate of the parameter a is needed to use
the distribution in the simulation model. Using the estimates provided, the cumula-
tive distribution becomes,
and thereby, using b^ and xa , and some algebra, the estimate of the minimum
parameter a is,
^
a^ ¼ ðxa abÞ=ð1 aÞ
Example 11.2 Assume a situation where the simulation analyst wants to use the
continuous uniform distribution and has estimates of b^ ¼ 16 and the 0.1-quantile
x0.1 ¼ 11. The estimate of a becomes,
The following sections show how to apply the triangular, beta, Weibull and
lognormal distributions in the simulation model when no data is available. See Law,
pages 370–375 (2007) for further discussion.
Beta 139
Triangular
Recall, the triangular distribution applies for a continuous variable, x with three
parameters, (a, b, x~), where the range of x is from a to b, and the mode is denoted as x~.
When the analyst wants to use this distribution in a simulation model and has no
empirical or sample data to estimate the three parameters, he/she may turn to one or
more experts to gain the estimates of the following type:
So now, the triangular distribution can be used with parameters, a^; b; ^ ^x~.
0
The associated standard triangular distribution, T(0, 1, x~ ),with variable x0 falls in
the range from 0 to 1. The most likely value of x0 is the mode denoted as x~0 . The
mode of the standard triangular variable is computed from the corresponding
triangular variable by x~0 ¼ ^x~ a^ ðb^ a^Þ.
Example 11.3 Suppose the analyst wants to use the continuous triangular distribution
in the simulation model and from experts opinions has estimates of a^ ¼ 10; b^ ¼ 60
and ^x~ ¼ 20. To apply the standard triangular, with variable x0 , the estimate of the mode
becomes:
20 10
x~ ¼ ¼ 0:20:
60 10
So the triangular distribution is T(10, 20, 60) and the associated standard
triangular distribution is T(0, 1, 0.20).
Beta
Recall the beta distributionhas two parameters (k1,k2) where k1> 0 and k2> 0,
and takes on many shapes depending on the values of the parameters. The
variable denoted as x, lies within two limits(a and b) where (a x b). The
distribution takes on many shapes, where it can skew to the right, skew to
the left, where the mode is at either of the limit end points (a, b), includes
various bathtub configurations, and also has symmetrical and uniform shapes.
These shapes depend on the values of the two parameters, k1, k2. Perhaps the
most common situations occur when k2>k1> 1 whereby the mode is greater than
the low limit, a, and the distribution is skewed to the right. This is the distribu-
tion of interest in this chapter.
140 11 Choosing the Probability Distribution When No Data
When the analyst wants to use this distribution in a simulation model and has no
empirical or sample data to estimate the four parameters, he/she may turn to one or
more experts who could provide estimates of the following type:
Recall, for the beta distribution, the mean and mode of x are computed from the
parameters as follows:
Note there are two equations and two unknowns (k1, k2) when estimates of (^ ^
a; b;
^; ^x~) are given. Using algebra, and the estimates provided, it is possible to estimate the
m
unknown shape parameters, k1, k2 with the two equations listed below:
k^1 ¼ ð^ ^ = ð^x~ m
m a^Þð2^x~ a^ bÞ ^Þðb^ a^Þ
k^2 ¼ ðb^ m
^Þk^1 =½ð^
m a^Þ
So now the analyst can use the beta distribution with estimates of all four
parameters, (k^1 ; k^2 ; a^; b)
^ in the simulation model.
Example 11.4 Assume a simulation model where the analyst wants to use the beta
distribution but has no empirical or sample data to estimate the parameters. The
analyst gets advice from an expert(s) that provides estimates of a^ ¼ 10, b^ ¼ 60,
^ ¼ 30 and ^x~ ¼ 20 . Using the above equations, the estimates of the
m
parameters become,
Hence, the beta distribution can now be applied with the parameters, a^ ¼ 10;
b ¼ 60; kb1 ¼ 1:2, and kb2 ¼ 1:8.
b
Lognormal 141
Lognormal
x~0 ¼ ems
2
x~ ¼ g þ ems
xa ¼ g þ emþza s
m s2 ¼ lnð~
x gÞ
m þ za s ¼ lnðxa gÞ
½s2 za s ¼ ln½ð~
x gÞ=ðxa gÞ ¼ c
^¼s
m ^2 þ lnð~
x gÞ
142 11 Choosing the Probability Distribution When No Data
Finally, the analyst can now apply the lognormal distribution to the variable x
using the parameters:
g ¼ location parameter of x
^ ¼ mean of y
m
^ ¼ standard deviation of y
s
^; s
In essence, x LNðg; m ^2 Þ.
Example 11.5 A simulation model is being developed and the analyst wants to use
the lognormal distribution but has no empirical or sample data to estimate the
parameters. The analyst gets advice from an expert(s) who provides estimates of
^g ¼ 100; x~ ¼ 200 and x0.9 ¼ 800. Note a ¼ 0.90 and z0:90 ¼ 1:282. Using the above
results, the estimates of the parameters become,
Hence, the lognormal distribution can now be applied in the simulation model
with the parameters, ^g ¼ 100; m
^ ¼ 5:404 and s ^ ¼ 0:894.
A quick check to ensure the estimates are correct is to measure the mode and/or
the a-quantile that were provided at the outset. The computations are below.
2Þ
x~ ¼ g þ ems ¼ 100 þ eð5:4040:894 ¼ 200
2
x ¼ 200 and
Since the measures are the same as the specifications provided, (~
x0.90 ¼ 800),the computations are accepted.
Weibull
When k1< 1, the mode of x0 is at x0 ¼ 0. The analysis here is when k1 1 and the
mode of x0 is greater than zero. For this situation, the mode is measured as below.
Hence,
So now,
whereby,
Solving for k1
Because estimates of x~; g and xa are provided, along with a, the only unknown in
the above equation is k1. At this point, an iterative search is made to find the value
of k1 where the right-hand-side of the above equation is equal to the left-hand-side.
The result is k^1 .
Solving for k2
Having found k^1 , the other parameter, k2, is now obtained from
^
x gÞ=½ðk^1 1Þ=k^1
k^2 ¼ ð~
1=k1
Example 11.6 A simulation model is being developed and the analyst wants to use
the Weibull distribution but has no empirical or sample data to estimate the
parameters. The analyst gets advice from an expert(s) that provides estimates of
^g¼ 100; x~ ¼ 130 and x0.9 ¼ 500. Note a ¼ 0.90. To find the estimate of k1, the
following computations are needed to begin the iterative search:
ð~
x gÞ=ðxa gÞ ¼ ð130 100Þ=ð500 100Þ ¼ 0:075
The search for k1 begins with k1 ¼ 2.00, and continues until k1 ¼ 1.14:
At k1 ¼ 2.00, LHS ¼ 0.46
At k1 ¼ 1.50, LHS ¼ 0.26
At k1 ¼ 1.20, LHS ¼ 0.11
At k1 ¼ 1.14, LHS ¼ 0.075
Hence, k^1 ¼ 1:14.
So now, the estimate of k2 is the following:
Summary 145
x gÞ=½ðk^1 1Þ=k^1 k1
k^2 ¼ ð~
1= ^
Finally, the estimates of the parameters are (k^1 ¼ 1:14; k^2 ¼ 188:9).
A quick check to ensure the estimates are correct, requires measuring the mode
and/or the a-quantile and compare the measures with those that were provided at
the outset. The computations are below.
xa ¼ g þ k2 fln½1=ð1 aÞg1=k1
¼ 100 þ 188:9 ln½1=ð1 0:901=1:14
¼ 492:5
x ¼ 130 and
Since the above measures are sufficiently near the data provided, (~
x0.90 ¼ 500) the parameters estimated are accepted.
Summary
Sometimes the analyst may need to develop a computer simulation model that
includes one or more variables where no empirical or sample data is available. This
is where he/she seeks opinions from one or more experts who give some estimates
on the characteristics of the variable. The chapter pertains to these situations and
shows some of the common ways to select the probability distribution and estimate
the associated parameters.
Appendix A
Problems
In solving the problems, the student will occasionally need one or more random
variates of the uniform type, u ~ U(0,1), or of the standard normal type, z ~ N(0,1).
These are provided in the Appendix with random variates of each type, Table A.3
for u ~ U(0,1) and Table A.4 for z ~ (N(0,1). On each problem, the student should
begin on the first row and first column to retrieve the random variate, then the first
row and second column, and so forth. Hence, for u ~ U(0,1), the variates are:
0.3650, 0.4899, and so forth. For z ~ N(0,1), they are: 0.058, 1.167, so forth.
Chapter 2
Chapter 3
3.1 The variable x is continuous with probability density f(x) ¼ 3/8x2 for 0 x
2. Use the inverse transform method to generate a random variate of x.
x 1 0 1 2 3 4
p(x) 0.10 0.20 0.30 0.20 0.15 0.05
[a,b) Frequency
[0, 10) 36
[10–20) 10
[20–30) 4
Chapter 4
4.1 The variable x is a continuous uniform for 10 < x < 30. Use the inverse
transform method to generate a random variate of x.
4.2 The variable x is exponential with E(x) ¼ 5. Generate a random variate of x.
4.3 The variable x is Erlang with k ¼ 5 and E(x) ¼ 20. Generate a random
variate of x.
Appendix B 155
4.4 The variable x is Gamma with sample data of x ¼ 5:0 and s2 ¼ 10.0.
Generate a random variate of x.
4.5 The variable x is Gamma with sample data of x ¼ 1:0 and s2 ¼ 10.0.
Generate a random variate of x.
4.6 The variable x is Beta with parameters, (k1, k2) ¼ (1, 8) and (a, b) ¼ (10, 90).
Two random Gammas are g1 ¼ 13 for (k1, k2) ¼ 10,1) and g2 ¼ 20 for (k1,
k2) ¼ (8,1). Find the random variate for the Beta.
4.7 The variable x is Weibull with parameters, (k1, k2) ¼ (2, 20). Generate a
random variate of x.
4.8 The variable x is Normal with mean ¼ 100 and variance ¼ 100. Use the
Sine-Cosine method to generate a random variate of x.
4.9 The variable x is Lognormal with mx ¼ 5 and sx ¼ 100. Generate a random
variate of x.
4.10 Generate a random variate for a chi-square variable with degrees of
freedom ¼ 5.
4.11 Generate a random variate for a chi-square variable with degrees of
freedom ¼ 153.
4.12 Generate a random variate for a student’s t with degrees of freedom ¼ 5. Get
z first.
4.13 Generate a random variate for an F distribution with degrees of freedom 5 and
2. Get chi square (df ¼ 5), first.
Chapter 5
x 1 0 1 2
P(x) 0.50 0.30 0.15 0.05
Chapter 6
6.1 The variables x1, x2, x3, x4 are jointly related by the following probabilities:
x1 0 1
x2 1 2 1 2
x3 x4
0 3 .01 .09 .02 .10
4 .03 .01 .04 .02
1 3 .05 .06 .06 .09
4 .07 .12 .08 .15
Generate one set of random variates for x1, x2, x3, x4.
6.3 Consider the multinomial distribution with k ¼ 4, p1 ¼ .4, p2 ¼ .3, p3 ¼ .2,
p4 ¼ .1 and n ¼ 5. Generate one set of random variates for x1, x2, x3, x4.
6.4 Consider x1, x2 that are related by the bivariate normal distribution with
m1 ¼ 1.0, m2 ¼ 0.8, s1 ¼ 0.2, s2 ¼0.1 and r ¼ 0.5. Generate one random
variate set of x1, x2.
6.6 Consider x1, x2 that are related by the bivariate lognormal distribution with
parameters my1 ¼ 20, my2 ¼ 2, sy1 ¼ 2, sy2 ¼ 0.5 and ry ¼ 0.8, where
yi ¼ ln(xi) for i ¼ 1,2. Generate one random variate set of x1, x2.
6.8 Generate the Cholesky matrix from the variance-covariance matrix:
2 3
16 4 2
4 4 4 15
2 1 1
6.9 The variables x1, x2, x3 are from a multivariate normal distribution with the
following matrices. Generate one random set of x1, x2, x3:
2 3 2 3
20 4 0 0
m¼4 6 5 C¼4 1 3 0 5
10 0:5 0:5 0:5
6.10 The variables x1, x2, x3 are from a multivariate lognormal distribution with the
following matrices from the transformed values of y1, y2, y3. Generate one
random set of x1, x2, x3:
2 3 2 3
2:0 4 0 0
my ¼ 4 0:6 5 Cy ¼ 4 1 3 0 5
1:0 0:5 0:5 0:5
Appendix B 157
Chapter 7
7.1 Consider a Poisson process where A(j) ¼ arrival rate, B(j) ¼ time for j ¼ 1,
2, 3, 4:
j 1 2 3 4 ......
AðjÞ 2 3 5 4
BðjÞ 0 1 2 3
Chapter 9
9.2 Simulation results of n ¼ 11 runs yields x ¼ 100 and s ¼ 18. Compute the
0.95 confidence limits for the true mean. Note, t10, 0.025 ¼ 2.228.
9.3 In Problem 9.2, the analyst wants (U – L) ¼ 5.0. How many more runs are
needed?
9.4 Simulation results of n ¼ 40 runs yield w ¼ 8 units with an attribute. Find the
0.95 confidence limits on the true proportion of units with the attribute.
9.5 In Problem 9.4, the analyst wants (U – L) ¼ 0.04. How many more runs are
needed?
9.6 Consider a machine shop where an order of No ¼ 15 is needed. A simulation
is run where the units started is Ns ¼ 20, and after n ¼ 1,000 simulation runs,
Ng ¼ 958 is the number of good units. The management wants to be 95%
certain that the number of good units will exceed No. Find the 0.95 confidence
interval on the probability that Ng will be equal or larger to No.
9.7 For Problem 9.6, how many more runs, if any, are needed so that the 0.95
confidence interval (from L to U) is always above the 0.95 specification mark?
158 Appendix B
9.9 Suppose two options are run in a simulation with results of n1 ¼ 20, x1 ¼ 1001
¼ 100 and s1 ¼ 30 for option 1, and n2 ¼ 20, x2 ¼ 95 and s2 ¼ 25 for
option 2. Assuming the two variances are the same, find the 0.95 confi-
dence limits on the difference between the two means.
9.10 A simulation is run with four options (i) and four observations (j) of each are
run with results in the table below. Use the one-way-analysis-of-variance
method to determine if all the means of the four options are the same at the
0.05 significance level. Note F3,12,0.05 ¼ 3.49.
Observations (j)
1 2 3 4
Options (i)
1 5 7 6 4
2 3 5 6 2
3 8 9 7 8
4 6 4 4 6
9.12 Using the results of Problem 9.10, compute the 0.95 confidence interval on the
difference between each pair of means, and label any that are significantly
different. Note, t0.025,12 ¼ 2.179.
Chapter 10
10.2 A sample of n ¼ 10 data entries are the following: (10, 13, 9, 7, 8, 12, 15, 10,
3, 8). Compute the following: x(1), x(n), x, s, cov, t.
10.3 From the data of Problem 10.2, compute the estimate of the location
parameter, g.
10.4 Consider the (n ¼ 15) sample data from a continuous uniform distribution:
(1.3, 1.4, 1.8, 2.3, 2.4, 2.5, 2.9, 3.1, 3.4, 3.9, 4.1, 4.7, 5.2, 5.7, 6.1). Find the
maximum likelihood estimates of the min and max parameters, (a, b). Now
find the method of moments estimators for a and b.
10.5 Suppose the n ¼ 12 sample data entries are the following: (10.4, 12.3, 13.5,
14.6, 15.1, 15.8, 16.2, 16.5, 17.3, 16.3, 15.1, 19.4). Assuming the data are
normally distributed, estimate the parameters of the mean and standard
deviation.
10.6 From the n ¼ 8 sample data: (0.7, 1.2, 1.8, 2.4, 4.0,10.3, 0.9, 1.4), estimate
the parameter for an exponential distribution.
10.7 Assume the lognormal variable x, with sample data: (10, 12, 15, 23, 40, 90,
217). Estimate the parameters for this distribution.
10.8 Suppose the variable x is assumed as gamma distributed and a sample of
n ¼ 50 yields the sample average of 33 and the sample variance of 342.
Estimate the parameters for the gamma distribution.
Appendix B 159
10.9 Suppose the variable x is assumed as beta distributed and estimates of the
following are given: mean ¼ 60, mode ¼ 80, min ¼ 0, max ¼ 100. Esti-
mate the parameters for the beta distribution.
10.10 The following (n ¼ 11) sample data are: (3, 3, 5, 7, 8, 8, 11,12,13,15,16).
Assuming the data comes from a discrete uniform distribution, estimate the
maximum likelihood estimate for the min and max (a, b). Now estimate the
parameters using the method of moments.
10.11 Suppose the variable x is from a binomial distribution with n ¼ 10, p is
unknown, and m ¼ 8 samples of x are (3, 2, 1, 5, 3, 4, 3, 2). Find the
maximum likelihood estimate of p.
10.12 Suppose the variable x is from a geometric distribution where p is unknown,
and m ¼ 6 samples of x are (3, 5, 8, 4, 7, 3). Find the maximum likelihood
estimate of p. Recall, x ¼ number of failures till a success.
10.13 Suppose the variable x is from a Pascal distribution with k ¼ 3, p is
unknown, and m ¼ 10 samples of x are (8, 9, 9, 12, 10, 13, 10, 12,
11, 13). Find the maximum likelihood estimate of p. Recall, x ¼ number
of failures till three successes.
10.14 Suppose the variable x is from a Poisson distribution where the parameter is
unknown, and m ¼ 7 samples of x are (3, 5, 2, 7, 4, 4, 1). Find the maximum
likelihood estimate of the parameter.
10.15 Consider the (n ¼ 15) sample data from Problem 10.4 and the maximum
likelihood estimates of the parameters. Assuming the continuous uniform
distribution, list the vectors Xs and Xf for the Q-Q Plot.
10.16 Consider the (n ¼ 11) sample data from Problem 10.10 and the maximum
likelihood estimates of the parameters. Assuming the discrete uniform
distribution pertains, list the vectors Fs and Ff for the P-P Plot.
Chapter 11
11.1 Suppose the variable x is from a continuous uniform distribution and an expert
estimates the minimum value is 50 and the 0.75-quantile is 90. Estimate the
parameters for this distribution.
11.2 Suppose the variable x is from a continuous uniform distribution and an expert
estimates the maximum value is 100 and the 0.20-quantile is 40. Estimate the
parameters for this distribution.
11.3 Assume a variable x from the triangular distribution where an expert estimates
the following: min ¼ 5, most likely ¼ 20 and the max ¼ 30. Estimate the
parameters for the standard triangular distribution.
11.4 Assume a variable x from the beta distribution where an expert estimates the
following: min ¼ 5, mean ¼ 18, most likely ¼ 20 and the max ¼ 30. Esti-
mate the parameters for the beta distribution.
160 Appendix B
Solutions
5.3 1
5.4 3
5.5 99
5.6 1
5.7 0
5.8 2
5.9 8
5.10 3
6.1 0,2,0,4
6.3 3,0,2,0
6.4 19.884, 2.326
6.6 2.686, 2.454
6.8 C¼ 4 0 0
1 1.732 0
0.5 0.289 0.666
6.9 X¼ 19.768
9.443
11.028
7.1 0.908, 2.865, 3.562
7.2 x¼6
7.3 67.14
7.5 193.76
7.7 8, 11, 3, 12, 6
7.8 8 H, AD, 6D, AS, KD
9.2 (L,U) ¼ (87.908, 112.091)
9.3 188 more
9.4 (L,U) ¼ (0.076, 0.324)
9.5 1,497 more’
9.6 (L,U) ¼ (0.946, 0.970)
9.7 1,415 more
9.8 (L,U) ¼ (12.11, 22.11)
9.9 (L,U) ¼ (0.1917, 0.0317)
9.10 F ¼ 6.61 Significant
9.12 1v2 0.476, 2.524 s
1v3 3.524, 1.475 s
1v4 0.524, 1.524 ns
2v3 5.024, 2.976 s
2v4 2.024, 0.024 ns
3v4 1.976, 4.024 s
10.2 x(1) ¼ 3
x(10) ¼ 15
x ¼ 9:5
s ¼ 3.375
cov ¼ 0.355
t ¼ 1.190
10.3 1
10.4 MLE: a ¼ 1.3, b ¼ 6.1
MOM: a ¼ 0.746, b ¼ 6.025
(continued)
Appendix C 163
Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions with Formulas, Graphs and
Tables, pp. 931–937. U.S. Department of Commerce, Washington (1964)
Ahrens, J.H., Dieter, V.: Computer methods for sampling from gamma, beta, poisson and binomial
distributions. Computing 12, 223–246 (1974)
Barnett, V.: Probability plotting methods and order statistics. Appl. Stat. 24, 95–108 (1975)
Box, G.E.P., Miller, M.E.: A Note on the generation of random normal variates. Ann. Math. Stat.
29, 610–611 (1958)
Cheng, R.C.H.: The Generation of gamma variables with non-integral shape parameter. Appl. Stat.
26, 71–75 (1977)
Dubey, S.D.: On some permissible estimators of the location parameters of the Wiebull and certain
other distributions. Technometrics 9, 293–307 (1967)
Fishman, G.S., Moore, L.R.: A statistical evaluation of multiplicative congruent random number
generators with modulus 231 -1. J. Am. Stat. Assoc. 77, 129–136 (1982)
Gentle, J.E.: Cholesky Factorization. In: Numerical Linear Algebra for Applications in Statistics,
pp. 93–95. Springer, Berlin (1958)
Hahn, G.J., Shapiro, S.S.: Statistical Models in Engineering. Wiley, New York (1994)
Hastings Jr., C.: Approximation for Digital Computers. Princeton University Press, Princeton
(1955)
Hines, W.W., Montgomery, D.C., Goldsman, D.M., Borror, C.M.: Probability and Statistics in
Engineering. Wiley, New Jersey (2003)
Kelton, W.D., Sadowski, R.P., Sturrock, D.T.: Simulation with Arena. McGraw Hill, New York
(2007)
Law, A.M.: Simulation Modeling and Analysis, 4th edn. McGraw Hill, Boston (2007)
Lehmer, D.H.: Mathematical methods in large scale computing units. Ann. Comput. Lab. 26,
142–146 (1951). Harvard University
Lewis, P.S.W., Goodman, A.S., Miller, J.M.: A pseudo random number generator for the system/
360. IBM Syst. J. 8, 136–146 (1969)
Payne, W.H., Rabung, J.R., Bogyo, T.P.: Coding the lehmer pseudorandom number generator.
Commun. Assoc. Comput. Mach. 12, 85–86 (1969)
Rand Corporation: A Million Random Digits with 100,000 Normal Deviates, Santa Monica (1946)
Rose, C., Smith, M.D.: Order statistics. Math. Stat. Math. 9.4, 311–332 (2002)
Schever, E.M., Stoller, D.S.: On the generation of normal random vectors. Technometrics 4,
278–281 (1962)
Wilk, M.B., Gnanadesikan, R.: Probability plotting methods for the analysis of data. Biometrika
55, 1–17 (1968)
Zanakis, S.H.: A simulation study of some simple estimators for the three-paramater Weibull
distribtuion. J. Stat. Comput. Simul. 9, 101–116 (1979)
Cumulative distribution, 18, 20, 25, 36, 58, 59, Forecast model, 84–89
127, 130, 137, 143 FORTRAN, 2
Cumulative distribution function (CDF), Frontline, 3
16–18, 21, 23–25, 28, 30, 35, 46, 48, 51,
52, 126, 132–135
Cycle, 10–14, 81–84 G
Cyclical, 6, 79, 83, 84, 90 Gamma, 5, 27, 31–33, 41, 44, 74–76, 91,
116–118, 122–123
Gamma function, 31, 35
D Gas station, 54, 96
Data, 2, 9, 24, 69, 79, 91–112, 113, 137–145 GAUSS, 2
Database, 84–85, 90 Geometric, 5, 45, 51, 52, 55, 116, 119–120,
Defectives, 50, 60, 61, 99 125, 133, 134
Degrees of freedom, 40–44, 93–95, 102–105, Goodness-of-fit, 13, 126, 132
109–112
Demand history, 84, 85, 87–89
Demand pattern, 85–89 H
Digits, Hastings approximation, 36–37
Discrete, 3–5, 7, 15, 16, 21, 45–55, 57, 78, 113, Hastings method, 37
114, 116, 124, 132, 135 Horizontal, 84–87, 89
Discrete arbitrary, 5, 45–46 Hyper geometric, 5, 45, 50, 55
Discrete distributions, 119, 123–124 Hypothesis, 12, 108, 110
Discrete uniform, 5, 46–47, 76, 77, 116, 119,
123, 124, 133
Discrete variable, 15–17 I
Idle time, 55, 80, 92, 100, 101, 108
Independence, 114
E Inter-arrival time, 72
Empirical, 4, 7, 114, 126, 132, 135, 137, Inventory, 7, 55, 85–87, 90, 113
139–142, 144, 145 Inventory manager, 7, 85–89, 113
Empirical grouped data, 25–26 Inverse function, 16, 20–22, 126
Empirical probability, 127 Inverse transform method (IT), 5, 15–17, 19,
Empirical ungrouped data, 24–25 24–26, 29, 36, 37
Emulate, 4, 6, 7, 12, 26, 79, 90, 91, 113, 118
Equilibrium, 4, 6, 55, 79–82, 84, 90
Erlang, 5, 21, 27, 30–31, 44 J
Estimating parameters, 120–126 JAVA, 2
Excel, 3 Joint probability, 5, 57, 58
Expected value, 12, 13, 23, 28–30, 34, 35, 42,
43, 45, 47–51, 53, 63, 73, 99, 101, 105,
109, 124, 125 L
Exponential, 5, 27–29, 31, 35, 44, 53, 54, Lehmer, D.H., 10
71–75, 99, 116, 117, 121–122, 130–131 Length of the cycle, 12
Lexis ratio, 115, 119, 120
Linear congruent generators, 10–11
F Linear congruent method, 14, 153
F distribution, 43 Location parameter, 91, 115–116, 118,
Fishers’ F, 5, 27, 43–44 141–143
Fixed time horizon, 71 Lognormal, 5, 7, 27, 39–40, 44, 116–118, 122,
Flipping coins, 2, 4, 9 131, 132, 138, 141–142
Forecast, 84–89 Los Alamos National Laboratory, 1
Forecast error, 85–89 Lower limit, 104, 108, 111
Forecasting, 84–85, 87, 88, 90 Lower-tail, 43
Index 169
Upper and lower limits, 93–95, 97, 101–108, Vector, 66–69, 127, 128, 159
111 Visual Basic, 2
Upper-tail, 43, 94 Von Neumann, J., 1
U.S. Air Force, 1–2
W
V Weibull, 5, 7, 27, 35–36, 44, 117–119, 137,
Variable type data, 92–96, 101, 108–112 138, 142–145, 149, 155, 160
Variance, 4, 12, 23, 28, 30–35, 39–43, 45, What if, 6, 7, 91, 113
47–53, 58, 60, 63, 64, 66, 67, 92, 93, 95, When need more accuracy, 96, 98–100
97, 98, 101–104, 107–109, 111, When n is small, 48
118–124, 141, 155, 158 When s1=s2, 102–103
Variance-covariance, 67, 68, 156 When s16¼s2, 103
Variate, 12, 15, 26, 31, 32, 38, 49, 53, 55, Without replacement, 6, 50, 60, 61, 71,
58–59, 61, 69, 153 76–78, 157