0% found this document useful (0 votes)
1K views

Essentials of Monte Carlo Simulation - Statistical Methods For Building Simulation Models

Uploaded by

darioime
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views

Essentials of Monte Carlo Simulation - Statistical Methods For Building Simulation Models

Uploaded by

darioime
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 183

Essentials of Monte Carlo Simulation

Nick T. Thomopoulos

Essentials of Monte Carlo


Simulation
Statistical Methods for Building
Simulation Models
Nick T. Thomopoulos
Stuart School of Business
Illinois Institute of Technology
Chicago, Illinois, USA

ISBN 978-1-4614-6021-3 ISBN 978-1-4614-6022-0 (eBook)


DOI 10.1007/978-1-4614-6022-0
Springer New York Heidelberg Dordrecht London
Library of Congress Control Number: 2012953256

# Springer Science+Business Media New York 2013


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts
in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being
entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication
of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the
Publisher’s location, in its current version, and permission for use must always be obtained from
Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center.
Violations are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)


For my wife,
my children,
and my grandchildren.
Preface

I was fortunate to have a diverse career in industry and academia. This included
working at International Harvester as supervisor of operations research in the
corporate headquarters; at IIT Research Institute (IITRI) as a senior scientist with
applications that spanned worldwide in industry and government; as a professor in
the Industrial Engineering Department at the Illinois Institute of Technology (IIT),
in the Stuart School of Business at IIT; at FIC Inc. as a consultant for a software
house that specializes in supply chain applications; and the many years of consult-
ing assignments with industry and government throughout the world. At IIT, I was
fortunate to be assigned a broad array of courses, gaining a wide breadth with the
variety of topics, and with the added knowledge I acquired from the students, and
with every repeat of the course. I also was privileged to serve as the advisor to many
bright Ph.D. students as they carried on their dissertation research. Bits of knowl-
edge from the various courses and research helped me in the classroom, and also in
my consulting assignments. I used my industry knowledge in classroom lectures so
the students could see how some of the textbook methodologies actually are applied
in industry. At the same time, the knowledge from the classroom helped to
formulate and develop Monte Carlo solutions to industry applications as they
unfolded. This variety of experience allowed the author to view how simulation
can be used in industry. This book is based on this total experience.
Simulation has been a valuable tool in my professional life, and some of the
applications are listed below. The simulations models were from real applications
and were coded in various languages of FORTRAN, C++, Basic, and Visual Basic.
Some models were coded in an hour, others in several hours, and some in many
days, depending on the complexity of the system under study. The knowledge
gained from the output of the simulation models proved to be invaluable to the
research team and to the project that was in study. The simulation results allowed
the team to confidently make the decisions needed for the applications at hand. For
convenience, the models below are listed by type of application.

vii
viii Preface

Time Series Forecasting

• Compare the accuracy of the horizontal forecast model when using 12, 24 or 36
months of history.
• Compare the accuracy of the trend forecast model when using 12, 24 or 36
months of history.
• Compare the accuracy of the seasonal forecast model when using 12, 24, or 36
months of history.
• Compare the accuracy of forecasts between weekly and monthly forecast
intervals.
• Compare the accuracy benefit of forecasts when using month-to-date demands to
revise monthly forecasts.
• Compare the accuracy of the horizontal forecast model with the choice of the
alternative forecast parameters.
• Compare the accuracy of the trend forecast model with the choice of the
alternative forecast parameters.
• Compare the accuracy of the seasonal forecast model with the choice of the
alternative forecast parameters.
• In seasonal forecast models, measure how the size of the forecast error varies as
the season changes from low-demand months to high-demand months.

Order Quantity

• Compare the inventory costs for parts (with horizontal, trend, and seasonal
demand patterns) when stock is replenished by use of the following strategies:
EOQ, Month-in-Buy or Least Unit Cost.
• Compare various strategies to determine the mix of shoe styles to have in a store
that yields the desired service level and satisfies the store quota.
• Compare various strategies to determine the mix of shoe sizes for each style type
to have in a store that yields the desired service level and satisfies the store quota
for the style.
• Compare various strategies to find the initial-order-quantity that yields the least
cost for a new part in a service parts distribution center.
• Compare various strategies to find the all-time-requirement that yields the least
cost for a part in a service parts distribution center.
• Compare various ways to determine how to measure lost sales demand for an
individual part in a dealer.
• Compare strategies, for a multi-distribution system, on how often to run a
transfer routine that determines for each part when and how much stock to
transfer from one location to another to avoid mal-distribution.
Preface ix

Safety Stock

• Compare the costs between the four basic methods of generating safety stock:
month’s supply, availability, service level and Lagrange.
• Compare how the service level for a part reacts as the size of the safety stock and
the order quantity vary.
• Compare how a late delivery of stock by the supplier affects the service level of a
part.
• Compare strategies on how to find the minimum amount of early stock to have
available to offset the potential of late delivery by the supplier.
• Measure the relationship between the service level of a part and the amount of
lost sales on the part.

Production

• In mixed-model (make-to-stock) assembly, compare various strategies on how


to sequence the models down the line.
• In mixed-model (make-to-order) assembly, compare various strategies on how
to sequence the individual jobs down the line.
• In job-shop operations, determine how many units to initially produce to satisfy
the order needs and minimize the material, machine, and labor costs.
• In machine-loading operations, compare strategies on how to schedule the jobs
through the shop to meet due dates and minimize machine idle times.
• Compare strategies on how to set the number of bays (for maintenance and
repair) in a truck dealership that meets the customer needs and minimizes the
dealer labor costs.

Other

• In the bivariate normal distribution, estimate the cumulative distribution func-


tion for any combination of observations when the means and variances are
given, and the correlation varies between 1.0 and 1.0.
• In the bivariate lognormal distribution, estimate the cumulative distribution
function for any combination of observations when the means and variances of
the transformed variables are known, and the correlation varies from 1.0 to 1.0.
• In the multivariate normal distribution with k variables, an estimate of the
cumulative distribution function is obtained, for any combination of
observations when both the mean vector and the variance-covariance matrix
are known.
x Preface

• In an airport noise abatement study, noise measures were estimated, as in a


contour map, for the airport and for all blocks surrounding the airport. The noise
was measured with various combinations of: daily number of flights in and out,
the type of aircraft and engines, and the direction of the runways in use.
• In a study for the navy, some very complex queuing systems were in consider-
ation. Analytical solutions were developed, and when a level of doubt was
present in the solution, simulation models were developed to verify the accuracy
of the analytical solutions.
• A simulated database for part numbers were needed in the process of developing
various routines in forecasting and inventory replenishment for software
systems. These were for systems with one or more stocking locations. The
database was essential to test the effectiveness of the routines in carrying out
its functions in forecasting and inventory replenishments. The reader may note
that many of the fields in the database were jointly related and thereby simulated
in a correlated way.
Acknowledgments

Thanks especially to my wife, Elaine Thomopoulos, who encouraged me to write


this book, and who gave consultation whenever needed. Thanks also to the many
people who have helped and inspired me over the years, some of whom are former
IIT students from my simulation classes. I can name only a few here: Bob Allen
(R. R. Donnelly), Wayne Bancroft (Walgreens), Fred Bock (IIT Research Institute),
Harry Bock (Florsheim Shoe Company), Dan Cahill (International Truck and
Engine), Debbie Cernauskas (Benedictine University), Dick Chiapetta (Chiapetta,
Welch and Associates), Edine Dahel (Monterey Institute), Frank Donahue
(Navistar), John Garofalakis (Patras University), Tom Galvin (Northern Illinois
University), Tom Georginis (Lewis University), Shail Godambe (Motorola, North-
ern Illinois University), M. Zia Hassan (Illinois Institute of Technology), Willard
Huson (Navistar), Robert Janc (IIT Research Institute), Marsha Jance (Indiana
University – Richmond), Chuck Jones (Decision Sciences, Inc.), Tom Knowles
(Illinois Institute of Technology), Joachim Lauer (Northern Illinois University),
Carol Lindee (Panduit), Anatol Longinow (IIT Research Institute), Louca Loucas
(University of Cyprus), Nick Malham (FIC Inc.), Barry Marks (IIT Research
Institute), Jamshid Mohammadi (Illinois Institute of Technology), Fotis Mouzakis
(Cass Business School of London), Pricha Pantumsinchai (M-Focus), Ted Prenting
(Marist College), Ornlatcha Sivarak (Mahidol University), Spencer Smith (Illinois
Institute of Technology), Mark Spieglan (FIC Inc), Paul Spirakis (Patras Univer-
sity) and Tolis Xanthopoulos (IIT).

Nick T. Thomopoulos

xi
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Monte Carlo Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Random Number Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Computer Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Computer Simulation Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Basic Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter Summaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Random Number Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Modular Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Linear Congruent Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Generating Uniform Variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
32-Bit Word Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Random Number Generator Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Length of the Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Mean and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Chi Square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Pseudo Random Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Generating Random Variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Inverse Transform Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Continuous Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Discrete Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Accept-Reject Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Truncated Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

xiii
xiv Contents

Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Sorted Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Minimum Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Maximum Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Triangular Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Empirical Ungrouped Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Empirical Grouped Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Generating Continuous Random Variates . . . . . . . . . . . . . . . . . . . . 27
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Continuous Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Standard Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Erlang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
When k < 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
When k > 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Standard Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Weibull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Hastings Approximation of F(z) from z . . . . . . . . . . . . . . . . . . . . . 36
Hastings Approximation of z from F(z) . . . . . . . . . . . . . . . . . . . . . 37
Hastings Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Convolution Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Sine-Cosine Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Lognormal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Chi-Square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Approximation Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Relation to Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Generate a Random Chi-Square Variate . . . . . . . . . . . . . . . . . . . . . 41
Student’s t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Generate a Random Variate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Fishers’ F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5 Generating Discrete Random Variates . . . . . . . . . . . . . . . . . . . . . . 45
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Discrete Arbitrary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Discrete Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Bernoulli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Contents xv

Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
When n is Small . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Normal Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Poisson Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Hyper Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Pascal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Relation to the Exponential Distribution . . . . . . . . . . . . . . . . . . . . . 53
Generating a Random Poisson Variate . . . . . . . . . . . . . . . . . . . . . . 53
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6 Generating Multivariate Random Variates . . . . . . . . . . . . . . . . . . . 57
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Multivariate Discrete Arbitrary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Generate a Random Set of Variates . . . . . . . . . . . . . . . . . . . . . . . . 58
Multinomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Generating Random Multinomial Variates . . . . . . . . . . . . . . . . . . . 60
Multivariate Hyper Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Generating Random Variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Bivariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Marginal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Generate Random Variates (x1, x2) . . . . . . . . . . . . . . . . . . . . . . . . 64
Bivariate Lognormal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Generate a Random Pair (x1, x2) . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Multivariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Cholesky Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Generate a Random Set [x1, . . . , xk] . . . . . . . . . . . . . . . . . . . . . . . 67
Multivariate Lognormal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Cholesky Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Generate a Random Set [x1, . . . , xk] . . . . . . . . . . . . . . . . . . . . . . . 69
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7 Special Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Constant Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Batch Arrivals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Active Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Generate a Random Variate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Standby Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Generate a Random Variate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Random Integers Without Replacement . . . . . . . . . . . . . . . . . . . . . . . 76
Generate a Random Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
xvi Contents

Poker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Generate Random Hands to Players A and B . . . . . . . . . . . . . . . . . 77
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
8 Output from Simulation Runs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Terminating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Nonterminating Transient Equilibrium Systems . . . . . . . . . . . . . . . . . 80
Identifying the End of the Transient Stage . . . . . . . . . . . . . . . . . . . 81
Output Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Partitions and Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Nonterminating Transient Cyclical Systems . . . . . . . . . . . . . . . . . . . . 83
Output Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Cyclical Partitions and Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Other Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Forecasting Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Forecast and Replenish Database . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
9 Analysis of Output Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Variable Type Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Proportion Type Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Analysis of Variable Type Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Sample Mean and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Confidence Interval of m when x is Normal . . . . . . . . . . . . . . . . . . . 93
Approximate Confidence Interval of m when x
is Not Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
When Need More Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Analysis of Proportion Type Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Proportion Estimate and Its Variance . . . . . . . . . . . . . . . . . . . . . . . 97
Confidence Interval of p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
When Need More Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Comparing Two Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Comparing Two Means when Variable Type Data . . . . . . . . . . . . . . . 101
Comparing x1 and x2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Confidence Interval of (m1  m2) when Normal Distribution . . . . . . 101
Significant Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
When s1 ¼ s2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
When s1 6¼ s2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Approximate Confidence Interval of (m1  m2)
when Not Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
As Degrees of Freedom Increases . . . . . . . . . . . . . . . . . . . . . . . . . 104
Contents xvii

Comparing the Proportions Between Two Options . . . . . . . . . . . . . . . 105


Comparing p1 and p2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Confidence Interval of (p1  p2) . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Significant Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Comparing k Means of Variable Type Data . . . . . . . . . . . . . . . . . . . . 108
One-Way Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
10 Choosing the Probability Distribution from Data . . . . . . . . . . . . . . 113
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Collecting the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Test for Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Some Useful Statistical Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Location Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Candidate Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Transforming Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Transform Data to (0,1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Transform Data to ðx  0Þ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Candidate Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Continuous Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Lognormal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Weibull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Some Candidate Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . 119
Discrete Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Pascal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Estimating Parameters for Continuous Distributions . . . . . . . . . . . . . . 120
Continuous Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Lognormal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Estimating Parameters for Discrete Distributions . . . . . . . . . . . . . . . . 123
Discrete Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
xviii Contents

Pascal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Q-Q Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
P-P Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Adjustment for Ties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
11 Choosing the Probability Distribution When No Data . . . . . . . . . . 137
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Continuous Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Triangular . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Lognormal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Weibull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Solving for k1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Solving for k2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Appendix C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Chapter 1
Introduction

Monte Carlo Method

To apply the Monte Carol method, the analyst constructs a mathematical model that
simulates a real system. A large number of random sampling of the model is applied
yielding a large number of random samples of output results from the model. The
origin began in the 1940s by three scientists, John von Neumann, Stanislaw Ulam
and Nicholas Metropolis who were employed on a secret assignment in the Los
Alamos National Laboratory, while working on a nuclear weapon project called the
Manhattan Project. They conceived of a new mathematical method that would
become known as the Monte Carlo method. Stanislaw Ulam coined the name
after the Monte Carlo Casinos, located in Monaco. Monaco is a tiny country located
just south of France facing the Mediterranean Sea, and is famous for its beauty,
casinos, beaches, and auto racing. The Manhattan team formulated a model of a
system they were studying that included input variables, and a series of algorithms
that were too complicated to analytically solve.
The method is based on running the model many times as in random sampling.
For each sample, random variates are generated on each input variable; compu-
tations are run through the model yielding random outcomes on each output
variable. Since each input is random, the outcomes are random. In the same way,
they generated thousands of such samples and achieved thousands of outcomes for
each output variable. In order to carryout this method, a large stream of random
numbers were needed. Von Neumann developed a way to calculate pseudo random
numbers by using a middle-square method. Von Neumann realized the method had
faults, but he reasoned the method was the fastest that was then available, and he
would be aware when the method would fall out of alignment.
The Monte Carlo method proved to be successful and was an important instru-
ment in the Manhattan Project. After the War, during the 1940s, the method was
continually in use and became a prominent tool in the development of the hydrogen
bomb. The Rand Corporation and the U.S. Air Force were two of the top

N.T. Thomopoulos, Essentials of Monte Carlo Simulation: Statistical Methods 1


for Building Simulation Models, DOI 10.1007/978-1-4614-6022-0_1,
# Springer Science+Business Media New York 2013
2 1 Introduction

organizations that were funding and circulating information on the use of the Monte
Carlo method. Soon, applications started popping up in all sorts of situations in
business, engineering, science and finance.

Random Number Generators

A random number generator is a computerized or physical method that produces


numbers that have no sequential pattern and are arranged purely by chance. Since
the early times, many ways have been applied to generate random deviates, like:
rolling dice, flipping coins, roulette wheels, and shuffling cards. These physical
methods are not practical when a large number of random numbers are needed in
applications. In 1947, the Rand Corporation generated random numbers by use of
an electronic roulette type device that was connected to a computer. A book, with a
list of all the numbers, was published by the Rand Corporation (1946). The numbers
were also available on punched cards and tape. The numbers had found many
applications in statistics, experimental design, cryptography and other scientific
disciplines. However, with the advent of high-speed computers in the 1950s,
mathematical algorithms became practical and new developments led to improved
ways of generating a large stream of random numbers.

Computer Languages

Since the 1940s, many computer languages have been developed and in use in one
way or another allowing programmers to write code for countless applications.
Early languages like: COBOL, FORTRAN, Basic, Visual Basic , JAVA and C++,
were popular in developing computer simulation models.
As simulation became more popular in solving complex problems in business,
engineering, science, and finance, a new set of computer languages (e.g., GAUSS,
SAS, SPSS, R) have evolved. Some are for either of the two major types of
simulation: continuous and discrete, and some can handle both. These languages
allow the user to construct the process he/she is emulating and also has the ability to
gather statistics, perform data analysis and provide tables and graphs on outcome
summaries.
Discrete simulation models are when the events change one at a time, like in
queuing models where new customers arrive or depart the system individually or in
batches. Continuous simulation models are when the events are continuously
changing over time, according to a set of differential equations. Could be the
trajectory of a rocket.
Basic Fundamentals 3

Computer Simulation Software

The number of simulation software packages has also exploded over the years and
most apply for specific processes in industry, gaming and finance. The software
may include a series of mathematical equations and algorithms that are associated
with a given process. When the software fits the process under study, the user can
apply the software and quickly observe the outcomes from a new or modified
arrangement of the process without actually developing the real system. This ability
is a large savings in time and cost of development.
In the 1990s, Frontline presented the Solver application that was for spreadsheet
use to solve linear and nonlinear problems. This soon led to Microsoft’s Excel
Solver. Improvements were made and in the 2000s, Monte Carlo Simulation was
introduced as another Excel application where many trials of a process are auto-
matically performed from probability distributions specified by the user. In 2006,
yet another added feature is the RISK Solver Engine for EXCEL that performs
instant Monte Carlo Simulations whenever a user changes a number on a spreadsheet.

Basic Fundamentals

The early pioneers of the Manhattan Project were fully aware that the validity of
their model highly depended on the authenticity of the algorithms they formed as
well as the choice of the input probability distributions and parameter values they
selected. An error in the formulation could give misleading results. With their
creativity, intellect and careful construction, the output from their simulation
model was highly successful.
Monte Carlo methods are now extensively used in all industries and government
to study the behavior of complex systems of all sorts. Many of the applications are
performed with software programs, like the Excel Solver models, described earlier.
The casual user will run the powerful easy-to-use software model and collect the
results, and with them, make decisions in the work place, and that might be all that
is needed. This user may not be completely aware of how the model works inside
and may not have a need to know.
Another user may want to know more on how the model does what it does. Many
others who code their own simulation models need to know the fundamentals. This
book is meant for these users, giving the mathematical basis on developing simu-
lation models. A description is given on how pseudo random numbers are
generated, and further, how they are used to generate random variates of input
variables that come from specified probability distributions, discrete or continuous.
The book further describes how to cope with simulation models that are
associated with two or more variables that are correlated and jointly related.
These are called multivariate variables, and various distributions of this type are
described. Some are discrete, like the multinomial, multivariate hyper geometric,
and some are continuous like the multivariate normal and multivariate lognormal.
4 1 Introduction

In addition, the text helps those users who are confronted with a probability
distribution that does not comply with those that are available in the software in use.
Further, the system could be a non-terminating system that includes transient and
equilibrium (steady state) stages. The text also gives insight on how to determine
the end of the transient stage and the beginning of the equilibrium stage. Most
analysts only want to collect and analyze the data from the equilibrium stage.
Further, one chapter shows how to generate output data that are independent so
they can properly be analyzed with statistical methods. Methods are described to steer
the results so that the output data is independent. Another chapter presents a review
on the common statistical methods that are used to analyze the output results. This
includes the measuring of the average, variance, confidence intervals, tests between
two means or between two proportions, and the one-way analysis of variance.
The better the analyst can structure a simulation model to emulate the real system,
the more reliable the output results in problem solving decisions. Besides formulating
the equations and algorithms of the system properly, the analyst is confronted with
selecting the probability distributions that apply for each input variable in the model.
This is done with use of the data, empirical or sample, that is available. With this
data, the probability distributions are selected and the accompanying parameter
values are estimated. The better the fit, the better the model. One of the chapters
describes how to do this.
Another issue that sometimes confronts the analyst is to choose the probability
distribution and the corresponding parameter value(s) when no data is available for
an input variable. In this situation, the analyst relies of the best judgment of one or
more experts. Statistical ways are offered in the text to assist in choosing the
probability distribution and estimating the parameters.

Chapter Summaries

The following is a list of the remaining chapters and a quick summary on the content
of each.
Chapter 2. Random Number Generators Since early days, the many applications of
randomness have led to a wide variety of methods for generating random data of
various type, like rolling dice, flipping coins and shuffling cards. But these methods
are physical and are not practical when a large number of random data is needed in
an application. Since the advent of computers, a variety of computational methods
have been suggested to generate the random data, usually with random numbers.
Scientists, engineers and researchers are ever more developing simulation models
in their applications; and their models require a large – if not vast – number of
random numbers in processing. Developing these simulation models is not possible
without a reliable way to generate random numbers.
Chapter 3. Generating Random Variates Random variables are classified as discrete
or continuous. Discrete is when the variable can take on a specified list of values,
Chapter Summaries 5

and continuous is when the variable can assume any value in a specified interval.
The mathematical function that relates the values of the random variable with a
probability is the probability distribution. When a value of the variable is randomly
chosen according to the probability distribution, it is called a random variate. This
chapter describes the common methods to generate random variates for random
variables from various probability distributions. Two methods are in general use for
this purpose, one is called the Inverse Transform method (IT), and the other is the
Accept-Reject method (AR). The IT method is generally preferred assuming the
distribution function transforms readilly. If the distribution is mathematically
complicated and not easily transformed, the IT method becomes complicated and
is not easily used. The AR method generally requires more steps than the IT method
to achieve the random variate. The chapter presents various adaptations of these
two methods.
Chapter 4. Generating Continuous Random Variates A continuous random variable
has a mathematical function that defines the relative likelihood that any value in a
defined interval will occur by chance. The mathematical function is called the
probability density. For example, the interval could be all values from 10 to 50,
or might be all values zero or larger, and so forth. This chapter considers the more
common continuous probability distributions and shows how to generate random
variates for each. The probability distributions described here are the following: the
continuous uniform, exponential, Erlang, gamma, beta, Weibull, normal, lognormal,
chi-square, student’s t, and Fishers F. Because the standard normal distribution is so
useful in statistics and in simulation, and no closed-form formula is available, the
chapter also lists the Hastings approximation formula that measures the relationship
between the variable value and its associated cumulative probability.
Chapter 5. Generating Discrete Random Variates A discrete random variable
includes a specified list of exact values where each is assigned a probability of
occurring by chance. The variable can take on a particular set of discrete events,
like tossing a coin (head or tail), or rolling a die (1,2,3,4,5,6). This chapter considers
the more common discrete probability distributions and shows how to generate
random variates for each. The probability distributions described here are the
following: discrete arbitrary, discrete uniform, Bernoulli, binomial, hyper geometric,
geometric, Pascal and Poisson.
Chapter 6. Generating Multivariate Random Variates When two or more random
variables are jointly related in a probability way, they are labeled as multivariate
random variables. The probability of the variables occurring together is defined by a
joint probability distribution. In most situations, all of the variables included in the
distribution are continuous or all are discrete; and on less situations, they are a
mixture between continuous and discrete. This chapter considers some of the more
popular multivariate distributions and shows how to generate random variates for
each. The probability distributions described here are the following: multivariate
discrete arbitrary, multinomial, multivariate hyper geometric, bivariate normal,
bivariate lognormal, multivariate normal and multivariate lognormal. The Cholesky
6 1 Introduction

decomposition method is also described since it is needed to generate random


variates from the multivariate normal and the multivariate lognormal distributions.
Chapter 7. Special Applications This chapter shows how to generate random
variates for applications that are not directly bound by a probability distribution
as was described in some of the earlier chapters. The applications are instructively
useful and often are needed as such in simulation models. They are the following:
Poisson process, constant Poisson process, batch arrivals, active redundancy,
standby redundancy, random integers without replacement and poker.
Chapter 8. Output From Simulation Runs Computer simulation models are gener-
ally developed to study the performance of a system that is too complicated for
analytical solutions. The usual goal of the analyst is to develop a computer simula-
tion model that emulates the activities of the actual system as best as possible. Many
of these models are from terminating and nonterminating systems.
A terminating system is when a defined starting event B and an ending event C
are specified, and so, each run of the simulation model begins at B and ends at C.
This could be a model of a car wash that opens each day at 6 a.m. and closes at
8 p.m. Each simulation run would randomly emulate the activities from B to C.
A nonterminating system is where there is no beginning or ending events to the
system. The system often begins in a transient stage and eventually falls into either
an equilibrium stage or a cyclical stage. This could be a study of a maintenance
and repair shop that is always open. At the outset of the simulation model run, the
system is empty and may take some time to enter either an equilibrium stage or a
cyclical stage. This initial time period is called the transient stage.
A nonterminating system with transient and equilibrium stages might be a
system where the inter-arrival flow of new customers to the shop is steadily coming
from the same probability distribution. In the run of the simulation model, the
system begins in the transient stage and thereafter the flow of activities continues in
the equilibrium stage.
A nonterminating model with transient and cyclical stages could be a model of a
system where the probability distribution of the inter-arrival flow of new customers
varies by the hour of the day. The simulation run begins in a transient stage and
passes to the cyclical stage thereafter.
In either system, while the analyst is developing the computer model, he/she
includes code in the model to collect data of interest for later analysis. This output
data is used subsequently to statistically analyze the performance of the system.
Chapter 9. Analysis of Output Data This chapter is a quick review on some of the
common statistical tests that are useful in analyzing the output data from runs of a
computer simulation model. This pertains when each run of the model yields a
group of k unique output measures that are of interest to the analyst. When the
model is run n times, each with a different string of continuous uniform u ~ U(0,1)
random variates, the output data is generated independently from run to run, and
therefore the data can be analyzed using ordinary statistical methods. Some of
the output data may be of the variable type and some may be of the proportion type.
Chapter Summaries 7

The appropriate statistical method for each type of data is applied as needed. This
includes, measuring the average value and computing the confidence interval of the
true mean. Oftentimes, the simulation model is run with one or more control
variables in a ‘what if’ manner. The output data between the two or more settings
of the control variables can be compared using appropriate statistical tools. This
includes testing for significant difference between two means, between two
proportions, and between k or more means.
Chapter 10. Choosing the Probability Distribution From Sample Data In building a
simulation model, the analyst often includes several input variables of the control
and random type. The control variables are those that are of the “what if” type.
Often, the purpose of the simulation model is to determine how to set the control
variables in the real system seeking optimal results. For example, in an inventory
simulation model, the control variables may be the service level and the holding
rate, both of which are controlled by the inventory manager. On each run of the
model, the analyst sets the values of the control variables and observes the output
measures to see how the system reacts.
Another type of variable is the random input variables, and these are of the
continuous and discrete type. This type of variable is needed to match, as best as
possible, the real life system for which the simulation model is seeking to emulate.
For each such variable, the analyst is confronted with choosing the probability
distribution to apply and the parameter value(s) to use. Often empirical or sample
data is available to assist in choosing the distribution to apply and in estimating
the associated parameter values. Sometimes two or more distributions may seem
appropriate and the one to select is needed. The authenticity of the simulation
model largely depends on how well the analyst emulates the real system. Choosing
the random variables and their parameter values is vital in this process.
This chapter gives guidance on the steps to find the probability distribution to use
in the simulation model and how to estimate the parameter values that pertain. For
each of the random variables in the simulation model with data available, the
following steps are described: verify the data is independent, compute various
statistical measures, choose the candidate probability distributions, estimate the
parameter(s) for each probability distribution, and determine the adequacy of the fit.
Chapter 11. Choosing the Probability Distribution When No Data Sometimes the
analyst has no data to measure the parameters on one or more of the input variables
in a simulation model. When this occurs, the analyst is limited to a few distributions
where the parameters may be estimated without empirical or sample data. Instead of
data, experts are consulted who give their judgment on various parameters of the
distributions. This chapter explores some of the more common distributions where
such expert opinions are useful. The distributions described here are continuous and
are the following: continuous uniform, triangular, beta, lognormal and Weibull. The
type of data provided by the experts is the following type: minimum value,
maximum value, most likely value, average value, and a p-quantile value.
Chapter 2
Random Number Generators

Introduction

For many past years, numerous applications of randomness have led to a wide
variety of methods for generating random data of various type, like rolling dice,
flipping coins and shuffling cards. But these methods are physical and are not
practical when a large number of random data is needed in an application. Since
the advent of computers, a variety of computational methods have been suggested
to generate the random data, usually with random numbers. Scientists, engineers
and researchers are ever more developing simulation models in their applications;
and their models require a large – if not vast – number of random numbers in
processing. Developing these simulation models is not possible without a reliable
way to generate random numbers. This chapter describes some of the fundamental
considerations in this process.

Modular Arithmetic

Generating random numbers with use of a computer is not easy. Many mathe-
maticians have grappled with the task and only a few acceptable algorithms have
been found. One of the tools used to generate random numbers is by way of the
mathematical function called modular arithmetic. For a variable w, the modulo of
w with modulus m is denoted as: w modulo(m). The function returns the remainder
of w when divided by m. In the context here, w and m are integers, and the function
returns the remainder that also is an integer. For example, if m ¼ 5, and w ¼ 1,

w modulo ðmÞ ¼ 1 moduloð5Þ ¼ 1:

N.T. Thomopoulos, Essentials of Monte Carlo Simulation: Statistical Methods 9


for Building Simulation Models, DOI 10.1007/978-1-4614-6022-0_2,
# Springer Science+Business Media New York 2013
10 2 Random Number Generators

In the same way, should w ¼ 5, 6 or 17, then,

5 moduloð5Þ ¼ 0
6 moduloð5Þ ¼ 1
17 moduloð5Þ ¼ 2

and so forth. Hence, for example, the numbers 1, 6, 11, 16 are all congruent
modulo 1 when m ¼ 5. Note, the difference of any of the numbers, that have the
same remainder, is perfectly divisible by m, and thus they are congruent. Also
notice, when the parameter is m, the values returned are all integers, 0 to m-1.
An example of modular arithmetic is somewhat like the clock where the
numbers for the hours are always from 1 to 12. The same applies with the days of
the week (1–7), and the months of the year (1–12).

Linear Congruent Generators

In 1951, Lehmer introduced a way to generated random numbers, called the linear
congruent generator, LCG. This method adapts well to computer applications and
today is the most common technique in use. The method requires the use of the
modulo function as shown below.
LCG calls for three parameters (a, b, m) and uses modular arithmetic. To obtain
the i-th value of w, the function uses the prior value of w as in the method
shown below:

wi ¼ ða wi1 þ bÞ moduloðmÞ

In the above, w is a sequence of integers that are generated from this function,
and i is the index of the sequence.
Example 2.1 Suppose, m ¼ 32, a ¼ 5 and b ¼ 3. Also, assume the seed (at i ¼ 0)
for the LCG is w0 ¼ 11. Applying,

wi ¼ ð5wi1 þ 3Þ moduloð32Þ

yields the sequence:


26,5,28,15,14,9,16,19,2,13,4,23,22,17,24,27,10,21,12,31,30,25,0,3,18,29,20,7,
6,1,8,11
Notice there are 32 values of w for i ranging from 1 to 32, and all are different.
This is called a full cycle of all the possible remainders 0–31 when m ¼ 32. The last
number in the sequence is the same as the seed, w0 ¼ 11. The sequence of numbers
that are generated for the second cycle would be the same as the sequence from the
first cycle of 32 numbers because the seed has been repeated.
32-Bit Word Length 11

The reader should be aware that it is not easy to find the combination of a and b
that gives a full cycle for a modulus m. The trio of m ¼ 32, a ¼ 5 and b ¼ 3 is one
such combination that works.
Example 2.2 Now suppose the parameters are, m ¼ 32, a ¼ 7 and b ¼ 13. Also
assume the seed for the LCG is w0 ¼ 20. Applying,

wi ¼ ð7wi1 þ 13Þ moduloð32Þ

yields the sequence of 32 numbers:

25; 28; 17; 4; 9; 12; 1; 20;


25; 28; 17; 4; 9; 12; 1; 20;
25; 28; 17; 4; 9; 12; 1; 20;
25; 28; 17; 4; 9; 12; 1; 20;

Note there is a cycle of eight numbers, 25–20. In general, when one of the
numbers in the sequence is the same as the seed, w0 ¼ 20, the sequence repeats
with the same numbers, 25–20. In this situation, after eight numbers, the seed value
is generated, and thereby the cycle of eight numbers will continually repeat, in a
loop, as shown in the example above.

Generating Uniform Variates

The standard continuous uniform random variable, denoted as u, is a variable that is


equally likely to fall anywhere in the range from 0 to 1. The LCG is used to convert
the values of w to u by dividing w over m, i.e., u ¼ w/m. The generated values of
u will range from 0/m to (m  1)/m, or from zero to just less than 1. To illustrate,
the 32 values of w generated in Example 2.1 are used for this purpose. In Example
2.3, u ¼ w/m for the values listed earlier.
Example 2.3 The 32 values of u listed below (in 3 decimals) are derived from the
corresponding 32 values of w listed in Example 2.1. Note, ui ¼ wi/32 for i ¼ 1–32.
0.812, 0.156, 0.875, 0.468, 0.437, 0.281, 0.500, 0.593, 0.062, 0.406, 0.125,
0.718, 0.687, 0.531, 0.750, 0.843, 0.312, 0.656, 0.375, 0.968, 0.937, 0.781, 0.000,
0.093, 0.562, 0.906, 0.625, 0.218, 0.187, 0.031, 0.250, 0.343

32-Bit Word Length

The majority of computers today have word lengths of 32 bits. For these machines,
the largest number that is recognized is (231 – 1), and the smallest number is –(231–1).
The first of the 32 bits is used to identify whether the number is a plus or a minus,
leaving the remaining 31 bits to determine the number.
12 2 Random Number Generators

So, for these machines, the ideal value of the modulus is m ¼ (231–1) since a full
cycle with m gives a sequence with the largest number of unique random uniform
variables The goal is to find a combination of parameters that are compatible with
the modulus. Fishman and Moore (1982), have done extensive analysis on random
number generators determining their acceptability for simulation use. In 1969,
Lewis, Goodman and Miller suggested the parameter values of a ¼ 16,807 and
b ¼ 0; and also in 1969, Payne, Rabung and Bogyo offered a ¼ 630,360,016 with
b ¼ 0. These combinations have been installed in computer compilers and are
accepted as parameter combinations that are acceptable for scientific use. The Fishman
and Moore reference also identifies other multipliers that achieve good results.

Random Number Generator Tests

Mathematicians have developed a series of tests to evaluate how good a sequence of


uniform variates are with respect to truly random uniform variates. Some of the
tests are described below.

Length of the Cycle

The first consideration is how many variates are generated before the cycle repeats.
The most important rule is to have a full cycle with the length of the modulus m.
Assume that n uniform variates are generated and are labeled as (u1, . . ., un) where
n ¼ m or is close to m. The ideal would be to generate random numbers where the
numbers span a full cycle or very near a full cycle.

Mean and Variance

For the set of n variates (u1, . . ., un), the sample average and variance are computed
and labeled as u and su2, respectively. The goal of the random number generator
is to emulate the standard continuous uniform random variable u, denoted here
as u ~ U(0,1), with expected value E(u) ¼ 0.5 and the variance V(u) ¼ s2 ¼ 1/12 ¼
0.0833. So, the appropriate hypothesis mean and variance tests are used to compare
u to 0.5 and su2 to 0.0833.

Chi Square

The sequence (u1, . . ., un) are set in k intervals, say k ¼ 10, where i ¼ 1–10
identifies the interval for which each u falls. When k ¼ 10, the intervals are:
(0.0–0.1), (0.1–0.2), . . .., (0.9–1.0). Now let fi designate the number of u’s
Pseudo Random Numbers 13

that fall in interval i. Since n is the number of u’s in the total sample, the expected
number of u’s in an interval is ei ¼ 0.1n. With the ten sets of fi and ei, a Chi Square
(goodness-of-fit) test is used to determine if the sequence of u’s are spread equally
in the range from zero to one.
The above chi square test can be expanded to two dimensions where the pair of
u’s (ui, ui+1) are applied as follows. Assume the same ten intervals are used as above
for both ui and for ui+1. That means there are10  10 ¼ 100 possible cells where
the pair can fall. Let fij designate the number of pairs that fall in the cell ij. Since n
values of u are tested, there are n/2 pairs. So the expected number of units to fall in a
cell is eij ¼ 0.01n/2. This allows use of the Chi Square test to determine if fij is
significantly different than eij. For a truly uniform distribution, the number of
entries in a cell should be equally distributed. When more precision is called, the
length of the intervals can be reduced from 0.10 to 0.05 or to 0.01, for example.
In the same way, the triplet of u’s can be tested to determine if the u’s generated
follow the expected values from a truly uniform distribution. With k ¼ 10 and
with three dimensions, the entries fall into 10  10  10 ¼ 1,000 cubes, and for a
truly uniform distribution, the number of entries in the cubes should be equally
distributed.

Autocorrelation

Another test computes the autocorrelation between the u’s with various lags of
length 1, 2, 3,. . .. The ideal is for all the lag autocorrelations to be significantly close
to zero, plus or minus. When the lag is k, the estimate of the autocorrelation is the
following,
X X
rk ¼ ðui  0:5Þðuik  0:5Þ= i
ðui  0:5Þ2
i

Pseudo Random Numbers

In the earlier example when m ¼ 32, a full cycle of w’s were generated with the
parameters (a ¼ 5, b ¼ 3). Further when the seed was set at w0 ¼ 11, the sequence
of w’s generated were the following:
26,5,28,15,14,9,16,19,2,13,4,23,22,17,24,27,10,21,12,31,30,25,0,3,18,29,20,7,
6,1,8,11
With this combination of parameters, whenever the seed is w0 ¼ 11, the same
sequence of random numbers will emerge. So in essence, these are not truly random
numbers, since they are predictable and will fall exactly as listed above. These
numbers are thereby called pseudo random numbers, where pseudo is another term
for pretend.
14 2 Random Number Generators

Note, in the above, if the seed were changed to w0 ¼ 30, say, the sequence of
random numbers would be the following:
25,0,3,18,29,20,7,6,1,8,11,26,5,28,15,14,9,16,19,2,13,4,23,22,17,24,27,10,21,
12,31,30
As the seed changes, another full cycle of 32 numbers is again attained.
The examples here are illustrated with m ¼ 32, a small set of random values.
But when m is large like (231 – 1), a very large sequence of random numbers is
generated. In several simulation situations, it is useful to use the same sequence of
random numbers, and therefore the same seed is appropriately applied. In other
situations, the seed is changed on each run so that a different sequence of random
numbers is used in the analysis.

Summary

The integrity of computer simulation models is only as good as the reliability of the
random number generator that produces the stream of random numbers one after
the other. The chapter describes the difficult task of developing an algorithm to
generate random numbers that are statistically valid and have a large cycle length.
The linear congruent method is currently the common way to generate the random
numbers for a computer. The parameters of this method include the multiplier and
the seed. Only a few multipliers are statistically recommended, and two popular
ones in use for 32-bit word length computers are presented. Another parameter is
the seed and this allows the analyst the choice of altering the sequence of random
numbers with each run, and also when necessary, offers the choice of using the
same sequence of random numbers from one run to another.
Chapter 3
Generating Random Variates

Introduction

Random variables are classified as discrete or continuous. Discrete is when the


variable can take on a specified list of values, and continuous is when the variable
can assume any value in a specified interval. The mathematical function that relates
the values of the random variable with a probability is the probability distribution.
When a value of the variable is randomly chosen according to the probability
distribution, it is called a random variate. This chapter describes the common
methods to generate random variates for random variables from various probability
distributions. Two methods are in general use for this purpose, one is called the
Inverse Transform method (IT), and the other is the Accept-Reject method (AR).
The IT method is generally preferred assuming the distribution function transforms as
needed. If the distribution is mathematically complicated and not easily transformed,
the IT method becomes complicated and is not easily used. The AR method generally
requires more steps than the IT method. The chapter presents various adaptations of
these two methods.
For notation in this chapter, and in the entire book, when a continuous uniform
variate falls equally in the range from zero to one, the notation will be u ~ U(0,1).
In the examples of the book, when a variate of u ~ U(0,1) is obtained, for simplic-
ity, only two or three decimals are used to show how the routine is run. Of course,
in real simulation situations, the u variates with all decimals in place are needed.

Inverse Transform Method

Perhaps the most common way to generate a random variate for a random variable
is by the inverse transform method. The method applies to continuous and discrete
variables.

N.T. Thomopoulos, Essentials of Monte Carlo Simulation: Statistical Methods 15


for Building Simulation Models, DOI 10.1007/978-1-4614-6022-0_3,
# Springer Science+Business Media New York 2013
16 3 Generating Random Variates

Continuous Variables

Suppose x is a continuous random variable with probability density f(x) for a  x  b.


Ðx
The cumulative distribution function (cdf) of x becomes FðxÞ ¼ f ðxÞdx where
a
0  F(x)  1. Since u  U(0,1) and F(x) both range between 0 and 1, a random
variate of u is generated and then F(x) is set to equal u, from which the associated
value of x is found. The routine below describes the procedure:
1. Generate a standard uniform random variate u  U(0,1).
2. Set F(x) ¼ u.
3. Find the value of x that corresponds to F(x) ¼ u, i.e., x ¼ F1(u).
4. Return x.
The function F1(u) is called the inverse function of F(x) ¼ u.
Example 3.1 Suppose x is a random variable with

fðxÞ ¼ 0:125x for 0  x  4:

The associated cdf is below:

FðxÞ ¼ 0:0625x2 for 0  x  4:

To find a random variate of x, the inverse function of F(x) is derived as below:


1. Set u ¼ F(x) ¼ 0.0625x2.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2. Hence, x ¼ F1 ðuÞ ¼ u=0:0625.
3. Generate a random variate, u ~ U(0,1). Assume u ¼ 0.71.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
4. Compute x ¼ 0:71=0:0625 ¼ 3:370.
5. Return x ¼ 3.370.

Discrete Variables

Now consider a discrete random variable, xi, where i ¼ 1, 2, . . .. with probability


distribution P(xi) for i ¼ 1, 2, . . .. The cumulative distribution function of xi is
F(xi) ¼ P(x  xi). To generate a random variate with the inverse transform
method, the following routine is run:
1. Generate a random standard uniform variate, u  U(0,1).
2. From F(xi), find the minimum i, say i0, where u < F(xi).
3. Return xio.
Example 3.2 Suppose a discrete random variable x with range, (0, 1, 2, 3) and
probabilities, p(0) ¼ 0.4, p(1) ¼ 0.3, p(2) ¼ 0.2 and p(3) ¼ 0.1. Note the cumulative
distribution function becomes: F(0) ¼ 0.4, F(1) ¼ 0.7, F(2) ¼ 0.9 and F(3) ¼ 1.0.
Truncated Variables 17

Assume a simulation model is in progress and a random variate of x is needed.


To comply, the following routine is run:
1. Generate a random standard uniform, u ~ U(0,1), say, u ¼ 0.548.
2. Find the minimum x where u < F(x). Note this is x ¼ 1.
3. Return x ¼ 1.

Accept-Reject Method

Consider x as a continuous random variable with density f(x) for a  x  b.


Further, for notation sake, let x~ denote the mode of x, and thereby f’ ¼ f(~x) is the
density value at the mode of x. Note where all values of f(x) will be equal or less
than f’. To find a random variate of x, the following routine of five steps is run:
1. Generate two uniform random variates, u ~ U(0,1), as u1, u2.
2. Find x ¼ a þ u1(ba).
3. Compute f(x).
4. If u2 < f(x)/f’, then accept x and go to 5.
Else, reject x and repeat steps 1–4.
5. Return x.
Example 3.3 Suppose x is a random variable with

fðxÞ ¼ 0:125x for 0  x  4:

The mode is at x~ ¼ 4 and thereby f’ ¼ f(4) ¼ 0.5. So, now the four steps noted
above are followed:
1. Generate (u1, u2) ¼ (0.26, 0,83), say.
2. x ¼ 0 þ 0.26(40) ¼ 1.04.
3. f(1.04) ¼ 0.13.
4. Since 0.83 > 0.13/0.50 ¼ 0.26, reject x ¼ 1.04, and repeat the steps 1–4.
1. Generate (u1, u2) ¼ (0.72, 0,15), say.
2. x ¼ 0 þ 0.72(40) ¼ 2.88.
3. f(2.88) ¼ 0.36.
4. Since 0.15 < 0.36/0.50 ¼ 0.72, accept x ¼ 2.88.
5. Return x ¼ 2.88.

Truncated Variables

Sometimes, when the inverse transform method applies, the random variable of
interest is a truncated portion of another random variable. For example, suppose x
has density f(x) for a  x  b, and F(x) is the associated cumulative distribution
18 3 Generating Random Variates

function of x. But assume the variable of interest is only a portion of the original
density where c  x  d and the limits c, d are within the original limits of a and b,
Therefore, a  c  x  d  b. Note the new density of this truncated variable
becomes,

gðxÞ ¼ fðxÞ=½FðdÞ  FðcÞ for c  x  d

To find a random variate of this truncated variable, the following routine is


applied. Note, however, the routine listed below does not need the truncated density
g(x) just described above:
1. Compute F(c) and F(d).
2. Generate a random uniform variate u  U(0,1).
3. Find v ¼ F(c) þ u[F(d)F(c)].
Note, F(c)  v  F(d).
4. Set the cumulative distribution to v, i.e., F(x) ¼ v.
5. Find the value of x that corresponds to F(x) ¼ v, i.e., x ¼ F1(v).
6. Return x.
Example 3.4 Suppose x is a random variable with

fðxÞ ¼ 0:125x for 0  x  4:

and recall the cdf below,

FðxÞ ¼ 0:0625x2 for 0  x  4:

Assume a random variate between the limits of 1 and 2 is required in a simu-


lation analysis, i.e., 1  x  2. To accomplish, the four steps below are followed:
1. Note, F(1) ¼ 0.0625 and F(2) ¼ 0.2500.
2. Generate a random u ~ U(0,1), say u ¼ 0.63.
3. v ¼ 0.0625 þ 0.63[0.25000.0625]
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi¼ 0.1806.
4. x ¼ F1 ð0:1806Þ ¼ 0:1806=0:0625 ¼ 1:70:
5. Return x ¼ 1.70.

Order Statistics

Suppose n samples are taken from a continuous random variable, x, with density
f(x) and cumulative distribution F(x), and these are labeled as (x1, . . ., xn). Sorting
the n samples from low to high yields, x(1)  x(2)  . . ...  x(n)where x(i) is the i-th
lowest value in the sample.
Order Statistics 19

Sorted Values

The notation y is here used to denote the i-th sorted value from the n samples of x.
From order statistics, the probability density of y becomes:

gðyÞ ¼ n!=½ði  1Þ!ðn  iÞ!fðyÞFðyÞi1 ½1  FðyÞni

Note, there is one value of x ¼ y, (i-1) values of x smaller than y, and (n-i)
values larger than y. See Rose and Smith, (2002) for more detail on order statistics.

Minimum Value

Suppose y is the smallest value of x, whereby, y ¼ min(x1, . . ., xn). The probability


density of y becomes:

gðyÞ ¼ nfðyÞ ½1  FðyÞn1

and the corresponding distribution function is:

GðyÞ ¼ ½1  FðyÞn

To generate a random variate for the minimum of n values from a continuous


random variable x, the following routine is run:
1. Generate a random u  U(0,1).
2. Set G(y) ¼ u and F(y) ¼ v.
Hence, u ¼ (1v)nand v ¼ [1u1/n].
3. Using the inverse transform method, compute y ¼ F1(v).
4. Return y.

Maximum Value

When w is the largest value of x, then w ¼ max(x1, . . ., xn). Hence, the probability
density of w becomes:

gðwÞ ¼ nfðwÞ FðwÞn1

and the corresponding distribution function is,

GðwÞ ¼ FðwÞn
20 3 Generating Random Variates

To generate a random variate for the maximum of n values of a continuous


random variable x, the following routine is run:
1. Generate a random u  U(0,1).
2. Set G(w) ¼ u and F(w) ¼ v.
Hence, u ¼ vn.
and v ¼ u1/n.
3. Using the inverse transform, compute w ¼ F1(v).
4. Return w.
Example 3.5 Suppose the density for a random variable x is f(x) ¼ 0.125x
(0  x  4), and thereby, the cumulative distribution is,

FðxÞ ¼ 0:0625x2 for 0  x  4:

Recall, also where the inverse function is


pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
F1 ðuÞ ¼ u=0:0625

Suppose n ¼ 8 samples of x are taken and of interest is to generate a random


variate for the minimum value of the samples. To accomplish, the following steps
are taken:
1. Generate a random variate u  U(0,1), say u ¼ 0.37.
2. v ¼p[10.37 1/8
] ¼ 0.117.
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3. y ¼ 0:117=0:0625 ¼ 1:367:
4. Return y ¼ 1.367.
Note F(1.367) ¼ 0.117.
Example 3.6 Suppose the density for a random variable x is f(x) ¼ 0.125x
(0  x  4), and thereby, the cumulative distribution is,

FðxÞ ¼ 0:0625x2 for 0  x  4:

Recall, also where the inverse function is


pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
F1 ðuÞ ¼ u=0:0625

Suppose n ¼ 8 samples of x are taken and of interest is to generate a random


variate for the maximum value of the samples. To accomplish, the following steps
are taken:
1. Generate a random variate u  U(0,1), say u ¼ 0.28.
2. v ¼ 0.28 1/8
¼ 0.853.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3. w ¼ 0:853=0:0625 ¼ 3:694:
4. Return w ¼ 3.694.
Note F(3.694) ¼ 0.853.
Summation 21

Composition

Sometimes the random variable is composed of a series of probability densities


where each density occurs with a probability. This happens when there are k
densities, fi(x), where the probability of density i being selected is pi, and the sum
of all the pi is one. In essence, x is a random variable with probability density
as below:

fðxÞ ¼ p1 f 1 ðxÞ þ . . . : þ pk f k ðxÞ

P
k
and pi ¼ 1
i¼1
Note, where each of the k densities, fi(x), has a unique cumulative distribution
function, Fi(x), and a corresponding unique inverse function, Fi1(u).
The composition can be described as below.

i fi(x) pi Gi
1 f1(x) p1 G1 ¼ p1
...
k fk(x) pk Gk ¼ Gk1þ pk

The term Gi is the cumulative distribution function of the pi’s, and when i ¼ k,
Gk ¼ 1.
Example 3.7 The density for variable x is composed of two separate densities,
f1(x) ¼ 1.25x for (0  x  4) and f2(x) ¼ 0.25 for (2  x  6). The associated
probabilities are p1 ¼ 0.6 for f1(x), and p2 ¼ .4 for f2(x). So, G1 ¼ 0.6 and
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
G2 ¼ 1.0. Note also, F1 1 ðuÞ ¼ u=0:0625 , and F21(u) ¼ 2 þ 4u. To generate
a random x, two random u  U(0,1), are needed, u1 and u2. The steps below are
followed:
1. Find two random uniform variates, u ~ U(0,1). Say (u1, u2) ¼ (0.14, 0.53).
2. Since u1 < G1 ¼ 0.60, density f1(x) is used.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3. x ¼ F1 1 ð0:53Þ ¼ 0:53=0:0625 ¼ 2:91.
4. Return x ¼ 2.91.

Summation

On some occasions when generating a random variate, the sum of a random variable
is needed, as in y ¼ (x1 þ . . . þ xk). For notation convenience in this section, y
will denote the sum of k independent samples of x, where x is a random variable
with distribution f(x). This method of summing is applied in subsequent chapters, in
a convolution manner, to generate a random variate for the continuous Erlang
distribution, and also for the discrete Pascal distribution.
22 3 Generating Random Variates

To generate a random value of y from a continuous x with the inverse transform,


F1(u), known, the following loop is followed:
1. Set y ¼ 0.
2. For i ¼ 1 to k.
3. Generate a random ui  U(0,1).
4. xi ¼ F1(ui).
5. y ¼ y þ xi .
6. Next i.
7. Return y.
Example 3.8 Suppose x is a continuous random variable with density f(x) ¼
0.125x for (0  x  4). The associated inverse function of x is F1 ðuÞ ¼
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u=0:0625 . Assume a sum of k ¼ 2 is called and the result is another random
variate y. The steps below show how one random observation of y is generated:
1. Generate (u1, u2) ¼ (0.44, 0.23), say.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2. x1 ¼ F1 ð0:44Þ ¼ 0:44=0:0625 ¼ 2:653
y ¼ 2:653:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3. x2 ¼ F1 ð0:23Þ ¼ 0:23=0:0625 ¼ 1:918.
y ¼ 2:653 þ 1:918 ¼ 4:571.
4. Return y ¼ 4.571.

Triangular Distribution

The triangular is a continuous distribution that is sometimes used in simulation


models when the true distribution of the variable is not known. To apply, the analyst
estimates the minimum value of the variable, the maximum value, and the most
likely value. These estimated values are denoted as: a, b and x~, respectively. The
random variable is labeled as x where (a  x  b) and (a  x~  b).
Another variable is now introduced and follows the standard triangular distri-
bution. The random variable is x0 and this ranges from 0 to 1 where the most likely
value (the mode) is labeled x~0 . So (0  x0  1) and (0  x~0  1). The notation for
the two variables,triangular x and standard triangular x’, are below:

x  Tða; b; x~Þ

x0  Tð0; 1; x~0 Þ

The standard triangular x’ is related to the triangular x as follows:

x0 ¼ ðx  aÞ=ðb  aÞ
Triangular Distribution 23

and
x~0 ¼ ð~
x  aÞ=ðb  aÞ

The probability density of x’ is the following:

fðx0 Þ ¼ 2x0 =~
x0 ð0  x0  x~0 Þ
2ð1  x0 Þ=ð1  x~0 Þ x0 < x0  1Þ
ð~

and the associated cumulative distribution function becomes:

Fðx0 Þ ¼ x02 =~
x ð0  x0  x~0 Þ
1  ð1  x0 Þ2 =ð1  x~0 Þ ð~
x0 <x0  1Þ

The mean and variance of the standard triangular x’ are below.

Eðx0 Þ ¼ ð~
x0 þ 1Þ=3

Vðx0 Þ ¼ ð1 þ x~02  x~0 Þ=18

The expected value and variance of the triangular x is related to the same from
the standard triangular x0 as shown below.

EðxÞ ¼ a þ Eðx0 Þ½b  a

VðxÞ ¼ Vðx0 Þ½b  a2

To find a random variate for a standard triangular x0 ~ T(0,1,~


x0 ), the following
routine is run:
1. Generate a random u  U(0,1).
pffiffiffiffiffiffi
2. If u  x~0 x0 ¼ u~x0 .
0 0
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
x x ¼ 1  ð1  x~0 Þð1  uÞ.
If u >~
3. Return x0 .
x) becomes x ¼ a + x0 (ba).
Note the corresponding value of T(a,b,~
Example 3.9 Consider a variable x that is from a triangular distribution where x
ranges from 20 to 60 and the most likely value is 30. When a random variate of x is
needed in a simulation model, the following steps take place:
1. The mode x~0 for the standard triangular distribution is computed by x~0 ¼ (3020)/
(6020) ¼ 0.25.
2. A random uniform variate,u ~ U(0,1), is generated. Say u ¼ 0.38.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3. Since u>~x0 ; x0 ¼ 1  ð1  0:25Þð1  0:38Þ ¼ 0:318.
4. x ¼ 20 þ 0.318[6020] ¼ 32.72.
5. Return x ¼ 32.72.
24 3 Generating Random Variates

Empirical Ungrouped Data

Sometimes, in simulation modeling, the data for a variable is not used to seek the
theoretical continuous density, but instead is applied directly to define the distribu-
tion. The density that results is called the empirical distribution. Suppose the data is
denoted as (x1, . . ., xn), and when sorted from low to high, it becomes x(1)  x(2)
 . . .  x(n). The data is then arranged in tabular form as below, along with the
associated cumulative distribution function, Fx(i).

i x(i) Fx(i) ¼ (i-1)/(n-1)


1 x(1) 0
2 x(2) 1/(n-1)
...
n x(n) 1

To generate a random x, the composition method is used in the following way:


1. Generate two random uniform variates, u ~ U(0,1), u1 and u2.
2. If Fx(i)  u1 < Fx(iþ1), set x ¼ x(i) þ u2[x(i þ 1)x(i)].
3. Return x.
Example 3.10 Suppose five observations of a continuous random variable are the
following: 10, 5, 2, 20, 11. When sorted, they become: 2, 5, 10, 11, 20. The tabular
form and the cumulative distribution function are listed below:

i x(i) Fx(i)
1 2 0.00
2 5 0.25
3 10 0.50
4 11 0.75
5 20 1.00

To generate a random x by the composition method, the following steps are


followed:
1. Generate two random uniform variates, say (u1, u2) ¼ (0.82, 0.13).
2. Since 0.75  0.82 < 1.00, i ¼ 4, and x ¼ 11 þ 0.13[2011] ¼ 12.17.
3. Return x ¼12.17.
Another way to generate the random variate is by the inverse transform method,
as given below:
1. Generate a random uniform variate, u ~ U(0,1).
2. If Fx(i)  u < Fx(iþ1), set x ¼ x(i) þ {[uFx(i)]/[Fx(iþ1)Fx(i)]}[x(i þ 1)x(i)].
3. Return x.
Empirical Grouped Data 25

Example 3.11 Assume the same data (2, 5, 10, 11, 20) as Example 3.10.
To generate a random x by the inverse transform method, the following steps are
followed:
1. Generate a random uniform variate, say u ¼ 0.82.
2. Since 0.75  0.82 < 1.00, i ¼ 4, and
x ¼ 11 þ {[0.820.75]/[1.000.75]}[2011] ¼ 13.52.
3. Return x ¼13.52.

Empirical Grouped Data

Sometimes the data comes in grouped form as shown in the table below. Note the k
intervals where (ai  x < bi) identify the limits of x within each interval, and fi is
the frequency of samples in interval i. The sum of the frequencies is denoted as n.
The cumulative distribution function for interval i becomes Fi ¼ Fi1þ fi/n, where
F0 ¼ 0.

i [ai, bi) fi Fi
F0 ¼ 0
1 [a1,b1) f1 F1 ¼ F0 þ f1/n
2 [a2,b2) f2 F2 ¼ F1 þ f2/n
...
k [ak,bk) fk Fk ¼ Fk1 þ fk/n

To generate a random x, the composition method is used in the following way:


1. Generate two random uniform variates, u ~ U(0,1), u1, u2.
2. Find the interval, i, where Fi1  u1 < Fi, and set x ¼ ai þ u2(biai).
3. Return x.
Example 3.12 Suppose a variable to be used in a simulation study is presented in
grouped form with the five intervals as listed below. Note, the table lists the range
within each interval, the associated frequency and also the cumulative distribution.
The sum of the frequencies is n ¼ 80.

i [ai, bi) fi Fi
1 [5,8) 2 0.0250
2 [8,11) 27 0.3625
3 [11,13) 32 0.7625
4 [13,15) 15 0.9500
5 [15,18) 4 1.0000

To find a random x by the composition method, the routine below is applied:


1. Generate two random uniform variates, u ~ U(0,1). Say (u1, u2) ¼ (0.91, 0.37).
2. Since 0.7625  0.91 < 0.9500, i ¼ 4 and x ¼ 13þ 0.37(1513) ¼ 13.74.
3. Return x ¼ 13.74.
26 3 Generating Random Variates

Another way to generate the random variate is by the inverse transform method
as shown below:
1. Generate a random uniform variate, u  U(0,1).
2. Find the interval, i, where Fi1  u < Fi, and set x ¼ ai þ {[uFi1]/
[FiFi1]}(biai).
3. Return x.
Example 3.13 Assume the same data as in Example 3.12. To find a random x by
the inverse transform method, the routine below applies:
1. Generate a random uniform variate, u ~ U(0,1). Say u ¼ 0.91.
2. Since 0.7625  0.91 < 0.9500, i ¼ 4 and
x ¼ 13þ {[0.910.7625]/[0.95000.7625]}(1513) ¼ 14.573.
3. Return x ¼ 14.573.

Summary

This chapter shows how the continuous uniform u ~ U(0,1) random variates are
used to generate random variates for random variables from defined probability
distributions. To accomplish in a computer simulation model, a random number
generator algorithm is applied whenever a random uniform u ~ U(0,1) variate is
needed. The random number generator is the catalyst that delivers the uniform,
u ~ U(0,1), random variates, one after another, as they are needed in the simulation
model. This is essential since it allows the analyst the opportunity to create
simulation models that use any probability distribution that pertains and gives
flexibility to emulate the actual system under study.
Chapter 4
Generating Continuous Random Variates

Introduction

A continuous random variable has a mathematical function that defines the relative
likelihood that any value in a defined interval will occur by chance. The mathe-
matical function is called the probability density. For example, the interval could be
all values from 10 to 50, or might be all values zero or larger, and so forth. This
chapter considers the more common continuous probability distributions and shows
how to generate random variates for each. The probability distributions described
here are the following: the continuous uniform, exponential, Erlang, gamma, beta,
Weibull, normal, lognormal, chi-square, student’s t, and Fishers F. Because the
standard normal distribution is so useful in statistics and in simulation, and no
closed-form formula is available, the chapter also lists the Hastings approximation
formula that measures the relationship between the variable value and its associated
cumulative probability.

Continuous Uniform

A variable x is defined as continuous uniform with parameters ða; bÞ when x is


equally likely to fall anywhere from a to b. For example, suppose an experiment in a
laboratory gives a temperature reading that rounds to 30 Fahrenheit. The true
temperature, though, in three decimals, would be somewhere between 29.500 and
30.499 . Hence, assume a continuous distribution where x has parameters
a ¼ 29.500 and b ¼ 30.499.
The probability density of x is,

fðxÞ ¼ 1=ðb  aÞ for a  x  b

N.T. Thomopoulos, Essentials of Monte Carlo Simulation: Statistical Methods 27


for Building Simulation Models, DOI 10.1007/978-1-4614-6022-0_4,
# Springer Science+Business Media New York 2013
28 4 Generating Continuous Random Variates

and the cumulative distribution function becomes,

FðxÞ ¼ ðx  aÞ=ðb  aÞ for a  x  b

The expected value and the variance of x are the following:

EðxÞ ¼ ðb þ aÞ=2

VðxÞ ¼ ðb  aÞ2 =12

To generate a random variate from the continuous uniform distribution, the


following routine is run:
1. Generate a random uniform u  U(0,1)
2. x ¼ a þ u(b  a)
3. Return x
Example 4.1 Suppose x is continuous uniform with parameters (10, 20) and a
random variate of x is needed. To accomplish, the following steps are followed:
1. Generate a random uniform u  U(0,1), say u ¼ 0.68
2. x ¼ 10 þ 0.68(20  10) ¼ 16.8
3. Return x ¼ 16.8

Exponential

The exponential distribution is heavily used in many applications and especially in


queuing systems to define the time between units arriving to a system, and also for
the times associated with servicing the units in the system. The probability density,
f(x), is largest at x ¼ 0 and continually decreases as x increases. The density has
one parameter, y, and is defined as below:

fðxÞ ¼ yeyx for x  0

The associated cumulative distribution function becomes,

FðxÞ ¼ 1  eyx for x  0

The mean and variance of x are the following,

m ¼ 1=y

and

s2 ¼ 1=y2
Exponential 29

The inverse transform method is used to generate a random variate x in the


following way. A random continuous uniform variate from u  U(0,1) is obtained
and is set equal to F(x) as below:

FðxÞ ¼ u ¼ 1  eyx

Now, solving for x, yields the random variate of x by the relation,

x ¼ 1=y lnð1  uÞ

where ln denotes the natural logarithm.

Standard Exponential

Note, the expected value of x is EðxÞ ¼ 1=y,and in the special case when y ¼ 1, the
expected value of x is E(x) ¼ 1. When x ¼ 1 (the mean), F(1) ¼ 0.632, indicating
that 63.2 % of the values of x are below the mean value and 36.8 % are above.
In this special situation at E(x) ¼ 1, the distribution is like a standard exponential
distribution. The list below relates some selective values of the cumulative distri-
bution function, F(x), with the corresponding values of x.

F(x) x
0.0 0.000
0.1 0.105
0.2 0.223
0.3 0.357
0.4 0.511
0.5 0.693
0.6 0.916
0.7 1.204
0.8 1.609
0.9 2.303

Note, the median occurs at x ¼ 0.693 and the mean at x ¼ 1.00, indicating the
distribution skews far to the right.
Example 4.2 Assume an exponential distributed random variable x with a mean of
20 and a random variate of x is called in a simulation model. To generate the
random x, the following steps are followed:
1. Generate a random uniform u  U(0,1), say u ¼ 0.17
2. x ¼ 20  ln(10.17) ¼ 3.727
3. Return x ¼ 3.727
30 4 Generating Continuous Random Variates

Erlang

In some queuing systems, the time associated with arrivals and service times is
assumed as an Erlang continuous random variable. The Erlang variable has two
parameters, y and k. The parameter y is the scale parameter, and k, an integer,
identifies the number of independent exponential variables that are summed
together to form the Erlang variable. In this way, the Erlang variable x is related
to the exponential variable y as below:

x ¼ ðy1 þ . . . þ yk Þ;

The expected value of x is related to the expected value of y as below:

EðxÞ ¼ kEðyÞ ¼ k=y

and the variance of x is derived from adding k variances of y, V(y), as below:

VðxÞ ¼ kVðyÞ ¼ k=y2

Note, when k ¼ 1, the Erlang variable is the same as an exponential variable


where the mode is zero and the density skews far to the right. As k increases, the
shape of the Erlang density resembles a normal variable, via the central limit
theorem.
The probability density of x is

fðxÞ ¼ xk1 yk eyx =ðk  1Þ! x  0

and the cumulative distribution function is

X
k1
FðxÞ ¼ 1  eyx ðyxÞj =j! x 0
j¼0

To generate a random Erlang variate of x with parameters, y and k, the following


routine is run:
1. Set x ¼ 0
2. For j ¼ 1 to k
3. Generate a random continuous uniform variate u  U(0,1)
4. y ¼ ð1=yÞ ln ð1  uÞ
5. Sum x ¼ x þ y
6. Next j
7. Return x
Gamma 31

Example 4.3 Suppose a random Erlang variate is needed in a simulation run


for variable x with parameter k ¼ 4 and whose mean is 20. Note, because E(x) ¼
4/y ¼ 20, y ¼ 4/20 ¼ 0.20. The following three steps yield the random x:
1. Generate four random continuous uniform u  U(0,1) variates, say (u1,u2, u3, u4) ¼
(0.27, 0.69, 0.18, 0.76).
2. Using y ¼ (1/0.2)[ln(1  u)], the corresponding random exponential variates
are: (y1, y2, y3, y4) ¼ (1.574, 5.856, 0.992, 7.136).
3. Summing the four exponentials yields: x ¼ 14.658.
4. Return x ¼ 14.658.

Gamma

The gamma distribution is almost the same as the Erlang distribution, except the
parameter k is any value larger than zero whereas k is a positive integer for the
Erlang. Also, x is any value greater or equal to zero. The density of the gamma is:

fðxÞ ¼ xk1 yk eyx =GðkÞ x  0

where G(k) is called the gamma function, (not a density), defined as


ð1
GðkÞ ¼ tk1 et dt for k>0
0

When k is a positive integer, G(k) ¼ (k  1)!


The mean and variance of x are the following:

m ¼ k=y

and

s2 ¼ k=y2

To generate a random gamma variate is not easy. The method presented here
depends on whether k < 1 or k > 1. Note, when, k ¼ 1, the distribution is the same
as an exponential.

When k < 1

A routine to generate a random x, when k < 1, comes from Ahrens and Dieter
(1974). It is based on the Accept-Reject method and is shown below in five steps.
In the computations below, x0 is a gamma variate with y ¼ 1, and x is a gamma
variate with any positive y:
32 4 Generating Continuous Random Variates

1. Set b ¼ (e þ k)/e where e  2.71828 is the base of the natural logarithm.


2. Generate two random uniform u  U(0,1)variates, u1 and u2.
p ¼ bu1
if p > 1 go to step 4
if p  1 go to step 3
3. y ¼ p1/k
if u2  ey, set x0 ¼ y, go to step 5
if u2 > ey, go to step 2
4. y ¼ ln[(b  p)/k]
if u2  yk  1, set x0 ¼ y, go to step 5
if u2 > yk  1, go to step 2
5. Return x ¼ x0 /y.

When k > 1

Cheng (1977) developed the routine to generate a random x when k > 1. The
method uses the Accept-Reject method as shown in the five steps listed below. Note
below where x0 is a gamma variate with y ¼1, and x is a gamma variate with any
positive y:
pffiffiffiffiffiffiffiffiffiffiffiffiffi
1. Set a ¼ 1/ 2k  1
b ¼ k  ln4, where ln ¼ natural logarithm.
q ¼ k þ 1/a
c ¼ 4.5
d ¼ 1 þ ln(4.5)
2. Generate two random uniform u  U(0,1) varites, u1 and u2.
v ¼ a  ln[u1/(1  u1)]
y ¼ kev
z ¼ u 12u 2
w ¼ b þ qv  y
3. if w þ d  cz  0, set x0 ¼ y, go to step 5
if w þ d  cz < 0, go to step 4
4. if w  ln(z), set x0 ¼ y, go to step 5
If w < ln(z), goto step 2
5. Return x ¼ x0 /y.
Example 4.4 Suppose x is gamma distributed with a mean of 0.1 and the variance ¼
0.02. Since, m ¼ k/y, and s2 ¼ k/y2, then solving for k and y, yields k ¼ 0.5 and
y ¼ 5. The computations are below:
1. e ¼ 2.718
b ¼ 1.184
2. (u1, u2) ¼ (0.71, 0.21), say.
Since p ¼ 0.841  1, go to step 3
Beta 33

3. y ¼ 0.707
ey ¼ 0.493
Since u2  0.493, x0 ¼ 0.707
4. Return x ¼ 0.707/5 ¼ 0.141.
Example 4.5 Suppose x is gamma distributed with a mean of 10 and the variance ¼
66.6. Since, m ¼ k/y, and s2 ¼ k/y2, then solving for k and y, yields k ¼ 1.5 and
y  0.15. The computations are below:
1. a ¼ 0.707
b ¼ 0.114
q ¼ 2.914
c ¼ 4.5
d ¼ 2.504
2. (u1, u2) ¼ (0.15, 0.74), say
v ¼ 1.226
y ¼ 0.440
z ¼ 0.0167
w ¼ 3.899
3. Since w þ dcz < 0, go to 4
4. ln(z) ¼ 4.092
Since w  ln(z), x0 ¼ 0.440
5. Return x ¼ 0.440/0.15 ¼ 2.933

Beta

The beta distribution has two parameters (k1,k2) where k1 > 0 and k2 > 0, and
takes on many shapes depending on the values of the parameters. The variable
denoted as x, lies within two limits, a and b where (a  x  b).

Standard Beta

Another distribution is introduced and is called the standard beta. This distribution
has the same parameters (k1,k2) as the beta distribution, but the limits are
constrained to the range (0,1). So when x is a beta with a range (a,b), x0 is a
standard beta with a range (0,1). When they both have the same parameters, x and x0
are related as below:

x0 ¼ ðx  aÞ=ðb  aÞ

and

x ¼ a þ x0 ðb  aÞ
34 4 Generating Continuous Random Variates

The probability density for x0 is the following:

fðx0 Þ ¼ ðx0 Þk11 ð1  x0 Þ ð0  x0  1Þ


k21
=Bðk1 ; k2 Þ

where
ð1
Bðc; dÞ ¼ beta function ¼ tc1 ð1  tÞd1 dt
0

The expected value of x0 is

Eðx0 Þ ¼ k1 =ðk1 þ k2 Þ

and the variance is


h i
Vðx0 Þ ¼ k1 k2 = ðk1 þ k2 Þ2 ðk1 þ k2 þ 1Þ

The corresponding expected value and variance of x becomes,

EðxÞ ¼ a þ Eðx0 Þðb  aÞ

VðxÞ ¼ ðb  aÞ2 Vðx0 Þ

The mode of the standard beta variable could be 0 or 1 depending on the values
of k1 and k2. However, when k1 > 1 and k2 > 1, the mode lies between 0 and 1 and
is computed by,

x~0 ¼ ðk1  1Þ=ðk1 þ k2  2Þ

The mode for the beta variable becomes:

x~ ¼ a þ x~0 ðb  aÞ

Below is a list describing the relation between the parameters and the shape of
the distribution.

Parameters Shape
k1 < 1 and k2  1 Mode at zero (right skewed)
k1  1 and k2 < 1 Mode at one (left skewed)
k1 ¼ 1 and k2 > 1 Ramp down from zero to one
k1 > 1 and k2 ¼ 1 Ramp up from zero to one
k1 < 1 and k2 < 1 Bathtub shape
k1 > 1 and k2 > 1 and k1 > k2 Mode between zero and one (left skewed)
k1 > 1 and k2 > 1 and k2 > k1 Mode between zero and one (right skewed)
k1 > 1 and k2 > 1 and k1 ¼ k2 Mode in middle, symmetrical, normal like
k1 ¼ k2 ¼ 1 Uniform
Weibull 35

To generate a random beta variate x with parameters (k1,k2), and with the range
(a, b) the following routine is run:
1. Generate a random gamma variate, g1, with parameters k1 and y1 ¼1.
Generate a random gamma variate, g2, with parameters k2 and y2 ¼1.
2. x0 ¼ g1/(g1 þ g2)
3. x ¼ a þ x0 (b  a)
4. Return x
Example 4.6 Suppose x is a beta random variable with parameters (2,4) and has a
range of 10–50. The following steps are followed to show how to generate a random x:
1. A random gamma variate is generated with (k1 ¼ 2, y1 ¼1), say, g1 ¼ 1.71.
A random gamma variate is generated with (k2 ¼ 4, y2 ¼1), say, g2 ¼ 4.01.
2. The random standard beta variate is x0 ¼ 1.71/(1.71 þ 4.01) ¼ 0.299
3. The random beta variate is x ¼ 10 þ 0.299(50  10) ¼ 21.958
4. Return x ¼ 21.958

Weibull

The Weibull distribution has two parameters, k1 > 0 and k2 > 0, and the random
variable, denoted as x, ranges from zero and above. The density is
h i
k1
fðxÞ ¼ k1 k~2 xk1 exp ðx=k2 Þk1 x>0

and the cumulative distribution function


h i
FðxÞ ¼ 1  exp ðx=k2 Þk1 x>0

The expected value and variance of x are listed below,

EðxÞ ¼ k2 =k1 G ð1=k1 Þ


h i
VðxÞ ¼ k2 2 =k1 2G ð2=k1 Þ  1=k1 G ð1=k1 Þ2

Recall G denotes the gamma function described earlier in this chapter. When the
parameter k1  1, the shape of the density is exponential like. When k1 > 1, the
shape has a mode greater than zero and skews to the right, and at k1  3, the density
shape starts looking like a normal distribution.
To generate a random x from the Weibull, the inverse transform method is used.
Setting a random uniform variate u  U(0,1) to F(x), and solving for x, yields the
following:
36 4 Generating Continuous Random Variates

x ¼ k2 ½ lnð1  uÞ 1=k1

Example 4.7 Suppose x is Weibull distributed with parameters k1 ¼ 4, k2 ¼ 10,


and a random x is called in a simulation run. The following steps are followed:
1. Generate a random uniform u  U(0,1). Say u ¼ 0.92.
2. x ¼ 10[ln(1  0.92)]1/4 ¼ 12.61.
3. Return x ¼ 12.61.

Normal Distribution

The normal distribution is symmetrical with a bell shaped density. Its mean is
denoted as m and the standard deviation as s. This is perhaps the most widely used
probability distribution in business and scientific applications. A companion distri-
bution, the standard normal distribution, has a mean of zero, a standard deviation of
one, and has the same shape as the normal distribution. The notation for the normal
variable is x  N(m,s2), and its counterpart, the standard normal is z  N(0,1). The
variable z is related to x by the relation: z ¼ (xm)/s. In the same way, x is
obtained from z by: x ¼ m þ zs. When k represents a particular value of z, the
pffiffiffiffiffiffi
probability density is f(k) ¼ 1/ 2p exp(k2/2). The probability that z is less than k
Ðk
is denoted as F(k) and is computed by F(k) ¼ 1 f ðzÞdz.
There is no closed-form solution for the cumulative distribution F(z). A way to
approximate F(z) has been developed by C.Hastings, Jr. (1955), and also reported
by A. Abramowitz and I. A. Stegun (1964). Two methods credited to Hastings are
listed below.

Hastings Approximation of F(z) from z

To find F(z) from a particular value of z, the following routine is run:


1. d1 ¼ 0.0498673470
d2 ¼ 0.0211410061
d3 ¼ 0.0032776263
d4 ¼ 0.0000380036
d5 ¼ 0.0000488906
d6 ¼ 0.0000053830
2. If z  0, k ¼ z
If z < 0, k ¼ z
3. F ¼ 1  0.5[1 þ d1k þ d2k2 þ d3k3 þ d4k4 þ d5k5 þ d6k6]16
4. if z  0, F(z) ¼ F
Normal Distribution 37

If z < 0, F(z) ¼ 1  F
5. Return F(z)

Hastings Approximation of z from F(z)

Another useful approximation also comes from Hastings, and gives a formula that
yields a random z from a value of F(z). The routine is listed below:
1. c0 ¼ 2.515517
c1 ¼ 0.802853
c2 ¼ 0.010328
d1 ¼ 1.432788
d2 ¼ 0.189269
d3 ¼ 0.001308
2. H(z) ¼1  F(z)
If H(z)  0.5, H ¼ H(z)
If H(z) > 0.5, H ¼ 1  H(z)
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3. t ¼ lnð1=H2 Þ where ln ¼ natural logarithm
4. k ¼ t  [c0 þ c1t þ c2t2]/[1 þ d1t þ d2t2 þ d3t3]
5. If H(z)  0.5, z ¼ k
If H(z) > 0.5, z ¼ k
6. Return z
The literature reports various ways to generate a random standard normal variate z.
Three of the methods are presented here.

Hastings Method

The first way utilizes the Hastings method that finds a z from F(z), and is based
on the inverse transform method. The routine uses one standard uniform variate,
u  U(0,1), as shown below:
1. Generate a random continuous uniform variate u  U(0,1).
2. Set F(z) ¼ u.
3. Use Hastings Approximation of z from F(z) to generate a random standard
normal variate z.
4. Return z.
38 4 Generating Continuous Random Variates

Convolution Method

A second way to generate a random standard normal variate uses twelve random
continuous uniform variates. The routine is listed below:
1. z ¼ 6
2. For i ¼ 1 to 12
3. Generate a random uniform variate u  U(0,1)
4. z¼zþu
5. Next i
6. Return z
Note in the above routine, E(u) ¼ 0.5 and V(u) ¼ 1/12, and thereby, E(z) ¼ 0
and V(z) ¼ 1. Also since z is based on the convolution of twelve continuous uniform
u  U(0,1) variates, the central limit theorem applies and hence, z  N(0,1).
Example 4.7 The routine below shows how to use the convolution method to
generate a random z  N(0.1):
1. Set z ¼ 6.0
2. Sum 12 random continuous uniform variates, u  U(0,1), say Su ¼ 7.12
3. z ¼ 6.0 þ Su ¼ 1.12
4. Return z ¼ 1.12

Sine-Cosine Method

A third way generates two random standard normal variates, z1, z2. This method
comes from a paper by Box and Muller (1958). The routine requires two random
continuous uniform variates to generate the two random standard normal variates.
The routine is listed below:
1. Generate two random continuous uniform variates, u1 and u2
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2. z1 ¼ 2 lnðu1 Þ cos½2pðu2 Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
z2 ¼ 2 lnðu1 Þ sin½2pðu2 Þ
3. Return z1 and z2
Example 4.8 Suppose x is normally distributed with mean 40 and standard devia-
tion 10, and a random normal variate of x is needed. Using the Sine-Cosine method,
the steps below follow:
1. Two random continuous uniform u  U(0,1)variates are (u1, u2) ¼ (.37, .54), say
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2. z1 ¼ 2 lnðu1 Þ cos½2pðu2 Þ ¼ 1:3658
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
z2 ¼ 2 lnðu1 Þ sin½2pðu2 Þ ¼ 0:3506
3. x ¼ 40  1.3658  10 ¼ 26.342
4. Return x ¼ 26.342.
Lognormal 39

Lognormal

The lognormal distribution with variable x > 0, reaches a peak greater than zero
and skews far to the right. This variable is related to a counterpart normal variable y,
in the following way.

y ¼ lnðxÞ

where ln is the natural logarithm. In the same way, x is related to y by the relation
below:

x ¼ ey :

The variable y is normally distributed with mean and variance, my and sy2,
respectively, and x is lognormal with mean and variance, mx and sx2. The notation
for x and y are as below:
 
x  LN my ; sy 2
 
y  N my ; sy 2

Note, the parameters to define the distribution of x, are the mean and variance of y.
The parameters between x and y are related in the following way:
 
mx ¼ exp my þ sy 2 =2
    
sx 2 ¼ exp 2my þ sy 2 exp sy 2  1
 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
my ¼ ln mx 2 = m2x þ s2x

 
sy 2 ¼ ln 1 þ sx 2 = mx 2

To generate a random x with parameters my and sy2, the following routine is run:
1. Generate a random standard normal variate, z.
2. A random normal variate becomes: y ¼ my þ zsy.
3. The random lognormal variate is x ¼ ey.
4. Return x.
Example 4.9 Suppose x is lognormal with mean mx ¼ 10, variance sx2 ¼ 400,
and a random lognormal variate of x is needed. The steps below show how to find a
random x:
40 4 Generating Continuous Random Variates

1. The mean and variance of y become:


 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
my ¼ ln mx 2 = m2x þ s2x ¼ 1:498

sy 2 ¼ ln½1 þ sx 2 = mx 2 ¼ 1:609
The standard deviation of y is sy ¼ 1.269.
2. A random standard normal variate is generated as z ¼ 1.37, say.
3. The random normal variate becomes y ¼ 1.498 þ 1.37  1.269 ¼ 3.236.
4. The random lognormal variate is x ¼ e3.236 ¼ 25.44.
5. Return x ¼ 25.44.

Chi-Square

The chi-square distribution is one of the most frequently used distributions in


statistical analysis, usually to test the variability of the variance of a variable. The
chi-square variable is denoted as w2 and is associated with a parameter k, the
degrees of freedom. The variable w2 is greater or equal to zero, and the parameter
k is a positive integer. When, w ¼ w2, the probability density of w, with parameter
k, is listed below:
h i
fðwÞ ¼ ½wðk=21Þ ew=2 = 2k=2 Gðk=2Þ w  0

The mean and variance of w (and w2) are E(w) ¼ k and V(w) ¼ 2k, respec-
tively. So, the mean and variance of w2 with k degrees of freedom are:

Eðw2 Þ ¼ k

Vðw2 Þ ¼ 2k

Probability tables values of chi-square with parameter k and probability P(w2 >
w a) ¼ a are listed in most statistical books, and usually for k  100.
2

The variable chi-square with degrees of freedom k is related to the standard


normal variable as shown below:

w2 ¼ z1 2 þ . . . þ zk 2

where z1 to zk are standard normal variates.


Chi-Square 41

Approximation Formula

When the parameter k is large (k >30), thanks to the central limit theorem, the
chi-square probability density is shaped like a normal distribution whereby, w2 
N(k,2k). Using this relation, an approximation to the a-percent chi-square value is
shown below:
pffiffiffiffiffi
w2 a  k þ za 2k

where z is a standard normal variable with P(z > za) ¼ a and thereby
P(w2 > w2a)  a.

Relation to Gamma

It is also noted where the density f(w) has the same shape as the gamma distribution
with parameters y and k0 when the gamma parameters are set as: y ¼ 2 and k0 ¼ k/2,

Generate a Random Chi-Square Variate

A random chi-square variate with degrees of freedom, k, can be generated by


summing k standard normal variates as shown in the routine below:
1. w2 ¼ 0
2. For i ¼ 1 to k
3. Generate a standard normal variate z
4. w2 ¼ w2 þ z2
5. Next i
6. Return w2
Another way to generate the chi-square variate is by using the gamma relationship
noted above. For chi-square with parameter k, generate a gamma with parameters,
(2, k/2) and the outcome becomes the random chi-square variate.
When k is large, (k > 30), the normal approximation given above, can be used
to generate the chi-square variate. In this situation the chi-square variate is
approximated by the normal distribution with a mean k and variance 2k.
Example 4.10 Suppose a random chi-square with degrees of freedom k ¼ 3 is
needed in a simulation run. To generate a random chi-square, the steps below are
followed:
1. Suppose three random standard normal variates are: (z1, z2, z3) ¼ (0.47, 0.81,
1.04), say.
2. w2 ¼ 0.472 þ 0.812 þ 1.042 ¼ 1.9586.
3. Return w2 ¼ 1.9586.
42 4 Generating Continuous Random Variates

Example 4.11 Suppose a simulation model needs a chi-square random variate with
239 degrees of freedom. To accomplish, the following routine is run:
1. Generate a random standard normal z  N(0,1), say z ¼ 1.34.
 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 
2. w2 ¼ int 239 þ 1:34 2  239 þ 0:5 ¼ 269
3. Return w2 ¼ 269

Student’s t

The student’s t distribution is an important distribution used in statistical analysis,


usually to test the significance of the mean value of a variable. The distribution is
often referred as the t distribution. The spread of the distribution is much like the
standard normal distribution but the tails can reach out farther to the right and left,
depending on a parameter k, the degrees of freedom. The expected value and
variance of t are listed below:

EðtÞ ¼ 0

VðtÞ ¼ k=ðk  2Þ at k>2

When k > 30, the student’s t distribution is approximated by the standard


normal distribution.
The variable t, with parameter k, is related to the standard normal distribution
and the chi-square distribution by the relation below:
qffiffiffiffiffiffiffiffiffi
t ¼ z= w2k =k

Generate a Random Variate

To generate a random t with parameter k, the following routine is run:


1. Generate a random standard normal variate, z  N(0,1).
2. Generate a random chi-square variate with parameter k, w2k .
pffiffiffiffiffiffiffiffiffi
3. t ¼ z= w2k =k
4. Return t.
Example 4.12 Suppose a random variate t with degrees of freedom k ¼ 6 is
needed in a simulation analysis. To accomplish, the steps below are followed.
1. A random standard normal variate is generated as z ¼ 0.71, say.
2. A random chi-square variate with parameter k ¼ 6 is generated as w26 ¼ 6:29, say.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3. t ¼ 0:71= 6:29=6 ¼ 0:693.
4. Return t ¼ 0.693.
Fishers’ F 43

Fishers’ F

Fisher’s F distribution is an important distribution used in statistical analysis and


pertains when the two or more variances from normal variables are under review.
The variable F is greater than zero, and has two parameters, u and v, where both are
positive integers. The variable F is derived from two independent chi square
variables, w12 and w22 with degree of freedom, u and v, respectively, as shown below.
   
F ¼ w1 2 =u = w2 2 =v and F>0

The expected value and variance of F are listed below,

EðFÞ ¼ v=ðv  2Þ when v>2


  h i
VðFÞ ¼ 2v2 ðu þ v  2Þ = uðv  2Þ2 ðv  4Þ when v>4

Table values, Fa,u,v, of F with parameters u and v are listed in most statistical
books where P[F > Fa,u,] ¼ a. Note the relation below that shows how the lower
tail values of F with degrees of freedom v and u is related to the upper tail values
when the degrees of freedom are u and v.

Fð1aÞ;v;u ¼ 1=Fa;u;v

In statistical analysis, suppose x1 and x2 are two normally distributed variables


with variances s12 and s22, respectively, and s12 and s22 are the corresponding
sample variances when n1 and n2 are the number of samples taken for x1 and x2,
respectively. The ratio,
   
F ¼ s1 2 = s1 2 = s2 2 =s2 2

is distributed as an F variable with degrees of freedom n10 ¼ n11 and n20 ¼ n21.
When s12 and s22 are equal, the ratio becomes,

F ¼ s1 2 =s2 2

Note, u ¼ n10 and v ¼ n20 is the notation for the degrees of freedom. To generate
a random F with parameters, u and v, the following routine is run:
1. Generate w12 and w22 with degree of freedom, u and v, respectively.
2. F ¼ [w12/u]/[w22/v]
3. Return F
Example 4.13 Suppose a random variate of F with degrees of freedom 4 and 6 is
needed in a simulation run. The steps below show how the random F is derived:
44 4 Generating Continuous Random Variates

1. Suppose w12 ¼ 3.4 and w22 ¼ 8.3, are randomly generated.


2. F ¼ [3.4/4]/[8.3/6] ¼ 0.61
3. Return F ¼ 0.61

Summary

This chapter shows how to transform the continuous uniform random variates,
u  U(0,1), to random variates for a variable that comes from one of the common
continuous probability distributions. The probability distributions described here
are the following: the continuous uniform, exponential, Erlang, gamma, beta,
Weibull, normal, lognormal, chi-square, student’s t, and Fishers F. The chapter
also shows how to use the (Hastings) approximation formulas for the standard
normal distribution.
Chapter 5
Generating Discrete Random Variates

Introduction

A discrete random variable includes a specified list of exact values where each is
assigned a probability of occurring by chance. The variable can take on a particular
set of discrete events, like tossing a coin (head or tail), or rolling a die (1,2,3,4,5,6).
This chapter considers the more common discrete probability distributions and
shows how to generate random variates for each. The probability distributions
described here are the following: discrete arbitrary, discrete uniform, Bernoulli,
binomial, hyper geometric, geometric, Pascal and Poisson.

Discrete Arbitrary

A variable x is defined as discrete when a set number of values of x can occur, as xi


for i ¼ 1 to N, and N could be finite or infinite. Generally, xi are the positive
integers as xi ¼ 0, 1, 2,. . .. The probability of a particular value xi is denoted as
P(xi) ¼ P(x ¼ xi). Hence, P(x1), . . ., P(xN) define the probability distribution of x.
The sum of all the probabilities is equal to one, i.e.,
X
Pðxi Þ ¼ 1
i

The expected value of x is obtained as below:


X
EðxÞ ¼ xi Pðxi Þ
i

and the associated variance is,

VðxÞ ¼ Eðx2 Þ  EðxÞ2

N.T. Thomopoulos, Essentials of Monte Carlo Simulation: Statistical Methods 45


for Building Simulation Models, DOI 10.1007/978-1-4614-6022-0_5,
# Springer Science+Business Media New York 2013
46 5 Generating Discrete Random Variates

where,
X
Eðx2 Þ ¼ x2i Pðxi Þ
i

The cumulative distribution function of x is denoted as F(xi) and is computed by

Fðxi Þ ¼ Pðx  xi Þ

To generate a random variate x from an arbitrary probability distribution, the


following routine is run:
1. For each xi, find F(xi) i ¼ 1 to N.
2. Generate a random continuous uniform u  U(0,1).
3. Locate the smallest xi where u < F(xi).
4. Set x ¼ xi.
5. Return x.
Example 5.1 Suppose x is discrete with the following probability distribution, and
the associated cumulative distribution function.

x P(x) F(x)
0 0.4 0.4
1 0.3 0.7
2 0.2 0.9
3 0.1 1.0

To obtain a random x, the following steps are taken:


1. Generate a random u  U(0,1), say u ¼ 0.57.
2. Note, x ¼ 1 is the smallest x where u ¼ 0.57 < F(1) ¼ 0.7.
3. Return x ¼ 1.

Discrete Uniform

A variable x follows the discrete uniform distribution with parameter (a,b) when x
takes on all integers from a to b with equal probabilities. The probability of x
becomes,

pðxÞ ¼ 1=ðb  a þ 1Þ x ¼ a to b

The cumulative distribution function is

FðxÞ ¼ ðx  a þ 1Þ=ðb  a þ 1Þ x ¼ a to b
Bernoulli 47

The expected value and the variance of x are listed below:

EðxÞ ¼ ða þ bÞ=2
h i
VðxÞ ¼ ðb  a þ 1Þ2  1 =12

To generate a random discrete uniform variate of x, the routine below is


followed:
1. Generate a random continuous uniform u  U(0,1).
2. x ¼ ceiling [(a1) þ u(ba þ 1)].
3. Return x.
Example 5.2 Suppose an analyst needs a random discrete uniform variate for use
in a simulation model in progress where x includes all integers from 10 to 20.
To compute, the following steps are taken:
1. Generate a continuous uniform random u  U(0,1), say u ¼ 0.714.
2. Calculate x ¼ ceiling [(101) þ 0.714 (2010 þ 1)] ¼ ceiling [16.854] ¼ 17.
3. Return x ¼ 17.

Bernoulli

Suppose the variable x is distributed as a Bernoulli variable of x ¼ 0 or 1, where the


probability of each is the following:

Pðx ¼ 0Þ ¼ 1  p

Pðx ¼ 1Þ ¼ p

The expected value and variance of x are the following:

EðxÞ ¼ p

VðxÞ ¼ pð1  pÞ

To generate a random Bernoulli variate x, the following three steps are taken:
1. Generate a random uniform u  U(0,1).
2. If u < p, x ¼ 1; else, x ¼ 0.
3. Return x.
48 5 Generating Discrete Random Variates

Example 5.3 Consider a Bernoulli x with p ¼ 0.70. A random x is generated as


follows:
1. Generate a random u  U(0,1), say u ¼ 0.48.
2. Since u < p ¼ 0.70, x ¼ 1.
3. Return x ¼ 1.

Binomial

The variable x is distributed as a Binomial when x is the number of success’ in n


independent trials of an experiment with p the probability of a success per trial. The
variable x can take on the integer values of 0–n. The probability of x is the
following:

PðxÞ ¼ n!=½x!ðn  xÞ!px ð1  pÞnx x ¼ 0; . . . :; n

The expected value and variance of x are listed below:

EðxÞ ¼ np

VðxÞ ¼ npð1  pÞ

The cumulative distribution function of x, denoted as F(x), is the probability of


the variable achieving the value of x or smaller. When x ¼ xo, say,

Fðxo Þ ¼ Pðx  xo Þ:

To generate a random binomial variate of x, one of the three routines listed


below may be used.

When n is Small

When n is small to moderate in size, the following routine is efficient:


1. Set x ¼ 0.
2. For i ¼ 1 to n
Generate a random continuous uniform variate, u  U(0,1)
If u < p, x ¼ x þ 1
Next i.
3. Return x.
Binomial 49

Normal Approximation

When n is large and if p  0.5 with np > 5, or if p > 0.5 with n(1p) > 5, then x
can be approximated with the normal distribution, whereby x  N[np, np(1p)].
The routine listed below will generate a random x:

h
1. Generate a random standard
p normal
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi i
variate, z  N(0,1).
2. x ¼ integer np þ z npð1  pÞ þ 0:5 :
3. Return x.

Poisson Approximation

When n is large and p is small, and the above normal approximation does not apply,
x is approximated by the Poisson distribution that is described subsequently in this
chapter:
1. The expected value of the Poisson variable is denoted here asy, where E(x) ¼ y ¼ np.
2. Generate a random Poisson variate x with parameter y.
3. Return x.
Example 5.4 Suppose x is binomial distributed with n ¼ 5 trials and p ¼ 0.375.
To generate a random binomial x, five continuous uniform variates,u  U(0,1), are
needed as shown below:
1. Suppose the five uniform variates are the following: 0.286, 0.949, 0.710, 0.633,
and 0.325.
2. Since two of the variates are below p ¼ 0.375, x ¼ 2.
3. Return x ¼ 2.
Example 5.5 Assume 100 independent Bernoulli trials are run where the probabil-
ity of a success per trial is p ¼ 0.40. Of interest is to generate a random binomial
variate x. Since n ¼ 100, p ¼ 0.40 and np ¼ 40, the normal approximation to
the binomial can be used. The mean and variance of the normal are, m ¼ np ¼ 40
and s2 ¼ npð1  pÞ ¼ 24 , respectively. Hence s ¼ 4.89. The following routine
generates the random x:
1. Generate a random standard normal variate z, say z ¼ 0.87.
2. x ¼ integer[m þ zs þ 0.5] ¼ 44.
3. Return x ¼ 44.
Example 5.6 Suppose a random binomial variate x is needed from a sample of
n ¼ 1,000 trials with p ¼ 0.001. With np ¼ 1.0, the normal distribution does not
apply, but the Poisson distribution is applicable with y ¼ 1.00. Later in this chapter,
the way to generate a random x from the Poisson distribution is shown:
1. Generate a random Poisson variate with parameter y ¼ 1.00, say x ¼ 2.
2. Return x ¼ 2.
50 5 Generating Discrete Random Variates

Hyper Geometric

The variable x is distributed as a hyper geometric when x is the number of


defectives in n samples taken without replacement from a population of size N
with D defectives. The variable x can take on the integer values of zero to the
smaller of D and n. The probability of x is the following:
   
ND D N
PðxÞ ¼ x ¼ 0; . . . ; minðn; DÞ
nx x n

The expected value and variance of x are listed below:

EðxÞ ¼ nD=N

VðxÞ ¼ n½D=N½1  D=N½N  n=½N  1

To generate a random hyper geometric variate, the following routine is run. The
parameter notations are N ¼ population size, D ¼ number of defects in the popu-
lation, and n ¼ number of samples without replacement:
1. Set N1 ¼ N, D1 ¼ D and x ¼ 0.
2. For i ¼ 1 to n
p ¼ D1/N1
Generate a random continuous uniform u  U(0,1)
N1 ¼ N1 - 1
If u < p, x ¼ x þ 1 and D1 ¼ D11
Next i.
3. Return x.
Example 5.7 Suppose a situation where a lot of ten units contains two defectives
and a sample of size four is taken without replacement. The goal is to generate a
random hyper geometric variate on the number of defective units, x, observed in the
sample. Note, N ¼ 10, D ¼ 2 and n ¼ 4. The steps below show how a random x is
generated:
1. Set N1 ¼ N ¼ 10, D1 ¼ D ¼ 2 and n ¼ 4.
2. Start loop.
3. At i ¼ 1, p ¼ D1/N1 ¼ 0.200, u ¼ 0.37 say. Since u  p,set N1 ¼ 9.
4. At i ¼ 2, p ¼ D1/N1 ¼ 0.222, u ¼ 0.51 say. Since u  p, set N1 ¼ 8.
5. At i ¼ 3, p ¼ D1/N1 ¼ 0.250, u ¼ 0.14 say. Since u < p, set N1 ¼ 7, x ¼ 1,
D1 ¼ 1.
6. At i ¼ 4, p ¼ D1/N1 ¼ 0.143, u ¼ 0.84 say. Since u  p, set N1 ¼ 6.
7. End loop.
8. Return x ¼ 1.
Geometric 51

Geometric

A variable x is distributed by the geometric distribution when x measures the


number of trials to achieve a success, and where the probability of a success, p,
remains the same for each trial. The probability of x is listed below.

PðxÞ ¼ pð1  pÞx1 x ¼ 1; 2; . . .

The cumulative distribution function of x is the following:

FðxÞ ¼ 1  ð1  pÞx x ¼ 1; 2; . . .

The expected value and the corresponding variance of x are listed below:

EðxÞ ¼ 1=p

VðxÞ ¼ ð1  pÞ=p2

To generate a random geometric variate of x (number of trials to achieve a


success), the following routine is run:
1. Generate a random continuous uniform u  U(0,1).
2. x ¼ integer[ln(1u)/ln(1p)] þ 1, where ln ¼ natural logarithm.
3. Return x.
Example 5.8 Suppose an experiment is run where the probability of a success is
p ¼ 0.20 and a random geometric variate of x, the number of trials till the first
success, is needed. To accomplish, the following three steps are shown:
1. Generate a random uniform from u  U(0,1), say, u ¼ 0.27.
2. x ¼ integer[ln(1.27)/ln(1.2)] þ 1 ¼ 2.
3. Return x ¼ 2.
When the variable is defined as the number of failures till obtain a success, the
variable is x0 ¼ x1 and x0 ¼ 0, 1, . . . The probability of x0 is below:
0
Pðx0 Þ ¼ pð1  pÞx x0 ¼ 0; 1; 2; . . .

The cumulative distribution function is the following:


0
Fðx0 Þ ¼ 1  ð1  pÞx þ1 x0 ¼ 0; 1; 2; . . .

Also, E(x0 ) ¼ E(x)1 ¼ (1p)/p and V(x0 ) ¼ V(x) ¼ (1p)/p2.


52 5 Generating Discrete Random Variates

Pascal

A variable x follows the Pascal distribution when x represents the number of trials
needed to gain k successes when the probability of a success is p. This distribution is
also called the negative binomial distribution. The probability of x is listed below:
 
x1 k
PðxÞ ¼ p ð1  pÞxk x ¼ k; k þ 1; . . .
k1

The cumulative distribution function is,

X
x
FðxÞ ¼ PðyÞ x ¼ k; k þ 1; . . .
y¼k

The mean and variance of x are given below:

EðxÞ ¼ k=p

VðxÞ ¼ kð1  pÞ=p2

To generate a random Pascal variate of x, the following routine is run:


1. x ¼ 0
2. For i ¼ 1 to k
Generate y, a random geometric variate with parameter p
(Note, y ¼ number of trials till a success.)
x¼xþy
Next i
3. Return x
Example 5.9 Suppose x is distributed as a Pascal variable with p ¼ 0.5 and k ¼ 5,
whereby x represents the number of trials until five successes. The following steps
illustrate how x is generated:
1. x ¼ 0.
2. At i ¼ 1, generate a geometric y with p ¼ .5, say y ¼ 3. x ¼ 3.
3. At i ¼ 2, generate a geometric y with p ¼ .5, say y ¼ 1. x ¼ 4.
4. At i ¼ 3, generate a geometric y with p ¼ .5, say y ¼ 2. x ¼ 6.
5. At i ¼ 4, generate a geometric y with p ¼ .5, say y ¼ 4 x ¼ 10.
6. At i ¼ 5, generate a geometric y with p ¼ .5, say y ¼ 2. x ¼ 12.
7. Return x ¼ 12.
When the variable is defined as the number of failures till obtain k successes,
the variable is x0 ¼ xk and x0 ¼ 0, 1, . . . Also, E(x0 ) ¼ E(x)k ¼ k(1p)/p and
V(x0 ) ¼ V(x) ¼ k(1p)/p2.
Poisson 53

Poisson

The variable x is described as Poisson distributed when events occur at a constant


rate, y, during a specified interval of time. Could be the number of vehicles crossing
an intersection each minute, or the demand for a product over a month’s interval of
time. The probability of x is listed below.

PðxÞ ¼ yx ey =x! x ¼ 0; 1; 2; . . . ::

The expected value and variance of x are shown below:

EðxÞ ¼ y
VðxÞ ¼ y

Relation to the Exponential Distribution

The Poisson and the exponential distributions are related since the time, t, between
events from a Poisson variable is distributed as exponential with EðtÞ ¼ 1=y. This
relation is used to randomly generate a value of x.

Generating a Random Poisson Variate

To generate a random variate of x from the Poisson distribution with parameter, y,


the following routine is run.
1. Set x ¼ 0, i ¼ 0 and St ¼ 0.
2. i ¼ i þ 1, generate a random exponential variate, t, with EðtÞ ¼ 1=y, and set
St ¼ St þ t.
3. if St > 1, go to step 5.
4. if St  1, x ¼ x þ 1, goto step 2.
5. Return x.
Example 5.10 Assume x is a random variable that follows the Poisson distribution
where the expected occurrences in an interval of time is y ¼ 2.4. In the steps below,
note how a random Poisson variate x, is derived from randomly generated expo-
nential t variates with expected value of 1/2.4:
1. x ¼ 0 and St ¼ 0.
2. At i ¼ 1, t ¼ 0.21 say, St ¼ 0.21, x ¼ 1.
3. At i ¼ 2, t ¼ 0.43 say, St ¼ 0.64, x ¼ 2.
4. At i ¼ 3, t ¼ 0.09 say, St ¼ 0.73, x ¼ 3.
5. At i ¼ 4, t ¼ 0.31 say, St ¼ 1.04.
6. Return x ¼ 3.
54 5 Generating Discrete Random Variates

Example 5.11 A gas station is open 24 h a day where 200–300 vehicles arrive for
gas each day, equally distributed. Eighty percent of the vehicles are cars, 15 % are
trucks and 5 % motorcycles. The consumption of gas per vehicle is a truncated
exponential with a minimum and average known by vehicle type. Cars consume on
average 11 gal with a minimum of 3 gal. Trucks consumer a minimum of 8 gal and
an average of 20 gal. The motorcycles consume a minimum of 2 gal and an average
of 4 gal. The analyst wants to determine the distribution of the total consumption of
gas for a day.
A simulation model is developed and run for 1,000 days. On the first day, 228
vehicles enter the station and for each of the 228 vehicles, the type of vehicle and
the amount of gas consumed is randomly generated. The sum of gas consumed in
the first day is G ¼ 2,596 gal. The simulation is carried on for 1,000 days and the
amount of gas consumed is recorded for each day. Now with 1,000 values of G, the
gas consumed per day, the next step is to sort the values of G from low to high to
yield Gð1Þ  Gð2Þ  . . .  Gð1000Þ. The p-quantile is estimated by Gðp  1000Þ.
For example, the 0.01 quantile is estimated using 0:01  1000 ¼ 10 where
G(10) ¼ 2,202, that is the tenth smallest value of G. The table below lists various
p-quantiles of the daily gas consumption.

p G(p)
0.01 2,202
0.05 2,336
0.10 2,436
0.20 2,546
0.30 2,658
0.40 2,768
0.50 2,883
0.60 3,014
0.70 3,138
0.80 3,261
0.90 3,396
0.95 3,503
0.99 3,656

The results show where there is a 5 % chance that G will exceed 3,503 gal and a
1 % chance that G will exceed 3,656 gal. Further, an estimate of the 90 % prediction
interval on G becomes Pð2336  G  3503Þ ¼ 0:90.
Example 5.12 An always-open truck dealership has ten bays to service trucks for
maintenance and service. The arrival rate of trucks is Poisson distributed with 6.67
vehicles per day, and the service rate is also Poisson with a service rate of 1.33
vehicles per day. The vehicles require n parts in the service operation and the
probability of n, denoted as P(n), is as follows:

n 3 4 5 6 7 8
P(n) 0.11 0.17 0.22 0.28 0.18 0.04
Summary 55

Four types of parts are described depending on the source of the supplier: PDC
(parts distribution center), OEM (original equipment manufacturer), DSH (direct
ship supplier), NST (Non stocked part). The table below lists the probability the
vehicle needs one of these type of parts, P(type); the service level for each type of
part, (SL), where SL ¼ P(part is available in dealer); and the lead time to obtain the
part in days (LT). Note, the dealer is limited on space and budget and must set his
service levels accordingly. The higher the service level, the more inventory in
pieces and in investment.

Part type PDC OEM DSH NST


P(type) 0.50 0.30 0.19 0.01
SL 0.93 0.92 0.95 0.00
LT 1.25 1.50 2.50 1.50

A simulation model is developed and is run until 5,000 vehicles are processed in
the dealership. The first 500 vehicles are used as the transient stage, whereby the
equilibrium stage is for the final 4,500 vehicles. This is where all the measurements
are tallied. The table below shows some of the statistics gathered from the final
4,500 vehicles.
Bay averages per vehicle:

Empty time ¼ 0.19 days


Service time ¼ 0.75 days
Wait time for part(s) ¼ 0.56 days
Total time ¼ 1.50 days

Vehicle averages:

Wait time in yard ¼ 0.09 days


Service time ¼ 0.75 days
Wait time for part(s) ¼ 0.56 days
Total time ¼ 1.40 days

The results show where the average bay is empty for 0.19 days for each vehicle it
processes. Further the average service time is 0.75 days and the average idle time
per vehicle waiting to receive the out-of-stock part(s) is 0.56 days. The average wait
time a vehicle is in the yard prior to service is 0.09 days and the average time in the
dealership is 1.40 days. Note also where the average time between vehicles for a
bay is 1.50 days.

Summary

This chapter shows how to transform continuous uniform random variates, u 


U(0,1), to random discrete variates for a variable that comes from one of the more
common discrete probability distributions. The probability distributions described
here are the following: discrete arbitrary, discrete uniform, Bernoulli, binomial,
hyper-geometric, geometric, Pascal and Poisson.
Chapter 6
Generating Multivariate Random Variates

Introduction

When two or more random variables are jointly related in a probability way, they
are labeled as multivariate random variables. The probability of the variables
occurring together is defined by a joint probability distribution. In most situations,
all of the variables included in the distribution are continuous or all are discrete; and
on less situations, they are a mixture between continuous and discrete. This chapter
considers some of the more popular multivariate distributions and shows how to
generate random variates for each. The probability distributions described here are
the following: multivariate discrete arbitrary, multinomial, multivariate hyper geo-
metric, bivariate normal, bivariate lognormal, multivariate normal and multivariate
lognormal. The Cholesky decomposition method is also described since it is needed
to generate random variates from the multivariate normal and the multivariate
lognormal distributions.

Multivariate Discrete Arbitrary

Suppose k discrete random variables (x1,. . .,xk) are jointly related by the probability
distribution P(x1,. . .,xk). The sum of the probabilities over all possible values of the
k variables is one, i.e.,
X
x ::x
Pðx1 . . . xk Þ ¼ 1:0
1 k

Consider one of the k variables, say xj. The marginal probability of xj, denoted as
P(xj. . .), is obtained by summing the joint probability distribution over all xi except
xj, as shown below,
X
Pðxj . . .Þ ¼ Pðx1 ; x2 ; . . . ; xk Þ
all:x: :but:xj

N.T. Thomopoulos, Essentials of Monte Carlo Simulation: Statistical Methods 57


for Building Simulation Models, DOI 10.1007/978-1-4614-6022-0_6,
# Springer Science+Business Media New York 2013
58 6 Generating Multivariate Random Variates

The partial expectation of xj is obtained as follows,


X
Eðxj . . .Þ ¼ xj Pðxj . . .Þ
xj

and the partial variance is


 
Vðxj . . .Þ ¼ E xj 2 . . .  Eðxj . . . Þ2

where
  X 2
E xj 2 . . . ¼ xj Pðxj . . .Þ
xj

Generate a Random Set of Variates

The steps below show how to generate a random set of variates for the variables,
x1,x2,. . .,xk.
1. Get P(x1. . .), the marginal probability of x1, and also F(x1. . .), the corresponding
cumulative distribution.
Generate a random continuous uniform variate u  U(0,1).
Locate the smallest value of x1 where u < F(x1. . .), say x10.
2. Get P(x2|x10. . .), the marginal probability of x2 given x10, and also F(x2|x10. . .),
the corresponding cumulative distribution.
Generate a random uniform continuous variate u  U(0,1).
Locate the smallest value of x2 where u < F(x2|x10. . .), say x20.
3. Get P(x3|x10x20. . .), the marginal probability of x3 given x10 and x20, and also
F(x3|x10x20. . .), the corresponding cumulative distribution.
Generate a random continuous uniform variate u  U(0,1).
Locate the smallest value of x3 where u < F(x3|x10x20. . .), say x30.
4. Repeat in the same way until get xk0.
5. Return (x10, . . ., xk0).
Example 6.1 Suppose a three variable joint probability distribution with variables,
x1, x2, x3 where the possible values for x1 is 0,1,2, for x2 it is 0,1, and for x3 it is 1,2,3.
The probability distribution P(x1,x2,x3) is listed below. Note the sum of all
probabilities is one. Below shows how to generate one set of random variates.
Multinomial 59

P(x1,x2,x3)

x3 1 2 3 |1 2 3
x2 0 |1
x1
0 | 0.12 0.10 0.08 | 0.08 0.06 0.05 |
1 | 0.08 0.06 0.04 | 0.05 0.04 0.03 |
2 | 0.06 0.04 0.02 | 0.04 0.03 0.02 |

1. The marginal distribution for x1 and the associated cumulative distribution is below:
P(0..) ¼ 0.49 F(0..) ¼ 0.49
P(1..) ¼ 0.30 F(1..) ¼ 0.79
P(2..) ¼ 0.21 F(2..) ¼ 1.00
Generate a random u  U(0,1), u ¼ 0.37 say. Hence x10 ¼ 0.
2. The marginal distribution for x2 given x10 ¼ 0, and the associated cumulative
distribution is below:
P(0|0.) ¼ 0.30/0.49 ¼ 0.612 F(0|0.) ¼ 0.612
P(1|0.) ¼ 0.19/0.49 ¼ 0.388 F(1|0.) ¼ 1.000
Generate a random u  U(0,1), u ¼ 0.65 say. Hence x20 ¼ 1.
3. The marginal distribution for x3 given x10 ¼ 0 and x20 ¼ 1, and the associated
cumulative distribution is below.
P(1|01) ¼ 0.08/0.19 ¼ 0.421 F(1|01) ¼ 0.421
P(2|01) ¼ 0.06/0.19 ¼ 0.316 F(2|01) ¼ 0.737
P(3|01) ¼ 0.05/0.19 ¼ 0.263 F(3|01) ¼ 1.000
Generate a random u  U(0,1), u ¼ 0.84 say. Hence x30 ¼ 3.
4. Return (x10, x20, x30) ¼ (0, 1, 3)

Multinomial

Suppose an experiment has k mutually exclusive possible outcomes, A1, . . ., Ak,


P
k
with probabilities p1, . . . , pk, respectively, and pi ¼ 1.0. With n independent
i¼1
trials of the experiment, the random variables are x1, . . ., xk representing the
Pk
number of times event Ai (i ¼ 1, . . ., k) has occurred. Note xi ¼ n. The
i¼1
probability of xi is as follows:

Pðx1 ; . . . ; xk Þ ¼ n!=½x1 !::::xk !p1 x1 . . . pk xk


60 6 Generating Multivariate Random Variates

The marginal probability of each individual variable, xi, is a binomial random


variable with parameters, n and pi. The associated mean and variance of xi are
listed below.

Eðxi Þ ¼ npi

Vðxi Þ ¼ npi ð1  pi Þ

Generating Random Multinomial Variates

The steps below show how to randomly generate a set of multinomial variates
(x1, . . ., xk) from n trials with probabilities, p1, . . . , pk.
1. For i ¼ 1 to k

P
k
pi 0 ¼ pi = pj
j¼i

iP
1
ni 0 ¼ n  xj
j¼1

Generate a random Binomial variate, xi, with parameters, ni0 and pi0 .
Next i
2. Return x1, . . ., xk
Example 6.2 Suppose an experiment with three possible outcomes with
probabilities 0.5, 0.3 and 0.2, respectively, where five trials of the experiment are
run. Of need is to randomly generate the multinomial variate set (x1, x2, x3) for this
situation. The steps below show how this is done.
1. At i ¼ 1, with parameters n10 ¼ 5, p10 ¼ 0.5, generate a random binomial, say
x1 ¼ 2.
2. At i ¼ 2, with parameters n20 ¼ 3, p20 ¼ 0.6, generate a random binomial, say
x2 ¼ 2.
3. At i ¼ 3, with parameters n30 ¼ 1, p30 ¼ 1.0, generate a random binomial, say
x3 ¼ 1.
4. Return (x1, x2, x3) ¼ (2, 2, 1).

Multivariate Hyper Geometric

Suppose a population of N items where some are non-defective and the remainder are
defective falling into k defective categories with number of defectives, D1, . . ., Dk.
A sample of n items are taken without replacement and the outcomes are x1, . . ., xk
Multivariate Hyper Geometric 61

defective items. Note, xi ¼ number of defective items of the ith category in the
sample. The random variables follow the Multivariate Hyper Geometric distribution.
The probability distribution is listed below,
     
N  SD D1 Dk N
Pðx1 ; . . . ; xk Þ ¼ ...:
n  Sx x1 xk n

where

P
k
SD ¼ Di ¼ sum of defective items in the population
i¼1

P
k
Sx ¼ xi ¼ sum of defective items in the sample
i¼1
n  Sx ¼ sum of non-defective items in the sample

Generating Random Variates

To generate a random set of output variates for the Multivariate Hyper Geometric
distribution, the following routine is run. Recall the notation of N, n, D1, . . ., Dk and
x1, . . ., xk.
1. Initialize D1i ¼ Di for i ¼ 1 to k, and N1 ¼ N.
2. For j ¼ 1 to n
Generate a random uniform u  U(0,1).
F¼0
For i ¼ 1 to k
p ¼ D1i/N1
F¼Fþp
If u < F, xi ¼ xi þ 1, D1i ¼ D1i  1, go to 3
Next i
3. N1 ¼ N1  1
Next j
4. Return (x1, . . ., xk).
Example 6.3 Suppose a lot of size 20 comes in to a receiving dock with three
types of defectives. There are 4 defective of type 1, 3 defectives of type 2, and
2 defectives of type 3. Eleven of the items have no defectives. A sample of size
four is taken without replacement and of need is to generate a random set of
output variates, x1, x2, x3. The method to obtain the variates is shown below.
For simplicity, the fractions are carried out only to two decimal places.
62 6 Generating Multivariate Random Variates

1. Initialize N1 ¼ 20, (D11, D12, D13) ¼ (4,3,2),


x1 ¼ 0, x2 ¼ 0, x3 ¼ 0.
2. At j ¼ 1, u ¼ 0.71 say, F ¼ 0.
At i ¼ 1, p ¼ 4/20 ¼ 0.20, F ¼ 0.20
At i ¼ 2, p ¼ 3/20 ¼ 0.15, F ¼ 0.35
At i ¼ 3, p ¼ 2/20 ¼ 0.10, F ¼ 0.45
N1 ¼ 19
3. At j ¼ 2, u ¼ 0.34 say, F ¼ 0.
At i ¼ 1, p ¼ 4/19 ¼ 0.21, F ¼ 0.21
At i ¼ 2, p ¼ 3/19 ¼ 0.16, F ¼ 0.37, x2 ¼ x2 þ 1, D12 ¼ 2
N1 ¼ 18
4. At j ¼ 3, u ¼ 0.63 say, F ¼ 0.
At i ¼ 1, p ¼ 4/18 ¼ 0.22, F ¼ 0.22
At i ¼ 2, p ¼ 2/18 ¼ 0.11, F ¼ 0.33
At i ¼ 3, p ¼ 2/18 ¼ 0.11, F ¼ 0.44
N1 ¼ 17
5. At j ¼ 4, u ¼ 0.14 say, F ¼ 0.
At i ¼ 1, p ¼ 4/17 ¼ 0.23, F ¼ 0.23, x1 ¼ x1 þ 1, D11 ¼ 3
N1 ¼ 16
6. Return (x1, x2, x3) ¼ (1,1,0)

Bivariate Normal

Consider variables x1 and x2 that are jointly related via the bivariate normal
distribution (BVN) as below:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
fðx1 ; x2 Þ ¼1=½2ps1 s2 ð1  r2Þ expff½ðx1  m1 Þ=s1 2 þ ½ðx2  m2 Þ=s2 2
 2r½ðx1  m1 Þðx2  m2 Þ=s1 s2 g=2ð1  r2 Þg

where (m1, m2, s1, s2, r) are five parameters of the distribution.

Marginal Distributions

The marginal distributions of x1 and x2 are normally distributed, whereby,

x1  Nðm1 ; s1 2 Þ
Bivariate Normal 63

and

x2  Nðm2 ; s2 2 Þ

The expected value and variance of x1 are,

Eðx1 Þ ¼ m1

Vðx1 Þ ¼ s1 2

respectively.
The corresponding values for x2 are

Eðx2 Þ ¼ m2 ;

Vðx2 Þ ¼ s2 2

The correlation between x1 and x2 is r. Note,

r ¼ s12 =ðs1 s2 Þ

where s12 is the covariance between x1 and x2. The covariance is also denoted as
C(x1,x2) and is obtained from,
Cðx1 ; x2 Þ ¼ Eðx1 x2 Þ  Eðx1 ÞEðx2 Þ

Conditional Distributions

When x1 ¼ x10, say, the conditional mean of x2 is

mx2jx10 ¼ m2 þ rðs2 =s1 Þðx10  m1 Þ

The corresponding variance is

sx2jx1 2 ¼ s2 2 ð1  r2 Þ

and is the same for all values of x1.


The associated conditional distribution of x2 given x10 is also normally
distributed as,

x2 jx10  N mx2jx10 ; sx2jx1 2
64 6 Generating Multivariate Random Variates

In the same way, when x2 ¼ x20, say, the conditional mean of x1 is

mx1jx20 ¼ m1 þ rðs1 =s2 Þðx20  m2 Þ

The corresponding variance is

sx1jx2 2 ¼ s1 2 ð1  r2 Þ

and is the same for all values of x2.


The associated conditional distribution of x1 given x20 is also normally
distributed as,

x1 jx20  N mx1jx20 ; sx1jx2 2

Generate Random Variates (x1, x2)

To generate a random pair of x1 and x2 with parameters (m1,m2, s1,s2,r), the


following routine is run.
1. Generate a random standard normal, z  N(0,1), say z1.
2. A random x1 is computed by x10 ¼ m1 þ z1s1
3. The conditional mean and variance of x2 now become, mx2|x10 ¼ m2 þ r(s2/s1)
(x10  m1) and sx2|x12 ¼ s22(1  r2) , respectively.
4. Generate another random standard normal, z  N(0,1), say z2.
5. The random x2 is computed by x20 ¼ mx2|x10 þ z2sx2|x1
6. Return (x10,x20)
Example 6.4 Suppose the pair (x1,x2) are related by the bivariate normal distribu-
tion with parameters (5,8, 1,4,0.5) and a random variate of the pair is needed. The
following routine is run.
1. Generate a random standard normal from z  N(0,1), say, z1 ¼ 0.72.
2. The random variate of x1 becomes x10 ¼ 5 þ 0.72(1) ¼ 5.72
3. The mean and variance of the conditional x2 variable now are, mx2|5.72 ¼ 8 þ 0.5
(2/1)(5.72  5.00) ¼ 8.72, and sx2|5.722 ¼ 4(1  0.52) ¼ 3
4. The conditional standard deviation is sx2|5.72 ¼ 30.5 ¼ 1.732
5. Generate another standard normal variate of z  N(0,1), say, z2 ¼ 1.08.
6. The random variate for x2 now becomes, x20 ¼ 8.72  1.08 (1.732) ¼ 6.85
7. Return (x10,x20) ¼ (5.72, 6.85).
Example 6.5 Consider the bivariate normal variables (x1,x2) with parameters
m1 ¼ 0, m2 ¼ 0, s1 ¼ 1, s2 ¼ 1 and r. A researcher is seeking the four following
cumulative probability distributions, F(0,0), F(1,0), F(0,1), F(1,1), at the three
Bivariate Lognormal 65

correlations, r ¼ 0.5, 0.0 and 0.5. Recall, F(x10,x20) ¼ P(x1  x10, and
x2  x20). To comply, a simulation model is developed to find the probabilities
needed. For a given x10, and x20, n trials are run where the values of x1 and x2 are
randomly generated to conform with the stated parameters. The program is run with
n trials where g represents the number of times in the n trials the generated values
were both less or equal to x10, and x20. The estimate of the probability is computed
by F(x10,x20) ¼ g/n.
The table below lists the simulation findings where various number of trials are
run at n ¼ 50, 100, 500 and 1,000. Note, at r ¼ 0, the true value of the cumulative
distribution is known at x10, and x20 and is listed in the table at n ¼ 1. The results
point to the need for more trials to sharpen the probability estimates.

r n F(0,0) F(1,0) F(0,1) F(1,1)


0.50 50 0.180 0.310 0.460 0.760
100 0.160 0.410 0.340 0.710
500 0.174 0.372 0.372 0.696
1,000 0.159 0.360 0.357 0.652
0.00 50 0.360 0.580 0.480 0.780
100 0.250 0.490 0.330 0.650
500 0.234 0.412 0.400 0.668
1,000 0.233 0.395 0.405 0.687
1 0.250 0.421 0.421 0.708
0.50 50 0.380 0.500 0.520 0.760
100 0.270 0.430 0.420 0.660
500 0.326 0.476 0.448 0.742
1,000 0.370 0.468 0.509 0.748

Bivariate Lognormal

When the pair x1,x2 are bivariate lognormal (BVLN), the distribution is noted as
BVLN(my1, my2, ,sy1, sy2, ry), where my1 and my2 are the means of the y1 ¼ ln(x1)
and y2 ¼ ln(x2). Also sy1 and sy2 are the corresponding standard deviations of y1
and y2. ry is the correlation between y1 and y2. The transformed pair, (y1, y2) are
distributed by the bivariate normal distribution and the notation is BVN(my1, my2,
sy1, sy2, ry). Note x1 ¼ ey1 and x2 ¼ ey2.

Generate a Random Pair (x1, x2)

To generate a random pair (x1, x2), the following routine is run.


1. Find my1, my2, sy1, sy2 and ry.
2. From the standard normal, z  N(0,1), generate two random values, say, z1, z2.
3. Get a random y1 by y10 ¼ my1 þ z1sy1
66 6 Generating Multivariate Random Variates

4. Now find the conditional mean and standard deviation of y2, given y10, from
my2|y10 ¼ my2 þ ry(sy2/sy1)(y10  my1), and sy2|y12 ¼ sy22(1  ry2).
5. Get a random y2 by y20 ¼ my2|y10 þ z2 sy2|y1.
6. Now, x1 ¼ ey10 and x2 ¼ e y20.
7. Return x1, x2.
Example 6.6 Assume x1, x2 are a pair of bivariate lognormal variables with
parameters BVLN(5,8,1,2,0.5), and a set of random variates is needed. The follow-
ing steps are followed:
1. From z  N(0,1), get the pair of random variates, say: z1 ¼ 0.72 and
z2 ¼ 1.08.
2. A random y1 becomes y10 ¼ 5 þ 0.72(1) ¼ 5.72
3. The mean and standard deviation of y2 given y10 ¼ 5.72 are my2|5.72 ¼ 8.72, and
sy2|5.72 ¼ 1.732.
4. So now, the random y2 becomes, y20 ¼ 8.72  1.08(1.732) ¼ 6.85.
5. Finally, the pair needed are x1 ¼ e5.72 ¼ 304.90 and x2 ¼ e6.85 ¼ 943.88.
6. Return (x1, x2) ¼ (304.90, 943.88).

Multivariate Normal

When k variables, x1, . . ., xk, are related by the multivariate normal distribution, the
parameters are m and S. The parameter m is a k-dimensional vector whose transpose
mT ¼ [m1 ,. . .., mk] houses the mean of each of the variables, and S is a k-by-k
matrix, that contains sij in row i and column j, where sij is the covariance between
variables i and j. Note, the covariances along the main diagonal, sii, are the
variances of variable i, i ¼ 1 to k. Thus si2 ¼ sii, i ¼ 1 to k.

Cholesky Decomposition

An important relation is the Cholesky decomposition of the matrix S where,

S ¼ CCT

and C is a k-by-k matrix where the upper diagonal is all zeros and the diagonal and
lower diagonal contain the elements cij. For a full discussion, see Gentle (1998).
The values of the elements of matrix C are computed in the three stages listed
below.
pffiffiffiffiffiffiffi
Column 1 elements: ci1 ¼ si1 = s11 i ¼ 1; . . . :; k
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
iP
1
Main diagonal elements: cii ¼ sii  c2im i ¼ 2; . . . ; k
m¼1
Multivariate Normal 67



P
j1
Lower diagonal elements: cij ¼ sij  cim cjm =cjj 1<j<i  k
m¼1

As stated earlier, the upper diagonal elements of C are all zero.

Generate a Random Set [x1, . . . , xk]

To generate a random set of variables from the k-dimensional multivariate distri-


bution, the following routine is run.
1. From the variance-covariance matrix S, compute the matrix C.
2. Using the standard normal distribution, z  N(0,1), generate k random variates:
z1,. . .., zk, and insert them in a k-dimensional vector Z, where the transpose is
ZT ¼ [z1,. . .., zk].
3. The random variates of the k variables will be placed in a k-dimensional vector
X, whose transpose is XT ¼ [x1, . . ., xk].
4. So now, the k random variates of the k variables are obtained by the following
matrix manipulation, X ¼ CZ þ m.
Another way is to compute the xi as below:
P
k
xi ¼ cij zj þ mi i ¼ 1; . . . ; k
j¼1

Example 6.7 When k ¼ 2, the matrices, S, C and X are as below:





s11 s12 s21 rs1 s2
S¼ ¼
s21 s22 rs2 s1 s22



c c12 s1 0
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
C ¼ 11 ¼
c21 c22 rs2 s2 ð1  r2 Þ



x1 m1 þ z1 sp
1 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X¼ ¼
x2 m2 þ z1 rs2 þ z2 s2 ð1  r2 Þ

Example 6.8 Consider the k ¼ 3 dimensional multivariate normal distribution


with the mean vector and variance covariance matrix as below.
32
100
m ¼ 4 80 5
140
2 3
64 20 10
S ¼ 4 20 25 36 5
10 36 100
68 6 Generating Multivariate Random Variates

The C matrix becomes:


2 3
8 0 0
C ¼ 4 2:5 4:33 0 5
1:25 9:04 4:09

When the three entries of the standard normal variates are: z1 ¼ 0.02,
z2 ¼ 2.00, z3 ¼ 1.20, and the matrix manipulation of X ¼ CZ þ m is applied,
the random normal variates become:
2 3 2 3
x1 99:84
X ¼ 4 x2 5 ¼ 4 88:16 5
x3 153:20

Multivariate Lognormal

When k variables, x1, . . ., xk, are related by the multivariate lognormal distribution, the
associated bivariate normal variables are y1,. . ., yk, where yi ¼ ln(xi) i ¼ 1,. . ., k. The
parameters for this distribution are derived from the k variables, y1,. . ., yk.
The parameters are listed in the transposed k-dimensional vector, m, whose transpose
is mT ¼ [m1 ,. . .., mk], that houses the mean of each of the yi variables, and the k-by-k
matrix, S, that contains sij in row i and column j, where sij is the covariance between
variables yi and yj. Note, the covariances along the main diagonal, sii, are the
variances of variable yi, i ¼ 1 to k. Thus si2 ¼ sii, i ¼ 1 to k.

Cholesky Decomposition

The Cholesky decomposition of the matrix S is used here where,

S ¼ CCT

and C is a k-by-k matrix where the upper diagonal is all zeros and the diagonal and
lower diagonal contain the elements cij. As shown earlier, the values of the elements
of matrix C are computed in the three stages listed below.
pffiffiffiffiffiffiffi
Column 1 elements: ci1 ¼ si1 = s11 i ¼ 1; . . . :; k
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
iP
1
Main diagonal elements: cii ¼ sii  c2im i ¼ 2; . . . ; k
m¼1


P
j1
Lower diagonal elements: cij ¼ sij  cim cjm cjj 1<j<i  k
m¼1

The upper diagonal elements of C are all zero.


Multivariate Lognormal 69

Generate a Random Set [x1, . . . , xk]

To generate a random set of variables from the k-dimensional multivariate distri-


bution, the following routine is run.
1. Convert the lognormal variables x to y by yi ¼ ln(xi) i ¼ 1, . . ., k.
2. From the data of (y1, . . .,yk), compute the mean and variance-covariance matrix S.
3. From S compute the matrix C.
4. Using the standard normal distribution, z  N(0,1), generate k random variates:
z1,. . .., zk, and insert them in a k-dimensional vector Z, where the transpose is
ZT ¼ [z1,. . .., zk].
5. The random variates of the k variables will be placed in a k-dimensional vector
Y, whose transpose is YT ¼ [y1, . . ., yk].
6. So now, the k random variates of the k variables are obtained by the following
matrix manipulation, Y ¼ CZ þ m.
Another way is to compute the yi as below:
P
k
yi ¼ cij zj þ mi i ¼ 1; . . . ; k
j¼1

7. Finally, convert the k random y variates to k random x variates by the relation


xi ¼ eyi, i ¼ 1,,,., k.
8. Return (x1, . . ., xk)
Example 6.9 Consider variables (x1, x2, x3) from the multivariate lognormal
distribution with converted variables (y1, y2, y3) from the associated multivariate
normal distribution, and their parameters below:
2 3
5
my ¼ 4 2 5
8
2 3
64 20 10
Sy ¼ 4 20 25 36 5
10 36 100

The C matrix from the above becomes:


2 3
8 0 0
C ¼ 4 2:5 4:33 0 5
1:25 9:04 4:09

When the three entries of the standard normal variates are: z1 ¼ 0.72, z2 ¼ 1.00,
z3 ¼ 1.04, and the matrix manipulation of Y ¼ CZ þ my is applied, the random
normal variates become:
70 6 Generating Multivariate Random Variates

2 3 2 3
y1 0:76
Y ¼ 4 y2 5 ¼ 4 8:13 5
y3 4:11

Finally, convert y1, y2, y3, to x1, x2, x3 by xi ¼ eyi for i ¼ 1, 2, 3. The multivari-
ate lognormal variates are below.
2 3 2 3
x1 1:82
X ¼ 4 x2 5 ¼ 4 3394:80 5
x3 0:02

Summary

This chapter considers some of the more popular multivariate distributions and
shows how to generate random variates for each. The probability distributions
described are the following: multivariate discrete arbitrary, multinomial, multivari-
ate hyper geometric, bivariate normal, bivariate lognormal, multivariate normal and
multivariate lognormal. The Cholesky decomposition method is also presented
because of its important role in generating random variates from the multivariate
normal and multivariate lognormal distributions.
Chapter 7
Special Applications

Introduction

This chapter shows how to generate random variates for applications that are not
directly bound by a probability distribution as was described in some of the earlier
chapters. The applications are instructively useful and often are needed as such in
simulation models. They are the following: Poisson process, constant Poisson
process, batch arrivals, active redundancy, standby redundancy, random integers
without replacement and poker.

Poisson Process

There are many simulation models where a series of events take place over a fixed
time horizon, say, from t ¼ 0 to T. In this section, the arrivals are from a Poisson
process, whereby the time between events are distributed via the exponential
density. Further, when the expected time between arrivals changes over the time
horizon, additional information is needed concerning various points of time in the
interval, B(j), and the associated expected time between arrivals, A(j). At t ¼ 0, the
average time between arrivals is A(1), and the point in time is B(1) ¼ 0, An interval
of time later, say at B(2), the average time between arrivals is A(2), and so forth. In
this way, A(j) and B(j) jointly identify how the average time between arrivals vary
from t ¼ 0 to T. At t ¼ T, the last entry of j occurs and is denoted as j ¼ J, and B
(J) ¼ T. Note, A(J) gives the associated average time at t ¼ T, whereby B(J) ¼ T.
For any other time from 0 to T, interpolation is used to determine the average time
at t, as is described in the routine below:
1. Parameters and initial values: T ¼ length of time horizon, J ¼ number of points in
time from 0 to T, B(j) ¼ the j-th point in time, and A(j) ¼ average time between
arrivals at time B(j), j ¼ 1 to J, t(0) ¼ 0 is the starting point, and n is an index.

N.T. Thomopoulos, Essentials of Monte Carlo Simulation: Statistical Methods 71


for Building Simulation Models, DOI 10.1007/978-1-4614-6022-0_7,
# Springer Science+Business Media New York 2013
72 7 Special Applications

2. Using t(n) and {B(j) j ¼ 1 to J}, find the minimum jo where t(n)  B(jo).
3. The average time between arrivals becomes: A ¼ A(jo) + {[t(n)B(jo)]/
[B(jo + 1)B(jo)]}[A(jo + 1)A(jo)].
4. Get a random uniform u ~ U(0,1).
5. Use A and u and the exponential density to generate the random time between
the arrivals, x ¼ A  lnð1  uÞ.
6. If [t(n1) + x]  T, n ¼ n + 1, t(n) ¼ [t(n1) + x], go to 2.
7. If [t(n1) + x] > T, end, go to 8.
8. Return {t(i), i ¼ 1 to n}.
Example 7.1 Suppose T ¼ 24 and J ¼ 5. {B(j), j ¼ 1 to 5} ¼ {0, 8, 14, 18, 24},
{A(j), j ¼ 1 to 5} ¼ {5, 4, 2, 3, 5}, and the time between arrivals are from a
Poisson process, with the exponential density. The routine below shows how to find
the time of arrival for the first three units:
1. n ¼ 0.
At t(0) ¼ 0, jo ¼ 1, since B(1)  t(0) < B(2).
The average time is: A ¼ 5 + (00)/(80)  (45) ¼ 5.0.
Get u ~ U(0,1), assume u ¼ 0.72.
The random time is: x ¼ 5.0  ln(10.72) ¼ 6.36.
n ¼ n + 1 ¼ 1, t(1) ¼ 0 + 6.36 ¼ 6.36.
2. n ¼ 1.
At t(1) ¼ 6.36, jo ¼ 1, since Bð1Þ  tð1Þ<Bð2Þ.
The average time is: A ¼ 5 þ ð6:36  0Þ=ð8  0Þ  ð4  5Þ ¼ 4:205.
Get u ~ U(0,1), assume u ¼ 0.38.
The random time is: x ¼ 4:205  lnð1  0:38Þ ¼ 2:01.
n ¼ n + 1 ¼ 2, t(2) ¼ 6.36 + 2.01 ¼ 8.37.
3. n ¼ 2.
At t(2) ¼ 8.37, jo ¼ 2, since Bð2Þ  tð2Þ<Bð3Þ.
The average time is: A ¼ 4 þ ð8:37  8Þ=ð14  8Þ  ð2  4Þ ¼ 3:666.
Get u ~ U(0,1), assume u ¼ 0.17.
The random time is: x ¼ 3:666  lnð1  0:17Þ ¼ 0:68.
n ¼ n + 1 ¼ 3, t(3) ¼ 8.37 + 0.68 ¼ 9.05.
4. The first three arrivals occur at times: 6.36, 8.37, 9.05.

Constant Poisson Process

In the event the expected inter-arrival time is the same for the whole time horizon,
then J ¼ 2 periods in the time horizon, B(1) ¼ 0 is the start time, B(2) ¼ T is the
end time, and A(1) ¼ A(2) are the arrival rates for time periods 1 and 2.
Active Redundancy 73

Batch Arrivals

Consider a simulation model where units arrive to a system in batch sizes of one or
more. The model generates the random time of arrival and the associated batch size.
One way to describe the batch size distribution is by the modified Poisson distri-
bution. Since each individual batch size, x, is one or larger, the expected value of x
is EðxÞ  1:0. The modified Poisson becomes x ¼ y + 1, where y is a Poisson
variable with mean m ¼ E(x)1. So, to generate a random x, the following routine
is run:
1. For a Poisson variable y with parameter m, generate a random Poisson y.
2. Set x ¼ y + 1.
3. Return x.
Example 7.2 Suppose the average batch size for a simulation run is 1.6, and a
random variate of the batch size, y, is needed. The following routine is run:
1. From the Poisson distribution with parameter m ¼ E(x)1 ¼ 0.6, generate a
random y. Assume the random Poisson y ¼ 0.
2. Hence, x ¼ y + 1 ¼ 0 + 1 ¼ 1.
3. Return x ¼ 1.

Active Redundancy

An active redundancy is when the reliability of a unit, with m components, is


satisfied as long as one of the components of the unit is still running. The m
components are run simultaneously and the last component to fail is the run time
of the unit. Assume the run time, y, of each component is based on the exponential
distribution with parameter y. The run times for the m components are (y1, . . ., ym)
and the run time for the unit becomes, x ¼ max(y1, . . ., ym).

Generate a Random Variate

To generate a random variate for a unit with an active redundancy of m components


with expected run time EðyÞ ¼ 1=y, the following routine is run:
1. For i ¼ 1 to m.
2. From the continuous uniform u ~ U(0,1), generate a random u.
3. Generate a random exponential by yi ¼ ð1=yÞ lnð1  uÞ.
4. Next i.
5. x ¼ max(y1, . . .,ym).
6. Return x.
74 7 Special Applications

Example 7.3 Consider a unit that has three active redundant components with run
times following the exponential density and each with an expected run time of 10 h.
The routine below generates one random variate of the unit run time:
1. At i ¼ 1, generate a random exponential with mean 10, say, y1 ¼ 7.4.
2. At i ¼ 2, generate a random exponential with mean 10, say, y2 ¼ 15.1.
3. At i ¼ 3, generate a random exponential with mean 10, say, y3 ¼ 4.2.
4. The run time for the unit is x ¼ max(7.4, 15.1, 4.2) ¼ 15.1.
5. Return x ¼ 15.1.
Example 7.4 A component is in the design stage and will include m identical
subcomponents in an active redundancy manner. All m subcomponents start and
run together. The component run time ends when the last subcomponent fails. The
time to fail, t, for each subcomponent follows a gamma distribution with parameters
k ¼ 3 and 1/ y ¼ 6.0 (1,000 h), whereby the time to fail, t, has a mean of
E(t) ¼ 18 (1,000 h). The design engineer wants to know the minimum number
of the subcomponents to include in the active redundancy package so that the time
to fail for the component, T, has a reliability of R equal to 0.99 or greater at
20 (1,000 h). Note T ¼ max(t1, . . .., tm) and the goal is to have
R ¼ P½T  20 ð1000 hoursÞ  0:99.
A simulation model is developed to find the minimum number of
subcomponents to achieve the reliability specified. In the table, m denotes the
number of subcomponents, n is the number of trials in a run – where in each trial,
T is computed from m random variates of the gamma distribution with the stated
parameters, g is number of the trials in the run where T  20 (1,000 h), and
R ¼ g/n is an estimate of the probability that T will be 20 (1,000 h) or greater. The
simulation results are listed below where the number of components (m) increases
by one with each simulation run of n ¼ 1,000 trials. Note, at m ¼ 11, R ¼ 0.993
and at m ¼ 12, R ¼ 0.998. Hence, the minimum value of m is 11 and the reliability
is estimated as R ¼ 0.993.

m n g R
1 1,000 318 0.318
2 1,000 588 0.588
3 1,000 707 0.707
4 1,000 802 0.802
5 1,000 883 0.883
6 1,000 927 0.927
7 1,000 961 0.961
8 1,000 975 0.975
9 1,000 973 0.973
10 1,000 980 0.980
11 1,000 993 0.993
12 1,000 998 0.998
Standby Redundancy 75

Standby Redundancy

A unit with standby redundancy is defined when the run time of the unit is the sum
of the run times of m components that are run one after the other. That is, when one
component fails, another starts running. This model assumes the run time, y, of each
component is based on the exponential distribution with parameter y. The run times
for the m components are (y1, . . ., ym), and the run time for the unit becomes,
x ¼ (y1 + . . . + ym).

Generate a Random Variate

To generate a random variate for a unit with a standby redundancy of m components


with expected run time EðyÞ ¼ 1=y, the following routine is run:
1. For i ¼ 1 to m.
2. From the continuous uniform u ~ U(0,1), generate a random u.
3. Generate a random exponential by yi ¼ ð1=yÞ lnð1  uÞ.
4. Next i.
5. x ¼ (y1 + . . . + ym).
6. Return x.
Example 7.5 Consider a unit that has four standby redundant components with run
times following the exponential density and each with an expected run time of 5 h.
The routine below generates one random variate of the unit run time:
1. At i ¼ 1, generate a random exponential with mean 5, say, y1 ¼ 2.7.
2. At i ¼ 2, generate a random exponential with mean 5, say, y2 ¼ 9.3.
3. At i ¼ 3, generate a random exponential with mean 5, say, y3 ¼ 7.2.
4. At i ¼ 4, generate a random exponential with mean 5, say, y4 ¼ 1.8.
5. The run time for the unit is x ¼ (2.7 + 9.3 + 7.2 + 1.8) ¼ 19.0.
6. Return x ¼ 19.0.
Example 7.6 A component is in the design stage and will include m identical
subcomponents in a standby redundancy manner. One subcomponent is run at a
time; when it fails, and another is still available, the next subcomponent starts its run.
The component run time ends when the last subcomponent fails. The time to
fail for each subcomponent follows a gamma distribution with parameters k ¼ 3
and 1=y ¼ 6:0 (1,000 h), whereby the time to fail, t, has a mean of E(t) ¼ 18
(1,000 h). The design engineer wants to know the minimum number of the sub-
components to include in the standby redundancy package so that the time to fail for
the component, T, has a reliability of R equal to 0.99 or larger at 20 (1,000 h). Note
T ¼ t1 + . . .. + tm and the goal is to have R ¼ P½T  20 ð1000 hoursÞ  0:99.
A simulation model is developed to find the minimum number of subcomponents
to achieve the reliability specified. In the table, m denotes the number of
subcomponents, n is the number of trials in a run – where in each trial T is
76 7 Special Applications

computed from m random variates of the gamma distribution with the stated
parameters, g is number of the trials in the run where T  20 (1,000 h), and
R ¼ g/n is an estimate of the probability that T will be 20 (1,000 h) or greater.
The simulation results are listed below where the number of components (m)
increases by one with each simulation run of n ¼ 1,000 trials. Note, at m ¼ 3,
R ¼ 0.993 and at m ¼ 4, R ¼ 0.999. Hence, the minimum value of m is three
and the reliability is estimated as R ¼ 0.993.

m n g R
1 1,000 348 0.348
2 1,000 871 0.871
3 1,000 993 0.993
4 1,000 999 0.999

Random Integers Without Replacement

Consider N unique items where n of them will be arranged in a random sequence


and none will be repeated. The N items are identified by D(i) i ¼ 1 to N. The n
items in sequence are E(1), . . ., E(n).

Generate a Random Sequence

The routine below shows how to generate a random sequence of n samples without
replacement from N unique items in a population:
1. The parameters are N ¼ population size, n ¼ sample size without replace-
ment, {D(i), i ¼ 1 to N} identifies the N unique items, and {E(j) j ¼ 1 to n}
denotes the n random items in sequence.
2. ND ¼ N ¼ number of unique items remaining.
3. For j ¼ 1 to n.
4. Generate a random discrete uniform integer, k, (1ND).
Set E(j) ¼ D(k).
5. ND ¼ ND1.
6. For m ¼ k to ND.
7. D(m) ¼ D(m + 1).
8. Next m.
9. Next j.
10. Return [E(1), . . ., E(n)].
Example 7.7 Suppose a population of N ¼ 10 integers, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
where n ¼ 5 will be selected randomly in sequence. The routine below is run:
1. At j ¼ 1, ND ¼ 10 and D(i) ¼ (1,2,3,4,5,6,7,8,9,10).
Using the discrete uniform (1,10), randomly generate k, say k ¼ 7.
Poker 77

Hence, E(1) ¼ 7. ND ¼ 9 and D(i) ¼ (1,2,3,4,5,6,8,9,10).


2. At j ¼ 2, ND ¼ 9 and D(i) ¼ (1,2,3,4,5,6,8,9,10).
Using the discrete uniform (1,9), randomly generate k, say k ¼ 4.
Hence, E(2) ¼ 4. ND ¼ 8 and D(i) ¼ (1,2,3,5,6,8,9,10).
3. At j ¼ 3, ND ¼ 8 and D(i) ¼ (1,2,3,5,6,8,9,10).
Using the discrete uniform (1,8), randomly generate k, say k ¼ 6.
Hence, E(3) ¼ 8. ND ¼ 7 and D(i) ¼ (1,2,3,5,6,9,10).
4. At j ¼ 4, ND ¼ 7 and D(i) ¼ (1,2,3,5,6,9,10).
Using the discrete uniform (1,7), randomly generate k, say k ¼ 4.
Hence, E(4) ¼ 5. ND ¼ 6 and D(i) ¼ (1,2,3,6,9,10).
5. At j ¼ 5, ND ¼ 6 and D(i) ¼ (1,2,3,6,9,10).
Using the discrete uniform (1,6), randomly generate k, say k ¼ 6.
Hence, E(5) ¼ 10. ND ¼ 5 and D(i) ¼ (1,2,3,6,9).
6. Return (7, 4, 8, 5, 10).

Poker

A deck of cards has 52 unique items. For simplicity, the items are here labeled as
D(i) i ¼ 1 to 52. The notation is (H, D, S, C) for (hearts, diamonds, spades, clubs)
and (A,2,3,4,5,6,7,8,9,10,J,Q,K) for (ace, 2,3,4,5,6,7,8,9,10, jack, queen, king). The
52 items become:
{D(i) i ¼ 1 to 13} ¼ {AH, 2H, 3H, 4H, 5H, 6H, 7H, 8H, 9H, 10H, JH, QH, KH}
{D(i) i ¼ 14 to 26} ¼ {AD, 2D, 3D, 4D, 5D, 6D, 7D, 8D, 9D, 10D, JD, QD, KD}
{D(i) i ¼ 27 to 39} ¼ {AS, 2S, 3S, 4S, 5S, 6S, 7S, 8S, 9S, 10S, JS, QS, KS}
{D(i) i ¼ 40 to 52} ¼ {AC, 2C, 3C, 4C, 5C, 6C, 7C, 8C, 9C, 10C, JC, QC, KC}
The model developed here assumes two players (A,B) and each are dealt five
cards from the deck of 52 cards. Altogether, ten cards are dealt, the first five to
player A and the next five to B.

Generate Random Hands to Players A and B

The routine below shows how to generate random hands of five cards each to
players A and B:
1. Using the random integer method of 52 unique integers, randomly generate
n ¼ 10 integers in sequence and label as: {E(j) j ¼ 1 to 10}.
2. Use {E(j) j ¼ 1 to 5} to select from the set D(i), the five cards for player A.
Label as {C(k) k ¼ 1 to 5}.
3. Use {E(j) j ¼ 6 to 10} to select from the set D(i), the five cards for player B.
Label as {C(k) k ¼ 6 to 10}.
78 7 Special Applications

4. Return {C(k) k ¼ 1 to 5} for player A, and {C(k) k ¼ 6 to 10} for player B.


Example 7.8 The routine below shows how to deal five cards to players A and B
from the deck of 52 cards, where the dealt cards are without replacement:
1. Using the Random Integer method with N ¼ 52, generate n ¼ 10 randomly
sequenced integers, {E(1), . . ., E(10)}. Say, {27, 5, 24, 16, 14, 32, 47, 31, 4, 25}.
2. For player A, the first five integers {27, 5, 24, 16, 14} yield {AS, 5H, JD,
3D, AD}.
3. For player B, the next five integers {32, 47, 31, 4, 25} yield {6S, 8C, 5S, 4H,
QD}.
4. Return C(k) k ¼ 1 to 5 ¼ {AS, 5H, JD, 3D, AD} for player A, and C(k)
k ¼ 6 to 10 ¼ {6S, 8C, 5S, 4H, QD} for player B.

Summary

This chapter concerns applications that are not from the common probability
distributions, continuous or discrete. The applications are instructive since they
show some popular deviations in generating random variates as is often needed in
building computer simulation models. The applications presented are the Poisson
process, constant Poisson process, batch arrivals, active redundancy, standby
redundancy, random integers without replacement and poker.
Chapter 8
Output from Simulation Runs

Introduction

Computer simulation models are generally developed to study the performance of a


system that is too complicated for analytical solutions. The usual goal of the analyst
is to develop a computer simulation model that emulates the activities of the actual
system as best as possible. Many of these models are from terminating and
nonterminating systems.
A terminating system is when a defined starting event B and an ending event C
are specified, and so, each run of the simulation model begins at B and ends at C.
This could be a model of a car wash that opens each day at 6 a.m. and closes at 8 p.
m. Each simulation run would randomly emulate the activities from B to C.
A nonterminating system is where there is no beginning or ending events to
the system. The system often begins in a transient stage and eventually falls into
either an equilibrium stage or a cyclical stage. This could be a study of a
maintenance and repair shop that is always open. At the outset of the simulation
model run, the system is empty and may take some time to enter either an
equilibrium stage or a cyclical stage. This initial time period is called the
transient stage.
A nonterminating system with transient and equilibrium stages might be a
system where the inter-arrival flow of new customers to the shop is steadily coming
from the same probability distribution. In the run of the simulation model, the
system begins in the transient stage and thereafter the flow of activities continues in
the equilibrium stage.
A nonterminating model with transient and cyclical stages could be a model of a
system where the probability distribution of the inter-arrival flow of new customers
varies by the hour of the day. The simulation run begins in a transient stage and
passes to the cyclical stage thereafter.
In either system, while the analyst is developing the computer model, he/she
includes code in the model to collect data of interest for later study. This output data
is used subsequently to statistically analyze the performance of the system.

N.T. Thomopoulos, Essentials of Monte Carlo Simulation: Statistical Methods 79


for Building Simulation Models, DOI 10.1007/978-1-4614-6022-0_8,
# Springer Science+Business Media New York 2013
80 8 Output from Simulation Runs

Terminating System

A terminating system is when at some point in time, or in events, the system comes
to a natural end. Various portions of each run of the computer simulation model
may be of interest to the analyst, and for each portion, a collection of output
measures are saved for subsequent statistical analysis. To illustrate, an example is
provided below.
Suppose an analyst is developing a computer simulation model of a large car
wash system. Assume, the business opens at 8 a.m. and closes at 8 p.m; and at the
beginning of the day, the car wash is empty. During the open hours, the customer
arrival rate varies over the day. The end of the day is when the last car enters before
closing time of 8 p.m. The analyst may be interested in the activities at different
interval times of the day, perhaps from 8 to 12 a.m., 12 to 4 p.m. and 4 to 8 p.m. For
convenience, the time periods are labeled as j (j ¼ 1 to 3). Also suppose the
measures of interest (denoted by index k), are collected for each time interval j.
In the example, assume these are the following:
x1j ¼ number vehicle arrivals in j
x2j ¼ number vehicles serviced without waiting in j
x3j ¼ total vehicle wait time in j
x4j ¼ total system idle time in j
p1j ¼ x2j/x1j ¼ percent of vehicles that do not need to wait in j
Suppose further, the model is run n times, each with a different string of random
variates, where i denotes the run number, whereby i ¼ 1 to n. So the output data for
the n runs of the simulation model would be the following:

xkji k ¼ 1 to 4; j ¼ 1 to 3 and i ¼ 1 to n

p1ji k ¼ 1; j ¼ 1 to 3 and i ¼ 1 to n

As demonstrated, a large variety of output data could be collected for the


simulation model and this is the type of data that will subsequently require some
sort of statistical analysis to measure the performance of the system. Chapter 9
describes the typical statistical methods.

Nonterminating Transient Equilibrium Systems

The computer simulation model could be for a system that is nonterminating and
evolves into an equilibrium state. The computer model would begin empty and flow
through a transient stage prior to reaching the equilibrium stage. To illustrate,
suppose the system under study is a mixed model assembly line where at the
beginning of the simulation there are no units on the line. One by one, units are
Nonterminating Transient Equilibrium Systems 81

placed in station 1 and they move on up to station 2 and beyond, while new units are
arriving to station 1. When the line is filled up in all stations, the transient staged
ends and this is the start of the equilibrium stage. From that point on the system is in
its equilibrium stage. In the example provided, this is fairly obvious; but in the
general case, it is not obvious and a concern is how to determine when the transient
stage ends.

Identifying the End of the Transient Stage

Suppose a variable of interest from the model is denoted as xki, the average of
output index k in the i-th interval (batch) of events. The batches are run one after the
other and the average for each batch is measured on every output index k. The
difference from one batch to the previous is measured and the end of the transient
stage is signaled when the differences begin to cycle around zero. The analyst seeks
the event index i where thereafter, the average difference between two successive
measures is sufficiently close to zero, i.e., ðxk;iþ1  xk;i Þ  0 . So, the transient
stage, for output variable k, ends when such value of i is identified, say i ¼ A,
whereby the equilibrium stage follows. For convenience sake, the analyst would
likely choose one value of A that defines the end of the transient stage for all K
output variables.

Output Data

The output for each run begins after the transient period of A events has elapsed.
When K is the number of output variables of interest, and n is the number of
simulation runs during the equilibrium period, the output becomes the following:

xki k ¼ 1 to K and i ¼ 1 to n:

This is the output data that is subsequently analyzed by statistical methods.


Example 8.1 A one service facility with infinite capacity queuing system is under
review with parameters ta ¼ expected time between arrivals and ts ¼ expected
service time. The arrival rate is l ¼ 1/ta and the service rate is m ¼ 1/ts. The inter-
arrival times and the service times are exponentially distributed. The utilization
ratio is r ¼ l/m ¼ ts/ta where r must be less than one to attain an equilibrium
system. This is a common queuing system listed in many textbooks. Two of the
performance measures of this system is the expected time a unit is in the system,
denoted as w, and another is the probability a new arrival is delayed in the queue
before it enters the service facility, Pd. The analytical solution for this model is
w ¼ 1/[(1  r)m] and Pd ¼ r.
82 8 Output from Simulation Runs

A simulation model is developed for this system to demonstrate some of the


concepts. Two set of parameters are shown, one is where r ¼ 0.10 and another when
r ¼ 0.50. In both situations, m ¼ 1. The simulation is run in batch sizes on n ¼ 500
arrivals to measure the average time a unit spends in the system, w, and the
probability a new arrival has to wait for service, Pd. The results are listed in the
table below where arrivals reach 3,000. In the table, for each batch of n ¼ 500
arrivals, w is the average time an arrival is in the system and Pd is the portion that are
delayed in the queue before entering the service facility. The difference from
one batch to another is measured as dw and dPd. For example, at r ¼ 0.1,
dw ¼ (1.05  0.00) ¼ 1.05 for the first batch, dw ¼ (1.14  1.05) ¼ 0.09 for
the second batch, and so forth. The end of the transient stage is signaled when the
difference from one batch to another start to cycle around zero.

r ¼ 0.10
Runs w Pd dw dPd
1–500 1.05 0.086 1.05 0.086
501–1,000 1.14 0.112 0.09 0.026
1,001–1,500 1.18 0.106 0.04 0.006
1,501–2,000 1.05 0.116 0.13 0.010
2,001–2,500 1.01 0.081 0 .04 0.035
2,501–3,000 1.08 0.106 0.07 0.025

r ¼ 0.50
Runs w Pd dw dPd
1–500 1.75 0.448 1.75 0.448
501–1,000 2.52 0.548 0.77 0.100
1,001–1,500 2.01 0.474 0.51 0.074
1,501–2,000 1.78 0.462 0.23 0.012
2,001–2,500 2.05 0.516 0.27 0.044
2,501–3,000 1.77 0.524 0.28 0.008
At r ¼ 0.10, note how dw and dPd start to cycle around zero after 1,000 arrivals,
and this signals the transient stage is ending at n ¼ 1,000 arrivals. At r ¼ 0.50, dw
and dPd both start to cycle around zero after 1,500 arrivals, indicating the transient
stage ends at n ¼ 1,500 arrivals, and the equilibrium stage begins.

Partitions and Buffers

Another way to collect the data for this system is described here. For all the events
after the transent stage ends, the analyst could specify two parameters, N and M that
will be used in partitions and buffers, respectively. Each partition is of length N and
every buffer is of length M, where typically N  M . As the simulation model
progresses, the partitions and buffers will follow in a leapfrog manner one after the
other. For example, the progression of events is the following:
Nonterminating Transient Cyclical Systems 83

1 to N Partition 1
N + 1 to N + M Buffer 1
N + M + 1 to 2N + M Partition 2
2N + M + 1 to 2N + 2M Buffer 2
2N + 2M + 1 to 3N + 2M Partition 3
...

The run stops after n partitions.


Statistical measures will only include the data from the partitions, and not the
buffers. The buffers are needed to allow the measures from one partition to the next
to be far enough apart from each other where the status of one partition does not
influence the status of the next. This way, the variable measures from the individual
partitions are independent.
Suppose n is the number of partitions in the run and K is the number of output
measures of interest that are collected for each partition. So, for output measure k of
partition i, the output data saved would be xki where k ¼ 1 to K and i ¼ 1 to n.

Nonterminating Transient Cyclical Systems

The computer simulation model could be for a system that is nonterminating and
progresses into a cyclical state. The computer model would begin empty and flow
through a transient stage prior to reaching the cyclical stage. To illustrate, suppose
the system under study is a car repair center where customers leave their vehicles
for maintenance or repair. The shop is open 12 h a day and the vehicles remain in
the system, even overnight, until the service is finished. The arrival rate varies by
the time of day and as such the system follows a cyclical daily pattern. In the
computer simulation model, after the transition stage ends, the status evolves into
daily cycles.

Output Data

Upon identifying A, the number of events until the cyclical stage begins, the
computer simulation model now runs with n different strings of random numbers.
The output for each run begins after the transient period of A events have elapsed.
When K is the number of output measures of interest and n is the number of runs,
the output becomes the following:

xki k ¼ 1 to K and i ¼ 1 to n:

This is the output data that is subsequently analyzed by statistical study.


84 8 Output from Simulation Runs

Cyclical Partitions and Buffers

The output measures might be collected for each cycle and even at different
intervals of the cycle. Suppose K is the number of variables of interest, J is the
number of intervals in a cycle to measure, and n is the number of partitions. Hence,
the data collected after n partitions is the following:

xkji for k ¼ 1 to K; j ¼ 1 to J and i ¼ 1 to n

This would be the data for the subsequent statistical analysis.


To ensure the data from each cycle is independent, the concept of partitions and
buffers could still be in place. To begin, the partitions would be the every second
day cycle and the buffer would be the day in-between. So, the n partitions collected
for the output would come from every second day of the simulation run. On some
occasions, the analyst may select each third or fourth day to represent a partition.

Other Models

Some simulation models are not time or event related and have none of the
associated traits described earlier as terminating, transient or equilibrium. Instead,
the simulation model may be used to develop data that can be used in subsequent
analysis. Several examples are provided below.

Forecasting Database

An analyst is testing a time series forecasting system that uses up to 24 months of


historical demands and generates forecasts for the coming 12 months. The forecasts
are of the horizontal, trend and seasonal type. The forecast system processes one
part at a time and determines the forecast model to use and estimates the
coefficients associated with the forecast model. The forecasts for each of the next
12 months are then generated.
To analyze the efficiency of the forecast system, the analyst wants to generate a
database record for a series of parts to process through the forecast system. This
requires the fields in the part record as: part number, description (optional), number
months of history (1–24), and the 24 most recent history demands that are available
(could be from horizontal, trend or seasonal). The simulation model would need to
generate the data, one field at a time, in a way that is realistic and would allow the
analyst to measure the efficiency of the forecasting system.
The simulation model could generate any number of part number records, 10,
100, 1,000, as needed. It would be good for each part record to have a comment field
to identify the type of history data (horizontal, trend, seasonal) so the results could
Other Models 85

compare with the forecast model generated. The analyst could also occasionally
insert an outlier demand in one of the history fields to see how competent the
forecast system is to detect for outliers and adjust accordingly. The output record
for each part might include the following:
Part, Comments, Number Months History, D1, D2, . . .., D24.

Forecast and Replenish Database

In the event the system also has replenish capability, the forecast-replenish
system would then compute the order size, safety stock, order point and order
level. Depending on the on-hand and on-order data, the system would compute
the replenish quantity needed, if any, for each part. The simulation part data now
includes fields with the following: on-hand, on-order, cost per unit, multiple
quantity, minimum buy quantity, lead time, and price break data when
they pertain.
The simulation model has to coordinate this data for each part with the above
forecast data generated to yield a realistic database record for each part.
Comments should be included in a field to allow the analyst to compare the
forecast-replenish system results with the data provided. When the on-hand and
on-order are low, a replenish quantity should be called in the subsequent
replenishment routine. When the on-hand is low and the on-order high, no
replenish quantity is needed, and so forth. The additional data per part may
include the following:
Part, Cost, Multiple Quantity, Min Quantity, Lead Time, On-Hand, On-Order,
Price Break Data.
Example 8.2 An inventory manager is seeking guidance on how to set the
forecasting parameter, a (alpha) that plays an important role in controlling the
inventory. In particular, the horizontal single smoothing forecasting model is in
use to generate the forecasts for the future months for many of the parts in the
inventory. The particular values of a under consideration are: 0.05, 0.10, 0.20,
0.30, 0.40 and 0.50.
A simulation model is developed to randomly generate, for each part, 48 months
of demand history that follows a horizontal demand pattern with a coefficient of
variation (cov) set at 0.30. The demands are randomly generated using the normal
distribution. The cov implies, the standard deviation of the demands is s ¼ 0.30m
where m is the average demand per month. Essentially, cov ¼ s/m where s is the
standard deviation of each monthly demand. So for each part in the simulation
study, the demands generated are: x1, . . .. ,x48.
The forecast model is run through the first 24 months of history and the
forecast for the next 12 months of demands is generated. Starting with the month
24 forecast, the forecast errors are measured for each of the next 12 months, and
the standard error of the forecast error is tabulated. From month 24 to month 36,
the forecast model moves forward and the forecast errors are measured in the
same way.
86 8 Output from Simulation Runs

Altogether, 13 sets of forecasts are generated for each part, (one each for months
24–36), and the corresponding forecast errors are measured. Finally, the cov of the 1
month ahead forecast error is measured by cov ¼ s= x , where s is the standard
deviation of 1 month ahead forecast error, and x is the average 1 month demand.
This process is followed for each of the six parameter values of a listed earlier,
and also for 100 parts where the demand stream of 48 months are each generated
with a different set of random numbers. The average cov from all of the 100 parts
are listed in the table below for each of the a values under review. The results
clearly show where the smaller value of the parameter a yields the best forecast
results. The forecaster is cautiously aware that in the simulation model all the data
are from a horizontal demand pattern and the accuracy results are for history
patterns that are truly of that type.

a Cov
0.05 0.297
0.10 0.305
0.20 0.315
0.30 0.318
0.40 0.325
0.50 0.344

Example 8.3 An inventory manager is concerned on the forecast accuracy for a


part depending on the number of month’s history (nmh) demand that is available
to generate the forecast. The forecast manager is using the horizontal single
smoothing model that generates the forecasts for the future months, for many
of the parts in the inventory. The parameter for the forecast model is set as
a ¼ 0.10.
A simulation model is developed to randomly generate 36 months of demand
history that follows a horizontal demand pattern with a coefficient of variation
(cov) set at 0.30. The demands are randomly varied using the normal distribu-
tion. The standard deviation of the demands is s ¼ 0.30m where m is the average
demand per month. So for each part in the simulation study, the demands
generated are: x1, . . .. ,x36.
The forecast model is run through the first 24 months of history and the forecast
for the next 12 months of demands is generated. The forecast errors are measured
for each of the next 12 months from which the standard error of the forecast error is
tabulated. Finally, the cov of the 1 month forecast error is measured by cov ¼ s= x,
where s is the standard deviation of 1 month forecast error, and x is the average 1
month demand.
The table below lists the results from one such part by nmh. The table clearly
shows how the cov improves as the nmh increases from 1 to 24.
Other Models 87

Month nmh Cov


1 1 0.71
2 2 0.60
3 3 0.55
4 4 0.50
5 5 0.45
6 6 0.46
7 7 0.56
8 8 0.55
9 9 0.45
10 10 0.43
11 11 0.38
12 12 0.36
13 13 0.38
14 14 0.31
15 15 0.31
16 16 0.32
17 17 0.34
18 18 0.36
19 19 0.27
20 20 0.28
21 21 0.31
22 22 0.28
23 23 0.28
24 24 0.32

Example 8.4 An inventory manager is wondering whether to include logic in


the forecasting model to detect and adjust the demand history for outlier
demands prior to generating the forecasts. An outlier demand is where one (or
more) months have a demand that is much larger (or smaller) than the normal
flow of the other demands in the history. Of specific interest is the affect on the
horizontal single smoothing model that generates the forecasts for the future
months for many of the parts in the inventory. The parameter for the forecast
model is set at a ¼ 0.10.
A simulation model is developed to randomly generate 36 months of demand
history that follows a horizontal demand pattern with a coefficient of variation (cov)
set at 0.30. The demands are randomly varied using the normal distribution. The
standard deviation of the demands is s ¼ 0.30m where m is the average demand per
month. So for each part in the simulation study, the demands generated are: x1, . . .. ,x36.
Three sets of demand history are generated for each part. The first set has no outliers.
The next two sets have one outlier inserted in the demand history somewhere in
months 1–24.
For each of the three sets (no outlier, one outlier, one outlier), the forecast model
is run through the first 24 months of history and the forecast for the next 12 months
of demands is generated. The forecast errors are measured for each of the next 12
months from which the standard error of the forecast error is tabulated. Finally, the
cov of the 1 month forecast error is measured by cov ¼ s= x, where s is the standard
deviation of 1 month forecast error, and x is the average 1 month demand.
88 8 Output from Simulation Runs

The table below lists the results from each of the three sets of demand history
from the same part. The table shows how the cov increases tremendously when
the part has one outlier in the demand history. The forecast manager can now
clearly see how important it is to include logic in the forecasting routine to
detect for outlier demands and adjust accordingly prior to forecasting.

Part Demand history Cov


1 No outlier 0.343
2 One outlier 0.611
3 One outlier 0.476
Example 8.5 An inventory manager is seeking guidance on various decisions
concerning the seasonal forecasting model. One of the decisions concerns the
three parameters to the model: a (for the average), b (for the trend) and g (for the
seasonal pattern). The particular values of a under consideration are: 0.05, 0.10,
0.20, 0.30, 0.40 and 0.50. The values for b and g both are (0.10 and 0.20).
A simulation model is developed to randomly generate, for each part, 48 months
of demand history that follows a seasonal demand pattern with a coefficient of
variation (cov) set at 0.30. The demands are randomly generated using the normal
distribution. So for each part in the simulation study, the demands generated are:
x1, . . .. ,x48.
The forecast model is run through the first 24 months of history and the forecast
for the next 12 months of demands is generated. Starting with the month 24 forecast,
the forecast errors are measured for each of the next 12 months, and the standard
error of the forecast error is tabulated. From month 24 to month 36, the forecast
model moves forward and the forecast errors are measured in the same way.
Altogether, 13 sets of forecasts are computed for each part, (one each for months
24–36), and the corresponding forecast errors are measured. Finally, the cov of the 1
month ahead forecast error is measured by cov ¼ s= x , where s is the standard
deviation of 1 month ahead forecast error and x is the average 1 month demand.
This process is followed for each combination of the parameter values of a, b, g
listed earlier, and also for 100 parts where the demand stream of 48 months are each
generated with a different set of random numbers. The average cov from all of the
100 parts are listed in the table below for each parameter (a, b, g) combination. The
results show where the smaller value of the parameters yields the best results. The
forecaster is cautiously aware that all the data are from a seasonal demand pattern
and the accuracy results are for history patterns that are truly of that type.
Cov when the demand pattern is seasonal and the seasonal forecast model is run
with the following parameter values:
β = 0.10 β = 0.10 β = 0.20 β = 0.20
γ = 0.10 γ = 0.20 γ = 0.10 γ = 0.20
α cov cov cov cov
0.05 0.306 0.316 0.307 0.312
0.10 0.314 0.317 0.315 0.329
0.20 0.318 0.325 0.320 0.333
0.30 0.337 0.338 0.347 0.357
0.40 0.341 0.351 0.347 0.357
0.50 0.357 0.366 0.369 0.383
Other Models 89

Example 8.6 An inventory manager is seeking guidance on various decisions


concerning the seasonal demand pattern. One of the decisions concerns the forecast
model to apply: horizontal, trend or seasonal. The horizontal model uses the
parameter a; the trend model uses parameters a, b; and the seasonal model uses
the parameters: a (for the average), b (for the trend) and g (for the seasonal pattern).
The particular values of a under consideration are: 0.05, 0.10, 0.20, 0.30, 0.40 and
0.50. The values for b and g are (0.10 and 0.10).
A simulation model is developed to randomly generate, for each part, 48
months of demand history that follows a seasonal demand pattern with a coeffi-
cient of variation (cov) set at 0.30. The demands are randomly generated using
the normal distribution. So for each part in the simulation study, the demands
generated are: x1, . . .. ,x48. The forecast model is run through the first 24 months
of history and the forecast for the next 12 months of demands is generated.
Starting with the month 24 forecast, the forecast errors are measured for each of
the next 12 months, and the standard error of the forecast error is tabulated.
From month 24 to month 36, the forecast model moves forward and the forecast
errors are measured in the same way. Altogether, 13 sets of forecasts are
generated for each part, (one each for months 24–36), and the corresponding
forecast errors are measured. Finally, the cov of the 1 month ahead forecast error
is measured by cov ¼ s= x, where s is the standard deviation of 1 month ahead
forecast error, and x is the average 1 month demand.
This process is followed for each combination of the parameter values of a,
b, g listed earlier, and also for 100 parts where the demand stream of 48
months are each generated with a different set of random numbers. Each of
the three forecast models are run with the same data. The average cov from all
of the 100 parts are listed in the table below for each forecast model and of
every a, b, g combination. The results clearly show that the best forecasts for
the data from a seasonal demand pattern are those that are generated from the
seasonal forecast model. The better forecasts are also those with the smaller
parameter values.
Cov when the demand pattern is seasonal and the forecast models (horizontal,
trend, seasonal) are run with the following parameter values:
Forecast model
horizontal trend seasonal
β = 0.10 β = 0.10
γ = 0.10
α cov cov cov
0.05 0.443 0.404 0.306
0.10 0.429 0.419 0.314
0.20 0.424 0.435 0.325
0.30 0.423 0.442 0.337
0.40 0.409 0.428 0.341
0.50 0.400 0.418 0.357
90 8 Output from Simulation Runs

Summary

Computer simulation models are mainly developed to emulate actual systems


that are too complex to analyze mathematically. The systems often fall into the
terminating or the nonterminating type. The chapter describes how output data is
collected for either type of system. Terminating systems have a defined begin-
ning and ending event, and nonterminating systems include a combination of
transient, equilibrium and cyclical stages. The output data from these systems are
needed subsequently to statistically analyze the performance of the system that is
under study. Another simulation model presented is one that creates the database
to be used as test data for software applications like forecasting and inventory
replenishments.
Chapter 9
Analysis of Output Data

Introduction

This chapter is a quick review on some of the common statistical tests that are useful
in analyzing the output data from runs of a computer simulation model. This pertains
when each run of the model yields a group of k unique output measures that are of
interest to the analyst. When the model is run n times, each with a different string of
continuous uniform u ~ U(0,1) random variates, the output data is generated inde-
pendently from run to run, and therefore the data can be analyzed using ordinary
statistical methods. See, for example, Hines et al. (2003) for a full description on
statistical methods. Some of the output data may be of the variable type and some
may be of the proportion type. The appropriate statistical method for each type of
data is applied as needed. This includes, measuring the average value and computing
the confidence interval of the true mean. Oftentimes, the simulation model is run
with one or more control input variables in a ‘what if’ manner. The output data
between the two or more settings of the control variables can be compared using
appropriate statistical tools. This includes testing for significant difference between
two means, between two proportions, and between k or more means.
Example 9.1 A maintenance and repair shop for cars is open Monday through
Saturday from 8 a.m. till 6 p.m. The cars needing service arrive during the day with
an average arrival rate via the Poisson distribution. The service times vary via a
gamma distribution with a location parameter to signify the minimum time of
service. A simulation model is developed to emulate the daily activities and the
model collects a series of measures of interest to the analyst. The number of
independent runs of the model is n. Some of the measures collected in each run
of the model are listed below:
n0 ¼ number of bays (this is a control parameter)
n1 ¼ number of vehicles that arrive for service.
n2 ¼ number of parts needed to complete the service.
n3 ¼ number of vehicles serviced without a delay.

N.T. Thomopoulos, Essentials of Monte Carlo Simulation: Statistical Methods 91


for Building Simulation Models, DOI 10.1007/978-1-4614-6022-0_9,
# Springer Science+Business Media New York 2013
92 9 Analysis of Output Data

n4 ¼ number of needed parts that are available in the stock room.


n5 ¼ the number of vehicles that wait in queue over 60 min.
S1 ¼ sum wait time for delayed vehicles.
S2 ¼ sum idle time for a repair bay over the day.

Variable Type Data

The variable type data for the individual daily run are the following:
x1 ¼ S1/n1 ¼ average wait time per vehicle.
x2 ¼ S2/(10  n0) ¼ average idle time per hour per bay.
x3 ¼ n3/n1 ¼ service level for the vehicles.
x4 ¼ n4/n2 ¼ service level for the parts.

Proportion Type Data

When n replications of the simulation model are run, and n5 is the number of days
when one or more vehicles wait at least 60 min for service, then,
p ¼ n5/n is the proportion of days when a vehicle waits 60 or more minutes.

Analysis of Variable Type Data

Variable type data is like time to complete a service on an item in repair, strength
of a steel beam, elasticity from rubber compound, labor cost to service a vehicle,
and so forth. Upon completion of n replications of a simulation model, the analyst
may be interested in determining the point estimate on the variable measured in
the model and also the corresponding confidence interval. The analyst may
further inquire on how many replications are needed to gain the precision desired
in the estimate.
Consider the output measure, x, from a run of a simulation model. The variable x
could be the average of a large number of events that take place in the model. In the
service department of the car dealership model, x could be the average time a
customer waits for service on the vehicle. In the typical situation, the distribution
and the mean and variance of x are unknown. When n replications of the simulation
are run, with the same initial conditions and with different streams of continuous
uniform u ~ U(0,1) random variates, then the output data, x1, . . ., xn, for each
random variable can be treated as independent observations of the random variable.
Analysis of Variable Type Data 93

Some of the more common statistical tools for analyzing the output data are
presented below. Each of the tools are valid as long as the n sample observations,
x1, . . ., xn, are statistically independent.

Sample Mean and Variance

Using the n output results, x1, . . ., xn, the sample mean and variance of x are
computed as below:

X
n
x ¼ xi =n
i¼1

X
n
s2 ¼ ðxi  xÞ2 =ðn  1Þ
i¼1

pffiffiffiffi
The standard deviation of x is s ¼ s2, and the standard error of the mean, sx, is
obtained by,
pffiffiffi
sx ¼ s= n

The associated degrees of freedom is (n  1).


So now, x is an estimate of the true mean, m, and s is an estimate of the true
standard deviation, s. Note the true values of the parameters (m and s) are
unknown. Also not known is the distribution of x.

Confidence Interval of m when x is Normal

In the event, x is normally distributed, the (1  a) confidence interval of m


becomes,

LmU

where, the upper and lower confidence limits (U, L) are, obtained from

U ¼ x þ ta=2 sx

L ¼ x  ta=2 sx
94 9 Analysis of Output Data

Note, ta/2, has (n  1) degrees of freedom and is the a/2 upper-tail percentage-
point of the student’s t distribution where P(t > ta/2) ¼ a/2. Hence,

PðL  m  UÞ ¼ 1  a:

In the event n is large, say n > 30, the standard normal variable, z, can be used in
place of the student’s variable, t. In this event, the confidence limits become,

U ¼ x þ za=2 sx

L ¼ x  za=2 sx

where za/2 is the z value from N(0,1)that gives P(z > za/2) ¼ a/2.

Approximate Confidence Interval of m when x is Not Normal

In the event, x is not normally distributed, and the sample size is small, an
approximate (1  a) confidence interval of m is computed in the same way. That is,

LmU

where, the upper and lower confidence limits (U, L) are, obtained from

U ¼ x þ ta=2 sx

L ¼ x  ta=2 sx

The term ta/2 is the upper-tail percentage-point from the student’s t distribution
with degrees of freedom (n  1), where P(t > ta/2) ¼ a/2. But since, x is not
normal, the exact probability of the interval is not truly known and is
approximated as,

PðL  m  UÞ  1  a:

Central Limit Theorem

When n increases, the Central Limit Theorem applies, and the distribution shape of
the sample mean approaches a normal distribution. Hence, the standard normal
variable, z, replaces the student’s variable, t and the confidence limits become,
Analysis of Variable Type Data 95

U ¼ x þ za=2 sx

L ¼ x  za=2 sx

where za/2 is the z value that gives P(z > za/2) ¼ a/2. The confidence interval on m
becomes,

PðL  m  UÞ ¼ 1  a:

Example 9.2 Suppose a simulation run yields a variable, x with each trial run of
the simulation. Assume further, the simulation is run with n ¼ 10 repetitions, and
each repetition begins with the same initial values and terminates over the same
length of events. The only difference is the stream of random variates used in each
of the simulation runs. So, as much as possible, there are now n output results, x1,
. . ., xn, that are from the same distribution and are independently generated.
Assume further, the sample mean and variance from the ten samples are the
following:

x ¼ 20:0

and

s2 ¼ 25:0;

respectively. The standard deviation becomes s ¼ 5, and the standard error of the
mean is,
pffiffiffi
sx ¼ s= n ¼ 1:58:

Also, the degrees of freedom is (10  1) ¼ 9 for this data.


Because the true distribution of x is not known, an approximate confidence
interval is computed. The student’s t variable needed for a 95 % confidence is t0.025.
Since the degrees of freedom is 9, the search of the student t distribution yields,
t0.025 ¼ 2.262. The approximate upper and lower 95 % confidence limits on m can
now be computed and become,

U ¼ 20 þ 2:262  1:58 ¼ 23:57

L ¼ 20  2:262  1:58 ¼ 16:43

The corresponding approximate confidence interval is ð16:43  m  23:57Þ ,


where

Pð16:43  m  23:57Þ  0:95


96 9 Analysis of Output Data

When Need More Accuracy

Suppose the analyst wants the length of the (1  a) confidence interval, currently
(U  L), to shrink to 2E, and all else remains the same. The number of repetitions
to achieve this goal is obtained from the relation below,

n ¼ ½ðs  ta=2 Þ=E2

where s is the sample standard deviation, t is the student’s t value and E is the
precision sought. Note the student’s t value must be coordinated with degree of
freedom (n  1). The problem with the above relation is that the t value cannot be
inserted in the formula until the sample size n is known.
A way to approximate the formula is to use the normal z value instead of the
student’s t value in the above formula. Replacing t with z yields,

n ¼ ½ðs  za=2 Þ=E2

The above estimate of n will be less or equal to the counterpart value of n when
the student’s t is used. As n gets larger, (n > 30), the difference between using the
t and z value is minor.
Example 9.3 Consider the approximate 95 % confidence interval that is shown in
Example 9.2. Suppose the analyst wants the length of the 95 confidence interval to
shrink from the current (U  L) ¼ (23.57  16.43) ¼ 7.14 to (U  L) ¼ 4.0.
The number of repetitions to achieve this goal is obtained from the relation below,

n ¼ ½ðs  za=2 Þ=E2 ¼ ½ð5  1:96Þ=22 ¼ 24:01

So, in this example, n ¼ 24 repetitions are needed. Because, n ¼ 10 repetitions


have already been run, 14 new repetitions are needed.

Analysis of Proportion Type Data

Proportion type data is like the portion of trials an event occurs. Examples are: the
portion of units that have defects; the portion of customers who use a credit card for
a purchase; the portion of customers in a gas station that use premium gas; the
portion of police calls that have to wait more than 10 min for service, and so forth.
Upon completion of n replications of a simulation model, the analyst may be
interested in determining the point estimate of the proportion measured in the
model, and also on the corresponding confidence interval. The analyst may also
inquire how many replications are needed to gain more precision.
Analysis of Proportion Type Data 97

Proportion Estimate and Its Variance

Using the n output results, let w represent the number from the n replications where
a specified event occurs. The estimated proportion is obtained by,

p^ ¼ w=n

The corresponding variance is

p2 ¼ p^ð1  p^Þ=n
s^

and the standard error of p^ becomes,


pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

s^ p^ð1  p^Þ=n

Confidence Interval of p

The distribution on the estimate of p is approximated by the normal distribution


when np  5 at p  0:50, or when nð1  pÞ  5 at p  0:50. So when the number of
replications, n, is sufficiently large, the (1  a) confidence interval on the true
proportion, p, can be computed in the following way. Let U and L be the upper and
lower confidence limits, respectively, whereby,

LpU

The confidence limits are computed as follows,

U ¼ p^ þ za=2 s^
p

L ¼ p^  za=2 s^
p

The probability on the confidence interval is,

PðL  p  UÞ ¼ ð1  aÞ:

Example 9.4 Suppose a terminating simulation model is run with n ¼ 60


replications, and an event A occurs on w ¼ 6 of the replications. The analyst
wants to estimate the portion of times the event will occur. The estimate of the
portion of times event A occurs is

p^ ¼ 6=60 ¼ 0:10:
98 9 Analysis of Output Data

The variance of the estimate is the following.


0:10 ð1  0:10Þ
p2 ¼
s^ ¼ 0:0015
60

and the standard error of p becomes,


pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
p ¼ 0:0015 ¼ 0:039
s^

Since the estimate of p ð^


p ¼ 0:10Þ is less than 0.50 and n(1  p) ¼ 6.0 is larger
than 5, the normal approximation can be applied and the confidence limit is
computed as below. Recall, z0.025 ¼ 1.96.
The 95 % confidence limits on p becomes,

U ¼ 0:10 þ 1:96  0:039 ¼ 0:176

L ¼ 0:10  1:96  0:039 ¼ 0:024

whereby, the 95 % confidence interval is

ð0:024  p  0:176Þ

and the associated probability is


Pð0:024  p  0:176Þ ¼ 0:95:

When Need More Accuracy

In the event the analyst desires more accuracy on the estimate of the proportion, p,
the number of repetitions needs to increase. The following formula computes the
size of n for the accuracy desired. Suppose (1  a) is fixed and the tolerance
E ¼ 0.5(U  L) desired is specified. So the number of repetitions becomes,

n ¼ pð1  pÞ½za=2 =E2

If an estimate on the proportion p is not available, then set p ¼ 0.5, and the
number of repetitions becomes
n ¼ ð0:25Þ½za=2 =E2

Example 9.5 Consider Example 9.4 again, and assume the analyst needs more
accuracy in the estimate of p. He/she wants to lower the 95 % tolerance from 0.5
(0.176  0.24) ¼ 0.076 to E ¼ 0.050. The question is how many more repetitions
Analysis of Proportion Type Data 99

are needed to accomplish. The above formula is used assuming p ¼ 0.10 and letting
(1  a) ¼ 0.95. Hence, a ¼ 0.05 and za/2 ¼ 1.96. The number of repetitions
needed becomes,
n ¼ ð0:1Þð0:9Þ½1:96=0:052 ¼ 138:3:

Since, 60 repetitions have already been run, 79 more are needed.


Example 9.6 A machine shop has an order for ten units (No ¼ 10) of a product that
require processing on two machines, M1 and M2. One component is produced on M1
and another on M2. The two components are combined to yield the final product. The
machine processing is expensive and difficult where defective components can
occur on each machine. The management of the shop wants to determine in advance
how many units to start, Ns, at the outset to be 95 % certain the number of good units,
Ng, of the final product is equal or larger than the required units of No ¼ 10. In
essence, they want to determine Ns where PðNg  NoÞ  0:95.
When M1 launches Ns units at the start, the raw material is gathered and the units
are processed one after the other. The number of good units from M1 is denoted as
g1, whereby the number of defective units is d1 ¼ Ns  g1. A defective unit from
processing on M1 can occur in two ways, (1) when the raw material is defective, and
(2) when the processed unit fails a strength test. The probability of a defective raw
material is P(d) ¼ 0.03. The strength of the material, S1, is equally likely to fall as
10  S1  20 . The force, F1, is distributed as an exponential distribution with
expected value E(F1) ¼ 5.0. If F1 > S1, the unit is defective since the strength is
not adequate.
When M2 launches Ns units at the start, the raw material is gathered and the units
are processed one after the other. The number of good units from M2 is denoted as g2,
whereby the number of defective units is d2 ¼ Ns  g2. A defective unit from
processing on M2 can occur in two ways, (1) when the raw material is defective,
and (2) when the processed unit fails a strength test. The probability of a defective raw
material is P(d) ¼ 0.05. The strength of the material, S2, is equally likely to fall as
15  S2  24 . The force, F2, is distributed as an exponential distribution with
expected value E(F2) ¼ 6.0. If F2 > S2, the unit is defective since the strength is
not adequate.
The good units from M1 and M2 are combined to yield the final product. The
number of good units in the final product is Ng ¼ Min(g1, g2). Recall the goal is to
begin Ns units on both machines so that the probability of Ng  No is 0.95 or better.
A simulation model is developed to guide the management on the size of Ns
to apply.
The results are listed below where the number of units to start, Ns, ranges from
10 to 15. For each attempt at Ns, the simulation is run 1,000 times to estimate the
probability that the number of good units will be ten or larger. At Ns ¼ 10, for
example, the probability is 0.156 indicating that on 156 of the 1,000 trials, Ng was
ten or larger, and this is far short of the goal. Note at Ns ¼ 13, the probability
reaches 0.952 and this is the minimum value of Ns to achieve the specified goal of
the management.
100 9 Analysis of Output Data

No Ns PðNg  NoÞ
10 10 0.156
10 11 0.537
10 12 0.850
10 13 0.952
10 14 0.988
10 15 0.999

Example 9.7 In Example 9.6, from 1,000 trials with control variable Ns ¼ 13 as
the number of units to start production, the number of runs producing ten or more
good units becomes 952. This is a proportion situation where the estimate of the
proportion of good units, p, becomes p^ ¼ 952=1000 ¼ 0:952 . The associated
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
standard deviation of this estimate is sp ¼ ð:952  :048Þ=1000 ¼ :0067. Hence,
the 90 % confidence limits of p are,

L ¼ 0:952  1:645  :0067 ¼ 0:941

U ¼ 0:952 þ 1:645  :0067 ¼ 0:963

Finally, the 90 % confidence interval is


Pð0:941  p  0:963Þ ¼ 0:90:

Based on the confidence interval, since L ¼ 0.941 is less than 0.95, the manage-
ment should be aware, with 90 % confidence, the proportion of good units could be
lower than the goal of 0.95, when Ns ¼ 13. Instead of settling on Ns ¼ 14 units to
start the production process, the analyst could consider taking more samples to gain
a higher precision on the estimate of p at Ns ¼ 13.

Comparing Two Options

Sometimes the analyst is seeking the better solution when two or more control
options are in consideration. An example could be a mixed model assembly line
where k different models are produced on the line and the analyst is seeking the best
way to send the different models down the line for the day. One way is to send each
model down the line in batches, and another way is to send them down the line in a
random order. The assembly time for each model is known by station. A simulation
model is developed and is run for a day’s activity and the following type of
measures are tallied. Idle time is recorded when an operator in a station must wait
for the next unit in sequence before he/she can begin its work. Congestion is when
the operator has extended work on a unit and is forced to continue working on the
unit even when the next unit arrives at the station on a moving conveyer. A goal is
to minimize the idle time and congestion time over the day’s activities.
Comparing Two Means when Variable Type Data 101

S1 ¼ sum of idle time across all stations and models for the day.
S2 ¼ sum of congestion time across all stations and models for the day.
N1 ¼ number of units assembled for the day.
N2 ¼ number of stations on the line.
N3 ¼ number of units with congestion over the day.
The computations for the day are the following:

x1 ¼ S1=ðN1  N2Þ ¼ average idle time per station per unit assembled.

x2 ¼ S2 =ðN1  N2Þ ¼ average congestion time per station per unit assembled.

p1 ¼ N3=N1 ¼ proportion of units with congestion.

Note, the measures x1 and x2 are variable type data, and p1 is proportion
type data.

Comparing Two Means when Variable Type Data

Suppose a terminating simulation model has two different options (1 and 2) and the
analyst wants to compare the one against the other to see which is more preferable.
For option 1, n1 repetitions of simulation runs are taken and the output yields a
sample mean x1 and variance s21. For option 2, n2 simulation runs are generated and
the results give x2 and s22 . Typically, the number of simulation runs are the same,
whereby n1 ¼ n2. The true mean and variance of options 1 and 2 are not known and
are estimated by the sample simulation runs.

Comparing x1 and x2

The estimate of the difference between the true means is measured by, x1  x2 .
Denoting the true means of options 1 and 2 by m1 and m2, respectively, the
difference between the sample means is an estimate of the difference between the
true means. The expected value of the difference yields the following,

x1  x2 Þ ¼ ðm1  m2 Þ
Eð

Confidence Interval of (m1  m2) when Normal Distribution

When both variables x1 and x2 are normally distributed, it is possible to compute a


(1  a) confidence interval on the difference between the two means, (m1  m2).
The result will yield upper and lower confidence limits, (U, L) where,
102 9 Analysis of Output Data

L  ðm1  m2 Þ  U

and

P½L  ðm1  m2 Þ  U ¼ ð1  aÞ

The lower and upper limits are computed in the following way,

x1  x2 Þ þ ta=2 sð x1 x2 Þ


U ¼ ð

x1  x2 Þ  ta=2 sðx1 x2 Þ


L ¼ ð

Where

sðx1 x2 Þ ¼ standard error of the difference between the two means,
ta/2 ¼ the student’s t value with degrees of freedom df.
The way to compute the above standard error and degrees of freedom will be
given subsequently.

Significant Test

The significance of the difference between the two options can be noted by use of
the confidence interval by observing the range of values from confidence limits L to
U. In the event the interval (L to U) passes through zero, the means of the two
options are not significantly different with (1  a) confidence level. When the
interval is always positive, the mean of option 1 is significantly higher than the
mean of option 2. On the other hand, if the interval is always negative, the mean of
option 1 is significantly smaller than the mean of option 2.

When s1 ¼ s2

When the true standard deviations of the two options are assumed the same, the way
to measure the standard error of the difference between the two means is shown
below.
First, the two sample variances are combined to estimate the common variance.
The is called the pooled estimate of the variance and is computed as follows,
 
s2 ¼ ðn1  1Þs1 2 þ ðn2  1Þs2 2 =½n1 þ n2  2
Comparing Two Means when Variable Type Data 103

Second, the standard error on the difference between the two means becomes,
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 1
sðx1 x2 Þ ¼s þ
n1 n2

and the corresponding degrees of freedom is,

df ¼ ðn1 þ n2  2Þ:

When s1 6¼ s2

When the true standard deviations of the two options are not assumed the same, the
way to measure the standard error of the difference between the two means is
below:
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
s21 s22
sðx1 x2 Þ ¼ þ
n1 n2

The corresponding degrees of freedom, df, is computed in the following way.


First, the variance error of the mean for options 1 and 2 are measured as below,

x1 2 ¼ s21 =n1
s

x2 2 ¼ s22 =n1
s

The degrees of freedom becomes,

2
x1 2 þ s
df ¼ ½s x2 2  =½s
x1 4 =ðn1 þ 1Þ þ s
x2 4 =ðn2 þ 1Þ  2

Approximate Confidence Interval of (m1  m2) when Not Normal

When one or both variables x1 and x2 are not normally distributed, it is possible to
compute an approximate (1  a) confidence interval on the difference between the
two means, (m1  m2) in the same way as shown above when the normal distribu-
tion applies. The result will yield approximate upper and lower confidence limits,
(U, L) where,
104 9 Analysis of Output Data

L  ðm1  m2 Þ  U

and

P½L  ðm1  m2 Þ  U  ð1  aÞ

The upper and lower limits are computed in the following way,

x1  x2 Þ þ ta=2 sð x1 x2 Þ


U ¼ ð

x1  x2 Þ  ta=2 sðx1 x2 Þ


L ¼ ð

As Degrees of Freedom Increases

Via the central limit theorem, as the degrees of freedom increases, the shape of the
x1  x2 Þ increasingly resembles a normal distribution, and eventu-
distribution of ð
ally the approximation term in the confidence interval is dropped.
Example 9.8 Suppose a terminating simulation model has two options, (1,2), and
ten simulation runs of each are taken. The goal is to compare the difference between
the means of each option. Assume the sample mean and variance of the two options
are computed and the results are listed below.
Option 1: n1 ¼ 10, x1 ¼ 50, and s21 ¼ 36.
Option 2: n2 ¼ 10, x2 ¼ 46, and s21 ¼ 27.
The analyst assumes the variables x1 and x2 are sufficiently close to a normal
distribution, and also the variances of the two options are equal. Hence, the standard
error of the difference between the two means is computed as follows.
First, the pooled estimate of the variance is calculated as below.

s2 ¼ ½ð10  1Þ36 þ ð10  1Þ27=½10 þ 10  2


¼ 31:5

The pooled standard deviation becomes,


pffiffiffiffiffiffiffiffiffi
s¼ 31:5 ¼ 5:61

Second, the standard error on the difference between the means is calculated as,
Comparing the Proportions Between Two Options 105

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 1
sðx1 x2 Þ ¼ 5:61 þ
10 10
¼ 2:51

The associated degrees of freedom is df ¼ (n1 + n2  2) ¼ 18.


To compute the (1  a) ¼ 0.95 confidence interval, the lower and upper confi-
dence limits are found in the following way.
The student’s t variable for a/2 ¼ 0.025 and for 18 degrees of freedom is
searched on the student’s t table to find,
t0:025 ¼ 2:101

Hence, the upper and lower confidence limits are computed as below,
U ¼ ð50  46Þ þ 2:101  2:51 ¼ 5:27

L ¼ ð50  46Þ  2:101  2:51 ¼ 1:27

The 0.95 confidence interval is,


 1:27  ðm1  m2 Þ  5:27:

Note, the range of the confidence interval passes across zero. Hence, with the
sample sizes taken so far, there is no evidence of a significant difference between
the means of the two options at the (1  a) ¼ 95 % confidence level.

Comparing the Proportions Between Two Options

Suppose a terminating simulation model has two different options (1 and 2) and the
analyst wants to compare the one against the other to see which is more preferable.
For option 1, n1 repetitions of simulation runs are taken and x1 of them have an
attribute of interest. The proportion of repetitions that have the attribute is measured
by p^1 ¼ x1 =n1 . For option 2, n2 repetitions are taken, x2 have the attribute and the
proportion becomes p^2 ¼ x2 =n2 . Typically, the number of simulation runs are the
same, whereby n1 ¼ n2. The true proportions of options 1 and 2 are not known and
are estimated by the sample simulation runs.

Comparing p1 and p2

The estimate of the difference between the two true proportions p1 and p2 is
measured by the difference of their estimates p^1 and p^2 . The expected value of
the difference yields the following,
106 9 Analysis of Output Data

p1  p^2 Þ ¼ ðp1  p2 Þ
Eð^

and the estimate of the difference in the two proportions is p^1  p^2 .

Confidence Interval of (p1  p2)

When a sufficient number of repetitions (n1, n2) are taken, the normal distribution
applies to the shape of the difference between p^1 and p^2 , and it is possible to
compute a (1  a) confidence interval on the difference between the two
proportions, (p1  p2). The result will yield upper and lower confidence limits,
(U, L) where,

L  ðp1  p2 Þ  U

and

P½L  ðp1  p2 Þ  U ¼ ð1  aÞ

The lower and upper limits are computed in the following way,

p1  p^2 Þ  za=2 sp1p2


L ¼ ð^

p1  p^2 Þ þ za=2 sp1p2


U ¼ ð^

where
sp1  p2 ¼ standard error of the difference between the two proportions,
za/2 ¼ the standard normal variable where P(z > za/2) ¼ a/2.
The standard error sp1  p2 is calculated as follows,

sp1 2 ¼ p^1 ð1  p^1 Þ=n1

sp2 2 ¼ p^2 ð1  p^2 Þ=n2

0:5
sp1p2 ¼ ½sp1 2 þ sp2 2 
Comparing the Proportions Between Two Options 107

Significant Test

The significance of the difference between the two options can be noted by use of
the confidence interval by observing the range of values from confidence limits L to
U. In the event the interval (L to U) passes through zero, the proportions of the two
options are not significantly different with (1  a) confidence level. When the
interval is always positive, the proportion of option 1 is significantly higher than the
proportion of option 2. On the other hand, if the interval is always negative, the
proportion of option 1 is significantly smaller than the proportion of option 2.
Example 9.9 Suppose a terminating simulation model is run with two options to
determine which is preferable with respected to an event, A. Option 1 is
run with n1 ¼ 200 repetitions and event A occurs on x1 ¼ 28 occasions. Hence,
p^1 ¼ x1 =n1 ¼ 0:14 is the portion of times that event A has occurred. Option 2 is
run with n2 ¼ 200 repetitions and event A occurs on x2 ¼ 44 occasions, whereas,
p^2 ¼ x2 =n2 ¼ 0:22. The analyst wants to determine the 95 % confidence interval on
the difference between the two proportions, (p1  p2).
The point estimate on the true difference between the two proportions is,

p1  p^2 Þ ¼ ð0:14  0:22Þ ¼ 0:08:


ð^

The variance of each of the proportions is,

sp21 ¼ ð0:14Þð1  0:14Þ=200 ¼ 0:00060:

sp22 ¼ ð0:22Þð1  0:22Þ=200 ¼ 0:00086:

The standard error between the two proportions is now computed as below,
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
sp1p2 ¼ 0:00060 þ 0:00086
¼ 0:0382

To compute the confidence interval with (1  a) ¼ 0.95, we need the standard


normal value of z0.025 ¼ 1.96. The confidence limits become,

U ¼ ð0:14  0:22Þ þ 1:96  0:0382 ¼ 0:0052

L ¼ ð0:14  0:22Þ  1:96  0:0382 ¼ 0:1546

Therefore, the 95 % confidence interval is,

ð0:1546  p1  p2  0:0052Þ
108 9 Analysis of Output Data

and the associated probability is

Pð0:1546  p1  p2  0:0052Þ ¼ 0:95:

Finally, since the range from the lower limit to the upper limit is all negative
numbers, the difference between the two proportions is significant at the 0.95
confidence level, and thereby, p1 is significantly smaller than p2.

Comparing k Means of Variable Type Data

Suppose a simulation model with terminating type data where the analyst is
comparing various options in the model seeking the combination of options that
yield the optimal efficiency. Could be a mixed model assembly line simulation
model where the analyst is seeking the best way to sequence the units down the line
to minimize the total idle time and congestion time on the line. One of the output
measures is a variable type data. Each option is run with n repetitions and various
outputs results are measured. The analyst wants to determine whether the difference
in the output measure is significant.
The one-way analysis of variance is a method to test for significant differences in
the output results. In the event, the options are found significantly different in an
output measure, the next step is to determine which option(s) give significantly
better results. Below shows how to use the one-way analysis of variance method.

One-Way Analysis of Variance

Assume k treatments and n repetitions of each are observed (in simulation runs).
Note, in this situation, treatment is the same as option and is the common term
in use for analysis of variance. When k treatments and n repetitions, the data
available are:

xij i ¼ 1; . . . ; k and j ¼ 1; . . . ; n:

The one-way analysis of variance method assumes each of the options, i, have
mean mi and variance s2, all not known, and all are normally distributed. The null
hypothesis is below:

Ho : m1 ¼ . . . ¼ mk

The Type I error for this test is: a ¼ P(reject Ho | Ho is true).


Comparing k Means of Variable Type Data 109

In the event Ho is rejected, the analyst will seek the option(s) that yield signifi-
cantly better results. One way to do this is by comparing the difference of two
variables, as described earlier in this chapter. More of this is given subsequently.
A first step in using this method is to calculate the sample averages for each
treatment, i, and for the total of all treatments as shown below.

X
n
xi ¼ xij =n i ¼ 1 to k
j¼1

X
k
x ¼ xl =k
i¼1

A second step is to compute, the sum of squares of treatments, (SSTR), and the
sum of squares of error, (SSE) as below.

X
k
SSTR ¼ n xi  xÞ2
ð
i¼1

k X
X n
SSE ¼ ðxij  xl Þ2
i¼1 j¼1

The degrees of freedom for the treatments and for the error are the following:

df TR ¼ k  1

df E ¼ nk  k

Next, the mean square for the treatments and the mean square of errors are
obtained as follows:

MSTR ¼ SSTR =df TR

MSE ¼ SSE =df E

The residual errors for each observation are denoted as, eij ¼ ðxij  xi Þ, and the
estimate of the variance becomes,

^2 ¼ MSE
s

Further, the expected value of MSE is

EðMSE Þ ¼ s2
110 9 Analysis of Output Data

The expected value of MSTR depends on whether Ho is true or not, as below,

EðMSTR Þ ¼ s2 when Ho is true

EðMSTR Þ>s2 when Ho is not true

To test if the null hypothesis is true, Fisher’s test is applied and Fo is computed
by,

Fo ¼ MSTR =MSE

Fo has the pair of degrees of freedom (dfTR, dfE).


Next, the Fisher’s F table is searched to find the value with confidence level a
and degrees of freedom, (dfTR, dfE), denoted as Fa(dfTR, dfE). Finally, Ho is accepted
or rejected depending on the outcome from below:

If Fo  FaðdfTR; dfEÞ ; accept Ho

If Fo >FaðdfTR; dfE Þ; reject Ho

Example 9.10 Suppose a simulation model is run with k ¼ 3 options (treatments)


and n ¼ 5 repetitions for each option. One of the output measures is of the variable
type and the smaller the value the better. The analyst is seeking whether any of the
options yields significantly better results. The sample averages of the options and
for the total are below:

Observation j
Option i 1 2 3 4 5 Average
1 10 9 11 8 12 x1 ¼ 10:0
2 6 10 7 8 9 x2 ¼ 8:0
3 9 5 6 9 6 x3 ¼ 7:0
Total x ¼ 8:33

The associated sum of squares, degrees of freedom and mean squares are below.

Sum of squares Degrees of freedom Mean squares


SSTR ¼ 23.35 dfTR ¼ 2 MSTR ¼ 11.675
SSE ¼ 34.00 dfE ¼ 12 MSE ¼ 2.833

The measure of Fisher’s F is

Fo ¼ MSTR =MSE ¼ 11:675=2:833 ¼ 4:12


Comparing k Means of Variable Type Data 111

The F value in Fisher’s tables with a ¼ 0.05 and degrees of freedom (dfTR, dfE)
¼ (2, 12) yields, F0.05(2,12) ¼ 3.89.
Since, Fo > 3.89, the null hypothesis is rejected, indicating there is a significant
difference in the means from two or more of the options.
In this example, the smaller the mean, the better. The simulation results show
where option 3 gives the best results and option 2 is the next best. The next step is to
determine whether the sample mean values of options 2 and 3 are significantly
different of not.
Example 9.11 Continuing with Example 9.10, the goal now is to determine
whether the means of options 2 and 3 are significantly different. The estimate of
the variance for the residual errors is s^2 ¼ MSE ¼ 2:833 ; thereby, the standard
pffiffiffiffiffi
^¼ s
error is s ^2 ¼ 1:683 , and the associated degrees of freedom is dfE ¼ 12.
The (1  a) ¼ 95 % confidence limits (U and L) between the means of options
2 and 3 is computed using the student’s t value with a/2 ¼ 0.025 and dfE ¼ 12,
whereby ta/2 ¼ 2.179. Hence,
pffiffiffiffiffiffiffiffiffi
U ¼ ðx3  x2 Þ þ ta=2 s
^= ð2nÞ ¼ 0:160
pffiffiffiffiffiffiffiffiffi
x3  x2 Þ  ta=2 s
L ¼ ð ^= ð2nÞ ¼ 2:160

The 95 % confidence interval becomes,

ð2:16  m3  m2  0:16Þ

and the corresponding probability is

Pð2:16  m3  m2  0:16Þ ¼ 0:95

Because the values from L to U pass through zero, the mean of option 3 is not
significantly smaller than the mean of option 2 with 95 % confidence.
Example 9.12 The 95 % confidence intervals of all three comparisons are below:

Pð2:16  m3  m2  0:16Þ ¼ 0:95

Pð3:16  m2  m1  0:84Þ ¼ 0:95

Pð4:16  m3  m1  1:84Þ ¼ 0:95

The results show where option 1 is significantly higher from options 2 and 3
since the upper and lower limits are both negative and do not pass through zero. As
stated, options 2 and 3 are not significantly different from each other, but option 1 is
significantly higher. The analyst might consider taking more samples to gain further
precision on comparing the difference between options 2 and 3.
112 9 Analysis of Output Data

Summary

This chapter describes the common statistical methods that are used to analyze the
output data from computer models that are based on terminating and nonterminat-
ing systems. The statistical methods are essentially the same that are described in
the common statistical textbooks. They include measuring the average, standard
deviation, confidence interval from output data, some of the variable type and some
of the proportion type. The methods described also pertain when the two or more
variables are in review.
Chapter 10
Choosing the Probability Distribution from Data

Introduction

In building a simulation model, the analyst often includes several input variables of
the control and random type. The control variables are those that are of the “what if”
type. Often, the purpose of the simulation model is to determine how to set the
control variables seeking optimal results. For example, in an inventory simulation
model, the control variables may be the service level and the holding rate, both of
which are controlled by the inventory manager. On each run of the model, the
analyst sets the values of the control variables and observes the output measures to
see how the system reacts.
Another type of variable is the input random variables, and these are of the
continuous and discrete type. This type of variable is needed to match, as best as
possible, the real life system for which the simulation model is seeking to
emulate. For each such variable, the analyst is confronted with choosing the
probability distribution to apply and the parameter value(s) to use. Often empiri-
cal or sample data is available to assist in choosing the distribution to apply and
in estimating the associated parameter values. Sometimes two or more
distributions may seem appropriate and the one to select is needed. The authen-
ticity of the simulation model largely depends on how well the analyst can
emulate the real system. Choosing the random variables and their parameter
values is vital in this process.
This chapter gives guidance on the steps to find the probability distribution to
use in the simulation model and how to estimate the parameter values that
pertain. For each of the random variables in the simulation model with data
available, the following steps are described: verify the data is independent,
compute various statistical measures, choose the candidate probability
distributions, estimate the parameter(s) for each probability distribution, and
determine the adequacy of the fit.

N.T. Thomopoulos, Essentials of Monte Carlo Simulation: Statistical Methods 113


for Building Simulation Models, DOI 10.1007/978-1-4614-6022-0_10,
# Springer Science+Business Media New York 2013
114 10 Choosing the Probability Distribution from Data

Collecting the Data

For each random variable in the simulation model, the analyst is obliged to seek
actual data (empirical or sample) from the real system under study. For example, if
the variable is the number of units purchased with each customer order for an item,
the data collected might be the history of number of pieces from each order, for the
item, in the past year, say. Since the numbers are integers, the variable is from a
discrete probability distribution. The data is recorded and identified as x1, . . ., xn,
where n is the number of data entries collected. Subsequently, this sample or
empirical data is the catalyst in selecting the probability distribution for the
variable. The data is also needed to estimate the distribution’s parameter value,
and subsequently is applied to compare fitted values with the actual values.

Test for Independence

The data should be independently collected so that the subsequent statistical


measures yield valid estimates. An example when not independent is the wait
time for cars at a tollbooth line when the times are observed one car after the
other. When the line is long, the successive wait times will remain high for the
consecutive cars in a line. A way to avoid correlated data is to spread the samples
out and not collect them one after the other. The common method to test for
independent sequential data is to measure the autocorrelation with various lags.

Autocorrelation

Suppose the data, x1, . . ., xn, is sequentially collected, and thereby may not be
independent. A way to detect for independence is by measuring the autocorrelation
of the data with various lags. The sample autocorrelation with a lag of k, denoted as
rk, is computed as below.

Xn
ðxi  xÞðxik  xÞ
rk ¼ P
n k ¼ 1; 2; 3; ::
i¼kþ1 ðxi  xÞ2
i¼1

where x is the average of all x’s. When all sample autocorrelations are near zero,
plus or minus, the data is assumed independent. In the event the data appears not
independent, the sample should be retaken, perhaps sampling one item out of each
five items, or one item per hour, so forth.
Introduction 115

Example 10.1 Assume a series of observations are taken in sequence and the first
three autocorrelations, say, are: 0.89, 0.54, 0.39 for lags of k ¼ 1, 2, 3, respectively.
Since they are not near zero (plus or minus), the data does not appear as indepen-
dent. On the other hand, if the first three autocorrelations were: 0.07, 0.13, 0.18,
the data does appear independent.

Some Useful Statistical Measures

A variety of statistical measures can be computed from the sample data, x1, . . ., xn.
Some that are useful in selecting the probability distribution are listed here:
xð1Þ ¼ minimum

xðnÞ ¼ maximum

x ¼ average

s ¼ standard deviation

cov ¼ s=
x ¼ coefficient of variation

t ¼ s2 =
x ¼ lexis ratio

Example 10.2 Suppose 20 samples are the following: [5.3, 9.8, 5.1, 0.6, 3.9, 8.1,
4.0, 0.1, 4.6, 0.6, 2.9, 7.1, 2.7, 7.0, 2.5, 5.8, 3.0, 7.6, 3.5, 7.7]. The statistical
measures from this data are listed below.
xð1Þ ¼ 0:1

xð20Þ ¼ 9:8

x ¼ 4:595

s ¼ 2:713

cov ¼ 0:590

t ¼ 1:602

Location Parameter

Sometimes it is useful to estimate the location parameter for a distribution, labeled


here as g. This represents the minimum value of x, whereby, x  g. Consider the
sorted sample data denoted as xð1Þ  xð2Þ  . . .  xðnÞ. A way to estimate g for
the Wiebull distribution is given by Zanakis (1979) and is below,
116 10 Choosing the Probability Distribution from Data

^
g ¼ ½x ð1Þ x ðnÞ  xðkÞ2 =½x ð1Þ þ xðnÞ  2xðkÞ

where k is the smallest index with x(k) > x(1).


Example 10.3 Suppose 50 observations are taken and are sorted as follows: [16.3,
21.3, 27.4, 35.7, 38.4, . . ., 51.4], where 16.3 is the smallest and 51.4 the largest in the
sample. Using the formula given above, the estimate of the minimum value of x
becomes,

^g ¼ ð16:3  51:4  21:32 Þ=ð16:3 þ 51:4  2  21:3Þ ¼ 15:3

Candidate Probability Distributions

The typical probability distribution candidates for a continuous random variable are
the following: continuous uniform, normal, exponential, lognormal, gamma, beta
and Wiebull. The more common discrete probability distributions are the discrete
uniform, binomial, geometric, Pascal and Poisson.

Transforming Variables

In the pursuit of seeking the candidate distribution to use, it is sometimes helpful to


convert a variable x to another variable, x0 , where x0 ranges from zero to one, or
where x0 is zero or larger. More discussion is below.

Transform Data to (0,1)

A way to convert a variable to a range where x0 lies between 0 and 1 is described


here. Recall the summary statistics of the variable x as listed earlier. It is possible to
estimate the summary statistics when the variable is transformed to lie between 0 and
1. For convenience in notation, let a0 ¼ x(1)for the minimum, and b0 ¼ x(n)
for the maximum. When x is converted to x0 by the relation x0 ¼ (x  a0 )/(b0  a0 ),
the sample average and standard deviation become, x0 ¼ ð x  a0 Þ=ðb0  a0 Þ , and
s ¼ s/(b  a ), respectively. The corresponding coefficient of variation is cov0
0 0 0

¼ ½s=ðx  a0 Þ . The cov of this measure may be useful when selecting the
distribution to apply.
Candidate Continuous Distributions 117

Transform Data to ðx  0Þ

A way to convert a variable to a range where x0 lies approximately zero and larger is
described here. Recall again the summary statistics of the variable x as listed earlier,
and once more use the notation a0 ¼ x(1) for the minimum. When x is converted to
x0 by the relation x0 ¼ (xa0 ), the range of x0 becomes zero or larger. The
corresponding sample average and standard deviation become, x0 ¼ ð x  a0 Þ, and
0 0
s ¼ s, respectively. Finally, the coefficient of variation is cov ¼ s=ð x  a0 Þ.

Candidate Continuous Distributions

Below is a brief review on some of the properties of the continuous probability


distributions. These are the following: continuous uniform, normal, exponential,
lognormal, gamma, beta and Weibull. Of particular interest with each distribution is
the coefficient of variation (cov) and its range of values that apply. When sample
data is available, the sample cov can be measured and compared to each
distribution’s range to help narrow the choice for a candidate distribution.

Continuous Uniform

The random variable x from the continuous uniform distribution (0,1) has a range of
pffiffiffiffiffi
zero to one. The mean is m ¼ 0:5 and standard deviation is s ¼ 1= 12 ¼ 0:289,
and thereby, the coefficient of variation becomes cov ¼ s=m ¼ 0:577.

Normal

When a variable x is normally distributed with mean m and standard deviation s, the
notation is x ~ Nðm; s2 Þ. Note, the coefficient of variation for x is cov ¼ s=m. When
all x values are zero or larger, the coefficient of variation is always 0.33 or smaller,
i.e., cov  0:33.

Exponential

Recall the exponential distribution where the variable x is zero or larger. The mean,
m; and standard deviation, s; of this distribution have the same value and thereby,
the coefficient of variation is cov ¼ s=m ¼ 1:00.
118 10 Choosing the Probability Distribution from Data

Lognormal

When the variable x of a lognormal distribution is converted to the natural logarithm,


(x0 ¼ ln(x)), the notation for x is x ~ LNðm0 ; s0 2 Þ, and for the transformation, it is
x0~ Nðm0 ; s0 2 Þ. Note, the parameters m0 and s0 2 , are the mean and variance, respec-
tively, of x0 the normal distributed variable and not the lognormal distributed
variable. The coefficient of variation for the variable x0 becomes cov0 ¼ s0 =m0 .

Gamma

The variable x from the (standard) gamma distribution is always zero or larger and
has parameters (k; y). Recall, the mean and variance of x are m ¼ k=y, and s2 ¼ k=y2,
pffiffiffi
respectively, and therefore, the coefficient of variation is cov ¼ 1= k. When k>1,
cov is less than one. When k  1, cov is one or larger. Note, the mode is (k  1)/y
when k  1, and is zero when k < 1.

Beta

The variable x from a beta distribution has many shapes that could skew right or
left or be symmetric and look like the uniform, normal and may even have a
bathtub-like shape. This distribution emulates most shapes, but is a bit difficult
to apply. The parameters are ðk1 ; k2 Þ , and the mean and variance are shown
below:

k1

k1 þ k2

ðk1 k2 Þ
s2 ¼
ðk1 þ k2 Þ2 ðk1 þ k2 þ 1Þ

Weibull

The variable x from a Wiebull distribution has three parameters, ðk1 ; k2 ; gÞ, where g
is the location parameter and can be estimated from the relation given earlier. The
values of x are greater than g and the shape is skewed to the right after the mode is
reached. The mean and variance are below:
Some Candidate Discrete Distributions 119

 
k2 1
m¼ G
k1 k1

ðk2 2 Þ
s2 ¼     2 
k1 2G 2
k1  1=k1 G k11

Some Candidate Discrete Distributions

An important statistic to determine the candidate discrete distribution is the lexis


ratio, t ¼ sm . The lexis ratio can be estimated from sample data by ^t ¼ s^m^ , where
2 2

^ ¼ x ¼ sample average, and s


m ^2 ¼ s2 ¼ sample variance. Below is a description
on some of the properties concerning the lexis ratio for the more common discrete
distributions.

Discrete Uniform

The variable x with the discrete uniform distribution has parameters, (a,b), where x
are all the integers from a to b. The mean and variance of x are m ¼ ða þ bÞ=2 and
s2 ¼ ½ðb  a þ 1Þ2  1=12, respectively. When a ¼ 0, the lexis ratio is t ¼ s2 =m
¼ ½ðb þ 1Þ2  1=6b. Note, when b  4, t  1.

Binomial

The parameters for the binomial distribution are n (number of trials) and p (proba-
bility of a success per trial). The random variable is x (number of successes in n
trials). The mean of x is m ¼ np, and the variance is s2 ¼ npð1  pÞ. Hence, the
lexis ratio, t ¼ s2 =m ¼ ð1  pÞ<1.

Geometric

Recall the geometric distribution where the parameter is p, the probability of a


success on each trial. When the random variable is x (number of fails
until the first success), x ¼ 0, 1, . . ., the mean is m ¼ ð1  pÞ=p, the variance
120 10 Choosing the Probability Distribution from Data

is s2 ¼ ð1  pÞ=p2 , and the lexis ratio for x0 becomes t ¼ s2 =m ¼ 1=p that is


always larger than one.
But when x0 ¼ (x + 1) the variable is the number of trials until a success,
0
x ¼ 1, 2, . . ., the mean is m ¼ 1=p, and the variance is s2 ¼ ð1  pÞ=p2. The lexis
ratio, t ¼ s2 =m ¼ ð1  pÞ=p , is inconclusive since the ratio ranges below and
above one.

Pascal

The parameters for the Pascal distribution are p (probability of a success) and
k (number of successes). The random variable is x (number of fails till k successes),
where x ¼ 0, 1, 2,. . ., the mean is m ¼ kð1  pÞ=p , and the variance is
s2 ¼ kð1  pÞ=p2 . The lexis ratio is t ¼ s2 =m ¼ 1=p>1.
But when x0 ¼ (x + k) is the number of trials until k success’s, x0 ¼ k, k + 1, . . .,
the mean is m ¼ k=p, the variance remains as s2 ¼ kð1  pÞ=p2 , and the lexis ratio
becomes t ¼ s2 =m ¼ ð1  pÞ=p. In this situation, the lexis ratio ranges above and
below one.

Poisson

The parameter for the Poisson distribution is y (rate per unit of measure), where the
unit of measure is typically a unit of time (minute, hour), and so forth. The random
variable is x (number of events in a unit of measure). Since the mean of x is m ¼ y,
and the variance is s2 ¼ y, the lexis ratio becomes t ¼ s2 =m ¼ 1.

Estimating Parameters for Continuous Distributions

Below gives the popular ways to estimate the parameters for the common continu-
ous distributions. These are by the maximum-likelihood estimators and/or the
method-of-moment estimators.

Continuous Uniform

The parameters of the continuous uniform distribution are (a,b) where the variable x
is equally likely to fall anywhere from a to b. When the data x1, . . ., xn is available,
the maximum likelihood estimates of the parameters are as follows:
a^ ¼ minðx1 ; . . . ; xn Þ
b^ ¼ maxðx1 ; . . . ; xn Þ
Estimating Parameters for Continuous Distributions 121

Another way to estimate the parameters for this distribution is by the


method-of-moments. The same data is used to first compute the sample aver-
age, x , and the sample standard deviation, s. Next, the estimates of the
parameters are obtained in the following way:
pffiffiffiffiffi
a^ ¼ x  12s=2
pffiffiffiffiffi
b^ ¼ x þ 12s=2

Example 10.4 Consider a situation where the sample of n ¼ 20 yield the follow-
ing sorted data: [0.1, 0.6, 0.6, 2.5, 2.7, 2.9, 3.0, 3.5, 3.9, 4.0, 4.6, 5.1, 5.3, 5.8, 7.0,
7.1, 7.6, 7.7, 8.1, 9.8], and suppose the analyst suspects the data comes from a
continuous uniform distribution and thereby needs estimates of the parameters, a
and b. From the maximum likelihood estimator method, the estimates of the
parameters are a^ ¼ 0:1 and b^ ¼ 9:8.
Another way to estimate the parameters is by the method-of-moments. To find
the estimates this way, the average and standard deviation of the data entries are
needed, and they are: x ¼ 4:595 and s ¼ 2.713, respectively. Thereby the method-
of-moment estimates become a^ ¼ 0:10 and b^ ¼ 9:29.

Normal Distribution

The normal distribution has two parameters, m, the mean, and s2, the variance.
The estimates are obtained from the sample mean, x, and sample variance, s2,
as below,
^ ¼ x
m

^ 2 ¼ s2
s

Example 10.5 Suppose the analyst has ten sample sorted data entries as [1.3, 6.4,
7.1, 8.7, 9.1, 10.2, 11.5, 14.3, 16.1, 18.0]. The sample average is x ¼ 10:27 and the
standard deviation is s ¼ 4.95. Hence, x is estimated as: N(10.27, 4.952).

Exponential

The exponential distribution has one parameter, y, where the mean and standard
deviation of x are equal whereby m ¼ s ¼ 1y . The maximum-likelihood-estimator of
the parameter is based on the sample mean, x, as shown below,

^
y ¼ 1=
x
122 10 Choosing the Probability Distribution from Data

Example 10.6 Suppose the analyst has the following data with n ¼ 10
observations: 3.0, 5.7, 10.8, 0.3, 1.5, 2.5, 4.5, 7.3, 1.3, 2.1, and assumes the data
comes from an exponential distribution. The sample average is x ¼ 3:90 , and
thereby the estimate of the exponential parameter is ^y ¼ 1= x ¼ 1=3:90 ¼ 0:256.
Upon further computations, the standard deviation of the ten observations is
measured as s ¼ 3.24, not too far away from the average of 3.90.

Lognormal

Consider the variable x of the lognormal distribution, and another related variable,
y, that is the natural logarithm of x, i.e. y ¼ ln(x). The parameters for x are the
mean and variance of y and are denoted as, my, and sy2, respectively. To estimate
the parameters, the n corresponding values of y (y1, . . ., yn) are needed to give the
sample average, y, and the sample variance, sy2. The estimates of the parameters for
the lognormal distribution are the following:

^y ¼ y
m

^ y 2 ¼ sy 2
s

Example10.7 Assume the analyst has collected ten sample entries as X ¼ [0.3,
1.3, 1.5, 2.1, 2.5, 3.0, 4.5, 5.7, 7.3, 10.8]. Upon taking the natural logarithm of each,
y ¼ ln(x), the sample now has ten variables on y. The corresponding values of y are
Y ¼ [1.204, 0.262, 0.405, 0.742, 0.916, 1.099, 1.504, 1.741, 1.988, 2.379]. The
mean and variance of the n ¼ 10 observations of y are y ¼ 0:983 and sy2 ¼ 1.057.

Gamma

The variable x from the gamma distribution has two parameters ðk; yÞ. The mean of
x is m ¼ k=y, and the variance is s2 ¼ k=y2 . One way to estimate the parameters is
by the method-of-moments using the sample average, x, and the sample variance, s2
that are computed from data, x1, . . ., xn. The estimate of the gamma parameters are
derived from,

^
y ¼ x=s2

k^ ¼ x^
y
Estimating Parameters for Discrete Distributions 123

Example 10.8 Assume a sample of n entries, x1, . . ., xn, are collected, from which
the average and variance are measured as x ¼ 10:8 and s2 ¼ 4.3, respectively. The
analyst wants to estimate the gamma parameters for this data. Using the method of
moments, the estimates are: ^y ¼ 2:51 and k^ ¼ 27:12.

Beta

The variable x from the beta distribution (0–1) has two parameters (k1,k2). The
mean of x is m ¼ k1kþk
1
2
, and when the two parameters are greater than zero, (k1 > 0,
~ ¼ ðkðk
k2 > 0), the mode is m 1 1Þ
1 þk2 2Þ
. In the typical situation, the distribution skews to
the right. This occurs when k2 > k1 > 1. For this situation, a way to estimate the
parameters is with use of the sample average, x, and the sample mode, x~. From
the two equations and two unknowns, and some algebra, the estimates of the
parameters are computed as below:
k^1 ¼ x½2~ x  x
x  1=½~

k^2 ¼ ½1  xk^1 =
x

Example 10.9 Assume sample data of x that lies between 0 and 1, and yield the
average and mode as x ¼ 0:4 and x~ ¼ 0:2, respectively. The analyst wants to estimate
the parameters for a beta distribution in the range (0–1). The estimates are below:
k^1 ¼ 0:4½2  0:2  1=½0:2  0:4 ¼ 1:2

k^2 ¼ ½1  0:41:2=½0:4 ¼ 1:8

Estimating Parameters for Discrete Distributions

Below gives the popular ways to estimate the parameters for the common discrete
distributions. These are by the maximum-likelihood estimators and/or the method-
of-moment estimators.

Discrete Uniform

The variable x from the discrete uniform distribution has two parameters (a,b)
where the variable x is equally likely to fall as an integer from a to b. The sample
data (x1, . . ., xn) is used to find the minimum, x(1), and maximum, x(n). The maximum
likelihood estimator of the parameters, a and b, are obtained as below:
124 10 Choosing the Probability Distribution from Data

a^ ¼ xð1Þ

b^ ¼ xðnÞ

Another way to estimate the parameters is by the method-of-moments. The mean


of x is m ¼ (a + b)/2 and the variance is s2 ¼ [(ba1)21]/12. Using the sample
mean x, and sample variance, s2, and a bit of algebra, the following parameter
estimates are found:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
a^ ¼ floor integer of ð
x þ 0:5  0:5 12s2 þ 1Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
b^ ¼ ceiling integer of ð
x  0:5 þ 0:5 12s2 þ 1Þ

Example 10.10 Suppose an analyst collects ten discrete sample data [7, 5, 4, 8, 5,
4, 12, 9, 2, 8] and wants to estimate the min and max coefficients from a discrete
uniform distribution. Using the maximum likelihood estimator, the minimum and
maximum estimates are:
a^ ¼ 2

b^ ¼ 12

The method-of-moment estimate of the parameters requires finding the sample


average and sample variance. These are: x ¼ 6:4 and s2 ¼ 8.711. So now, the
estimate of the parameters become,
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
a^ ¼ floor ð6:4 þ 0:5  0:5 12  8:711 þ 1Þ ¼ floor ð1:764Þ ¼ 1
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
b^ ¼ ceiling ð6:4  0:5 þ 0:5 12  8:711 þ 1Þ ¼ ceilingð11:036Þ ¼ 12

Binomial

The variable x from the binomial distribution has parameters (n, p), where typically
n is known and p is not. The expected value of x is E(x) ¼ np, and thereby, when a
sample of n trials yields x successes, the maximum likelihood estimate of p is,
p^ ¼ x=n:

In the event the n trial experiment is run m times, and the results are (x1, . . ., xm),
with an average of x, the estimate of p becomes,
p^ ¼ x=n:
Estimating Parameters for Discrete Distributions 125

Example 10.11 Suppose m ¼ 5 experiments of binomial data with n ¼ 8 trials are


observed with the results: [1, 3, 2, 2, 0]. Since the average is x ¼ 1:6, the estimate of
p is p^ ¼ 1:6=8 ¼ 0:2.

Geometric

Consider the geometric distribution where the variable x (0, 1, 2, . . .) is the number
of fails before the first success and p is the probability of a success per trial. The
expected value of x is E(x) ¼ (1p)/p. When m samples of x are taken, with results
(x1, . . ., xm), and a sample average x , the maximum-likelihood-estimator of
p becomes,
p^ ¼ 1=ð
x þ 1Þ:

Example 10.12 Suppose m ¼ 8 samples from geometric data are observed and
yield the following values of x: [3,6, 2, 5, 4, 4, 1, 5] where x is the number of
failures till the first success. The analyst wants to estimate the probability of
a success, p, and since the average of x is x ¼ 3:75 , the estimate becomes
p^ ¼1=ð3:75 þ 1Þ ¼ 0:211.

Pascal

Recall the Pascal distribution where the variable x is the number of failures till k
success’s. The parameters are (k, p), where k is known, and assume p is not known.
When m samples of x are taken, with results (x1, . . ., xm), and a sample average x is
computed, the maximum-likelihood-estimator of p becomes the following:
p^ ¼ k=ð
x þ kÞ:

Example 10.13 Suppose m ¼ 5 samples from the Pascal distribution with param-
eter k ¼ 4 are observed and yield the following data entries of x: [6, 4, 7, 5, 6]
where x is the number of failures till k successes. The analyst wants to estimate
the probability of a success, and since the average of x is x ¼ 5:60, the estimate is
p^ ¼ 4=ð5:60 þ 4Þ ¼ 0:417.

Poisson

The variable x (0, 1, 2, . . .) from the Poisson distribution has parameter y. The
expected value of x is EðxÞ ¼ y. When m samples of x (x1, . . ., xm) are collected,
the sample average of x is readily computed as x. Using the sample mean, the
maximum likelihood estimator of y becomes,
^
y ¼ x:
126 10 Choosing the Probability Distribution from Data

Example 10.14 Suppose m ¼ 10 samples from Poisson data are observed and
yield the following values of x: [0, 0, 1, 2, 2, 0, 1, 2, 0, 1]. The analyst wants to
estimate the Poisson parameter, y , and since the average of x is x ¼ 0:90 , the
estimate is ^
y ¼ 0:90.

Q-Q Plot

The Q-Q plot is a graphical way to compare the quantiles of sample (empirical) data
to the quantiles from a specified probability distribution as a way of observing the
goodness-of-fit. This plot applies to continuous probability distributions. See Wilk
and Ganandel (1968) for a fuller description on the Q-Q (quantile to quantile) plot.
To carryout, the empirical or sample data [x1,, . . ., xn] are first arranged in sorted
order [x(1),. . . ., x(n)] where x(1) is the smallest value and x(n) the largest. The
quantiles for the sample data are merely [x(1), . . ., x(n)]. The empirical cumulative
distribution function (cdf) of the sample quantiles is computed and denoted as

F½xðiÞ ¼ wi ¼ ði  0:5Þ=n i ¼ 1 to n

For example if n ¼ 10, and i ¼ 1, F[x(1)] ¼ w1 ¼ 0.05. At i ¼ 2, F[x(2)] ¼


w2 ¼ 0.15; at i ¼ 10, F[x(10)] ¼ w10 ¼0.95, and so forth. The set of ten
probabilities are denoted as Ps ¼ [w1, . . ., w10]. Note, for each x(i), there is an
associated wi.
Consider a probability distribution, f(x) where the cumulative probability distri-
bution is F(x). The corresponding quantiles for this distribution are obtained by the
inverse function,

xi 0 ¼ F1 ðwi Þ i ¼ 1 to n

For each quantile from the sample, a corresponding quantile is computed for
the probability distribution. For convenience, the pair of quantiles are labeled as
Xs ¼ [x(1), . . ., x(n)] for the sample data, and Xf ¼ [x10 , . . ., xn0 ] for the fit from
the probability model.
The n pair of quantiles are now placed on a scatter plot with the sample quantiles,
Xs, on the x-axis and the probability model quantiles, Xf, on the y-axis. In the event
the probability model is a good fit to the sample data, the scatter plot will look like a
straight line going through a 45 angle from the lower left-hand side to the upper
right-hand side, and the scale of the x and y axis will be similar. In the literature, it is
noted where some references place, Xs on the y-axis and Xf on the x-axis.
Example 10.15 Suppose n ¼5 sample (or empirical)data of a variable are observed
as: [8.3, 2.5, 1.3, 9.4, 5.0]. The data are sorted and the sample quantiles are:
Xs ¼ [1.3, 2.5, 5.0, 8.3, 9.4]. The set of empirical probabilities are obtained from
the n samples and are listed in vector form as: Ps ¼ [0.1, 0.3, 0.5, 0.7, 0.9].
Q-Q Plot 127

Assume the sample data are to be compared to a continuous uniform distribution


where f(x) ¼ 0.1 for 0  x  10. Since the cumulative distribution of x becomes
F(x) ¼ 0.1x ¼ w, the quantile for each w is obtained by x ¼ F1(w) ¼ w/0.1.
For each probability on the sample set, Ps, an associated fit from the model quantile
is computed as

xi ¼ wi =0:1 i ¼ 1 to 5:

Thereby, the five probability fit quantiles are Xf ¼ [1.0, 3.0, 5.0, 7.0, 9.0]. The
Q-Q plot for the pair of quantiles (Xs, Xf) is shown in Fig. 10.1. Since the scatter
appears much like a straight line with a 45 fit, the conclusion is that the sample data
is a reasonably close fit to the continuous uniform distribution that is under
consideration.
Example 10.16 Consider once more the sample data from Example 10.15 where
n ¼ 5 and the sorted data yield the quantile set: Xs ¼ [1.3, 2.5, 5.0, 8.3, 9.4], and
associated empirical probabilities Ps ¼ [0.1, 0.3, 0.5, 0.7, 0.9].
Now suppose the sample data are to be compared to a continuous distribution, f
(x) ¼ x/50 for 0  x  10 . Note, the cumulative distribution of x becomes
F(x) ¼ x2/100 ¼ w. So now, for probability w, the quantile is computed by x
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffi
¼ F1 ðwÞ ¼ 100FðxÞ ¼ 10 w . Hence, for each w in the sample set, Ps, an
associated fitted quantile is obtained by,

pffiffiffiffiffi
xi ¼ 10 wi i ¼ 1 to 5:

Thereby, the five quantiles for the probability fit become Xf ¼ [3.2, 5.5, 7.1,
8.4, 9.5]. The Q-Q plot for the pair of quantiles (Xs, Xf) is shown in Fig. 10.2.
Since the scatter plot is not close to the 45 line, the conclusion is that the sample
data is not a good fit to the probability distribution under consideration.
Example 10.17 (Continuous Uniform Q-Q Plot). Consider the sample data from
Example 10.4, and suppose the analyst wants to run a Q-Q Plot assuming the
probability distribution is a continuous uniform distribution. Recall from the earlier
example, the MLE estimates of the parameters are a^ ¼ 0:1 and b^ ¼ 9:8. Hence, the
probability function is estimated as f(x) ¼ 1/9.7 for 0:1  x  9:8, and the cumul-
ative distribution is F(x) ¼ (x0.1)/9.7. So, when the cumulative probability
is w ¼ F(x), the associated quantile becomes w ¼ F1(w) ¼ 0.1 + w(9.7). At
i ¼ 1, the minimum rank, wi ¼ (10.5)/20 ¼ 0.025 and x10 ¼ 0.1 + 0.025
(9.7) ¼ 0.345. At i ¼ 20, the maximum rank, w20 ¼ (200.5)/20 ¼ 0.975 and
x200 ¼ 0.1 + 0.975(9.7) ¼ 9.56, and so forth. The full set of quantiles for the
probability fit is denoted as Xf, and for simplicity, are listed here with one decimal
accuracy, Xf ¼ [0.3, 0.8, 1.3, 1.8, 2.3, 2.8, 3.3, 3.7, 4.2, 4.7, 5.2, 5.7, 6.2, 6.7, 7.1,
7.6, 8.1, 8.6, 9.1, 9.6].
Using the pair (Xs, Xf), the Q-Q Plot is in Fig. 10.3. The vector Xs contains the
sample data from Example 10.4. Note, the scatter plot closely fits a 45 angle from
128 10 Choosing the Probability Distribution from Data

10

5
Xf

0
0 5 10
Xs

Fig. 10.1 Q-Q plot when f(x) ¼ 0.1 for 0  x  10

the lower left corner to the upper right corner, and thereby the sample data appears
as a good fit to the continuous distribution with parameters a^ ¼ 0:1 and b^ ¼ 9:8.
Example 10.18 (Normal Q-Q Plot). Recall Example 10.5 where an analyst has
collected a sample of n ¼ 10 observations: [7.1, 9.1, 11.5, 16.1, 18.0, 14.3, 10.2,
8.7, 6.4, 1.3],and where the sample average and standard deviation of the data are:
x ¼ 10:27 and s ¼ 4.95, respectively. The sorted values gives the sample quantile
set Xs ¼ [1.3, 6.4, 7.1, 8.7, 9.1, 10.2, 11.5, 14.3, 16.1, 18.0]. With n ¼ 10, the ith
sorted cumulative probability for the sample quantiles are wi ¼ (i  0.5)/10 for
(i ¼ 1 to 10). The set Ps of cumulative probabilities is: Ps ¼ [0.05, 0.15, 0.25, 0.35,
0.45, 0.55, 0.65, 0.75, 0.85, 0.95]. Note the cumulative probabilities for i ¼ 1 is
w1 ¼ 0.05, for i ¼ 2, it is w2 ¼ 0.15, and so forth.
The analyst wishes to explore how the data fits with the normal distribution.
To do this, the z variable from the standard normal distribution is needed with each
w entry in the sample probability set Ps. This gives another set here denoted as
Z ¼ [1.645,1.036,0.674,0.385,0.125, 0.125, 0.385, 0.674, 1.036, 1.645].
See Table A.1 in the Appendix. Note at i ¼ 1, w1 ¼ 0.05 and z1 ¼ 1.645,
whereby P(z < 1.645) ¼ 0.05. In the same way, all the z values are obtained
from the standard normal distribution. Now using the average, x, standard deviation,
s, and z values, it is possible to compute the n ¼ 10 fitted quantiles for the normal
distribution by the following formula,

xi 0 ¼ x þ zi s i ¼ 1 to 10
Q-Q Plot 129

10

6
Xf

0
0 5 10
Xs

Fig. 10.2 Q-Q plot when f(x) ¼ x/50 for 0  x  10

12.0

10.0

8.0
Xf

6.0

4.0

2.0

0.0
0.0 5.0 10.0 15.0
Xs

Fig. 10.3 Q-Q plot for the continuous uniform example

Applying the above formula yields the quantiles for the probability model,
Xf ¼ [2.12, 5.14, 6.93, 8.36, 9.65, 10.89, 12.18, 13.61, 15.40, 18.42]. The Q-Q
Plot comparing the quantiles from the sample, Xs, with the quantiles from the
probability fit, Xf, is shown in Fig. 10.4. Since the plot closely follows the 45 line
from the lower left-hand side to the upper right-hand side, the sample data seems
like a good fit with the normal distribution.
130 10 Choosing the Probability Distribution from Data

20

18

16

14

12

10
Xf

0
0 5 10 15 20
Xs

Fig. 10.4 Q-Q plot for the normal example

Example 10.19 (Exponential Q-Q Plot). Recall the n ¼ 10 observations from


Example 10.6 where the sample average and standard deviation were x ¼ 3:90 and
s ¼ 3.24, respectively. The sorted data yields the sample quantiles, Xs ¼ [0.3, 1.3,
1.5, 2.1, 2.5, 3.0, 4.5, 5.7, 7.3, 10.8]. Since n ¼ 10, the ith sorted cumulative
probability for the sample quantiles are wi ¼ (i  0.5)/10 for (i ¼ 1 to 10), and
the set Ps of cumulative probabilities is: Ps ¼ [0.05, 0.15, 0.25, 0.35, 0.45, 0.55,
0.65, 0.75, 0.85, 0.95].
Now assume the analyst wants to compare the data to an exponential distribution
with parameter y ¼ 1= x ¼ 1=3:9 ¼ 0:256. The exponential density is fðxÞ ¼ y exp
ðyxÞ; and the cumulative distribution is, FðxÞ ¼ 1  expðyxÞ . For a given
cumulative probability w ¼ F(x), the associated value of x is obtained by the
relation below.

x ¼ 1=y lnð1  wÞ

where ln is the natural logarithm.


So for the n ¼ 10 values of w listed above, the corresponding values of x are
computed and are the ten fitted quantiles for the exponential distribution. They are
labeled as Xf, whereXf ¼ [0.20, 0.63, 1.12, 1.68, 2.33, 3.11, 4.09, 5.41,7.40,11.68].
See Fig. 10.5 showing the Q-Q Plot that relates the sample quantiles, Xs, with the
exponential quantiles, Xf, is below. Because the plot is a good fit through a 45 line
from the lower left-hand corner to the upper right-hand corner, the exponential
distribution appears as a good fit to the ten sample observations.
Q-Q Plot 131

14

12

10

8
Xf

0
0 5 10 15
Xs

Fig. 10.5 Q-Q plot for exponential example

Example 10.20 (Lognormal Q-Q Plot) Suppose the analyst wants to run a Q-Q
Plot comparing the lognormal distribution on the same data of Example 10.7. Recall,
the ten observations are Xs ¼ [0.3, 1.3, 1.5, 2.1, 2.5, 3.0, 4.5, 5.7, 7.3, 10.8], and the
cumulative probabilities are: Ps ¼ [0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75,
0.85, 0.95]. This gives the set Z ¼ [1.645,1.036,0.674,0.385,0.125, 0.125,
0.385, 0.674, 1.036, 1.645]. Recall at i ¼ 1, w1 ¼ 0.05 and z1 ¼ 1.645, whereby
P(z < 1.645) ¼ 0.05, and so forth. All the z values are obtained from the standard
normal distribution. To test for the lognormal, the natural logarithm of each sample
is taken as yi ¼ ln(xi) for i ¼ 1–10, The ten quantiles are the transformed data and
are denoted as Ys ¼ [1.204, 0.262, 0.405, 0.742, 0.916, 1.099, 1.504, 1.741,
1.988, 2.379]. The average and standard deviation on the ten values of y are, y ¼ 0
:9833 and s ¼ 1.0283, respectively. For each zi, the corresponding(fitted) entry is
obtained from the relation below,

yi 0 ¼ y þ zi s for i ¼ 1 to 10:

The ten fitted values of the normal distribution are now compared to their
counterpart yi (i ¼ 1 to 10), and are listed as Yf ¼ [0.706, 0.080, 0.292,
0.589, 0.857, 1.113, 1.381, 1.678, 2.051, 2.677]. The ten-paired data of Ys and Yf
form the Q-Q Plot in Fig. 10.6. Since the plotted data lie below the 45 line from the
lower left-hand corner to the upper right-hand corner, the lognormal distribution
does not appear as a good fit for the data.
132 10 Choosing the Probability Distribution from Data

2.5

1.5
Yf

0.5

0
−2 −1 0 1 2 3
−0.5

−1
Ys

Fig. 10.6 Q-Q plot for the lognormal example

P-P Plot

The P-P plot is a graphical way to compare the cumulative distribution function
(cdf) of sample (or empirical) data to the cdf of a specified probability distribution
as a way of detecting the goodness-of-fit. The plot applies for both continuous and
discrete probability distributions. See Wilk and Ganandel (1968) for a fuller
description on the P-P (probability to probability) plot. To carryout, the sample
data (x1,, . . ., xn) are first arranged in sorted order [x(1),. . . ., x(n)] where x(1) is the
smallest value and x(n) the largest. The cdf for the sample data, F[x(i)] i ¼ 1 to n,
are denoted here as (w1, . . ., wn), where

wi ¼ F½xðiÞ ¼ ði  0:5Þ=n i ¼ 1 to n

For example if n ¼ 10, and i ¼ 1, F[(1)] ¼ w1 ¼ 0.05. At i ¼ 2, F[x(2)] ¼ 0.15;


at i ¼ 10, F[x(10)] ¼ w10 ¼ 0.95, and so forth. The set of ten probabilities are
denoted as Fs ¼ [w1, . . ., w10]. Note, there is one wi for each x(i).
Now consider a probability distribution, f(x) where the cumulative probability
distribution is F(x). The corresponding cdf’s for this distribution are obtained by

wi 0 ¼ Fðxði Þ Þ i ¼ 1 to n

For each cdf from the sample, a corresponding cdf is computed for the probabil-
ity distribution. For convenience, the pair of cdf’s are labeled as Fs ¼ [w1, . . ., w10]
for the sample data, and Ff ¼ [w10 , . . ., w100 ] for the probability model.
The n pair of cdf’s are now placed on a scatter plot with the sample cdf’s, Fs, on
the x-axis and the probability model cdf’s, Ff, on the y-axis. In the event the
P-P Plot 133

0.9
0.8
0.7
0.6
0.5
Ff

0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
Fs

Fig. 10.7 P-P plot for discrete uniform example

probability model is a good fit to the sample data, the scatter plot will look like a
straight line going through a 45 angle from the lower left hand side to the upper
right hand side, and the scale of the x-axis and y-axis are similar. Note, in some
references, Fs is placed on the y-axis and Ff on the x-axis.
Example 10.21 (Discrete Uniform) Recall Example 10.10 where n ¼ 10 samples
were taken from data assumed as discrete uniform, and where the maximum
likelihood estimates of the parameters are a^ ¼ 2 and b^ ¼ 12 . The sorted data
becomes [2, 4, 4, 5, 5, 7, 8, 8, 9, 10]. Since n ¼ 10, the sample cdf for this data are
listed here as Fs ¼ [0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85, 0.95]. Recall,
for a discrete uniform distribution, with parameters (a,b), the cdf is computed by
F[x] ¼ (x  a+1)/(b  a+1). Hence, the cdf’s for the fitted probability model
become: Ff ¼ [0.09, 0.27, 0.27, 0.36, 0.36, 0.55, 0.64, 0.64, 0.73, 0.81]. The P-P
Plot is shown in Fig. 10.7. Since the plotted points are similar to a 45 line, the
sample data appears reasonably close to a discrete uniform distribution.
Example 10.22 (Binomial) Consider Example 10.11 where m ¼ 5 samples on
binomial data for the number of successes in n ¼ 8 trials yields an estimate of
p^ ¼ 0:20. The sorted data is Xs ¼ [0, 1, 2, 2, 3] and the cdf of the sample data
becomes: Fs ¼ [0.1, 0.3, 0.5, 0.7, 0.9]. From the binomial distribution, the proba-
bility of x successes in n ¼ 8 trials with p ¼ 0.2 are computed as: p(0) ¼ 0.168,
p(1) ¼ 0.336, p(2) ¼ 0.293, p(3) ¼ 0.146, . . . Hence the associated cdf’s for the
fitted probability model are: F(0) ¼ 0.168, F(1) ¼ 0.504, F(2) ¼ 0.797,
F(3) ¼ 0.943, ...., and thereby Ff ¼ [0.168, 0.504, 0.797, 0.797, 0.943]. Figure 10.8
is the P-P Plot for this data. The plot somewhat follows a 45 line and as such, the
binomial probability distribution appears as a fair fit to the data.
Example 10.23 (Geometric) Recall Example 10.12 where m ¼ 8 samples are
taken from a geometric distribution where x is the number of failures till a success.
The estimate of the success probability for the example is p^ ¼ 0:211. The sorted
134 10 Choosing the Probability Distribution from Data

1
0.9
0.8
0.7
0.6
Ff

0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
Fs

Fig. 10.8 P-P plot for binomial example

0.8

0.7

0.6

0.5

0.4
Ff

0.3

0.2

0.1

0
0 0.2 0.4 0.6 0.8 1
Fs

Fig. 10.9 P-P plot for the geometric example

data are Xs ¼ [1, 2, 3, 4, 4, 5, 5, 6], and the corresponding cdf’s are Fs ¼ [0.0625,
0.1875, 0.3125, 0.4375, 0.5625, 0.6875, 0.8125, 0.9375]. Using p ¼ 0.211, and the
cumulative distribution function,
FðxÞ ¼ 1  ð1  pÞxþ1 x ¼ 0; 1; . . .

Ff ¼ [0.377, 0.508, 0.612, 0.694, 0.694, 0.758, 0.758, 0.809]. Figure 10.9 is the
P-P Plot for this example. Since the plot does not follow a 45 line, the geometric
distribution does not appear as a good fit to the data.
Summary 135

Fig. 10.10 P-P plot for 1


poisson example
0.9
0.8
0.7
0.6

Ff
0.5
0.4
0.3
0.2
0.1
0
0 0.5 1
Fs

Example 10.24 (Poisson) Consider Example 10.14 with m ¼ 10 samples on


Poisson data where the parameter estimate is ^ y ¼ 0:90 and the sorted sample data
is Xs ¼ [0, 0, 0, 0, 1, 1, 1, 2, 2, 2]. The cdf of the sample data is Fs ¼ [0.05, 0.15,
0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85, 0.95]. Applying the Poisson probability with
y ¼ 0:90, the probabilities of x are: p(0) ¼ 0.407, p(1) ¼ 0.366, p(2) ¼ 0.165, . . .,
which yields the cdf for the fit as Ff ¼ [0.407, 0.407, 0.407, 0.407, 0.773, 0.773,
0.773, 0.938, 0.938, 0.938, . . .].

Adjustment for Ties

In this example, there are (many) ties in the sample data, and special consideration is
applied in the P-P Plot. With ties in the sample (or empirical) data, the P-P plot only
considers one entry for each unique value of the data. In this way, Xs ¼ [0, 1, 2],
Fs ¼ [0.35, 0.65, 0.95], and Ff ¼ [0.407, 0.773, 0.938]. Figure 10.10 is the P-P Plot
for the cdf’s of the sample and the fit. Since, Fs is similar to Ff, the fit seems
appropriate.

Summary

Computer simulation models often include one or more variables that play impor-
tant roles in the model. Some of the random variables are of the continuous type and
others are discrete. The analyst is confronted with choosing the proper probability
distribution for each variable, and also with estimating the associated parameter(s)
value. The chapter describes some of the common ways to select the distribution
and to estimate the associated parameter values when some empirical or sample
data is available from the real system.
Chapter 11
Choosing the Probability Distribution
When No Data

Introduction

Sometimes the analyst has no data to measure the parameters on one or more of the
input variables in a simulation model. When this occurs, the analyst is limited to a few
distributions where the parameter(s) may be estimated without empirical or sample
data. Instead of data, experts are consulted who give their judgment on various
parameters of the distributions. This chapter explores some of the more common
distributions where such expert opinions are useful. The distributions described here
are continuous and are the following: continuous uniform, triangular, beta, lognor-
mal and Weibull. The data provided by the experts is the following type: minimum
value, maximum value, most likely value, average value, and a p-quantile value.

Continuous Uniform

Recall the continuous uniform distribution, CU (a,b), with parameters, a and b,


where the variable x is equally likely to fall anywhere from a to b. The probability
distribution is f(x) ¼ 1/(ba), and the cumulative distribution is F(x) ¼ (x  a)/
(b  a) for a  x  b. Sometimes the analyst wants to use this distribution but does
not have data to estimate the parameters, (a, b). Suppose expert(s) can help by
providing their opinions on the range of the statistics as below.
First assume the expert(s) can give the following two estimates on the
distribution:
a^ ¼ an estimate of the minimum value of x, and xa ¼ an estimate of the of
a-quantile of x where P½x  xa  ¼ a. An estimate of the parameter b is needed to use
the distribution in the simulation model. Using the estimates provided, the cumula-
tive distribution becomes,

a ¼ FðxÞ ¼ ðxa  a^Þ=ðb  a^Þ;

N.T. Thomopoulos, Essentials of Monte Carlo Simulation: Statistical Methods 137


for Building Simulation Models, DOI 10.1007/978-1-4614-6022-0_11,
# Springer Science+Business Media New York 2013
138 11 Choosing the Probability Distribution When No Data

and thereby the estimate of b is,

b^ ¼ a^ þ ðxa  a^Þ=a

Example 11.1 Suppose the analyst wants to use the continuous uniform distribution
in the simulation model and has estimates of the minimum value of x as a^ ¼ 10 and
the 0.9-quantitleas x0.9 ¼ 15. With this information, the estimate of b becomes,

15  10
b^ ¼ 10 þ ¼ 15:56:
0:90

So now, the distribution to use is,

fðxÞ ¼ 1=5:56 for 10:00  x  15:56

Now assume the expert(s) can give the following two estimates on the
distribution:
b^ ¼ an estimate of the maximum value of x, and xa ¼ an estimate of the of a-
quantile of x where P½x  xa  ¼ a. An estimate of the parameter a is needed to use
the distribution in the simulation model. Using the estimates provided, the cumula-
tive distribution becomes,

a ¼ FðxÞ ¼ ðxa  aÞ=ðb^  aÞ;

and thereby, using b^ and xa , and some algebra, the estimate of the minimum
parameter a is,

^
a^ ¼ ðxa  abÞ=ð1  aÞ

Example 11.2 Assume a situation where the simulation analyst wants to use the
continuous uniform distribution and has estimates of b^ ¼ 16 and the 0.1-quantile
x0.1 ¼ 11. The estimate of a becomes,

a^ ¼ ð11  0:1  16Þ=ð1  0:1Þ ¼ 10:44

So now, the distribution to use is,

fðxÞ ¼ 1=5:56 for 10:44  x  16:00

The following sections show how to apply the triangular, beta, Weibull and
lognormal distributions in the simulation model when no data is available. See Law,
pages 370–375 (2007) for further discussion.
Beta 139

Triangular

Recall, the triangular distribution applies for a continuous variable, x with three
parameters, (a, b, x~), where the range of x is from a to b, and the mode is denoted as x~.
When the analyst wants to use this distribution in a simulation model and has no
empirical or sample data to estimate the three parameters, he/she may turn to one or
more experts to gain the estimates of the following type:

a^ ¼ an estimate of the minimum value of x

b^ ¼ an estimate of the maximum value of x

^x~ ¼ an estimate of the most likely value of x

So now, the triangular distribution can be used with parameters, a^; b; ^ ^x~.
0
The associated standard triangular distribution, T(0, 1, x~ ),with variable x0 falls in
the range from 0 to 1. The most likely value of x0 is the mode denoted as x~0 . The
mode of the standard triangular variable is computed from the corresponding
 
triangular variable by x~0 ¼ ^x~  a^ ðb^  a^Þ.
Example 11.3 Suppose the analyst wants to use the continuous triangular distribution
in the simulation model and from experts opinions has estimates of a^ ¼ 10; b^ ¼ 60
and ^x~ ¼ 20. To apply the standard triangular, with variable x0 , the estimate of the mode
becomes:

20  10
x~ ¼ ¼ 0:20:
60  10

So the triangular distribution is T(10, 20, 60) and the associated standard
triangular distribution is T(0, 1, 0.20).

Beta

Recall the beta distributionhas two parameters (k1,k2) where k1> 0 and k2> 0,
and takes on many shapes depending on the values of the parameters. The
variable denoted as x, lies within two limits(a and b) where (a  x  b). The
distribution takes on many shapes, where it can skew to the right, skew to
the left, where the mode is at either of the limit end points (a, b), includes
various bathtub configurations, and also has symmetrical and uniform shapes.
These shapes depend on the values of the two parameters, k1, k2. Perhaps the
most common situations occur when k2>k1> 1 whereby the mode is greater than
the low limit, a, and the distribution is skewed to the right. This is the distribu-
tion of interest in this chapter.
140 11 Choosing the Probability Distribution When No Data

When the analyst wants to use this distribution in a simulation model and has no
empirical or sample data to estimate the four parameters, he/she may turn to one or
more experts who could provide estimates of the following type:

a^ ¼ an estimate of the minimum value of x

b^ ¼ an estimate of the maximum value of x

^ ¼ an estimate of the mean of x


m

^x~ ¼ an estimate of the most likely value of x

Recall, for the beta distribution, the mean and mode of x are computed from the
parameters as follows:

m ¼ a^ þ ½k1 =ðk1 þ k2 Þðb^  a^Þ

x~ ¼ a^ þ ½ðk1  1Þ=ðk1 þ k2  2Þðb^  a^Þ

Note there are two equations and two unknowns (k1, k2) when estimates of (^ ^
a; b;
^; ^x~) are given. Using algebra, and the estimates provided, it is possible to estimate the
m
unknown shape parameters, k1, k2 with the two equations listed below:
   
k^1 ¼ ð^ ^ = ð^x~  m
m  a^Þð2^x~  a^  bÞ ^Þðb^  a^Þ
 
k^2 ¼ ðb^  m
^Þk^1 =½ð^
m  a^Þ

So now the analyst can use the beta distribution with estimates of all four
parameters, (k^1 ; k^2 ; a^; b)
^ in the simulation model.

Example 11.4 Assume a simulation model where the analyst wants to use the beta
distribution but has no empirical or sample data to estimate the parameters. The
analyst gets advice from an expert(s) that provides estimates of a^ ¼ 10, b^ ¼ 60,
^ ¼ 30 and ^x~ ¼ 20 . Using the above equations, the estimates of the
m
parameters become,

kb1 ¼ ½ð30  10Þð2  20  10  60Þ=½ð20  30Þð60  10Þ ¼ 1:2

kb2 ¼ ½ð60  30Þ 1:2=ð30  10Þ ¼ 1:8

Hence, the beta distribution can now be applied with the parameters, a^ ¼ 10;
b ¼ 60; kb1 ¼ 1:2, and kb2 ¼ 1:8.
b
Lognormal 141

Lognormal

Suppose a variable x  g where g is a location parameter to x. Now let x0 ¼ x  g


where x0  0 and x0 is lognormal. The corresponding normal variable to x0 is y ¼ ln
(x0 ) where ln is the natural logarithm, and thereby x0 ¼ ey. The mean and variance of
y are denoted as m and s2 , respectively, whereby y  Nðm; s2 Þ and x0  LNðm; s2 Þ.
Note also x ¼ g þ x0 ¼ g þ ey . Assume the simulation analyst wants to apply the
lognormal variable x in the simulation model but does not have any empirical or
sample data to estimate the parameters. Instead, the analyst relies on expert(s) who
are able to give the following estimates on the variable x:

g ¼ an estimate of the location parameter of x

x~ ¼ an estimate of the most likely value (mode) of x

xa ¼ an estimate of the a-quantile value of x

Note, the mode of x0 and x are the following:

x~0 ¼ ems
2

x~ ¼ g þ ems

respectively. The a-quantile value of x becomes,

xa ¼ g þ emþza s

where z ~ N(0,1) and Pðz  za Þ ¼ a.


Now note,

m  s2 ¼ lnð~
x  gÞ

m þ za s ¼ lnðxa  gÞ

Applying some algebra,

½s2  za s ¼ ln½ð~
x  gÞ=ðxa  gÞ ¼ c

Solving for s via the quadratic equation,


h pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffii
^ ¼ za þ za 2  4c =2
s

The estimate of the mean of y is below.

^¼s
m ^2 þ lnð~
x  gÞ
142 11 Choosing the Probability Distribution When No Data

Finally, the analyst can now apply the lognormal distribution to the variable x
using the parameters:

g ¼ location parameter of x

^ ¼ mean of y
m

^ ¼ standard deviation of y
s

^; s
In essence, x  LNðg; m ^2 Þ.
Example 11.5 A simulation model is being developed and the analyst wants to use
the lognormal distribution but has no empirical or sample data to estimate the
parameters. The analyst gets advice from an expert(s) who provides estimates of
^g ¼ 100; x~ ¼ 200 and x0.9 ¼ 800. Note a ¼ 0.90 and z0:90 ¼ 1:282. Using the above
results, the estimates of the parameters become,

c ¼ ln½ð200  100Þ=ð800  100Þ ¼ 1:946


 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
^ ¼ 1:282 þ ð1:282Þ2  4ð1:946Þ =2 ¼ 0:894
s

^ ¼ 0:8942 þ lnð200  100Þ ¼ 5:404


m

Hence, the lognormal distribution can now be applied in the simulation model
with the parameters, ^g ¼ 100; m
^ ¼ 5:404 and s ^ ¼ 0:894.
A quick check to ensure the estimates are correct is to measure the mode and/or
the a-quantile that were provided at the outset. The computations are below.


x~ ¼ g þ ems ¼ 100 þ eð5:4040:894 ¼ 200
2

xa ¼ g þ emþza s ¼ 100 þ eð5:404þ1:2820:894Þ ¼ 800

x ¼ 200 and
Since the measures are the same as the specifications provided, (~
x0.90 ¼ 800),the computations are accepted.

Weibull

Suppose a variable x  g where g is a location parameter to x. Now let x0 ¼ x  g


where x0  0 and x0 is Weibull distributed with parameters (k1, k2). Assume the
simulation analyst wants to apply the Weibull distribution in the simulation system
but does not have any empirical data to estimate the parameters. Instead, the analyst
relies on expert(s) who are able to give the following estimates on the variable x:
Weibull 143

g ¼ an estimate of the location parameter of x

x~ ¼ an estimate of the most likely value (mode) of x

xa ¼ an estimate of the a-quantile value of x

When k1< 1, the mode of x0 is at x0 ¼ 0. The analysis here is when k1  1 and the
mode of x0 is greater than zero. For this situation, the mode is measured as below.

x~0 ¼ k2 ½ðk1  1Þ=k1 1=k1

The corresponding mode of x is

x~ ¼ g þ k2 ½ðk1  1Þ=k1 1=k1

Using algebra, k2 becomes

x  gÞ=½ðk1  1Þ=k1 1=k1


k2 ¼ ð~

The cumulative distribution for the a-quantile is obtained by the following,


n o
Fðxa Þ ¼ 1  exp ½ðxa  gÞ=k2 k1 ¼ a

Hence,

lnð1  aÞ ¼ ½ðxa  gÞ=k2 k1

Applying algebra and solving for k2,

k2 ¼ ðxa  gÞ=fln½1=ð1  aÞg1=k1

So now,

x  gÞ=½ðk1  1Þ=k1 1=k1 ¼ ðxa  gÞ= ln ½1=ð1  aÞ1=k1


ð~

whereby,

x  gÞ=ðxa  gÞ ¼ fðk1  1Þ=½k1  ln½1=ð1  aÞg1=k1


ð~
144 11 Choosing the Probability Distribution When No Data

Solving for k1

Because estimates of x~; g and xa are provided, along with a, the only unknown in
the above equation is k1. At this point, an iterative search is made to find the value
of k1 where the right-hand-side of the above equation is equal to the left-hand-side.
The result is k^1 .

Solving for k2

Having found k^1 , the other parameter, k2, is now obtained from

^
x  gÞ=½ðk^1  1Þ=k^1 
k^2 ¼ ð~
1=k1

Example 11.6 A simulation model is being developed and the analyst wants to use
the Weibull distribution but has no empirical or sample data to estimate the
parameters. The analyst gets advice from an expert(s) that provides estimates of
^g¼ 100; x~ ¼ 130 and x0.9 ¼ 500. Note a ¼ 0.90. To find the estimate of k1, the
following computations are needed to begin the iterative search:

ð~
x  gÞ=ðxa  gÞ ¼ ð130  100Þ=ð500  100Þ ¼ 0:075

fðk1  1Þ=½k1  ln½1=ð1  aÞg1=k1 ¼ fðk1  1Þ=½k1  ln½1=ð1  0:9Þg1=k1


¼ fðk1  1Þ=½k1  2:302g1=k1

Note the left-hand-side (LHS) of the equation below. An iterative search of k1 is


now followed until the LHS is near to 0.075.

LHS ¼ fðk1  1Þ=½k1  2:302g1=k1 ¼ 0:075

The search for k1 begins with k1 ¼ 2.00, and continues until k1 ¼ 1.14:
At k1 ¼ 2.00, LHS ¼ 0.46
At k1 ¼ 1.50, LHS ¼ 0.26
At k1 ¼ 1.20, LHS ¼ 0.11
At k1 ¼ 1.14, LHS ¼ 0.075
Hence, k^1 ¼ 1:14.
So now, the estimate of k2 is the following:
Summary 145

x  gÞ=½ðk^1  1Þ=k^1  k1
k^2 ¼ ð~
1= ^

¼ ð130  100Þ=½ð1:14  1:00Þ=1:141=1:14


¼ 188:9

Finally, the estimates of the parameters are (k^1 ¼ 1:14; k^2 ¼ 188:9).
A quick check to ensure the estimates are correct, requires measuring the mode
and/or the a-quantile and compare the measures with those that were provided at
the outset. The computations are below.

x~ ¼ g þ k2 ½ðk1  1Þ=k1 1=k1


¼ 100 þ 188:9½ð1:14  1:00Þ=1:141=1:14
¼ 130

xa ¼ g þ k2 fln½1=ð1  aÞg1=k1
¼ 100 þ 188:9 ln½1=ð1  0:901=1:14
¼ 492:5

x ¼ 130 and
Since the above measures are sufficiently near the data provided, (~
x0.90 ¼ 500) the parameters estimated are accepted.

Summary

Sometimes the analyst may need to develop a computer simulation model that
includes one or more variables where no empirical or sample data is available. This
is where he/she seeks opinions from one or more experts who give some estimates
on the characteristics of the variable. The chapter pertains to these situations and
shows some of the common ways to select the probability distribution and estimate
the associated parameters.
Appendix A

Table A.1 Measures from the standard normal distribution


F(z) z F(z) z
0.010 2.327 0.510 0.025
0.020 2.054 0.520 0.050
0.030 1.881 0.530 0.075
0.040 1.751 0.540 0.100
0.050 1.645 0.550 0.125
0.060 1.555 0.560 0.151
0.070 1.476 0.570 0.176
0.080 1.405 0.580 0.202
0.090 1.341 0.590 0.227
0.100 1.282 0.600 0.253
0.110 1.227 0.610 0.279
0.120 1.175 0.620 0.305
0.130 1.126 0.630 0.331
0.140 1.080 0.640 0.358
0.150 1.036 0.650 0.385
0.160 0.994 0.660 0.412
0.170 0.954 0.670 0.439
0.180 0.915 0.680 0.467
0.190 0.878 0.690 0.495
0.200 0.841 0.700 0.524
0.210 0.806 0.710 0.553
0.220 0.772 0.720 0.582
0.230 0.739 0.730 0.612
0.240 0.706 0.740 0.643
0.250 0.674 0.750 0.674
0.260 0.643 0.760 0.706
0.270 0.612 0.770 0.739
0.280 0.582 0.780 0.772
0.290 0.553 0.790 0.806
0.300 0.524 0.800 0.841
(continued)

N.T. Thomopoulos, Essentials of Monte Carlo Simulation: Statistical Methods 147


for Building Simulation Models, DOI 10.1007/978-1-4614-6022-0,
# Springer Science+Business Media New York 2013
148 Appendix A

Table A.1 (continued)


F(z) z F(z) z
0.310 0.495 0.810 0.878
0.320 0.467 0.820 0.915
0.330 0.439 0.830 0.954
0.340 0.412 0.840 0.994
0.350 0.385 0.850 1.036
0.360 0.358 0.860 1.080
0.370 0.331 0.870 1.126
0.380 0.305 0.880 1.175
0.390 0.279 0.890 1.227
0.400 0.253 0.900 1.282
0.410 0.227 0.910 1.341
0.420 0.202 0.920 1.405
0.430 0.176 0.930 1.476
0.440 0.151 0.940 1.555
0.450 0.125 0.950 1.645
0.460 0.100 0.960 1.751
0.470 0.075 0.970 1.881
0.480 0.050 0.980 2.054
0.490 0.025 0.990 2.327
0.500 0.000
Appendix A 149

Table A.2 Probability Continuous distributions:


distributions, random
variables, notation and Standard uniform u ~ U(0,1)
parameters Continuous uniform x ~ CU(a,b)
Exponential x ~ ExpðyÞ
Erlang x ~ Erlðk; yÞ
Gamma x ~ Gamðk; yÞ
Beta x ~ Beta(k1,k2,a,b)
Weibull x ~ We(k1,k2,g)
Normal x ~ N(m,s2)
Lognormal x ~ LN(my,sy2)
Triangular x ~ TRða; b; x~Þ)
Discrete distributions:
Discrete uniform x ~ DU(a,b)
Bernoulli x ~ Be(p)
Binomial x ~ Bin(n,p)
Geometric x ~ Ge(p)
Pascal x ~ Pa(k,p)
Hyper geometric x ~ HG(n,N,D)
Poisson x ~ PoðyÞ
Multivariate distributions:
Multivariate arbitrary x1,. . .,xk ~ MA(p1. . .k)
Multinomial x1,. . .,xk ~ MN(n,p1,. . .,pk)
Multivariate hyper x1,. . .,xk ~ MHG(n,N,D1,. . .,Dk)
geometric
Bivariate normal x1, x2 ~ BVN(m1,m2,s1,s2,r)
Bivariate lognormal x1, x2 ~ BVLN(my1,my2,sy1,sy2,ry)
Multivariate normal x1,. . .,xk ~ MVN(m,S)
Multivariate lognormal x1,. . .,xk ~ MVLN(my,Sy)
150 Appendix A

Table A.3 Continuous uniform u ~ U(0,1) random variates


0.3650 0.4899 0.1557 0.4745 0.2573 0.6288 0.5421 0.1563
0.5061 0.3905 0.1074 0.7840 0.4596 0.7537 0.5961 0.8327
0.0740 0.1055 0.3317 0.1282 0.0002 0.5368 0.6571 0.5440
0.1919 0.6789 0.4542 0.3570 0.1500 0.7044 0.9288 0.5302
0.4018 0.4619 0.4922 0.2076 0.3297 0.0954 0.5898 0.1699
0.4439 0.2729 0.8725 0.7507 0.2729 0.6736 0.2566 0.0899
0.7901 0.2973 0.2353 0.4805 0.2546 0.3406 0.0449 0.4824
0.5886 0.7549 0.9279 0.3310 0.5429 0.0807 0.6344 0.4100
0.9234 0.6202 0.3477 0.1492 0.4800 0.2194 0.9937 0.1304
0.5477 0.9230 0.5382 0.4064 0.8472 0.8262 0.6724 0.7219
0.4952 0.4130 0.6953 0.1791 0.4229 0.5432 0.8147 0.5409
0.2278 0.6192 0.4898 0.6808 0.8866 0.3705 0.3025 0.2929
0.2233 0.5845 0.3635 0.8760 0.4780 0.1906 0.6841 0.7474
0.1617 0.8078 0.2026 0.9568 0.0659 0.0615 0.7932 0.3796
0.1155 0.1738 0.0481 0.7148 0.5330 0.5610 0.2167 0.4680
0.3989 0.9031 0.7460 0.0886 0.6346 0.7130 0.0157 0.4311
0.9854 0.8026 0.6961 0.4176 0.7345 0.2772 0.3566 0.4335
0.6460 0.3478 0.1044 0.1854 0.0777 0.4328 0.9593 0.5420
0.2178 0.3790 0.3958 0.2815 0.5034 0.1387 0.5173 0.9654
0.6573 0.4411 0.6930 0.0645 0.7561 0.7005 0.4971 0.1554
0.7845 0.0503 0.5180 0.7570 0.8007 0.3252 0.9727 0.8043
0.8758 0.4166 0.1231 0.9542 0.7973 0.6963 0.4016 0.0163
0.5097 0.4061 0.1061 0.2761 0.6430 0.8491 0.4980 0.1878
0.3236 0.7708 0.2180 0.4470 0.2360 0.8784 0.6104 0.3744
0.5859 0.9316 0.5172 0.3303 0.8685 0.2591 0.2595 0.1787
0.7423 0.8409 0.2786 0.7030 0.4049 0.8116 0.7418 0.4377
0.3394 0.7106 0.3123 0.7988 0.1518 0.5930 0.9562 0.2431
0.9843 0.6330 0.5989 0.9026 0.5749 0.2452 0.8602 0.0750
0.2457 0.3786 0.3972 0.5266 0.2704 0.5812 0.2097 0.0787
0.6524 0.9003 0.2316 0.9499 0.8462 0.4412 0.4920 0.7695
0.1955 0.3262 0.4132 0.1527 0.6198 0.0994 0.2050 0.6925
0.9914 0.4714 0.0040 0.4258 0.2887 0.7525 0.8913 0.8219
0.0103 0.1517 0.3774 0.1881 0.9795 0.8721 0.5815 0.7294
0.0282 0.8279 0.7834 0.7912 0.3327 0.4509 0.5551 0.8033
0.2076 0.3647 0.5735 0.3442 0.5282 0.4255 0.5730 0.0500
0.9627 0.9331 0.9926 0.8396 0.4093 0.8053 0.9894 0.2584
0.0170 0.3391 0.6925 0.1104 0.1097 0.2906 0.3989 0.5590
0.8079 0.3096 0.3758 0.4010 0.8414 0.4096 0.7246 0.6588
0.6456 0.5161 0.2233 0.5828 0.7485 0.4565 0.9044 0.2830
0.2814 0.3681 0.0142 0.2947 0.9840 0.7613 0.5809 0.6057
Appendix A 151

Table A.4 Standard normal random variates z ~ N(0,1)


0.058 1.167 0.948 0.173 1.114 0.823 1.163 0.847
1.843 0.465 0.503 2.375 0.416 0.291 1.131 0.533
0.369 0.207 1.338 0.367 0.019 1.885 1.382 0.321
0.776 0.003 0.168 0.817 0.380 0.852 0.026 1.273
0.567 0.303 0.302 2.234 0.068 1.506 0.891 0.292
0.460 1.049 0.086 0.627 0.922 1.526 0.500 1.494
1.204 1.251 0.585 0.822 1.785 0.661 0.817 1.110
1.005 1.529 0.219 0.506 0.662 0.638 1.243 0.528
0.636 1.845 2.548 1.332 0.587 0.320 0.385 0.674
1.311 0.065 0.183 0.841 1.055 0.282 1.208 0.364
0.320 0.114 0.752 0.091 0.614 0.174 0.736 1.151
1.169 1.373 0.067 0.288 0.553 0.746 1.651 0.127
2.710 1.830 0.061 0.102 1.228 1.074 1.635 0.383
0.019 0.044 0.580 0.596 2.391 1.648 0.382 0.701
1.580 0.992 0.465 0.452 0.840 2.280 0.237 0.182
1.329 1.847 0.599 0.213 1.323 0.629 0.030 0.447
1.022 1.652 1.785 0.840 0.771 1.062 0.425 0.253
0.212 0.098 1.578 1.564 2.791 0.890 1.356 1.868
1.049 0.556 0.350 1.569 0.482 0.604 0.524 0.486
0.122 2.494 0.842 0.630 1.341 0.364 1.270 0.139
0.545 1.334 0.614 1.533 0.966 0.020 0.938 0.312
0.512 0.867 1.187 0.313 0.480 0.069 0.045 0.720
0.193 0.386 0.030 0.472 1.273 0.230 0.357 0.471
0.836 1.022 0.288 2.560 0.125 1.392 0.255 1.256
1.784 0.587 1.051 0.648 1.813 0.322 0.280 1.066
0.547 1.636 0.219 0.409 1.953 1.191 0.688 1.230
0.477 0.120 0.869 0.199 0.270 1.595 0.745 0.324
1.096 0.362 1.561 0.843 0.301 0.478 1.170 0.473
2.071 0.791 1.278 0.672 1.145 1.655 0.173 0.505
0.061 1.781 0.265 1.101 1.535 2.265 0.219 0.771
0.526 0.385 0.278 0.762 0.514 0.132 0.456 0.244
0.527 0.138 1.715 1.463 1.007 1.651 0.099 1.421
1.220 0.651 1.251 1.132 1.338 0.462 2.048 0.369
0.315 0.677 0.425 1.238 1.432 0.527 0.077 2.720
1.036 0.195 0.095 0.787 0.251 0.577 0.401 0.666
0.754 1.024 1.087 0.073 0.672 1.405 3.332 0.964
0.288 0.456 1.264 0.685 0.234 0.049 0.032 1.068
1.299 0.699 0.775 0.232 1.773 0.352 1.175 0.451
0.417 0.995 0.791 1.750 1.436 1.364 0.797 0.036
1.402 0.500 0.409 0.858 0.322 0.407 1.502 0.523
0.031 2.155 0.615 0.612 1.195 0.519 1.559 1.558
0.483 0.223 1.511 0.493 0.773 0.116 0.349 1.661
152 Appendix A

Table A.5 Standard exponential random variates when E(x) ¼ 1.00


2.247 2.012 0.485 3.367 0.699 4.144 0.592 1.51
2.118 0.866 1.978 1.659 1.46 2.577 0.052 1.742
0.054 2.334 4.419 0.304 0.776 0.42 0.24 0.225
0.882 1.945 0.092 0.224 1.005 0.617 0.337 1.793
0.506 1.569 1.639 0.936 0.44 0.881 1.081 0.875
0.061 0.164 0.38 0.572 0.551 1.941 0.058 2.276
1.516 0.037 0.147 1.106 0.305 1.847 0.236 0.256
0.933 0.106 2.865 0.482 0.427 0.041 2.209 0.68
0.289 0.808 0.045 1.683 0.635 0.789 0.473 0.159
0.236 0.39 0.551 1.352 0.126 0.572 1.741 0.852
0.073 0.978 0.698 1.166 0.434 1.002 0.175 2.371
0.179 0.691 0.034 1.377 1.837 0.349 0.547 0.722
3.754 0.098 1.516 0.045 3.378 8.186 1.167 1.164
3.402 1.187 2.164 0.136 0.163 0.315 0.032 0.392
1.207 1.777 0.091 1.706 1.674 0.226 0.11 0.978
0.619 2.865 0.214 0.354 0.933 0.283 0.375 0.098
1.323 0.558 0.615 0.617 1.94 0.53 1.227 0.472
0.258 0.072 1.562 1.729 0.404 0.545 5.107 1.201
0.415 0.165 2.362 0.715 1.553 0.954 0.424 3.283
0.247 1.487 2.243 0.121 2.342 0.972 2.208 1.414
1.738 0.166 1.576 0.56 0.785 0.074 0.058 1.804
1.069 3.084 0.128 0.485 0.679 2.303 0.186 1.745
0.048 2.424 0.675 1.11 0.068 0.014 3.12 0.236
0.186 0.339 1.387 1.923 0.296 0.427 2.025 1.956
0.21 2.888 0.299 0.018 0.389 0.917 0.476 1.028
0.204 0.059 0.282 0.221 3.73 0.195 1.067 2.397
1.474 1.421 0.493 0.169 0.846 0.128 0.317 1.945
0.835 0.264 0.016 0.295 3.046 0.057 1.916 0.084
2.436 0.055 0.912 0.692 2.082 0.589 1.776 1.751
0.709 1.572 0.563 3.776 0.153 1.117 1.18 1.322
1.918 0.19 1.091 0.323 2.044 1.323 0.502 0.103
1.119 0.371 0.091 1.002 0.796 0.866 1.837 2.713
1.259 1.665 0.386 1.891 1.505 0.482 0.878 0.463
0.884 1.49 0.395 0.305 2.009 0.3 0.173 0.175
0.742 3.262 0.501 0.353 1.115 0.263 1.18 0.398
0.319 1.123 6.492 0.478 0.455 0.612 1.27 0.295
0.624 0.134 0.222 0.281 5.387 0.016 0.397 1.244
0.161 0.39 0.648 2.583 0.04 1.441 0.077 0.994
0.519 1.427 0.191 2.758 0.392 0.072 0.356 1.05
0.178 0.571 2.612 0.629 4.506 0.565 1.453 1.249
Appendix B

Problems

In solving the problems, the student will occasionally need one or more random
variates of the uniform type, u ~ U(0,1), or of the standard normal type, z ~ N(0,1).
These are provided in the Appendix with random variates of each type, Table A.3
for u ~ U(0,1) and Table A.4 for z ~ (N(0,1). On each problem, the student should
begin on the first row and first column to retrieve the random variate, then the first
row and second column, and so forth. Hence, for u ~ U(0,1), the variates are:
0.3650, 0.4899, and so forth. For z ~ N(0,1), they are: 0.058, 1.167, so forth.

Chapter 2

2.1 Using the Linear Congruent method with parameters, m ¼ 16, a ¼ 5, b ¼ 1


and the seed w0 ¼ 7, list the next 16 entries of wi for i ¼ 1 to 16.
2.2 Use the Linear Congruent method with parameters, a ¼ 20, b ¼ 0, m ¼ 64
and w0 ¼ 3 to generate the next three entries of w.
2.3 Use the results of Problem 2.2 and list the three entries of the corresponding
uniform, u, distribution

Chapter 3

3.1 The variable x is continuous with probability density f(x) ¼ 3/8x2 for 0  x
 2. Use the inverse transform method to generate a random variate of x.

N.T. Thomopoulos, Essentials of Monte Carlo Simulation: Statistical Methods 153


for Building Simulation Models, DOI 10.1007/978-1-4614-6022-0,
# Springer Science+Business Media New York 2013
154 Appendix B

3.2 The variable x is discrete with probability distribution:

x 1 0 1 2 3 4
p(x) 0.10 0.20 0.30 0.20 0.15 0.05

Use the inverse transform method to generate a random variate of x.


3.3 The variable x is continuous with probability density f(x) ¼ 3/8x2 for0  x  2.
Use the Accept-Reject method to generate a random variate of x.
3.4 The variable x is continuous with probability density f(x) ¼ 3/8x2 for0  x  2.
Use the inverse transform method to generate a random variate of x that is
restricted to lie within 1 and 2.
3.5 The variable x is continuous with probability density f(x) ¼ 3/8x2 for0  x  2.
Use the inverse transform method to generate a random variate of y ¼ min(x1,
. . ., x8).
3.6 The variable x is continuous with probability density f(x) ¼ 3/8x2 for 0  x
 2 . Use the inverse transform method to generate a random variate of
y ¼ max(x1, . . ., x8).
3.7 The variable x is composed of three distributions, f1(x), f2(x) and f3(x) with
probabilities 0.5, 0.3, 0.2, respectively. f1(x) ¼ 0.1 for 0 < x < 10,
f2(x) ¼ 0.2x for 0 < x < 10 and f3(x) ¼ 0.003x2 for 0 < x, <10. Generate
one random variate of x.
3.8 The variable x is continuous with density f(x) ¼ 0.25x3 0 < x < 2. Generate
a random variate of y ¼ x1 + x2 + x3.
3.9 The variable x is triangular with a minimum value of 10, mode of 0 and
maximum of 20. Generate a random variate of x.
3.10 The variable x is continuous with empirical data of: 5. 8, 12, 20, 25, 3, 6,
10,15. Generate one random variate of x using the composition method.
3.11 The variable x is continuous with grouped empirical data as follows:

[a,b) Frequency
[0, 10) 36
[10–20) 10
[20–30) 4

Generate one random variate of x using the composition method.

Chapter 4

4.1 The variable x is a continuous uniform for 10 < x < 30. Use the inverse
transform method to generate a random variate of x.
4.2 The variable x is exponential with E(x) ¼ 5. Generate a random variate of x.
4.3 The variable x is Erlang with k ¼ 5 and E(x) ¼ 20. Generate a random
variate of x.
Appendix B 155

4.4 The variable x is Gamma with sample data of x ¼ 5:0 and s2 ¼ 10.0.
Generate a random variate of x.
4.5 The variable x is Gamma with sample data of x ¼ 1:0 and s2 ¼ 10.0.
Generate a random variate of x.
4.6 The variable x is Beta with parameters, (k1, k2) ¼ (1, 8) and (a, b) ¼ (10, 90).
Two random Gammas are g1 ¼ 13 for (k1, k2) ¼ 10,1) and g2 ¼ 20 for (k1,
k2) ¼ (8,1). Find the random variate for the Beta.
4.7 The variable x is Weibull with parameters, (k1, k2) ¼ (2, 20). Generate a
random variate of x.
4.8 The variable x is Normal with mean ¼ 100 and variance ¼ 100. Use the
Sine-Cosine method to generate a random variate of x.
4.9 The variable x is Lognormal with mx ¼ 5 and sx ¼ 100. Generate a random
variate of x.
4.10 Generate a random variate for a chi-square variable with degrees of
freedom ¼ 5.
4.11 Generate a random variate for a chi-square variable with degrees of
freedom ¼ 153.
4.12 Generate a random variate for a student’s t with degrees of freedom ¼ 5. Get
z first.
4.13 Generate a random variate for an F distribution with degrees of freedom 5 and
2. Get chi square (df ¼ 5), first.

Chapter 5

5.1 The variable x is discrete with the following probability distribution.

x 1 0 1 2
P(x) 0.50 0.30 0.15 0.05

Generate one random variate of x.


5.2 The variable x is from a discrete uniform distribution where x ranges from 5
to 5. Generate a random variate of x.
5.3 Generate a random variate of x for a Bernoulli with p ¼ 0.30.
5.4 Generate a random variate of x for a Binomial with n ¼ 10 and p ¼ 0.30.
5.5 Generate a random variate of x for a Binomial with n ¼ 500 and p ¼ 0.20.
5.6 Generate a random variate of x for a Binomial with n ¼ 500 and p ¼ 0.001.
5.7 Generate a random variate of x for a Hyper Geometric with N ¼ 20, D ¼ 5
and n ¼ 2.
5.8 Generate a random variate of x for a Geometric with p ¼ 0.30. Note, x is the
number of trials till the first success.
5.9 Generate a random variate for a Pascal with p ¼ 0.30 and k ¼ 5. Note, x is
the number of trials till five successes.
5.10 Generate a random variate for a Poisson with E(x) ¼ 1.5.
156 Appendix B

Chapter 6

6.1 The variables x1, x2, x3, x4 are jointly related by the following probabilities:

x1 0 1
x2 1 2 1 2
x3 x4
0 3 .01 .09 .02 .10
4 .03 .01 .04 .02
1 3 .05 .06 .06 .09
4 .07 .12 .08 .15

Generate one set of random variates for x1, x2, x3, x4.
6.3 Consider the multinomial distribution with k ¼ 4, p1 ¼ .4, p2 ¼ .3, p3 ¼ .2,
p4 ¼ .1 and n ¼ 5. Generate one set of random variates for x1, x2, x3, x4.
6.4 Consider x1, x2 that are related by the bivariate normal distribution with
m1 ¼ 1.0, m2 ¼ 0.8, s1 ¼ 0.2, s2 ¼0.1 and r ¼ 0.5. Generate one random
variate set of x1, x2.
6.6 Consider x1, x2 that are related by the bivariate lognormal distribution with
parameters my1 ¼ 20, my2 ¼ 2, sy1 ¼ 2, sy2 ¼ 0.5 and ry ¼ 0.8, where
yi ¼ ln(xi) for i ¼ 1,2. Generate one random variate set of x1, x2.
6.8 Generate the Cholesky matrix from the variance-covariance matrix:
2 3
16 4 2
4 4 4 15
2 1 1

6.9 The variables x1, x2, x3 are from a multivariate normal distribution with the
following matrices. Generate one random set of x1, x2, x3:
2 3 2 3
20 4 0 0
m¼4 6 5 C¼4 1 3 0 5
10 0:5 0:5 0:5

6.10 The variables x1, x2, x3 are from a multivariate lognormal distribution with the
following matrices from the transformed values of y1, y2, y3. Generate one
random set of x1, x2, x3:
2 3 2 3
2:0 4 0 0
my ¼ 4 0:6 5 Cy ¼ 4 1 3 0 5
1:0 0:5 0:5 0:5
Appendix B 157

Chapter 7

7.1 Consider a Poisson process where A(j) ¼ arrival rate, B(j) ¼ time for j ¼ 1,
2, 3, 4:

j 1 2 3 4 ......
AðjÞ 2 3 5 4
BðjÞ 0 1 2 3

Generate the random times of the first three arrivals.


7.2 Generate the batch size, x, for an arrival where x ¼ y + 1 and y is Posson
distributed with E(y) ¼ 2.4.
7.3 Generate a random variate, x, that is the time to fail for a unit that has four
active redundant units, with times denoted as y, that are Exponential and E
(y) ¼ 100.
7.5 Generate a random variate, x, that is the time to fail for a unit that has four
standby redundant units, with times denoted as y, that are Exponential and E
(y) ¼ 100.
7.7 From the integers of 1–20, generate the sequence of n ¼ 5 of them randomly
without replacement.
7.8 From a deck of N ¼ 52 regular cards, generate a random hand for a player
who will receive five of the cards. Use the same index to identify cards as
listed in the text.

Chapter 9

9.2 Simulation results of n ¼ 11 runs yields x ¼ 100 and s ¼ 18. Compute the
0.95 confidence limits for the true mean. Note, t10, 0.025 ¼ 2.228.
9.3 In Problem 9.2, the analyst wants (U – L) ¼ 5.0. How many more runs are
needed?
9.4 Simulation results of n ¼ 40 runs yield w ¼ 8 units with an attribute. Find the
0.95 confidence limits on the true proportion of units with the attribute.
9.5 In Problem 9.4, the analyst wants (U – L) ¼ 0.04. How many more runs are
needed?
9.6 Consider a machine shop where an order of No ¼ 15 is needed. A simulation
is run where the units started is Ns ¼ 20, and after n ¼ 1,000 simulation runs,
Ng ¼ 958 is the number of good units. The management wants to be 95%
certain that the number of good units will exceed No. Find the 0.95 confidence
interval on the probability that Ng will be equal or larger to No.
9.7 For Problem 9.6, how many more runs, if any, are needed so that the 0.95
confidence interval (from L to U) is always above the 0.95 specification mark?
158 Appendix B

9.9 Suppose two options are run in a simulation with results of n1 ¼ 20, x1 ¼ 1001
¼ 100 and s1 ¼ 30 for option 1, and n2 ¼ 20, x2 ¼ 95 and s2 ¼ 25 for
option 2. Assuming the two variances are the same, find the 0.95 confi-
dence limits on the difference between the two means.
9.10 A simulation is run with four options (i) and four observations (j) of each are
run with results in the table below. Use the one-way-analysis-of-variance
method to determine if all the means of the four options are the same at the
0.05 significance level. Note F3,12,0.05 ¼ 3.49.

Observations (j)
1 2 3 4
Options (i)
1 5 7 6 4
2 3 5 6 2
3 8 9 7 8
4 6 4 4 6

9.12 Using the results of Problem 9.10, compute the 0.95 confidence interval on the
difference between each pair of means, and label any that are significantly
different. Note, t0.025,12 ¼ 2.179.

Chapter 10

10.2 A sample of n ¼ 10 data entries are the following: (10, 13, 9, 7, 8, 12, 15, 10,
3, 8). Compute the following: x(1), x(n), x, s, cov, t.
10.3 From the data of Problem 10.2, compute the estimate of the location
parameter, g.
10.4 Consider the (n ¼ 15) sample data from a continuous uniform distribution:
(1.3, 1.4, 1.8, 2.3, 2.4, 2.5, 2.9, 3.1, 3.4, 3.9, 4.1, 4.7, 5.2, 5.7, 6.1). Find the
maximum likelihood estimates of the min and max parameters, (a, b). Now
find the method of moments estimators for a and b.
10.5 Suppose the n ¼ 12 sample data entries are the following: (10.4, 12.3, 13.5,
14.6, 15.1, 15.8, 16.2, 16.5, 17.3, 16.3, 15.1, 19.4). Assuming the data are
normally distributed, estimate the parameters of the mean and standard
deviation.
10.6 From the n ¼ 8 sample data: (0.7, 1.2, 1.8, 2.4, 4.0,10.3, 0.9, 1.4), estimate
the parameter for an exponential distribution.
10.7 Assume the lognormal variable x, with sample data: (10, 12, 15, 23, 40, 90,
217). Estimate the parameters for this distribution.
10.8 Suppose the variable x is assumed as gamma distributed and a sample of
n ¼ 50 yields the sample average of 33 and the sample variance of 342.
Estimate the parameters for the gamma distribution.
Appendix B 159

10.9 Suppose the variable x is assumed as beta distributed and estimates of the
following are given: mean ¼ 60, mode ¼ 80, min ¼ 0, max ¼ 100. Esti-
mate the parameters for the beta distribution.
10.10 The following (n ¼ 11) sample data are: (3, 3, 5, 7, 8, 8, 11,12,13,15,16).
Assuming the data comes from a discrete uniform distribution, estimate the
maximum likelihood estimate for the min and max (a, b). Now estimate the
parameters using the method of moments.
10.11 Suppose the variable x is from a binomial distribution with n ¼ 10, p is
unknown, and m ¼ 8 samples of x are (3, 2, 1, 5, 3, 4, 3, 2). Find the
maximum likelihood estimate of p.
10.12 Suppose the variable x is from a geometric distribution where p is unknown,
and m ¼ 6 samples of x are (3, 5, 8, 4, 7, 3). Find the maximum likelihood
estimate of p. Recall, x ¼ number of failures till a success.
10.13 Suppose the variable x is from a Pascal distribution with k ¼ 3, p is
unknown, and m ¼ 10 samples of x are (8, 9, 9, 12, 10, 13, 10, 12,
11, 13). Find the maximum likelihood estimate of p. Recall, x ¼ number
of failures till three successes.
10.14 Suppose the variable x is from a Poisson distribution where the parameter is
unknown, and m ¼ 7 samples of x are (3, 5, 2, 7, 4, 4, 1). Find the maximum
likelihood estimate of the parameter.
10.15 Consider the (n ¼ 15) sample data from Problem 10.4 and the maximum
likelihood estimates of the parameters. Assuming the continuous uniform
distribution, list the vectors Xs and Xf for the Q-Q Plot.
10.16 Consider the (n ¼ 11) sample data from Problem 10.10 and the maximum
likelihood estimates of the parameters. Assuming the discrete uniform
distribution pertains, list the vectors Fs and Ff for the P-P Plot.

Chapter 11

11.1 Suppose the variable x is from a continuous uniform distribution and an expert
estimates the minimum value is 50 and the 0.75-quantile is 90. Estimate the
parameters for this distribution.
11.2 Suppose the variable x is from a continuous uniform distribution and an expert
estimates the maximum value is 100 and the 0.20-quantile is 40. Estimate the
parameters for this distribution.
11.3 Assume a variable x from the triangular distribution where an expert estimates
the following: min ¼ 5, most likely ¼ 20 and the max ¼ 30. Estimate the
parameters for the standard triangular distribution.
11.4 Assume a variable x from the beta distribution where an expert estimates the
following: min ¼ 5, mean ¼ 18, most likely ¼ 20 and the max ¼ 30. Esti-
mate the parameters for the beta distribution.
160 Appendix B

11.5 Assume a variable x from the lognormal distribution where an expert


estimates the following: min ¼ 0, most likely ¼ 20 and the 0.95-quantile
¼ 100. Estimate the parameters for the lognormal distribution.
11.6 Assume a variable x from the Weibull distribution where an expert estimates
the following: min ¼ 0, most likely ¼ 20 and the 0.95-quantile ¼ 100. Esti-
mate the parameters for the Weibull distribution.
Appendix C

Solutions

2.1 4, 5, 10, 3. 0, 1, 6, 15, 12, 13, 2, 11, 8, 9, 14, 7


2.2 60, 48, 0
2,3 0.9375, 0.750, 0.000
3.1 1.429
3.2 1
3.3 1.084
3.4 1.526
3.5 1.260
3.6 1.918
3.7 4.899
3.8 4.484
3.9 0.500
3.10 6.978
3.11 4.899
4.1 4.600
4.2 2.271
4.3 8.948
4.4 4.230
4.5 0.00058
4.6 41.515
4.7 13.478.
4.8 114.18
4.9 1.283
4.10 3.535
4.11 151.98
4.12 0.063
4.13 0.697
5.1 1
5.2 1
(continued)

N.T. Thomopoulos, Essentials of Monte Carlo Simulation: Statistical Methods 161


for Building Simulation Models, DOI 10.1007/978-1-4614-6022-0,
# Springer Science+Business Media New York 2013
162 Appendix C

5.3 1
5.4 3
5.5 99
5.6 1
5.7 0
5.8 2
5.9 8
5.10 3
6.1 0,2,0,4
6.3 3,0,2,0
6.4 19.884, 2.326
6.6 2.686, 2.454
6.8 C¼ 4 0 0
1 1.732 0
0.5 0.289 0.666
6.9 X¼ 19.768
9.443
11.028
7.1 0.908, 2.865, 3.562
7.2 x¼6
7.3 67.14
7.5 193.76
7.7 8, 11, 3, 12, 6
7.8 8 H, AD, 6D, AS, KD
9.2 (L,U) ¼ (87.908, 112.091)
9.3 188 more
9.4 (L,U) ¼ (0.076, 0.324)
9.5 1,497 more’
9.6 (L,U) ¼ (0.946, 0.970)
9.7 1,415 more
9.8 (L,U) ¼ (12.11, 22.11)
9.9 (L,U) ¼ (0.1917, 0.0317)
9.10 F ¼ 6.61 Significant
9.12 1v2 0.476, 2.524 s
1v3 3.524, 1.475 s
1v4 0.524, 1.524 ns
2v3 5.024, 2.976 s
2v4 2.024, 0.024 ns
3v4 1.976, 4.024 s
10.2 x(1) ¼ 3
x(10) ¼ 15
x ¼ 9:5
s ¼ 3.375
cov ¼ 0.355
t ¼ 1.190
10.3 1
10.4 MLE: a ¼ 1.3, b ¼ 6.1
MOM: a ¼ 0.746, b ¼ 6.025
(continued)
Appendix C 163

10.5 N(15.208, 2.3552)


10.6 0.352
10.7 LN(3.457, 1.1392)
10.8 y ¼ 0:096
k ¼ 3.168
10.9 (a, b, k1, k2) ¼ (0,100,1.8,1.2)
10.10 MLE: a ¼ 3, b ¼ 16
MOM: a ¼ 1, b ¼ 17
10.11 0.2875
10.12 0.167
10.13 0.219
10.14 y ¼ 3:714
10.15 Xs ¼ (1.3,1.4, 1.8, 2.3, 2.4, 2.5, 2.9, 3.1, 3.4, 3.9, 4.1,4.7, 5.2, 5.7, 6.1)
Xf ¼ (1.46, 1.78, 2.10, 2.42, 2.74, 3.06, 3.38, 3.70, 4.02, 4.34, 4.66, 4.98, 5.30, 5.62, 5.94)
10.16 Fs ¼ (.045, .136, .227, .318, .409, 500, .591, .683, .774, .865, .955)
Ff ¼ (.071, .071, .214, .357, .429, .429, .643, .714, .786, .928, 1.000)
11.1 103.33
11.2 25
11.3 (0. 0.6, 1)
11.4 (5, 30, 1.3, 1.2)
11.5 LN(3.470, 0.6862)
11.6 k1 ¼ 1.44
k2 ¼ 45.54
References

Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions with Formulas, Graphs and
Tables, pp. 931–937. U.S. Department of Commerce, Washington (1964)
Ahrens, J.H., Dieter, V.: Computer methods for sampling from gamma, beta, poisson and binomial
distributions. Computing 12, 223–246 (1974)
Barnett, V.: Probability plotting methods and order statistics. Appl. Stat. 24, 95–108 (1975)
Box, G.E.P., Miller, M.E.: A Note on the generation of random normal variates. Ann. Math. Stat.
29, 610–611 (1958)
Cheng, R.C.H.: The Generation of gamma variables with non-integral shape parameter. Appl. Stat.
26, 71–75 (1977)
Dubey, S.D.: On some permissible estimators of the location parameters of the Wiebull and certain
other distributions. Technometrics 9, 293–307 (1967)
Fishman, G.S., Moore, L.R.: A statistical evaluation of multiplicative congruent random number
generators with modulus 231 -1. J. Am. Stat. Assoc. 77, 129–136 (1982)
Gentle, J.E.: Cholesky Factorization. In: Numerical Linear Algebra for Applications in Statistics,
pp. 93–95. Springer, Berlin (1958)
Hahn, G.J., Shapiro, S.S.: Statistical Models in Engineering. Wiley, New York (1994)
Hastings Jr., C.: Approximation for Digital Computers. Princeton University Press, Princeton
(1955)
Hines, W.W., Montgomery, D.C., Goldsman, D.M., Borror, C.M.: Probability and Statistics in
Engineering. Wiley, New Jersey (2003)
Kelton, W.D., Sadowski, R.P., Sturrock, D.T.: Simulation with Arena. McGraw Hill, New York
(2007)
Law, A.M.: Simulation Modeling and Analysis, 4th edn. McGraw Hill, Boston (2007)
Lehmer, D.H.: Mathematical methods in large scale computing units. Ann. Comput. Lab. 26,
142–146 (1951). Harvard University
Lewis, P.S.W., Goodman, A.S., Miller, J.M.: A pseudo random number generator for the system/
360. IBM Syst. J. 8, 136–146 (1969)
Payne, W.H., Rabung, J.R., Bogyo, T.P.: Coding the lehmer pseudorandom number generator.
Commun. Assoc. Comput. Mach. 12, 85–86 (1969)
Rand Corporation: A Million Random Digits with 100,000 Normal Deviates, Santa Monica (1946)
Rose, C., Smith, M.D.: Order statistics. Math. Stat. Math. 9.4, 311–332 (2002)
Schever, E.M., Stoller, D.S.: On the generation of normal random vectors. Technometrics 4,
278–281 (1962)
Wilk, M.B., Gnanadesikan, R.: Probability plotting methods for the analysis of data. Biometrika
55, 1–17 (1968)
Zanakis, S.H.: A simulation study of some simple estimators for the three-paramater Weibull
distribtuion. J. Stat. Comput. Simul. 9, 101–116 (1979)

N.T. Thomopoulos, Essentials of Monte Carlo Simulation: Statistical Methods 165


for Building Simulation Models, DOI 10.1007/978-1-4614-6022-0,
# Springer Science+Business Media New York 2013
Index

A Chi-square variate, 41–42


Accept-reject (AR) method, 5, 15, 17–18, Cholesky decomposition, 57, 66–68, 70
31, 32 COBOL, 2
Active redundancy, 6, 71, 73–74, 78 Coefficient of variation (COV), 85–89,
Approximation formula, 5, 27, 41, 44, 96 115–118
Arrival rate, 54, 72, 80, 83, 91 Coefficients, 84, 124
Assembly line, 80, 100, 108 Collecting the data, 114
Autocorrelation, 13, 114–115 Comparing p1 and p2, 105–106
Average, 4, 6, 7, 12, 54, 55, 71–73, 81, 82, Comparing two options, 100–101
85–89, 91, 92, 101, 109, 110, 112, Comparing x1 and x2, 101
114–117, 119, 121–126, 128, 130, Component, 73–76, 99
131, 137 Composition, 21, 24, 25
Computer languages, 2
Computer simulation, 2, 6, 14, 26, 78–80, 83,
B 90, 91, 135, 145
Basic, 2 Computer simulation software, 3
Basic fundamentals, 3–4 Conditional, 63, 64, 66
Batch arrivals, 6, 71, 73, 78 Conditional distributions, 63–64
Batch size, 73, 82 Confidence interval, 4, 6, 91–98, 100–107, 111,
Bathtub, 34, 118, 139 112
Bernoulli, 5, 45, 47–49, 55 Confidence limits, 93–95, 97, 98, 100–103,
Beta, 5, 7, 27, 33–35, 44, 116–118, 123, 105–107, 111
137–140 Congestion time, 100, 101, 108
Beta function, 34 Congruent, 10–11, 14
Binomial, 5, 45, 48–49, 52, 55, 60, 116, 119, Constant, 6, 53, 71, 72, 78
124, 125, 133, 134 Continuous, 2–5, 7, 15–22, 24, 27–44, 57, 113,
32-bit word length, 11–12, 14 116, 126, 132, 135, 137, 139
Bivariate lognormal, 5, 47, 65, 66, 70 Continuous distributions, 22, 78, 117, 120,
Bivariate normal, 5, 57, 62, 64–65, 68, 70 127, 128
Buffer, 82–84 Continuous uniform, 5–7, 11, 12, 26–31, 37,
38, 44, 46–51, 55, 58, 73, 75, 91, 92,
116, 117, 120, 121, 127, 129, 137–138
C Continuous variable, 16, 139
C++, 2 Control variable, 6, 7, 91, 100, 113
Candidate, 7, 113, 116, 117, 119 Convolution method, 38
Central limit theorem, 30, 38, 41, 94–95, 104 Correlation, 63, 65
Chi square, 5, 12–13, 27, 40–44 Covariance, 63, 66–68

N.T. Thomopoulos, Essentials of Monte Carlo Simulation: Statistical Methods 167


for Building Simulation Models, DOI 10.1007/978-1-4614-6022-0,
# Springer Science+Business Media New York 2013
168 Index

Cumulative distribution, 18, 20, 25, 36, 58, 59, Forecast model, 84–89
127, 130, 137, 143 FORTRAN, 2
Cumulative distribution function (CDF), Frontline, 3
16–18, 21, 23–25, 28, 30, 35, 46, 48, 51,
52, 126, 132–135
Cycle, 10–14, 81–84 G
Cyclical, 6, 79, 83, 84, 90 Gamma, 5, 27, 31–33, 41, 44, 74–76, 91,
116–118, 122–123
Gamma function, 31, 35
D Gas station, 54, 96
Data, 2, 9, 24, 69, 79, 91–112, 113, 137–145 GAUSS, 2
Database, 84–85, 90 Geometric, 5, 45, 51, 52, 55, 116, 119–120,
Defectives, 50, 60, 61, 99 125, 133, 134
Degrees of freedom, 40–44, 93–95, 102–105, Goodness-of-fit, 13, 126, 132
109–112
Demand history, 84, 85, 87–89
Demand pattern, 85–89 H
Digits, Hastings approximation, 36–37
Discrete, 3–5, 7, 15, 16, 21, 45–55, 57, 78, 113, Hastings method, 37
114, 116, 124, 132, 135 Horizontal, 84–87, 89
Discrete arbitrary, 5, 45–46 Hyper geometric, 5, 45, 50, 55
Discrete distributions, 119, 123–124 Hypothesis, 12, 108, 110
Discrete uniform, 5, 46–47, 76, 77, 116, 119,
123, 124, 133
Discrete variable, 15–17 I
Idle time, 55, 80, 92, 100, 101, 108
Independence, 114
E Inter-arrival time, 72
Empirical, 4, 7, 114, 126, 132, 135, 137, Inventory, 7, 55, 85–87, 90, 113
139–142, 144, 145 Inventory manager, 7, 85–89, 113
Empirical grouped data, 25–26 Inverse function, 16, 20–22, 126
Empirical probability, 127 Inverse transform method (IT), 5, 15–17, 19,
Empirical ungrouped data, 24–25 24–26, 29, 36, 37
Emulate, 4, 6, 7, 12, 26, 79, 90, 91, 113, 118
Equilibrium, 4, 6, 55, 79–82, 84, 90
Erlang, 5, 21, 27, 30–31, 44 J
Estimating parameters, 120–126 JAVA, 2
Excel, 3 Joint probability, 5, 57, 58
Expected value, 12, 13, 23, 28–30, 34, 35, 42,
43, 45, 47–51, 53, 63, 73, 99, 101, 105,
109, 124, 125 L
Exponential, 5, 27–29, 31, 35, 44, 53, 54, Lehmer, D.H., 10
71–75, 99, 116, 117, 121–122, 130–131 Length of the cycle, 12
Lexis ratio, 115, 119, 120
Linear congruent generators, 10–11
F Linear congruent method, 14, 153
F distribution, 43 Location parameter, 91, 115–116, 118,
Fishers’ F, 5, 27, 43–44 141–143
Fixed time horizon, 71 Lognormal, 5, 7, 27, 39–40, 44, 116–118, 122,
Flipping coins, 2, 4, 9 131, 132, 138, 141–142
Forecast, 84–89 Los Alamos National Laboratory, 1
Forecast error, 85–89 Lower limit, 104, 108, 111
Forecasting, 84–85, 87, 88, 90 Lower-tail, 43
Index 169

M Number of failures, 51, 52, 119, 120, 125, 133


Maintenance, 6, 54, 79, 83, 91 Number of trials, 51, 52, 65, 74, 75, 119,
Manhattan project, 1, 3 120, 155
Marginal distribution, 59, 62–63
Matrices, 66–69, 156
Maximum, 20, 115, 116, 123, 124, 127 O
Maximum likelihood, 120, 121, One-way analysis of variance, 4, 108–111, 158
123–125, 133 Options, 100–108, 110, 111, 158
Maximum value, 7, 19–20, 137–140 Order statistics, 18–20
Mean, 4, 6, 7, 12, 13, 23, 28, 29, 31–33, 36, Outlier, 85, 87, 88
38–42, 49, 52, 60, 63–68, 73–75, 91–95, Output data, 4, 6, 79–83, 90–112
101–105, 108–112, 117–125,
140–142
Mean squares, 109, 110 P
Memoryless, Pair, 13, 64–66, 110, 126–128, 132, 158
Method of moments, 120–124 Parameter, 3, 4, 7, 10, 12, 14, 28, 30, 31, 35,
Metropolis, 1 40–42, 46, 49, 50, 52, 53, 66, 73, 75,
Microsoft, 3 85–89, 91, 113–116, 118–126, 130, 135,
Middle-square method, 1 137, 138, 141–144, 158, 159
Minimum, 16, 17, 19, 54, 72, 74, 75, 85, 91, Partial expectation, 58
115–117, 123, 124, 127, 138 Partial variance, 58
Minimum value, 7, 19, 20, 22, 74, 76, 99, 115, Partitions, 82–84
137–140 Partitions and buffers, 82–84
Mode, 17, 22, 23, 30, 34, 35, 118, 123, Pascal, 5, 21, 45, 52, 55, 116, 120, 125, 149,
139–143, 145 155, 159
Modular, 9–10 Percentage-point, 94
Modular arithmetic, 9–10 Physical methods, 2
Modulo, 9–11 Poisson, 5, 45, 49, 53–55, 73, 91, 116, 120,
Monaco, 1 125–126, 135, 149
Monte Carlo methods, 1–3 Poisson approximation, 49
Monte Carlo simulation, 3 Poisson process, 6, 71–72, 78, 157
Most likely, 7, 22, 23, 137, 139–141, 143 Poker, 6, 71, 77, 78
Multinomial, 3, 5, 57, 59–60, 70 Pooled variance, 102, 104
Multiplier, 12, 14 Population, 50, 60, 61, 76
Multivariate discrete arbitrary, 57–58 P-P Plot, 132–135
Multivariate hyper geometric, 3, 5, p-quantile, 7, 54, 137
60–61, 70 Probability density, 5, 16, 19, 21, 23, 27, 28,
Multivariate lognormal, 3, 57, 68–70 30, 34, 36, 40, 41, 153, 154
Multivariate normal, 3, 5, 57, 66–68, 70 Probability distribution, 3–7, 15, 26, 27, 36,
Multivariate random, 5, 57–70 44–46, 55, 57, 58, 61, 64, 70, 71, 79,
113–135, 137–145, 149, 154, 155
Probability fit, 127, 129
N Production, 100
Natural logarithm, 29, 32, 37, 39, 51, 118, 122, Proportion, 4, 6, 7, 91, 92, 96–101, 105–108,
130, 131, 141 112, 157
Negative binomial, 52 Proportion type data, 92, 96–100
No Data, 7, 137–145 Pseudo random numbers, 3, 13–14
Nonterminating, 4, 6, 79–84, 90
Normal, 5, 27, 30, 34–44, 49, 68, 85–89,
93–94, 96–98, 101–104, 116–118, 121, Q
128–131, 141, 149 Q-Q Plot, 126–132, 159
Number Months History, 85 Quantile, 54, 126–131, 137, 138, 141–143,
Number of fails, 119, 120, 125 145, 159, 160
170 Index

a-Quantile, 137, 141–143, 145 Sine-Cosine method, 38, 155


Queue, 81, 82, 92 Skewed to the left, 118, 139
Skewed to the right, 118, 139
Solver, 3
R Sorted, 19, 24, 115, 116, 121, 126–128, 130,
Ramp down, 34 132, 133, 135
Ramp up, 34 Sorted values, 19, 128
Rand corporation, 1–2 Special applications, 6, 71–78
Random integers, 6, 71, 76–78 SPSS, 2
Random integers without replacement, 76–78 Standard beta, 33–35
Random multinomial, 60 Standard deviation, 36, 40, 64–66, 85–87, 89,
Random number, 2–4, 9, 13, 14, 83, 86, 88, 89 93, 95, 96, 100, 102–104, 115–117, 121,
Random number generators, 2, 4, 9–14, 26 122, 128, 130, 131, 142
Random variable, 4, 5, 7, 11, 12, 15–22, 24, 26, Standard error, 85–87, 89, 93, 95, 97, 98,
27, 29, 30, 35, 45, 53, 57, 59, 61, 92, 102–104, 106, 107, 111
113, 114, 116, 117, 119, 120, 135, 149 Standard exponential, 129, 152
Random variate, 1, 3–6, 15–55, 57–71, 73–76, Standard normal, 5, 27, 36–42, 49, 64, 65,
78, 91, 92, 95, 150–157 67–69, 94, 106, 128, 131, 147, 151, 153
Redundancy, 6, 71, 73–76, 78 Standby redundancy, 6, 71, 75–76, 78
Reliability, 14, 73–76 Statistical analysis, 40, 42, 43, 80, 84
Repair shop, 6, 79, 91 Statistical measures, 7, 83, 113, 115
Replenish, 85–89 Steady state, 4
Residual error, 109, 110 Student’s t, 5, 27, 42, 47, 94–96, 102, 105,
Rolling dice, 2, 4, 9 111, 155
Roulette wheel, 2 Subcomponent, 74, 75
Summation, 21–22
Sum of squares, 109, 110
S Symmetric, 118
Sample, 1, 4, 7, 12, 13, 18, 49, 50, 60, 61, 76,
93, 94, 96, 105, 109, 110, 113–117, 119,
121, 122, 124–126, 128–133, 135, T
137–142, 144, 145, 155, 158, 159 Terminating system, 6, 79, 80, 84, 90, 97, 101,
Sample mean, 93–95, 101, 104, 111, 121, 104, 105, 107, 108, 112
124, 125 Time series, 84
Sample variance, 43, 93, 102, 119, 121, 122, Transforming data, 116, 117
124, 158 Transforming variables, 116
SAS, 2 Transient, 4, 6, 79–84, 90, 153, 154
Seasonal, 84, 88, 89 Transient stage, 4, 6, 55, 79–83
Seed, 10, 11, 13, 14, 153 Treatments, 108–110
Sequence, 10–14, 76–77, 100, 108, 114, 157 Trend, 84, 88, 89
Service, 7, 30, 54, 55, 80–83, 91, 92, 96, 113 Trials, 3, 48, 49, 51, 52, 59, 60, 65, 74–76,
Shuffling cards, 2, 4, 9 96, 99, 100, 119, 120, 124, 125,
Significant test, 102–103, 107–108 133, 155
Simulation, 2, 3, 5, 12, 14, 15, 25, 27, 42, 54, Triangular, 7, 22, 23, 137–139, 149, 154
65, 74, 76, 80, 82, 85, 86, 92, 95, 110, Triangular distribution, 22–23, 139, 159
138, 141, 142, 157 Truck dealership, 54
Simulation model, 2–4, 6, 7, 9, 14, 17, 23, 24, Truncated variables, 17–18
26, 29, 42, 47, 54, 55, 63, 71, 73–75,
78–80, 82–92, 96, 97, 99, 101, 104, 105,
107, 108, 110, 113, 114, 135, 137–142, U
144, 145 Ulam, 2
Simulation run, 6, 31, 36, 41, 44, 73, 74, 76, Uniform variates, 11, 12, 16, 18, 21, 23–26, 29,
79–90, 95, 101, 104, 105, 108, 157 30, 36–38, 47–49, 58
Index 171

Upper and lower limits, 93–95, 97, 101–108, Vector, 66–69, 127, 128, 159
111 Visual Basic, 2
Upper-tail, 43, 94 Von Neumann, J., 1
U.S. Air Force, 1–2

W
V Weibull, 5, 7, 27, 35–36, 44, 117–119, 137,
Variable type data, 92–96, 101, 108–112 138, 142–145, 149, 155, 160
Variance, 4, 12, 23, 28, 30–35, 39–43, 45, What if, 6, 7, 91, 113
47–53, 58, 60, 63, 64, 66, 67, 92, 93, 95, When need more accuracy, 96, 98–100
97, 98, 101–104, 107–109, 111, When n is small, 48
118–124, 141, 155, 158 When s1=s2, 102–103
Variance-covariance, 67, 68, 156 When s16¼s2, 103
Variate, 12, 15, 26, 31, 32, 38, 49, 53, 55, Without replacement, 6, 50, 60, 61, 71,
58–59, 61, 69, 153 76–78, 157

You might also like