0% found this document useful (0 votes)
72 views

Input Modeling: Banks, Carson, Nelson & Nicol

This document discusses input modeling for simulation models. It covers the four main steps of input model development: collecting real system data, identifying a probability distribution to represent the input process, choosing parameters for the distribution, and evaluating the fit of the chosen distribution. The key points covered are identifying distributions based on histograms of the data and the physical characteristics of the input process. Common distribution families like exponential, normal, Poisson, and Weibull are discussed.

Uploaded by

hasanayyildiz
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

Input Modeling: Banks, Carson, Nelson & Nicol

This document discusses input modeling for simulation models. It covers the four main steps of input model development: collecting real system data, identifying a probability distribution to represent the input process, choosing parameters for the distribution, and evaluating the fit of the chosen distribution. The key points covered are identifying distributions based on histograms of the data and the physical characteristics of the input process. Common distribution families like exponential, normal, Poisson, and Weibull are discussed.

Uploaded by

hasanayyildiz
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Chapter 9-Input Modeling 05.04.

2010

Purpose & Overview


 Input models provide the driving force for a simulation model.
 The quality of the output is no better than the quality of inputs.
 In this chapter, we will discuss the 4 steps of input model
Chapter 9 development:
Input Modeling  Collect data from the real system
 Identify a probability distribution to represent the input
process
 Choose parameters for the distribution
Banks, Carson, Nelson & Nicol
 Evaluate the chosen distribution and parameters for
Discrete-Event System Simulation goodness of fit.

Data Collection Identifying the Distribution


 One of the biggest tasks in solving a real problem. GIGO –  Histograms
garbage-in-garbage-out  Selecting families of distribution
 Suggestions that may enhance and facilitate data collection:
 Parameter estimation
 Plan ahead: begin by a practice or pre-observing session, watch
for unusual circumstances  Goodness-of-fit tests
 Analyze the data as it is being collected: check adequacy  Fitting a non-stationary process
 Combine homogeneous data sets, e.g. successive time periods,
during the same time period on successive days
 Be aware of data censoring: the quantity is not observed in its
entirety, danger of leaving out long process times
 Check for relationship between variables, e.g. build scatter
diagram
 Check for autocorrelation
 Collect input data, not performance data

3 4

Histograms [Identifying the distribution] Histograms [Identifying the distribution]

 A frequency distribution or histogram is useful in  Vehicle Arrival Example: # of vehicles arriving at an intersection
determining the shape of a distribution between 7 am and 7:05 am was monitored for 100 random workdays.
 The number of class intervals depends on: Arrivals per
Period Frequency
 The number of observations
0 12
 The dispersion of the data 1 10
 Suggested: the square root of the sample size 2 19
3 17
 For continuous data: 4 10
Same data
with different
 Corresponds to the probability density function of a theoretical 5 8 interval sizes
distribution 6 7
7 5
 For discrete data: 8 5
 Corresponds to the probability mass function 9 3
10 3
 If few data points are available: combine adjacent cells to 11 1
eliminate the ragged appearance of the histogram
 There are ample data, so the histogram may have a cell for each
possible value in the data range

5 6

Banks, Carson, Nelson & Nicol


Discrete-Event System Simulation 1
Chapter 9-Input Modeling 05.04.2010

Selecting the Family of Distributions Selecting the Family of Distributions


[Identifying the distribution] [Identifying the distribution]

 A family of distributions is selected based on:  Use the physical basis of the distribution as a guide, for
 The context of the input variable example:
 Shape of the histogram  Binomial: # of successes in n trials
 Frequently encountered distributions:  Poisson: # of independent events that occur in a fixed amount of
time or space
 Easier to analyze: exponential, normal and Poisson
 Harder to analyze: beta, gamma and Weibull  Normal: dist’n of a process that is the sum of a number of
component processes
 Exponential: time between independent events, or a process time
that is memoryless
 Weibull: time to failure for components
 Discrete or continuous uniform: models complete uncertainty
 Triangular: a process for which only the minimum, most likely,
and maximum values are known
 Empirical: resamples from the actual data collected
7 8

Selecting the Family of Distributions


Quantile-Quantile Plots [Identifying the distribution]
[Identifying the distribution]

 Remember the physical characteristics of the process  Q-Q plot is a useful tool for evaluating distribution fit
 Is the process naturally discrete or continuous valued?  If X is a random variable with cdf F, then the q-quantile of X is
 Is it bounded? the g such that
 No “true” distribution for any stochastic input process F( g )  P(X  g )  q, for 0  q  1
 Goal: obtain a good approximation
 When F has an inverse, g = F-1(q)

 Let {xi, i = 1,2, …., n} be a sample of data from X and {yj, j = 1,2,
…, n} be the observations in ascending order:

 j - 0.5 
y j is approximately F -1  
 n 

where j is the ranking or order number

9 10

Quantile-Quantile Plots [Identifying the distribution] Quantile-Quantile Plots [Identifying the distribution]

 The plot of yj versus F-1( (j-0.5)/n) is  Example: Check whether the door installation times follows a
 Approximately a straight line if F is a member of an appropriate normal distribution.
family of distributions  The observations are now ordered from smallest to largest:
 The line has slope 1 if F is a member of an appropriate family of
distributions with appropriate parameter values j Value j Value j Value
1 99.55 6 99.98 11 100.26
2 99.56 7 100.02 12 100.27
3 99.62 8 100.06 13 100.33
4 99.65 9 100.17 14 100.41
5 99.79 10 100.23 15 100.47

 yj are plotted versus F-1( (j-0.5)/n) where F has a normal


distribution with the sample mean (99.99 sec) and sample
variance (0.28322 sec2)

11 12

Banks, Carson, Nelson & Nicol


Discrete-Event System Simulation 2
Chapter 9-Input Modeling 05.04.2010

Quantile-Quantile Plots [Identifying the distribution] Quantile-Quantile Plots [Identifying the distribution]

 Example (continued): Check whether the door installation  Consider the following while evaluating the linearity of a q-q
times follow a normal distribution. plot:
 The observed values never fall exactly on a straight line
Straight line,  The ordered values are ranked and hence not independent,
supporting the
hypothesis of a unlikely for the points to be scattered about the line
normal distribution  Variance of the extremes is higher than the middle. Linearity of
the points in the middle of the plot is more important.
 Q-Q plot can also be used to check homogeneity
 Check whether a single distribution can represent both sample
Superimposed sets
density function of
the normal
 Plotting the order values of the two data samples against each
distribution other

13 14

Parameter Estimation [Identifying the distribution] Parameter Estimation [Identifying the distribution]

 Next step after selecting a family of distributions  When raw data are unavailable (data are grouped into class
 If observations in a sample of size n are X1, X2, …, Xn (discrete intervals), the approximate sample mean and variance are:
or continuous), the sample mean and variance are:
 
c n
j 1
fjX j j 1
f j m 2j  nX 2
X S2 
 
n n
Xi X i2  nX 2 n n 1
X i 1
S 2
 i 1
n n 1
where fj is the observed frequency of in the jth class interval
 If the data are discrete and have been grouped in a frequency
mj is the midpoint of the jth interval, and c is the number of class intervals
distribution:
 
n n
j 1
fjX j j 1
f j X 2j  nX 2
X S2   A parameter is an unknown constant, but an estimator is a
n n 1
statistic.
where fj is the observed frequency of value Xj

15 16

Parameter Estimation [Identifying the distribution] Goodness-of-Fit Tests [Identifying the distribution]

 Vehicle Arrival Example (continued): Table in the histogram  Conduct hypothesis testing on input data distribution using:
example on slide 6 (Table 9.1 in book) can be analyzed to obtain:  Kolmogorov-Smirnov test
n  100, f1  12, X 1  0, f 2  10, X 2  1,...,
 Chi-square test
 j 1 f j X j  364, and  j 1 f j X 2j  2080
k k
and
 No single correct distribution in a real application exists.
 The sample mean and variance are  If very little data are available, it is unlikely to reject any candidate
distributions
364
X  3.64  If a lot of data are available, it is likely to reject all candidate
100
distributions
2080  100 * (3.64) 2
S2 
99
 7.63

 The histogram suggests X to have a Possion distribution


 However, note that sample mean is not equal to sample variance.
 Reason: each estimator is a random variable, is not perfect.
17 18

Banks, Carson, Nelson & Nicol


Discrete-Event System Simulation 3
Chapter 9-Input Modeling 05.04.2010

Chi-Square test [Goodness-of-Fit Tests] Chi-Square test [Goodness-of-Fit Tests]

 Intuition: comparing the histogram of the data to the shape of  The hypothesis of a chi-square test is:
the candidate density or mass function H0: The random variable, X, conforms to the distributional
 Valid for large sample sizes when parameters are estimated by assumption with the parameter(s) given by the estimate(s).
maximum likelihood H1: The random variable X does not conform.
 By arranging the n observations into a set of k class intervals or
cells, the test statistics is:  If the distribution tested is discrete and combining adjacent cell
(Oi  Ei ) 2
k
is not required (so that Ei > minimum requirement):

Expected Frequency
 02  Ei = n*pi
i 1
Ei  Each value of the random variable should be a class interval,
where pi is the theoretical
Observed prob. of the ith interval. unless combining is necessary, and
Frequency Suggested Minimum = 5
pi  p(xi )  P(X  xi )
which approximately follows the chi-square distribution with k-s-1
degrees of freedom, where s = # of parameters of the hypothesized
distribution estimated by the sample statistics.

19 20

Chi-Square test [Goodness-of-Fit Tests] Chi-Square test [Goodness-of-Fit Tests]

 If the distribution tested is continuous:  Vehicle Arrival Example (continued):


ai H0: the random variable is Poisson distributed.
pi   ai1
f ( x) dx  F (ai )  F (ai 1 )
H1: the random variable is not Poisson distributed.
xi Observed Frequency, Oi Expected Frequency, Ei (Oi - Ei)2/Ei Ei  np ( x)
e   x
0 12 2.6
where ai-1 and ai are the endpoints of the ith class interval 7.87
1
2
10
19
9.6
17.4 0.15
n
and f(x) is the assumed pdf, F(x) is the assumed cdf. 3 17 21.1 0.8
x!
4 19 19.2 4.41
 Recommended number of class intervals (k): 5 6 14.0 2.57
6 7 8.5 0.26
Sample Size, n Number of Class Intervals, k 7 5 4.4
8 5 2.0
20 Do not use the chi-square test 9 3 0.8 11.62 Combined because
10 3 0.3
50 5 to 10 > 11 1 0.1
of min Ei
100 10 to 20 100 100.0 27.68

> 100 n1/2 to n/5  Degree of freedom is k-s-1 = 7-1-1 = 5, hence, the hypothesis is
rejected at the 0.05 level of significance.
 Caution: Different grouping of data (i.e., k) can affect the  02  27.68   02.05,5  11.1
hypothesis testing result.
21 22

Kolmogorov-Smirnov Test p-Values and “Best Fits”


[Goodness-of-Fit Tests] [Goodness-of-Fit Tests]

 Intuition: formalize the idea behind examining a q-q plot  p-value for the test statistics
 Recall from Chapter 7.4.1:  The significance level at which one would just reject H0 for the
 The test compares the continuous cdf, F(x), of the hypothesized given test statistic value.
distribution with the empirical cdf, SN(x), of the N sample  A measure of fit, the larger the better
observations.
 Large p-value: good fit
 Based on the maximum difference statistics (Tabulated in A.8):
D = max| F(x) - SN(x)|  Small p-value: poor fit
 A more powerful test, particularly useful when:
 Sample sizes are small,  Vehicle Arrival Example (cont.):
 No parameters have been estimated from the data.  H0: data is Possion
Test statistics:  0  27.68 , with 5 degrees of freedom
2
 When parameter estimates have been made: 
 Critical values in Table A.8 are biased, too large.  p-value = 0.00004, meaning we would reject H0 with 0.00004
 More conservative, i.e., smaller Type I error than specified. significance level, hence Poisson is a poor fit.

23 24

Banks, Carson, Nelson & Nicol


Discrete-Event System Simulation 4
Chapter 9-Input Modeling 05.04.2010

p-Values and “Best Fits” Fitting a Non-stationary Poisson Process


[Goodness-of-Fit Tests]

 Many software use p-value as the ranking measure to  Fitting a NSPP to arrival data is difficult, possible approaches:
automatically determine the “best fit”. Things to be cautious  Fit a very flexible model with lots of parameters or
about:  Approximate constant arrival rate over some basic interval of time,
 Software may not know about the physical basis of the data, but vary it from time interval to time interval. Our focus
distribution families it suggests may be inappropriate.
 Close conformance to the data does not always lead to the most
 Suppose we need to model arrivals over time [0,T], our
appropriate input model.
approach is the most appropriate when we can:
 p-value does not say much about where the lack of fit occurs
 Observe the time period repeatedly and
 Recommended: always inspect the automatic selection using  Count arrivals / record arrival times.
graphical methods.

25 26

Fitting a Non-stationary Poisson Process Selecting Model without Data


 The estimated arrival rate during the ith time period is:  If data is not available, some possible sources to obtain
1 n information about the process are:
̂ (t )   Cij
nDt j 1
 Engineering data: often product or process has performance
ratings provided by the manufacturer or company rules specify
time or production standards.
where n = # of observation periods, Dt = time interval length
 Expert option: people who are experienced with the process or
Cij = # of arrivals during the ith time interval on the jth observation
similar processes, often, they can provide optimistic, pessimistic
period
and most-likely times, and they may know the variability as well.
 Example: Divide a 10-hour business day [8am,6pm] into equal  Physical or conventional limitations: physical limits on
intervals k = 20 whose length Dt = ½, and observe over n =3 performance, limits or bounds that narrow the range of the input
days Number of Arrivals
Day 1 Day 2 Day 3
Estimated Arrival
process.
Time Period Rate (arrivals/hr)

8:00 - 8:30 12 14 10 24 For instance,  The nature of the process.


1/3(0.5)*(23+26+32)
8:30 - 9:00 23 26 32 54
= 54 arrivals/hour  The uniform, triangular, and beta distributions are often used
9:00 - 9:30 27 18 32 52
as input models.
9:30 - 10:00 20 13 12 30
27 28

Selecting Model without Data Multivariate and Time-Series Input Models

 Example: Production planning simulation.  Multivariate:


 Input of sales volume of various products is required, salesperson  For example, lead time and annual demand for an inventory
of product XYZ says that: model, increase in demand results in lead time increase, hence
 No fewer than 1,000 units and no more than 5,000 units will be variables are dependent.
sold.  Time-series:
 Given her experience, she believes there is a 90% chance of  For example, time between arrivals of orders to buy and sell
selling more than 2,000 units, a 25% chance of selling more stocks, buy and sell orders tend to arrive in bursts, hence, times
than 2,500 units, and only a 1% chance of selling more than between arrivals are dependent.
4,500 units.
 Translating these information into a cumulative probability of being
less than or equal to those goals for simulation input:
i Interval (Sales) Cumulative Frequency, ci
1 1000  x 2000 0.10
2 2000 < x 3000 0.75
3 3000 < x 4000 0.99
4 4000 < x 5000 1.00
29 30

Banks, Carson, Nelson & Nicol


Discrete-Event System Simulation 5
Chapter 9-Input Modeling 05.04.2010

Covariance and Correlation Covariance and Correlation


[Multivariate/Time Series] [Multivariate/Time Series]

 Consider the model that describes relationship between X1 and X2:  Correlation between X1 and X2 (values between -1 and 1):
( X1  1 )  b ( X 2  2 )    is a random variable
r  corr ( X 1 , X 2 ) 
cov( X 1 , X 2 )
 1 2
with mean 0 and is

 b = 0, X1 and X2 are statistically independent independent of X2

 b > 0, X1 and X2 tend to be above or below their means together


 b < 0, X1 and X2 tend to be on opposite sides of their means = 0, =0
 where corr(X1, X2) < 0, then b < 0
 Covariance between X1 and X2 : > 0, >0
cov( X1 , X 2 )  E[( X1  1 )( X 2  2 )]  E( X1 X 2 )  12  The closer r is to -1 or 1, the stronger the linear relationship is
between X1 and X2.
= 0, =0
 where cov(X1, X2) < 0, then b <0
> 0, >0

31 32

Covariance and Correlation Multivariate Input Models


[Multivariate/Time Series] [Multivariate/Time Series]

 A time series is a sequence of random variables X1, X2, X3, … ,  If X1 and X2 are normally distributed, dependence between
that are identically distributed (same mean and variance) but them can be modeled by the bivariate normal distribution with
dependent. 1, 2, 12, 22 and correlation r
 cov(Xt, Xt+h) is the lag-h autocovariance  To Estimate 1, 2, 12, 22, see “Parameter Estimation” (slide 15-
 corr(Xt, Xt+h) is the lag-h autocorrelation 17, Section 9.3.2 in book)
 If the autocovariance value depends only on h and not on t, the  To Estimate r, suppose we have n independent and identically
time series is covariance stationary distributed pairs (X11, X21), (X12, X22), … (X1n, X2n), then:
1 n
ˆ X1 , X 2 ) 
cov(  ( X1 j  X1 )( X 2 j  X 2 )
n  1 j 1
1  n 
   X 1 j X 2 j  nX 1 X 2 
n  1  j 1 

côv( X 1 , X 2 )
rˆ 
ˆ1ˆ 2 Sample deviation
33 34

Time-Series Input Models AR(1) Time-Series Input Models


[Multivariate/Time Series] [Multivariate/Time Series]

 If X1, X2, X3,… is a sequence of identically distributed, but  Consider the time-series model:
dependent and covariance-stationary random variables, then X t    f ( X t 1   )   t , for t  2,3,...
we can represent the process as follows:
 Autoregressive order-1 model, AR(1) where  2 ,  3 ,  are i.i.d. normally distributed with   0 and variance  2
 Exponential autoregressive order-1 model, EAR(1)
 If X1 is chosen appropriately, then
 Both have the characteristics that:
 X1, X2, … are normally distributed with mean = , and variance =
rh  corr ( X t , X t h )  r h , for h  1,2,... 2/(1-f2)
 Autocorrelation rh = fh
 Lag-h autocorrelation decreases geometrically as the lag
increases, hence, observations far apart in time are nearly
 To estimate f,, 2 :
côv( X t , X t 1 )
independent ˆ  X , ˆ 2  ˆ 2 (1  fˆ 2 ) , fˆ 
ˆ 2
where côv( X t , X t 1 ) is the lag-1 autocovariance

35 36

Banks, Carson, Nelson & Nicol


Discrete-Event System Simulation 6
Chapter 9-Input Modeling 05.04.2010

EAR(1) Time-Series Input Models


[Multivariate/Time Series]
Summary
 Consider the time-series model:  In this chapter, we described the 4 steps in developing input
fX , with probability f data models:
X t   t 1 for t  2,3,...  Collecting the raw data
fX t 1   t , with probability 1-φ
 Identifying the underlying statistical distribution
where  2 ,  3 ,  are i.i.d. exponentially distributed with ε  1/λ, and 0  f  1  Estimating the parameters
 Testing for goodness of fit
 If X1 is chosen appropriately, then
 X1, X2, … are exponentially distributed with mean = 1/
 Autocorrelation rh = fh , and only positive correlation is allowed.
 To estimate f,:
côv( X t , X t 1 )
ˆ  1 / X , fˆ  rˆ 
ˆ 2
where côv( X t , X t 1 ) is the lag-1 autocovariance

37 38

Banks, Carson, Nelson & Nicol


Discrete-Event System Simulation 7

You might also like