0% found this document useful (0 votes)
27 views

7 Input Modeling 2024

Uploaded by

Thành Duy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

7 Input Modeling 2024

Uploaded by

Thành Duy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 90

Input Modelling

Book: Jerry Bank’s chapter 9,5


Kelton’s chapter 4 & 5
Robinson, chapter 7

Dr. P.H.Tram & Dr Tran Duc Vi , 2023


How to simulate “randomness” of the inputs?

World view: Process view of Event view ?


- Events
- Activities
- System states
Descriptive statistics to evaluate/ predict system performances?

2
Steps in a Simulation Study

- Why do we need to do
simulation?

- What do we need to
create a simulation
model?

- What are we to do with


a simulation model?

3
3
Outline

• Structural modelling (logic) vs Quantitative


modelling (input modelling)
• Data collection
• Using data
• Special cases:
- Non-Stationary process
- No data
- Multivariate and correlated data
• Using data with Arena

4
Simulation modelling
● Structural modelling:
Entities, resources, path…

● Quantitative modelling:
Interarrival time, processing time, downtime… → Input
modelling

● Deterministic vs Random inputs

Interarrival time Service time:


1 minute 59s
Data
• Central to development and use of
simulation models
• If data are inaccurate -> model inaccurate
(GIGO: garbage-in-garbage-out)
• Practical Issues:
– Data requirements for simulation models
– Data collection
– Model variability using statistical distribution
Data requirements
• Quantitative (numeric data)
• Qualitative data (pictures, diagrams, words, logic
statements….)
• Raw Data # information (data with interpretation, data
have been analyzed for some purpose)
• 3 types
– Contextual data: to develop understand the problem
– Data for model realisation: develop computer model
– Data for model validation
• Obtain Data
– Category A: Available
– Category B: Not available but collectable
– Category C: Not available and not collectable
Data Collection
• Hard, time-consuming, frustrating,
boring (non-existence, not available,
incomplete, too much..)

• Suggestions:
– Plan ahead: begin by a practice or pre-observing session, watch
for unusual circumstances
– Combine homogeneous data sets, e.g. successive time
periods, during the same time period on successive days
– Check for relationship between variables, e.g. build scatter
diagram
– sensitivity analysis: try a range of value and see what important
– match model detail with quality of data
– Collect input data, not performance data (however, performance
data can be used for validation)
– remember Garbage In, Garbage Out 8
Course projects

- How do you plan to collect the data?


- What are the input data and performance data?
Using data
● Use data “directly” in simulation
(trace-driven model)
- Read actual observed values to drive the model inputs
(eg. set of orders: order items, quantity; truck arrival
time using gate records; time to failure and down time,
product mix and quantities, production sequence)

● Or, fit probability distribution to data


- “Draw” or “generate” synthetic observations from this
distribution to drive the model inputs

Pros & Cons?


Using data

● Use data “directly” in simulation


- All values will be “legal” and realistic (useful for
dependent data if any)
- But can never go outside your observed data
- May not have enough data for long or many runs
- Computationally slow (reading disk files)

● Or, fit probability distribution to data


- Can go beyond observed data (good and bad)
- May not get a good “fit” to data
4 Steps of Input modelling

1. Data Collection
2. Identifying Distribution (Family)
3. Parameter Estimation
4. Goodness-of-Fit Tests
Identify the Distribution (Family)

• Histograms
• Context
• Practical

13
Example of Arrival process

Time Period: 10 minutes


Histogram
• Show shape of a distribution
• Correspond to PDF (continuous data) or
PMF (discrete data)

• The shape of histogram depends on


number of class intervals
→ Suggestion:
+ number of intervals = the square root of
the sample size
+ If few data points are available,
combine adjacent cells to eliminate the
ragged appearance of the histogram

• Plot histogram of discrete data and


continuous data Fig 9.3. Ragged, coarse, and appropriate histograms
a) Original data- too ragged
b) Combining adjacent cells-too coarse
c) Combining adjacent cells- appropriate 15
Discrete Data

Example 9.5 page 342


The number of vehicles at the northwest corner of an intersection in a 5 minute
period between 7:00 AM and 7:05 AM was monitored for five workday over a 20-
week period.

Table 9.1 Number of Arrivals in a 5-Minute Period

Fig 9.4 Histogram of number of arrivals per period


16
Selecting the Family of Distribution

• A family of distributions is selected based on:


– Shape of the histogram
– The context of the input variable

Reference:
● Understanding and choosing the right probability distribution
https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781119197096.app03
● Jerry Bank’s chapter 5: Statistical models in simulation
● Jerry Bank's chapter 9: page 346 17
Identify distributions!
Exponential
Normal
Guideline of Probability Distributions
● Binomial: # of successes in n trials
● Negative binomial: # of trials for r successes
● Geometry: #of trials for 1st success
● Poisson: # of independent events that occur in a fixed amount of
time or space

● Normal: distribution of a process that is the sum of a number of


component processes. Eg. assembly
● Lognormal: distribution of a process that can be thought of as
the product of (meaning to multiply together) a number of
component processes. Eg. ROI with compounded interest
● Exponential: time between independent events, or a process
time that is memoryless
21
Guideline of Probability Distributions
● Gamma: model nonnegative random variables. The gamma can be
shifted away from 0 by adding a constant.
● Beta: model bounded (fixed upper and lower limits) random
variables, ranging [0,1] (unit interval).The beta can be shifted away
from 0 by adding a constant
● Erlang: sum of several exponentially distributed processes;
● Weibull: time to failure for components

● Discrete or continuous uniform: models complete uncertainty


● Triangular: a process for which only the minimum, most likely, and
maximum values are known
● Empirical: resamples from the actual data collected

22
Common random variables
● Queuing system: Interarrival time, Service time
→ constant, Normal (>0), Exponential, Gamma, Weibull
Common random variables

● Inventory & SC:


- Number of demand: Geometric, Poisson, Negative
binomial
- Lead time: Gamma
- Time between demand

● Reliability & maintainability:


- Time to failure: Weibull, Exponential, Normal

<Jerry's chapter 5>


Example of Weibull Distribution - Failure distribution

Infant mortality -failure rate decrease


ß=0.2-0.6

<https://www.weibull.com/hotwire/issue21/hottopics21.htm>
Practical Selection of Distribution Family
Probability Distribution

Histogram Histogram
uniform, single hump number of values groupings
(multimodals), data points different
from main set (outliers)
Theoretical Empirical (large sample size)
mathematical formulation divide data into groupings,
calculate proportion, interpolate
ARENA: TRIA( ), NORM( ).. ARENA: CONT( ), DISC( )
Bounded/ unbounded? discrete/continuous?
eg.TRIA is prefered to NORM eg. assigning entity type
discrete/continuous?

ease of parameter
manipulation?
eg.EXPO is prefered to WEIB

26
4 Steps of Input modelling

1. Data Collection
2. Identifying Distribution (Family)
3. Parameter Estimation
4. Goodness-of-Fit Tests
Parameter Estimation (signature)
• To identify a specific instance of the distribution family
• Location parameters — they shift the density function
• Shape parameters — they change the shape of the
density function
• Scale parameters

Normal, location & scale parameters


Parameter Estimation
• To identify a specific instance of the distribution family
• Location parameters — they shift the density function
• Shape parameters — they change the shape of the
density function
• Scale parameters

Weibull, shape parameter Weibull, scale parameter

Normal, location & scale parameters


Parameter
Estimation
Parameter Estimation

• A parameter is an unknown constant, but an estimator is a


statistic.
• If observations in a sample of size n are X1, X2, …, Xn
(discrete or continuous), the sample mean and variance are:

31
Example of Raw data of Component life (days)

If data come from


Exponential
• If the data are discrete and have been grouped in a
frequency distribution:

The histogram +
context → X follows
Poisson distribution

Why sample mean is


Number of Arrivals in a 5- not equal to sample
Minute Period variance?
• If data are grouped into class intervals (raw data is not
available)
Component life (days)
• If data are grouped into class intervals (raw data is not
available)

fj is the observed frequency of in the jth class interval


mj is the _________of the jth interval
c is the ___________of class intervals

Compare with previous calculation from the raw data?


Parameter Estimation

Example 9.10 Exponential Distribution (page 353)


Example 9.11 Weibull Distribution (page 354)
Example 9.12 Poisson Distribution (page 356)
Example 9.13 Lognormal Distribution (page 356)
Example 9.14 Normal Distribution (page 356)
Example 9.15 Gamma Distribution (page 356)
Example 9.16 Beta Distribution (page 358)

36
4 Steps of Input modelling

1. Data Collection
2. Identifying Distribution (Family)
3. Parameter Estimation
4. Goodness-of-Fit Tests
Goodness-of-fit Test

• Graphical approach:
– Q-Q plot: graphs the quantiles of the fitted distribution
vs. the sample quantiles.
– P-P plot: graphs the fitted CDF vs. the empirical CDF

• Statistical Test:
– Kolmogorov-Smirnov test
– Chi-square test

38
Quantile-Quantile Plot

A Median is a
_______% quantile

● Q-Q plot: The plot of yj versus F-1( (j-0.5)/n) 39


Recall: Random variate
● Exponential Distribution:

● Weibull distribution

Figure: Inverse-transform technique for exp(λ = 1)

40
Quantile-Quantile Plot

● The plot of yj versus F-1( (j-0.5)/n) is

– Approximately a straight line if F is a member of an


appropriate family of distributions
– The line has slope 1 if F is a member of an appropriate
family of distributions with appropriate parameter values
● Q-Q plot can also be used to check homogeneity
– Check whether a single distribution can represent both
sample sets
– Plotting the order values of the two data samples against

each other 41
Quantile-Quantile Plot

• Example: Check whether the door installation times


follows a normal distribution.

– The observations are now ordered from smallest to


largest:

– yj are plotted versus F-1( (j-0.5)/n) where F has a


normal distribution with the sample mean (99.99 sec)
and sample variance (0.28322 sec2)
42
Quantile-Quantile Plot
• Example (continued): Check whether the door
installation times follow a normal distribution.

43
Exercise

Test Exponential,
Uniform
Quantile-
Quantile
Plot
Quantile-Quantile Plot

randomness tends to obscure things,


especially with small samples 47
Statistical Test
We want to test the null hypothesis :
H0: The random variable, X, conforms to the distributional
assumption with the parameter(s) given by the
estimate(s).
H1: The random variable X does not conform.

𝛼= Type I error = Pr(reject H0/ H0is true)


𝛃=Type II error = Pr(accept H0/ H0is false)
Power = 1-𝛃 = Pr(reject H0/ H0is false)
p value: smallest value of type I error that leads to rejection
of H0
Chi-square Test
Chi-square Test
• Test statistics :

which approximately follows the chi-square distribution


with k-s-1 degrees of freedom, where s = number of
parameters of the hypothesized distribution estimated by
the sample statistics.
- One should use Ei ≥ 5

• Reject Ho if 𝛘02 > 𝛘2k-s-1,𝛂


50
Chi-square Test
• Discrete distribution

• Continuous distribution

52
Chi-square Test

– Recommended number of class intervals (k):

– Caution: Different grouping of data (i.e., k) can affect


the hypothesis testing result.
– if using equal probabilities, then pi = 1/k

recommend: Ei =npi ≥5 or k ≤ n/5


53
Chi-square Test
• Vehicle Arrival Example :
H0: the random variable is Poisson distributed.
H1: the random variable is not Poisson distributed.
Chi-square Test
• Vehicle Arrival Example :
H0: the random variable is Poisson distributed.
H1: the random variable is not Poisson distributed.

Combined because
of min Ei

– Degree of freedom is k-s-1 = 7-1-1 = 5, hence, the


hypothesis is rejected at the 0.05 level of significance.
55
Chi-square Test
● Component Life Example
H0: the random variable is Exponential distributed.
H1: the random variable is not Exponential distributed.

50 data points → number of intervals ?


Ei =npi ≥5 or k ≤ n/5
Exercise
Test Uniform distribution
Kolmogorov-Smirnov

Sn(x)
Kolmogorov-Smirnov
• Test statistics:
D = max| F(x) - Sn(x)|

• Reject Ho if D > dn,𝛂

• A more powerful test, particularly useful when:


– Sample sizes are small,
– No parameters have been estimated from the data.
– discrete distribution
• When parameter estimates have been made:
– Critical values in Table A.8 are biased, too large.
– More conservative, i.e., smaller Type I error than specified.

61
Example: Test uniform distribution unif(0,1) of these 5 numbers
0.44, 0.81, 0.14, 0.05, 0.93.

Arrange R(i) from


Step 1: R(i) 0.05 0.14 0.44 0.81 0.93 smallest to largest

i/N 0.20 0.40 0.60 0.80 1.00

i/N – R(i) 0.15 0.26 0.16 - 0.07 D+ = max {i/N – R(i)}


Step 2:
R(i) – (i-1)/N 0.05 - 0.04 0.21 0.13
D- = max {R(i) - (i-1)/N}

Step 3: D = max(D+, D-) = 0.26


Step 4: For α = 0.05,
Dα = 0.565 > D

Hence, H0 is not rejected.

62
Exercise

Test Exponential,
Uniform
Important points

• No “true” distribution for any stochastic input


process
– If very little data are available, it is unlikely to reject
any candidate distributions
– If a lot of data are available, it is likely to reject all
candidate distributions

• Do sensitivity analysis
• Goal: obtain a good approximation
Fitting a Non-stationary Poisson Process

• External events (often arrivals) whose rate varies


over time:
- Lunchtime at fast-food restaurants
- Rush-hour traffic in cities
- Telephone call centers
- Seasonal demands for a manufactured product

• Possible approaches:
– Fit a very flexible model with lots of parameters
or
– Approximate constant arrival rate over some basic
interval of time, but vary the rate from time interval to
time interval. (piecewise constant)
66
Fitting a Non-stationary Poisson Process

• The estimated arrival rate during the ith time period is:

where n = # of observation periods, Δt = time interval length


Cij = # of arrivals during the ith time interval on the jth observation period

- More on:
- Kelton’s chapter 4,5
- Rossetti’s chapter 10

67
Fitting a Non-stationary Poisson Process

Example: Consider a 10-hour business day [8am,6pm] allow


the rate to change every half hour, then T = 10, k = 20 and Δt =
½, and observe over n =3 days

--
>1/3(0.5)*(23+26+3
2)
= 54 arrivals/hour

68
Stationary
Need more data for Chi_square
test on Poisson distribution

nonstationary
Arrival Schedule in Arena

The “Time unit” is for the “duration”

The “rate” must be in HOUR regardless the “Time


unit”
No Data
• Some possible sources to obtain information about
the process are:
– Engineering data: performance ratings provided by the
manufacturer or company rules specify time or
production standards.
– Expert option: provide optimistic, pessimistic and most-
likely times,variability
– Physical or conventional limitations: physical limits on
performance, limits or bounds that narrow the range of
the input process.
– The nature of the process.

71
Possible No Data Distribution
Distribution Parameters Characteristics Example use
Exponential Mean - High variance - Interarrival
- Bound on left time
- Unbounded on - Time to
right machine
failure
(constant
failure rate)
Triangular Min, Mode, - Symmetric or non - activity
Max symmetric times
- Bounded on both
sides
Uniform Min, Max - All values equally - Little
likely known
- Bounded on both about the
sides process
No Data
• Example 9.20: Production planning simulation.
– Input of sales volume of various products is required,
salesperson of product XYZ-123 says that:
• No fewer than 1,000 units and no more than 5,000 units will be sold.
• Given her experience, she believes there is a 90% chance of selling
more than 2,000 units, a 25% chance of selling more than 3,500
units, and only a 1% chance of selling more than 4,500 units.

– Translating these information into a cumulative probability of


being less than or equal to those goals for simulation input:

73
Multi Modal
Example:
1000 observations of the time, in minutes, required to pass through the
metal detector station were collected. The average value of these data
was about 1/2 minute, with standard deviation almost the same. This
suggests that a good input model for the time to pass through the metal
detector is the exponential distribution

F(x)=N1/N. F1(x) +N2/N.F2(X)

<Kelton 4.6.4>
Multivariate and Time-series input model

● So far we assume that all generated


random observations are independent

● When this isn’t true, ignoring the relations


can invalidate model
Multivariate and Time-Series Input Models

• Examples:
Multivariate:
– Lead time and annual demand for an inventory model,
increase in demand results in lead time increase,
hence variables are dependent.

Time-series:
– Monthly demand, Time between arrivals of orders to
buy and sell stocks, buy and sell orders tend to arrive
in bursts, hence, times between arrivals are
dependent.

76
Multivariate and Time-Series Input Models

● Checking correlation of data: scatter diagram, time series plot,


ACF plot
● Fit Multivariate model
- Multivariate Normal
- Multivariate Lognormal
- Arbitrary specified marginal distribution & correlation ..
● Fit Time series model
- AR, ARMA
- EAR
- NORTA ..
More on:
Jerry Bank’s
Law & Kelton 2000’s (Simulation modeling and analysis)
Rosetti’s (Simulation modeling and Arena) 77
Check correlation
Multivariate model
suppose that we have n independent and identically distributed pairs (X11 , X21
), (X12 , X22 ), . . . , (X1n , X2n ).
Time series

• A time series is a sequence of random variables


X1, X2, X3, … , are identically distributed (same
mean and variance) but dependent.
– cov(Xt, Xt+h) is the lag-h autocovariance

– corr(Xt, Xt+h) is the lag-h autocorrelation

– If the autocovariance value depends only on h and not


on t, the time series is covariance stationary
82
• If X1, X2, X3,… is a sequence of identically distributed, but
dependent and covariance-stationary random variables,
then we can represent the process as follows:
– Autoregressive order-1 model, AR(1)
– Exponential autoregressive order-1 model, EAR(1)
• Both have the characteristics that:

• Lag-h autocorrelation decreases geometrically as the lag increases,


hence, observations far apart in time are nearly independent

83
AR(1) Autoregressive Order-1 Model
AR(1) Autoregressive Order-1 Model
AR(1) Autoregressive Order-1 Model
EAR(1) Exponential Autoregressive Order-1
Model
EAR(1) Exponential Autoregressive Order-1
Model
NORTA - The Normal-to-Anything
Transformation

More in Jerry Bank’s chapter 9


SUMMARY
● What is input modelling?

● How to fit a distribution to data?

● How to deal with special cases


- Cannot fit theoretical distribution
- Multimodal
- Nonstationary process
- No data
- Mutivariate & Time-series

You might also like