0% found this document useful (0 votes)
3 views

SM_Lect_07 (1)

The document outlines the process of analyzing simulation data, focusing on input modeling, data collection, and the identification of statistical distributions. It details methods for data collection, types of data, and techniques for testing the goodness of fit of distributions. The four key steps in developing input data models are collecting raw data, identifying statistical distributions, estimating parameters, and testing for goodness of fit.

Uploaded by

ifexplora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

SM_Lect_07 (1)

The document outlines the process of analyzing simulation data, focusing on input modeling, data collection, and the identification of statistical distributions. It details methods for data collection, types of data, and techniques for testing the goodness of fit of distributions. The four key steps in developing input data models are collecting raw data, identifying statistical distributions, estimating parameters, and testing for goodness of fit.

Uploaded by

ifexplora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Simulation And Modeling

CS-805: SIMULATION and MODELING

Analysis of Simulation Data


Lecture 13, Unit-4
Contents

Analysis of Simulation Data


• Input Modelling:
• Data collection,
• Identification and
• distribution with data
• Parameter estimation, Goodness of fit tests,
• Selection of input models without data, Multivariate and time
series analysis,
• Verification and Validation of Model – Model Building, Verification,
• Calibration and Validation of Models.
Introduction

Input Modelling : Why ?


Input Data

• The ultimate use of input data is to drive the


simulation.
• The process involves
• Collection of Input Data
• Analysis of the input data
• Use the analysis of input data in the simulation model
Collection of input data

• The data may not exist


• eg. A project involves analysis of new capital equipment

• Collection of historical data


• E.g. sales data

• The data may be collected in real time


• E.g. changes in traffic patterns
Sources for input data
• Historical records
• Old data may not be of much use
• Reliability factor
• Complete information is not available

• Manufacturer specifications
• Whether or not these claims can actually be achieved in a real environment has to be proven

• Vendor claims
• The vendor or distributor should already have some experience with the type of system that is being considered.

• Operator claims
• If the operator is knowledgeable about the system, it may be possible to obtain some performance estimates that can
be used as input data

• Management estimates
• their input maybe helpful when an experienced operator is not available for input

• Automatic data capture


• This is analogous to the traffic volume monitors that are frequently encountered on the road.

• Direct observations
• The most physically and mentally demanding form of data collection
• This approach can be particularly grueling and costly when a large amount of data on infrequently occurring events
must be captured.
Data Collection Mechanisms

• Data Collection Devices


• Wit equipment's
• With the video
• With the help of programs

• Time collection mode and units


• Event advance system vs fixed time interval
• Time metric: nonSec/Msec,Sec,Min,Hr,Week…

• Other data collection consideration


• Unbiased Data
• Data collection wit out disruption
Types of Data

• Identify the data type


• Deterministic vs Probabilistic Data
• Deterministics: Conveyor velocities, Preventive maintenance schedule
• Probablistic: Interarrival Time, Customer service processes, Repair times
• Discrete Vs Continuous Data
• Discrete: No. of people arrive in system as a group or a batch, number of
jobs processed before a machine experiences a breakdown
• Continuous: Time between arrivals, Service times, route times
Common Data Distributions

Already covered
• Bernoulli
• Uniform
• Exponential
• Normal
• Triangular
• Weibull
• Erlang
Selecting the Family of Distributions
Use the physical basis of the distribution as a guide, e.g.:
• Binomial: Number of successes in n trials
• Negative binomial and geometric: Number of trials to achieve k successes
• Poisson: Number of independent events that occur in a fix amount of time or space
• Normal: Distribution of a process that is the sum of a number of component processes
• Lognormal: Distribution of a process that is the product of a number of component
processes
• Exponential: Time between independent events, or a process time that is memoryless
• Weibull: Time to failure for components
• Discrete or continuous uniform: Models complete uncertainty
• Triangular: A process for which only the minimum, most likely, and maximum values are
known
• Empirical: Re-samples from the actual data collected
Analysis of input data
• The process of determining the underlying theoretical distribution for
a set of data usually involves what is known as a goodness of fit test.
• these tests are based on some sort of comparison between the
observed data distribution and a corresponding theoretical
distribution.
• If the difference between the observed data distribution and the
corresponding theoretical distribution is small, then it may be stated
with some level of certainty that the input data could have come from
a set of data with the same parameters as the theoretical distribution.
• Methods:
• Graphic approach
• Chi-square test
• Kolmogorov–Smirnov test
• Square error
Analysis of input data: Graphic Approach
• This approach consists of a visual qualitative comparison between
the actual data distribution and a theoretical distribution from which
the observed data may have come.
• Steps
• Create a histogram of observed data
• Create a histogram for the theoretical distribution
• Visually compare the two histograms for similarity
• Make a qualitative decision as to the similarity of the two data sets

• The practitioner must first decide


• how wide a data range each bar in the histogram covers and how many bars to graph.
• The number of observations in each data cell is used to represent the height of the
histogram bars.

• There are two common approaches for determining how to handle


the cell issue:
• Equal-interval approach
• Equal-probability approach
Data Collection
• Suggestions that may enhance and facilitate data
collection:
• Analyze the data as it is being collected: check
adequacy
• Combine homogeneous data sets: successive time
periods, during the same time period on successive days
• Be aware of data censoring: the quantity is not
observed in its entirety, danger of leaving out long
process times
• Check for relationship between variables (scatter
diagram)
• Check for autocorrelation
Identifying the Distribution
Histograms
• A frequency distribution or histogram is useful in determining
the shape of a distribution
• The number of class intervals depends on:
• The number of observations
• The dispersion of the data
• Suggested number of intervals: the square root of the sample
size
• For continuous data:
• Corresponds to the probability density function (pdf) of a theoretical distribution
• For discrete data:
• Corresponds to the probability mass function (pmf)
• If few data points are available
• combine adjacent cells to eliminate the ragged appearance of the histogram
Histograms

Same data with different interval


sizes
Histograms
Example
• Vehicle Arrival Example: Number of vehicles arriving at
an intersection between 7 am and 7:05 am was
monitored for 100 random workdays.
• There are ample data, so the histogram may have a
cell for each possible value in the data range
Histograms: Example

• Sample size 10000


• with different numbers of bins
Identifying the Distribution
Scatter diagrams
A scatter diagram is a quality tool that can show
the relationship between paired data
• Random Variable X = Data 1
• Random Variable Y = Data 2
• Draw random variable X on the x-axis and Y on
the y-axis
Scatter diagrams
➢ Linear relationship
➢ • Correlation: Measures how well data line up
• Slope: Measures the steepness of the data
• Direction
➢ • Y intercept
Identifying the Distribution
Selecting the Family of Distributions
Selecting the Family of Distributions
A family of distributions is selected based on:
• The context of the input variable
• Shape of the histogram
• Frequently encountered distributions:
Selecting the Family of Distributions
Use the physical basis of the distribution as a guide, e.g.:
• Binomial: Number of successes in n trials
• Negative binomial and geometric: Number of trials to achieve k successes
• Poisson: Number of independent events that occur in a fix amount of time or space
• Normal: Distribution of a process that is the sum of a number of component processes
• Lognormal: Distribution of a process that is the product of a number of component
processes
• Exponential: Time between independent events, or a process time that is memoryless
• Weibull: Time to failure for components
• Discrete or continuous uniform: Models complete uncertainty
• Triangular: A process for which only the minimum, most likely, and maximum values are
known
• Empirical: Re-samples from the actual data collected
Selecting the Family of Distributions
Remember the physical characteristics of the process
• Is the process naturally discrete or continuous valued?
• Is it bound?
• Value range?
• Only positive values
• Only negative values
• Interval of [-a:b]
• No “true” distribution for any stochastic input process
• Goal: obtain a good approximation
Summary
• In this Unit,
we described the 4 steps in developing input data
models:
(1) Collecting the raw data
(2) Identifying the underlying statistical distribution
(3) Estimating the parameters
(4) Testing for goodness of fit
Reference

• Simulation modelling Handbook: a practical


approach: Cristopher A. Chung, CRC Press,
ISBN 0-8493-1241-8, 2004
Thank You

Wish you a fruitful Simulation

You might also like