0% found this document useful (0 votes)
32 views

Batch Means Method: S-38.3148 Simulation of Data Networks / Data Collection and Analysis 1

The batch means and regenerative methods are used to analyze data from simulation runs. The batch means method divides a simulation run into batches and takes the average of each batch. The averages of all batches are then averaged to estimate the expected value. The regenerative method identifies regenerative states in a system where the system's development does not depend on past states. Statistics are collected between regenerative states to estimate values while avoiding initial transient bias. Both methods provide confidence intervals for estimates but the regenerative method does not require fixing parameters like batch size in advance.

Uploaded by

Monk Ey
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Batch Means Method: S-38.3148 Simulation of Data Networks / Data Collection and Analysis 1

The batch means and regenerative methods are used to analyze data from simulation runs. The batch means method divides a simulation run into batches and takes the average of each batch. The averages of all batches are then averaged to estimate the expected value. The regenerative method identifies regenerative states in a system where the system's development does not depend on past states. Statistics are collected between regenerative states to estimate values while avoiding initial transient bias. Both methods provide confidence intervals for estimates but the regenerative method does not require fixing parameters like batch size in advance.

Uploaded by

Monk Ey
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

S-38.

3148 Simulation of data networks / Data collection and analysis 1(9)

Batch means method


• Batch means method is used frequently
• Simulation is done as a single (long) run
– let the length of simulation be M
∗ here we think that we consider the system from a customer point of view; then M may
mean the number of interesting observations (as well we may think that M represents
time)
– let the observed variable be X (for instance, waiting time in a queue) and the task is to
estimate its expected value µ = E[X]
• From the beginning of the simulation, the warm-up period of K observations is rejected
• The useful run (of length M − K) is divided into N batches; thus in each batch there are
M −K
n=
N
observations
S-38.3148 Simulation of data networks / Data collection and analysis 2(9)

Batch means method (continued)


• In batch i we get for X the sample average (Xij denotes the j th observation in the ith batch)
1 Xn
X̄i = Xij
n j=1

• The final estimator for the expectation µ is


1 N
X 1 N X
X n
µ̂N = X̄i = Xij
N i=1 nN i=1 j=1

• This is simply the sample average of the whole run (after the warm-up period)
– the division in batches has no bearing from the point of view of the estimator
– the sole purpose of the division is to get an idea of the confidence interval of the estimator
• Assuming that the batches are long enough, the sample averages X̄i of the batches are ap-
proximately independent
• Their sample variance then provides an estimate for the variance of a single X̄i
1 X N
S2 = (X̄i − µ̂N )2
N − 1 i=1
S-38.3148 Simulation of data networks / Data collection and analysis 3(9)

Batch means method (continued)


• The confidence interval of the estimator (at confidence level 1 − β) is
S
µ̂N ± z1−β/2 √
N
• The advantage of the method is that there is only one warm-up period
• There should be at least 20-30 batches in order to estimate the variance reliably
• The bathes should be long enough (much longer than the duration of the initial transient) to
guarantee that the X̄i are approximately independent
• If there is dependence, the correlation is usually positive
• Then the real confidence interval of µ̂ is larger than the estimate given above based on the
assumption of independent batches
– the dependence does not at all degrade the value of the estimator
– it only can mislead the user to believe that the accuracy of the estimator is better than it
actually is
S-38.3148 Simulation of data networks / Data collection and analysis 4(9)

Regenerative method
• Is applicable in so called regenerative systems
• A regenerative system has at least on regenerative state
– the stochastic development of the system from tat point on does not at all depend on how
this state has been reached
– every state of a Markovian system is regenerative
– in an G/G/1 queue the state where the system is empty is a regenerative state
• It there are several regenerative states, one of them is chosen as the basis for the data collection
method
– in the sequel, the regenerative state refers to the chosen regenerative state
• Every now and then the system visits the regenerative state or “regenerates itself”
– this starts “a new life” which does not depend on the past
S-38.3148 Simulation of data networks / Data collection and analysis 5(9)

Regenerative method (continued)


• The instant, when the system returns to the regenerative state, is called the regeneration point
• The period between two regeneration points is called the regeneration period
• The developments of different regeneration periods are fully independent of each other
– this is the “point” of the method

t1 t2 t3 t4 t5
S-38.3148 Simulation of data networks / Data collection and analysis 6(9)

Regenerative method: point estimator


• Let X be the cumulative value of the observed variable during a regenerative period, for
instance,
– the total time the system has spent in a blocking state during a regenerative period
– the total number of packets overflown from a buffer during the period
• Let τ be the “duration” of the regenerative period
– this may refer to the real duration (time) of the period
– it may also refer to e.g. the total number of arrivals during the regenerative period
• The expectation of the observed variable ℓ (for instance, the expectation of time blocking) is
E[X]
ℓ=
E[τ ]

• In a simulation over n regenerative periods one obtains a (strongly consistent) estimator



ℓ̄n =
τ̄
1 Xn 1 Xn
where X̄ and τ̄ are the sample averages X̄ = Xi and τ̄ = τi
n i=1 n i=1
S-38.3148 Simulation of data networks / Data collection and analysis 7(9)

The confidence interval of the estimator


• Consider the variable Zi = Xi − ℓτi
– the Zi are independent and identically distributed random variables (with mean 0)
– so are the Xi and the τi
• Denote
1 Xn 1 Xn 1 Xn
X̄ = Xi , τ̄ = τi , Z̄ = Zi = X̄ − ℓτ̄
n i=1 n i=1 n i=1
• By the central limit theorem we have
n1/2Z̄ n1/2(X̄ − ℓτ̄ )
= → N(0, 1), kun n → ∞
σ σ
where σ 2 is the variance of Z
σ 2 = V[Z] = V[X] − 2ℓCov[X, τ ] + ℓ2V[τ ]
S-38.3148 Simulation of data networks / Data collection and analysis 8(9)

The confidence interval of the estimator (continued)


• By dividing by τ̄ we get
n1/2(ℓ̄n − ℓ)
→ N(0, 1), when n → ∞
σ/τ̄

• For the point estimator ℓ̄n based on measurement over n regenerative periods we get the
confidence interval (at the confidence level 1 − β)
z1−β/2S
ℓ̄n ± √
nτ̄
where S 2 is the (unbiased) estimator of σ 2 based on the sample
S 2 = S11 − 2 ℓ̄n S12 + ℓ̄2n S22
and S11, S22 and S12 are the sample variances and sample covariance of X and τ
1 X n 1 X n 1 X n
S11 = (Xi − X̄)2, S22 = (τi − τ̄ )2, S12 = (Xi − X̄)(τi − τ̄ )
n − 1 i=1 n − 1 i=1 n − 1 i=1
S-38.3148 Simulation of data networks / Data collection and analysis 9(9)

Regenerative method: discussion


• Advantages
– separate transient removal is not needed
– one does not have to fix parameters such as the number of batches in advance
– asymptotically accurate
– easy to understand and implement
• There are, however, a few disadvantages
– it may be difficult to identify regenerative states
– even if one can be identified
∗ the regenerative period may be very long (the user has no control over it)
∗ in a complex system the identification of the regenerative state may be computationally
expensive
– with a finite value of n the estimator ℓ̄n is biased
∗ in fact, the initial transient problem does exist, though it is somewhat concealed

You might also like