Supplemental 1
Supplemental 1
1
______________________________________________________________________________________
f f
b. ³ ³
f f
f ( x1 , x2 )dx1dx2 1
In general, if there are p random variables x1 , x2 ,..., x p then the joint probability density
function is f ( x1 , x2 ,..., x p ) , with the properties:
a. f ( x1 , x2 ,..., x p ) t 0 for all x1 , x2 ,..., x p
b. ³³ ...³ f ( x , x ..., x
R
1 2 p )dx1dx2 ...dx p 1
2. Random Samples
To properly apply many statistical techniques, the sample drawn from the population of
interest must be a random sample. To properly define a random sample, let x be a
random variable that represents the results of selecting one observation from the
population of interest. Let f ( x ) be the probability distribution of x. Now suppose that n
observations (a sample) are obtained independently from the population under
unchanging conditions. That is, we do not let the outcome from one observation influence
the outcome from another observation. Let xi be the random variable that represents the
observation obtained on the ith trial. Then the observations x1 , x2 ,..., xn are a random
sample.
2
______________________________________________________________________________________
3
______________________________________________________________________________________
the range space of xt is R = {0,1,…}. Assume that the number of births during non-
overlapping time intervals are independent random variables, and that there is a positive
constant O such that for any small time interval 't , the following statements are true:
1. The probability that exactly one birth will occur in an interval of length 't is
O 't .
2. The probability that zero births will occur in the interval is 1 O 't .
3. The probability that more than one birth will occur in the interval is zero.
The parameter O is often called the mean arrival rate or the mean birth rate. This type of
process, in which the probability of observing exactly one event in a small interval of
time is constant (or the probability of occurrence of event is directly proportional to the
length of the time interval), and the occurrence of events in non-overlapping time
intervals is independent is called a Poisson process.
In the following, let
P{xt x} p( x) px (t ), x 0,1, 2,...
Suppose that there have been no births up to time t. The probability that there are no
births at the end of time t + 't is
p0 (t 't ) (1 O 't ) p0 (t )
Note that
p0 (t 't ) p0 (t )
O p0 (t )
't
so consequently
ª p (t 't ) p0 (t ) º
lim « 0 »¼ p0c (t )
¬
't o 0 't
O p0 (t )
For x > 0 births at the end of time t + 't we have
px (t 't ) px 1 (t )O 't (1 O 't ) px (t )
and
ª p (t 't ) px (t ) º
lim « x »¼ pcx (t )
't o 0
¬ 't
O px 1 (t ) O px (t )
Thus we have a system of differential equations that describe the arrivals or births:
4
______________________________________________________________________________________
p0c (t ) O p0 (t ) for x 0
pcx (t ) O px 1 (t ) O px (t ) for x 1, 2,...
The solution to this set of equations is
(O t ) x e O t
p x (t ) x 0,1, 2,...
x!
Obviously for a fixed value of t this is the Poisson distribution.
or
f
E( y) E[h( x )] ³f
h ( x) f ( x )dx
The name for this theorem comes from the fact that we often apply it without consciously
thinking about whether the theorem is true in our particular case.
5
______________________________________________________________________________________
E[( x c )2 ] E[ x 2 2 xc c 2 ]
E ( x 2 ) 2cE ( x) c 2
6
______________________________________________________________________________________
is the covariance of the random variables x1 and x2. The covariance is a measure
of the linear association between x1 and x2. More specifically, we may show that
if x1 and x2 are independent, then Cov( x1 , x2 ) 0 .
3. V ( x1 x2 ) V 12 V 22 2Cov( x1 , x2 )
Moments
Although we do not make much use of the notion of the moments of a random variable
in the book, for completeness we give the definition. Let the function of the random
variable x be
h( x) xk
where k is a positive integer. Then the expectation of h( x) x k is called the kth moment
about the origin of the random variable x and is given by
¦ xik p ( xi ), xi is a discrete random variable
° all xi
E(xk ) ® f
° x k f ( x )dx, x is a continuous random variable
¯ ³f
Note that the first origin moment is just the mean P of the random variable x. The
second origin moment is
E(x2 ) P2 V 2
Moments about the mean are defined as
¦ ( xi P )k p ( xi ), xi is a discrete random variable
° all xi
E[( x P )k ] ®
f
° ( x P ) k f ( x) dx, x is a continuous random variable
¯ ³f
The second moment about the mean is the variance V 2 of the random variable x.
7
______________________________________________________________________________________
1 2S 1
2S ³0
dT
2S
2S 1
Several useful properties of the standard normal distribution can be found by basic
calculus:
1. I ( z ) I ( z ), for all real z, so I ( z ) is an even function (symmetric about 0) of z
2. I c( z ) zI ( z )
3. I cc( z ) ( z 2 1)I ( z )
8
______________________________________________________________________________________
I ( z ) |ff
0
and
f
E(z2 ) ³ f
z 2I ( z )dz
f
³ f
[I cc( z ) I ( z )]dz
f
I c( z ) |ff ³ I ( z )dz
f
0 1
1
Because the variance of a random variable can be expressed in terms of expectation as
V 2 E ( z P )2 E ( z 2 ) P 2 , we have shown that the mean and variance of the standard
normal distribution are 0 and 1, respectively.
Now consider the case where x follows the more general normal distribution. Based on
the substitution, we have z ( x P ) / V
1
f 1 2 ( x P )2
E ( x) ³f V 2S 2V
x e dx
f
³f
(P zV )I ( z )dz
f f
P ³ I ( z )dz V ³ zI ( z )dz
f f
P (1) V (0)
P
and
1
f 1 2 ( x P )2
³f V 2S 2V
2 2
E(x ) x e dx
f
³f
( P zV )2 I ( z )dz
f f f
P 2 ³ I ( z )dz 2VP ³ zI ( z )dz V 2 ³ I ( z )dz
f f f
P V2 2
9
______________________________________________________________________________________
10
______________________________________________________________________________________
11
______________________________________________________________________________________
R(t ) P{x ! t}
t
1 ³ f ( x )dx
0
1 F (t )
where, of course, F (t ) is the cumulative distribution function. In biomedical
applications, the reliability function is usually called the survival function. For the
exponential distribution, the reliability function is
F (t ) e Ot
The Hazard Function
The mean and variance of a distribution are quite important in reliability applications, but
an additional property called the hazard function or the instantaneous failure rate is also
useful. The hazard function is the conditional density function of failure at time t, given
that the unit has survived until time t. Therefore, letting X denote the random variable
and x denote the realization,
f ( x | X t x ) h( x )
F c( x | X t x)
F ( x 'x | X t x ) F ( x | X t x )
lim
'x o 'x
F ( x d X d x 'x | X t x )
lim
'x o 'x
F ( x d X d x 'x, X t x)
lim
'x o 'xP{ X t x}
F ( x d X d x 'x)
lim
'x o 'x[1 F ( x )]
f ( x)
1 F ( x)
It turns out that specifying a hazard function completely determines the cumulative
distribution function (and vive-versa).
The Hazard Function for the Exponential Distribution
For the exponential distribution, the hazard function is
f ( x)
h( x)
1 F ( x)
O e O x
e Ox
O
That is, the hazard function for the exponential distribution is constant, or the failure rate
is just the reciprocal of the mean time to failure.
12
______________________________________________________________________________________
A constant failure rate implies that the reliability of the unit at time t does not depend on
its age. This may be a reasonable assumption for some types of units, such as electrical
components, but it’s probably unreasonable for mechanical components. It is probably
not a good assumption for many types of system-level products that are made up of many
components (such as an automobile). Generally, an increasing hazard function indicates
that the unit is more likely to fail in the next increment of time than it would have been in
an earlier increment of time of the same length. This is likely due to aging or wear.
Despite the apparent simplicity of its hazard function, the exponential distribution has
been an important distribution in reliability engineering. This is partly because the
constant failure rate assumption is probably not unreasonable over some region of the
unit’s life.
13
______________________________________________________________________________________
One of the best methods for obtaining a point estimator of a population parameter is the
method of maximum likelihood. Suppose that x is a random variable with probability
distribution f ( x;T ) , where T is a single unknown parameter. Let x1 , x2 ,..., xn be the
observations in a random sample of size n. Then the likelihood function of the sample is
L(T ) f ( x1 ;T ) f ( x2 ;T ) f ( xn ;T )
The maximum likelihood estimator of T is the value of T that maximizes the likelihood
function L( T ).
Example 1 The Exponential Distribution
To illustrate the maximum likelihood estimation procedure, set x be exponentially
distributed with parameter O . The likelihood function of a random sample of size n, say
x1 , x2 ,..., xn , is
n
L (O ) Oe
i 1
O xi
n
O ¦ xi
O ne i 1
Now it turns out that, in general, if the maximum likelihood estimator maximizes L( T ), it
will also maximize the log likelihood, ln L(T ) . For the exponential distribution, the log
likelihood is
n
ln L(O ) n ln O O ¦ xi
i 1
Now
d ln L(O ) n n
¦ xi
dO O i1
Equating the derivative to zero and solving for the estimator of O we obtain
n 1
Oˆ n
x
¦x
i 1
i
Thus the maximum likelihood estimator (or the MLE) of O is the reciprocal of the sample
average.
14
______________________________________________________________________________________
Maximum likelihood estimation can be used in situations where there are several
unknown parameters, say T1 ,T 2 , ,T p to be estimated. The maximum likelihood
estimators would be found simply by equating the p first partial derivatives
wL(T1 ,T 2 , ,T p ) / wT i , i 1, 2,..., p of the likelihood (or the log likelihood) equal to zero
and solving the resulting system of equations.
Example 2 The Normal Distribution
Let x be normally distributed with the parameters P and V 2 unknown. The likelihood
function of a random sample of size n is
2
n 1 § xi P ·
1 ¨ ¸
L( P , V )
2
e 2© V ¹
i 1 V 2S
n
1
1 ¦ ( xi P )2
2V 2 i 1
e
(2SV 2 )n / 2
The log-likelihood function is
n
n 1
ln L( P , V 2 ) ln(2SV 2 ) 2
2 2V
¦ (x P)
i 1
i
2
Now
w ln L( PV 2 ) 1 n
wP V2
¦ ( x P)
i 1
i
w ln L( PV 2 ) n 1 n
4 ¦
2 ( xi P )2 0
wV 2 2V 2V i 1
The solution to these equations yields the MLEs
1 n
Pˆ ¦ xi x
ni1
1 n
Vˆ 2 ¦
ni1
( xi x )2
Generally, we like the method of maximum likelihood because when n is large, (1) it
results in estimators that are approximately unbiased, (2) the variance of a MLE is as
small as or nearly as small as the variance that could be obtained with any other
estimation technique, and (3) MLEs are approximately normally distributed.
Furthermore, the MLE has an invariance property; that is, if Tˆ is the MLE of T , then the
MLE of a function of T , say h(T ) , is the same function h(Tˆ) of the MLE . There are also
some other “nice” statistical properties that MLEs enjoy; see a book on mathematical
statistics, such as Hogg and Craig (1978) or Bain and Engelhardt (1987).
15
______________________________________________________________________________________
¦x k
i
M kc i 1
, k 1, 2,..., p
n
and the first p moments around the origin of the random variable x are just
P kc E ( x k ), k 1, 2,..., p
Example 3 The Normal Distribution
For the normal distribution the first two origin moments are
P1c P
P 2c P2 V 2
and the first two sample moments are
M 1c x
1 n 2
M 2c ¦ xi
ni1
Equating the sample and origin moments results in
P x
1 n 2
P2 V 2 ¦ xi
n i 1
The solution gives the moment estimators of P and V 2 :
16
______________________________________________________________________________________
Pˆ x
1 n
Vˆ 2 ¦ ( xi x )2
ni1
The method of moments often yields estimators that are reasonably good. For example,
in the above example the moment estimators are identical to the MLEs. However,
generally moment estimators are not as good as MLEs because they don’t have statistical
properties that are as nice. For example, moment estimators usually have larger
variances than MLEs.
Least Squares Estimation
The method of least squares is one of the oldest and most widely used methods of
parameter estimation. Unlike the method of maximum likelihood and the method of
moments, least squares can be employed when the distribution of the random variable is
unknown.
To illustrate, suppose that the simple location model can describe the random variable x:
xi P H i , i 1, 2,..., n
where the parameter P is unknown and the H i are random errors. We don’t know the
distribution of the errors, but we can assume that they have mean zero and constant
variance. The least squares estimator of P is chosen so the sum of the squares of the
model errors H i is minimized. The least squares function for a sample of n observations
x1 , x2 ,..., xn is
n
L ¦H
i 1
i
2
¦ (x P)
i 1
i
2
Differentiating L and equating the derivative to zero results in the least squares estimator
of P :
Pˆ x
In general, the least squares function will contain p unknown parameters and L will be
minimized by solving the equations that result when the first partial derivatives of L with
respect to the unknown parameters are equated to zero. These equations are called the
least squares normal equations.
The method of least squares dates from work by Karl Gauss in the early 1800s. It has a
very well-developed and indeed quite elegant theory. For a discussion of the use of least
squares in estimating the parameters in regression models and many illustrative
examples, see Montgomery, Peck and Vining (2001), and for a very readable and concise
presentation of the theory, see Myers and Milton (1991).
17
______________________________________________________________________________________
§ n ·
E (S 2 ) E ¨ ¦ xi2 nx 2 ¸
©i1 ¹
n
¦ E(x
i 1
2
i ) nE ( x 2 )
n
Now ¦ E(x
i 1
2
i ) P 2 V 2 and E(x 2 ) P 2 V 2 / n . Therefore
18
______________________________________________________________________________________
1 ª n º
E (S 2 ) « ¦
n 1 ¬ i 1
( P 2 V 2 ) n( P 2 V 2 / n) »
¼
1
n 1
nP 2 nV 2 nP 2 V 2
(n 1)V 2
n 1
V 2
Note that:
a. These results do not depend on the form of the distribution for the random
variable x. Many people think that an assumption of normality is required, but
this is unnecessary.
b. Even though E (S 2 ) V 2 , the sample standard deviation is not an unbiased
estimator of the population standard deviation. This is discussed more fully in the
next section.
19
______________________________________________________________________________________
§ V2 2 ·
E (S 2 ) E¨ F n 1 ¸
© n 1 ¹
V 2
E ( F n21 )
n 1
V2
(n 1)
n 1
V2
because the mean of a chi-square random variable with n – 1 degrees of freedom is n – 1.
Now it follows that the distribution of
(n 1) S
V
is a chi distribution with n – 1 degrees of freedom, denoted F n 1 . The expected value of
S can be written as
§ V ·
E (S ) E¨ F n 1 ¸
© n 1 ¹
V
E ( F n 1 )
n 1
The mean of the chi distribution with n –1 degrees of freedom is
*(n / 2)
E ( F n 1 ) 2
*[(n 1) / 2]
f
³
where the gamma function *(r ) y r 1e y dy . Then
0
2 *(n / 2)
E (S ) V
n 1 *[(n 1) / 2]
c4V
The constant c4 is given in Appendix table VI.
While S is a biased estimator of V , the bias gets small fairly quickly as the sample size n
increases. From Appendix table VI, note that c4 = 0.94 for a sample of n = 5, c4 = 0.9727
for a sample of n = 10, and c4 = 0.9896 or very nearly unity for a sample of n = 25.
20
______________________________________________________________________________________
However, this unbiased property depends on the assumption that the sample data has
been drawn from a stable process; that is, a process that is in statistical control. In
statistical quality control work we sometimes make this assumption, but if it is incorrect,
it can have serious consequences on the estimates of the process parameters we obtain.
To illustrate, suppose that in the sequence of individual observations
x1 , x2 , , xt , xt 1 , , xm
the process is in-control with mean P0 and standard deviation V for the first t
observations, but between xt and xt+1 an assignable cause occurs that results in a sustained
shift in the process mean to a new level P P 0 GV and the mean remains at this new
level for the remaining sample observations xt 1 , , xm . Under these conditions,
Woodall and Montgomery (2000-01) show that
t (m t )
E (S 2 ) V 2 (GV ) 2 . (13-1)
m(m 1)
In fact, this result holds for any case in which the mean of t of the observations is P0 and
the mean of the remaining observations is P 0 GV , since the order of the observations is
not relevant in computing S2. Note that S2 is biased upwards; that is, S2 tends to
overestimate V2. Furthermore, the extent of the bias in V 5 depends on the magnitude of
the shift in the mean (GV), the time period following which the shift occurs (t), and the
number of available observations (m). For example, if there are m = 25 observations and
the process mean shifts from P0 to P P 0 V (that is, G 1) between the 20th and the 21st
observation (t = 20), then S2 will overestimate V2 by 16.7% on average. If the shift in the
mean occurs earlier, say between the 10th and 11th observations, then S2 will overestimate
V2 by 25% on average.
The proof of Equation 13-1 is straightforward. Since we can write
1 § m 2 ·
S2 ¨ ¦
m 1 © i 1
xi mx 2 ¸
¹
then
2 1 FG ¦ x
m
2 I
mx J2 1 FG ¦ E ( x ) mE ( x )IJ
m
2 2
E (S ) E
m1 Hi 1
i
K m1 Hi 1
i
K
Now
21
______________________________________________________________________________________
1 FG ¦ E ( x )IJ
m
2 1 FG ¦ E ( x ) ¦ E ( x )IJ
t
2
m
2
m1 H i 1 K i
m1 H i 1
i
K i t 1
i
1
m1
ct ( P V ) (m t )( P GV ) (m t )V h
2
0
2
0
2 2
1
m1
ctP (m t )( P GV ) mV h
2
0 0
2 2
and
1 m LMF P FG m t IJGV I 2
V2 OP
MNGH H m K JK
mE ( x 2 )
m1 m 1
0
m PQ
Therefore
LM F F FG IJ I
mt
2
V2 I OP
MNc h GH GH H K JK
1
E (S )2
tP 20 (m t )( P 0 GV ) 2 mV 2 m P 0 GV JK P
m 1 m m
Q
1 L F F m t IJGV IJ OP 2
V 2
M
m 1 MN
tP (m t )( P GV ) mG P G
2
0 0
H H m K K PQ
2
0
1 L (m t ) O 2
V 2
m 1N M(m t )(GV ) 2
m
(GV ) P
Q
2
1 L F (m t ) IJ OP
V2 M(m t )(GV ) G 1
H m KQ
2
m 1N
t (m t )
V2 (GV ) 2
m(m 1)
22
______________________________________________________________________________________
23
______________________________________________________________________________________
92
Octane No.
91
90
89
-1 0 1
Formulation
n n
2nE 1 ¦ y2 j ¦ y1 j
j 1 j 1
24
______________________________________________________________________________________
Analysis of Variance
Source DF SS MS F P
Regression 1 0.050 0.050 0.04 0.841
Residual Error 18 21.700 1.206
Total 19 21.750
Notice that the estimate of the slope (given in the column labeled “Coef” and the row
1 1
labeled “Formulat” above) is 0.05 ( y2 y1 ) (90.8 90.7) and the estimate of the
2 2
1 1
intercept is 90.75 ( y2 y1 ) (90.7 90.8) . Furthermore, notice that the t-statistic
2 2
associated with the slope is equal to 0.20, exactly the same value (apart from sign,
because we subtracted the averages in the reverse order) we gave in the text. Now in
simple linear regression, the t-test on the slope is actually testing the hypotheses
H0 : E 1 0
H0 : E 1 z 0
and this is equivalent to testing H0 : P 1 P2 .
It is easy to show that the t-test statistic used for testing that the slope equals zero in
simple linear regression is identical to the usual two-sample t-test. Recall that to test the
above hypotheses in simple linear regression the t-statistic is
E 1
t0
V 2
S xx
2 n
where Sxx ¦ ¦ (x
i 1 j 1
ij x ) 2 is the “corrected” sum of squares of the x’s. Now in our
25
______________________________________________________________________________________
1
E 1 ( y2 y1 ) y2 y1
t0 2
V 2 Sp
1
Sp
2
S xx 2n n
This is the usual two-sample t-test statistic for the case of equal sample sizes.
Most regression software packages will also compute a table or listing of the residuals
from the model. The residuals from the Minitab regression model fit obtained above are
as follows:
Obs Formulat Octane N Fit SE Fit Residual St Resid
1 -1.00 89.500 90.700 0.347 -1.200 -1.15
2 -1.00 90.000 90.700 0.347 -0.700 -0.67
3 -1.00 91.000 90.700 0.347 0.300 0.29
4 -1.00 91.500 90.700 0.347 0.800 0.77
5 -1.00 92.500 90.700 0.347 1.800 1.73
6 -1.00 91.000 90.700 0.347 0.300 0.29
7 -1.00 89.000 90.700 0.347 -1.700 -1.63
8 -1.00 89.500 90.700 0.347 -1.200 -1.15
9 -1.00 91.000 90.700 0.347 0.300 0.29
10 -1.00 92.000 90.700 0.347 1.300 1.25
11 1.00 89.500 90.800 0.347 -1.300 -1.25
12 1.00 91.500 90.800 0.347 0.700 0.67
13 1.00 91.000 90.800 0.347 0.200 0.19
14 1.00 89.000 90.800 0.347 -1.800 -1.73
15 1.00 91.500 90.800 0.347 0.700 0.67
16 1.00 92.000 90.800 0.347 1.200 1.15
17 1.00 92.000 90.800 0.347 1.200 1.15
18 1.00 90.500 90.800 0.347 -0.300 -0.29
19 1.00 90.000 90.800 0.347 -0.800 -0.77
20 1.00 91.000 90.800 0.347 0.200 0.19
The column labeled “Fit” contains the predicted values of octane number from the
regression model, which just turn out to be the averages of the two samples. The
residuals are in the sixth column of this table. They are just the differences between the
observed values of the octane number and the corresponding predicted values. A normal
probability plot of the residuals follows.
26
______________________________________________________________________________________
99 ML Estimates
Mean -0.0000000
95
StDev 1.04163
90
80
Goodness of Fit
70 AD* 0.942
Percent
60
50
40
30
20
10
5
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5
Data
Notice that the residuals plot approximately along a straight line, indicating that there is
no problem with the normality assumption in these data. This is equivalent to plotting the
original octane number data on separate probability plots as we did in Chapter 3.
yij P W i H ij
RSi 1,2, , a
Tj 1,2, , n
In addition, we will find the following useful:
E (H ij ) E (H i . ) E (H .. ) 0, E (H 2ij ) V 2 , E (H i2. ) nV 2 , E (H ..2 ) anV 2
27
______________________________________________________________________________________
Now
1 a 2 1
E ( SSTreatments ) E( ¦
ni1
yi . ) E ( y..2 )
an
Consider the first term on the right hand side of the above expression:
1 a 2 1 a
E( ¦ yi . )
ni1
¦
ni1
E (nP nW i H i . ) 2
because the three cross-product terms are all zero. Now consider the second term on the
right hand side of E ( SSTreatments ) :
FG 1 y IJ 1 a
E (anP n¦ W i H .. ) 2
H an K
2
E ..
an i 1
1
E (anP H .. ) 2
an
a
since ¦W
i 1
i 0. Upon squaring the term in parentheses and taking expectation, we obtain
FG 1 y IJ 1
[(anP ) 2 anV 2 ]
H an K
2
E ..
an
anP 2 V 2
since the expected value of the cross-product is zero. Therefore,
1 a 2 1
E ( SSTreatments ) E( ¦
ni1
yi . ) E ( y..2 )
an
a
anP 2 n ¦ W 2i aV 2 (anP 2 V 2 )
i 1
a
V 2 (a 1) n¦ W i2
i 1
28
______________________________________________________________________________________
FG SS IJ
H a 1 K
Treatments
E ( MSTreatments ) E
a
V 2 (a 1) n¦ W i2
i 1
a 1
a
n ¦ W i2
V2 i 1
a 1
This is the result given in the textbook.
For the error mean square, we obtain
§ SS E ·
E (MS E ) E¨ ¸
© N a¹
1 ª a n º
E « ¦¦ ( yij yi . )2 »
N a ¬i 1 j 1 ¼
1 ª a n 1 a º
E « ¦¦ yij2 ¦ yi2. »
N a ¬i 1 j 1 ni1 ¼
Substituting the model into this last expression, we obtain
1 ª a n 1 a § n · º
2
E (MS E ) E « ¦¦ (P W i H ij ) ¦ ¨ ¦ (P W i H ij ) ¸ »
2
N a «i 1 j 1 n i 1© j 1 ¹ »¼
¬
After squaring and taking expectation, this last equation becomes
1 § a a
·
E (MS E )
N a©¨ N P 2
n ¦
i 1
W 2
i N V 2
N P 2
n ¦
i 1
W i2 aV 2 ¸
¹
V2
but now the treatment effects W i are random variables, because the treatment levels
actually used in the experiment have been chosen at random. The population of
29
______________________________________________________________________________________
treatments is assumed to be normally and independently distributed with mean zero and
variance V W2 . Note that the variance of an observation is
V ( yij ) V ( P W i H ij )
V W2 V 2
We often call V W2 and V 2 variance components, and the random model is sometimes
called the components of variance model. All of the computations in the random model
are the same as in the fixed effects model, but since we are studying an entire population
of treatments, it doesn’t make much sense to formulate hypotheses about the individual
factor levels selected in the experiment. Instead, we test the following hypotheses about
the variance of the treatment effects:
H 0 : V W2 0
H1 : V W2 ! 0
The test statistic for these hypotheses is the usual F-ratio, F = MSTreatments/MSE. If the
null hypothesis is not rejected, there is no variability in the population of treatments,
while if the null hypothesis is rejected, there is significant variability among the
treatments in the entire population that was sampled. Notice that the conclusions of the
ANOVA extend to the entire population of treatments.
The expected mean squares in the random model are different from their fixed effects
model counterparts. It can be shown that
E (MSTreatments ) V 2 nV W2
E (MS E ) V 2
Frequently, the objective of an experiment involving random factors is to estimate the
variance components. A logical way to do this is to equate the expected values of the
mean squares to their observed values and solve the resulting equations. This leads to
MSTreatments MS E
VˆW2
n
Vˆ 2 MS E
A typical application of experiments where some of the factors are random is in
measurement systems capability study, as discussed in Chapter 7. The model used there
is a factorial model, so the analysis and the expected mean squares are somewhat more
complicated than in the single factor model considered here.
30
______________________________________________________________________________________
although as we point out, there can be some occasional advantage to the use of
probability limits, such as on the range chart to obtain a non-zero lower control limit.
The standard applications of attributes control charts almost always use the three-sigma
limits as well, although their use is potentially somewhat more troublesome here. When
three-sigma limits are used on attributes charts, we are basically assuming that the normal
approximation to either the binomial or Poisson distribution is appropriate, at least to the
extent that the distribution of the attribute chart statistic is approximately symmetric, and
that the symmetric three-sigma control limits are satisfactory.
This will, of course, not always be the case. If the binomial probability p is small and the
sample size n is not large, or if the Poisson mean is small, then symmetric three-sigma
control limits on the p or c chart may not be appropriate, and probability limits may be
much better.
For example, consider a p chart with p = 0.07 and n = 100. The center line is at 0.07 and
the usual three-sigma control limits are UCL = -0.007 = 0 and UCL = 0.147. A short
table of cumulative binomial probabilities computed from Minitab follows.
x P( X <= x )
0.00 0.0007
1.00 0.0060
2.00 0.0258
3.00 0.0744
4.00 0.1632
5.00 0.2914
6.00 0.4443
7.00 0.5988
8.00 0.7340
9.00 0.8380
10.00 0.9092
11.00 0.9531
12.00 0.9776
13.00 0.9901
14.00 0.9959
15.00 0.9984
16.00 0.9994
17.00 0.9998
18.00 0.9999
19.00 1.0000
20.00 1.0000
If the lower control limit is at zero and the upper control limit is at 0.147, then any
sample with 15 or more defective items will plot beyond the upper control limit. The
above table shows that the probability of obtaining 15 or more defectives when the
process is in-control is 1 – 0.9959 = 0.0041. This is about 50% greater than the false
alarm rate on the normal-theory three-sigma limit control chart (0.0027). However, if we
were to set the lower control limit at 0.01 and the upper control limit at 0.15, and
conclude that the process is out-of-control only if a control limit is exceeded, than the
false alarm rate is 0.0007 + 0.0016 = 0.0023, which is very close to the advertised value
31
______________________________________________________________________________________
of 0.0027. Furthermore, there is a nonzero LCL, which can be very useful in practice.
Notice that the control limits are not symmetric around the center line. However, the
distribution of p̂ is not symmetric, so this should not be too surprising.
There are several other interesting approaches to setting probability-type limits on
attribute control charts. Refer to Ryan (2000), Acosta-Mejia (1999), Ryan and
Schwertman (1997), Schwertman and Ryan (1997), and Shore (2000).
¦R
i 1
i
R , (19-1)
m
that are commonly encountered in practice. The estimator
V 1 R / d2 19-2)
is widely used after the application of control charts to estimate process variability and to
assess process capability. In Chapter 3 we report the relative efficiency of the range
estimator given in Equation (19-2) to the sample standard deviation for various sample
sizes. For example, if n = 5, the relative efficiency of the range estimator compared to
the sample standard deviation is 0.955. Consequently, there is little practical difference
between the two estimators. Equation (19-2) is also frequently used to determine the
32
______________________________________________________________________________________
usual 3-sigma limits on the Shewhart x chart in statistical process control. The
estimator
V 2 R / d 2* (19-3)
is more often used in gauge R & R studies and in variables acceptance sampling. Here
d 2* represents a constant whose value depends on both m and n. See Chrysler, Ford, GM
(1995), Military Standard 414 (1957), and Duncan (1986).
Patnaik (1950) showed that R / V is distributed approximately as a multiple of a F -
distribution. In particular, R / V is distributed approximately as d 2* F / Q , where Q
represents the fractional degrees of freedom for the F distribution. Patnaik (1950) used
the approximation
§ 1 1 5 ·
d 2* # d 2 ¨1 ¸. (19-4)
© 4Q 32Q 128Q 3 ¹
2
It has been pointed out by Duncan (1986), Wheeler (1995), and Luko (1996), among
others, that V 1 is an unbiased estimator of V and that V 22 is an unbiased estimator of V2.
For V 22 to be an unbiased estimator of V2, however, David (1951) showed that no
approximation for d 2* was required. He showed that
d 2* (d 22 Vn / m)1/ 2 , (19-5)
where Vn is the variance of the sample range with sample size n from a normal
population with unit variance. It is important to note that Vn d 32 , so Equation (19-5) can
be easily used to determine values of d 2* from the widely available tables of d2 and d3.
Thus, a table of d 2* values, such as the ones given by Duncan (1986), Wheeler (1995), and
many others, is not required so long as values of d2 and d3 are tabled, as they usually are
(once again, see Appendix Table VI). Also, use of the approximation
§ 1 ·
d 2
# d 2 ¨1 ¸
© 4Q ¹
given by Duncan (1986) and Wheeler (1995) becomes unnecessary.
The table of d 2* values given by Duncan (1986) is the most frequently recommended. If a
table is required, the ones by Nelson (1975) and Luko (1996) provide values of d 2* that
are slightly more accurate since their values are based on Equation (19-5).
It has been noted that as m increases, d 2* approaches d2. This has frequently been argued
using Equation (19-4) and noting that Q increases as m increases. The fact that
d 2* approaches d2 as m increases is more easily seen, however, from Equation (19-5) as
pointed out by Luko (1996).
Sometimes use of Equation (19-3) is recommended without any explanation. See, for
example, the AIAG measurement systems capability guidelines [Chrysler, Ford, and GM
33
______________________________________________________________________________________
(1995)]. The choice between V 1 and V 2 has often not been explained clearly in the
literature. It is frequently stated that the use of Equation (19-2) requires that R be
obtained from a fairly large number of individual ranges. See, for example, Bissell
(1994, p. 289). Grant and Leavenworth (1996, p. 128) state that “Strictly speaking, the
validity of the exact value of the d2 factor assumes that the ranges have been averaged for
a fair number of subgroups, say, 20 or more. When only a few subgroups are available, a
better estimate of V is obtained using a factor that writers on statistics have designated as
d 2* .” Nelson (1975) writes, “If fewer than a large number of subgroups are used,
Equation (19-2) gives an estimate of V which does not have the same expected value as
the standard deviation estimator.” In fact, Equation (19-2) produces an unbiased
estimator of V regardless of the number of samples m, whereas the pooled standard
deviation does not (refer to Section 12 of the Supplemental Text Material). The choice
between V 1 and V 2 depends upon whether one is interested in obtaining an unbiased
estimator of V or V2. As m increases, both estimators (19-2) and (19-3) become
equivalent since each is a consistent estimator of V.
It is interesting to note that among all estimators of the form cR (c ! 0), the one
minimizing the mean squared error in estimating V has
c d 2 / (d 2* ) 2 .
The derivation of this result is in the proofs at the end of this section. If we let
d2
V 3 R
(d 2* ) 2
MSE (V 3 )
FG1 d IJ 2
2
H (d ) K * 2
2
Luko (1996) compared the mean squared error of V 2 in estimating V to that of V 1 and
recommended V 2 on the basis of uniformly lower MSE values. By definition, V 3 leads to
further reduction in MSE. It is shown in the proofs at the end of this section that the
percentage reduction in MSE using V 3 instead of V 2 is
F d d IJ
50G
*
2 2
H d K *
2
34
______________________________________________________________________________________
Values of the percentage reduction are given in Table 19-1. Notice that when both the
number of subgroups and the subgroup size are small, a moderate reduction in mean
squared error can be obtained by using V 3 .
Table 19-1.
Percentage Reduction in Mean Squared Error from using
V 3 instead of V 2
Subgroup Number of Subgroups, m
Size, n 1 2 3 4 5 7 10 15 20
2 10.1191 5.9077 4.1769 3.2314 2.6352 1.9251 1.3711 0.9267 0.6998
3 5.7269 3.1238 2.1485 1.6374 1.3228 0.9556 0.6747 0.4528 0.3408
4 4.0231 2.1379 1.4560 1.1040 0.8890 0.6399 0.4505 0.3017 0.2268
5 3.1291 1.6403 1.1116 0.8407 0.6759 0.4856 0.3414 0.2284 0.1716
6 2.5846 1.3437 0.9079 0.6856 0.5507 0.3952 0.2776 0.1856 0.1394
7 2.2160 1.1457 0.7726 0.5828 0.4679 0.3355 0.2356 0.1574 0.1182
8 1.9532 1.0058 0.6773 0.5106 0.4097 0.2937 0.2061 0.1377 0.1034
9 1.7536 0.9003 0.6056 0.4563 0.3660 0.2623 0.1840 0.1229 0.0923
10 1.5963 0.8176 0.5495 0.4138 0.3319 0.2377 0.1668 0.1114 0.0836
Proofs
Proof:
35
______________________________________________________________________________________
Result 2: The value of c that minimizes the mean squared error of estimators of the form
d2
cR in estimating V is .
(d 2* ) 2
Proof:
MSE (V ) V 2 [c 2 (d 2* ) 2 2cd 2 1]
dMSE (V )
V 2 [2c(d 2* ) 2 2d 2 ] 0
dc
d2
c .
(d 2* ) 2
MSE (V 3 )
LM d (d ) 2 d
V2
2
2 * 2 2
OP
d 2 1 (from result 1)
N (d ) * 4
(d )
2
2 * 2
2 Q
=V M
L d 2 d 1OP
2 2
2 2
2
N (d ) (d ) Q* 2
2
* 2
2
F d IJ
V G1 2
2
2
H (d ) K * 2
2
Result 4: Let V 2
R
and V 3
d2
R . Then
MSE (V 2 ) MSE (V 3 ) LM
u 100 , the
OP
*
d2 * 2
(d 2 ) MSE (V 2 ) N Q
percent reduction in mean square error using the minimum mean square error estimator
R
instead of * [as recommended by Luko (1996)], is
d2
F d d IJ
50G
*
2 2
H d K *
2
36
______________________________________________________________________________________
Proof:
2V 2 (d 2* d 2 )
Luko (1996) shows that MSE (V 2 ) , therefore
d 2*
N d *
2 (d ) Q 2
* 2
V 2 LM 2(d d ) (d d )(d d ) OP
*
2 2
*
2 2 2
*
2
N d *
2 (d ) Q * 2
2
(d d ) F
*
d d I *
V2 2
d*
2
G
H d JK
22 2
*
2
2
(d d ) F d d I
* *
V2 2
d*
2
GH d JK
2 2
*
2
2
(d 2* d 2 ) 2
V2
(d 2* ) 2
Consequently
LM MSE (V ) MSE (V ) OP u 100
2 3 V 2 (d 2* d 2 ) 2 / (d 2* ) 2
u 100 50
d 2* d 2
.
FG IJ
N MSE (V ) Q 2 2V 2 (d 2* d 2 ) / (d 2* ) d 2* H K
20. Control Charting Past Values Versus Future Observations (Phase 1 & Phase 2
of Control Chart Usage)
There are two distinct phases of control chart usage. In Phase 1 (some authors say Stage
1) we plot a group of points all at once in a retrospective analysis, constructing trial
control limits to determine if the process has been in control over the period of time
where the data was collected, and to see if reliable control limits can be established to
monitor future production. In Phase 2 we use the control limits to monitor the process
by comparing the sample statistic for each sample as it is drawn from the process to the
control limits.
Thus in Phase 1, we are comparing a collection of say m points to a set of control limits.
Suppose for the moment that the process is normally distributed with known mean and
variance, and that the usual three-sigma limits are used on the control chart for x , so that
the probability of a single point plotting outside the control limits when the process is in
control is 0.0027. The question we address is if the averages of a set of m samples or
37
______________________________________________________________________________________
subgroups from an in-control process is plotted on this chart, what is the probability that
at least one of the averages will plot outside the control limits?
This is just the probability of obtaining at least one success out of m Bernoulli trials with
constant probability of success p = 0.0027. A brief tabulation of the calculations is
shown below:
Number of 1 5 10 20 25 50
subgroups,
m
Probability 0.0027 0.0134 0.0267 0.0526 0.0656 0.1264
of at least
one point
beyond the
control
limits
Notice that as the number of subgroups increases, the probability of finding at least one
point outside the limits increases. Furthermore, for the typical number of samples or
subgroups often used to construct Shewhart control charts (20 or 25), we observe that the
chances of finding at least one sample out of control is about an order of magnitude larger
than the probability of finding a single point out of control when the points are plotted
one-at-a-time in monitoring future production (0.0027).
Two obvious questions are (1) what is going on here, and (2) is this anything to worry
about?
Question 1 has an obvious answer. These are simply binomial probability calculations
and they give you some insight about what is likely to happen when you first analyze a
collection of samples retrospectively to establish control limits. They also point out what
is likely to occur when a batch of new samples is employed to revise the control limits on
a chart. It could also happen in monitoring future production if several points are added
to the chart at once, because the chart is only updated once per shift or once per day.
The answer to question 2 is also simple. In effect, when establishing trial control limits
you should probably expect some points to fall outside of the limits. This may happen
because the process was not in control when the preliminary samples were taken, but
even if the process is stable, the chances of seeing at least one out-pf-control point is not
0.0027. It depends on how many preliminary samples are used in the calculations. It is
possible to adjust the control limits (make them wider) to compensate for this, but there
isn’t a practical necessity to do this in most cases. This is also one of the reason that when
we can’t find an assignable cause for all the out-of-control points in the preliminary data
when establishing trial control limits, we usually just delete the points without too much
fanfare unless there are a lot of these points (you might find it helpful to reread the
discussions on trial control limits in chapters 5 and 6).
These calculations have been performed assuming that the process parameters were
known, so that the limits are based on standard values. In practice, the preliminary
38
______________________________________________________________________________________
observations will be used to estimate parameters and control limits. Consequently, the
deviations of the group of points plotted on the chart from the limits exhibit correlation,
and we can’t even calculate the false alarm probabilities analytically. In Shewhart
control charting, we assume that enough preliminary data is used so that very reliable
(essentially known) values of the process parameters are being used in the control limit
calculations. That’s why we recommend that 20 – 25 subgroups be used when setting up
control charts. It would be possible to determine the probabilities (or control limits that
give specified probability of false alarms) by simulation. Refer to Sullivan and Woodall
(1996) for a nice discussion of this. We presented their simulation based control limits in
the multivariate control chart example for subgroups of size n = 1 in Chapter 10.
It is fairly typical in Phase I to assume that the process is initially out of control, so a
frequent objective of the analyst is to bring the process into a state of statistical control.
Sometimes this will require several cycles in which the control chart is employed,
assignable causes are detected and corrected, revised control limits are calculated, and the
out-of-control action plan is updated and expanded. Generally, Shewhart control charts
are very effective in Phase I because they are easy to construct, patterns on the control
charts are often easy to interpret and have direct physical meaning, and most often the
types of assignable causes that usually occur in Phase 1 result in fairly large process
shifts – exactly the scenario in which the Shewhart control chart is most effective.
In Phase 2 we usually assume that the process is reasonably stable. Often, the assignable
causes that occur in Phase 2 result in smaller process shifts, because (hopefully) most of
the really ugly sources of variability have been systematically removed during Phase 1.
Our emphasis is now on process monitoring, not on bringing an unruly process into
control. Shewhart control charts are much less likely to be effective in Phase 2 because
they are not very sensitive to small to moderate size process shifts. Attempts to solve this
problem by employing sensitizing rules such as those discussed in Chapter 4 are likely to
be unsatisfactory, because the use of these supplemental sensitizing rules increases the
false alarm rate of the Shewhart control chart. Consequently, Cusum and EWMA control
charts are much more likely to be effective in Phase 2.
39
______________________________________________________________________________________
and in trying to bring an unruly process into control, but even then they need to be used
carefully to avoid false alarms.
Obviously, Cusum and EWMA control charts provide an effective alternative to
Shewhart control charts for the problem of small shifts. However, Klein (2000) has
proposed another solution. His solution is simple but elegant: use an r out of m
consecutive point rule, but apply the rule to a single control limit rather than to a set of
interior “warning” type limits. He analyzes the following two rules:
1. If two consecutive points exceed a control limit, the process is out of control. The
width of the control limits should be 1.78 V .
2. If two out of three consecutive points exceed a control limit, the process is out of
control. The width of the control limits should be 1.93 V .
These rules would be applied to one side of the chart at a time, just as we do with the
Western Electric rules.
Klein (2000) presents the ARL performance of these rules for the x chart, using actual
control limit widths of r1.7814V and r 1.9307V , as these choices make the in-control
ARL exactly equal to 370, the values associated with the usual three-sigma limits on the
Shewhart chart. The table shown below is adapted from his results. Notice that
Professor Klein’s procedure greatly improves the ability of the Shewhart x chart to detect
small shifts. The improvement is not as much as can be obtained with an EWMA or a
Cusum, but it is substantial, and considering the simplicity of Klein’s procedure, it should
be more widely used in practice.
Shift in process ARL for the ARL for the ARL for the
mean, in standard Shewhart x chart Shewhart x chart Shewhart x chart
deviation units with three-sigma with 1.7814V with 1.9307V
control limits control limits control limits
0 370 350 370
0.2 308 277 271
0.4 200 150 142
0.6 120 79 73
0.8 72 44 40
1 44 26 23
2 6.3 4.6 4.3
3 2 2.4 2.4
40
______________________________________________________________________________________
1 T
where xT ,t ¦ xi is the reverse cumulative average; that is, the average of the T – t
T t i t 1
most recent subgroup averages. The value of t that maximizes Ct is the estimator of the
last subgroup that was selected from the in-control process.
41
______________________________________________________________________________________
useful in instrument calibration, where one measurement on each unit is from a standard
instrument (say in a laboratory) and the other is from an instrument used in different
conditions (such as in production).
42
______________________________________________________________________________________
Furnace 6
2
9
Location1 3 1 5
4
7 8
Furnace xxx
Location2
Furnace
Location3
Furnace
Location4
Figure 24-1. Diagram of a Furnace where four wafers are simultaneously processed and
nine quality measurements are performed on each wafer.
The most widely used approach to monitoring these processes is to first consider the
average of all mn observations from a run as a single observation and to use a Shewhart
control chart for individuals to monitor the overall process mean. The control limits for
this chart are usually found by applying a moving range to the sequence of averages.
Thus, the control limits for the individuals chart reflect run-to-run variability, not
variability within a run. The variability within a run is monitored by applying a control
chart for S (the standard deviation) or S 2 to all mn observations from each run. It is
interesting to note that this approach is so widely used that at least one popular statistical
software package (Minitab) includes it as a standard control charting option (called the
“between – within” procedure in Minitab). This procedure was illustrated in Example 5-
11.
Runger and Fowler (1999) show how the structure of the data obtained on these processes
can be represented by an analysis of variance model, and how control charts based on
contrasts can be designed to detect specific assignable causes of potential interest.
Below we briefly review their results and relate them to some other methods. Then we
analyze the average run performance of the contrast charts and show that the use of
specifically designed contrast charts can greatly enhance the ability of the monitoring
scheme to detect assignable causes. We confine our analysis to Shewhart charts, but
both Cusum and EWMA control charts would be very effective alternatives, because they
43
______________________________________________________________________________________
are more effective in detecting small process shifts, which are likely to be of interest in
many of these applications
Contrast Control Charts
We consider the oxidation process in Figure 24-1, but allow m wafers in each run with n
measurements or sites per wafer. The appropriate model for oxide thickness is
yij ri s j H ij (24-1)
where yij is the oxide thickness measurement from run i and site j, ri is the run effect, sj is
the site effect, and H ij is a random error component. We assume that the site effects are
fixed effects, since the measurements are generally taken at the same locations on all
wafers. The run effect is a random factor and we assume it is distributed as NID (0, V r2 ) .
We assume that the error term is distributed as NID (0, V H2 ) . Notice that equation (24-1)
is essentially an analysis of variance model.
Let yt be a vector of all measurements from the process at the end of run t. It is
customary in most applications to update the control charts at the completion of every
run. A contrast is a linear combination of the elements of the observation vector yt , say
c ccy t
where the elements of the vector c sum to zero and, for convenience, we assume that the
contrast vector has unit length. That is,
c c1 0 and c cc 1
Any contrast vector is orthogonal to the vector that generates the mean, since the mean
can be written as
yt 1c y t / mn
Thus, a contrast generates information that is different from the information produced by
the overall mean from the current run. Based on the particular problem, the control chart
analyst can choose the elements of the contrast vector c to provide information of interest
to that specific process.
For example, suppose that we were interested in detecting process shifts that could cause
a difference in mean thickness between the top and bottom of the furnace. The
engineering cause of such a difference could be a temperature gradient along the furnace
from top to bottom. To detect this disturbance, we would want the contrast to compare
the average oxide thickness of the top wafer in the furnace to the average thickness of the
bottom wafer. Thus, if m = 4, the vector c has mn = 36 components, the first 9 of which
44
______________________________________________________________________________________
are +1, the last 9 of which are –1, and the middle 18 elements are zero. To normalize the
contrast to unit length we would actually use
c c [1,1,...,1,0,0,...,0,1,1,...,1] / 18
One could also divide the elements of c by nine to compute the averages of the top and
bottom wafers, but this is not really necessary.
In practice, a set of k contrasts, say
c1 , c2 ,..., ck
can be used to define control charts to monitor a process to detect k assignable causes of
interest. These simultaneous control charts have overall false alarm rate D, where
k
D 1 (1 D i ) (24-2)
i 1
and Di is the false alarm rate for the ith contrast. If the contrasts are orthogonal, then
Equation (24-2) holds exactly, while if the contrasts are not orthogonal then the
Bonferroni inequality applies and the D in Equation (2) is a lower bound on the false
alarm rate.
Related Procedures
Several authors have suggested related approaches for process monitoring when non-
standard conditionss relative to rational subgrouping apply. Yashchin (1994), Czitrom
and Reese (1997), and Hurwicz and Spagon (1997) all present control charts or other
similar techniques based on variance components. The major difference in this approach
in comparison to these authors is the use of an analysis-of-variance type partitioning
based on contrasts instead of variance components as the basis of the monitoring scheme.
Roes and Does (1995) do discuss the use of contrasts, and Hurwicz and Spagon discuss
contrasts to estimate the variance contributed by sites within a wafer. However, the
Runger and Fowler model is the most widely applicable of all the techniques we have
encountered.
Even though the methodology used to monitor specific differences in processing
conditions has been studied by all these authors, the statistical performance of these
charts has not been demonstrated. We now present some performance results for
Shewhart control charts.
45
______________________________________________________________________________________
c1 1
9
[1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1, 1, 1]
c2 1
16
[0,0,0,0,0,1,1,1,1,0,0,0,0,0,1,1, 1,1,0,0,0,0,0,1,1,1,10,0,0,0,0,1,1,1,1]
c3 1
32
[0,1,1,1,1,1, 1, 1, 1,0,1,1,1,1, 1, 1, 1,1,0,1,1,1,1, 1, 1,1,1,0,1,1,1,1,1,1,1,1]
A comparison of the ARL values obtained using these contrasts and the traditional
approach – an individuals control chart for the mean of all 36 observations- is presented
in Tables 24-1, 24-2, and 24-3. From inspection of these tables, we see that the charts for
the orthogonal contrasts, originally with the same in-control ARL as the traditional chart,
are more sensitive to changes at specific locations, thus improving the chances of early
detection of an assignable cause. Notice that the improvement is dramatic for small shifts,
say on the order of 1.5 standard deviations or less.
A similar analysis was performed for a modified version of the process shown in figure
24-1. In this example, there are seven measurements per wafer for a total of 28
measurements in a run. There are still three measurements at the center of the wafer, but
now there are only measurements at the perimeter; one in each “corner”. The same types
of contrasts used in the previous example (top versus bottom, left versus right and edge
versus center) were analyzed and the ARL results are presented in Tables 24-4, 24-6, and
24-6.
46
______________________________________________________________________________________
1 1.9 2.2
2 1 1
2.5 1 1
3 1 1
0.5 23.4 47
1 3.9 10
2 1.1 1.7
2.5 1 1.2
3 1 1
47
______________________________________________________________________________________
1 4.6 13.6
2 1.1 2.2
2.5 1 1.4
3 1 1.1
Table 24- 4. Average Run Length Comparison between Traditional and Orthogonal
Contrast Charts for a shift in the Edge of all Wafers. In this chart m = 4 and n = 7.
1 4.6 9.8
2 1.1 1.7
2.5 1 1.2
3 1 1
48
______________________________________________________________________________________
Table 24-5. Average Run Length Comparison between Traditional and Orthogonal
Contrast Charts for a change in the Top Wafer. In this chart m = 4 and n = 7.
Size of Shift Top versus Bottom Traditional Chart
In Multiples of V Contrast
1 5.5 13.8
1.5 2 4.7
2 1.2 2.2
2.5 1 1.4
3 1 1.1
Table 24-6. Average Run Length Performance of Traditional and Orthogonal Contrast
Charts for a shift in the left side of all Wafers. In this chart m = 4 and n = 7.
1 4.6 9.8
2 1.1 1.7
2.5 1 1.2
3 1 1
Decreasing the number of measurements per wafer has increased the relative importance
of the changes in the mean of a subset of the observations and the traditional control
charts signal the shift faster than in the previous example. Still, note that the control
charts based on orthogonal contrasts represent a considerable improvement over the
traditional approach.
49
______________________________________________________________________________________
50
______________________________________________________________________________________
different because the instruments are in different physical locations. Then operators are
nested within instruments, and the experiment has been conducted as a nested design.
As another example, suppose that the operators are not selected at random, because the
specific operators used in the study are the only ones that actually perform the
measurements. This is a mixed model experiment, and the random effects approach that
the tabular method is based on is inappropriate. The random effects model analysis of
variance approach in the text is also inappropriate for this situation. Dolezal, Burdick,
and Birch (1998) and Montgomery (1997) discuss the mixed model analysis of variance
for gage R & R studies.
The tabular approach does not lend itself to constructing confidence intervals on the
variance components or functions of the variance components of interest. For that reason
we do not recommend the tabular approach for general use. There are three general
approaches to constructing these confidence intervals: (1) the Satterthwaite method, (2)
the maximum likelihood large-sample method, and (3) the modified large sample
method. Montgomery (1997) gives an overview of these different methods. Of the three
approaches, there is good evidence that the modified large sample approach is the nest in
the sense that it produces confidence intervals that are closest to the stated level of
confidence.
Hamada and Weerahandi (2000) show how generalized inference can be applied to the
problem of determining confidence intervals in measurement systems capability studies.
The technique is somewhat more involved that the three methods referenced above.
Either numerical integration or simulation must be used to find the desired confidence
intervals.
While the tabular method should be abandoned, the control charting aspect of
measurement systems capability studies should be used more consistently. All too often
a measurement study is conducted and analyzed via some computer program without
adequate graphical analysis of the data. Furthermore, some of the advice in various
quality standards and reference sources regarding these studies is just not very good and
can produce results of questionable validity.
28. The Brook and Evans Markov Chain Approach to Finding the Average Run
Length of the Cusum and EWMA Control Charts
When the observations drawn from the process are independent, average run lengths or
ARLs are easy to determine for Shewhart control charts because the points plotted on the
chart are independent. The distribution of run length is geometric, so the ARL of the
chart is just the mean of the geometric distribution, or 1/p, where p is the probability that
a single point plots outside the control limits.
The sequence of plotted points on Cusum and EWMA charts is not independent, so
another approach must be used to find the ARLs. The Markov chain approach developed
by Brook and Evans (1972) is very widely used. We give a brief discussion of this
procedure for a one-sided Cusum.
51
______________________________________________________________________________________
The Cusum control chart statistic C (or C ) form a Markov process with a continuous
state space. By discretizing the continuous random variable C (or C ) with a finite set
of values, approximate ARLs can be obtained from Markov chain theory. For the upper
one-sided Cusum with upper decision interval H, the intervals are defined as follows:
(f, w / 2],[ w / 2,3w / 2], ,[(k 1/ 2) w, (k 1/ 2) w], ,[(m 3/ 2) w, H ],[ H , f) where m
+ 1 is the number of states and w = 2H/(2m- 1). The elements of the transition
probability matrix of the Markov chain P [ pij ] are
w/ 2
pi 0 ³ f
f ( x iw k )dx, i 0,1,..., m 1
( j i / 2) w i 0,1,..., m 1
pij ³( j i / 2) w
f ( x iw k )dx ®
¯ j 1, 2,..., m 1
f
pim ³ H
f ( x iw k )dx, i 0,1,..., m 1
pmj 0, j 0,1,..., m 1
pmm 1
The absorbing state is m and f denotes the probability density function of the variable that
is being monitored with the Cusum.
From the theory of Markov chains, the expected first passage times from state i to the
absorbing state are
m 1
P i 1 ¦ pij P j , i 0,1,..., m 1
j 0
Thus, P i is the ARL given that the process started in state i. Let Q be the matrix of
transition probabilities obtained by deleting the last row and column of P. Then the
vector of ARLs P is found by computing
P I Q 1
where 1 is an mu 1 vector of 1s and I is the m u m identity matrix.
When the process is out of control, this procedure gives a vector of initial-state (or zero-
state) ARLs. That is, the process shifts out of control at the initial start-up of the control
chart. It is also possible to calculate steady-state ARLs that describe performance
assuming that the process shifts out of control after the control chart has been operating
for a long period of time. There is typically very little difference between initial-state and
steady-state ARLs.
Let P(n, i ) be the probability that run length takes on the value n given that the chart
started in state i. Collect these quantities into a vector say
pcn [ P (n,0), P(n,1),..., P(n, m 1)]
for n = 1,2, …. These probabilities can be calculated by solving the following equations:
52
______________________________________________________________________________________
p1 ( I Q) 1 1
pn Qp n 1 , n 2,3,...
This technique can be used to calculate the probability distribution of the run length,
given the control chart started in state i. Some authors believe that the distribution of run
length or its percentiles is more useful that the ARL, since the distribution of run length is
usually highly skewed and so the ARL may not be a ‘typical” value in any sense.
29. Integral Equations Versus Markov Chains for Finding the ARL
Two methods are used to find the ARL distribution of control charts, the Markov chain
method and an approach that uses integral equations. The Markov chain method is
described in Section 28 of the Supplemental Text Material. This section gives an
overview of the integral equation approach for the Cusum control chart. Some of the
notation defined in Section 28 will be used here.
Let P(n, u ) and R (u ) be the probability that the run length takes on the value n and the
ARL for the Cusum when the procedure begins with initial value u. For the one-sided
upper Cusum
H
P(1, u ) 1 ³ f ( x u k )dx
f
w/ 2 m 1 ( j 1/ 2) w
1 ³ f ( x u k )dx ¦ ³ f ( x u k )dx
f ( j 1/ 2) w
j 1
and
0 H
P(n, u ) P(n 1, 0)³ f ( x u k )dx ³ P(n 1, y ) f ( x u k )dx
f 0
0 w/ 2
P(n 1, 0) ³ f ( x u k )dx ³ P(n 1, y ) f ( x u k )dx
f 0
m 1 ( j 1/ 2) w
¦³ P(n 1, y ) f ( x u k )dx
( j 1/ 2) w
j 1
0 w/ 2
P(n 1, 0) ³ f ( x u k )dx P(n 1, H 0 ) ³ f ( x u k )dx
f 0
m 1 ( j 1/ 2) w
¦ P(n 1, H j ) ³ f ( x u k )dx
( j 1/ 2) w
j 1
53
______________________________________________________________________________________
But these last equations are just the equations used for calculating the probabilities of
first-passage times in a Markov chain. Therefore, the solution to the integral equation
approach involves solving equations identical to those used in the Markov chain
procedure.
Champ and Rigdon (1991) give an excellent discussion of the Markov chain and integral
equation techniques for finding ARLs for both the Cusum and the EWMA control charts.
They observe that the Markov chain approach involves obtaining an exact solution to an
approximate formulation of the ARL problem, while the integral equation approach
involves finding an approximate solution to the exact formulation of the ARL problem.
They point out that more accurate solutions can likely be found via the integral equation
approach. However, there are problems for which only the Markov chain method will
work, such as the case of a drifting mean.
54