0% found this document useful (0 votes)

60 views20 pages

Process Capability

This document provides an overview of process capability analysis and defines key terms like capability indices Cp and Cpk. It discusses how these indices are used to evaluate whether a production process can meet requirements defined by specification limits. It also describes how the indices Cp and Cpk are calculated and how they differ based on whether the process is centered. Finally, it discusses methods for estimating capability indices based on sample data.

Uploaded by

Kaya Eralp Asan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views20 pages

Process Capability

Uploaded by

Kaya Eralp Asan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Chapter 1

Process Capability Analysis

In this chapter we give the mathematical background of process capability analysis, in particular
the capability indices Cp and Cpk and related topics like tolerance intervals and density estimation.
For a detailed mathematical account of capability indices, we refer to [13].

1.1 Capability indices

Usually the items produced by a production process have to meet customer requirements 1 . Re-
quirements may also be set by the government through legislation. It is therefore important to
know beforehand whether the inherent variation within the production process is such that it can
meet these requirements. Requirements are usually defined as specification limits. We denote
the upper specification limit by USL and the lower specification limit by LSL. Products that fall
outside specification limits are called non-conforming. Within SPC an investigation whether the
process can meet the requirements is called a Process Capability Analysis.
A straightforward way to describe process capability would be to use sample mean and sample
standard deviation. As natural bandwidth of a process one usually takes 6σ, which implicitly
assumes a normal distribution. For a random variable X which is normally distributed X with
parameters µ and σ 2 , it holds that P (X > 3σ) = 0.00135, and thus P (−3σ < X < 3σ) = 0.9973.
This is a fairly arbitrary, but widely accepted choice.
Whether a process fits within the 6σ-bandwidth, is often indicated in industry by so-called
Process Capability Indices. The simplest capability index is called Cp (in order to avoid confusion
with Mallow’s regression diagnostic value Cp one sometimes uses Pp ) and is defined as
U SL − LSL
Cp = .
6σ
Note that this quantity has the advantage of being dimensionless The quantity 1/Cp is known as
the capability ratio (often abbreviated as CR). It will be convenient to write
1
d= (U SL − LSL).
2
The capability index Cp is useful if the process is centred around the middle of the specification
interval. If that is the case, then the proportion of non-conforming items of a normally distributed
characteristic X equals

1 − P (LSL < X < U SL) = 2Φ(−d/σ) = 2Φ(−3Cp ). (1.1)

If the process is not centred, then the expected proportion of non-conforming items will be higher
than the value of Cp seems to indicate. Therefore the following index has been introduced for
1 In modern business a customer is any person that receives produced items. Hence, this may be another

department within the same plant.

1.1
1.2 CHAPTER 1. PROCESS CAPABILITY ANALYSIS

non-centred processes:
µ ¶
U SL − µ µ − LSL
Cpk = min , .
3σ 3σ
1
Using the identity min(a, b) = 2 (|a + b| − |a − b|), we obtain the following representations:

min (U SL − µ, µ − LSL)
Cpk =
3σ
d − |µ − 12 (LSL + U SL)|
=
3σ
µ ¶
|µ − 12 (LSL + U SL)|
= 1− Cp . (1.2)
d

We immediately read off from (1.2) that Cp ≥ Cpk . Moreover, since Cp = d/(3σ), we also have
that Cp = Cpk if and only the process is centred. The notation

|µ − 12 (LSL + U SL)|
k=
d
is often used. It is also possible to define Cpk in terms of a target value T instead of the process
mean µ.
The expected proportion non-conforming items for a non-centred process with normal distri-
bution
³ can´be defined
³ in terms
´ of Cp and Cpk as follows (cf. (1.1). The expected proportion equals
Φ LSL−µ
σ + 1 − Φ U SL−µ
σ . Now assume that 12 (U SL + LSL) ≤ µ ≤ U SL. Then Cpk = U SL−µ
3σ
and
LSL − µ (U SL − µ) − (U SL − LSL)
= = Cpk − 2Cp ≤ −Cpk ,
3σ 3σ
because Cp ≥ Cpk . Hence, the expected proportion non-conforming items can be expressed as

1 − P (LSL < X < U SL) = Φ(−3(2Cp − Cpk )) + Φ(−3Cpk ). (1.3)

1.2 Estimation of capability indices

In this section we first recall some general estimation principles. These principles will be used to
obtain (optimal) estimators for capability indices.

Definition 1.2.1 Let P be a probability distribution depending on parameters θ1 , . . . , θk , where

θ = (θ1 , . . . , θk ) ranges over a set Θ ⊂ Rk and let L(θ; x1 , . . . , xn ) be the likelihood function of a
sample X1 , . . . , Xn from f . Let τ be an arbitrary function on Θ. The function Mτ (ξ; x1 , . . . , xn ) :=
supθ∈Θ : τ (θ)=ξ L(θ; x1 , . . . , xn ) is the induced likelihood function by τ . Any number ξ ∈ Θ that
maximizes Mτ is said to be an MLE of τ (θ).

The rationale behind this definition is as follows. Estimation of θ is obtained by maximizing the
likelihood function L(θ; x1 , . . . , xn ) as function of θ for fixed x1 , . . . , xn , while estimation of τ (θ)
is obtained by maximizing the induced likelihood function L(τ (θ); x1 , . . . , xn ) as function of τ (θ)
for fixed x1 , . . . , xn .
The following theorem describes a useful invariance property of Maximum Likelihood estima-
tors. Note that the theorem does not require any assumption on the function τ .

Theorem 1.2.2 (Zehna [24]) Let P be a distribution depending on parameters θ1 , . . . , θk and

b = (Θ
let Θ c1 , . . . , Θ
ck ) be an MLE of (θ1 , . . . , θk ). If τ is an arbitrary function with domain Θ, then
b
τ (Θ) is an MLE of τ ((θ1 , . . . , θk )). If moreover the MLE (Θ) b is unique, then τ (Θ)
b is unique too.
1.2. ESTIMATION OF CAPABILITY INDICES 1.3

Proof: Define τ −1 (ξ) := {θ ∈ Θ | τ (θ) = ξ} for any ξ ∈ Θ. Obviously, θ ∈ τ −1 (τ (θ)) for all θ ∈ Θ.
Hence, we have for any ξ ∈ Θ that
Mτ (ξ; x1 , . . . , xn ) = sup L(θ; x1 , . . . , xn )
θ∈τ −1 (ξ)

≤ sup L(θ; x1 , . . . , xn )
θ∈Θ
b x1 , . . . , xn )
= L(Θ;
= sup L(θ; x1 , . . . , xn )
θ∈τ −1 (τ (Θ)
b )
³ ´
b ; x1 , . . . , xn ).
= Mτ (τ Θ

b maximizes the induced likelihood function, as required. Inspection of the proof reveals
Thus τ (Θ)
b b is the unique MLE of τ ((θ1 , . . . , θk )) . ¤
that if Θ is the unique MLE of (θ1 , . . . , θk ), then τ (Θ)

We now give some examples that illustrate how to use this invariance property in order to obtain
an MLE of a function of a parameter.
Examples 1.2.3 Let X, X1 , X2 , . . . , Xn be independent random variables, each distributed ac-
cording to the normal distribution with parameters µ and σ 2 . Let Z be a standard normal random
variable with distribution function Φ. Recall that the ML estimators for µ and σ 2 are µ
b = X and
c2 = 1 Pn (Xi − X)2 , respectively.
σ n i=1

a) Suppose we want to estimate σ, when µ is unknown.

v Theorem 1.2.2 with Θ = (0, ∞) and
u n
√ u1 X ¡ ¢2
b of σ equals t
τ (x) = x yields that the MLE σ Xi − X .
n i=1

b) Suppose we want to estimate 1/σ, when µ is unknown. Theorem 1.2.2 with Θ = (0, ∞) and
Ã n
!−1/2
√ 1 X 2
τ (x) = 1/ x yields that the MLE of σ equals (Xi − X) . The MLE’s for Cp
n i=1
U SL−LSL min(U SL−X,X−LSL)
and Cpk easily follow from the MLE of 1/σ and are given by 6σ̂ and 3σ̂ .
c) Let p be an arbitrary number between 0 and 1 and assume that both µ and σ 2 are unknown.
Suppose that we want to estimate the p-th quantile of X, that is we want to estimate the
unique number xp such that P (X ≤ xp ) = p. Since
µ ¶ µ ¶
xp − µ xp − µ
p = P (X ≤ xp ) = P Z ≤ =Φ ,
σ σ
it follows that xp = µ + zp σ, where zp := Φ−1 (p). Thus Theorem 1.2.2 with Θ = R × (0, ∞)
√
and τ (x, y) = x + zp y yields that the MLE of xp equals X + zp σb, where σ
b is as in a).
d) Let a < b be arbitrary real numbers and assume that both µ and σ 2 are unknown. Suppose
we want to estimate P (a < X < b) = F (b) − F (a). Since
µ ¶ µ ¶ µ ¶
a−µ b−µ b−µ a−µ
P (a < X < b) = P <Z< =Φ −Φ ,
σ σ σ σ
µ ¶ µ ¶
b−x a−x
Theorem 1.2.2 with Θ = R × (0, ∞) and τ (x, y) = Φ √ −Φ √ yields that the
y y
MLE for P (a < X < b) equals
µ ¶ µ ¶
b−X a−X
Φ −Φ ,
σ
b σ
b
where σ
b is as in a).
1.4 CHAPTER 1. PROCESS CAPABILITY ANALYSIS

It µfollows from¶ d) that

µ the ML-estimator
¶ for proportion non-conforming items is given by 1 −
U SL − X LSL − X
Φ +Φ . This is a biased estimator (cf. Exercise 5). Using the Rao-
σ
b σ
b
Blackwell Theorem, we find an unbiased estimator. Define
(
0 if LSL < X1 < U SL
Y =
1 otherwise.

Since (X, S) are joint complete sufficiently statistics, the Rao-Blackwell theorem in combination
with the Lehmann-Scheffé theorem yields that E(Y | X, S) is an UMVU-estimator of the propor-
tion non-conforming items. For various explicit formulas of this quantity, we refer to [6, 7, 23].

1.3 Exact distribution of capability indices

Now that we have constructed several estimators, we want to study their distribution. It is well-
known that MLE’s are not unbiased in general. E.g, S is biased.
Recall that if X1 , . . . , Xn are independent random variables each with a N (µ, σ 2 ) distribution,
√
then the random variable (n−1)S 2 /σ 2 has a χ2n−1 -distribution. The expected value of n − 1 S/σ
thus equals √
Z ∞ √
t (n−1)/2−1 −t/2 Γ(n/2)
(n−1)/2 Γ((n − 1)/2)
t e dt = 2 .
0 2 Γ((n − 1)/2)
Hence, √
2 Γ(n/2)
E (S) = √ σ.
n − 1 Γ((n − 1)/2)
An unbiased estimator for σ is thus given by c4 S, where
√
2 Γ(n/2)
c4 (n) = √ .
n − 1 Γ((n − 1)/2)
√
Recall that Γ(x + 1) = xΓ(x) for√x 6=
√ 0, −1, −2, . . ., Γ(1) √
= 1, and√Γ(1/2) = π. Thus we have
the following recursion: c4 (2) = 2/ π and c4 (n + 1) = ( n − 1/ n)(1/c4 (n)).
Instead of the ML estimators for Cp and Cpk , one usually uses the estimators

bp = U SL − LSL
C
6S
and µ ¶
bpk = min U SL − X USL − X
C , ,
3S 3S
where X denotes the sample mean and S denotes the sample standard deviation.
Confidence intervals and hypothesis tests for Cp easily follow from the identity
Ã ! µ ¶
bp
C n−1
2
P > c = P χn−1 < .
Cp c2

In particular, it follows that

µ ¶1/2
bp = n−1 Γ ((n − 2)/2)
EC Cp .
2 Γ ((n − 1)/2)
bpk is quite complicated, but explicit formulas can be given because X and S
The distribution of C
are independent. We refer to [13] for details.
In order to describe the exact distribution of the ML estimator for quantiles of a normal
distribution, we need a generalization of the Student t-distribution.
1.4. ASYMPTOTIC DISTRIBUTION OF CAPABILITY INDICES 1.5

Definition 1.3.1 (Noncentral t-distribution) Let Z be a standard normal random variable

and let Y be a χ2 -distributed random variable with ν degrees of freedom. If Z and Y are indepen-
dent, then the distribution of
Z +δ
q
Y
ν

is called a noncentral t-distribution with ν degrees of freedom and non-centrality parameter δ.

For further properties and examples of the use of the noncentral t-distribution, we refer to [13, 15].

Theorem 1.3.2 The MLE X bp = X + zp σ b for xp with an underlying normal distribution is

distributed as follows:
µ µ√ ¶ ¶
n(µ − t) √
P (X + zp σ
b ≤ t) = P Tn ≤ −zp n , (1.4)
σ

where Tν (λ) denotes a random variable distributed according to the noncentral t-distribution with
ν degrees of freedom and noncentrality parameter λ.

Proof: Recall that n σ c2 /σ 2 follows a χ2 -distribution with n degrees of freedom. Combining this
with the definition of the noncentral t-distribution (see Definition 1.3.1), we obtain
µ √ √ ¶
X −µ n n (t − µ)
P (X + zp σ b ≤ t) = P √ + zp σ
b≤
σ/ n σ σ
µ √ √ ¶
n (µ − t) n
= P Z+ ≤ −zp σ
b
σ σ
µ µ√ ¶ ¶
n (µ − t) √
= P Tn ≤ −zp n .
σ
¤

1.4 Asymptotic distribution of capability indices

We now turn to the asymptotic distribution of the estimators that we encountered. It is well-known
that under suitable assumptions MLE’s are asymptotically normal. In concrete situations, one
may alternatively prove asymptotic normality by using the following theorem. It may be useful
to recall that the Jacobian of a function is the matrix of partial derivatives of the component
functions. For real functions on the real line, the Jacobian reduces to the derivative.

Theorem 1.4.1 (Cramér) Let g be a function from Rm to Rk which is totally differentiable at a

d
point a. If (Xn )n∈N is a sequence of m-dimensional random vectors such that cn (Xn − a) −→ X
for some random vector X and some sequence of scalars (cn )n∈N with limn→∞ cn = ∞, then
d
cn (g(Xn ) − g(a)) −→ Jg(a) X,

where Jg is the Jacobian of g.

Proof: See text books on mathematical statistics.

With this theorem, we can compute the asymptotic distribution of many estimators, in particular
those discussed above. Among other things, this is useful for constructing confidence intervals
(especially when the finite sample distribution is intractable).

We start with the asymptotic distribution of the sample variance. We first recall a multivariate
version of the Central Limit Theorem.
1.6 CHAPTER 1. PROCESS CAPABILITY ANALYSIS

Theorem 1.4.2 (Multivariate Central Limit Theorem) Let X1 , . . . , Xn be i.i.d. random

vectors with existing covariance matrix Σ. Then, as n → ∞,
Ã n !
1 1X d
n2 Xi − E X1 −→ N (h0, . . . , 0i, Σ).
n i=1

Theorem 1.4.3 Let X, X1 , X2 , . . . be independent identically distributed random variables with

µ4 = E X 4 < ∞. Then the following asymptotic result holds for the MLE σ c2 of σ 2 :
√ ³c2 ´
d
n σ − σ 2 −→ N (0, µ4 − σ 4 ).

If moreover the parent distribution is normal, then

√ ³c2 ´
d
n σ − σ 2 −→ N (0, 2 σ 4 ).

Proof: Because the variance does not depend on the mean, we assume without loss of generality
that µ = 0. Since we have finite fourth moments, we infer from the multivariate Central Limit
Theorem ( = Theorem 1.4.2) that
"µ Pn ¶ µ ¶#
1
√ n i=1 Xi 0 d
n 1
Pn 2
− 2
−→ N (0, Σ),
n i=1 Xi σ

where Σ is the covariance matrix of X and X 2 . Since σ c2 = 1/n Pn X 2 − X 2 , we apply

i=1 i
Theorem 1.4.1 with g(x, y) = y − x2 . We compute Jg(0, σ 2 ) = (0 1). Now recall that if Y is
a random variable with a multinormal distribution N (µ, Σ) and L is a linear map of the right
d
dimensions, then L Y = N (L µ, L Σ LT ). Hence,
√ ³c2 ´
d ¡ ¢
n σ − σ2 −→ N 0, Jg(0, σ 2 ) Σ Jg(0, σ 2 )T
= N (0, Var X 2 )
= N (0, µ4 − σ 4 ).

The last statement follows from the fact that for zero-mean normal distributions µ4 = 3 σ 4 . ¤

For the asymptotic distribution of σ

b and 1/b
σ , see Exercise 2.

Theorem 1.4.4 Let X, X1 , X2 , . . . be independent normal random variables with mean µ and
variance σ 2 . If µ and σ 2 are unknown, then the following asymptotic result holds for the MLE
cp = X + zp σ
X b of xp :
√ ³ ´
d ¡ ¡ ¢¢
n X cp − (µ + zp σ) −→ N 0, σ 2 1 + 21 zp2 .

√ ¡ ¢ d
Proof: The Central Limit Theorem yields that n X − µ −→ N (0, σ 2 ). Combining Theo-
√ d
rems 1.4.1 and 1.4.3, we have that n (bσ − σ) −→ N (0, 21 σ 2 ). Now recall that since we have a
normal sample, X and σ b are independent. Hence, it follows from Slutsky’s Lemma that
√ ¡ ¢ d
b − µ − zp σ −→ N (0, σ 2 ) ∗ N (0, 21 zp2 σ 2 ).
n X + zp σ

The result now follows from the elementary fact

N (0, σ12 ) ∗ N (0, σ22 ) = N (0, σ12 + σ22 ).

This concludes the proof. ¤

1.4. ASYMPTOTIC DISTRIBUTION OF CAPABILITY INDICES 1.7

Theorem 1.4.5 Let X, X1 , X2 , . . . be independent normal random variables with mean µ and
variance σ 2 and let ϕ be the standard normal density. If µ and σ 2 are unknown, then the following
asymptotic result holds for the MLE of P (a < X < b) = P ((a, b)):
µ µ ¶ µ ¶ ¶
√ b−X a−X d ¡ ¡ ¢¢
n Φ −Φ − P (a < X < b) −→ N (0, σ 2 c21 + 4 c1 c2 µ + 2 c22 (2µ2 + σ 2 ) ,
σ
b σ
b

where
µ ¶ µ ¶
b − µ b µ − (µ2 + σ 2 ) a − µ a µ − (µ2 + σ 2 )
c1 = ϕ −ϕ
σ 2σ 3 σ 2σ 3
µ ¶ µ ¶
b−µ µ−b a−µ µ−a
c2 = ϕ 3
−ϕ .
σ 2σ σ 2σ 3

Proof: We infer from the multivariate Central Limit Theorem ( = Theorem 1.4.2) that
·µ Pn ¶ µ ¶¸
√ 1/n i=1 Xi µ d
n Pn 2
− 2 2
−→ N (0, Σ) ,
1/n i=1 Xi µ + σ
d
where Σ is the covariance matrix of X and X 2 . Since E Z 4 = 3 and X = µ + σ Z where Z is a
standard normal random variable, we have

E X3 = µ3 + 3 µ σ 2
Var X 2 = µ4 + 6 µ2 σ 2 + 3 σ 4 − (µ2 + σ 2 )2 = 2 σ 2 (2 µ2 + σ 2 ).

Hence, µ ¶
σ2 2 µ σ2
Σ= .
2 µ σ2 2 σ 2 (2 µ2 + σ 2 )
Now we wish to apply Theorem 1.4.1 with
Ã ! Ã !
b−x a−x
g(x, y) = Φ p −Φ p .
y − x2 y − x2

This function is totally differentiable, except on the line y = x2 . Since we evaluate at x = µ

and y = µ2 + σ 2 , we have that y − x2 = σ 2 > 0. Hence, there are no differentiability problems.
c−x
Note that the partial derivatives of f (x, y) = p with respect to x and y are given by
y − x2
cx − y x−c
2 3/2
, respectively. Thus the transpose of the Jacobian of g is given by
2(y − x ) 2(y − x2 )3/2
 Ã ! Ã ! 
b−x bx − y a−x ax − y
ϕ p −ϕ p 
 y − x2 2(y − x2 )3/2 y − x2 2(y − x2 )3/2 
 
 Ã ! Ã ! ,
 
ϕ pb − x x−b
− ϕ p
a−x x−a 
y − x2 2(y − x2 )3/2 y − x2 2(y − x2 )3/2

where ϕ(x) is the standard normal density. Evaluating at x = µ and y = µ2 + σ 2 , we see that this
reduces to  µ ¶ µ ¶ 
b − µ b µ − (µ2 + σ 2 ) a − µ a µ − (µ2 + σ 2 )
 ϕ − ϕ 
 σ 2σ 3 σ 2σ 3 
 µ ¶ µ ¶ .
 b−µ µ−b a−µ µ−a 
ϕ 3
−ϕ 3
σ 2σ σ 2σ
Putting everything together yields the result. ¤
1.8 CHAPTER 1. PROCESS CAPABILITY ANALYSIS

1.5 Tolerance intervals

In the previous section we generalized estimation of parameters to estimation of functions of
parameters. Two important examples were the p-th quantile and the fraction P (a < X < b) of
a distribution. We now present a further generalization. Instead of considering estimators that
are real-valued functions of the sample, we will study estimators that are set-valued functions (in
particular, functions whose values are intervals).

Many practical situations require knowledge about the location of the complete distribution. E.g.,
one would like to construct intervals that cover a certain percentage of a distribution. Such
intervals are known as tolerance intervals. Although they are of great practical importance, this
topic is ignored in many text books. Many practical applications (and theory) can be found in [1].
The monograph [9], the review paper [16] and the bibliographies [10, 11] are also excellent sources
of information on this topic.

In this section we will give an introduction to tolerance intervals based on the normal distribution.
It is also possible to construct intervals for other distributions (see e.g., [1]).

Definition 1.5.1 Let X1 , . . . , Xn be a sample from a continuous distribution P with distribution

function F . An interval T (X1 , . . . , Xn ) = (L, U ) is said to be a β-content tolerance interval at
confidence level α if

P (P (T (X1 , . . . , Xn )) ≥ β) = P (F (U ) − F (L) ≥ β) = α. (1.5)

The random variable P (T (X1 , . . . , Xn )) is called the coverage of the tolerance interval.

This type of tolerance interval is sometimes called a guaranteed content interval. There also exists
a one-sided version of this type of tolerance interval.

Definition 1.5.2 Let X1 , . . . , Xn be a sample from a continuous distribution P with distribution

function F . The estimator U (X1 , . . . , Xn ) is said to be a β-content upper tolerance limit at
confidence level α if
P (F (U (X1 , . . . , Xn )) ≥ β) = α. (1.6)
Similarly, the estimator L(X1 , . . . , Xn ) is said to be a β-content lower tolerance limit at confidence
level α if
P (1 − F (L(X1 , . . . , Xn )) ≥ β) = α. (1.7)

Definition 1.5.3 Let X1 , . . . , Xn be a sample from a continuous distribution P with distribution

function F . An interval T (X1 , . . . , Xn ) = (L, U ) is said to be a β-expectation tolerance interval
if the expected coverage equals β, i.e.

E (P (T (X1 , . . . , Xn ))) = E (F (U ) − F (L)) = β. (1.8)

There are interesting relations between these concepts and quantiles. Let X1 , . . . , Xn be a sample
from a continuous distribution P with distribution function F . Since

P (F (U ) ≤ β) = P (U ≤ F −1 (β)),

it follows immediately that an upper (lower) (α, β) tolerance limit is an upper (lower) confidence
interval for the quantile F −1 (β) and vice-versa.

Definition 1.5.4 Let X1 , . . . , Xn be a sample from a continuous distribution P with distribution

function F . An interval T (X1 , . . . , Xn ) = (L, U ) is said to be a 100 × β% prediction interval if

P (L < X < U ) = β. (1.9)

1.5. TOLERANCE INTERVALS 1.9

Prediction intervals are usually associated with regression analysis, but also appear in other con-
texts as we shall see. The following proposition shows a surprising link between β-expectation
tolerance intervals and prediction intervals. It also has interesting corollaries as we shall see later
on.

Proposition 1.5.5 (Paulson [18]) A β-expectation tolerance interval is a 100 × β% prediction

interval.

Proof: We use the following well-known property of conditional expectations:

E (E (Y | X)) = E Y.

Hence, rewriting the probability in the definition of prediction interval in terms of an expectation,
we obtain:

P (L < X < U ) = E (1L<X<U )

= E (E (1L<X<U | L, U ))
= E (P (L < X < U | L, U ))
= E (P (L, U ))
= E (F (U ) − F (L)) ,

as required. ¤

For normal distributions, it is rather natural to construct tolerance ¡ intervals using

¢ the jointly
sufficient statistics X and S 2 . In particular, intervals of the form X − k S, X + k S are natural
candidates. Unfortunately, the distribution of the coverage of even such simple intervals is very
complicated if both µ and σ are unknown. However, we may use the Paulson result to compute
the first moment, i.e. the expected coverage.

Corollary 1.5.6 Let X1 , . . . , Xn be a sample from

¡ a normal distribution
¢ with mean µ and vari-
ance σ 2 . The expected coverage of the interval X − k S, X + k S equals β if and only if k =
q
1 + n1 tn−1;(1−β)/2 .

Proof: We first have to show that the expectation is finite. Note that the coverage may be written
in this case as
µ ¶ µ ¶
X −µ S X −µ S
F (X + kS) − F (X − kS) = Φ +k −Φ −k .
σ σ σ σ

Since 0 ≤ Φ(x) ≤ 1 for all x ∈ R, it suffices to show that the expectations of X and S are finite.
The first expectation is trivial, while the second one follows from Exercise 4.

Now Proposition 1.5.5 yields that it suffices to choose k such that (X − k S, X + k S) is a 100 × β%
prediction interval. In other words, k must be chosen such that
¡ ¢
P X − k S < X < X + k S = β,
d
where
¡ X is independent
¢ of X1 , . . . , Xn , but follows the same normal distribution. Hence, X − X =
N 0, σ 2 (1 + n1 ) . Thus we have the following equalities:
¡ ¢
β = P X −kS < X < X +kS
µ ¶
X −X
= P −k < <k
S
µ q ¶
1
= P −k < 1 + n Tn−1 < k ,
1.10 CHAPTER 1. PROCESS CAPABILITY ANALYSIS
q
1
from which we see that we must choose k = 1+ n tn−1;(1−β)/2 . ¤

For several explicit tolerance intervals under different assumptions, we refer to the exercises. There
is no closed solution for (α, β) tolerance intervals for normal distributions when both µ and σ 2 are
unknown; numerical procedures for this problem can be found in [4].

1.6 Density estimators

Although the distribution function completely characterizes the distribution and hence all charac-
teristics can be computed from it in principle, there are many cases in which one wants to estimate
the density directly. Apart from being more intuitive, knowledge of the density is required for
examining the shape of the distributions (e.g., to assess whether a distribution is a mixture of two
other distributions which is often reflected through bimodality). The density is also required for
estimation of the hazard rate f (x)/(1 − F (x)). Density estimation is also used in nonparametric
pattern recognition (discriminant analysis) when the densities of the feature vectors are unknown
and are to be estimated from training samples (see e.g. [5]).
If we know the shape of the density (e.g., a normal distribution), then density estimation
reduces to parameter estimation. We will study the case where the form of the density is not
known, i.e. we study nonparametric density estimation. A widely used density estimator of
(although it is not always recognized as such) is the histogram. Let X1 , . . . , Xn be a random
sample from a distribution function F (pertaining to a law P ) on R, with continuous derivative
F 0 = f . As before, we denote the empirical distribution function by Pn . Let I be a compact
interval on R and suppose that the intervals I1 , . . . , Ik form a partition of I, i.e.
I = I1 ∪ . . . ∪ Ik , Ii ∩ Ij = ∅ if i 6= j.
The histogram of X1 , . . . , Xn with respect to the partition I1 , . . . , Ik is defined as
k
X Pn (Ij ) IIj (x)
Hn (x) := ,
j=1
|Ij |

where |Ij | denotes the length of the interval Ij . It is clear that the histogram is a stepwise constant
function. Two major disadvantages of the histogram are
• the stepwise constant nature of the histogram
• the fact that the histogram heavily depends on the choice of the partition
In order to illustrate the last point, consider the following two histograms that represent the same
data set:

−4 −2 2 4 −4 −2 2 4
Histograms of sample of size 50 from mixture of 2 normal distributions
1.6. DENSITY ESTIMATORS 1.11

It is because of this phenomenon that histograms are not to be recommended. A natural way to

improve on histograms is to get rid of the fixed partition by putting an interval around each point.
If h > 0 is fixed, then
bn (x) := Pn ((x − h, x + h))
N (1.10)
2h
is called the naive density estimator and was introduced in 1951 by Fix and Hodges in an unpub-
lished report (reprinted in [5]) dealing with discriminant analysis. The motivation for the naive
estimator is that Z x+h
P (x − h < X < x + h) = f (t) dt ≈ 2 h f (x). (1.11)
x−h
Note that the naive estimator is a local procedure; it uses only the observations close to the point
at which one wants to estimate the unknown density. Compare this with the empirical distribution
function, which uses all observations to the right of the point at which one is estimating.
bn decreases as h tends to 0. However, if h
It is intuitively clear from (1.11) that the bias of N
tends to 0, then one is using less and less observations, and hence the variance of N bn increases.
This phenomenon occurs often in density estimation. The optimal value of h is a compromise
between the bias and the variance. We will return to this topic of great practical importance when
we discuss the MSE.
The naive estimator is a special case of the following class of density estimators. Let K be a
kernel function, that is a nonnegative function such that
Z ∞
K(x) dx = 1. (1.12)
−∞
The kernel estimator with kernel K and bandwidth h is defined by
n µ ¶
1 X 1 x − Xi
fbn (x) := K . (1.13)
n i=1 h h
Thus, the kernel indicates the weight that each observation receives in estimating the unknown
density. It is easy to verify that kernel estimators are densities and that the naive estimator is a
kernel estimator with kernel (
1
if |x| < 1
K(x) = 2
0 otherwise.
Remark 1.6.1 The kernel estimator can also be written in terms of the empirical distribution
function Fn : Z ∞ µ ¶
1 x−y
fbn (x) = K dFn (y),
−∞ h h
where the integral is a Stieltjes integral.
Examples of other kernels include:

name function
1 1 2
Gaussian √ e− 2 x
2π
1
naive/rectangular 1(−1,1) (x)
2
triangular (1 − |x|) 1(−1,1) (x)
15
biweight (1 − x2 )2 1(−1,1) (x)
16
3
Epanechnikov (1 − x2 ) 1(−1,1) (x)
4
1.12 CHAPTER 1. PROCESS CAPABILITY ANALYSIS

The kernel estimator is a widely used density estimator. A good impression of kernel estimation
is given by the books [20] and [22]. For other types of estimators, we refer to [20] and [21].

1.6.1 Finite sample behaviour of density estimators

In order to assess point estimators, we look at properties like unbiasedness and efficiency. In density
estimation, it is very important to know the influence of the bandwidth h (cf. our discussion of the
naive estimator). To combine the assessment of these properties, the Mean Square Error (MSE)
is used. We now discuss the analogues of these properties for density estimators. The difference
is that the estimate is not a single number, but a function. However, we start with pointwise
properties.

Theorem 1.6.2 Let fbn be a kernel estimator with kernel K. Then

Z µ ¶ Z ³y´
1 ∞ x−y 1 ∞
E fbn (x) = K f (y) dy = K f (x − y) dy. (1.14)
h −∞ h h −∞ h

RProof:
∞
This follows from the fact that for a random variable X with density f , we have E g(X) =
−∞
g(x) f (x) dx. ¤.

Theorem 1.6.3 Let fbn be a kernel estimator with kernel K. Then

Z ∞ µ ¶ ½Z ∞ µ ¶ ¾2
1 x−y 1 x−y
Var fbn (x) = K 2
f (y) dy − K f (y) dy . (1.15)
n h2 −∞ h n h2 −∞ h

Proof: It is easy to see that

1 X 1 2 ³ x − Xi ´ 1 X 1 ³ x − Xi ´ ³ x − Xj ´
n n
¡ ¢2
fˆn (x) = 2 K + K K .
n i=1 h2 h n2 i,j=1 h2 h h
i6=j

Then
Z ³x − y´ Z
¡ ¢2 1 ∞
n − 1³ ∞ ³x − y´ ´2
E fˆn (x) = K2 f (y)dy + K f (y)dy .
nh2 −∞ h nh2 −∞ h

Next use Theorem 4.2 and the well-known fact that Var X = E X 2 − (E X)2 . ¤

The following general result due to Rosenblatt (see [19] for a slightly more general result) shows
that we cannot have unbiasedness for all x.

Theorem 1.6.4 (Rosenblatt [19]) A kernel estimator can not be unbiased for all x ∈ R.
Rb
Proof: We argue by contradiction. Assume that E fbn (x) = f (x) for all x ∈ R. Then a
fbn (x) dx
is an unbiased estimator for F (b) − F (a), since
Z b Z b Z b
E fbn (x) dx = E fbn (x) dx = f (x) dx = F (b) − F (a),
a a a

where the interchange of integrals is allowed since the integrand is positive. Now it can be shown
that the only unbiased estimator of F (b) − F (a) symmetric in X1 , . . . , Xn is Fn (b) − Fn (a). This
leads to a contradiction, since it implies that the empirical distribution function is differentiable.
¤

For point estimators, the MSE is a useful concept. We now generalize this concept to density
estimators.
1.6. DENSITY ESTIMATORS 1.13

Definition 1.6.5 The Mean Square Error at x of a density estimator fb is defined as

³ ´2
MSEx (fb) := E fb(x) − f (x) . (1.16)

The Mean Integrated Square Error of a density estimator fb is defined as

Z ∞ ³ ´2
MISE(fb) := E fb(x) − f (x) dx. (1.17)
−∞

Theorem 1.6.6 For a kernel density estimator fbn with kernel K the MSE and MISE can be
expressed as:
Z ∞ µ ¶ ½Z ∞ µ ¶ ¾2
b 1 2 x−y 1 x−y
MSEx (fn ) = K f (y) dy − K f (y) dy +
n h2 −∞ h n h2 −∞ h
µ Z ∞ µ ¶ ¶2 (1.18)
1 x−y
K f (y) dy − f (x) .
h −∞ h
Z ∞ ÃZ ∞ µ ¶ ½Z ∞ µ ¶ ¾2 !
b 1 2 x−y x−y
MISE(fn ) = K f (y) dy − K f (y) dy dx+
n h2 −∞ −∞ h −∞ h
Z ∞ µ Z ∞ µ ¶ ¶2
1 x−y
K f (y) dy − f (x) dx.
−∞ h −∞ h
(1.19)

Proof: Combination of Exercise 14 with formulas (1.14) and (1.15) yields the formula for the
MSE. Integrating this formula with respect to x, we obtain the formula for the MISE. ¤

The above formulas can in general not be evaluated explicitly. When both the kernel and the
unknown density are Gaussian, then straightforward but tedious computations yield explicit for-
mulas as shown in [8]. These formulas were extended in [14] to the case of mixtures of normal
distributions. Marron and Wand claim in [14] that the class of mixture of normal distributions is
very rich and that it is thus possible to perform exact calculations for many distributions. These
calculations can be used to choose an optimal bandwidth h (see [14] for details).

For other examples of explicit MSE calculations, we refer to [3] and the exercises.

We conclude this section with a note on the use of Fourier analysis. Recall that the convolution
of two functions g1 and g2 is defined as
Z ∞
(g1 ∗ g2 )(x) := g1 (t) g2 (x − t) dt.
−∞

One of the elementary properties of the Fourier transform is that it transforms the complicated
convolution operation into the elementary multiplication operation, i.e.

F(g1 ∗ g2 ) = F(g1 ) F(g2 ),

where F(g) denotes the Fourier transform of g, defined by

Z ∞
(F(g))(s) = g(t)eist dt.
−∞

The formulas (1.14) and (1.15) show that E fbn (x) and Var fbn (x) can be expressed in terms of
convolutions of the kernel with the unknown density. The exercises contain examples in which
Fourier transforms yield explicit formulas for the mean and the variance of the kernel estimator.
1.14 CHAPTER 1. PROCESS CAPABILITY ANALYSIS

Another (even more important) use of Fourier transforms is the computation of the kernel estimate
itself. Computing density estimates directly from the definition is often very time consuming.
Define the function u by
n
1 X i s Xj
u(s) = e . (1.20)
n j=1

Then the Fourier transform of the kernel estimator is a convolution of u with the Fourier transform
of the kernel (see Exercise 18). Using Fast Fourier Transform (FFT), one can efficiently compute
good approximations to the kernel estimates. For details we refer to [20, pp. 61-66] and [22,
Appendix D].

1.6.2 Asymptotic behaviour of kernel density estimators

We have seen in the previous section that it is possible to evaluate exactly the important properties
of kernel density estimators. However, the unknown density f appears in a complicated way in
exact calculations, which limits the applicability. Such calculations are very important for choosing
the optimal bandwidth h. Therefore, much effort has been put in obtaining asymptotic results
in which the unknown density f appears in a less complicated way. In this section we give an
introduction to these results. Many of the presented results can be found in [17, 19]. For an
overview of more recent results, we refer to the monographs [20, 22].

Theorem 1.6.7 (Bochner) Let K be a bounded kernel function such that lim|y|→∞ y K(y) = 0.
Define for any absolutely integrable function g the functions
Z ∞ µ ¶
1 y
gn (x) := K g(x − y) dy,
hn −∞ hn

where (hn )n∈N is a sequence of positive numbers such that limn→∞ hn = 0. If g is continuous at
x, then we have
lim gn (x) = g(x). (1.21)
n→∞
R∞ 1
¡y¢ R∞
Proof: Since −∞ h
K h dy = −∞
K(y) dy = 1, we may write
¯ Z ∞ µ ¶ ¯
¯ 1 y ¯
|gn (x) − g(x)| = ¯gn (x) − g(x) K dy ¯¯
¯
−∞ hn hn
Z ∞ ¯ µ ¶¯
¯ 1 y ¯¯
≤ ¯{g(x − y) − g(x)} K dy.
¯ hn hn ¯
−∞

Let δ > 0 be arbitrary. We now split the integration interval into 2 parts: {y : |y| ≥ δ} and
{y : |y| < δ}. The first integral can be bounded from above by
Z µ ¶ Z µ ¶
|g(x − y)| y y 1 y
K dy + |g(x)| K dy ≤
|y|≥δ y hn hn |y|≥δ hn hn
Z Z
sup|v|≥δ/hn |v K(v)|
|g(x − y)| dy + |g(x)| K(t) dt ≤
δ |y|≥δ |t|≥δ/hn
Z Z
sup|v|≥δ/hn |v K(v)| ∞
|g(u)| du + |g(x)| K(t) dt .
δ −∞ |t|≥δ/hn

Letting n → ∞ and using that K is absolutely integrable, we see that these terms can be made
arbitrarily small. The integral over the second region can be bounded from above by
Z
sup |g(x − y) − g(x)| K(y) dy ≤ sup |g(x − y) − g(x)|.
|y|<δ |y|<δ |y|<δ
1.6. DENSITY ESTIMATORS 1.15

Since this holds for all δ > 0 and g is continuous at x, the above expression can be made arbitrarily
small. ¤.

As a corollary, we obtain the following asymptotic results (taken from [17]) for the mean and
variance of the kernel estimator at a point x.

Corollary 1.6.8 (Parzen) Let fbn be a kernel estimator such that its kernel K is bounded and
satisfies lim|y|→∞ y K(y) = 0. Then fbn is an asymptotically unbiased estimator for f at all
continuity points x if limn→∞ hn = 0.

Proof: Apply Theorem 1.6.7 to Formula (1.14). ¤.

In the above corollary, there is no restriction on the rate at which (hn )n∈N converges to 0. The
next corollaries show that if (hn )n∈N converges to 0 slower than n−1 , then fbn (x) is consistent in
the sense that the MSE converges to 0.

Corollary 1.6.9 (Parzen) Let fbn be a kernel estimator such that its kernel K is bounded and
satisfies lim|y|→∞ y K(y) = 0. If limn→∞ hn = 0 and x is a continuity point of the unknown
density f , then Z ∞
b
lim n hn Var fn (x) = f (x) K 2 (y) dy.
n→∞ −∞

Proof: First note that since K is bounded, K 2 also satisfies the conditions of Theorem 1.6.7.
Hence, the result follows from applying Theorem 1.6.7 and Exercise 21 to Formula (1.15). ¤.

Corollary 1.6.10 (Parzen) Let fbn be a kernel estimator such that its kernel K is bounded and
satisfies lim|y|→∞ y K(y) = 0. If limn→∞ hn = 0, limn→∞ n hn = ∞ and x is a continuity point
of the unknown density f , then
lim MSEx (fbn ) = 0.
n→∞

Proof: It follows from Corollary 1.6.9 that limn→∞ Var fbn (x) = 0. The result now follows by
combining Corollary 1.6.8 and Exercise 14. ¤.

Although the above theorems give insight in the asymptotic behaviour of density estimators, they
are not sufficient for practical purposes. Therefore, we now refine them by using Taylor expansions.

Theorem 1.6.11 b
R ∞ Let fn be a kernel estimator such that its kernel K is bounded and symmetric
and such that −∞ |t3 | K(t) dt exists and is finite. If the unknown density f has a bounded third
derivative, then we have that
Z ∞
1
E fbn (x) = f (x) + h2 f 00 (x) t2 K(t) dt + o(h2 ), h ↓ 0 (1.22)
2 −∞
Z ∞ µ ¶
1 1
Var fbn (x) = f (x) K 2 (t) dt + o , h ↓ 0 and nh → ∞ (1.23)
nh −∞ nh
Z ∞ µ Z ∞ ¶2 µ ¶
1 1 4 1 ¡ ¢
MSEx (fbn ) = f (x) K 2 (t) dt + h f 00 (x) t2 K(t) dt +o + o h4 ,
nh −∞ 4 −∞ nh
h ↓ 0 and nh → ∞.
(1.24)

Proof: By Formula (1.14) and a change of variables, we may write the bias as
Z ∞
b
E fn (x) − f (x) = K(t) {f (x − th) − f (x)} dt.
−∞
1.16 CHAPTER 1. PROCESS CAPABILITY ANALYSIS

Now Taylor’s Theorem with the Lagrange form of the remainder says that
(th)2 00 (th)3 000
f (x − th) = f (x) − th f 0 (x) + f (x) − f (ξ),
2 3!
R∞
where ξ depends on x, t, and h and is such that |x − ξ| < |th|. Since −∞ K(t) dt = 1, it follows
that Z ∞ µ ¶
b 0 (th)2 00 (th)3 000
E fn (x) − f (x) = K(t) −th f (x) + f (x) − f (ξ) dt,
−∞ 2 3!
which because of the symmetry of K simplifies to
Z ∞ µ ¶
(th)2 00 (th)3 000
E fbn (x) − f (x) = K(t) f (x) − f (ξ) dt.
−∞ 2 3!
If M denotes an upper bound for f 000 , then the first result follows from
¯ Z ∞ ¯ 3 Z ∞ ¯
¯ ¯ ¯
¯E fbn (x) − f (x) − 1 h2 f 00 (x) t2
K(t) dt¯ ≤ h ¯t3 K(t) f 000 (ξ)¯ dt
¯ 2 ¯ 3! −∞
−∞
Z ∞
h3
≤ M |t3 K(t)| dt,
3! −∞
where the last term obviously is o(h2 ).
The asymptotic expansion of the variance follows immediately from Corollary 1.6.9. In order
to obtain the asymptotic expansion for the MSE, it suffices to combine Exercise 14 with Formu-
las (1.22) and (1.23). ¤

These expressions show that the asymptotic expressions are much easier to interpret than the exact
expression of the previous section. For example, we can now clearly see that the bias decreases
if h is small and that the variance decreases if h is large (cf. our discussion of the naive density
estimator).

Theorem 1.6.11 is essential for obtaining optimal choices of the bandwidth. If we assume that f 00
is square integrable, then it follows from Formula (1.24) that for h ↓ 0 and nh → ∞:
Z ∞ Z ∞ µZ ∞ ¶2 µ ¶
1 1 1
MISE(fbn ) = K 2 (t) dt + h4 (f 00 )2 (x) dx t2 K(t) dt + o + o(h4 ).
nh −∞ 4 −∞ −∞ nh
(1.25)
The expression
Z ∞ Z ∞ µZ ∞ ¶2
1 1 2
K 2 (t) dt + h4 (f 00 ) (x) dx t2 K(t) dt (1.26)
n h −∞ 4 −∞ −∞

is called the asymptotic MISE, often abbreviated as AMISE. Note that Formula (1.26) is much
easier to understand than Formula (1.19). We now see (cf. Exercise 22) how to balance between
squared bias and variance in order to obtain a choice of h that minimizes the MISE:
 1/5
R∞ 2
 −∞
K (t) dt 
hAMISE =  ³R ´2 R  . (1.27)
∞ ∞ 00 )2 (x) dx
4n −∞ t2 K(t) dt −∞
(f

R∞ 2
An important drawback of Formula (1.27) is that it depends on −∞ (f 00 ) (x) dx, which is un-
known. However, there are good methods for estimating this quantity. For details, we refer to the
literature ([20, 22]). An example of a simple method is given in Exercise 23.

Given an optimal choice of the bandwidth h, we may wonder which kernel gives the smallest
MISE. It turns out that the Epanechnikov kernel is the optimal kernel. However, the other kernels
perform nearly as well, so that the optimality property of the Epanechnikov kernel is not very
important in practice. For details, we refer to [20, 22].
1.7. EXERCISES 1.17

1.7 Exercises
In all exercises X, X1 , X2 , . . . are independent identically distributed normal random variables with
mean µ and variance σ 2 , unless otherwise stated.

Exercise 1 Assume that that the main characteristic of a production process follows a normal
distribution and that Cp equals 1.33.

a) What is the percentage non-conforming items if the process is centred (that is, if µ = (U SL+
LSL)/2 )?

b) What is the percentage non-conforming items if µ = (2U SL + LSL)/3?

Exercise 2 Find the asymptotic distribution of σ

b and 1/b
σ , where σ
b is the MLE for σ. What is
bp ?
the asymptotic distribution of C

bp (both exact and

Exercise 3 Compute a 100 (1 − α)%-confidence interval for Cp based on C
asymptotic).

Exercise 4 Compute Var S.

µ ¶ µ ¶
b−X a−X
Exercise 5 Assume that σ is known. Show that Φ −Φ is a biased estimator
σ σ
for P (a < X < b).

Exercise 6 Construct a β-expectation tolerance interval in the trivial case when both µ and σ 2
are known.

Exercise 7 Construct a β-expectation tolerance interval when µ is unknown and σ 2 is known.

Exercise 8 Construct a β-expectation tolerance interval when µ is known and σ 2 is unknown.

Exercise 9 Construct a β-content tolerance interval at confidence level α in the trivial case when
both µ and σ 2 are known. What values can α take?

Exercise 10 Construct a β-content tolerance interval at confidence level α when µ is unknown

and σ 2 is known.

Exercise 11 Construct a β-content tolerance interval at confidence level α when µ is known and
σ 2 is unknown.

Exercise 12 Verify that the naive estimator is a kernel estimator.

Exercise 13 Verify that the kernel estimator is a density.

Exercise 14 Prove that for any density estimator fb we have

³ ´2
MSEx (fb) = Var fb(x) + E fb(x) − f (x) .

Exercise 15 Show that formula (1.19) can be rewritten as

Z ∞ µ ¶Z ∞ µZ ∞ µ ¶ ¶2
1 1 1 x−y
MISE(fbn ) = K 2 (y) dy + 1 − 2
K f (y) dy dx −
n h −∞ n −∞ h −∞ h
Z Z ∞ µ ¶ Z ∞
2 ∞ x−y
K f (y) dyf (x) dx + f 2 (x) dx.
h −∞ −∞ h −∞
1.18 CHAPTER 1. PROCESS CAPABILITY ANALYSIS

Exercise 16 Calculate the following estimators for µ, σ 2 respectively, where fbn is a kernel esti-
mator for f with a symmetric kernel K (that is, K(x) = K(−x)):
Z ∞
a) µ
b= x fbn (x) dx.
−∞
Z ∞
c2 =
b) σ b)2 fbn (x) dx.
(x − µ
−∞

Exercise 17 Verify by direct computation that the naive estimator is biased in general.

Exercise 18 Use (1.20) to find a formula for the Fourier transform of fbn .

Exercise 19 Suppose that K is a symmetric kernel, i.e. K(x) = K(−x). Show that MISE(fc
n)
equals
Z ∞ Z ∞ ½ ¾
1 1 1
(FK)2 (t) dt + (1 − ) (FK)2 (ht) − 2 FK(ht) + 1 |Ff (t)|2 dt.
2πnh −∞ 2 π −∞ n

Hint: use Parseval’s identity

Z ∞ Z ∞
1
g1 (x) g2 (x) dx = Fg1 (t) Fg2 (t) dt.
−∞ 2π −∞

Exercise 20 The Laplace kernel is defined by K(x) := 12 e−|x| . Use the results of the previous
exercise to derive an expression for the MISE of the kernel estimator with the Laplace kernel when
the density is an exponential density.

R ∞Show that a version

Exercise 21 R ∞of Bochner’s Theorem 1.6.7 holds if we relax the conditions
K ≥ 0 and −∞ K(y) dy = 1 to −∞ |K(y)| dy < ∞.

Exercise 22 Prove Formula (1.27) for the optimal bandwidth based on the AMISE.

Exercise 23 Prove that if f is a normal density with parameters µ and σ 2 , then

 1/5
√ R∞ 2
 8 π −∞ K (t) dt 
hAMISE = ³R ´2  σ.
∞
3 n −∞ t2 K(t) dt

How can this be used to select a bandwidth? What is the rationale behind this bandwidth choice?

Exercise 24 Suppose that we take hn = c n−γ where c > 0 and γ ∈ (0, 1). Which value of γ gives
the optimal rate of convergence for the MSE?
Bibliography

[1] J. Aitchison, and I. Dunsmore, Statistical Prediction Analysis, Cambridge University Press,
1975.

[2] R.B. D’Agostino and M.A. Stephens (eds.), Goodness-of-fit Techniques, Marcel Dekker, New
York, 1986.

[3] P. Deheuvels, Estimation nonparametrique de la densité par histogrammes generalisés, Revue

de Statistique Appliquée 35 (1977), 5–42.

[4] K.R. Eberhardt, R.W. Mee and C.P. Reeve, Computing factors for exact two-sided tolerance
limits for a normal distribution, Communications in Statistics - Simulation and Computation
18 (1989), 397–413.

[5] E. Fix and J.L. Hodges, Discriminatory analysis - nonparametric discrimination: consistency
properties, International Statistical Reviews 57 (1989), 238–247.

[6] J.L. Folks, D.A. Pierce and C. Stewart, Estimating the fraction of acceptable product, Tech-
nometrics 7 (1965), 43–50.

[7] W.C. Guenther, A note on the Minimum Variance Unbiased estimate of the fraction of a
normal distribution below a specification limit, American Statistician 25 (1971), 18–20.

[8] M.J. Fryer, Some errors associated with the non-parametric estimation of density functions,
Journal of the Institute of Mathematics and its Applications 18 (1976), 371–380.

[9] I. Guttman, Statistical Tolerance regions: Classical and Bayesian, Charles Griffin, 1970.

[10] M. Jı́lek, A bibliography of statistical tolerance regions, Mathematische Operationsforschung

und Statistik - Series Statistics 12 (1981), 441–456.

[11] M. Jı́lek and H. Ackermann, A bibliography of statistical tolerance regions II, Statistics 20
(1989), 165–172.

[12] V.E. Kane, Process Capability Indices, J. Qual. Technol. 18(1986), 41–52.

[13] S. Kotz and N.L. Johnson, Process Capability Indices, Chapman and Hall, London, 1993.

[14] J.S. Marron and M.P. Wand, Exact mean integrated squared error, Annals of Statistics 20
(1992), 712–736.

[15] D.B. Owen, A survey of properties and applications of the noncentral t-distribution, Techno-
metrics 10 (1968), 445–478.

[16] J.K. Patel, Tolerance limits - a review, Communications in Statistics A - Theory and Methods
15 (1986), 2719–2762.

[17] E. Parzen, On estimation of a probability density function and mode, Annals of Mathematical
Statistics 33 (1962), 1065–1076.

1.19
1.20 BIBLIOGRAPHY

[18] E. Paulson, A note on control limits, Annals of Mathematical Statistics 14 (1943), 90–93.

[19] M. Rosenblatt, Remarks on some nonparametric estimates of a density function, Annals of

Mathematical Statistics 27 (1956), 827–837.
[20] B.W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman & Hall,
London, 1986.
[21] R.A. Tapia and J.R. Thompson, Nonparametric Probability Density Estimation, The John
Hopkins University Press, Baltimore, 1978.
[22] M.P. Wand and M.C. Jones, Kernel Smoothing, Chapman & Hall, London, 1995.

[23] D.J. Wheeler, The variance of an estimator in variables sampling, Technometrics 12 (1970),
751–755.
[24] P.W. Zehna, Invariance of Maximum Likelihood estimation, Annals of Mathematical Statistics
37 (1966), 744.

Process Capability
No ratings yet
Process Capability
45 pages
Solid Layer Thermal-Conductivity Measurement Techniques: KE Goodson
No ratings yet
Solid Layer Thermal-Conductivity Measurement Techniques: KE Goodson
12 pages
STAT 513 Solutions
No ratings yet
STAT 513 Solutions
16 pages
Extract Pages From Aiag-spc-2nd-Edition EDIT
No ratings yet
Extract Pages From Aiag-spc-2nd-Edition EDIT
81 pages
Comparison_of_Process_Capability_Indices
No ratings yet
Comparison_of_Process_Capability_Indices
16 pages
1 Point Estimation 10: Ffi Ffi
No ratings yet
1 Point Estimation 10: Ffi Ffi
60 pages
Estimation_of_Population_Process_Capabil
No ratings yet
Estimation_of_Population_Process_Capabil
16 pages
meng2020 Hypothesis testing of process capability index ??? from the perspective of generalized confidance interval
No ratings yet
meng2020 Hypothesis testing of process capability index ??? from the perspective of generalized confidance interval
21 pages
chap7_2012
No ratings yet
chap7_2012
14 pages
Chap4. Supplementary - SPC, CNN Practice
No ratings yet
Chap4. Supplementary - SPC, CNN Practice
9 pages
Non-normal Process Capability Indices
No ratings yet
Non-normal Process Capability Indices
6 pages
A short note on the effect of sample size
No ratings yet
A short note on the effect of sample size
13 pages
Wararit2014 Confidence interval for the process capability index based on the bootstrap-t confidence interval for the standard deviation
No ratings yet
Wararit2014 Confidence interval for the process capability index based on the bootstrap-t confidence interval for the standard deviation
15 pages
Estimating Process Capability Index C PM Using a Bootstrap Sequential Sampling Procedure
No ratings yet
Estimating Process Capability Index C PM Using a Bootstrap Sequential Sampling Procedure
15 pages
A Capability Index Calibrated To The Non
No ratings yet
A Capability Index Calibrated To The Non
15 pages
9e6cbf4ac9c3320e4e9d5402ab7ac5eb_MIT14_384F13_rec7
No ratings yet
9e6cbf4ac9c3320e4e9d5402ab7ac5eb_MIT14_384F13_rec7
6 pages
The Standard Model and Beyond, Second Edition Paul Langacker 2024 Scribd Download
100% (4)
The Standard Model and Beyond, Second Edition Paul Langacker 2024 Scribd Download
62 pages
Capability Indices for Non Normal Data
No ratings yet
Capability Indices for Non Normal Data
8 pages
a425707-189
No ratings yet
a425707-189
11 pages
Confidence Interval Non Normality
No ratings yet
Confidence Interval Non Normality
15 pages
A Modified Approach For Estimating Proce
No ratings yet
A Modified Approach For Estimating Proce
8 pages
STS 371 PG 29 - 66
No ratings yet
STS 371 PG 29 - 66
39 pages
Rakhlin Mathstat sp22
No ratings yet
Rakhlin Mathstat sp22
108 pages
Estimation of CPMK Process Capability Index Based
No ratings yet
Estimation of CPMK Process Capability Index Based
10 pages
DJW325.Dec.17.Capability Ratios Vary
No ratings yet
DJW325.Dec.17.Capability Ratios Vary
10 pages
Sample size determination for lower con®dence limits for
No ratings yet
Sample size determination for lower con®dence limits for
12 pages
Empirical Process (Sara Van de Geer)
No ratings yet
Empirical Process (Sara Van de Geer)
91 pages
Handout_II_STA3220
No ratings yet
Handout_II_STA3220
16 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Statistics
No ratings yet
Statistics
4 pages
SS Process Capa
No ratings yet
SS Process Capa
37 pages
MidtermFormula
No ratings yet
MidtermFormula
1 page
Lecture Notes
No ratings yet
Lecture Notes
90 pages
Process Capability NIST
No ratings yet
Process Capability NIST
7 pages
Advanced Statistical Inference
No ratings yet
Advanced Statistical Inference
7 pages
Weather-Curriculum-1-1
No ratings yet
Weather-Curriculum-1-1
15 pages
Mallows - Some Comments On CP
No ratings yet
Mallows - Some Comments On CP
16 pages
X400004_20220215_solutions
No ratings yet
X400004_20220215_solutions
8 pages
Cours 2 MVA
No ratings yet
Cours 2 MVA
5 pages
Google Agile Essentials
No ratings yet
Google Agile Essentials
5 pages
Diseño Experimentos
No ratings yet
Diseño Experimentos
16 pages
A Note On Selecting Target and Process Capability Index Based On Fuzzy Optimization
No ratings yet
A Note On Selecting Target and Process Capability Index Based On Fuzzy Optimization
6 pages
A Process Capability Index Sensitive To Skewness
No ratings yet
A Process Capability Index Sensitive To Skewness
13 pages
Lecture1
No ratings yet
Lecture1
8 pages
IT-PreviousYearsQA-DigitalDocumentation
No ratings yet
IT-PreviousYearsQA-DigitalDocumentation
3 pages
Pangmalakasang MD Elements Part1
No ratings yet
Pangmalakasang MD Elements Part1
72 pages
Process Capability Analysis and Process Analytical Technology
No ratings yet
Process Capability Analysis and Process Analytical Technology
43 pages
Capability Indices: P P P PK
No ratings yet
Capability Indices: P P P PK
5 pages
Making Decisions in Assessing Process Capability Index CPK
No ratings yet
Making Decisions in Assessing Process Capability Index CPK
6 pages
Introduction
No ratings yet
Introduction
11 pages
Statistical Process Control Methods From The Viewpoint of Industrial Application
No ratings yet
Statistical Process Control Methods From The Viewpoint of Industrial Application
15 pages
Engineering and Maintenance Turnover Sheet
No ratings yet
Engineering and Maintenance Turnover Sheet
6 pages
CPK Index - How To Calculate For All Types of Tolerances
100% (6)
CPK Index - How To Calculate For All Types of Tolerances
15 pages
Capability Indices For Unilateral Tolerances
No ratings yet
Capability Indices For Unilateral Tolerances
16 pages
The Role of Traditional Leadership and Communicati
No ratings yet
The Role of Traditional Leadership and Communicati
9 pages
MBI6655 Datenblatt - Datasheet
No ratings yet
MBI6655 Datenblatt - Datasheet
26 pages
Chen 2001
No ratings yet
Chen 2001
11 pages
The Effects of Temperature On Daphnia Heart Rate With Reference Abstracts
88% (16)
The Effects of Temperature On Daphnia Heart Rate With Reference Abstracts
12 pages
Planet Formation Theory in The Era of ALMA and Kepler From Pebbles To Exoplanets
No ratings yet
Planet Formation Theory in The Era of ALMA and Kepler From Pebbles To Exoplanets
3 pages
Weibull and Lognormal
No ratings yet
Weibull and Lognormal
9 pages
Process Capability Indices Based On Median Absolute Deviation PDF
No ratings yet
Process Capability Indices Based On Median Absolute Deviation PDF
6 pages
(1997) Process Capability Analysis For Non-Normal Relay Test Data
No ratings yet
(1997) Process Capability Analysis For Non-Normal Relay Test Data
8 pages
Sufficient Statistics - Problems - Solved - Xiang - Yin
No ratings yet
Sufficient Statistics - Problems - Solved - Xiang - Yin
5 pages
Control Theory: FIG. 2.34m
No ratings yet
Control Theory: FIG. 2.34m
1 page
Process Capability
No ratings yet
Process Capability
4 pages
One Process Different Results Methodologies For Analyzing A Stencil Printing Process Using Process Capability Indices - Santos
No ratings yet
One Process Different Results Methodologies For Analyzing A Stencil Printing Process Using Process Capability Indices - Santos
8 pages
Process Capability Indices For Non Normal Distributions
No ratings yet
Process Capability Indices For Non Normal Distributions
39 pages
Revo Head User's Guide
No ratings yet
Revo Head User's Guide
126 pages
November Updated 2019 GR 8-12 Past Exam Papers Correct - AUajYkarQN6nwVRJlOdj
No ratings yet
November Updated 2019 GR 8-12 Past Exam Papers Correct - AUajYkarQN6nwVRJlOdj
5 pages
Chapter 1 - 25
No ratings yet
Chapter 1 - 25
26 pages
SPC For Non-Normal Data
No ratings yet
SPC For Non-Normal Data
4 pages
Class 12 Physics Notes Chapter 1&2 Studyguide360
No ratings yet
Class 12 Physics Notes Chapter 1&2 Studyguide360
54 pages
World Geography - Continent
No ratings yet
World Geography - Continent
2 pages
Process Capab Sixsigma
No ratings yet
Process Capab Sixsigma
3 pages
2/2 Normally Closed Solenoid Valve For Terminal/ Gantary Automation
No ratings yet
2/2 Normally Closed Solenoid Valve For Terminal/ Gantary Automation
2 pages
Passage I: Task 1. Answer and Questions
No ratings yet
Passage I: Task 1. Answer and Questions
2 pages
Lab Report: Qualitative Analysis of Everyday Chemicals: Name: Date: Lab Partner: Lab Section
No ratings yet
Lab Report: Qualitative Analysis of Everyday Chemicals: Name: Date: Lab Partner: Lab Section
3 pages
Weekly Assignments
No ratings yet
Weekly Assignments
1 page
Project Paper 15 Pages Summary Template
No ratings yet
Project Paper 15 Pages Summary Template
5 pages
FD 2020 410505 00
No ratings yet
FD 2020 410505 00
2 pages
Robert P. Sroufe - Integrated Management - How Sustainability Creates Value For Any Business-Emerald Publishing (2018)
No ratings yet
Robert P. Sroufe - Integrated Management - How Sustainability Creates Value For Any Business-Emerald Publishing (2018)
433 pages
Engineer Your Future Engineer Your Future
No ratings yet
Engineer Your Future Engineer Your Future
16 pages
CA Final Study Plan
No ratings yet
CA Final Study Plan
1 page
20 Millions: Over 1 Million
0% (1)
20 Millions: Over 1 Million
66 pages
The Edrizzi System: User Manual March 2020/4
No ratings yet
The Edrizzi System: User Manual March 2020/4
58 pages
Ventilation and Air Distribution in Indoor Aquatic Facilities
No ratings yet
Ventilation and Air Distribution in Indoor Aquatic Facilities
14 pages
Class X - Artificial Intelligence - Evaluation - Question Bank
83% (6)
Class X - Artificial Intelligence - Evaluation - Question Bank
8 pages
Revo Scan 5 Axis-2
No ratings yet
Revo Scan 5 Axis-2
24 pages
Kaya Eralp ASAN - CV - 20201121
No ratings yet
Kaya Eralp ASAN - CV - 20201121
10 pages
TP20 Touch Trigger Probe System
No ratings yet
TP20 Touch Trigger Probe System
2 pages
RGA22G Scale Guide: A B C DH
No ratings yet
RGA22G Scale Guide: A B C DH
2 pages
Q1018 Series: Linear Low Density Polyethylene (LLDPE)
No ratings yet
Q1018 Series: Linear Low Density Polyethylene (LLDPE)
2 pages
DLL Matatag - Science 4 Q1 W1
No ratings yet
DLL Matatag - Science 4 Q1 W1
23 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
3.5/5 (1)
Trouble Shooting Injection Molding Process
No ratings yet
Trouble Shooting Injection Molding Process
16 pages
Teacher Induction Program - Module 2 V1.0
No ratings yet
Teacher Induction Program - Module 2 V1.0
58 pages
Basic Concepts in Geometry and Kinds of Angles: Week 1
No ratings yet
Basic Concepts in Geometry and Kinds of Angles: Week 1
7 pages

Process Capability

Uploaded by

Process Capability

Uploaded by

Chapter 1

Process Capability Analysis

1.1 Capability indices

1 − P (LSL < X < U SL) = 2Φ(−d/σ) = 2Φ(−3Cp ). (1.1)

department within the same plant.

1 − P (LSL < X < U SL) = Φ(−3(2Cp − Cpk )) + Φ(−3Cpk ). (1.3)

1.2 Estimation of capability indices

Definition 1.2.1 Let P be a probability distribution depending on parameters θ1 , . . . , θk , where

Theorem 1.2.2 (Zehna [24]) Let P be a distribution depending on parameters θ1 , . . . , θk and

a) Suppose we want to estimate σ, when µ is unknown.

It µfollows from¶ d) that

1.3 Exact distribution of capability indices

In particular, it follows that

Definition 1.3.1 (Noncentral t-distribution) Let Z be a standard normal random variable

is called a noncentral t-distribution with ν degrees of freedom and non-centrality parameter δ.

Theorem 1.3.2 The MLE X bp = X + zp σ b for xp with an underlying normal distribution is

1.4 Asymptotic distribution of capability indices

Theorem 1.4.1 (Cramér) Let g be a function from Rm to Rk which is totally differentiable at a

where Jg is the Jacobian of g.

Proof: See text books on mathematical statistics.

Theorem 1.4.2 (Multivariate Central Limit Theorem) Let X1 , . . . , Xn be i.i.d. random

Theorem 1.4.3 Let X, X1 , X2 , . . . be independent identically distributed random variables with

If moreover the parent distribution is normal, then

where Σ is the covariance matrix of X and X 2 . Since σ c2 = 1/n Pn X 2 − X 2 , we apply

For the asymptotic distribution of σ

The result now follows from the elementary fact

N (0, σ12 ) ∗ N (0, σ22 ) = N (0, σ12 + σ22 ).

This concludes the proof. ¤

This function is totally differentiable, except on the line y = x2 . Since we evaluate at x = µ

1.5 Tolerance intervals

Definition 1.5.1 Let X1 , . . . , Xn be a sample from a continuous distribution P with distribution

P (P (T (X1 , . . . , Xn )) ≥ β) = P (F (U ) − F (L) ≥ β) = α. (1.5)

Definition 1.5.2 Let X1 , . . . , Xn be a sample from a continuous distribution P with distribution

Definition 1.5.3 Let X1 , . . . , Xn be a sample from a continuous distribution P with distribution

E (P (T (X1 , . . . , Xn ))) = E (F (U ) − F (L)) = β. (1.8)

Definition 1.5.4 Let X1 , . . . , Xn be a sample from a continuous distribution P with distribution

P (L < X < U ) = β. (1.9)

Proposition 1.5.5 (Paulson [18]) A β-expectation tolerance interval is a 100 × β% prediction

Proof: We use the following well-known property of conditional expectations:

P (L < X < U ) = E (1L<X<U )

For normal distributions, it is rather natural to construct tolerance ¡ intervals using

Corollary 1.5.6 Let X1 , . . . , Xn be a sample from

1.6 Density estimators

1.6.1 Finite sample behaviour of density estimators

Theorem 1.6.2 Let fbn be a kernel estimator with kernel K. Then

Theorem 1.6.3 Let fbn be a kernel estimator with kernel K. Then

Proof: It is easy to see that

Definition 1.6.5 The Mean Square Error at x of a density estimator fb is defined as

The Mean Integrated Square Error of a density estimator fb is defined as

F(g1 ∗ g2 ) = F(g1 ) F(g2 ),

where F(g) denotes the Fourier transform of g, defined by

1.6.2 Asymptotic behaviour of kernel density estimators

Proof: Apply Theorem 1.6.7 to Formula (1.14). ¤.

b) What is the percentage non-conforming items if µ = (2U SL + LSL)/3?

Exercise 2 Find the asymptotic distribution of σ

bp (both exact and

Exercise 4 Compute Var S.

Exercise 7 Construct a β-expectation tolerance interval when µ is unknown and σ 2 is known.

Exercise 8 Construct a β-expectation tolerance interval when µ is known and σ 2 is unknown.

Exercise 10 Construct a β-content tolerance interval at confidence level α when µ is unknown

Exercise 12 Verify that the naive estimator is a kernel estimator.

Exercise 13 Verify that the kernel estimator is a density.

Exercise 14 Prove that for any density estimator fb we have

Exercise 15 Show that formula (1.19) can be rewritten as

Hint: use Parseval’s identity

R ∞Show that a version

Exercise 23 Prove that if f is a normal density with parameters µ and σ 2 , then

[3] P. Deheuvels, Estimation nonparametrique de la densité par histogrammes generalisés, Revue

[10] M. Jı́lek, A bibliography of statistical tolerance regions, Mathematische Operationsforschung

[19] M. Rosenblatt, Remarks on some nonparametric estimates of a density function, Annals of

You might also like