0% found this document useful (0 votes)
57 views

Process Capability

This document provides an overview of process capability analysis and defines key terms like capability indices Cp and Cpk. It discusses how these indices are used to evaluate whether a production process can meet requirements defined by specification limits. It also describes how the indices Cp and Cpk are calculated and how they differ based on whether the process is centered. Finally, it discusses methods for estimating capability indices based on sample data.

Uploaded by

Kaya Eralp Asan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Process Capability

This document provides an overview of process capability analysis and defines key terms like capability indices Cp and Cpk. It discusses how these indices are used to evaluate whether a production process can meet requirements defined by specification limits. It also describes how the indices Cp and Cpk are calculated and how they differ based on whether the process is centered. Finally, it discusses methods for estimating capability indices based on sample data.

Uploaded by

Kaya Eralp Asan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Chapter 1

Process Capability Analysis

In this chapter we give the mathematical background of process capability analysis, in particular
the capability indices Cp and Cpk and related topics like tolerance intervals and density estimation.
For a detailed mathematical account of capability indices, we refer to [13].

1.1 Capability indices


Usually the items produced by a production process have to meet customer requirements 1 . Re-
quirements may also be set by the government through legislation. It is therefore important to
know beforehand whether the inherent variation within the production process is such that it can
meet these requirements. Requirements are usually defined as specification limits. We denote
the upper specification limit by USL and the lower specification limit by LSL. Products that fall
outside specification limits are called non-conforming. Within SPC an investigation whether the
process can meet the requirements is called a Process Capability Analysis.
A straightforward way to describe process capability would be to use sample mean and sample
standard deviation. As natural bandwidth of a process one usually takes 6σ, which implicitly
assumes a normal distribution. For a random variable X which is normally distributed X with
parameters µ and σ 2 , it holds that P (X > 3σ) = 0.00135, and thus P (−3σ < X < 3σ) = 0.9973.
This is a fairly arbitrary, but widely accepted choice.
Whether a process fits within the 6σ-bandwidth, is often indicated in industry by so-called
Process Capability Indices. The simplest capability index is called Cp (in order to avoid confusion
with Mallow’s regression diagnostic value Cp one sometimes uses Pp ) and is defined as
U SL − LSL
Cp = .

Note that this quantity has the advantage of being dimensionless The quantity 1/Cp is known as
the capability ratio (often abbreviated as CR). It will be convenient to write
1
d= (U SL − LSL).
2
The capability index Cp is useful if the process is centred around the middle of the specification
interval. If that is the case, then the proportion of non-conforming items of a normally distributed
characteristic X equals

1 − P (LSL < X < U SL) = 2Φ(−d/σ) = 2Φ(−3Cp ). (1.1)

If the process is not centred, then the expected proportion of non-conforming items will be higher
than the value of Cp seems to indicate. Therefore the following index has been introduced for
1 In modern business a customer is any person that receives produced items. Hence, this may be another

department within the same plant.

1.1
1.2 CHAPTER 1. PROCESS CAPABILITY ANALYSIS

non-centred processes:
µ ¶
U SL − µ µ − LSL
Cpk = min , .
3σ 3σ
1
Using the identity min(a, b) = 2 (|a + b| − |a − b|), we obtain the following representations:

min (U SL − µ, µ − LSL)
Cpk =

d − |µ − 12 (LSL + U SL)|
=

µ ¶
|µ − 12 (LSL + U SL)|
= 1− Cp . (1.2)
d

We immediately read off from (1.2) that Cp ≥ Cpk . Moreover, since Cp = d/(3σ), we also have
that Cp = Cpk if and only the process is centred. The notation

|µ − 12 (LSL + U SL)|
k=
d
is often used. It is also possible to define Cpk in terms of a target value T instead of the process
mean µ.
The expected proportion non-conforming items for a non-centred process with normal distri-
bution
³ can´be defined
³ in terms
´ of Cp and Cpk as follows (cf. (1.1). The expected proportion equals
Φ LSL−µ
σ + 1 − Φ U SL−µ
σ . Now assume that 12 (U SL + LSL) ≤ µ ≤ U SL. Then Cpk = U SL−µ

and
LSL − µ (U SL − µ) − (U SL − LSL)
= = Cpk − 2Cp ≤ −Cpk ,
3σ 3σ
because Cp ≥ Cpk . Hence, the expected proportion non-conforming items can be expressed as

1 − P (LSL < X < U SL) = Φ(−3(2Cp − Cpk )) + Φ(−3Cpk ). (1.3)

1.2 Estimation of capability indices


In this section we first recall some general estimation principles. These principles will be used to
obtain (optimal) estimators for capability indices.

Definition 1.2.1 Let P be a probability distribution depending on parameters θ1 , . . . , θk , where


θ = (θ1 , . . . , θk ) ranges over a set Θ ⊂ Rk and let L(θ; x1 , . . . , xn ) be the likelihood function of a
sample X1 , . . . , Xn from f . Let τ be an arbitrary function on Θ. The function Mτ (ξ; x1 , . . . , xn ) :=
supθ∈Θ : τ (θ)=ξ L(θ; x1 , . . . , xn ) is the induced likelihood function by τ . Any number ξ ∈ Θ that
maximizes Mτ is said to be an MLE of τ (θ).

The rationale behind this definition is as follows. Estimation of θ is obtained by maximizing the
likelihood function L(θ; x1 , . . . , xn ) as function of θ for fixed x1 , . . . , xn , while estimation of τ (θ)
is obtained by maximizing the induced likelihood function L(τ (θ); x1 , . . . , xn ) as function of τ (θ)
for fixed x1 , . . . , xn .
The following theorem describes a useful invariance property of Maximum Likelihood estima-
tors. Note that the theorem does not require any assumption on the function τ .

Theorem 1.2.2 (Zehna [24]) Let P be a distribution depending on parameters θ1 , . . . , θk and


b = (Θ
let Θ c1 , . . . , Θ
ck ) be an MLE of (θ1 , . . . , θk ). If τ is an arbitrary function with domain Θ, then
b
τ (Θ) is an MLE of τ ((θ1 , . . . , θk )). If moreover the MLE (Θ) b is unique, then τ (Θ)
b is unique too.
1.2. ESTIMATION OF CAPABILITY INDICES 1.3

Proof: Define τ −1 (ξ) := {θ ∈ Θ | τ (θ) = ξ} for any ξ ∈ Θ. Obviously, θ ∈ τ −1 (τ (θ)) for all θ ∈ Θ.
Hence, we have for any ξ ∈ Θ that
Mτ (ξ; x1 , . . . , xn ) = sup L(θ; x1 , . . . , xn )
θ∈τ −1 (ξ)

≤ sup L(θ; x1 , . . . , xn )
θ∈Θ
b x1 , . . . , xn )
= L(Θ;
= sup L(θ; x1 , . . . , xn )
θ∈τ −1 (τ (Θ)
b )
³ ´
b ; x1 , . . . , xn ).
= Mτ (τ Θ

b maximizes the induced likelihood function, as required. Inspection of the proof reveals
Thus τ (Θ)
b b is the unique MLE of τ ((θ1 , . . . , θk )) . ¤
that if Θ is the unique MLE of (θ1 , . . . , θk ), then τ (Θ)

We now give some examples that illustrate how to use this invariance property in order to obtain
an MLE of a function of a parameter.
Examples 1.2.3 Let X, X1 , X2 , . . . , Xn be independent random variables, each distributed ac-
cording to the normal distribution with parameters µ and σ 2 . Let Z be a standard normal random
variable with distribution function Φ. Recall that the ML estimators for µ and σ 2 are µ
b = X and
c2 = 1 Pn (Xi − X)2 , respectively.
σ n i=1

a) Suppose we want to estimate σ, when µ is unknown.


v Theorem 1.2.2 with Θ = (0, ∞) and
u n
√ u1 X ¡ ¢2
b of σ equals t
τ (x) = x yields that the MLE σ Xi − X .
n i=1

b) Suppose we want to estimate 1/σ, when µ is unknown. Theorem 1.2.2 with Θ = (0, ∞) and
à n
!−1/2
√ 1 X 2
τ (x) = 1/ x yields that the MLE of σ equals (Xi − X) . The MLE’s for Cp
n i=1
U SL−LSL min(U SL−X,X−LSL)
and Cpk easily follow from the MLE of 1/σ and are given by 6σ̂ and 3σ̂ .
c) Let p be an arbitrary number between 0 and 1 and assume that both µ and σ 2 are unknown.
Suppose that we want to estimate the p-th quantile of X, that is we want to estimate the
unique number xp such that P (X ≤ xp ) = p. Since
µ ¶ µ ¶
xp − µ xp − µ
p = P (X ≤ xp ) = P Z ≤ =Φ ,
σ σ
it follows that xp = µ + zp σ, where zp := Φ−1 (p). Thus Theorem 1.2.2 with Θ = R × (0, ∞)

and τ (x, y) = x + zp y yields that the MLE of xp equals X + zp σb, where σ
b is as in a).
d) Let a < b be arbitrary real numbers and assume that both µ and σ 2 are unknown. Suppose
we want to estimate P (a < X < b) = F (b) − F (a). Since
µ ¶ µ ¶ µ ¶
a−µ b−µ b−µ a−µ
P (a < X < b) = P <Z< =Φ −Φ ,
σ σ σ σ
µ ¶ µ ¶
b−x a−x
Theorem 1.2.2 with Θ = R × (0, ∞) and τ (x, y) = Φ √ −Φ √ yields that the
y y
MLE for P (a < X < b) equals
µ ¶ µ ¶
b−X a−X
Φ −Φ ,
σ
b σ
b
where σ
b is as in a).
1.4 CHAPTER 1. PROCESS CAPABILITY ANALYSIS

It µfollows from¶ d) that


µ the ML-estimator
¶ for proportion non-conforming items is given by 1 −
U SL − X LSL − X
Φ +Φ . This is a biased estimator (cf. Exercise 5). Using the Rao-
σ
b σ
b
Blackwell Theorem, we find an unbiased estimator. Define
(
0 if LSL < X1 < U SL
Y =
1 otherwise.

Since (X, S) are joint complete sufficiently statistics, the Rao-Blackwell theorem in combination
with the Lehmann-Scheffé theorem yields that E(Y | X, S) is an UMVU-estimator of the propor-
tion non-conforming items. For various explicit formulas of this quantity, we refer to [6, 7, 23].

1.3 Exact distribution of capability indices


Now that we have constructed several estimators, we want to study their distribution. It is well-
known that MLE’s are not unbiased in general. E.g, S is biased.
Recall that if X1 , . . . , Xn are independent random variables each with a N (µ, σ 2 ) distribution,

then the random variable (n−1)S 2 /σ 2 has a χ2n−1 -distribution. The expected value of n − 1 S/σ
thus equals √
Z ∞ √
t (n−1)/2−1 −t/2 Γ(n/2)
(n−1)/2 Γ((n − 1)/2)
t e dt = 2 .
0 2 Γ((n − 1)/2)
Hence, √
2 Γ(n/2)
E (S) = √ σ.
n − 1 Γ((n − 1)/2)
An unbiased estimator for σ is thus given by c4 S, where

2 Γ(n/2)
c4 (n) = √ .
n − 1 Γ((n − 1)/2)

Recall that Γ(x + 1) = xΓ(x) for√x 6=
√ 0, −1, −2, . . ., Γ(1) √
= 1, and√Γ(1/2) = π. Thus we have
the following recursion: c4 (2) = 2/ π and c4 (n + 1) = ( n − 1/ n)(1/c4 (n)).
Instead of the ML estimators for Cp and Cpk , one usually uses the estimators

bp = U SL − LSL
C
6S
and µ ¶
bpk = min U SL − X USL − X
C , ,
3S 3S
where X denotes the sample mean and S denotes the sample standard deviation.
Confidence intervals and hypothesis tests for Cp easily follow from the identity
à ! µ ¶
bp
C n−1
2
P > c = P χn−1 < .
Cp c2

In particular, it follows that


µ ¶1/2
bp = n−1 Γ ((n − 2)/2)
EC Cp .
2 Γ ((n − 1)/2)
bpk is quite complicated, but explicit formulas can be given because X and S
The distribution of C
are independent. We refer to [13] for details.
In order to describe the exact distribution of the ML estimator for quantiles of a normal
distribution, we need a generalization of the Student t-distribution.
1.4. ASYMPTOTIC DISTRIBUTION OF CAPABILITY INDICES 1.5

Definition 1.3.1 (Noncentral t-distribution) Let Z be a standard normal random variable


and let Y be a χ2 -distributed random variable with ν degrees of freedom. If Z and Y are indepen-
dent, then the distribution of
Z +δ
q
Y
ν

is called a noncentral t-distribution with ν degrees of freedom and non-centrality parameter δ.


For further properties and examples of the use of the noncentral t-distribution, we refer to [13, 15].

Theorem 1.3.2 The MLE X bp = X + zp σ b for xp with an underlying normal distribution is


distributed as follows:
µ µ√ ¶ ¶
n(µ − t) √
P (X + zp σ
b ≤ t) = P Tn ≤ −zp n , (1.4)
σ

where Tν (λ) denotes a random variable distributed according to the noncentral t-distribution with
ν degrees of freedom and noncentrality parameter λ.

Proof: Recall that n σ c2 /σ 2 follows a χ2 -distribution with n degrees of freedom. Combining this
with the definition of the noncentral t-distribution (see Definition 1.3.1), we obtain
µ √ √ ¶
X −µ n n (t − µ)
P (X + zp σ b ≤ t) = P √ + zp σ
b≤
σ/ n σ σ
µ √ √ ¶
n (µ − t) n
= P Z+ ≤ −zp σ
b
σ σ
µ µ√ ¶ ¶
n (µ − t) √
= P Tn ≤ −zp n .
σ
¤

1.4 Asymptotic distribution of capability indices


We now turn to the asymptotic distribution of the estimators that we encountered. It is well-known
that under suitable assumptions MLE’s are asymptotically normal. In concrete situations, one
may alternatively prove asymptotic normality by using the following theorem. It may be useful
to recall that the Jacobian of a function is the matrix of partial derivatives of the component
functions. For real functions on the real line, the Jacobian reduces to the derivative.

Theorem 1.4.1 (Cramér) Let g be a function from Rm to Rk which is totally differentiable at a


d
point a. If (Xn )n∈N is a sequence of m-dimensional random vectors such that cn (Xn − a) −→ X
for some random vector X and some sequence of scalars (cn )n∈N with limn→∞ cn = ∞, then
d
cn (g(Xn ) − g(a)) −→ Jg(a) X,

where Jg is the Jacobian of g.

Proof: See text books on mathematical statistics.

With this theorem, we can compute the asymptotic distribution of many estimators, in particular
those discussed above. Among other things, this is useful for constructing confidence intervals
(especially when the finite sample distribution is intractable).

We start with the asymptotic distribution of the sample variance. We first recall a multivariate
version of the Central Limit Theorem.
1.6 CHAPTER 1. PROCESS CAPABILITY ANALYSIS

Theorem 1.4.2 (Multivariate Central Limit Theorem) Let X1 , . . . , Xn be i.i.d. random


vectors with existing covariance matrix Σ. Then, as n → ∞,
à n !
1 1X d
n2 Xi − E X1 −→ N (h0, . . . , 0i, Σ).
n i=1

Theorem 1.4.3 Let X, X1 , X2 , . . . be independent identically distributed random variables with


µ4 = E X 4 < ∞. Then the following asymptotic result holds for the MLE σ c2 of σ 2 :
√ ³c2 ´
d
n σ − σ 2 −→ N (0, µ4 − σ 4 ).

If moreover the parent distribution is normal, then


√ ³c2 ´
d
n σ − σ 2 −→ N (0, 2 σ 4 ).

Proof: Because the variance does not depend on the mean, we assume without loss of generality
that µ = 0. Since we have finite fourth moments, we infer from the multivariate Central Limit
Theorem ( = Theorem 1.4.2) that
"µ Pn ¶ µ ¶#
1
√ n i=1 Xi 0 d
n 1
Pn 2
− 2
−→ N (0, Σ),
n i=1 Xi σ

where Σ is the covariance matrix of X and X 2 . Since σ c2 = 1/n Pn X 2 − X 2 , we apply


i=1 i
Theorem 1.4.1 with g(x, y) = y − x2 . We compute Jg(0, σ 2 ) = (0 1). Now recall that if Y is
a random variable with a multinormal distribution N (µ, Σ) and L is a linear map of the right
d
dimensions, then L Y = N (L µ, L Σ LT ). Hence,
√ ³c2 ´
d ¡ ¢
n σ − σ2 −→ N 0, Jg(0, σ 2 ) Σ Jg(0, σ 2 )T
= N (0, Var X 2 )
= N (0, µ4 − σ 4 ).

The last statement follows from the fact that for zero-mean normal distributions µ4 = 3 σ 4 . ¤

For the asymptotic distribution of σ


b and 1/b
σ , see Exercise 2.

Theorem 1.4.4 Let X, X1 , X2 , . . . be independent normal random variables with mean µ and
variance σ 2 . If µ and σ 2 are unknown, then the following asymptotic result holds for the MLE
cp = X + zp σ
X b of xp :
√ ³ ´
d ¡ ¡ ¢¢
n X cp − (µ + zp σ) −→ N 0, σ 2 1 + 21 zp2 .

√ ¡ ¢ d
Proof: The Central Limit Theorem yields that n X − µ −→ N (0, σ 2 ). Combining Theo-
√ d
rems 1.4.1 and 1.4.3, we have that n (bσ − σ) −→ N (0, 21 σ 2 ). Now recall that since we have a
normal sample, X and σ b are independent. Hence, it follows from Slutsky’s Lemma that
√ ¡ ¢ d
b − µ − zp σ −→ N (0, σ 2 ) ∗ N (0, 21 zp2 σ 2 ).
n X + zp σ

The result now follows from the elementary fact

N (0, σ12 ) ∗ N (0, σ22 ) = N (0, σ12 + σ22 ).

This concludes the proof. ¤


1.4. ASYMPTOTIC DISTRIBUTION OF CAPABILITY INDICES 1.7

Theorem 1.4.5 Let X, X1 , X2 , . . . be independent normal random variables with mean µ and
variance σ 2 and let ϕ be the standard normal density. If µ and σ 2 are unknown, then the following
asymptotic result holds for the MLE of P (a < X < b) = P ((a, b)):
µ µ ¶ µ ¶ ¶
√ b−X a−X d ¡ ¡ ¢¢
n Φ −Φ − P (a < X < b) −→ N (0, σ 2 c21 + 4 c1 c2 µ + 2 c22 (2µ2 + σ 2 ) ,
σ
b σ
b

where
µ ¶ µ ¶
b − µ b µ − (µ2 + σ 2 ) a − µ a µ − (µ2 + σ 2 )
c1 = ϕ −ϕ
σ 2σ 3 σ 2σ 3
µ ¶ µ ¶
b−µ µ−b a−µ µ−a
c2 = ϕ 3
−ϕ .
σ 2σ σ 2σ 3

Proof: We infer from the multivariate Central Limit Theorem ( = Theorem 1.4.2) that
·µ Pn ¶ µ ¶¸
√ 1/n i=1 Xi µ d
n Pn 2
− 2 2
−→ N (0, Σ) ,
1/n i=1 Xi µ + σ
d
where Σ is the covariance matrix of X and X 2 . Since E Z 4 = 3 and X = µ + σ Z where Z is a
standard normal random variable, we have

E X3 = µ3 + 3 µ σ 2
Var X 2 = µ4 + 6 µ2 σ 2 + 3 σ 4 − (µ2 + σ 2 )2 = 2 σ 2 (2 µ2 + σ 2 ).

Hence, µ ¶
σ2 2 µ σ2
Σ= .
2 µ σ2 2 σ 2 (2 µ2 + σ 2 )
Now we wish to apply Theorem 1.4.1 with
à ! à !
b−x a−x
g(x, y) = Φ p −Φ p .
y − x2 y − x2

This function is totally differentiable, except on the line y = x2 . Since we evaluate at x = µ


and y = µ2 + σ 2 , we have that y − x2 = σ 2 > 0. Hence, there are no differentiability problems.
c−x
Note that the partial derivatives of f (x, y) = p with respect to x and y are given by
y − x2
cx − y x−c
2 3/2
, respectively. Thus the transpose of the Jacobian of g is given by
2(y − x ) 2(y − x2 )3/2
 Ã ! Ã ! 
b−x bx − y a−x ax − y
ϕ p −ϕ p 
 y − x2 2(y − x2 )3/2 y − x2 2(y − x2 )3/2 
 
 Ã ! Ã ! ,
 
ϕ pb − x x−b
− ϕ p
a−x x−a 
y − x2 2(y − x2 )3/2 y − x2 2(y − x2 )3/2

where ϕ(x) is the standard normal density. Evaluating at x = µ and y = µ2 + σ 2 , we see that this
reduces to  µ ¶ µ ¶ 
b − µ b µ − (µ2 + σ 2 ) a − µ a µ − (µ2 + σ 2 )
 ϕ − ϕ 
 σ 2σ 3 σ 2σ 3 
 µ ¶ µ ¶ .
 b−µ µ−b a−µ µ−a 
ϕ 3
−ϕ 3
σ 2σ σ 2σ
Putting everything together yields the result. ¤
1.8 CHAPTER 1. PROCESS CAPABILITY ANALYSIS

1.5 Tolerance intervals


In the previous section we generalized estimation of parameters to estimation of functions of
parameters. Two important examples were the p-th quantile and the fraction P (a < X < b) of
a distribution. We now present a further generalization. Instead of considering estimators that
are real-valued functions of the sample, we will study estimators that are set-valued functions (in
particular, functions whose values are intervals).

Many practical situations require knowledge about the location of the complete distribution. E.g.,
one would like to construct intervals that cover a certain percentage of a distribution. Such
intervals are known as tolerance intervals. Although they are of great practical importance, this
topic is ignored in many text books. Many practical applications (and theory) can be found in [1].
The monograph [9], the review paper [16] and the bibliographies [10, 11] are also excellent sources
of information on this topic.

In this section we will give an introduction to tolerance intervals based on the normal distribution.
It is also possible to construct intervals for other distributions (see e.g., [1]).

Definition 1.5.1 Let X1 , . . . , Xn be a sample from a continuous distribution P with distribution


function F . An interval T (X1 , . . . , Xn ) = (L, U ) is said to be a β-content tolerance interval at
confidence level α if

P (P (T (X1 , . . . , Xn )) ≥ β) = P (F (U ) − F (L) ≥ β) = α. (1.5)

The random variable P (T (X1 , . . . , Xn )) is called the coverage of the tolerance interval.

This type of tolerance interval is sometimes called a guaranteed content interval. There also exists
a one-sided version of this type of tolerance interval.

Definition 1.5.2 Let X1 , . . . , Xn be a sample from a continuous distribution P with distribution


function F . The estimator U (X1 , . . . , Xn ) is said to be a β-content upper tolerance limit at
confidence level α if
P (F (U (X1 , . . . , Xn )) ≥ β) = α. (1.6)
Similarly, the estimator L(X1 , . . . , Xn ) is said to be a β-content lower tolerance limit at confidence
level α if
P (1 − F (L(X1 , . . . , Xn )) ≥ β) = α. (1.7)

Definition 1.5.3 Let X1 , . . . , Xn be a sample from a continuous distribution P with distribution


function F . An interval T (X1 , . . . , Xn ) = (L, U ) is said to be a β-expectation tolerance interval
if the expected coverage equals β, i.e.

E (P (T (X1 , . . . , Xn ))) = E (F (U ) − F (L)) = β. (1.8)

There are interesting relations between these concepts and quantiles. Let X1 , . . . , Xn be a sample
from a continuous distribution P with distribution function F . Since

P (F (U ) ≤ β) = P (U ≤ F −1 (β)),

it follows immediately that an upper (lower) (α, β) tolerance limit is an upper (lower) confidence
interval for the quantile F −1 (β) and vice-versa.

Definition 1.5.4 Let X1 , . . . , Xn be a sample from a continuous distribution P with distribution


function F . An interval T (X1 , . . . , Xn ) = (L, U ) is said to be a 100 × β% prediction interval if

P (L < X < U ) = β. (1.9)


1.5. TOLERANCE INTERVALS 1.9

Prediction intervals are usually associated with regression analysis, but also appear in other con-
texts as we shall see. The following proposition shows a surprising link between β-expectation
tolerance intervals and prediction intervals. It also has interesting corollaries as we shall see later
on.

Proposition 1.5.5 (Paulson [18]) A β-expectation tolerance interval is a 100 × β% prediction


interval.

Proof: We use the following well-known property of conditional expectations:

E (E (Y | X)) = E Y.

Hence, rewriting the probability in the definition of prediction interval in terms of an expectation,
we obtain:

P (L < X < U ) = E (1L<X<U )


= E (E (1L<X<U | L, U ))
= E (P (L < X < U | L, U ))
= E (P (L, U ))
= E (F (U ) − F (L)) ,

as required. ¤

For normal distributions, it is rather natural to construct tolerance ¡ intervals using


¢ the jointly
sufficient statistics X and S 2 . In particular, intervals of the form X − k S, X + k S are natural
candidates. Unfortunately, the distribution of the coverage of even such simple intervals is very
complicated if both µ and σ are unknown. However, we may use the Paulson result to compute
the first moment, i.e. the expected coverage.

Corollary 1.5.6 Let X1 , . . . , Xn be a sample from


¡ a normal distribution
¢ with mean µ and vari-
ance σ 2 . The expected coverage of the interval X − k S, X + k S equals β if and only if k =
q
1 + n1 tn−1;(1−β)/2 .

Proof: We first have to show that the expectation is finite. Note that the coverage may be written
in this case as
µ ¶ µ ¶
X −µ S X −µ S
F (X + kS) − F (X − kS) = Φ +k −Φ −k .
σ σ σ σ

Since 0 ≤ Φ(x) ≤ 1 for all x ∈ R, it suffices to show that the expectations of X and S are finite.
The first expectation is trivial, while the second one follows from Exercise 4.

Now Proposition 1.5.5 yields that it suffices to choose k such that (X − k S, X + k S) is a 100 × β%
prediction interval. In other words, k must be chosen such that
¡ ¢
P X − k S < X < X + k S = β,
d
where
¡ X is independent
¢ of X1 , . . . , Xn , but follows the same normal distribution. Hence, X − X =
N 0, σ 2 (1 + n1 ) . Thus we have the following equalities:
¡ ¢
β = P X −kS < X < X +kS
µ ¶
X −X
= P −k < <k
S
µ q ¶
1
= P −k < 1 + n Tn−1 < k ,
1.10 CHAPTER 1. PROCESS CAPABILITY ANALYSIS
q
1
from which we see that we must choose k = 1+ n tn−1;(1−β)/2 . ¤

For several explicit tolerance intervals under different assumptions, we refer to the exercises. There
is no closed solution for (α, β) tolerance intervals for normal distributions when both µ and σ 2 are
unknown; numerical procedures for this problem can be found in [4].

1.6 Density estimators


Although the distribution function completely characterizes the distribution and hence all charac-
teristics can be computed from it in principle, there are many cases in which one wants to estimate
the density directly. Apart from being more intuitive, knowledge of the density is required for
examining the shape of the distributions (e.g., to assess whether a distribution is a mixture of two
other distributions which is often reflected through bimodality). The density is also required for
estimation of the hazard rate f (x)/(1 − F (x)). Density estimation is also used in nonparametric
pattern recognition (discriminant analysis) when the densities of the feature vectors are unknown
and are to be estimated from training samples (see e.g. [5]).
If we know the shape of the density (e.g., a normal distribution), then density estimation
reduces to parameter estimation. We will study the case where the form of the density is not
known, i.e. we study nonparametric density estimation. A widely used density estimator of
(although it is not always recognized as such) is the histogram. Let X1 , . . . , Xn be a random
sample from a distribution function F (pertaining to a law P ) on R, with continuous derivative
F 0 = f . As before, we denote the empirical distribution function by Pn . Let I be a compact
interval on R and suppose that the intervals I1 , . . . , Ik form a partition of I, i.e.
I = I1 ∪ . . . ∪ Ik , Ii ∩ Ij = ∅ if i 6= j.
The histogram of X1 , . . . , Xn with respect to the partition I1 , . . . , Ik is defined as
k
X Pn (Ij ) IIj (x)
Hn (x) := ,
j=1
|Ij |

where |Ij | denotes the length of the interval Ij . It is clear that the histogram is a stepwise constant
function. Two major disadvantages of the histogram are
• the stepwise constant nature of the histogram
• the fact that the histogram heavily depends on the choice of the partition
In order to illustrate the last point, consider the following two histograms that represent the same
data set:

−4 −2 2 4 −4 −2 2 4
Histograms of sample of size 50 from mixture of 2 normal distributions
1.6. DENSITY ESTIMATORS 1.11

It is because of this phenomenon that histograms are not to be recommended. A natural way to

improve on histograms is to get rid of the fixed partition by putting an interval around each point.
If h > 0 is fixed, then
bn (x) := Pn ((x − h, x + h))
N (1.10)
2h
is called the naive density estimator and was introduced in 1951 by Fix and Hodges in an unpub-
lished report (reprinted in [5]) dealing with discriminant analysis. The motivation for the naive
estimator is that Z x+h
P (x − h < X < x + h) = f (t) dt ≈ 2 h f (x). (1.11)
x−h
Note that the naive estimator is a local procedure; it uses only the observations close to the point
at which one wants to estimate the unknown density. Compare this with the empirical distribution
function, which uses all observations to the right of the point at which one is estimating.
bn decreases as h tends to 0. However, if h
It is intuitively clear from (1.11) that the bias of N
tends to 0, then one is using less and less observations, and hence the variance of N bn increases.
This phenomenon occurs often in density estimation. The optimal value of h is a compromise
between the bias and the variance. We will return to this topic of great practical importance when
we discuss the MSE.
The naive estimator is a special case of the following class of density estimators. Let K be a
kernel function, that is a nonnegative function such that
Z ∞
K(x) dx = 1. (1.12)
−∞
The kernel estimator with kernel K and bandwidth h is defined by
n µ ¶
1 X 1 x − Xi
fbn (x) := K . (1.13)
n i=1 h h
Thus, the kernel indicates the weight that each observation receives in estimating the unknown
density. It is easy to verify that kernel estimators are densities and that the naive estimator is a
kernel estimator with kernel (
1
if |x| < 1
K(x) = 2
0 otherwise.
Remark 1.6.1 The kernel estimator can also be written in terms of the empirical distribution
function Fn : Z ∞ µ ¶
1 x−y
fbn (x) = K dFn (y),
−∞ h h
where the integral is a Stieltjes integral.
Examples of other kernels include:

name function
1 1 2
Gaussian √ e− 2 x

1
naive/rectangular 1(−1,1) (x)
2
triangular (1 − |x|) 1(−1,1) (x)
15
biweight (1 − x2 )2 1(−1,1) (x)
16
3
Epanechnikov (1 − x2 ) 1(−1,1) (x)
4
1.12 CHAPTER 1. PROCESS CAPABILITY ANALYSIS

The kernel estimator is a widely used density estimator. A good impression of kernel estimation
is given by the books [20] and [22]. For other types of estimators, we refer to [20] and [21].

1.6.1 Finite sample behaviour of density estimators


In order to assess point estimators, we look at properties like unbiasedness and efficiency. In density
estimation, it is very important to know the influence of the bandwidth h (cf. our discussion of the
naive estimator). To combine the assessment of these properties, the Mean Square Error (MSE)
is used. We now discuss the analogues of these properties for density estimators. The difference
is that the estimate is not a single number, but a function. However, we start with pointwise
properties.

Theorem 1.6.2 Let fbn be a kernel estimator with kernel K. Then


Z µ ¶ Z ³y´
1 ∞ x−y 1 ∞
E fbn (x) = K f (y) dy = K f (x − y) dy. (1.14)
h −∞ h h −∞ h

RProof:

This follows from the fact that for a random variable X with density f , we have E g(X) =
−∞
g(x) f (x) dx. ¤.

Theorem 1.6.3 Let fbn be a kernel estimator with kernel K. Then


Z ∞ µ ¶ ½Z ∞ µ ¶ ¾2
1 x−y 1 x−y
Var fbn (x) = K 2
f (y) dy − K f (y) dy . (1.15)
n h2 −∞ h n h2 −∞ h

Proof: It is easy to see that

1 X 1 2 ³ x − Xi ´ 1 X 1 ³ x − Xi ´ ³ x − Xj ´
n n
¡ ¢2
fˆn (x) = 2 K + K K .
n i=1 h2 h n2 i,j=1 h2 h h
i6=j

Then
Z ³x − y´ Z
¡ ¢2 1 ∞
n − 1³ ∞ ³x − y´ ´2
E fˆn (x) = K2 f (y)dy + K f (y)dy .
nh2 −∞ h nh2 −∞ h

Next use Theorem 4.2 and the well-known fact that Var X = E X 2 − (E X)2 . ¤

The following general result due to Rosenblatt (see [19] for a slightly more general result) shows
that we cannot have unbiasedness for all x.

Theorem 1.6.4 (Rosenblatt [19]) A kernel estimator can not be unbiased for all x ∈ R.
Rb
Proof: We argue by contradiction. Assume that E fbn (x) = f (x) for all x ∈ R. Then a
fbn (x) dx
is an unbiased estimator for F (b) − F (a), since
Z b Z b Z b
E fbn (x) dx = E fbn (x) dx = f (x) dx = F (b) − F (a),
a a a

where the interchange of integrals is allowed since the integrand is positive. Now it can be shown
that the only unbiased estimator of F (b) − F (a) symmetric in X1 , . . . , Xn is Fn (b) − Fn (a). This
leads to a contradiction, since it implies that the empirical distribution function is differentiable.
¤

For point estimators, the MSE is a useful concept. We now generalize this concept to density
estimators.
1.6. DENSITY ESTIMATORS 1.13

Definition 1.6.5 The Mean Square Error at x of a density estimator fb is defined as


³ ´2
MSEx (fb) := E fb(x) − f (x) . (1.16)

The Mean Integrated Square Error of a density estimator fb is defined as


Z ∞ ³ ´2
MISE(fb) := E fb(x) − f (x) dx. (1.17)
−∞

Theorem 1.6.6 For a kernel density estimator fbn with kernel K the MSE and MISE can be
expressed as:
Z ∞ µ ¶ ½Z ∞ µ ¶ ¾2
b 1 2 x−y 1 x−y
MSEx (fn ) = K f (y) dy − K f (y) dy +
n h2 −∞ h n h2 −∞ h
µ Z ∞ µ ¶ ¶2 (1.18)
1 x−y
K f (y) dy − f (x) .
h −∞ h
Z ∞ ÃZ ∞ µ ¶ ½Z ∞ µ ¶ ¾2 !
b 1 2 x−y x−y
MISE(fn ) = K f (y) dy − K f (y) dy dx+
n h2 −∞ −∞ h −∞ h
Z ∞ µ Z ∞ µ ¶ ¶2
1 x−y
K f (y) dy − f (x) dx.
−∞ h −∞ h
(1.19)

Proof: Combination of Exercise 14 with formulas (1.14) and (1.15) yields the formula for the
MSE. Integrating this formula with respect to x, we obtain the formula for the MISE. ¤

The above formulas can in general not be evaluated explicitly. When both the kernel and the
unknown density are Gaussian, then straightforward but tedious computations yield explicit for-
mulas as shown in [8]. These formulas were extended in [14] to the case of mixtures of normal
distributions. Marron and Wand claim in [14] that the class of mixture of normal distributions is
very rich and that it is thus possible to perform exact calculations for many distributions. These
calculations can be used to choose an optimal bandwidth h (see [14] for details).

For other examples of explicit MSE calculations, we refer to [3] and the exercises.

We conclude this section with a note on the use of Fourier analysis. Recall that the convolution
of two functions g1 and g2 is defined as
Z ∞
(g1 ∗ g2 )(x) := g1 (t) g2 (x − t) dt.
−∞

One of the elementary properties of the Fourier transform is that it transforms the complicated
convolution operation into the elementary multiplication operation, i.e.

F(g1 ∗ g2 ) = F(g1 ) F(g2 ),

where F(g) denotes the Fourier transform of g, defined by


Z ∞
(F(g))(s) = g(t)eist dt.
−∞

The formulas (1.14) and (1.15) show that E fbn (x) and Var fbn (x) can be expressed in terms of
convolutions of the kernel with the unknown density. The exercises contain examples in which
Fourier transforms yield explicit formulas for the mean and the variance of the kernel estimator.
1.14 CHAPTER 1. PROCESS CAPABILITY ANALYSIS

Another (even more important) use of Fourier transforms is the computation of the kernel estimate
itself. Computing density estimates directly from the definition is often very time consuming.
Define the function u by
n
1 X i s Xj
u(s) = e . (1.20)
n j=1

Then the Fourier transform of the kernel estimator is a convolution of u with the Fourier transform
of the kernel (see Exercise 18). Using Fast Fourier Transform (FFT), one can efficiently compute
good approximations to the kernel estimates. For details we refer to [20, pp. 61-66] and [22,
Appendix D].

1.6.2 Asymptotic behaviour of kernel density estimators


We have seen in the previous section that it is possible to evaluate exactly the important properties
of kernel density estimators. However, the unknown density f appears in a complicated way in
exact calculations, which limits the applicability. Such calculations are very important for choosing
the optimal bandwidth h. Therefore, much effort has been put in obtaining asymptotic results
in which the unknown density f appears in a less complicated way. In this section we give an
introduction to these results. Many of the presented results can be found in [17, 19]. For an
overview of more recent results, we refer to the monographs [20, 22].

Theorem 1.6.7 (Bochner) Let K be a bounded kernel function such that lim|y|→∞ y K(y) = 0.
Define for any absolutely integrable function g the functions
Z ∞ µ ¶
1 y
gn (x) := K g(x − y) dy,
hn −∞ hn

where (hn )n∈N is a sequence of positive numbers such that limn→∞ hn = 0. If g is continuous at
x, then we have
lim gn (x) = g(x). (1.21)
n→∞
R∞ 1
¡y¢ R∞
Proof: Since −∞ h
K h dy = −∞
K(y) dy = 1, we may write
¯ Z ∞ µ ¶ ¯
¯ 1 y ¯
|gn (x) − g(x)| = ¯gn (x) − g(x) K dy ¯¯
¯
−∞ hn hn
Z ∞ ¯ µ ¶¯
¯ 1 y ¯¯
≤ ¯{g(x − y) − g(x)} K dy.
¯ hn hn ¯
−∞

Let δ > 0 be arbitrary. We now split the integration interval into 2 parts: {y : |y| ≥ δ} and
{y : |y| < δ}. The first integral can be bounded from above by
Z µ ¶ Z µ ¶
|g(x − y)| y y 1 y
K dy + |g(x)| K dy ≤
|y|≥δ y hn hn |y|≥δ hn hn
Z Z
sup|v|≥δ/hn |v K(v)|
|g(x − y)| dy + |g(x)| K(t) dt ≤
δ |y|≥δ |t|≥δ/hn
Z Z
sup|v|≥δ/hn |v K(v)| ∞
|g(u)| du + |g(x)| K(t) dt .
δ −∞ |t|≥δ/hn

Letting n → ∞ and using that K is absolutely integrable, we see that these terms can be made
arbitrarily small. The integral over the second region can be bounded from above by
Z
sup |g(x − y) − g(x)| K(y) dy ≤ sup |g(x − y) − g(x)|.
|y|<δ |y|<δ |y|<δ
1.6. DENSITY ESTIMATORS 1.15

Since this holds for all δ > 0 and g is continuous at x, the above expression can be made arbitrarily
small. ¤.

As a corollary, we obtain the following asymptotic results (taken from [17]) for the mean and
variance of the kernel estimator at a point x.

Corollary 1.6.8 (Parzen) Let fbn be a kernel estimator such that its kernel K is bounded and
satisfies lim|y|→∞ y K(y) = 0. Then fbn is an asymptotically unbiased estimator for f at all
continuity points x if limn→∞ hn = 0.

Proof: Apply Theorem 1.6.7 to Formula (1.14). ¤.

In the above corollary, there is no restriction on the rate at which (hn )n∈N converges to 0. The
next corollaries show that if (hn )n∈N converges to 0 slower than n−1 , then fbn (x) is consistent in
the sense that the MSE converges to 0.

Corollary 1.6.9 (Parzen) Let fbn be a kernel estimator such that its kernel K is bounded and
satisfies lim|y|→∞ y K(y) = 0. If limn→∞ hn = 0 and x is a continuity point of the unknown
density f , then Z ∞
b
lim n hn Var fn (x) = f (x) K 2 (y) dy.
n→∞ −∞

Proof: First note that since K is bounded, K 2 also satisfies the conditions of Theorem 1.6.7.
Hence, the result follows from applying Theorem 1.6.7 and Exercise 21 to Formula (1.15). ¤.

Corollary 1.6.10 (Parzen) Let fbn be a kernel estimator such that its kernel K is bounded and
satisfies lim|y|→∞ y K(y) = 0. If limn→∞ hn = 0, limn→∞ n hn = ∞ and x is a continuity point
of the unknown density f , then
lim MSEx (fbn ) = 0.
n→∞

Proof: It follows from Corollary 1.6.9 that limn→∞ Var fbn (x) = 0. The result now follows by
combining Corollary 1.6.8 and Exercise 14. ¤.

Although the above theorems give insight in the asymptotic behaviour of density estimators, they
are not sufficient for practical purposes. Therefore, we now refine them by using Taylor expansions.

Theorem 1.6.11 b
R ∞ Let fn be a kernel estimator such that its kernel K is bounded and symmetric
and such that −∞ |t3 | K(t) dt exists and is finite. If the unknown density f has a bounded third
derivative, then we have that
Z ∞
1
E fbn (x) = f (x) + h2 f 00 (x) t2 K(t) dt + o(h2 ), h ↓ 0 (1.22)
2 −∞
Z ∞ µ ¶
1 1
Var fbn (x) = f (x) K 2 (t) dt + o , h ↓ 0 and nh → ∞ (1.23)
nh −∞ nh
Z ∞ µ Z ∞ ¶2 µ ¶
1 1 4 1 ¡ ¢
MSEx (fbn ) = f (x) K 2 (t) dt + h f 00 (x) t2 K(t) dt +o + o h4 ,
nh −∞ 4 −∞ nh
h ↓ 0 and nh → ∞.
(1.24)

Proof: By Formula (1.14) and a change of variables, we may write the bias as
Z ∞
b
E fn (x) − f (x) = K(t) {f (x − th) − f (x)} dt.
−∞
1.16 CHAPTER 1. PROCESS CAPABILITY ANALYSIS

Now Taylor’s Theorem with the Lagrange form of the remainder says that
(th)2 00 (th)3 000
f (x − th) = f (x) − th f 0 (x) + f (x) − f (ξ),
2 3!
R∞
where ξ depends on x, t, and h and is such that |x − ξ| < |th|. Since −∞ K(t) dt = 1, it follows
that Z ∞ µ ¶
b 0 (th)2 00 (th)3 000
E fn (x) − f (x) = K(t) −th f (x) + f (x) − f (ξ) dt,
−∞ 2 3!
which because of the symmetry of K simplifies to
Z ∞ µ ¶
(th)2 00 (th)3 000
E fbn (x) − f (x) = K(t) f (x) − f (ξ) dt.
−∞ 2 3!
If M denotes an upper bound for f 000 , then the first result follows from
¯ Z ∞ ¯ 3 Z ∞ ¯
¯ ¯ ¯
¯E fbn (x) − f (x) − 1 h2 f 00 (x) t2
K(t) dt¯ ≤ h ¯t3 K(t) f 000 (ξ)¯ dt
¯ 2 ¯ 3! −∞
−∞
Z ∞
h3
≤ M |t3 K(t)| dt,
3! −∞
where the last term obviously is o(h2 ).
The asymptotic expansion of the variance follows immediately from Corollary 1.6.9. In order
to obtain the asymptotic expansion for the MSE, it suffices to combine Exercise 14 with Formu-
las (1.22) and (1.23). ¤

These expressions show that the asymptotic expressions are much easier to interpret than the exact
expression of the previous section. For example, we can now clearly see that the bias decreases
if h is small and that the variance decreases if h is large (cf. our discussion of the naive density
estimator).

Theorem 1.6.11 is essential for obtaining optimal choices of the bandwidth. If we assume that f 00
is square integrable, then it follows from Formula (1.24) that for h ↓ 0 and nh → ∞:
Z ∞ Z ∞ µZ ∞ ¶2 µ ¶
1 1 1
MISE(fbn ) = K 2 (t) dt + h4 (f 00 )2 (x) dx t2 K(t) dt + o + o(h4 ).
nh −∞ 4 −∞ −∞ nh
(1.25)
The expression
Z ∞ Z ∞ µZ ∞ ¶2
1 1 2
K 2 (t) dt + h4 (f 00 ) (x) dx t2 K(t) dt (1.26)
n h −∞ 4 −∞ −∞

is called the asymptotic MISE, often abbreviated as AMISE. Note that Formula (1.26) is much
easier to understand than Formula (1.19). We now see (cf. Exercise 22) how to balance between
squared bias and variance in order to obtain a choice of h that minimizes the MISE:
 1/5
R∞ 2
 −∞
K (t) dt 
hAMISE =  ³R ´2 R  . (1.27)
∞ ∞ 00 )2 (x) dx
4n −∞ t2 K(t) dt −∞
(f

R∞ 2
An important drawback of Formula (1.27) is that it depends on −∞ (f 00 ) (x) dx, which is un-
known. However, there are good methods for estimating this quantity. For details, we refer to the
literature ([20, 22]). An example of a simple method is given in Exercise 23.

Given an optimal choice of the bandwidth h, we may wonder which kernel gives the smallest
MISE. It turns out that the Epanechnikov kernel is the optimal kernel. However, the other kernels
perform nearly as well, so that the optimality property of the Epanechnikov kernel is not very
important in practice. For details, we refer to [20, 22].
1.7. EXERCISES 1.17

1.7 Exercises
In all exercises X, X1 , X2 , . . . are independent identically distributed normal random variables with
mean µ and variance σ 2 , unless otherwise stated.

Exercise 1 Assume that that the main characteristic of a production process follows a normal
distribution and that Cp equals 1.33.

a) What is the percentage non-conforming items if the process is centred (that is, if µ = (U SL+
LSL)/2 )?

b) What is the percentage non-conforming items if µ = (2U SL + LSL)/3?

Exercise 2 Find the asymptotic distribution of σ


b and 1/b
σ , where σ
b is the MLE for σ. What is
bp ?
the asymptotic distribution of C

bp (both exact and


Exercise 3 Compute a 100 (1 − α)%-confidence interval for Cp based on C
asymptotic).

Exercise 4 Compute Var S.


µ ¶ µ ¶
b−X a−X
Exercise 5 Assume that σ is known. Show that Φ −Φ is a biased estimator
σ σ
for P (a < X < b).

Exercise 6 Construct a β-expectation tolerance interval in the trivial case when both µ and σ 2
are known.

Exercise 7 Construct a β-expectation tolerance interval when µ is unknown and σ 2 is known.

Exercise 8 Construct a β-expectation tolerance interval when µ is known and σ 2 is unknown.

Exercise 9 Construct a β-content tolerance interval at confidence level α in the trivial case when
both µ and σ 2 are known. What values can α take?

Exercise 10 Construct a β-content tolerance interval at confidence level α when µ is unknown


and σ 2 is known.

Exercise 11 Construct a β-content tolerance interval at confidence level α when µ is known and
σ 2 is unknown.

Exercise 12 Verify that the naive estimator is a kernel estimator.

Exercise 13 Verify that the kernel estimator is a density.

Exercise 14 Prove that for any density estimator fb we have


³ ´2
MSEx (fb) = Var fb(x) + E fb(x) − f (x) .

Exercise 15 Show that formula (1.19) can be rewritten as


Z ∞ µ ¶Z ∞ µZ ∞ µ ¶ ¶2
1 1 1 x−y
MISE(fbn ) = K 2 (y) dy + 1 − 2
K f (y) dy dx −
n h −∞ n −∞ h −∞ h
Z Z ∞ µ ¶ Z ∞
2 ∞ x−y
K f (y) dyf (x) dx + f 2 (x) dx.
h −∞ −∞ h −∞
1.18 CHAPTER 1. PROCESS CAPABILITY ANALYSIS

Exercise 16 Calculate the following estimators for µ, σ 2 respectively, where fbn is a kernel esti-
mator for f with a symmetric kernel K (that is, K(x) = K(−x)):
Z ∞
a) µ
b= x fbn (x) dx.
−∞
Z ∞
c2 =
b) σ b)2 fbn (x) dx.
(x − µ
−∞

Exercise 17 Verify by direct computation that the naive estimator is biased in general.

Exercise 18 Use (1.20) to find a formula for the Fourier transform of fbn .

Exercise 19 Suppose that K is a symmetric kernel, i.e. K(x) = K(−x). Show that MISE(fc
n)
equals
Z ∞ Z ∞ ½ ¾
1 1 1
(FK)2 (t) dt + (1 − ) (FK)2 (ht) − 2 FK(ht) + 1 |Ff (t)|2 dt.
2πnh −∞ 2 π −∞ n

Hint: use Parseval’s identity


Z ∞ Z ∞
1
g1 (x) g2 (x) dx = Fg1 (t) Fg2 (t) dt.
−∞ 2π −∞

Exercise 20 The Laplace kernel is defined by K(x) := 12 e−|x| . Use the results of the previous
exercise to derive an expression for the MISE of the kernel estimator with the Laplace kernel when
the density is an exponential density.

R ∞Show that a version


Exercise 21 R ∞of Bochner’s Theorem 1.6.7 holds if we relax the conditions
K ≥ 0 and −∞ K(y) dy = 1 to −∞ |K(y)| dy < ∞.

Exercise 22 Prove Formula (1.27) for the optimal bandwidth based on the AMISE.

Exercise 23 Prove that if f is a normal density with parameters µ and σ 2 , then


 1/5
√ R∞ 2
 8 π −∞ K (t) dt 
hAMISE = ³R ´2  σ.

3 n −∞ t2 K(t) dt

How can this be used to select a bandwidth? What is the rationale behind this bandwidth choice?

Exercise 24 Suppose that we take hn = c n−γ where c > 0 and γ ∈ (0, 1). Which value of γ gives
the optimal rate of convergence for the MSE?
Bibliography

[1] J. Aitchison, and I. Dunsmore, Statistical Prediction Analysis, Cambridge University Press,
1975.

[2] R.B. D’Agostino and M.A. Stephens (eds.), Goodness-of-fit Techniques, Marcel Dekker, New
York, 1986.

[3] P. Deheuvels, Estimation nonparametrique de la densité par histogrammes generalisés, Revue


de Statistique Appliquée 35 (1977), 5–42.

[4] K.R. Eberhardt, R.W. Mee and C.P. Reeve, Computing factors for exact two-sided tolerance
limits for a normal distribution, Communications in Statistics - Simulation and Computation
18 (1989), 397–413.

[5] E. Fix and J.L. Hodges, Discriminatory analysis - nonparametric discrimination: consistency
properties, International Statistical Reviews 57 (1989), 238–247.

[6] J.L. Folks, D.A. Pierce and C. Stewart, Estimating the fraction of acceptable product, Tech-
nometrics 7 (1965), 43–50.

[7] W.C. Guenther, A note on the Minimum Variance Unbiased estimate of the fraction of a
normal distribution below a specification limit, American Statistician 25 (1971), 18–20.

[8] M.J. Fryer, Some errors associated with the non-parametric estimation of density functions,
Journal of the Institute of Mathematics and its Applications 18 (1976), 371–380.

[9] I. Guttman, Statistical Tolerance regions: Classical and Bayesian, Charles Griffin, 1970.

[10] M. Jı́lek, A bibliography of statistical tolerance regions, Mathematische Operationsforschung


und Statistik - Series Statistics 12 (1981), 441–456.

[11] M. Jı́lek and H. Ackermann, A bibliography of statistical tolerance regions II, Statistics 20
(1989), 165–172.

[12] V.E. Kane, Process Capability Indices, J. Qual. Technol. 18(1986), 41–52.

[13] S. Kotz and N.L. Johnson, Process Capability Indices, Chapman and Hall, London, 1993.

[14] J.S. Marron and M.P. Wand, Exact mean integrated squared error, Annals of Statistics 20
(1992), 712–736.

[15] D.B. Owen, A survey of properties and applications of the noncentral t-distribution, Techno-
metrics 10 (1968), 445–478.

[16] J.K. Patel, Tolerance limits - a review, Communications in Statistics A - Theory and Methods
15 (1986), 2719–2762.

[17] E. Parzen, On estimation of a probability density function and mode, Annals of Mathematical
Statistics 33 (1962), 1065–1076.

1.19
1.20 BIBLIOGRAPHY

[18] E. Paulson, A note on control limits, Annals of Mathematical Statistics 14 (1943), 90–93.

[19] M. Rosenblatt, Remarks on some nonparametric estimates of a density function, Annals of


Mathematical Statistics 27 (1956), 827–837.
[20] B.W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman & Hall,
London, 1986.
[21] R.A. Tapia and J.R. Thompson, Nonparametric Probability Density Estimation, The John
Hopkins University Press, Baltimore, 1978.
[22] M.P. Wand and M.C. Jones, Kernel Smoothing, Chapman & Hall, London, 1995.

[23] D.J. Wheeler, The variance of an estimator in variables sampling, Technometrics 12 (1970),
751–755.
[24] P.W. Zehna, Invariance of Maximum Likelihood estimation, Annals of Mathematical Statistics
37 (1966), 744.

You might also like