0% found this document useful (0 votes)
137 views137 pages

Engineering Statistics Handbook 3. Production Process Characterization

The document discusses production process characterization (PPC). PPC involves building data-based models of manufacturing processes by collecting data on key inputs and outputs over the operating range and using this to estimate steady-state behavior and develop mathematical models. This is typically a three-step process involving screening experiments to identify important factors, response surface experiments to map output behavior across operating ranges, and passive monitoring of processes under nominal conditions to assess stability and capability. Not all steps are required for established processes or after minor changes.

Uploaded by

agbas20026896
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views137 pages

Engineering Statistics Handbook 3. Production Process Characterization

The document discusses production process characterization (PPC). PPC involves building data-based models of manufacturing processes by collecting data on key inputs and outputs over the operating range and using this to estimate steady-state behavior and develop mathematical models. This is typically a three-step process involving screening experiments to identify important factors, response surface experiments to map output behavior across operating ranges, and passive monitoring of processes under nominal conditions to assess stability and capability. Not all steps are required for established processes or after minor changes.

Uploaded by

agbas20026896
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 137

3.

Production Process Characterization

3. Production Process Characterization

The goal of this chapter is to learn how to plan and conduct a Production
Process Characterization Study (PPC) on manufacturing processes. We will
learn how to model manufacturing processes and use these models to design
a data collection scheme and to guide data analysis activities. We will look
in detail at how to analyze the data collected in characterization studies and
how to interpret and report the results. The accompanying Case Studies
provide detailed examples of several process characterization studies.

1. Introduction 2. Assumptions

1. Definition 1. General Assumptions


2. Uses 2. Specific PPC Models
3. Terminology/Concepts
4. PPC Steps

3. Data Collection 4. Analysis

1. Set Goals 1. First Steps


2. Model the Process 2. Exploring Relationships
3. Define Sampling Plan 3. Model Building
4. Variance Components
5. Process Stability
6. Process Capability
7. Checking Assumptions

5. Case Studies

1. Furnace Case Study


2. Machine Case Study

Detailed Chapter Table of Contents

References

http://www.itl.nist.gov/div898/handbook/ppc/ppc.htm[6/27/2012 2:11:14 PM]


3. Production Process Characterization

3.   Production Process Characterization - Detailed Table of


Contents  [3.]

1. Introduction to Production Process Characterization  [3.1.]


1. What is PPC?  [3.1.1.]
2. What are PPC Studies Used For?  [3.1.2.]
3. Terminology/Concepts  [3.1.3.]
1. Distribution (Location, Spread and Shape)  [3.1.3.1.]
2. Process Variability  [3.1.3.2.]
1. Controlled/Uncontrolled Variation  [3.1.3.2.1.]
3. Propagating Error  [3.1.3.3.]
4. Populations and Sampling  [3.1.3.4.]
5. Process Models  [3.1.3.5.]
6. Experiments and Experimental Design  [3.1.3.6.]
4. PPC Steps  [3.1.4.]

2. Assumptions / Prerequisites  [3.2.]
1. General Assumptions  [3.2.1.]
2. Continuous Linear Model  [3.2.2.]
3. Analysis of Variance Models (ANOVA)  [3.2.3.]
1. One-Way ANOVA  [3.2.3.1.]
1. One-Way Value-Splitting  [3.2.3.1.1.]
2. Two-Way Crossed ANOVA  [3.2.3.2.]
1. Two-way Crossed Value-Splitting Example  [3.2.3.2.1.]
3. Two-Way Nested ANOVA  [3.2.3.3.]
1. Two-Way Nested Value-Splitting Example  [3.2.3.3.1.]
4. Discrete Models  [3.2.4.]

3. Data Collection for PPC  [3.3.]


1. Define Goals  [3.3.1.]
2. Process Modeling  [3.3.2.]
3. Define Sampling Plan  [3.3.3.]
1. Identifying Parameters, Ranges and Resolution  [3.3.3.1.]
2. Choosing a Sampling Scheme  [3.3.3.2.]
3. Selecting Sample Sizes  [3.3.3.3.]
4. Data Storage and Retrieval  [3.3.3.4.]
5. Assign Roles and Responsibilities  [3.3.3.5.]

4. Data Analysis for PPC  [3.4.]


1. First Steps  [3.4.1.]
2. Exploring Relationships  [3.4.2.]
1. Response Correlations  [3.4.2.1.]
2. Exploring Main Effects  [3.4.2.2.]

http://www.itl.nist.gov/div898/handbook/ppc/ppc_d.htm[6/27/2012 2:10:10 PM]


3. Production Process Characterization

3. Exploring First Order Interactions  [3.4.2.3.]


3. Building Models  [3.4.3.]
1. Fitting Polynomial Models  [3.4.3.1.]
2. Fitting Physical Models  [3.4.3.2.]
4. Analyzing Variance Structure  [3.4.4.]
5. Assessing Process Stability  [3.4.5.]
6. Assessing Process Capability  [3.4.6.]
7. Checking Assumptions  [3.4.7.]

5. Case Studies  [3.5.]
1. Furnace Case Study  [3.5.1.]
1. Background and Data  [3.5.1.1.]
2. Initial Analysis of Response Variable  [3.5.1.2.]
3. Identify Sources of Variation  [3.5.1.3.]
4. Analysis of Variance  [3.5.1.4.]
5. Final Conclusions  [3.5.1.5.]
6. Work This Example Yourself  [3.5.1.6.]
2. Machine Screw Case Study  [3.5.2.]
1. Background and Data  [3.5.2.1.]
2. Box Plots by Factors  [3.5.2.2.]
3. Analysis of Variance  [3.5.2.3.]
4. Throughput  [3.5.2.4.]
5. Final Conclusions  [3.5.2.5.]
6. Work This Example Yourself  [3.5.2.6.]

6. References  [3.6.]

http://www.itl.nist.gov/div898/handbook/ppc/ppc_d.htm[6/27/2012 2:10:10 PM]


3.1. Introduction to Production Process Characterization

3. Production Process Characterization

3.1. Introduction to Production Process


Characterization

Overview The goal of this section is to provide an introduction to PPC.


Section We will define PPC and the terminology used and discuss
some of the possible uses of a PPC study. Finally, we will
look at the steps involved in designing and executing a PPC
study.

Contents: 1. What is PPC?


Section 1 2. What are PPC studies used for? 
3. What terminology is used in PPC? 
1. Location, Spread and Shape 
2. Process Variability 
3. Propagating Error 
4. Populations and Sampling 
5. Process Models
6. Experiments and Experimental Design
4. What are the steps of a PPC? 
1. Plan PPC 
2. Collect Data 
3. Analyze and Interpret Data 
4. Report Conclusions 

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc1.htm[6/27/2012 2:10:17 PM]


3.1.1. What is PPC?

3. Production Process Characterization


3.1. Introduction to Production Process Characterization

3.1.1. What is PPC?

In PPC, Process characterization is an activity in which we:


we build
data- identify the key inputs and outputs of a process
based collect data on their behavior over the entire operating
models range
estimate the steady-state behavior at optimal operating
conditions
and build models describing the parameter relationships
across the operating range

The result of this activity is a set of mathematical process


models that we can use to monitor and improve the process.

This is a This activity is typically a three-step process.


three-step
process The Screening Step
In this phase we identify all possible significant process
inputs and outputs and conduct a series of screening
experiments in order to reduce that list to the key inputs
and outputs. These experiments will also allow us to
develop initial models of the relationships between those
inputs and outputs.
The Mapping Step
In this step we map the behavior of the key outputs over
their expected operating ranges. We do this through a
series of more detailed experiments called Response
Surface experiments.
The Passive Step
In this step we allow the process to run at nominal
conditions and estimate the process stability and
capability.

Not all of The first two steps are only needed for new processes or when
the steps the process has undergone some significant engineering
need to be change.  There are, however, many times throughout the life
performed of a process when the third step is needed. Examples might
be: initial process qualification, control chart development,
after minor process adjustments, after schedule equipment
maintenance, etc. 

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc11.htm[6/27/2012 2:10:18 PM]


3.1.1. What is PPC?

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc11.htm[6/27/2012 2:10:18 PM]


3.1.2. What are PPC Studies Used For?

3. Production Process Characterization


3.1. Introduction to Production Process Characterization

3.1.2. What are PPC Studies Used For?

PPC is the core Process characterization is an integral part of any


of any CI continuous improvement program. There are many steps
program in that program for which process characterization is
required. These might include:

When process when we are bringing a new process or tool into


characterization use.
is required when we are bringing a tool or process back up
after scheduled/unscheduled maintenance.
when we want to compare tools or processes.
when we want to check the health of our process
during the monitoring phase.
when we are troubleshooting a bad process.

The techniques described in this chapter are equally


applicable to the other chapters covered in this
Handbook. These include:

Process calibration
characterization process monitoring
techniques are process improvement
applicable in process/product comparison
other areas reliability

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc12.htm[6/27/2012 2:10:19 PM]


3.1.3. Terminology/Concepts

3. Production Process Characterization


3.1. Introduction to Production Process Characterization

3.1.3. Terminology/Concepts

There are just a few fundamental concepts needed


for PPC. This section will review these ideas briefly
and provide links to other sections in the Handbook
where they are covered in more detail. 

Distribution(location, For basic data analysis, we will need to understand


spread, shape) how to estimate location, spread and shape from the
data. These three measures comprise what is known
as the distribution of the data. We will look at both
graphical and numerical techniques.

Process variability We need to thoroughly understand the concept of


process variability. This includes how variation
explains the possible range of expected data values,
the various classifications of variability, and the role
that variability plays in process stability and
capability.

Error propagation We also need to understand how variation


propagates through our manufacturing processes
and how to decompose the total observed variation
into components attributable to the contributing
sources.

Populations and It is important to have an understanding of the


sampling various issues related to sampling. We will define a
population and discuss how to acquire
representative random samples from the population
of interest.  We will also discuss a useful formula
for estimating the number of observations required
to answer specific questions.

Modeling For modeling, we will need to know how to identify


important factors and responses.  We will also need
to know how to graphically and quantitatively build
models of the relationships between the factors and
responses. 

Experiments Finally, we will need to know about the basics of


designed experiments including screening designs
and response surface designs so that we can
quantify these relationships. This topic will receive

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc13.htm[6/27/2012 2:10:19 PM]


3.1.3. Terminology/Concepts

only a cursory treatment in this chapter. It is


covered in detail in the process improvement
chapter. However, examples of its use are in the
case studies.

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc13.htm[6/27/2012 2:10:19 PM]


3.1.3.1. Distribution (Location, Spread and Shape)

3. Production Process Characterization


3.1. Introduction to Production Process Characterization
3.1.3. Terminology/Concepts

3.1.3.1. Distribution (Location, Spread and


Shape)

Distributions A fundamental concept in representing any of the outputs


are from a production process is that of a distribution.
characterized Distributions arise because any manufacturing process
by location, output will not yield the same value every time it is
spread and measured. There will be a natural scattering of the
shape measured values about some central tendency value. This
scattering about a central value is known as a distribution.
A distribution is characterized by three values:

Location
The location is the expected value of the output being
measured. For a stable process, this is the value
around which the process has stabilized.
Spread
The spread is the expected amount of variation
associated with the output. This tells us the range of
possible values that we would expect to see.
Shape
The shape shows how the variation is distributed
about the location. This tells us if our variation is
symmetric about the mean or if it is skewed or
possibly multimodal.

A primary One of the primary goals of a PPC study is to characterize


goal of PPC our process outputs in terms of these three measurements. If
is to estimate we can demonstrate that our process is stabilized about a
the constant location, with a constant variance and a known
distributions stable shape, then we have a process that is both predictable
of the and controllable. This is required before we can set up
process control charts or conduct experiments.
outputs

Click on The table below shows the most common numerical and
each item to graphical measures of location, spread and shape.
read more
detail Parameter Numerical Graphical
scatter plot
mean
Location boxplot
median

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc131.htm[6/27/2012 2:10:20 PM]


3.1.3.1. Distribution (Location, Spread and Shape)

histogram
variance
boxplot 
Spread range
histogram
inter-quartile range 
boxplot 
skewness
Shape histogram 
kurtosis
probability plot

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc131.htm[6/27/2012 2:10:20 PM]


3.1.3.2. Process Variability

3. Production Process Characterization


3.1. Introduction to Production Process Characterization
3.1.3. Terminology/Concepts

3.1.3.2. Process Variability

Variability All manufacturing and measurement processes exhibit variation. For example, when
is present we take sample data on the output of a process, such as critical dimensions, oxide
everywhere thickness, or resistivity, we observe that all the values are NOT the same. This results
in a collection of observed values distributed about some location value. This is what
we call spread or variability. We represent variability numerically with the variance
calculation and graphically with a  histogram.

How does The standard deviation (square root of the variance) gives insight into the spread of the
the data through the use of what is known as the Empirical Rule. This rule (shown in the
standard graph below) is:
deviation
describe the Approximately 60-78% of the data are within a distance of one standard deviation
spread of from the average .
the data?
Approximately 90-98% of the data are within a distance of two standard deviations
from the average .

More than 99% of the data are within a distance of three standard deviations from the
average

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc132.htm[6/27/2012 2:10:21 PM]


3.1.3.2. Process Variability

Variability This observed variability is an accumulation of many different sources of variation that
accumulates have occurred throughout the manufacturing process. One of the more important
from many activities of process characterization is to identify and quantify these various sources
sources of variation so that they may be minimized.

There are There are not only different sources of variation, but there are also different types of
also variation. Two important classifications of variation for the purposes of PPC are
different controlled variation and uncontrolled variation.
types

Click here CONTROLLED VARIATION


to see Variation that is characterized by a stable and consistent pattern of variation
examples over time. This type of variation will be random in nature and will be exhibited
by a uniform fluctuation about a constant level.
UNCONTROLLED VARIATION
Variation that is characterized by a pattern of variation that changes over time
and hence is unpredictable. This type of variation will typically contain some
structure.

Stable This concept of controlled/uncontrolled variation is important in determining if a


processes process is stable. A process is deemed stable if it runs in a consistent and predictable
only exhibit manner. This means that the average process value is constant and the variability is
controlled controlled. If the variation is uncontrolled, then either the process average is changing
variation or the process variation is changing or both. The first process in the example above is
stable; the second is not.

In the course of process characterization we should endeavor to eliminate all sources

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc132.htm[6/27/2012 2:10:21 PM]


3.1.3.2. Process Variability

of uncontrolled variation. 

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc132.htm[6/27/2012 2:10:21 PM]


3.1.3.2.1. Controlled/Uncontrolled Variation

3. Production Process Characterization


3.1. Introduction to Production Process Characterization
3.1.3. Terminology/Concepts
3.1.3.2. Process Variability

3.1.3.2.1. Controlled/Uncontrolled Variation

Two trend The two figures below are two trend plots from two different oxide
plots growth processes. Thirty wafers were sampled from each process: one
per day over 30 days. Thickness at the center was measured on each
wafer. The x-axis of each graph is the wafer number and the y-axis is the
film thickness in angstroms.

Examples The first process is an example of a process that is "in control" with
of"in random fluctuation about a process location of approximately 990. The
control" and second process is an example of a process that is "out of control" with a
"out of process location trending upward after observation 20.
control"
processes

This process
exhibits
controlled
variation.
Note the
random
fluctuation
about a
constant
mean.

This process
exhibits
uncontrolled
variation.
Note the
structure in
the

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc1321.htm[6/27/2012 2:10:22 PM]


3.1.3.2.1. Controlled/Uncontrolled Variation

variation in
the form of
a linear
trend.

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc1321.htm[6/27/2012 2:10:22 PM]


3.1.3.3. Propagating Error

3. Production Process Characterization


3.1. Introduction to Production Process Characterization
3.1.3. Terminology/Concepts

3.1.3.3. Propagating Error

The When we estimate the variance at a particular process step, this


variation we variance is typically not just a result of the current step, but rather is
see can an accumulation of variation from previous steps and from
come from measurement error. Therefore, an important question that we need
many to answer in PPC is how the variation from the different sources
sources accumulates. This will allow us to partition the total variation and
assign the parts to the various sources. Then we can attack the
sources that contribute the most.

How do I Usually we can model the contribution of the various sources of


partition the error to the total error through a simple linear relationship. If we
error? have a simple linear relationship between two variables, say, 

then the variance associated with, y, is given by,

If the variables are not correlated, then there is no covariance and


the last term in the above equation drops off.  A good example of
this is the case in which we have both process error and
measurement error. Since these are usually independent of each
other, the total observed variance is just the sum of the variances for
process and measurement. Remember to never add standard
deviations, we must add variances.

How do I Of course, we rarely have the individual components of variation


calculate the and wish to know the total variation. Usually, we have an estimate
individual of the overall variance and wish to break that variance down into its
components? individual components. This is known as components of variance
estimation and is dealt with in detail in the  analysis of variance
page later in this chapter.

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc133.htm[6/27/2012 2:10:22 PM]


3.1.3.4. Populations and Sampling

3. Production Process Characterization


3.1. Introduction to Production Process Characterization
3.1.3. Terminology/Concepts

3.1.3.4. Populations and Sampling

We take In survey sampling, if you want to know what everyone thinks


samples about a particular topic, you can just ask everyone and record
from a their answers. Depending on how you define the term,
target everyone (all the adults in a town, all the males in the USA,
population etc.), it may be impossible or impractical to survey everyone. 
and make The other option is to survey a small group (Sample) of the
inferences people whose opinions you are interested in (Target
Population) , record their opinions and use that information to
make inferences about what everyone thinks. Opinion pollsters
have developed a whole body of tools for doing just that and
many of those tools apply to manufacturing as well.  We can
use these sampling techniques to take a few measurements
from a process and make statements about the behavior of that
process.

Facts If it weren't for process variation we could just take one


about a sample and everything would be known about the target
sample are population.  Unfortunately this is never the case.  We cannot
not take facts about the sample to be facts about the population. 
necessarily Our job is to reach appropriate conclusions about the
facts about population despite this variation.  The more observations we
a take from a population, the more our sample data resembles
population the population. When we have reached the point at which facts
about the sample are reasonable approximations of facts about
the population, then we say the sample is adequate.

Four Adequacy of a sample depends on the following four


attributes attributes:
of samples
Representativeness of the sample (is it random?)
Size of the sample
Variability in the population
Desired precision of the estimates

We will learn about choosing representative samples of


adequate size in the section on  defining sampling plans.

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc134.htm[6/27/2012 2:10:23 PM]


3.1.3.5. Process Models

3. Production Process Characterization


3.1. Introduction to Production Process Characterization
3.1.3. Terminology/Concepts

3.1.3.5. Process Models

Black box As we will see in Section 3 of this chapter, one of the first steps in PPC is to
model and model the process that is under investigation. Two very useful tools for
fishbone doing this are the black-box model and the fishbone diagram.
diagram

We use the We can use the simple black-box model, shown below, to describe most of
black-box the tools and processes we will encounter in PPC.  The process will be
model to stimulated by inputs. These inputs can either be controlled (such as recipe or
describe machine settings) or uncontrolled (such as humidity, operators, power
our fluctuations, etc.). These inputs interact with our process and produce
processes outputs. These outputs are usually some characteristic of our process that we
can measure. The measurable inputs and outputs can be sampled in order to
observe and understand how they behave and relate to each other.

Diagram
of the
black box
model

These inputs and outputs are also known as Factors and Responses,
respectively. 

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc135.htm[6/27/2012 2:10:24 PM]


3.1.3.5. Process Models

Factors 
Observed inputs used to explain response behavior (also called
explanatory variables). Factors may be fixed-level controlled inputs or
sampled uncontrolled inputs. 
Responses 
Sampled process outputs. Responses may also be functions of sampled
outputs such as average thickness or uniformity. 

Factors We further categorize factors and responses according to their Variable


and Type, which indicates the amount of information they contain. As the name
Responses implies, this classification is useful for data modeling activities and is
are further critical for selecting the proper analysis technique. The table below
classified summarizes this categorization. The types are listed in order of the amount
by of information they contain with Measurement containing the most
variable information and Nominal containing the least.
type

Table
Type Description Example
describing
the particle count, oxide
different discrete/continuous, order is
Measurement thickness, pressure,
variable important,  infinite range
temperature
types
discrete, order is important,
Ordinal run #, wafer #, site, bin
finite range

good/bad, bin,
discrete, no order, very few
Nominal high/medium/low, shift,
possible values
operator
 

Fishbone We can use the fishbone diagram to further refine the modeling process.
diagrams Fishbone diagrams are very useful for decomposing the complexity of our
help to manufacturing processes. Typically, we choose a process characteristic
decompose (either Factors or Responses) and list out the general categories that may
complexity influence the characteristic (such as material, machine method, environment,
etc.), and then provide more specific detail within each category. Examples
of how to do this are given in the section on Case Studies.

Sample
fishbone
diagram

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc135.htm[6/27/2012 2:10:24 PM]


3.1.3.5. Process Models

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc135.htm[6/27/2012 2:10:24 PM]


3.1.3.6. Experiments and Experimental Design

3. Production Process Characterization


3.1. Introduction to Production Process Characterization
3.1.3. Terminology/Concepts

3.1.3.6. Experiments and Experimental Design

Factors and Besides just observing our processes for evidence of stability
responses and capability, we quite often want to know about the
relationships between the various Factors and Responses.

We look for There are generally two types of relationships that we are
correlations interested in for purposes of PPC.  They are: 
and causal
relationships Correlation
Two variables are said to be correlated if an observed
change in the level of one variable is accompanied by
a change in the level of another variable.  The change
may be in the same direction (positive correlation) or
in the opposite direction (negative correlation).
Causality
There is a causal relationship between two variables if
a change in the level of one variable causes a change
in the other variable.

Note that correlation does not imply causality.  It is possible


for two variables to be associated with each other without
one of them causing the observed behavior in the other. 
When this is the case it is usually because there is a third
(possibly unknown) causal factor. 

Our goal is Generally, our ultimate goal in PPC is to find and quantify
to find causal relationships. Once this is done, we can then take
causal advantage of these relationships to improve and control our
relationships processes.

Find Generally, we first need to find and explore correlations and


correlations then try to establish causal relationships. It is much easier to
and then try find correlations as these are just properties of the data. It is
to establish much more difficult to prove causality as this additionally
causal requires sound engineering judgment.  There is a systematic
relationships procedure we can use to accomplish this in an efficient
manner. We do this through the use of designed
experiments.

First we When we have many potential factors and we want to see


screen, then which ones are correlated and have the potential to be
we build involved in causal relationships with the responses, we use

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc136.htm[6/27/2012 2:10:25 PM]


3.1.3.6. Experiments and Experimental Design

models screening designs to reduce the number of candidates.  Once


we have a reduced set of influential factors, we can use
response surface designs to model the causal relationships
with the responses across the operating range of the process
factors.

Techniques The techniques are covered in detail in the  process


discussed in improvement section and will not be discussed much in this
process chapter. Examples of how the techniques are used in PPC
improvement are given in the Case Studies.
chapter

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc136.htm[6/27/2012 2:10:25 PM]


3.1.4. PPC Steps

3. Production Process Characterization


3.1. Introduction to Production Process Characterization

3.1.4. PPC Steps

Follow The primary activity of a PPC is to collect and analyze data


these 4  so that we may draw conclusions about and ultimately
steps to improve our production processes. In many industrial
ensure applications, access to production facilities for the purposes of
efficient conducting experiments is very limited.  Thus we must be
use of very careful in how we go about these activities so that we
resources can be sure of doing them in a cost-effective manner.

Step 1: The most important step by far is the planning step. By


Plan faithfully executing this step, we will ensure that we only
collect data in the most efficient manner possible and still
support the goals of the PPC. Planning should generate the
following:

a statement of the goals


a descriptive process model (a list of process inputs and
outputs)
a description of the sampling plan (including a
description of the procedure and settings to be used to
run the process during the study with clear assignments
for each person involved)
a description of the method of data collection, tasks and
responsibilities, formatting, and storage
an outline of the data analysis

All decisions that affect how the characterization will be


conducted should be made during the planning phase. The
process characterization should be conducted according to
this plan, with all exceptions noted.

Step 2: Data collection is essentially just the execution of the


Collect sampling plan part of the previous step. If a good job were
done in the planning step, then this step should be pretty
straightforward. It is important to execute to the plan as
closely as possible and to note any exceptions.

Step 3: This is the combination of quantitative (regression, ANOVA,


Analyze correlation, etc.) and graphical (histograms, scatter plots, box
and plots, etc.) analysis techniques that are applied to the collected
interpret data in order to accomplish the goals of the PPC.

Step 4: Reporting is an important step that should not be overlooked. 

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc14.htm[6/27/2012 2:10:25 PM]


3.1.4. PPC Steps

Report By creating an informative report and archiving it in an


accessible place, we can ensure that others have access to the
information generated by the PPC. Often, the work involved
in a PPC can be minimized by using the results of other,
similar studies. Examples of PPC reports can be found in the
Case Studies section.

Further The planning and data collection steps are described in detail
information in the data collection section. The analysis and interpretation
steps are covered in detail in the analysis section. Examples
of the reporting step can be seen in the Case Studies.

http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc14.htm[6/27/2012 2:10:25 PM]


3.2. Assumptions / Prerequisites

3. Production Process Characterization

3.2. Assumptions / Prerequisites

Primary The primary goal of PPC is to identify and quantify sources of


goal is to variation. Only by doing this will we be able to define an
identify effective plan for variation reduction and process
and improvement. Sometimes, in order to achieve this goal, we
quantify must first build mathematical/statistical models of our
sources of processes. In these models we will identify influential factors
variation and the responses on which they have an effect. We will use
these models to understand how the sources of variation are
influenced by the important factors. This subsection will
review many of the modeling tools we have at our disposal to
accomplish these tasks. In particular, the models covered in
this section are linear models, Analysis of Variance (ANOVA)
models and discrete models.

Contents: 1. General Assumptions 


Section 2 2. Continuous Linear
3. Analysis of Variance 
1. One-Way
2. Crossed
3. Nested 
4. Discrete

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc2.htm[6/27/2012 2:10:26 PM]


3.2.1. General Assumptions

3. Production Process Characterization


3.2. Assumptions / Prerequisites

3.2.1. General Assumptions

Assumption: In order to employ the modeling techniques described in


process is this section, there are a few assumptions about the process
sum of a under study that must be made. First, we must assume that
systematic the process can adequately be modeled as the sum of a
component systematic component and a random component. The
and a random systematic component is the mathematical model part and
component the random component is the error or noise present in the
system. We also assume that the systematic component is
fixed over the range of operating conditions and that the
random component has a constant location, spread and
distributional form.

Assumption: Finally, we assume that the data used to fit these models
data used to are representative of the process being modeled. As a
fit these result, we must additionally assume that the measurement
models are system used to collect the data has been studied and proven
representative to be capable of making measurements to the desired
of the process precision and accuracy.  If this is not the case, refer to the
being Measurement Capability Section of this Handbook.
modeled

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc21.htm[6/27/2012 2:10:27 PM]


3.2.2. Continuous Linear Model

3. Production Process Characterization


3.2. Assumptions / Prerequisites

3.2.2. Continuous Linear Model

Description The continuous linear model (CLM) is probably the most


commonly used model in PPC. It is applicable in many instances
ranging from simple control charts to response surface models.

The CLM is a mathematical function that relates explanatory


variables (either discrete or continuous) to a single continuous
response variable.  It is called linear because the coefficients of
the terms are expressed as a linear sum. The terms themselves do
not have to be linear.

Model The general form of the CLM is:

This equation just says that if we have p explanatory variables


then the response is modeled by a constant term plus a sum of
functions of those explanatory variables, plus some random error
term. This will become clear as we look at some examples below.

Estimation The coefficients for the parameters in the CLM are estimated by
the method of least squares. This is a method that gives estimates
which minimize the sum of the squared distances from the
observations to the fitted line or plane. See the chapter on Process
Modeling for a more complete discussion on estimating the
coefficients for these models.

Testing The tests for the CLM involve testing that the model as a whole is
a good representation of the process and whether any of the
coefficients in the model are zero or have no effect on the overall
fit. Again, the details for testing are given in the chapter on
Process Modeling.

Assumptions For estimation purposes, there are no additional assumptions


necessary for the CLM beyond those stated in the assumptions
section. For testing purposes, however, it is necessary to assume
that the error term is adequately modeled by a Gaussian
distribution.

Uses The CLM has many uses such as building predictive process
models over a range of process settings that exhibit linear
behavior, control charts, process capability, building models from

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc22.htm[6/27/2012 2:10:27 PM]


3.2.2. Continuous Linear Model

the data produced by designed experiments, and building response


surface models for automated process control applications.

Examples Shewhart Control Chart - The simplest example of a very


common usage of the CLM is the underlying model used for
Shewhart control charts. This model assumes that the process
parameter being measured is a constant with additive Gaussian
noise and is given by:

Diffusion Furnace - Suppose we want to model the average wafer


sheet resistance as a function of the location or zone in a furnace
tube, the temperature, and the anneal time. In this case, let there
be 3 distinct zones (front, center, back) and temperature and time
are continuous explanatory variables.  This model is given by the
CLM:

Diffusion Furnace (cont.) - Usually, the fitted line for the average
wafer sheet resistance is not straight but has some curvature to it.
This can be accommodated by adding a quadratic term for the
time parameter as follows:

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc22.htm[6/27/2012 2:10:27 PM]


3.2.3. Analysis of Variance Models (ANOVA)

3. Production Process Characterization


3.2. Assumptions / Prerequisites

3.2.3. Analysis of Variance Models (ANOVA)

ANOVA One of the most common analysis activities in PPC is


allows us comparison. We often compare the performance of similar
to compare tools or processes. We also compare the effect of different
the effects treatments such as recipe settings. When we compare two
of multiple things, such as two tools running the same operation, we use 
levels of comparison techniques. When we want to compare multiple
multiple things, like multiple tools running the same operation or
factors multiple tools with multiple operators running the same
operation, we turn to ANOVA techniques to perform the
analysis.

The easiest way to understand ANOVA is through a concept


ANOVA
known as value splitting. ANOVA splits the observed data
splits the
values into components that are attributable to the different
data into
levels of the factors. Value splitting is best explained by
components
example.

Example: The simplest example of value splitting is when we just have


Turned one level of one factor. Suppose we have a turning operation
Pins in a machine shop where we are turning pins to a diameter of
.125 +/- .005 inches.  Throughout the course of a day we take
five samples of pins and obtain the following measurements:
.125, .127, .124, .126, .128.

We can split these data values into a common value (mean)


and residuals (what's left over) as follows:

.125 .127 .124 .126 .128

.126 .126 .126 .126 .126

-.001 .001 -.002 .000 .002

From these tables, also called overlays, we can easily


calculate the location and spread of the data as follows:

mean = .126

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc23.htm[6/27/2012 2:10:28 PM]


3.2.3. Analysis of Variance Models (ANOVA)

std. deviation = .0016.

Other While the above example is a trivial structural layout, it


layouts illustrates how we can split data values into its components.
In the next sections, we will look at more complicated
structural layouts for the data. In particular we will look at
multiple levels of one factor (One-Way ANOVA) and
multiple levels of two factors (Two-Way ANOVA) where the
factors are crossed and  nested.

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc23.htm[6/27/2012 2:10:28 PM]


3.2.3.1. One-Way ANOVA

3. Production Process Characterization


3.2. Assumptions / Prerequisites
3.2.3. Analysis of Variance Models (ANOVA)

3.2.3.1. One-Way ANOVA

Description A one-way layout consists of a single factor with several levels and multiple
observations at each level. With this kind of layout we can calculate the mean of the
observations within each level of our factor. The residuals will tell us about the
variation within each level. We can also average the means of each level to obtain a
grand mean. We can then look at the deviation of the mean of each level from the
grand mean to understand something about the level effects. Finally, we can compare
the variation within levels to the variation across levels. Hence the name analysis of
variance.

Model It is easy to model all of this with an equation of the form:

The equation indicates that the jth data value, from level i, is the sum of three
components: the common value (grand mean), the level effect (the deviation of each
level mean from the grand mean), and the residual (what's left over).

Estimation Estimation for the one-way layout can be performed one of two ways. First, we can
calculate the total variation, within-level variation and across-level variation. These can
click here to be summarized in a table as shown below and tests can be made to determine if the
see details factor levels are significant. The value splitting example illustrates the calculations
of one-way involved.
value
splitting

ANOVA In general, the ANOVA table for the one-way case is given by:
table for
one-way
case

where

and

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc231.htm[6/27/2012 2:10:29 PM]


3.2.3.1. One-Way ANOVA

.
The row labeled, "Corr. Total", in the ANOVA table contains the corrected total sum of
squares and the associated degrees of freedom (DoF).

Level effects The second way to estimate effects is through the use of CLM techniques. If you look at
must sum to the model above you will notice that it is in the form of a CLM. The only problem is
zero that the model is saturated and no unique solution exists. We overcome this problem by
applying a constraint to the model. Since the level effects are just deviations from the
grand mean, they must sum to zero. By applying the constraint that the level effects
must sum to zero, we can now obtain a unique solution to the CLM equations. Most
analysis programs will handle this for you automatically. See the chapter on  Process
Modeling for a more complete discussion on estimating the coefficients for these
models.

Testing We are testing to see if the observed data support the hypothesis that the levels of the
factor are significantly different from each other. The way we do this is by comparing
the within-level variancs to the between-level variance.

If we assume that the observations within each level have the same variance, we can
calculate the variance within each level and pool these together to obtain an estimate of
the overall population variance. This works out to be the mean square of the residuals.

Similarly, if there really were no level effect, the mean square across levels would be an
estimate of the overall variance. Therefore, if there really were no level effect, these
two estimates would be just two different ways to estimate the same parameter and
should be close numerically. However, if there is a level effect, the level mean square
will be higher than the residual mean square.

It can be shown that given the assumptions about the data stated below, the ratio of the
level mean square and the residual mean square follows an F distribution with degrees
of freedom as shown in the ANOVA table. If the F0 value is significant at a given
significance level (greater than the cut-off value in a F table), then there is a level effect
present in the data.

Assumptions For estimation purposes, we assume the data can adequately be modeled as the sum of
a deterministic component and a random component. We further assume that the fixed
(deterministic) component can be modeled as the sum of an overall mean and some
contribution from the factor level. Finally, it is assumed that the random component can
be modeled with a Gaussian distribution with fixed location and spread.

Uses The one-way ANOVA is useful when we want to compare the effect of multiple levels
of one factor and we have multiple observations at each level. The factor can be either
discrete (different machine, different plants, different shifts, etc.) or continuous
(different gas flows, temperatures, etc.).

Example Let's extend the machining example by assuming that we have five different machines
making the same part and we take five random samples from each machine to obtain the
following diameter data:

Machine

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc231.htm[6/27/2012 2:10:29 PM]


3.2.3.1. One-Way ANOVA

1 2 3 4 5
0.125 0.118 0.123 0.126 0.118
0.127 0.122 0.125 0.128 0.129
0.125 0.120 0.125 0.126 0.127
0.126 0.124 0.124 0.127 0.120
0.128 0.119 0.126 0.129 0.121

Analyze Using ANOVA software or the techniques of the value-splitting example, we


summarize the data in an ANOVA table as follows:

Source Sum of Squares Deg. of Freedom Mean Square F0


Factor 0.000137 4 0.000034 4.86
Residual 0.000132 20 0.000007  
Corrected Total 0.000269 24   

Test By dividing the factor-level mean square by the residual mean square, we obtain an F0
value of 4.86 which is greater than the cut-off value of 2.87 from the F distribution with
4 and 20 degrees of freedom and a significance level of 0.05. Therefore, there is
sufficient evidence to reject the hypothesis that the levels are all the same.

Conclusion From the analysis of these data we can conclude that the factor "machine" has an effect.
There is a statistically significant difference in the pin diameters across the machines on
which they were manufactured.

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc231.htm[6/27/2012 2:10:29 PM]


3.2.3.1.1. One-Way Value-Splitting

3. Production Process Characterization


3.2. Assumptions / Prerequisites
3.2.3. Analysis of Variance Models (ANOVA)
3.2.3.1. One-Way ANOVA

3.2.3.1.1. One-Way Value-Splitting

Example Let's use the data from the machining example to illustrate
how to use the techniques of value-splitting to break each data
value into its component parts. Once we have the component
parts, it is then a trivial matter to calculate the sums of squares
and form the F-value for the test. 
 
Machine
1 2 3 4 5
.125 .118 .123 .126 .118
.127 .122 .125 .128 .129
.125 .120 .125 .126 .127
.126 .124 .124 .127 .120
.128 .119 .126 .129 .121

Calculate Remember from our model,  , we say each


level- observation is the sum of a common value, a level effect and a
means residual value. Value-splitting just breaks each observation
into its component parts. The first step in value-splitting is to
calculate the mean values (rounding to the nearest thousandth)
within each machine to get the level means. 

Machine
1 2 3 4 5
.1262 .1206 .1246 .1272 .123

Sweep We can then sweep (subtract the level mean from each
level associated data value) the means through the original data
means table to get the residuals:

Machine
1 2 3 4 5
- - - - -
.0012 .0026 .0016 .0012 .005
.0008 .0014 .0004 .0008 .006
- - -

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc2311.htm[6/27/2012 2:10:30 PM]


3.2.3.1.1. One-Way Value-Splitting

.0004 .004
.0012 .0006 .0012
- - - -
.0034
.0002 .0006 .0002 .003
- -
.0018 .0014 .0018
.0016 .002

Calculate The next step is to calculate the grand mean from the
the grand individual machine means as:
mean

Grand
Mean
.12432

Sweep the Finally, we can sweep the grand mean through the individual
grand level means to obtain the level effects:
mean
through
the level
means

Machine
1 2 3 4 5
- -
.00188 .00028 .00288
.00372 .00132

It is easy to verify that the original data table can be


constructed by adding the overall mean, the machine effect
and the appropriate residual.

Calculate Now that we have the data values split and the overlays
ANOVA created, the next step is to calculate the various values in the
values One-Way ANOVA table. We have three values to calculate
for each overlay. They are the sums of squares, the degrees of
freedom, and the mean squares.

Total sum The total sum of squares is calculated by summing the squares
of squares of all the data values and subtracting from this number the
square of the grand mean times the total number of data
values. We usually don't calculate the mean square for the
total sum of squares because we don't use this value in any
statistical test.

Residual The residual sum of squares is calculated by summing the


sum of squares of the residual values. This is equal to .000132. The
squares, degrees of freedom is the number of unconstrained values.
degrees of Since the residuals for each level of the factor must sum to
freedom zero, once we know four of them, the last one is determined.
and mean This means we have four unconstrained values for each level,

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc2311.htm[6/27/2012 2:10:30 PM]


3.2.3.1.1. One-Way Value-Splitting

square or 20 degrees of freedom. This gives a mean square of


.000007.

Level sum Finally, to obtain the sum of squares for the levels, we sum the
of squares, squares of each value in the level effect overlay and multiply
degrees of the sum by the number of observations for each level (in this
freedom case 5) to obtain a value of .000137. Since the deviations from
and mean the level means must sum to zero, we have only four
square unconstrained values so the degrees of freedom for level
effects is 4. This produces a mean square of .000034.

Calculate The last step is to calculate the F-value and perform the test of
F-value equal level means. The F- value is just the level mean square
divided by the residual mean square. In this case the F-
value=4.86. If we look in an F-table for 4 and 20 degrees of
freedom at 95% confidence, we see that the critical value is
2.87, which means that we have a significant result and that
there is thus evidence of a strong machine effect. By looking
at the level-effect overlay we see that this is driven by
machines 2 and 4.

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc2311.htm[6/27/2012 2:10:30 PM]


3.2.3.2. Two-Way Crossed ANOVA

3. Production Process Characterization


3.2. Assumptions / Prerequisites
3.2.3. Analysis of Variance Models (ANOVA)

3.2.3.2. Two-Way Crossed ANOVA

Description When we have two factors with at least two levels and one or more observations at each level, we say we have a
two-way layout. We say that the two-way layout is crossed when every level of Factor A occurs with every level
of Factor B. With this kind of layout we can estimate the effect of each factor (Main Effects) as well as any
interaction between the factors.

Model If we assume that we have K observations at each combination of I levels of Factor A and J levels of Factor B,
then we can model the two-way layout with an equation of the form:

This equation just says that the kth data value for the jth level of Factor B and the ith level of Factor A is the sum
of five components: the common value (grand mean), the level effect for Factor A, the level effect for Factor B,
the interaction effect, and the residual. Note that (ab) does not mean multiplication; rather that there is interaction
between the two factors.

Estimation Like the one-way case, the estimation for the two-way layout can be done either by calculating the variance
components or by using CLM techniques.

Click here For the two-way ANOVA, we display the data in a two-dimensional table with the levels of Factor A in columns
for the and the levels of Factor B in rows. The replicate observations fill each cell. We can sweep out the common
value value, the row effects, the column effects, the interaction effects and the residuals using  value-splitting
splitting techniques. Sums of squares can be calculated and summarized in an ANOVA table as shown below.
example

.
The row labeled, "Corr. Total", in the ANOVA table contains the corrected total sum of squares and the
associated degrees of freedom (DoF).

We can use CLM techniques to do the estimation. We still have the problem that the model is saturated and no
unique solution exists. We overcome this problem by applying the constraints to the model that the two main

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc232.htm[6/27/2012 2:10:31 PM]


3.2.3.2. Two-Way Crossed ANOVA

effects and interaction effects each sum to zero.

Testing Like testing in the one-way case, we are testing that two main effects and the interaction are zero. Again we just
form a ratio of each main effect mean square and the interaction mean square to the residual mean square. If the
assumptions stated below are true then those ratios follow an F distribution and the test is performed by
comparing the F0 ratios to values in an F table with the appropriate degrees of freedom and confidence level.

Assumptions For estimation purposes, we assume the data can be adequately modeled as described in the model above. It is
assumed that the random component can be modeled with a Gaussian distribution with fixed location and spread.

Uses The two-way crossed ANOVA is useful when we want to compare the effect of multiple levels of two factors
and we can combine every level of one factor with every level of the other factor. If we have multiple
observations at each level, then we can also estimate the effects of interaction between the two factors.

Example Let's extend the one-way machining example by assuming that we want to test if there are any differences in pin
diameters due to different types of coolant. We still have five different machines making the same part and we
take five samples from each machine for each coolant type to obtain the following data:

Machine
1 2 3 4 5
0.125 0.118 0.123 0.126 0.118
Coolant 0.127 0.122 0.125 0.128 0.129
A 0.125 0.120 0.125 0.126 0.127
0.126 0.124 0.124 0.127 0.120
0.128 0.119 0.126 0.129 0.121
0.124 0.116 0.122 0.126 0.125
0.128 0.125 0.121 0.129 0.123
Coolant
0.127 0.119 0.124 0.125 0.114
B
0.126 0.125 0.126 0.130 0.124
0.129 0.120 0.125 0.124 0.117

Analyze For analysis details see the crossed two-way value splitting example.  We can summarize the analysis results in
an ANOVA table as follows: 

Source Sum of Squares Deg. of Freedom Mean Square F0


machine 0.000303 4 0.000076 8.8
coolant 0.00000392 1 0.00000392 0.45
interaction 0.00001468 4 0.00000367 0.42
residuals 0.000346 40 0.0000087  
corrected total 0.000668 49    

Test By dividing the mean square for machine by the mean square for residuals we obtain an F0 value of 8.8 which is
greater than the critical value of 2.61 based on 4 and 40 degrees of freedom and a 0.05 significance level.
Likewise the F0 values for Coolant and Interaction, obtained by dividing their mean squares by the residual mean
square, are less than their respective critical values of 4.08 and 2.61 (0.05 significance level).

Conclusion From the ANOVA table we can conclude that machine is the most important factor and is statistically
significant. Coolant is not significant and neither is the interaction. These results would lead us to believe that
some tool-matching efforts would be useful for improving this process.

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc232.htm[6/27/2012 2:10:31 PM]


3.2.3.2. Two-Way Crossed ANOVA

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc232.htm[6/27/2012 2:10:31 PM]


3.2.3.2.1. Two-way Crossed Value-Splitting Example

3. Production Process Characterization


3.2. Assumptions / Prerequisites
3.2.3. Analysis of Variance Models (ANOVA)
3.2.3.2. Two-Way Crossed ANOVA

3.2.3.2.1. Two-way Crossed Value-Splitting


Example

Example: The data table below is five samples each collected from five
Coolant is different lathes each running two different types of coolant.
completely The measurement is the diameter of a turned pin.
crossed
with Machine
machine
1 2 3 4 5
.125 .118 .123 .126 .118
Coolant .127 .122 .125 .128 .129
A .125 .120 .125 .126 .127
.126 .124 .124 .127 .120
.128 .119 .126 .129 .121
.124 .116 .122 .126 .125
Coolant .128 .125 .121 .129 .123
.127 .119 .124 .125 .114
B .126 .125 .126 .130 .124
.129 .120 .125 .124 .117

For the crossed two-way case, the first thing we need to do is


to sweep the cell means from the data table to obtain the
residual values. This is shown in the tables below.

The first Machine


step is to
sweep out 1 2 3 4 5
the cell A .1262 .1206 .1246 .1272 .123
means to B .1268 .121 .1236 .1268 .1206
obtain the
residuals - - - -
-.005
and means .0012 .0026 .0016 .0012
.0008 .0014 .0004 .0008 .006
Coolant - - -
.0004 .004
.0012 .0006 .0012
A - - -
.0034 -.003
.0002 .0006 .0002

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc2321.htm[6/27/2012 2:10:32 PM]


3.2.3.2.1. Two-way Crossed Value-Splitting Example

-
.0018 .0014 .0018 -.002
.0016
- - -
-.005 .0044
.0028 .0016 .0008
-
.0012 .004 .0022 .0024
.0026
Coolant
- -
.0002 -.002 .0004
.0018 .0066
B
-
.004 .0024 .0032 .0034
.0008
- -
.0022 -.001 .0014
.0028 .0036

Sweep the The next step is to sweep out the row means. This gives the
row means table below.

Machine
1 2 3 4 5
- -
A .1243 .0019 .0003 .0029
.0037 .0013
- - -
B .1238 .003 .003
.0028 .0002 .0032

Sweep the Finally, we sweep the column means to obtain the grand mean,
column row (coolant) effects, column (machine) effects and the
means interaction effects.

Machine
1 2 3 4 5
- -
.1241 .0025 .00005 .003
.0033 .0023
- -
A .0003 .00025 .0000 .001
.0006 .0005
- -
B .0006 .0005 .0000 -.001
.0003 .00025

What do By looking at the table of residuals, we see that the residuals


these for coolant B tend to be a little higher than for coolant A. This
tables tell implies that there may be more variability in diameter when
us? we use coolant B. From the effects table above, we see that
machines 2 and 5 produce smaller pin diameters than the other
machines. There is also a very slight coolant effect but the
machine effect is larger. Finally, there also appears to be slight
interaction effects. For instance, machines 1 and 2 had smaller
diameters with coolant A but the opposite was true for
machines 3,4 and 5.

Calculate We can calculate the values for the ANOVA table according

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc2321.htm[6/27/2012 2:10:32 PM]


3.2.3.2.1. Two-way Crossed Value-Splitting Example

sums of to the formulae in the table on the crossed two-way page. This
squares gives the table below. From the F-values we see that the
and mean machine effect is significant but the coolant and the
squares interaction are not.

Sums of Degrees of Mean F-


Source
Squares Freedom Square value
8.8 >
Machine .000303 4 .000076
2.61
.45 <
Coolant .00000392 1 .00000392
4.08
.42 <
Interaction .00001468 4 .00000367
2.61
Residual .000346 40 .0000087  
Corrected
.000668 49    
Total

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc2321.htm[6/27/2012 2:10:32 PM]


3.2.3.3. Two-Way Nested ANOVA

3. Production Process Characterization


3.2. Assumptions / Prerequisites
3.2.3. Analysis of Variance Models (ANOVA)

3.2.3.3. Two-Way Nested ANOVA

Description Sometimes, constraints prevent us from crossing every level of one factor with every level of the
other factor. In these cases we are forced into what is known as a nested layout. We say we have
a nested layout when fewer than all levels of one factor occur within each level of the other
factor. An example of this might be if we want to study the effects of different machines and
different operators on some output characteristic, but we can't have the operators change the
machines they run. In this case, each operator is not crossed with each machine but rather only
runs one machine.

Model If Factor B is nested within Factor A, then a level of Factor B can only occur within one level of
Factor A and there can be no interaction. This gives the following model:

This equation indicates that each data value is the sum of a common value (grand mean), the
level effect for Factor A, the level effect of Factor B nested within Factor A, and the residual.

Estimation For a nested design we typically use variance components methods to perform the analysis.  We
can sweep out the common value, the Factor A effects, the Factor B within A effects and the
residuals using  value-splitting techniques. Sums of squares can be calculated and summarized in
an ANOVA table as shown below.

Click here It is important to note that with this type of layout, since each level of one factor is only present
for nested with one level of the other factor, we can't estimate interaction between the two.
value-
splitting
example

ANOVA
table for
nested case

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc233.htm[6/27/2012 2:10:33 PM]


3.2.3.3. Two-Way Nested ANOVA

The row labeled, "Corr. Total", in the ANOVA table contains the corrected total sum of squares
and the associated degrees of freedom (DoF).

As with the crossed layout, we can also use CLM techniques. We still have the problem that the
model is saturated and no unique solution exists. We overcome this problem by applying to the
model the constraints that the two main effects sum to zero.

Testing We are testing that two main effects are zero. Again we just form a ratio (F0 ) of each main effect
mean square to the appropriate mean-squared error term. (Note that the error term for Factor A is
not MSE, but is MSB.) If the assumptions stated below are true then those ratios follow an F
distribution and the test is performed by comparing the F0 ratios to values in an F table with the
appropriate degrees of freedom and confidence level.

Assumptions For estimation purposes, we assume the data can be adequately modeled by the model above and
that there is more than one variance component. It is assumed that the random component can be
modeled with a Gaussian distribution with fixed location and spread.

Uses The two-way nested ANOVA is useful when we are constrained from combining all the levels of
one factor with all of the levels of the other factor. These designs are most useful when we have
what is called a random effects situation. When the levels of a factor are chosen at random rather
than selected intentionally, we say we have a random effects model. An example of this is when
we select lots from a production run, then select units from the lot. Here the units are nested
within lots and the effect of each factor is random.

Example Let's change the two-way machining example slightly by assuming that we have five different
machines making the same part and each machine has two operators, one for the day shift and
one for the night shift. We take five samples from each machine for each operator to obtain the
following data:

Machine
1 2 3 4 5
0.125 0.118 0.123 0.126 0.118
Operator 0.127 0.122 0.125 0.128 0.129
Day 0.125 0.120 0.125 0.126 0.127
0.126 0.124 0.124 0.127 0.120
0.128 0.119 0.126 0.129 0.121
0.124 0.116 0.122 0.126 0.125
0.128 0.125 0.121 0.129 0.123
Operator
0.127 0.119 0.124 0.125 0.114
Night
0.126 0.125 0.126 0.130 0.124
0.129 0.120 0.125 0.124 0.117

Analyze For analysis details see the nested two-way value splitting example. We can summarize the
analysis results in an ANOVA table as follows: 

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc233.htm[6/27/2012 2:10:33 PM]


3.2.3.3. Two-Way Nested ANOVA

Source Sum of Squares Deg. of Freedom Mean Square F0


Machine 3.03e-4 4 7.58e-5 20.38
Operator(Machine) 1.86e-5 5 3.72e-6 0.428
Residuals 3.46e-4 40 8.70e-6  
Corrected Total 6.68e-4 49    

Test By dividing the mean square for Machine by the mean square for Operator within Machine, or
Operator(Machine), we obtain an F0 value of 20.38 which is greater than the critical value of
5.19 for 4 and 5 degrees of freedom at the 0.05 significance level. The F0 value for
Operator(Machine), obtained by dividing its mean square by the residual mean square, is less than
the critical value of 2.45 for 5 and 40 degrees of freedom at the 0.05 significance level.

Conclusion From the ANOVA table we can conclude that the Machine is the most important factor and is
statistically significant. The effect of Operator nested within Machine is not statistically
significant. Again, any improvement activities should be focused on the tools.

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc233.htm[6/27/2012 2:10:33 PM]


3.2.3.3.1. Two-Way Nested Value-Splitting Example

3. Production Process Characterization


3.2. Assumptions / Prerequisites
3.2.3. Analysis of Variance Models (ANOVA)
3.2.3.3. Two-Way Nested ANOVA

3.2.3.3.1. Two-Way Nested Value-Splitting Example

Example: The data table below contains data collected from five different lathes, each run by two
Operator different operators. Note we are concerned here with the effect of operators, so the
is nested layout is nested. If we were concerned with shift instead of operator, the layout would
within be crossed. The measurement is the diameter of a turned pin.
machine.
Sample
Machine Operator
1 2 3 4 5
Day .125 .127 .125 .126 .128
1
Night .124 .128 .127 .126 .129
Day .118 .122 .120 .124 .119
2
Night .116 .125 .119 .125 .120
Day .123 .125 .125 .124 .126
3
Night .122 .121 .124 .126 .125
Day .126 .128 .126 .127 .129
4
Night .126 .129 .125 .130 .124
Day .118 .129 .127 .120 .121
5
Night .125 .123 .114 .124 .117

For the nested two-way case, just as in the crossed case, the first thing we need to do is
to sweep the cell means from the data table to obtain the residual values. We then
sweep the nested factor (Operator) and the top level factor (Machine) to obtain the
table below.

Machine Operator   Sample


Common Machine Operator
  1 2 3 4 5
- - -
Day -.0003 .0008 .0018
.0012 .0012 .0002
1 .00246
- -
Night .0003 .0012 .002 .0022
.0028 .0008
- - -
Day -.0002 .0014 .0034
2 -.00324 .0026 .0006 .0016
Night .0002 -.005 .004 -.002 .004 -.001
- -
Day .0005 .0004 .0004 .0014
.0016 .0006
3 .12404 .00006

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc2331.htm[6/27/2012 2:10:34 PM]


3.2.3.3.1. Two-Way Nested Value-Splitting Example

- -
Night -.0005 .0004 .0024 .0014
.0016 .0026
- -
Day .0002 .0008 -.002 .0018
.0012 .0012
4 .00296
- - -
Night -.0002 .0022 .0032
.0008 .0018 .0028
Day .0012 -.005 .006 .004 -.003 -.002
5 -.00224 - -
Night -.0012 .0044 .0024 .0034
.0066 .0036

What By looking at the residuals we see that machines 2 and 5 have the greatest variability.
does this There does not appear to be much of an operator effect but there is clearly a strong
table tell machine effect.
us?

Calculate We can calculate the values for the ANOVA table according to the formulae in the
sums of table on the nested two-way page. This produces the table below. From the F-values
squares we see that the machine effect is significant but the operator effect is not. (Here it is
and mean assumed that both factors are fixed).
squares

Sums of Degrees of Mean


Source F-value
Squares Freedom Square
8.77 >
Machine .000303 4 .0000758
2.61
.428 <
Operator(Machine) .0000186 5 .00000372
2.45
Residual .000346 40 .0000087  
Corrected Total .000668 49    

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc2331.htm[6/27/2012 2:10:34 PM]


3.2.4. Discrete Models

3. Production Process Characterization


3.2. Assumptions / Prerequisites

3.2.4. Discrete Models

Description There are many instances when we are faced with the
analysis of discrete data rather than continuous data.
Examples of this are yield (good/bad), speed bins
(slow/fast/faster/fastest), survey results (favor/oppose), etc.
We then try to explain the discrete outcomes with some
combination of discrete and/or continuous explanatory
variables. In this situation the modeling techniques we have
learned so far (CLM and ANOVA) are no longer appropriate.

Contingency There are two primary methods available for the analysis of
table discrete response data. The first one applies to situations in
analysis and which we have discrete explanatory variables and discrete
log-linear responses and is known as Contingency Table Analysis. The
model model for this is covered in detail in this section. The second
model applies when we have both discrete and continuous
explanatory variables and is referred to as a Log-Linear
Model. That model is beyond the scope of this Handbook,
but interested readers should refer to the reference section of
this chapter for a list of useful books on the topic.

Model Suppose we have n individuals that we classify according to


two criteria, A and B. Suppose there are r levels of criterion
A and s levels of criterion B. These responses can be
displayed in an r x s table. For example, suppose we have a
box of manufactured parts that we classify as good or bad
and whether they came from supplier 1, 2 or 3.

Now, each cell of this table will have a count of the


individuals who fall into its particular combination of
classification levels. Let's call this count Nij . The sum of all
of these counts will be equal to the total number of
individuals, N. Also, each row of the table will sum to N i.
and each column will sum to N.j .

Under the assumption that there is no interaction between the


two classifying variables (like the number of good or bad
parts does not depend on which supplier they came from),
we can calculate the counts we would expect to see in each
cell. Let's call the expected count for any cell Eij . Then the
expected value for a cell is Eij = Ni. * N.j /N . All we need to
do then is to compare the expected counts to the observed

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc24.htm[6/27/2012 2:10:36 PM]


3.2.4. Discrete Models

counts. If there is a consderable difference between the


observed counts and the expected values, then the two
variables interact in some way.

Estimation The estimation is very simple. All we do is make a table of


the observed counts and then calculate the expected counts as
described above.

Testing The test is performed using a Chi-Square goodness-of-fit


test according to the following formula:

where the summation is across all of the cells in the table.

Given the assumptions stated below, this statistic has


approximately a chi-square distribution and is therefore
compared against a chi-square table with (r-1)(s-1) degrees
of freedom, with r and s as previously defined. If the value
of the test statistic is less than the chi-square value for a
given level of confidence, then the classifying variables are
declared independent, otherwise they are judged to be
dependent.

Assumptions The estimation and testing results above hold regardless of


whether the sample model is Poisson, multinomial, or
product-multinomial. The chi-square results start to break
down if the counts in any cell are small, say < 5.

Uses The contingency table method is really just a test of


interaction between discrete explanatory variables for
discrete responses. The example given below is for two
factors. The methods are equally applicable to more factors,
but as with any interaction, as you add more factors the
interpretation of the results becomes more difficult.

Example Suppose we are comparing the yield from two manufacturing


processes. We want want to know if one process has a higher
yield.

Make table Good Bad Totals


of counts
Process A 86 14 100
Process B 80 20 100
Totals 166 34 200
Table 1. Yields for two production processes

We obtain the expected values by the formula given above. 


This gives the table below.

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc24.htm[6/27/2012 2:10:36 PM]


3.2.4. Discrete Models

Calculate Good Bad Totals


expected
counts Process A 83 17 100
Process B 83 17 100
Totals 166 34 200
Table 2. Expected values for two production processes

Calculate The chi-square statistic is 1.276. This is below the chi-square


chi-square value for 1 degree of freedom and 90% confidence of 2.71 .
statistic and Therefore, we conclude that there is not a (significant)
compare to difference in process yield.
table value

Conclusion Therefore, we conclude that there is no statistically


significant difference between the two processes.

http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc24.htm[6/27/2012 2:10:36 PM]


3.3. Data Collection for PPC

3. Production Process Characterization

3.3. Data Collection for PPC

Start with The data collection process for PPC starts with careful
careful planning. The planning consists of the definition of clear and
planning concise goals, developing process models and devising a
sampling plan.

Many This activity of course ends without the actual collection of


things can the data which is usually not as straightforward as it might
go wrong appear. Many things can go wrong in the execution of the
in the data sampling plan. The problems can be mitigated with the use of
collection check lists and by carefully documenting all exceptions to the
original sampling plan.

Table of 1. Set Goals


Contents 2. Modeling Processes
1. Black-Box Models
2. Fishbone Diagrams
3. Relationships and Sensitivities
3. Define the Sampling Plan
1. Identify the parameters, ranges and
resolution
2. Design sampling scheme
3. Select sample sizes
4. Design data storage formats
5. Assign roles and responsibilities

http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc3.htm[6/27/2012 2:10:36 PM]


3.3.1. Define Goals

3. Production Process Characterization


3.3. Data Collection for PPC

3.3.1. Define Goals

State concise The goal statement is one of the most important parts of the
goals characterization plan. With clearly and concisely stated
goals, the rest of the planning process falls naturally into
place.

Goals The goals are usually defined in terms of key specifications


usually or manufacturing indices. We typically want to characterize
defined in a process and compare the results against these
terms of key specifications. However, this is not always the case. We
specifications may, for instance, just want to quantify key process
parameters and use our estimates of those parameters in
some other activity like controller design or process
improvement.

Example Click on each of the links below to see Goal Statements for
goal each of the case studies.
statements
1. Furnace Case Study (Goal)
2. Machine Case Study (Goal)

http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc31.htm[6/27/2012 2:10:37 PM]


3.3.2. Process Modeling

3. Production Process Characterization


3.3. Data Collection for PPC

3.3.2. Process Modeling

Identify Process modeling begins by identifying all of the important


influential factors and responses. This is usually best done as a team
parameters effort and is limited to the scope set by the goal statement.

Document This activity is best documented in the form of a black-box


with black- model as seen in the figure below. In this figure all of the
box outputs are shown on the right and all of the controllable
models inputs are shown on the left. Any inputs or factors that may be
observable but not controllable are shown on the top or
bottom.

Model The next step is to model relationships of the previously


relationships identified factors and responses. In this step we choose a
using parameter and identify all of the other parameters that may
fishbone have an influence on it. This process is easily documented
diagrams with fishbone diagrams as illustrated in the figure below.

http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc32.htm[6/27/2012 2:10:38 PM]


3.3.2. Process Modeling

The influenced parameter is put on the center line and the


influential factors are listed off of the centerline and can be
grouped into major categories like Tool, Material, Work
Methods and Environment.

Document The final step is to document all known information about


relationships the relationships and sensitivities between the inputs and
and outputs. Some of the inputs may be correlated with each
sensitivities other as well as the outputs. There may be detailed
mathematical models available from other studies or the
information available may be vague such as for a machining
process we know that as the feed rate increases, the quality
of the finish decreases.

It is best to document this kind of information in a table


with all of the inputs and outputs listed both on the left
column and on the top row. Then, correlation information
can be filled in for each of the appropriate cells. See the case
studies for an example.

Examples Click on each of the links below to see the process models
for each of the case studies.

1. Case Study 1 (Process Model)


2. Case Study 2 (Process Model)

http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc32.htm[6/27/2012 2:10:38 PM]


3.3.2. Process Modeling

http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc32.htm[6/27/2012 2:10:38 PM]


3.3.3. Define Sampling Plan

3. Production Process Characterization


3.3. Data Collection for PPC

3.3.3. Define Sampling Plan

Sampling A sampling plan is a detailed outline of which


plan is measurements will be taken at what times, on which
detailed material, in what manner, and by whom. Sampling plans
outline of should be designed in such a way that the resulting data
measurements will contain a representative sample of the parameters of
to be taken interest and allow for all questions, as stated in the goals, to
be answered.

Steps in the The steps involved in developing a sampling plan are:


sampling plan
1. identify the parameters to be measured, the range of
possible values, and the required resolution
2. design a sampling scheme that details how and when
samples will be taken
3. select sample sizes
4. design data storage formats
5. assign roles and responsibilities

Verify and Once the sampling plan has been developed, it can be
execute verified and then passed on to the responsible parties for
execution.

http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc33.htm[6/27/2012 2:10:38 PM]


3.3.3.1. Identifying Parameters, Ranges and Resolution

3. Production Process Characterization


3.3. Data Collection for PPC
3.3.3. Define Sampling Plan

3.3.3.1. Identifying Parameters, Ranges and


Resolution

Our goals and the models we built in the previous steps


should provide all of the information needed for selecting
parameters and determining the expected ranges and the
required measurement resolution.

Goals will The first step is to carefully examine the goals. This will tell
tell us what you which response variables need to be sampled and how.
to measure For instance, if our goal states that we want to determine if
and how an oxide film can be grown on a wafer to within 10
Angstroms of the target value with a uniformity of <2%,
then we know we have to measure the film thickness on the
wafers to an accuracy of at least +/- 3 Angstroms and we
must measure at multiple sites on the wafer in order to
calculate uniformity.

The goals and the models we build will also indicate which
explanatory variables need to be sampled and how. Since
the fishbone diagrams define the known important
relationships, these will be our best guide as to which
explanatory variables are candidates for measurement.

Ranges help Defining the expected ranges of values is useful for


screen screening outliers. In the machining example , we would not
outliers expect to see many values that vary more than +/- .005"
from nominal. Therefore we know that any values that are
much beyond this interval are highly suspect and should be
remeasured.

Resolution Finally, the required resolution for the measurements should


helps choose be specified. This specification will help guide the choice of
measurement metrology equipment and help define the measurement
equipment procedures. As a rule of thumb, we would like our
measurement resolution to be at least 1/10 of our tolerance.
For the oxide growth example, this means that we want to
measure with an accuracy of 2 Angstroms. Similarly, for the
turning operation we would need to measure the diameter
within .001". This means that vernier calipers would be
adequate as the measurement device for this application.

Examples Click on each of the links below to see the parameter

http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc331.htm[6/27/2012 2:10:39 PM]


3.3.3.1. Identifying Parameters, Ranges and Resolution

descriptions for each of the case studies.

1. Case Study 1 (Sampling Plan)


2. Case Study 2 (Sampling Plan)

http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc331.htm[6/27/2012 2:10:39 PM]


3.3.3.2. Choosing a Sampling Scheme

3. Production Process Characterization


3.3. Data Collection for PPC
3.3.3. Define Sampling Plan

3.3.3.2. Choosing a Sampling Scheme

A sampling A sampling scheme is a detailed description of what data


scheme will be obtained and how this will be done. In PPC we are
defines what faced with two different situations for developing
data will be sampling schemes. The first is when we are conducting a
obtained and controlled experiment. There are very efficient and exact
how methods for developing sampling schemes for designed
experiments and the reader is referred to the Process
Improvement chapter for details.

Passive data The second situation is when we are conducting a passive


collection data collection (PDC) study to learn about the inherent
properties of a process. These types of studies are usually
for comparison purposes when we wish to compare
properties of processes against each other or against some
hypothesis. This is the situation that we will focus on here.

There are two Once we have selected our response parameters, it would
principles that seem to be a rather straightforward exercise to take some
guide our measurements, calculate some statistics and draw
choice of conclusions. There are, however, many things which can
sampling go wrong along the way that can be avoided with careful
scheme planning and knowing what to watch for. There are two
overriding principles that will guide the design of our
sampling scheme.

The first is The first principle is that of precision. If the sampling


precision scheme is properly laid out, the difference between our
estimate of some parameter of interest and its true value
will be due only to random variation. The size of this
random variation is measured by a quantity called
standard error. The magnitude of the standard error is
known as precision. The smaller the standard error, the
more precise are our estimates.

Precision of The precision of any estimate will depend on:


an estimate
depends on the inherent variability of the process estimator
several factors the measurement error
the number of independent replications (sample size)
the efficiency of the sampling scheme.

http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc332.htm[6/27/2012 2:10:40 PM]


3.3.3.2. Choosing a Sampling Scheme

The second is The second principle is the avoidance of systematic errors.


systematic Systematic sampling error occurs when the levels of one
sampling error explanatory variable are the same as some other
(or unaccounted for explanatory variable. This is also referred
confounded to as confounded effects. Systematic sampling error is best
effects) seen by example.

Example 1: We want to compare the effect of


two different coolants on the resulting surface
finish from a turning operation. It is decided
to run one lot, change the coolant and then
run another lot. With this sampling scheme,
there is no way to distinguish the coolant
effect from the lot effect or from tool wear
considerations. There is systematic sampling
error in this sampling scheme.

Example 2: We wish to examine the effect of


two pre-clean procedures on the uniformity of
an oxide growth process. We clean one
cassette of wafers with one method and
another cassette with the other method. We
load one cassette in the front of the furnace
tube and the other cassette in the middle. To
complete the run, we fill the rest of the tube
with other lots. With this sampling scheme,
there is no way to distinguish between the
effect of the different pre-clean methods and
the cassette effect or the tube location effect.
Again, we have systematic sampling errors.

Stratification The way to combat systematic sampling errors (and at the


helps to same time increase precision) is through stratification and
overcome randomization. Stratification is the process of segmenting
systematic our population across levels of some factor so as to
error minimize variability within those segments or strata. For
instance, if we want to try several different process recipes
to see which one is best, we may want to be sure to apply
each of the recipes to each of the three work shifts. This
will ensure that we eliminate any systematic errors caused
by a shift effect. This is where the ANOVA designs are
particularly useful.

Randomization Randomization is the process of randomly applying the


helps too various treatment combinations. In the above example, we
would not want to apply recipe 1, 2 and 3 in the same
order for each of the three shifts but would instead
randomize the order of the three recipes in each shift. This
will avoid any systematic errors caused by the order of the
recipes.

http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc332.htm[6/27/2012 2:10:40 PM]


3.3.3.2. Choosing a Sampling Scheme

Examples The issues here are many and complicated. Click on each
of the links below to see the sampling schemes for each of
the case studies.

1. Case Study 1 (Sampling Plan)


2. Case Study 2 (Sampling Plan)

http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc332.htm[6/27/2012 2:10:40 PM]


3.3.3.3. Selecting Sample Sizes

3. Production Process Characterization


3.3. Data Collection for PPC
3.3.3. Define Sampling Plan

3.3.3.3. Selecting Sample Sizes

Consider When choosing a sample size, we must consider the


these things following issues:
when
selecting a What population parameters we want to estimate
sample size Cost of sampling (importance of information)
How much is already known
Spread (variability) of the population
Practicality: how hard is it to collect data
How precise we want the final estimates to be

Cost of The cost of sampling issue helps us determine how precise


taking our estimates should be. As we will see below, when
samples choosing sample sizes we need to select risk values.  If the
decisions we will make from the sampling activity are very
valuable, then we will want low risk values and hence
larger sample sizes.

Prior If our process has been studied before, we can use that prior
information information to reduce sample sizes. This can be done by
using prior mean and variance estimates and by stratifying
the population to reduce variation within groups.

Inherent We take samples to form estimates of some characteristic


variability of the population of interest. The variance of that estimate
is proportional to the inherent variability of the population
divided by the sample size:

with denoting the parameter we are trying to estimate.


This means that if the variability of the population is large,
then we must take many samples. Conversely, a small
population variance means we don't have to take as many
samples.

Practicality Of course the sample size you select must make sense. This
is where the trade-offs usually occur. We want to take
enough observations to obtain reasonably precise estimates

http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc333.htm[6/27/2012 2:10:41 PM]


3.3.3.3. Selecting Sample Sizes

of the parameters of interest but we also want to do this


within a practical resource budget. The important thing is to
quantify the risks associated with the chosen sample size.

Sample size In summary, the steps involved in estimating a sample size


determination are:

1. There must be a statement about what is expected of


the sample. We must determine what is it we are
trying to estimate, how precise we want the estimate
to be, and what are we going to do with the estimate
once we have it. This should easily be derived from
the goals.
2. We must find some equation that connects the desired
precision of the estimate with the sample size. This is
a probability statement. A couple are given below;
see your statistician if these are not appropriate for
your situation.
3. This equation may contain unknown properties of the
population such as the mean or variance. This is
where prior information can help.
4. If you are stratifying the population in order to reduce
variation, sample size determination must be
performed for each stratum.
5. The final sample size should be scrutinized for
practicality. If it is unacceptable, the only way to
reduce it is to accept less precision in the sample
estimate.

Sampling When we are sampling proportions we start with a


proportions probability statement about the desired precision. This is
given by:

where

is the estimated proportion


P is the unknown population parameter
is the specified precision of the estimate
is the probability value (usually low)

This equation simply shows that we want the probability


that the precision of our estimate being less than we want is
. Of course we like to set low, usually .1 or less.
Using some assumptions about the proportion being
approximately normally distributed we can obtain an
estimate of the required sample size as:

where z is the ordinate on the Normal curve corresponding


to .

http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc333.htm[6/27/2012 2:10:41 PM]


3.3.3.3. Selecting Sample Sizes

Example Let's say we have a new process we want to try. We plan to


run the new process and sample the output for yield
(good/bad). Our current process has been yielding 65%
(p=.65, q=.35). We decide that we want the estimate of the
new process yield to be accurate to within = .10 at 95%
confidence ( = .05, z α = -2). Using the formula above we
get a sample size estimate of n=91. Thus, if we draw 91
random parts from the output of the new process and
estimate the yield, then we are 95% sure the yield estimate
is within .10 of the true process yield.

Estimating If we are sampling continuous normally distributed


location: variables, quite often we are concerned about the relative
relative error error of our estimates rather than the absolute error. The
probability statement connecting the desired precision to
the sample size is given by:

where is the (unknown) population mean and is the


sample mean.

Again, using the normality assumptions we obtain the


estimated sample size to be:

with 2 denoting the population variance.

Estimating If instead of relative error, we wish to use absolute error,


location: the equation for sample size looks alot like the one for the
absolute case of proportions:
error

where is the population standard deviation (but in


practice is usually replaced by an engineering guesstimate).

Example Suppose we want to sample a stable process that deposits a


500 Angstrom film on a semiconductor wafer in order to
determine the process mean so that we can set up a control
chart on the process. We want to estimate the mean within
10 Angstroms ( = 10) of the true mean with 95%
confidence ( = .05, z α = -2). Our initial guess regarding
the variation in the process is that one standard deviation is
about 20 Angstroms. This gives a sample size estimate of n
= 16. Thus, if we take at least 16 samples from this process

http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc333.htm[6/27/2012 2:10:41 PM]


3.3.3.3. Selecting Sample Sizes

and estimate the mean film thickness, we can be 95% sure


that the estimate is within 10 angstroms of the true mean
value.

http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc333.htm[6/27/2012 2:10:41 PM]


3.3.3.4. Data Storage and Retrieval

3. Production Process Characterization


3.3. Data Collection for PPC
3.3.3. Define Sampling Plan

3.3.3.4. Data Storage and Retrieval

Data If you are in a small manufacturing facility or a lab, you can


control simply design a sampling plan, run the material, take the
depends measurements, fill in the run sheet and go back to your
on facility computer to analyze the results. There really is not much to be
size concerned with regarding data storage and retrieval.

In most larger facilities, however, the people handling the


material usually have nothing to do with the design. Quite
often the measurements are taken automatically and may not
even be made in the same country where the material was
produced. Your data go through a long chain of automatic
acquisition, storage, reformatting, and retrieval before you are
ever able to see it. All of these steps are fraught with peril and
should be examined closely to ensure that valuable data are
not lost or accidentally altered.

Know the In the planning phase of the PPC, be sure to understand the
process entire data collection process. Things to watch out for include:
involved
automatic measurement machines rejecting outliers
only summary statistics (mean and standard deviation)
being saved
values for explanatory variables (location, operator, etc.)
are not being saved
how missing values are handled

Consult It is important to consult with someone from the organization


with responsible for maintaining the data system early in the
support planning phase of the PPC.  It can also be worthwhile to
staff early perform some "dry runs" of the data collection to ensure you
on will be able to actually acquire the data in the format as
defined in the plan.

http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc334.htm[6/27/2012 2:10:42 PM]


3.3.3.5. Assign Roles and Responsibilities

3. Production Process Characterization


3.3. Data Collection for PPC
3.3.3. Define Sampling Plan

3.3.3.5. Assign Roles and Responsibilities

PPC is a team In today's manufacturing environment, it is unusual when


effort, get an investigative study is conducted by a single individual.
everyone Most PPC studies will be a team effort. It is important that
involved early all individuals who will be involved in the study become a
part of the team from the beginning. Many of the various
collateral activities will need approvals and sign-offs. Be
sure to account for that cycle time in your plan.

Table showing A partial list of these individuals along with their roles
roles and and potential responsibilities is given in the table below.
potential There may be multiple occurrences of each of these
responsibilities individuals across shifts or process steps, so be sure to
include everyone.

Tool Controls Schedules tool time


Owner Tool Ensures tool state
Operations Advises on
experimental design

Process Controls Advises on


Owner Process experimental design
Recipe Controls recipe settings

Tool Executes Executes experimental


Operator Experimental runs
Plan May take
measurements

Metrology Own Maintains metrology


Measurement equipment
Tools Conducts gauge studies
May take
measurements

CIM Owns Maintains data


Enterprise collection system
Information Maintains equipment
System interfaces and data
formatters
Maintains databases
and information access

http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc335.htm[6/27/2012 2:10:42 PM]


3.3.3.5. Assign Roles and Responsibilities

Statistician Consultant Consults on


experimental design
Consults on data
analysis

Quality Controls Ensures quality of


Control Material incoming material
Must approve shipment
of outgoing material
(especially for recipe
changes)

http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc335.htm[6/27/2012 2:10:42 PM]


3.4. Data Analysis for PPC

3. Production Process Characterization

3.4. Data Analysis for PPC

In this section we will learn how to analyze and interpret the


data we collected in accordance with our data collection plan.

Click on This section discusses the following topics:


desired
topic to 1. Initial Data Analysis 
read more 1. Gather Data
2. Quality Checking the Data 
3. Summary Analysis (Location, Spread and Shape) 
2. Exploring Relationships 
1. Response Correlations
2. Exploring Main Effects 
3. Exploring First-Order Interactions 
3. Building Models 
1. Fitting Polynomial Models 
2. Fitting Physical Models
4. Analyzing Variance Structure 
5. Assessing Process Stablility
6. Assessing Process Capability
7. Checking Assumptions 

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc4.htm[6/27/2012 2:10:43 PM]


3.4.1. First Steps

3. Production Process Characterization


3.4. Data Analysis for PPC

3.4.1. First Steps

Gather all After executing the data collection plan for the
of the data characterization study, the data must be gathered up for
into one analysis. Depending on the scope of the study, the data may
place reside in one place or in many different places. It may be in
common factory databases, flat files on individual computers,
or handwritten on run sheets. Whatever the case, the first step
will be to collect all of the data from the various sources and
enter it into a single data file.  The most convenient format for
most data analyses is the variables-in-columns format. This
format has the variable names in column headings and the
values for the variables in the rows.

Perform a The next step is to perform a quality check on the data. Here
quality we are typically looking for data entry problems, unusual data
check on values, missing data, etc. The two most useful tools for this
the data step are the scatter plot and the histogram. By constructing
using scatter plots of all of the response variables, any data entry
graphical problems will be easily identified.  Histograms of response
and variables are also quite useful for identifying data entry
numerical problems. Histograms of explanatory variables help identify
techniques problems with the execution of the sampling plan. If the
counts for each level of the explanatory variables are not the
same as called for in the sampling plan, you know you may
have an execution problem. Running numerical summary
statistics on all of the variables (both response and
explanatory) also helps to identify data problems.

Summarize Once the data quality problems are identified and fixed, we
data by should estimate the location, spread and shape for all of the
estimating response variables. This is easily done with a combination of
location, histograms and numerical summary statistics.
spread and
shape

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc41.htm[6/27/2012 2:10:44 PM]


3.4.2. Exploring Relationships

3. Production Process Characterization


3.4. Data Analysis for PPC

3.4.2. Exploring Relationships

The first Once we have a data file created in the desired format,
analysis of checked the data integrity, and have estimated the summary
our data is statistics on the response variables, the next step is to start
exploration exploring the data and to try to understand the underlying
structure. The most useful tools will be various forms of the
basic scatter plot and box plot.

These techniques will allow pairwise explorations for


examining relationships between any pair of response
variables, any pair of explanatory and response variables, or a
response variable as a function of any two explanatory
variables. Beyond three dimensions we are pretty much
limited by our human frailties at visualization.

Graph In this exploratory phase, the key is to graph everything that


everything makes sense to graph. These pictures will not only reveal any
that makes additional quality problems with the data but will also reveal
sense influential data points and will guide the subsequent modeling
activities.

Graph The order that generally proves most effective for data
responses, analysis is to first graph all of the responses against each
then other in a pairwise fashion. Then we graph responses against
explanatory the explanatory variables. This will give an indication of the
versus main factors that have an effect on response variables.
response, Finally, we graph response variables, conditioned on the
then levels of explanatory factors. This is what reveals interactions
conditional between explanatory variables. We will use nested boxplots
plots and block plots to visualize interactions.

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc42.htm[6/27/2012 2:10:44 PM]


3.4.2.1. Response Correlations

3. Production Process Characterization


3.4. Data Analysis for PPC
3.4.2. Exploring Relationships

3.4.2.1. Response Correlations

Make In this first phase of exploring our data, we plot all of the response variables in a
scatter pairwise fashion. The individual scatter plots are displayed in a matrix form with the
plots of y-axis scaling the same for all plots in a row of the matrix.
all of the
response
variables

Check the The scatterplot matrix shows how the response variables are related to each other. If
slope of there is a linear trend with a positive slope, this indicates that the responses are
the data positively correlated. If there is a linear trend with a negative slope, then the variables
on the are negatively correlated. If the data appear random with no slope, the variables are
scatter probably not correlated. This will be important information for subsequent model
plots building steps.

This An example of a scatterplot matrix is given below. In this semiconductor


scatterplot manufacturing example, three responses, yield (Bin1), N-channel Id effective
matrix (NIDEFF), and P-channel Id effective (PIDEFF) are plotted against each other in a
shows scatterplot matrix. We can see that Bin1 is positively correlated with NIDEFF and
examples negatively correlated with PIDEFF. Also, as expected, NIDEFF is negatively
of both correlated with PIDEFF. This kind of information will prove to be useful when we
negatively build models for yield improvement.
and
positively
correlated
variables

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc421.htm[6/27/2012 2:10:46 PM]


3.4.2.1. Response Correlations

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc421.htm[6/27/2012 2:10:46 PM]


3.4.2.2. Exploring Main Effects

3. Production Process Characterization


3.4. Data Analysis for PPC
3.4.2. Exploring Relationships

3.4.2.2. Exploring Main Effects

The next The next step in the exploratory analysis of our data is to see which factors have an
step is to effect on which response variables and to quantify that effect. Scatter plots and box
look for plots will be the tools of choice here.
main effects

Watch out This step is relatively self explanatory. However there are two points of caution. First,
for varying be cognizant of not only the trends in these graphs but also the amount of data
sample represented in those trends. This is especially true for categorical explanatory
sizes across variables. There may be many more observations in some levels of the categorical
levels variable than in others. In any event, take unequal sample sizes into account when
making inferences.

Graph The second point is to be sure to graph the responses against implicit explanatory
implicit as variables (such as observation order) as well as the explicit explanatory variables.
well as There may be interesting insights in these hidden explanatory variables.
explicit
explanatory
variables

Example: In the example below, we have collected data on the particles added to a wafer during
wafer a particular processing step. We ran a number of cassettes through the process and
processing sampled wafers from certain slots in the cassette. We also kept track of which load
lock the wafers passed through. This was done for two different process temperatures.
We measured both small particles (< 2 microns) and large particles (> 2 microns). We
plot the responses (particle counts) against each of the explanatory variables.

Cassette This first graph is a box plot of the number of small particles added for each cassette
does not type. The "X"'s in the plot represent the maximum, median, and minimum number of
appear to particles.
be an
important
factor for
small or
large
particles

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc422.htm[6/27/2012 2:10:49 PM]


3.4.2.2. Exploring Main Effects

The second graph is a box plot of the number of large particles added for each cassette
type.

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc422.htm[6/27/2012 2:10:49 PM]


3.4.2.2. Exploring Main Effects

We conclude from these two box plots that cassette does not appear to be an important
factor for small or large particles.

There is a We next generate box plots of small and large particles for the slot variable. First, the
difference box plot for small particles.
between
slots for
small
particles,
one slot is
different for
large
particles

Next, the box plot for large particles.

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc422.htm[6/27/2012 2:10:49 PM]


3.4.2.2. Exploring Main Effects

We conclude that there is a difference between slots for small particles. We also
conclude that one slot appears to be different for large particles.

Load lock We next generate box plots of small and large particles for the load lock variable.
may have a First, the box plot for small particles.
slight effect
for small
and large
particles

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc422.htm[6/27/2012 2:10:49 PM]


3.4.2.2. Exploring Main Effects

Next, the box plot for large particles.

We conclude that there may be a slight effect for load lock for small and large

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc422.htm[6/27/2012 2:10:49 PM]


3.4.2.2. Exploring Main Effects

particles.

For small We next generate box plots of small and large particles for the temperature variable.
particles, First, the box plot for small particles.
temperature
has a
strong
effect on
both
location
and spread.
For large
particles,
there may
be a slight
temperature
effect but
this may
just be due
to the
outliers

Next, the box plot for large particles.

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc422.htm[6/27/2012 2:10:49 PM]


3.4.2.2. Exploring Main Effects

'

We conclude that temperature has a strong effect on both location and spread for small
particles. We conclude that there might be a small temperature effect for large
particles, but this may just be due to outliers.

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc422.htm[6/27/2012 2:10:49 PM]


3.4.2.3. Exploring First Order Interactions

3. Production Process Characterization


3.4. Data Analysis for PPC
3.4.2. Exploring Relationships

3.4.2.3. Exploring First Order Interactions

It is The final step (and perhaps the most important one) in the exploration phase is to find
important any first order interactions. When the difference in the response between the levels of
to identify one factor is not the same for all of the levels of another factor we say we have an
interactions interaction between those two factors. When we are trying to optimize responses based
on factor settings, interactions provide for compromise.

The eyes Interactions can be seen visually by using nested box plots. However, caution should
can be be exercised when identifying interactions through graphical means alone. Any
deceiving - graphically identified interactions should be verified by numerical methods as well.
be careful

Previous To continue the previous example, given below are nested box plots of the small and
example large particles. The load lock is nested within the two temperature values. There is
continued some evidence of possible interaction between these two factors. The effect of load
lock is stronger at the lower temperature than at the higher one. This effect is stronger
for the smaller particles than for the larger ones. As this example illustrates, when you
have significant interactions the main effects must be interpreted conditionally. That
is, the main effects do not tell the whole story by themselves.

For small The following is the box plot of small particles for load lock nested within
particles, temperature.
the load
lock effect
is not as
strong for
high
temperature
as it is for
low
temperature

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc423.htm[6/27/2012 2:10:51 PM]


3.4.2.3. Exploring First Order Interactions

We conclude from this plot that for small particles, the load lock effect is not as strong
for high temperature as it is for low temperature.

The same The following is the box plot of large particles for load lock nested within temperature.
may be true
for large
particles
but not as
strongly

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc423.htm[6/27/2012 2:10:51 PM]


3.4.2.3. Exploring First Order Interactions

We conclude from this plot that for large particles, the load lock effect may not be as
strong for high temperature as it is for low temperature. However, this effect is not as
strong as it is for small particles.

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc423.htm[6/27/2012 2:10:51 PM]


3.4.3. Building Models

3. Production Process Characterization


3.4. Data Analysis for PPC

3.4.3. Building Models

Black box When we develop a data collection plan we build black box models of
models the process we are studying like the one below:

In our data
collection plan
we drew
process model
pictures

Numerical In the Exploring Relationships section, we looked at how to identify


models are the input/output relationships through graphical methods. However, if
explicit we want to quantify the relationships and test them for statistical
representations significance, we must resort to building mathematical models.
of our process
model pictures

Polynomial There are two cases that we will cover for building mathematical
models are models. If our goal is to develop an empirical prediction equation or
generic to identify statistically significant explanatory variables and quantify
descriptors of their influence on output responses, we typically build polynomial
our output models. As the name implies, these are polynomial functions
surface (typically linear or quadratic functions) that describe the relationships
between the explanatory variables and the response variable.

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc43.htm[6/27/2012 2:10:52 PM]


3.4.3. Building Models

Physical On the other hand, if our goal is to fit an existing theoretical equation,
models then we want to build physical models. Again, as the name implies,
describe the this pertains to the case when we already have equations representing
underlying the physics involved in the process and we want to estimate specific
physics of our parameter values.
processes

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc43.htm[6/27/2012 2:10:52 PM]


3.4.3.1. Fitting Polynomial Models

3. Production Process Characterization


3.4. Data Analysis for PPC
3.4.3. Building Models

3.4.3.1. Fitting Polynomial Models

Polynomial We use polynomial models to estimate and predict the shape


models are of response values over a range of input parameter values.
a great tool Polynomial models are a great tool for determining which
for input factors drive responses and in what direction. These are
determining also the most common models used for analysis of designed
which input experiments. A quadratic (second-order) polynomial model
factors for two explanatory variables has the form of the equation
drive below. The single x-terms are called the main effects. The
responses squared terms are called the quadratic effects and are used to
and in what model curvature in the response surface. The cross-product
direction terms are used to model interactions between the explanatory
variables.

We In most engineering and manufacturing applications we are


generally concerned with at most second-order polynomial models.
don't need Polynomial equations obviously could become much more
more than complicated as we increase the number of explanatory
second- variables and hence the number of cross-product terms.
order Fortunately, we rarely see significant interaction terms above
equations the two-factor level. This helps to keep the equations at a
manageable level.

Use When the number of factors is small (less than 5), the
multiple complete polynomial equation can be fitted using the
regression technique known as multiple regression. When the number of
to fit factors is large, we should use a technique known as stepwise
polynomial regression. Most statistical analysis programs have a stepwise
models regression capability. We just enter all of the terms of the
polynomial models and let the software choose which terms
best describe the data. For a more thorough discussion of this
topic and some examples, refer to the process improvement
chapter.

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc431.htm[6/27/2012 2:10:53 PM]


3.4.3.2. Fitting Physical Models

3. Production Process Characterization


3.4. Data Analysis for PPC
3.4.3. Building Models

3.4.3.2. Fitting Physical Models

Sometimes, rather than approximating response behavior with


Sometimes
polynomial models, we know and can model the physics behind the
we want
underlying process. In these cases we would want to fit physical
to use a
models to our data. This kind of modeling allows for better prediction
physical
and is less subject to variation than polynomial models (as long as the
model
underlying process doesn't change).

We will illustrate this concept with an example. We have collected


We will
data on a chemical/mechanical planarization process (CMP) at a
use a
particular semiconductor processing step. In this process, wafers are
CMP
polished using a combination of chemicals in a polishing slurry using
process to
polishing pads. We polished a number of wafers for differing periods
illustrate
of time in order to calculate material removal rates.

CMP From first principles we know that removal rate changes with time.
removal Early on, removal rate is high and as the wafer becomes more planar
rate can the removal rate declines. This is easily modeled with an exponential
be function of the form:
modeled
with a removal rate = p1 + p2 x exp  p3 x time
non-linear
equation where p1, p2, and p3 are the parameters we want to estimate.

A non- The equation was fit to the data using a non-linear regression routine.
linear A plot of the original data and the fitted line are given in the image
regression below. The fit is quite good. This fitted equation was subsequently
routine used in process optimization work.
was used
to fit the
data to
the
equation

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc432.htm[6/27/2012 2:10:53 PM]


3.4.3.2. Fitting Physical Models

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc432.htm[6/27/2012 2:10:53 PM]


3.4.4. Analyzing Variance Structure

3. Production Process Characterization


3.4. Data Analysis for PPC

3.4.4. Analyzing Variance Structure

Studying One of the most common activities in process characterization work is to study the
variation is variation associated with the process and to try to determine the important sources
important of that variation. This is called analysis of variance. Refer to the section of this
in PPC chapter on ANOVA models for a discussion of the theory behind this kind of
analysis.

The key to performing an analysis of variance is identifying the structure


The key is
represented by the data. In the ANOVA models section we discussed one-way
to know the
layouts and two-way layouts where the factors are either crossed or nested. Review
structure
these sections if you want to learn more about ANOVA structural layouts.

To perform the analysis, we just identify the structure, enter the data for each of
the factors and levels into a statistical analysis program and then interpret the
ANOVA table and other output. This is all illustrated in the example below.

Example: The example is a furnace operation in semiconductor manufacture where we are


furnace growing an oxide layer on a wafer. Each lot of wafers is placed on quartz
oxide containers (boats) and then placed in a long tube-furnace. They are then raised to a
thickness certain temperature and held for a period of time in a gas flow. We want to
with a 1- understand the important factors in this operation. The furnace is broken down into
way layout four sections (zones) and two wafers from each lot in each zone are measured for
the thickness of the oxide layer.

Look at The first thing to look at is the effect of zone location on the oxide thickness. This
effect of is a classic  one-way layout. The factor is furnace zone and we have four levels. A
zone plot of the data and an ANOVA table are given below.
location on
oxide
thickness

The zone
effect is
masked by
the lot-to-
lot variation

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc44.htm[6/27/2012 2:10:54 PM]


3.4.4. Analyzing Variance Structure

ANOVA Analysis of Variance


table  
Source DF SS Mean Square F Ratio Prob > F
Zone 3 912.6905 304.23 0.467612 0.70527
Within 164 106699.1 650.604    

Let's From the graph there does not appear to be much of a zone effect; in fact, the
account for ANOVA table indicates that it is not significant. The problem is that variation due
lot with a to lots is so large that it is masking the zone effect. We can fix this by adding a
nested factor for lot. By treating this as a nested two-way layout, we obtain the ANOVA
layout table below.

Now both Analysis of Variance


lot and zone  
are Source DF SS Mean Square F Ratio Prob > F
revealed as
Lot 20 61442.29 3072.11 5.37404 1.39e-7
important
Zone[lot] 63 36014.5 571.659 4.72864 3.9e-11
Within 84 10155 120.893    

Conclusions Since the "Prob > F" is less than 0.05, for both lot and zone, we know that these
factors are statistically significant at the 0.05 significance level.

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc44.htm[6/27/2012 2:10:54 PM]


3.4.5. Assessing Process Stability

3. Production Process Characterization


3.4. Data Analysis for PPC

3.4.5. Assessing Process Stability

A process is A manufacturing process cannot be released to production


stable if it has until it has been proven to be stable. Also, we cannot
a constant begin to talk about process capability until we have
mean and a demonstrated stability in our process. A process is said to
constant be stable when all of the response parameters that we use
variance over to measure the process have both constant means and
time constant variances over time, and also have a constant
distribution. This is equivalent to our earlier definition of
controlled variation.

The graphical The graphical tool we use to assess process stability is the
tool we use to scatter plot. We collect a sufficient number of
assess stability independent samples (greater than 100) from our process
is the scatter over a sufficiently long period of time (this can be
plot or the specified in days, hours of processing time or number of
control chart parts processed) and plot them on a scatter plot with
sample order on the x-axis and the sample value on the y-
axis. The plot should look like constant random variation
about a constant mean. Sometimes it is helpful to
calculate control limits and plot them on the scatter plot
along with the data. The two plots in the controlled
variation example are good illustrations of stable and
unstable processes.

Numerically, Numerically, we evaluate process stability through a times


we assess its series analysis concept know as stationarity. This is just
stationarity another way of saying that the process has a constant
using the mean and a constant variance. The numerical technique
autocorrelation used to assess stationarity is the autocovariance function.
function

Graphical Typically, graphical methods are good enough for


methods evaluating process stability. The numerical methods are
usually good generally only used for modeling purposes.
enough

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc45.htm[6/27/2012 2:10:55 PM]


3.4.6. Assessing Process Capability

3. Production Process Characterization


3.4. Data Analysis for PPC

3.4.6. Assessing Process Capability

Capability Process capability analysis entails comparing the performance of a process against its
compares a specifications. We say that a process is capable if virtually all of the possible variable
process values fall within the specification limits.
against its
specification

Use a Graphically, we assess process capability by plotting the process specification limits on
capability a histogram of the observations. If the histogram falls within the specification limits,
chart then the process is capable. This is illustrated in the graph below. Note how the
process is shifted below target and the process variation is too large. This is an
example of an incapable process.

Notice how
the process is
off target and
has too much
variation

Numerically, we measure capability with a capability index. The general equation for
the capability index, Cp , is:

Numerically,
we use the C

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc46.htm[6/27/2012 2:10:56 PM]


3.4.6. Assessing Process Capability

p
index

Interpretation This equation just says that the measure of our process capability is how much of our
of the Cp observed process variation is covered by the process specifications. In this case the
index process variation is measured by 6 standard deviations (+/- 3 on each side of the
mean). Clearly, if Cp > 1.0, then the process specification covers almost all of our
process observations.

Cp does not The only problem with with the Cp index is that it does not account for a process that
account for is off-center. We can modify this equation slightly to account for off-center processes
process that to obtain the Cpk index as follows:
is off center

Or the Cpk
index

Cpk accounts This equation just says to take the minimum distance between our specification limits
for a process and the process mean and divide it by 3 standard deviations to arrive at the measure of
being off process capability. This is all covered in more detail in the process capability section
center of the process monitoring chapter. For the example above, note how the Cpk value is
less than the Cp value. This is because the process distribution is not centered between
the specification limits.

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc46.htm[6/27/2012 2:10:56 PM]


3.4.7. Checking Assumptions

3. Production Process Characterization


3.4. Data Analysis for PPC

3.4.7. Checking Assumptions

Many of the techniques discussed in this chapter, such as hypothesis tests, control
Check the
charts and capability indices, assume that the underlying structure of the data can be
normality of
adequately modeled by a normal distribution. Many times we encounter data where
the data
this is not the case.

Some causes There are several things that could cause the data to appear non-normal, such as:
of non-
normality The data come from two or more different sources. This type of data will often
have a multi-modal distribution. This can be solved by identifying the reason for
the multiple sets of data and analyzing the data separately.
The data come from an unstable process. This type of data is nearly impossible
to analyze because the results of the analysis will have no credibility due to the
changing nature of the process.
The data were generated by a stable, yet fundamentally non-normal mechanism.
For example, particle counts are non-normal by the very nature of the particle
generation process. Data of this type can be handled using transformations.

We can For the last case, we could try transforming the data using what is known as a power
sometimes transformation. The power transformation is given by the equation:
transform the
data to make it
look normal

where Y represents the data and lambda is the transformation value. Lambda is
typically any value between -2 and 2. Some of the more common values for lambda
are 0, 1/2, and -1, which give the following transformations:

General The general algorithm for trying to make non-normal data appear to be approximately
algorithm for normal is to:
trying to make
non-normal 1. Determine if the data are non-normal. (Use normal probability plot and
data histogram).
approximately 2. Find a transformation that makes the data look approximately normal, if
normal possible. Some data sets may include zeros (i.e., particle data). If the data set
does include zeros, you must first add a constant value to the data and then
transform the results.

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc47.htm[6/27/2012 2:10:57 PM]


3.4.7. Checking Assumptions

Example: As an example, let's look at some particle count data from a semiconductor processing
particle count step. Count data are inherently non-normal. Below are histograms and normal
data probability plots for the original data and the ln, sqrt and inverse of the data. You can
see that the log transform does the best job of making the data appear as if it is normal.
All analyses can be performed on the log-transformed data and the assumptions will
be approximately satisfied.

The original
data is non-
normal, the
log transform
looks fairly
normal

Neither the
square root
nor the
inverse
transformation
looks normal

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc47.htm[6/27/2012 2:10:57 PM]


3.4.7. Checking Assumptions

http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc47.htm[6/27/2012 2:10:57 PM]


3.5. Case Studies

3. Production Process Characterization

3.5. Case Studies

Summary This section presents several case studies that demonstrate the
application of production process characterizations to specific
problems.

Table of The following case studies are available.


Contents
1. Furnace Case Study
2. Machine Case Study

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc5.htm[6/27/2012 2:10:58 PM]


3.5.1. Furnace Case Study

3. Production Process Characterization


3.5. Case Studies

3.5.1. Furnace Case Study

Introduction This case study analyzes a furnace oxide growth process.

Table of The case study is broken down into the following steps.
Contents
1. Background and Data
2. Initial Analysis of Response Variable
3. Identify Sources of Variation
4. Analysis of Variance
5. Final Conclusions
6. Work This Example Yourself

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc51.htm[6/27/2012 2:10:58 PM]


3.5.1.1. Background and Data

3. Production Process Characterization


3.5. Case Studies
3.5.1. Furnace Case Study

3.5.1.1. Background and Data

Introduction In a semiconductor manufacturing process flow, we have a


step whereby we grow an oxide film on the silicon wafer
using a furnace. In this step, a cassette of wafers is placed in
a quartz "boat" and the boats are placed in the furnace. The
furnace can hold four boats. A gas flow is created in the
furnace and it is brought up to temperature and held there for
a specified period of time (which corresponds to the desired
oxide thickness). This study was conducted to determine if
the process was stable and to characterize sources of
variation so that a process control strategy could be
developed.

Goal The goal of this study is to determine if this process is


capable of consistently growing oxide films with a thickness
of 560 Angstroms +/- 100 Angstroms. An additional goal is
to determine important sources of variation for use in the
development of a process control strategy.

Software The analyses used in this case study can be generated using
both Dataplot code and R code.

Process In the picture below we are modeling this process with one
Model output (film thickness) that is influenced by four controlled
factors (gas flow, pressure, temperature and time) and two
uncontrolled factors (run and zone). The four controlled
factors are part of our recipe and will remain constant
throughout this study. We know that there is run-to-run
variation that is due to many different factors (input material
variation, variation in consumables, etc.). We also know that
the different zones in the furnace have an effect. A zone is a
region of the furnace tube that holds one boat. There are four
zones in these tubes. The zones in the middle of the tube
grow oxide a little bit differently from the ones on the ends.
In fact, there are temperature offsets in the recipe to help
minimize this problem.

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc511.htm[6/27/2012 2:10:59 PM]


3.5.1.1. Background and Data

Sensitivity The sensitivity model for this process is fairly straightforward


Model and is given in the figure below. The effects of the machin are
mostly related to the preventative maintenance (PM) cycle.
We want to make sure the quartz tube has been cleaned
recently, the mass flow controllers are in good shape and the
temperature controller has been calibrated recently. The same
is true of the measurement equipment where the thickness
readings will be taken. We want to make sure a gauge study
has been performed. For material, the incoming wafers will
certainly have an effect on the outgoing thickness as well as
the quality of the gases used. Finally, the recipe will have an
effect including gas flow, temperature offset for the different
zones, and temperature profile (how quickly we raise the
temperature, how long we hold it and how quickly we cool it
off).

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc511.htm[6/27/2012 2:10:59 PM]


3.5.1.1. Background and Data

Sampling Given our goal statement and process modeling, we can now
Plan define a sampling plan. The primary goal is to determine if the
process is capable. This just means that we need to monitor the
process over some period of time and compare the estimates
of process location and spread to the specifications. An
additional goal is to identify sources of variation to aid in
setting up a process control strategy. Some obvious sources of
variation are incoming wafers, run-to-run variability, variation
due to operators or shift, and variation due to zones within a
furnace tube. One additional constraint that we must work
under is that this study should not have a significant impact on
normal production operations.

Given these constraints, the following sampling plan was


selected. It was decided to monitor the process for one day
(three shifts). Because this process is operator independent, we
will not keep shift or operator information but just record run
number. For each run, we will randomly assign cassettes of
wafers to a zone. We will select two wafers from each zone
after processing and measure two sites on each wafer. This
plan should give reasonable estimates of run-to-run variation
and within zone variability as well as good overall estimates
of process location and spread.

We are expecting readings around 560 Angstroms. We would


not expect many readings above 700 or below 400. The
measurement equipment is accurate to within 0.5 Angstroms
which is well within the accuracy needed for this study.

Data The following are the data that were collected for this study.
RUN ZONE WAFER THICKNESS
--------------------------------
1 1 1 546

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc511.htm[6/27/2012 2:10:59 PM]


3.5.1.1. Background and Data

1 1 2 540
1 2 1 566
1 2 2 564
1 3 1 577
1 3 2 546
1 4 1 543
1 4 2 529
2 1 1 561
2 1 2 556
2 2 1 577
2 2 2 553
2 3 1 563
2 3 2 577
2 4 1 556
2 4 2 540
3 1 1 515
3 1 2 520
3 2 1 548
3 2 2 542
3 3 1 505
3 3 2 487
3 4 1 506
3 4 2 514
4 1 1 568
4 1 2 584
4 2 1 570
4 2 2 545
4 3 1 589
4 3 2 562
4 4 1 569
4 4 2 571
5 1 1 550
5 1 2 550
5 2 1 562
5 2 2 580
5 3 1 560
5 3 2 554
5 4 1 545
5 4 2 546
6 1 1 584
6 1 2 581
6 2 1 567
6 2 2 558
6 3 1 556
6 3 2 560
6 4 1 591
6 4 2 599
7 1 1 593
7 1 2 626
7 2 1 584
7 2 2 559
7 3 1 634
7 3 2 598
7 4 1 569
7 4 2 592
8 1 1 522
8 1 2 535
8 2 1 535
8 2 2 581
8 3 1 527
8 3 2 520
8 4 1 532
8 4 2 539
9 1 1 562
9 1 2 568
9 2 1 548
9 2 2 548
9 3 1 533
9 3 2 553
9 4 1 533
9 4 2 521
10 1 1 555
10 1 2 545
10 2 1 584
10 2 2 572
10 3 1 546
10 3 2 552
10 4 1 586
10 4 2 584
11 1 1 565
11 1 2 557

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc511.htm[6/27/2012 2:10:59 PM]


3.5.1.1. Background and Data

11 2 1 583
11 2 2 585
11 3 1 582
11 3 2 567
11 4 1 549
11 4 2 533
12 1 1 548
12 1 2 528
12 2 1 563
12 2 2 588
12 3 1 543
12 3 2 540
12 4 1 585
12 4 2 586
13 1 1 580
13 1 2 570
13 2 1 556
13 2 2 569
13 3 1 609
13 3 2 625
13 4 1 570
13 4 2 595
14 1 1 564
14 1 2 555
14 2 1 585
14 2 2 588
14 3 1 564
14 3 2 583
14 4 1 563
14 4 2 558
15 1 1 550
15 1 2 557
15 2 1 538
15 2 2 525
15 3 1 556
15 3 2 547
15 4 1 534
15 4 2 542
16 1 1 552
16 1 2 547
16 2 1 563
16 2 2 578
16 3 1 571
16 3 2 572
16 4 1 575
16 4 2 584
17 1 1 549
17 1 2 546
17 2 1 584
17 2 2 593
17 3 1 567
17 3 2 548
17 4 1 606
17 4 2 607
18 1 1 539
18 1 2 554
18 2 1 533
18 2 2 535
18 3 1 522
18 3 2 521
18 4 1 547
18 4 2 550
19 1 1 610
19 1 2 592
19 2 1 587
19 2 2 587
19 3 1 572
19 3 2 612
19 4 1 566
19 4 2 563
20 1 1 569
20 1 2 609
20 2 1 558
20 2 2 555
20 3 1 577
20 3 2 579
20 4 1 552
20 4 2 558
21 1 1 595
21 1 2 583
21 2 1 599

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc511.htm[6/27/2012 2:10:59 PM]


3.5.1.1. Background and Data

21 2 2 602
21 3 1 598
21 3 2 616
21 4 1 580
21 4 2 575

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc511.htm[6/27/2012 2:10:59 PM]


3.5.1.2. Initial Analysis of Response Variable

3. Production Process Characterization


3.5. Case Studies
3.5.1. Furnace Case Study

3.5.1.2. Initial Analysis of Response Variable

Initial Plots The initial step is to assess data quality and to look for anomalies. This is done by
of Response generating a normal probability plot, a histogram, and a boxplot. For convenience,
Variable these are generated on a single page.

Conclusions We can make the following conclusions based on these initial plots.
From the
Plots The box plot indicates one outlier. However, this outlier is only slightly
smaller than the other numbers.

The normal probability plot and the histogram (with an overlaid normal
density) indicate that this data set is reasonably approximated by a normal
distribution.

Parameter Parameter estimates for the film thickness are summarized in


Estimates the following table.

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc512.htm[6/27/2012 2:11:00 PM]


3.5.1.2. Initial Analysis of Response Variable

Parameter Estimates
Lower Upper
(95%) (95%)
Type Parameter Estimate
Confidence Confidence
Bound Bound
Location Mean 563.0357 559.1692 566.9023
Standard
Dispersion 25.3847 22.9297 28.4331
Deviation

Quantiles Quantiles for the film thickness are summarized in the following table.

Quantiles for Film Thickness


100.0% Maximum 634.00
99.5%   634.00
97.5%   615.10
90.0%   595.00
75.0% Upper Quartile 582.75
50.0% Median 562.50
25.0% Lower Quartile 546.25
10.0%   532.90
2.5%   514.23
0.5%   487.00
0.0% Minimum 487.00

Capability From the above preliminary analysis, it looks reasonable to proceed with the
Analysis capability analysis.

The lower specification limit is 460, the upper specification limit is 660, and
the target specification is 560.

Percent We summarize the percent defective (i.e., the number of items outside the
Defective specification limits) in the following table.

Percentage Outside Specification Limits


Theoretical
Specification Value Percent Actual (% Based On

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc512.htm[6/27/2012 2:11:00 PM]


3.5.1.2. Initial Analysis of Response Variable

Normal)
Lower Percent Below LSL =
Specification 460 0.0000 0.0025%
100* ((LSL - )/s)
Limit
Upper Percent Above USL =
Specification 660 100*(1 - ((USL - 0.0000 0.0067%
Limit )/s))
Combined Percent
Specification
560 Below LSL and Above 0.0000 0.0091%
Target
USL
Standard
25.38468      
Deviation

The value denotes the normal cumulative distribution function, the


sample mean, and s the sample standard deviation.

Capability We summarize various capability index statistics in the following table.


Index
Statistics Capability Index Statistics
Capability Statistic Index Lower CI Upper CI
CP 1.313 1.172 1.454
CPK 1.273 1.128 1.419
CPM 1.304 1.165 1.442
CPL 1.353 1.218 1.488
CPU 1.273 1.142 1.404

Conclusions The above capability analysis indicates that the process is capable and we
can proceed with the analysis.

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc512.htm[6/27/2012 2:11:00 PM]


3.5.1.3. Identify Sources of Variation

3. Production Process Characterization


3.5. Case Studies
3.5.1. Furnace Case Study

3.5.1.3. Identify Sources of Variation

The next part of the analysis is to break down the sources of variation.

Box Plot by The following is a box plot of the thickness by run number.
Run

Conclusions We can make the following conclusions from this box plot.
From Box
Plot 1. There is significant run-to-run variation.

2. Although the means of the runs are different, there is no discernable trend due
to run.

3. In addition to the run-to-run variation, there is significant within-run variation


as well. This suggests that a box plot by furnace location may be useful as
well.

Box Plot by The following is a box plot of the thickness by furnace location.

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc513.htm[6/27/2012 2:11:01 PM]


3.5.1.3. Identify Sources of Variation

Furnace
Location

Conclusions We can make the following conclusions from this box plot.
From Box
Plot 1. There is considerable variation within a given furnace location.

2. The variation between furnace locations is small. That is, the locations and
scales of each of the four furnace locations are fairly comparable (although
furnace location 3 seems to have a few mild outliers).

Box Plot by The following is a box plot of the thickness by wafer.


Wafer

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc513.htm[6/27/2012 2:11:01 PM]


3.5.1.3. Identify Sources of Variation

Conclusion From this box plot, we conclude that wafer does not seem to be a significant factor.
From Box
Plot

Block Plot In order to show the combined effects of run, furnace location, and wafer, we draw a
block plot of the thickness. Note that for aesthetic reasons, we have used connecting
lines rather than enclosing boxes.

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc513.htm[6/27/2012 2:11:01 PM]


3.5.1.3. Identify Sources of Variation

Conclusions We can draw the following conclusions from this block plot.
From Block
Plot 1. There is significant variation both between runs and between furnace
locations. The between-run variation appears to be greater.

2. Run 3 seems to be an outlier.

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc513.htm[6/27/2012 2:11:01 PM]


3.5.1.4. Analysis of Variance

3. Production Process Characterization


3.5. Case Studies
3.5.1. Furnace Case Study

3.5.1.4. Analysis of Variance

Analysis of The next step is to confirm our interpretation of the plots in


Variance the previous section by running a nested analysis of variance.

Analysis of Variance
Source Degrees Sum of Mean F Ratio Prob > F
of Squares Square
Freedom Error
Run 20 61,442.29 3,072.11 5.37404 0.0000001
Furnace 63 36,014.5 571.659 4.72864 3.85e-11
Location
[Run]
Within 84 10,155 120.893    
Total 167 107,611.8 644.382    

Components From the above analysis of variance table, we can compute


of Variance the components of variance. Recall that for this data set we
have 2 wafers measured at 4 furnace locations for 21 runs.
This leads to the following set of equations.

3072.11 = (4*2)*Var(Run) + 2*Var(Furnace Location)


+ Var(Within)
571.659 = 2*Var(Furnace Location) + Var(Within)
120.893 = Var(Within)

Solving these equations yields the following components of


variance.

Components of Variance
Component Variance Percent Sqrt(Variance
Component of Total Component)
Run 312.55694 47.44 17.679
Furnace 225.38294 34.21 15.013
Location[Run]
Within 120.89286 18.35 10.995

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc514.htm[6/27/2012 2:11:02 PM]


3.5.1.4. Analysis of Variance

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc514.htm[6/27/2012 2:11:02 PM]


3.5.1.5. Final Conclusions

3. Production Process Characterization


3.5. Case Studies
3.5.1. Furnace Case Study

3.5.1.5. Final Conclusions

Final This simple study of a furnace oxide growth process


Conclusions indicated that the process is capable and showed that both
run-to-run and zone-within-run are significant sources of
variation. We should take this into account when designing
the control strategy for this process. The results also pointed
to where we should look when we perform process
improvement activities.

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc515.htm[6/27/2012 2:11:03 PM]


3.5.1.6. Work This Example Yourself

3. Production Process Characterization


3.5. Case Studies
3.5.1. Furnace Case Study

3.5.1.6. Work This Example Yourself

View This page allows you to repeat the analysis outlined in the
Dataplot case study description on the previous page using Dataplot, if
Macro for you have downloaded and installed it. Output from each
this Case analysis step below will be displayed in one or more of the
Study Dataplot windows. The four main windows are the Output
window, the Graphics window, the Command History window
and the Data Sheet window. Across the top of the main
windows there are menus for executing Dataplot commands.
Across the bottom is a command entry window where
commands can be typed in.

Results and
Data Analysis Steps
Conclusions

The links in this column


Click on the links below to start Dataplot and
will connect you with
run this case study yourself. Each step may use
more detailed
results from previous steps, so please be patient.
information about each
Wait until the software verifies that the current
analysis step from the
step is complete before clicking on the next step.
case study description.

1. Get set up and started.

1. Read in the data. 1. You have read 4


columns of numbers
into Dataplot,
variables run, zone,
wafer, and
filmthic.

2. Analyze the response variable.


1. Normal probability plot, 1. Initial plots
box plot, and histogram of indicate that the
film thickness. film thickness is
reasonably
approximated by a
normal
distribution with
2. Compute summary statistics no significant
and quantiles of film outliers.
thickness.
2. Mean is 563.04
and standard

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc516.htm[6/27/2012 2:11:04 PM]


3.5.1.6. Work This Example Yourself

3. Perform a capability analysis. deviation is


25.38. Data range
from 487 to 634.

3. Capability
analysis indicates
that the process
is capable.

3. Identify Sources of Variation.


1. Generate a box plot by run. 1. The box plot
shows significant
variation both
between runs and
within runs.
2. Generate a box plot by furnace
location.
2. The box plot
shows significant
variation within
3. Generate a box plot by wafer. furnace location
but not between
furnace location.

4. Generate a block plot.


3. The box plot
shows no significant
effect for wafer.

4. The block plot


shows both run
and furnace
location are
significant.

4. Perform an Analysis of Variance


1. Perform the analysis of 1. The results of
variance and compute the the ANOVA are
components of variance. summarized in an
ANOVA table
and a components
of variance
table.

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc516.htm[6/27/2012 2:11:04 PM]


3.5.2. Machine Screw Case Study

3. Production Process Characterization


3.5. Case Studies

3.5.2. Machine Screw Case Study

Introduction This case study analyzes three automatic screw machines


with the intent of replacing one of them.

Table of The case study is broken down into the following steps.
Contents
1. Background and Data
2. Box Plots by Factor
3. Analysis of Variance
4. Throughput
5. Final Conclusions
6. Work This Example Yourself

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc52.htm[6/27/2012 2:11:05 PM]


3.5.2.1. Background and Data

3. Production Process Characterization


3.5. Case Studies
3.5.2. Machine Screw Case Study

3.5.2.1. Background and Data

Introduction A machine shop has three automatic screw machines that


produce various parts. The shop has enough capital to
replace one of the machines. The quality control department
has been asked to conduct a study and make a
recommendation as to which machine should be replaced. It
was decided to monitor one of the most commonly produced
parts (an 1/8 th inch diameter pin) on each of the machines
and see which machine is the least stable.

Goal The goal of this study is to determine which machine is least


stable in manufacturing a steel pin with a diameter of .125
+/- .003 inches. Stability will be measured in terms of a
constant variance about a constant mean. If all machines are
stable, the decision will be based on process variability and
throughput. Namely, the machine with the highest variability
and lowest throughput will be selected for replacement.

Software The analyses used in this case study can be generated using
both Dataplot code and R code.

Process The process model for this operation is trivial and need not
Model be addressed.

Sensitivity The sensitivity model, however, is important and is given in


Model the figure below. The material is not very important. All
machines will receive barstock from the same source and the
coolant will be the same. The method is important. Each
machine is slightly different and the operator must make
adjustments to the speed (how fast the part rotates), feed
(how quickly the cut is made) and stops (where cuts are
finished) for each machine. The same operator will be
running all three machines simultaneously. Measurement is
not too important. An experienced QC engineer will be
collecting the samples and making the measurements.
Finally, the machine condition is really what this study is all
about. The wear on the ways and the lead screws will largely
determine the stability of the machining process. Also, tool
wear is important. The same type of tool inserts will be used
on all three machines. The tool insert wear will be monitored
by the operator and they will be changed as needed.

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc521.htm[6/27/2012 2:11:05 PM]


3.5.2.1. Background and Data

Sampling Given our goal statement and process modeling, we can now
Plan define a sampling plan. The primary goal is to determine if the
process is stable and to compare the variances of the three
machines. We also need to monitor throughput so that we can
compare the productivity of the three machines.

There is an upcoming three-day run of the particular part of


interest, so this study will be conducted on that run. There is a
suspected time-of-day effect that we must account for. It is
sometimes the case that the machines do not perform as well
in the morning, when they are first started up, as they do later
in the day. To account for this we will sample parts in the
morning and in the afternoon. So as not to impact other QC
operations too severely, it was decided to sample 10 parts,
twice a day, for three days from each of the three machines.
Daily throughput will be recorded as well.

We are expecting readings around .125 +/- .003 inches. The


parts will be measured using a standard micrometer with
readings recorded to 0.0001 of an inch. Throughput will be
measured by reading the part counters on the machines at the
end of each day.

Data The following are the data that were collected for this study.
MACHINE DAY TIME SAMPLE
DIAMETER
(1-3) (1-3) 1 = AM (1-10)
(inches)
2 = PM
---------------------------------------------------
---
1 1 1 1
0.1247
1 1 1 2

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc521.htm[6/27/2012 2:11:05 PM]


3.5.2.1. Background and Data

0.1264
1 1 1 3
0.1252
1 1 1 4
0.1253
1 1 1 5
0.1263
1 1 1 6
0.1251
1 1 1 7
0.1254
1 1 1 8
0.1239
1 1 1 9
0.1235
1 1 1 10
0.1257
1 1 2 1
0.1271
1 1 2 2
0.1253
1 1 2 3
0.1265
1 1 2 4
0.1254
1 1 2 5
0.1243
1 1 2 6
0.124
1 1 2 7
0.1246
1 1 2 8
0.1244
1 1 2 9
0.1271
1 1 2 10
0.1241
1 2 1 1
0.1251
1 2 1 2
0.1238
1 2 1 3
0.1255
1 2 1 4
0.1234
1 2 1 5
0.1235
1 2 1 6
0.1266
1 2 1 7
0.125
1 2 1 8
0.1246
1 2 1 9
0.1243
1 2 1 10
0.1248
1 2 2 1
0.1248
1 2 2 2
0.1235
1 2 2 3
0.1243
1 2 2 4
0.1265
1 2 2 5
0.127
1 2 2 6
0.1229
1 2 2 7
0.125
1 2 2 8
0.1248
1 2 2 9
0.1252
1 2 2 10
0.1243
1 3 1 1
0.1255
1 3 1 2
0.1237

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc521.htm[6/27/2012 2:11:05 PM]


3.5.2.1. Background and Data

1 3 1 3
0.1235
1 3 1 4
0.1264
1 3 1 5
0.1239
1 3 1 6
0.1266
1 3 1 7
0.1242
1 3 1 8
0.1231
1 3 1 9
0.1232
1 3 1 10
0.1244
1 3 2 1
0.1233
1 3 2 2
0.1237
1 3 2 3
0.1244
1 3 2 4
0.1254
1 3 2 5
0.1247
1 3 2 6
0.1254
1 3 2 7
0.1258
1 3 2 8
0.126
1 3 2 9
0.1235
1 3 2 10
0.1273
2 1 1 1
0.1239
2 1 1 2
0.1239
2 1 1 3
0.1239
2 1 1 4
0.1231
2 1 1 5
0.1221
2 1 1 6
0.1216
2 1 1 7
0.1233
2 1 1 8
0.1228
2 1 1 9
0.1227
2 1 1 10
0.1229
2 1 2 1
0.122
2 1 2 2
0.1239
2 1 2 3
0.1237
2 1 2 4
0.1216
2 1 2 5
0.1235
2 1 2 6
0.124
2 1 2 7
0.1224
2 1 2 8
0.1236
2 1 2 9
0.1236
2 1 2 10
0.1217
2 2 1 1
0.1247
2 2 1 2
0.122
2 2 1 3

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc521.htm[6/27/2012 2:11:05 PM]


3.5.2.1. Background and Data

0.1218
2 2 1 4
0.1237
2 2 1 5
0.1234
2 2 1 6
0.1229
2 2 1 7
0.1235
2 2 1 8
0.1237
2 2 1 9
0.1224
2 2 1 10
0.1224
2 2 2 1
0.1239
2 2 2 2
0.1226
2 2 2 3
0.1224
2 2 2 4
0.1239
2 2 2 5
0.1237
2 2 2 6
0.1227
2 2 2 7
0.1218
2 2 2 8
0.122
2 2 2 9
0.1231
2 2 2 10
0.1244
2 3 1 1
0.1219
2 3 1 2
0.1243
2 3 1 3
0.1231
2 3 1 4
0.1223
2 3 1 5
0.1218
2 3 1 6
0.1218
2 3 1 7
0.1225
2 3 1 8
0.1238
2 3 1 9
0.1244
2 3 1 10
0.1236
2 3 2 1
0.1231
2 3 2 2
0.1223
2 3 2 3
0.1241
2 3 2 4
0.1215
2 3 2 5
0.1221
2 3 2 6
0.1236
2 3 2 7
0.1229
2 3 2 8
0.1205
2 3 2 9
0.1241
2 3 2 10
0.1232
3 1 1 1
0.1255
3 1 1 2
0.1215
3 1 1 3
0.1219

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc521.htm[6/27/2012 2:11:05 PM]


3.5.2.1. Background and Data

3 1 1 4
0.1253
3 1 1 5
0.1232
3 1 1 6
0.1266
3 1 1 7
0.1271
3 1 1 8
0.1209
3 1 1 9
0.1212
3 1 1 10
0.1249
3 1 2 1
0.1228
3 1 2 2
0.126
3 1 2 3
0.1242
3 1 2 4
0.1236
3 1 2 5
0.1248
3 1 2 6
0.1243
3 1 2 7
0.126
3 1 2 8
0.1231
3 1 2 9
0.1234
3 1 2 10
0.1246
3 2 1 1
0.1207
3 2 1 2
0.1279
3 2 1 3
0.1268
3 2 1 4
0.1222
3 2 1 5
0.1244
3 2 1 6
0.1225
3 2 1 7
0.1234
3 2 1 8
0.1244
3 2 1 9
0.1207
3 2 1 10
0.1264
3 2 2 1
0.1224
3 2 2 2
0.1254
3 2 2 3
0.1237
3 2 2 4
0.1254
3 2 2 5
0.1269
3 2 2 6
0.1236
3 2 2 7
0.1248
3 2 2 8
0.1253
3 2 2 9
0.1252
3 2 2 10
0.1237
3 3 1 1
0.1217
3 3 1 2
0.122
3 3 1 3
0.1227
3 3 1 4

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc521.htm[6/27/2012 2:11:05 PM]


3.5.2.1. Background and Data

0.1202
3 3 1 5
0.127
3 3 1 6
0.1224
3 3 1 7
0.1219
3 3 1 8
0.1266
3 3 1 9
0.1254
3 3 1 10
0.1258
3 3 2 1
0.1236
3 3 2 2
0.1247
3 3 2 3
0.124
3 3 2 4
0.1235
3 3 2 5
0.124
3 3 2 6
0.1217
3 3 2 7
0.1235
3 3 2 8
0.1242
3 3 2 9
0.1247
3 3 2 10
0.125

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc521.htm[6/27/2012 2:11:05 PM]


3.5.2.2. Box Plots by Factors

3. Production Process Characterization


3.5. Case Studies
3.5.2. Machine Screw Case Study

3.5.2.2. Box Plots by Factors

Initial Steps The initial step is to plot box plots of the measured diameter for each of the
explanatory variables.

Box Plot by The following is a box plot of the diameter by machine.


Machine

Conclusions We can make the following conclusions from this box plot.
From Box
Plot 1. The location appears to be significantly different for the three machines, with
machine 2 having the smallest median diameter and machine 1 having the
largest median diameter.

2. Machines 1 and 2 have comparable variability while machine 3 has somewhat


larger variability.

Box Plot by The following is a box plot of the diameter by day.


Day

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc522.htm[6/27/2012 2:11:09 PM]


3.5.2.2. Box Plots by Factors

Conclusions We can draw the following conclusion from this box plot. Neither the location nor
From Box the spread seem to differ significantly by day.
Plot

Box Plot by The following is a box plot of the time of day.


Time of Day

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc522.htm[6/27/2012 2:11:09 PM]


3.5.2.2. Box Plots by Factors

Conclusion We can draw the following conclusion from this box plot. Neither the location nor
From Box the spread seem to differ significantly by time of day.
Plot

Box Plot by The following is a box plot of the sample number.


Sample
Number

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc522.htm[6/27/2012 2:11:09 PM]


3.5.2.2. Box Plots by Factors

Conclusion We can draw the following conclusion from this box plot. Although there are some
From Box minor differences in location and spread between the samples, these differences do
Plot not show a noticeable pattern and do not seem significant.

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc522.htm[6/27/2012 2:11:09 PM]


3.5.2.3. Analysis of Variance

3. Production Process Characterization


3.5. Case Studies
3.5.2. Machine Screw Case Study

3.5.2.3. Analysis of Variance

Analysis of We can confirm our interpretation of the box plots by running an analysis of
Variance variance when all four factors are included.
Using All
Source DF
Sum of Mean F Statistic Prob > F
Factors Squares Square
------------------------------------------------------------------
Machine 2 0.000111 0.000055 29.3159 1.3e-11
Day 2 0.000004 0.000002 0.9884 0.37
Time 1 0.000002 0.000002 1.2478 0.27
Sample 9 0.000009 0.000001 0.5205 0.86
Residual 165 0.000312 0.000002
------------------------------------------------------------------
Corrected Total 179 0.000437 0.000002

Interpretation We fit the model


of ANOVA
Output

which has an overall mean, as opposed to the model

These models are mathematically equivalent. The effect estimates in the first model
are relative to the overall mean. The effect estimates for the second model can be
obtained by simply adding the overall mean to effect estimates from the first model.

Only the machine factor is statistically significant. This confirms what the box plots
in the previous section had indicated graphically.

Analysis of The previous analysis of variance indicated that only the machine factor was
Variance statistically significant. The following table displays the ANOVA results using only
Using Only the machine factor.
Machine
Source DF
Sum of Mean F Statistic Prob > F
Squares Square
------------------------------------------------------------------
Machine 2 0.000111 0.000055 30.0094 6.0E-12
Residual 177 0.000327 0.000002
------------------------------------------------------------------
Corrected Total 179 0.000437 0.000002

Interpretation At this stage, we are interested in the level means for the machine variable. These
of ANOVA can be summarized in the following table.
Output
Machine Means for One-way ANOVA
Level Number Mean Standard Error Lower 95% CI Upper 95% CI
1 60 0.124887 0.00018 0.12454 0.12523

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc523.htm[6/27/2012 2:11:10 PM]


3.5.2.3. Analysis of Variance

2 60 0.122968 0.00018 0.12262 0.12331


3 60 0.124022 0.00018 0.12368 0.12437

Model As a final step, we validate the model by generating a 4-plot of the residuals.
Validation

The 4-plot does not indicate any significant problems with the ANOVA model.

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc523.htm[6/27/2012 2:11:10 PM]


3.5.2.4. Throughput

3. Production Process Characterization


3.5. Case Studies
3.5.2. Machine Screw Case Study

3.5.2.4. Throughput

Summary of The throughput is summarized in the following table (this was part of the original
Throughput data collection, not the result of analysis).

Machine Day 1 Day 2 Day 3


1 576 604 583
2 657 604 586
3 510 546 571

This table shows that machine 3 had significantly lower throughput.

Graphical We can show the throughput graphically.


Representation
of Throughput

The graph clearly shows the lower throughput for machine 3.

Analysis of We can confirm the statistical significance of the lower throughput of machine 3 by
Variance for running an analysis of variance.
Throughput
Source DF Sum of Mean F Statistic Prob > F

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc524.htm[6/27/2012 2:11:10 PM]


3.5.2.4. Throughput

Squares Square
-------------------------------------------------------------------
Machine 2 8216.89 4108.45 4.9007 0.0547
Residual 6 5030.00 838.33
-------------------------------------------------------------------
Corrected Total 8 13246.89 1655.86

Interpretation We summarize the level means for machine 3 in the following table.
of ANOVA
Output Machine 3 Level Means for One-way ANOVA
Level Number Mean Standard Error Lower 95% CI Upper 95% CI
1 3 587.667 16.717 546.76 628.57
2 3 615.667 16.717 574.76 656.57
3 3 542.33 16.717 501.43 583.24

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc524.htm[6/27/2012 2:11:10 PM]


3.5.2.5. Final Conclusions

3. Production Process Characterization


3.5. Case Studies
3.5.2. Machine Screw Case Study

3.5.2.5. Final Conclusions

Final The analysis shows that machines 1 and 2 had about the same
Conclusions variablity but significantly different locations. The
throughput for machine 2 was also higher with greater
variability than for machine 1. An interview with the operator
revealed that he realized the second machine was not set
correctly. However, he did not want to change the settings
because he knew a study was being conducted and was afraid
he might impact the results by making changes. Machine 3
had significantly more variation and lower throughput. The
operator indicated that the machine had to be taken down
several times for minor repairs. Given the preceeding
analysis results, the team recommended replacing machine 3.

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc525.htm[6/27/2012 2:11:12 PM]


3.5.2.6. Work This Example Yourself

3. Production Process Characterization


3.5. Case Studies
3.5.2. Machine Screw Case Study

3.5.2.6. Work This Example Yourself

View This page allows you to repeat the analysis outlined in the
Dataplot case study description on the previous page using Dataplot, if
Macro for you have downloaded and installed it. Output from each
this Case analysis step below will be displayed in one or more of the
Study Dataplot windows. The four main windows are the Output
window, the Graphics window, the Command History window
and the Data Sheet window. Across the top of the main
windows there are menus for executing Dataplot commands.
Across the bottom is a command entry window where
commands can be typed in.

Results and
Data Analysis Steps
Conclusions

The links in this column


Click on the links below to start Dataplot and
will connect you with
run this case study yourself. Each step may use
more detailed
results from previous steps, so please be patient.
information about each
Wait until the software verifies that the current
analysis step from the
step is complete before clicking on the next step.
case study description.

1. Get set up and started.

1. Read in the data. 1. You have read 5


columns of numbers
into Dataplot,
variables machine,
day, time,
sample, and diameter.

2. Box Plots by Factor Variables


1. Generate a box plot by machine. 1. The box plot
shows significant
variation for
both location and
spread.
2. Generate a box plot by day.
2. The box plot
shows no significant
location or
3. Generate a box plot by time of spread effects for
day. day.

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc526.htm[6/27/2012 2:11:12 PM]


3.5.2.6. Work This Example Yourself

3. The box plot


4. Generate a box plot by shows no significant
sample. location or
spread effects for
time of day.

4. The box plot


shows no significant
location or
spread effects for
sample.

3. Analysis of Variance
1. Perform an analysis of variance 1. The analysis of
with all factors. variance shows
that only the
machine factor
is statistically
2. Perform an analysis of variance significant.
with only the machine factor.
2. The analysis of
variance shows
the overall mean
3. Perform model validation by and the
generating a 4-plot of the effect estimates
residuals. for the levels
of the machine
variable.

3. The 4-plot of
the residuals does
not indicate any
significant
problems with the
model.

4. Graph of Throughput
1. Generate a graph of the 1. The graph shows
throughput. the throughput
for machine 3 is
lower than
the other
2. Perform an analysis of machines.
variance of the throughput.
2. The effect
estimates from the
ANIVA are given.

http://www.itl.nist.gov/div898/handbook/ppc/section5/ppc526.htm[6/27/2012 2:11:12 PM]


3.6. References

3. Production Process Characterization

3.6. References

Box, G.E.P., Hunter, W.G., and Hunter, J.S. (1978), Statistics for
Experimenters, John Wiley and Sons, New York.

Cleveland, W.S. (1993), Visualizing Data, Hobart Press, New Jersey.

Hoaglin, D.C., Mosteller, F., and Tukey, J.W. (1985), Exploring Data
Tables, Trends, and Shapes, John Wiley and Sons, New York.

Hoaglin, D.C., Mosteller, F., and Tukey, J.W. (1991), Fundamentals of


Exploratory Analysis of Variance, John Wiley and Sons, New York.

http://www.itl.nist.gov/div898/handbook/ppc/section6/ppc6.htm[6/27/2012 2:11:13 PM]

You might also like