Engineering Statistics Handbook 3. Production Process Characterization
Engineering Statistics Handbook 3. Production Process Characterization
The goal of this chapter is to learn how to plan and conduct a Production
Process Characterization Study (PPC) on manufacturing processes. We will
learn how to model manufacturing processes and use these models to design
a data collection scheme and to guide data analysis activities. We will look
in detail at how to analyze the data collected in characterization studies and
how to interpret and report the results. The accompanying Case Studies
provide detailed examples of several process characterization studies.
1. Introduction 2. Assumptions
5. Case Studies
References
2. Assumptions / Prerequisites [3.2.]
1. General Assumptions [3.2.1.]
2. Continuous Linear Model [3.2.2.]
3. Analysis of Variance Models (ANOVA) [3.2.3.]
1. One-Way ANOVA [3.2.3.1.]
1. One-Way Value-Splitting [3.2.3.1.1.]
2. Two-Way Crossed ANOVA [3.2.3.2.]
1. Two-way Crossed Value-Splitting Example [3.2.3.2.1.]
3. Two-Way Nested ANOVA [3.2.3.3.]
1. Two-Way Nested Value-Splitting Example [3.2.3.3.1.]
4. Discrete Models [3.2.4.]
5. Case Studies [3.5.]
1. Furnace Case Study [3.5.1.]
1. Background and Data [3.5.1.1.]
2. Initial Analysis of Response Variable [3.5.1.2.]
3. Identify Sources of Variation [3.5.1.3.]
4. Analysis of Variance [3.5.1.4.]
5. Final Conclusions [3.5.1.5.]
6. Work This Example Yourself [3.5.1.6.]
2. Machine Screw Case Study [3.5.2.]
1. Background and Data [3.5.2.1.]
2. Box Plots by Factors [3.5.2.2.]
3. Analysis of Variance [3.5.2.3.]
4. Throughput [3.5.2.4.]
5. Final Conclusions [3.5.2.5.]
6. Work This Example Yourself [3.5.2.6.]
6. References [3.6.]
Not all of The first two steps are only needed for new processes or when
the steps the process has undergone some significant engineering
need to be change. There are, however, many times throughout the life
performed of a process when the third step is needed. Examples might
be: initial process qualification, control chart development,
after minor process adjustments, after schedule equipment
maintenance, etc.
Process calibration
characterization process monitoring
techniques are process improvement
applicable in process/product comparison
other areas reliability
3.1.3. Terminology/Concepts
Location
The location is the expected value of the output being
measured. For a stable process, this is the value
around which the process has stabilized.
Spread
The spread is the expected amount of variation
associated with the output. This tells us the range of
possible values that we would expect to see.
Shape
The shape shows how the variation is distributed
about the location. This tells us if our variation is
symmetric about the mean or if it is skewed or
possibly multimodal.
Click on The table below shows the most common numerical and
each item to graphical measures of location, spread and shape.
read more
detail Parameter Numerical Graphical
scatter plot
mean
Location boxplot
median
histogram
variance
boxplot
Spread range
histogram
inter-quartile range
boxplot
skewness
Shape histogram
kurtosis
probability plot
Variability All manufacturing and measurement processes exhibit variation. For example, when
is present we take sample data on the output of a process, such as critical dimensions, oxide
everywhere thickness, or resistivity, we observe that all the values are NOT the same. This results
in a collection of observed values distributed about some location value. This is what
we call spread or variability. We represent variability numerically with the variance
calculation and graphically with a histogram.
How does The standard deviation (square root of the variance) gives insight into the spread of the
the data through the use of what is known as the Empirical Rule. This rule (shown in the
standard graph below) is:
deviation
describe the Approximately 60-78% of the data are within a distance of one standard deviation
spread of from the average .
the data?
Approximately 90-98% of the data are within a distance of two standard deviations
from the average .
More than 99% of the data are within a distance of three standard deviations from the
average
Variability This observed variability is an accumulation of many different sources of variation that
accumulates have occurred throughout the manufacturing process. One of the more important
from many activities of process characterization is to identify and quantify these various sources
sources of variation so that they may be minimized.
There are There are not only different sources of variation, but there are also different types of
also variation. Two important classifications of variation for the purposes of PPC are
different controlled variation and uncontrolled variation.
types
of uncontrolled variation.
Two trend The two figures below are two trend plots from two different oxide
plots growth processes. Thirty wafers were sampled from each process: one
per day over 30 days. Thickness at the center was measured on each
wafer. The x-axis of each graph is the wafer number and the y-axis is the
film thickness in angstroms.
Examples The first process is an example of a process that is "in control" with
of"in random fluctuation about a process location of approximately 990. The
control" and second process is an example of a process that is "out of control" with a
"out of process location trending upward after observation 20.
control"
processes
This process
exhibits
controlled
variation.
Note the
random
fluctuation
about a
constant
mean.
This process
exhibits
uncontrolled
variation.
Note the
structure in
the
variation in
the form of
a linear
trend.
Black box As we will see in Section 3 of this chapter, one of the first steps in PPC is to
model and model the process that is under investigation. Two very useful tools for
fishbone doing this are the black-box model and the fishbone diagram.
diagram
We use the We can use the simple black-box model, shown below, to describe most of
black-box the tools and processes we will encounter in PPC. The process will be
model to stimulated by inputs. These inputs can either be controlled (such as recipe or
describe machine settings) or uncontrolled (such as humidity, operators, power
our fluctuations, etc.). These inputs interact with our process and produce
processes outputs. These outputs are usually some characteristic of our process that we
can measure. The measurable inputs and outputs can be sampled in order to
observe and understand how they behave and relate to each other.
Diagram
of the
black box
model
These inputs and outputs are also known as Factors and Responses,
respectively.
Factors
Observed inputs used to explain response behavior (also called
explanatory variables). Factors may be fixed-level controlled inputs or
sampled uncontrolled inputs.
Responses
Sampled process outputs. Responses may also be functions of sampled
outputs such as average thickness or uniformity.
Table
Type Description Example
describing
the particle count, oxide
different discrete/continuous, order is
Measurement thickness, pressure,
variable important, infinite range
temperature
types
discrete, order is important,
Ordinal run #, wafer #, site, bin
finite range
good/bad, bin,
discrete, no order, very few
Nominal high/medium/low, shift,
possible values
operator
Fishbone We can use the fishbone diagram to further refine the modeling process.
diagrams Fishbone diagrams are very useful for decomposing the complexity of our
help to manufacturing processes. Typically, we choose a process characteristic
decompose (either Factors or Responses) and list out the general categories that may
complexity influence the characteristic (such as material, machine method, environment,
etc.), and then provide more specific detail within each category. Examples
of how to do this are given in the section on Case Studies.
Sample
fishbone
diagram
Factors and Besides just observing our processes for evidence of stability
responses and capability, we quite often want to know about the
relationships between the various Factors and Responses.
We look for There are generally two types of relationships that we are
correlations interested in for purposes of PPC. They are:
and causal
relationships Correlation
Two variables are said to be correlated if an observed
change in the level of one variable is accompanied by
a change in the level of another variable. The change
may be in the same direction (positive correlation) or
in the opposite direction (negative correlation).
Causality
There is a causal relationship between two variables if
a change in the level of one variable causes a change
in the other variable.
Our goal is Generally, our ultimate goal in PPC is to find and quantify
to find causal relationships. Once this is done, we can then take
causal advantage of these relationships to improve and control our
relationships processes.
Further The planning and data collection steps are described in detail
information in the data collection section. The analysis and interpretation
steps are covered in detail in the analysis section. Examples
of the reporting step can be seen in the Case Studies.
Assumption: Finally, we assume that the data used to fit these models
data used to are representative of the process being modeled. As a
fit these result, we must additionally assume that the measurement
models are system used to collect the data has been studied and proven
representative to be capable of making measurements to the desired
of the process precision and accuracy. If this is not the case, refer to the
being Measurement Capability Section of this Handbook.
modeled
Estimation The coefficients for the parameters in the CLM are estimated by
the method of least squares. This is a method that gives estimates
which minimize the sum of the squared distances from the
observations to the fitted line or plane. See the chapter on Process
Modeling for a more complete discussion on estimating the
coefficients for these models.
Testing The tests for the CLM involve testing that the model as a whole is
a good representation of the process and whether any of the
coefficients in the model are zero or have no effect on the overall
fit. Again, the details for testing are given in the chapter on
Process Modeling.
Uses The CLM has many uses such as building predictive process
models over a range of process settings that exhibit linear
behavior, control charts, process capability, building models from
Diffusion Furnace (cont.) - Usually, the fitted line for the average
wafer sheet resistance is not straight but has some curvature to it.
This can be accommodated by adding a quadratic term for the
time parameter as follows:
mean = .126
Description A one-way layout consists of a single factor with several levels and multiple
observations at each level. With this kind of layout we can calculate the mean of the
observations within each level of our factor. The residuals will tell us about the
variation within each level. We can also average the means of each level to obtain a
grand mean. We can then look at the deviation of the mean of each level from the
grand mean to understand something about the level effects. Finally, we can compare
the variation within levels to the variation across levels. Hence the name analysis of
variance.
The equation indicates that the jth data value, from level i, is the sum of three
components: the common value (grand mean), the level effect (the deviation of each
level mean from the grand mean), and the residual (what's left over).
Estimation Estimation for the one-way layout can be performed one of two ways. First, we can
calculate the total variation, within-level variation and across-level variation. These can
click here to be summarized in a table as shown below and tests can be made to determine if the
see details factor levels are significant. The value splitting example illustrates the calculations
of one-way involved.
value
splitting
ANOVA In general, the ANOVA table for the one-way case is given by:
table for
one-way
case
where
and
.
The row labeled, "Corr. Total", in the ANOVA table contains the corrected total sum of
squares and the associated degrees of freedom (DoF).
Level effects The second way to estimate effects is through the use of CLM techniques. If you look at
must sum to the model above you will notice that it is in the form of a CLM. The only problem is
zero that the model is saturated and no unique solution exists. We overcome this problem by
applying a constraint to the model. Since the level effects are just deviations from the
grand mean, they must sum to zero. By applying the constraint that the level effects
must sum to zero, we can now obtain a unique solution to the CLM equations. Most
analysis programs will handle this for you automatically. See the chapter on Process
Modeling for a more complete discussion on estimating the coefficients for these
models.
Testing We are testing to see if the observed data support the hypothesis that the levels of the
factor are significantly different from each other. The way we do this is by comparing
the within-level variancs to the between-level variance.
If we assume that the observations within each level have the same variance, we can
calculate the variance within each level and pool these together to obtain an estimate of
the overall population variance. This works out to be the mean square of the residuals.
Similarly, if there really were no level effect, the mean square across levels would be an
estimate of the overall variance. Therefore, if there really were no level effect, these
two estimates would be just two different ways to estimate the same parameter and
should be close numerically. However, if there is a level effect, the level mean square
will be higher than the residual mean square.
It can be shown that given the assumptions about the data stated below, the ratio of the
level mean square and the residual mean square follows an F distribution with degrees
of freedom as shown in the ANOVA table. If the F0 value is significant at a given
significance level (greater than the cut-off value in a F table), then there is a level effect
present in the data.
Assumptions For estimation purposes, we assume the data can adequately be modeled as the sum of
a deterministic component and a random component. We further assume that the fixed
(deterministic) component can be modeled as the sum of an overall mean and some
contribution from the factor level. Finally, it is assumed that the random component can
be modeled with a Gaussian distribution with fixed location and spread.
Uses The one-way ANOVA is useful when we want to compare the effect of multiple levels
of one factor and we have multiple observations at each level. The factor can be either
discrete (different machine, different plants, different shifts, etc.) or continuous
(different gas flows, temperatures, etc.).
Example Let's extend the machining example by assuming that we have five different machines
making the same part and we take five random samples from each machine to obtain the
following diameter data:
Machine
1 2 3 4 5
0.125 0.118 0.123 0.126 0.118
0.127 0.122 0.125 0.128 0.129
0.125 0.120 0.125 0.126 0.127
0.126 0.124 0.124 0.127 0.120
0.128 0.119 0.126 0.129 0.121
Test By dividing the factor-level mean square by the residual mean square, we obtain an F0
value of 4.86 which is greater than the cut-off value of 2.87 from the F distribution with
4 and 20 degrees of freedom and a significance level of 0.05. Therefore, there is
sufficient evidence to reject the hypothesis that the levels are all the same.
Conclusion From the analysis of these data we can conclude that the factor "machine" has an effect.
There is a statistically significant difference in the pin diameters across the machines on
which they were manufactured.
Example Let's use the data from the machining example to illustrate
how to use the techniques of value-splitting to break each data
value into its component parts. Once we have the component
parts, it is then a trivial matter to calculate the sums of squares
and form the F-value for the test.
Machine
1 2 3 4 5
.125 .118 .123 .126 .118
.127 .122 .125 .128 .129
.125 .120 .125 .126 .127
.126 .124 .124 .127 .120
.128 .119 .126 .129 .121
Machine
1 2 3 4 5
.1262 .1206 .1246 .1272 .123
Sweep We can then sweep (subtract the level mean from each
level associated data value) the means through the original data
means table to get the residuals:
Machine
1 2 3 4 5
- - - - -
.0012 .0026 .0016 .0012 .005
.0008 .0014 .0004 .0008 .006
- - -
.0004 .004
.0012 .0006 .0012
- - - -
.0034
.0002 .0006 .0002 .003
- -
.0018 .0014 .0018
.0016 .002
Calculate The next step is to calculate the grand mean from the
the grand individual machine means as:
mean
Grand
Mean
.12432
Sweep the Finally, we can sweep the grand mean through the individual
grand level means to obtain the level effects:
mean
through
the level
means
Machine
1 2 3 4 5
- -
.00188 .00028 .00288
.00372 .00132
Calculate Now that we have the data values split and the overlays
ANOVA created, the next step is to calculate the various values in the
values One-Way ANOVA table. We have three values to calculate
for each overlay. They are the sums of squares, the degrees of
freedom, and the mean squares.
Total sum The total sum of squares is calculated by summing the squares
of squares of all the data values and subtracting from this number the
square of the grand mean times the total number of data
values. We usually don't calculate the mean square for the
total sum of squares because we don't use this value in any
statistical test.
Level sum Finally, to obtain the sum of squares for the levels, we sum the
of squares, squares of each value in the level effect overlay and multiply
degrees of the sum by the number of observations for each level (in this
freedom case 5) to obtain a value of .000137. Since the deviations from
and mean the level means must sum to zero, we have only four
square unconstrained values so the degrees of freedom for level
effects is 4. This produces a mean square of .000034.
Calculate The last step is to calculate the F-value and perform the test of
F-value equal level means. The F- value is just the level mean square
divided by the residual mean square. In this case the F-
value=4.86. If we look in an F-table for 4 and 20 degrees of
freedom at 95% confidence, we see that the critical value is
2.87, which means that we have a significant result and that
there is thus evidence of a strong machine effect. By looking
at the level-effect overlay we see that this is driven by
machines 2 and 4.
Description When we have two factors with at least two levels and one or more observations at each level, we say we have a
two-way layout. We say that the two-way layout is crossed when every level of Factor A occurs with every level
of Factor B. With this kind of layout we can estimate the effect of each factor (Main Effects) as well as any
interaction between the factors.
Model If we assume that we have K observations at each combination of I levels of Factor A and J levels of Factor B,
then we can model the two-way layout with an equation of the form:
This equation just says that the kth data value for the jth level of Factor B and the ith level of Factor A is the sum
of five components: the common value (grand mean), the level effect for Factor A, the level effect for Factor B,
the interaction effect, and the residual. Note that (ab) does not mean multiplication; rather that there is interaction
between the two factors.
Estimation Like the one-way case, the estimation for the two-way layout can be done either by calculating the variance
components or by using CLM techniques.
Click here For the two-way ANOVA, we display the data in a two-dimensional table with the levels of Factor A in columns
for the and the levels of Factor B in rows. The replicate observations fill each cell. We can sweep out the common
value value, the row effects, the column effects, the interaction effects and the residuals using value-splitting
splitting techniques. Sums of squares can be calculated and summarized in an ANOVA table as shown below.
example
.
The row labeled, "Corr. Total", in the ANOVA table contains the corrected total sum of squares and the
associated degrees of freedom (DoF).
We can use CLM techniques to do the estimation. We still have the problem that the model is saturated and no
unique solution exists. We overcome this problem by applying the constraints to the model that the two main
Testing Like testing in the one-way case, we are testing that two main effects and the interaction are zero. Again we just
form a ratio of each main effect mean square and the interaction mean square to the residual mean square. If the
assumptions stated below are true then those ratios follow an F distribution and the test is performed by
comparing the F0 ratios to values in an F table with the appropriate degrees of freedom and confidence level.
Assumptions For estimation purposes, we assume the data can be adequately modeled as described in the model above. It is
assumed that the random component can be modeled with a Gaussian distribution with fixed location and spread.
Uses The two-way crossed ANOVA is useful when we want to compare the effect of multiple levels of two factors
and we can combine every level of one factor with every level of the other factor. If we have multiple
observations at each level, then we can also estimate the effects of interaction between the two factors.
Example Let's extend the one-way machining example by assuming that we want to test if there are any differences in pin
diameters due to different types of coolant. We still have five different machines making the same part and we
take five samples from each machine for each coolant type to obtain the following data:
Machine
1 2 3 4 5
0.125 0.118 0.123 0.126 0.118
Coolant 0.127 0.122 0.125 0.128 0.129
A 0.125 0.120 0.125 0.126 0.127
0.126 0.124 0.124 0.127 0.120
0.128 0.119 0.126 0.129 0.121
0.124 0.116 0.122 0.126 0.125
0.128 0.125 0.121 0.129 0.123
Coolant
0.127 0.119 0.124 0.125 0.114
B
0.126 0.125 0.126 0.130 0.124
0.129 0.120 0.125 0.124 0.117
Analyze For analysis details see the crossed two-way value splitting example. We can summarize the analysis results in
an ANOVA table as follows:
Test By dividing the mean square for machine by the mean square for residuals we obtain an F0 value of 8.8 which is
greater than the critical value of 2.61 based on 4 and 40 degrees of freedom and a 0.05 significance level.
Likewise the F0 values for Coolant and Interaction, obtained by dividing their mean squares by the residual mean
square, are less than their respective critical values of 4.08 and 2.61 (0.05 significance level).
Conclusion From the ANOVA table we can conclude that machine is the most important factor and is statistically
significant. Coolant is not significant and neither is the interaction. These results would lead us to believe that
some tool-matching efforts would be useful for improving this process.
Example: The data table below is five samples each collected from five
Coolant is different lathes each running two different types of coolant.
completely The measurement is the diameter of a turned pin.
crossed
with Machine
machine
1 2 3 4 5
.125 .118 .123 .126 .118
Coolant .127 .122 .125 .128 .129
A .125 .120 .125 .126 .127
.126 .124 .124 .127 .120
.128 .119 .126 .129 .121
.124 .116 .122 .126 .125
Coolant .128 .125 .121 .129 .123
.127 .119 .124 .125 .114
B .126 .125 .126 .130 .124
.129 .120 .125 .124 .117
-
.0018 .0014 .0018 -.002
.0016
- - -
-.005 .0044
.0028 .0016 .0008
-
.0012 .004 .0022 .0024
.0026
Coolant
- -
.0002 -.002 .0004
.0018 .0066
B
-
.004 .0024 .0032 .0034
.0008
- -
.0022 -.001 .0014
.0028 .0036
Sweep the The next step is to sweep out the row means. This gives the
row means table below.
Machine
1 2 3 4 5
- -
A .1243 .0019 .0003 .0029
.0037 .0013
- - -
B .1238 .003 .003
.0028 .0002 .0032
Sweep the Finally, we sweep the column means to obtain the grand mean,
column row (coolant) effects, column (machine) effects and the
means interaction effects.
Machine
1 2 3 4 5
- -
.1241 .0025 .00005 .003
.0033 .0023
- -
A .0003 .00025 .0000 .001
.0006 .0005
- -
B .0006 .0005 .0000 -.001
.0003 .00025
Calculate We can calculate the values for the ANOVA table according
sums of to the formulae in the table on the crossed two-way page. This
squares gives the table below. From the F-values we see that the
and mean machine effect is significant but the coolant and the
squares interaction are not.
Description Sometimes, constraints prevent us from crossing every level of one factor with every level of the
other factor. In these cases we are forced into what is known as a nested layout. We say we have
a nested layout when fewer than all levels of one factor occur within each level of the other
factor. An example of this might be if we want to study the effects of different machines and
different operators on some output characteristic, but we can't have the operators change the
machines they run. In this case, each operator is not crossed with each machine but rather only
runs one machine.
Model If Factor B is nested within Factor A, then a level of Factor B can only occur within one level of
Factor A and there can be no interaction. This gives the following model:
This equation indicates that each data value is the sum of a common value (grand mean), the
level effect for Factor A, the level effect of Factor B nested within Factor A, and the residual.
Estimation For a nested design we typically use variance components methods to perform the analysis. We
can sweep out the common value, the Factor A effects, the Factor B within A effects and the
residuals using value-splitting techniques. Sums of squares can be calculated and summarized in
an ANOVA table as shown below.
Click here It is important to note that with this type of layout, since each level of one factor is only present
for nested with one level of the other factor, we can't estimate interaction between the two.
value-
splitting
example
ANOVA
table for
nested case
The row labeled, "Corr. Total", in the ANOVA table contains the corrected total sum of squares
and the associated degrees of freedom (DoF).
As with the crossed layout, we can also use CLM techniques. We still have the problem that the
model is saturated and no unique solution exists. We overcome this problem by applying to the
model the constraints that the two main effects sum to zero.
Testing We are testing that two main effects are zero. Again we just form a ratio (F0 ) of each main effect
mean square to the appropriate mean-squared error term. (Note that the error term for Factor A is
not MSE, but is MSB.) If the assumptions stated below are true then those ratios follow an F
distribution and the test is performed by comparing the F0 ratios to values in an F table with the
appropriate degrees of freedom and confidence level.
Assumptions For estimation purposes, we assume the data can be adequately modeled by the model above and
that there is more than one variance component. It is assumed that the random component can be
modeled with a Gaussian distribution with fixed location and spread.
Uses The two-way nested ANOVA is useful when we are constrained from combining all the levels of
one factor with all of the levels of the other factor. These designs are most useful when we have
what is called a random effects situation. When the levels of a factor are chosen at random rather
than selected intentionally, we say we have a random effects model. An example of this is when
we select lots from a production run, then select units from the lot. Here the units are nested
within lots and the effect of each factor is random.
Example Let's change the two-way machining example slightly by assuming that we have five different
machines making the same part and each machine has two operators, one for the day shift and
one for the night shift. We take five samples from each machine for each operator to obtain the
following data:
Machine
1 2 3 4 5
0.125 0.118 0.123 0.126 0.118
Operator 0.127 0.122 0.125 0.128 0.129
Day 0.125 0.120 0.125 0.126 0.127
0.126 0.124 0.124 0.127 0.120
0.128 0.119 0.126 0.129 0.121
0.124 0.116 0.122 0.126 0.125
0.128 0.125 0.121 0.129 0.123
Operator
0.127 0.119 0.124 0.125 0.114
Night
0.126 0.125 0.126 0.130 0.124
0.129 0.120 0.125 0.124 0.117
Analyze For analysis details see the nested two-way value splitting example. We can summarize the
analysis results in an ANOVA table as follows:
Test By dividing the mean square for Machine by the mean square for Operator within Machine, or
Operator(Machine), we obtain an F0 value of 20.38 which is greater than the critical value of
5.19 for 4 and 5 degrees of freedom at the 0.05 significance level. The F0 value for
Operator(Machine), obtained by dividing its mean square by the residual mean square, is less than
the critical value of 2.45 for 5 and 40 degrees of freedom at the 0.05 significance level.
Conclusion From the ANOVA table we can conclude that the Machine is the most important factor and is
statistically significant. The effect of Operator nested within Machine is not statistically
significant. Again, any improvement activities should be focused on the tools.
Example: The data table below contains data collected from five different lathes, each run by two
Operator different operators. Note we are concerned here with the effect of operators, so the
is nested layout is nested. If we were concerned with shift instead of operator, the layout would
within be crossed. The measurement is the diameter of a turned pin.
machine.
Sample
Machine Operator
1 2 3 4 5
Day .125 .127 .125 .126 .128
1
Night .124 .128 .127 .126 .129
Day .118 .122 .120 .124 .119
2
Night .116 .125 .119 .125 .120
Day .123 .125 .125 .124 .126
3
Night .122 .121 .124 .126 .125
Day .126 .128 .126 .127 .129
4
Night .126 .129 .125 .130 .124
Day .118 .129 .127 .120 .121
5
Night .125 .123 .114 .124 .117
For the nested two-way case, just as in the crossed case, the first thing we need to do is
to sweep the cell means from the data table to obtain the residual values. We then
sweep the nested factor (Operator) and the top level factor (Machine) to obtain the
table below.
- -
Night -.0005 .0004 .0024 .0014
.0016 .0026
- -
Day .0002 .0008 -.002 .0018
.0012 .0012
4 .00296
- - -
Night -.0002 .0022 .0032
.0008 .0018 .0028
Day .0012 -.005 .006 .004 -.003 -.002
5 -.00224 - -
Night -.0012 .0044 .0024 .0034
.0066 .0036
What By looking at the residuals we see that machines 2 and 5 have the greatest variability.
does this There does not appear to be much of an operator effect but there is clearly a strong
table tell machine effect.
us?
Calculate We can calculate the values for the ANOVA table according to the formulae in the
sums of table on the nested two-way page. This produces the table below. From the F-values
squares we see that the machine effect is significant but the operator effect is not. (Here it is
and mean assumed that both factors are fixed).
squares
Description There are many instances when we are faced with the
analysis of discrete data rather than continuous data.
Examples of this are yield (good/bad), speed bins
(slow/fast/faster/fastest), survey results (favor/oppose), etc.
We then try to explain the discrete outcomes with some
combination of discrete and/or continuous explanatory
variables. In this situation the modeling techniques we have
learned so far (CLM and ANOVA) are no longer appropriate.
Contingency There are two primary methods available for the analysis of
table discrete response data. The first one applies to situations in
analysis and which we have discrete explanatory variables and discrete
log-linear responses and is known as Contingency Table Analysis. The
model model for this is covered in detail in this section. The second
model applies when we have both discrete and continuous
explanatory variables and is referred to as a Log-Linear
Model. That model is beyond the scope of this Handbook,
but interested readers should refer to the reference section of
this chapter for a list of useful books on the topic.
Start with The data collection process for PPC starts with careful
careful planning. The planning consists of the definition of clear and
planning concise goals, developing process models and devising a
sampling plan.
State concise The goal statement is one of the most important parts of the
goals characterization plan. With clearly and concisely stated
goals, the rest of the planning process falls naturally into
place.
Example Click on each of the links below to see Goal Statements for
goal each of the case studies.
statements
1. Furnace Case Study (Goal)
2. Machine Case Study (Goal)
Examples Click on each of the links below to see the process models
for each of the case studies.
Verify and Once the sampling plan has been developed, it can be
execute verified and then passed on to the responsible parties for
execution.
Goals will The first step is to carefully examine the goals. This will tell
tell us what you which response variables need to be sampled and how.
to measure For instance, if our goal states that we want to determine if
and how an oxide film can be grown on a wafer to within 10
Angstroms of the target value with a uniformity of <2%,
then we know we have to measure the film thickness on the
wafers to an accuracy of at least +/- 3 Angstroms and we
must measure at multiple sites on the wafer in order to
calculate uniformity.
The goals and the models we build will also indicate which
explanatory variables need to be sampled and how. Since
the fishbone diagrams define the known important
relationships, these will be our best guide as to which
explanatory variables are candidates for measurement.
There are two Once we have selected our response parameters, it would
principles that seem to be a rather straightforward exercise to take some
guide our measurements, calculate some statistics and draw
choice of conclusions. There are, however, many things which can
sampling go wrong along the way that can be avoided with careful
scheme planning and knowing what to watch for. There are two
overriding principles that will guide the design of our
sampling scheme.
Examples The issues here are many and complicated. Click on each
of the links below to see the sampling schemes for each of
the case studies.
Prior If our process has been studied before, we can use that prior
information information to reduce sample sizes. This can be done by
using prior mean and variance estimates and by stratifying
the population to reduce variation within groups.
Practicality Of course the sample size you select must make sense. This
is where the trade-offs usually occur. We want to take
enough observations to obtain reasonably precise estimates
where
Know the In the planning phase of the PPC, be sure to understand the
process entire data collection process. Things to watch out for include:
involved
automatic measurement machines rejecting outliers
only summary statistics (mean and standard deviation)
being saved
values for explanatory variables (location, operator, etc.)
are not being saved
how missing values are handled
Table showing A partial list of these individuals along with their roles
roles and and potential responsibilities is given in the table below.
potential There may be multiple occurrences of each of these
responsibilities individuals across shifts or process steps, so be sure to
include everyone.
Gather all After executing the data collection plan for the
of the data characterization study, the data must be gathered up for
into one analysis. Depending on the scope of the study, the data may
place reside in one place or in many different places. It may be in
common factory databases, flat files on individual computers,
or handwritten on run sheets. Whatever the case, the first step
will be to collect all of the data from the various sources and
enter it into a single data file. The most convenient format for
most data analyses is the variables-in-columns format. This
format has the variable names in column headings and the
values for the variables in the rows.
Perform a The next step is to perform a quality check on the data. Here
quality we are typically looking for data entry problems, unusual data
check on values, missing data, etc. The two most useful tools for this
the data step are the scatter plot and the histogram. By constructing
using scatter plots of all of the response variables, any data entry
graphical problems will be easily identified. Histograms of response
and variables are also quite useful for identifying data entry
numerical problems. Histograms of explanatory variables help identify
techniques problems with the execution of the sampling plan. If the
counts for each level of the explanatory variables are not the
same as called for in the sampling plan, you know you may
have an execution problem. Running numerical summary
statistics on all of the variables (both response and
explanatory) also helps to identify data problems.
Summarize Once the data quality problems are identified and fixed, we
data by should estimate the location, spread and shape for all of the
estimating response variables. This is easily done with a combination of
location, histograms and numerical summary statistics.
spread and
shape
The first Once we have a data file created in the desired format,
analysis of checked the data integrity, and have estimated the summary
our data is statistics on the response variables, the next step is to start
exploration exploring the data and to try to understand the underlying
structure. The most useful tools will be various forms of the
basic scatter plot and box plot.
Graph The order that generally proves most effective for data
responses, analysis is to first graph all of the responses against each
then other in a pairwise fashion. Then we graph responses against
explanatory the explanatory variables. This will give an indication of the
versus main factors that have an effect on response variables.
response, Finally, we graph response variables, conditioned on the
then levels of explanatory factors. This is what reveals interactions
conditional between explanatory variables. We will use nested boxplots
plots and block plots to visualize interactions.
Make In this first phase of exploring our data, we plot all of the response variables in a
scatter pairwise fashion. The individual scatter plots are displayed in a matrix form with the
plots of y-axis scaling the same for all plots in a row of the matrix.
all of the
response
variables
Check the The scatterplot matrix shows how the response variables are related to each other. If
slope of there is a linear trend with a positive slope, this indicates that the responses are
the data positively correlated. If there is a linear trend with a negative slope, then the variables
on the are negatively correlated. If the data appear random with no slope, the variables are
scatter probably not correlated. This will be important information for subsequent model
plots building steps.
The next The next step in the exploratory analysis of our data is to see which factors have an
step is to effect on which response variables and to quantify that effect. Scatter plots and box
look for plots will be the tools of choice here.
main effects
Watch out This step is relatively self explanatory. However there are two points of caution. First,
for varying be cognizant of not only the trends in these graphs but also the amount of data
sample represented in those trends. This is especially true for categorical explanatory
sizes across variables. There may be many more observations in some levels of the categorical
levels variable than in others. In any event, take unequal sample sizes into account when
making inferences.
Graph The second point is to be sure to graph the responses against implicit explanatory
implicit as variables (such as observation order) as well as the explicit explanatory variables.
well as There may be interesting insights in these hidden explanatory variables.
explicit
explanatory
variables
Example: In the example below, we have collected data on the particles added to a wafer during
wafer a particular processing step. We ran a number of cassettes through the process and
processing sampled wafers from certain slots in the cassette. We also kept track of which load
lock the wafers passed through. This was done for two different process temperatures.
We measured both small particles (< 2 microns) and large particles (> 2 microns). We
plot the responses (particle counts) against each of the explanatory variables.
Cassette This first graph is a box plot of the number of small particles added for each cassette
does not type. The "X"'s in the plot represent the maximum, median, and minimum number of
appear to particles.
be an
important
factor for
small or
large
particles
The second graph is a box plot of the number of large particles added for each cassette
type.
We conclude from these two box plots that cassette does not appear to be an important
factor for small or large particles.
There is a We next generate box plots of small and large particles for the slot variable. First, the
difference box plot for small particles.
between
slots for
small
particles,
one slot is
different for
large
particles
We conclude that there is a difference between slots for small particles. We also
conclude that one slot appears to be different for large particles.
Load lock We next generate box plots of small and large particles for the load lock variable.
may have a First, the box plot for small particles.
slight effect
for small
and large
particles
We conclude that there may be a slight effect for load lock for small and large
particles.
For small We next generate box plots of small and large particles for the temperature variable.
particles, First, the box plot for small particles.
temperature
has a
strong
effect on
both
location
and spread.
For large
particles,
there may
be a slight
temperature
effect but
this may
just be due
to the
outliers
'
We conclude that temperature has a strong effect on both location and spread for small
particles. We conclude that there might be a small temperature effect for large
particles, but this may just be due to outliers.
It is The final step (and perhaps the most important one) in the exploration phase is to find
important any first order interactions. When the difference in the response between the levels of
to identify one factor is not the same for all of the levels of another factor we say we have an
interactions interaction between those two factors. When we are trying to optimize responses based
on factor settings, interactions provide for compromise.
The eyes Interactions can be seen visually by using nested box plots. However, caution should
can be be exercised when identifying interactions through graphical means alone. Any
deceiving - graphically identified interactions should be verified by numerical methods as well.
be careful
Previous To continue the previous example, given below are nested box plots of the small and
example large particles. The load lock is nested within the two temperature values. There is
continued some evidence of possible interaction between these two factors. The effect of load
lock is stronger at the lower temperature than at the higher one. This effect is stronger
for the smaller particles than for the larger ones. As this example illustrates, when you
have significant interactions the main effects must be interpreted conditionally. That
is, the main effects do not tell the whole story by themselves.
For small The following is the box plot of small particles for load lock nested within
particles, temperature.
the load
lock effect
is not as
strong for
high
temperature
as it is for
low
temperature
We conclude from this plot that for small particles, the load lock effect is not as strong
for high temperature as it is for low temperature.
The same The following is the box plot of large particles for load lock nested within temperature.
may be true
for large
particles
but not as
strongly
We conclude from this plot that for large particles, the load lock effect may not be as
strong for high temperature as it is for low temperature. However, this effect is not as
strong as it is for small particles.
Black box When we develop a data collection plan we build black box models of
models the process we are studying like the one below:
In our data
collection plan
we drew
process model
pictures
Polynomial There are two cases that we will cover for building mathematical
models are models. If our goal is to develop an empirical prediction equation or
generic to identify statistically significant explanatory variables and quantify
descriptors of their influence on output responses, we typically build polynomial
our output models. As the name implies, these are polynomial functions
surface (typically linear or quadratic functions) that describe the relationships
between the explanatory variables and the response variable.
Physical On the other hand, if our goal is to fit an existing theoretical equation,
models then we want to build physical models. Again, as the name implies,
describe the this pertains to the case when we already have equations representing
underlying the physics involved in the process and we want to estimate specific
physics of our parameter values.
processes
Use When the number of factors is small (less than 5), the
multiple complete polynomial equation can be fitted using the
regression technique known as multiple regression. When the number of
to fit factors is large, we should use a technique known as stepwise
polynomial regression. Most statistical analysis programs have a stepwise
models regression capability. We just enter all of the terms of the
polynomial models and let the software choose which terms
best describe the data. For a more thorough discussion of this
topic and some examples, refer to the process improvement
chapter.
CMP From first principles we know that removal rate changes with time.
removal Early on, removal rate is high and as the wafer becomes more planar
rate can the removal rate declines. This is easily modeled with an exponential
be function of the form:
modeled
with a removal rate = p1 + p2 x exp p3 x time
non-linear
equation where p1, p2, and p3 are the parameters we want to estimate.
A non- The equation was fit to the data using a non-linear regression routine.
linear A plot of the original data and the fitted line are given in the image
regression below. The fit is quite good. This fitted equation was subsequently
routine used in process optimization work.
was used
to fit the
data to
the
equation
Studying One of the most common activities in process characterization work is to study the
variation is variation associated with the process and to try to determine the important sources
important of that variation. This is called analysis of variance. Refer to the section of this
in PPC chapter on ANOVA models for a discussion of the theory behind this kind of
analysis.
To perform the analysis, we just identify the structure, enter the data for each of
the factors and levels into a statistical analysis program and then interpret the
ANOVA table and other output. This is all illustrated in the example below.
Look at The first thing to look at is the effect of zone location on the oxide thickness. This
effect of is a classic one-way layout. The factor is furnace zone and we have four levels. A
zone plot of the data and an ANOVA table are given below.
location on
oxide
thickness
The zone
effect is
masked by
the lot-to-
lot variation
Let's From the graph there does not appear to be much of a zone effect; in fact, the
account for ANOVA table indicates that it is not significant. The problem is that variation due
lot with a to lots is so large that it is masking the zone effect. We can fix this by adding a
nested factor for lot. By treating this as a nested two-way layout, we obtain the ANOVA
layout table below.
Conclusions Since the "Prob > F" is less than 0.05, for both lot and zone, we know that these
factors are statistically significant at the 0.05 significance level.
The graphical The graphical tool we use to assess process stability is the
tool we use to scatter plot. We collect a sufficient number of
assess stability independent samples (greater than 100) from our process
is the scatter over a sufficiently long period of time (this can be
plot or the specified in days, hours of processing time or number of
control chart parts processed) and plot them on a scatter plot with
sample order on the x-axis and the sample value on the y-
axis. The plot should look like constant random variation
about a constant mean. Sometimes it is helpful to
calculate control limits and plot them on the scatter plot
along with the data. The two plots in the controlled
variation example are good illustrations of stable and
unstable processes.
Capability Process capability analysis entails comparing the performance of a process against its
compares a specifications. We say that a process is capable if virtually all of the possible variable
process values fall within the specification limits.
against its
specification
Use a Graphically, we assess process capability by plotting the process specification limits on
capability a histogram of the observations. If the histogram falls within the specification limits,
chart then the process is capable. This is illustrated in the graph below. Note how the
process is shifted below target and the process variation is too large. This is an
example of an incapable process.
Notice how
the process is
off target and
has too much
variation
Numerically, we measure capability with a capability index. The general equation for
the capability index, Cp , is:
Numerically,
we use the C
p
index
Interpretation This equation just says that the measure of our process capability is how much of our
of the Cp observed process variation is covered by the process specifications. In this case the
index process variation is measured by 6 standard deviations (+/- 3 on each side of the
mean). Clearly, if Cp > 1.0, then the process specification covers almost all of our
process observations.
Cp does not The only problem with with the Cp index is that it does not account for a process that
account for is off-center. We can modify this equation slightly to account for off-center processes
process that to obtain the Cpk index as follows:
is off center
Or the Cpk
index
Cpk accounts This equation just says to take the minimum distance between our specification limits
for a process and the process mean and divide it by 3 standard deviations to arrive at the measure of
being off process capability. This is all covered in more detail in the process capability section
center of the process monitoring chapter. For the example above, note how the Cpk value is
less than the Cp value. This is because the process distribution is not centered between
the specification limits.
Many of the techniques discussed in this chapter, such as hypothesis tests, control
Check the
charts and capability indices, assume that the underlying structure of the data can be
normality of
adequately modeled by a normal distribution. Many times we encounter data where
the data
this is not the case.
Some causes There are several things that could cause the data to appear non-normal, such as:
of non-
normality The data come from two or more different sources. This type of data will often
have a multi-modal distribution. This can be solved by identifying the reason for
the multiple sets of data and analyzing the data separately.
The data come from an unstable process. This type of data is nearly impossible
to analyze because the results of the analysis will have no credibility due to the
changing nature of the process.
The data were generated by a stable, yet fundamentally non-normal mechanism.
For example, particle counts are non-normal by the very nature of the particle
generation process. Data of this type can be handled using transformations.
We can For the last case, we could try transforming the data using what is known as a power
sometimes transformation. The power transformation is given by the equation:
transform the
data to make it
look normal
where Y represents the data and lambda is the transformation value. Lambda is
typically any value between -2 and 2. Some of the more common values for lambda
are 0, 1/2, and -1, which give the following transformations:
General The general algorithm for trying to make non-normal data appear to be approximately
algorithm for normal is to:
trying to make
non-normal 1. Determine if the data are non-normal. (Use normal probability plot and
data histogram).
approximately 2. Find a transformation that makes the data look approximately normal, if
normal possible. Some data sets may include zeros (i.e., particle data). If the data set
does include zeros, you must first add a constant value to the data and then
transform the results.
Example: As an example, let's look at some particle count data from a semiconductor processing
particle count step. Count data are inherently non-normal. Below are histograms and normal
data probability plots for the original data and the ln, sqrt and inverse of the data. You can
see that the log transform does the best job of making the data appear as if it is normal.
All analyses can be performed on the log-transformed data and the assumptions will
be approximately satisfied.
The original
data is non-
normal, the
log transform
looks fairly
normal
Neither the
square root
nor the
inverse
transformation
looks normal
Summary This section presents several case studies that demonstrate the
application of production process characterizations to specific
problems.
Table of The case study is broken down into the following steps.
Contents
1. Background and Data
2. Initial Analysis of Response Variable
3. Identify Sources of Variation
4. Analysis of Variance
5. Final Conclusions
6. Work This Example Yourself
Software The analyses used in this case study can be generated using
both Dataplot code and R code.
Process In the picture below we are modeling this process with one
Model output (film thickness) that is influenced by four controlled
factors (gas flow, pressure, temperature and time) and two
uncontrolled factors (run and zone). The four controlled
factors are part of our recipe and will remain constant
throughout this study. We know that there is run-to-run
variation that is due to many different factors (input material
variation, variation in consumables, etc.). We also know that
the different zones in the furnace have an effect. A zone is a
region of the furnace tube that holds one boat. There are four
zones in these tubes. The zones in the middle of the tube
grow oxide a little bit differently from the ones on the ends.
In fact, there are temperature offsets in the recipe to help
minimize this problem.
Sampling Given our goal statement and process modeling, we can now
Plan define a sampling plan. The primary goal is to determine if the
process is capable. This just means that we need to monitor the
process over some period of time and compare the estimates
of process location and spread to the specifications. An
additional goal is to identify sources of variation to aid in
setting up a process control strategy. Some obvious sources of
variation are incoming wafers, run-to-run variability, variation
due to operators or shift, and variation due to zones within a
furnace tube. One additional constraint that we must work
under is that this study should not have a significant impact on
normal production operations.
Data The following are the data that were collected for this study.
RUN ZONE WAFER THICKNESS
--------------------------------
1 1 1 546
1 1 2 540
1 2 1 566
1 2 2 564
1 3 1 577
1 3 2 546
1 4 1 543
1 4 2 529
2 1 1 561
2 1 2 556
2 2 1 577
2 2 2 553
2 3 1 563
2 3 2 577
2 4 1 556
2 4 2 540
3 1 1 515
3 1 2 520
3 2 1 548
3 2 2 542
3 3 1 505
3 3 2 487
3 4 1 506
3 4 2 514
4 1 1 568
4 1 2 584
4 2 1 570
4 2 2 545
4 3 1 589
4 3 2 562
4 4 1 569
4 4 2 571
5 1 1 550
5 1 2 550
5 2 1 562
5 2 2 580
5 3 1 560
5 3 2 554
5 4 1 545
5 4 2 546
6 1 1 584
6 1 2 581
6 2 1 567
6 2 2 558
6 3 1 556
6 3 2 560
6 4 1 591
6 4 2 599
7 1 1 593
7 1 2 626
7 2 1 584
7 2 2 559
7 3 1 634
7 3 2 598
7 4 1 569
7 4 2 592
8 1 1 522
8 1 2 535
8 2 1 535
8 2 2 581
8 3 1 527
8 3 2 520
8 4 1 532
8 4 2 539
9 1 1 562
9 1 2 568
9 2 1 548
9 2 2 548
9 3 1 533
9 3 2 553
9 4 1 533
9 4 2 521
10 1 1 555
10 1 2 545
10 2 1 584
10 2 2 572
10 3 1 546
10 3 2 552
10 4 1 586
10 4 2 584
11 1 1 565
11 1 2 557
11 2 1 583
11 2 2 585
11 3 1 582
11 3 2 567
11 4 1 549
11 4 2 533
12 1 1 548
12 1 2 528
12 2 1 563
12 2 2 588
12 3 1 543
12 3 2 540
12 4 1 585
12 4 2 586
13 1 1 580
13 1 2 570
13 2 1 556
13 2 2 569
13 3 1 609
13 3 2 625
13 4 1 570
13 4 2 595
14 1 1 564
14 1 2 555
14 2 1 585
14 2 2 588
14 3 1 564
14 3 2 583
14 4 1 563
14 4 2 558
15 1 1 550
15 1 2 557
15 2 1 538
15 2 2 525
15 3 1 556
15 3 2 547
15 4 1 534
15 4 2 542
16 1 1 552
16 1 2 547
16 2 1 563
16 2 2 578
16 3 1 571
16 3 2 572
16 4 1 575
16 4 2 584
17 1 1 549
17 1 2 546
17 2 1 584
17 2 2 593
17 3 1 567
17 3 2 548
17 4 1 606
17 4 2 607
18 1 1 539
18 1 2 554
18 2 1 533
18 2 2 535
18 3 1 522
18 3 2 521
18 4 1 547
18 4 2 550
19 1 1 610
19 1 2 592
19 2 1 587
19 2 2 587
19 3 1 572
19 3 2 612
19 4 1 566
19 4 2 563
20 1 1 569
20 1 2 609
20 2 1 558
20 2 2 555
20 3 1 577
20 3 2 579
20 4 1 552
20 4 2 558
21 1 1 595
21 1 2 583
21 2 1 599
21 2 2 602
21 3 1 598
21 3 2 616
21 4 1 580
21 4 2 575
Initial Plots The initial step is to assess data quality and to look for anomalies. This is done by
of Response generating a normal probability plot, a histogram, and a boxplot. For convenience,
Variable these are generated on a single page.
Conclusions We can make the following conclusions based on these initial plots.
From the
Plots The box plot indicates one outlier. However, this outlier is only slightly
smaller than the other numbers.
The normal probability plot and the histogram (with an overlaid normal
density) indicate that this data set is reasonably approximated by a normal
distribution.
Parameter Estimates
Lower Upper
(95%) (95%)
Type Parameter Estimate
Confidence Confidence
Bound Bound
Location Mean 563.0357 559.1692 566.9023
Standard
Dispersion 25.3847 22.9297 28.4331
Deviation
Quantiles Quantiles for the film thickness are summarized in the following table.
Capability From the above preliminary analysis, it looks reasonable to proceed with the
Analysis capability analysis.
The lower specification limit is 460, the upper specification limit is 660, and
the target specification is 560.
Percent We summarize the percent defective (i.e., the number of items outside the
Defective specification limits) in the following table.
Normal)
Lower Percent Below LSL =
Specification 460 0.0000 0.0025%
100* ((LSL - )/s)
Limit
Upper Percent Above USL =
Specification 660 100*(1 - ((USL - 0.0000 0.0067%
Limit )/s))
Combined Percent
Specification
560 Below LSL and Above 0.0000 0.0091%
Target
USL
Standard
25.38468
Deviation
Conclusions The above capability analysis indicates that the process is capable and we
can proceed with the analysis.
The next part of the analysis is to break down the sources of variation.
Box Plot by The following is a box plot of the thickness by run number.
Run
Conclusions We can make the following conclusions from this box plot.
From Box
Plot 1. There is significant run-to-run variation.
2. Although the means of the runs are different, there is no discernable trend due
to run.
Box Plot by The following is a box plot of the thickness by furnace location.
Furnace
Location
Conclusions We can make the following conclusions from this box plot.
From Box
Plot 1. There is considerable variation within a given furnace location.
2. The variation between furnace locations is small. That is, the locations and
scales of each of the four furnace locations are fairly comparable (although
furnace location 3 seems to have a few mild outliers).
Conclusion From this box plot, we conclude that wafer does not seem to be a significant factor.
From Box
Plot
Block Plot In order to show the combined effects of run, furnace location, and wafer, we draw a
block plot of the thickness. Note that for aesthetic reasons, we have used connecting
lines rather than enclosing boxes.
Conclusions We can draw the following conclusions from this block plot.
From Block
Plot 1. There is significant variation both between runs and between furnace
locations. The between-run variation appears to be greater.
Analysis of Variance
Source Degrees Sum of Mean F Ratio Prob > F
of Squares Square
Freedom Error
Run 20 61,442.29 3,072.11 5.37404 0.0000001
Furnace 63 36,014.5 571.659 4.72864 3.85e-11
Location
[Run]
Within 84 10,155 120.893
Total 167 107,611.8 644.382
Components of Variance
Component Variance Percent Sqrt(Variance
Component of Total Component)
Run 312.55694 47.44 17.679
Furnace 225.38294 34.21 15.013
Location[Run]
Within 120.89286 18.35 10.995
View This page allows you to repeat the analysis outlined in the
Dataplot case study description on the previous page using Dataplot, if
Macro for you have downloaded and installed it. Output from each
this Case analysis step below will be displayed in one or more of the
Study Dataplot windows. The four main windows are the Output
window, the Graphics window, the Command History window
and the Data Sheet window. Across the top of the main
windows there are menus for executing Dataplot commands.
Across the bottom is a command entry window where
commands can be typed in.
Results and
Data Analysis Steps
Conclusions
3. Capability
analysis indicates
that the process
is capable.
Table of The case study is broken down into the following steps.
Contents
1. Background and Data
2. Box Plots by Factor
3. Analysis of Variance
4. Throughput
5. Final Conclusions
6. Work This Example Yourself
Software The analyses used in this case study can be generated using
both Dataplot code and R code.
Process The process model for this operation is trivial and need not
Model be addressed.
Sampling Given our goal statement and process modeling, we can now
Plan define a sampling plan. The primary goal is to determine if the
process is stable and to compare the variances of the three
machines. We also need to monitor throughput so that we can
compare the productivity of the three machines.
Data The following are the data that were collected for this study.
MACHINE DAY TIME SAMPLE
DIAMETER
(1-3) (1-3) 1 = AM (1-10)
(inches)
2 = PM
---------------------------------------------------
---
1 1 1 1
0.1247
1 1 1 2
0.1264
1 1 1 3
0.1252
1 1 1 4
0.1253
1 1 1 5
0.1263
1 1 1 6
0.1251
1 1 1 7
0.1254
1 1 1 8
0.1239
1 1 1 9
0.1235
1 1 1 10
0.1257
1 1 2 1
0.1271
1 1 2 2
0.1253
1 1 2 3
0.1265
1 1 2 4
0.1254
1 1 2 5
0.1243
1 1 2 6
0.124
1 1 2 7
0.1246
1 1 2 8
0.1244
1 1 2 9
0.1271
1 1 2 10
0.1241
1 2 1 1
0.1251
1 2 1 2
0.1238
1 2 1 3
0.1255
1 2 1 4
0.1234
1 2 1 5
0.1235
1 2 1 6
0.1266
1 2 1 7
0.125
1 2 1 8
0.1246
1 2 1 9
0.1243
1 2 1 10
0.1248
1 2 2 1
0.1248
1 2 2 2
0.1235
1 2 2 3
0.1243
1 2 2 4
0.1265
1 2 2 5
0.127
1 2 2 6
0.1229
1 2 2 7
0.125
1 2 2 8
0.1248
1 2 2 9
0.1252
1 2 2 10
0.1243
1 3 1 1
0.1255
1 3 1 2
0.1237
1 3 1 3
0.1235
1 3 1 4
0.1264
1 3 1 5
0.1239
1 3 1 6
0.1266
1 3 1 7
0.1242
1 3 1 8
0.1231
1 3 1 9
0.1232
1 3 1 10
0.1244
1 3 2 1
0.1233
1 3 2 2
0.1237
1 3 2 3
0.1244
1 3 2 4
0.1254
1 3 2 5
0.1247
1 3 2 6
0.1254
1 3 2 7
0.1258
1 3 2 8
0.126
1 3 2 9
0.1235
1 3 2 10
0.1273
2 1 1 1
0.1239
2 1 1 2
0.1239
2 1 1 3
0.1239
2 1 1 4
0.1231
2 1 1 5
0.1221
2 1 1 6
0.1216
2 1 1 7
0.1233
2 1 1 8
0.1228
2 1 1 9
0.1227
2 1 1 10
0.1229
2 1 2 1
0.122
2 1 2 2
0.1239
2 1 2 3
0.1237
2 1 2 4
0.1216
2 1 2 5
0.1235
2 1 2 6
0.124
2 1 2 7
0.1224
2 1 2 8
0.1236
2 1 2 9
0.1236
2 1 2 10
0.1217
2 2 1 1
0.1247
2 2 1 2
0.122
2 2 1 3
0.1218
2 2 1 4
0.1237
2 2 1 5
0.1234
2 2 1 6
0.1229
2 2 1 7
0.1235
2 2 1 8
0.1237
2 2 1 9
0.1224
2 2 1 10
0.1224
2 2 2 1
0.1239
2 2 2 2
0.1226
2 2 2 3
0.1224
2 2 2 4
0.1239
2 2 2 5
0.1237
2 2 2 6
0.1227
2 2 2 7
0.1218
2 2 2 8
0.122
2 2 2 9
0.1231
2 2 2 10
0.1244
2 3 1 1
0.1219
2 3 1 2
0.1243
2 3 1 3
0.1231
2 3 1 4
0.1223
2 3 1 5
0.1218
2 3 1 6
0.1218
2 3 1 7
0.1225
2 3 1 8
0.1238
2 3 1 9
0.1244
2 3 1 10
0.1236
2 3 2 1
0.1231
2 3 2 2
0.1223
2 3 2 3
0.1241
2 3 2 4
0.1215
2 3 2 5
0.1221
2 3 2 6
0.1236
2 3 2 7
0.1229
2 3 2 8
0.1205
2 3 2 9
0.1241
2 3 2 10
0.1232
3 1 1 1
0.1255
3 1 1 2
0.1215
3 1 1 3
0.1219
3 1 1 4
0.1253
3 1 1 5
0.1232
3 1 1 6
0.1266
3 1 1 7
0.1271
3 1 1 8
0.1209
3 1 1 9
0.1212
3 1 1 10
0.1249
3 1 2 1
0.1228
3 1 2 2
0.126
3 1 2 3
0.1242
3 1 2 4
0.1236
3 1 2 5
0.1248
3 1 2 6
0.1243
3 1 2 7
0.126
3 1 2 8
0.1231
3 1 2 9
0.1234
3 1 2 10
0.1246
3 2 1 1
0.1207
3 2 1 2
0.1279
3 2 1 3
0.1268
3 2 1 4
0.1222
3 2 1 5
0.1244
3 2 1 6
0.1225
3 2 1 7
0.1234
3 2 1 8
0.1244
3 2 1 9
0.1207
3 2 1 10
0.1264
3 2 2 1
0.1224
3 2 2 2
0.1254
3 2 2 3
0.1237
3 2 2 4
0.1254
3 2 2 5
0.1269
3 2 2 6
0.1236
3 2 2 7
0.1248
3 2 2 8
0.1253
3 2 2 9
0.1252
3 2 2 10
0.1237
3 3 1 1
0.1217
3 3 1 2
0.122
3 3 1 3
0.1227
3 3 1 4
0.1202
3 3 1 5
0.127
3 3 1 6
0.1224
3 3 1 7
0.1219
3 3 1 8
0.1266
3 3 1 9
0.1254
3 3 1 10
0.1258
3 3 2 1
0.1236
3 3 2 2
0.1247
3 3 2 3
0.124
3 3 2 4
0.1235
3 3 2 5
0.124
3 3 2 6
0.1217
3 3 2 7
0.1235
3 3 2 8
0.1242
3 3 2 9
0.1247
3 3 2 10
0.125
Initial Steps The initial step is to plot box plots of the measured diameter for each of the
explanatory variables.
Conclusions We can make the following conclusions from this box plot.
From Box
Plot 1. The location appears to be significantly different for the three machines, with
machine 2 having the smallest median diameter and machine 1 having the
largest median diameter.
Conclusions We can draw the following conclusion from this box plot. Neither the location nor
From Box the spread seem to differ significantly by day.
Plot
Conclusion We can draw the following conclusion from this box plot. Neither the location nor
From Box the spread seem to differ significantly by time of day.
Plot
Conclusion We can draw the following conclusion from this box plot. Although there are some
From Box minor differences in location and spread between the samples, these differences do
Plot not show a noticeable pattern and do not seem significant.
Analysis of We can confirm our interpretation of the box plots by running an analysis of
Variance variance when all four factors are included.
Using All
Source DF
Sum of Mean F Statistic Prob > F
Factors Squares Square
------------------------------------------------------------------
Machine 2 0.000111 0.000055 29.3159 1.3e-11
Day 2 0.000004 0.000002 0.9884 0.37
Time 1 0.000002 0.000002 1.2478 0.27
Sample 9 0.000009 0.000001 0.5205 0.86
Residual 165 0.000312 0.000002
------------------------------------------------------------------
Corrected Total 179 0.000437 0.000002
These models are mathematically equivalent. The effect estimates in the first model
are relative to the overall mean. The effect estimates for the second model can be
obtained by simply adding the overall mean to effect estimates from the first model.
Only the machine factor is statistically significant. This confirms what the box plots
in the previous section had indicated graphically.
Analysis of The previous analysis of variance indicated that only the machine factor was
Variance statistically significant. The following table displays the ANOVA results using only
Using Only the machine factor.
Machine
Source DF
Sum of Mean F Statistic Prob > F
Squares Square
------------------------------------------------------------------
Machine 2 0.000111 0.000055 30.0094 6.0E-12
Residual 177 0.000327 0.000002
------------------------------------------------------------------
Corrected Total 179 0.000437 0.000002
Interpretation At this stage, we are interested in the level means for the machine variable. These
of ANOVA can be summarized in the following table.
Output
Machine Means for One-way ANOVA
Level Number Mean Standard Error Lower 95% CI Upper 95% CI
1 60 0.124887 0.00018 0.12454 0.12523
Model As a final step, we validate the model by generating a 4-plot of the residuals.
Validation
The 4-plot does not indicate any significant problems with the ANOVA model.
3.5.2.4. Throughput
Summary of The throughput is summarized in the following table (this was part of the original
Throughput data collection, not the result of analysis).
Analysis of We can confirm the statistical significance of the lower throughput of machine 3 by
Variance for running an analysis of variance.
Throughput
Source DF Sum of Mean F Statistic Prob > F
Squares Square
-------------------------------------------------------------------
Machine 2 8216.89 4108.45 4.9007 0.0547
Residual 6 5030.00 838.33
-------------------------------------------------------------------
Corrected Total 8 13246.89 1655.86
Interpretation We summarize the level means for machine 3 in the following table.
of ANOVA
Output Machine 3 Level Means for One-way ANOVA
Level Number Mean Standard Error Lower 95% CI Upper 95% CI
1 3 587.667 16.717 546.76 628.57
2 3 615.667 16.717 574.76 656.57
3 3 542.33 16.717 501.43 583.24
Final The analysis shows that machines 1 and 2 had about the same
Conclusions variablity but significantly different locations. The
throughput for machine 2 was also higher with greater
variability than for machine 1. An interview with the operator
revealed that he realized the second machine was not set
correctly. However, he did not want to change the settings
because he knew a study was being conducted and was afraid
he might impact the results by making changes. Machine 3
had significantly more variation and lower throughput. The
operator indicated that the machine had to be taken down
several times for minor repairs. Given the preceeding
analysis results, the team recommended replacing machine 3.
View This page allows you to repeat the analysis outlined in the
Dataplot case study description on the previous page using Dataplot, if
Macro for you have downloaded and installed it. Output from each
this Case analysis step below will be displayed in one or more of the
Study Dataplot windows. The four main windows are the Output
window, the Graphics window, the Command History window
and the Data Sheet window. Across the top of the main
windows there are menus for executing Dataplot commands.
Across the bottom is a command entry window where
commands can be typed in.
Results and
Data Analysis Steps
Conclusions
3. Analysis of Variance
1. Perform an analysis of variance 1. The analysis of
with all factors. variance shows
that only the
machine factor
is statistically
2. Perform an analysis of variance significant.
with only the machine factor.
2. The analysis of
variance shows
the overall mean
3. Perform model validation by and the
generating a 4-plot of the effect estimates
residuals. for the levels
of the machine
variable.
3. The 4-plot of
the residuals does
not indicate any
significant
problems with the
model.
4. Graph of Throughput
1. Generate a graph of the 1. The graph shows
throughput. the throughput
for machine 3 is
lower than
the other
2. Perform an analysis of machines.
variance of the throughput.
2. The effect
estimates from the
ANIVA are given.
3.6. References
Box, G.E.P., Hunter, W.G., and Hunter, J.S. (1978), Statistics for
Experimenters, John Wiley and Sons, New York.
Hoaglin, D.C., Mosteller, F., and Tukey, J.W. (1985), Exploring Data
Tables, Trends, and Shapes, John Wiley and Sons, New York.