DOE Language and Concepts
DOE Language and Concepts
4.1 INTRODUCTION
Like any highly technical discipline, DOE is rife with acronyms and language that may
intimidate the novice. The purpose of this chapter is to provide an introduction to the
many terms, concepts, and administrative issues of DOE, hopefully in a nonintimidat-
ing manner. Don’t expect to understand all of the aspects of DOE presented in this chap-
ter the first time you read it. Many of the nuances only become apparent after years of
experience and study. But it doesn’t take years to demonstrate success with DOE. After
reading this chapter you should have sufficient understanding of DOE language and
concepts to proceed to the consideration of designed experiments and their applications.
93
94 Chapter Four
marketing, agriculture, and so on. Most technical publications in any discipline expect that
some form of designed experiment will be used to structure any experimental investiga-
tion. Researchers who don’t use DOE methods don’t get published.
The popularity of DOE is due to its tremendous power and efficiency. When used
correctly, DOE can provide the answers to specific questions about the behavior of a
system, using an optimum number of experimental observations. Since designed exper-
iments are structured to answer specific questions with statistical rigor, experiments
with too few observations won’t deliver the desired confidence in the results and exper-
iments with too many observations will waste resources. DOE gives the answers that
we seek with a minimum expenditure of time and resources.
Shear strength
Hardener Humidity
Durability
Filler Cleanliness
Sandable
to bond materials together. The process inputs are shown on the left and the responses
are shown on the right. Note that this diagram is just an elaboration of the simple
process model in Figure 4.1. MINITAB has the ability to create traditional cause-and-
effect diagrams (the left side of Figure 4.2) from the Stat> Quality Tools> Cause and
Effect menu.
It is very important to keep a cause-and-effect diagram for an experiment. The
cause-and-effect diagram:
• Provides a convenient place to collect ideas for new variables.
• Serves as a quick reference tool when things go wrong and quick decisions
are necessary.
• Summarizes all of the variable considerations made over the life of the
experiment.
• Provides an excellent source of information for planning new experiments.
• Quickly impresses managers with the complexity of the problem.
The cause-and-effect diagram should be updated religiously as new variables and
responses are identified.
96 Chapter Four
Intentionally varied
Variables Fixed
Measureable
Uncontrolled
Not measureable
Every process has a multitude of variables and all of them play a role in an exper-
iment. Some variables are intentionally varied by the experimenter to determine their
effect on the responses, others are held constant to ensure that they do not affect the
responses, and some variables are ignored with the hope that they will have little or no
effect on the responses. If the experiment design is a good one, if it is carried out care-
fully, and if the assumptions regarding uncontrolled variables are met, then the experi-
menter may learn something about the problem. This classification of process variables
is summarized in Figure 4.3. A cause-and-effect diagram can be augmented by using
different colored highlighters to classify variables into intentionally varied, fixed, and
uncontrolled classes.
Example 4.1
An experiment was designed to study several design variables for a one-time-use
chemical indicator. Each indicator contains six strips of paper that are reactive to a
specific chemical. In the presence of the chemical in the form of a vapor, the strips
quickly turn from white to black. The six strips in the indicator are arranged in series
along a torturous diffusion path so that each strip changes color in succession under
continuous exposure to the chemical. The purpose of the experiment was to determine
the geometry of the torturous path so that: 1) all six of the strips would change color
when the indicator was exposed to a known high concentration of the chemical for a
specified time and 2) none of the strips would change color when the indicator was
exposed to a known low concentration of the chemical for a specified time. In the orig-
inal concept of the experiment, sample indicators would be run under the two test con-
ditions and judged to either pass or fail the relevant test; however, the sample size for
this experiment was found to be prohibitively large. To solve this problem, the response
was modified from the original pass/fail binary response to a pseudo-continuous mea-
surement response. A series of 12 indicators was exposed to the chemical for increas-
ing and evenly spaced periods of time to create a 12-step pseudo-quantitative scale that
spanned the full range of the response from all white strips to all black strips. These
12 strips were used as the standard against which experimental strips were compared
to determine their degree of color change. With this new and improved measurement
scale, the number of runs required to study the torturous path variables was signifi-
cantly reduced.
Most experiments are performed for the purpose of learning about a single
response; however, multiple responses can be considered. For example, in the epoxy
98 Chapter Four
performance problem, suppose that the primary goal of the experiment is to increase the
strength of the cured epoxy without compromising the pot life, viscosity, or sandability
characteristics. This will require that all of the responses are recorded during the exe-
cution of the experiment and that models be fitted for each response. An acceptable
solution to the problem must satisfy all of these requirements simultaneously. This type
of problem is common and is completely within the capability of the DOE method.
4.7 INTERACTIONS
When a process contains two or more variables, it is possible that some variables will
interact with each other. An interaction exists between variables when the effect of one
variable on the response depends on the level of another variable. Interactions can occur
between two, three, or more variables but three-variable and higher-order interactions
are usually assumed to be insignificant. This is generally a safe assumption although
there are certain systems where higher-order interactions are important.
With practice, two-factor interactions between variables can often be identified by
simple graphs of the experimental response plotted as a function of the two involved
variables. These plots usually show the response plotted as a function of one of the vari-
ables with the levels of the other variable distinguished by different types of lines or
symbols. Multi-vari charts are also useful for identifying variable interactions.
The management of interactions between variables is a strength of the DOE method
and a weakness of the one-variable-at-a-time (OVAT) method. Whereas DOE recognizes
and quantifies variable interactions so that they can be used to understand and better
manage the response, the OVAT method ignores the interactions and so it will fail in
certain cases when the effects of those interactions are relatively large. DOE’s success
comes from its consideration of all possible combinations of variable levels. OVAT fails
because it relies on a simple but flawed algorithm to determine how the variables affect
the response. In some cases, OVAT will obtain the same result as DOE, but in many
other cases its result will be inferior.
Example 4.2
Two variables, A and B, can both be set to two states indicated by –1 and +1.
Figures 4.4a and b show how the responses Y1 and Y2 depend on A and B. Use these
figures to determine if there is an interaction between the two variables and to demon-
strate how DOE is superior to OVAT if the goal is to maximize the responses.
Solution: In Figure 4.4a, the line segments that connect the two levels of B are sub-
stantially parallel, which indicates that the levels of B do not cause a change in how A
affects the response, so there is probably no interaction between A and B in this case.
In Figure 4.4b, the line segments that connect levels of B diverge, which indicates that
the chosen level of B determines how A affects the response, so there is probably an
interaction between A and B.
DOE Language and Concepts 99
a. b.
120 B 120 B
–1 –1
90 1 (3) 90 (3) 1
Y1
Y2
60 60
(2) (2)
30 30
(1) (1)
0 0
–1 1 –1 1
A A
Figure 4.4 Two-variable examples without interaction (a) and with interaction (b).
The weakness of the OVAT method is that it follows a limited decision path through
the design space that may or may not lead to the optimal solution. In an experiment to
study the situations in Figures 4.4a and b, if the starting point in the OVAT process is
point (1) in both cases, and the first step in the experiment is to investigate the effect
of variable A, followed by a second step to investigate B, then the desired maximal
response is obtained in Figure 4.4a but not in b. In Figure 4.4a, where there is no
interaction between variables A and B, the optimal solution is obtained regardless of
which variable is studied first. But in Figure 4.4b, where there is an interaction
between A and B, the maximal solution is obtained only if those variables are studied
in the right order. By contrast, the DOE method investigates all four possible configu-
rations in the design space so it is guaranteed to find the maximal solution whether or
not A and B interact.
variable that covers a range of values, then the model will consist of an equation that
relates the response to the quantitative predictor. Experiments that involve qualitative
predictors are usually analyzed by analysis of variance (ANOVA). Experiments that
involve quantitative predictors are usually analyzed by regression. Experiments that com-
bine both qualitative and quantitative variables are analyzed using a special regression
model called a general linear model.
Any model must be accompanied by a corresponding description of the errors or dis-
crepancies between the observed and predicted values. These quantities are related by:
yi = yˆ i + εi (4.1)
where yi represents the ith observed value of the response, ŷi represents the correspond-
ing predicted value from the model, and ei represents the difference between them. The
ei are usually called the residuals. In general, the relationship between the data, model,
and error can be expressed as:
At a minimum, the error statement must include a description of the shape or dis-
tribution of the residuals and a summary measure of their variation. The amount of error
or residual variation is usually reported as a standard deviation called the standard
error of the model indicated with the symbol ŝe or se . When the response is measured
under several different conditions or treatments, which is the usual case in a designed
experiment, then it may be necessary to describe the shape and size of the errors under
each condition.
Most of the statistical analysis techniques that we will use to analyze designed
experiments demand that the errors meet some very specific requirements. The most
common methods that we will use in this book, regression for quantitative predictors
and ANOVA for qualitative predictors, require that the distribution of the errors is
normal in shape with constant standard deviation under all experimental conditions.
When the latter requirement is satisfied, we say that the distribution of the errors is
homoscedastic. A complete error statement for a situation that is to be analyzed by
regression or ANOVA is, “The distribution of errors is normal and homoscedastic with
standard error equal to se ,” where se is some numerical value. If the distribution of errors
does not meet the normality and homoscedasticity requirements, then the models
obtained by regression and ANOVA may be incorrect.* Consequently, it is very impor-
tant to check assumptions about the behavior of the error distribution before accepting
a model. When these conditions are not met, special methods may be required to ana-
lyze the data.
* When the standard deviations of the errors are different under different conditions (for example, treatments) we say
that the error distributions are heteroscedastic.
102 Chapter Four
a. b. c.
A B C 20 25 30
Lot Temperature
Example 4.3
A manufacturer wants to study one of the critical quality characteristics of his
process. He draws a random sample of n = 12 units from a production lot and measures
them, obtaining the distribution of parts shown in Figure 4.5a. The mean of the sample
is –x = 140 and the standard deviation is s = 10. A normal plot of the sample data (not
shown) indicates that the observations are normally distributed. From this information,
identify the data, model, and the error statement.
Solution: The data values are the n = 12 observations, which can be indicated with
the symbol yi where i = 1, 2, . . . , 12. The model is the one number, ŷi = –y = 140, that
best represents all of the observations. The error values are given by e i = yi – –y and are
known to be normally distributed with standard deviation se 10. These definitions
permit Equation 4.2 to be written:
yi = y − εi (4.3)
Example 4.4
The manufacturer in Example 4.3 presents his data and analysis to his engineer-
ing staff and someone comments that there is considerable lot-to-lot variation in the
product. To test this claim, he randomly samples n = 12 units from three different lots.
The data are shown in Figure 4.5b. The three lot means are –yA = 126, –yB = 165, and –yC
= 123 and the standard deviations are all about se = 10. Normal plots of the errors
DOE Language and Concepts 103
indicate that each lot is approximately normally distributed. From this information,
identify the data, model, and the error statement.
Solution: The data are the n = 12 observations drawn from the k = 3 lots indicated
by the symbol yij where i indicates the lot (A, B, or C) and j indicates the observation
(1 to 12) within a lot. The model consists of the three means –yA = 126, –yB = 165, and –yC
= 123. The error statement is, “The errors are normally distributed with constant stan-
dard deviation se 10.”
Example 4.5
After reviewing the data and analysis described in Example 4.4, someone realizes
that the part temperatures were different for the three lots at a critical point in the
process. They decide to run an experiment by making parts at different temperatures.
n = 12 parts were made at 20, 25, and 30C in completely randomized order and the data
are shown in Figure 4.5c. They use linear regression to fit a line to the data and obtain
y = 60 + 3T where T is the temperature. The errors calculated from the difference
between the observed values and the predicted values (that is, the fitted line) are
approximately normal and have se 10. From this information, identify the data,
model, and the error statement.
Solution: The data are the 36 sets of paired (Ti , yi) observations. The model is given
by the line y = 60 +3T. The error statement is, “The errors are normally distributed with
constant standard deviation se 10.”
always preferred over an empirical model, even if the empirical model provides a
slightly better fit to the data.
Example 4.6
An experiment is performed to study the pressure of a fixed mass of gas as a func-
tion of the gas volume and temperature. Describe empirical and first-principles models
that might be fitted to the data.
Solution: In the absence of any knowledge of the form of the relationship between
the gas pressure (P) and its volume (V) and temperature (T), an empirical model of the
form:
P = a + bV + cT + dVT (4.4)
might be attempted where a, b, c, and d are coefficients to be determined from the data.
For the first-principles model, the kinetic theory of gases suggests that an appropriate
model would be:
aT (4.5)
P=
V
where a is a coefficient to be determined from the data. Although both models might
fit the data equally well, the second model would be preferred because it is suggested by
the theoretical relationship between P, T, and V.
So why are we so concerned about models? What’s the purpose for building them
in the first place? These questions are also asking about our motivation for doing
designed experiments. The purpose of any designed experiment and its corresponding
model is to relate the response to its predictors so that the response can be optimized by
better management of the predictors. Some of the reasons to build a model are:
• To determine how to maximize, minimize, or set the response to some
target value.
• To learn how to decrease variation in the response.
• To identify which predictor variables are most important.
• To quantify the contribution of predictor variables to the response.
• To learn about interactions between predictor variables.
• To improve the operation of a process by learning how to control it better.
• To simplify complex operating procedures by focusing attention on the most
important variables and by taking advantage of previously unrecognized
relationships between predictor variables and between them and the response.
DOE Language and Concepts 105
nerve-wracking task and it’s very important to include the experts, operators, and man-
agers who are knowledgeable of and responsible for the process because they are the
ones most likely to offer valuable guidance.
To emphasize the importance of picking the highest and lowest levels of quantita-
tive variables in an experiment, suppose that there are five variables in an experiment.
If the safe range of operation for each variable is known, but only one half of that range
is used just to be really safe, then the five-variable experiment will only cover (1/2)5 =
0.031 or three percent of the possible design space. The chances of finding a good
design are significantly reduced by using too narrow a range for the variables. This is a
very common mistake made by novice experimenters.
When three or more levels of a quantitative variable are used in an experiment,
there are several ways to choose the spacing between levels. The most common choice
for a three-level quantitative variable is to use three equally spaced levels, often denoted
with the coded values –1, 0, and +1 or just –, 0, and +. For example, if the batch-size
variable in the epoxy example uses three levels of 50, 100, and 150cc, then the levels
are referred to using the codes –1, 0, and +1, respectively.
When the increment between levels is constant, we say that we have a linear scale for
that variable. It is also possible to design level selections using other schemes. For exam-
ple, levels can be selected on the basis of squares (for example, 1, 4, 9) or on a log scale
(for example, 3, 6, 12). In each case, the three levels are still referenced with the codes –1,
0, and +1. The use of special scaling for levels is usually based on the experimenter’s
understanding of the response and its expected dependence on the study variable.
the same identification number. It would be just as appropriate, and perhaps clearer, to
identify the lots as 1, 2, and 3 for manufacturer A and 4, 5, and 6 for manufacturer B.
Regardless of how the lots are numbered, they are nested within manufacturers.
4.12 COVARIATES
Figure 4.3 shows that all of the variables in an experiment can be classified as inten-
tionally controlled at desired levels, held constant, or uncontrolled. An uncontrolled
quantitative variable that can be measured during the experiment is called a covariate.
Common covariates are variables like temperature, atmospheric pressure, humidity, and
line voltage. If the covariate has no influence on the response then it is not of any con-
sequence, but in many cases it is unclear if the covariate is important or not. All known
variables that are uncontrolled during the experiment are covariates and should be mea-
sured and recorded. Then when the statistical analysis of the experimental data is per-
formed, the effect of these covariates can be removed from the response. Generally, the
effect of the covariate should have a very small if not unmeasurable effect on the response.
If the effect of a covariate becomes too large it can interfere with estimates of the effects
of other variables.
Covariates must be continuous (that is, quantitative) variables. They are always ana-
lyzed using regression methods. For this reason, the word covariate is also used to refer
to quantitative variables that are intentionally varied in the experiment, even if they only
appear at two discrete levels, because they are also analyzed with regression methods.
The purpose of this matrix is to clearly define the experimental variables and their
levels. Note the use of the generic variable names x1, x2, and x3. Their use permits ref-
erences to variables without knowing their names or the context. (Sometimes the letters
108 Chapter Four
A, B, and C are used instead of x1, x2, and x3. Some people prefer to use x1, x2, . . . to
indicate quantitative variables and A, B, . . . to indicate qualitative variables but there’s
no standardized convention for assigning generic names to variables.)
Now, using the – and + notation, an experiment design is shown in the design
matrix:
Std Run x1 x2 x3
1 4 – – –
2 6 – – +
3 2 – + –
4 7 – + +
5 8 + – –
6 1 + – +
7 3 + + –
8 5 + + +
This experiment has eight runs. The Std or standard order column uses an integer to
identify each unique configuration of x1, x2, and x3. Each row, called a run or a cell of
the experiment, defines a different set of conditions for the preparation of an epoxy
sample. For example, run number 3 is to be made with levels (x1, x2, x3) = (–, +, –) or
with a 50cc batch size, Manufacturer B’s resin, and a one minute mixing time. This par-
ticular design is called a 23 full factorial design because there are three variables, each
at two levels, and the experiment requires 23 = 8 runs.
The standard order column identifies the logical order of the experimental runs. The
actual runs of the experiment must not be performed in this order because of the possi-
bility of confusing one of the study variables with a lurking variable, that is, an uncon-
trolled and unobserved variable that changes during the experiment and might affect the
response. The order of the experimental runs is always randomized, such as the order
shown in the Run or run order column. Randomization doesn’t provide perfect protec-
tion against lurking variables, but it is often effective so we always randomize. Randomi-
zation is so important that it is considered to be part of the experiment design—any
design is incomplete if a suitable randomization plan has not been identified.
The matrix of experimental runs is often organized by standard order in the plan-
ning and analysis stages of DOE, but is best organized by random run order when the
experiment is being constructed. This simplifies the job of the person who actually has
to build the experiment, and decreases the chances of making a mistake.
15
10
Percent
0
One-way
classification
2^2
2^3
Two-way
classification
2^4
BB(3)
2^(4–1)IV
2^(5–1)V
2^(7–4)III
CC(5–1)
Other
Designs
composite, Box-Behnken, and Plackett-Burman. There are also hybrid designs which
combine characteristics from two or more of these groups. But as complicated as all of
this sounds, only a handful of designs are used for the majority of experiments. You’d
never guess this from looking at a DOE textbook. The books are always full of all kinds
of big elaborate experiment designs because those are the fun ones for the authors to
talk and write about. Figure 4.6 is a Pareto chart that attempts to convey how often spe-
cific designs get built. (The data are fictional and definitely change from experimenter
to experimenter, and technology to technology, but you get the idea.) The figure shows
that there are less than a dozen designs that account for nearly all of the experiments that
get built, and all of the remaining cases fall into the “Other” category. This book will
attempt to focus on the “vital few” designs.
4.15 RANDOMIZATION
It is usually impossible to construct all of the runs of an experiment simultaneously, so
runs are typically made one after the other. Since uncontrolled experimental conditions
could change from run to run, the influence of the order of the runs must be considered.
Even a simple experiment with one variable at two levels would be easiest to build
if all of its runs were done in a convenient order (for example, 11112222); however,
such run order plans run the risk of mistakenly attributing the effect of an unobserved
variable that changes during the experiment to the experimental variable. The accepted
method of protecting against this risk is to randomize the run order of the levels of the
experimental variable (for example, 21121221). By randomizing, the effects of any
110 Chapter Four
unobserved systematic changes in the process unrelated to the experimental variable are
uniformly and randomly distributed over all of the levels of the experimental variable.
This inflates the error variability observed within experimental treatments, but it does
not add bias to the real and interesting differences between treatments. As an example
of this concept, an experiment with a single classification variable with several levels
that are run in random order is called a completely randomized design. Completely ran-
domized designs will be presented in detail in Chapter 5.
Sometimes you must include a variable in an experiment even though you’re not
interested in detecting or making claims about differences between its levels. For
example, an experiment to compare several operators might require so many parts
that raw material for the parts must come from several different raw material lots. If
the lot-to-lot differences are not important, then the experiment could be run using
one lot at a time, one after the other. To be able to make valid claims about differences
between operators, each operator would have to make parts using material from each
lot and the operator order would have to be randomized within lots. For example, if
there were four operators and three material lots, the following run order plan might
be considered:
Run Order 1 2 3 4 5 6 7 8 9 10 11 12
Lot A A A A B B B B C C C C
Operator 2 1 3 4 4 3 2 1 2 4 3 1
In this experiment, called a randomized block design, the raw material lots define
blocks of runs, and operator is the study variable. Because the blocks (that is, the raw
material lots) are not run in random order, it is not safe to interpret any observed dif-
ferences between them because the differences could be due to unobserved variables
that change during the experiment. Since the operators are run in random order, it is safe
to interpret differences between them as being real differences between operators. Even
though the randomized block design has two variables, it is considered to be a one-variable
experiment because claims can be made about only one of the two variables—always
the study variable, which must be run in random order. Despite the loss of information
about differences between the levels of the blocking variable, the use of blocking often
increases the sensitivity of the experiment to differences between the levels of the study
variable.
To summarize:
• If you intend to makes claims about differences between the treatment levels of
a variable, then the run order of the treatment levels must be randomized.
• If a variable must be included in an experiment but you don’t intend to make
claims about differences between its levels, then the levels do not need to be
randomized. Instead, the levels of the variable are used to define blocks of
experimental runs.
DOE Language and Concepts 111
Example 4.7
Contamination introduced during a powder dry process forms insoluble particles in
the dried powder. A powder drying process introduces contamination that forms insol-
uble particles into the powder. When the powder is dissolved, these particles eventually
clog a critical filter. An alternative drying schedule is proposed that should reduce the
amount of contamination. Describe a run order plan for an experiment to compare
the amount of insoluble particles formed in the two processes. A sample-size calcula-
tion based on historical process information indicates that 10 observations will be
required from each drying process.
Solution: The simplest way to run the experiment would be to configure the drying
system for the first process and then to complete all 10 trials before reconfiguring
the system for the second process and its 10 trials. The run order would be
11111111112222222222. However, due to possible changes in raw material, tempera-
ture, humidity, concentration of the contaminant, the measurement process, and so on,
during the experiment, the 20 experimental trials should be performed in random order,
such as: 221221212111221111212. Then if one or more unobserved variables do
change and have an effect on the response, these effects will be randomly but uniformly
applied to both treatments and not affect the true difference between the treatments.
Example 4.8
Suppose that the three-variable epoxy experiment described in Section 4.13 was
built in the standard run order indicated in the table and that a significant effect due to
x1 was detected. What can you conclude about the effect of x1 on the response from this
experiment?
Solution: Because the experimental runs were not performed in random order,
there is a chance that the observed effect that appears to be caused by x1 is really due
to an unobserved variable. No safe conclusion can be drawn from this experiment about
the effect of x1 on the response. Although the experiment design is a good one, its use-
fulness is compromised by the failure to randomize the run order.
Example 4.9*
A student performed a science fair project to study the distance that golf balls trav-
eled as a function of golf ball temperature. To standardize the process of hitting the golf
balls, he built a machine to hit balls using a five iron, a clay pigeon launcher, a piece
of plywood, two sawhorses, and some duct tape. The experiment was performed using
three sets of six Maxfli golf balls. One set of golf balls was placed in hot water held at
66C for 10 minutes just before they were hit, another set was stored in a freezer at –12C
overnight, and the last set was held at ambient temperature (23C). The distances in
yards that the golf balls traveled are shown in Table 4.1, but the order used to collect
the observations was not reported. Create dotplots of the data and interpret the differ-
ences between the three treatment means assuming that the order of the observations
* Source: “The Effect of Temperature on the Flight of Golf Balls.” John Swang. The National Student Research
Center. Used with permission.
112 Chapter Four
Temperature
Normal
Cold
Hot
31.4 32.4 33.4 34.4 35.4 36.4 37.4 38.4
Distance (yards)
was random. How does your interpretation change if the observations were collected in
the order shown—all of the hot trials, all of the cold trials, and finally all of the ambi-
ent temperature trials?
Solution: Dotplots for the distances versus the three golf ball temperature treat-
ments are shown in Figure 4.7. The dotplots and Tukey’s quick test suggest that the
treatment means are all different from each other and that the balls at ambient temper-
ature traveled the farthest. If the order of the observations was indeed random, then this
conclusion is probably justified; if, however, the observations were taken in the order
shown: hot, cold, then normal temperature, the steady increase in the distance suggests
that something else might have been changing during the experiment that caused the
golf balls to travel farther on later trials. If the run order was not randomized then it’s
not safe to conclude from these data that golf balls are sensitive to temperature. Given
the lack of information about the run order, this experiment cannot be used to support
claims about the effect of temperature on golf ball flight distance.
There are low-tech and high-tech ways to determine the random order of runs for
an experiment. When the experiment is presented in its logical or standard order, each
cell is assigned a number indicating its position in the experiment: 1, 2, 3, and so on.
Low-tech randomization can be done by writing those numbers on slips of paper, one
number per slip, and pulling them from a hat to determine the random run order. Decks
of cards and dice can also be used.
High-tech randomization uses a computer with random number generating capabil-
ity to assign the run order for the cells. MINITAB can be used to create a random run
order for an experiment with some of the tools from its Calc> Random Data menu. For
example, the Calc> Random Data> Sample from Columns function can be used to
DOE Language and Concepts 113
sample, without replacement, from the column showing the experiment’s standard order
into a new column for the random order. For convenience, the experiment design work-
sheet should then be sorted (use Data> Sort) by the random order column so that the
runs are shown in the correct randomized order.
It’s important to validate your randomization plan before beginning an experiment.
This can be done by analyzing your experiment using the run order as the response. If
any of the design variables are found to be good predictors for the random order of the
runs, then the randomization wasn’t effective and should be performed again.
Randomization is often hard to do and painful but you have no choice—you must
randomize. Randomize any variable that you intend to make claims about and use vari-
ables that you can’t or don’t need to randomize to define blocks. It’s often easy and
tempting to compromise the randomized run order. Don’t! Be especially careful if
someone else is running the experiment for you. Make sure that they understand that the
order of the runs must be performed in the specified order. If it is ever necessary to devi-
ate from the planned run order of an experiment, make sure to keep careful records of
the actual run order used so that the effect of the change in run order can be assessed.
4.17 BLOCKING
Often while preparing a designed experiment to study one or more variables, another
important variable, a nuisance variable, is identified that cannot be held constant or
DOE Language and Concepts 115
randomized. If during the course of the experiment this variable’s level changes which
has a corresponding effect on the response, these changes in the response will inflate the
noise in the experiment making it less sensitive to small but possibly important differ-
ences between the levels of the study variables. Rather than tolerate the additional noise
introduced by this nuisance variable, the experimental runs should be built in subsets
called blocks, where each block uses a single level of the nuisance variable. The usual
method of assigning runs to blocks is to build one or more complete replicates of the
experiment design within each block. Then when the statistical analysis of the data is
performed, the blocking variable is included in the model so that any differences
between the nuisance variable’s levels are accounted for. This approach isolates the
variation caused by the nuisance variable and recovers the full sensitivity of the exper-
iment design.
Although a blocking variable is included in the statistical analysis of an experiment,
we usually don’t test to see if there are differences between its levels. Such tests would
be unsafe because, since the levels of the blocking variable are not typically run in ran-
dom order, there may be other unidentified variables changing during the experiment
that are the real cause of the apparent differences between levels. When a study variable
cannot be randomized and must be run in blocks, we must be very careful to guarantee
that the experimental conditions are as constant as possible and we must stay conscious
of the risk that our conclusions about differences between blocks might be wrong. If we
really need to determine if there are differences between the levels of some variable,
then we have no choice—its levels must be run in random order. If we don’t need to
determine if there are differences between the levels of some variable, then we can treat
it as a blocking variable.
We almost always want to construct the runs of our experiments in random order;
however, in many cases this is impossible or impractical. For example, in the epoxy
problem, imagine that a large-scale operation required a full day to clean out all of the
equipment to make the change from resin A to resin B. Running the experiment with a
random choice of resin from run to run is desirable from an experimental standpoint,
but the many days required to change resins is definitely not practical. One choice is to
perform all of the experimental runs with one resin before switching to another. Then
only one changeover is required and the experiment will be completed quickly. In this
case, we say that the resin variable is run in two blocks and that resin is a blocking vari-
able. The danger of this approach is that if a steady trend or shift occurs in the process
during the experiment that is unrelated to the differences between resins A and B, the
trend will be misattributed to differences between the resins. If the purpose of the exper-
iment is to study variables other than resin, then it is appropriate to run resin in blocks.
But if the purpose of the experiment is to measure the differences between the resins,
then there is no choice—resins must be run in random order. This is always the case for
blocking variables—effects attributed to blocking variables may have other causes and
until the variable is run in random order you cannot be certain of the real cause of the
observed effect.
It is common to have several nuisance variables dealt with in blocks in a single
experiment in order to study a single independent variable. Even though an experiment
116 Chapter Four
Example 4.10
Describe a blocking and randomization plan for the three-variable eight-run exper-
iment design from Section 4.13 if the experiment requires three replicates and only
twenty runs can be completed in a day.
Solution: Since the full experiment requires 24 runs and only 20 runs can be com-
pleted in a day, it is necessary to build the experiment over at least a two-day period.
To account for possible differences between the morning of the first day, the afternoon
of the first day, and the morning of the second day, the experiment will be built in three
blocks. Each block will contain a single replicate of the eight-run experiment design
with the runs within blocks in random order. Table 4.2 suggests a possible blocking and
randomization plan. The numbers in the columns for blocks 1, 2, and 3 indicate the run
order within the blocks.
Example 4.11
Suppose that management decides that the experiment from Example 4.10 will take
too much time to complete. As a compromise, they decide that all of the runs from resin
manufacturer A will be done first, followed by all of the runs from manufacturer B, so
that the experiment can be completed in a single day. Describe a blocking and ran-
domization plan for the new experiment and discuss how the analysis and conclusions
will differ from the original plan.
Solution: The new experiment will be built with two blocks of twelve runs each,
defined by manufacturer (x2). The study variables will be batch size (x1) and mixing time
(x3). This two-variable experiment requires 2 × 2 = 4 runs per replicate. Since each
DOE Language and Concepts 117
Table 4.2 Blocking and randomization plan for a 24-run experiment in three blocks.
Block
Run x1 x2 x3 1 2 3
1 – – – 3 8 5
2 – – + 2 1 3
3 – + – 8 7 6
4 – + + 5 5 7
5 + – – 4 6 4
6 + – + 6 2 8
7 + + – 7 4 1
8 + + + 1 3 2
block will contain twelve runs there will be three replicates per block. If the experi-
mental conditions within blocks are expected to be stable, then the twelve runs within
each block could be completely randomized. If other variables may cause differences
within the original two blocks, however, then each block should consist of three sub-
blocks defined by replicates of the four-run experiment. Generally, the latter choice is
preferred.
4.18 CONFOUNDING
Sometimes, by accident or design, an experiment is constructed so that two variables
have the same levels for each run in the experiment; that is, if our variables are x1 and
x2, then x1 = x2. When this happens, it becomes impossible to separate the effects of the
two variables. It’s like having two input knobs on a machine that are always locked
together. When one is changed, the other changes with it so that the true cause of any
change in the output cannot be determined. When two variables are coupled or locked
together like this we say that the two variables are confounded or aliased. Confounding
should be avoided when possible, but sometimes it’s necessary to design an experiment
with some confounding of the variables. For example, there are certain designs where
a variable is intentionally confounded with an interaction, such as x1 = x23.
Variables can still be confounded and not have exactly the same settings. Suppose ±
that one variable x1 has levels ±1 and a second variable x2 has corresponding levels 1.
That is, whenever the first variable is +1, the second is –1, and vice versa. A concise
way of writing this relationship is x1 = – x2. These two variables are still confounded
with each other because the settings of one variable determine the settings of the other.
Confounding is an issue of the ability of one variable to predict another. Ideally we want
our experimental variables to be independent of each other, that is, no variable should
be predictable from another variable or combination of variables. We design experi-
ments so that the variables are independent, that is, not confounded.
Confounding of variables is not a simple binary state. Two variables can be inde-
pendent of each other (that is, not confounded), perfectly confounded with each other,
118 Chapter Four
or they can fall in some intermediate state between the two extremes. Some small
degree of confounding is tolerable under most circumstances, but large amounts of con-
founding can cause problems. One way that minor confounding can appear in an exper-
iment is when a well designed experiment, in which all of the variables are independent,
loses some experimental runs. There are safe ways to handle some missing data, but an
experiment with lots of missing runs will have to be supplemented with new runs or run
again from scratch.
The order of the runs in an experiment is another variable that is always present but
often overlooked. When the levels of a variable change in some systematic way from
the start to the end of the experiment (for example, AAAABBBBCCCC) we say that
the variable is confounded with the run order. When a variable is confounded with the
run order, it is unsafe to attribute an observed effect to that variable because another
unobserved variable, called a lurking variable, that changes during the experiment
could have been the real cause of the effect. Because we can never be certain that there
are no lurking variables present, we must assume that they are there and protect our-
selves from them. We do this by randomizing the run order so that the effects of any
lurking variables are not confounded with the experimental variables.
the experiment. Without the missing values we couldn’t complete the lab report. Being
the creative students that we were, and having few other choices, Steve and I created a
graph of the theoretical relationship between the response and the independent vari-
ables. Then we plotted fictitious points along this curve and worked backward to create
the column of missing settings. Of course we drew the points along the curve with lots
of experimental error. We didn’t want to make unreasonably strong claims about the
relationship—only that the relationship was roughly as the theory predicted.
A few days after turning in our reports, Steve and I were called to a special meet-
ing with the lab professor to discuss our results. When we met with him, Steve and I
were initially relieved to find out that our fakery hadn’t been detected, but then shocked
to realize that the professor was completely thrilled with our work! No other students
had ever successfully obtained the expected relationship between the response and the
independent variables! The experiment had essentially been a study of noise! We
quickly admitted that we had faked the data and luckily got away with passing grades
and a verbal reprimand; however, in any other environment it’s unlikely that the com-
mission of such an act would be dealt with so lightly. Nowadays, in most workplaces,
someone caught faking data would probably be fired, maybe even prosecuted, and cer-
tainly ostracized by their peers like Baltimore and Imanishi-Kari.
The moral of these stories is that you should not fake or in any other way compro-
mise the integrity of your data. If you do, you put yourself and your whole organization
at risk and you will probably be caught and held accountable. Whether you get caught
or not, you will certainly go to data hell. Like statistics hell, the line to get into data hell
is very long and we will probably see each other there.
diagram, such as the example in Figure 4.2, page 95, is a good way to incorporate the
variables and responses into one document.
The initial cause-and-effect diagram that you create to document a process should
evolve as your understanding of the system changes and improves. Try to keep photo-
copies of the most current diagram handy so you can add to it as you get new ideas. The
modified cause-and-effect diagram is also a great way to get a new DOE team member
up to speed on the process or to show an anxious manager that you have command of
the situation.
Although an initial modified cause-and-effect diagram can be started by a single
person, most likely the DOE project team leader, it is essential that all of the people
involved in the process contribute to this step. Inputs from the operators and technicians
who run the process on a daily basis are critical because they are often the only ones
aware of special but important variables. The design and process engineering people are
important because they must provide the more technical and theoretical viewpoint. The
manager of the organization that owns and operates the process should be involved to
make sure that all of the requirements of the process, including broader requirements
that might not be known or obvious to the others, are addressed. If the customer of the
process cannot be involved or consulted at this stage, the manager is also responsible
for representing his or her viewpoint.
The initial creation of the modified cause-and-effect diagram can happen in a rela-
tively short period of time, but the document is rarely complete in its early stages. It is
just so easy to overlook secondary or even important variables or responses that you
must expect to spend quite a bit of time spread out over several days or even weeks to
develop a complete analysis. And always update this document on a regular basis as
new variables and interpretations of the system are discovered.
In many cases, the creation of new procedures or the review of existing procedures
will uncover potentially serious gaps in the system. For example, it may be discovered
that one or more operators really don’t understand how the process is supposed to oper-
ate. These issues must be identified and resolved before proceeding to the next step in
the DOE process.
If the independent variables and/or response are quantitative, then calibration
records and gage error study data should be checked to confirm that all of the necessary
measurements and settings are accurate and precise. If this evidence doesn’t exist, then
it may be worth the time to address the most serious concerns if not all of them.
All of the owners/managers of the process should be involved in documenting the
process. This includes the operators who run the process, the technicians or machine
adjusters who troubleshoot and maintain the process on a daily basis, and the design
and/or process engineers who have overall responsibility for the process.
It’s relatively rare that a process is sufficiently documented prior to performing a
designed experiment. Usually something important among the many procedures, pro-
cess performance records, calibration records, and gage error study results that are nec-
essary to completely document the process is inadequate or completely missing. If these
things are all available and up-to-date, this step of the DOE process might happen
quickly, but it’s more likely that many hours of preparation will be necessary before this
step can be considered complete. Often these activities uncover the problem or prob-
lems that initiated considerations for performing a designed experiment in the first
place. If effective solutions can be found to these problems, it may not be necessary to
continue to the next step in the DOE process.
Experiment design
Figure 4.8 Relationship between the experiment design, data, model, and answers.
the data to be collected, and the experiment design defines the organization and structure
of the data.
Section 4.13 indicated that there are two parts to every experiment design: the vari-
ables matrix, which identifies the experimental variables and their levels, and the design
matrix, which identifies the combination of variable levels that will be used for the
experimental runs. Both of these matrices must be completely defined in this step of the
DOE process. Use the information collected in the previous steps of the DOE process
to determine which variables to include in an experiment and what levels each variable
will have. Then, based on this information and the number and nature of the design vari-
ables, that is, whether they are qualitative, quantitative, or a mixture of the two types,
and how many levels of the variables there are, select an appropriate experiment design.
For example, the experiment design may be: a one-way or multi-way classification
design with qualitative variables, a two-level screening design, a factorial design to
model main effects and interactions, a response surface design to account for curvature
in the response, or a hybrid design involving both qualitative and quantitative variables.
Once the design has been chosen, the matrix of experimental runs and a fictional
response (for example, random normal) should be created and analyzed to confirm that
the desired model can indeed be fitted to the data.
The DOE project leader should have enough information collected at this point in
the DOE process that he or she can specify the variables and experimental design
matrices, however, they may still find it necessary to consult with the appropriate
experts on the process if there are still ambiguities with respect to some of the vari-
ables or their levels. And if there are too many variables and/or variable levels in the
experiment, it may be necessary to consult with the statistical specialist and/or recon-
vene the whole project team to identify a more practical design. In the majority of the
cases, when the previous steps in the DOE process have been completed successfully,
the specification of an appropriate experiment design should take the DOE project
leader less than one hour.
performed after an experiment design has been chosen, but they may raise issues that
force you to reconsider that choice.
After the experiment design, has been chosen a sample-size calculation should be
performed to determine the number of replicates of the design necessary to make the
experiment sufficiently sensitive to practically significant effects. If the indicated total
number of runs exhausts the available time and resources, it may be necessary to revise
or perhaps even abandon the original plan. The total number of runs required and the
rate at which they can be produced will also factor into blocking considerations.
All experimental variables are either randomized or used to define blocks. If you
intend to make claims about the effect of a variable, then that variable must be ran-
domized. If the experiment is blocked, then the study variables must be randomized
within blocks. The role of blocking variables is limited to reducing the variability asso-
ciated with sources of noise that would reduce the sensitivity of the experiment if they
were overlooked. If the experiment is to be built in blocks, randomize the order of the
blocks and randomize runs involving study variables within blocks. Confirm that the ran-
domization and blocking plan is effective by analyzing the intended order of the experi-
mental runs as if it were the experimental response. If any of the design variables or
other important terms in the model can predict the run order, then the randomization
wasn’t effective and you will have to re-randomize the runs or modify the randomiza-
tion plan. Use this opportunity to confirm that the blocking plan didn’t interfere with
the intended model, such as by confounding a blocking variable with a study variable
or an important interaction between study variables.
After the randomization plan is validated, data sheets should be created for the oper-
ators. These data sheets should indicate the settings of the experimental runs, with room
to record the response and any special notes. To avoid confusing the operators, the order
of the runs on the data sheets should be the randomized run order. If the experiment is
to be built in blocks, separate data sheets can be created for each block. The team mem-
bers who will participate in actually building the experiment should review the data col-
lection sheets to make sure that they are correct and understood.
The DOE project leader is responsible for determining the randomization and
blocking plan, but it may be necessary for him or her to consult with the statistical spe-
cialist, technicians, or process engineer to determine practical strategies for the plan.
Simple experiments take only a few minutes to randomize and block but a complicated
experiment may take several hours. In severe cases, the difficulties associated with ran-
domization and blocking may require reconsideration of the experiment design.
be necessary to suspend the experiment until the issue is resolved. Sometimes this can be
done immediately after the problem is discovered, but when a clear and effective solu-
tion is not apparent it’s usually better to walk away, regroup, and come back another day.
When the experiment is being performed, great care has to be taken to follow the
correct procedures for the process, to honor the randomization plan, to maintain the iden-
tity of the parts, and to make sure that all of the data are faithfully and accurately recorded.
Any unusual events or conditions should be clearly noted. These observations will be
crucial later on if it becomes necessary to explain outliers in the data set.
All of the key people required to operate the process must be available to run the
experiment. If a crucial person is missing, you’re better off waiting until that person
becomes available. This may or may not include the process engineer depending on
how unusual some of the experimental runs are. If the operators and technicians have
been properly trained and prepared to run the experiment, the DOE project leader
should be able to sit back, observe, and only take action if someone becomes confused
or if some unexpected event occurs.
Example 4.12
The following report is presented as an example of a well-written report of a
designed experiment.
132 Chapter Four
Background: Brake bars are cut using carbide-tipped circular saws lubricated with
LAU-003. Saws must be resharpened when burrs on the perimeter of the cut approach
the allowed tolerances for the cut surface. LAU has suggested that saws lubricated with
LAU-016 instead of LAU-003 would make more cuts between resharpenings. Decreased
downtime and resharpening costs would more than offset the minor price increase for
LAU-016. The purpose of this experiment is to: 1) demonstrate that LAU-016 delivers
more cuts between resharpenings than LAU-003 and 2) confirm that there are no
adverse effects associated with the use of LAU-016.
• Methods
– LAU-003 and LAU-016 are delivered in dry form and mixed with mineral oil.
Both lubricants were mixed according to LAU’s instructions.
– The lubricant tank was drained and refilled between trials that required a lubri-
cant change. No attempt was made to flush out lubricant that was adsorbed on
machine surfaces.
Continued
DOE Language and Concepts 133
– All saw blade resharpenings were performed in-house on the Heller grinder by
Tony E. Tony also confirmed the critical blade tooth specs before a blade was
released for the experiment.
• Material
– The steel stock used for brake bars is thought to be consistent from lot to lot so
no attempt was made to control for lots. The order of the experimental runs was
randomized to reduce the risk of lot-to-lot differences.
– Saw blades tend to have a ‘personality’ so each blade was run two times—
once with LAU-003 and once with LAU-016.
– LAU lubricants tend to be very consistent from batch to batch and batches are
very large, so single batches of both lubricants were used in the experiment.
– 10 randomly selected saw blades were used for the experiment.
– Standard-grade mineral oil provided by LAU was used to mix the lubricants.
• Manpower
– Bob P. is the primary operator of the brake-bar cutting operation so all experi-
mental runs were performed by him. Bob also mixed the lubricants, switched
the lubricants between trials, and monitored the cutting operation to determine
when a blade had reached its end of life. Bob documented all of these steps in
the brake-bar cutting operation log book.
– Tony E. resharpened blades and confirmed that they met their specs.
• Machines
– All blades were sharpened on the Heller grinder.
– All cuts were made with the dedicated brake bar cutter.
– The number of cuts was determined from the counter on the brake bar cutter.
Experiment Design: Each saw blade was used once with each lubricant so the exper-
iment is a paired-sample design that can be analyzed using a paired-sample t test. The
sample size (n = 10) was determined by prior calculation to deliver a 90 percent proba-
bility of detecting a 10 percent increase in the life of the saw blades using LAU-016. The
standard deviation for the sample-size calculation was estimated from LAU-003 histori-
cal data. The lubricant type was run in completely random order by randomly choosing
a blade from among those scheduled and available for use with the required lubricant.
Experimental Data: The experiment was performed over the period 14–18 October
1999. The experimental data are shown in Figure 4.9 in the order in which they were col-
lected. Blade #9 broke a carbide tip during its first trial so it was repaired, resharpened,
and put back into service. Broken tips are random events thought to be unrelated to the
lubricant so the first observation of blade #9 was omitted from the analysis. There were
no other special events recorded during the execution of the experiment. The original
record of these data is in the brake-bar operation logbook.
Continued
134 Chapter Four
Statistical Analysis: The experimental data are plotted by lubricant type and con-
nected by blade in Figure 4.10. The plot clearly shows that the number of cuts obtained
using LAU-016 is, on average, greater than the number of cuts obtained with LAU-003.
The data were analyzed using a paired-sample t test with Stat> Basic Stats>
Paired t in MINITAB V13.1. The output from MINITAB is shown in Figure 4.11. The
mean and standard deviation of the difference between the number of cuts obtained with
––
LAU-016 versus LAU-003 was Δx = 16.0 and s = 13.5. This result was statistically sig-
nificant with p = 0.005. The Δxi were analyzed graphically (not shown) and found to be
at least approximately normal and homoscedastic with respect to run order as required
for the paired-sample t test. The 95 percent confidence interval for the increase in the
mean number of cuts is given by:
Relative to the mean number of cuts observed with LAU-003, the 95 percent confidence
interval for the fractional increase in the mean number of cuts is given by:
⎛ Δμ ⎞
P ⎜ 0.05 < < 0.20⎟ = 0.95
⎝ x 003 ⎠
Continued
DOE Language and Concepts 135
Continued
Blade
1
160 2
3
4
150
5
6
Cuts
140 7
8
9
130 10
120
LAU-003 LAU-016
Lube
Although the lower bound of the 95 percent confidence interval falls below the 10 per-
cent target increase in the number of cuts, the true magnitude of the increase is still rel-
atively uncertain but large enough to justify the change from LAU-003 to LAU-016.
4. Preliminary experiments
• List of specific questions or issues to be resolved with preliminary
experiments. (This list may already exist in the problem statement.)
• Summary statement of the purpose and results of each preliminary
experiment.
• Original data records, notes, and analysis from each preliminary
experiment.
• Notes on any follow-up actions taken as a result of findings from
preliminary experiments.
5. Experiment design
• Final classification of each input variable from the cause-and-effect analysis
into one of the following categories: experimental variable, variable to
be held fixed, uncontrolled variable not recorded, or uncontrolled variable
recorded.
• Copies of the variable and design matrices.
• Copy of the sample-size calculation or other sample-size justification.
• Copy of the analysis of a simulated response demonstrating that the desired
model can be fitted from the design. (This may be postponed and done in
combination with the validation of the randomization and blocking plan in
the next step.)
6. Randomization and blocking plan
• Description of and justification for the randomization and blocking plan.
• Copy of analysis validating the randomization plan (for example, analysis
of run order as the response).
• Copies of the actual data sheets (with the runs in random order) to be used
for data collection.
7. Experiment records
• Copies of any required formal authorizations to build the experiment.
• Copies of all original data records.
• Copies of all notes taken during the execution of the experiment.
8. Statistical analysis
• Copy of the experimental data after transcription into the electronic
worksheet.
138 Chapter Four
• Copies of the analyses of the full and refined models, including the
residuals analysis.
• Copies of any alternative models considered.
• Copies of any special notes or observations from the analysis concerning
unusual observations, and so on, and any related findings from follow-
up activities.
9. Interpretation
• Written interpretation of the statistical analysis with references to graphs
and tables.
• Explanation of special applications of the final model, for example,
optimization, variable settings to achieve a specified value of the response,
and so on.
• Description of any special conditions or observations that might indicate
the need for a follow-up experiment or influence the direction of the
confirmation experiment.
• If appropriate, a brief statement about the strengths and/or weaknesses of
the process, experiment, and/or analysis.
• Recommendations for the next step in the experimental project, for example,
proceed to a confirmation experiment, run more replicates of the original
experiment, perform another designed experiment, and so on.
10. Confirmation experiment
• Description of and justification for the confirmation experiment.
• Copies of the original data records, statistical analysis, and interpretation
of the confirmation experiment.
• Summary statement about the success or failure of the confirmation
experiment and implications for the goals of the experimental project.
11. Report
• Copy of the final experiment report (written and/or slide presentation) and
distribution list.
• Copies of any comments or follow-up to the report.
• List of recommendations and/or warnings for anyone who might reconsider
this problem in the future.
DOE Language and Concepts 139