0% found this document useful (0 votes)
106 views

DOE Language and Concepts

This document provides an introduction to key terms and concepts in Design of Experiments (DOE). It defines DOE as a formal technique for studying how independent variables affect a response. DOE is more efficient than traditional methods and can detect interactions between variables. Variables are the inputs to a process and responses are the outputs, with the goal being to understand how to control variables to achieve desired responses. Variables can be classified as intentionally varied, fixed, or uncontrolled in an experiment.

Uploaded by

guillermo2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views

DOE Language and Concepts

This document provides an introduction to key terms and concepts in Design of Experiments (DOE). It defines DOE as a formal technique for studying how independent variables affect a response. DOE is more efficient than traditional methods and can detect interactions between variables. Variables are the inputs to a process and responses are the outputs, with the goal being to understand how to control variables to achieve desired responses. Variables can be classified as intentionally varied, fixed, or uncontrolled in an experiment.

Uploaded by

guillermo2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

4

DOE Language and Concepts

4.1 INTRODUCTION
Like any highly technical discipline, DOE is rife with acronyms and language that may
intimidate the novice. The purpose of this chapter is to provide an introduction to the
many terms, concepts, and administrative issues of DOE, hopefully in a nonintimidat-
ing manner. Don’t expect to understand all of the aspects of DOE presented in this chap-
ter the first time you read it. Many of the nuances only become apparent after years of
experience and study. But it doesn’t take years to demonstrate success with DOE. After
reading this chapter you should have sufficient understanding of DOE language and
concepts to proceed to the consideration of designed experiments and their applications.

4.2 DESIGN OF EXPERIMENTS: DEFINITION, SCOPE,


AND MOTIVATION
Design of experiments (DOE) is a formal structured technique for studying any situa-
tion that involves a response that varies as a function of one or more independent vari-
ables. DOE is specifically designed to address complex problems where more than one
variable may affect a response and two or more variables may interact with each other.
DOE replaces inferior methods such as the traditional but unfortunately still common
method of studying the effect of one variable at a time (OVAT). Compared to DOE, the
OVAT method is an inefficient use of resources and is incapable of detecting the pres-
ence of or quantifying the interactions between variables.
DOE is used wherever experimental data are collected and analyzed. It’s use is
expected in all branches of scientific research but DOE is becoming ever more widespread
in engineering, manufacturing, biology, medicine, economics, sociology, psychology,

93
94 Chapter Four

marketing, agriculture, and so on. Most technical publications in any discipline expect that
some form of designed experiment will be used to structure any experimental investiga-
tion. Researchers who don’t use DOE methods don’t get published.
The popularity of DOE is due to its tremendous power and efficiency. When used
correctly, DOE can provide the answers to specific questions about the behavior of a
system, using an optimum number of experimental observations. Since designed exper-
iments are structured to answer specific questions with statistical rigor, experiments
with too few observations won’t deliver the desired confidence in the results and exper-
iments with too many observations will waste resources. DOE gives the answers that
we seek with a minimum expenditure of time and resources.

4.3 EXPERIMENT DEFINED


A simple model of a process is shown in Figure 4.1. Processes have inputs that deter-
mine how the process operates and outputs that are produced by the process. The purpose
of an experiment is to determine how the inputs affect the outputs. Experiments may be
performed to document the behavior of the inputs and corresponding outputs for scien-
tific purposes, but the goal of engineering experimentation is to learn how to control the
process inputs in order to produce the desired outputs. Process inputs are called vari-
ables, factors, or predictors and process outputs are called responses.
Every experiment involves the observation of both the inputs (the variables) and the
outputs (the responses). The action taken by the experimenter on the inputs determines
whether the experiment is passive or active. When the experimenter merely observes the
system and records any changes that occur in the inputs and the corresponding outputs,
the experiment is passive. This type of experimentation can be costly, time-consuming,
and unproductive. When the experimenter intentionally varies the inputs, then the
experiment is active. Active experimentation, done under controlled conditions in a log-
ical structured manner, is a tremendously powerful tool and is the type of experimenta-
tion used in DOE.

4.4 IDENTIFICATION OF VARIABLES AND RESPONSES


Perhaps the best way to identify and document the many variables and responses of a
process is to construct a modified cause-and-effect diagram. Consider the example
shown in Figure 4.2. The problem of interest is the performance of a two-part epoxy used

Inputs Process Outputs

Figure 4.1 Simple model of a process.


DOE Language and Concepts 95

Machine Manpower Methods Time Finish

Container Skill Batch size


• Metal
• Plastic Ratio R/H Pot life Sandable
Experience
Mixing time Skin Tackiness

Agitator Patience Surface Sanding Appearance


preparation
Applicator Final cure Ability to
take paint
Epoxy
performance

Resin Temperature Tensile strength

Shear strength
Hardener Humidity
Durability
Filler Cleanliness
Sandable

Materials Environment Mechanical

Figure 4.2 Cause-and-effect diagram for epoxy performance.

to bond materials together. The process inputs are shown on the left and the responses
are shown on the right. Note that this diagram is just an elaboration of the simple
process model in Figure 4.1. MINITAB has the ability to create traditional cause-and-
effect diagrams (the left side of Figure 4.2) from the Stat> Quality Tools> Cause and
Effect menu.
It is very important to keep a cause-and-effect diagram for an experiment. The
cause-and-effect diagram:
• Provides a convenient place to collect ideas for new variables.
• Serves as a quick reference tool when things go wrong and quick decisions
are necessary.
• Summarizes all of the variable considerations made over the life of the
experiment.
• Provides an excellent source of information for planning new experiments.
• Quickly impresses managers with the complexity of the problem.
The cause-and-effect diagram should be updated religiously as new variables and
responses are identified.
96 Chapter Four

Intentionally varied

Variables Fixed

Measureable
Uncontrolled
Not measureable

Figure 4.3 Disposition of the experimental variables.

Every process has a multitude of variables and all of them play a role in an exper-
iment. Some variables are intentionally varied by the experimenter to determine their
effect on the responses, others are held constant to ensure that they do not affect the
responses, and some variables are ignored with the hope that they will have little or no
effect on the responses. If the experiment design is a good one, if it is carried out care-
fully, and if the assumptions regarding uncontrolled variables are met, then the experi-
menter may learn something about the problem. This classification of process variables
is summarized in Figure 4.3. A cause-and-effect diagram can be augmented by using
different colored highlighters to classify variables into intentionally varied, fixed, and
uncontrolled classes.

4.5 TYPES OF VARIABLES


The inputs to a process are referred to as variables, factors, or predictors. Each variable
in an experiment has its own unique settings referred to as levels or treatments. The
relationship between the levels of a variable determines whether the variable is qualita-
tive or quantitative. The levels of a qualitative variable differ in type. For example, in
the epoxy problem the resin variable may have three qualitative levels determined by
manufacturer: Manufacturer A, Manufacturer B, and Manufacturer C. A quantitative
variable has levels that differ in size. For example, the epoxy batch-size variable may
appear in the experiment at four quantitative levels: 50, 100, 150, and 200 cc. An advan-
tage of a quantitative variable is that the experiment results may be used to interpolate
between the levels of the variable included in the experiment. For example, the behav-
ior of the 50, 100, 150, and 200 cc batches could be used to predict how a batch of size
120 cc would behave.
Some experiments include only a single design variable but many of the experi-
ments that we will be interested in will contain two or more variables. Although an
experiment with more than one variable can contain a mixture of qualitative and quan-
titative variables, experiments built with just quantitative variables generally offer more
design possibilities. Sometimes it is possible, and generally it is desirable, to redefine a
qualitative variable so that it becomes quantitative. This may take some imagination,
but with practice and out of necessity it often becomes possible. Methods of redefining
a qualitative variable into a quantitative variable are discussed later in this chapter.
DOE Language and Concepts 97

4.6 TYPES OF RESPONSES


Whenever possible, the response of an experiment should be quantitative. Any appro-
priate measurement system may be used but it should be repeatable and reproducible.
This text will not rigorously treat binary or count responses although MINITAB has the
ability to analyze them. The basic experiment design considerations are the same for
these responses but the statistical analyses are different and sample sizes will be much
larger. See Agresti (2002), Christensen (1997), or your neighborhood statistician or
DOE consultant for help.
Sometimes it’s possible to define a severity rating for a binary success/failure type
response that approximates a quantitative response. For example, if the response is the
cleanliness of a utensil after it is machine washed, then a severity rating of zero to 10
might be used to indicate the range of cleanliness from completely clean to really filthy
instead of a simple binary response. The information content of this simple multi-level
response is much greater than the binary response and may be sufficient to allow the
response to be treated as if it were quantitative. This will give the analysis more power
and permit the sample size to be reduced.

Example 4.1
An experiment was designed to study several design variables for a one-time-use
chemical indicator. Each indicator contains six strips of paper that are reactive to a
specific chemical. In the presence of the chemical in the form of a vapor, the strips
quickly turn from white to black. The six strips in the indicator are arranged in series
along a torturous diffusion path so that each strip changes color in succession under
continuous exposure to the chemical. The purpose of the experiment was to determine
the geometry of the torturous path so that: 1) all six of the strips would change color
when the indicator was exposed to a known high concentration of the chemical for a
specified time and 2) none of the strips would change color when the indicator was
exposed to a known low concentration of the chemical for a specified time. In the orig-
inal concept of the experiment, sample indicators would be run under the two test con-
ditions and judged to either pass or fail the relevant test; however, the sample size for
this experiment was found to be prohibitively large. To solve this problem, the response
was modified from the original pass/fail binary response to a pseudo-continuous mea-
surement response. A series of 12 indicators was exposed to the chemical for increas-
ing and evenly spaced periods of time to create a 12-step pseudo-quantitative scale that
spanned the full range of the response from all white strips to all black strips. These
12 strips were used as the standard against which experimental strips were compared
to determine their degree of color change. With this new and improved measurement
scale, the number of runs required to study the torturous path variables was signifi-
cantly reduced.

Most experiments are performed for the purpose of learning about a single
response; however, multiple responses can be considered. For example, in the epoxy
98 Chapter Four

performance problem, suppose that the primary goal of the experiment is to increase the
strength of the cured epoxy without compromising the pot life, viscosity, or sandability
characteristics. This will require that all of the responses are recorded during the exe-
cution of the experiment and that models be fitted for each response. An acceptable
solution to the problem must satisfy all of these requirements simultaneously. This type
of problem is common and is completely within the capability of the DOE method.

4.7 INTERACTIONS
When a process contains two or more variables, it is possible that some variables will
interact with each other. An interaction exists between variables when the effect of one
variable on the response depends on the level of another variable. Interactions can occur
between two, three, or more variables but three-variable and higher-order interactions
are usually assumed to be insignificant. This is generally a safe assumption although
there are certain systems where higher-order interactions are important.
With practice, two-factor interactions between variables can often be identified by
simple graphs of the experimental response plotted as a function of the two involved
variables. These plots usually show the response plotted as a function of one of the vari-
ables with the levels of the other variable distinguished by different types of lines or
symbols. Multi-vari charts are also useful for identifying variable interactions.
The management of interactions between variables is a strength of the DOE method
and a weakness of the one-variable-at-a-time (OVAT) method. Whereas DOE recognizes
and quantifies variable interactions so that they can be used to understand and better
manage the response, the OVAT method ignores the interactions and so it will fail in
certain cases when the effects of those interactions are relatively large. DOE’s success
comes from its consideration of all possible combinations of variable levels. OVAT fails
because it relies on a simple but flawed algorithm to determine how the variables affect
the response. In some cases, OVAT will obtain the same result as DOE, but in many
other cases its result will be inferior.

Example 4.2
Two variables, A and B, can both be set to two states indicated by –1 and +1.
Figures 4.4a and b show how the responses Y1 and Y2 depend on A and B. Use these
figures to determine if there is an interaction between the two variables and to demon-
strate how DOE is superior to OVAT if the goal is to maximize the responses.
Solution: In Figure 4.4a, the line segments that connect the two levels of B are sub-
stantially parallel, which indicates that the levels of B do not cause a change in how A
affects the response, so there is probably no interaction between A and B in this case.
In Figure 4.4b, the line segments that connect levels of B diverge, which indicates that
the chosen level of B determines how A affects the response, so there is probably an
interaction between A and B.
DOE Language and Concepts 99

a. b.

120 B 120 B
–1 –1
90 1 (3) 90 (3) 1
Y1

Y2
60 60
(2) (2)
30 30
(1) (1)
0 0
–1 1 –1 1
A A

Figure 4.4 Two-variable examples without interaction (a) and with interaction (b).

The weakness of the OVAT method is that it follows a limited decision path through
the design space that may or may not lead to the optimal solution. In an experiment to
study the situations in Figures 4.4a and b, if the starting point in the OVAT process is
point (1) in both cases, and the first step in the experiment is to investigate the effect
of variable A, followed by a second step to investigate B, then the desired maximal
response is obtained in Figure 4.4a but not in b. In Figure 4.4a, where there is no
interaction between variables A and B, the optimal solution is obtained regardless of
which variable is studied first. But in Figure 4.4b, where there is an interaction
between A and B, the maximal solution is obtained only if those variables are studied
in the right order. By contrast, the DOE method investigates all four possible configu-
rations in the design space so it is guaranteed to find the maximal solution whether or
not A and B interact.

4.8 TYPES OF EXPERIMENTS


Two of the primary considerations that distinguish experiment designs are the number
of design variables that they include and the complexity of the model that they provide.
For a specified number of design variables, there could be many experiment designs to
choose from, but the extreme designs that span all of the others are called screening
experiments and response surface experiments. Screening experiments are used to study
a large number of design variables for the purpose of identifying the most important
ones. Some screening experiments can evaluate many variables with very few experi-
mental runs. For example, the Plackett-Burman designs can handle seven variables in
eight runs, eleven variables in twelve runs, and designs for more variables are available.
Screening experiments use only two levels of each design variable and cannot resolve
interactions between pairs of variables—a characteristic which can make these designs
quite risky.
100 Chapter Four

Response-surface experiments are more complex and difficult to administrate than


screening experiments, so they generally involve just two to five variables. Every vari-
able in a response surface design must be quantitative and three or more levels of each
variable will be required. The benefit of using so many variable levels is that response
surface designs provide very complex models that include at least main effects, two-
factor interactions, and terms to measure the curvature induced in the response by each
design variable.
There is an intermediate set of experiment designs that falls between screening
experiments and response surface experiments in terms of their complexity and capa-
bility. These experiments typically use two levels of each design variable and can
resolve main effects, two-factor interactions, and sometimes higher-order interactions.
When the design variables are all quantitative, a select set of additional runs with inter-
mediate variable levels can be included in these designs to provide a test for, but not
complete resolution of, curvature in the response. The existence of this family of inter-
mediate designs should make it apparent that there is actually a discrete spectrum of
experiment designs for a given number of experimental variables, where the spectrum
is bounded by screening and response surface designs.
When faced with a new situation where there is little prior knowledge or experi-
ence, the best strategy may be to employ a series of smaller experiments instead of com-
mitting all available time and resources to one large experiment. The first experiment
that should be considered is a screening experiment to determine the most influential
variables from among the many variables that could affect the process. A screening
experiment for many variables will usually identify the two or three significant vari-
ables that dominate the process. The next step in the series of experiments would be to
build a more complex experiment involving the key variables identified by the screening
experiment. This design should at least be capable of resolving two-factor interactions,
but often the chosen design is a response surface design which can more completely
characterize the process being studied. Occasionally when such a series of experiments
is planned, the insights provided by the early experiments are sufficient to indicate an
effective solution to the problem that initiated the project, victory can be declared, and
the experimental program can be suspended.

4.9 TYPES OF MODELS


There have been many references to a model to be constructed from experimental data.
The word model refers to the mathematical description of how the response behaves as
a function of the input variable or variables. A good model explains the systematic
behavior of the original data in some concise manner. The specific form of the model
depends on the type of design variable used in the experiment. If an experiment contains
a single qualitative design variable set to several different treatment levels, then the
model consists of the treatment means. There will be as many means for the model as
there are treatments in the experiment. If an experiment contains a single quantitative
DOE Language and Concepts 101

variable that covers a range of values, then the model will consist of an equation that
relates the response to the quantitative predictor. Experiments that involve qualitative
predictors are usually analyzed by analysis of variance (ANOVA). Experiments that
involve quantitative predictors are usually analyzed by regression. Experiments that com-
bine both qualitative and quantitative variables are analyzed using a special regression
model called a general linear model.
Any model must be accompanied by a corresponding description of the errors or dis-
crepancies between the observed and predicted values. These quantities are related by:

yi = yˆ i + εi (4.1)

where yi represents the ith observed value of the response, ŷi represents the correspond-
ing predicted value from the model, and ei represents the difference between them. The
ei are usually called the residuals. In general, the relationship between the data, model,
and error can be expressed as:

Data → Model + Error Statement (4.2)

At a minimum, the error statement must include a description of the shape or dis-
tribution of the residuals and a summary measure of their variation. The amount of error
or residual variation is usually reported as a standard deviation called the standard
error of the model indicated with the symbol ŝe or se . When the response is measured
under several different conditions or treatments, which is the usual case in a designed
experiment, then it may be necessary to describe the shape and size of the errors under
each condition.
Most of the statistical analysis techniques that we will use to analyze designed
experiments demand that the errors meet some very specific requirements. The most
common methods that we will use in this book, regression for quantitative predictors
and ANOVA for qualitative predictors, require that the distribution of the errors is
normal in shape with constant standard deviation under all experimental conditions.
When the latter requirement is satisfied, we say that the distribution of the errors is
homoscedastic. A complete error statement for a situation that is to be analyzed by
regression or ANOVA is, “The distribution of errors is normal and homoscedastic with
standard error equal to se ,” where se is some numerical value. If the distribution of errors
does not meet the normality and homoscedasticity requirements, then the models
obtained by regression and ANOVA may be incorrect.* Consequently, it is very impor-
tant to check assumptions about the behavior of the error distribution before accepting
a model. When these conditions are not met, special methods may be required to ana-
lyze the data.

* When the standard deviations of the errors are different under different conditions (for example, treatments) we say
that the error distributions are heteroscedastic.
102 Chapter Four

a. b. c.

180 180 180

160 160 160

140 140 140

120 120 120

100 100 100

A B C 20 25 30
Lot Temperature

Figure 4.5 Three experiments and three models.

Example 4.3
A manufacturer wants to study one of the critical quality characteristics of his
process. He draws a random sample of n = 12 units from a production lot and measures
them, obtaining the distribution of parts shown in Figure 4.5a. The mean of the sample
is –x = 140 and the standard deviation is s = 10. A normal plot of the sample data (not
shown) indicates that the observations are normally distributed. From this information,
identify the data, model, and the error statement.
Solution: The data values are the n = 12 observations, which can be indicated with
the symbol yi where i = 1, 2, . . . , 12. The model is the one number, ŷi = –y = 140, that
best represents all of the observations. The error values are given by e i = yi – –y and are
known to be normally distributed with standard deviation se 10. These definitions
permit Equation 4.2 to be written:

yi = y − εi (4.3)

Example 4.4
The manufacturer in Example 4.3 presents his data and analysis to his engineer-
ing staff and someone comments that there is considerable lot-to-lot variation in the
product. To test this claim, he randomly samples n = 12 units from three different lots.
The data are shown in Figure 4.5b. The three lot means are –yA = 126, –yB = 165, and –yC
= 123 and the standard deviations are all about se = 10. Normal plots of the errors
DOE Language and Concepts 103

indicate that each lot is approximately normally distributed. From this information,
identify the data, model, and the error statement.
Solution: The data are the n = 12 observations drawn from the k = 3 lots indicated
by the symbol yij where i indicates the lot (A, B, or C) and j indicates the observation
(1 to 12) within a lot. The model consists of the three means –yA = 126, –yB = 165, and –yC
= 123. The error statement is, “The errors are normally distributed with constant stan-
dard deviation se 10.”

Example 4.5
After reviewing the data and analysis described in Example 4.4, someone realizes
that the part temperatures were different for the three lots at a critical point in the
process. They decide to run an experiment by making parts at different temperatures.
n = 12 parts were made at 20, 25, and 30C in completely randomized order and the data
are shown in Figure 4.5c. They use linear regression to fit a line to the data and obtain
y = 60 + 3T where T is the temperature. The errors calculated from the difference
between the observed values and the predicted values (that is, the fitted line) are
approximately normal and have se 10. From this information, identify the data,
model, and the error statement.
Solution: The data are the 36 sets of paired (Ti , yi) observations. The model is given
by the line y = 60 +3T. The error statement is, “The errors are normally distributed with
constant standard deviation se 10.”

Models involving quantitative predictors are written in the form of an equation.


These models may be empirical or based on first principles depending on the needs of
the analyst. The goal of an empirical model is to provide an accurate description of the
response independent of the physical mechanisms that cause the predictors to affect
the response. Empirical models tend to be arbitrary. A model based on first principles
gets its functional form from the mechanistic theory that relates the predictors to the
response. First-principles models may be based on very crude to highly accurate ana-
lytical study of the problem. It may be safe to extrapolate a first-principles model but
empirical models should never be extrapolated.
Whether an empirical or first-principles model is fitted to data depends on the
motivation of the experimenter. A scientist who wants to demonstrate that data follow
some theoretical formula will, of course, have to use the first-principles approach. This
may involve some heroics to transform the formula into a form that can be handled by
the software. On the other hand, a manufacturing engineer probably doesn’t care about
the true form of the relationship between the response and the predictors and is usually
willing to settle for an effective empirical model because it gets the job done. He would
still be wise to stay conscious of any available first-principles model because it will
suggest variables, their ranges of suitable values, and other subtleties that might influ-
ence the design of the experiment even if time or model complexity prohibit the use of
the first-principles model. When a first-principles model is available, it is almost
104 Chapter Four

always preferred over an empirical model, even if the empirical model provides a
slightly better fit to the data.

Example 4.6
An experiment is performed to study the pressure of a fixed mass of gas as a func-
tion of the gas volume and temperature. Describe empirical and first-principles models
that might be fitted to the data.
Solution: In the absence of any knowledge of the form of the relationship between
the gas pressure (P) and its volume (V) and temperature (T), an empirical model of the
form:

P = a + bV + cT + dVT (4.4)

might be attempted where a, b, c, and d are coefficients to be determined from the data.
For the first-principles model, the kinetic theory of gases suggests that an appropriate
model would be:

aT (4.5)
P=
V

where a is a coefficient to be determined from the data. Although both models might
fit the data equally well, the second model would be preferred because it is suggested by
the theoretical relationship between P, T, and V.

So why are we so concerned about models? What’s the purpose for building them
in the first place? These questions are also asking about our motivation for doing
designed experiments. The purpose of any designed experiment and its corresponding
model is to relate the response to its predictors so that the response can be optimized by
better management of the predictors. Some of the reasons to build a model are:
• To determine how to maximize, minimize, or set the response to some
target value.
• To learn how to decrease variation in the response.
• To identify which predictor variables are most important.
• To quantify the contribution of predictor variables to the response.
• To learn about interactions between predictor variables.
• To improve the operation of a process by learning how to control it better.
• To simplify complex operating procedures by focusing attention on the most
important variables and by taking advantage of previously unrecognized
relationships between predictor variables and between them and the response.
DOE Language and Concepts 105

4.10 SELECTION OF VARIABLE LEVELS


The selection of the variable levels for a designed experiment is a very serious issue.
Many experiments fail because the levels of one or more variables are chosen incor-
rectly. Even when a variable level is chosen badly and much of the data are lost, the
DOE method can often recover lots of information from the surviving data. This is one
aspect of the robustness provided by DOE.

4.10.1 Qualitative Variable Levels


For qualitative variables the choice of levels is not so critical. Just be sure that each level
is practical and should give valid data. For the epoxy example, do not consider
Manufacturer C in your experiment if you know that their resin has an inherent prob-
lem when used in your process. If, however, you don’t know why their resin is a problem
and it’s much cheaper than the others, you may want to use it in your experiment anyway.
The experiment may show that with the correct choice of other variables, Manufacturer
C’s resin is perfect for the job and will save you money.
Sometimes it’s possible to redefine a qualitative variable as a quantitative variable.
For example, the old classification of Manufacturer A, B, and C for resin would change
if it was determined that the only difference between resins was quantitative, such as if
the resins only differed in wax content, say 1, 1.3, and 3 percent. If this is the case, an
experiment designed to resolve the effects of wax content might predict improved
epoxy performance with two percent wax. Now you’re in a position to compromise and
use the best wax, A, B, or C, or inquire about a special resin with two percent wax, or
mix resins to get two percent wax. Always try to redefine a qualitative variable to make
it quantitative. Even if you don’t choose to analyze it or interpret it in this way, it pro-
vides increased understanding of how the variable behaves.
Sometimes qualitative variables can have only two levels: yes or no. Many process
variables behave like this—you either do the step in the process or you don’t. For exam-
ple, if surface preparation in the epoxy example is done by sanding, then the surface
might be sanded or not sanded.

4.10.2 Quantitative Variable Levels


The selection of levels for quantitative variables can become quite complicated. The
most important issue is the choice of the highest and lowest levels. These levels must
be safe, that is, the product obtained at these levels should be useful or at least the
process should be able to operate at these levels. This tends to force the choice of levels
to be narrow so there’s less risk of losing runs or doing damage. If, however, the levels are
chosen too close together, you may see no difference between them and you may miss
something important outside of the range of experimentation. Experimenters are always
trying to guess the highest and lowest safe levels for variables so that they have a high
likelihood of seeing measurable effects on the responses. This is often a difficult and
106 Chapter Four

nerve-wracking task and it’s very important to include the experts, operators, and man-
agers who are knowledgeable of and responsible for the process because they are the
ones most likely to offer valuable guidance.
To emphasize the importance of picking the highest and lowest levels of quantita-
tive variables in an experiment, suppose that there are five variables in an experiment.
If the safe range of operation for each variable is known, but only one half of that range
is used just to be really safe, then the five-variable experiment will only cover (1/2)5 =
0.031 or three percent of the possible design space. The chances of finding a good
design are significantly reduced by using too narrow a range for the variables. This is a
very common mistake made by novice experimenters.
When three or more levels of a quantitative variable are used in an experiment,
there are several ways to choose the spacing between levels. The most common choice
for a three-level quantitative variable is to use three equally spaced levels, often denoted
with the coded values –1, 0, and +1 or just –, 0, and +. For example, if the batch-size
variable in the epoxy example uses three levels of 50, 100, and 150cc, then the levels
are referred to using the codes –1, 0, and +1, respectively.
When the increment between levels is constant, we say that we have a linear scale for
that variable. It is also possible to design level selections using other schemes. For exam-
ple, levels can be selected on the basis of squares (for example, 1, 4, 9) or on a log scale
(for example, 3, 6, 12). In each case, the three levels are still referenced with the codes –1,
0, and +1. The use of special scaling for levels is usually based on the experimenter’s
understanding of the response and its expected dependence on the study variable.

4.11 NESTED VARIABLES


Sometimes it is impossible or impractical for the levels of one variable to be expressed
within each level of another variable. In this case we say that one variable is nested within
another. For example, suppose we are interested in a manufacturing process in which
two machines (x1 : A or B) are supposed to be producing the same material and that each
machine has eight heads or stations (x2 : 1 to 8) that are all supposed to perform the
exact same operation. (Since the product that flows into these two machines gets sepa-
rated into 16 separate but hopefully identical channels, this is called a multiple stream
process.) It’s not logical to try to compare pairs of heads with other pairs of heads, such
as the two heads with x2 = 1 on machines A and B with the two heads x2 = 2, since they
are physically different heads. The comparison is just not meaningful. Instead, we say that
heads are nested within machines and treat each head as the unique one that it is. In
order to un-nest the heads it would be necessary to redefine the head variable as having
16 levels (x2 : 1 to 16) and physically move the heads to each of the 16 different posi-
tions on the two machines. Ouch!
Another example of nesting is when two manufacturers (x1 : A or B) are each asked to
provide three different lots of material for evaluation. Someone might choose to identify
each manufacturer’s lots with identification numbers 1, 2, and 3 (x2 : 1, 2, 3), but lot 1 from
manufacturer A has no relationship to lot 1 from manufacturer B other than that they have
DOE Language and Concepts 107

the same identification number. It would be just as appropriate, and perhaps clearer, to
identify the lots as 1, 2, and 3 for manufacturer A and 4, 5, and 6 for manufacturer B.
Regardless of how the lots are numbered, they are nested within manufacturers.

4.12 COVARIATES
Figure 4.3 shows that all of the variables in an experiment can be classified as inten-
tionally controlled at desired levels, held constant, or uncontrolled. An uncontrolled
quantitative variable that can be measured during the experiment is called a covariate.
Common covariates are variables like temperature, atmospheric pressure, humidity, and
line voltage. If the covariate has no influence on the response then it is not of any con-
sequence, but in many cases it is unclear if the covariate is important or not. All known
variables that are uncontrolled during the experiment are covariates and should be mea-
sured and recorded. Then when the statistical analysis of the experimental data is per-
formed, the effect of these covariates can be removed from the response. Generally, the
effect of the covariate should have a very small if not unmeasurable effect on the response.
If the effect of a covariate becomes too large it can interfere with estimates of the effects
of other variables.
Covariates must be continuous (that is, quantitative) variables. They are always ana-
lyzed using regression methods. For this reason, the word covariate is also used to refer
to quantitative variables that are intentionally varied in the experiment, even if they only
appear at two discrete levels, because they are also analyzed with regression methods.

4.13 DEFINITION OF DESIGN IN


DESIGN OF EXPERIMENTS
The word design in the phrase design of experiments refers to the way in which vari-
ables are intentionally varied over many runs in an experiment. Once the experimental
variables are identified and the levels of each variable are chosen, the experiment can
be designed. Usually the experiment design is expressed in the form of two matrices: a
variables matrix and a design matrix. Consider the epoxy example. Suppose the vari-
ables to be considered are batch size, resin manufacturer, and mixing time and that it
has been decided to use two levels for each variable. The following variables matrix
shows one possible way to select variable levels:

Level x1 : Batch size x2 : Resin x3 : Mixing time


– 50cc A 1 minute
+ 150cc B 3 minutes

The purpose of this matrix is to clearly define the experimental variables and their
levels. Note the use of the generic variable names x1, x2, and x3. Their use permits ref-
erences to variables without knowing their names or the context. (Sometimes the letters
108 Chapter Four

A, B, and C are used instead of x1, x2, and x3. Some people prefer to use x1, x2, . . . to
indicate quantitative variables and A, B, . . . to indicate qualitative variables but there’s
no standardized convention for assigning generic names to variables.)
Now, using the – and + notation, an experiment design is shown in the design
matrix:

Std Run x1 x2 x3
1 4 – – –
2 6 – – +
3 2 – + –
4 7 – + +
5 8 + – –
6 1 + – +
7 3 + + –
8 5 + + +

This experiment has eight runs. The Std or standard order column uses an integer to
identify each unique configuration of x1, x2, and x3. Each row, called a run or a cell of
the experiment, defines a different set of conditions for the preparation of an epoxy
sample. For example, run number 3 is to be made with levels (x1, x2, x3) = (–, +, –) or
with a 50cc batch size, Manufacturer B’s resin, and a one minute mixing time. This par-
ticular design is called a 23 full factorial design because there are three variables, each
at two levels, and the experiment requires 23 = 8 runs.
The standard order column identifies the logical order of the experimental runs. The
actual runs of the experiment must not be performed in this order because of the possi-
bility of confusing one of the study variables with a lurking variable, that is, an uncon-
trolled and unobserved variable that changes during the experiment and might affect the
response. The order of the experimental runs is always randomized, such as the order
shown in the Run or run order column. Randomization doesn’t provide perfect protec-
tion against lurking variables, but it is often effective so we always randomize. Randomi-
zation is so important that it is considered to be part of the experiment design—any
design is incomplete if a suitable randomization plan has not been identified.
The matrix of experimental runs is often organized by standard order in the plan-
ning and analysis stages of DOE, but is best organized by random run order when the
experiment is being constructed. This simplifies the job of the person who actually has
to build the experiment, and decreases the chances of making a mistake.

4.14 TYPES OF DESIGNS


There are many different kinds of experiment designs. Generally they can be classified
into large groups with strange names: factorials, 2n factorials, fractional factorials, central
DOE Language and Concepts 109

15

10
Percent

0
One-way
classification

2^2

2^3

Two-way
classification
2^4

BB(3)

2^(4–1)IV

2^(5–1)V

2^(7–4)III

CC(5–1)

Other
Designs

Figure 4.6 Pareto analysis of experiment designs.

composite, Box-Behnken, and Plackett-Burman. There are also hybrid designs which
combine characteristics from two or more of these groups. But as complicated as all of
this sounds, only a handful of designs are used for the majority of experiments. You’d
never guess this from looking at a DOE textbook. The books are always full of all kinds
of big elaborate experiment designs because those are the fun ones for the authors to
talk and write about. Figure 4.6 is a Pareto chart that attempts to convey how often spe-
cific designs get built. (The data are fictional and definitely change from experimenter
to experimenter, and technology to technology, but you get the idea.) The figure shows
that there are less than a dozen designs that account for nearly all of the experiments that
get built, and all of the remaining cases fall into the “Other” category. This book will
attempt to focus on the “vital few” designs.

4.15 RANDOMIZATION
It is usually impossible to construct all of the runs of an experiment simultaneously, so
runs are typically made one after the other. Since uncontrolled experimental conditions
could change from run to run, the influence of the order of the runs must be considered.
Even a simple experiment with one variable at two levels would be easiest to build
if all of its runs were done in a convenient order (for example, 11112222); however,
such run order plans run the risk of mistakenly attributing the effect of an unobserved
variable that changes during the experiment to the experimental variable. The accepted
method of protecting against this risk is to randomize the run order of the levels of the
experimental variable (for example, 21121221). By randomizing, the effects of any
110 Chapter Four

unobserved systematic changes in the process unrelated to the experimental variable are
uniformly and randomly distributed over all of the levels of the experimental variable.
This inflates the error variability observed within experimental treatments, but it does
not add bias to the real and interesting differences between treatments. As an example
of this concept, an experiment with a single classification variable with several levels
that are run in random order is called a completely randomized design. Completely ran-
domized designs will be presented in detail in Chapter 5.
Sometimes you must include a variable in an experiment even though you’re not
interested in detecting or making claims about differences between its levels. For
example, an experiment to compare several operators might require so many parts
that raw material for the parts must come from several different raw material lots. If
the lot-to-lot differences are not important, then the experiment could be run using
one lot at a time, one after the other. To be able to make valid claims about differences
between operators, each operator would have to make parts using material from each
lot and the operator order would have to be randomized within lots. For example, if
there were four operators and three material lots, the following run order plan might
be considered:

Run Order 1 2 3 4 5 6 7 8 9 10 11 12
Lot A A A A B B B B C C C C
Operator 2 1 3 4 4 3 2 1 2 4 3 1

In this experiment, called a randomized block design, the raw material lots define
blocks of runs, and operator is the study variable. Because the blocks (that is, the raw
material lots) are not run in random order, it is not safe to interpret any observed dif-
ferences between them because the differences could be due to unobserved variables
that change during the experiment. Since the operators are run in random order, it is safe
to interpret differences between them as being real differences between operators. Even
though the randomized block design has two variables, it is considered to be a one-variable
experiment because claims can be made about only one of the two variables—always
the study variable, which must be run in random order. Despite the loss of information
about differences between the levels of the blocking variable, the use of blocking often
increases the sensitivity of the experiment to differences between the levels of the study
variable.
To summarize:
• If you intend to makes claims about differences between the treatment levels of
a variable, then the run order of the treatment levels must be randomized.
• If a variable must be included in an experiment but you don’t intend to make
claims about differences between its levels, then the levels do not need to be
randomized. Instead, the levels of the variable are used to define blocks of
experimental runs.
DOE Language and Concepts 111

Example 4.7
Contamination introduced during a powder dry process forms insoluble particles in
the dried powder. A powder drying process introduces contamination that forms insol-
uble particles into the powder. When the powder is dissolved, these particles eventually
clog a critical filter. An alternative drying schedule is proposed that should reduce the
amount of contamination. Describe a run order plan for an experiment to compare
the amount of insoluble particles formed in the two processes. A sample-size calcula-
tion based on historical process information indicates that 10 observations will be
required from each drying process.
Solution: The simplest way to run the experiment would be to configure the drying
system for the first process and then to complete all 10 trials before reconfiguring
the system for the second process and its 10 trials. The run order would be
11111111112222222222. However, due to possible changes in raw material, tempera-
ture, humidity, concentration of the contaminant, the measurement process, and so on,
during the experiment, the 20 experimental trials should be performed in random order,
such as: 221221212111221111212. Then if one or more unobserved variables do
change and have an effect on the response, these effects will be randomly but uniformly
applied to both treatments and not affect the true difference between the treatments.

Example 4.8
Suppose that the three-variable epoxy experiment described in Section 4.13 was
built in the standard run order indicated in the table and that a significant effect due to
x1 was detected. What can you conclude about the effect of x1 on the response from this
experiment?
Solution: Because the experimental runs were not performed in random order,
there is a chance that the observed effect that appears to be caused by x1 is really due
to an unobserved variable. No safe conclusion can be drawn from this experiment about
the effect of x1 on the response. Although the experiment design is a good one, its use-
fulness is compromised by the failure to randomize the run order.

Example 4.9*
A student performed a science fair project to study the distance that golf balls trav-
eled as a function of golf ball temperature. To standardize the process of hitting the golf
balls, he built a machine to hit balls using a five iron, a clay pigeon launcher, a piece
of plywood, two sawhorses, and some duct tape. The experiment was performed using
three sets of six Maxfli golf balls. One set of golf balls was placed in hot water held at
66C for 10 minutes just before they were hit, another set was stored in a freezer at –12C
overnight, and the last set was held at ambient temperature (23C). The distances in
yards that the golf balls traveled are shown in Table 4.1, but the order used to collect
the observations was not reported. Create dotplots of the data and interpret the differ-
ences between the three treatment means assuming that the order of the observations

* Source: “The Effect of Temperature on the Flight of Golf Balls.” John Swang. The National Student Research
Center. Used with permission.
112 Chapter Four

Table 4.1 Flight distance of golf balls versus temperature.


Trial
Temp 1 2 3 4 5 6
66C 31.50 32.10 32.18 32.63 32.70 32.00
–12C 32.70 32.78 33.53 33.98 34.64 34.50
23C 33.98 34.65 34.98 35.30 36.53 38.20

Temperature

Normal

Cold

Hot
31.4 32.4 33.4 34.4 35.4 36.4 37.4 38.4
Distance (yards)

Figure 4.7 Golf ball distance versus temperature.

was random. How does your interpretation change if the observations were collected in
the order shown—all of the hot trials, all of the cold trials, and finally all of the ambi-
ent temperature trials?
Solution: Dotplots for the distances versus the three golf ball temperature treat-
ments are shown in Figure 4.7. The dotplots and Tukey’s quick test suggest that the
treatment means are all different from each other and that the balls at ambient temper-
ature traveled the farthest. If the order of the observations was indeed random, then this
conclusion is probably justified; if, however, the observations were taken in the order
shown: hot, cold, then normal temperature, the steady increase in the distance suggests
that something else might have been changing during the experiment that caused the
golf balls to travel farther on later trials. If the run order was not randomized then it’s
not safe to conclude from these data that golf balls are sensitive to temperature. Given
the lack of information about the run order, this experiment cannot be used to support
claims about the effect of temperature on golf ball flight distance.

There are low-tech and high-tech ways to determine the random order of runs for
an experiment. When the experiment is presented in its logical or standard order, each
cell is assigned a number indicating its position in the experiment: 1, 2, 3, and so on.
Low-tech randomization can be done by writing those numbers on slips of paper, one
number per slip, and pulling them from a hat to determine the random run order. Decks
of cards and dice can also be used.
High-tech randomization uses a computer with random number generating capabil-
ity to assign the run order for the cells. MINITAB can be used to create a random run
order for an experiment with some of the tools from its Calc> Random Data menu. For
example, the Calc> Random Data> Sample from Columns function can be used to
DOE Language and Concepts 113

sample, without replacement, from the column showing the experiment’s standard order
into a new column for the random order. For convenience, the experiment design work-
sheet should then be sorted (use Data> Sort) by the random order column so that the
runs are shown in the correct randomized order.
It’s important to validate your randomization plan before beginning an experiment.
This can be done by analyzing your experiment using the run order as the response. If
any of the design variables are found to be good predictors for the random order of the
runs, then the randomization wasn’t effective and should be performed again.
Randomization is often hard to do and painful but you have no choice—you must
randomize. Randomize any variable that you intend to make claims about and use vari-
ables that you can’t or don’t need to randomize to define blocks. It’s often easy and
tempting to compromise the randomized run order. Don’t! Be especially careful if
someone else is running the experiment for you. Make sure that they understand that the
order of the runs must be performed in the specified order. If it is ever necessary to devi-
ate from the planned run order of an experiment, make sure to keep careful records of
the actual run order used so that the effect of the change in run order can be assessed.

4.16 REPLICATION AND REPETITION


The design matrix of an experiment determines what terms the model will be capable
of resolving, but the sensitivity of the analysis to small variable effects is determined by
the number of times each experimental run is built. Generally, the more times the runs
of an experiment design are built the greater will be the sensitivity of the experiment.
There are two different ways that the runs of an experiment design can be repeated.
When consecutive units are made without changing the levels of the design variables
between units, these like units are called repetitions. When two or more like units are
produced in an experiment, but at different times spaced throughout the experiment and
not as consecutive units, these like units are called replicates.
DOE novices usually have difficulty using and understanding the word replicate
because it is used as both a noun and a verb and is even pronounced differently in the
two cases. As a noun, the word replicate (-cate rhymes with kit) is used to refer to each
set of unique runs that make up a complete experiment design. As a verb, we replicate
(-cate rhymes with late) an experiment design by building replicates.
At first it might seem that the use of repetitions and replicates would give similar, if
not identical, results, but that is usually not the case. Indeed, the values of the response
for both repetitions and replicates will be nearly identical if the process is stable, but
replication almost always leads to greater variation in the response due to changes in
uncontrolled variables. Despite this apparent disadvantage of replication over repeteti-
tion, replication generally provides a more realistic measure of the inherent noise in the
process and is the preferred way to increase the number of runs in an experiment. The
difference in the values associated with repetitions and replicates is made clear by how
they are treated in statistical analyses; repeated runs are averaged whereas individual
replicated observations are preserved, so repetitions do comparatively little to increase
the sensitivity of an experiment to small variable effects.
114 Chapter Four

The number of replicates required for an experiment is often chosen arbitrarily,


based on historical choices or guidelines, but should instead be determined by an objec-
tive sample-size calculation similar to those performed in Chapter 3. The inputs required
to complete such sample size calculations are: 1) an estimate of the inherent error vari-
ation in the process, 2) the size of the smallest variable effect considered to be practi-
cally significant, and 3) knowledge of the model to be fitted to the experimental data.
When one or more of these values are unknown, they should be estimated by consider-
ing prior experience with similar processes, information from preliminary experiments,
or expert opinion.
It is very important that the number of replicates per cell in the experiment design
be held constant. An unequal number of replicates throws off the balance of the exper-
iment and can lead to biased or even incorrect conclusions. The balance of the experi-
ment is referred to as its orthogonality and we say that an unbalanced experiment has
suffered some loss of orthogonality. There is a rigorous mathematical meaning to the
term orthogonality, but we will only use the term in a binary sense to address the issue
of whether the experiment is balanced or not. Often an experiment intended to have an
equal number of replicates suffers from some loss of units due to breakage or early
product failure. Some experiments, especially those that are replicated several times,
can tolerate lost units, but recovering the integrity of an experiment with a substantial
loss of orthogonality requires finesse and experience. Some recommendations on how
to deal with a few lost units will be presented in this book, but plan on replacing lost
units if it’s at all possible, and consult with your neighborhood statistician when it’s not.
Some experiments can be fractionally replicated. Fractionally replicated experi-
ments use only certain runs, such as one half or one quarter of all of the possible runs.
If they are carefully designed, these experiments can be very efficient but they do have
limitations, such as the inherent confounding (or aliasing) of variables and interactions.
If the confounding is managed correctly, a fractionally replicated experiment can pro-
vide most of the information that a fully replicated experiment would reveal.
The randomization of replicates can take two forms: complete randomization and
limited randomization. In complete randomization, all runs of all replicates are eligible
to be run at any time. With limited randomization, all of the runs of each replicate are
completed before the next replicate is started, with the runs within each replicate
peformed in random order. This approach, called blocking on replicates, has the advan-
tage that biases between the blocked replicates, that would otherwise be attributed to
experimental noise, can be isolated in the experiment analysis. By isolating the block
biases the experimental error is reduced which increases the sensitivity of the experi-
ment to small variable effects. This advantage makes blocking on replicates the pre-
ferred practice over complete randomization.

4.17 BLOCKING
Often while preparing a designed experiment to study one or more variables, another
important variable, a nuisance variable, is identified that cannot be held constant or
DOE Language and Concepts 115

randomized. If during the course of the experiment this variable’s level changes which
has a corresponding effect on the response, these changes in the response will inflate the
noise in the experiment making it less sensitive to small but possibly important differ-
ences between the levels of the study variables. Rather than tolerate the additional noise
introduced by this nuisance variable, the experimental runs should be built in subsets
called blocks, where each block uses a single level of the nuisance variable. The usual
method of assigning runs to blocks is to build one or more complete replicates of the
experiment design within each block. Then when the statistical analysis of the data is
performed, the blocking variable is included in the model so that any differences
between the nuisance variable’s levels are accounted for. This approach isolates the
variation caused by the nuisance variable and recovers the full sensitivity of the exper-
iment design.
Although a blocking variable is included in the statistical analysis of an experiment,
we usually don’t test to see if there are differences between its levels. Such tests would
be unsafe because, since the levels of the blocking variable are not typically run in ran-
dom order, there may be other unidentified variables changing during the experiment
that are the real cause of the apparent differences between levels. When a study variable
cannot be randomized and must be run in blocks, we must be very careful to guarantee
that the experimental conditions are as constant as possible and we must stay conscious
of the risk that our conclusions about differences between blocks might be wrong. If we
really need to determine if there are differences between the levels of some variable,
then we have no choice—its levels must be run in random order. If we don’t need to
determine if there are differences between the levels of some variable, then we can treat
it as a blocking variable.
We almost always want to construct the runs of our experiments in random order;
however, in many cases this is impossible or impractical. For example, in the epoxy
problem, imagine that a large-scale operation required a full day to clean out all of the
equipment to make the change from resin A to resin B. Running the experiment with a
random choice of resin from run to run is desirable from an experimental standpoint,
but the many days required to change resins is definitely not practical. One choice is to
perform all of the experimental runs with one resin before switching to another. Then
only one changeover is required and the experiment will be completed quickly. In this
case, we say that the resin variable is run in two blocks and that resin is a blocking vari-
able. The danger of this approach is that if a steady trend or shift occurs in the process
during the experiment that is unrelated to the differences between resins A and B, the
trend will be misattributed to differences between the resins. If the purpose of the exper-
iment is to study variables other than resin, then it is appropriate to run resin in blocks.
But if the purpose of the experiment is to measure the differences between the resins,
then there is no choice—resins must be run in random order. This is always the case for
blocking variables—effects attributed to blocking variables may have other causes and
until the variable is run in random order you cannot be certain of the real cause of the
observed effect.
It is common to have several nuisance variables dealt with in blocks in a single
experiment in order to study a single independent variable. Even though an experiment
116 Chapter Four

might contain many variables, it still is often called a single-variable experiment


because only one variable is randomized and all the others are used to define blocks.
Such an experiment is still considered to be a one-variable experiment because there’s
only one variable for which safe conclusions can be drawn.
When an experiment gets so large that it cannot be completed at one time, it should
be built in blocks defined, for example, by days or shifts. Then if there are differences
between the blocks, the differences can be accounted for in the analysis without
decreasing the sensitivity of the original design. When even a single replicate of a
design it too big to build at one time, the design can often be broken into fractional repli-
cates, such as two half-fractions or four-quarter fractions, so that the number of runs in
a block or blocks defined by these fractional replicates are more reasonable in size.
In some designs, blocks can be analyzed independently as they are built and then
the results of several blocks can be combined to improve the analysis. Another advan-
tage of blocking is that occasionally a single block or just a few blocks of a large exper-
iment might be built, analyzed, and found to answer all relevant questions so that the
remaining blocks do not have to be constructed. Use this to your advantage! There is no
better way to earn brownie points with management than to announce that the last half
of an experiment does not have to be built and that a process can again be used to pro-
duce saleable product instead of experimental units.

Example 4.10
Describe a blocking and randomization plan for the three-variable eight-run exper-
iment design from Section 4.13 if the experiment requires three replicates and only
twenty runs can be completed in a day.
Solution: Since the full experiment requires 24 runs and only 20 runs can be com-
pleted in a day, it is necessary to build the experiment over at least a two-day period.
To account for possible differences between the morning of the first day, the afternoon
of the first day, and the morning of the second day, the experiment will be built in three
blocks. Each block will contain a single replicate of the eight-run experiment design
with the runs within blocks in random order. Table 4.2 suggests a possible blocking and
randomization plan. The numbers in the columns for blocks 1, 2, and 3 indicate the run
order within the blocks.

Example 4.11
Suppose that management decides that the experiment from Example 4.10 will take
too much time to complete. As a compromise, they decide that all of the runs from resin
manufacturer A will be done first, followed by all of the runs from manufacturer B, so
that the experiment can be completed in a single day. Describe a blocking and ran-
domization plan for the new experiment and discuss how the analysis and conclusions
will differ from the original plan.
Solution: The new experiment will be built with two blocks of twelve runs each,
defined by manufacturer (x2). The study variables will be batch size (x1) and mixing time
(x3). This two-variable experiment requires 2 × 2 = 4 runs per replicate. Since each
DOE Language and Concepts 117

Table 4.2 Blocking and randomization plan for a 24-run experiment in three blocks.
Block
Run x1 x2 x3 1 2 3
1 – – – 3 8 5
2 – – + 2 1 3
3 – + – 8 7 6
4 – + + 5 5 7
5 + – – 4 6 4
6 + – + 6 2 8
7 + + – 7 4 1
8 + + + 1 3 2

block will contain twelve runs there will be three replicates per block. If the experi-
mental conditions within blocks are expected to be stable, then the twelve runs within
each block could be completely randomized. If other variables may cause differences
within the original two blocks, however, then each block should consist of three sub-
blocks defined by replicates of the four-run experiment. Generally, the latter choice is
preferred.

4.18 CONFOUNDING
Sometimes, by accident or design, an experiment is constructed so that two variables
have the same levels for each run in the experiment; that is, if our variables are x1 and
x2, then x1 = x2. When this happens, it becomes impossible to separate the effects of the
two variables. It’s like having two input knobs on a machine that are always locked
together. When one is changed, the other changes with it so that the true cause of any
change in the output cannot be determined. When two variables are coupled or locked
together like this we say that the two variables are confounded or aliased. Confounding
should be avoided when possible, but sometimes it’s necessary to design an experiment
with some confounding of the variables. For example, there are certain designs where
a variable is intentionally confounded with an interaction, such as x1 = x23.
Variables can still be confounded and not have exactly the same settings. Suppose ±
that one variable x1 has levels ±1 and a second variable x2 has corresponding levels 1.
That is, whenever the first variable is +1, the second is –1, and vice versa. A concise
way of writing this relationship is x1 = – x2. These two variables are still confounded
with each other because the settings of one variable determine the settings of the other.
Confounding is an issue of the ability of one variable to predict another. Ideally we want
our experimental variables to be independent of each other, that is, no variable should
be predictable from another variable or combination of variables. We design experi-
ments so that the variables are independent, that is, not confounded.
Confounding of variables is not a simple binary state. Two variables can be inde-
pendent of each other (that is, not confounded), perfectly confounded with each other,
118 Chapter Four

or they can fall in some intermediate state between the two extremes. Some small
degree of confounding is tolerable under most circumstances, but large amounts of con-
founding can cause problems. One way that minor confounding can appear in an exper-
iment is when a well designed experiment, in which all of the variables are independent,
loses some experimental runs. There are safe ways to handle some missing data, but an
experiment with lots of missing runs will have to be supplemented with new runs or run
again from scratch.
The order of the runs in an experiment is another variable that is always present but
often overlooked. When the levels of a variable change in some systematic way from
the start to the end of the experiment (for example, AAAABBBBCCCC) we say that
the variable is confounded with the run order. When a variable is confounded with the
run order, it is unsafe to attribute an observed effect to that variable because another
unobserved variable, called a lurking variable, that changes during the experiment
could have been the real cause of the effect. Because we can never be certain that there
are no lurking variables present, we must assume that they are there and protect our-
selves from them. We do this by randomizing the run order so that the effects of any
lurking variables are not confounded with the experimental variables.

4.19 OCCAM’S RAZOR AND EFFECT HEREDITY


Most statistical analyses of data from designed experiments involve many model terms.
Some of these terms are main effects, some are two-factor and higher-order interactions,
sometimes when there are quantitative variables a model may have quadratic terms, and
other terms are possible. Usually the first model fitted to an experimental data set
includes all possible terms in the model, but many of these terms turn out to be statisti-
cally insignificant. Rather than reporting the full model with all of its inherent com-
plexity, we usually fit a simplified or reduced model including just those terms that are
statistically significant. This practice comes from a recommendation by a 15th-century
philosopher named Occam who basically said, “The simplest model that explains the
data is probably the best model.” This adage is called Occam’s razor. Occam’s razor says
that if we have to choose between two models for the same data set, one more complex
than the other, the simpler of the two models is more likely to be the correct one. For
example, suppose that two models are fitted to the data from a scatter plot where the
first model is a simple linear model and the second model contains a quadratic term. If
both models fit the data equally well, then the linear model is preferred over the qua-
dratic model because it is simpler.
Another important concept related to Occam’s razor is the concept of effect heredity.
Effect heredity appears in the context of interactions between variables. Effect heredity
says that it’s more likely that a two-factor interaction will be significant if both of its fac-
tors are significant, it’s less likely that a two-factor interaction will be significant if only
one of its factors is significant, and it’s unlikely that a two-factor interaction will be sig-
nificant if neither of its factors is significant. This concept becomes especially important
when we interpret analyses from some of the designs from Chapter 10.
DOE Language and Concepts 119

4.20 DATA INTEGRITY AND ETHICS


DOE is a data-driven decision-making tool. The advantage of using data to make deci-
sions is that data are objective—or at least they are supposed to be. Since one or more
people usually have access to experimental data and are in a position to knowingly or
unknowingly edit, censor, or bias observations, it is possible that the objectivity of the
data will be compromised. Everyone who has contact with the data must understand that
it is absolutely critical to preserve its integrity.
If you recognize the name David Baltimore it should be because he won a Nobel
Prize in virology in 1975, but it’s more likely that you would recognize his name because
it’s forever linked to an infamous case where the integrity of experimental data was
called into question. In 1983, David Baltimore got funding from the National Institutes
of Health (NIH) for an important experiment in immunology. The lab work was so dif-
ficult that it was performed at two different labs at MIT, one operated by David Baltimore
and the other operated by Theresa Imanishi-Kari. Baltimore did not personally super-
vise or perform any of the lab work himself, but he was the most prestigious author of
the paper that reported the experimental results (Weaver, 1986). After the results were
published, another researcher working in Imanishi-Kari’s lab questioned the validity of
some of the experimental data. A hearing was held at Tufts University Medical School
that exonerated Baltimore, Imanishi-Kari, and the people who worked in their labs, but
then other people picked up and expanded on the initial accusation. The accusations
grew from simple mismanagement of the data to claims of malicious manipulation and
fabrication of data, which carried a criminal fraud charge. Further investigations were
carried out at MIT; the NIH; a new special operation, created in part by the Baltimore
case, at NIH called the Office of Scientific Integrity (OSI); the congressional subcom-
mittee on oversight and investigation of the House Energy and Commerce Committee,
which was responsible for funding the NIH; and eventually a reorganized version of the
OSI called the Office of Research Integrity (ORI) in the Department of Health and Human
Services. Throughout these investigations, many of them badly managed, Baltimore and
Imanishi-Kari were crucified by the press and shunned by the scientific community.
Nine years after the initial challenge to their Cell paper, a special panel appointed by the
ORI Departmental Appeals Board dropped all charges against Baltimore and Imanishi-
Kari. They acknowledged that there was some sloppy record keeping in lab notebooks
and some misleading descriptions of the methods used in the Cell paper but there was
no evidence of fraud and no indication that the claims in the paper were in error. In the
absence of all of the attention, whatever mistakes were made in the Cell paper would
have been resolved by the normal progression of science. Ironically, David Baltimore’s
Nobel Prize that drew special attention to the infamous Cell paper will always be his
second claim to fame after his unfortunate role in this story.
During my junior year of college, I performed a lab experiment in physical elec-
tronics with my lab partner, Steve. The experiment was very complex so Steve and I
split up the task of recording the variable settings and the corresponding response. A
week later—the night before the lab report was due, of course—we discovered that we
had both forgotten to record one of the independent variables that we adjusted during
120 Chapter Four

the experiment. Without the missing values we couldn’t complete the lab report. Being
the creative students that we were, and having few other choices, Steve and I created a
graph of the theoretical relationship between the response and the independent vari-
ables. Then we plotted fictitious points along this curve and worked backward to create
the column of missing settings. Of course we drew the points along the curve with lots
of experimental error. We didn’t want to make unreasonably strong claims about the
relationship—only that the relationship was roughly as the theory predicted.
A few days after turning in our reports, Steve and I were called to a special meet-
ing with the lab professor to discuss our results. When we met with him, Steve and I
were initially relieved to find out that our fakery hadn’t been detected, but then shocked
to realize that the professor was completely thrilled with our work! No other students
had ever successfully obtained the expected relationship between the response and the
independent variables! The experiment had essentially been a study of noise! We
quickly admitted that we had faked the data and luckily got away with passing grades
and a verbal reprimand; however, in any other environment it’s unlikely that the com-
mission of such an act would be dealt with so lightly. Nowadays, in most workplaces,
someone caught faking data would probably be fired, maybe even prosecuted, and cer-
tainly ostracized by their peers like Baltimore and Imanishi-Kari.
The moral of these stories is that you should not fake or in any other way compro-
mise the integrity of your data. If you do, you put yourself and your whole organization
at risk and you will probably be caught and held accountable. Whether you get caught
or not, you will certainly go to data hell. Like statistics hell, the line to get into data hell
is very long and we will probably see each other there.

4.21 GENERAL PROCEDURE FOR EXPERIMENTATION


The following procedure outlines the steps involved in planning, executing, analyzing,
and reporting an experiment:
1. Prepare a cause-and-effect analysis of all of the process inputs (variables) and
outputs (responses).
2. Document the process using written procedures or flowcharts.
3. Write a detailed problem statement.
4. Perform preliminary experimentation.
5. Design the experiment.
6. Determine the sample size and the blocking and randomization plan.
7. Run the experiment.
8. Perform the statistical analysis of the experimental data.
9. Interpret the statistical analysis.
DOE Language and Concepts 121

10. Perform a confirmation experiment.


11. Report the results of the experiment.
Each of these steps is described in detail in the following sections. The descriptions
include a list of the activities that must be considered, recommendations for who should
be involved in these activities, and an estimate of how much time is required. The time
estimates are appropriate for someone proficient in DOE methods. Novices will proba-
bly take longer to complete each step if they have to do any research and review.
The cast of characters involved in the DOE process and their primary functions on
the DOE team are:
• The DOE project leader, who has the primary responsibility for the project.
• The operators, who run the process.
• The technicians or machine adjusters, who maintain the equipment and
implement significant process changes or upgrades.
• The design engineer, who has knowledge of how the product is supposed
to work.
• The process engineer, who has knowledge of how the process is supposed
to run.
• The manager/customer, for whose benefit the experiment is being performed.
• The DOE statistical specialist, who will support the DOE project leader on
complicated statistical issues as necessary.
Table 4.3 provides a summary of which DOE team members are required for each
activity.

4.21.1 Step 1: Cause-and-Effect Analysis


The first step in preparing for a new designed experiment or recovering from a poorly
implemented one is to complete a cause-and-effect analysis. The purpose of this analy-
sis is to create a catalog of all of the possible variables that affect the process. The tra-
ditional variable categories: methods, manpower, machines, material, and environment,
provide a good starting point. All variables, including ones that are not and can not be
included in the experiment, should be added to the list. Try to identify every possible
source of variation. If late in an experiment you discover that you’ve overlooked an
important variable, you should have at least listed the source of the problem in your
cause-and-effect analysis.
In addition to creating a list of the input variables, it’s also important to create a
complete list of all of the possible responses. Although most experiments tend to focus
on a single response, there are usually secondary responses that must at least meet some
constraints, if not aggressive performance requirements. A modified cause-and-effect
122
Chapter Four
Table 4.3 Cast of characters and their responsibilities.
Project Design Process Manager/ Statistical
Activity Leader Operators Technicians Engineer Engineer Customer Specialist
1. Cause-and-effect analysis ✓ ✓ ✓ ✓ ✓ ✓
2. Document the process ✓ ✓ ✓ ✓ ✓
3. Problem statement ✓ Review Review Review Review Review Review
4. Preliminary experiment ✓ ✓ ✓ ✓ ✓
5. Design the experiment ✓ Support
6. Randomization plan ✓ Support
7. Run the experiment ✓ ✓ ✓ ✓ ✓
8. Analyze the data ✓ Support
9. Interpret the model ✓ Support
10. Confirmation experiment ✓ ✓ ✓
11. Report the results ✓ Review Review Review
DOE Language and Concepts 123

diagram, such as the example in Figure 4.2, page 95, is a good way to incorporate the
variables and responses into one document.
The initial cause-and-effect diagram that you create to document a process should
evolve as your understanding of the system changes and improves. Try to keep photo-
copies of the most current diagram handy so you can add to it as you get new ideas. The
modified cause-and-effect diagram is also a great way to get a new DOE team member
up to speed on the process or to show an anxious manager that you have command of
the situation.
Although an initial modified cause-and-effect diagram can be started by a single
person, most likely the DOE project team leader, it is essential that all of the people
involved in the process contribute to this step. Inputs from the operators and technicians
who run the process on a daily basis are critical because they are often the only ones
aware of special but important variables. The design and process engineering people are
important because they must provide the more technical and theoretical viewpoint. The
manager of the organization that owns and operates the process should be involved to
make sure that all of the requirements of the process, including broader requirements
that might not be known or obvious to the others, are addressed. If the customer of the
process cannot be involved or consulted at this stage, the manager is also responsible
for representing his or her viewpoint.
The initial creation of the modified cause-and-effect diagram can happen in a rela-
tively short period of time, but the document is rarely complete in its early stages. It is
just so easy to overlook secondary or even important variables or responses that you
must expect to spend quite a bit of time spread out over several days or even weeks to
develop a complete analysis. And always update this document on a regular basis as
new variables and interpretations of the system are discovered.

4.21.2 Step 2: Document the Process


The process to be studied should be documented in the form of written procedures or
flowcharts. The documentation should be sufficiently complete that someone unfamil-
iar with the particular process but reasonably skilled in the art could operate the process
and reproduce the results of an experiment. If this documentation doesn’t exist, it is well
worth taking the time to create it. If the documentation already exists, it is wise to
review it carefully for discrepancies between the perceived and real processes. It’s likely
that the problems or issues that instigated a DOE study of the system were caused by a
lack of understanding of the system. This is the time to resolve as many issues as pos-
sible before proceeding to the next step in the DOE procedure.
In addition to the instructions required to operate the process, historical records of
the performance of the process should also be reviewed and summarized. It is almost
pointless to consider performing a designed experiment on a process that is out of sta-
tistical control, so locating up-to-date and relevant control charts or other evidence that
confirms that the process is in control is very important. If this evidence doesn’t exist
or hasn’t been compiled, you should take the opportunity to complete this step instead of
just assuming that everything is OK.
124 Chapter Four

In many cases, the creation of new procedures or the review of existing procedures
will uncover potentially serious gaps in the system. For example, it may be discovered
that one or more operators really don’t understand how the process is supposed to oper-
ate. These issues must be identified and resolved before proceeding to the next step in
the DOE process.
If the independent variables and/or response are quantitative, then calibration
records and gage error study data should be checked to confirm that all of the necessary
measurements and settings are accurate and precise. If this evidence doesn’t exist, then
it may be worth the time to address the most serious concerns if not all of them.
All of the owners/managers of the process should be involved in documenting the
process. This includes the operators who run the process, the technicians or machine
adjusters who troubleshoot and maintain the process on a daily basis, and the design
and/or process engineers who have overall responsibility for the process.
It’s relatively rare that a process is sufficiently documented prior to performing a
designed experiment. Usually something important among the many procedures, pro-
cess performance records, calibration records, and gage error study results that are nec-
essary to completely document the process is inadequate or completely missing. If these
things are all available and up-to-date, this step of the DOE process might happen
quickly, but it’s more likely that many hours of preparation will be necessary before this
step can be considered complete. Often these activities uncover the problem or prob-
lems that initiated considerations for performing a designed experiment in the first
place. If effective solutions can be found to these problems, it may not be necessary to
continue to the next step in the DOE process.

4.21.3 Step 3: Write a Detailed Problem Statement


Most DOE projects involve many people who come from different levels and parts of the
organization. These people often have very different perceptions of the purpose of the spe-
cific DOE project. For example, the expectations of an upper manager who will only see
the final report from a DOE project may be completely different from those of an opera-
tor who has to run the process. The purpose of a DOE problem statement is to unam-
biguously define the scope and goals of the experimental program for everyone involved.
The DOE problem statement should be a written document that is circulated for
comments, formally reviewed, and signed off like a contract between the DOE project
team and the manager that they report to. The problem statement should include:
• A description of the response or responses to be studied and their relevant goals
or constraints.
• An estimate of the smallest practically significant change in the response that
the experiment is expected to detect for the purpose of sample-size calculations.
• A presentation of any relevant theory or physical model for the problem that
might provide additional insight into its behavior.
DOE Language and Concepts 125

• A description of relevant historical data or other experiments that were


performed to study the problem.
• A list of the possible experimental variables. This list does not need to be
complete or even accurate at this point in the DOE process, but it helps identify
the scope of the variables that will be considered. Preliminary assignments of
variables to the following categories should be made: 1) a variable intended for
active experimentation, 2) a variable that will be held constant throughout the
experiment, or 3) a variable that cannot be controlled and may or may not be
measured during the experiment.
• A list of expected and possible interactions between design variables.
• Citation of evidence of gage capability for experimental variables and
responses.
• Citation of evidence that the process is in control.
• Estimates of the personnel, amount of time, and material required to perform
the experiment.
• A list of the assumptions that will be made to simplify the design, execution,
and analysis of the experiment.
• Identification of questions that must be answered, such as by preliminary
experimentation, before a large designed experiment is undertaken.
The DOE problem statement is usually drafted by the DOE project team leader, but
the draft should be reviewed by all of the people on the project team. Their changes and
recommendations should be considered and incorporated into the document if appro-
priate. Because there are so many people involved and because this document sets the
stage for most of the later steps in a DOE project, this step can be very time-consuming;
however, if the problem statement is accurate and well written, the DOE project will be
more likely to succeed.

4.21.4 Step 4: Preliminary Experimentation


Often the only way to fill knowledge gaps identified in the DOE problem statement is
to perform some preliminary experiments, either in the lab or on the actual process to
be studied. Successful preliminary experimentation is critical to decreasing the risks of a
large designed experiment. These preliminary experiments usually take the form of small
sets of runs to investigate one variable or procedure at a time.
The purpose of preliminary experimentation is to:
• Gain experience with new experimental variables.
• Confirm that there are no unidentified variables.
126 Chapter Four

• Confirm that the classification of each variable as fixed, experimental, or


uncontrolled is appropriate.
• Identify the safe upper and lower bounds for experimental variables.
• Investigate the need for an intermediate level of a quantitative variable to detect
or quantify curvature in a response.
• Confirm that the procedures used to operate the process are accurate.
• Confirm that the operators and equipment function correctly and as expected.
• Estimate the standard deviation of the response so that a sample-size calculation
can be done.
Preliminary experiments should use no more than 10 to 15 percent of the total
resources allocated for an experiment. The scope of preliminary experiments should be
limited to those questions that must be resolved before the full experiment can be per-
formed. It’s often difficult to decide how much preliminary experimentation is neces-
sary. If an insufficient amount of preliminary experimentation is done, problems will
appear later in the DOE process. Besides wasting time, some committed experimental
resources will possibly become useless.
Excessive preliminary experimentation can also cause a DOE project to fail.
Although most preliminary experiments appear to be simple and innocent, there are
often unexpected surprises that consume time, materials, and perhaps most importantly
the patience of the managers waiting for results. Always decide in advance of beginning
any preliminary experiments how much of your resources will be expended. When
those resources are gone, it’s time to push on to the next step in the program. In some
extreme cases, the results of preliminary experiments may also cause you to back up
several steps in the DOE process or even to abandon the DOE project completely.
Which preliminary experiments to run must be decided by the DOE project leader,
the operators, the technicians, and the design and process engineers. After all prelimi-
nary experiments are completed, all of these people must be convinced that they under-
stand the product and process well enough to guarantee that the primary experiment can
be built successfully. The amount of time required to perform preliminary experiments
is very dependent on the process being studied and the number and complexity of issues
that have to be resolved.

4.21.5 Step 5: Design the Experiment


The goal of an experiment is to extract an appropriate model from an experimental data
set that can be used to answer specific questions about the process being studied. This
relationship is shown in Figure 4.8. Although the execution of an experiment flows as in
the figure, the process of selecting an experiment design actually flows backward. The
questions to be answered determine the model that is necessary, the model determines
DOE Language and Concepts 127

Experiment design

Data Model Answers

Figure 4.8 Relationship between the experiment design, data, model, and answers.

the data to be collected, and the experiment design defines the organization and structure
of the data.
Section 4.13 indicated that there are two parts to every experiment design: the vari-
ables matrix, which identifies the experimental variables and their levels, and the design
matrix, which identifies the combination of variable levels that will be used for the
experimental runs. Both of these matrices must be completely defined in this step of the
DOE process. Use the information collected in the previous steps of the DOE process
to determine which variables to include in an experiment and what levels each variable
will have. Then, based on this information and the number and nature of the design vari-
ables, that is, whether they are qualitative, quantitative, or a mixture of the two types,
and how many levels of the variables there are, select an appropriate experiment design.
For example, the experiment design may be: a one-way or multi-way classification
design with qualitative variables, a two-level screening design, a factorial design to
model main effects and interactions, a response surface design to account for curvature
in the response, or a hybrid design involving both qualitative and quantitative variables.
Once the design has been chosen, the matrix of experimental runs and a fictional
response (for example, random normal) should be created and analyzed to confirm that
the desired model can indeed be fitted to the data.
The DOE project leader should have enough information collected at this point in
the DOE process that he or she can specify the variables and experimental design
matrices, however, they may still find it necessary to consult with the appropriate
experts on the process if there are still ambiguities with respect to some of the vari-
ables or their levels. And if there are too many variables and/or variable levels in the
experiment, it may be necessary to consult with the statistical specialist and/or recon-
vene the whole project team to identify a more practical design. In the majority of the
cases, when the previous steps in the DOE process have been completed successfully,
the specification of an appropriate experiment design should take the DOE project
leader less than one hour.

4.21.6 Step 6: Sample Size, Randomization, and Blocking


The tasks of determining the sample size and the randomization and blocking plans for
an experiment are often considered to be a part of the experiment design step, but these
tasks are so important, so interrelated, and so frequently botched, that they deserve to
be elevated to a separate DOE process step of their own. These tasks are also usually
128 Chapter Four

performed after an experiment design has been chosen, but they may raise issues that
force you to reconsider that choice.
After the experiment design, has been chosen a sample-size calculation should be
performed to determine the number of replicates of the design necessary to make the
experiment sufficiently sensitive to practically significant effects. If the indicated total
number of runs exhausts the available time and resources, it may be necessary to revise
or perhaps even abandon the original plan. The total number of runs required and the
rate at which they can be produced will also factor into blocking considerations.
All experimental variables are either randomized or used to define blocks. If you
intend to make claims about the effect of a variable, then that variable must be ran-
domized. If the experiment is blocked, then the study variables must be randomized
within blocks. The role of blocking variables is limited to reducing the variability asso-
ciated with sources of noise that would reduce the sensitivity of the experiment if they
were overlooked. If the experiment is to be built in blocks, randomize the order of the
blocks and randomize runs involving study variables within blocks. Confirm that the ran-
domization and blocking plan is effective by analyzing the intended order of the experi-
mental runs as if it were the experimental response. If any of the design variables or
other important terms in the model can predict the run order, then the randomization
wasn’t effective and you will have to re-randomize the runs or modify the randomiza-
tion plan. Use this opportunity to confirm that the blocking plan didn’t interfere with
the intended model, such as by confounding a blocking variable with a study variable
or an important interaction between study variables.
After the randomization plan is validated, data sheets should be created for the oper-
ators. These data sheets should indicate the settings of the experimental runs, with room
to record the response and any special notes. To avoid confusing the operators, the order
of the runs on the data sheets should be the randomized run order. If the experiment is
to be built in blocks, separate data sheets can be created for each block. The team mem-
bers who will participate in actually building the experiment should review the data col-
lection sheets to make sure that they are correct and understood.
The DOE project leader is responsible for determining the randomization and
blocking plan, but it may be necessary for him or her to consult with the statistical spe-
cialist, technicians, or process engineer to determine practical strategies for the plan.
Simple experiments take only a few minutes to randomize and block but a complicated
experiment may take several hours. In severe cases, the difficulties associated with ran-
domization and blocking may require reconsideration of the experiment design.

4.21.7 Step 7: Run the Experiment


An experiment should not be run unless there is consensus among the key people
involved that everything is ready. There is so much preparation and investment involved
that no one wants to start an experiment prematurely. Even after a large experiment is
started, you may realize that some aspect of the process was not considered and it may
DOE Language and Concepts 129

be necessary to suspend the experiment until the issue is resolved. Sometimes this can be
done immediately after the problem is discovered, but when a clear and effective solu-
tion is not apparent it’s usually better to walk away, regroup, and come back another day.
When the experiment is being performed, great care has to be taken to follow the
correct procedures for the process, to honor the randomization plan, to maintain the iden-
tity of the parts, and to make sure that all of the data are faithfully and accurately recorded.
Any unusual events or conditions should be clearly noted. These observations will be
crucial later on if it becomes necessary to explain outliers in the data set.
All of the key people required to operate the process must be available to run the
experiment. If a crucial person is missing, you’re better off waiting until that person
becomes available. This may or may not include the process engineer depending on
how unusual some of the experimental runs are. If the operators and technicians have
been properly trained and prepared to run the experiment, the DOE project leader
should be able to sit back, observe, and only take action if someone becomes confused
or if some unexpected event occurs.

4.21.8 Step 8: Analyze the Data


Before performing any analysis of the experimental data, the accuracy of the recorded
data values should be confirmed by checking each recorded value in the worksheet
against the original data record. All discrepancies must be resolved before any analysis
is performed.
The experimental data should be analyzed using MINITAB or some other suitable
statistical software package. When possible, the raw data should be plotted in some
meaningful way. The full model, including all relevant main effects, interactions, and
other terms, should be run and a complete set of residuals diagnostic plots should be cre-
ated, including plots of the residuals versus each of the design variables, residuals versus
fitted values, and residuals versus the run order. A normal probability plot of the resid-
uals should also be created. Special consideration should be given to outliers or highly
influential observations. It may be necessary to compare outliers detected in the statis-
tical analysis to records or notes of unusual conditions that occurred during the experi-
ment. Outliers must not be dropped from the data set without correlation to a clear
special cause.
If the full model is excessively complex, a simplified model should be created. Both
models, the full model and the simplified model, should be retained in their complete form
because it is usually necessary to report both of them. The full model is important because
it will probably include terms that are testable but not statistically significant, which can
be of as much or even greater importance than the terms that are statistically significant.
The statistical analysis of the data should be done by the DOE project leader with
support from the DOE statistical specialist if necessary. The analysis of a simple exper-
iment should take less than an hour. More complicated experiments might require sev-
eral different models and so may take several hours to analyze.
130 Chapter Four

4.21.9 Step 9: Interpret the Results


Before attempting to interpret any model, the residuals diagnostic plots should be eval-
uated for normality, homoscedasticity, and independence. Only after these conditions
and any other requirements are found to be valid should the statistical model be inter-
preted. If only one model was constructed from the data, the interpretation of the model
will probably be straightforward. When there are several possible models to consider,
it will be necessary to compare several different aspects of each to determine which is
the best fit to the data. When the best model is identified, a corresponding error state-
ment must be constructed.
When an acceptable model and error statement are found, the model should be
interpreted relative to the goals of the experiments. For example, if there were specific
numerical goals for the response, then valid ranges of the experimental variables that
meet those goals should be identified. Several different strategies for achieving the target
response might be possible. When an experimental variable is quantitative, be careful
not to extrapolate the predicted response outside the range of experimentation.
The interpretation of the statistical model should be done by the DOE project leader
with support from the DOE statistical specialist, if necessary. The interpretation of the
model for a simple experiment will generally be straightforward and take less than an
hour but more complicated problems may take several hours to interpret.

4.21.10 Step 10: Run a Confirmation Experiment


Despite the most diligent efforts and attention to detail, occasionally you will find that
you can’t duplicate the results of a successful experiment. To protect yourself and the
organization from irreproducible results, you should always follow up a designed exper-
iment with a confirmation experiment. The purpose of a confirmation experiment is to
demonstrate the validity of the model derived from the designed experiment. The con-
firmation experiment may be quite small, perhaps consisting of just a single crucial
condition, but it should address the most important claims or conclusions from the full
experiment. It should be run well after the original experiment and under typical oper-
ating conditions. If the conclusions from the original experiment were robust, then the
confirmation experiment will successfully reproduce the desired results. However, if
something was overlooked or changed from the time the original experiment was per-
formed, then the variable or changes need to be identified and incorporated into the
analysis. Never report the results from the original experiment until the confirmation
experiment has been successfully run.
The confirmation experiment is usually designed by the DOE project leader and is
performed by the operators who run the process. The experiment usually doesn’t take
too long or consume a large amount of resources unless its results are inconsistent with
the original experiment. If the confirmation experiment requires an unusual combina-
tion of variable levels, it may be necessary for the process engineer to consult or even
participate in the confirmation experiment.
DOE Language and Concepts 131

4.21.11 Step 11: Report the Experiment


Many organizations compile a report of an experiment in the form of slides, for exam-
ple, in the form of a PowerPoint presentation. The eleven-step procedure provides an
effective outline for such a presentation. It can also serve as a checklist that makes it
relatively easy to confirm that all aspects of the experiment have been appropriately
documented.
If a designed experiment must be documented in a written report, most organiza-
tions have a standard report format that must be used. If a standard format has not been
established, the following report organization is effective:
1. Findings. An executive summary of the experiment summarizing the design,
the analysis, and the results. This section should be no longer than just a
few sentences.
2. Background. Some technical background on the process to be studied, a
description of the problem, and a statement of the purpose of the experiment.
This section should be no more than one page long.
3. Experiment design. A description of the experiment design and the
randomization and blocking plan that was used. It may also be necessary
to justify the sample size that was used.
4. Data. A table of the experimental data with a description of the table and
its organization. Special mention should be made of missing values and
any observations that have associated special causes. The location of the
original data in its paper or electronic form should be indicated.
5. Statistical analysis. A description of the statistical analysis with explicit
references to all of the computer analyses and supporting graphs. Discussion
of the analysis can also be integrated into this section.
6. Recommendations. A recommendations section should be included in the
report if a follow-up study is required or if there are any ambiguities
remaining after the analysis is complete. This section may also include a
focused interpretation of the analysis to address a specific problem or goal
of the experiment, for example, to optimize a response.
The formal report should be written by the DOE project leader, but the report
should be reviewed and approved by those members of the team who have the techni-
cal skills to understand it. Most designed experiments can be reported in detail in three
to 10 pages with attached figures. Someone skilled in the art of DOE report writing will
require about one to 1.5 hours per page to write the report.

Example 4.12
The following report is presented as an example of a well-written report of a
designed experiment.
132 Chapter Four

Report: Analysis of an Experiment to Compare Two Lubricants in a


Cutting Operation

Author: Paul Mathews, Mathews Malnar and Bailey, Inc.

For: Dan M., Engineering Manager, XYZ Cutoff Inc.

Date: 22 November 1999

Findings: An experiment was performed to compare the standard lubricant (LAU-003)


to a recommended replacement lubricant (LAU-016) in the brake-bar cutting operation
to determine if the new lubricant would allow more cuts between blade resharpenings.
The new lubricant was confirmed to deliver about 16 more cuts (131 versus 147) on
average than the old lubricant (p = 0.005) and no adverse effects were observed. The
95 percent confidence interval for the increase in the number of cuts with LAU-016 was
P(6.3 < Δm < 25.7) = 0.95 or between five percent to 20 percent relative to LAU-003.
Based on these results, it is recommended that LAU-003 be replaced with LAU-016 in
the brake-bar cutting operation.

Background: Brake bars are cut using carbide-tipped circular saws lubricated with
LAU-003. Saws must be resharpened when burrs on the perimeter of the cut approach
the allowed tolerances for the cut surface. LAU has suggested that saws lubricated with
LAU-016 instead of LAU-003 would make more cuts between resharpenings. Decreased
downtime and resharpening costs would more than offset the minor price increase for
LAU-016. The purpose of this experiment is to: 1) demonstrate that LAU-016 delivers
more cuts between resharpenings than LAU-003 and 2) confirm that there are no
adverse effects associated with the use of LAU-016.

Preparation: A cause-and-effect analysis and operating procedure review were per-


formed to identify factors that might influence the number of cuts delivered in the brake-
bar cutting operation. These analyses identified the following factors that were thought
to deserve special attention or comment:

• Methods

– The brake-bar cutting operation is automated so there should be no variation


in the methods used.

– LAU-003 and LAU-016 are delivered in dry form and mixed with mineral oil.
Both lubricants were mixed according to LAU’s instructions.

– LAU-003 lubricant is continuously filtered and eventually replaced on an estab-


lished schedule; however, that schedule is not rigorously followed. For this
experiment, new batches of both lubricants were prepared and used for about
10 percent of their scheduled life before the experimental runs were performed.

– The lubricant tank was drained and refilled between trials that required a lubri-
cant change. No attempt was made to flush out lubricant that was adsorbed on
machine surfaces.

Continued
DOE Language and Concepts 133

– All saw blade resharpenings were performed in-house on the Heller grinder by
Tony E. Tony also confirmed the critical blade tooth specs before a blade was
released for the experiment.
• Material
– The steel stock used for brake bars is thought to be consistent from lot to lot so
no attempt was made to control for lots. The order of the experimental runs was
randomized to reduce the risk of lot-to-lot differences.
– Saw blades tend to have a ‘personality’ so each blade was run two times—
once with LAU-003 and once with LAU-016.
– LAU lubricants tend to be very consistent from batch to batch and batches are
very large, so single batches of both lubricants were used in the experiment.
– 10 randomly selected saw blades were used for the experiment.
– Standard-grade mineral oil provided by LAU was used to mix the lubricants.
• Manpower
– Bob P. is the primary operator of the brake-bar cutting operation so all experi-
mental runs were performed by him. Bob also mixed the lubricants, switched
the lubricants between trials, and monitored the cutting operation to determine
when a blade had reached its end of life. Bob documented all of these steps in
the brake-bar cutting operation log book.
– Tony E. resharpened blades and confirmed that they met their specs.
• Machines
– All blades were sharpened on the Heller grinder.
– All cuts were made with the dedicated brake bar cutter.
– The number of cuts was determined from the counter on the brake bar cutter.
Experiment Design: Each saw blade was used once with each lubricant so the exper-
iment is a paired-sample design that can be analyzed using a paired-sample t test. The
sample size (n = 10) was determined by prior calculation to deliver a 90 percent proba-
bility of detecting a 10 percent increase in the life of the saw blades using LAU-016. The
standard deviation for the sample-size calculation was estimated from LAU-003 histori-
cal data. The lubricant type was run in completely random order by randomly choosing
a blade from among those scheduled and available for use with the required lubricant.
Experimental Data: The experiment was performed over the period 14–18 October
1999. The experimental data are shown in Figure 4.9 in the order in which they were col-
lected. Blade #9 broke a carbide tip during its first trial so it was repaired, resharpened,
and put back into service. Broken tips are random events thought to be unrelated to the
lubricant so the first observation of blade #9 was omitted from the analysis. There were
no other special events recorded during the execution of the experiment. The original
record of these data is in the brake-bar operation logbook.

Continued
134 Chapter Four

Row Rand Std Blade Lube Cuts


1 1 15 5 2 162
* 2 9 9 1 * Broken Tip
2 2 11 1 2 145
3 3 4 4 1 117
4 4 9 9 1 135
5 5 13 3 2 145
6 6 2 2 1 124
7 7 3 3 1 131
8 8 20 10 2 147
9 9 12 2 2 146
10 10 16 6 2 134
11 11 7 7 1 130
12 12 19 9 2 142
13 13 8 8 1 134
14 14 6 6 1 138
15 15 5 5 1 123
16 16 18 8 2 161
17 17 1 1 1 139
18 18 17 7 2 139
19 19 14 4 2 149
20 20 10 10 1 139

Figure 4.9 Number of cuts by lubricant and blade.

Statistical Analysis: The experimental data are plotted by lubricant type and con-
nected by blade in Figure 4.10. The plot clearly shows that the number of cuts obtained
using LAU-016 is, on average, greater than the number of cuts obtained with LAU-003.
The data were analyzed using a paired-sample t test with Stat> Basic Stats>
Paired t in MINITAB V13.1. The output from MINITAB is shown in Figure 4.11. The
mean and standard deviation of the difference between the number of cuts obtained with
––
LAU-016 versus LAU-003 was Δx = 16.0 and s = 13.5. This result was statistically sig-
nificant with p = 0.005. The Δxi were analyzed graphically (not shown) and found to be
at least approximately normal and homoscedastic with respect to run order as required
for the paired-sample t test. The 95 percent confidence interval for the increase in the
mean number of cuts is given by:

P ( 6.3 < Δμ < 25.7 ) = 0.95

Relative to the mean number of cuts observed with LAU-003, the 95 percent confidence
interval for the fractional increase in the mean number of cuts is given by:

⎛ Δμ ⎞
P ⎜ 0.05 < < 0.20⎟ = 0.95
⎝ x 003 ⎠

Continued
DOE Language and Concepts 135

Continued

Blade
1
160 2
3
4
150
5
6
Cuts

140 7
8
9
130 10

120

LAU-003 LAU-016
Lube

Figure 4.10 Number of cuts versus lubricant by blade.

Paired T-Test and CI: LAU-016, LAU-003

Paired T for LAU-016 – LAU-003

N Mean StDev SE Mean


LAU-016 10 147.00 8.77 2.77
LAU-003 10 131.00 7.54 2.39
Difference 10 16.00 13.50 4.27

95% CI for mean difference: (6.34, 25.66)


T-Test of mean difference = 0 (vs not = 0): T-Value = 3.75 P-Value = 0.005

Figure 4.11 Paired-sample t test for cuts.

Although the lower bound of the 95 percent confidence interval falls below the 10 per-
cent target increase in the number of cuts, the true magnitude of the increase is still rel-
atively uncertain but large enough to justify the change from LAU-003 to LAU-016.

Conclusions: The experiment indicated that LAU-016 provides a greater number of


brake-bar cuts than LAU-003, and no adverse effects associated with the use of LAU-
016 were observed. Based on these results, it is recommended that LAU-016 be used
to replace LAU-003. Blade life should continue to be monitored after the lubricant
change has been implemented to confirm that the life improvement with LAU-016 is
maintained.
136 Chapter Four

4.22 EXPERIMENT DOCUMENTATION


There are so many people and activities involved in a DOE project that it can be difficult
to keep all of the associated documents organized. Perhaps the best way is to use a ring
binder with twelve tabs or dividers. The first eleven tabs should correspond to the eleven
steps of the DOE procedure and the 12th is for everything else. Some organizations issue
the DOE project leader a standardized project binder or open a special folder on a shared
computer hard drive at the beginning of a project. Then, as the eleven steps are
addressed, the project leader and team members are responsible for seeing that copies of
the appropriate documents are added to the binder or folder. This also keeps all of the
relevant information in one place so everyone on the project and in the organization
knows where to go and has at least some sense of the organization of the material.
The following list describes the general contents of each section of a complete pro-
ject binder for an experimental program:
1. Cause-and-effect analysis
• Copies of each revision of the modified cause-and-effect diagram, including
a list of contributors to each one.
• Relevant notes about any significant changes or additions to the cause-and-
effect diagram.
• Preliminary classification of variables into categories: possible design
variable, variable to be held fixed, uncontrolled variable.
2. Process documentation
• Copies or references to the original and final versions of the process procedures.
• Copies or references to control charts documenting the state of the process.
• Copies or references to relevant process capability analyses.
• Copies or references to GR&R study results.
• Completed checklist of relevant calibration records.
• Completed checklist of operator training records and/or proficiency
test results.
• Completed checklist of relevant maintenance items.
3. Problem statement
• Copies of each revision to the DOE problem statement.
• Copy of the final DOE problem statement with appropriate approvals
(for example, initials or signatures) required prior to proceeding with the
experimental program.
DOE Language and Concepts 137

4. Preliminary experiments
• List of specific questions or issues to be resolved with preliminary
experiments. (This list may already exist in the problem statement.)
• Summary statement of the purpose and results of each preliminary
experiment.
• Original data records, notes, and analysis from each preliminary
experiment.
• Notes on any follow-up actions taken as a result of findings from
preliminary experiments.
5. Experiment design
• Final classification of each input variable from the cause-and-effect analysis
into one of the following categories: experimental variable, variable to
be held fixed, uncontrolled variable not recorded, or uncontrolled variable
recorded.
• Copies of the variable and design matrices.
• Copy of the sample-size calculation or other sample-size justification.
• Copy of the analysis of a simulated response demonstrating that the desired
model can be fitted from the design. (This may be postponed and done in
combination with the validation of the randomization and blocking plan in
the next step.)
6. Randomization and blocking plan
• Description of and justification for the randomization and blocking plan.
• Copy of analysis validating the randomization plan (for example, analysis
of run order as the response).
• Copies of the actual data sheets (with the runs in random order) to be used
for data collection.
7. Experiment records
• Copies of any required formal authorizations to build the experiment.
• Copies of all original data records.
• Copies of all notes taken during the execution of the experiment.
8. Statistical analysis
• Copy of the experimental data after transcription into the electronic
worksheet.
138 Chapter Four

• Copies of the analyses of the full and refined models, including the
residuals analysis.
• Copies of any alternative models considered.
• Copies of any special notes or observations from the analysis concerning
unusual observations, and so on, and any related findings from follow-
up activities.
9. Interpretation
• Written interpretation of the statistical analysis with references to graphs
and tables.
• Explanation of special applications of the final model, for example,
optimization, variable settings to achieve a specified value of the response,
and so on.
• Description of any special conditions or observations that might indicate
the need for a follow-up experiment or influence the direction of the
confirmation experiment.
• If appropriate, a brief statement about the strengths and/or weaknesses of
the process, experiment, and/or analysis.
• Recommendations for the next step in the experimental project, for example,
proceed to a confirmation experiment, run more replicates of the original
experiment, perform another designed experiment, and so on.
10. Confirmation experiment
• Description of and justification for the confirmation experiment.
• Copies of the original data records, statistical analysis, and interpretation
of the confirmation experiment.
• Summary statement about the success or failure of the confirmation
experiment and implications for the goals of the experimental project.
11. Report
• Copy of the final experiment report (written and/or slide presentation) and
distribution list.
• Copies of any comments or follow-up to the report.
• List of recommendations and/or warnings for anyone who might reconsider
this problem in the future.
DOE Language and Concepts 139

4.23 WHY EXPERIMENTS GO BAD


Despite the most careful planning, preparation, execution, and analysis, experiments
still go bad. In fact, more experiments probably go wrong than right. It’s safe to say that
someone with lots of DOE experience has a better chance of getting an experiment right
the first time, but even the best experimenter runs an unsuccessful experiment now and
then. Hopefully this won’t happen to you too often but as with every other type of prob-
lem that occurs, consider it to be an opportunity for improvement. Remember that
there’s usually more to be learned from an experiment that has gone bad than one that
goes perfectly. Experiments are expected to deliver surprises but not surprises of career-
ending magnitude.
The general procedure outlined in the preceding section consists of just 11 steps.
Every one of those 11 steps contains numerous places where an experiment can fail. It
might seem strange that most of this book is dedicated to steps 5 and 7, as those are
probably the easiest ones to get right.
One of the crucial skills that distinguishes an experienced experimenter from a
novice is the experienced experimenter’s attention to detail coupled with the ability to
recognize, among the many factors competing for attention, those important but subtle
factors that can cause an experiment to fail. Experienced experimenters develop a spe-
cial type of experimental sense or conscience that borders on intuition. Novices have
this sense, too, but they have to learn to listen to it, develop it, and trust it. If you’re new
to DOE, take special note of the problems that cause your experiments to fail and try to
identify the first moment that you became aware of them. Usually you’ll find that you
had some early sense that a problem existed but didn’t fully appreciate its significance
at the time. With practice, you’ll get better at recognizing and reacting to those prob-
lems in time to minimize the damage to your experiments.
Here’s a short list of some mistakes that can lead you astray:
• Inexperienced experimenter
• The presence of the experimenter changes the process
• Failure to identify an important variable
• Picked the wrong variables for the experiment
• Failure to hold a known variable fixed
• Failure to record the value of a known but uncontrollable variable
• Failure to block on an influential variable
• Poor understanding of the process and procedures
• Multiple processes in use
• Failure to consult the operators and technicians
140 Chapter Four

• Failure to identify significant interactions


• Failure to recognize all of the responses
• Ambiguous operational definition for one or more variables or for the response.
• Inadequate repeatability and reproducibility (R&R) to measure the responses
• Failure to do any or enough preliminary experimentation
• Exhausted resources and patience with too much preliminary experimentation
• Picked variable levels too close together
• Picked variable levels too far apart
• Wrong experiment design (overlooked interactions, curvature, or first
principles)
• One experiment instead of several smaller ones
• Several small experiments instead of a single larger one
• Not enough replicates
• Too many replicates
• Repetitions instead of replicates
• Failure to randomize
• Randomization plan ignored by those building the experiment
• Failure to record the actual run order
• Critical person missing when the experiment is run
• Data recording or transcription errors
• Failure to record all of the data
• Lost or confused data records
• Failure to maintain part identity
• Error in setting variable levels
• Deviations from variable target levels
• Unanticipated process change during experiment
• Equipment not properly maintained
• Failure to complete the experiment in the allotted time (for example,
before a shift change)
DOE Language and Concepts 141

• Failure to note special occurrences


• Wrong statistical analysis
• Failure to check assumptions (normality, equality of variances, lack of fit,
and so on)
• Failure to specify the model correctly in the analysis software
• Mistreatment of experimental runs that suffered from special causes
• Mistreatment of lost experimental runs
• Failure to refine the model
• Misinterpretation of results
• Extrapolation outside of experimental boundaries
• Failure to perform a confirmation experiment
• Inadequate resources to build a confirmation experiment
• Inadequate documentation of the results
• Inappropriate presentation of the results for the audience

You might also like