Basic Engineering Data Collection and Analysis
Basic Engineering Data Collection and Analysis
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Basic Engineering
Data Collection
and Analysis
Stephen B. Vardeman
Iowa State University
J. Marcus Jobe
Miami University
Ames, Iowa
Basic Engineering Data Collection and © 2001 Stephen B. Vardeman and J. Marcus Jobe
Analysis
Basic Engineering Data Collection and Analysis is available under a Creative
Stephen B. Vardeman and J. Marcus Jobe Commons Attribution NonCommercial ShareAlike 4.0 International license.
Sponsoring Editor: Carolyn Crockett
You may share and adapt the material, so long as you provide appropriate credit
Marketing: Chris Kelly
to the original authors, do not use the material for commercial purposes, and
Editorial Assistant: Ann Day any adaptations or remixes of the material which you create are distributed
Production Editor: Janet Hill under the same license as the original.
Production Service: Martha Emry
Permissions: Sue Ewing
Cover Design/Illustration: Denise Davidson Originally published by Brooks/Cole Cengage Learning in 2001.
Interior Design: John Edeen Published online by Iowa State University Digital Press in 2023.
Interior Illustration: Bob Cordes
Print Buyer: Kristina Waller
Typesetting: Eigentype Compositors Library of Congress Catalog Number: 00-040358
ISBN-13: 978-0-534-36957-6 (print)
ISBN-10: 0-534-36957-X (print)
ISBN: 978-1-958291-03-0 (PDF)
https://doi.org/10.31274/isudp.2023.127
Iowa State University is located on the ancestral lands and territory of the Baxoje
(bah-kho-dzhe), or Ioway Nation. The United States obtained the land from the
Meskwaki and Sauk nations in the Treaty of 1842. We wish to recognize our
obligations to this land and to the people who took care of it, as well as to the
17,000 Native people who live in Iowa today.
Soli Deo Gloria
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Preface
v
vi Preface
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Pedagogical Features
Pedagogical and practical features include:
■ Precise exposition
■ A logical two-color layout, with examples delineated by a color rule
■ Margin notes naming formulas and calling attention to some main issues of
discussion
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
The Exercises
There are far more exercises in this text than could ever be assigned over several
semesters of teaching from this book. Exercises involving direct application of
section material appear at the end of each section, and answers for most of them
appear at the end of the book. These give the reader immediate reinforcement that
the mechanics and main points of the exposition have been mastered. The rich sets of
Chapter Exercises provide more. Beyond additional practice with the computations
of the chapter, they add significant insight into how engineering statistics is done
and into the engineering implications of the chapter material. These often probe
what kinds of analyses might elucidate the main features of a scenario and facilitate
substantive engineering progress, and ponder what else might be needed. In most
cases, these exercises were written after we had analyzed the data and seriously
considered what they show in the engineering context. These come from a variety
of engineering disciplines, and we expect that instructors will find them to be not
only useful for class assignments but also for lecture examples to many different
engineering audiences.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Ancillaries
Several types of ancillary material are available to support this text.
■ The CD packaged with the book provides PowerPointTM visuals and audio
presenting solutions for selected Section Exercises.
■ For instructors only, a complete solutions manual is available through the
local sales representative.
■ The publisher also maintains a web site supporting instruction using Basic
Engineering Data Collection and Analysis at www.brookscole.com.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Acknowledgments
There are many who deserve thanks for their kind help with this project. People at
Duxbury Thomson Learning have been great. We especially thank Carolyn Crockett
for her encouragement and vision in putting this project together. Janet Hill has
been an excellent Production Editor. We appreciate the help of Seema Atwal with
the book’s ancillaries, and are truly pleased with the design work overseen by
Vernon Boes.
Acknowledgments ix
First class help has also come from outside of Duxbury Thomson Learning.
Martha Emry of Martha Emry Production Services has simply been dynamite to
work with. She is thorough, knowledgeable, possessed of excellent judgment and
unbelievably patient. Thanks Martha! And although he didn’t work directly on this
project, we gratefully acknowledge the meticulous work of Chuck Lerch, who wrote
the solutions manual and provided the answer section for Statistics for Engineering
Problem Solving. We have borrowed liberally from his essentially flawless efforts
for answers and solutions carried over to this project. We are also grateful to Jimmy
Wright and Victor Chan for their careful work as error checkers. We thank Tom
Andrika for his important contributions to the development of the PowerPoint/audio
CD supplement. We thank Tiffany Lynn Hagemeyer for her help in preparing the
MINITAB, JMP, and Excel data files for download. Andrew Vardeman developed
the web site, providing JMP, MINITAB, and Excel help for the text, and we aprreci-
ate his contributions to this effort. John Ramberg, University of Arizona; V. A.
Samaranayake, University of Missouri at Rolla; Paul Joyce, University of Idaho;
James W. Hardin, Texas A & M; and Jagdish K. Patel, University of Missouri at
Rolla provided helpful reviews of this book at various stages of completion, and we
thank them.
It is our hope that this book proves to be genuinely useful to both engineering
students and working engineers, and one that instructors find easy to build their
courses around. We’ll be glad to receive comments and suggestions at our e-mail
addresses.
Steve Vardeman J. Marcus Jobe
[email protected] [email protected]
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Contents
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1 Introduction 1
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
2 Data Collection 26
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
B Tables 785
Introduction
T his chapter lays a foundation for all that follows: It contains a road map for the
study of engineering statistics. The subject is defined, its importance is described,
some basic terminology is introduced, and the important issue of measurement is
discussed. Finally, the role of mathematical models in achieving the objectives of
engineering statistics is investigated.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1
2 Chapter 1 Introduction
Table 1.1
Thrust Face Runouts (.0001 in.)
Gears laid
0 10 20 30 40
Runout (.0001 in.)
Gears hung
0 10 20 30 40
Runout (.0001 in.)
Example 1 But how “precise” is this figure? Runout values are variable. So is there any
(continued ) assurance that the difference seen in the present means would reappear in further
testing? Or is it possibly explainable as simply “stray background noise”? Lay-
ing gears is more expensive than hanging them. Can one know whether the extra
expense is justified?
Drawing These questions point to the need for methods of formal statistical inference
inferences from data and translation of those inferences into practical conclusions. Meth-
from data ods presented in this text can, for example, be used to support the following
statements about hanging and laying gears:
1. One can be roughly 90% sure that the difference in long-run mean runouts
produced under conditions like those of the engineer’s study is in the range
3.2 to 7.4
2. One can be roughly 95% sure that 95% of runouts for gears laid under
conditions like those of the engineer’s study would fall in the range
3.0 to 22.2
3. One can be roughly 95% sure that 95% of runouts for gears hung under
conditions like those of the engineer’s study would fall in the range
.8 to 35.0
These are formal quantifications of what was learned from the study of laid
and hung gears. To derive practical benefit from statements like these, the process
engineer had to combine them with other information, such as the consequences
of a given amount of runout and the costs for hanging and laying gears, and had to
apply sound engineering judgment. Ultimately, the runout improvement was great
enough to justify some extra expense, and the laying method was implemented.
The example shows how the elements of statistics were helpful in solving an
engineer’s problem. Throughout this text, the intention is to emphasize that the
topics discussed are not ends in themselves, but rather tools that engineers can use
to help them do their jobs effectively.
Section 1 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Explain why engineering practice is an inherently 3. Describe the difference between descriptive and
statistical enterprise. (formal) inferential statistics.
2. Explain why the concept of variability has a central
place in the subject of engineering statistics.
1.2 Basic Terminology 5
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Definition 3 An experimental study (or, more simply, an experiment) is one in which the
investigator’s role is active. Process variables are manipulated, and the study
environment is regulated.
Most real statistical studies have both observational and experimental features,
and these two definitions should be thought of as representing idealized opposite
ends of a continuum. On this continuum, the experimental end usually provides
the most efficient and reliable ways to collect engineering data. It is typically
much quicker to manipulate process variables and watch how a system responds
to the changes than to passively observe, hoping to notice something interesting or
revealing.
Inferring In addition, it is far easier and safer to infer causality from an experiment than
causality from an observational study. Real systems are complex. One may observe several
instances of good process performance and note that they were all surrounded by
circumstances X without being safe in assuming that circumstances X cause good
process performance. There may be important variables in the background that are
changing and are the true reason for instances of favorable system behavior. These
so-called lurking variables may govern both process performance and circum-
stances X. Or it may simply be that many variables change haphazardly without
appreciable impact on the system and that by chance, during a limited period of
observation, some of these happen to produce X at the same time that good perfor-
mance occurs. In either case, an engineer’s efforts to create X as a means of making
things work well will be wasted effort.
6 Chapter 1 Introduction
Most engineering studies tend to be of the second type, although some important
engineering applications do involve enumerative work. One such example is the
1.2 Basic Terminology 7
reliability testing of critical components—e.g., for use in a space shuttle. The interest
is in the components actually in hand and how well they can be expected to perform
rather than on any broader problem like “the behavior of all components of this
type.” Acceptance sampling (where incoming lots are checked before taking formal
receipt) is another important kind of enumerative study. But as indicated, most
engineering studies are analytical in nature.
Example 2 The students working on the pelletizing machine were not interested in any partic-
(continued ) ular batch of pellets, but rather in the question of how to make the machine work
effectively. They hoped (or tacitly assumed) that what they learned about making
fuel pellets would remain valid at later times, at least under shop conditions like
those they were facing. Their experimental study was analytical in nature.
Particularly when discussing enumerative studies, the next two definitions are
helpful.
Definition 6 A population is the entire group of objects about which one wishes to gather
information in a statistical study.
Definition 7 A sample is the group of objects on which one actually gathers data. In the
case of an enumerative investigation, the sample is a subset of the population
(and can in some cases include the entire population).
Figure 1.2 shows the relationship between a population and a sample. If a crate of
100 machine parts is delivered to a loading dock and 5 are examined in order to
verify the acceptability of the lot, the 100 parts constitute the population of interest,
and the 5 parts make up a (single) sample of size 5 from the population. (Notice the
word usage here: There is one sample, not five samples.)
Sample
Population
There are several ways in which the meanings of the words population and
sample are often extended. For one, it is common to use them to refer to not only
objects under study but also data values associated with those objects. For example,
if one thinks of Rockwell hardness values associated with 100 crated machine parts,
the 100 hardness values might be called a population (of numbers). Five hardness
values corresponding to the parts examined in acceptance sampling could be termed
a sample from that population.
Example 2 Cyr, Ellson, and Rickard identified eight different sets of experimental conditions
(continued ) under which to run the pelletizing machine. Several production runs of fuel pellets
were made under each set of conditions, and each of these produced its own
percentage of conforming pellets. These eight sets of percentages can be referred
to as eight different samples (of numbers).
Definition 8 Qualitative or categorical data are the values of basically nonnumerical char-
acteristics associated with items in a sample. There can be an order associated
with qualitative data, but aggregation and counting are required to produce
any meaningful numerical values from such data.
Consider again 5 machine parts constituting a sample from 100 crated parts. If each
part can be classified into one of the (ordered) categories (1) conforming, (2) rework,
and (3) scrap, and one knows the classifications of the 5 parts, one has 5 qualitative
data points. If one aggregates across the 5 and finds 3 conforming, 1 reworkable, and
1 scrap, then numerical summaries have been derived from the original categorical
data by counting.
In contrast to categorical data are numerical data.
1.2 Basic Terminology 9
Returning to the crated machine parts, Rockwell hardness values for 5 selected
parts would constitute a set of quantitative measurement data. Counts of visible
blemishes on a machined surface for each of the 5 selected parts would make up a
set of quantitative count data.
It is sometimes convenient to act as if infinitely precise measurement were
possible. From that perspective, measured variables are continuous in the sense
that their sets of possible values are whole (continuous) intervals of numbers. For
example, a convenient idealization might be that the Rockwell hardness of a ma-
chine part can lie anywhere in the interval (0, ∞). But of course this is only an
idealization. All real measurements are to the nearest unit (whatever that unit may
be). This is becoming especially obvious as measurement instruments are increas-
ingly equipped with digital displays. So in reality, when looked at under a strong
enough magnifying glass, all numerical data (both measured and count alike) are
discrete in the sense that they have isolated possible values rather than a continuum
of available outcomes. Although (0, ∞) may be mathematically convenient and
completely adequate for practical purposes, the real set of possible values for the
measured Rockwell hardness of a machine part may be more like {.1, .2, .3, . . .}
than like (0, ∞).
Well-known conventional wisdom is that measurement data are preferable to
categorical and count data. Statistical methods for measurements are simpler and
more informative than methods for qualitative data and counts. Further, there is
typically far more to be learned from appropriate measurements than from qualitative
data taken on the same physical objects. However, this must sometimes be balanced
against the fact that measurement can be more time-consuming (and thus expensive)
than the gathering of qualitative data.
Example 3 Information on 200 pellets was collected. The students could have simply
(continued ) observed and recorded whether or not a given pellet had mass within the specifi-
cations, thereby producing qualitative data. Instead, they took the time necessary
to actually measure pellet mass to the nearest .1 gram—thereby collecting mea-
surement data. A graphical summary of their findings is shown in Figure 1.3.
20
Frequency
10
3 4 5 6 7 8
Mass (g)
Notice that one can recover from the measurements the conformity/noncon-
formity information—about 28.5% (57 out of 200) of the pellets had masses that
did not meet specifications. But there is much more in Figure 1.3 besides this.
The shape of the display can give insights into how the machine is operating and
the likely consequences of simple modifications to the pelletizing process. For
example, note the truncated or chopped-off appearance of the figure. Masses do
not trail off on the high side as they do on the low side. The students reasoned that
this feature of their data had its origin in the fact that after powder is dispensed
into a die, it passes under a paddle that wipes off excess material before a cylinder
compresses the powder in the die. The amount initially dispensed to a given die
may have a fairly symmetric mound-shaped distribution, but the paddle probably
introduces the truncated feature of the display.
Also, from the numerical data displayed in Figure 1.3, one can find a per-
centage of pellet masses in any interval of interest, not just the interval [6.2, 7.0].
And by mentally sliding the figure to the right, it is even possible to project the
likely effects of increasing die size by various amounts.
Definition 10 Univariate data arise when only a single characteristic of each sampled item
is observed.
Definition 11 Multivariate data arise when observations are made on more than one
characteristic of each sampled item. A special case of this involves two
characteristics—bivariate data.
Definition 12 When multivariate data consist of several determinations of basically the same
characteristic (e.g., made with different instruments or at different times),
the data are called repeated measures data. In the special case of bivariate
responses, the term paired data is used.
Definition 13 A (complete) factorial study is one in which several process variables (and
settings of each) are identified as being of interest, and data are collected under
each possible combination of settings of the process variables. The process
variables are usually called factors, and the settings of each variable that are
studied are termed levels of the factor.
For example, suppose there are four factors of interest—call them A, B, C, and D for
convenience. If A has 3 levels, B has 2, C has 2, and D has 4, a study that includes
samples collected under each of the 3 × 2 × 2 × 4 = 48 different possible sets of
conditions would be called a 3 × 2 × 2 × 4 factorial study.
Combining these then produced eight sets of conditions under which data were
collected (see Table 1.2).
Table 1.2
Combinations in a 23 Factorial Study
When many factors and/or levels are involved, the number of samples in a
full factorial study quickly reaches an impractical size. Engineers often find that
1.2 Basic Terminology 13
they want to collect data for only some of the combinations that would make up a
complete factorial study.
Definition 14 A fractional factorial study is one in which data are collected for only some
of the combinations that would make up a complete factorial study.
One cannot hope to learn as much about how a response is related to a given set
of factors from a fractional factorial study as from the corresponding full factorial
study. Some information must be lost when only part of all possible sets of conditions
are studied. However, some fractional factorial studies will be potentially more
informative than others. If only a fixed number of samples can be taken, which
samples to take is an issue that needs careful consideration. Sections 8.3 and 8.4
discuss fractional factorials in detail, including how to choose good ones, taking
into account what part of the potential information from a full factorial study they
can provide.
Example 2 The experiment actually carried out on the pelletizing process was, as indicated
(continued ) in Table 1.2, a full factorial study. Table 1.3 lists four experimental combinations,
forming a well-chosen half of the eight possible combinations. (These are the
combinations numbered 2, 3, 5, and 8 in Table 1.2.)
Table 1.3
Half of the 23 Factorial
Section 2 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Describe a situation in your field where an observa- 3. What kind of information can be derived from
tional study might be used to answer a question of a single sample of n bivariate data points (x, y)
real importance. Describe another situation where that can’t be derived from two separate sam-
an experiment might be used. ples of, respectively, n data points x and n data
2. Describe two different contexts in your field where, points y?
respectively, qualitative and quantitative data might 4. Describe a situation in your field where paired data
arise. might arise.
14 Chapter 1 Introduction
5. Consider a study of making paper airplanes, where a full factorial and then a fractional factorial data
two different Designs (say, delta versus t wing), two structure that might arise from such a study.
different Papers (say, construction versus typing), 6. Explain why it is safer to infer causality from an
and two different Loading Conditions (with a paper experiment than from an observational study.
clip versus without a paper clip) are of interest in
terms of their effects on flight distance. Describe
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
60°
40°
Angle
past vertical 20° Metal part
horizontal position, released, and allowed to swing through a test part firmly
fixed in a vertical position at the bottom of its arc of motion. The number of
degrees past vertical that the arm traversed after impact with the part provided an
effective measure of brittleness.
follows. The book is to be opened to a page somewhere near the beginning and one
somewhere near the end. The stack between the two pages is to be grasped firmly
between the thumb and index finger and stack thickness read to the nearest .1 mm
using an ordinary ruler. Dividing the stack thickness by the number of sheets in the
stack and recording the result to the nearest .0001 mm will then produce a thickness
measurement.
Figure 1.5 shows a graph of these data and clearly reveals that even repeated
measurements by one person on one book will vary and also that the patterns of
variation for two different individuals can be quite different. (Wendel’s values
are both smaller and more consistent than Gulliver’s.)
Wendel
Gulliver
Example 7 Ignoring the possibility that some property of Gulliver’s book was responsible for
(continued ) his values showing more spread than those of Wendel, it appears that Wendel’s
measuring technique was more precise than Gulliver’s.
The precision of both students’ measurements could probably have been
improved by giving each a binder clip and a micrometer. The binder clip would
provide a relatively constant pressure on the stacks of pages being measured,
thereby eliminating the subjectivity and variation involved in grasping the stack
firmly between thumb and index finger. For obtaining stack thickness, a microm-
eter is clearly a more precise instrument than a ruler.
Maintaining the U.S. reference sets for physical measurement is the business of
the National Institute of Standards and Technology. It is important business. Poorly
calibrated measuring devices may be sufficient for local purposes of comparing
local conditions. But to establish the values of quantities in any absolute sense, or
to expect local values to have meaning at other places and other times, it is essential
to calibrate measurement systems against a constant standard. A millimeter must be
the same today in Iowa as it was last week in Alaska.
The possibility of bias or inaccuracy in measuring systems has at least two im-
portant implications for planning statistical engineering studies. First, the fact that
18 Chapter 1 Introduction
Accuracy and measurement systems can lose accuracy over time demands that their performance
statistical be monitored over time and that they be recalibrated as needed. The well-known
studies phenomenon of instrument drift can ruin an otherwise flawless statistical study.
Second, whenever possible, a single system should be used to do all measuring. If
several measurement devices or technicians are used, it is hard to know whether the
differences observed originate with the variables under study or from differences in
devices or technician biases. If the use of several measurement systems is unavoid-
able, they must be calibrated against a standard (or at least against each other). The
following example illustrates the role that human differences can play.
Section 3 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Why might it be argued that in terms of producing Explain which of the three aspects of measure-
useful measurements, one must deal first with the ment quality—validity, precision, and accuracy—
issue of validity, then the issue of precision, and this averaging of many measurements can be ex-
only then the issue of accuracy? pected to improve and which it cannot.
2. Often, in order to evaluate a physical quantity 3. Explain the importance of the stability of the mea-
(for example, the mean yield of a batch chemi- surement system to the real-world success of a sta-
cal process run according to some standard plant tistical engineering study.
operating procedures), a large number of measure-
ments of the quantity are made and then averaged.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Mathematical models are themselves not reality, but they can be extremely effective
descriptions of reality. This effectiveness hinges on two somewhat opposing prop-
erties of a mathematical model: (1) its degree of simplicity and (2) its predictive
20 Chapter 1 Introduction
ability. The most powerful mathematical models are those that simultaneously are
simple and generate good predictions. A model’s simplicity allows one to maneuver
within its framework, deriving mathematical consequences of basic assumptions that
translate into predictions of process behavior. When these are empirically correct,
one has an effective engineering tool.
The elementary “laws” of mechanics are an outstanding example of effective
mathematical modeling. For example, the simple mathematical statement that the
acceleration due to gravity is constant,
a=g
yields, after one easy mathematical maneuver (an integration), the prediction that
beginning with 0 velocity, after a time t in free fall an object will have velocity
v = gt
And a second integration gives the prediction that beginning with 0 velocity, a time t
in free fall produces displacement
1 2
d= gt
2
The beauty of this is that for most practical purposes, these easy predictions are quite
adequate. They agree well with what is observed empirically and can be counted
on as an engineer designs, builds, operates, and/or improves physical processes or
products.
Mathematics But then, how does the notion of mathematical modeling interact with the
and statistics subject of engineering statistics? There are several ways. For one, data collection
and analysis are essential in fitting or estimating parameters of mathematical
models. To understand this point, consider again the example of a body in free fall.
If one postulates that the acceleration due to gravity is constant, there remains the
question of what numerical value that constant should have. The parameter g must
be evaluated before the model can be used for practical purposes. One does this by
gathering data and using them to estimate the parameter.
A standard first college physics lab has traditionally been to empirically evalu-
ate g. The method often used is to release a steel bob down a vertical wire running
through a hole in its center and allowing 60-cycle current to arc from the bob through
a paper tape to another vertical wire, burning the tape slightly with every arc. A
schematic diagram of the apparatus used is shown in Figure 1.7. The vertical posi-
1
tions of the burn marks are bob positions at intervals of 60 of a second. Table 1.4
gives measurements of such positions. (We are grateful to Dr. Frank Peterson of
the ISU Physics and Astronomy Department for supplying the tape.) Plotting the
bob positions in the table at equally spaced intervals produces the approximately
quadratic plot shown in Figure 1.8. Picking a parabola to fit the plotted points in-
volves identifying an appropriate value for g. A method of curve fitting (discussed
in Chapter 4) called least squares produces a value for g of 9.79m/sec2 , not far from
the commonly quoted value of 9.8m/sec2 .
1.4 Mathematical Models, Reality, and Data Analysis 21
Paper tape
Arc
Sliding
metal
bob
Bare
Bare
wire
wire
AC Generator
Table 1.4
Measured Displacements of a Bob in Free Fall
1 .8 13 223.8
2 4.8 14 260.0
3 10.8 15 299.2
4 20.1 16 340.5
5 31.9 17 385.0
6 45.9 18 432.2
7 63.3 19 481.8
8 83.1 20 534.2
9 105.8 21 589.8
10 131.3 22 647.7
11 159.5 23 708.8
12 190.5
Notice that (at least before Newton) the data in Table 1.4 might also have been
used in another way. The parabolic shape of the plot in Figure 1.8 could have
suggested the form of an appropriate model for the motion of a body in free fall.
That is, a careful observer viewing the plot of position versus time should conclude
that there is an approximately quadratic relationship between position and time (and
22 Chapter 1 Introduction
700
600
500
Displacement (mm)
400
300
200
100
1
Time ( 60 second)
from that proceed via two differentiations to the conclusion that the acceleration
due to gravity is roughly constant). This text is full of examples of how helpful it
can be to use data both to identify potential forms for empirical models and to then
estimate parameters of such models (preparing them for use in prediction).
This discussion has concentrated on the fact that statistics provides raw material
for developing realistic mathematical models of real systems. But there is another
important way in which statistics and mathematics interact. The mathematical theory
of probability provides a framework for quantifying the uncertainty associated with
inferences drawn from data.
If, for example, five students arrive at the five different laboratory values of g,
questions naturally arise as to how to use them to state both a best value for g
and some measure of precision for the value. The theory of probability provides
guidance in addressing these issues. Material in Chapter 6 shows that probability
Chapter 1 Exercises 23
considerations support using the class average of 9.796 to estimate g and attaching
to it a precision on the order of plus or minus .02m/sec2 .
We do not assume that the reader has studied the mathematics of probability,
so this text will supply a minimal introduction to the subject. But do not lose sight
of the fact that probability is not statistics—nor vice versa. Rather, probability is a
branch of mathematics and a useful subject in its own right. It is met in a statistics
course as a tool because the variation that one sees in real data is closely related
conceptually to the notion of chance modeled by the theory of probability.
Section 4 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Explain in your own words the importance of mathematical models to engineering practice.
Chapter 1 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Calibration of measurement equipment is most 5. Describe a situation from your field where a full
clearly associated with which of the following factorial study might be conducted (name at least
concepts: validity, precision, or accuracy? Explain. three factors, and the levels of each, that would
2. If factor A has levels 1, 2, and 3, factor B has appear in the study).
levels 1 and 2, and factor C has levels 1 and 2, list 6. Example 7 concerns the measurement of the thick-
the combinations of A, B, and C that make up a ness of book paper. Variation in measurements is
full factorial arrangement. a fact of life. To observe this reality firsthand,
3. Explain how paired data might arise in a heat measure the thickness of the paper used in this
treating study aimed at determining the best way book ten times. Use the method described imme-
to heat treat parts made from a certain alloy. diately before Example 7. For each determination,
record the measured stack thickness, the number
4. Losen, Cahoy, and Lewis purchased eight spanner
of sheets, and the quotient to four decimal places.
bushings of a particular type from a local machine
If you are using this book in a formal course,
shop and measured a number of characteristics of
be prepared to hand in your results and compare
these bushings, including their outside diameters.
them with the values obtained by others in your
Each of the eight outside diameters was measured
class.
once by two student technicians, with the follow-
ing results. (The units are inches.) Considering 7. Exercise 6 illustrates the reality of variation in
both students’ measurements, what type of data physical measurement. Another exercise that is
are given here? Explain. similar in spirit, but leads to qualitative data, in-
volves the spinning of U.S. pennies. Spin a penny
on a hard surface 20 different times; for each trial,
Bushing 1 2 3 4
record whether the penny comes to rest with heads
Student A .3690 .3690 .3690 .3700
or tails showing. Did all the trials have the same
Student B .3690 .3695 .3695 .3695 outcome? Is the pattern you observed the one you
Bushing 5 6 7 8 expected to see? If not, do you have any possible
Student A .3695 .3700 .3695 .3690 explanations?
Student B .3695 .3700 .3700 .3690
24 Chapter 1 Introduction
8. Consider a situation like that of Example 1 (in- variables include such things as the hardnesses,
volving the heat treating of gears). Suppose that diameters and surface roughnesses of the pistons
the original gears can be purchased from a variety and the hardnesses, and inside diameters and sur-
of vendors, they can be made out of a variety of face roughnesses of the bores into which the pis-
materials, they can be heated according to a va- tons fit. Describe, in general terms, an observa-
riety of regimens (involving different times and tional study to try to determine how to improve
temperatures), they can be cooled in a number of life. Then describe an experimental study and say
different ways, and the furnace atmosphere can why it might be preferable.
be adjusted to a variety of different conditions. A 12. In the context of Exercise 9, it might make sense
number of features of the final gears are of interest, to average the strengths you record. Would you
including their flatness, their concentricity, their expect such an average to be more or less precise
hardness (both before and after heat treating), and than a single measurement as an estimate of the
their surface finish. average strength of this kind of dowel? Explain.
(a) What kind of data arise if, for a single set Argue that such averages can be no more (or less)
of conditions, the Rockwell hardness of sev- accurate than the individual measurements that
eral gears is measured both before and after make them up.
heat treating? (Use the terminology of Sec-
13. A toy catapult launches golf balls. There are a
tion 1.2.) In the same context, suppose that
number of things that can be altered on the con-
engineering specifications on flatness require
figuration of the catapult: The length of the arm
that measured flatness not exceed .40 mm.
can be changed, the angle the arm makes when it
If flatness is measured for several gears and
hits the stop can be changed, the pull-back angle
each gear is simply marked Acceptable or Not
can be changed, the weight of the ball launched
Acceptable, what kind of data are generated?
can be changed, and the place the rubber cord
(b) Describe a three-factor full factorial study that
(used to snap the arm forward) is attached to the
might be carried out in this situation. Name
arm can be changed. An experiment is to be done
the factors that will be used and describe the
to determine how these factors affect the distance
levels of each. Write out a list of all the differ-
a ball is launched.
ent combinations of levels of the factors that
(a) Describe one three-factor full factorial study
will be studied.
that might be carried out. Make out a data
9. Suppose that you wish to determine “the” axial collection form that could be used. For each
strength of a type of wooden dowel. Why might it launch, specify the level to be used of each of
be a good idea to test several such dowels in order the three factors and leave a blank for record-
to arrive at a value for this “physical constant”? ing the observed value of the response vari-
10. Give an example of a 2 × 3 full factorial data able. (Suppose two launches will be made for
structure that might arise in a student study of the each setup.)
breaking strengths of wooden dowels. (Name the (b) If each of the five factors mentioned above is
two factors involved, their levels, and write out all included in a full factorial experiment, a min-
six different combinations.) Then make up a data imum of how many different combinations of
collection form for the study. Plan to record both levels of the five factors will be required? If
the breaking strength and whether the break was there is time to make only 16 launches with
clean or splintered for each dowel, supposing that the device during the available lab period, but
three dowels of each type are to be tested. you want to vary all five factors, what kind of
11. You are a mechanical engineer charged with im- a data collection plan must you use?
proving the life-length characteristics of a hydro-
static transmission. You suspect that important
Chapter 1 Exercises 25
14. As a variation on Exercise 6, you could try using by applying the method in Exercise 6 ten times
only pages in the first four chapters of the book. to stacks of pages from only the first four chap-
If there were to be a noticeable change in the ul- ters. Is there a noticeable difference in precision
timate precision of thickness measurement, what of measurement from what is obtained using the
kind of a change would you expect? Try this out whole book?
2
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Data Collection
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
2.1.1 Measurement
Good measurement is indispensable in any statistical engineering study. An engi-
neer planning a study ought to ensure that data on relevant variables will be col-
lected by well-trained people using measurement equipment of known and adequate
quality.
When choosing variables to observe in a statistical study, the concepts of mea-
surement validity and precision, discussed in Section 1.3, must be remembered. One
practical point in this regard concerns how directly a measure represents a system
property. When a direct measure exists, it is preferable to an indirect measure,
because it will usually give much better precision.
26
2.1 General Principles in the Collection of Engineering Data 27
2.1.2 Sampling
Once it is established how measurement/observation will proceed, the engineer can
consider how much to do, who is to do it, where and under what conditions it is
to be done, etc. Sections 2.2, 2.3, and 2.4 consider the question of choosing what
observations to make, first in enumerative and then in experimental studies. But first,
a few general comments about the issues of “How much?”, “Who?”, and “Where?”.
How much The most common question engineers ask about data collection is “How many
data? observations do I need?” Unfortunately, the proper answer to the question is typically
“it depends.” As you proceed through this book, you should begin to develop some
intuition and some rough guides for choosing sample sizes. For the time being, we
point out that the only factor on which the answer to the sample size question really
depends is the variation in response that one expects (coming both from unit-to-unit
variation and from measurement variation).
This makes sense. If objects to be observed were all alike and perfect measure-
ment were possible, then a single observation would suffice for any purpose. But if
there is increase either in the measurement noise or in the variation in the system
or population under study, the sample size necessary to get a clear picture of reality
becomes larger.
However, one feature of the matter of sample size sometimes catches people a bit
off guard—the fact that in enumerative studies (provided the population size is large),
sample size requirements do not depend on the population size. That is, sample size
requirements are not relative to population size, but, rather, are absolute. If a sample
size of 5 is adequate to characterize compressive strengths of a lot of 1,000 red
clay bricks, then a sample of size 5 would be adequate to characterize compressive
strengths for a lot of 100,000 bricks with similar brick-to-brick variability.
Who should The “Who?” question of data collection cannot be effectively answered without
collect data? reference to human nature and behavior. This is true even in a time when automatic
data collection devices are proliferating. Humans will continue to supervise these
and process the information they generate. Those who collect engineering data must
not only be well trained; they must also be convinced that the data they collect will
be used and in a way that is in their best interests. Good data must be seen as a help
in doing a good job, benefiting an organization, and remaining employed, rather
than as pointless or even threatening. If those charged with collecting or releasing
data believe that the data will be used against them, it is unrealistic to expect them
to produce useful data.
2.1 General Principles in the Collection of Engineering Data 29
Even where those who will gather data are convinced of its importance and are
eager to cooperate, care must be exercised. Personal biases (whether conscious or
subconscious) must not be allowed to enter the data collection process. Sometimes
in a statistical study, hoped-for or predicted best conditions are deliberately or
unwittingly given preference over others. If this is a concern, measurements can be
made blind (i.e., without personnel knowing what set of conditions led to an item
being measured). Other techniques for ensuring fair play, having less to do with
human behavior, will be discussed in the next two sections.
Where should The “Where?” question of engineering data collection can be answered in
data be general terms: “As close as possible in time and space to the phenomenon being
collected? studied.” The importance of this principle is most obvious in the routine monitoring
of complex manufacturing processes. The performance of one operation in such a
process is most effectively monitored at the operation rather than at some later point.
If items being produced turn out to be unsatisfactory at the end of the line, it is rarely
easy to backtrack and locate the operation responsible. Even if that is accomplished,
unnecessary waste has occurred during the time lag between the onset of operation
malfunction and its later discovery.
Example 3 wafers could be identified and eliminated, thus saving the considerable extra
(continued ) expense of further processing. What’s more, the need for adjustments to the
process was signaled in a timely manner.
2.1.3 Recording
The object of engineering data collection is to get data used. How they are recorded
has a major impact on whether this objective is met. A good data recording format
can make the difference between success and failure.
Table 2.1
Mass and Bottom Piece Widths of PVC Bottles
Sample Item Mass (g) Width (mm) Sample Item Mass (g) Width (mm)
First 3 samples
Last 3 samples
31
30
29
Width (mm)
28
27
26
25
24
is easy to see how the two variables are related. If, as in the figure, the recording
symbol is varied over time, it is also easy to track changes in the characteristics
over time. In the present case, width seems to be inversely related to mass, which
appears to be decreasing over time.
Date
Time
Raw Material Lot
Operation
Operator
Machine
Range
Variables Control Chart
1
Zero Equals
2
Measurements
4
Specifications
5
Units
Gage
Sum
Mean
Production Lot(s)
Part and Drawing
Period Covered
Dimension
and values of the experimental variables, and it is also wise to keep track of other
variables that might later prove to be of interest. In the context of routine process
monitoring, data records will be useful in discovering differences in raw material
lots, machines, operators, etc., only if information on these is recorded along with
the responses of interest. Figure 2.2 shows a form commonly used for the routine
collection of measurements for process monitoring. Notice how thoroughly the user
is invited to document the data collection.
Section 1 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Consider the context of a study on making paper ditions (with a paper clip versus without a paper
airplanes where two different Designs (say delta clip) are of interest with regard to their impact on
versus t wing), two different Papers (say construc- flight distance. Give an operational definition of
tion versus typing), and two different Loading Con- flight distance that you might use in such a study.
2.2 Sampling in Enumerative Studies 33
2. Explain how training operators in the proper use our product—we take 5% samples of every outgo-
of measurement equipment might affect both the ing order, regardless of order size!”?
repeatability and the reproducibility of measure- 4. State briefly why it is critical to make careful oper-
ments made by an organization. ational definitions for response variables in statis-
3. What would be your response to another engi- tical engineering studies.
neer’s comment, “We have great information on
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
properties in any useful way. There is no good way to take information from samples
drawn via these methods and make reliable statements of likely margins of error. The
method introduced next avoids the deficiencies of systematic and judgment-based
sampling.
Methods for actually carrying out the selection of a simple random sample
Mechanical methods include mechanical methods and methods using “random digits.” Mechanical
and simple random methods rely for their effectiveness on symmetry and/or thorough mixing in a
sampling physical randomizing device. So to speak, the slips of paper in the hat need to be of
the same size and well scrambled before sample selection begins.
The first Vietnam-era U.S. draft lottery was a famous case in which adequate
care was not taken to ensure appropriate operation of a mechanical randomizing
device. Birthdays were supposed to be assigned priority numbers 1 through 366 in a
“random” way. However, it was clear after the fact that balls representing birth dates
were placed into a bin by months, and the bin was poorly mixed. When the balls
were drawn out, birth dates near the end of the year received a disproportionately
large share of the low draft numbers. In the present terminology, the first five dates
out of the bin should not have been thought of as a simple random sample of size 5.
Those who operate games of chance more routinely make it their business to know
(via the collection of appropriate data) that their mechanical devices are operating
in a more random manner.
2.2 Sampling in Enumerative Studies 35
1. each digit 0 through 9 has the same chance of appearing at any particular
location in the table one wants to consider, and
2. knowledge of which digit will occur at a given location provides no help in
predicting which one will appear at another.
Table 2.2
Random Digits
12159 66144 05091 13446 45653 13684 66024 91410 51351 22772
30156 90519 95785 47544 66735 35754 11088 67310 19720 08379
59069 01722 53338 41942 65118 71236 01932 70343 25812 62275
54107 58081 82470 59407 13475 95872 16268 78436 39251 64247
99681 81295 06315 28212 45029 57701 96327 85436 33614 29070
36 Chapter 2 Data Collection
12159 66144 05091 13446 45653 13684 66024 91410 51351 22772
30156 90519 95785 47544 66735 35754 11088 67310 19720 08379
Data Display
C2
56 74 43 61 80 22 30 67 35 7
10 69 19 49 8 45 3 37 21 17
2 12 9 14 72
2.2 Sampling in Enumerative Studies 37
Section 2 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. For the sake of exercise, treat the runout values for 2. Repeat Exercise 1 using statistical or spreadsheet
38 laid gears (given in Table 1.1) as a population software to do the random sampling.
of interest, and using the random digit table (Ta- 3. Explain briefly why in an enumerative study, a sim-
ble B.1), select a simple random sample of 5 of ple random sample is or is not guaranteed to be
these runouts. Repeat this selection process a total representative of the population from which it is
of four different times. (Begin the selection of the drawn.
first sample at the upper left of the table and pro-
ceed left to right and top to bottom.) Are the four
samples identical? Are they each what you would
call “representative” of the population?
38 Chapter 2 Data Collection
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. taxonomy of variables,
2. handling extraneous variables,
3. comparative study,
4. replication, and
5. allocation of resources.
Then Section 2.4 discusses a few generic experimental frameworks for planning a
specific experiment.
Physical process
Response
variable
Managed variables
Concomitant
variables
Some of the variables that are neither primary responses nor managed in an experi-
ment will nevertheless be observed.
Figure 2.4 is an attempt to picture Definitions 2 through 4. In it, the physical process
somehow produces values of a response. “Knobs” on the process represent managed
variables. Concomitant variables are floating about as part of the experimental
environment without being its main focus.
Example 7 The choice Dimond and Dix made to control Drying Time and the Pressure
(continued ) provided a uniform environment for comparing the nine wood/glue combinations.
But strictly speaking, they learned only about joint behavior under their particular
experimental Time and Pressure conditions.
To make projections for other conditions, they had to rely on their expe-
rience and knowledge of material science to decide how far the patterns they
observed were likely to extend. For example, it may have been reasonable to
expect what they observed to also hold up for any drying time at least as long
as the experimental one, because of expert knowledge that the experimental time
was sufficient for the joints to fully set. But such extrapolation is based on other
than statistical grounds.
Example 7 Consider embellishing a bit on the gluing study of Dimond and Dix. Imagine
(continued ) that the students were uneasy about two issues, the first being the possibility that
surface roughness differences in the pieces to be glued might mask the wood/glue
combination differences of interest. Suppose also that because of constraints on
schedules, the strength testing was going to have to be done in two different
sessions a day apart. Measuring techniques or variables like ambient humidity
might vary somewhat between such periods. How might such potential problems
have been handled?
Blocking is one way. If the specimens of each wood type were separated into
relatively rough and relatively smooth groups, the factor Roughness could have
then served as an experimental factor. Each of the glues could have been used the
same number of times to join both rough and smooth specimens of each species.
This would set up comparison of wood/glue combinations separately for rough
and for smooth surfaces.
In a similar way, half the testing for each wood/glue/roughness combination
might have been done in each testing session. Then, any consistent differences
between sessions could be identified and prevented from clouding the comparison
of levels of the primary experimental variables. Thus, Testing Period could have
also served as a blocking variable in the study.
Experimenters usually hope that by careful planning they can account for the
most important extraneous variables via control and blocking. But not all extraneous
variables can be supervised. There are an essentially infinite number, most of which
Randomization cannot even be named. And there is a way to take out insurance against the possibility
and extraneous that major extraneous variables get overlooked and then produce effects that are
variables mistaken for those of the primary experimental variables.
Example 7 Dimond and Dix took the notion of randomization to heart in their gluing study
(continued ) and, so to speak, randomized everything in sight. In the tension strength testing for
a given type of wood, they glued .500 × .500 × 300 blocks to a .7500 × 3.500 × 31.500
board of the same wood type, as illustrated in Figure 2.5.
Each glue was used for three joints on each type of wood. In order to deal
with any unpredicted differences in material properties (e.g., over the extent of
the board) or unforeseen differences in loading by the steel strap used to provide
pressure on the joints, etc., the students randomized the order in which glue was
applied and the blocks placed along the base board. In addition, when it came
time to do the strength testing, that was carried out in a randomly determined
order.
2.3 Principles for Effective Experimentation 43
Metal strap
Wood block
Block position
Wood board
Example 8 In the gear loading study, hanging was the standard method in use at the time
(continued ) of the study. From its records, the company could probably have located some
values for thrust face runout to use as a baseline for evaluating the laying method.
But the choice to run a comparative study, including both laid and hung gears,
put the engineer on firm ground for drawing conclusions about the new method.
A second In a potentially confusing use of language, the word control is sometimes used
usage of to mean the practice of including a standard or no-change sample in an experiment
“control” for comparison purposes. (Notice that this is not the usage in Definition 3.) When
a control group is included in a medical study to verify the effectiveness of a new
drug, that group is either a standard-treatment or no-treatment group, included to
provide a solid basis of comparison for the new treatment.
2.3.4 Replication
In much of what has been said so far, it has been implicit that having more than one
observation for a given setting of experimental variables is a good idea.
Example 10 “like” the two used in the study, they needed to make and test several prototypes
(continued ) for each design.
ISU Professor Emeritus L. Wolins calls the problem of identifying what con-
stitutes replication in an experiment the unit of analysis problem. There must be
replication of the basic experimental unit or object. The agriculturalist who, in order
to study pig blood chemistry, takes hundreds of measurements per hour on one pig,
has a (highly multivariate) sample of size 1. The pig is the unit of analysis.
Without proper replication, one can only hope to be lucky. If experimental error
is small, then accepting conclusions suggested by samples of size 1 will lead to
correct conclusions. But the problem is that without replication, one usually has
little idea of the size of that experimental error.
Section 3 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Consider again the paper airplane study from Ex- 5. Continuing the paper airplane scenario of Exercise
ercise 1 of Section 2.1. Describe some variables 1 of Section 2.1, discuss the pros and cons of Tom
that you would want to control in such a study. and Juanita flying each of their own eight planes
What are the response and experimental variables twice, as opposed to making and flying two planes
that would be appropriate in this context? Name a of each of the eight types, one time each.
potential concomitant variable here. 6. Random number tables are sometimes used in the
2. In general terms, what is the trade-off that must planning of both enumerative and analytical/ex-
be weighed in deciding whether or not to control a perimental studies. What are the two different ter-
variable in a statistical engineering study? minologies employed in these different contexts,
3. In the paper airplane scenario of Exercise 1 of Sec- and what are the different purposes behind the use
tion 2.1, if (because of schedule limitations, for of the tables?
example) two different team members will make 7. What is blocking supposed to accomplish in an
the flight distance measurements, discuss how the engineering experiment?
notion of blocking might be used. 8. What are some purposes of replication in a statisti-
4. Again using the paper airplane scenario of Exer- cal engineering study?
cise 1 of Section 2.1, suppose that two students are 9. Comment briefly on the notion that in order for
each going to make and fly one airplane of each a statistical engineering study to be statistically
of the 23 = 8 possible types once. Employ the no- proper, one should know before beginning data col-
tion of randomization and Table B.1 and develop lection exactly how an entire experimental budget
schedules for Tom and Juanita to use in their flight is to be spent. (Is this, in fact, a correct idea?)
testing. Explain how the table was used.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Notice that this definition says nothing about how the combinations of settings
of experimental variables included in the study are structured. In fact, they may
be essentially unstructured or produce data with any of the structures discussed
in Section 1.2. That is, there are completely randomized one-factor, factorial, and
fractional factorial experiments. The essential point in Definition 8 is that all else is
randomized except what is restricted by choice of which combinations of levels of
experimental variables are to be used in the study.
Although it doesn’t really fit every situation (or perhaps even most) in which
Paraphrase of the term complete randomization is appropriate, language like the following is
the definition commonly used to capture the intent of Definition 8. “Experimental units (objects)
of complete are allocated at random to the treatment combinations (settings of experimental
randomization variables). Experimental runs are made in a randomly determined order. And any
post-facto measuring of experimental outcomes is also carried out in a random
order.”
100%
HF
75%
HF
50%
HF
Randomization is a good idea. Its virtues have been discussed at some length.
So it would be wise to point out that using it can sometimes lead to practically
unworkable experimental plans. Dogmatic insistence on complete randomization
can in some cases be quite foolish and unrealistic. Changing experimental variables
according to a completely randomly determined schedule can sometimes be exceed-
ingly inconvenient (and therefore expensive). If the inconvenience is great and the
fear of being misled by the effects of extraneous variables is relatively small, then
backing off from complete to partial randomization may be the only reasonable
course of action. But when choosing not to randomize, the implications of that
choice must be carefully considered.
Example 11 From the discussion of replication in the previous section and present con-
(continued ) siderations of complete randomization, it would seem that the purest method of
conducting the study would be to make a new dilution of HF for each of the rods
as its turn comes for testing. But this would be time-consuming and might require
more acid than was available.
If the investigator had three containers to use for baths but limited acid, an
alternative possibility would be to prepare three different dilutions, one 100%,
one 75%, and one 50% dilution. A given dilution could then be used in testing
all rods assigned to that concentration. Notice that this alternative allows for a
randomized order of testing, but it introduces some question as to whether there
is “true” replication.
Taking the resource restriction idea one step further, notice that even if an
investigator could afford only enough acid for making one bath, there is a way
of proceeding. One could do all 100% concentration testing, then dilute the
acid and do all 75% testing, then dilute the acid again and do all 50% testing.
The resource restriction would not only affect the “purity” of replication but also
prevent complete randomization of the experimental order. Thus, for example, any
unintended effects of increased contamination of the acid (as more and more tests
were made using it) would show up in the experimental data as indistinguishable
from effects of differences in acid concentration.
To choose intelligently between complete randomization (with “true” repli-
cation) and the two plans just discussed, the real severity of resource limitations
would have to be weighed against the likelihood that extraneous factors would
jeopardize the usefulness of experimental results.
blocking variables also have one of these standard structures. The essential points
of Definition 9 are the completeness of each block (in the sense that it contains
each setting of the primary variables) and the randomization within each block. The
following two examples illustrate that depending upon the specifics of a scenario,
Definition 9 can describe a variety of experimental plans.
Example 12 As actually run, Gronberg’s golf ball flight study amounted to a randomized
(continued ) complete block experiment. This is because he hit and recorded flight distances
for all 30 balls on six different evenings (over a six-week period). Note that
this allowed him to have (six different) homogeneous conditions under which to
compare the flight distances of balls having 80, 90, and 100 compression. (The
blocks account for possible changes over time in his physical condition and skill
level as well as varied environmental conditions.)
Notice the structure of the data set that resulted from the study. The settings of
the single primary experimental variable Compression combined with the levels
of the single blocking factor Day to produce a 3 × 6 factorial structure for 18
samples of size 10, as pictured in Figure 2.7.
100
Compression
90
Compression
80
Compression
Operator 1
Block 1 1 Run 1 Run 1 Run 1 Run
Day 1
Operator 2
Block 2 1 Run 1 Run 1 Run 1 Run
Day 1
Operator 2
Block 4 1 Run 1 Run 1 Run 1 Run
Day 2
Table 2.4
Half of a 23 Factorial Run Once in Each of Four Blocks
Example 13 In Section 1.2, the pelletizing machine study examined all eight possible com-
(continued ) binations of Volume, Flow, and Mixture. These are listed in Table 2.5. Imagine
that only half of these eight combinations can be run on a given day, and there
is some fear that daily environmental conditions might strongly affect process
performance. How might one proceed?
There are then two blocks (days), each of which will accommodate four
runs. Some possibilities for assigning runs to blocks would clearly be poor. For
example, running combinations 1 through 4 on the first day and 5 through 8 on
54 Chapter 2 Data Collection
the second would make it impossible to distinguish the effects of Mixture from
any important environmental effects.
What turns out to be a far better possibility is to run, say, the four combinations
listed in Table 2.3 (combinations 2, 3, 5, and 8) on one day and the others on
the next. This is illustrated in Table 2.6. In a well-defined sense (explained in
Chapter 8), this choice of an incomplete block plan minimizes the unavoidable
clouding of inferences caused by the fact all eight combinations of levels of
Volume, Flow, and Mixture cannot be run on a single day.
Table 2.6
A 23 Factorial Run in Two Incomplete Blocks
Table 2.7
A Once-Replicated 23 Factorial Run in Four Incomplete
Blocks
There may be some reader uneasiness and frustration with the “rabbit out of a
hat” nature of the examples of incomplete block experiments, since there has been
no discussion of how to go about making up a good incomplete block plan. Both
the choosing of an incomplete block plan and corresponding techniques of data
analysis are advanced topics that will not be developed until Chapter 8. The purpose
here is to simply introduce the possibility of incomplete blocks as a useful option in
experimental planning.
56 Chapter 2 Data Collection
Section 4 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. What standard name might be applied to the ex- How does the method you used here differ from
perimental plan you developed for Exercise 4 of what you did in part (a)?
Section 2.3? 3. Once more referring to the paper airplane scenario
2. Consider an experimental situation where the three of Exercise 1 of Section 2.1, suppose that only the
factors A, B, and C each have two levels, and it factors Design and Paper are of interest (all planes
is desirable to make three experimental runs for will be made without paper clips) but that Tom and
each of the possible combinations of levels of the Juanita can make and test only two planes apiece.
factors. Devise an incomplete block plan for this study that
(a) Select a completely random order of experi- gives each student experience with both designs
mentation. Carefully describe how you use Ta- and both papers. (Which two planes will each make
ble B.1 or statistical software to do this. Make and test?)
an ordered list of combinations of levels of the 4. Again in the paper airplane scenario of Exercise 1
three factors, prescribing which combination of Section 2.1, suppose that Tom and Juanita each
should be run first, second, etc. have time to make and test only four airplanes
(b) Suppose that because of physical constraints, apiece, but that in toto they still wish to test all eight
only eight runs can be made on a given day. possible types of planes. Develop a sensible plan
Carefully discuss how the concept of blocking for doing this. (Which planes should each person
could be used in this situation when planning test?) You will probably want to be careful to make
which experimental runs to make on each of sure that each person tests two delta wing planes,
three consecutive days. What possible purpose two construction paper planes, and two paper clip
would blocking serve? planes. Why is this? Can you arrange your plan so
(c) Use Table B.1 or statistical software to ran- that each person tests each Design/Paper combina-
domize the order of experimentation within tion, each Design/Loading combination, and each
the blocks you described in part (b). (Make Paper/Loading combination once?
a list of what combinations of levels of the fac- 5. What standard name might be applied to the plan
tors are to be run on each day, in what order.) you developed in Exercise 4?
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
PROBLEM DEFINITION
STUDY DEFINITION
PHYSICAL PREPARATION
These 12 points are listed in a reasonably rational order, but planning any
real study may involve departures from the listed order as well as a fair amount
of iterating among the steps before they are all accomplished. The need for other
steps (like finding funds to pay for a proposed study) will also be apparent in some
contexts. Nevertheless, steps 1 through 12 form a framework for getting started.
1000
500
has studied and asked questions in order to gain expert knowledge about a system
is he or she then in a position to decide intelligently what is not known about the
system—and thus what data will be of help.
It is often helpful at step 2 to make flowcharts describing an ideal process and/or
the process as it is currently operating. (Sometimes the comparison of the two is
enough in itself to show an engineer how a process should be modified.) During the
construction of such a chart, data needs and variables of potential interest can be
identified in an organized manner.
Work
order
Typesetting Yes
Typesetting
needed?
No
Yes Makeready
Makeready
needed?
No
Photo lab
Masking
Plating
Printing
Cutting Yes
Cutting
needed?
No
Yes Folding
Folding
needed?
No
Ship
Example 15 what might go wrong in the printing process and at what points what data could
(continued ) be gathered in order to monitor and improve process performance.
Step 3 After determining the general arena and physical context of a statistical engi-
neering study, it is necessary to agree on a statement of purpose and scope for the
study. An engineering project team assigned to work on a wave soldering process
for printed circuit boards must understand the steps in that process and then begin to
define what part(s) of the process will be included in the study and what the goal(s)
of the study will be. Will flux formulation and application, the actual soldering,
subsequent cleaning and inspection, and touch-up all be studied? Or will only some
part of this list be investigated? Is system throughput the primary concern, or is it
instead some aspect of quality or cost? The sharper a statement of purpose and scope
can be made at this point, the easier subsequent planning steps will be.
Tooling Material
Figure 2.11 Cause and effect diagram for a molding process. From the Third Symposium
on Taguchi Methods.
c Copyright, American Supplier Institute, Dearborn, Michigan
(U.S.A.). Reproduced by permission under License No. 930403.
62 Chapter 2 Data Collection
Example 16 wheel quality. Without some kind of organization, it would be all but impossible
(continued ) to develop anything like a complete list of important factors in a complex situation
like this.
Step 6 Armed with (1) a list of variables that might influence the response(s) of interest
and some guesses at their relative importance, (2) a solid understanding of the issues
raised in Section 2.3, and (3) knowledge of resource and physical constraints and
time-frame requirements, one can begin to make decisions about which (if any)
variables are to be managed. Experiments have some real advantages over purely
observational studies (see Section 1.2). Those must be weighed against possible extra
costs and difficulties associated with managing both variables that are of interest
and those that are not. The hope is to choose a physically and financially workable
set of managed variables in such a way that the aggregate effects of variables not of
interest and not managed are not so large as to mask the effects of those variables
that are of interest.
Step 7 Choosing experimental levels and then combinations for managed variables
is part of the task of deciding on a detailed data collection protocol. Levels of
controlled and block variables should usually be chosen to be representative of
the values that will be met in routine system operation. For example, suppose the
amount of contamination in a transmission’s hydraulic fluid is thought to affect
time to failure when the transmission is subjected to stress testing, where Operating
Speed and Pressure are the primary experimental variables. It only makes sense to
see that the contamination level(s) during testing are representative of the level(s)
that will be typical when the transmission is used in the field.
With regard to primary experimental variables, one should also choose typical
levels—with a couple of provisos. Sometimes the goal in an engineering experiment
is to compare an innovative, nonstandard way of doing things to current practice.
In such cases, it is not good enough simply to look at system behavior with typical
settings for primary experimental variables. Also, where primary experimental vari-
ables are believed to have relatively small effects on a response, it may be necessary
to choose ranges for the primary variables that are wider than normal, to see clearly
how they act on the response.
Other physical realities and constraints on data collection may also make it
appropriate to use atypical values of managed variables and subsequently extrapolate
experimental results to “standard” circumstances. For example, it is costly enough to
run studies on pilot plants using small quantities of chemical reagents and miniature
equipment but much cheaper than experimentation on a full-scale facility. Another
kind of engineering study in which levels of primary experimental variables are
purposely chosen outside normal ranges is the accelerated life test. Such studies
are done to predict the life-length properties of products that in normal usage would
far outlast any study of feasible length. All that can then be done is to turn up
the stress on sample units beyond normal levels, observe performance, and try to
extrapolate back to a prediction for behavior under normal usage. (For example, if
sensitive electronic equipment performs well under abnormally high temperature
2.5 Preparing to Collect Engineering Data 63
and humidity, this could well be expected to imply long useful life under normal
temperature and humidity conditions.)
After the experimental levels of individual manipulated variables are chosen,
they must be combined to form the experimental patterns (combinations) of man-
aged variables. The range of choices is wide: factorial structures, fractional factorial
structures, other standard structures, and patterns tailor-made for a particular prob-
lem. (Tailor-made plans will, for example, be needed in situations where particular
combinations of factor levels prescribed by standard structures are a priori clearly
unsafe or destructive of company property.)
But developing a detailed data collection protocol requires more than even
choices of experimental combinations. Experimental order must be decided. Explicit
instructions for actually carrying out the testing must be agreed upon and written
down in such a way that someone who was not involved in study planning can carry
out the data collection. A timetable for initial data collection must be developed.
In all of this, it must be remembered that several iterations of data collection and
analysis (all within given budget constraints) may be required in order to find a
solution to the original engineering problem.
Section 5 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Either take an engineering system and response tomer Satisfaction and make a cause-and-effect di-
variable that you are familiar with from your field agram showing a variety of variables that may po-
or consider, for example, the United Airlines pas- tentially affect the response. How might such a
senger flight system and the response variable Cus- diagram be practically useful?
Chapter 2 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Use Table B.1 and choose a simple random sam- carefully how you do this. If you use the table,
ple of n = 8 out of N = 491 widgets. Describe begin in the upper left corner.
carefully how you label the widgets. Begin in the (b) What are some purposes of the randomization
upper left corner of the table. Then use spread- used in part (a)?
sheet or statistical software to redo the selection. 4. A sanitary engineer wishes to compare two meth-
2. Consider a potential student project concerning ods for determining chlorine content of Cl2 -
the making of popcorn. Possible factors affecting demand-free water. To do this, eight quite dif-
the outcome of popcorn making include at least ferent water samples are split in half, and one
the following: Brand of corn, Temperature of corn determination is made using the MSI method and
at beginning of cooking, Popping Method (e.g., another using the SIB method. Explain why it
frying versus hot air popping), Type of Oil used could be said that the principle of blocking was
(if frying), Amount of Oil used (if frying), Batch used in the engineer’s study. Also argue that the
Size, initial Moisture Content of corn, and Person resulting data set could be described as consisting
doing the evaluation of a single batch. Using these of paired measurement data.
factors and/or any others that you can think of, an- 5. A research group is testing three different meth-
swer the following questions about such a project: ods of electroplating widgets (say, methods A, B,
(a) What is a possible response variable in a pop- and C). On a particular day, 18 widgets are avail-
corn project? able for testing. The effectiveness of electroplat-
(b) Pick two possible experimental factors in this ing may be strongly affected by the surface texture
context and describe a 2 × 2 factorial data of the widgets. The engineer running the exper-
structure in those variables that might arise in iment is able to divide the 18 available widgets
such a study. into three groups of 6 on the basis of surface tex-
(c) Describe how the concept of randomization ture. (Assume that widgets 1–6 are rough, widgets
might be employed. 7–12 are normal, and widgets 13–18 are smooth.)
(d) Describe how the concept of blocking might (a) Use Table B.1 or statistical software in an
be employed. appropriate way and assign each of the treat-
3. An experiment is to be performed to compare the ments to 6 widgets. Carefully explain exactly
effects of two different methods for loading gears how you do the assignment of levels of treat-
in a carburizing furnace on the amount of distor- ments A, B, and C to the widgets.
tion produced in a heat treating process. Thrust (b) If equipment limitations are such that only
face runout will be measured for gears laid and one widget can be electroplated at once, but
for gears hung while treating. it is possible to complete the plating of all 18
(a) 20 gears are to be used in the study. Randomly widgets on a single day, in exactly what order
divide the gears into a group (of 10) to be laid would you have the widgets plated? Explain
and a group (of 10) to be hung, using either where you got this order.
Table B.1 or statistical software. Describe (c) If, in contrast to the situation in part (b), it is
Chapter 2 Exercises 65
possible to plate only 9 widgets in a single observed values of the two responses could
day, make up an appropriate plan for plating be entered for each experimental run.
9 on each of two consecutive days. (b) Suppose that it is feasible to make the runs
(d) If measurements of plating effectiveness are listed in your answer to part (a) in a com-
made on each of the 18 widgets, what kind of pletely randomized order. Use a mechanical
data structure will result from the scenario in method (like slips of paper in a hat) to arrive at
part (b)? From the scenario in part (c)? a random order of experimentation for your
6. A company wishes to increase the light intensity study. Carefully describe the physical steps
of its photoflash cartridge. Two wall thicknesses you follow in developing this order for data
1 00 00 collection.
( 16 and 18 ) and two ignition point placements are
under study. Two batches of the basic formulation 9. Use Table B.1 and
used in the cartridge are to be made up, each (a) Select a simple random sample of 7 widgets
batch large enough to make 12 cartridges. Discuss from a production run of 619 widgets (begin
how you would recommend running this initial at the upper left corner of the table and move
phase of experimentation if all cartridges can be left to right, top to bottom). Tell how you la-
made and tested in a short time period by a single beled the widgets and name which ones make
technician. Be explicit about any randomization up your sample.
and/or blocking you would employ. Say exactly (b) Beginning in the table where you left off in
what kinds of cartridges you would make and test, (a), select a second simple random sample of
in what order. Describe the structure of the data 7 widgets. Is this sample the same as the first?
that would result from your study. Is there any overlap at all?
7. Use Table B.1 or statistical software and 10. Redo Exercise 9 using spreadsheet or statistical
(a) Select a simple random sample of 5 widgets software.
from a production run of 354 such widgets. 11. Consider a study comparing the lifetimes (mea-
(If you use the table, begin at the upper left sured in terms of numbers of holes drilled before
corner and move left to right, top to bottom.) failure) of two different brands of 8-mm drills in
(b) Select a random order of experimentation for drilling 1045 steel. Suppose that steel bars from
a context where an experimental factor A has three different heats (batches) of steel are avail-
two levels; a second factor, B, has three lev- able for use in the study, and it is possible that the
els; and two experimental runs are going to different heats have differing physical properties.
be made for each of the 2 × 3 = 6 different The lifetimes of a total of 15 drills of each brand
possible combinations of levels of the factors. will be measured, and each of the bars available
Carefully describe how you do this. is large enough to accommodate as much drilling
8. Return to the situation of Exercise 8 of the Chap- as will be done in the entire study.
ter 1 Exercises. (a) Describe how the concept of control could be
(a) Name factors and levels that might be used in used to deal with the possibility that different
a three-factor, full factorial study in this situ- heats might have different physical properties
ation. Also name two response variables for (such as hardnesses).
the study. Suppose that in accord with good (b) Name one advantage and one drawback to
engineering data collection practice, you wish controlling the heat.
to include some replication in the study. Make (c) Describe how one might use the concept of
up a data collection sheet, listing all the com- blocking to deal with the possibility that dif-
binations of levels of the factors to be studied, ferent heats might have different physical
and include blanks where the corresponding properties.
3
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Elementary
Descriptive Statistics
Engineering data are always variable. Given precise enough measurement, even
supposedly constant process conditions produce differing responses. Therefore, it is
not individual data values that demand an engineer’s attention as much as the pattern
or distribution of those responses. The task of summarizing data is to describe their
important distributional characteristics. This chapter discusses simple methods that
are helpful in this task.
The chapter begins with some elementary graphical and tabular methods of
data summarization. The notion of quantiles of a distribution is then introduced and
used to make other useful graphical displays. Next, standard numerical summary
measures of location and spread for quantitative data are discussed. Finally comes a
brief look at some elementary methods for summarizing qualitative and count data.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
66
3.1 Elementary Graphical and Tabular Treatment of Quantitative Data 67
Gears laid
0 10 20 30 40
Runout (.0001 in.)
Gears hung
0 10 20 30 40
Table 3.1
Bullet Penetration Depths (mm)
20 30 40 50 60 70
Penetration (mm)
20 30 40 50 60 70
Penetration (mm)
The dot diagrams show the penetrations of the 200 grain bullets to be both
larger and more consistent than those of the 230 grain bullets. (The students
had predicted larger penetrations for the lighter bullets on the basis of greater
muzzle velocity and smaller surface area on which friction can act. The different
consistencies of penetration were neither expected nor explained.)
Dot diagrams give the general feel of a data set but do not always allow the
recovery of exactly the values used to make them. A stem-and-leaf plot carries
much the same visual information as a dot diagram while preserving the original
values exactly. A stem-and-leaf plot is made by using the last few digits of each data
point to indicate where it falls.
0 5 8 9 9 9 9
1 0 0 1 1 1 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 5 5 5 5 6 7 7 8 9
2 7
3
0
0 5 8 9 9 9 9
1 0 0 1 1 1 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4
1 5 5 5 6 7 7 8 9
2
2 7
3
3
Example 1 Figure 3.3 gives two possible stem-and-leaf plots for the thrust face runouts
(continued ) of laid gears. In both, the first digit of each observation is represented by
the number to the left of the vertical line or “stem” of the diagram. The
numbers to the right of the vertical line make up the “leaves” and give the
second digits of the observed runouts. The second display shows somewhat
more detail than the first by providing “0–4” and “5–9” leaf positions for each
possible leading digit, instead of only a single “0–9” leaf for each leading
digit.
Example 2 Figure 3.4 gives two possible stem-and-leaf plots for the penetrations of 200 grain
(continued ) bullets in Table 3.1. On these, it was convenient to use two digits to the left of
the decimal point to make the stem and the two following the decimal point to
create the leaves. The first display was made by recording the leaf values directly
from the table (from left to right and top to bottom). The second display is a
better one, obtained by ordering the values that make up each leaf. Notice that
both plots give essentially the same visual impression as the second dot diagram
in Figure 3.2.
When comparing two data sets, a useful way to use the stem-and-leaf idea is to
make two plots back-to-back.
70 Chapter 3 Elementary Descriptive Statistics
Example 1 Figure 3.5 gives back-to-back stem-and-leaf plots for the data of Table 1.1 (pg. 3).
(continued ) It shows clearly the differences in location and spread of the two data sets.
Example 1 Table 3.2 gives one possible frequency table for the laid gear runouts. The relative
(continued ) frequency values are obtained by dividing the entries in the frequency column
Table 3.2
Frequency Table for Laid Gear Thrust Face Runouts
Cumulative
Runout Relative Relative
(.0001 in.) Tally Frequency Frequency Frequency
by 38, the number of data points. The entries in the cumulative relative frequency
column are the ratios of the totals in a given class and all preceding classes to the
total number of data points. (Except for round-off, this is the sum of the relative
frequencies on the same row and above a given cumulative relative frequency.)
The tally column gives the same kind of information about distributional shape
that is provided by a dot diagram or a stem-and-leaf plot.
Choosing intervals The choice of intervals to use in making a frequency table is a matter of
for a frequency judgment. Two people will not necessarily choose the same set of intervals. However,
table there are a number of simple points to keep in mind when choosing them. First, in
order to avoid visual distortion when using the tally column of the table to gain an
impression of distributional shape, intervals of equal length should be employed.
Also, for aesthetic reasons, round numbers are preferable as interval endpoints. Since
there is usually aggregation (and therefore some loss of information) involved in the
reduction of raw data to tallies, the larger the number of intervals used, the more
detailed the information portrayed by the table. On the other hand, if a frequency
table is to have value as a summarization of data, it can’t be cluttered with too many
intervals.
After making a frequency table, it is common to use the organization provided
by the table to create a histogram. A (frequency or relative frequency) histogram is
a kind of bar chart used to portray the shape of a distribution of data points.
Example 2 Table 3.3 is a frequency table for the 200 grain bullet penetration depths, and
(continued ) Figure 3.6 is a translation of that table into the form of a histogram.
Table 3.3
Frequency Table for 200 Grain Penetration Depths
Cumulative
Penetration Relative Relative
Depth (mm) Tally Frequency Frequency Frequency
Example 2
(continued )
6
Frequency
4
60 70
Penetration depth (mm)
The vertical scale in Figure 3.6 is a frequency scale, and the histogram is a frequency
histogram. By changing to relative frequency on the vertical scale, one can produce
a relative frequency histogram. In making Figure 3.6, care was taken to
Following these guidelines results in a display in which equal enclosed areas cor-
respond to equal numbers of data points. Further, data point positioning is clearly
indicated by bar positioning on the horizontal axis. If these guidelines are not fol-
lowed, the resulting bar chart will in one way or another fail to faithfully represent
its data set.
Figure 3.7 shows terminology for common distributional shapes encountered
when making and using dot diagrams, stem-and-leaf plots, and histograms.
The graphical and tabular devices discussed to this point are deceptively simple
methods. When routinely and intelligently used, they are powerful engineering
tools. The information on location, spread, and shape that is portrayed so clearly on
a histogram can give strong hints as to the functioning of the physical process that
is generating the data. It can also help suggest physical mechanisms at work in the
Examples of process.
engineering For example, if data on the diameters of machined metal cylinders purchased
interpretations of from a vendor produce a histogram that is decidedly bimodal (or multimodal,
distribution shape having several clear humps), this suggests that the machining of the parts was done
3.1 Elementary Graphical and Tabular Treatment of Quantitative Data 73
on more than one machine, or by more than one operator, or at more than one
time. The practical consequence of such multichannel machining is a distribution
of diameters that has more variation than is typical of a production run of cylinders
from a single machine, operator, and setup. As another possibility, if the histogram
is truncated, this might suggest that the lot of cylinders has been 100% inspected
and sorted, removing all cylinders with excessive diameters. Or, upon marking
engineering specifications (requirements) for cylinder diameter on the histogram,
one may get a picture like that in Figure 3.8. It then becomes obvious that the lathe
turning the cylinders needs adjustment in order to increase the typical diameter.
But it also becomes clear that the basic process variation is so large that this
adjustment will fail to bring essentially all diameters into specifications. Armed
with this realization and a knowledge of the economic consequences of parts failing
to meet specifications, an engineer can intelligently weigh alternative courses of
action: sorting of all incoming parts, demanding that the vendor use more precise
equipment, seeking a new vendor, etc.
Investigating the shape of a data set is useful not only because it can lend insight
into physical mechanisms but also because shape can be important when determining
the appropriateness of methods of formal statistical inference like those discussed
later in this book. A methodology appropriate for one distributional shape may not
be appropriate for another.
Lower Upper
specification specification
Cylinder diameter
Table 3.4
Torques Required to Loosen Two Bolts on Face Plates (ft lb)
1 16 16 18 15 14
2 15 16 19 17 17
3 15 17 20 14 16
4 15 16 21 17 18
5 20 20 22 19 16
6 19 16 23 19 18
7 19 20 24 19 20
8 17 19 25 15 15
9 15 15 26 12 15
10 11 15 27 18 20
11 17 19 28 13 18
12 18 17 29 14 18
13 18 14 30 18 18
14 15 15 31 18 14
15 18 17 32 15 13
16 15 17 33 16 17
17 18 20 34 16 16
3.1 Elementary Graphical and Tabular Treatment of Quantitative Data 75
20 2 2
2
10 15 20
Bolt 3 torque (ft lb)
otherwise, unwanted differential forces might act on the face plate. It is also quite
reasonable that bolt 3 and bolt 4 torques be related, since the bolts were tightened
by different heads of a single pneumatic wrench operating off a single source of
compressed air. It stands to reason that variations in air pressure might affect the
tightening of the bolts at the two positions similarly, producing the big-together,
small-together pattern seen in Figure 3.9.
The previous example illustrates the point that relationships seen on scatterplots
suggest a common physical cause for the behavior of variables and can help reveal
that cause.
In the most common version of the scatterplot, the variable on the horizontal
axis is a time variable. A scatterplot in which univariate data are plotted against time
order of observation is called a run chart or trend chart. Making run charts is one
of the most helpful statistical habits an engineer can develop. Seeing patterns on a
run chart leads to thinking about what process variables were changing in concert
with the pattern. This can help develop a keener understanding of how process
behavior is affected by those variables that change over time.
Diameter Diameter
Joint (inches above nominal) Joint (inches above nominal)
1 −.005 16 .015
2 .000 17 .000
3 −.010 18 .000
4 −.030 19 −.015
5 −.010 20 −.015
6 −.025 21 −.005
7 −.030 22 −.015
8 −.035 23 −.015
9 −.025 24 −.010
10 −.025 25 −.015
11 −.025 26 −.035
12 −.035 27 −.025
13 −.040 28 −.020
14 −.035 29 −.025
15 −.035 30 −.015
+.020 +.020
Diameter
Diameter
.000 .000
–.020 –.020
–.040 –.040
5 10 15 20 25 30
Time of manufacture
Figure 3.10 Dot diagram and run chart of consecutive outer diameters
lathe. Figure 3.10 gives both a dot diagram and a run chart for the data in the
table. In keeping with standard practice, consecutive points on the run chart have
been connected with line segments.
Here the dot diagram is not particularly suggestive of the physical mecha-
nisms that generated the data. But the time information added in the run chart
is revealing. Moving along in time, the outer diameters tend to get smaller until
3.2 Quantiles and Related Graphical Tools 77
part 16, where there is a large jump, followed again by a pattern of diameter gen-
erally decreasing in time. In fact, upon checking production records, Williams
and Markowski found that the lathe had been turned off and allowed to cool down
between parts 15 and 16. The pattern seen on the run chart is likely related to the
behavior of the lathe’s hydraulics. When cold, the hydraulics probably don’t do
as good a job pushing the cutting tool into the part being turned as when they are
warm. Hence, the turned parts become smaller as the lathe warms up. In order
to get parts closer to nominal, the aimed-for diameter might be adjusted up by
about .020 in. and parts run only after warming up the lathe.
Section 1 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. The following are percent yields from 40 runs of (b) A trick often employed in the analysis of paired
a chemical process, taken from J. S. Hunter’s arti- data such as these is to reduce the pairs to dif-
cle “The Technology of Quality” (RCA Engineer, ferences by subtracting the values of one of the
May/June 1985): variables from the other. Compute differences
(top bolt–bottom bolt) here. Then make and
65.6, 65.6, 66.2, 66.8, 67.2, 67.5, 67.8, 67.8, 68.0, interpret a dot diagram for these values.
68.0, 68.2, 68.3, 68.3, 68.4, 68.9, 69.0, 69.1, 69.2,
69.3, 69.5, 69.5, 69.5, 69.8, 69.9, 70.0, 70.2, 70.4, Piece Top Bolt Bottom Bolt
70.6, 70.6, 70.7, 70.8, 70.9, 71.3, 71.7, 72.0, 72.6,
72.7, 72.8, 73.5, 74.2 1 110 125
2 115 115
Make a dot diagram, a stem-and-leaf plot, a fre- 3 105 125
quency table, and a histogram of these data. 4 115 115
2. Make back-to-back stem-and-leaf plots for the two 5 115 120
samples in Table 3.1. 6 120 120
3. Osborne, Bishop, and Klein collected manufactur- 7 110 115
ing data on the torques required to loosen bolts 8 125 125
holding an assembly on a piece of heavy machin- 9 105 110
ery. The accompanying table shows part of their 10 130 110
data concerning two particular bolts. The torques 11 95 120
recorded (in ft lb) were taken from 15 different 12 110 115
pieces of equipment as they were assembled. 13 110 120
(a) Make a scatterplot of these paired data. Are 14 95 115
there any obvious patterns in the plot? 15 105 105
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
the test had worse scores, and roughly 20% had better scores. This concept is also
useful in the description of engineering data. However, because it is often more
convenient to work in terms of fractions between 0 and 1 rather than in percentages
between 0 and 100, slightly different terminology will be used here: “Quantiles,”
rather than percentiles, will be discussed. After the quantiles of a data set are carefully
defined, they are used to create a number of useful tools of descriptive statistics:
quantile plots, boxplots, Q-Q plots, and normal plots (a type of theoretical Q-Q
plot).
Definition 1 For a data set consisting of n values that when ordered are x1 ≤ x2 ≤ · · · ≤ xn ,
1. if p = i−.5
n
for a positive integer i ≤ n, the p quantile of the data
set is
i − .5
Q( p) = Q = xi
n
i−.5
(The ith smallest data point will be called the n
quantile.)
2. for any number p between .5n and n−.5n
that is not of the form i−.5
n
for
an integer i, the p quantile of the data set will be obtained by linear
interpolation between the two values of Q( i−.5 n
) with corresponding
i−.5
n
that bracket p.
Table 3.6
Ten Paper Towel Breaking
Strengths
Table 3.7
Quantiles of the Paper Towel Breaking Strength
Distribution
i −.5 i −.5
i 10
ith Smallest Data Point, xi = Q 10
Example 5 the smallest 3 data points and half of the fourth smallest are counted as lying to
(continued ) the left of the desired number, and the largest 6 data points and half of the seventh
largest are counted as lying to the right. Thus, the fourth smallest data point must
be the .35 quantile, as is shown in Table 3.7.
To illustrate convention (2) of Definition 1, consider finding the .5 and .93
.5−.45
quantiles of the strength distribution. Since .5 is .55−.45 = .5 of the way from .45
to .55, linear interpolation gives
Definition 3 Q(.25) and Q(.75) are called the first (or lower) quartile and third (or
upper) quartile of a distribution, respectively.
Example 5 Referring again to Table 3.7 and the value of Q(.5) previously computed, for the
(continued ) breaking strength distribution
Definition 4 A quantile plot is a plot of Q( p) versus p. For an ordered data set of size
n containing values x1 ≤ x2 ≤ · · · ≤ xn , such a display is made by first plot-
ting the points ( i−.5
n
, xi ) and then connecting consecutive plotted points with
straight-line segments.
It is because convention (2) in Definition 1 calls for linear interpolation that straight-
line segments enter the picture in making a quantile plot.
3.2 Quantiles and Related Graphical Tools 81
Q( p)
10,000
9,000
8,000
7,000
.1 .2 .3 .4 .5 .6 .7 .8 .9 p
A quantile plot allows the user to do some informal visual smoothing of the plot to
compensate for any jaggedness. (The tacit assumption is that the underlying data-
generating mechanism would itself produce smoother and smoother quantile plots
for larger and larger samples.)
3.2.2 Boxplots
Familiarity with the quantile idea is the principal prerequisite for making boxplots,
an alternative to dot diagrams or histograms. The boxplot carries somewhat less
information, but it has the advantage that many can be placed side-by-side on a
single page for comparison purposes.
There are several common conventions for making boxplots. The one that will
be used here is illustrated in generic fashion in Figure 3.12. A box is made to extend
from the first to the third quartiles and is divided by a line at the median. Then the
interquartile range
Interquartile
range
IQR= Q(.75) − Q(.25)
82 Chapter 3 Elementary Descriptive Statistics
is calculated and the smallest data point within 1.5IQR of Q(.25) and the largest
data point within 1.5IQR of Q(.75) are determined. Lines called whiskers are made
to extend out from the box to these values. Typically, most data points will be within
the interval [Q(.25) − 1.5IQR, Q(.75) + 1.5IQR]. Any that are not then get plotted
individually and are thereby identified as outlying or unusual.
Example 5 Consider making a boxplot for the paper towel breaking strength data. To begin,
(continued )
Q(.25) = 8,572 g
Q(.5) = 9,088 g
Q(.75) = 9,614 g
So
and
1.5IQR = 1,563 g
Then
and
Since all the data points lie in the range 7,009 g to 11,177 g, the boxplot is as
shown in Figure 3.13.
9,088
7,583 10,688
8,572 9,614
A boxplot shows distributional location through the placement of the box and
whiskers along a number line. It shows distributional spread through the extent of
the box and the whiskers, with the box enclosing the middle 50% of the distribution.
Some elements of distributional shape are indicated by the symmetry (or lack
thereof) of the box and of the whiskers. And a gap between the end of a whisker
and a separately plotted point serves as a reminder that no data values fall in that
interval.
Two or more boxplots drawn to the same scale and side by side provide an
effective way of comparing samples.
So
Example 6 Similar calculations for the 200 grain bullet penetration depths yield
(continued )
Q(.25) = 60.25 mm
Q(.5) = 62.80 mm
Q(.75) = 64.35 mm
Q(.75) + 1.5IQR = 70.50 mm
Q(.25) − 1.5IQR = 54.10 mm
Table 3.8
Quantiles of the Bullet Penetration Depth Distributions
Figure 3.14 then shows boxplots placed side by side on the same scale. The
plots show the larger and more consistent penetration depths of the 200 grain
bullets. They also show the existence of one particularly extreme data point in
the 200 grain data set. Further, the relative lengths of the whiskers hint at some
skewness (recall the terminology introduced with Figure 3.7) in the data. And
all of this is done in a way that is quite uncluttered and compact. Many more of
3.2 Quantiles and Related Graphical Tools 85
70
200 Grain
50
230 Grain
bullets
40
30
these boxes could be added to Figure 3.14 (to compare other bullet types) without
visual overload.
Then, recognizing ordered data values as quantiles and letting Q 1 and Q 2 stand for
the quantile functions of the two respective data sets, it is clear from display (3.1)
that
Q 2 ( p) = 2Q 1 ( p) + 1 (3.2)
Table 3.9
Two Small Artificial Data Sets
Q2( p)
15
14
13
12
Data set 1 11
10
3 4 5 6 7 8
9
8
Data set 2
7 2
7 9 11 13 15 17 3 4 5 6 7 Q1( p)
Figure 3.15 Dot diagrams for two Figure 3.16 Q- Q plot for the data
small data sets of Table 3.9
That is, the two data sets have quantile functions that are linearly related. Looking
at either display (3.1) or (3.2), it is obvious that a plot of the points
i − .5 i − .5
Q1 , Q2
5 5
Definition 5 A Q-Q plot for two data sets with respective quantile functions Q 1 and Q 2 is
a plot of ordered pairs (Q 1 ( p), Q 2 ( p)) for appropriate values of p. When two
data sets of size n are involved, the values of p used to make the plot will be
i−.5
n
for i = 1, 2, . . . , n. When two data sets of unequal sizes are involved, the
values of p used to make the plot will be i−.5 n
for i = 1, 2, . . . , n, where n is
the size of the smaller set.
To make a Q-Q plot for two data sets of the same size,
Steps in making 1. order each from the smallest observation to the largest,
a Q-Q plot
2. pair off corresponding values in the two data sets, and
3. plot ordered pairs, with the horizontal coordinates coming from the first data
set and the vertical ones from the second.
When data sets of unequal size are involved, the ordered values from the smaller
data set must be paired with quantiles of the larger data set obtained by interpolation.
3.2 Quantiles and Related Graphical Tools 87
A Q-Q plot that is reasonably linear indicates the two distributions involved have
similar shapes. When there are significant departures from linearity, the character
of those departures reveals the ways in which the shapes differ.
Example 6 Returning again to the bullet penetration depths, Table 3.8 (page 84) gives the
(continued ) raw material for making a Q-Q plot. The depths on each row of that table need
only be paired and plotted in order to make the plot given in Figure 3.17.
The scatterplot in Figure 3.17 is not terribly linear when looked at as a whole.
However, the points corresponding to the 2nd through 13th smallest values in
each data set do look fairly linear, indicating that (except for the extreme lower
ends) the lower ends of the two distributions have similar shapes.
The horizontal jog the plot takes between the 13th and 14th plotted points
indicates that the gap between 43.85 mm and 47.30 mm (for the 230 grain data)
is out of proportion to the gap between 63.55 and 63.80 mm (for the 200 grain
data). This hints that there was some kind of basic physical difference in the
mechanisms that produced the smaller and larger 230 grain penetration depths.
Once this kind of indication is discovered, it is a task for ballistics experts or
materials people to explain the phenomenon.
Because of the marked departure from linearity produced by the 1st plotted
point (27.75, 58.00), there is also a drastic difference in the shapes of the extreme
lower ends of the two distributions. In order to move that point back on line with
the rest of the plotted points, it would need to be moved to the right or down
(i.e., increase the smallest 230 grain observation or decrease the smallest 200
grain observation). That is, relative to the 200 grain distribution, the 230 grain
distribution is long-tailed to the low side. (Or to put it differently, relative to
the 230 grain distribution, the 200 grain distribution is short-tailed to the low
side.) Note that the difference in shapes was already evident in the boxplot in
Figure 3.14. Again, it would remain for a specialist to explain this difference in
distributional shapes.
200 Grain pentration (mm)
70
60
50
20 30 40 50 60
230 Grain penetration (mm)
The Q-Q plotting idea is useful when applied to two data sets, and it is easiest to
explain the notion in such an “empirical versus empirical” context. But its greatest
usefulness is really when it is applied to one quantile function that represents a data
set and a second that represents a theoretical distribution.
Definition 6 A theoretical Q-Q plot or probability plot for a data set of size n and a
theoretical distribution, with respective quantile functions Q 1 and Q 2 , is a plot
of ordered pairs (Q 1 ( p), Q 2 ( p)) for appropriate values of p. In this text, the
values of p of the form i−.5n
for i = 1, 2, . . . , n will be used.
Recognizing Q 1 ( i−.5
n
) as the ith smallest data point, one sees that a theoretical
Q-Q plot is a plot of points with horizontal plotting positions equal to observed data
and vertical plotting positions equal to quantiles of the theoretical distribution. That
is, with ordered data x 1 ≤ x2 ≤ · · · ≤ xn , the points
Ordered pairs i − .5
making a xi , Q 2
n
probability plot
are plotted. Such a plot allows one to ask, “Does the data set have a shape similar to
the theoretical distribution?”
Normal The most famous version of the theoretical Q-Q plot occurs when quantiles for
plotting the standard normal or Gaussian distribution are employed. This is the familiar
bell-shaped distribution. Table 3.10 gives some quantiles of this distribution. In
order to find Q( p) for p equal to one of the values .01, .02, . . . , .98, .99, locate the
entry in the row labeled by the first digit after the decimal place and in the column
labeled by the second digit after the decimal place. (For example, Q(.37) = −.33.)
A simple numerical approximation to the values given in Table 3.10 adequate for
most plotting purposes is
The origin of Table 3.10 is not obvious at this point. It will be explained in
Section 5.2, but for the time being consider the following crude argument to the
effect that the quantiles in the table correspond to a bell-shaped distribution. Imagine
that each entry in Table 3.10 corresponds to a data point in a set of size n = 99. A
possible frequency table for those 99 data points is given as Table 3.11. The tally
column in Table 3.11 shows clearly the bell shape.
The standard normal quantiles can be used to make a theoretical Q-Q plot as
a way of assessing how bell-shaped a data set looks. The resulting plot is called a
normal (probability) plot.
3.2 Quantiles and Related Graphical Tools 89
Table 3.10
Standard Normal Quantiles
.00 .01 .02 .03 .04 .05 .06 .07 .08 .09
.0 −2.33 −2.05 −1.88 −1.75 −1.65 −1.55 −1.48 −1.41 −1.34
.1 −1.28 −1.23 −1.18 −1.13 −1.08 −1.04 −.99 −.95 −.92 −.88
.2 −.84 −.81 −.77 −.74 −.71 −.67 −.64 −.61 −.58 −.55
.3 −.52 −.50 −.47 −.44 −.41 −.39 −.36 −.33 −.31 −.28
.4 −.25 −.23 −.20 −.18 −.15 −.13 −.10 −.08 −.05 −.03
.5 0.00 .03 .05 .08 .10 .13 .15 .18 .20 .23
.6 .25 .28 .31 .33 .36 .39 .41 .44 .47 .50
.7 .52 .55 .58 .61 .64 .67 .71 .74 .77 .81
.8 .84 .88 .92 .95 .99 1.04 1.08 1.13 1.18 1.23
.9 1.28 1.34 1.41 1.48 1.55 1.65 1.75 1.88 2.05 2.33
Table 3.11
A Frequency Table for the Standard Normal Quantiles
−2.80 to −2.30 1
−2.29 to −1.79 2
−1.78 to −1.28 7
−1.27 to −.77 12
−.76 to −.26 17
−.25 to .25 21
.26 to .76 17
.77 to 1.27 12
1.28 to 1.78 7
1.79 to 2.29 2
2.30 to 2.80 1
Example 5 Consider again the paper towel strength testing scenario and now the issue of
(continued ) how bell-shaped the data set in Table 3.6 (page 79) is. Table 3.12 was made using
Tables 3.7 (page 79) and 3.10; it gives the information needed to produce the
theoretical Q-Q plot in Figure 3.18.
Considering the small size of the data set involved, the plot in Figure 3.18
is fairly linear, and so the data set is reasonably bell-shaped. As a practical
consequence of this judgment, it is then possible to use the normal probability
models discussed in Section 5.2 to describe breaking strength. These could be
employed to make breaking strength predictions, and methods of formal statistical
inference based on them could be used in the analysis of breaking strength data.
90 Chapter 3 Elementary Descriptive Statistics
2.0
Standard normal quantile
1.0
–1.0
Special graph paper, called normal probability paper (or just probability
paper), is available as an alternative way of making normal plots. Instead of plotting
points on regular graph paper using vertical plotting positions taken from Table 3.10,
points are plotted on probability paper using vertical plotting positions of the form
i−.5
n
. Figure 3.19 is a normal plot of the breaking strength data from Example 5 made
on probability paper. Observe that this is virtually identical to the plot in Figure 3.18.
Normal plots are not the only kind of theoretical Q-Q plots useful to engineers.
Many other types of theoretical distributions are of engineering importance, and
each can be used to make theoretical Q-Q plots. This point is discussed in more
3.2 Quantiles and Related Graphical Tools 91
99.99
0.01
0.2 0.1 0.05
99.8 99.9
0.5
99
1
98
2
95
5
10
90
20
80
70
30
40
60
50
50
60
40
30
70
80
20
90
10
95
5
98
2
99
1
0.5
99.9 99.8
0.01
Figure 3.19 Normal plot for the paper towel strengths (made on probability paper,
used with permission of the Keuffel and Esser Company)
detail in Section 5.3, but the introduction of theoretical Q-Q plotting here makes it
possible to emphasize the relationship between probability plotting and (empirical)
Q-Q plotting.
92 Chapter 3 Elementary Descriptive Statistics
Section 2 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. The following are data (from Introduction to Con- (a) Make quantile plots for these two samples.
temporary Statistical Methods by L. H. Koopmans) Find the medians, the quartiles, and the .37
on the impact strength of sheets of insulating ma- quantiles for the two data sets.
terial cut in two different ways. (The values are in (b) Draw (to scale) carefully labeled side-by-side
ft lb.) boxplots for comparing the two cutting meth-
ods. Discuss what these show about the two
Lengthwise Cuts Crosswise Cuts methods.
(c) Make and discuss the appearance of a Q-Q plot
1.15 .89 for comparing the shapes of these two data sets.
.84 .69 2. Make a Q-Q plot for the two small samples in
.88 .46 Table 3.13 in Section 3.3.
.91 .85 3. Make and interpret a normal plot for the yield data
.86 .73 of Exercise 1 of Section 3.1.
.88 .67
4. Explain the usefulness of theoretical Q-Q plotting.
.92 .78
.87 .77
.93 .80
.95 .79
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
The word average, as used in colloquial speech, has several potential technical
meanings. One is the median, Q(.5), which was introduced in the last section. The
median divides a data set in half. Roughly half of the area enclosed by the bars of a
well-made histogram will lie to either side of the median. As a measure of center,
it is completely insensitive to the effects of a few extreme or outlying observations.
For example, the small set of data
2, 3, 6, 9, 10
has median 6, and this remains true even if the value 10 is replaced by 10,000,000
and/or the value 2 is replaced by −200,000.
The previous section used the median as a center value in the making of boxplots.
But the median is not the technical meaning most often attached to the notion of
average in statistical analyses. Instead, it is more common to employ the (arithmetic)
mean.
1X
n
x̄ = x
n i=1 i
The mean is sometimes called the first moment or center of mass of a distribution,
drawing on an analogy to mechanics. Think of placing a unit mass along the number
line at the location of each value in a data set—the balance point of the mass
distribution is at x̄.
and
1
I x̄ = (.37 + .52 + .65 + .92 + 2.89 + 3.62) = 1.495% waste
6
94 Chapter 3 Elementary Descriptive Statistics
Supplier 1 Supplier 2
.37, .52, .65, .89, .99, 1.45, 1.47,
.92, 2.89, 3.62 1.58, 2.27, 2.63, 6.54
and
1
x̄ = (.89 + .99 + 1.45 + 1.47 + 1.58 + 2.27 + 2.63 + 6.54)
8
I = 2.228% waste
Figure 3.20 shows dot diagrams with the medians and means marked. Notice
that a comparison of either medians or means for the two suppliers shows the
supplier 2 waste to be larger than the supplier 1 waste. But there is a substan-
tial difference between the median and mean values for a given supplier. In
both cases, the mean is quite a bit larger than the corresponding median. This
reflects the right-skewed nature of both data sets. In both cases, the center of
mass of the distribution is pulled strongly to the right by a few extremely large
values.
Supplier 1
Q(.5) = .785
0 1 2 3 4 5 6
x = 1.495
Waste (percent)
Supplier 2
Q(.5) = 1.525
0 1 2 3 4 5 6
x = 2.228
Waste (percent)
Example 7 shows clearly that, in contrast to the median, the mean is a mea-
sure of center that can be strongly affected by a few extreme data values. People
sometimes say that because of this, one or the other of the two measures is “better.”
Such statements lack sense. Neither is better; they are simply measures with dif-
ferent properties. And the difference is one that intelligent consumers of statistical
information do well to keep in mind. The “average” income of employees at a com-
pany paying nine workers each $10,000/year and a president $110,000/year can be
described as $10,000/year or $20,000/year, depending upon whether the median or
mean is being used.
R = xn − x1
Notice the word usage here. The word range could be used as a verb to say, “The
data range from 3 to 21.” But to use the word as a noun, one says, “The range is
(21 − 3) = 18.” Since the range depends only on the values of the smallest and
largest points in a data set, it is necessarily highly sensitive to extreme (or outlying)
values. Because it is easily calculated, it has enjoyed long-standing popularity in
industrial settings, particularly as a tool in statistical quality control.
However, most methods of formal statistical inference are based on another mea-
sure of distributional spread. A notion of “mean squared deviation” or “root mean
squared deviation” is employed to produce measures that are called the variance
and the standard deviation, respectively.
96 Chapter 3 Elementary Descriptive Statistics
1 X
n
s2 = (x − x̄)2
n − 1 i=1 i
Example 7 The spreads in the two sets of percentage wastes recorded in Table 3.13 can be
(continued ) expressed in any of the preceding terms. For the supplier 1 data,
Q(.25) = .52
Q(.75) = 2.89
and so
Also,
Further,
1
s2 = ((.37 − 1.495)2 + (.52 − 1.495)2 + (.65 − 1.495)2 + (.92 − 1.495)2
6−1
+ (2.89 − 1.495)2 + (3.62 − 1.495)2 )
= 1.945(% waste)2
so that
√
I s= 1.945 = 1.394% waste
3.3 Standard Numerical Summary Measures 97
and
Further,
1
s2 = ((.89 − 2.228)2 + (.99 − 2.228)2 + (1.45 − 2.228)2 + (1.47 − 2.228)2
8−1
+ (1.58 − 2.228)2 + (2.27 − 2.228)2 + (2.63 − 2.228)2 + (6.54 − 2.228)2 )
= 3.383(% waste)2
so
I s = 1.839% waste
Supplier 2 has the smaller IQR but the larger R and s. This is consistent with
Figure 3.20. The central portion of the supplier 2 distribution is tightly packed.
But the single extreme data point makes the overall variability larger for the
second supplier than for the first.
Proposition 1 For any data set and any number k larger than 1, a fraction of at least 1 − (1/k 2 )
(Chebyschev’s Theorem ) of the data are within ks of x̄.
98 Chapter 3 Elementary Descriptive Statistics
This little theorem says, for example, that at least 34 of a data set is within 2 standard
deviations of its mean. And at least 89 of a data set is within 3 standard deviations of
its mean. So the theorem promises that if a data set has a small standard deviation,
it will be tightly packed about its mean.
Example 7 Returning to the waste data, consider illustrating the meaning of Chebyschev’s
(continued ) theorem with the supplier 1 values. For example, taking k = 2, at least 34 =
1 − ( 12 )2 of the 6 data points (i.e., at least 4.5 of them) must be within 2 standard
deviations of x̄. In fact
and
so simple counting shows that all (a fraction of 1.0) of the data are between these
two values.
Definition 10 Numerical summarizations of sample data are called (sample) statistics. Nu-
merical summarizations of population and theoretical distributions are called
(population or model) parameters. Typically, Roman letters are used as sym-
bols for statistics, and Greek letters are used to stand for parameters.
1 X
N
Population µ= x (3.4)
mean N i=1 i
Comparing this expression to the one in Definition 7, not only is a different symbol
used for the mean but also N is used in place of n. It is standard to denote a
population size as N and a sample size as n. Chapter 5 gives a definition for the
3.3 Standard Numerical Summary Measures 99
mean of a theoretical distribution. But it is worth saying now that the symbol µ will
be used in that context as well as in the context of equation (3.4).
As another example of the usage suggested by Definition 10, consider the vari-
ance and standard deviation. Definition 9 refers specifically to the sample variance
and standard deviation. If a data set represents an entire population, then it is com-
mon to use the lowercase Greek sigma squared (σ 2 ) to stand for the population
variance and to define
1 X
Population N
variance σ2 = (x − µ)2 (3.5)
N i=1 i
The nonnegative square root of σ 2 is then called the population standard devia-
tion, σ . (The division in equation (3.5) is by N , and not the N − 1 that might be
expected on the basis of Definition 9. There are reasons for this change, but they are
not accessible at this point.) Chapter 5 defines a variance and standard deviation for
theoretical distributions, and the symbols σ 2 and σ will be used there as well as in
the context of equation (3.5).
On one point, this text will deviate from the Roman/Greek symbolism conven-
tion laid out in Definition 10: the notation for quantiles. Q( p) will stand for the pth
quantile of a distribution, whether it is from a sample, a population, or a theoretical
model.
Table 3.14
Means and Ranges for a Critical Dimension on Samples of n = 5 Parts
Example 8
(continued )
x
3515
3510
3505
3500
5 10 15 20 25
Sample number
R
5
0
5 10 15 20 25
Sample number
shift were not really systematically any different from the others. Instead, the
person making the measurements for samples 9 through 15 used the gauge in a
fundamentally different way than other employees. The pattern in the x̄ values
was caused by this change in measurement technique.
Terminology and Patterns revealed in the plotting of sample statistics against time ought to alert
causes for patterns an engineer to look for a physical cause and (typically) a cure. Systematic vari-
on plots against ations or cycles in a plot of means can often be related to process variables that
Time come and go on a more or less regular basis. Examples include seasonal or daily
variables like ambient temperature or those caused by rotation of gauges or fixtures.
Instability or variation in excess of that related to basic equipment precision can
sometimes be traced to mixed lots of raw material or overadjustment of equipment
by operators. Changes in level of a process mean can originate in the introduction
of new machinery, raw materials, or employee training and (for example) tool wear.
Mixtures of several patterns of variation on a single plot of some summary statistic
against time can sometimes (as in Example 8) be traced to changes in measurement
calibration. They are also sometimes produced by consistent differences in machines
or streams of raw material.
Plots against Plots of summary statistics against time are not the only useful ones. Plots
process variables against process variables can also be quite informative.
Table 3.15
Mean Joint Strengths for Nine Wood/Glue Combinations
x̄
Wood Glue Mean Joint Shear Strength (lb)
pine white 131.7
pine carpenter’s 192.7
pine cascamite 201.3
fir white 92.0
fir carpenter’s 146.3
fir cascamite 156.7
oak white 257.7
oak carpenter’s 234.3
oak cascamite 177.7
102 Chapter 3 Elementary Descriptive Statistics
Example 9
(continued )
250 Oak
200
Strength (lb)
Pine
150
Fir
100
From the plot, it is obvious that the gluing properties of pine and fir are
quite similar, with pine joints averaging around 40–45 lb stronger. For these
two soft woods, cascamite appears slightly better than carpenter’s glue, both of
which make much better joints than white glue. The gluing properties of oak
(a hardwood) are quite different from those of pine and fir. In fact, the glues
perform in exactly the opposite ordering for the strength of oak joints. All of this
is displayed quite clearly by the simple plot in Figure 3.22.
The two previous examples have illustrated the usefulness of plotting sample
statistics against time and against levels of an experimental variable. Other possi-
bilities in specific engineering situations can potentially help the working engineer
understand and manipulate the systems on which he or she works.
figures on the printout do not match exactly those found earlier. MINITAB simply
uses slightly different conventions for those quantities than the ones introduced in
Section 3.2.
High-quality statistical packages like MINITAB (and JMP, SAS, SPSS, SYS-
TAT, SPLUS, etc.) are widely available. One of them should be on the electronic
desktop of every working engineer. Unfortunately, this is not always the case, and
engineers often assume that standard spreadsheet software (perhaps augmented with
third party plug-ins) provides a workable substitute. Often this is true, but sometimes
it is not.
The primary potential problem with using a spreadsheet as a substitute for sta-
tistical software concerns numerical accuracy. Spreadsheets can and do on occasion
return catastrophically wrong values for even simple statistics. Established vendors
of statistical software have many years of experience dealing with subtle numerical
issues that arise in the computation of even simple summaries of even small data
sets. Most vendors of spreadsheet software seem unaware of or indifferent to these
matters. For example, consider the very small data set
0, 1, 2
The sample variance of these data is easily seen to be 1.0, and essentially any
statistical package or spreadsheet will reliably return this value. However, suppose
100,000,000 is added to each of these n = 3 values, producing the data set
The actual sample variance is unchanged, and high-quality statistical software will
reliably return the value 1.0. However, as of late 1999, the current version of the
leading spreadsheet program returned the value 0 for this second sample variance.
This is a badly wrong answer to an apparently very simple problem.
So at least until vendors of spreadsheet software choose to integrate an es-
tablished statistical package into their products, we advise extreme caution in the
use of spreadsheets to do statistical computations. A good source of up-to-date
information on this issue is the AP Statistics electronic bulletin board found at
http://forum.swarthmore.edu/epigone/apstat-l.
104 Chapter 3 Elementary Descriptive Statistics
Section 3 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Calculate and compare the means, medians, ranges, Exercise 1 of Section 3.2 and thereby check your
interquartile ranges, and standard deviations of the answers to Exercise 1 here.
two data sets introduced in Exercise 1 of Section 4. Add 1.3 to each of the lengthwise cut impact
3.2. Discuss the interpretation of these values in the strengths referred to in Exercise 1 and then re-
context of comparing the two cutting methods. compute the values of the mean, median, range,
2. Are the numerical values you produced in Exercise interquartile range, and standard deviation. How
1 above most naturally thought of as statistics or as do these compare with the values obtained earlier?
parameters? Explain. Repeat this exercise after multiplying each length-
3. Use a statistical package to compute basic sum- wise cut impact strength by 2 (instead of adding
mary statistics for the two data sets introduced in 1.3).
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Sample fraction
The number of items in the sample with the characteristic
of items with a p̂ = (3.6)
characteristic n
will be used. A given sample can produce many such values of “ p hat” if either a
single characteristic has many possible categories or many different characteristics
are being monitored simultaneously.
3.4 Descriptive Statistics for Qualitative and Count Data 105
Table 3.16 gives counts of sampled connectors falling into the first four
categories (the four defect categories) over the 30-day period. Then, using the
fact that 30 × 100 = 3,000 connectors were inspected over this period,
p̂ A = 3/3000 = .0010
p̂ B = 0/3000 = .0000
p̂ C = 11/3000 = .0037
p̂ D = 1/3000 = .0003
Table 3.16
Counts of Connectors Classified into Four Defect
Categories
Table 3.17
Counts and Fractions of Tools with Various
Problems
Sample mean
The total number of occurrences
occurences per û = (3.7)
unit or item The total number of inspection units or sampled items
3.4 Descriptive Statistics for Qualitative and Count Data 107
is used. û is really closer in meaning to x̄ than to p̂, even though it can turn out to be
a number between 0 and 1 and is sometimes expressed as a percentage and called a
rate.
Although the counts totaled in the numerator of expression (3.7) must all be
integers, the values totaled to create the denominator need not be. For instance,
suppose vinyl floor tiles are being inspected for serious blemishes. If on one occasion
inspection of 1 box yields a total of 2 blemishes, on another occasion .5 box yields
0 blemishes, and on still another occasion 2.5 boxes yield a total of 1 blemish, then
2+0+1
û = = .75 blemishes/box
1 + .5 + 2.5
Example 10 It was possible for a single cable connector to have more than one defect of a
(continued ) given severity and, in fact, defects of different severities. For example, Delva,
Lynch, and Stephany’s records indicate that in the 3,000 connectors inspected,
1 connector had exactly 2 moderately serious defects (along with a single very
serious defect), 11 connectors had exactly 1 moderately serious defect (and no
others), and 2,988 had no moderately serious defects. So the observed rate of
moderately serious defects could be reported as
2 + 11
û = = .0043 moderately serious defects/connector
1 + 11 + 2988
This is an occurrence rate for moderately serious defects( û), but not a fraction
of connectors having moderately serious defects ( p̂).
The difference between the statistics p̂ and û may seem trivial. But it is a point
that constantly causes students confusion. Methods of formal statistical inference
based on p̂ are not the same as those based on û. The distinction between the two
kinds of rates must be kept in mind if those methods are to be applied appropriately.
To carry this warning a step further, note that not every quantity called a
percentage is even of the form p̂ or û. In a laboratory analysis, a specimen may be
declared to be “30% carbon.” The 30% cannot be thought of as having the form of p̂
in equation (3.6) or û in equation (3.7). It is really a single continuous measurement,
not a summary statistic. Statistical methods for p̂ or û have nothing to say about
such rates.
3.4.2 Bar Charts and Plots for Qualitative and Count Data
Often, a study will produce several values of p̂ or û that need to be compared. Bar
charts and simple bivariate plots can be a great aid in summarizing these results.
108 Chapter 3 Elementary Descriptive Statistics
Example 10 Figure 3.23 is a bar chart of the fractions of connectors in the categories A through
(continued ) D. It shows clearly that most connectors with defects fall into category C, having
moderately serious defects but no serious or very serious defects. This bar chart
is a presentation of the behavior of a single categorical variable.
Fraction of connectors
.004
.003
.002
.001
0
A B C D
Connector category
Example 11 Figure 3.24 is a bar chart of the information on tool problems in Table 3.17. It
(continued ) shows leaks to be the most frequently occurring problems on this production run.
.08
.07
.06
Problem rate
.05
.04
.03
.02
.01
0
Type 1 leak
Type 2 leak
Type 3 leak
Missing part 1
Missing part 2
Missing part 3
Bad part 4
Bad part 5
Bad part 6
Wrong part 7
Wrong part 8
Figures 3.23 and 3.24 are both bar charts, but they differ considerably. The
first concerns the behavior of a single (ordered) categorical variable—namely, Con-
nector Class. The second concerns the behavior of 11 different present–not present
categorical variables, like Type 1 Leak, Missing Part 3, etc. There may be some
significance to the shape of Figure 3.23, since categories A through D are arranged
in decreasing order of defect severity, and this order was used in the making of
the figure. But the shape of Figure 3.24 is essentially arbitrary, since the particular
ordering of the tool problem categories used to make the figure is arbitrary. Other
equally sensible orderings would give quite different shapes.
The device of segmenting bars on a bar chart and letting the segments stand
for different categories of a single qualitative variable can be helpful, particularly
where several different samples are to be compared.
Table 3.18
Percents Scrap and Rework in a Turning Operation
Job Number Percent Scrap Percent Rework Job Number Percent Scrap Percent Rework
1 2 25 10 3 18
2 3 11 11 0 3
3 0 5 12 1 5
4 0 0 13 0 0
5 0 20 14 0 0
6 2 23 15 0 3
7 0 6 16 0 2
8 0 5 17 0 2
9 2 8 18 1 5
110 Chapter 3 Elementary Descriptive Statistics
Example 12 Scrap
(continued ) Rework
25
Percent of production
20
15
10
1 5 10 15
Job number
Table 3.19
Defects Per Truck on 26 Production Days
2.5
Defects / Truck
2.0
1.5
1.0
.5
12/2
12/6
12/9
12/13
12/16
12/20
12/23
12/24
12/26
12/27
12/30
1/3
1/6
1/10
Date
.4
20% Reground powder
.2
Small Large
Shot size
Section 4 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. From your field, give an example of a variable that represented in Table 1.1, what would have been the
is a rate (a) of the form p̂, (b) of the form û, and sample fractions nonconforming p̂? Give a practi-
(c) of neither form. cal reason why having the values in Table 1.1 might
2. Because gauging is easier, it is sometimes tempting be preferable to knowing only the corresponding p̂
to collect qualitative data related to measurements values.
rather than the measurements themselves. For ex- 3. Consider the measurement of the percentage cop-
ample, in the context of Example 1 in Chapter 1, if per in brass specimens. The resulting data will be a
gears with runouts exceeding 15 were considered kind of rate data. Are the rates that will be obtained
to be nonconforming, it would be possible to derive of the type p̂, of the type û, or of neither type?
fractions nonconforming, p̂, from simple “go–no Explain.
go” checking of gears. For the two sets of gears
Chapter 3 Exercises 113
Chapter 3 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. The accompanying values are gains measured on which later can clog the filters in extruders when
120 amplifiers designed to produce a 10 dB gain. the recycled material is used. The following are the
These data were originally from the Quality Im- amounts (in ppm by weight of aluminum) found
provement Tools workbook set (published by the in bihourly samples of PET recovered at the plant
Juran Institute). They were then used as an exam- over roughly a two-day period.
ple in the article “The Tools of Quality” (Quality
Progress, September 1990). 291, 222, 125, 79, 145, 119, 244, 118, 182, 63,
30, 140, 101, 102, 87, 183, 60, 191, 119, 511,
8.1, 10.4, 8.8, 9.7, 7.8, 9.9, 11.7, 8.0, 9.3, 9.0, 8.2, 120, 172, 70, 30, 90, 115
8.9, 10.1, 9.4, 9.2, 7.9, 9.5, 10.9, 7.8, 8.3, 9.1, 8.4,
9.6, 11.1, 7.9, 8.5, 8.7, 7.8, 10.5, 8.5, 11.5, 8.0, 7.9, (Apparently, the data are recorded in the order in
8.3, 8.7, 10.0, 9.4, 9.0, 9.2, 10.7, 9.3, 9.7, 8.7, 8.2, which they were collected, reading left to right, top
8.9, 8.6, 9.5, 9.4, 8.8, 8.3, 8.4, 9.1, 10.1, 7.8, 8.1, to bottom.)
8.8, 8.0, 9.2, 8.4, 7.8, 7.9, 8.5, 9.2, 8.7, 10.2, 7.9, (a) Make a run chart for these data. Are there any
9.8, 8.3, 9.0, 9.6, 9.9, 10.6, 8.6, 9.4, 8.8, 8.2, 10.5, obvious time trends? What practical engineer-
9.7, 9.1, 8.0, 8.7, 9.8, 8.5, 8.9, 9.1, 8.4, 8.1, 9.5, ing reason is there for looking for such trends?
8.7, 9.3, 8.1, 10.1, 9.6, 8.3, 8.0, 9.8, 9.0, 8.9, 8.1, (b) Ignoring the time order information, make a
9.7, 8.5, 8.2, 9.0, 10.2, 9.5, 8.3, 8.9, 9.1, 10.3, 8.4, stem-and-leaf diagram. Use the hundreds digit
8.6, 9.2, 8.5, 9.6, 9.0, 10.7, 8.6, 10.0, 8.8, 8.6 to make the stem and the other two digits (sep-
arated by commas to indicate the different data
(a) Make a stem-and-leaf plot and a boxplot for points) to make the leaves. After making an
these data. How would you describe the shape initial stem-and-leaf diagram by recording the
of this data set? Does the shape of your stem- data in the (time) order given above, make a
and-leaf plot (or a corresponding histogram) second one in which the values have been or-
give you any clue how a high fraction within dered.
specifications was achieved? (c) How would you describe the shape of the stem-
(b) Make a normal plot for these data and interpret and-leaf diagram? Is the data set bell-shaped?
its shape. (Standard normal quantiles for p = (d) Find the median and the first and third quartiles
.0042 and p = .9958 are approximately −2.64 for the aluminum contents and then find the .58
and 2.64, respectively.) quantile of the data set.
(c) Although the nominal gain for these amplifiers (e) Make a boxplot.
was to be 10 dB, the design allowed gains from (f) Make a normal plot, using regular graph paper.
7.75 dB to 12.2 dB to be considered acceptable. List the coordinates of the 26 plotted points.
About what fraction, p, of such amplifiers do Interpret the shape of the plot.
you expect to meet these engineering specifi- (g) Try transforming the data by taking natural log-
cations? arithms and again assess the shape. Is the trans-
2. The article “The Lognormal Distribution for Mod- formed data set more bell-shaped than the raw
eling Quality Data When the Mean is Near Zero” data set?
by S. Albin (Journal of Quality Technology, April (h) Find the sample mean, the sample range, and
1990) described the operation of a Rutgers Uni- the sample standard deviation for both the
versity plastics recycling pilot plant. The most im- original data and the log-transformed values
portant material reclaimed from beverage bottles from (g). Is the mean of the transformed val-
is PET plastic. A serious impurity is aluminum, ues equal to the natural logarithm of the mean
of the original data?
114 Chapter 3 Elementary Descriptive Statistics
3. The accompanying data are three hypothetical sam- 4. Gaul, Phan, and Shimonek measured the resis-
ples of size 10 that are supposed to represent mea- tances of 15 resistors of 2 × 5 = 10 different types.
sured manganese contents in specimens of 1045 Two different wattage ratings were involved, and
steel (the units are points, or .01%). Suppose that five different nominal resistances were used. All
these measurements were made on standard speci- measurements were reported to three significant
mens having “true” manganese contents of 80, us- digits. Their data follow.
ing three different analytical methods. (Thirty dif- (a) Make back-to-back stem-and-leaf plots for
ferent specimens were involved.) comparing the 14 watt and 12 watt resistance
distributions for each nominal resistance. In a
Method 1 few sentences, summarize what these show.
(b) Make pairs of boxplots for comparing the 14
87, 74, 78, 81, 78, watt and 12 watt resistance distributions for each
77, 84, 80, 85, 78 nominal resistance.
(c) Make normal plots for the 12 watt nominal 20
ohm and nominal 200 ohm resistors. Interpret
Method 2
these in a sentence or two. From the appear-
86, 85, 82, 87, 85, ance of the second plot, does it seem that if
84, 84, 82, 82, 85 the nominal 200 ohm resistances were treated
as if they had a bell-shaped distribution, the
tendency would be to overestimate or to un-
Method 3 derestimate the fraction of resistances near the
nominal value?
84, 83, 78, 79, 85,
82, 82, 81, 82, 79
1
4
Watt Resistors
(a) Make (on the same coordinate system) side- 20 ohm 75 ohm 100 ohm 150 ohm 200 ohm
by-side boxplots that you can use to compare
the three analytical methods. 19.2 72.9 97.4 148 198
(b) Discuss the apparent effectiveness of the three 19.2 72.4 95.8 148 196
methods in terms of the appearance of your di- 19.3 72.0 97.7 148 199
agram from (a) and in terms of the concepts 19.3 72.5 94.1 148 196
of accuracy and precision discussed in Sec- 19.1 72.7 95.1 148 196
tion 1.3. 19.0 72.3 95.4 147 195
(c) An alternative method of comparing two such
19.6 72.9 94.9 148 193
analytical methods is to use both methods of
19.2 73.2 98.5 148 196
analysis once on each of (say) 10 different
19.3 71.8 94.8 148 196
specimens (10 specimens and 20 measure-
ments). In the terminology of Section 1.2, what 19.4 73.4 94.6 147 199
kind of data would be generated by such a 19.4 70.9 98.3 147 194
plan? If one simply wishes to compare the 19.3 72.3 96.0 149 195
average measurements produced by two ana- 19.5 72.5 97.3 148 196
lytical methods, which data collection plan (20 19.2 72.1 96.0 148 195
specimens and 20 measurements, or 10 spec- 19.1 72.6 94.8 148 199
imens and 20 measurements) seems to you
most likely to provide the better comparison?
Explain.
Chapter 3 Exercises 115
1 5-Gram Weighings
2
Watt Resistors
20 ohm 75 ohm 100 ohm 150 ohm 200 ohm Scale 1 Scale 2 Scale 3
20.1 73.9 97.2 152 207 Student 1 5.03, 5.02 5.07, 5.09 4.98, 4.98
19.7 74.2 97.9 151 205 Student 2 5.03, 5.01 5.02, 5.07 4.99, 4.98
20.2 74.6 96.8 155 214 Student 3 5.06, 5.00 5.10, 5.08 4.98, 4.98
24.4 72.1 99.2 146 195
20.2 73.8 98.5 148 202 20-Gram Weighings
20.1 74.8 95.5 154 211
Scale 1 Scale 2 Scale 3
20.0 75.0 97.2 149 197
20.4 68.6 98.7 150 197 Student 1 20.04, 20.06 20.04, 20.04 19.94, 19.93
20.3 74.0 96.6 153 199 Student 2 20.02, 19.99 20.03, 19.93 19.95, 19.95
20.6 71.7 102 149 196 Student 3 20.03, 20.02 20.06, 20.03 19.91, 19.96
19.9 76.5 103 150 207
100-Gram Weighings
19.7 76.2 102 149 210
20.8 72.8 102 145 192 Scale 1 Scale 2 Scale 3
20.4 73.2 100 147 201
20.5 76.7 100 149 257 Student 1 100.06, 100.35 100.25, 100.08 99.87, 99.88
Student 2 100.05, 100.01 100.10, 100.02 99.87, 99.88
(d) Compute the sample means and sample stan- Student 3 100.00, 100.00 100.01, 100.02 99.88, 99.88
dard deviations for all 10 samples. Do these
values agree with your qualitative statements 6. The accompanying values are the lifetimes (in num-
made in answer to part (a)? bers of 24 mm deep holes drilled in 1045 steel
(e) Make a plot of the 10 sample means computed before tool failure) for n = 12 D952-II (8 mm)
in part (d), similar to the plot in Figure 3.22. drills. These were read from a graph in “Computer-
Comment on the appearance of this plot. assisted Prediction of Drill-failure Using In-process
5. Blomquist, Kennedy, and Reiter studied the prop- Measurements of Thrust Force” by A. Thangaraj
erties of three scales by each weighing a standard and P. K. Wright (Journal of Engineering for In-
5 g weight, 20 g weight, and 100 g weight twice dustry, May 1988).
on each scale. Their data are presented in the ac-
companying table. Using whatever graphical and 47, 145, 172, 86, 122, 110, 172, 52, 194, 116,
numerical data summary methods you find helpful, 149, 48
make sense of these data. Write a several-page dis- Write a short report to your engineering manager
cussion of your findings. You will probably want summarizing what these data indicate about the
to consider both accuracy and precision and (to the lifetimes of drills of this type in this kind of appli-
extent possible) make comparisons between scales cation. Use whatever graphical and numerical data
and between students. Part of your discussion might summary tools make clear the main features of the
deal with the concepts of repeatability and repro- data set.
ducibility introduced in Section 2.1. Are the pic-
7. Losen, Cahoy, and Lewis purchased eight spanner
tures you get of the scale and student performances
bushings of a particular type from a local machine
consistent across the different weights?
shop and measured a number of characteristics of
these bushings, including their outside diameters.
Each of the eight outside diameters was measured
116 Chapter 3 Elementary Descriptive Statistics
once by each of two student technicians, with the (a) Find the .84 quantile of the Compound 1 failure
following results (the units are inches): times.
(b) Give the coordinates of the two lower-left
Bushing 1 2 3 4 points that would appear on a normal plot of
Student A .3690 .3690 .3690 .3700 the Compound 1 data.
(c) Make back-to-back stem-and-leaf plots for
Student B .3690 .3695 .3695 .3695
comparing the life length properties of bear-
Bushing 5 6 7 8 ings made from Compounds 1 and 2.
Student A .3695 .3700 .3695 .3690 (d) Make (to scale) side-by-side boxplots for com-
Student B .3695 .3700 .3700 .3690 paring the life lengths for the two compounds.
Mark numbers on the plots indicating the loca-
A common device when dealing with paired data tions of their main features.
like these is to analyze the differences. Subtracting (e) Compute the sample means and standard devi-
B measurements from A measurements gives the ations of the two sets of lifetimes.
following eight values: (f) Describe what your answers to parts (c), (d),
and (e) above indicate about the life lengths of
.0000, −.0005, −.0005, .0005, .0000, .0000, these turbine bearings.
−.0005, .0000
9. Heyde, Kuebrick, and Swanson measured the
(a) Find the first and third quartiles for these dif- heights of 405 steel punches purchased by a com-
ferences, and their median. pany from a single supplier. The stamping machine
(b) Find the sample mean and standard deviation in which these are used is designed to use .500 in.
for the differences. punches. Frequencies of the measurements they
(c) Your mean in part (b) should be negative. Inter- obtained are shown in the accompanying table.
pret this in terms of the original measurement
problem. Punch Height Punch Height
(d) Suppose you want to make a normal plot of the (.001 in.) Frequency (.001 in.) Frequency
differences on regular graph paper. Give the co-
ordinates of the lower-left point on such a plot. 482 1 496 7
8. The accompanying data are the times to failure (in 483 0 497 13
millions of cycles) of high-speed turbine engine 484 1 498 24
bearings made out of two different compounds. 485 1 499 56
These were taken from “Analysis of Single Classi- 486 0 500 82
fication Experiments Based on Censored Samples 487 1 501 97
from the Two-parameter Weibull Distribution” by 488 0 502 64
J. I. McCool (The Journal of Statistical Planning 489 1 503 43
and Inference, 1979). 490 0 504 3
491 2 505 1
Compound 1 492 0 506 0
3.03, 5.53, 5.60, 9.30, 9.92, 493 0 507 0
12.51, 12.95, 15.21, 16.04, 16.84 494 0 508 0
495 6 509 2
Compound 2
(a) Summarize these data, using appropriate purity. Describe the shape of the purity distri-
graphical and numerical tools. How would bution.
you describe the shape of the distribution of (c) The author of the article found it useful to
punch heights? The specifications for punch reexpress the purities by subtracting 99.30
heights were in fact .500 in. to .505 in. Does (remember that the preceding values are in
this fact give you any insight as to the ori- units of .01% above 99.00%) and then tak-
gin of the distributional shape observed in ing natural logarithms. Do this with the raw
the data? Does it appear that the supplier has data and make a second stem-and-leaf dia-
equipment capable of meeting the engineer- gram and a second histogram to portray the
ing specifications on punch height? shape of the transformed data. Do these fig-
(b) In the manufacturing application of these ures look more bell-shaped than the ones you
punches, several had to be placed side-by-side made in part (b)?
on a drum to cut the same piece of material. In (d) Make a normal plot for the transformed values
this context, why is having small variability from part (c). What does it indicate about the
in punch height perhaps even more important shape of the distribution of the transformed
than having the correct mean punch height? values? (Standard normal quantiles for p =
10. The article “Watch Out for Nonnormal Distri- .005 and p = .995 are approximately −2.58
butions” by D. C. Jacobs (Chemical Engineer- and 2.58, respectively.)
ing Progress, November 1990) contains 100 mea- 11. The following are some data taken from the article
sured daily purities of oxygen delivered by a sin- “Confidence Limits for Weibull Regression with
gle supplier. These are as follows, listed in the time Censored Data” by J. I. McCool (IEEE Transac-
order of their collection (read left to right, top to tions on Reliability, 1980). They are the ordered
bottom). The values given are in hundredths of failure times (the time units are not given in the
a percent purity above 99.00% (so 63 stands for paper) for hardened steel specimens subjected to
99.63%). rolling contact fatigue tests at four different values
of contact stress.
63, 61, 67, 58, 55, 50, 55, 56, 52, 64, 73, 57, 63,
81, 64, 54, 57, 59, 60, 68, 58, 57, 67, 56, 66, 60,
.87 × 106 .99 × 106 1.09 × 106 1.18 × 106
49, 79, 60, 62, 60, 49, 62, 56, 69, 75, 52, 56, 61,
58, 66, 67, 56, 55, 66, 55, 69, 60, 69, 70, 65, 56, psi psi psi psi
73, 65, 68, 59, 62, 58, 62, 66, 57, 60, 66, 54, 64, 1.67 .80 .012 .073
62, 64, 64, 50, 50, 72, 85, 68, 58, 68, 80, 60, 60, 2.20 1.00 .18 .098
53, 49, 55, 80, 64, 59, 53, 73, 55, 54, 60, 60, 58, 2.51 1.37 .20 .117
50, 53, 48, 78, 72, 51, 60, 49, 67
3.00 2.25 .24 .135
You will probably want to use a statistical analysis 3.90 2.95 .26 .175
package to help you do the following: 4.70 3.70 .32 .262
(a) Make a run chart for these data. Are there any 7.53 6.07 .32 .270
obvious time trends? What would be the prac- 14.7 6.65 .42 .350
tical engineering usefulness of early detection 27.8 7.05 .44 .386
of any such time trend? 37.4 7.37 .88 .456
(b) Now ignore the time order of data collection
and represent these data with a stem-and-leaf (a) Make side-by-side boxplots for these data.
plot and a histogram. (Use .02% class widths Does it look as if the different stress levels
in making your histogram.) Mark on these the produce life distributions of roughly the same
supplier’s lower specification limit of 99.50% shape? (Engineering experience suggests that
118 Chapter 3 Elementary Descriptive Statistics
different stress levels often change the scale by R. Rossi (Solid State Technology, 1984). (The
but not the basic shape of life distributions.) units were not given in the article.)
(b) Make Q-Q plots for comparing all six dif-
ferent possible pairs of distributional shapes. 5.55, 5.52, 5.45, 5.53, 5.37, 5.22, 5.62, 5.69,
Summarize in a few sentences what these in- 5.60, 5.58, 5.51, 5.53
dicate about the shapes of the failure time
distributions under the different stress levels. (a) Make a dot diagram and a boxplot for these
12. Riddle, Peterson, and Harper studied the perfor- data and compute the statistics x̄ and s.
mance of a rapid-cut industrial shear in a continu- (b) Make a normal plot for these data. How bell-
ous cut mode. They cut nominally 2-in. and 1-in. shaped does this data set look? If you were to
strips of 14 gauge and 16 gauge steel sheet metal say that the shape departs from a perfect bell
and measured the actual widths of the strips pro- shape, in what specific way does it? (Refer to
duced by the shear. Their data follow, in units of characteristics of the normal plot to support
10−3 in. above nominal. your answer.)
14. The article “Thermal Endurance of Polyester
Material Thickness Enameled Wires Using Twisted Wire Specimens”
by H. Goldenberg (IEEE Transactions on Electri-
14 Gauge 16 Gauge cal Insulation, 1965) contains some data on the
lifetimes (in weeks) of wire specimens tested for
2, 1, 1, 1, −2, −6, −1, −2,
thermal endurance according to AIEE Standard
1 in. 0, 0, −2, −1, −2, −1, 57. Several different laboratories were used to
−10, −5, 1 −1, −1, −5 make the tests, and the results from two of the
Machine Setting laboratories, using a test temperature of 200◦ C,
10, 10, 8, 8, −4, −3, −4, −2, follow:
2 in. 8, 8, 7, −3, −3, −3,
7, 9, 11 −3, −4, −4 Laboratory 1 Laboratory 2
(a) Compute sample means and standard devia- 14, 16, 17, 18, 20, 27, 28, 29, 29, 29,
tions for the four samples. Plot the means in 22, 23, 25, 27, 28 30, 31, 31, 33, 34
a manner similar to the plot in Figure 3.22.
Make a separate plot of this kind for the stan- Consider first only the Laboratory 1 data.
dard deviations. (a) Find the median and the first and third quar-
(b) Write a short report to an engineering man- tiles for the lifetimes and then find the .64
ager to summarize what these data and your quantile of the data set.
summary statistics and plots show about the (b) Make and interpret a normal plot for these
performance of the industrial shear. How do data. Would you describe this distribution as
you recommend that the shear be set up in bell-shaped? If not, in what way(s) does it
the future in order to get strips cut from these depart from being bell-shaped? Give the co-
materials with widths as close as possible to ordinates of the 10 points you plot on regular
specified dimensions? graph paper.
(c) Find the sample mean, the sample range, and
13. The accompanying data are some measured resis-
the sample standard deviation for these data.
tivity values from in situ doped polysilicon spec-
Now consider comparing the work of the two dif-
imens taken from the article “LPCVD Process
ferent laboratories (i.e., consider both data sets).
Equipment Evaluation Using Statistical Methods”
Chapter 3 Exercises 119
(d) Make back-to-back stem-and-leaf plots for 16. The accompanying values are representative of
these two data sets (use two leaves for obser- data summarized in a histogram appearing in
vations 10–19, two for observations 20–29, the article “Influence of Final Recrystallization
etc.) Heat Treatment on Zircaloy-4 Strip Corrosion”
(e) Make side-by-side boxplots for these two data by Foster, Dougherty, Burke, Bates, and Worces-
sets. (Draw these on the same scale.) ter (Journal of Nuclear Materials, 1990). Given
(f) Based on your work in parts (d) and (e), which are n = 20 particle diameters observed in a bright-
of the two labs would you say produced the field TEM micrograph of a Zircaloy-4 specimen.
more precise results? The units are 10−2 µm.
(g) Is it possible to tell from your plots in (d)
and (e) which lab produced the more accurate 1.73, 2.47, 2.83, 3.20, 3.20, 3.57, 3.93, 4.30,
results? Why or why not? 4.67, 5.03, 5.03, 5.40, 5.77, 6.13, 6.50, 7.23,
15. Agusalim, Ferry, and Hollowaty made some mea- 7.60, 8.33, 9.43, 11.27
surements on the thickness of wallboard during
its manufacture. The accompanying table shows (a) Compute the mean and standard deviation of
thicknesses (in inches) of 12 different 4 ft × 8 ft these particle diameters.
boards (at a single location on the boards) both (b) Make both a dot diagram and a boxplot for
before and after drying in a kiln. (These boards these data. Sketch the dot diagram on a ruled
were nominally .500 in. thick.) scale and make the boxplot below it.
(c) Based on your work in (b), how would you
describe the shape of this data set?
Board 1 2 3 4 5 6 (d) Make a normal plot of these data. In what
Before Drying .514 .505 .500 .490 .503 .500 specific way does the distribution depart from
After Drying .510 .502 .493 .486 .497 .494 being bell-shaped?
(e) It is sometimes useful to find a scale of mea-
Board 7 8 9 10 11 12
surement on which a data set is reasonably
Before Drying .510 .508 .500 .511 .505 .501
bell-shaped. To that end, take the natural loga-
After Drying .502 .505 .488 .486 .491 .498
rithms of the raw particle diameters. Normal-
plot the log diameters. Does this plot appear
(a) Make a scatterplot of these data. Does there
to be more linear than your plot in (d)?
appear to be a strong relationship between
after-drying thickness and before-drying 17. The data in the accompanying tables are measure-
thickness? How might such a relationship ments of the latent heat of fusion of ice taken from
be of practical engineering importance in the Experimental Statistics (NBS Handbook 91) by
manufacture of wallboard? M. G. Natrella. The measurements were made (on
(b) Calculate the 12 before minus after differ- specimens cooled to −.072◦ C) using two differ-
ences in thickness. Find the sample mean and ent methods. The first was an electrical method,
sample standard deviation of these values. and the second was a method of mixtures. The
How might the mean value be used in running units are calories per gram of mass.
the sheetrock manufacturing process? (Based (a) Make side-by-side boxplots for comparing
on the mean value, what is an ideal before- the two measurement methods. Does there
drying thickness for the boards?) If some- appear to be any important difference in the
how all variability in before-drying thickness precision of the two methods? Is it fair to
could be eliminated, would substantial after- say that at least one of the methods must be
drying variability in thickness remain? Ex- somewhat inaccurate? Explain.
plain in terms of your calculations.
120 Chapter 3 Elementary Descriptive Statistics
(b) The dial bore data might well be termed (c) Find the sample mean, the sample range, and
“paired” data. A common method of anal- the sample standard deviation for the Laser
ysis for such data is to take differences and data.
study those. Compute the ten “notch minus Now consider comparing the two different drilling
non-notch” differences for the dial bore val- methods.
ues. Make a dot diagram for these and then (d) Make back-to-back stem-and-leaf plots for
a boxplot. What physical interpretation does the two data sets.
a nonzero mean for such differences have? (e) Make side-by-side boxplots for the two data
What physical interpretation does a large vari- sets. (Draw these on the same scale.)
ability in these differences have? (f) Based on your work in parts (d) and (e), which
(c) Make a scatterplot of the air spindler notch of the two processes would you say produced
measurements versus the dial bore notch mea- the most consistent results? Which process
surements. Does it appear that the air spindler produced an “average” angle closest to the
and dial bore measurements are strongly re- nominal angle (45◦ )?
lated? As it turns out, each metal part actually had two
(d) How would you suggest trying to determine holes drilled in it and their angles measured. Be-
which of the two gauges is most precise? low are the measured angles of the second hole
20. Duren, Leng and Patterson studied the drilling of drilled in each of the parts made using the Laser
holes in a miniature metal part using two different process. (The data are listed in the same part order
physical processes (laser drilling and electrical as earlier.)
discharge machining). Blueprint specifications on
these holes called for them to be drilled at an angle Laser (Hole B)
of 45◦ to the top surface of the part in question.
43.1, 44.3, 44.5, 46.3, 43.9, 41.9,
The realized angles measured on 13 parts drilled
using each process (26 parts in all) are 43.4, 49.0, 43.5, 47.2, 44.8 ,44.0, 43.9
conversion in the case of the Rockwell readings) (g) Is it possible to tell from your plot (e) which
were machine produced the most accurate results?
Why or why not?
Dial Rockwell 22. Ritchey, Bazan, and Buhman did an experiment
to compare flight times of several designs of pa-
536.6, 539.2, 524.4,
per helicopters, dropping them from the first to
536.6, 526.8, 531.6,
ground floors of the ISU Design Center. The flight
540.5, 534.0, 526.8,
times that they reported for two different designs
531.6 were (the units are seconds)
501.2, 522.0, 531.6, 2.47, 2.45, 2.43, 2.67, 2.69, 3.42, 3.50, 3.29, 3.51, 3.53,
522.0, 519.4, 523.2, 2.48, 2.44, 2.71, 2.84, 2.84 2.67, 2.69, 3.47, 3.40, 2.87
522.0, 514.2, 506.4,
518.1 (a) Find the median and the first and third quar-
tiles for the Design 1 data. Then find the .62
quantile of the Design 1 data set.
Brinell (b) Make and interpret a normal plot for the De-
sign 1 data. Would you describe this distri-
542.6, 526.0, 520.5, bution as bell-shaped? If not, in what way(s)
514.0, 546.6, 512.6, does it depart from being bell-shaped?
516.0, 580.4, 600.0, (c) Find the sample mean, the sample range, and
601.0 the sample standard deviation for the Design 1
data. Show some work.
Consider first only the Dial Rockwell data. Now consider comparing the two different de-
(a) Find the median and the first and third quar- signs.
tiles for the hardness measurements. Then (d) Make back-to-back stem-and-leaf plots for
find the .27 quantile of the data set. the two data sets.
(b) Make and interpret a normal plot for these (e) Make side-by-side boxplots for the two data
data. Would you describe this distribution as sets. (Draw these on the same scale.)
bell-shaped? If not, in what way(s) does it (f) Based on your work in parts (d) and (e), which
depart from being bell-shaped? of the two designs would you say produced
(c) Find the sample mean, the sample range, and the most consistent results? Which design
the sample standard deviation for these data. produced the longest flight times?
Now consider comparing the readings from the (g) It is not really clear from the students’ report
different testers (i.e., consider all three data sets.) whether the data came from the dropping of
(d) Make back-to-back stem-and-leaf plots for one helicopter of each design ten times, or
the two Rockwell data sets. (Use two “leaves” from the dropping of ten helicopters of each
for observations 500–509, two for the obser- design once. Briefly discuss which of these
vations 510–519, etc.) possibilities is preferable if the object of the
(e) Make side-by-side boxplots for all three data study was to identify a superior design. (If
sets. (Draw these on the same scale.) necessary, review Section 2.3.4.)
(f) Based on your work in part (e), which of the
three machines would you say produced the
most precise results?
4
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Describing
Relationships
Between Variables
The methods of Chapter 3 are really quite simple. They require little in the way of
calculations and are most obviously relevant to the analysis of a single engineering
variable. This chapter provides methods that address the more complicated prob-
lem of describing relationships between variables and are computationally more
demanding.
The chapter begins with least squares fitting of a line to bivariate quantitative
data and the assessment of the goodness of that fit. Then the line-fitting ideas are
generalized to the fitting of curves to bivariate data and surfaces to multivariate
quantitative data. The next topic is the summarization of data from full factorial
studies in terms of so-called factorial effects. Next, the notion of data transforma-
tions is discussed. Finally, the chapter closes with a short transitional section that
argues that further progress in statistics requires some familiarity with the subject
of probability.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
y ≈ β 0 + β1 x (4.1)
123
124 Chapter 4 Describing Relationships Between Variables
Table 4.1
Pressing Pressures and Resultant
Specimen Densities
x, y,
Pressure (psi) Density (g/cc)
2,000 2.486
2,000 2.479
2,000 2.472
4,000 2.558
4,000 2.570
4,000 2.580
6,000 2.646
6,000 2.657
6,000 2.653
8,000 2.724
8,000 2.774
8,000 2.808
10,000 2.861
10,000 2.879
10,000 2.858
4.1 Fitting a Line by Least Squares 125
2.900
2.800
2.600
2.500
It is very easy to imagine sketching a straight line through the plotted points in
Figure 4.1. Such a line could then be used to summarize how density depends upon
pressing pressure. The principle of least squares provides a method of choosing a
“best” line to describe the data.
Definition 1 To apply the principle of least squares in the fitting of an equation for y to
an n-point data set, values of the equation parameters are chosen to minimize
X
n
2
yi − ŷ i (4.2)
i=1
In the context of fitting a line to (x, y) data, the prescription offered by Def-
inition 1 amounts to choosing a slope and intercept so as to minimize the sum of
squared vertical distances from (x, y) data points to the line in question. This notion
is shown in generic fashion in Figure 4.2 for a fictitious five-point data set. (It is the
squares of the five indicated differences that must be added and minimized.)
Looking at the form of display (4.1), for the fitting of a line,
ŷ = β0 + β1 x
126 Chapter 4 Describing Relationships Between Variables
y
A possible
fitted line
y4 – y4
y 3 – y3 is positive
is positive
y 5 – y5
y1 – y1 is negative
is positive
y 2 – y2
is negative
X
n
2
S(β0 , β1 ) = yi − (β0 + β1 xi ) (4.3)
i=1
!
X
n X
n
nβ0 + x i β1 = yi (4.4)
i=1 i=1
and
! !
X
n X
n X
n
x i β0 + xi2 β1 = xi yi (4.5)
i=1 i=1 i=1
For reasons that are not obvious, equations (4.4) and (4.5) are sometimes called
the normal (as in perpendicular) equations for fitting a line. They are two linear
equations in two unknowns and can be fairly easily solved for β0 and β1 (provided
4.1 Fitting a Line by Least Squares 127
there are at least two different xi ’s in the data set). Simultaneous solution of equations
(4.4) and (4.5) produces values of β1 and β0 given by
Slope of the P
xi − x̄ yi − ȳ
least squares b1 = P 2 (4.6)
line, b1 xi − x̄
and
Intercept of
the least b0 = ȳ − b1 x̄ (4.7)
squares line, b0
Notice the notational convention here. The particular numerical slope and intercept
minimizing S(β0 , β1 ) are denoted (not as β’s but) as b1 and b0 .
In display (4.6), somewhat standard practice has been followed (and the sum-
mation notation abused) by not indicating the variable or range of summation (i,
from 1 to n).
Example 1 It is possible to verify that the data in Table 4.1 yield the following summary
(continued ) statistics:
X
xi = 2,000 + 2,000 + · · · + 10,000 = 90,000,
90,000
so x̄ = = 6,000
15
X 2
xi − x̄ = (2,000 − 6,000)2 + (2,000 − 6,000)2 + · · · +
Example 1 Then the least squares slope and intercept, b1 and b0 , are given via equations
(continued ) (4.6) and (4.7) as
5,840
I b1 = = .0000486 (g/cc)/psi
120,000,000
and
I ŷ = 2.375 + .0000487x
Interpretation of sketched on a scatterplot of the (x, y) points from Table 4.1. Note that the slope on
the slope of the this plot, b1 ≈ .0000487 (g/cc)/psi, has physical meaning as the (approximate)
least squares increase in y (density) that accompanies a unit (1 psi) increase in x (pressure).
line The intercept on the plot, b0 = 2.375 g/cc, positions the line vertically and is the
value at which the line cuts the y axis. But it should probably not be interpreted
as the density that would accompany a pressing pressure of x = 0 psi. The point
is that the reasonably linear-looking relation that the students found for pressures
between 2,000 psi and 10,000 psi could well break down at larger or smaller
Extrapolation pressures. Thinking of b0 as a 0 pressure density amounts to an extrapolation
outside the range of data used to fit the equation, something that ought always to
be approached with extreme caution.
2.900
2.800
Density (g/cc)
2.700
2.600
Least squares line
y = 2. 375 + .0000487x
2.500
1
ȳ = (2.558 + 2.570 + 2.580) = 2.5693 g/cc
3
and so to use this as a representative value. But assuming that y is indeed
approximately linearly related to x, the fitted value
might be even better for representing average density for 4,000 psi pressure.
Looking then at the situation for x = 5,000 psi, there are no data with this
x value. The only thing one can do to represent density at that pressure is to ask
Interpolation whether interpolation is sensible from a physical viewpoint. If so, the fitted value
Definition 2 The sample (linear) correlation between x and y in a sample of n data pairs
(xi , yi ) is
P
xi − x̄ yi − ȳ
r=q 2 P 2 (4.8)
P
xi − x̄ · yi − ȳ
Interpreting the The sample correlation always lies in the interval from −1 to 1. Further, it is −1
sample correlation or 1 only when all (x, y) data points fall on a single straight line. Comparison of
130 Chapter 4 Describing Relationships Between Variables
P 2 P 2 1/2
formulas (4.6) and (4.8) shows that r = b1 xi − x̄ / yi − ȳ so that
b1 and r have the same sign. So a sample correlation of −1 means that y decreases
linearly in increasing x, while a sample correlation of +1 means that y increases
linearly in increasing x.
Real data sets do not often exhibit perfect (+1 or −1) correlation. Instead r is
typically between −1 and 1. But drawing on the facts about how it behaves, people
take r as a measure of the strength of an apparent linear relationship: r near +1
or −1 is interpreted as indicating a relatively strong linear relationship; r near 0
is taken as indicating a lack of linear relationship. The sign of r is thought of as
indicating whether y tends to increase or decrease with increased x.
Example 1 For the pressure/density data, the summary statistics in the example following
(continued ) display (4.7) (page 127) produces
5,840
I r=p = .9911
(120,000,000)(.289366)
This value of r is near +1 and indicates clearly the strong positive linear rela-
tionship evident in Figures 4.1 and 4.3.
Definition 3 The coefficient of determination for an equation fitted to an n-point data set
via least squares and producing fitted y values ŷ 1 , ŷ 2 , . . . , ŷ n is
P 2 P 2
yi − ȳ − yi − ŷ i
R =
2
P 2 (4.9)
yi − ȳ
Interpretation R 2 may be interpreted as the fraction of the raw variation in y accounted for
of R2 using Pthe fitted equation.
P That is, providedP the fitted equation includes a constant
term, (yi − ȳ)2 ≥ (yi − ŷ i )2 . Further, (yi − ȳ)2 is a measure of raw variabil-
P
ity in y, while (yi − ŷ i )2 is a measure of variation in y remaining after fitting the
P P
equation. So the nonnegative difference (yi − ȳ)2 − (yi − ŷ i )2 is a measure of
the variability in y accounted for in the equation-fitting process. R 2 then expresses
this difference as a fraction (of the total raw variation).
4.1 Fitting a Line by Least Squares 131
Example 1 Using the fitted line, one can find ŷ values for all n = 15 data points in the original
(continued ) data set. These are given in Table 4.2.
Table 4.2
Fitted Density Values
.289366 − .005153
I R2 = = .9822
.289366
and the fitted line accounts for over 98% of the raw variability in density, reducing
the “unexplained” variation from .289366 to .005153.
R2 as a squared The coefficient of determination has a second useful interpretation. For equa-
correlation tions that are linear in the parameters (which are the only ones considered in this
text), R 2 turns out to be a squared correlation. It is the squared correlation between
the observed values yi and the fitted values ŷ i . (Since in the present situation of
fitting a line, the ŷ i values are perfectly correlated with the xi values, R 2 also turns
out to be the squared correlation between the yi and xi values.)
Example 1 Since ŷ is perfectly correlated with x, this is also the correlation between ŷ and y.
(continued ) But notice as well that
r 2 = (.9911)2 = .9822 = R 2
ei = yi − ŷ i
If a fitted equation is telling the whole story contained in a data set, then its
residuals ought to be patternless. So when they’re plotted against time order of
observation, values of experimental variables, fitted values, or any other sensible
quantities, the plots should look randomly scattered. When they don’t, the patterns
can themselves suggest what has gone unaccounted for in the fitting and/or how the
data summary might be improved.
Table 4.3
Additive Concentrations and Compressive Strengths for Fly Ash Cylinders
0 1221 3 1609
0 1207 3 1627
0 1187 3 1642
1 1555 4 1451
1 1562 4 1472
1 1575 4 1465
2 1827 5 1321
2 1839 5 1289
2 1802 5 1292
Using formulas (4.6) and (4.7), it is possible to show that the least squares
line through the (x, y) data in Table 4.3 is
Table 4.4
Residuals from a Straight-Line Fit to the Fly Ash Data
x y ŷ e = y − ŷ x y ŷ e = y − ŷ
Example 2
(continued ) 300
200
100
Residual, ei
–100
–200
–300
0 1 2 3 4 5
Percent ammonium phosphate, xi
the fitting of a line to Roth’s data. Figure 4.5 is a simple scatterplot of Roth’s
data (which in practice should be made before fitting any curve to such data).
It is obvious from the scatterplot that the relationship between the amount of
ammonium phosphate and compressive strength is decidedly nonlinear. In fact,
a quadratic function would come much closer to fitting the data in Table 4.3.
1800
1700
Compressive strength (psi)
1600
1500
Least squares line
1400
1300
1200
0 1 2 3 4 5
Percent ammonium phosphate
Order of yi 1 2 Technician
observation, i
Interpreting Figure 4.6 shows several patterns that can occur in plots of residuals against
patterns on various variables. Plot 1 of Figure 4.6 shows a trend on a plot of residuals versus
residual plots time order of observation. The pattern suggests that some variable changing in time
is acting on y and has not been accounted for in fitting ŷ values. For example,
instrument drift (where an instrument reads higher late in a study than it did early
on) could produce a pattern like that in Plot 1. Plot 2 shows a fan-shaped pattern on
a plot of residuals versus fitted values. Such a pattern indicates that large responses
are fitted (and quite possibly produced and/or measured) less consistently than small
responses. Plot 3 shows residuals corresponding to observations made by Technician
1 that are on the whole smaller than those made by Technician 2. The suggestion is
that Technician 1’s work is more precise than that of Technician 2.
Normal-plotting Another useful way of plotting residuals is to normal-plot them. The idea is that
residuals the normal distribution shape is typical of random variation and that normal-plotting
of residuals is a way to investigate whether such a distributional shape applies to
what is left in the data after fitting an equation or model.
Example 1 Table 4.5 gives residuals for the fitting of a line to the pressure/density data. The
(continued ) residuals ei were treated as a sample of 15 numbers and normal-plotted (using
the methods of Section 3.2) to produce Figure 4.7.
The central portion of the plot in Figure 4.7 is fairly linear, indicating a gen-
erally bell-shaped distribution of residuals. But the plotted point corresponding to
the largest residual, and probably the one corresponding to the smallest residual,
fail to conform to the linear pattern established by the others. Those residuals
seem big in absolute value compared to the others.
From Table 4.5 and the scatterplot in Figure 4.3, one sees that these large
residuals both arise from the 8,000 psi condition. And the spread for the three
densities at that pressure value does indeed look considerably larger than those at
the other pressure values. The normal plot suggests that the pattern of variation
at 8,000 psi is genuinely different from those at other pressures. It may be that
a different physical compaction mechanism was acting at 8,000 psi than at the
other pressures. But it is more likely that there was a problem with laboratory
technique, or recording, or the test equipment when the 8,000 psi tests were made.
136 Chapter 4 Describing Relationships Between Variables
Example 1 In any case, the normal plot of residuals helps draw attention to an idiosyncrasy
(continued ) in the data of Table 4.1 that merits further investigation, and perhaps some further
data collection.
Table 4.5
Residuals from the Linear Fit to the Pressure/Density
Data
x, Pressure y, Density ŷ e = y − ŷ
2,000 2.486 2.4723 .0137
2,000 2.479 2.4723 .0067
2,000 2.472 2.4723 −.0003
4,000 2.558 2.5697 −.0117
4,000 2.570 2.5697 .0003
4,000 2.580 2.5697 .0103
6,000 2.646 2.6670 −.0210
6,000 2.657 2.6670 −.0100
6,000 2.653 2.6670 −.0140
8,000 2.724 2.7643 −.0403
8,000 2.774 2.7643 .0097
8,000 2.808 2.7643 .0437
10,000 2.861 2.8617 −.0007
10,000 2.879 2.8617 .0173
10,000 2.858 2.8617 −.0037
2.0
Standard normal quantile
1.0
–1.0
–2.0
–.04 –.02 0 .02 .04
Residual quantile
80
A 30-year-old
75
6' 8" student
Height (in.)
70 2 3 2
3
65
60
20 22 24 26 28 30
Age (years)
data alone would have identified the 30-year-old student in Figure 4.8 as unusual.
That would have raised the possibility of that data point strongly influencing both r
and any curve that might be fitted via least squares.
4.1.5 Computing
The examples in this section have no doubt left the impression that computations
were done “by hand.” In practice, such computations are almost always done with
a statistical analysis package. The fitting of a line by least squares is done using a
regression program. Such programs usually also compute R 2 and have an option
that allows the computing and plotting of residuals.
It is not the purpose of this text to teach or recommend the use of any particular
statistical package, but annotated printouts will occasionally be included to show
how MINITAB formats its output. Printout 1 is such a printout for an analysis of
the pressure/density data in Table 4.1, paralleling the discussion in this section.
(MINITAB’s regression routine is found under its “Stat/Regression/Regression”
menu.) MINITAB gives its user much more in the way of analysis for least squares
curve fitting than has been discussed to this point, so your understanding of Printout 1
will be incomplete. But it should be possible to locate values of the major summary
statistics discussed here. The printout shown doesn’t include plots, but it’s worth
noting that the program has options for saving fitted values and residuals for later
plotting.
Analysis of Variance
Source DF SS MS F P
Regression 1 0.28421 0.28421 717.06 0.000
Residual Error 13 0.00515 0.00040
Total 14 0.28937
At the end of Section 3.3 we warned that using spreadsheet software in place of
high-quality statistical software can, without warning, produce spectacularly wrong
answers. The example provided at the end of Section 3.3 concerns a badly wrong
sample variance of only three numbers. It is important to note that the potential
for numerical inaccuracy shown in that example carries over to the rest of the
statistical methods discussed in this book, including those of the present section.
For example, consider the n = 6 hypothetical (x, y) pairs listed in Table 4.6. For
fitting a line to these data via least squares, MINITAB correctly produces R 2 = .997.
But as recently as late 1999, the current version of the leading spreadsheet program
returned the ridiculously wrong value, R 2 = −.81648. (This data set comes from a
posting by Mark Eakin on the “edstat” electronic bulletin board that can be found
at http://jse.stat.ncsu.edu/archives/.)
Table 4.6
6 Hypothetical Data Pairs
x y x y
Section 1 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. The following is a small set of artificial data. Show (c) Obtain the sample correlation between y and
the hand calculations necessary to do the indicated ŷ for these data and compare it to your answer
tasks. to part (b).
(d) Use the formula in Definition 3 and compute
x 1 2 3 4 5 R 2 for these data. Compare it to the square of
y 8 8 6 6 4 your answers to parts (b) and (c).
(e) Find the five residuals from your fit in part (a).
(a) Obtain the least squares line through these data. How are they portrayed geometrically on the
Make a scatterplot of the data and sketch this scatterplot for (a)?
line on that scatterplot. 2. Use a computer package and redo the computations
(b) Obtain the sample correlation between x and y and plotting required in Exercise 1. Annotate your
for these data. output, indicating where on the printout you can
140 Chapter 4 Describing Relationships Between Variables
find the equation of the least squares line, the value (e) Based on your analysis of these data, what
of r , the value of R 2 , and the residuals. average molecular weight would you predict
3. The article “Polyglycol Modified Poly (Ethylene for an additional reaction run at 188◦ C? At
Ether Carbonate) Polyols by Molecular Weight Ad- 200◦ C? Why would or wouldn’t you be willing
vancement” by R. Harris (Journal of Applied Poly- to make a similar prediction of average molec-
mer Science, 1990) contains some data on the effect ular weight if the reaction is run at 70◦ C?
of reaction temperature on the molecular weight of 4. Upon changing measurement scales, nonlinear re-
resulting poly polyols. The data for eight experi- lationships between two variables can sometimes
mental runs at temperatures 165◦ C and above are be made linear. The article “The Effect of Experi-
as follows: mental Error on the Determination of the Optimum
Metal-Cutting Conditions” by Ermer and Wu (The
Pot Temperature, x (◦ C) Average Molecular Weight, y Journal of Engineering for Industry, 1967) con-
tains a data set gathered in a study of tool life in
165 808 a turning operation. The data here are part of that
176 940 data set.
188 1183
205 1545 Cutting Speed, x (sfpm) Tool Life, y (min)
220 2012
800 1.00, 0.90, 0.74, 0.66
235 2362
700 1.00, 1.20, 1.50, 1.60
250 2742
600 2.35, 2.65, 3.00, 3.60
260 2935
500 6.40, 7.80, 9.80, 16.50
Use a statistical package to help you complete the 400 21.50, 24.50, 26.00, 33.00
following (both the plotting and computations):
(a) What fraction of the observed raw variation in (a) Plot y versus x and calculate R 2 for fitting a
y is accounted for by a linear equation in x? linear function of x to y. Does the relationship
(b) Fit a linear relationship y ≈ β0 + β1 x to these y ≈ β0 + β1 x look like a reasonable explana-
data via least squares. About what change in tion of tool life in terms of cutting speed?
average molecular weight seems to accompany (b) Take natural logs of both x and y and repeat
a 1◦ C increase in pot temperature (at least over part (a) with these log cutting speeds and log
the experimental range of temperatures)? tool lives.
(c) Compute and plot residuals from the linear re- (c) Using the logged variables as in (b), fit a lin-
lationship fit in (b). Discuss what they suggest ear relationship between the two variables us-
about the appropriateness of that fitted equa- ing least squares. Based on this fitted equation,
tion. (Plot residuals versus x, residuals versus what tool life would you predict for a cutting
ŷ, and make a normal plot of them.) speed of 550? What approximate relationship
(d) These data came from an experiment where the between x and y is implied by a linear approx-
investigator managed the value of x. There is imate relationship between ln(x) and ln(y)?
a fairly glaring weakness in the experimenter’s (Give an equation for this relationship.) By the
data collection efforts. What is it? way, Taylor’s equation for tool life is yx α = C.
4.2 Fitting Curves and Surfaces by Least Squares 141
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
y ≈ β 0 + β1 x (4.11)
y ≈ β 0 + β1 x + β 2 x 2 + · · · + βk x k (4.12)
The least squares fitting of equation (4.12) to a set of n pairs (xi , yi ) is conceptually
only slightly more difficult than the task of fitting equation (4.11). The function of
k + 1 variables
X
n X
n
2
S(β0 , β1 , β2 , . . . , βk ) = (yi − ŷ i ) = 2
yi − (β0 + β1 xi + β2 xi2 + · · · + βk xik )
i=1 i=1
y ≈ β 0 + β1 x + β2 x 2 (4.13)
to the data of Table 4.3. Printout 2 shows the MINITAB run. (After entering x
and y values from Table 4.3 into two columns of the worksheet, an additional
column was created by squaring the x values.)
Analysis of Variance
Source DF SS MS F P
Regression 2 658230 329115 48.78 0.000
Residual Error 15 101206 6747
Total 17 759437
Source DF Seq SS
x 1 21
x**2 1 658209
Figure 4.9 shows the fitted curve sketched on a scatterplot of the (x, y) data.
Although the quadratic curve is not an altogether satisfactory summary of Roth’s
data, it does a much better job of following the trend of the data than the line
sketched in Figure 4.5.
4.2 Fitting Curves and Surfaces by Least Squares 143
1800
1700
1500
1400
1300
1200
0 1 2 3 4 5
Percent ammonium phosphate
The previous section showed that when fitting a line to (x, y) data, it is helpful
to quantify the goodness of that fit using R 2 . The coefficient of determination can
also be used when fitting a polynomial of form (4.12). Recall once more from
Definition 3 that
P P
(yi − ȳ)2 − (yi − ŷ i )2
Coefficient of R =
2
P (4.14)
determination (yi − ȳ)2
is the fraction of the raw variability in y accounted for by the fitted equation.
Calculation by hand from formula (4.14) is possible, but of course the easiest way
to obtain R 2 is to use a computer package.
Example 3 Consulting Printout 2, it can be seen that the equation ŷ = 1242.9 + 382.7x −
(continued ) 76.7x 2 produces R 2 = .867. So 86.7% of the raw variability in compressive
strength is accounted for using the fitted quadratic. The sample√ correlation be-
tween the observed strengths yi and fitted strengths ŷ i is + .867 = .93.
Comparing what has been done in the present section to what was done in
Section 4.1, it is interesting that for the fitting of a line to the fly ash data, R 2
obtained there was only .000 (to three decimal places). The present quadratic is
a remarkable improvement over a linear equation for summarizing these data.
A natural question to raise is “What about a cubic version of equation (4.12)?”
Printout 3 shows some results of a MINITAB run made to investigate this possi-
bility, and Figure 4.10 shows a scatterplot of the data and a plot of the fitted cubic
144 Chapter 4 Describing Relationships Between Variables
Example 3 equation. (x values were squared and cubed to provide x, x 2 , and x 3 for each y
(continued ) value to use in the fitting.)
Regression Analysis
Source DF SS MS F P
Regression 3 723197 241066 93.13 0.000
Residual Error 14 36240 2589
Total 17 759437
1800
1700
Compressive strength (psi)
1400
1300
1200
0 1 2 3 4 5
Percent ammonium phosphate
Figure 4.10 Scatterplot and fitted cubic for the fly ash
data
4.2 Fitting Curves and Surfaces by Least Squares 145
R 2 for the cubic equation is .952, somewhat larger than for the quadratic.
But it is fairly clear from Figure 4.10 that even a cubic polynomial is not totally
satisfactory as a summary of these data. In particular, both the fitted quadratic in
Figure 4.9 and the fitted cubic in Figure 4.10 fail to fit the data adequately near
an ammonium phosphate level of 2%. Unfortunately, this is where compressive
strength is greatest—precisely the area of greatest practical interest.
The example illustrates that R 2 is not the only consideration when it comes to
judging the appropriateness of a fitted polynomial. The examination of plots is also
important. Not only scatterplots of y versus x with superimposed fitted curves but
plots of residuals can be helpful. This can be illustrated on a data set where y is
expected to be nearly perfectly quadratic in x.
gt 2
displacement =
2
one expects
2
1 1
y ≈ g t0 + (x − 1)
2 60
g x 2 1 x g 1 2
= + g t0 − + t − (4.15)
2 60 60 60 2 0 60
g 2 g 1 g 1 2
= x + t0 − x+ t0 −
7200 60 60 2 60
ŷ = b0 + b1 x + b2 x 2 (4.16)
7200b2 . This is in fact how the value 9.79 m/sec2 , quoted in Section 1.4, was
obtained.
A multiple linear regression program fits equation (4.16) to the bob drop data
giving
(from which g ≈ 9790 mm/sec2 ) with R 2 that is 1.0 to 6 decimal places. Residuals
for this fit can be calculated using Definition 4 and are also given in Table 4.7.
Figure 4.11 is a normal plot of the residuals. It is reasonably linear and thus not
remarkable (except for some small suggestion that the largest residual or two may
not be as extreme as might be expected, a circumstance that suggests no obvious
physical explanation).
4.2 Fitting Curves and Surfaces by Least Squares 147
2.0
–1.0
–2.0
–.3 –.2 –.1 0 .1 .2 .3
Residual quantile
.30
Residual
–.30
5 10 15 20 25
Point number, x
Example 4 pattern suggested by Figure 4.12 reappear consistently, it would indicate that
(continued ) something in the mechanism generating the 60 cycle current may cause cycles
1
to be alternately slightly shorter then slightly longer than 60 sec. The practical
implication of this would be that if a better determination of g were desired, the
regularity of the AC current waveform is one matter to be addressed.
What if a Examples 3 and 4 (respectively) illustrate only partial success and then great
polynomial success in describing an (x, y) data set by means of a polynomial equation. Situations
doesn’t fit like Example 3 obviously do sometimes occur, and it is reasonable to wonder what
(x, y) data? to do when they happen. There are two simple things to keep in mind.
For one, although a polynomial may be unsatisfactory as a global description
of a relationship between x and y, it may be quite adequate locally—i.e., for
a relatively restricted range of x values. For example, in the fly ash study, the
quadratic representation of compressive strength as a function of percent ammonium
phosphate is not appropriate over the range 0 to 5%. But having identified the region
around 2% as being of practical interest, it would make good sense to conduct a
follow-up study concentrating on (say) 1.5 to 2.5% ammonium phosphate. It is quite
possible that a quadratic fit only to data with 1.5 ≤ x ≤ 2.5 would be both adequate
and helpful as a summarization of the follow-up data.
The second observation is that the terms x, x 2 , x 3 , . . . , x k in equation (4.12) can
be replaced by any (known) functions of x and what we have said here will remain
essentially unchanged. The normal equations will still be k + 1 linear equations
in β0 , β1 , . . . , βk , and a multiple linear regression program will still produce least
squares values b0 , b1 , . . . , bk . This can be quite useful when there are theoretical
reasons to expect a particular (nonlinear but) simple functional relationship between
x and y. For example, Taylor’s equation for tool life is of the form
y ≈ αx β
for y tool life (e.g., in minutes) and x the cutting speed used (e.g., in sfpm). Taking
logarithms,
This is an equation for ln(y) that is linear in the parameters ln(α) and β involving
the variable ln(x). So, presented with a set of (x, y) data, empirical values for α and
β could be determined by
y ≈ β 0 + β1 x 1 + β2 x 2 + · · · + βk x k (4.17)
to the data using the least squares principle. This is pictured for a k = 2 case in
Figure 4.13, where six (x 1 , x2 , y) data points are pictured in three dimensions, along
with a possible fitted surface of the form (4.17). To fit a surface defined by equation
(4.17) to a set of n data points (x1i , x2i , . . . , xki , yi ) via least squares, the function
of k + 1 variables
X
n X
n
2
S(β0 , β1 , β2 , . . . , βk ) = (yi − ŷ i )2 = yi − (β0 + β1 x1i + · · · + βk xki )
i=1 i=1
x2
x1
ŷ = b0 + b1 x1 + b2 x2 + b3 x3
Table 4.8
Brownlee’s Stack Loss Data
i, x2i , x3i ,
Observation x1i , Cooling Water Acid yi ,
Number Air Flow Inlet Temperature Concentration Stack Loss
1 80 27 88 37
2 62 22 87 18
3 62 23 87 18
4 62 24 93 19
5 62 24 93 20
6 58 23 87 15
7 58 18 80 14
8 58 18 89 14
9 58 17 88 13
10 58 18 82 11
11 58 19 93 12
12 50 18 89 8
13 50 18 86 7
14 50 19 72 8
15 50 19 79 8
16 50 20 80 9
17 56 20 82 15
4.2 Fitting Curves and Surfaces by Least Squares 151
Interpreting with R 2 = .975. The coefficients in this equation can be thought of as rates of
fitted coefficients change of stack loss with respect to the individual variables x 1 , x2 , and x3 , holding
from a multiple the others fixed. For example, b1 = .80 can be interpreted as the increase in stack
regression loss y that accompanies a one-unit increase in air flow x 1 if inlet temperature x2
and acid concentration x3 are held fixed. The signs on the coefficients indicate
whether y tends to increase or decrease with increases in the corresponding x. For
example, the fact that b1 is positive indicates that the higher the rate at which the
plant is run, the larger y tends to be (i.e., the less efficiently the plant operates).
The large value of R 2 is a preliminary indicator that the equation (4.18) is an
effective summarization of the data.
Source DF SS MS F P
Regression 3 795.83 265.28 169.04 0.000
Residual Error 13 20.40 1.57
Total 16 816.24
Source DF Seq SS
air 1 775.48
water 1 18.49
acid 1 1.86
Unusual Observations
Obs air stack Fit StDev Fit Residual St Resid
10 58.0 11.000 13.506 0.552 -2.506 -2.23R
Example 5 In the context of the nitrogen plant, it is sensible to ask whether all three variables,
(continued ) x1 , x2 , and x3 , are required to adequately account for the observed variation in
y. For example, the behavior of stack loss might be adequately explained using
only one or two of the three x variables. There would be several consequences
of practical engineering importance if this were so. For one, in such a case, a
simple or parsimonious version of equation (4.17) could be used in describing
the oxidation process. And if a variable is not needed to predict y, then it is
possible that the expense of measuring it might be saved. Or, if a variable doesn’t
seem to have much impact on y (because it doesn’t seem to be essential to include
it when writing an equation for y), it may be possible to choose its level on purely
economic grounds, without fear of degrading process performance.
As a means of investigating whether indeed some subset of x 1 , x2 , and x3
is adequate to explain stack loss behavior, R 2 values for equations based on all
possible subsets of x1 , x2 , and x3 were obtained and placed in Table 4.9. This
shows, for example, that 95% of the raw variability in y can be accounted for
using a linear equation in only the air flow variable x 1 . Use of both x1 and the
water temperature variable x 2 can account for 97.3% of the raw variability in
stack loss. Inclusion of x 3 , the acid concentration variable, in an equation already
involving x 1 and x2 , increases R 2 only from .973 to .975.
If identifying a simple equation for stack loss that seems to fit the data well
is the goal, the message in Table 4.9 would seem to be “Consider an x 1 term first,
and then possibly an x2 term.” On the basis of R 2 , including an x3 term in an
equation for y seems unnecessary. And in retrospect, this is entirely consistent
with the character of the fitted equation (4.18): x 3 varies from 72 to 93 in the
original data set, and this means that ŷ changes only a total amount
ŷ = b0 + b1 x1 + b2 x2 + b3 x3
these can and should go through thorough residual analyses before they are
adopted as data summaries. As an example, consider a fitted equation involving
4.2 Fitting Curves and Surfaces by Least Squares 153
Table 4.9
R2 ’s for Equations Predicting Stack Loss
Equation Fit R2
y ≈ β0 + β1 x1 .950
y ≈ β0 + β2 x2 .695
y ≈ β0 + β3 x3 .165
y ≈ β0 + β1 x1 + β2 x2 .973
y ≈ β0 + β1 x1 + β3 x3 .952
y ≈ β0 + β2 x2 + β3 x3 .706
y ≈ β0 + β1 x1 + β2 x2 + β3 x3 .975
x1 and x2 . A multiple linear regression program can be used to produce the fitted
equation
Dropping variables (Notice that b0 , b1 , and b2 in equation (4.19) differ somewhat from the corre-
from a fitted sponding values in equation (4.18). That is, equation (4.19) was not obtained
equation typically from equation (4.18) by simply dropping the last term in the equation. In general,
changes coefficients the values of the coefficients b will change depending on which x variables are
and are not included in the fitting.)
Residuals for equation (4.19) can be computed and plotted in any number
of potentially useful ways. Figure 4.14 shows a normal plot of the residuals and
three other plots of the residuals against, respectively, x 1 , x2 , and ŷ. There are
no really strong messages carried by the plots in Figure 4.14 except that the
data set contains one unusually large x 1 value and one unusually large ŷ (which
corresponds to the large x1 ). But there is enough of a curvilinear “up-then-down-
then-back-up-again” pattern in the plot of residuals against x 1 to suggest the
possibility of adding an x12 term to the fitted equation (4.19).
You might want to verify that fitting the equation
y ≈ β0 + β1 x1 + β2 x2 + β3 x12
with corresponding R 2 = .980 and residuals that show even less of a pattern than
those for the fitted equation (4.19). In particular, the hint of curvature on the plot
of residuals versus x1 for equation (4.19) is not present in the corresponding plot
for equation (4.20). Interestingly, looking back over this example, one sees that
fitted equation (4.20) has a better R 2 value than even fitted equation (4.18), in
154 Chapter 4 Describing Relationships Between Variables
Standard normal quantile
2.0 2.0
1.0 1.0
2
Residual
2
0 0
–1.0 –1.0
–2.0 –2.0
2.0 2.0
1.0 1.0
2 2
Residual
Residual
2 2
0 0
–1.0 –1.0
–2.0 –2.0
20 25 30 10 20 30
Inlet temperature, x2 Fitted Stack Loss, y
Figure 4.14 Plots of residuals from a two-variable equation fit to the stack loss data
( ŷ = −42.00 − .78x1 + .57x2 )
Example 5 spite of the fact that equation (4.18) involves the process variable x 3 and equation
(continued ) (4.20) does not.
Equation (4.20) is somewhat more complicated than equation (4.19). But
because it still really only involves two different input x’s and also eliminates the
slight pattern seen on the plot of residuals for equation (4.19) versus x 1 , it seems
an attractive choice for summarizing the stack loss data. A two-dimensional rep-
resentation of the fitted surface defined by equation (4.20) is given in Figure 4.15.
The slight curvature on the plotted curves is a result of the x 12 term appearing in
equation (4.20). Since most of the data have x 1 from 50 to 62 and x2 from 17 to
24, the curves carry the message that over these ranges, changes in x 1 seem to
produce larger changes in stack loss than do changes in x 2 . This conclusion is
consistent with the discussion centered around Table 4.9.
4.2 Fitting Curves and Surfaces by Least Squares 155
35
30
20 x2 = 28 x2 = 20
15 x2 = 24
10
50 55 60 65 70 75
Air flow, x1
Common residual The plots of residuals used in Example 5 are typical. They are
plots in multiple
regression
1. normal plots of residuals,
All of these can be used to help assess the appropriateness of surfaces fit to multivari-
ate data, and they all have the potential to tell an engineer something not previously
discovered about a set of data and the process that generated them.
Earlier in this section, there was a discussion of the fact that an “x term” in
the equations fitted via least squares can be a known function (e.g., a logarithm)
of a basic process variable. In fact, it is frequently helpful to allow an “x term” in
equation (4.17) (page 149) to be a known function of several basic process variables.
The next example illustrates this point.
156 Chapter 4 Describing Relationships Between Variables
x1 = canard placement in inches above the plane defined by the main wing
x2 = tail placement in inches above the plane defined by the main wing
(The front-to-rear positions of the three surfaces were constant throughout the
study.)
A straightforward least squares fitting of the equation
y ≈ β 0 + β1 x 1 + β2 x 2
to these data produces R 2 of only .394. Even the addition of squared terms in
both x1 and x2 , i.e., the fitting of
y ≈ β0 + β1 x1 + β2 x2 + β3 x12 + β4 x22
Table 4.10
Lift/Drag Ratios for 9 Canard/Tail Position Combinations
x1 , x2 , y,
Canard Position Tail Position Lift/Drag Ratio
−1.2 −1.2 .858
−1.2 0.0 3.156
−1.2 1.2 3.644
0.0 −1.2 4.281
0.0 0.0 3.481
0.0 1.2 3.918
1.2 −1.2 4.136
1.2 0.0 3.364
1.2 1.2 4.018
4.2 Fitting Curves and Surfaces by Least Squares 157
Regression Analysis
Analysis of Variance
Source DF SS MS F P
Regression 3 5.4771 1.8257 2.97 0.136
Residual Error 5 3.0724 0.6145
Total 8 8.5495
(After reading x1 , x2 , and y values from Table 4.10 into columns of MINITAB’s
worksheet, x1 x2 products were created and y fitted to the three predictor variables
x1 , x2 , and x1 x2 in order to create this printout.)
Figure 4.16 shows the nature of the fitted surface (4.21). Raising the canard
(increasing x1 ) has noticeably different predicted impacts on y, depending on the
value of x2 (the tail position). (It appears that the canard and tail should not be
lined up—i.e., x1 should not be near x2 . For large predicted response, one wants
small x1 for large x2 and large x1 for small x2 .) It is the cross-product term x1 x2
in relationship (4.21) that allows the response curves to have different characters
for different x2 values. Without it, the slices of the fitted (x 1 , x2 , ŷ) surface would
be parallel for various x 2 , much like the situation in Figure 4.15.
x2 = 1.2
Fitted lift / Drag ratio, y
4.0
3.0
x2 = 0
x2 = –1.2
2.0
1.0
–1.2 0 1.2
Canard position, x1
Example 6 Although the main new point of this example has by now been made, it
(continued ) probably should be mentioned that equation (4.21) is not the last word for fitting
the data of Table 4.10. Figure 4.17 gives a plot of the residuals for relationship
(4.21) versus canard position x1 , and it shows a strong curvilinear pattern. In fact,
the fitted equation
1.0
Residual
–1.0
–1.2 0 1.2
Canard position, x1
x2
Dots show (x1, x2) locations
of fictitious data points
20
The region
15 × with 1 ≤ x1 ≤ 5
and 10 ≤ x2 ≤ 20
10
(3,15) is unlike the
5 (x1, x2) pairs for the data
1 2 3 4 5 x1
that when several different x variables are involved, it is difficult to tell whether a
particular (x1 , x2 , . . . , xk ) vector is a large extrapolation. About all one can do is
check to see that it comes close to matching some single data point in the set on
each coordinate x1 , x2 , . . . , xk . It is not sufficient that there be some point with x 1
value near the one of interest, another point with x 2 value near the one of interest,
etc. For example, having data with 1≤ x 1 ≤ 5 and 10≤ x2 ≤ 20 doesn’t mean that
the (x1 , x2 ) pair (3, 15) is necessarily like any of the pairs in the data set. This fact
is illustrated in Figure 4.18 for a fictitious set of (x 1 , x2 ) values.
The influence Another potential pitfall is that the fitting of curves and surfaces via least squares
of outlying can be strongly affected by a few outlying or extreme data points. One can try to
data vectors identify such points by examining plots and comparing fits made with and without
the suspicious point(s).
Example 5 Figure 4.14 earlier called attention to the fact that the nitrogen plant data set
(continued ) contains one point with an extreme x 1 value. Figure 4.19 is a scatterplot of
(x1 , x2 ) pairs for the data in Table 4.8 (page 150). It shows that by most qualitative
standards, observation 1 in Table 4.8 is unusual or outlying.
If the fitting of equation (4.20) is redone using only the last 16 data points in
Table 4.8, the equation
and R 2 = .942 are obtained. Using equation (4.23) as a description of stack loss
and limiting attention to x 1 in the range 50 to 62 could be considered. But it
is possible to verify that though some of the coefficients (the b’s) in equations
(4.20) and (4.23) differ substantially, the two equations produce comparable ŷ
values for the 16 data points with x1 between 50 and 62. In fact, the largest
difference in fitted values is about .4. So, since point 1 in Table 4.8 doesn’t
160 Chapter 4 Describing Relationships Between Variables
Example 5
Water temperature, x2
(continued ) 25
2
20
2
2 3
15
50 55 60 65 70 75 80
Air flow, x1
Figure 4.19 Plot of (x1 , x2 ) pairs for the stack loss data
radically change predictions made using the fitted equation, it makes sense to
leave it in consideration, adopt equation (4.20), and use it to describe stack loss
for (x1 , x2 ) pairs interior to the pattern of scatter in Figure 4.19.
Replication and A third warning has to do with the notion of replication (first discussed in
surface fitting Section 2.3). It is the fact that the fly ash data of Example 3 has several y’s for
each x that makes it so clear that even the quadratic and cubic curves sketched
in Figures 4.9 and 4.10 are inadequate descriptions of the relationship between
phosphate and strength. The fitted curves pass clearly outside the range of what look
like believable values of y for some values of x. Without such replication, what is
permissible variation about a fitted curve or surface can’t be known with confidence.
For example, the structure of the lift/drag data set in Example 6 is weak from this
viewpoint. There is no replication represented in Table 4.10, so an external value for
typical experimental precision would be needed in order to identify a fitted value as
obviously incompatible with an observed one.
The nitrogen plant data set of Example 5 was presumably derived from a
primarily observational study, where no conscious attempt was made to replicate
(x1 , x2 , x3 ) settings. However, points number 4 and 5 in Table 4.8 (page 150) do
represent the replication of a single (x1 , x2 , x3 ) combination and show a difference
in observed stack loss of 1. And this makes the residuals for equation (4.20) (which
range from −2.0 to 2.3) seem at least not obviously out of line.
Section 9.2 discusses more formal and precise ways of using data from studies
with some replication to judge whether or not a fitted curve or surface misses some
observed y’s too badly. For now, simply note that among replication’s many virtues
is the fact that it allows more reliable judgments about the appropriateness of a fitted
equation than are otherwise possible.
The possibility The fourth caution is that the notion of equation simplicity ( parsimony) is
of overfitting important for reasons in addition to simplicity of interpretation and reduced expense
involved in using the equation. It is also important from the point of view of typically
giving smooth interpolation and not overfitting a data set. As a hypothetical example,
4.2 Fitting Curves and Surfaces by Least Squares 161
consider the artificial, generally linear (x, y) data plotted in Figure 4.20. It would be
possible to run a (wiggly) k = 10 version of the polynomial (4.12) through each of
these points. But in most physical problems, such a curve would do a much worse
job of predicting y at values of x not represented by a data point than would a simple
fitted line. A tenth-order polynomial would overfit the data in hand.
Empirical models As a final point in this section, consider how the methods discussed here fit
and engineering into the broad picture of using models for attacking engineering problems. It must
be said that physical theories of physics, chemistry, materials, etc. rarely produce
equations of the forms (4.12) or (4.17). Sometimes pertinent equations from those
theories can be rewritten in such forms, as was possible with Taylor’s equation for
tool life earlier in this section. But the majority of engineering applications of the
methods in this section are to the large number of problems where no commonly
known and simple physical theory is available, and a simple empirical description
of the situation would be helpful. In such cases, the tool of least squares fitting of
curves and surfaces can function as a kind of “mathematical French curve,” allowing
an engineer to develop approximate empirical descriptions of how a response y is
related to system inputs x1 , x2 , . . . , xk .
Section 2 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Return to Exercise 3 of Section 4.1. Fit a quadratic 2. Here are some data taken from the article “Chemi-
relationship y ≈ β0 + β1 x + β2 x 2 to the data via thermomechanical Pulp from Mixed High Den-
least squares. By appropriately plotting residuals sity Hardwoods” by Miller, Shankar, and Peterson
and examining R 2 values, determine the advis- (Tappi Journal, 1988). Given are the percent NaOH
ability of using a quadratic rather than a linear used as a pretreatment chemical, x 1 , the pretreat-
equation to describe the relationship between x ment time in minutes, x 2 , and the resulting value
and y. If a quadratic fitted equation is used, how of a specific surface area variable, y (with units of
does the predicted mean molecular weight at 200◦ C cm3 /g), for nine batches of pulp produced from a
compare to that obtained in part (e) of the earlier mixture of hardwoods at a treatment temperature
exercise? of 75◦ C in mechanical pulping.
162 Chapter 4 Describing Relationships Between Variables
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
equation can adequately describe the data and have intuitively appealing and under-
standable interpretations. The use of simple plots and residuals will be discussed,
as tools helpful in assessing whether such a simple structure holds.
The discussion begins with the 2-factor case, then considers three (or, by anal-
ogy, more) factors. Finally, the special case where each factor has only two levels is
discussed.
The ȳ i. and ȳ . j are row and column averages when one thinks of the ȳ i j laid out in
a two-dimensional format, as shown in Figure 4.21.
Example 7 Joint Strengths for Three Different Joint Types in Three Different Woods
Kotlers, MacFarland, and Tomlinson studied the tensile strength of three differ-
ent types of joints made on three different types of wood. Butt, lap, and beveled
joints were made in nominal 100 × 400 × 1200 pine, oak, and walnut specimens
using a resin glue. The original intention was to test two specimens of each Joint
Type/Wood Type combination. But one operator error and one specimen failure
not related to its joint removed two of the original data points from consideration
and gave the data in Table 4.11. These data have complete 3 × 3 factorial struc-
164 Chapter 4 Describing Relationships Between Variables
Factor B
Level 1 Level 2 Level J
Factor A
Table 4.12
Sample Means for Nine Wood/Joint Combinations
Wood
ture. Collecting y’s for the nine different combinations into separate samples and
calculating means, the ȳ i j ’s are as presented in tabular form in Table 4.12 and
Interaction plotted in Figure 4.22. This figure is a so-called interaction plot of these means.
Plot The qualitative messages given by the plot are as follows:
1. Joint types ordered by strength are “beveled is stronger than lap, which
in turn is stronger than butt.”
2400
2200
2000
Mean stress at failure, y (psi)
1800
1600
Beveled
Lap
1400
1200 Butt
1000
800
600
Pine Oak Walnut
Wood
The row and column average means ( ȳi· ’s and ȳ· j ’s, respectively) might be
taken as measures of average response behavior at different levels of the factors in
question. If so, it then makes sense to use the differences between these and the
grand average mean ȳ.. as measures of the effects of those levels on mean response.
This leads to Definition 5.
Definition 5 In a two-way complete factorial study with factors A and B, the fitted main
effect of factor A at its ith level is
ai = ȳ i. − ȳ ..
b j = ȳ . j − ȳ ..
Example 7 Simple arithmetic and the ȳ’s in Table 4.12 yield the fitted main effects for the
(continued ) joint strength study of Kotlers, MacFarland, and Tomlinson. First for factor A
(the Joint Type),
These fitted main effects quantify the first two qualitative messages carried by
the data and listed as (1) and (2) before Definition 5. For example,
a2 > a3 > a1
says that beveled joints are strongest and butt joints the weakest. Further, the fact
that the ai ’s and b j ’s are of roughly the same order of magnitude says that the
Joint Type and Wood Type factors are of comparable importance in determining
tensile strength.
A difference between fitted main effects for a factor amounts to a difference be-
tween corresponding row or column averages and quantifies how different response
behavior is for those two levels.
which indicates that pine joint average strength is about 467 psi less than oak
joint average strength.
168 Chapter 4 Describing Relationships Between Variables
In some two-factor factorial studies, the fitted main effects as defined in Defini-
tion 5 pretty much summarize the story told by the means ȳ i j , in the sense that
Display (4.24) implies, for example, that the pattern of mean responses for level 1
of factor A is the same as for level 2 of A. That is, changing levels of factor B (from
say j to j 0 ) produces the same change in mean response for level 2 as for level 1
(namely, b j 0 − b j ). In fact, if relation (4.24) holds, there are parallel traces on an
interaction plot of means.
Example 7 To illustrate the meaning of expression (4.24), the fitted effects for the Joint
(continued ) Type/Wood Type data have been used to calculate 3 × 3 = 9 values of ȳ .. +
ai + b j corresponding to the nine experimental combinations. These are given in
Table 4.13.
For comparison purposes, the ȳ i j from Table 4.12 and the ȳ .. + ai + b j from
Table 4.13 are plotted on the same sets of axes in Figure 4.23. Notice the parallel
traces for the ȳ .. + ai + b j values for the three different joint types. The traces for
yij
2400
Beveled
y.. + ai + bj
2200
2000
Stress at failure (psi)
1800
1600 Lap
1400
Butt
1200
1000
800
600
Pine Oak Walnut
Wood
Table 4.13
Values of ȳ.. + ai + bj for the Joint Strength Study
Wood
the ȳ i j values for the three different joint types are not parallel (particularly when
walnut is considered), so there are apparently substantial differences between the
ȳ i j ’s and the ȳ .. + ai + b j ’s.
When relationship (4.24) fails to hold, the patterns in mean response across
levels of one factor depend on the levels of the second factor. In such cases, the
differences between the combination means ȳ i j and the values ȳ .. + ai + b j can
serve as useful measures of lack of parallelism on the plots of means, and this leads
to another definition.
Definition 6 In a two-way complete factorial study with factors A and B, the fitted inter-
action of factor A at its ith level and factor B at its jth level is
abi j = ȳ i j − ( ȳ .. + ai + b j )
Interpretation of The fitted interactions in some sense measure how much pattern the combination
interactions in a means ȳ i j carry that is not explainable in terms of the factors A and B acting
two-way separately. Clearly, when relationship (4.24) holds, the fitted interactions abi j are all
factorial study small (nearly 0), and system behavior can be thought of as depending separately on
level of A and level of B. In such cases, an important practical consequence is that it
is possible to develop recommendations for levels of the two factors independently
of each other. For example, one need not recommend one level of A if B is at its
level 1 and another if B is at its level 2.
Consider a study of the effects of factors Tool Type and Turning Speed on the
metal removal rate for a lathe. If the fitted interactions are small, turning speed
recommendations that remain valid for all tool types can be made. However, if
the fitted interactions are important, turning speed recommendations might vary
according to tool type.
170 Chapter 4 Describing Relationships Between Variables
Example 7 Again using the Joint Type/Wood Type data, consider calculating the fitted in-
(continued ) teractions. The raw material for these calculations already exists in Tables 4.12
and 4.13. Simply taking differences between entries in these tables cell-by-cell
yields the fitted interactions given in Table 4.14.
It is interesting to compare these fitted interactions to themselves and to
the fitted main effects. The largest (in absolute value) fitted interaction (ab23 )
corresponds to beveled walnut joints. This is consistent with one visual message
in Figures 4.22 and 4.23: This Joint Type/Wood Type combination is in some
sense most responsible for destroying any nearly parallel structure that might
otherwise appear. The fact that (on the whole) the abi j ’s are not as large as the
ai ’s or b j ’s is consistent with a second visual message in Figures 4.22 and 4.23:
The lack of parallelism, while important, is not as important as differences in
Joint Types or Wood Types.
Table 4.14
Fitted Interactions for the Joint Strength Study
Wood
Example 7 has proceeded “by hand.” But using a statistical package can make
the calculations painless. For example, Printout 6 illustrates that most of the results
of Example 7 are readily available in MINITAB’s “General Linear Model” routine
(found under the “Stat/ANOVA/General Linear Model” menu). Comparing this
printout to the example does bring up one point regarding the fitted effects defined
in Definitions 5 and 6. Note that the printout provides values of only two (of three)
Joint main effects, two (of three) Wood main effects, and four (of nine) Joint × Wood
Fitted effects interactions. These are all that are needed, since it is a consequence of Definition 5
sum to zero that fitted main effects for a given factor must total to 0, and it is a consequence of
Definition 6 that fitted interactions must sum to zero across any row or down any
column of the two-way table of factor combinations. The fitted effects not provided
by the printout are easily deduced from the ones that are given.
That is, ȳ.. , the fitted main effects, and the fitted interactions provide a decomposition
or breakdown of the combination sample means into interpretable pieces. These
pieces correspond to an overall effect, the effects of factors acting separately, and
the effects of factors acting jointly.
Taking a hint from the equation fitting done in the previous two sections, it
makes sense to think of (4.25) as a fitted version of an approximate relationship,
y ≈ µ + αi + β j + αβi j (4.26)
y ≈ µ + αi + β j
and there are other simplified versions of equation (4.26) that also have appealing
interpretations. For example, the simplified version of equation (4.26),
y ≈ µ + αi
Residuals e = y − ŷ
(and should look like noise if the simplified equation is an adequate description of
the data set). Further, the fraction of raw variation in y accounted for in the fitting
process is (as always)
P P
Coefficient of (y − ȳ)2 − (y − ŷ)2
determination R2 = P (4.27)
(y − ȳ)2
where the sums are over all observed y’s. (Summation notation is being abused even
further than usual, by not even subscripting the y’s and ŷ’s.)
Table 4.15
Golf Ball Flight Distances for Four Compression/Evening Combinations
Evening (B)
1 2
180 192 196 180
193 190 192 195
80 197 182 191 197
189 192 194 192
187 179 186 193
Compression (A)
180 175 190 185
185 190 195 167
100 167 185 180 180
162 180 170 180
170 185 180 165
174 Chapter 4 Describing Relationships Between Variables
Example 8 These data have complete two-way factorial structure. The factor Evening is
(continued ) not really of primary interest. Rather, it is a blocking factor, its levels creating
homogeneous environments in which to compare 80 and 100 compression flight
distances. Figure 4.24 is a graphic using boxplots to represent the four samples
and emphasizing the factorial structure.
Calculating sample means corresponding to the four cells in Table 4.15 and
then finding fitted effects is straightforward. Table 4.16 displays cell, row, column,
and grand average means. And based on those values,
80 Compression
Flight distance (yd)
190
100 Compression
180
170
1 2
Evening
Table 4.16
Cell, Row, Column, and Grand Average Means for the Golf Ball Flight Data
Evening (B)
1 2
80 ȳ 11 = 188.1 ȳ 12 = 191.6 189.85
Compression (A)
100 ȳ 21 = 177.9 ȳ 22 = 179.2 178.55
183.00 185.40 184.20
4.3 Fitted Effects for Factorial Data 175
180
100 Compression
1 2
Evening
The fitted effects indicate that most of the differences in the cell means in Ta-
ble 4.16 are understandable in terms of differences between 80 and 100 compres-
sion balls. The effect of differences between evenings appears to be on the order
of one-fourth the size of the effect of differences between ball compressions.
Further, the pattern of flight distances across the two compressions changed rela-
tively little from evening to evening. These facts are portrayed graphically in the
interaction plot of Figure 4.25.
The story told by the fitted effects in this example probably agrees with most
readers’ intuition. There is little reason a priori to expect the relative behaviors of
80 and 100 compression flight distances to change much from evening to evening.
But there is slightly more reason to expect the distances to be longer overall on
some nights than on others.
It is worth investigating whether the data in Table 4.15 allow the simplest
To do so, fitted responses are first calculated corresponding to the three different
possible corresponding relationships
y ≈ µ + αi (4.28)
y ≈ µ + αi + β j (4.29)
y ≈ µ + αi + β j + αβi j (4.30)
176 Chapter 4 Describing Relationships Between Variables
These are generated using the fitted effects. They are collected in Table 4.17
(not surprisingly, the first and third sets of fitted responses are, respectively, row
average and cell means).
Residuals e = y − ŷ for fitting the three equations (4.28), (4.29), and (4.30)
are obtained by subtracting the appropriate entries in, respectively, the third,
fourth, or fifth column of Table 4.17 from each of the data values listed in
Table 4.15. For example, 40 residuals for the fitting of the “A main effects only”
equation (4.28) would be obtained by subtracting 189.85 from every entry in the
upper left cell of Table 4.15, subtracting 178.55 from every entry in the lower
left cell, 189.85 from every entry in the upper right cell, and 178.55 from every
entry in the lower right cell.
Figure 4.26 provides normal plots of the residuals from the fitting of the three
equations (4.28), (4.29), and (4.30). None of the normal plots is especially linear,
but at the same time, none of them is grossly nonlinear either. In particular, the
first two, corresponding to simplified versions of relationship 4.26, are not signif-
icantly worse than the last one, which corresponds to the use of all fitted effects
(both main effects and interactions). From the limited viewpoint of producing
residuals with an approximately bell-shaped distribution, the fitting of any of the
three equations (4.28), (4.29), and (4.30) would appear approximately equally
effective.
The calculation of R 2 values for equations (4.28), (4.29), and (4.30) proceeds
as follows. First, since the grand average of all 40 flight distances is ȳ = 184.2
yards (which in this case also turns out to be ȳ .. ) ,
X
(y − ȳ)2 = (180 − 184.2)2 + · · · + (179 − 184.2)2
0 0 0
Figure 4.26 Normal plots of residuals from three different equations fitted to the golf data
values for the three equations are obtained as the sums of the squared residuals.
For example, using Tables 4.15 and 4.17, for equation (4.29),
X
(y − ŷ)2 = (180 − 188.65)2 + · · · + (179 − 188.65)2
Finally, equation (4.27) is used. Table 4.18 gives the three values of R 2 .
The story told by the R 2 values is consistent with everything else that’s been
said in this example. None of the values is terribly big, which is consistent with
the large within-sample variation in flight distances evident in Figure 4.24. But
Table 4.18
R2 Values for Fitting Equations
(4.28), (4.29), and (4.30) to
Gronberg’s Data
Equation R2
y ≈ µ + αi .366
y ≈ µ + αi + β j .382
y ≈ µ + αi + β j + αβi j .386
178 Chapter 4 Describing Relationships Between Variables
Example 8 considering A (Compression) main effects does account for some of the observed
(continued ) variation in flight distance, and the addition of B (Evening) main effects adds
slightly to the variation accounted for. Introducing interactions into consideration
adds little additional accounting power.
Notation for sample ȳ i jk = the sample mean response when factor A is at level i,
means and their factor B is at level j, and factor C is at level k
averages (for three-way 1 X
factorial data) ȳ ... = ȳ
IJK i, j,k i jk
1 X
ȳ i j. = ȳ
K k i jk
Factor A level
K
l
ve
le
C
2
or
ct
1
Fa
2
1
1 2 J
Factor B level
Table 4.19
Levels of Three Process Variables in a 23 Study of Material Strength
Table 4.20
Sample Mean Strengths for 23 Treatment Combinations
ȳ i j k ,
i, j, k, Sample Mean
Factor A Level Factor B Level Factor C Level Strength (psi)
1 1 1 1520
2 1 1 2450
1 2 1 2340
2 2 1 2900
1 1 2 1670
2 1 2 2540
1 2 2 2230
2 2 2 3230
4.3 Fitted Effects for Factorial Data 181
Factor A level
y112 = 1670 y122 = 2230
le 2
l
ve
C
1 y111 = 1520 y121 = 2340
or
1
ct
Fa
1 2
Factor B level
For example,
1
ȳ 1.. = (1520 + 2340 + 1670 + 2230) = 1940 psi
2·2
is the average mean on the bottom face, while
1
ȳ 11. = (1520 + 1670) = 1595 psi
2
is the average mean on the lower left edge. For future reference, all of the average
sample means are collected here:
Definition 7 In a three-way complete factorial study with factors A, B, and C, the fitted
main effect of factor A at its ith level is
ai = ȳ i.. − ȳ ...
b j = ȳ . j. − ȳ ...
ck = ȳ ..k − ȳ ...
Definition 8 In a three-way complete factorial study with factors A, B, and C, the fitted
2-factor interaction of factor A at its ith level and factor B at its jth level is
abi j = ȳ i j. − ( ȳ ... + ai + b j )
the fitted 2-factor interaction of factor A at its ith level and factor C at its
kth level is
and the fitted 2-factor interaction of factor B at its jth level and factor C at
its kth level is
bc jk = ȳ . jk − ( ȳ ... + b j + ck )
4.3 Fitted Effects for Factorial Data 183
Interpreting two- These fitted 2-factor interactions can be thought of in two equivalent ways:
way interactions
1. as what one gets as fitted interactions upon averaging across all levels of
in a three-way study
the factor that is not under consideration to obtain a single two-way table of
(average) means and then calculating as per Definition 6 (page 169);
2. as what one gets as averages, across all levels of the factor not under consid-
eration, of the fitted two-factor interactions calculated as per Definition 6,
one level of the excluded factor at a time.
Example 9 To illustrate the meaning of Definitions 7 and 8, return to the composite material
(continued ) strength study. For example, the fitted A main effects are
The entire set of fitted effects for the means of Table 4.20 is as follows.
Remember equation (4.25) (page 171). It says that in 2-factor studies, the fitted
grand mean, main effects, and two-factor interactions completely describe a factorial
set of sample means. Such is not the case in three-factor studies. Instead, a new pos-
Interpretation of sibility arises: 3-factor interaction. Roughly speaking, the fitted three-factor interac-
three-way interactions tions in a 3-factor study measure how much pattern the combination means carry that
is not explainable in terms of the factors A, B, and C acting separately and in pairs.
Definition 9 In a three-way complete factorial study with factors A, B, and C, the fitted
3-factor interaction of A at its ith level, B at its jth level, and C at its kth
level is
Example 9 To illustrate the meaning of Definition 9, consider again the composite ma-
(continued ) terial study. Using the previously calculated fitted main effects and 2-factor
interactions,
Similar calculations can be made to verify that the entire set of 3-factor interac-
tions for the means of Table 4.20 is as follows:
Main effects and 2-factor interactions are more easily interpreted than 3-factor
interactions. One insight into their meaning was given immediately before Defi-
A second nition 9. Another is the following. If at the different levels of (say) factor C, the
interpretation fitted AB interactions are calculated and the fitted AB interactions (the pattern of
of three-way parallelism or nonparallelism) are essentially the same on all levels of C, then the
interactions 3-factor interactions are small (near 0). Otherwise, large 3-factor interactions allow
the pattern of AB interaction to change, from one level of C to another.
When beginning the analysis of three-way factorial data, one hopes to discover
a simplified version of equation (4.32) that is both interpretable and an adequate
description of the data. (Indeed, if it is not possible to do so, little is gained by
using the factorial breakdown rather than simply treating the data in question as IJK
unstructured samples.)
As was the case earlier with two-way factorial data, the process of fitting a
simplified version of display (4.32) via least squares is, in general, unfortunately
somewhat complicated. But when all sample sizes are equal (i.e., the data are
4.3 Fitted Effects for Factorial Data 185
Example 9 Looking over the magnitudes of the fitted effects for Kinzer’s composite material
(continued ) strength study, the A and B main effects clearly dwarf the others, suggesting the
possibility that the relationship
y ≈ µ + αi + β j (4.33)
Fitted y values, y
2465 3095
2 2465 3095
Factor A
1625 2255
2
C
or
ct
Fa
1 1625 2255
1
1 Factor B 2
Example 9 All eight fitted values corresponding to equation (4.33) are shown geometrically
(continued ) in Figure 4.29. The fitted values given in the figure might be combined with
product requirements and cost information to allow a process engineer to make
sound decisions about autoclave temperature, autoclave time, and time span.
2 26.0 24.0
Plane design
22.5 21.6
rw L 2
t t
ei igh
gh
1 15.8 18.4
H 1
Pa y
v
pe
ea
1 2
Large Small
Plane size
Table 4.21
Shorthand Names for the 23 Factorial Treatment
Combinations
their high levels. For example, if level 2 of each of factors A, B, and C is designated
the high level, shorthand names for the 23 = 8 different ABC combinations are as
given in Table 4.21. Using these names, for example, ȳ a can stand for a sample mean
where factor A is at its high (or second) level and all other factors are at their low
(or first) levels.
Special relationship A second convenience special to two-level factorial data structures is the fact
between 2p effects that all effects of a given type have the same absolute value. This has already been
of a given type illustrated in Example 9. For example, looking back, for the data of Table 4.20,
This is always the case for fitted effects in 2 p factorials. In fact, if two fitted
effects of the same type are such that an even number of 1 → 2 or 2 → 1 subscript
changes are required to get the second from the first, the fitted effects are equal
(e.g., bc22 = bc11 ). If an odd number are required, then the second fitted effect is
−1 times the first (e.g., bc12 = −bc22 ). This fact is so useful because one needs only
to do the arithmetic necessary to find one fitted effect of each type and then choose
appropriate signs to get all others of that type.
A statistician named Frank Yates is credited with discovering an efficient,
mechanical way of generating one fitted effect of each type for a 2 p study. His
method is easy to implement “by hand” and produces fitted effects with all “2”
subscripts (i.e., corresponding to the “all factors at their high level” combination).
The Yates algorithm The Yates algorithm consists of the following steps.
for computing fitted
2p factorial effects Step 1 Write down the 2 p sample means in a column in what is called Yates
standard order. Standard order is easily remembered by beginning
4.3 Fitted Effects for Factorial Data 189
The last column (made via step 3) gives fitted effects (all factors at level 2), again
in standard order.
Example 9 Table 4.22 shows the use of the Yates algorithm to calculate fitted effects for the
(continued ) 23 composite material study. The entries in the final column of this table are, of
course, exactly as listed earlier, and the rest of the fitted effects are easily obtained
via appropriate sign changes. This final column is an extremely concise summary
of the fitted effects, which quickly reveals which types of fitted effects are larger
than others.
Table 4.22
The Yates Algorithm Applied to the Means of Table 4.20
The Yates algorithm is useful beyond finding fitted effects. For balanced data
sets, it is also possible to modify it slightly to find fitted responses, ŷ, correspond-
ing to a simplified version of a relation like display (4.32). First, the desired (all
factors at their high level) fitted effects (using 0’s for those types not considered)
The reverse Yates are written down in reverse standard order. Then, by applying p cycles of the
algorithm and easy Yates additions and subtractions, the fitted values, ŷ, are obtained, listed in re-
computation of fitted verse standard order. (Note that no final division is required in this reverse Yates
responses algorithm.)
190 Chapter 4 Describing Relationships Between Variables
Example 9 Consider fitting the relationship (4.33) to the balanced data set that led to the
(continued ) means of Table 4.20 via the reverse Yates algorithm. Table 4.23 gives the details.
The fitted values in the final column are exactly as shown earlier in Figure 4.29.
Table 4.23
The Reverse Yates Algorithm Applied to Fitting the "A and B
Main Effects Only" Equation (4.33) to the Data of Table 4.20
The restriction to two-level factors that makes these notational and computa-
tional devices possible is not as specialized as it may at first seem. When an engineer
wishes to study the effects of a large number of factors, even 2 p will be a large num-
The importance ber of conditions to investigate. If more than two levels of factors are considered,
of two-level the sheer size of a complete factorial study quickly becomes unmanageable. Rec-
factorials ognizing this, two-level studies are often used for screening to identify a few (from
many) process variables for subsequent study at more levels on the basis of their
large perceived effects in the screening study. So this 2 p material is in fact quite
important to the practice of engineering statistics.
Section 3 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Since the data of Exercise 2 of Section 4.2 have Plot these versus level of the NaOH variable,
complete factorial structure, it is possible (at least connecting fitted values having the same level
temporarily) to ignore the fact that the two experi- of the Time variable with line segments, as in
mental factors are basically quantitative and make Figure 4.23. Discuss how this plot compares
a factorial analysis of the data. to the two plots of fitted y versus x 1 made in
(a) Compute all fitted factorial main effects and in- Exercise 2 of Section 4.2.
teractions for the data of Exercise 2 of Section (c) Use the fitted values computed in (b) and find
4.2. Interpret the relative sizes of these fitted ef- a value of R 2 appropriate to the “main effects
fects, using a interaction plot like Figure 4.22 only” representation of y. How does it com-
to facilitate your discussion. pare to the R 2 values from multiple regres-
(b) Compute nine fitted responses for the “main ef- sions? Also use the fitted values to compute
fects only” explanation of y, y ≈ µ + αi + β j .
4.4 Transformations and Choice of Measurement Scale 191
residuals for this “main effects only” represen- (b) The students actually had some physical the-
tation. Plot these (versus level of NaOH, level ory suggesting that the log of the drain time
of Time, and ŷ, and in normal plot form). What might be a more convenient response variable
do they indicate about the present “no interac- than the raw time. Take the logs of the y’s and
tion” explanation of specific area? recompute the factorial effects. Does an inter-
2. Bachman, Herzberg, and Rich conducted a 23 fac- pretation of this system in terms of only main
torial study of fluid flow through thin tubes. They effects seem more plausible on the log scale
measured the time required for the liquid level in than on the original scale?
a fluid holding tank to drop from 4 in. to 2 in. for (c) Considering the logged drain times as the re-
two drain tube diameters and two fluid types. Two sponses, find fitted values and residuals for a
different technicians did the measuring. Their data “Diameter and Fluid main effects only” expla-
are as follows: nation of these data. Compute R 2 appropriate
to such a view and compare it to R 2 that re-
Diameter sults from using all factorial effects to describe
Technician (in.) Fluid Time (sec)
log drain time. Make and interpret appropriate
residual plots.
1 .188 water 21.12, 21.11, 20.80 (d) Based on the analysis from (c), what change in
2 .188 water 21.82, 21.87, 21.78 log drain time seems to accompany a change
1 .314 water 6.06, 6.04, 5.92 from .188 in. diameter to .314 in. diameter?
2 .314 water 6.09, 5.91, 6.01 What does this translate to in terms of raw drain
1 .188 ethylene glycol 51.25, 46.03, 46.09 time? Physical theory suggests that raw time is
2 .188 ethylene glycol 45.61, 47.00, 50.71
inversely proportional to the fourth power of
drain tube radius. Does your answer here seem
1 .314 ethylene glycol 7.85, 7.91, 7.97
compatible with that theory? Why or why not?
2 .314 ethylene glycol 7.73, 8.01, 8.32
3. When analyzing a full factorial data set where the
(a) Compute (using the Yates algorithm or other- factors involved are quantitative, either the surface-
wise) the values of all the fitted main effects, fitting technology of Section 4.2 or the factorial
two-way interactions, and three-way interac- analysis material of Section 4.3 can be applied.
tions for these data. Do any simple interpreta- What practical engineering advantage does the first
tions of these suggest themselves? offer over the second in such cases?
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0 4 3
0 6 5 5 5 6 6 8 9 8 8
1 4 0 3 0
1 7 5 9 5 6 9 5
2 0
2 9 5 7 9
3 2
3 6
0 0
–1.0 –1.0
–2.0 –2.0
Figure 4.32 Normal plots for discovery times and log discovery times
Power
g(y) = (y − γ )α (4.34)
transformations
Where several samples (and corresponding ȳ and s values) are involved, an empirical
way of investigating whether (1) or (2) above might be useful is to plot ln(s) versus
ln( ȳ) and see if there is approximate linearity. If so, a slope of roughly 1 makes (1)
appropriate, while a slope of δ 6= 1 signals what version of (2) might be helpful.
In addition to this empirical way of identifying a potentially variance-stabilizing
transformation, theoretical considerations can sometimes provide guidance. Stan-
dard theoretical distributions (like those introduced in Chapter 5) have their own
relationships between their (theoretical) means and variances, which can help pick
out an appropriate version of (1) or (2) above.
A power law β β β
y ≈ αx1 1 x2 2 · · · xk k (4.35)
ȳ .... = 6.1550
a2 = .4563 b2 = 1.6488 c2 = 3.2163 d2 = 1.1425
ab22 = .0750 ac22 = .2975 ad22 = .4213
bc22 = .7525 bd22 = .2213 cd22 = .7987
abc222 = .0838 abd222 = .2950 acd222 = .3775 bcd222 = .0900
abcd2222 = .2688
y ≈ µ + β j + γk + δl (4.37)
and
y ≈ µ + β j + γk + δl + βγ jk + γ δkl (4.38)
Table 4.24
Daniel’s 24 Drill Advance Rate Data
Combination y Combination y
Example 12 are suggested. (The five largest fitted effects are, in order of decreasing magnitude,
(continued ) the main effects of C, B, and D, and then the two-factor interactions of C with D
and B with C.) Fitting equation (4.37) to the balanced data of Table 4.24 produces
R 2 = .875, and fitting relationship (4.38) produces R 2 = .948. But upon closer
examination, neither fitted equation turns out to be a very good description of
these data.
Figure 4.33 shows a normal plot and a plot against ŷ for residuals from
a fitted version of equation (4.37). It shows that the fitted version of equation
(4.37) produces several disturbingly large residuals and fitted values that are
systematically too small for responses that are small and large, but too large for
moderate responses. Such a curved plot of residuals versus ŷ in general suggests
that a nonlinear transformation of y may potentially be effective.
The reader is invited to verify that residual plots for equation (4.38) look even
worse than those in Figure 4.33. In particular, it is the bigger responses that are
2.0
Standard normal quantile
1.0
–1.0
–2.0
–1.0 0.0 1.0 2.0 3.0 4.0
Residual quantile
4.0
3.0
Residual
2.0
1.0
0
–1.0
–2.0
2.0 4.0 6.0 8.0 10.0 12.0
Fitted response, y
2.0
–1.0
–2.0
–.2 –.1 0 .1 .2
Residual quantile
.2
.1
Residual
–.1
–.2
1.0 2.0
Fitted response, ln( y)
y 0 .... = 1.5977
a2 = .0650 b2 = .2900 c2 = .5772 d2 = .1633
ab22 = −.0172 ac22 = .0052 ad22 = .0334
bc22 = −.0251 bd22 = −.0075 cd22 = .0491
abc222 = .0052 abd222 = .0261 acd222 = .0266 bcd222 = −.0173
abcd2222 = .0193
198 Chapter 4 Describing Relationships Between Variables
Example 12 For the logged drill advance rates, the simple relationship
(continued )
ln(y) ≈ µ + β j + γk + δl (4.39)
y2 ≈ β0 + β1 x1 + β2 x2 + β3 x12 + β4 x22 + β5 x1 x2
Table 4.25
Yields and Filtration Times in a 32 Factorial Chemical
Process Study
x1 , x2 , y1 , y2 ,
Condensation Amount Yield Filtration
Temperature (◦ C) of B (cc) (g) Time (sec)
90 24.4 21.1 150
90 29.3 23.7 10
90 34.2 20.7 8
100 24.4 21.1 35
100 29.3 24.1 8
100 34.2 22.2 7
110 24.4 18.4 18
110 29.3 23.4 8
110 34.2 21.9 10
(4.40) rates as a statistical engineering success story. But there is the embarrass-
ing fact that upon substituting x 1 = 103.2 and x2 = 30.88 into equation (4.40),
one gets ŷ 2 = −11 sec, hardly a possible filtration time.
Looking again at the data, it is not hard to see what has gone wrong. The
largest response is more than 20 times the smallest. So in order to come close to
fitting both the extremely large and more moderate responses, the fitted quadratic
surface needs to be very steep—so steep that it is forced to dip below the (x1 , x2 )-
plane and produce negative ŷ 2 values before it can “get turned around” and start
to climb again as it moves away from the point of minimum ŷ 2 toward larger x1
and x2 .
One cure for the problem of negative predicted filtration times is to use ln(y2 )
as a response variable. Values of ln(y2 ) are given in Table 4.26 to illustrate the
moderating effect the logarithm has on the factor of 20 disparity between the
largest and smallest filtration times.
Fitting the approximate quadratic relationship
y2 , ln(y2 ),
Filtration Time (sec) Log Filtration Time (ln(sec))
150 5.0106
10 2.3026
8 2.0794
35 3.5553
8 2.0794
7 1.9459
18 2.8904
8 2.0794
10 2.3026
The taking of logs in this example had two beneficial effects. The first was to
cut the ratio of largest response to smallest down to about 2.5 (from over 20), al-
lowing a good fit (as measured by R 2 ) for a fitted quadratic in two variables, x 1 and
x2 . The second was to ensure that minimum predicted filtration time was positive.
Of course, other transformations besides the logarithmic one are also useful in
describing the structure of multifactor data sets. Sometimes they are applied to the
responses and sometimes to other system variables. As an example of a situation
where a power transformation like that specified by equation (4.34) is useful in
understanding the structure of a sample of bivariate data, consider the following.
Table 4.27
Average Grain Diameters and Yield Strengths for Copper Deposits
350
300
250
200
.2 .4 .6 .8
Average grain diameter ( µm)
350
300
250
200
1 1.2 1.4 1.6 1.8 2.0 2.2
Reciprocal square root grain diameter
Section 4 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. What are benefits that can sometimes be derived will there be important interactions? (In order to
from transforming data before applying standard make this concrete, you may if you wish consider
statistical techniques? the relationship y ≈ kx12 x2−3 . Plot, for at least two
2. Suppose that a response variable, y, obeys an ap- different values of x 2 , y as a function of x1 . Then
proximate power law in at least two quantitative plot, for at least two different values of x 2 , ln(y) as
variables (say, x1 and x2 ). Will there be important a function of x1 . What do these plots show in the
interactions? If the log of y is analyzed instead, way of parallelism?)
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Section 5 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Read again Section 1.4 and the present one. Then Give an example of a deterministic model that is
describe in your own words the difference between useful in your field.
deterministic and stochastic/probabilistic models.
Chapter 4 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Nicholson and Bartle studied the effect of the wa- (b) Compute the sample correlation between x and
ter/cement ratio on 14-day compressive strength y by hand. Interpret this value.
for Portland cement concrete. The water/cement (c) What fraction of the raw variability in y is
ratios (by volume) and compressive strengths of accounted for in the fitting of a line to the data?
nine concrete specimens are given next. (d) Compute the residuals from your fitted line and
make a normal plot of them. Interpret this plot.
Water/Cement 14-Day Compressive (e) What compressive strength would you predict,
Ratio, x Strength, y (psi) based on your calculations from (a), for speci-
mens made using a .48 water/cement ratio?
.45 2954, 2913, 2923 (f) Use a statistical package to find the least
.50 2743, 2779, 2739 squares line, the sample correlation, R 2 , and
.55 2652, 2607, 2583 the residuals for this data set.
2. Griffith and Tesdall studied the elapsed time in 14
(a) Fit a line to the data here via least squares, mile runs of a Camaro Z-28 fitted with different
showing the hand calculations.
204 Chapter 4 Describing Relationships Between Variables
sizes of carburetor jetting. Their data from six runs (a) What type of data structure did the researchers
of the car follow: employ? (Use the terminology of Section 1.2.)
What was an obvious weakness in their data
Jetting Size, x Elapsed Time, y (sec) collection plan?
(b) Use a regression program to fit the following
66 14.90 equations to these data:
68 14.67
70 14.50 y ≈ β 0 + β1 x 1 + β2 x 2
72 14.53
y ≈ β0 + β1 x1 + β2 ln(x2 )
74 14.79
76 15.02 y ≈ β0 + β1 x1 + β2 ln(x2 ) + β3 x1 ln(x2 )
(a) What is an obvious weakness in the students’ What are the R 2 values for the three differ-
data collection plan? ent fitted equations? Compare the three fitted
(b) Fit both a line and a quadratic equation (y ≈ equations in terms of complexity and apparent
β0 + β1 x + β2 x 2 ) to these data via least ability to predict y.
squares. Plot both of these equations on a scat- (c) Compute the residuals for the third fitted equa-
terplot of the data. tion in (b). Plot them against x1 , x2 , and ŷ.
(c) What fractions of the raw variation in elapsed Also normal-plot them. Do any of these plots
time are accounted for by the two different suggest that the third fitted equation is inade-
fitted equations? quate as summary of these data? What, if any,
(d) Use your fitted quadratic equation to predict an possible improvement over the third equation
optimal jetting size (allowing fractional sizes). is suggested by these plots?
3. The following are some data taken from “Kinet- (d) As a means of understanding the nature of the
ics of Grain Growth in Powder-formed IN-792: A third fitted equation in (b), make a scatterplot
Nickel-Base Super-alloy” by Huda and Ralph (Ma- of y vs. x2 using a logarithmic scale for x2 . On
terials Characterization, September 1990). Three this plot, plot three lines representing ŷ as a
different Temperatures, x1 (◦ K), and three different function of x2 for the three different values of
Times, x2 (min), were used in the heat treating of x1 . Qualitatively, how would a similar plot for
specimens of a material, and the response the second equation differ from this one?
(e) Using the third equation in (b), what mean
y = mean grain diameter (µm) grain diameter would you predict for x 1 =
1500 and x2 = 500?
was measured. (f) It is possible to ignore the fact that the Tem-
perature and Time factors are quantitative and
Temperature, x1 Time, x2 Grain Size, y make a factorial analysis of these data. Do so.
Begin by making an interaction plot similar
1443 20 5 to Figure 4.22 for these data. Based on that
1443 120 6 plot, discuss the apparent relative sizes of the
1443 1320 9 Time and Temperature main effects and the
1493 20 14 Time × Temperature interactions. Then com-
1493 120 17 pute the fitted factorial effects (the fitted main
1493 1320 25
effects and interactions).
1543 20 29 4. The article “Cyanoacetamide Accelerators for the
1543 120 38 Epoxide/Isocyanate Reaction” by Eldin and Ren-
1543 1320 60
ner (Journal of Applied Polymer Science, 1990)
Chapter 4 Exercises 205
reports the results of a 23 factorial experiment. Us- values of ŷ and compute R 2 for the “A main ef-
ing cyanoacetamides as catalysts for an epoxy/iso- fects only” description of impact strength. (The
cyanate reaction, various mechanical properties of formula in Definition 3 works in this context
a resulting polymer were studied. One of these was as well as in regression.)
(d) Now recognize that the experimental factors
y = impact strength (kJ/mm2 ) here are quantitative, so methods of curve and
surface fitting may be applicable. Fit the equa-
The three experimental factors employed and their tion y ≈ β0 + β1 (epoxy/isocyanate ratio) to
corresponding experimental levels were as follows: the data. What eight values of ŷ and value
of R 2 accompany this fit?
Factor A Initial Epoxy/Isocyanate Ratio
0.4 (−) vs. 1.2 (+) 5. Timp and M-Sidek studied the strength of mechan-
ical pencil lead. They taped pieces of lead to a desk,
Factor B Flexibilizer Concentration with various lengths protruding over the edge of the
10 mol % (−) vs. 40 mol % (+) desk. After fitting a small piece of tape on the free
Factor C Accelerator Concentration end of a lead piece to act as a stop, they loaded it
1/240 mol % (−) vs. 1/30 mol% (+) with paper clips until failure. In one part of their
(The flexibilizer and accelerator concentrations are study, they tested leads of two different Diame-
relative to the amount of epoxy present initially.) ters, used two different Lengths protruding over
The impact strength data obtained (one observation the edge of the desk, and tested two different lead
per combination of levels of the three factors) were Hardnesses. That is, they ran a 23 factorial study.
as follows: Their factors and levels were as follows:
Factor A Diameter .3 mm (−) vs. .7 mm (+)
Combination y Combination y
Factor B Length Protruding 3 cm (−) vs.
(1) 6.7 c 6.3 4.5 cm (+)
a 11.9 ac 15.1 Factor C Hardness B (−) vs. 2H (+)
b 8.5 bc 6.7 and m = 2 trials were made at each of the 23 = 8
ab 16.5 abc 16.4 different sets of conditions. The data the students
obtained are given here.
(a) What is an obvious weakness in the researchers’
data collection plan?
Combination Number of Clips
(b) Use the Yates algorithm and compute fitted fac-
torial effects corresponding to the “all high” (1) 13, 13
treatment combination (i.e., compute ȳ ... , a2 , a 74, 76
b2 , etc.). Interpret these in the context of the b 9, 10
original study. (Describe in words which fac-
ab 43, 42
tors and/or combinations of factors appear to
c 16, 15
have the largest effect(s) on impact strength
ac 89, 88
and interpret the sign or signs.)
(c) Suppose only factor A is judged to be of im- bc 10, 12
portance in determining impact strength. What abc 54, 55
predicted/fitted impact strengths correspond to
this judgment? (Find ŷ values using the reverse (a) It appears that analysis of these data in terms
Yates algorithm or otherwise.) Use these eight of the natural logarithms of the numbers of
206 Chapter 4 Describing Relationships Between Variables
clips first causing failure is more straightfor- (a) Compute the fitted 23 factorial effects (main
ward than the analysis of the raw numbers of effects, 2-factor interactions and 3-factor inter-
clips. So take natural logs and compute the fit- actions) corresponding to the following set of
ted 23 factorial effects. Interpret these. In par- conditions: 60 mesh, 500 cc, vibrated cylinder.
ticular, what (in quantitative terms) does the (b) If your arithmetic for part (a) is correct, you
size of the fitted A main effect say about lead should have found that the largest of the fitted
strength? Does lead hardness appear to play effects (in absolute value) are (respectively)
a dominant role in determining this kind of the C main effect, the A main effect, and then
breaking strength? the AC 2-factor interaction. (The next largest
(b) Suppose only the main effects of Diameter are fitted effect is only about half of the smallest
judged to be of importance in determining lead of these, the AC interaction.) Now, suppose
strength. Find a predicted log breaking strength you judge these three fitted effects to summa-
for .7 mm, 2H lead when the length protruding rize the main features of the data set. Interpret
is 4.5 cm. Use this to predict the number of this data summary (A and C main effects and
clips required to break such a piece of lead. AC interactions) in the context of this 3-factor
(c) What, if any, engineering reasons do you have study.
for expecting the analysis of breaking strength (c) Using your fitted effects from (a) and the data
to be more straightforward on the log scale than summary from (b) (A and C main effects and
on the original scale? AC interactions), what fitted response would
6. Ceramic engineering researchers Leigh and Taylor, you have for these conditions: 60 mesh, 500
in their paper “Computer Generated Experimen- cc, vibrated cylinder?
tal Designs” (Ceramic Bulletin, 1990), studied the (d) Using your fitted effects from (a), what average
packing properties of crushed T-61 tabular alumina change in density would you say accompanies
powder. The densities of batches of the material the vibration of the graduated cylinder before
were measured under a total of eight different sets density determination?
of conditions having a 23 factorial structure. The 7. The article “An Analysis of Transformations” by
following factors and levels were employed in the Box and Cox (Journal of the Royal Statistical So-
study: ciety, Series B, 1964) contains a classical unrepli-
Factor A Mesh Size of Powder Particles cated 33 factorial data set originally taken from an
6 mesh (−) vs. 60 mesh (+) unpublished technical report of Barella and Sust.
These researchers studied the behavior of worsted
Factor B Volume of Graduated Cylinder yarns under repeated loading. The response vari-
100 cc (−) vs. 500 cc (+) able was
Factor C Vibration of Cylinder
no (−) vs. yes (+) y = the numbers of cycles till failure
The mean densities (in g/cc) obtained in m = 5
for specimens tested with various values of
determinations for each set of conditions were as
follows:
x 1 = length (mm)
ȳ (1) = 2.348 ȳ a = 2.080
x2 = amplitude of the loading cycle (mm)
ȳ b = 2.298 ȳ ab = 1.980
ȳ c = 2.354 ȳ ac = 2.314 x3 = load (g)
ȳ bc = 2.404 ȳ abc = 2.374
Chapter 4 Exercises 207
The researchers’ data are given in the accompany- to the data. What fraction of the observed vari-
ing table. ability in y = ln(y) does this equation account
for? What change in y 0 seems to accompany a
x1 x2 x3 y x1 x2 x3 y unit (a 1 ln(g)) increase in x30 ?
(c) To carry the analysis one step further, note that
250 8 40 674 300 9 50 438 your fitted coefficients for x10 and x20 are nearly
250 8 45 370 300 10 40 442 the negatives of each other. That suggests that
250 8 50 292 300 10 45 332 y 0 depends only on the difference between x10
250 9 40 338 300 10 50 220 and x20 . To see how this works, fit the equation
250 9 45 266 350 8 40 3,636
250 9 50 210 350 8 45 3,184 y 0 ≈ β0 + β1 (x10 − x20 ) + β2 x30
250 10 40 170 350 8 50 2,000
to the data. Compute and plot residuals from
250 10 45 118 350 9 40 1,568
this relationship (still on the log scale). How
250 10 50 90 350 9 45 1,070
does this relationship appear to do as a data
300 8 40 1,414 350 9 50 566
summary? What power law for y (on the orig-
300 8 45 1,198 350 10 40 1,140 inal scale) in terms of x1 , x2 , and x3 (on their
300 8 50 634 350 10 45 884 original scales) is implied by this last fitted
300 9 40 1,022 350 10 50 360 equation? How does this equation compare to
300 9 45 620 the one from (a) in terms of parsimony?
(d) Use your equation from (c) to predict the life
(a) To find an equation to represent these data, of an additional specimen of length 300 mm, at
you might first try to fit multivariable polyno- an amplitude of 9 mm, under a load of 45 g. Do
mials. Use a regression program and fit a full the same for an additional specimen of length
quadratic equation to these data. That is, fit 325 mm, at an amplitude of 9.5 mm, under
a load of 47 g. Why would or wouldn’t you
y ≈ β0 + β1 x1 + β2 x2 + β3 x3 + β4 x12 + β5 x22 be willing to make a similar projection for an
additional specimen of length 375 mm, at an
+β6 x32 + β7 x1 x2 + β8 x1 x3 + β9 x2 x3
amplitude of 10.5 mm, under a load of 51 g?
to the data. What fraction of the observed vari- 8. Bauer, Dirks, Palkovic, and Wittmer fired tennis
ation in y does it account for? In terms of par- balls out of a “Polish cannon” inclined at an angle
simony (or providing a simple data summary), of 45◦, using three different Propellants and two
how does this quadratic equation do as a data different Charge Sizes of propellant. They observed
summary? the distances traveled in the air by the tennis balls.
(b) Notice the huge range of values of the response Their data are given in the accompanying table.
variable. In cases like this, where the response (Five trials were made for each Propellant/Charge
varies over an order of magnitude, taking log- Size combination and the values given are in feet.)
arithms of the response often helps produce a
simple fitted equation. Here, take (natural) log-
arithms of all of x 1 , x2 , x3 , and y, producing
(say) x10 , x20 , x30 , and y 0 , and fit the equation
set from the article follows. (The data in Section 13. K. Casali conducted a gas mileage study on his
4.1 are the x2 = .01725 data only.) well-used four-year-old economy car. He drove
a 107-mile course a total of eight different times
Cutting Speed, Feed, (in comparable weather conditions) at four differ-
ent speeds, using two different types of gasoline,
x1 (sfpm) x2 (ipr) Tool Life, y (min)
and ended up with an unreplicated 4 × 2 factorial
800 .01725 1.00, 0.90, 0.74, 0.66 study. His data are given in the table below.
700 .01725 1.00, 1.20, 1.50, 1.60
700 .01570 1.75, 1.85, 2.00, 2.20 Speed Gasoline Gallons Mileage
600 .02200 1.20, 1.50, 1.60, 1.60 Test (mph) Octane Used (mpg)
600 .01725 2.35, 2.65, 3.00, 3.60
1 65 87 3.2 33.4
500 .01725 6.40, 7.80, 9.80, 16.50
2 60 87 3.1 34.5
500 .01570 8.80, 11.00, 11.75, 19.00
3 70 87 3.4 31.5
450 .02200 4.00, 4.70, 5.30, 6.00
4 55 87 3.0 35.7
400 .01725 21.50, 24.50, 26.00, 33.00
5 65 90 3.2 33.4
(a) Taylor’s expanded tool life equation is 6 55 90 2.9 36.9
α α
yx1 1 x2 2 = C. This relationship suggests that 7 70 90 3.3 32.4
ln(y) may well be approximately linear in 8 60 90 3.0 35.7
both ln(x 1 ) and ln(x2 ). Use a multiple linear
regression program to fit the relationship (a) Make a plot of the mileages that is useful for
judging the size of Speed × Octane interac-
ln(y) ≈ β0 + β1 ln(x1 ) + β2 ln(x2 ) tions. Does it look as if the interactions are
large in comparison to the main effects?
to these data. What fraction of the raw vari- (b) Compute the fitted main effects and interac-
ability in ln(y) is accounted for in the fitting tions for the mileages, using the formulas of
process? What estimates of the parameters α1 , Section 4.3. Make a plot like Figure 4.23
α2 , and C follow from your fitted equation? for comparing the observed mileages to fit-
(b) Compute and plot residuals (continuing to ted mileages computed supposing that there
work on log scales) for the equation you fit are no Speed × Octane interactions.
in part (a). Make at least plots of residuals (c) Now fit the equation
versus fitted ln(y) and both ln(x1 ) and ln(x2 ),
and make a normal plot of these residuals. Mileage ≈ β0 + β1 (Speed) + β2 (Octane)
Do these plots reveal any particular problems
with the fitted equation? to the data and plot lines representing the pre-
(c) Use your fitted equation to predict first a log dicted mileages versus Speed for both the 87
tool life and then a tool life, if in this machin- octane and the 90 octane gasolines on the
ing application a cutting speed of 550 and a same set of axes.
feed of .01650 is used. (d) Now fit the equation Mileage ≈ β0 + β1
(d) Plot the ordered pairs appearing in the data (Speed) separately, first to the 87 octane data
set in the (x1 , x2 )-plane. Outline a region in and then to the 90 octane data. Plot the two
the plane where you would feel reasonably different lines on the same set of axes.
safe using the equation you fit in part (a) to (e) Discuss the different appearances of the plots
predict tool life. you made in parts (a) through (d) of this exer-
cise in terms of how well they fit the original
210 Chapter 4 Describing Relationships Between Variables
data and the different natures of the assump- α2 , and C follow from your fitted equation?
tions involved in producing them. Using your estimates of α1 , α2 , and C, plot on
(f) What was the fundamental weakness in the same set of (x1 , y) axes the functional re-
Casali’s data collection scheme? A weakness lationships between x 1 and y implied by your
of secondary importance has to do with the fitted equation for x2 equal to 3,000, 6,000,
fact that tests 1–4 were made ten days ear- and then 10,000 psi, respectively.
lier than tests 5–8. Why is this a potential (b) Compute and plot residuals (continuing to
problem? work on log scales) for the equation you fit
14. The article “Accelerated Testing of Solid Film in part (a). Make at least plots of residuals
Lubricants” by Hopkins and Lavik (Lubrication versus fitted ln(y) and both ln(x1 ) and ln(x2 ),
Engineering, 1972) contains a nice example of and make a normal plot of these residuals.
the engineering use of multiple regression. In the Do these plots reveal any particular problems
study, m = 3 sets of journal bearing tests were with the fitted equation?
made on a Mil-L-8937 type film at each combi- (c) Use your fitted equation to predict first a log
nation of three different Loads and three different wear life and then a wear life, if in this appli-
Speeds. The wear lives of journal bearings, y, cation a speed of 20 rpm and a load of 10,000
in hours, are given next for the tests run by the psi are used.
authors. (d) (Accelerated life testing) As a means of
trying to make intelligent data-based predic-
Speed, Load, tions of wear life at low stress levels (and
x1 (rpm) x2 (psi) Wear Life, y (hs)
correspondingly large lifetimes that would be
impractical to observe directly), you might
20 3,000 300.2, 310.8, 333.0 (fully recognizing the inherent dangers of the
20 6,000 99.6, 136.2, 142.4 practice) try to extrapolate using the fitted
20 10,000 20.2, 28.2, 102.7 equation. Use your fitted equation to predict
60 3,000 67.3, 77.9, 93.9 first a log wear life and then a wear life if
60 6,000 43.0, 44.5, 65.9 a speed of 15 rpm and load of 1,500 psi are
used in this application.
60 10,000 10.7, 34.1, 39.1
100 3,000 26.5, 22.3, 34.8 15. The article “Statistical Methods for Controlling
100 6,000 32.8, 25.6, 32.7 the Brown Oxide Process in Multilayer Board
100 10,000 2.3, 4.4, 5.8
Processing” by S. Imadi (Plating and Surface
Finishing, 1988) discusses an experiment con-
(a) The authors expected to be able to describe ducted to help a circuit board manufacturer mea-
wear life as roughly following the relationship sure the concentration of important components
yx1 x2 = C, but they did not find this relation- in a chemical bath. Various combinations of lev-
ship to be a completely satisfactory model. So els of
instead, they tried using the more general rela-
α α
tionship yx 1 1 x2 2 = C. Use a multiple linear x1 = % by volume of component A (a proprietary
regression program to fit the relationship formulation, the major component of which
is sodium chlorite)
ln(y) ≈ β0 + β1 ln(x1 ) + β2 ln(x2 ) and
to these data. What fraction of the raw vari- x2 = % by volume of component B (a proprietary
ability in ln(y) is accounted for in the fitting formulation, the major component of which
process? What estimates of the parameters α1 , is sodium hydroxide)
Chapter 4 Exercises 211
were set in the chemical bath, and the variables regression program. Is this equation the same
one you found in part (b)?
y1 = ml of 1N H2 SO4 used in the first phase (d) If you were to compare the equations for x2
of a titration derived in (b) and (c) in terms of the sum
of squared differences between the predicted
and
and observed values of x2 , which is guaran-
y2 = ml of 1N H2 SO4 used in the second phase teed to be the winner? Why?
of a titration 16. The article “Nonbloated Burned Clay Aggregate
Concrete” by Martin, Ledbetter, Ahmad, and Brit-
were measured. Part of the original data col- ton (Journal of Materials, 1972) contains data
lected (corresponding to bath conditions free of on both composition and resulting physical prop-
Na2 CO3 ) follow: erty test results for a number of different batches
of concrete made using burned clay aggregates.
x1 x2 y1 y2 The accompanying data are compressive strength
measurements, y (made according to ASTM C 39
15 25 3.3 .4
and recorded in psi), and splitting tensile strength
20 25 3.4 .4 measurements, x (made according to ASTM C
20 30 4.1 .4 496 and recorded in psi), for ten of the batches
25 30 4.3 .3 used in the study.
25 35 5.0 .5
30 35 5.0 .3 Batch 1 2 3 4 5
30 40 5.7 .5 y 1420 1950 2230 3070 3060
35 40 5.8 .4 x 207 233 254 328 325
batch of concrete of this type if you were to polymer concentration, x 2 , on percent recoveries
measure a splitting tensile strength of 245 psi? of pyrite, y1 , and kaolin, y2 , from a step of an ore
(g) Compute the residuals from your fitted line. refining process. (High pyrite recovery and low
Plot the residuals against x and against ŷ. kaolin recovery rates were desirable.) Data from
Then make a normal plot of the residuals. one set of n = 9 experimental runs are given here.
What do these plots indicate about the linear-
ity of the relationship between splitting ten- x1 (rpm) x2 (ppm) y1 (%) y2 (%)
sile strength and compressive strength?
(h) Use a statistical package to find the least 1350 80 77 67
squares line, the sample correlation, R 2 , and 950 80 83 54
the residuals for these data. 600 80 91 70
(i) Fit the quadratic relationship y ≈ β0 + β1 x + 1350 100 80 52
β2 x 2 to the data, using a statistical package. 950 100 87 57
Sketch this fitted parabola on your scatterplot 600 100 87 66
from part (a). Does this fitted quadratic ap- 1350 120 67 54
pear to be an important improvement over the 950 120 80 52
line you fit in (c) in terms of describing the
600 120 81 44
relationship of y to x?
(j) How do the R 2 values from parts (h) and (i) (a) What type of data structure did the researcher
compare? Does the increase in R 2 in part (i) use? (Use the terminology of Section 1.2.)
speak strongly for the use of the quadratic (as What was an obvious weakness in his data
opposed to linear) description of the relation- collection plan?
ship of y to x for concretes of this type? (b) Use a regression program to fit the following
(k) If you use the fitted relationship from part equations to these data:
(i) to predict y for x = 245, how does the
prediction compare to your answer for part y1 ≈ β0 + β1 x1
(f)?
(l) What do the fitted relationships from parts y1 ≈ β0 + β2 x2
(c) and (i) give for predicted compressive y1 ≈ β0 + β1 x1 + β2 x2
strengths when x = 400 psi? Do these com-
pare to each other as well as your answers to
What are the R 2 values for the three differ-
parts (f) and (k)? Why would it be unwise to
ent fitted equations? Compare the three fitted
use either of these predictions without further
equations in terms of complexity and appar-
data collection and analysis?
ent ability to predict y1 .
17. In the previous exercise, both x and y were really (c) Compute the residuals for the third fitted
response variables. As such, they were not subject equation in part (b). Plot them against x1 ,
to direct manipulation by the experimenters. That x2 , and ŷ 1 . Also normal-plot them. Do any of
made it difficult to get several (x, y) pairs with these plots suggest that the third fitted equa-
a single x value into the data set. In experimen- tion is inadequate as a summary of these data?
tal situations where an engineer gets to choose (d) As a means of understanding the nature of
values of an experimental variable x, why is it the third fitted equation from part (b), make a
useful/important to get several y observations for scatterplot of y1 vs. x2 . On this plot, plot three
at least some x’s? lines representing ŷ 1 as a function of x2 for
18. Chemical engineering graduate student S. Osoka the three different values of x1 represented in
studied the effects of an agitator speed, x 1 , and a the data set.
Chapter 4 Exercises 213
(e) Using the third equation from part (b), what Factor A Plane Design
pyrite recovery rate would you predict for straight wing (−) vs. t wing (+)
x1 = 1000 rpm and x2 = 110 ppm?
Factor B Nose Weight
(f) Consider also a multivariable quadratic de-
none (−) vs. paper clip (+)
scription of the dependence of y1 on x1 and
x2 . That is, fit the equation Factor C Paper Type
notebook (−) vs. construction (+)
y1 ≈ β0 + β1 x1 + β2 x2 + β3 x12 Factor D Wing Tips
straight (−) vs. bent up (+)
+β4 x22 + β5 x1 x2
The mean flight distances, y (ft), recorded by Fel-
to the data. How does the R 2 value here com- lows for two launches of each plane were as shown
pare with the ones in part (b)? As a means of in the accompanying table.
understanding this fitted equation, plot on a (a) Use the Yates algorithm and compute the fit-
single set of axes the three different quadratic ted factorial effects corresponding to the “all
functions of x 2 obtained by holding x1 at one high” treatment combination.
of the values in the data set. (b) Interpret the results of your calculations from
(g) It is possible to ignore the fact that the speed (a) in the context of the study. (Describe in
and concentration factors are quantitative and words which factors and/or combinations of
to make a factorial analysis of these y1 data. factors appear to have the largest effect(s) on
Do so. Begin by making an interaction plot flight distance. What are the practical impli-
similar to Figure 4.22 for these data. Based cations of these effects?)
on that plot, discuss the apparent relative sizes
of the Speed and Concentration main effects Combination y Combination y
and the Speed × Concentration interactions.
Then compute the fitted factorial effects (the (1) 6.25 d 7.00
fitted main effects and interactions). a 15.50 ad 10.00
(h) If the third equation in part (b) governed y1 , b 7.00 bd 10.00
would it lead to Speed × Concentration inter- ab 16.50 abd 16.00
actions? What about the equation in part (f)? c 4.75 cd 4.50
Explain. ac 5.50 acd 6.00
19. The data given in the previous exercise concern bc 4.50 bcd 4.50
both responses y1 and y2 . The previous analysis abc 6.00 abcd 5.75
dealt with only y1 . Redo all parts of the problem,
replacing the response y1 with y2 throughout. (c) Suppose factors B and D are judged to be
20. K. Fellows conducted a 4-factor experiment, with inert as far as determining flight distance is
the response variable the flight distance of a pa- concerned. (The main effects of B and D and
per airplane when propelled from a launcher fab- all interactions involving them are negligi-
ricated specially for the study. This exercise con- ble.) What fitted/predicted values correspond
cerns part of the data he collected, constituting to this description of flight distance (A and
a complete 24 factorial. The experimental factors C main effects and AC interactions only)?
involved and levels used were as given here. Use these 16 values of ŷ to compute residu-
als, y − ŷ. Plot these against ŷ, levels of A,
levels of B, levels of C, and levels of D. Also
214 Chapter 4 Describing Relationships Between Variables
normal-plot these residuals. Comment on any (b) What is the correlation between x1 and y?
interpretable patterns in your plots. The correlation between x2 and y?
(d) Compute R 2 corresponding to the descrip- (c) Based on (a) and (b), describe how strongly
tion of flight distance used in part (c). (The Thickness and Hardness appear to affect bal-
formula in Definition 3 works in this context listic limit. Review the raw data and specu-
as well as in regression. So does the represen- late as to why the variable with the smaller
tation of R 2 as the squared sample correlation influence on y seems to be of only minor im-
between y and ŷ.) Does it seem that the grand portance in this data set (although logic says
mean, A and C main effects, and AC 2-factor that it must in general have a sizable influence
interactions provide an effective summary of on y).
flight distance? (d) Compute the residuals for the third fitted
21. The data in the accompanying table appear in the equation from (a). Plot them against x 1 , x2 ,
text Quality Control and Industrial Statistics by and ŷ. Also normal-plot them. Do any of
Duncan (and were from a paper of L. E. Simon). these plots suggest that the third fitted equa-
The data were collected in a study of the effec- tion is seriously deficient as a summary of
tiveness of armor plate. Armor-piercing bullets these data?
were fired at an angle of 40◦ against armor plate (e) Plot the (x 1 , x2 ) pairs represented in the data
of thickness x1 (in .001 in.) and Brinell hardness set. Why would it be unwise to use any of the
number x2 , and the resulting so-called ballistic fitted equations to predict y for x 1 = 265 and
limit, y (in ft/sec), was measured. x2 = 440?
22. Basgall, Dahl, and Warren experimented with
x1 x2 y x1 x2 y smooth and treaded bicycle tires of different
widths. Tires were mounted on the same wheel,
253 317 927 253 407 1393 placed on a bicycle wind trainer, and accelerated
258 321 978 252 426 1401 to a velocity of 25 miles per hour. Then pedaling
259 341 1028 246 432 1436 was stopped, and the time required for the wheel
247 350 906 250 469 1327 to stop rolling was recorded. The sample means,
256 352 1159 242 257 950 y, of five trials for each of six different tires were
246 363 1055 243 302 998 as follows:
257 365 1335 239 331 1144
262 375 1392 242 355 1080 Tire Width Tread Time to Stop, y (sec)
255 373 1362 244 385 1276 700/19c smooth 7.30
258 391 1374 234 426 1062 700/25c smooth 8.44
700/32c smooth 9.27
(a) Use a regression program to fit the following 700/19c treaded 6.63
equations to these data:
700/25c treaded 6.87
700/32c treaded 7.07
y ≈ β 0 + β1 x 1
y ≈ β 0 + β2 x 2 (a) Carefully make an interaction plot of times
required to stop, useful for investigating the
y ≈ β 0 + β1 x 1 + β2 x 2
sizes of Width and Tread main effects and
Width × Tread interactions here. Comment
What are the R 2 values for the three differ- briefly on what the plot shows about these
ent fitted equations? Compare the three fitted effects. Be sure to label the plot very clearly.
equations in terms of complexity and appar-
ent ability to predict y.
Chapter 4 Exercises 215
(b) Compute the fitted main effects of Width, 5.00 km/sec detonation velocity, what PETN
the fitted main effects of Tread, and the fit- density would you employ?
ted Width × Tread interactions from the y’s. (g) Compute the residuals from your fitted line.
Discuss how they quantify features that are Plot them against x and against ŷ. Then make
evident in your plot from (a). a normal plot of the residuals. What do these
23. Below are some data read from a graph in the ar- indicate about the linearity of the relationship
ticle “Chemical Explosives” by W. B. Sudweeks between y and x?
that appears as Chapter 30 in Riegel’s Handbook (h) Use a statistical package and compute the
of Industrial Chemistry. The x values are densities least squares line, the sample correlation, R 2 ,
(in g/cc) of pentaerythritol tetranitrate (PETN) and the residuals from the least squares line
samples and the y values are corresponding deto- for these data.
nation velocities (in km/sec). 24. Some data collected in a study intended to reduce
a thread stripping problem in an assembly process
x y x y x y follow. Studs screwed into a metal block were
stripping out of the block when a nut holding
.19 2.65 .50 3.95 .91 5.29 another part on the block was tightened. It was
.20 2.71 .50 3.87 .91 5.11 thought that the depth the stud was screwed into
.24 2.79 .50 3.57 .95 5.33 the block (the thread engagement) might affect
.24 3.19 .55 3.84 .95 5.27 the torque at which the stud stripped out. In the
.25 2.83 .75 4.70 .97 5.30 table below, x is the depth (in 10−3 inches above
.30 3.52 .77 4.19 1.00 5.52 .400) and y is the torque at failure (in lbs/in.).
.30 3.41 .80 4.75 1.00 5.46
.32 3.51 .80 4.38 1.00 5.30 x y x y x y x y
.43 3.38 .85 4.83 1.03 5.59
80 15 40 70 75 70 20 70
.45 3.13 .85 5.32 1.04 5.71
76 15 36 65 25 70 40 65
88 25 30 65 30 60 30 75
(a) Make a scatterplot of these data and comment
on the apparent linearity (or the lack thereof) 35 60 0 45 78 25 74 25
of the relationship between y and x. 75 35 44 50 60 45
(b) Compute the sample correlation between y
and x. Interpret this value. (a) Use a regression program and fit both a linear
(c) Show the “hand” calculations necessary to fit equation and a quadratic equation to these
a line to these data by least squares. Then plot data. Plot them on a scatterplot of the data.
your line on the graph from (a). What are the fractions of raw variability in y
(d) About what increase in detonation velocity accounted for by these two equations?
appears to accompany a unit (1 g/cc) increase (b) Redo part (a) after dropping the x = 0 and
in PETN density? What increase in detona- y = 45 data point from consideration. Do
tion velocity would then accompany a .1 g/cc your conclusions about how best to describe
increase in PETN density? the relationship between x and y change ap-
(e) What fraction of the raw variability in detona- preciably? What does this say about the ex-
tion velocity is “accounted for” by the fitted tent to which a single data point can affect a
line from part (c)? curve-fitting analysis?
(f) Based on your analysis, about what detona- (c) Use your quadratic equation from part (a) and
tion velocity would you predict for a PETN find a thread engagement that provides an op-
density of 0.65 g/cc? If it was your job to timal predicted failure torque. What would
produce a PETN explosive charge with a
216 Chapter 4 Describing Relationships Between Variables
you probably want to do before recommend- (e) About what increase in log grip force appears
ing this depth for use in this assembly pro- to accompany an increase in drag of 10% of
cess? the total possible? This corresponds to what
25. The textbook Introduction to Contemporary Sta- kind of change in raw grip force?
tistical Methods by L. H. Koopmans contains a (f) What fraction of the raw variability in log grip
data set from the testing of automobile tires. A tire force is accounted for in the fitting of a line
under study is mounted on a test trailer and pulled to the data in part (d)?
at a standard velocity. Using a braking mecha- (g) Based on your answer to (d), what log grip
nism, a standard amount of drag (measured in %) force would you predict for a tire of this type
is applied to the tire and the force (in pounds) under these conditions using 40% of the pos-
with which it grips the road is measured. The fol- sible drag? What raw grip force?
lowing data are from tests on 19 different tires (h) Compute the residuals from your fitted line.
of the same design made under the same set of Plot the residuals against x and against ŷ.
road conditions. x = 0% indicates no braking and Then make a normal plot of the residuals.
x = 100% indicates the brake is locked. What do these plots indicate about the linear-
ity of the relationship between drag and log
Drag, x (%) Grip Force, y (lb) grip force?
(i) Use a statistical package to find the least
10 550, 460, 610 squares line, the sample correlation, R 2 , and
20 510, 410, 580 the residuals for these (x, y 0 ) data.
30 470, 360, 480 26. The article “Laboratory Testing of Asphalt Con-
50 390, 310, 400 crete for Porous Pavements” by Woelfl, Wei, Faul-
70 300, 280, 340 stich, and Litwack (Journal of Testing and Evalu-
100 250, 200, 200, 200 ation, 1981) studied the effect of asphalt content
on the permeability of open-graded asphalt con-
(a) Make a scatterplot of these data and comment crete. Four specimens were tested for each of
on “how linear” the relation between y and x six different asphalt contents, with the following
appears to be. results:
In fact, physical theory can be called upon to pre-
dict that instead of being linear, the relationship Asphalt Content, Permeability,
between y and x is of the form y ≈ α exp(βx) x (% by weight) y (in./hr water loss)
for suitable α and β. Note that if natural loga-
rithms are taken of both sides of this expression, 3 1189, 840, 1020, 980
ln(y) ≈ ln(α) + βx. Calling ln(α) by the name 4 1440, 1227, 1022, 1293
β0 and β by the name β1 , one then has a linear 5 1227, 1180, 980, 1210
relationship of the form used in Section 4.1. 6 707, 927, 1067, 822
(b) Make a scatterplot of y 0 = ln(y) versus x. 7 835, 900, 733, 585
Does this plot look more linear than the one 8 395, 270, 310, 208
in (a)?
(c) Compute the sample correlation between y 0 (a) Make a scatterplot of these data and comment
and x “by hand.” Interpret this value. on how linear the relation between y and x
(d) Fit a line to the drags and logged grip forces appears to be. If you focus on asphalt con-
using the least squares principle. Show the tents between, say, 5% and 7%, does linearity
necessary hand calculations. Sketch this line seem to be an adequate description of the re-
on your scatterplot from (b). lationship between y and x?
Chapter 4 Exercises 217
Temporarily restrict your attention to the x = 5, 6, (l) What do the fitted relationships from (c), (i)
and 7 data. and (j) give for predicted permeabilities when
(b) Compute the sample correlation between y x = 2%? Compare these to each other as well
and x “by hand.” Interpret this value. as your answers to (f) and (k). Why would
(c) Fit a line to the asphalt contents and per- it be unwise to use any of these predictions
meabilities using the least squares principle. without further data collection?
Show the necessary hand calculations. Sketch 27. Some data collected by Koh, Morden, and Og-
this fitted line on your scatterplot from (a). bourne in a study of axial breaking strengths (y)
(d) About what increase in permeability appears for wooden dowel rods follow. The students tested
to accompany a 1% (by weight) increase in m = 4 different dowels for each of nine combi-
asphalt content? nations of three different diameters (x1 ) and three
(e) What fraction of the raw variability in perme- different lengths (x2 ).
ability is “accounted for” in the fitting of a
line to the x = 5, 6, and 7 data in part (c)?
x1 (in.) x2 (in.) y (lb)
(f) Based on your answer to (c), what measured
permeability would you predict for a speci- .125 4 51.5, 37.4, 59.3, 58.5
men of this material with an asphalt content .125 8 5.2, 6.4, 9.0, 6.3
of 5.5%? .125 12 2.5, 3.3, 2.6, 1.9
(g) Compute the residuals from your fitted line.
.1875 4 225.3, 233.9, 211.2, 212.8
Plot the residuals against x and against ŷ.
.1875 8 47.0, 79.2, 88.7, 70.2
Then make a normal plot of the residuals.
.1875 12 18.4, 22.4, 18.9, 16.6
What do these plots indicate about the linear-
ity of the relationship between asphalt content .250 4 358.8, 309.6, 343.5, 357.8
and permeability? .250 8 127.1, 158.0, 194.0, 133.0
(h) Use a statistical package and values for x = .250 12 68.9, 40.5, 50.3, 65.6
5, 6, and 7 to find the least squares line, the
sample correlation, R 2 , and the residuals for (a) Make a plot of the 3 × 3 means, ȳ, corre-
these data. sponding to the different combinations of di-
Now consider again the entire data set. ameter and length used in the study, plotting
(i) Fit the quadratic relationship y ≈ β0 + β1 x + ȳ vs. x2 and connecting the three means for
β2 x 2 to the data using a statistical pack- a given diameter with line segments. What
age. Sketch this fitted parabola on your sec- does this plot suggest about how successful
ond scatterplot from part (a). Does this fit- an equation for y that is linear in x2 for each
ted quadratic appear to be an important im- fixed x1 might be in explaining these data?
provement over the line you fit in (c) in terms (b) Replace the strength values with their natural
of describing the relationship over the range logarithms, y 0 = ln(y), and redo the plotting
3≤x ≤8? of part (a). Does this second plot suggest that
(j) Fit the linear relation y ≈ β0 + β1 x to the en- the logarithm of strength might be a linear
tire data set. How do the R 2 values for this fit function of length for fixed diameter?
and the one in (i) compare? Does the larger R 2 (c) Fit the following three equations to the data
in (i) speak strongly for the use of a quadratic via least squares:
(as opposed to a linear) description of the re-
lationship of y to x in this situation? y 0 ≈ β 0 + β1 x 1 ,
(k) If one uses the fitted relationship from (i) to y 0 ≈ β 0 + β2 x 2 ,
predict y for x = 5.5, how does the prediction
compare to your answer for (f)? y 0 ≈ β 0 + β1 x 1 + β2 x 2
218 Chapter 4 Describing Relationships Between Variables
What are the coefficients of determination for analysis. Looking again at your plot from (a),
the three fitted equations? Compare the equa- does it seem that the interactions of Diameter
tions in terms of their complexity and their and Length will be important in describing the
apparent ability to predict y 0 . raw strengths, y? Compute the fitted factorial
(d) Add three lines to your plot from part (b), effects and comment on the relative sizes of
showing predicted log strength (from your the main effects and interactions.
third fitted equation) as a function of x2 for (h) Redo part (g), referring to the graph from part
the three different values of x1 included in (b) and working with the logarithms of dowel
the study. Use your third fitted equation to strength.
predict first a log strength and then a strength 28. The paper “Design of a Metal-Cutting Drilling
for a dowel of diameter .20 in. and length 10 Experiment—A Discrete Two-Variable Problem”
in. Why shouldn’t you be willing to use your by E. Mielnik (Quality Engineering, 1993–1994)
equation to predict the strength of a rod with reports a drilling study run on an aluminum al-
diameter .50 in. and length 24 in.? loy (7075-T6). The thrust (or axial force), y1 , and
(e) Compute and plot residuals for the third equa- torque, y2 , required to rotate drills of various di-
tion you fit in part (c). Make plots of residuals ameters x1 at various feeds (rates of drill penetra-
vs. fitted response and both x 1 and x2 , and tion into the workpiece) x2 , were measured with
normal-plot the residuals. Do these plots sug- the following results:
gest any potential inadequacies of the third
fitted equation? How might these be reme-
Diameter, Feed Rate, Thrust, Torque,
died?
x1 (in.) x2 (in. rev) y1 (lb) y2 (ft-lb)
(f) The students who did this study were strongly
suspicious that the ratio x3 = x12 /x2 is the .250 .006 230 1.0
principal determiner of dowel strength. In .406 .006 375 2.1
fact, it is possible to empirically discover the .406 .013 570 3.8
importance of this quantity as follows. Try
.250 .013 375 2.1
fitting the equation
.225 .009 280 1.0
y 0 ≈ β0 + β1 ln x1 + β2 ln x2 .318 .005 225 1.1
.450 .009 580 3.8
to these data and notice that the fitted coef- .318 .017 565 3.4
ficients of ln x1 and ln x2 are roughly in the .318 .009 400 2.2
ratio of 4 to −2, i.e., 2 to −1. (What does this .318 .009 400 2.1
fitted equation for ln(y) say about y?) Then .318 .009 380 2.1
plot y vs. x 3 and fit the linear equation .318 .009 380 1.9
(a) Use a regression program to fit the following (f) Redo part (e), using y10 as the response vari-
equations to these data: able.
(g) Do your answers to parts (e) and (f) comple-
y10 ≈ β0 + β1 x10 , ment those of part (d)? Explain.
y10 ≈ β0 + β2 x20 , 29. The article “A Simple Method to Study Dispersion
Effects From Non-Necessarily Replicated Data in
y10 ≈ β0 + β1 x10 + β2 x20 Industrial Contexts” by Ferrer and Romero (Qual-
ity Engineering, 1995) describes an unreplicated
What are the R 2 values for the three differ- 24 experiment done to improve the adhesive force
ent fitted equations? Compare the three fitted obtained when gluing on polyurethane sheets as
equations in terms of complexity and appar- the inner lining of some hollow metal parts. The
ent ability to predict y10 . factors studied were the amount of glue used (A),
(b) Compute and plot residuals (continuing to the predrying temperature (B), the tunnel temper-
work on log scales) for the third equation you ature (C), and the pressure applied (D). The exact
fit in part (a). Make plots of residuals vs. fitted levels of the variables employed were not given
y10 and both x10 and x20 , and normal-plot these in the article (presumably for reasons of corporate
residuals. Do these plots reveal any particular security). The response variable was the adhesive
problems with the fitted equation? force, y, in Newtons, and the data reported in the
(c) Use your third equation from (a) to predict article follow:
first a log thrust and then a thrust if a drill of
diameter .360 in. and a feed of .011 in./rev Combination y Combination y
are used. Why would it be unwise to make
a similar prediction for x1 = .450 and x2 = (1) 3.80 d 3.29
.017? (Hint: Make a plot of the (x1 , x2 ) pairs a 4.34 ad 2.82
in the data set and locate this second set of b 3.54 bd 4.59
conditions on that plot.) ab 4.59 abd 4.68
(d) If the third equation fit in part (a) governed y1 , c 3.95 cd 2.73
would it lead to Diameter × Feed interactions
ac 4.83 acd 4.31
for y1 measured on the log scale? To help you
bc 4.86 bcd 5.16
answer this question, plot yb10 vs. x2 (or x20 ) for
abc 5.28 abcd 6.06
each of x 1 = .250, .318, and .406. Does this
equation lead to Diameter × Feed interactions (a) Compute the fitted factorial effects corre-
for raw y1 ? sponding to the “all high” treatment com-
(e) The first four data points listed in the ta- bination.
ble constitute a very small complete factorial (b) Interpret the results of your calculations in
study (an unreplicated 2 × 2 factorial in the the context of the study. Which factors and/or
factors Diameter and Feed). Considering only combinations of factors appear to have the
these data points, do a “factorial” analysis of largest effects on the adhesive force? Suppose
this part of the y1 data. Begin by making an in- that only the A, B, and C main effects and
teraction plot similar to Figure 4.22 for these the B × D interactions were judged to be
data. Based on that plot, discuss the apparent of importance here. Make a corresponding
relative sizes of the Diameter and Feed main statement to your engineering manager about
effects on thrust. Then carry out the arith- how the factors impact adhesive force.
metic necessary to compute the fitted factorial
effects (the main effects and interactions).
220 Chapter 4 Describing Relationships Between Variables
Probability:
The Mathematics
of Randomness
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
221
222 Chapter 5 Probability: The Mathematics of Randomness
factor in statistical engineering studies, and the many small, unnameable causes that
work to produce it are conveniently thought of as chance phenomena. In analytical
contexts, changes in system conditions work to make measured responses vary, and
this is most often attributed to chance.
a random variable.
Following Definition 9 in Chapter 1, a distinction was made between discrete
and continuous data. That terminology carries over to the present context and inspires
two more definitions.
Definition 2 A discrete random variable is one that has isolated or separated possible
values (rather than a continuum of available outcomes).
Random variables that are basically count variables clearly fall under Defi-
nition 2 and are discrete. It could be argued that all measurement variables are
discrete—on the basis that all measurements are “to the nearest unit.” But it is often
mathematically convenient, and adequate for practical purposes, to treat them as
continuous.
A random variable is, to some extent, a priori unpredictable. Therefore, in
describing or modeling it, the important thing is to specify its set of potential values
and the likelihoods associated with those possible values.
Definition 4 To specify a probability distribution for a random variable is to give its set
of possible values and (in one way or another) consistently assign numbers
5.1 (Discrete) Random Variables 223
This text will use the notational convention that a capital P followed by an
expression or phrase enclosed by brackets will be read “the probability” of that
expression. In these terms, a probability function for X is a function f such that
Probability function
for the discrete f (x) = P[X = x]
random variable X
That is, “ f (x) is the probability that (the random variable) X takes the value x.”
Z = the next measured torque for bolt 3 (recorded to the nearest integer)
Example 1 to produce the next bolt 3 torque, then it also makes sense to base a probability
(continued ) function for Z on the relative frequencies in Table 5.1. That is, the probability
distribution specified in Table 5.2 might be used. (In going from the relative
frequencies in Table 5.1 to proposed values for f (z) in Table 5.2, there has been
some slightly arbitrary rounding. This has been done so that probability values
are expressed to two decimal places and now total to exactly 1.00.)
Table 5.1
Relative Frequency Distribution for Measured Bolt 3
Torques
Table 5.2
A Probability Function
for Z
Torque Probability
z f (z)
11 .03
12 .03
13 .03
14 .06
15 .26
16 .09
17 .12
18 .20
19 .15
20 .03
5.1 (Discrete) Random Variables 225
Then the probability function in Table 5.2 is also approximately appropriate for Y .
This point is not so important in this specific example as it is in general: Where
The probability one value is to be selected at random from a population, an appropriate probability
distribution of a distribution is one that is equivalent to the population relative frequency distribution.
single value selected This text will usually express probabilities to two decimal places, as in Table 5.2.
at random from Computations may be carried to several more decimal places, but final probabilities
a population will typically be reported only to two places. This is because numbers expressed to
more than two places tend to look too impressive and be taken too seriously by the
uninitiated. Consider for example the statement “There is a .097328 probability of
booster engine failure” at a certain missile launch. This may represent the results of
some very careful mathematical manipulations and be correct to six decimal places
in the context of the mathematical model used to obtain the value. But it is doubtful
that the model used is a good enough description of physical reality to warrant that
much apparent precision. Two-decimal precision is about what is warranted in most
engineering applications of simple probability.
Properties of a The probability function shown in Table 5.2 has two properties that are necessary
mathematically valid for the mathematical consistency of a discrete probability distribution. The f (z)
probability function values are each in the interval [0, 1] and they total to 1. Negative probabilities or
ones larger than 1 would make no practical sense. A probability of 1 is taken as
indicating certainty of occurrence and a probability of 0 as indicating certainty of
nonoccurrence. Thus, according to the model specified in Table 5.2, since the values
of f (z) sum to 1, the occurrence of one of the values 11, 12, 13, 14, 15, 16, 17, 18,
19, and 20 ft lb is certain.
A probability function f (x) gives probabilities of occurrence for individual val-
ues. Adding the appropriate values gives probabilities associated with the occurrence
of one of a specified type of value for X .
Adding the f (z) entries corresponding to possible values larger than 17 ft lb,
P[Z > 17] = f (18) + f (19) + f (20) = .20 + .15 + .03 = .38
The likelihood of the next torque being more than 17 ft lb is about 38%.
226 Chapter 5 Probability: The Mathematics of Randomness
Example 1 If, for example, specifications for torques were 16 ft lb to 21 ft lb, then the
(continued ) likelihood that the next torque measured will be within specifications is
Suppose further that tool serial numbers begin with some code special to the
tool model and end with consecutively assigned numbers reflecting how many
tools of the particular model have been produced. The symmetry of this situation
suggests that each possible value of W (w = 0, 1, . . . , 9) is equally likely. That
is, a plausible probability function for W is given by the formula
(
.1 for w = 0, 1, 2, . . . , 9
f (w) =
0 otherwise
F(x) = P[X ≤ x]
5.1 (Discrete) Random Variables 227
Cumulative probability X
F(x) = f (z)
function for a discrete
z≤x
variable X
(The sum is over possible values less than or equal to x.) In this discrete case, the
graph of F(x) will be a stair-step graph with jumps located at possible values and
equal in size to the probabilities associated with those possible values.
Example 1 Values of both the probability function and the cumulative probability function
(continued ) for the torque variable Z are given in Table 5.3. Values of F(z) for other z are
also easily obtained. For example,
Table 5.3
Values of the Probability Function and Cumulative
Probability Function for Z
Example 1 F(z)
(continued )
1.0
.5
11 12 13 14 15 16 17 18 19 20 z
X
EX = x f (x) (5.1)
x
.2 .2
.1 .1
11 12 13 14 15 16 17 18 19 20 z 0 1 2 3 4 5 6 7 8 9 w
(Remember the warning in Section 3.3 that µ would stand for both the mean of a
population and the mean of a probability distribution.)
Example 1 Returning to the bolt torque example, the expected (or theoretical mean) value of
(continued ) the next torque is
X
EZ = z f (z)
z
This value is essentially the arithmetic mean of the bolt 3 torques listed in
Table 3.4. (The slight disagreement in the third decimal place arises only because
the relative frequencies in Table 5.1 were rounded slightly to produce Table 5.2.)
This kind of agreement provides motivation for using the symbol µ, first seen in
Section 3.3, as an alternative to EZ.
Example 2 Considering again the serial number example, and the second part of Figure 5.2,
(continued ) if a balance point interpretation of expected value is to hold, EW had better turn
out to be 4.5. And indeed,
It was convenient to measure the spread of a data set (or its relative frequency
distribution) with the variance and standard deviation. It is similarly useful to have
notions of spread for a discrete probability distribution.
Definition 8 The variance of a discrete random variable X (or the variance of its distribu-
tion) is
P P
Var X = (x − EX)2 f (x) = x 2 f (x) − (EX)2 (5.2)
√
The standard deviation of X is Var X√. Often the notation σ 2 is used in
place of Var X, and σ is used in place of Var X .
The variance of a random variable is its expected (or mean) squared distance
from the center of its probability distribution. The use of σ 2 to stand for both the
variance of a population and the variance of a probability distribution is motivated
on the same grounds as the double use of µ.
Example 1 The calculations necessary to produce the bolt torque standard deviation are
(continued ) organized in Table 5.4. So
√ √
I σ = Var Z = 4.6275 = 2.15 ft lb
Except for a small difference due to round-off associated with the creation of
Table 5.2, this standard deviation of the random variable Z is numerically the
same as the population standard deviation associated with the bolt 3 torques in
Table 3.4. (Again, this is consistent with the equivalence between the population
relative frequency distribution and the probability distribution for Z .)
5.1 (Discrete) Random Variables 231
Table 5.4
Calculations for Var Z
Example 2 To illustrate the alternative for calculating a variance given in Definition 8, con-
(continued ) sider finding the variance and standard
P deviation of the serial number variable W .
Table 5.5 shows the calculation of w2 f (w).
Table 5.5 P
Calculations for w 2 f (w)
w f (w) w2 f (w)
0 .1 0.0
1 .1 .1
2 .1 .4
3 .1 .9
4 .1 1.6
5 .1 2.5
6 .1 3.6
7 .1 4.9
8 .1 6.4
9 .1 8.1
28.5
232 Chapter 5 Probability: The Mathematics of Randomness
Example 2 Then
(continued ) X
Var W = w2 f (w) − (EW)2 = 28.5 − (4.5)2 = 8.25
so that
√
I Var W = 2.87
Comparing the two probability histograms in Figure 5.2, notice that the distribu-
tion of W appears to be more spread out than that of Z . Happily, this is reflected
in the fact that
√ √
Var W = 2.87 > 2.15 = Var Z
n!
p x (1 − p)n−x for x = 0, 1, . . . , n
f (x) = x! (n − x)! (5.3)
0 otherwise
Equation (5.3) is completely plausible. In it there is one factor of p for each trial pro-
ducing a go/success outcome and one factor of (1 − p) for each trial producing a no
go/failure outcome. And the n!/x! (n − x)! term is a count of the number of patterns
in which it would be possible to see x go/success outcomes in n trials. The name bi-
nomial distribution derives from the fact that the values f (0), f (1), f (2), . . . , f (n)
are the terms in the expansion of
( p + (1 − p))n
.2 .2 .2
.1 .1 .1
0 1 2 3 4 5 x 0 1 2 3 4 5 x 0 1 2 3 4 5 x
f (x)
n = 10
.3 p = .2
.2
.1
0 1 2 3 4 5 6 7 8 9 10 x
(The trick employed here, to avoid plugging into the binomial probability function
9 times by recognizing that the f (u)’s have to sum up to 1, is a common and
useful one.)
The .62 figure is only as good as the model assumptions that produced it.
If an independent, identical success-failure trials description of shaft production
fails to accurately portray physical reality, the .62 value is fine mathematics
but possibly a poor description of what will actually happen. For instance, say
that due to tool wear it is typical to see 40 shafts in specifications, then 10
reworkable shafts, a tool change, 40 shafts in specifications, and so on. In this
case, the binomial distribution would be a very poor description of U , and the
.62 figure largely irrelevant. (The independence-of-trials assumption would be
inappropriate in this situation.)
The binomial There is one important circumstance where a model of independent, identical
distribution and success-failure trials is not exactly appropriate, but a binomial distribution can still be
simple random adequate for practical purposes—that is, in describing the results of simple random
sampling sampling from a dichotomous population. Suppose a population of size N contains
5.1 (Discrete) Random Variables 235
f (0) = P[V = 0]
= P[first pellet selected is nonconforming and
subsequently the second pellet is also nonconforming]
f (2) = P[V = 2]
= P[first pellet selected is conforming and
subsequently the second pellet selected is conforming]
f (1) = 1 − ( f (0) + f (2))
Then think, “In the long run, the first selection will yield a nonconforming pellet
about 34 out of 100 times. Considering only cases where this occurs, in the long
run the next selection will also yield a nonconforming pellet about 33 out of 99
times.” That is, a sensible evaluation of f (0) is
34 33
f (0) = · = .1133
100 99
236 Chapter 5 Probability: The Mathematics of Randomness
Example 4 Similarly,
(continued )
66 65
f (2) = · = .4333
100 99
and thus
2!
(.34)2 (.66)0 = .1156 ≈ f (0)
0! 2!
2!
(.34)1 (.66)1 = .4488 ≈ f (1)
1! 1!
2!
(.34)0 (.66)2 = .4356 ≈ f (2)
2! 0!
Calculation of the mean and variance for binomial random variables is greatly
simplified by the fact that when the formulas (5.1) and (5.2) are used with the
expression for binomial probabilities in equation (5.3), simple formulas result. For
X a binomial (n, p) random variable,
Mean of the X
n
n!
binomial (n, p) µ = EX = x p x (1 − p)n−x = np (5.4)
distribution x=0
x!(n − x)!
Variance of the X
n
n!
binomial (n, p) σ = Var X =
2
(x − np)2 p x (1 − p)n−x = np(1 − p) (5.5)
distribution x=0
x!(n − x)!
5.1 (Discrete) Random Variables 237
Example 3 Returning to the machining of steel shafts, suppose that a binomial distribution
(continued ) with n = 10 and p = .2 is appropriate as a model for
EU = (10)(.2) = 2 shafts
√ p
Var U = 10(.2)(.8) = 1.26 shafts
(
p(1 − p)x−1 for x = 1, 2, . . .
f (x) = (5.6)
0 otherwise
Formula (5.6) makes good intuitive sense. In order for X to take the value x,
there must be x − 1 consecutive no-go/failure results followed by a go/success. In
formula (5.6), there are x − 1 terms (1 − p) and one term p. Another way to see
that formula (5.6) is plausible is to reason that for X as above and x = 1, 2, . . .
1 − F(x) = 1 − P[X ≤ x]
= P[X > x]
= P[x no-go/failure outcomes in x trials]
That is,
Simple relationship for
the geometric (p)
1 − F(x) = (1 − p)x (5.7)
cumulative probability
function
238 Chapter 5 Probability: The Mathematics of Randomness
f(x)
.5
.4
p = .5
.3
.2
.1
1 2 3 4 5 6 7 8 9 x
f(x)
p = .25
.3
.2
.1
1 2 3 4 5 6 7 8 9 10 11 12 13 x
by using the form of the binomial (x, p) probability function given in equation
(5.3). Then for x = 2, 3, . . . , f (x) = F(x) − F(x − 1) = −(1 − F(x)) + (1 −
F(x − 1)). This, combined with equation (5.7), gives equation (5.6).
The name geometric derives from the fact that the values f (1), f (2), f (3), . . .
are terms in the geometric infinite series for
1
p·
1 − (1 − p)
Suppose that testing begins on a production run in this plant, and let
P[the first or second cell tested has the first short] = P[T = 1 or T = 2]
= f (1) + f (2)
= (.01) + (.01)(1 − .01)
= .02
P[at least 50 cells are tested without finding a short] = P[T > 50]
= (1 − .01)50
= .61
Like the binomial distributions, the geometric distributions have means and
variances that are simple functions of the parameter p. That is, if X is geometric ( p),
Mean of the ∞
X 1
geometric (p) µ = EX = x p(1 − p)x−1 = (5.8)
distribution x=1
p
and
Variance of the ∞
X 2
1 1− p
geometric (p) σ = Var X =
2
x− p(1 − p)x−1 = (5.9)
distribution x=1
p p2
Example 5 Formula (5.8) is an intuitively appealing result. If there is only 1 chance in 100 of
(continued ) encountering a shorted battery at each test, it is sensible to expect to wait through
100 tests on average to encounter the first one.
Definition 11 The Poisson (λ) distribution is a discrete probability distribution with prob-
ability function
−λ x
e λ
f (x) = for x = 0, 1, 2, . . . (5.10)
x!
0 otherwise
for λ > 0.
The form of equation (5.10) may initially seem unappealing. But it is one that
has sensible mathematical origins, is manageable, and has proved itself empirically
useful in many different “rare events” circumstances. One way to arrive at equation
(5.10) is to think of a very large number of independent trials (opportunities for
occurrence), where the probability of success (occurrence) on any one is very small
and the product of the number of trials and the success probability is λ. One is
then led to the binomial (n, λn ) distribution. In fact, for large n, the binomial (n, λn )
probability function approximates the one specified in equation (5.10). So one
might think of the Poisson distribution for counts as arising through a mechanism
that would present many tiny similar opportunities for independent occurrence or
nonoccurrence throughout an interval of time or space.
The Poisson distributions are right-skewed distributions over the values x =
0, 1, 2, . . . , whose probability histograms peak near their respective λ’s. Two dif-
ferent Poisson probability histograms are shown in Figure 5.5. λ is both the mean
5.1 (Discrete) Random Variables 241
f(x)
λ = 1.5
.3
.2
.1
0 1 2 3 4 5 6 7 8 x
f (x)
λ = 3.0
.3
.2
.1
0 1 2 3 4 5 6 7 8 9 10 11 x
and the variance for the Poisson (λ) distribution. That is, if X has the Poisson (λ)
distribution, then
X∞
Mean of the e−λ λx
Poisson (λ) µ = EX = x =λ (5.11)
distribution x=0
x!
and
Variance of the ∞
X e−λ λx
Poisson (λ) Var X = (x − λ)2 =λ (5.12)
distribution x=0
x!
Fact (5.11) is helpful in picking out which Poisson distribution might be useful in
describing a particular “rare events” situation.
is then
−3.87
e (3.87)s
for s = 0, 1, 2, . . .
f (s) = s!
0 otherwise
M = the number of students entering the ISU library between 12:00 and
12:01 next Tuesday
and, for example, the probability that between 10 and 15 students (inclusive)
arrive at the library between 12:00 and 12:01 would be evaluated as
Section 1 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. A discrete random variable X can be described (b) If seven of the ten subjects correctly identify
using the probability function the artificial sweetener, is this outcome strong
evidence of a taste difference? Explain.
x 2 3 4 5 6 3. Suppose that a small population consists of the
N = 6 values 2, 3, 4, 4, 5, and 6.
f (x) .1 .2 .3 .3 .1 (a) Sketch a relative frequency histogram for this
population and compute the population mean,
(a) Make a probability histogram for X. Also plot µ, and standard deviation, σ .
F(x), the cumulative probability function (b) Now let X = the value of a single number se-
for X . lected at random from this population. Sketch
(b) Find the mean and standard deviation of X . a probability histogram for this variable X and
2. In an experiment to evaluate a new artificial sweet- compute EX and Var X .
ener, ten subjects are all asked to taste cola from (c) Now think of drawing a simple random sample
three unmarked glasses, two of which contain reg- of size n = 2 from this small population. Make
ular cola while the third contains cola made with tables giving the probability distributions of the
the new sweetener. The subjects are asked to iden- random variables
tify the glass whose content is different from the
other two. If there is no difference between the X = the sample mean
taste of sugar and the taste of the new sweetener,
the subjects would be just guessing. S 2 = the sample variance
(a) Make a table for a probability function for
(There are 15 different possible unordered sam-
X = the number of subjects correctly ples of 2 out of 6 items. Each of the 15 possible
identifying the artificially samples is equally likely to be chosen and has
sweetened cola its own corresponding x̄ and s 2 .) Use the tables
and make probability histograms for these ran-
under this hypothesis of no difference in taste. dom variables. Compute EX and Var X . How
do these compare to µ and σ 2 ?
244 Chapter 5 Probability: The Mathematics of Randomness
4. Sketch probability histograms for the binomial dis- each histogram, mark the location of the mean
tributions with n = 5 and p = .1, .3, .5, .7, and .9. and indicate the size of the standard deviation.
On each histogram, mark the location of the mean
and indicate the size of the standard deviation. 8. A process for making plate glass produces an av-
erage of four seeds (small bubbles) per 100 square
5. Suppose that an eddy current nondestructive eval- feet. Use Poisson distributions and assess proba-
uation technique for identifying cracks in critical bilities that
metal parts has a probability of around .20 of detect- (a) a particular piece of glass 5 ft × 10 ft will
ing a single crack of length .003 in. in a certain ma- contain more than two seeds.
terial. Suppose further that n = 8 specimens of this (b) a particular piece of glass 5 ft × 5 ft will con-
material, each containing a single crack of length tain no seeds.
.003 in., are inspected using this technique. Let W
be the number of these cracks that are detected. Use 9. Transmission line interruptions in a telecommu-
an appropriate probability model and evaluate the nications network occur at an average rate of one
following: per day.
(a) P[W = 3] (a) Use a Poisson distribution as a model for
(b) P[W ≤ 2]
(c) E W X = the number of interruptions in the next
(d) Var W five-day work week
(e) the standard deviation of W
and assess P[X = 0].
6. In the situation described in Exercise 5, suppose (b) Now consider the random variable
that a series of specimens, each containing a sin-
gle crack of length .003 in., are inspected. Let Y Y = the number of weeks in the next four
be the number of specimens inspected in order to in which there are no interruptions
obtain the first crack detection. Use an appropriate
probability model and evaluate all of the following: What is a reasonable probability model for
(a) P[Y = 5] Y ? Assess P[Y = 2].
(b) P[Y ≤ 4]
(c) EY 10. Distinguish clearly between the subjects of prob-
(d) Var Y ability and statistics. Is one field a subfield of the
(e) the standard deviation of Y other?
7. Sketch probability histograms for the Poisson dis- 11. What is the difference between a relative fre-
tributions with means λ = .5, 1.0, 2.0, and 4.0. On quency distribution and a probability distribution?
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
and such that for all a ≤ b, one is willing to assign P[a ≤ X ≤ b] according
to
Z b
P[a ≤ X ≤ b] = f (x) dx (5.14)
a
f(x)
Shaded area gives
P[2 ≤ X ≤ 6]
0 2 6 10 14 18 22 x
Example 8 The Random Time Until a First Arc in the Bob Drop Experiment
Consider once again the bob drop experiment first described in Section 1.4 and
revisited in Example 4 in Chapter 4. In any use of the apparatus, the bob is almost
certainly not released exactly “in sync” with the 60 cycle current that produces
the arcs and marks on the paper tape. One could think of a random variable
Y = the time elapsed (in seconds) from bob release until the first arc
1
as continuous with set of possible values (0, 60 ).
What is a plausible probability density function for Y ? The symmetry of this
situation suggests that probability density should be constant over the interval
1
(0, 60 ) and 0 outside the interval. That is, for any two values y1 and y2 in
1
(0, 60 ), the probability that Y takes a value within a small interval around y1 of
length dy (i.e., f (y1 ) dy approximately) should be the same as the probability
that Y takes a value within a small interval around y2 of the same length dy (i.e.,
f (y2 ) dy approximately). This forces f (y1 ) = f (y2 ), so there must be a constant
1
probability density on (0, 60 ).
Now if f (y) is to have the form
(
c for 0 < y < 1
f (y) = 60
0 otherwise
f (y)
Total area under the
graph of f (y) must be 1
c
0 1 y
60
(
60 for 0 < y < 1
I f (y) = 60
(5.15)
0 otherwise
One point about continuous probability distributions that may at first seem coun-
terintuitive concerns the probability associated with a continuous random variable
For X a continuous assuming a particular prespecified value (say, a). Just as the mass a continuous mass
random variable, distribution places at a single point is 0, so also is P[X = a] = 0 for a continuous
P[X = a] = 0 random variable X . This follows from equation (5.14), because
Z a
P[a ≤ X ≤ a] = f (x) dx = 0
a
One consequence of this mathematical curiosity is that when working with contin-
uous random variables, you don’t need to worry about whether or not inequality
signs you write are strict inequality signs. That is, if X is continuous,
Cumulative probability Z x
function for a F(x) = P[X ≤ x] = f (t) dt (5.16)
continuous variable −∞
F(x) is obtained from f (x) by integration, and applying the fundamental theorem
of calculus to equation (5.16)
Another relationship d
between F(x) and f(x)
F(x) = f (x) (5.17)
dx
Example 8 The cumulative probability function for Y , the elapsed time from bob release
(continued ) until first arc, is easily obtained from equation (5.15). For y ≤ 0,
Z y Z y
F(y) = P[Y ≤ y] = f (t) dt = 0 dt = 0
−∞ −∞
That is,
0 if y ≤ 0
I F(y) =
60y if 0 < y ≤ 1/60
1 if 1
60
<y
A plot of F(y) is given in Figure 5.8. Comparing Figure 5.8 to Figure 5.7
shows that indeed the graph of F(y) has slope 0 for y < 0 and y > 60 1
and
slope 60 for 0 < y < 60 . That is, f (y) is the derivative of F(y), as promised by
1
equation (5.17).
5.2 Continuous Random Variables 249
F(y)
0 1 y
60
Z ∞
EX = x f (x) dx (5.18)
−∞
Formula (5.18) is perfectly plausible from at least two perspectives. First, the
probability in a small interval around x of length dx is approximatelyP f (x) dx.
So multiplying this by x and summing as in Definition 7, one has x f (x) dx,
and formula (5.18) is exactly the limit of such sums as dx gets small. And second,
in mechanics the center of mass of a continuous mass distribution is of the form
given in equation (5.18) except for division by a total mass, which for a probability
distribution is 1.
250 Chapter 5 Probability: The Mathematics of Randomness
Example 8 Thinking of the probability density in Figure 5.7 as an idealized histogram and
(continued ) thinking of the balance point interpretation of the mean, it is clear that EY had
1
better turn out to be 120 for the elapsed time variable. Happily, equations (5.18)
and (5.15) give
Z ∞ Z 0 Z 1/60 Z ∞
µ = EY = y f (y) dy = y · 0 dy + y · 60 dy + y · 0 dy
−∞ −∞ 0 1/60
1/60
1
= 30y 2 = sec
0 120
Definition 14 The variance of a continuous random variable X (sometimes called the vari-
ance of its probability distribution) is
Z ∞ Z ∞
Var X = (x − EX) f (x) dx
2
= x f (x) dx − (EX)
2 2
(5.19)
−∞ −∞
√
The standard deviation of X is Var X√. Often the notation σ 2 is used in
place of Var X, and σ is used in place of Var X .
Example 8 Return for a final time to the bob drop and the random variable Y . Using formula
(continued ) (5.19) and the form of Y ’s probability density,
Z 0 2 Z 1/60 2
1 1
σ = Var Y =
2
y− · 0 dy + y− · 60 dy
−∞ 120 0 120
3 1/60
1
Z ∞ 2 60 y −
1 120
+ y− · 0 dy =
1/60 120 3 0
1 1 2
=
3 120
1 2 2
f (x) = p e−(x−µ) /2σ for all x (5.20)
2
2πσ
for σ > 0.
It is not necessarily obvious, but formula (5.20) does yield a legitimate proba-
bility density, in that the total area under the curve y = f (x) is 1. Further, it is also
the case that
Z ∞
Normal distribution 1 2 2
mean and variance
EX = xp e−(x−µ) /2σ dx = µ
−∞ 2
2πσ
and
Z ∞
1 2 2
Var X = (x − µ)2 p e−(x−µ) /2σ dx = σ 2
−∞ 2
2πσ
That is, the parameters µ and σ 2 used in Definition 15 are indeed, respectively, the
mean and variance (as defined in Definitions 13 and 14) of the distribution.
Figure 5.9 is a graph of the probability density specified by formula (5.20). The
bell-shaped curve shown there is symmetric about x = µ and has inflection points
at µ − σ and µ + σ . The exact form of formula (5.20) has a number of theoretical
origins. It is also a form that turns out to be empirically useful in a great variety of
applications.
In theory, probabilities for the normal distributions can be found directly by
integration using formula (5.20). Indeed, readers with pocket calculators that are
preprogrammed to do numerical integration may find it instructive to check some
of the calculations in the examples that follow, by straightforward use of formulas
(5.14) and (5.20). But the freshman calculus methods of evaluating integrals via
antidifferentiation will fail when it comes to the normal densities. They do not have
antiderivatives that are expressible in terms of elementary functions. Instead, special
normal probability tables are typically used.
252 Chapter 5 Probability: The Mathematics of Randomness
f(x)
µ – 2σ µ – σ µ µ + σ µ + 2σ x
The use of tables for evaluating normal probabilities depends on the following
relationship. If X is normally distributed with mean µ and variance σ 2 ,
Z b Z (b−µ)/σ
1 2 2 1 2
P[a ≤ X ≤ b] = p e−(x−µ) /2σ dx = √ e−z /2 dz (5.21)
a 2πσ 2 (a−µ)/σ 2π
where the second inequality follows from the change of variable or substitution
z = x−µ
σ
. Equation (5.21) involves an integral of the normal density with µ = 0
and σ = 1. It says that evaluation of all normal probabilities can be reduced to the
evaluation of normal probabilities for that special case.
Definition 16 The normal distribution with µ = 0 and σ = 1 is called the standard normal
distribution.
Relation between The relationship between normal (µ, σ 2 ) and standard normal probabilities
normal (µ, σ 2 ) is illustrated in Figure 5.10. Once one realizes that probabilities for all normal
probabilities and distributions can be had by tabulating probabilities for only the standard normal
standard normal distribution, it is a relatively simple matter to use techniques of numerical integration
probabilities to produce a standard normal table. The one that will be used in this text (other forms
are possible) is given in Table B.3. It is a table of the standard normal cumulative
probability function. That is, for values z located on the table’s margins, the entries
in the table body are
Z z
1 2
8(z) = F(z) = √ e−t /2 dt
−∞ 2π
(8 is routinely used to stand for the standard normal cumulative probability function,
instead of the more generic F.)
5.2 Continuous Random Variables 253
Normal
( µ , σ 2) P[a ≤ X ≤ b]
density
µ – 2σ µ–σ a µ µ+σ b µ + 2σ x
Equal areas!
Standard
normal a –µ b –µ
σ ≤Z≤ σ
density P
–2 –1 0 1 2 z
a –µ b –µ
σ σ
(The tabled value is .9608, but in keeping with the earlier promise to state final
probabilities to only two decimal places, the tabled value was rounded to get .96.)
After two table look-ups and a subtraction,
And a single table look-up and a subtraction yield a right-tail probability like
As the table was used in these examples, probabilities for values z located
on the table’s margins were found in the table’s body. The process can be run in
254 Chapter 5 Probability: The Mathematics of Randomness
–2 –1 0 1 2 –2 –1 0 1 2
–.89 –z z
reverse. Probabilities located in the table’s body can be used to specify values z
on the margins. For example, consider locating a value z such that
The last part of Example 9 amounts to finding the .975 quantile for the standard
normal distribution. In fact, the reader is now in a position to understand the origin
of Table 3.10 (see page 89). The standard normal quantiles there were found by
looking in the body of Table B.3 for the relevant probabilities and then locating
corresponding z’s on the margins.
In mathematical symbols, for 8(z), the standard normal cumulative probability
function, and Q z ( p), the standard normal quantile function,
)
8(Q z ( p)) = p
(5.22)
Q z (8(z)) = z
Relationships (5.22) mean that Q z and 8 are inverse functions. (In fact, the rela-
tionship Q = F −1 is not just a standard normal phenomenon but is true in general
for continuous distributions.)
Relationship (5.21) shows how to use the standard normal cumulative probabil-
ity function to find general normal probabilities. For X normal (µ, σ 2 ) and a value
5.2 Continuous Random Variables 255
x associated with X , one converts to units of standard deviations above the mean
via
with a normal distribution with µ = 137.2 and σ = 1.6. And further suppose the
probability that the next jar filled is below declared weight (i.e., P[W < 135.0])
is of interest. Using formula (5.23), w = 135.0 is converted to units of standard
deviations above µ (converted to a z-value) as
135.0 − 137.2
z= = −1.38
1.6
This model puts the chance of obtaining a below-nominal fill level at about 8%.
As a second example, consider the probability that W is within 1 gram of
nominal (i.e., P[134.0 < W < 136.0]). Using formula (5.23), both w1 = 134.0
and w2 = 136.0 are converted to z-values or units of standard deviations above
the mean as
134.0 − 137.2
z1 = = −2.00
1.6
136.0 − 137.2
z2 = = −.75
1.6
256 Chapter 5 Probability: The Mathematics of Randomness
P[134.0 < W < 136.0] = .20 P[–2.0 < Z < –.75] = .20
So then
The preceding two probabilities and their standard normal counterparts are shown
in Figure 5.12.
The calculations for this example have consisted of starting with all of the
quantities on the right of formula (5.23) and going from the margin of Table B.3
to its body to find probabilities for W . An important variant on this process is to
instead go from the body of the table to its margins to obtain z, and then—given
only two of the three quantities on the right of formula (5.23)—to solve for the
third.
For example, suppose that it is easy to adjust the aim of the filling process
(i.e., the mean µ of W ) and one wants to decrease the probability that the next
jar is below the declared weight of 135.0 to .01 by increasing µ. What is the
minimum µ that will achieve this (assuming that σ remains at 1.6 g)?
Figure 5.13 shows what to do. µ must be chosen in such a way that w =
135.0 becomes the .01 quantile of the normal distribution with mean µ and
standard deviation σ = 1.6. Consulting either Table 3.10 or Table B.3, it is easy
to determine that the .01 quantile of the standard normal distribution is
z = Q z (.01) = −2.33
5.2 Continuous Random Variables 257
135.0 µ w
135.0 − µ
−2.33 =
1.6
i.e.,
I µ = 138.7 g
1
e−x/α for x > 0
f (x) = α (5.24)
0 otherwise
for α > 0.
Figure 5.14 shows plots of f (x) for three different values of α. Expression
(5.24) is extremely convenient, and it is not at all difficult to show that α is both the
mean and the standard deviation of the exponential (α) distribution. That is,
Mean of the
Z ∞
1
exponential (α) µ = EX = x e−x/α dx = α
0 α
distribution
and
Variance of the Z ∞
1
exponential (α) σ 2 = Var X = (x − α)2 e−x/α dx = α 2
distribution 0 α
f (x)
2.0 α = .5
1.5 α = 1.0
1.0 α = 2.0
.5
0
1.0 2.0 3.0 4.0 5.0 x
T = the waiting time (in minutes) until the first student passes through the door
A possible model for T is the exponential distribution with α = .08. Using it, the
probability of waiting more than 10 seconds ( 16 min) for the first arrival is
1 1
P T > =1− F = 1 − 1 − e−1/6(.08) = .12
6 6
f (t)
10
5 P[T > 1
6
] = .12
.1 1 .2 t
6
Geometric and The exponential distribution is the continuous analog of the geometric distribu-
exponential tion in several respects. For one thing, both the geometric probability function and
distributions the exponential probability density decline exponentially in their arguments x. For
another, they both possess a kind of memoryless property. If the first success in a
series of independent identical success-failure trials is known not to have occurred
through trial t0 , then the additional number of trials (beyond t0 ) needed to produce
the first success is a geometric ( p) random variable (as was the total number of
trials required from the beginning). Similarly, if an exponential (α) waiting time is
known not to have been completed by time t0 , then the additional waiting time to
260 Chapter 5 Probability: The Mathematics of Randomness
(
0 if x < 0
F(x) = β
(5.26)
1 − e−(x/α) if x ≥ 0
Weibull (α, β) 0 if x < 0
probability f (x) =
β x β−1 e−(x/α)β
(5.27)
density if x > 0
αβ
Weibull (α, β)
µ = E X = α0 1 + β1 (5.28)
mean
and variance
2
Weibull (α, β)
σ 2 = Var X = α 2 0 1 + β2 − 0 1 + β1 (5.29)
variance
5.2 Continuous Random Variables 261
f (x)
2.0 α = .5
β = .5
α = 1.0
1.0
α = 4.0
0
1.0 2.0 3.0 4.0 5.0 x
f (x)
2.0 α = .5
β =1
α = 1.0
1.0
α = 4.0
0
1.0 2.0 3.0 4.0 5.0 x
f (x)
3.0
β =4
α = .5
2.0
α = 1.0
1.0
α = 4.0
0
1.0 2.0 3.0 4.0 5.0 x
R∞
where 0(x) = 0 t x−1 e−t dt is the gamma function of advanced calculus. (For
integer values n, 0(n) = (n − 1)!.) These formulas for f (x), µ, and σ 2 are not par-
ticularly illuminating. So it is probably most helpful to simply realize that β controls
the shape of the Weibull distribution and that α controls the scale. Figure 5.16 shows
plots of f (x) for several (α, β) pairs.
Note that β = 1 gives the special case of the exponential distributions. For
small β, the distributions are decidedly right-skewed, but for β larger than about
3.6, they actually become left-skewed. Regarding distribution location, the form of
the distribution mean given in equation (5.28) is not terribly revealing. It is perhaps
more helpful that the median for the Weibull (α, β) distribution is
Weibull (α, β)
median Q(.5) = αe−(.3665/β) (5.30)
262 Chapter 5 Probability: The Mathematics of Randomness
So, for example, for large shape parameter β the Weibull median is essentially α.
And formulas (5.28) through (5.30) show that for fixed β the Weibull mean, median,
and standard deviation are all proportional to the scale parameter α.
Under the assumption that S can be modeled using a Weibull distribution with
the suggested characteristics, suppose that P[S ≤ 400] is needed. Using equation
(5.30),
428 = αe−(.3665/8.8)
α = 446
f(s)
Weibull density
β = 8.8, α = 446
Section 2 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. The random number generator supplied on a cal- (b) What adjustment to the grinding process (hold-
culator is not terribly well chosen, in that values ing the process standard deviation constant)
it generates are not adequately described by a dis- would increase the fraction of journal diam-
tribution uniform on the interval (0, 1). Suppose eters that will be in specifications? What ap-
instead that a probability density pears to be the best possible fraction of jour-
( nal diameters inside ± .0005 in. specifications,
k(5 − x) for 0 < x < 1 given the σ = .0004 in. apparent precision of
f (x) = the grinder?
0 otherwise
(c) Suppose consideration was being given to pur-
chasing a more expensive/newer grinder, capa-
is a more appropriate model for X = the next value
ble of holding tighter tolerances on the parts it
produced by this random number generator.
produces. What σ would have to be associated
(a) Find the value of k.
with the new machine in order to guarantee that
(b) Sketch the probability density involved here.
(when perfectly adjusted so that µ = 2.0000)
(c) Evaluate P[.25 < X < .75].
the grinder would produce diameters with at
(d) Compute and graph the cumulative probability
least 95% meeting 2.0000 in. ± .0005 in. spec-
function for X, F(x).
ifications?
(e) Calculate EX and the standard deviation of X .
5. The mileage to first failure for a model of military
2. Suppose that Z is a standard normal random vari-
personnel carrier can be modeled as exponential
able. Evaluate the following probabilities involv-
with mean 1,000 miles.
ing Z :
(a) Evaluate the probability that a vehicle of this
(a) P[Z < −.62] (b) P[Z > 1.06]
type gives less than 500 miles of service be-
(c) P[−.37 < Z < .51] (d) P[|Z | ≤ .47]
fore first failure. Evaluate the probability that
(e) P[|Z | > .93] (f) P[−3.0< Z <3.0]
it gives at least 2,000 miles of service before
Now find numbers # such that the following state-
first failure.
ments involving Z are true:
(b) Find the .05 quantile of the distribution of
(g) P[Z ≤ #] = .90 (h) P[|Z | < #] = .90
mileage to first failure. Then find the .90 quan-
(i) P[|Z | > #] = .03
tile of the distribution.
3. Suppose that X is a normal random variable with
6. Some data analysis shows that lifetimes, x (in 106
mean 43.0 and standard deviation 3.6. Evaluate the
revolutions before failure), of certain ball bearings
following probabilities involving X:
can be modeled as Weibull with β = 2.3 and α =
(a) P[X < 45.2] (b) P[X ≤ 41.7]
80.
(c) P[43.8 < X ≤ 47.0] (d) P[|X − 43.0| ≤ 2.0]
(a) Make a plot of the Weibull density (5.27)
(e) P[|X− 43.0|>1.7]
for this situation. (Plot for x between 0 and
Now find numbers # such that the following state-
200. Standard statistical software packages like
ments involving X are true:
MINITAB will have routines for evaluating this
(f) P[X < #] = .95 (g) P[X ≥ #] = .30
density. In MINITAB look under the “Calc/
(h) P[|X − 43.0| > #] = .05
Probability Distributions/Weibull” menu.)
4. The diameters of bearing journals ground on a (b) What is the median bearing life?
particular grinder can be described as normally dis- (c) Find the .05 and .95 quantiles of bearing life.
tributed with mean 2.0005 in. and standard devia-
tion .0004 in.
(a) If engineering specifications on these diame-
ters are 2.0000 in. ± .0005 in., what fraction
of these journals are in specifications?
264 Chapter 5 Probability: The Mathematics of Randomness
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
This section will further discuss the importance of this method. First, some
additional points about probability plotting are made in the familiar context where
f (x) is the standard normal density (i.e., in the context of normal plotting). Then
the general applicability of the idea is illustrated by using it in assessing the appro-
priateness of exponential and Weibull models. In the course of the discussion, the
importance of probability plotting to process capability studies and life data analysis
will be indicated.
A way of determining whether or not the students’ data support the use of
a normal model for U is to make a normal probability plot. Table 5.6 presents
the data collected by Ash, Davison, and Miyagawa. Table 5.7 shows some of the
calculations used to produce the normal probability plot in Figure 5.18.
Table 5.6
Weights of 100 U.S. Nickels
4.81 1 5.00 12
4.86 1 5.01 10
4.88 1 5.02 7
4.89 1 5.03 7
4.91 2 5.04 5
4.92 2 5.05 4
4.93 3 5.06 4
4.94 2 5.07 3
4.95 6 5.08 2
4.96 4 5.09 3
4.97 5 5.10 2
4.98 4 5.11 1
4.99 7 5.13 1
Table 5.7
Example Calculations for a Normal Plot of
Nickel Weights
i − .5 i − .5
i xi Qz
100 100
Example 13
(continued ) Standard normal quantile
2.0 3
2
4 3
2 5 4
3 75
0.0 9 7
4 73
3 44
32 3
2 2
–2.0
At least up to the resolution provided by the graphics in Figure 5.18, the plot
is pretty linear for weights above, say, 4.90 g. However, there is some indication
that the shape of the lower end of the weight distribution differs from that of a
normal distribution. Real nickels seem to be more likely to be light than a normal
model would predict. Interestingly enough, the four nickels with weights under
4.90 g were all minted in 1970 or before (these data were collected in 1988). This
suggests the possibility that the shape of the lower end of the weight distribution
is related to wear patterns and unusual damage (particularly the extreme lower
tail represented by the single 1964 coin with weight 4.81 g).
But whatever the origin of the shape in Figure 5.18, its message is clear. For
most practical purposes, a normal model for the random variable
will suffice. Bear in mind, though, that such a distribution will tend to slightly
overstate probabilities associated with larger weights and understate probabilities
associated with smaller weights.
Much was made in Section 3.2 of the fact that linearity on a Q-Q plot indicates
equality of distribution shape. But to this point, no use has been made of the fact
that when there is near-linearity on a Q-Q plot, the nature of the linear relationship
gives information regarding the relative location and spread of the two distributions
involved. This can sometimes provide a way to choose sensible parameters of a
theoretical distribution for describing the data set.
For example, a normal probability plot can be used not only to determine whether
some normal distribution might describe a random variable but also to graphically
pick out which one might be used. For a roughly linear normal plot,
5.3 Probability Plotting 267
The line eye-fit to the plot further suggests appropriate values for the mean and
standard deviation: µ ≈ 10.8 and σ ≈ 2.1. (Direct calculation with the data in
Table 5.8 gives a sample mean and standard deviation of, respectively, l̄ ≈ 10.9
and s ≈ 1.9.)
Table 5.8
Measured Thread Lengths for 25 U-Bolts
Thread Length
(.001 in. over Nominal) Tally Frequency
6 1
7 0
8 3
9 0
10 4
11 10
12 0
13 6
14 1
268 Chapter 5 Probability: The Mathematics of Randomness
Example 14
(continued ) 3.0
2.0
Standard normal quantile
1.0
Intercept ≈ 10.8
0
–1.0
Difference ≈ 2.1
–2.0
Line eye-fit to plot
–3.0
6 7 8 9 10 11 12 13 14 15 16
Thread length (.001 in. above nominal)
99 1
(2.3%)
98 2 + 2σ
95 5
90 10
(15.9%)
+σ
80 20
70 30
60 40
50 50 σ
40 60
30 70
20 80
–σ
(15.9%)
10 90
5 95
–2 σ
2 98
(2.3%)
1 99
0.5 99.5
S
T
E 0.2 99.8 –3σ
P
(0.135%)
1 VALUE
2 FREQUENCY
Figure 5.20 Thread length data plotted on a capability analysis form (used with permission of
Reynolds Metals Company)
270 Chapter 5 Probability: The Mathematics of Randomness
0
0 8 8 8 9 9
1 0 0 0 0 0 2 2 2 2 3 3 4 4 4 4 4 4
1 5 6 7 7 7 7 8 8 9 9
2 0 1 2 2 2 2 3 4
2 6 7 8 9 9 9
3 0 2 2 2 4 4
3 6 6 7 7
4 2 3
4 5 6 7 8 8
5
5
6
6
7 0
7
8
8 7
x = − ln(1 − p)
I That is, − ln(1 − p) = Q( p), the p quantile of this distribution. Thus, for data
x1 ≤ x2 ≤ · · · ≤ xn , an exponential probability plot can be made by plotting the
ordered pairs
Points to plot
i − .5
for an exponential xi , − ln 1 − (5.31)
probability plot n
Figure 5.22 is a plot of the points in display (5.31) for the service time data. It
shows remarkable linearity. Except for the fact that the third- and fourth-largest
service times (both 48 seconds) appear to be somewhat smaller than might be
predicted based on the shape of the exponential distribution, the empirical service
time distribution corresponds quite closely to the exponential distribution shape.
5
Exponential quantile
0
0 10 20 30 40 50 60 70 80 90
About 7.5 About 24
Data quantile
Example 15 As was the case in normal-plotting, the character of the linearity in Figure
(continued ) 5.22 also carries some valuable information that can be applied to the modeling
of the random variable T . The positioning of the line sketched onto the plot
indicates the appropriate location of an exponentially shaped distribution for T ,
and the slope of the line indicates the appropriate spread for that distribution.
As introduced in Definition 17, the exponential distributions have positive
density f (x) for positive x. One might term 0 a threshold value for the dis-
tributions defined there. In Figure 5.22 the threshold value (0 = Q(0)) for the
exponential distribution with α = 1 corresponds to a service time of roughly 7.5
seconds. This means that to model a variable related to T with a distribution
exactly of the form given in Definition 17, it is
S = T − 7.5
That is, an exponential model for S ought to have an associated spread that is
16.5 times that of the exponential distribution with α = 1.
So ultimately, the data in Figure 5.21 lead via exponential probability plotting
to the suggestion that
S = T − 7.5
= the excess of the next time required to complete a postage stamp sale
over a threshold value of 7.5 seconds
be described with the density
1 e−(s/16.5) for s > 0
f (s) = 16.5 (5.32)
0 otherwise
.06
Probability density
.04
Density for T
.02
Density for
S = T – 7.5
10 20 30 40 50
Service time (sec)
Points to plot !
for a fixed β i − .5 1/β
xi , − ln 1 − (5.33)
Weibull plot n
P[Y ≤ y] = P[X ≤ e y ]
y /α)β
= 1 − e−(e
Reading α and β If data in hand are consistent with a (0-threshold) Weibull (α, β) model, a reasonably
from a 0-threshold linear plot with
Weibull plot
1. slope β and
2. horizontal axis intercept equal to ln(α)
may be expected.
3
3 9.4
4 5.3
4 9.2, 9.4
5 1.3, 2.0, 3.2, 3.2, 4.9
5 5.5, 7.1, 7.2, 7.5, 9.2
6 1.0, 2.4, 3.8, 4.3
6 7.3, 7.7
stress. They were taken from Statistical Models and Methods for Lifetime Data
by J. F. Lawless.
Consider the Weibull modeling of
Table 5.9 shows some of the calculations needed to use display (5.35) to produce
Figure 5.25. The near-linearity of the plot in Figure 5.25 suggests that a (0-
threshold) Weibull distribution might indeed be used to describe R. A Weibull
shape parameter of roughly
1 − (−4)
I β ≈ slope of the fitted line ≈ ≈ 9.6
4.19 − 3.67
and thus
I α ≈ 59
appears appropriate.
276 Chapter 5 Probability: The Mathematics of Randomness
2.0
Line eye-fit to plot
1.0
About 3.67
0
ln(–ln(1 – p))
About 4.19
–1.0
About 4.08
–2.0
–3.0
–4.0
.99
0
1.
.95
.90
.80
.50
.70
.60
.50
.40
.30
.20
.10
.05
.04
.03
.02
.01
1 2 3 4 5 10 20 30 40 50 100 200 300400500 1,000 10,000
Section 3 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. What is the practical usefulness of the technique of 2. Explain how an approximate mean µ and standard
probability plotting? deviation σ can be read off a plot of standard normal
quantiles versus data quantiles.
278 Chapter 5 Probability: The Mathematics of Randomness
3. Exercise 3 of Section 3.2 refers to the chemical (b) Use the method of display (5.35) and investi-
process yield data of J. S. Hunter given in Exercise gate whether the Weibull distribution might be
1 of Section 3.1. There you were asked to make a used to describe bearing load life. If a Weibull
normal plot of those data. description is sensible, read appropriate param-
(a) If you have not already done so, use a computer eter values from the plot. Then use the form
package to make a version of the normal plot. of the Weibull cumulative probability function
(b) Use your plot to derive an approximate mean given in Section 5.2 to find the .05 quantile of
and a standard deviation for the chemical pro- the bearing load life distribution.
cess yields. 5. The data here are from the article “Fiducial Bounds
4. The article “Statistical Investigation of the Fatigue on Reliability for the Two-Parameter Negative Ex-
Life of Deep Groove Ball Bearings” by J. Leiblein ponential Distribution,” by F. Grubbs (Technomet-
and M. Zelen (Journal of Research of the National rics, 1971). They are the mileages at first failure
Bureau of Standards, 1956) contains the data given for 19 military personnel carriers.
below on the lifetimes of 23 ball bearings. The units
are 106 revolutions before failure. 162, 200, 271, 320, 393, 508, 539, 629,
706, 777, 884, 1008, 1101, 1182, 1462,
17.88, 28.92, 33.00, 41.52, 42.12, 45.60, 1603, 1984, 2355, 2880
48.40, 51.84, 51.96, 54.12, 55.56, 67.80,
68.64, 68.64, 68.88, 84.12, 93.12, 98.64, (a) Make a histogram of these data. How would
105.12, 105.84, 127.92, 128.04, 173.40 you describe its shape?
(b) Plot points (5.31) and make an exponential
(a) Use a normal plot to assess how well a normal probability plot for these data. Does it appear
distribution fits these data. Then determine if that the exponential distribution can be used
bearing load life can be better represented by to model the mileage to failure of this kind of
a normal distribution if life is expressed on the vehicle? In Example 15, a threshold service
log scale. (Take the natural logarithms of these time of 7.5 seconds was suggested by a similar
data and make a normal plot.) What mean and exponential probability plot. Does the present
standard deviation would you use in a normal plot give a strong indication of the need for a
description of log load life? For these parame- threshold mileage larger than 0 if an exponen-
ters, what are the .05 quantiles of ln(life) and tial distribution is to be used here?
of life?
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Example 17 the data displayed in Table 3.4 (see page 74) and Figure 3.9 suggest, for exam-
(continued ) ple, that a sensible value for P[X = 18 and Y = 18] might be 34 1
, the relative
frequency of this pair in the data set. Similarly, the assignments
2
P[X = 18 and Y = 17] =
34
P[X = 14 and Y = 9] = 0
Table 5.10
f (x, y) for the Bolt Torque Problem
y x 11 12 13 14 15 16 17 18 19 20
20 2/34 2/34 1/34
19 2/34
18 1/34 1/34 1/34 1/34 1/34
17 2/34 1/34 1/34 2/34
16 1/34 2/34 2/34 2/34
15 1/34 1/34 3/34
14 1/34 2/34
13 1/34
Properties of a The probability function given in tabular form in Table 5.10 has two properties
joint probability that are necessary for mathematical consistency. These are that the f (x, y) values
function for X and Y are each in the interval [0, 1] and that they total to 1. By summing up just some
of the f (x, y) values, probabilities associated with X and Y being configured in
patterns of interest are obtained.
Example 17 Consider using the joint distribution given in Table 5.10 to evaluate
(continued )
P[X ≥ Y ] ,
P[|X − Y | ≤ 1] ,
and P[X = 17]
Take first P[X ≥ Y ], the probability that the measured bolt 3 torque is at least
as big as the measured bolt 4 torque. Figure 5.27 indicates with asterisks which
possible combinations of x and y lead to bolt 3 torque at least as large as the
5.4 Joint Distributions and Independence 281
bolt 4 torque. Referring to Table 5.10 and adding up those entries corresponding
to the cells that contain asterisks,
P[X ≥ Y ] = f (15, 13) + f (15, 14) + f (15, 15) + f (16, 16)
+ f (17, 17) + f (18, 14) + f (18, 17) + f (18, 18)
+ f (19, 16) + f (19, 18) + f (20, 20)
1 1 3 2 1 17
= + + + + ··· + =
34 34 34 34 34 34
Similar reasoning allows evaluation of P[|X − Y | ≤ 1]—the probability that
the bolt 3 and 4 torques are within 1 ft lb of each other. Figure 5.28 shows
combinations of x and y with an absolute difference of 0 or 1. Then, adding
probabilities corresponding to these combinations,
P[|X − Y | ≤ 1] = f (15, 14) + f (15, 15) + f (15, 16) + f (16, 16)
+ f (16, 17) + f (17, 17) + f (17, 18) + f (18, 17)
18
+ f (18, 18) + f (19, 18) + f (19, 20) + f (20, 20) =
34
x 11 12 13 14 15 16 17 18 19 20
y
20 *
19 * *
18 * * *
17 * * * *
16 * * * * *
15 * * * * * *
14 * * * * * * *
13 * * * * * * * *
Figure 5.27 Combinations of bolt 3
and bolt 4 torques with x ≥ y
x 11 12 13 14 15 16 17 18 19 20
y
20 * *
19 * * *
18 * * *
17 * * *
16 * * *
15 * * *
14 * * *
13 * * *
Figure 5.28 Combinations of bolt 3
and bolt 4 torques with |x − y| ≤ 1
282 Chapter 5 Probability: The Mathematics of Randomness
Example 17 Finally, P[X = 17], the probability that the measured bolt 3 torque is 17 ft lb,
(continued ) is obtained by adding down the x = 17 column in Table 5.10. That is,
Finding marginal In bivariate problems like the present one, one can add down columns in a two-
probability functions way table giving f (x, y) to get values for the probability function of X , f X (x). And
using a bivariate one can add across rows in the same table to get values for the probability function
joint probability of Y , f Y (y). One can then write these sums in the margins of the two-way table.
function So it should not be surprising that probability distributions for individual random
variables obtained from their joint distribution are called marginal distributions.
A formal statement of this terminology in the case of two discrete variables is
next.
Definition 20 The individual probability functions for discrete random variables X and
Y with joint probability function f (x, y) are called marginal probability
functions. They are obtained by summing f (x, y) values over all possible
values of the other variable. In symbols, the marginal probability function for
X is
X
f X (x) = f (x, y)
y
X
f Y (y) = f (x, y)
x
Example 17 Table 5.11 is a copy of Table 5.10, augmented by the addition of marginal
(continued ) probabilities for X and Y . Separating off the margins from the two-way table
produces tables of marginal probabilities in the familiar format of Section 5.1.
For example, the marginal probability function of Y is given separately in Table
5.12.
5.4 Joint Distributions and Independence 283
Table 5.11
Joint and Marginal Probabilities for X and Y
y x 11 12 13 14 15 16 17 18 19 20 f Y (y)
20 2/34 2/34 1/34 5/34
19 2/34 2/34
18 1/34 1/34 1/34 1/34 1/34 5/34
17 2/34 1/34 1/34 2/34 6/34
16 1/34 2/34 2/34 2/34 7/34
15 1/34 1/34 3/34 5/34
14 1/34 2/34 3/34
13 1/34 1/34
f X (x) 1/34 1/34 1/34 2/34 9/34 3/34 4/34 7/34 5/34 1/34
Table 5.12
Marginal
Probability
Function for Y
y f Y (y)
13 1/34
14 3/34
15 5/34
16 7/34
17 6/34
18 5/34
19 2/34
20 5/34
Distribution 1 Distribution 2
x x
y 1 2 3 y 1 2 3
3 .4 0 0 .4 3 .16 .16 .08 .4
2 0 .4 0 .4 2 .16 .16 .08 .4
1 0 0 .2 .2 1 .08 .08 .04 .2
.4 .4 .2 .4 .4 .2
example, in the bolt (X ) torque situation, a technician who has just loosened bolt
3 and measured the torque as 15 ft lb ought to have expectations for bolt 4 torque
(Y ) somewhat different from those described by the marginal distribution in Table
5.12. After all, returning to the data in Table 3.4 that led to Table 5.10, the relative
frequency distribution of bolt 4 torques for those components with bolt 3 torque
of 15 ft lb is as in Table 5.13. Somehow, knowing that X = 15 ought to make a
probability distribution for Y like the relative frequency distribution in Table 5.13
more relevant than the marginal distribution given in Table 5.12.
Table 5.13
Relative Frequency Distribution for Bolt 4
Torques When Bolt 3 Torque Is 15 ft lb
Definition 21 For discrete random variables X and Y with joint probability function f (x, y),
the conditional probability function of X given Y = y is the function of x
f (x, y)
f X |Y (x | y) = X
f (x, y)
x
5.4 Joint Distributions and Independence 285
f (x, y)
f Y |X (y | x) = X
f (x, y)
y
and
The conditional
f (x, y)
probability function f Y |X (y | x) = (5.37)
for Y given X = x f X (x)
And formulas (5.36) and (5.37) are perfectly sensible. Equation (5.36) says
Finding conditional that starting from f (x, y) given in a two-way table and looking only at the row
distributions from specified by Y = y, the appropriate (conditional) distribution for X is given by
a joint probability the probabilities in that row (the f (x, y) values) divided by their sum ( f Y (y) =
P
function x f (x, y)), so that they are renormalized to total to 1. Similarly, equation (5.37)
says that looking only at the column specified by X = x, the appropriate conditional
distribution for Y is given by the probabilities in that column divided by their sum.
Example 17 To illustrate the use of equations (5.36) and (5.37), consider several of the condi-
(continued ) tional distributions associated with the joint distribution for the bolt 3 and bolt 4
torques, beginning with the conditional distribution for Y given that X = 15.
From equation (5.37),
f (15, y)
f Y |X (y | 15) =
f X (15)
conditional distribution for Y given in Table 5.14. Comparing this to Table 5.13,
indeed formula (5.37) produces a conditional distribution that agrees with
intuition.
286 Chapter 5 Probability: The Mathematics of Randomness
y f Y |X (y | 15)
1 9 1
13 ÷ =
34 34 9
1 9 1
14 ÷ =
34 34 9
3 9 3
15 ÷ =
34 34 9
2 9 2
16 ÷ =
34 34 9
2 9 2
17 ÷ =
34 34 9
f (18, y)
f Y |X (y | 18) =
f X (18)
Consulting Table 5.11 again leads to the conditional distribution for Y given that
X = 18, shown in Table 5.15. Tables 5.14 and 5.15 confirm that the conditional
distributions of Y given X = 15 and given X = 18 are quite different. For exam-
ple, knowing that X = 18 would on the whole make one expect Y to be larger
than when X = 15.
Table 5.15
The Conditional
Probability Function for
Y Given X = 18
y f Y |X (y | 18)
14 2/7
17 2/7
18 1/7
20 2/7
To make sure that the meaning of equation (5.36) is also clear, consider the
conditional distribution of the bolt 3 torque (X ) given that the bolt 4 torque is 20
5.4 Joint Distributions and Independence 287
f (x, 20)
f X |Y (x | 20) =
f Y (20)
Table 5.16
The Conditional Probability
Function for X Given Y = 20
x f X |Y (x | 20)
2 5 2
18 ÷ =
34 34 5
2 5 2
19 ÷ =
34 34 5
1 5 1
20 ÷ =
34 34 5
The bolt torque example has the feature that the conditional distributions for Y
given various possible values for X differ. Further, these are not generally the same
as the marginal distribution for Y . X provides some information about Y , in that
depending upon its value there are differing probability assessments for Y . Contrast
this with the following example.
and
Example 18 Intuition dictates that (in contrast to the situation of X and Y in Example 17) the
(continued ) variables U and V don’t furnish any information about each other. Regardless of
what value U takes, the relative frequency distribution of bolt 4 torques in the hat
is appropriate as the (conditional) probability distribution for V , and vice versa.
That is, not only do U and V share the common marginal distribution given in
Table 5.17 but it is also the case that for all u and v, both
fU |V (u | v) = fU (u) (5.38)
and
f V |U (v | u) = f V (v) (5.39)
Equations (5.38) and (5.39) say that the marginal probabilities in Table 5.17
also serve as conditional probabilities. They also specify how joint probabilities
for U and V must be structured. That is, rewriting the left-hand side of equation
(5.38) using expression (5.36),
f (u, v)
= fU (u)
f V (v)
That is,
(The same logic applied to equation (5.39) also leads to equation (5.40).) Ex-
pression (5.40) says that joint probability values for U and V are obtained by
multiplying corresponding marginal probabilities. Table 5.18 gives the joint prob-
ability function for U and V .
Table 5.17
The Common Marginal
Probability Function for U
and V
u or v fU (u) or f V (v)
13 1/34
14 3/34
15 5/34
16 7/34
17 6/34
18 5/34
19 2/34
20 5/35
5.4 Joint Distributions and Independence 289
Table 5.18
Joint Probabilities for U and V
v u 13 14 15 16 17 18 19 20 f V (v)
5 15 25 35 30 25 10 25
20 5/34
(34)2 (34)2 (34)2 (34)2 (34)2 (34)2 (34)2 (34)2
2 6 10 14 12 10 4 10
19 2/34
(34)2 (34)2 (34)2 (34)2 (34)2 (34)2 (34)2 (34)2
5 15 25 35 30 25 10 25
18 5/34
(34)2 (34)2 (34)2 (34)2 (34)2 (34)2 (34)2 (34)2
6 18 30 42 36 30 12 30
17 6/34
(34)2 (34)2 (34)2 (34)2 (34)2 (34)2 (34)2 (34)2
7 21 35 49 42 35 14 35
16 7/34
(34)2 (34)2 (34)2 (34)2 (34)2 (34)2 (34)2 (34)2
5 15 25 35 30 25 10 25
15 5/34
(34)2 (34)2 (34)2 (34)2 (34)2 (34)2 (34)2 (34)2
3 9 15 21 18 15 6 15
14 3/34
(34)2 (34)2 (34)2 (34)2 (34)2 (34)2 (34)2 (34)2
1 3 5 7 6 5 2 5
13 1/34
(34)2 (34)2 (34)2 (34)2 (34)2 (34)2 (34)2 (34)2
Example 18 suggests that the intuitive notion that several random variables are
unrelated might be formalized in terms of all conditional distributions being equal to
their corresponding marginal distributions. Equivalently, it might be phrased in terms
of joint probabilities being the products of corresponding marginal probabilities. The
formal mathematical terminology is that of independence of the random variables.
The definition for the two-variable case is next.
Definition 22 Discrete random variables X and Y are called independent if their joint prob-
ability function f (x, y) is the product of their respective marginal probability
functions. That is, independence means that
If formula (5.41) does not hold, the variables X and Y are called dependent.
(Formula (5.41) does imply that conditional distributions are all equal to their cor-
responding marginals, so that the definition does fit its “unrelatedness” motivation.)
U and V in Example 18 are independent, whereas X and Y in Example 17
are dependent. Further, the two joint distributions depicted in Figure 5.29 give an
example of a highly dependent joint distribution (the first) and one of independence
Independence of (the second) that have the same marginals.
observations in The notion of independence is a fundamental one. When it is sensible to model
statistical studies random variables as independent, great mathematical simplicity results. Where
290 Chapter 5 Probability: The Mathematics of Randomness
engineering data are being collected in an analytical context, and care is taken to
make sure that all obvious physical causes of carryover effects that might influence
successive observations are minimal, an assumption of independence between
observations is often appropriate. And in enumerative contexts, relatively small
(compared to the population size) simple random samples yield observations that
can typically be considered as at least approximately independent.
Example 18 Again consider putting bolt torques on slips of paper in a hat. The method of torque
(continued ) selection described earlier for producing U and V is not simple random sam-
pling. Simple random sampling as defined in Section 2.2 is without-replacement
sampling, not the with-replacement sampling method used to produce U and V .
Indeed, if the first slip is not replaced before the second is selected, the proba-
bilities in Table 5.18 are not appropriate for describing U and V . For example,
if no replacement is done, since only one slip is labeled 13 ft lb, one clearly
wants
1
f (13, 13) =
(34)2
f V |U (13 | 13) = 0
1
f V |U (13 | 13) = f V (13) =
34
99
f V |U (13 | 13) =
3,399
5.4 Joint Distributions and Independence 291
f (u, v)
f V |U (v | u) =
fU (u)
so that
f (u, v) = f V |U (v | u) fU (u)
99 1
f (13, 13) = ·
3,399 34
99 1
≈
3,399 34
and so
1 1
f (13, 13) ≈ ·
34 34
For this hypothetical situation where the population size N = 3,400 is much
larger than the sample size n = 2, independence is a suitable approximate de-
scription of observations obtained using simple random sampling.
Where several variables are both independent and have the same marginal
distributions, some additional jargon is used.
For example, the joint distribution of U and V given in Table 5.18 shows U and V
to be iid random variables.
The standard statistical examples of iid random variables are successive mea-
surements taken from a stable process and the results of random sampling with
When can replacement from a single population. The question of whether an iid model is
observations be appropriate in a statistical application thus depends on whether or not the data-
modeled as iid? generating mechanism being studied can be thought of as conceptually equivalent
to these.
292 Chapter 5 Probability: The Mathematics of Randomness
ZZ
P[(X, Y ) ∈ R] = f (x, y) dx dy (5.42)
R
Imagine that the true value of S will be measured with a (very imprecise) analog
stopwatch, producing the random variable
f(s, r)
r
10
0 s
–10 10 20 30
0
Example 19 mean s and standard deviation .5.) Thus, equation (5.43) specifies a mathemati-
(continued ) cally legitimate joint probability density.
To illustrate the use of a joint probability density in finding probabilities, first
consider evaluating P[R > S]. Figure 5.31 shows the region in the (s, r )-plane
where f (s, r ) > 0 and r > s. It is over this region that one must integrate in
order to evaluate P[R > S]. Then,
ZZ
P[R > S] = f (s, r ) ds dr
R
Z ∞ Z ∞
= f (s, r ) dr ds
0 s
Z ∞ Z ∞
1 −s/16.5 1 2
= e √ e−(r −s) /2(.25) dr ds
0 16.5 s 2π(.25)
Z ∞
1 −s/16.5 1
= e ds
0 16.5 2
1
=
2
(once again using the fact that the integral in braces is a normal (mean s and
standard deviation .5) probability).
As a second example, consider the problem of evaluating P[S > 20]. Figure
5.32 shows the region over which f (s, r ) must be integrated in order to evaluate
P[S > 20]. Then,
ZZ
P[S > 20] = f (s, r ) ds dr
R
Z ∞ Z ∞
= f (s, r ) dr ds
20 −∞
Z ∞ Z ∞
1 −s/16.5 1 2
= e √ e−(r −s) /2(.25) dr ds
20 16.5 −∞ 2π(.25)
Z ∞
1 −s/16.5
= e ds
20 16.5
= e−20/16.5
≈ .30
5.4 Joint Distributions and Independence 295
r
r
20
Region
3 where
10 s > 20
2
0 10 20 s
Figure 5.31 Region where f (s, r) > 0 Figure 5.32 Region where f (s, r) > 0
and r > s and s > 20
The last part of the example essentially illustrates the fact that for X and Y with
joint density f (x, y),
Z x Z ∞
F(x) = P[X ≤ x] = f (t, y) dy dt
−∞ −∞
Definition 25 The individual probability densities for continuous random variables X and
Y with joint probability density f (x, y) are called marginal probability
densities. They are obtained by integrating f (x, y) over all possible values of
the other variable. In symbols, the marginal probability density function for
X is
Z ∞
f X (x) = f (x, y) dy (5.44)
−∞
Z ∞
f Y (y) = f (x, y) dx (5.45)
−∞
296 Chapter 5 Probability: The Mathematics of Randomness
Compare Definitions 20 and 25 (page 282). The same kind of thing is done
for jointly continuous variables to find marginal distributions as for jointly discrete
variables, except that integration is substituted for summation.
Example 19 Starting with the joint density specified by equation (5.43), it is possible to arrive
(continued ) at reasonably explicit expressions for the marginal densities for S and R. First
considering the density of S, Definition 25 declares that for s > 0,
Z ∞
1 −s/16.5 1 2
f S (s) = e √ e−(r −s) /2(.25) dr
−∞ 16.5 2π(.25)
1 −s/16.5
= e
16.5
That is, the form of f (s, r ) was chosen so that (as suggested by Example 15)
S has an exponential distribution with mean α = 16.5.
The determination of f R (r ) is conceptually no different than the determi-
nation of f S (s), but the details are more complicated. Some work (involving
completion of a square in the argument of the exponential function and recogni-
tion of an integral as a normal probability) will show the determined reader that
for any r ,
Z ∞
1 2
f R (r ) = √ e−(s/16.5)−((r −s) /2(.25)) ds
0 16.5 2π(.25)
1 1 1 r
I = 1−8 − 2r exp − (5.46)
16.5 33 2,178 16.5
fR(r)
–5 0 5 10 15 20 25 30 r
The marginal density for R derived from equation (5.43) does not belong to
any standard family of distributions. Indeed, there is generally no guarantee that the
process of finding marginal densities from a joint density will produce expressions
for the densities even as explicit as that in display (5.46).
Definition 26 For continuous random variables X and Y with joint probability density
f (x, y), the conditional probability density function of X given Y = y,
is the function of x
f (x, y)
f X |Y (x | y) = Z ∞
f (x, y) dx
−∞
f (x, y)
f Y |X (y | x) = Z ∞
f (x, y) dy
−∞
and
f(x, y)
y
x
Expressions (5.47) and (5.48) are formally identical to the expressions (5.36) and
Geometry of (5.37) relevant for discrete variables. The geometry indicated by equation (5.47) is
conditional that the shape of f X |Y (x | y) as a function of x is determined by cutting the f (x, y)
densities surface in a graph like that in Figure 5.34 with the Y = y-plane. In Figure 5.34,
the divisor in equation (5.47) is the area of the shaded figure above the (x, y)-plane
below the f (x, y) surface on the Y = y plane. That division serves to produce a
function of x that will integrate to 1. (Of course, there is a corresponding geometric
story told for the conditional distribution of Y given X = x in expression (5.48)).
Example 19 In the service time example, it is fairly easy to recognize the conditional distribu-
(continued ) tion of R given S = s as having a familiar form. For s > 0, applying expression
(5.48),
f (s, r ) 1 −s/16.5
f R|S (r | s) = = f (s, r ) ÷ e
f S (s) 16.5
1
I
2
f R|S (r | s) = √ e−(r −s) /2(.25) (5.49)
2π(.25)
That is, given that S = s, the conditional distribution of R is normal with mean
s and standard deviation .5.
This realization is consistent with the bell-shaped cross sections of f (s, r )
shown in Figure 5.30. The form of f R|S (r | s) given in equation (5.49) says that
the measured excess service time is the true excess service time plus a normally
distributed measurement error that has mean 0 and standard deviation .5.
5.4 Joint Distributions and Independence 299
It is evident from expression (5.49) (or from the way the positions of the bell-
shaped contours on Figure 5.30 vary with s) that the variables S and R ought to be
called dependent. After all, knowing that S = s gives the value of R except for a
normal error of measurement with mean 0 and standard deviation .5. On the other
hand, had it been the case that all conditional distributions of R given S = s were
the same (and equal to the marginal distribution of R), S and R should be called
independent. The notion of unchanging conditional distributions, all equal to their
corresponding marginal, is equivalently and more conveniently expressed in terms
of the joint probability density factoring into a product of marginals. The formal
version of this for two variables is next.
Definition 27 Continuous random variables X and Y are called independent if their joint
probability density function f (x, y) is the product of their respective marginal
probability densities. That is, independence means that
If expression (5.50) does not hold, the variables X and Y are called dependent.
Example 20 Residence Hall Depot Counter Service Times and iid Variables
(Example 15 revisited )
Returning once more to the service time example of Jenkins, Milbrath, and Worth,
consider the next two excess service times encountered,
To the extent that the service process is physically stable (i.e., excess service times
can be thought of in terms of sampling with replacement from a single population),
an iid model seems appropriate for S1 and S2 . Treating excess service times as
300 Chapter 5 Probability: The Mathematics of Randomness
Example 20 marginally exponential with mean α = 16.5 thus leads to the joint density for S1
(continued ) and S2 :
1
e−(s1 +s2 )/16.5 if s1 > 0 and s2 > 0
f (s1 , s2 ) = (16.5)2
0 otherwise
Section 4 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Explain in qualitative terms what it means for two What is the mean of this conditional distribu-
random variables X and Y to be independent. What tion?
advantage is there when X and Y can be described 3. A laboratory receives four specimens having iden-
as independent? tical appearances. However, it is possible that (a
2. Quality audit records are kept on numbers of major single unknown) one of the specimens is contam-
and minor failures of circuit packs during burn-in inated with a toxic material. The lab must test the
of large electronic switching devices. They indicate specimens to find the toxic specimen (if in fact one
that for a device of this type, the random variables is contaminated). The testing plan first put forth
by the laboratory staff is to test the specimens one
X = the number of major failures at a time, stopping when (and if) a contaminated
specimen is found.
and
Define two random variables
Y = the number of minor failures
X = the number of contaminated specimens
can be described at least approximately by the ac-
and
companying joint distribution.
Y = the number of specimens tested
/
y x 0 1 2 Let p = P[X = 0] and therefore P[X = 1] =
0 .15 .05 .01 1 − p.
1 .10 .08 .01
(a) Give the conditional distributions of Y given
X = 0 and X = 1 for the staff’s initial test-
2 .10 .14 .02
ing plan. Then use them to determine the joint
3 .10 .08 .03
probability function of X and Y . (Your joint
4 .05 .05 .03 distribution will involve p, and you may sim-
ply fill out tables like the accompanying ones.)
(a) Find the marginal probability functions for
both X and Y — f X (x) and f Y (y).
(b) Are X and Y independent? Explain. y f Y |X (y | 0) y f Y |X (y | 1)
(c) Find the mean and variance of X—EX and
Var X. 1 1
(d) Find the mean and variance of Y —EY and 2 2
Var Y . 3 3
(e) Find the conditional probability function for Y , 4 3
given that X = 0—i.e., that there are no major
circuit pack failures. (That is, find f Y |X (y | 0).)
5.4 Joint Distributions and Independence 301
the cumulative probability function for T is (d) The probability that the system has failed by
time t is
F(t) = 1 − P[X > t and Y > t]
P[X ≤ t and Y ≤ t]
so that your answer to (b) can be used to find
the distribution for T . Use your answer to (b) Find this probability using your answer to
and some differentiation to find the probability part (a).
density for T . What kind of distribution does (e) Now, as before, let T be the time until the
T have? What is its mean? system fails. Use your answer to (d) and some
Suppose now that the system is a parallel system differentiation to find the probability density
(i.e., one that fails only when both subsystems fail). for T . Then calculate the mean of T .
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
U = g(X, Y, . . . , Z ) (5.51)
In some special simple cases, it is possible to figure out exactly what distribution U
inherits from X, Y, . . . , Z .
Table 5.19
Relative Frequency Distribution of Plate
Thicknesses
Table 5.20
Relative Frequency Distribution of Slot
Widths
quency distribution in Table 5.19; a relative frequency distribution for the slot
widths measured on a lot of machined blocks is given in Table 5.20.
If a plate is randomly selected and a block is separately randomly selected,
a natural joint distribution for the random variables
U =Y −X
Notice that taking the extremes represented in Tables 5.19 and 5.20, U is guaran-
teed to be at least .153 − .150 = .003 in. but no more than .156 − .148 = .008 in.
In fact, much more than this can be said. Looking at Table 5.21, one can see that
the diagonals of entries (lower left to upper right) all correspond to the same value
of Y − X. Adding probabilities on those diagonals produces the distribution of
U given in Table 5.22.
304 Chapter 5 Probability: The Mathematics of Randomness
f X (x) .4 .3 .3
Table 5.22
The Probability Function for the
Clearance U = Y − X
u f (u)
.003 .06
.004 .12 = .06 + .06
.005 .26 = .08 + .06 + .12
.006 .26 = .08 + .12 + .06
.007 .22 = .16 + .06
.008 .08
Example 21 involves a very simple discrete joint distribution and a very simple
function g—namely, g(x, y) = y − x. In general, exact complete solution of the
problem of finding the distribution of U = g(X, Y, . . . , Z ) is not practically possi-
ble. Happily, for many engineering applications of probability, approximate and/or
partial solutions suffice to answer the questions of practical interest. The balance
of this section studies methods of producing these approximate and/or partial de-
scriptions of the distribution of U , beginning with a brief look at simulation-based
methods.
The authors further give some uncertainty values associated with each of the terms
appearing on the right side of equation (5.52) for an example set of measured
values of the variables. These are given in Table 5.23.
Table 5.23
Reported Uncertainties in the Measured Inputs
to Collector Efficiency
Example 22 Plugging the measured values from Table 5.23 into formula (5.52) produces
(continued ) a measured efficiency of about .44. But how good is the .44 value? That is, how
do the uncertainties associated with the measured values affect the reliability of
the .44 figure? Should you think of the calculated solar collector efficiency as .44
plus or minus .001, or plus or minus .1, or what?
One way of approaching this is to ask the related question, “What would
be the standard deviation of Efficiency if all of C through To were independent
random variables with means approximately equal to the measured values and
standard deviations related to the uncertainties as, say, half of the uncertainty
values?” (This “two sigma” interpretation of uncertainty appears to be at least
close to the intention in the original article.)
Printout 1 is from a MINITAB session in which 100 normally distributed
realizations of variables C through To were generated (using means equal to
measured values and standard deviations equal to half of the corresponding
uncertainties) and the resulting efficiencies calculated. (The routine under the
“Calc/Random Data/Normal” menu was used to generate the realizations of
C through To . The “Calc/Calculator” menu was used to combine these val-
ues according to equation (5.52). Then routines under the “Stat/Basic Statis-
tics/Describe” and “Graph/Character Graphs/Stem-and-Leaf” menus were used
to produce the summaries of the simulated efficiencies.) The simulation produced
a roughly bell-shaped distribution of calculated efficiencies, possessing a mean
value of approximately .437 and standard deviation of about .009. Evidently,
if one continues with the understanding that uncertainty means something like
“2 standard deviations,” an uncertainty of about .02 is appropriate for the nominal
efficiency figure of .44.
5 41 58899
10 42 22334
24 42 66666777788999
39 43 001112233333444
(21) 43 555556666777889999999
40 44 00000011122333444
5.5 Functions of Several Random Variables 307
23 44 555556667788889
8 45 023344
2 45 7
1 46 0
The beauty of Example 22 is the ease with which a simulation can be employed
to approximate the distribution of U . But the method is so powerful and easy to use
that some cautions need to be given about the application of this whole topic before
going any further.
Practical cautions Be careful not to expect more than is sensible from a derived probability distri-
bution (“exact” or approximate) for
U = g(X, Y, . . . , Z )
The output distribution can be no more realistic than are the assumptions used
to produce it (i.e., the form of the joint distribution and the form of the function
g(x, y, . . . , z)). It is all too common for people to apply the methods of this section
using a g representing some approximate physical law and U some measurable
physical quantity, only to be surprised that the variation in U observed in the real
world is substantially larger than that predicted by methods of this section. The fault
lies not with the methods, but with the naivete of the user. Approximate physical
laws are just that, often involving so-called constants that aren’t constant, using
functional forms that are too simple, and ignoring the influence of variables that
aren’t obvious or easily measured. Further, although independence of X, Y, . . . , Z
is a very convenient mathematical property, its use is not always justified. When
it is inappropriately used as a model assumption, it can produce an inappropriate
distribution for U . For these reasons, think of the methods of this section as useful
but likely to provide only a best-case picture of the variation you should expect
to see.
EU = a0 + a1 EX + a2 EY + · · · + an EZ (5.53)
308 Chapter 5 Probability: The Mathematics of Randomness
and variance
Example 21 Consider again the situation of the clearance involved in placing a steel plate
(continued ) in a machined slot on a steel block. With X , Y , and U being (respectively) the
plate thickness, slot width, and clearance, means and variances for these variables
can be calculated from Tables 5.19, 5.20, and 5.22, respectively. The reader is
encouraged to verify that
Now, since
U = Y − X = (−1)X + 1Y
so that
√
I Var U = .0013 in.
It is worth the effort to verify that the mean and standard deviation of the clearance
produced using Proposition 1 agree with those obtained using the distribution of
U given in Table 5.22 and the formulas for the mean and variance given in Section
5.1. The advantage of using Proposition 1 is that if all that is needed are EU
and Var U , there is no need to go through the intermediate step of deriving the
5.5 Functions of Several Random Variables 309
1 1 1
X= X1 + X2 + · · · + Xn
n n n
The mean of an
1 1 1 1
average of n iid EX = EX1 + EX2 + · · · + E X n = n µ =µ (5.55)
random variables n n n n
and
1 2
2
1 2
The variance of an Var X = n
Var X 1 + n1 Var X 2 + · · · + Var X n
average of n iid n
1 2 2 σ2 (5.56)
random variables =n σ =
n n
Since σ 2 /n is decreasing in n, equations (5.55) and (5.56) give the reassuring picture
of X having a probability distribution centered at the population mean µ, with spread
that decreases as the sample size increases.
Example 23 consider what means and standard deviations are associated with the probability
(continued ) distributions of the sample average, S, of first the next 4 and then the next 100
excess service times.
S1 , S2 , . . . , S100 are, to the extent that the service process is physically stable,
reasonably modeled as independent, identically distributed, exponential random
variables with mean α = 16.5. The exponential distribution with mean α = 16.5
has variance equal to α 2 = (16.5)2 . So, using formulas (5.55) and (5.56), for the
first 4 additional service times,
E S = α = 16.5 sec
s
p α2
Var S = = 8.25 sec
4
E S = α = 16.5 sec
s
p α2
Var S = = 1.65 sec
100
Relationships (5.55) and (5.56), which perfectly describe the random behavior
of X under random sampling with replacement, are also approximate descriptions of
the behavior of X under simple random sampling in enumerative contexts. (Recall
Example 18 and the discussion about the approximate independence of observations
resulting from simple random sampling of large populations.)
where the partial derivatives are evaluated at (x, y, . . . , z) = (EX, EY, . . . , EZ).
Now the right side of approximation (5.57) is linear in x, y, . . . , z. Thus, if the vari-
ances of X, Y, . . . , Z are small enough so that with high probability, X, Y, . . . , Z are
such that approximation (5.57) is effective, one might think of plugging X, Y, . . . , Z
into expression (5.57) and applying Proposition 1, thus winding up with approxi-
mations for the mean and variance of U = g(X, Y, . . . , Z ).
2 2 2
∂g ∂g ∂g
Var U ≈ Var X + Var Y + · · · + Var Z (5.59)
∂x ∂y ∂z
Formulas (5.58) and (5.59) are often called the propagation of error or transmis-
sion of variance formulas. They describe how variability or error is propagated or
transmitted through an exact mathematical function.
Comparison of Propositions 1 and 2 shows that when g is exactly linear, ex-
pressions (5.58) and (5.59) reduce to expressions (5.53) and (5.54), respectively.
(a1 through an are the partial derivatives of g in the case where g(x, y, . . . , z) =
a0 + a1 x + a2 y + · · · + an z.) Proposition 2 is purposely vague about when the
approximations (5.58) and (5.59) will be adequate for engineering purposes. Mathe-
matically inclined readers will not have much trouble constructing examples where
the approximations are quite poor. But often in engineering applications, expres-
sions (5.58) and (5.59) are at least of the right order of magnitude and certainly
better than not having any usable approximations.
∂g
=1
∂r1
∂g (r + r3 )r3 − r2r3 r32
= 2 =
∂r2 (r2 + r3 )2
(r2 + r3 )2
∂g (r + r3 )r2 − r2r3 r22
= 2 =
∂r3 (r2 + r3 )2 (r2 + r3 )2
Also, R1 , R2 , and R3 are approximately independent with means 100 and stan-
dard deviations 2. Then formulas (5.58) and (5.59) suggest that the probability
distribution inherited by R has mean
(100)(100)
I E R ≈ g(100,100,100) = 100 + = 150
100 + 100
and variance
!2 !2
(100)2 (100)2
Var R ≈ (1) (2) +
2 2
(2) +
2
(2)2 = 4.5
(100 + 100)2 (100 + 100)2
As something of a check on how good the 150 and 2.12 values are, 1,000
sets of normally distributed R1 , R2 , and R3 values with the specified population
mean and standard deviation were simulated and resulting values of R calculated
via formula (5.60). These simulated assembly resistances had R = 149.80 and
a sample standard deviation of 2.14 . A histogram of these values is given in
Figure 5.36.
5.5 Functions of Several Random Variables 313
Resistor 2
Resistor 1
Resistor 3
200
Frequency
100
0
145 150 155
Simulated value of R
u = g(x)
Relatively
for U = g(X )
∂g
small
∂x
Relatively
∂g
large
∂x
∂g
Figure 5.37 Illustration of the Effect of on Var U
∂x
The effects of Consider first the effect that g’s partial derivatives have on Var U . Formula (5.59)
the partial implies that depending on the size of ∂∂gx , the variance of X is either inflated or deflated
derivatives of g before becoming an ingredient of Var U . And even though formula (5.59) may not
on Var U be an exact expression, it provides correct intuition. If a given change in x produces
a big change in g(x, y, . . . , z), the impact Var X has on Var U will be greater than
if the change in x produces a small change in g(x, y, . . . , z). Figure 5.37 is a rough
illustration of this point. In the case that U = g(X ), two different approximately
normal distributions for X with different means but a common variance produce
radically different spreads in the distribution of U , due to differing rates of change
of g (different derivatives).
Partitioning the Then, consider the possibility of partitioning the variance of U into interpretable
variance of U pieces. Formula (5.59) suggests thinking of (for example)
2
∂g
Var X
∂x
Example 22 Return to the solar collector example. For means of C through To taken to be
(continued ) the measured values in Table 5.23 (page 305), and standard deviations of C
through To equal to half of the uncertainties listed in the same table, formula
5.5 Functions of Several Random Variables 315
(5.59) might well be applied to the calculated efficiency given in formula (5.52).
The squared partial derivatives of Efficiency with respect to each of the inputs,
times the variances of those inputs, are as given in Table 5.24. Thus, the ap-
proximate standard deviation for the efficiency variable provided by formula
(5.59) is
p
I 8.28 × 10−5 ≈ .009
which agrees quite well with the value obtained earlier via simulation.
What’s given in Table 5.24 that doesn’t come out of a simulation is some
understanding of the biggest contributors to the uncertainty. The largest contri-
bution listed in Table 5.24 corresponds to variable G, followed in order by those
corresponding to variables Mo , To , and Ti . At least for the values of the means
used in this example, it is the uncertainties in those variables that principally
produce the uncertainty in Efficiency. Knowing this gives direction to efforts to
improve measurement methods. Subject to considerations of feasibility and cost,
measurement of the variable G deserves first attention, followed by measurement
of the variables Mo , To , and Ti .
Notice, however, that reduction of the uncertainty in G alone to essentially 0
−5
would still leave a total in Table 5.24 of about
p 4.01×10 and thus an approximate
standard deviation for Efficiency of about 4.01 × 10−5 ≈ .006. Calculations of
this kind emphasize the need for reductions in the uncertainties of Mo , To , and
Ti as well, if dramatic (order of magnitude) improvements in overall uncertainty
are to be realized.
Table 5.24
Contributions to the Output Variation in
Collector Efficiency
C 4.73 × 10−8
G 4.27 × 10−5
A 4.76 × 10−7
Mi 5.01 × 10−7
Mo 1.58 × 10−5
Ta 3.39 × 10−8
Ti 1.10 × 10−5
To 1.22 × 10−5
A proof of Proposition 3 is outside the purposes of this text. But intuition about the
effect is fairly easy to develop through an example.
Example 25 The Central Limit Effect and the Sample Mean of Tool Serial Numbers
(Example 2 revisited )
Consider again the example from Section 5.1 involving the last digit of essentially
randomly selected serial numbers of pneumatic tools. Suppose now that
W1 = the last digit of the serial number observed next Monday at 9 A.M.
W2 = the last digit of the serial number observed the following Monday at 9 A.M.
A plausible model for the pair of random variables W1 , W2 is that they are
independent, each with the marginal probability function
(
.1 if w = 0, 1, 2, . . . , 9
f (w) = (5.61)
0 otherwise
f (w)
EW = 4.5
Var W = 8.25
.1
0 1 2 3 4 5 6 7 8 9 w
f(w) n=2
EW = 4.5
.10 8.25
Var W = 2 = 4.125
.05
0 1 2 3 4 5 6 7 8 9 w
Table 5.25
The Probability Function for W for n = 2
0.0 .01 2.0 .05 4.0 .09 6.0 .07 8.0 .03
0.5 .02 2.5 .06 4.5 .10 6.5 .06 8.5 .02
1.0 .03 3.0 .07 5.0 .09 7.0 .05 9.0 .01
1.5 .04 3.5 .08 5.5 .08 7.5 .04
Comparing Figures 5.38 and 5.39, it is clear that even for a completely
flat/uniform underlying distribution of W and the small sample size of n = 2,
the probability distribution of W looks far more bell-shaped than the underlying
distribution. It is clear why this is so. As you move away from the mean or central
value of W , there are relatively fewer and fewer combinations of w1 and w2 that
can produce a given value of w̄. For example, to observe W = 0, you must have
W1 = 0 and W2 = 0—that is, you must observe not one but two extreme values.
On the other hand, there are ten different combinations of w1 and w2 that lead to
W = 4.5.
It is possible to use the same kind of logic leading to Table 5.25 to produce
exact probability distributions for W based on larger sample sizes n. But such
318 Chapter 5 Probability: The Mathematics of Randomness
Example 25 work is tedious, and for the purpose of indicating roughly how the central limit
(continued ) effect takes over as n gets larger, it is sufficient to approximate the distribution
of W via simulation for a larger sample size. To this end, 1,000 sets of values for
iid variables W1 , W2 , . . . , W8 (with marginal distribution (5.61)) were simulated
and each set averaged to produce 1,000 simulated values of W based on n = 8.
Figure 5.40 is a histogram of these 1,000 values. Notice the bell-shaped character
of the plot. (The simulated mean of W was 4.508 ≈ 4.5 = E W = E W , while the
variance of W was 1.025 ≈ 1.013 = Var W = 8.25/8, in close agreement with
formulas (5.55) and (5.56).)
200
Frequency
100
0
0 1 2 3 4 5 6 7 8 9
Mean of n = 8 W ’s
Sample size and What constitutes “large n” in Proposition 3 isn’t obvious. The truth of the
the central limit matter is that what sample size is required before X can be treated as essentially
effect normal depends on the shape of the underlying distribution of a single observation.
Underlying distributions with decidedly nonnormal shapes require somewhat bigger
values of n. But for most engineering purposes, n ≥ 25 or so is adequate to make X
essentially normal for the majority of data-generating mechanisms met in practice.
(The exceptions are those subject to the occasional production of wildly outlying
values.) Indeed, as Example 25 suggests, in many cases X is essentially normal for
sample sizes much smaller than 25.
The practical usefulness of Proposition 3 is that in many circumstances, only a
normal table is needed to evaluate probabilities for sample averages.
Example 23 Return one more time to the stamp sale time requirements problem and consider
(continued ) observing and averaging the next n = 100 excess service times, to produce
E S = α = 16.5 sec
and
s
p α2
Var S = = 1.65 sec
100
are appropriate for S, via formulas (5.55) and (5.56). Further, in view of the
fact that n = 100 is large, the normal probability table may be used to find
approximate probabilities for S. Figure 5.41 shows an approximate distribution
for S and the area corresponding to P[S > 17].
16 17
17 − 16.5
z= = .30
1.65
so
z-value for a x̄ − EX x̄ − µ
sample mean
z=p = σ (5.62)
Var X √
n
appropriate when using the central limit theorem to find approximate probabilities
for a sample mean. Formula (5.62) is relevant because by Proposition 3, X is
approximately normal for large n and formulas (5.55) and (5.56) give its mean and
standard deviation.
The final example in this section illustrates how the central limit theorem and
some idea of a process or population standard deviation can help guide the choice
of sample size in statistical applications.
µ − .3 < V < µ + .3
Figure 5.42 pictures the situation. The .90 quantile of the standard normal distri-
bution is roughly 1.28—that is, P[−1.28 < Z < 1.28] = .8. So evidently Figure
5.42 indicates that µ + .3 should have z-value 1.28. That is, you want
(µ + .3) − µ
1.28 =
1.6
√
n
or
1.6
.3 = 1.28 √
n
So, solving for n, a sample size of n ≈ 47 would be required to provide the kind
of precision of measurement desired.
µ – .3 µ µ + .3
Section 5 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. A type of nominal 34 inch plywood is made of Find the mean and standard deviation of total thick-
five layers. These layers can be thought of as hav- ness associated with the combination of these indi-
ing thicknesses roughly describable as independent vidual values.
random variables with means and standard devia- 2. The coefficient of linear expansion of brass is to be
tions as follows: obtained as a laboratory exercise. For a brass bar
that is L 1 meters long at T1◦ C and L 2 meters long
Layer Mean (in.) Standard Deviation (in.) at T2◦ C, this coefficient is
1 .094 .001 L2 − L1
2 .156 .002 α=
L 1 (T2 − T1 )
3 .234 .002
4 .172 .002 Suppose that the equipment to be used in the lab-
5 .094 .001 oratory is thought to have a standard deviation for
repeated length measurements of about .00005 m
322 Chapter 5 Probability: The Mathematics of Randomness
and a standard deviation for repeated temperature (e) Redo parts (a) through (d) using a sample size
measurements of about .1◦ C. of 100 instead of 25.
(a) If using T1 ≈ 50◦ C and T2 ≈ 100◦ C, L 1 ≈ 4. Passing a large production run of piston rings
1.00000 m and L 2 ≈ 1.00095 m are obtained, through a grinding operation produces edge widths
and it is desired to attach an approximate stan- possessing a standard deviation of .0004 in. A sim-
dard deviation to the derived value of α, find ple random sample of rings is to be taken and their
such an approximate standard deviation two edge widths measured, with the intention of using
different ways. First, use simulation as was X as an estimate of the population mean thickness
done in Printout 1. Then use the propagation µ. Approximate the probabilities that X is within
of error formula. How well do your two values .0001 in. of µ for samples of size n = 25, 100, and
agree? 400.
(b) In this particular lab exercise, the precision of
5. A pendulum swinging through small angles ap-
which measurements (the lengths or the tem-
proximates simple harmonic motion. The period of
peratures) is the primary limiting factor in the
the pendulum, τ , is (approximately) given by
precision of the derived coefficient of linear
expansion? Explain. s
(c) Within limits, the larger T2 − T1 , the better the τ = 2π
L
value for α. What (in qualitative terms) is the g
physical origin of those limits?
3. Consider again the random number generator dis- where L is the length of the pendulum and g is
cussed in Exercise 1 of Section 5.2. Suppose that the acceleration due to gravity. This fact can be
it is used to generate 25 random numbers and that used to derive an experimental value for g. Suppose
these may reasonably be thought of as indepen- that the length L of about 5 ft can be measured
dent random variables with common individual with a standard deviation of about .25 in. (about
(marginal) distribution as given in Exercise 1 of .0208 foot), and the period τ of about 2.48 sec
Section 5.2. Let X be the sample mean of these 25 can be measured with standard deviation of about
values. .1 sec. What is a reasonable standard deviation to
(a) What are the mean and standard deviation of attach to a value of g derived using this equipment?
the random variable X? Is the precision of the length measurement or the
(b) What is the approximate probability distribu- precision of the period measurement the principal
tion of X? limitation on the precision of the derived g?
(c) Approximate the probability that X exceeds .5.
(d) Approximate the probability that X takes a
value within .02 of its mean.
Chapter 5 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Suppose 90% of all students taking a beginning (c) less than four fail on their first submissions
programming course fail to get their first program Continuing to use this binomial model,
to run on first submission. Use a binomial distri- (d) what is the mean number who will fail?
bution and assign probabilities to the possibilities (e) what are the variance and standard deviation of
that among a group of six such students, the number who will fail?
(a) all fail on their first submissions 2. Suppose that for single launches of a space shuttle,
(b) at least four fail on their first submissions there is a constant probability of O-ring failure (say,
Chapter 5 Exercises 323
.15). Consider ten future launches, and let X be the 6. Suppose that X is a normal random variable with
number of those involving an O-ring failure. Use mean µ = 10.2 and standard deviation σ = .7.
an appropriate probability model and evaluate all Evaluate the following probabilities involving X :
of the following: (a) P[X ≤ 10.1] (b) P[X > 10.5]
(a) P[X = 2] (b) P[X ≥ 1] (c) P[9.0 < X < 10.3] (d) P[|X − 10.2| ≤ .25]
(c) EX (d) Var X (e) P[|X − 10.2| > 1.51]
(e) the standard deviation of X Find numbers # such that the following statements
3. An injection molding process for making auto about X are true:
bumpers leaves an average of 1.3 visual defects (f) P[|X − 10.2| < #] = .80
per bumper prior to painting. Let Y and Z be the (g) P[X < #] = .80
numbers of visual defects on (respectively) the next (h) P[|X − 10.2| > #] = .04
two bumpers produced. Use an appropriate proba- 7. In a grinding operation, there is an upper speci-
bility distribution and evaluate the following: fication of 3.150 in. on a dimension of a certain
(a) √P[Y = 2] (b) P[Y ≥ 1] part after grinding. Suppose that the standard de-
(c) Var Y viation of this normally distributed dimension for
(d) P[Y + Z ≥ 2] (Hint: What is a sensible parts of this type ground to any particular mean
distribution for Y + Z , the number of blem- dimension µ is σ = .002 in. Suppose further that
ishes on two bumpers?) you desire to have no more than 3% of the parts
fail to meet specifications. What is the maximum
4. Suppose that the random number generator sup-
(minimum machining cost) µ that can be used if
plied in a pocket calculator actually generates val-
this 3% requirement is to be met?
ues in such a way that if X is the next value gener-
ated, X can be adequately described using a prob- 8. A 10 ft cable is made of 50 strands. Suppose that,
ability density of the form individually, 10 ft strands have breaking strengths
with mean 45 lb and standard deviation 4 lb. Sup-
(
k((x − .5)2 + 1) for 0 < x < 1 pose further that the breaking strength of a cable
f (x) = is roughly the sum of the strengths of the strands
0 otherwise that make it up.
(a) Find a plausible mean and standard deviation
(a) Evaluate k and sketch a graph of f (x) . for the breaking strengths of such 10 ft cables.
(b) Evaluate P[X ≥ .5], P[X > .5], P[.75 > (b) Evaluate the probability that a 10 ft cable
X ≥ .5], and P[|X − .5| ≥ .2]. of this type will support a load of 2230 lb.
(c) Compute EX and Var X. (Hint: If X P is the mean breaking strength of
(d) Compute and graph F(x), the cumulative prob- the strands, (Strengths) ≥ 2230 is the same
ability function for X. Read from your graph as X ≥ ( 2230 ). Now use the central limit the-
50
the .8 quantile of the distribution of X . orem.)
5. Suppose that Z is a standard normal random vari- 9. The electrical resistivity, ρ, of a piece of wire is a
able. Evaluate the following probabilities involv- property of the material involved and the temper-
ing Z : ature at which it is measured. At a given tempera-
(a) P[Z ≤ 1.13] (b) P[Z > −.54] ture, if a cylindrical piece of wire of length L and
(c) P[−1.02 < Z < .06] (d) P[|Z | ≤ .25] cross-sectional area A has resistance R, the ma-
(e) P[|Z | > 1.51] (f) P[−3.0 < Z < 3.0] terial’s resistivity is calculated using the formula
Find numbers # such that the following statements ρ = RLA . Thus, if a wire’s cross section is assumed
about Z are true:
(g) P[|Z | < #] = .80 (h) P[Z < #] = .80
(i) P[|Z | > #] = .04
324 Chapter 5 Probability: The Mathematics of Randomness
14. Find EX and Var X for a continuous distribution 17. The viscosity of a liquid may be measured by
with probability density placing it in a cylindrical container and determin-
ing the force needed to turn a cylindrical rotor (of
.3 if 0 < x < 1 nearly the same diameter as the container) at a
given velocity in the liquid. The relationship be-
f (x) = .7 if 1 < x < 2
tween the viscosity η, force F, area A of the side
0 otherwise of the rotor in contact with the liquid, the size L
of the gap between the rotor and the inside of the
15. Suppose that it is adequate to describe the 14- container, and the velocity v at which the rotor
day compressive strengths of test specimens of a surface moves is
certain concrete mixture as normally distributed
with mean µ = 2,930 psi and standard deviation FL
η=
σ = 20 psi. vA
(a) Assess the probability that the next specimen
of this type tested for compressive strength Suppose that students are to determine an experi-
will have strength above 2,945 psi. mental viscosity for SAE no. 10 oil as a laboratory
(b) Use your answer to part (a) and assess the exercise and that appropriate means and standard
probability that in the next four specimens deviations for the measured variables F, L, v, and
tested, at least one has compressive strength A in this laboratory are as follows:
above 2,945 psi.
(c) Assess the probability that the next 25 speci- µF = 151 N σF = .05 N
mens tested have a sample mean compressive µA = 1257 cm2 σA = .2 cm2
strength within 5 psi of µ = 2,930 psi. µL = .5 cm σL = .05 cm
(d) Suppose that although the particular concrete µv = 30 cm/sec σv = 1 cm/sec
formula under consideration in this problem
is relatively strong, it is difficult to pour in (a) Use the propagation of error formulas and
large quantities without serious air pockets find an approximate standard deviation that
developing (which can have important impli- might serve as a measure of precision for an
cations for structural integrity). In fact, sup- experimentally derived value of η from this
pose that using standard methods of pouring, laboratory.
serious air pockets form at an average rate of (b) Explain why, if experimental values of η ob-
1 per 50 cubic yards of poured concrete. Use tained for SAE no. 10 oil in similar laboratory
an appropriate probability distribution and as- exercises conducted over a number of years at
sess the probability that two or more serious a number of different universities were com-
air pockets will appear in a 150 cubic yard pared, the approximate standard deviation de-
pour to be made tomorrow. rived in (a) would be likely to understate the
16. For X with a continuous distribution specified by variability actually observed in those values.
the probability density 18. The heat conductivity, λ, of a cylindrical bar of
( diameter D and length L, connected between two
.5x for 0 < x < 2 constant temperature devices of temperatures T1
f (x) = and T2 (respectively), that conducts Q calories in
0 otherwise
t seconds is
find P[X < 1.0] and find the mean, EX. 4Q L
λ=
π(T1 − T2 )t D 2
326 Chapter 5 Probability: The Mathematics of Randomness
In a materials laboratory exercise to determine 21. Students are going to measure Young’s Modulus
λ for brass, the following means and standard for copper by measuring the elongation of a piece
deviations for the variables D, L, T1 , T2 , Q, and t of copper wire under a tensile force. For a cylin-
are appropriate, as are the partial derivatives of λ drical wire of diameter D subjected to a tensile
with respect to the various variables (evaluated at force F, if the initial length (length before apply-
the means of the variables): ing the force) is L 0 and final length is L 1 , Young’s
Modulus for the material in question is
D L T1
4FL0
µ ◦ Y =
1.6 cm 100 cm 100 C π D (L 1 − L 0 )
2
σ .1 cm .1 cm 1◦ C
partial −.249 .199 −.00199 The test and measuring equipment used in a par-
ticular lab are characterized by the standard devi-
ations
T2 Q t
measure 1L for a factorial arrangement of of the simpler of these approximations are dis-
levels of F and D. Does the equation predict cussed in the articles “A Simpler Approximation
that F and D will or will not have important for Areas Under the Standard Normal Curve,”
interactions? Explain. by A. Shah (The American Statistician, 1985),
22. Exercise 6 of Chapter 3 concerns the lifetimes (in “Pocket-Calculator Approximation for Areas un-
numbers of 24 mm deep holes drilled in 1045 steel der the Standard Normal Curve,” by R. Norton
before failure) of 12 D952-II (8 mm) drills. (The American Statistician, 1989), and “Approx-
(a) Make a normal plot of the data given in Ex- imations for Hand Calculators Using Small In-
ercise 6 of Chapter 3. In what specific way teger Coefficients,” by S. Derenzo (Mathematics
does the shape of the data distribution appear of Computation, 1977). For z > 0, consider the
to depart from a Gaussian shape? approximations offered in these articles:
(b) The 12 lifetimes have mean ȳ = 117.75 and
standard deviation s ≈ 51.1. Simply using z(4.4 − z)
.5 +
0 ≤ z ≤ 2.2
these in place of µ and σ for the underly- 10
8(z) ≈ gS (z) = .99 2.2 < z < 2.6
ing drill life distribution, use the normal table
to find an approximate fraction of drill lives
1.00 2.6 ≤ z
below 40 holes. !
(c) Based on your answer to (a), if your answer to 1 z 2 + 1.2z .8
(b) is seriously different from the real fraction 8(z) ≈ gN (z) = 1 − exp −
2 2
of drill lives below 40, is it most likely high
or low? Explain. 8(z) ≈ gD (z)
23. Metal fatigue causes cracks to appear on the skin 1 (83z + 351)z + 562
of older aircraft. Assume that it is reasonable to =1− exp −
2 703/z + 165
model the number of cracks appearing on a 1 m2
surface of planes of a certain model and vintage Evaluate gS (z), gN (z), and gD (z) for z = .5, 1.0,
as Poisson with mean λ = .03. 1.5, 2.0, and 2.5. How do these values compare to
(a) If 1 m2 is inspected, assess the probability that the corresponding entries in Table B.3?
at least one crack is present on that surface. 26. Exercise 25 concerned approximations for nor-
(b) If 10 m2 are inspected, assess the probability mal probabilities. People have also invested a fair
that at least one crack (total) is present. amount of effort in finding useful formulas ap-
(c) If ten areas, each of size 1 m2 , are inspected, proximating standard normal quantiles. One such
assess the probability that exactly one of these approximation was given in formula (3.3). A more
has cracks. complicated one, again taken from the article by
24. If a dimension on a mechanical part is normally S. Derenzo mentioned in Exercise 25, is as fol-
distributed, how small must the standard devi- lows. For p > .50, let y = − ln (2(1 − p)) and
ation be if 95% of such parts are to be within
s
specifications of 2 cm ± .002 cm when the mean
((4y + 100)y + 205) y 2
dimension is ideal (µ = 2 cm)? Q z ( p) ≈
((2y + 56)y + 192) y + 131
25. The fact that the “exact” calculation of normal
probabilities requires either numerical integration
For p < .50, let y = − ln (2 p) and
or the use of tables (ultimately generated using
numerical integration) has inspired many peo- s
ple to develop approximations to the standard ((4y + 100)y + 205) y 2
Q z ( p) ≈ −
normal cumulative distribution function. Several ((2y + 56)y + 192) y + 131
328 Chapter 5 Probability: The Mathematics of Randomness
Use these formulas to approximate Q z ( p) for the other two possibilities as a description of
p = .01, .05, .1, .3, .7, .9, .95, and .99. How do the the flexural strength?
values you obtain compare with the correspond- (d) Eye-fit lines to your plots from part (c). Use
ing entries in Table 3.10 and the results of using them to help you determine appropriate means
formula (3.3)? and standard deviations for normal distribu-
27. The article “Statistical Strength Evaluation of tions used to describe flexural strength and
Hot-pressed Si3 N4 ” by R. Govila (Ceramic Bul- the logarithm of flexural strength. Compare
letin, 1983) contains summary statistics from an the .01, .10, .20, and .50 quantiles of the fit-
extensive study of the flexural strengths of two ted normal and lognormal distributions for
high-strength hot-pressed silicon nitrides in 14 strength to the quantiles you computed in
point, 4 point bending. The values below are frac- part (b).
ture strengths of 30 specimens of one of the ma- 28. The article “Using Statistical Thinking to Solve
terials tested at 20◦ C. (The units are MPa, and Maintenance Problems” by Brick, Michael, and
the data were read from a graph in the paper Morganstein (Quality Progress, 1989) contains
and may therefore individually differ by perhaps the following data on lifetimes of sinker rollers.
as much as 10 MPa from the actual measured Given are the numbers of 8-hour shifts that 17
values.) sinker rollers (at the bottom of a galvanizing pot
and used to direct steel sheet through a coating
514, 533, 543, 547, 584, 619, 653, 684, operation) lasted before failing and requiring re-
689, 695, 700, 705, 709, 729, 729, 753, placement.
763, 800, 805, 805, 814, 819, 819, 839,
839, 849, 879, 900, 919, 979 10, 12, 15, 17, 18, 18, 20, 20,
21, 21, 23, 25, 27, 29, 29, 30, 35
(a) The materials researcher who collected the
original data believed the Weibull distribution (a) The authors of the article considered a Weibull
to be an adequate model for flexural strength distribution to be a likely model for the life-
of this material. Make a Weibull probability times of such rollers. Make a zero-threshold
plot using the method of display (5.35) of Weibull probability plot for use in assessing
Section 5.3 and investigate this possibility. the reasonableness of such a description of
Does a Weibull model fit these data? roller life.
(b) Eye-fit a line through your plot from part (a). (b) Eye-fit a line to your plot in (a) and use it to
Use it to help you determine an appropriate estimate parameters for a Weibull distribution
shape parameter, β, and an appropriate scale for describing roller life.
parameter, α, for a Weibull distribution used (c) Use your estimated parameters from (a) and
to describe flexural strength of this material the form of the Weibull cumulative distribu-
at 20◦ C. For a Weibull distribution with your tion function given in Section 5.2 to estimate
fitted values of α and β, what is the median the .10 quantile of the roller life distribution.
strength? What is a strength exceeded by 80% 29. The article “Elementary Probability Plotting for
of such Si3 N4 specimens? By 90% of such Statistical Data Analysis” by J. King (Quality
specimens? By 99% of such specimens? Progress, 1988) contains 24 measurements of de-
(c) Make normal plots of the raw data and of the viations from nominal of a distance between two
logarithms of the raw data. Comparing the
three probability plots made in this exercise, is
there strong reason to prefer a Weibull model,
a normal model, or a lognormal model over
Chapter 5 Exercises 329
holes drilled in a steel plate. These are reproduced (a) Find the probabilities that a given diameter
here. The units are mm. falls into each of the three zones.
(b) Suppose that a technician simply begins mea-
−2, −2, 7, −10, 4, −3, 0, 8, −5, 5, −6, 0, suring diameters on consecutive parts and
2, −2, 1, 3, 3, −4, −6, −13, −7, −2, 2, 2 continues until a Red Zone measurement is
found. Assess the probability that more than
(a) Make a dot diagram for these data and com- ten parts must be measured. Also, give the
pute x̄ and s. expected number of measurements that must
(b) Make a normal plot for these data. Eye-fit a be made.
line on the plot and use it to find graphical The engineer decides to use the Green/Yellow/Red
estimates of a process mean and standard de- gauging system in the following way. Every hour,
viation for this deviation from nominal. Com- parts coming off the lathe will be checked. First,
pare these graphical estimates with the values a single part will be measured. If it is in the Green
you calculated in (a). Zone, no further action is needed that hour. If the
(c) Engineering specifications on this deviation initial part is in the Red Zone, the lathe will be
from nominal were ±10 mm. Suppose that x̄ stopped and a supervisor alerted. If the first part
and s from (a) are adequate approximations is in the Yellow Zone, a second part is measured.
of the process mean and standard deviation If this second measurement is in the Green Zone,
for this variable. Use the normal distribution no further action is required, but if it is in the
with those parameters and compute a frac- Yellow or the Red Zone, the lathe is stopped and
tion of deviations that fall outside specifica- a supervisor alerted. It is possible to argue that
tions. Does it appear from this exercise that under this scheme (continuing to suppose that
the drilling operation is capable (i.e., precise) measurements are independent normal variables
enough to produce essentially all measured with mean 1.181 in. and standard deviation .002
deviations in specifications, at least if prop- in.), the probability that the lathe is stopped in any
erly aimed? Explain. given hour is .1865.
30. An engineer is responsible for setting up a mon- (c) Use the preceding fact and evaluate the prob-
itoring system for a critical diameter on a turned ability that the lathe is stopped exactly twice
metal part produced in his plant. Engineering in 8 consecutive hours. Also, what is the
specifications for the diameter are 1.180 in. ± expected number of times the lathe will be
.004 in. For ease of communication, the engineer stopped in 8 time periods?
sets up the following nomenclature for measured 31. A random variable X has a cumulative distribution
diameters on these parts: function
Green Zone Diameters: 1.178 in. ≤ Diameter
0 for x ≤ 0
≤ 1.182 in.
Red Zone Diameters: Diameter ≤ 1.176 in. or F(x) = sin(x) for 0 < x ≤ π/2
Diameter ≥ 1.184 in. 1 for π/2 < x
Yellow Zone Diameters: any other Diameter
(a) Find P[X ≤ .32].
Suppose that in fact the diameters of parts com- (b) Give the probability density for X , f (x).
ing off the lathe in question can be thought of as (c) Evaluate EX and Var X .
independent normal random variables with mean
32. Return to the situation of Exercise 2 of Section
µ = 1.181 in. and standard deviation σ = .002 in.
5.4.
330 Chapter 5 Probability: The Mathematics of Randomness
Suppose that demerits are assigned to devices Y = the number of flaws identified by the inspector
of the type considered there according to the for-
mula D = 5X + Y . (a) What is a sensible conditional distribution for
(a) Find the mean value of D, ED. (Use your an- Y given that X = 5? Given that X = 5, find
swers to (c) and (d) Exercise 2 of Section 5.4 the (conditional) probability that Y = 3.
and formula (5.53) of Section 5.5. Formula In general, a sensible conditional probability func-
(5.53) holds whether or not X and Y are in- tion for Y given X = x is the binomial probability
dependent.) function with number of trials x and success prob-
(b) Find the probability a device of this type ability .8. That is, one could use
scores 7 or less demerits. That is, find
P[D ≤ 7]. x
.8 y .2x−y for y = 0,
(c) On average, how many of these devices will y
f Y |X (y | x) = 1, 2, . . . , x
have to be inspected in order to find one that
scores 7 or less demerits? (Use your answer 0 otherwise
to (b).)
33. Consider jointly continuous random variables X Now suppose that X is modeled as Poisson with
and Y with density mean λ = 3—i.e.,
( −3 x
x + y for 0 < x < 1 and 0 < y < 1 e 3
for x = 0, 1, 2, 3, . . .
f (x, y) = f X (x) = x!
0 otherwise
0 otherwise
(a) Find the probability that the product of X and Multiplication of the two formulas gives a joint
Y is at least 14 . probability function for X and Y .
(b) Find the marginal probability density for X . (b) Find the (marginal) probability that Y = 0.
(Notice that Y ’s is similar.) Use this to find the (Note that this is obtained by summing
expected value and standard deviation of X. f (x, 0) over all possible values of x.)
(c) Are X and Y independent? Explain. (c) Find f Y (y) in general. What (marginal) dis-
(d) Compute the mean of X + Y . Why can’t for- tribution does Y have?
mula (5.54) of Section 5.5 be used to find the
36. Suppose that cans to be filled with a liquid are cir-
variance of X + Y ?
cular cylinders. The radii of these cans have mean
34. Return to the situation of Exercise 4 of Section µr = 1.00 in. and standard deviation σr = .02 in.
5.4. The volumes of liquid dispensed into these cans
(a) Find EX, Var X, EY , and Var Y using the have mean µv = 15.10 in.3 and standard devia-
marginal densities for X and Y . tion σv = .05 in.3 .
(b) Use your answer to (a) and Proposition 1 to (a) If the volumes dispensed into the cans are ap-
find the mean and variance of Y − X. proximately normally distributed, about what
35. Visual inspection of integrated circuit chips, even fraction will exceed 15.07 in.3 ?
under high magnification, is often less than per- (b) Approximate the probability that the total vol-
fect. Suppose that an inspector has an 80% chance ume dispensed into the next 100 cans exceeds
of detecting any given flaw. We will suppose that 1510.5 in.3 (if the total exceeds 1510.5, X ex-
the inspector never “cries wolf”—that is, sees a ceeds 15.105).
flaw where none exists. Then consider the random (c) Approximate the mean µh and standard de-
variables viation σh of the heights of the liquid in the
X = the true number of flaws on a chip
Chapter 5 Exercises 331
filled cans. (Recall that the volume of a cir- operating characteristic for λ = .25, .5, and
cular cylinder is v = πr 2 h, where h is the 1.0.
height of the cylinder.) (b) Suppose that instead of the rule in (a), this
(d) Does the variation in bottle radius or the vari- rule is followed: “Accept the lot if on 2 stan-
ation in volume of liquid dispensed into the dard size inspection units, 2 or fewer total
bottles have the biggest impact on the varia- nonconformities are seen.” Make a plot of the
tion in liquid height? Explain. operating characteristic curve for this second
37. Suppose that a pair of random variables have the plan and compare it with the plot from part
joint probability density (a). (Note that here, for X = the total number
of nonconformities seen, X has a Poisson dis-
(
exp(x − y) if 0 ≤ x ≤ 1 and x ≤ y tribution with mean 2λ and OC(λ) = P[X ≤
f (x, y) = 2].) List values of the operating characteristic
0 otherwise for λ = .25, .5, and 1.0.
39. A discrete random variable X can be described
(a) Evaluate P[Y ≤ 1.5].
using the following probability function:
(b) Find the marginal probability densities for X
and Y .
(c) Are X and Y independent? Explain. x 1 2 3 4 5
(d) Find the conditional probability density for f (x) .61 .24 .10 .04 .01
Y given X = .25, f Y |X (y | .25). Given that
X = .25, what is the mean of Y ? (Hint: Use (a) Make a probability histogram for X. Also
f Y |X (y | .25).) plot F(x), the cumulative probability func-
38. (Defects per Unit Acceptance Sampling) Sup- tion for X .
pose that in the inspection of an incoming prod- (b) Find the mean and standard deviation for the
uct, nonconformities on an inspection unit are random variable X .
counted. If too many are seen, the incoming lot (c) Evaluate P[X ≥ 3] and then find P[X < 3].
is rejected and returned to the manufacturer. (For 40. A classical data set of Rutherford and Geiger (re-
concreteness, you might think of blemishes on ferred to in Example 6) suggests that for a partic-
rolled paper or wire, where an inspection unit con- ular experimental setup involving a small bar of
sists of a certain length of material from the roll.) polonium, the number of collisions of α particles
Suppose further that the number of nonconformi- with a small screen placed near the bar during
ties on a piece of product of any particular size an 8-minute period can be modeled as a Poisson
can be modeled as Poisson with an appropriate variable with mean λ = 3.87. Consider an exper-
mean. imental setup of this type, and let X and Y be (re-
(a) Suppose that this rule is followed: “Accept spectively) the numbers of collisions in the next
the lot if on a standard size inspection unit, two 8-minute periods. Evaluate√the following:
1 or fewer nonconformities are seen.” The (a) P[X ≥ 2] (b) Var X
operating characteristic curve of this accep- (c) P[X + Y = 6] (d) P[X + Y ≥ 3]
tance sampling plan is a plot of the proba- (Hint for parts (c) and (d): What is a sensible
bility that the lot is accepted as a function of probability distribution for X + Y , the number of
λ = the mean defects per inspection unit. (For collisions in a 16-minute period?)
X = the number of nonconformities seen, X
has Poisson distribution with mean λ and
OC(λ) = P[X ≤ 1].) Make a plot of the op-
erating characteristic curve. List values of the
332 Chapter 5 Probability: The Mathematics of Randomness
41. Suppose that X is a continuous random variable (a) If lengths are normally distributed about a
with probability density of the form mean µ (which can be changed by altering
( the setup of a jig) and specifications on this
k x 2 (1 − x) for 0 < x < 1 length are 33.69 in. ± .01 in., what appears
f (x) = to be the best possible fraction of the lengths
0 otherwise
in specifications? What does µ need to be in
order to achieve this fraction?
(a) Evaluate k and sketch a graph of f (x).
(b) Suppose now that in an effort to determine
(b) Evaluate P[X ≤ .25], P[X ≤ .75], P[.25 <
the mean length produced using the current
X ≤ .75], and P[|X √ − .5| > .1]. setup of the jig, a sample of rods is to be taken
(c) Compute EX and Var X.
and their lengths measured, with the intention
(d) Compute and graph F(x), the cumulative
of using the value of X as an estimate of µ.
distribution function for X. Read from your
Approximate the probabilities that X is within
graph the .6 quantile of the distribution of X .
.0005 in. of µ for samples of size n = 25, 100,
42. Suppose that engineering specifications on the and 400. Do your calculations for this part of
shelf depth of a certain slug to be turned on a the question depend for their validity on the
CNC lathe are from .0275 in. to .0278 in. and that length distribution being normal? Explain.
values of this dimension produced on the lathe
45. Suppose that the measurement of the diameters of
can be described using a normal distribution with
#10 machine screws produced on a particular ma-
mean µ and standard deviation σ .
chine yields values that are normally distributed
(a) If µ = .0276 and σ = .0001, about what frac-
with mean µ and standard deviation σ = .03 mm.
tion of shelf depths are in specifications?
(a) If µ = 4.68 mm, about what fraction of all
(b) What machine precision (as measured by σ )
measured diameters will fall in the range from
would be required in order to produce about
4.65 mm to 4.70 mm?
98% of shelf depths within engineering spec-
(b) Use your value from (a) and an appropri-
ifications (assuming that µ is at the midpoint
ate discrete probability distribution to evalu-
of the specifications)?
ate the probability (assuming µ = 4.68) that
43. The resistance of an assembly of several resistors among the next five measurements made, ex-
connected in series is the sum of the resistances of actly four will fall in the range from 4.65 mm
the individual resistors. Suppose that a large lot of to 4.70 mm.
nominal 10 resistors has mean resistance µ = (c) Use your value from (a) and an appropriate
9.91 and standard deviation of resistances σ = discrete probability distribution to evaluate
.08 . Suppose that 30 resistors are randomly the probability (assuming that µ = 4.68) that
selected from this lot and connected in series. if one begins sampling and measuring these
(a) Find a plausible mean and variance for the screws, the first diameter in the range from
resistance of the assembly. 4.65 mm to 4.70 mm will be found on the
(b) Evaluate the probability that the resistance second, third, or fourth screw measured.
of the assembly exceeds 298.2 . (Hint: If (d) Now suppose that µ is unknown but is to be
X is the mean resistance of the 30 resistors estimated by X obtained from measuring a
involved, the resistance of the assembly ex- sample of n = 25 screws. Evaluate the prob-
ceeding 298.2 is the same as X exceeding ability that the sample mean, X , takes a value
9.94 . Now apply the central limit theorem.) within .01 mm of the long-run (population)
44. At a small metal fabrication company, steel rods mean µ.
of a particular type cut to length have lengths with
standard deviation .005 in.
Chapter 5 Exercises 333
(e) What sample size, n, would be required in (a) If X 1 , X 2 , X 3 , and X 4 are the actual widths of
order to a priori be 90% sure that X from n four of the product units and Y is the actual
measurements will fall within .005 mm of µ? inside length of a box into which they are to
46. The random variable X = the number of hours be packed, then the “head space” in the box is
till failure of a disk drive is described using an U = Y − (X 1 + X 2 + X 3 + X 4 ). What are a
exponential distribution with mean 15,000 hours. sensible mean and standard deviation for U ?
(a) Evaluate the probability that a given drive (b) If X 1 , X 2 , X 3 , X 4 , and Y are normally dis-
lasts at least 20,000 hours. tributed and independent, it turns out that U
(b) A new computer network has ten of these is also normal. Suppose this is the case. About
drives installed on computers in the network. what fraction of the time should the company
Use your answer to (a) and an assumption of expect to experience difficulty packing a box?
independence of the ten drive lifetimes and (What is the probability that the head space
evaluate the probability that at least nine of as calculated in (a) is negative?)
these drives are failure-free through 20,000 (c) If it is your job to recommend a new mean
hours. inside length of the boxes and the company
wishes to have packing problems in only .5%
47. Miles, Baumhover, and Miller worked with a
of the attempts to load four units of product
company on a packaging problem. Cardboard
into a box, what is the minimum mean inside
boxes, nominally 9.5 in. in length were supposed
length you would recommend? (Assume that
to hold four units of product stacked side by side.
standard deviations will remain unchanged.)
They did some measuring and found that in fact
the individual product units had widths with mean
approximately 2.577 in. and standard deviation
approximately .061 in. Further, the boxes had (in-
side) lengths with mean approximately 9.566 in.
and standard deviation approximately .053 in.
6
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Introduction to
Formal Statistical
Inference
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
For example, a piece of equipment that dispenses baby food into jars might
produce an unknown mean fill level, µ. Determining a data-based interval likely to
334
6.1 Large-Sample Confidence Intervals for a Mean 335
Definition 1 A confidence interval for a parameter (or function of one or more parameters)
is a data-based interval of numbers thought likely to contain the parameter (or
function of one or more parameters) possessing a stated probability-based
confidence or reliability.
This section discusses how basic probability facts lead to simple large-sample
formulas for confidence intervals for a mean, µ. The unusual case where the standard
deviation σ is known is treated first. Then parallel reasoning produces a formula for
the much more common situation where σ is not known. The section closes with
discussions of three practical issues in the application of confidence intervals.
x̄ = the sample mean net fill weight of 47 jars filled by the process (g)
µ – .3 µ µ + .3
to determine from context whether a random variable or its observed value is being
discussed.
The most common way of thinking about a graphic like Figure 6.1 is to think
of the possibility that
which shifts attention to this second way of thinking. The fact that expression (6.2)
has about an 80% chance of holding true anytime a sample of 47 fill weights is taken
suggests that the random interval
138.2 g ± .3 g
(i.e., the interval from 137.9 g to 138.5 g) be used as an 80% confidence interval
for the process mean fill weight.
6.1 Large-Sample Confidence Intervals for a Mean 337
It is not hard to generalize the logic that led to expression (6.3). Anytime an iid
model is appropriate for the elements of a large sample, the central limit theorem
√ sample mean x̄ is approximately normal with mean µ and standard
implies that the
deviation σ/ n. Then, if for p > .5, z is the p quantile of the standard normal
distribution, the probability that
σ σ
µ − z √ < x̄ < µ + z √ (6.4)
n n
σ σ
x̄ − z √ < µ < x̄ + z √ (6.5)
n n
and thought of as the eventuality that the random interval with endpoints
Large-sample σ
known σ confidence x̄ ± z √ (6.6)
n
limits for µ
Table 6.1
z’s for Use in Two-sided
Large-n Intervals for µ
Desired
Confidence z
80% 1.28
90% 1.645
95% 1.96
98% 2.33
99% 2.58
338 Chapter 6 Introduction to Formal Statistical Inference
.7 × 10−4
−.16 × 10−4 ± (1.96) √
32
a sample. The argument leading to formula (6.6) depends on the fact√that for large
n, x̄ is approximately normal with mean µ and standard deviation σ/ n—i.e., that
x̄ − µ
Z= (6.7)
σ
√
n
x̄ − µ
Z= (6.8)
s
√
n
is also approximately standard normal. And the variable (6.8) doesn’t involve σ .
Beginning with the fact that (when an iid model for observations is appropriate
and n is large) the variable (6.8) is approximately standard normal, the reasoning is
much as before. For a positive z,
x̄ − µ
−z < <z
s
√
n
is equivalent to
s s
µ − z √ < x̄ < µ + z √
n n
s s
x̄ − z √ < µ < x̄ + z √
n n
√
Thus, the interval with random center x̄ and random length 2zs/ n—i.e., with
random endpoints
Large-sample s
confidence limits x̄ ± z √ (6.9)
n
for µ
0 0 2 3
0 7 8 8 9 9
1 0 0 0 1 1 2 2 2 3
1 5 5 6 6 7 7 7 9
2 0
2
If the disk drives that produced the data in Figure 6.2 are thought of as
representing the population of drives subject to blink code A failure, it seems
reasonable to use an iid model and formula (6.9) to estimate the population mean
breakaway torque. Choosing to make a 90% confidence interval for µ, z = 1.645
is indicated in Table 6.1. And using formula (6.9), endpoints
5.1
11.5 ± 1.645 √
26
(i.e., endpoints 9.9 in. oz and 13.1 in. oz) are indicated.
The interval shows that the mean breakaway torque for drives with blink
code A failure was substantially below the factory’s 33.5 in. oz target value.
Recognizing this turned out to be key in finding and eliminating a design flaw in
the drives.
strength for specimens of this type of concrete is at least 4188 psi.” That is, practical
engineering problems are sometimes best addressed using one-sided confidence
intervals.
Making There is no real problem in coming up with formulas for one-sided confidence
one-sided intervals. If you have a workable two-sided formula, all that must be done is to
intervals
1. replace the lower limit with −∞ or the upper limit with +∞ and
2. adjust the stated confidence level appropriately upward (this usually means
dividing the “unconfidence level” by 2).
This prescription works not only with formulas (6.6) and (6.9) but also with the rest
of the two-sided confidence intervals introduced in this chapter.
Example 3 For the mean breakaway torque for defective disk drives, consider making a one-
(continued ) sided 90% confidence interval for µ of the form (−∞, #), for # an appropriate
number. Put slightly differently, consider finding a 90% upper confidence bound
for µ, (say, #).
Beginning with a two-sided 80% confidence interval for µ, the lower limit can
be replaced with −∞ and a one-sided 90% confidence interval determined. That
is, using formula (6.9), a 90% upper confidence bound for the mean breakaway
torque is
s 5.1
I x̄ + 1.28 √ = 11.5 + 1.28 √ = 12.8 in. oz
n 26
But how to think about a confidence level after sample selection? This is an entirely
different matter. Once numbers have been plugged into a formula like (6.6) or (6.9),
the die has already been cast, and the numerical interval is either right or wrong.
The practical difficulty is that while which is the case can’t be determined, it no
longer makes logical sense to attach a probability to the correctness of the interval.
For example, it would make no sense to look again at the two-sided interval found
in Example 3 and try to say something like “there is a 90% probability that µ
is between 9.9 in. oz and 13.1 in. oz.” µ is not a random variable. It is a fixed
(although unknown) quantity that either is or is not between 9.9 and 13.1. There is
no probability left in the situation to be discussed.
So what does it mean that (9.9, 13.1) is a 90% confidence interval for µ? Like
it or not, the phrase “90% confidence” refers more to the method used to obtain
the interval (9.9, 13.1) than to the interval itself. In coming up with the interval,
methodology has been used that would produce numerical intervals bracketing µ in
about 90% of repeated applications. But the effectiveness of the particular interval
in this application is unknown, and it is not quantifiable in terms of a probability. A
person who (in the course of a lifetime) makes many 90% confidence intervals can
expect to have a “lifetime success rate” of about 90%. But the effectiveness of any
particular application will typically be unknown.
A short statement summarizing this discussion as “the authorized interpretation
of confidence” will be useful.
Definition 2 To say that a numerical interval (a, b) is (for example) a 90% confidence
(Interpretation of a interval for a parameter is to say that in obtaining it, one has applied methods
Confidence Interval ) of data collection and calculation that would produce intervals bracketing the
parameter in about 90% of repeated applications. Whether or not the particular
interval (a, b) brackets the parameter is unknown and not describable in terms
of a probability.
The reader may feel that the statement in Definition 2 is a rather weak meaning
for the reliability figure associated with a confidence interval. Nevertheless, the
statement in Definition 2 is the correct interpretation and is all that can be rationally
expected. And despite the fact that the correct interpretation may initially seem
somewhat unappealing, confidence interval methods have proved themselves to be
of great practical use.
As a final consideration in this introduction to confidence intervals, note that
formulas like (6.6) and (6.9) can give some crude quantitative answers to the ques-
Sample sizes tion, “How big must n be?” Using formula (6.9), for example, if you have in mind
for estimating µ (1) a desired confidence level, (2) a worst-case expectation for the sample standard
deviation, and (3) a desired precision of estimation for µ, it is a simple matter to
solve for a corresponding sample size. That is, suppose that the desired confidence
level dictates the use of the value z in formula (6.9), s is some likely worst-case
6.1 Large-Sample Confidence Intervals for a Mean 343
value for the sample standard deviation, and you want to have confidence limits (or
a limit) of the form x̄ ± 1. Setting
s
1 = z√
n
5.1
1 = 1.96 √
n
For two reasons, the kind of calculations in the previous example give somewhat
less than an ironclad answer to the question of sample size. The first is that they
are only as good as the prediction of the sample standard deviation, s. If s is
underpredicted, an n that is not really large enough will result. (By the same token,
if one is excessively conservative and overpredicts s, an unnecessarily large sample
size will result.) The second issue is that expression (6.9) remains a large-sample
formula. If calculations like the preceding ones produce n smaller than, say, 25 or 30,
the value should be increased enough to guarantee that formula (6.9) can be applied.
344 Chapter 6 Introduction to Formal Statistical Inference
Section 1 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Interpret the statement, “The interval from 6.3 to their data are recorded here. Given in the small
7.9 is a 95% confidence interval for the mean µ.” frequency table are the measurements obtained on
2. In Chapter Exercise 2 of Chapter 3, there is a 50 screws by one of the students using the digital
data set consisting of the aluminum contents of calipers.
26 bihourly samples of recycled PET plastic from
a recycling facility. Those 26 measurements have Diameter (mm) Frequency
ȳ = 142.7 ppm and s ≈ 98.2 ppm. Use these facts
to respond to the following. (Assume that n = 26 4.52 1
is large enough to permit the use of large-sample 4.66 4
formulas in this case.) 4.67 7
(a) Make a 90% two-sided confidence interval for 4.68 7
the mean aluminum content of such specimens 4.69 14
over the 52-hour study period. 4.70 9
(b) Make a 95% two-sided confidence interval for 4.71 4
the mean aluminum content of such specimens 4.72 4
over the 52-hour study period. How does this
compare to your answer to part (a)? (a) Compute the sample mean and standard devi-
(c) Make a 90% upper confidence bound for the ation for these data.
mean aluminum content of such samples over (b) Use your sample values from (a) and make
the 52-hour study period. (Find # such that a 98% two-sided confidence interval for the
(−∞, #) is a 90% confidence interval.) How mean diameter of such screws as measured by
does this value compare to the upper endpoint this student with these calipers.
of your interval from part (a)? (c) Repeat part (b) using 99% confidence. How
(d) Make a 95% upper confidence bound for the does this interval compare with the one from
mean aluminum content of such samples over (b)?
the 52-hour study period. How does this value (d) Use your values from (a) and find a 98% lower
compare to your answer to part (c)? confidence bound for the mean diameter. (Find
(e) Interpret your interval from (a) for someone a number # such that (#, ∞) is a 98% confi-
with little statistical background. (Speak in the dence interval.) How does this value compare
context of the recycling study and use Defini- to the lower endpoint of your interval from (b)?
tion 2 as your guide.) (e) Repeat (d) using 99% confidence. How does
3. Return to the context of Exercise 2. Suppose that in the value computed here compare to your an-
order to monitor for possible process changes, fu- swer to (d)?
ture samples of PET will be taken. If it is desirable (f) Interpret your interval from (b) for someone
to estimate the mean aluminum content with ±20 with little statistical background. (Speak in the
ppm precision and 90% confidence, what future context of the diameter measurement study and
sample size do you recommend? use Definition 2 as your guide.)
4. DuToit, Hansen, and Osborne measured the diam-
eters of some no. 10 machine screws with two dif-
ferent calipers (digital and vernier scale). Part of
6.2 Large-Sample Significance Tests for a Mean 345
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
x̄ − 139.8 x̄ − 139.8
Z= = (6.10)
σ .32
√
n
346 Chapter 6 Introduction to Formal Statistical Inference
of all samples would produce a value of x̄ (or Z ) as extreme as the one actually
observed. Put in those terms, the data seem to speak rather convincingly against the
process being on target.
–2 –1 0 1 2
The argument that has just been made is an application of typical significance-
testing logic. In order to make the pattern of thought obvious, it is useful to isolate
some elements of it in definition form. This is done next, beginning with a formal
restatement of the overall purpose.
Definition 3 Statistical significance testing is the use of data in the quantitative assessment
of the plausibility of some trial value for a parameter (or function of one or
more parameters).
Logically, significance testing begins with the specification of the trial or hy-
pothesized value. Special jargon and notation exist for the statement of this value.
Parameter = #
or
Function of parameters = #
H0 : µ = 139.8 (6.11)
meaning that there is no difference between µ and the target value of 139.8 g.
After formulating a null hypothesis, what kinds of departures from it are of
interest must be specified.
same form as the corresponding null hypothesis, except that the equality sign
is replaced by 6=, >, or <.
H0 : µ = # H0 : µ = # H0 : µ = #
Ha : µ > # Ha : µ < # Ha : µ 6= #
In the example of the filling operation, there is a need to detect both the possibility of
consistently underfilled (µ < 139.8 g) and the possibility of consistently overfilled
(µ > 139.8 g) jars. Thus, an appropriate alternative hypothesis is
Ha : µ 6= 139.8 (6.12)
Definition 6 A test statistic is the particular form of numerical data summarization used
in a significance test. The formula for the test statistic typically involves the
number appearing in the null hypothesis.
Definition 7 A reference (or null) distribution for a test statistic is the probability dis-
tribution describing the test statistic, provided the null hypothesis is in fact
true.
The values of the test statistic considered to cast doubt on the validity of the
null hypothesis are specified after looking at the form of the alternative hypothesis.
Roughly speaking, values are identified that are more likely to occur if the alternative
hypothesis is true than if the null hypothesis holds.
6.2 Large-Sample Significance Tests for a Mean 349
The discussion of the filling process scenario has vacillated between using x̄
and its standardized version Z given in equation (6.10) for a test statistic. Equation
(6.10) is a specialized form of the general (large-n, known σ ) test statistic for µ,
Large-sample x̄ − #
Z= (6.13)
known σ test σ
statistic for µ √
n
for the present scenario, where the hypothesized value of µ is 139.8, n = 25, and
σ = 1.6. It is most convenient to think of the test statistic for this kind of problem
in the standardized form shown in equation (6.13) rather than as x̄ itself. Using
form (6.13), the reference distribution will always be the same—namely, standard
normal.
Continuing with the filling example, note that if instead of the null hypothesis
(6.11), the alternative hypothesis (6.12) is operating, observed x̄’s much larger or
much smaller than 139.8 will tend to result. Such x̄’s will then, via equation (6.13),
translate respectively to large or small (that is, large negative numbers in this case)
observed values of Z —i.e., large values |z|. Such observed values render the null
hypothesis implausible.
Having specified how data will be used to judge the plausibility of the null
hypothesis, it remains to collect them, plug them into the formula for the test
statistic, and (using the calculated value and the reference distribution) arrive at a
quantitative assessment of the plausibility of H0 . There is jargon for the form this
will take.
Small p-values The smaller the observed level of significance, the stronger the evidence against
are evidence the validity of the null hypothesis. In the context of the filling operation, with an
against H0 observed value of the test statistic of
z = −2.5
which gives fairly strong evidence against the possibility that the process mean is
on target.
350 Chapter 6 Introduction to Formal Statistical Inference
1. H0 : µ = 139.8.
2. Ha : µ 6= 139.8.
3. The test statistic is
x̄ − 139.8
Z=
σ
√
n
This is reasonably strong evidence that the process mean fill level is not
on target.
x̄ − µ
Z=
s
√
n
H0 : µ = #
a widely applicable method will simply be to use the logic already introduced but
with the statistic
Large-sample x̄ − #
Z= (6.14)
test statistic s
for µ
√
n
Example 5 have mean breakaway torque equal to the factory-set mean value of 33.5 in. oz.
(continued ) The five-step significance-testing format can be used.
1. H0 : µ = 33.5.
2. Ha : µ < 33.5.
(Here the alternative hypothesis is directional, amounting to a research
hypothesis based on the engineer’s suspicions about the relationship be-
tween drive failure and breakaway torque.)
3. The test statistic is
x̄ − 33.5
Z=
s
√
n
11.5 − 33.5
z= = −22.0
5.1
√
26
H0 Type I
The true state error
of affairs is
described by: Type II
Ha
error
It is standard practice to use small numbers, like .1, .05, or even .01, for α. This
puts some inertia in favor of H0 into the decision-making process. (Such a practice
guarantees that type I errors won’t be made very often. But at the same time, it
creates an asymmetry in the treatment of H0 and Ha that is not always justified.)
Definition 10 and Figure 6.5 make it clear that type I errors are not the only
undesirable possibility. The possibility of type II errors must also be considered.
For most of the testing methods studied in this book, calculation of β’s is more
than the limited introduction to probability given in Chapter 5 will support. But the
job can be handled for the simple known-σ situation that was used to introduce the
topic of significance testing. And making a few such calculations will provide some
intuition consistent with what, qualitatively at least, holds in general.
Example 4 Again consider the filling process and testing H0 : µ = 139.8 vs. Ha : µ 6= 139.8.
(continued ) This time suppose that significance testing based on n = 25 will be used tomorrow
to decide whether or not to adjust the process. Type II error probabilities, calcu-
lated supposing µ = 139.5 and µ = 139.2 for tests using α = .05 and α = .2,
will be compared.
First consider α = .05. The decision will be made in favor of H0 if the p-
value exceeds .05. That is, the decision will be in favor of the null hypothesis if
the observed value of Z given in equation (6.10) (generalized in formula (6.13))
is such that
i.e., if
i.e., if
β ≈ .50
β ≈ .61
β ≈ .27
Table 6.2
n = 25 type II error
probabilities (β)
µ
139.2 139.5
The story told by Table 6.2 applies in qualitative terms to all uses of significance
testing in decision-making contexts. The further H0 is from being true, the smaller
the corresponding β. And small α’s imply large β’s and vice versa.
There is one other element of this general picture that plays an important role in
The effect of the determination of error probabilities. That is the matter of sample size. If a sample
sample size size can be increased, for a given α, the corresponding β’s can be reduced. Redo the
on β’s calculations of the previous example, this time supposing that n = 100 rather than
25. Table 6.3 shows the type II error probabilities that should result, and comparison
with Table 6.2 serves to indicate the sample-size effect in the filling-process example.
Analogy between An analogy helpful in understanding the standard logic applied when signifi-
testing and a cance testing is employed in decision-making involves thinking of the process of
criminal trial coming to a decision as a sort of legal proceeding, like a criminal trial. In a criminal
trial, there are two opposing hypotheses, namely
Evidence, playing a role similar to the data used in testing, is gathered and used to
decide between the two hypotheses. Two types of potential error exist in a criminal
trial: the possibility of convicting an innocent person (parallel to the type I error)
and the possibility of acquitting a guilty person (similar to the type II error). A
criminal trial is a situation where the two types of error are definitely thought of as
having differing consequences, and the two hypotheses are treated asymmetrically.
The a priori presumption in a criminal trial is in favor of H0 , the defendant’s
innocence. In order to keep the chance of a false conviction small (i.e., keep α
small), overwhelming evidence is required for conviction, in much the same way
that if small α is used in testing, extreme values of the test statistic are needed in
order to indicate rejection of H0 . One consequence of this method of operation in
criminal trials is that there is a substantial chance that a guilty individual will be
acquitted, in the same way that small α’s produce big β’s in testing contexts.
This significance testing/criminal trial parallel is useful, but do not make more
of it than is justified. Not all significance-testing applications are properly thought
of in this light. And few engineering scenarios are simple enough to reduce to a
“decide between H0 and Ha ” choice. Sensible applications of significance testing are
Table 6.3
n = 100 Type II Error
Probabilities (β)
µ
139.2 139.5
WASHINGTON (AP) —A gadget that cuts off a car’s air conditioner when the
vehicle accelerates has become the first product aimed at cutting gasoline
consumption to win government endorsement.
The device, marketed under the name “Pass Master,” can provide a
“small but real fuel economy benefit,” the Environmental Protection Agency
said Wednesday.
Motorists could realize up to 4 percent fuel reduction while using their air
conditioners on cars equipped with the device, the agency said. That would
translate into .8-miles-per-gallon improvement for a car that normally gets 20
miles to the gallon with the air conditioner on.
The agency cautioned that the 4 percent figure was a maximum amount
and could be less depending on a motorist’s driving habits, the type of car and
the type of air conditioner.
But still the Pass Master, which sells for less than $15, is the first of 40
products to pass the EPA’s tests as making any “statistically significant”
improvement in a car’s mileage.
Figure 6.8 Article from The Lafayette Journal and Courier, Page D-3, August 28, 1980.
Reprinted by permission of the Associated Press.
c 1980 the Associated Press.
result. And an engineer equipped with a confidence interval for the mean mileage
improvement is in a better position to judge this than is one who knows only that
the p-value was less than .05.
Example 5 To illustrate the effect that sample size has on observed level of significance,
(continued ) return to the breakaway torque problem and consider two hypothetical samples,
one based on n = 25 and the other on n = 100 but both giving x̄ = 32.5 in. oz
and s = 5.1 in. oz.
For testing H0 : µ = 33.5 with Ha : µ < 33.5, the first hypothetical sample
gives
32.5 − 33.5
z= = −.98
5.1
√
25
with associated observed level of significance
8(−.98) = .16
32.5 − 33.5
z= = −1.96
5.1
√
100
360 Chapter 6 Introduction to Formal Statistical Inference
Because the second sample size is larger, the second sample gives stronger
evidence that the mean breakaway torque is below 33.5 in. oz. But the best data-
based guess at the difference between µ and 33.5 is x̄ − 33.5 = −1.0 in. oz in
both cases. And it is the size of the difference between µ and 33.5 that is of
primary engineering importance.
It is further useful to realize that in addition to doing its primary job of providing
an interval of plausible values for a parameter, a confidence interval itself also pro-
vides some significance-testing information. For example, a 95% confidence interval
for a parameter contains all those values of the parameter for which significance
tests using the data in hand would produce p-values bigger than 5%. (Those values
not covered by the interval would have associated p-values smaller than 5%.)
Example 5 Recall from Section 6.1 that a 90% one-sided confidence interval for the mean
(continued ) breakaway torque for failed drives is (−∞, 12.8). This means that for any value,
#, larger than 12.8 in. oz, a significance test of H0 : µ = # with Ha : µ < # would
produce a p-value less than .1. So clearly, the observed level of significance
corresponding to the null hypothesis H0 : µ = 33.5 is less than .1 . (In fact, as
was seen earlier in this section, the p-value is 0 to two decimal places.) Put more
loosely, the interval (−∞, 12.8) is a long way from containing 33.5 in. oz and
therefore makes such a value of µ quite implausible.
The discussion here could well raise the question “What practical role remains
for significance testing?” Some legitimate answers to this question are
1. In an almost negative way, p-values can help an engineer gauge the extent to
which data in hand are inconclusive. When observed levels of significance
are large, more information is needed in order to arrive at any definitive
judgment.
2. Sometimes legal requirements force the use of significance testing in a
compliance or effectiveness demonstration. (This was the case in Figure 6.8,
where before the Pass Master could be marketed, some mileage improvement
had to be legally demonstrated.)
3. There are cases where the use of significance testing in a decision-making
framework is necessary and appropriate. (An example is acceptance sam-
pling: Based on information from a sample of items from a large lot, one
must determine whether or not to receive shipment of the lot.)
6.3 One- and Two-Sample Inference for Means 361
So, properly understood and handled, significance testing does have its place in
engineering practice. Thus, although the rest of this book features estimation over
significance testing, methods of significance testing will not be completely ignored.
Section 2 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. In the aluminum contamination study discussed in (c) In the students’ application, the mean height of
Exercise 2 of Section 6.1 and in Chapter Exer- the punches did not tell the whole story about
cise 2 of Chapter 3, it was desirable to have mean how they worked in the stamping machine.
aluminum content for samples of recycled plas- Several of these punches had to be placed side
tic below 200 ppm. Use the five-step significance- by side and used to stamp the same piece of
testing format and determine the strength of the material. In this context, what other feature of
evidence in the data that in fact this contamination the height distribution is almost certainly of
goal has been violated. (You will want to begin with practical importance?
H0 : µ = 200 ppm and use Ha : µ > 200 ppm.) 3. Discuss, in the context of Exercise 2, part (a), the
2. Heyde, Kuebrick, and Swanson measured the potential difference between statistical significance
heights of 405 steel punches of a particular type. and practical importance.
These were all from a single manufacturer and were 4. In the context of the machine screw diameter study
supposed to have heights of .500 in. (The stamping of Exercise 4 of Section 6.1, suppose that the nom-
machine in which these are used is designed to use inal diameter of such screws is 4.70 mm. Use
.500 in. punches.) The students’ measurements had the five-step significance-testing format and as-
x̄ = .5002 in. and s = .0026 in. (The raw data are sess the strength of the evidence provided by the
given in Chapter Exercise 9 of Chapter 3.) data that the long-run mean measured diameter dif-
(a) Use the five-step format and test the hypothesis fers from nominal. (You will want to begin with
that the mean height of such punches is “on H0 : µ = 4.70 mm and use Ha : µ 6= 4.70 mm.)
spec” (i.e., is .500 in.).
5. Discuss, in the context of Exercise 4, the poten-
(b) Make a 98% two-sided confidence interval for
tial difference between statistical significance and
the mean height of such punches produced by
practical importance.
this manufacturer under conditions similar to
those existing when the students’ punches were
manufactured. Is your interval consistent with
the outcome of the test in part (a)? Explain.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
x̄ − µ
(6.16)
s
√
n
is approximately standard normal. So if, for example, one mechanically uses the
large-n confidence interval formula
s
x̄ ± z √ (6.17)
n
with a small sample, there is no way of assessing what actual level of confidence
should be declared. That is, for small n, using z = 1.96 in formula (6.17) generally
doesn’t produce 95% confidence intervals. And without a further condition, there is
neither any way to tell what confidence might be associated with z = 1.96 nor any
way to tell how to choose z in order to produce a 95% confidence level.
There is one important special circumstance in which it is possible to reason in
a way parallel to the work in Sections 6.1 and 6.2 and arrive at inference methods
for means based on small sample sizes. That is the situation where it is sensible to
model the observations as iid normal random variables. The normal observations
case is convenient because although the variable (6.16) is not standard normal, it
does have a recognized, tabled distribution. This is the Student t distribution.
ν = 11
ν =5
ν =2
ν =1
–3 –2 –1 0 1 2 3 t
The word Student in Definition 13 was the pen name of the statistician who first
came upon formula (6.18). Expression (6.18) is rather formidable looking. No direct
computations with it will actually be required in this book. But, it is useful to have
expression (6.18) available in order to sketch several t probability densities, to get a
feel for their shape. Figure 6.9 pictures the t densities for degrees of freedom ν = 1,
2, 5, and 11, along with the standard normal density.
The message carried by Figure 6.9 is that the t probability densities are bell
shaped and symmetric about 0. They are flatter than the standard normal density but
are increasingly like it as ν gets larger. In fact, for most practical purposes, for ν
t distributions larger than about 30, the t distribution with ν degrees of freedom and the standard
and the standard normal distribution are indistinguishable.
normal distribution Probabilities for the t distributions are not typically found using the density in
expression (6.18), as no simple antiderivative for f (t) exists. Instead, it is common
to use tables (or statistical software) to evaluate common t distribution quantiles
and to get at least crude bounds on the types of probabilities needed in significance
testing. Table B.4 is a typical table of t quantiles. Across the top of the table
are several cumulative probabilities. Down the left side are values of the degrees
of freedom parameter, ν. In the body of the table are corresponding quantiles.
Notice also that the last line of the table is a “ν = ∞” (i.e., standard normal)
line.
Example 7 First, looking at the ν = 5 row of Table B.4 under the cumulative proba-
(continued ) bility .95, 2.015 is found in the body of the table. That is, Q(.95) = 2.015 or
(equivalently) P[T ≤ 2.015] = .95.
Then note that by symmetry,
Looking at the ν = 5 row of Table B.4, 1.9 is between the .90 and .95 quantiles
of the t5 distribution. That is,
so finally
P[|T | > 2.3] = P[T < −2.3] + P[T > 2.3] = 2P[T > 2.3]
= 2(1 − P[T ≤ 2.3])
Then, from the ν = 5 row of Table B.4, 2.3 is seen to be between the .95 and
.975 quantiles of the t5 distribution. That is,
so
t5 Distribution
P[T ≤ 2.015] = .95
–2 –1 0 1
2.015 = Q(.95)
t5 Distribution
–2 –1 0 1
1.476 = Q(.9) 2.015 = Q(.95)
–1.9 1.9
–2 –1 0 1
2.015 = Q(.95) 2.571 = Q(.975)
2.3
The connection between expressions (6.18) and (6.16) that allows the develop-
ment of small-n inference methods for normal observations is that if an iid normal
model is appropriate,
x̄ − µ
T = (6.19)
s
√
n
where t is chosen such that the tn−1 distribution assigns probability corresponding
to the desired confidence level to the interval between −t and t. Further, the null
hypothesis
H0 : µ = #
x̄ − #
Normal distribution T = (6.21)
test statistic for µ s
√
n
Table 6.4
Cycles to Failure of Ten
Springs under 950 N/mm2
Stress (103 cycles)
Spring Lifetimes
225, 171, 198, 189, 189
135, 162, 135, 117, 162
6.3 One- and Two-Sample Inference for Means 367
1.2
0.0
–1.2
33.1
168.3 ± 1.833 √
10
i.e.,
168.3 ± 19.2
i.e.,
149.1 × 103 cycles and 187.5 × 103 cycles
368 Chapter 6 Introduction to Formal Statistical Inference
1.2 1.2
0.0 0.0
–1.2 –1.2
–2.4 –1.2 0.0 1.2 2.4 –2.4 –1.2 0.0 1.2 2.4
1.2 1.2
0.0 0.0
–1.2 –1.2
–2.4 –1.2 0.0 1.2 2.4 –2.4 –1.2 0.0 1.2 2.4
1.2 1.2
0.0 0.0
–1.2 –1.2
–2.4 –1.2 0.0 1.2 2.4 –2.4 –1.2 0.0 1.2 2.4
1.2 1.2
0.0 0.0
–1.2 –1.2
–2.4 –1.2 0.0 1.2 2.4 –2.4 –1.2 0.0 1.2 2.4
1.2 1.2
0.0 0.0
–1.2 –1.2
–2.4 –1.2 0.0 1.2 2.4 –2.4 –1.2 0.0 1.2 2.4
Table 6.5
Leading-Edge and Trailing-Edge Dimensions for Five
Workpieces
Leading-Edge Trailing-Edge
Piece Measurement (in.) Measurement (in.)
1 .168 .169
2 .170 .168
3 .165 .168
4 .165 .168
5 .170 .169
Large-sample s
confidence d̄ ± z √d (6.22)
n
limits for µd
6.3 One- and Two-Sample Inference for Means 371
H0 : µd = # (6.23)
Large-sample d̄ − #
Z= (6.24)
test statistic s
for µd √d
n
Normal distribution s
confidence limits d̄ ± t √d (6.25)
n
for µd
and the null hypothesis (6.23) can be tested using the test statistic
d̄ − #
Normal distribution T = (6.26)
test statistic for µd
s
√d
n
Example 9 To illustrate this method of paired differences, consider testing the null hypothesis
(continued ) H0 : µd = 0 and making a 95% confidence interval for any consistent difference
between leading- and trailing-edge dimensions, µd , based on the data in Table
6.5.
Begin by reducing the n = 5 paired observations in Table 6.5 to differences
1.0
Standard normal quantile
–1.0
1. H0 : µd = 0.
2. Ha : µd 6= 0.
(There is a priori no reason to adopt a one-sided alternative hypothesis.)
6.3 One- and Two-Sample Inference for Means 373
d̄ − 0
T =
s
√d
n
−.0008
t= = −.78
.0023
√
5
Consulting Table B.4 for the .975 quantile of the t4 distribution, t = 2.776
is the appropriate multiplier for use in expression (6.25) for 95% confidence.
That is, a two-sided 95% confidence interval for the mean difference between the
leading- and trailing-edge dimensions has endpoints
.0023
−.0008 ± 2.776 √
5
i.e.,
i.e.,
In situations like Example 10, it is useful to adopt subscript notation for both the
parameters and the statistics—for example, letting µ1 and µ2 stand for underlying
distributional means corresponding to the first and second conditions and x̄ 1 and x̄ 2
stand for corresponding sample means. Now if the two data-generating mechanisms
are conceptually essentially equivalent to sampling with replacement from two
distributions, Section 5.5 says that x̄ 1 has mean µ1 and variance σ12 /n 1 , and x̄ 2 has
mean µ2 and variance σ22 /n 2 .
The difference in sample means x̄ 1 − x̄ 2 is a natural statistic to use in comparing
µ1 and µ2 . Proposition 1 in Chapter 5 (see page 307) implies that if it is reasonable
6.3 One- and Two-Sample Inference for Means 375
Molded Crushed
7.9 11
4.5, 3.6, 1.2 12
9.8, 8.9, 7.9, 7.1, 6.1, 5.7, 5.1 12
2.3, 1.3, 0.0 13
8.0, 7.0, 6.5, 6.3, 6.2 13
2.2, 0.1 14
14
2.1, 1.2, 0.2 15
15
16 1.8
16 5.8, 9.6
17 1.3, 2.0, 2.4, 3.3, 3.4, 3.7
17 6.6, 9.8
18 0.2, 0.9, 3.3, 3.8, 4.9
18 5.5, 6.5, 7.1, 7.3, 9.1, 9.8
19 0.0, 1.0
19
E(x̄ 1 − x̄ 2 ) = µ1 − µ2
and
σ12 σ2
Var(x̄ 1 − x̄ 2 ) = + 2
n1 n2
If, in addition, n 1 and n 2 are large (so that x̄ 1 and x̄ 2 are each approximately normal),
x̄ 1 − x̄ 2 is approximately normal—i.e.,
x̄ 1 − x̄ 2 − (µ1 − µ2 )
Z= s (6.28)
σ12 σ2
+ 2
n1 n2
It is possible to begin with the fact that the variable (6.28) is approximately
standard normal and end up with confidence interval and significance-testing meth-
ods for µ1 − µ2 by using logic exactly parallel to that in the “known-σ ” parts of
Sections 6.1 and 6.2. But practically, it is far more useful to begin instead with an
expression that is free of the parameters σ1 and σ2 . Happily, for large n 1 and n 2 , not
only is the variable (6.28) approximately standard normal but so is
x̄ 1 − x̄ 2 − (µ1 − µ2 )
Z= s (6.29)
s12 s2
+ 2
n1 n2
Then the standard logic of Section 6.1 shows that a two-sided large-sample confi-
dence interval for the difference µ1 − µ2 based on two independent samples has
endpoints
Large-sample s
confidence limits s12 s2
x̄ 1 − x̄ 2 ± z + 2 (6.30)
for µ1 − µ2 n1 n2
where z is chosen such that the probability that the standard normal distribution
assigns to the interval between −z and z corresponds to the desired confidence. And
the logic of Section 6.2 shows that under the same conditions,
H0 : µ1 − µ2 = #
x̄ 1 − x̄ 2 − #
Large-sample Z= s (6.31)
test statistic s12 s2
for µ1 − µ2 + 2
n1 n2
Example 10 In the molding problem, the crushed pieces were a priori expected to pack better
(continued ) than the molded pieces (that for other purposes are more convenient). Consider
testing the statistical significance of the difference in mean weights and also
making a 95% one-sided confidence interval for the difference (declaring that the
crushed mean weight minus the molded mean weight is at least some number).
The sample sizes here (n 1 = n 2 = 24) are borderline for being called large.
It would be preferable to have a few more observations of each type. Lacking
them, we will go ahead and use the methods of expressions (6.30) and (6.31) but
6.3 One- and Two-Sample Inference for Means 377
remain properly cautious of the results should they in any way produce a “close
call” in engineering or business terms.
Arbitrarily labeling “crushed” condition 1 and “molded” condition 2 and
calculating from the data in Figure 6.14 that x̄ 1 = 179.55 g, s1 = 8.34 g, x̄ 2 =
132.97 g, and s2 = 9.31 g, the five-step testing format produces the following
summary:
1. H0 : µ1 − µ2 = 0.
2. Ha : µ1 − µ2 > 0.
(The research hypothesis here is that the crushed mean exceeds the molded
mean so that the difference, taken in this order, is positive.)
3. The test statistic is
x̄ 1 − x̄ 2 − 0
Z= s
s12 s2
+ 2
n1 n2
179.55 − 132.97 − 0
z= s = 18.3
(8.34)2
(9.31)2
+
24 24
i.e., exceeds
Students are sometimes uneasy about the arbitrary choice involved in labeling
the two conditions in a two-sample study. The fact is that either one can be used. As
long as a given choice is followed through consistently, the real-world conclusions
reached will be completely unaffected by the choice. In Example 10, if the molded
condition is labeled number 1 and the crushed condition number 2, an appropriate
one-sided confidence for the molded mean minus the crushed mean is
(−∞, −42.38)
This has the same meaning in practical terms as the interval in the example.
The present methods apply where single measurements are made on each ele-
ment of two different samples. This stands in contrast to problems of paired data
(where there are bivariate observations on a single sample). In the woodworking
case of Example 9, the data were paired because both leading-edge and trailing-edge
measurements were made on each piece. If leading-edge measurements were taken
from one group of items and trailing-edge measurements from another, a two-sample
(not a paired difference) analysis would be in order.
Example 8 The data of W. Armstrong on spring lifetimes (appearing in the book by Cox
(continued ) and Oakes) not only concern spring longevity at a 950 N/mm2 stress level but
also longevity at a 900 N/mm2 stress level. Table 6.7 repeats the 950 N/mm2 data
from before and gives the lifetimes of ten springs at the 900 N/mm2 stress level
as well.
6.3 One- and Two-Sample Inference for Means 379
Table 6.7
Spring Lifetimes under Two Different Levels of Stress
(103 cycles)
1.0
Standard normal quantile
–1.0
Figure 6.15 consists of normal plots for the two samples made on a single
set of axes. In light of the kind of variation in linearity and slope exhibited in
Figure 6.12 by the normal plots for samples of this size (n = 10) from a single
normal distribution, there is certainly no strong evidence in Figure 6.15 against
the appropriateness of an “equal variances, normal distributions” model for spring
lifetimes.
If the assumption that σ1 = σ2 is used, then the common value is called σ , and
it makes sense that both s1 and s2 will approximate σ . That suggests that they should
somehow be combined into a single estimate of the basic, baseline variation. As it
turns out, mathematical convenience dictates a particular method of combining or
pooling the individual s’s to arrive at a single estimate of σ .
380 Chapter 6 Introduction to Formal Statistical Inference
Example 8 In the spring-life case, making the arbitrary choice to call the 900 N/mm2 stress
(continued ) level condition 1 and the 950 N/mm2 stress level condition 2, s1 = 42.9 (103
cycles) and s2 = 33.1 (103 cycles). So pooling the two sample variances via
formula (6.32) produces
x̄ 1 − x̄ 2 − (µ1 − µ2 )
Z= s
σ12 σ2
+ 2
n1 n2
x̄ 1 − x̄ 2 − (µ1 − µ2 )
Z= s (6.33)
1 1
σ +
n1 n2
6.3 One- and Two-Sample Inference for Means 381
One could use the fact that expression (6.33) is standard normal to produce methods
for confidence interval estimation and significance testing. But for use, these would
require the input of the parameter σ . So instead of beginning with expression (6.28)
or (6.33), it is standard to replace σ in expression (6.33) with sP and begin with the
quantity
(x̄ 1 − x̄ 2 ) − (µ1 − µ2 )
T = s (6.34)
1 1
sP +
n1 n2
Expression (6.34) is crafted exactly so that under the present model assumptions,
the variable (6.34) has a well-known, tabled probability distribution: the t distribu-
tion with ν = (n 1 − 1) + (n 2 − 1) = n 1 + n 2 − 2 degrees of freedom. (Notice that
the n 1 − 1 degrees of freedom associated with the first sample add together with
the n 2 − 1 degrees of freedom associated with the second to produce n 1 + n 2 − 2
overall.) This probability fact, again via the kind of reasoning developed in Sec-
tions 6.1 and 6.2, produces inference methods for µ1 − µ2 . That is, a two-sided
confidence interval for the difference µ1 − µ2 , based on independent samples from
normal distributions with a common variance, has endpoints
Normal distributions s
(σ1 = σ2 ) confidence 1 1
x̄ 1 − x̄ 2 ± tsP + (6.35)
limits for µ1 − µ2 n1 n2
where t is chosen such that the probability that the tn +n −2 distribution assigns to
1 2
the interval between −t and t corresponds to the desired confidence. And under the
same conditions,
H0 : µ1 − µ2 = #
x̄ 1 − x̄ 2 − #
Normal distributions T = s (6.36)
(σ1 = σ2 ) test 1 1
sP +
statistic for µ1 − µ2 n1 n2
Example 8 We return to the spring-life case to illustrate small-sample inference for two
(continued ) means. First consider testing the hypothesis of equal mean lifetimes with an
alternative of increased lifetime accompanying a reduction in stress level. Then
382 Chapter 6 Introduction to Formal Statistical Inference
Example 8 consider making a two-sided 95% confidence interval for the difference in mean
(continued ) lifetimes.
Continuing to call the 900 N/mm2 stress level condition 1 and the 950 N/mm2
stress level condition 2, from Table 6.7 x̄ 1 = 215.1 and x̄ 2 = 168.3, while (from
before) sP = 38.3. The five-step significance-testing format then gives the fol-
lowing:
1. H0 : µ1 − µ2 = 0.
2. Ha : µ1 − µ2 > 0.
(The engineering expectation is that condition 1 produces the larger life-
times.)
x̄ 1 − x̄ 2 − 0
3. The test statistic is T = s
1 1
sP +
n1 n2
The reference distribution is t with 10 + 10 − 2 = 18 degrees of freedom,
and large observed t will count as evidence against H0 .
4. The samples give
215.1 − 168.3 − 0
t= r = 2.7
1 1
38.3 +
10 10
i.e.,
46.8 ± 36.0
i.e.,
The data in Table 6.7 provide enough information to establish convincingly that
increased stress is associated with reduced mean spring life. But although the
apparent size of that reduction when moving from the 900 N/mm2 level (condition
1) to the 950 N/mm2 level (condition 2) is 46.8 × 103 cycles, the variability
present in the data is large enough (and the sample sizes small enough) that only
a precision of ±36.0 × 103 cycles can be attached to the figure 46.8 × 103 cycles.
!2
s12 s2
Satterthwaite’s + 2
“estimated degrees n1 n2
ν̂ = (6.37)
of freedom” s14 s24
+
(n 1 − 1)n 21 (n 2 − 1)n 22
and for a desired confidence level, suppose that t̂ is such that the t distribution with
ν̂ degrees of freedom assigns that probability to the interval between −t̂ and t̂. Then
the two endpoints
Satterthwaite
(approximate) s
normal distribution s12 s2
x̄ 1 − x̄ 2 ± t̂ + 2 (6.38)
confidence limits n1 n2
for µ1 − µ2
Example 8 Armstrong collected spring lifetime data at stress levels besides the 900 and 950
(continued ) N/mm2 levels used thus far in this example. Ten springs tested at 850 N/mm2
had lifetimes with x̄ = 348.1 and s = 57.9 (both in 103 cycles) and a reasonably
linear normal plot. But taking the 850, 900, and 950 N/mm2 data together, there
is a clear trend to smaller and more consistent lifetimes as stress is increased. In
light of this fact, should mean lifetimes at the 850 and 950 N/mm2 stress levels
be compared, use of a constant variance assumption seems questionable.
384 Chapter 6 Introduction to Formal Statistical Inference
Example 8 Consider then what the Satterthwaite method (6.38) gives for two-sided
(continued ) approximate 95% confidence limits for the difference in 850 and 950 N/mm2
mean lifetimes. Equation (6.37) gives
!2
(57.9)2 (33.1)2
+
10 10
I ν̂ = = 14.3
(57.9)4 (33.1)4
+
9(100) 9(100)
and (rounding “degrees of freedom” down) the .975 quantile of the t14 distribution
is 2.145. So the 95% limits (6.38) for the (850 N/mm2 minus 950 N/mm2 )
difference in mean lifetimes (µ850 − µ950 ) are
s
(57.9)2 (33.1)2
348.1 − 168.3 ± 2.145 +
10 10
i.e.,
179.8 ± 45.2
i.e.,
The inference methods represented by displays (6.35), (6.36), and (6.38) are
the last of the standard one- and two-sample methods for means. In the next two
sections, parallel methods for variances and proportions are considered. But before
leaving this section to consider those methods, a final comment is appropriate about
the small-sample methods.
This discussion has emphasized that, strictly speaking, the nominal properties
(in terms of coverage probabilities for confidence intervals and relevant p-value
declarations for significance tests) of the small-sample methods depend on the
appropriateness of exactly normal underlying distributions and (in the cases of the
methods (6.35) and (6.36)) exactly equal variances. On the other hand, when actually
applying the methods, rather crude probability-plotting checks have been used for
verifying (only) that the models are roughly plausible. According to conventional
statistical wisdom, the small-sample methods presented here are remarkably robust
to all but gross departures from the model assumptions. That is, as long as the model
assumptions are at least roughly a description of reality, the nominal confidence
levels and p-values will not be ridiculously incorrect. (For example, a nominally
90% confidence interval method might in reality be only an 80% method, but it will
not be only a 20% confidence interval method.) So the kind of plotting that has been
illustrated here is often taken as adequate precaution against unjustified application
of the small-sample inference methods for means.
6.3 One- and Two-Sample Inference for Means 385
Section 3 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. What is the practical consequence of using a “nor- (a) Make a 90% two-sided confidence interval for
mal distribution” confidence interval formula when the mean difference in digital and vernier read-
in fact the underlying data-generating mechanism ings for this student.
cannot be adequately described using a normal dis- (b) Assess the strength of the evidence provided
tribution? Say something more specific/informa- by these differences to the effect that there is a
tive than “an error might be made,” or “the interval systematic difference in the readings produced
might not be valid.” (What, for example, can be said by the two calipers (at least when employed by
about the real confidence level that ought to be as- this student).
sociated with a nominally 90% confidence interval (c) Briefly discuss why your answers to parts (a)
in such a situation?) and (b) of this exercise are compatible. (Dis-
2. Consider again the situation of Exercise 3 of Sec- cuss how the outcome of part (b) could easily
tion 3.1. (It concerns the torques required to loosen have been anticipated from the outcome of part
two particular bolts holding an assembly on a piece (a).)
of machinery.) 4. B. Choi tested the stopping properties of various
(a) What model assumptions are needed in order bike tires on various surfaces. For one thing, he
to do inference for the mean top-bolt torque tested both treaded and smooth tires on dry con-
here? Make a plot to investigate the necessary crete. The lengths of skid marks produced in his
distributional assumption. study under these two conditions were as follows
(b) Assess the strength of the evidence in the data (in cm).
that the mean top-bolt torque differs from a
target value of 100 ft lb. Treaded Smooth
(c) Make a two-sided 98% confidence interval for
the mean top-bolt torque. 365, 374, 376 341, 348, 349
(d) What model assumptions are needed in order 391, 401, 402 355, 375, 391
to compare top-bolt and bottom-bolt torques
here? Make a plot for investigating the neces- (a) In order to make formal inferences about
sary distributional assumption. µTreaded − µSmooth based on these data, what
(e) Assess the strength of the evidence that there must you be willing to use for model assump-
is a mean increase in required torque as one tions? Make a plot to investigate the reason-
moves from the top to the bottom bolts. ableness of those assumptions.
(f) Give a 98% two-sided confidence interval for (b) Proceed under the necessary model assump-
the mean difference in torques between the top tions to assess the strength of Choi’s evidence
and bottom bolts. of a difference in mean skid lengths.
3. The machine screw measurement study of DuToit, (c) Make a 95% two-sided confidence interval for
Hansen, and Osborne referred to in Exercise 4 of µTreaded − µSmooth assuming that treaded and
Section 6.1 involved measurement of diameters of smooth skid marks have the same variability.
each of 50 screws with both digital and vernier- (d) Use the Satterthwaite method and make an ap-
scale calipers. For the student referred to in that proximate 95% two-sided confidence interval
exercise, the differences in measured diameters for µTreaded − µSmooth assuming only that skid
(digital minus vernier, with units of mm) had the mark lengths for both types of tires are nor-
following frequency distribution: mally distributed.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Form (6.39) is not terribly inviting, but neither is it unmanageable. For instance,
it is easy enough to use it to make the kind of plots in Figure 6.16 for comparing the
shapes of the χν2 distributions for various choices of ν.
The χν2 distribution has mean ν and variance 2ν. For ν = 2, it is exactly the
exponential distribution with mean 2. For large ν, the χν2 distributions look increas-
ingly bell-shaped (and can in fact be approximated by normal distributions with
matching means and variances). Rather than using form (6.39) to find χ 2 probabil-
ities, it is more common to use tables of χ 2 quantiles. Table B.5 is one such table.
Across the top of the table are several cumulative probabilities. Down the left side
of the table are values of the degrees of freedom parameter, ν. In the body of the
table are corresponding quantiles.
6.4 One- and Two-Sample Inference for Variances 387
f (x)
ν =1
ν =2
ν =3
ν =5
ν =8
5 10 15 x
Finally, since 10.0 lies between the (ν = 3 line) entries of the table corresponding
to cumulative probabilities .975 and .99 (i.e., the .975 and .99 quantiles of the χ32
distribution), one may reason that
(n − 1)s 2
X2 = (6.40)
σ2
388 Chapter 6 Introduction to Formal Statistical Inference
has a χn−1
2
distribution. This fact is what is needed to identify inference methods
for σ .
That is, given a desired confidence level concerning σ , one can choose χ 2
quantiles (say, L and U ) such that the probability that a χn−1
2
random variable will
take a value between L and U corresponds to that confidence level. (Typically, L
and U are chosen to “split the ‘unconfidence’ between the upper and lower χn−1 2
(n − 1)s 2
L< <U (6.41)
σ2
corresponds to the desired confidence level. But expression (6.41) is algebraically
equivalent to the eventuality that
(n − 1)s 2 (n − 1)s 2
< σ2 <
U L
This then means that when an engineering data-generating mechanism can be
thought of as essentially equivalent to random sampling from a normal distribu-
tion, a two-sided confidence interval for σ 2 has endpoints
Normal distribution
(n − 1)s 2 (n − 1)s 2
confidence limits and (6.42)
for σ 2 U L
H0 : σ 2 = #
Normal distribution
(n − 1)s 2
test statistic for σ 2 X2 = (6.43)
#
and a χn−1
2
reference distribution.
One feature of the testing methodology that needs comment concerns the com-
p-values for puting of p-values in the case that the alternative hypothesis is of the form Ha :
testing σ 2 6= #. ( p-values for the one-sided alternative hypotheses Ha : σ 2 < # and Ha :
H0 : σ 2 = # σ 2 > # are, respectively, the left and right χn−1
2
tail areas beyond the observed value
6.4 One- and Two-Sample Inference for Variances 389
of X 2 .) The fact that the χ 2 distributions have no point of symmetry leaves some
doubt for two-sided significance testing as to how an observed value of X 2 should
be translated into a (two-sided) p-value. The convention that will be used here is
as follows. If the observed value is larger than the χn−1 2
median, the (two-sided)
p-value will be twice the χn−12
probability to the right of the observed value. If the
observed value of X 2 is smaller than the χn−12
median, the (two-sided) p-value will
be twice the χn−1 probability to the left of the observed value.
2
Confidence Knowing that display (6.42) gives endpoints for a confidence interval for σ 2
limits for also leads to confidence intervals for functions of σ 2 . The square roots of the values
functions of σ 2 in display (6.42) give endpoints for a confidence interval for the standard deviation,
σ . And six times the square roots of the values in display (6.42) could be used as
endpoints of a confidence interval for the “6σ ” capability of a process.
Table 6.8
Measurements of a Dimension on 20 Parts
Machined on a CNC Lathe
Measured Dimension
(.0001 in. over nominal) Frequency
8 1
9 1
10 10
11 4
12 3
13 1
390 Chapter 6 Introduction to Formal Statistical Inference
Example 12
(continued ) 3.0
Standard normal quantile
1.5
2
0.0 2
2
2
2
–1.5
1. H0 : σ = .0007.
2. Ha : σ > .0007.
(The most practical concern is the possibility that the machine is not
capable of holding to the stated tolerances, and this is described in terms
of σ larger than standard.)
3. The test statistic is
(n − 1)s 2
X2 =
(.0007)2
6.4 One- and Two-Sample Inference for Variances 391
(20 − 1)(.00011)2
x2 = = .5
(.0007)2
exceeds .995. There is nothing in the data in hand to indicate that the
machine is incapable of holding to the given tolerances.
When this is compared to the ±20 × 10−4 in. engineering requirement, it shows
that the lathe in question is clearly capable of producing the kind of precision
specified for the given dimension.
Relationship between 1
Qν ( p) = (6.45)
Fν ,ν and Fν ,ν 1 ,ν2 Qν (1 − p)
2 ,ν1
1 2 2 1
quantiles
f(x)
ν1 = 10 ν 2 = 100
ν1 = 10 ν 2 = 10
ν1 = 10 ν 2 = 4
ν1 = 4 ν2 = 4
Fact (6.45) means that a small lower percentage point of an F distribution may be
obtained by taking the reciprocal of a corresponding small upper percentage point
of the F distribution with degrees of freedom reversed.
1
Q 3,5 (.01) =
Q 5,3 (.99)
so that using the ν1 = 5 column and ν2 = 3 row of the table of F .99 quantiles,
one has
1
Q 3,5 (.01) = = .04
28.24
Next, considering P[V > 4.0], one finds (using the ν1 = 3 columns and
ν2 = 5 rows of Tables B.6) that 4.0 lies between the .90 and .95 quantiles of the
F3,5 distribution. That is,
so that
Finally, considering P[V < .3], note that none of the entries in Tables B.6 is
less than 1.00. So to place the value .3 in the F3,5 distribution, one must locate its
reciprocal, 3.33(= 1/.3), in the F5,3 distribution and then make use of expression
(6.45). Using the ν1 = 5 columns and ν2 = 3 rows of Tables B.6, one finds that
3.33 is between the .75 and .90 quantiles of the F5,3 distribution. So by expression
(6.45), .3 is between the .1 and .25 quantiles of the F3,5 distribution, and
s12 σ22
F= · (6.46)
σ12 s22
s12 σ22
L< · <U
σ12 s22
is algebraically equivalent to
1 s12 σ2 1 s2
· 2 < 12 < · 12
U s2 σ2 L s2
Normal distributions
s12 s12
confidence limits and (6.47)
for σ12 /σ22 U · s22 L · s22
where L and U are (Fn −1,n −1 quantiles) such that the Fn −1,n −1 probability as-
1 2 1 2
signed to the interval (L , U ) corresponds to the desired confidence.
6.4 One- and Two-Sample Inference for Variances 395
σ12
H0 : =# (6.48)
σ22
Example 14 Comparing Uniformity of Hardness Test Results for Two Types of Steel
Condon, Smith, and Woodford did some hardness testing on specimens of 4%
carbon steel. Part of their data are given in Table 6.9, where Rockwell hardness
measurements for ten specimens from a lot of heat-treated steel specimens and
five specimens from a lot of cold-rolled steel specimens are represented.
Consider comparing measured hardness uniformity for these two steel types
(rather than mean hardness, as might have been done in Section 6.3). Figure 6.19
shows side-by-side dot diagrams for the two samples and suggests that there
is a larger variability associated with the heat-treated specimens than with the
cold-rolled specimens. The two normal plots in Figure 6.20 indicate no obvious
problems with a model assumption of normal underlying distributions.
Table 6.9
Rockwell Hardness Measurements for Steel Specimens
of Two Types
Heat-Treated Cold-Rolled
32.8, 44.9, 34.4, 37.0, 23.6, 21.0, 24.5, 19.9, 14.8, 18.8
29.1, 39.5, 30.1, 29.2, 19.2
396 Chapter 6 Introduction to Formal Statistical Inference
Example 14 Heat-treated
(continued )
Rockwell hardness
10 15 20 25 30 35 40 45
Cold-rolled
Rockwell hardness
10 15 20 25 30 35 40 45
2.4
1.2
0.0
–1.2
2.4
1.2
0.0
–1.2
σ12
1. H0 : = 1.
σ22
6.4 One- and Two-Sample Inference for Variances 397
σ12
2. Ha : 6= 1.
σ22
(If there is any materials-related reason to pick a one-sided alternative
hypothesis here, the authors don’t know it.)
3. The test statistic is
s12
F=
s22
The reference distribution is the F9,4 distribution, and both large observed
f and small observed f will constitute evidence against H0 .
4. The samples give
(7.52)2
f = = 4.6
(3.52)2
5. Since the observed f is larger than 1, for the two-sided alternative, the
p-value is
From Tables B.6, 4.6 is between the F9,4 distribution .9 and .95 quantiles,
so the observed level of significance is between .1 and .2. This makes
it moderately (but not completely) implausible that the heat-treated and
cold-rolled variabilities are the same.
In an effort to pin down the relative sizes of the heat-treated and cold-rolled
hardness variabilities, the square roots of the expressions in display (6.47) may be
used to give a 90% two-sided confidence interval for σ1 /σ2 . Now the .95 quantile
of the F9,4 distribution is 6.0, while the .95 quantile of the F4,9 distribution is
1
3.63, implying that the .05 quantile of the F9,4 distribution is 3.63 . Thus, a 90%
confidence interval for the ratio of standard deviations σ1 /σ2 has endpoints
s s
(7.52)2 (7.52)2
and
6.0(3.52)2 (1/3.63)(3.52)2
That is,
Example 14 The fact that the interval (.87, 4.07) covers values both smaller and larger than 1
(continued ) indicates that the data in hand do not provide definitive evidence even as to which
of the two variabilities in material hardness is larger.
The methods of this section are, strictly speaking, normal distribution methods.
It is worthwhile to ask, “How essential is this normal distribution restriction to the
predictable behavior of these inference methods for one and two variances?” There
is a remark at the end of Section 6.3 to the effect that the methods presented there for
means are fairly robust to moderate violation of the section’s model assumptions.
Unfortunately, such is not the case for the methods for variances presented here.
Caveats about These are methods whose nominal confidence levels and p-values can be fairly
inferences for badly misleading unless the normal models are good ones. This makes the kind of
variances careful data scrutiny that has been implemented in the examples (in the form of
normal-plotting) essential to the responsible use of the methods of this section. And
it suggests that since normal-plotting itself isn’t typically terribly revealing unless
the sample size involved is moderate to large, formal inferences for variances will
be most safely made on the basis of moderate to large normal-looking samples.
The importance of the “normal distribution(s)” restriction to the predictable
operation of the methods of this section is not the only reason to prefer large sample
sizes for inferences on variances. A little experience with the formulas in this section
will convince the reader that (even granting the appropriateness of normal models)
small samples often do not prove adequate to answer practical questions about
variances. χ 2 and F confidence intervals for variances and variance ratios based on
6.5 One- and Two-Sample Inference for Proportions 399
small samples can be so big as to be of little practical value, and the engineer will
typically be driven to large sample sizes in order to solve variance-related real-world
problems. This is not in any way a failing of the present methods. It is simply a
warning and quantification of the fact that learning about variances requires more
data than (for example) learning about means.
Section 4 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Return to data on Choi’s bicycle stopping distance treaded and smooth tires produce normally dis-
given in Exercise 4 of Section 6.3. tributed stopping distances, give a 90% two-
(a) Operating under the assumption that treaded sided confidence interval for the ratio σTreaded /
tires produce normally distributed stopping σSmooth .
distances, give a two-sided 95% confidence 2. Consider again the situation of Exercise 3 of Sec-
interval for the standard deviation of treaded tion 3.1 and Exercise 2 of Section 6.3. (It concerns
tire stopping distances. the torques required to loosen two particular bolts
(b) Operating under the assumption that smooth holding an assembly on a piece of machinery.)
tires produce normally distributed stopping (a) Operating under the assumption that top-bolt
distances, give a 99% upper confidence bound torques are normally distributed, give a 95%
for the standard deviation of smooth tire stop- lower confidence bound for the standard devi-
ping distances. ation of the top-bolt torques.
(c) Operating under the assumption that both (b) Translate your answer to part (a) into a 95%
treaded and smooth tires produce normally dis- lower confidence bound on the “6σ process
tributed stopping distances, assess the strength capability” of the top-bolt tightening process.
of Choi’s evidence that treaded and smooth (c) It is not appropriate to use the methods (6.47)
stopping distances differ in their variability. through (6.49) and the data given in Exercise
(Use H0 : σTreaded = σSmooth and Ha : σTreaded 6= 3 of Section 3.1 to compare the consistency of
σSmooth and show the whole five-step format.) top-bolt and bottom-bolt torques. Why?
(d) Operating under the assumption that both
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
has the binomial (n, p) distribution. The sample fraction p̂ is just a scale change
away from X = n p̂, so facts about the distribution of X have immediate counterparts
regarding the distribution of p̂. For example, Section 5.1.4 stated that the mean and
variance for the binomial (n, p) distribution are (respectively) np and np(1 − p).
This (together with Proposition 1 in Chapter 5) implies that p̂ has
Mean of the X 1 1
Ep̂ = E = EX = · np = p (6.50)
sample proportion n n n
and
2
Variance of the X 1 np(1 − p) p(1 − p)
sample proportion Var p̂ = Var = Var X = = (6.51)
n n n2 n
Equations (6.50) and (6.51) provide a reassuring picture of the behavior of the statis-
tic p̂. They show that the probability distribution of p̂ is centered at the underlying
parameter p, with a variability that decreases as n increases.
6.5 One- and Two-Sample Inference for Proportions 401
Ep̂ = p = .2
r r
p p(1 − p) (.2)(.8)
Var p̂ = = = .2
n 4
Ep̂ = p = .2
r
p (.2)(.8)
Var p̂ = = .04
100
Comparing the two standard deviations, it is clear that the effect of √ a change
in sample size from n = 4 to n = 100 is to produce a factor of 5 (= 100/4)
decrease in the standard deviation of p̂, while the distribution of p̂ is centered at
p for both sample sizes.
The basic new insight needed to provide large-sample inference methods based
on p̂ is the fact that for large n, the binomial (n, p) distribution (and therefore also
Approximate the distribution of p̂) is approximately normal. That is, for large n, approximate
normality of the probabilities for X = n p̂ (or p̂) can be found using the normal distribution with
sample proportion mean µ = np (or µ = p) and variance σ 2 = np(1 − p) (or σ 2 = p(1− n
p)
).
Example 16 In the shaft-turning example, consider the probability that for a sample of n = 100
(continued ) shafts, p̂ ≥ .25. Notice that p̂ ≥ .25 is equivalent here to the eventuality that
n p̂ ≥ 25. So in theory the form of the binomial probability function given in
Definition 9 of Chapter 5 could be used and the desired probability could be
evaluated exactly as
Approximate
probability that
p ≥ .25
.15 .2 .25
.25 − E p̂ .25 − .2
z= p = = 1.25
Var p̂ .04
so
The exact value of P[ p̂ ≥ .25] (calculated to four decimal places using the
binomial probability function) is .1314. (This can, for example, be obtained
using the MINITAB routine under the “Calc/Probability Distributions/Binomial”
menu.)
The statement that for large n, the random variable p̂ is approximately normal
is actually a version of the central limit theorem. For a given n, the approximation
is best for moderate p (i.e., p near .5), and a common rule of thumb is to require
that both the expected number of successes and the expected number of failures
be at least 5 before making use of a normal approximation to the binomial (n, p)
distribution. This is a requirement that
np ≥ 5 and n(1 − p) ≥ 5
p̂ − p
Z=r (6.54)
p(1 − p)
n
is approximately standard normal. This and the reasoning of Section 6.2 then imply
that the null hypothesis
H0 : p = #
Large-sample p̂ − #
Z=r (6.55)
test statistic #(1 − #)
for p
n
p (1 – p)
.20
.10
.5 1.0 p
Thus, modifying
√ the endpoints in formula (6.56) by replacing the plus-or-minus part
with ±z/2 n produces an interval that is guaranteed to be as wide as necessary to
give the desired approximate confidence level. That is, the interval with endpoints
Large-sample
conservative 1
p̂ ± z √ (6.57)
confidence limits 2 n
for p
where z is chosen such that the standard normal probability between −z and z
corresponds to a desired confidence, is a practically usable large-n, two-sided,
conservative confidence interval for p. (Appropriate use of only one of the endpoints
in display (6.57) gives a one-sided confidence interval.)
The other common method of dealing with the fact that the endpoints in formula
(6.56) are of no practical use is to begin the search for a formula from a point other
than the approximate standard normal distribution of the variable (6.54). For large
n, not only is the variable (6.54) approximately standard normal, but so is
p̂ − p
Z=r (6.58)
p̂(1 − p̂)
n
And the denominator of the quantity (6.58) (which amounts to an estimated standard
deviation for p̂) is free of the parameter p. So when manipulations parallel to those
in Section 6.1 are applied to expression (6.58), the conclusion is that the interval
with endpoints
Large-sample r
p̂(1 − p̂)
confidence limits p̂ ± z (6.59)
for p n
can be used as a two-sided, large-n confidence interval for p with confidence level
corresponding to the standard normal probability assigned to the interval between
−z and z. (One-sided confidence limits are obtained in the usual way, using only
one of the endpoints in display (6.59) and appropriately adjusting the confidence
level.)
6.5 One- and Two-Sample Inference for Proportions 405
Example 17 Inference for the Fraction of Dry Cells with Internal Shorts
The article “A Case Study of the Use of an Experimental Design in Preventing
Shorts in Nickel-Cadmium Cells” by Ophir, El-Gad, and Snyder (Journal of
Quality Technology, 1988) describes a series of experiments conducted to find
how to reduce the proportion of cells scrapped by a battery plant because of
internal shorts. At the beginning of the study, about 6% of the cells produced
were being scrapped because of internal shorts.
Among a sample of 235 cells made under a particular trial set of plant
operating conditions, 9 cells had shorts. Consider what formal inferences can be
drawn about the set of operating conditions based on such data. p̂ = 235
9
= .038,
so two-sided 95% confidence limits for p, are by expression (6.59)
r
(.038)(1 − .038)
.038 ± 1.96
235
i.e.,
.038 ± .025
i.e.,
Notice that according to display (6.60), although p̂ = .038 < .06 (and thus indi-
cates that the trial conditions were an improvement over the standard ones), the
case for this is not airtight. The data in hand allow some possibility that p for the
trial conditions even exceeds .06. And the ambiguity is further emphasized if the
conservative formula (6.57) is used in place of expression (6.59). Instead of 95%
confidence endpoints of .038 ± .025, formula (6.57) gives endpoints .038 ± .064.
To illustrate the significance-testing method represented by expression (6.55),
consider testing with an alternative hypothesis that the trial plant conditions are
an improvement over the standard ones. One then has the following summary:
1. H0 : p = .06.
2. Ha : p < .06.
3. The test statistic is
p̂ − .06
Z=s
(.06)(1 − .06)
n
8(−1.42) = .08
This is strong but not overwhelming evidence that the trial plant conditions
are an improvement on the standard ones.
It needs to be emphasized again that these inferences depend for their practi-
cal relevance on the appropriateness of the “stable process/independent, identical
trials” model for the battery-making process and extend only as far as that de-
scription continues to make sense. It is important that the experience reported in
the article was gained under (presumably physically stable) regular production,
so there is reason to hope that a single “independent, identical trials” model can
describe both experimental and future process behavior.
Section 6.1 illustrated the fact that the form of the large-n confidence interval
for a mean can be used to guide sample-size choices for estimating µ. The same is
Sample size true regarding the estimation of p. If one (1) has in mind a desired confidence level,
determination (2) plans to use expression (6.57) or has in mind a worst-case (largest) expectation
for estimating p for p̂(1 − p̂) in expression (6.59), and (3) has a desired precision of estimation of
p, it is a simple matter to solve for a corresponding sample size. That is, suppose
that the desired confidence level dictates the use of the value z in formula (6.57) and
one wants to have confidence limits (or a limit) of the form p̂ ± 1. Setting
1
1=z √
2 n
z 2
n=
21
Example 17 Return to the nicad battery case and suppose that for some reason a better fix on
(continued ) the implications of the new operating conditions was desired. In fact, suppose
that p is to be estimated with a two-sided conservative 95% confidence interval,
and ±.01 (fraction defective) precision of estimation is desired. Then, using the
6.5 One- and Two-Sample Inference for Proportions 407
1
.01 = 1.96 √
2 n
n ≈ 9,604
is required.
In most engineering contexts this sample size is impractically large. Rethink-
ing the calculation by planning the use of expression (6.59) and adopting the point
of view that, say, 10% is a worst-case expectation for p̂ (and thus .1(1 − .1) = .09
is a worst-case expectation for p̂(1 − p̂)), one might be led instead to set
r
(.1)(1 − .1)
.01 = 1.96
n
However, solving for n, one has
n ≈ 3,458
Cautions concerning The sample-size conclusions just illustrated are typical, and they justify two
inference based on important points about the use of qualitative data. First, qualitative data carry less
sample proportions information than corresponding numbers of quantitative data (and therefore usually
require very large samples to produce definitive inferences). This makes measure-
ments generally preferable to qualitative observations in engineering applications.
Second, if inferences about p based on even large values of n are often disappoint-
ing in their precision or reliability, there is little practical motivation to consider
small-sample inference for p in a beginning text like this.
Variance of a p1 (1 − p1 ) p2 (1 − p2 )
difference in Var( p̂ 1 − p̂ 2 ) = (1)2 Var p̂ 1 +(−1)2 Var p̂ 2 = + (6.62)
n1 n2
sample proportions
Approximate Then the approximate normality of p̂ 1 and p̂ 2 for large sample sizes turns out to
normality of imply the approximate normality of the difference p̂ 1 − p̂ 2 .
p̂1 − p̂2
Example 16 Consider again the turning of steel shafts, and imagine that two different, physi-
(continued ) cally stable lathes produce reworkable shafts at respective rates of 20 and 25%.
Then suppose that samples of (respectively) n 1 = 50 and n 2 = 50 shafts pro-
duced by the machines are taken, and the reworkable sample fractions p̂ 1 and
p̂ 2 are found. Consider approximating the probability that p̂ 1 ≥ p̂ 2 (i.e., that
p̂ 1 − p̂ 2 ≥ 0).
Using expressions (6.61) and (6.62), the variable p̂ 1 − p̂ 2 has
0 − E( p̂ 1 − p̂ 2 ) 0 − (−.05)
z= p = = .60
Var( p̂ 1 − p̂ 2 ) .083
so that
P[ p̂ 1 − p̂ 2 ≥ 0] = 1 − 8(.60) = .27
6.5 One- and Two-Sample Inference for Proportions 409
Approximate
probability that
p1 ≥ p 2
p̂ 1 − p̂ 2 − ( p1 − p2 )
Z=s (6.63)
p1 (1 − p1 ) p (1 − p2 )
+ 2
n1 n2
is approximately standard normal, and this observation forms the basis for inference
concerning p1 − p2 . First consider confidence interval estimation for p1 − p2 . The
familiar argument of Section 6.1 (beginning with the quantity (6.63)) shows
s
p1 (1 − p1 ) p (1 − p2 )
p̂ 1 − p̂ 2 ± z + 2 (6.64)
n1 n2
Large-sample s
1 1 1
conservative p̂ 1 − p̂ 2 ± z · + (6.65)
confidence limits 2 n1 n2
for p1 − p2
In addition, in by now familiar fashion, beginning with the fact that for large
sample sizes, the modification of the variable (6.63),
p̂1 − p̂ 2 − ( p1 − p2 )
Z=s (6.66)
p̂ 1 (1 − p̂ 1 ) p̂ (1 − p̂ 2 )
+ 2
n1 n2
is approximately standard normal leads to the conclusion that the interval with
endpoints
s
Large-sample p̂ 1 (1 − p̂ 1 ) p̂ (1 − p̂ 2 )
confidence limits p̂ 1 − p̂ 2 ± z + 2 (6.67)
for p1 − p2
n1 n2
i.e.,
.09 ± .109
i.e.,
H0 : p1 − p2 = 0 (6.69)
i.e., the hypothesis that the parameters p1 and p2 are equal. Notice that if p1 = p2
and the common value is denoted as p, expression (6.63) can be rewritten as
p̂1 − p̂ 2
Z= s (6.70)
p 1 1
p(1 − p) +
n1 n2
The variable (6.70) cannot serve as a test statistic for the null hypothesis (6.69),
since it involves the unknown hypothesized common value of p1 and p2 . What is
done to modify the variable (6.70) to arrive at a usable test statistic, is to replace p
with a sample-based estimate, obtained by pooling together the two samples. That
is, let
Pooled estimator n 1 p̂ 1 + n 2 p̂ 2
of a common p p̂ = (6.71)
n1 + n2
( p̂ is the total number of items in the two samples with the characteristic of interest
divided by the total number of items in the two samples). Then a significance test
of hypothesis (6.69) can be carried out using the test statistic
Large-sample p̂ 1 − p̂ 2
Z= s (6.72)
test statistic for p 1 1
H0 : p1 − p2 = 0 p̂(1 − p̂) +
n1 n2
Example 18 As further confirmation of the fact that in the pelletizing problem sample fractions
(continued ) of p̂ 1 = .38 and p̂ 2 = .29 based on samples of size n 1 = n 2 = 100 are not com-
pletely convincing evidence of a real difference in process performance for small
and large shot sizes, consider testing H0 : p1 − p2 = 0 with Ha : p1 − p2 6= 0. As
a preliminary step, from expression (6.71),
100(.38) + 100(.29) 67
I p̂ = = = .335
100 + 100 200
Then the five-step summary gives the following:
1. H0 : p1 − p2 = 0.
2. Ha : p1 − p2 6= 0.
3. The test statistic is
p̂ 1 − p̂ 2
Z= s
p 1 1
p̂(1 − p̂) +
n1 n2
.38 − .29
z= r = 1.35
p 1 1
(.335)(1 − .335) +
100 100
5. The p-value is P[|a standard normal variable| ≥ 1.35]. That is, the p-
value is
8(−1.35) + 1 − 8(1.35) = .18
The data furnish only fairly weak evidence of a real difference in long-run
fractions of conforming pellets for the two shot sizes.
The kind of results seen in Example 18 may take some getting used to. Even
with sample sizes as large as 100, sample fractions differing by nearly .1 are still
not necessarily conclusive evidence of a difference in p1 and p2 . But this is just
another manifestation of the point that individual qualitative observations carry
disappointingly little information.
A final reminder of the large-sample nature of the methods presented here is in
order. The methods here all rely (for the agreement of nominal and actual confidence
6.5 One- and Two-Sample Inference for Proportions 413
Section 5 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
x̄ − xn+1 (6.73)
that leads to an answer to this question. That is, the random variable in expression
(6.73) has, by the methods of Section 5.5 (Proposition 1 in particular),
and
σ2 1
Var(x̄ − xn+1 ) = (1) Var x̄ + (−1) Var xn+1
2 2
= +σ = 1+
2
σ2 (6.75)
n n
Further, it turns out that the difference (6.73) is normally distributed, so the variable
(x̄ − xn+1 ) − 0
Z= r (6.76)
1
σ 1+
n
is standard normal. And taking one more step, if s 2 is the usual sample variance of
x 1 , x2 , . . . , xn , substituting s for σ in expression (6.76) produces a variable
(x̄ − xn+1 ) − 0
T = r (6.77)
1
s 1+
n
Normal distribution s
prediction limits for 1
a single additional x̄ ± ts 1 + (6.78)
n
observation
can be used as a two-sided interval to predict x n+1 and that the probability-based
reliability figure attached to the interval should be the tn−1 probability assigned to
the interval from −t to t. The interval (6.78) is a called a prediction interval with
associated confidence the tn−1 probability assigned to the interval from −t to t. In
general, the language indicated in Definition 17 will be used.
416 Chapter 6 Introduction to Formal Statistical Inference
It is the fact that a finite sample gives only a somewhat clouded picture of a
distribution that prevents the making of a normal distribution prediction interval
from being a trivial matter of probability calculations like those in Section 5.2. That
is, suppose there were enough data to “know” the mean, µ, and variance, σ 2 , of
a normal distribution. Then, since 1.96 is the .975 standard normal quantile, the
interval with endpoints
has a 95% chance of bracketing the next value generated by the distribution. The fact
that (when based only on small samples), the knowledge of µ and σ is noisy forces
expression (6.79) to be abandoned for an interval like (6.78). It is thus comforting that
for large n and 95% confidence, formula (6.78) produces an interval with endpoints
approximating
√ those in display (6.79). That is, for large n and 95% confidence,
t ≈ 1.96, 1 + (1/n) ≈ 1, and one expects that typically x̄ ≈ µ and s ≈ σ , so that
expressions (6.78) and (6.79) will essentially agree. The beauty of expression (6.78)
is that it allows in a rational fashion for the uncertainties involved in the µ ≈ x̄ and
σ ≈ s approximations.
i.e.,
The interval indicated by display (6.80) is not at all the same as the confidence
interval for µ found in Example 8. The limits of
found on page 367 apply to the mean spring lifetime, µ, not to an additional
observation x11 as the ones in display (6.80) do.
i.e.,
I 3.164 g (6.81)
Table 6.10
Weights of 100 Newly Minted U.S. Pennies
2.99 1 3.11 24
3.01 4 3.13 17
3.03 4 3.15 13
3.05 4 3.17 6
3.07 7 3.19 2
3.09 17 3.21 1
418 Chapter 6 Introduction to Formal Statistical Inference
Example 20
(continued ) 3
–1
–2
–3
3.0 3.1 3.2
Weight quantile
This example illustrates at least two important points. First, the two-sided
prediction limits in display (6.78) can be modified to get a one-sided limit exactly
as two-sided confidence limits can be modified to get a one-sided limit. Second,
the calculation represented by the result (6.81) is, because n = 100 is a fairly
large sample size, only marginally different from what one would get assuming
µ = 3.108 g exactly and σ = .043 g exactly. That is, since the .9 normal quantile
is 1.282, “knowing” µ and σ leads to an upper prediction limit of
The fact that the result (6.81) is slightly larger than the final result in display
(6.82) reflects the small uncertainty involved in the use of x̄ in place of µ and s
in place of σ .
Cautions about The name “prediction interval” probably has some suggested meanings that
“prediction” should be dismissed before going any further. Prediction suggests the future and
thus potentially different conditions. But no such meaning should be associated
with statistical prediction intervals. The assumption behind formula (6.78) is that
x1 , x2 , . . . , xn and xn+1 are all generated according to the same underlying distribu-
tion. If (for example, because of potential physical changes in a system during a time
lapse between the generation of x 1 , x2 , . . . , xn and the generation of xn+1 ) no single
stable process model for the generation of all n + 1 observations is appropriate, then
neither is formula (6.78). Statistical inference is not a crystal ball for foretelling an
erratic and patternless future. It is rather a methodology for quantifying the extent
of knowledge about a pattern of variation existing in a consistent present. It has
implications in other times and at other places only if that same pattern of variation
can be expected to repeat itself in those conditions.
6.6 Prediction and Tolerance Intervals 419
However, there is no guarantee on this probability nor any way to determine it. In
particular, it is not necessarily .9 (the confidence level associated with the prediction
interval). That is, there is no practical way to employ probability to describe the
likely effectiveness of a numerical prediction interval. One is thus left with the
interpretation of confidence of prediction given in Definition 18.
Definition 18 To say that a numerical interval (a, b) is (for example) a 90% prediction interval
(Interpretation of a for an additional observation xn+1 is to say that in obtaining it, methods of
Prediction Interval ) data collection and calculation have been applied that would produce intervals
bracketing an (n + 1)th observation in about 90% of repeated applications of
the entire process of (1) selecting the sample x1 , . . . , xn , (2) calculating an
interval, and (3) generating a single additional observation x n+1 . Whether or
not x n+1 will fall into the numerical interval (a, b) is not known, and although
there is some probability associated with that eventuality, it is not possible to
evaluate it. And in particular, it need not be 90%.
and
Another one-sided
normal tolerance (x̄ − τ1 s, ∞) (6.85)
interval
Example 19 Consider making a two-sided 95% tolerance interval for 90% of additional spring
(continued ) lifetimes based on the data of Table 6.4. As earlier, for these data, x̄ = 168.3
(×103 cycles) and s = 33.1 (×103 cycles). Then consulting Table B.7A, since
n = 10, τ2 = 2.856 is appropriate for use in expression (6.83). That is, two-sided
95% tolerance limits for 90% of additional spring lifetimes are
i.e.,
It is obvious from comparing displays (6.80) and (6.86) that the effect of moving
from the prediction of a single additional spring lifetime to attempting to bracket
most of a large number of additional lifetimes is to increase the size of the
declared interval.
Example 20 Consider again the new penny weights given in Table 6.10 and now the problem of
(continued ) making a one-sided 95% tolerance interval of the form (−∞, #) for the weights of
90% of additional pennies. Remembering that for the penny weights, x̄ = 3.108 g
and s = .043 g, and using Table B.7B for n = 100, the desired upper tolerance
bound for 90% of the penny weights is
As expected, this is larger (more conservative) than the value of 3.164 g given in
display (6.81) as a one-sided 90% prediction limit for a single additional penny
weight.
422 Chapter 6 Introduction to Formal Statistical Inference
The correct interpretation of the confidence level for a tolerance interval should
be fairly easy to grasp. Prior to the generation of x 1 , x2 , . . . , xn , planned use of
expression (6.83), (6.84), or (6.85) gives a guaranteed probability of success in
bracketing a fraction of at least p of the underlying distribution. But after observing
x 1 , . . . , xn and making a numerical interval, it is impossible to know whether the
attempt has or has not been successful. Thus the following interpretation:
Definition 20 To say that a numerical interval (a, b) is (for example) a 90% tolerance in-
(Interpretation of a terval for a fraction p of an underlying distribution is to say that in obtaining
Tolerance Interval ) it, methods of data collection and calculation have been applied that would
produce intervals bracketing a fraction of at least p of the underlying distri-
bution in about 90% of repeated applications (of generation of x 1 , . . . , xn and
subsequent calculation). Whether or not the numerical interval (a, b) actually
contains at least a fraction p is unknown and not describable in terms of a
probability.
i.e.,
And using expression (6.83) to make, for example, a 95% tolerance interval for
99% of additional log discovery times produces endpoints
2.46 ± 3.355(.68)
i.e.,
Then the intervals specified in displays (6.87) and (6.88) for log discovery times
have, via exponentiation, their counterparts for raw discovery times. That is,
exponentiation of the values in display (6.87) gives a 99% prediction interval for
another discovery time of from
And exponentiation of the values in display (6.88) gives a 95% tolerance interval
for 99% of additional discovery times of from
Interval based on
the sample maximum (−∞,max(x1 , . . . , xn )) (6.89)
and
Interval based on
(min(x1 , . . . , xn ), ∞) (6.90)
the sample minimum
and
Interval based on
the sample minimum (min(x1 , . . . , xn ), max(x1 , . . . , xn )) (6.91)
and maximum
Prediction confidence n
for a one-sided interval One-sided prediction confidence level = (6.92)
n+1
The confidence levels for intervals (6.89), (6.90), and (6.91) as tolerance in-
tervals must of necessity involve p, the fraction of the underlying distribution one
hopes to bracket. The fact is that using interval (6.89) or (6.90) as a one-sided toler-
ance interval for a fraction p of an underlying distribution, the associated confidence
level is
Confidence level for
a one-sided tolerance One-sided confidence level = 1 − pn (6.94)
interval
6.6 Prediction and Tolerance Intervals 425
Example 19 Return one more time to the spring-life scenario, and consider the use of interval
(continued ) (6.91) as first a prediction interval and then a tolerance interval for 90% of
additional spring lifetimes. Notice in Table 6.4 (page 366) that the smallest and
largest of the observed spring lifetimes are, respectively,
and
so the numerical interval under consideration is the one with endpoints 117
(×103 cycles) and 225 (×103 cycles).
Then expression (6.93) means that this interval can be used as a prediction
interval with
10 − 1 9
Prediction confidence = = = 82%
10 + 1 11
Example 20 Looking for a final time at the penny weight data in Table 6.10, consider the use
(continued ) of interval (6.89) as first a prediction interval and then a tolerance interval for
99% of additional penny weights. Notice that in Table 6.10, the largest of the
n = 100 weights is 3.21 g, so
Example 20 Then expression (6.92) says that when used as an upper prediction limit for a
(continued ) single additional penny weight, the prediction confidence associated with 3.21 g is
100
Prediction confidence = = 99%
100 + 1
And expression (6.94) shows that as a tolerance interval for 99% of many addi-
tional penny weights, the interval (−∞, 3.21) has associated confidence
A little experience with formulas (6.92), (6.93), (6.94), and (6.95) will convince
the reader that the intervals (6.89), (6.90), and (6.91) often carry disappointingly
small confidence coefficients. Usually (but not always), you can do better in terms
of high confidence and short intervals if (possibly after transformation) the normal
distribution methods discussed earlier can be applied. But the beauty of intervals
(6.89), (6.90), and (6.91) is that they are both widely applicable (in even nonnormal
contexts) and extremely simple.
Prediction and tolerance interval methods are very useful engineering tools.
Historically, they probably haven’t been used as much as they should be for lack of
accessible textbook material on the methods. We hope the reader is now aware of the
existence of the methods as the appropriate form of formal inference when the focus
is on individual values generated by a process rather than on process parameters.
When the few particular methods discussed here don’t prove adequate for practical
purposes, the reader should look into the topic further, beginning with the book by
Hahn and Meeker mentioned earlier.
Section 6 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Confidence, prediction, and tolerance intervals are (a) Make a two-sided 90% prediction interval for
all intended to do different jobs. What are these an additional spring lifetime under this stress.
jobs? Consider the differing situations of an official (b) Make a two-sided 95% tolerance interval for
of the EPA, a consumer about to purchase a single 90% of all spring lifetimes under this stress.
car, and a design engineer trying to equip a certain (c) How do the intervals from (a) and (b) compare?
model with a gas tank large enough that most cars (Consider both size and interpretation.)
produced will have highway cruising ranges of at (d) There is a two-sided 90% confidence interval
least 350 miles. Argue that depending on the point for the mean spring lifetime under this stress
of view adopted, a lower confidence bound for a given in Example 8. How do your intervals
mean mileage, a lower prediction bound for an in- from (a) and (b) compare to the interval in
dividual mileage, or a lower tolerance bound for Example 8? (Consider both size and interpre-
most mileages would be of interest. tation.)
2. The 900 N/mm2 stress spring lifetime data in Table (e) Make a 90% lower prediction bound for an
6.7 used in Example 8 have a fairly linear normal additional spring lifetime under this stress.
plot.
Chapter 6 Exercises 427
(f) Make a 95% lower tolerance bound for 90% of into a prediction interval for a single additional
all spring lifetimes under this stress. aluminum content.
3. The natural logarithms of the aluminum contents (c) How do the intervals from (a) and (b) compare?
discussed in Exercise 2 of Chapter 3 have a rea- 4. Again in the context of Chapter Exercise 2 of Chap-
sonably bell-shaped relative frequency distribution. ter 3, if the interval from 30 ppm to 511 ppm
Further, these 26 log aluminum contents have sam- is used as a prediction interval for a single addi-
ple mean 4.9 and sample standard deviation .59. tional aluminum content measurement from the
Use this information to respond to the following: study period, what associated prediction confi-
(a) Give a two-sided 99% tolerance interval for dence level can be stated? What confidence can
90% of additional log aluminum contents at be associated with this interval as a tolerance in-
the Rutgers recycling facility. Then translate terval for 90% of all such aluminum content mea-
this interval into a 99% tolerance interval for surements?
90% of additional raw aluminum contents.
(b) Make a 90% prediction interval for one ad-
ditional log aluminum content and translate it
Chapter 6 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Consider the breaking strength data of Table 3.6. strength of the evidence in the data that the
Notice that the normal plot of these data given as mean generic towel strength is in fact below
Figure 3.18 is reasonably linear. It may thus be sen- the 9,500 g target. (Show the whole five-step
sible to suppose that breaking strengths for generic significance-testing format.)
towel of this type (as measured by the students) are (f) Now put yourself in the place of a quality
adequately modeled as normal. Under this assump- control inspector concerned that the breaking
tion, strength be reasonably consistent—i.e., that σ
(a) Make and interpret 95% two-sided and one- be small. Suppose in fact it is desirable that σ
sided confidence intervals for the mean break- be no more than 400 g. Use the significance-
ing strength of generic towels (make a one- testing format and assess the strength of the
sided interval of the form (#,∞)). evidence given in the data that in fact σ ex-
(b) Make and interpret 95% two-sided and one- ceeds the target standard deviation.
sided prediction intervals for a single addi- 2. Consider the situation of Example 1 in Chapter 1.
tional generic towel breaking strength (for the (a) Use the five-step significance-testing format to
one-sided interval, give the lower prediction assess the strength of the evidence collected in
bound). this study to the effect that the laying method
(c) Make and interpret 95% two-sided and one- is superior to the hanging method in terms of
sided tolerance intervals for 99% of generic mean runouts produced.
towel breaking strengths (for the one-sided in- (b) Make and interpret 90% two-sided and one-
terval, give the lower tolerance bound). sided confidence intervals for the improvement
(d) Make and interpret 95% two-sided and one- in mean runout produced by the laying method
sided confidence intervals for σ , the standard over the hanging method (for the one-sided
deviation of generic towel breaking strengths. interval, give a lower bound for µhung − µlaid ).
(e) Put yourself in the position of a quality con- (c) Make and interpret a 90% two-sided confi-
trol inspector, concerned that the mean break- dence interval for the mean runout for laid
ing strength not fall under 9,500 g. Assess the gears.
428 Chapter 6 Introduction to Formal Statistical Inference
(d) What is it about Figure 1.1 that makes it ques- lengths obtained by one of the students were as
tionable whether “normal distribution” predic- follows (the units are inches):
tion and tolerance interval formulas ought to be
used to describe runouts for laid gears? Sup- 1.1375, 1.1390, 1.1420, 1.1430, 1.1410, 1.1360,
pose instead that you used the methods of Sec- 1.1395, 1.1380, 1.1350, 1.1370, 1.1345, 1.1340,
tion 6.6.3 to make prediction and tolerance in- 1.1405, 1.1340, 1.1380, 1.1355
tervals for laid gear runouts. What confidence
could be associated with the largest observed (a) If you were to, for example, make a confi-
laid runout as an upper prediction bound for a dence interval for the population mean mea-
single additional laid runout? What confidence sured length of these bushings via the formu-
could be associated with the largest observed las in Section 6.3, what model assumption must
laid runout as an upper tolerance bound for you employ? Make a probability plot to assess
95% of additional laid gear runouts? the reasonableness of the assumption.
3. Consider the situation of Example 1 in Chapter 4.
(b) Make a 90% two-sided confidence interval for
In particular, limit attention to those densities ob- the mean measured length for bushings of this
tained under the 2,000 and 4,000 psi pressures. type measured by this student.
(One can view the six corresponding densities as (c) Give an upper bound for the mean length with
two samples of size n 1 = n 2 = 3.) 90% associated confidence.
(a) Assess the strength of the evidence that in- (d) Make a 90% two-sided prediction interval for
creasing pressure increases the mean density a single additional measured bushing length.
of the resulting cylinders. Use the five-step (e) Make a 95% two-sided tolerance interval for
significance-testing format. 99% of additional measured bushing lengths.
(b) Give a 99% lower confidence bound for the (f) Consider the statistical interval derived from
increase in mean density associated with the the minimum and maximum sample values—
change from 2,000 to 4,000 psi conditions. namely, (1.1340, 1.1430). What confidence
(c) Assess the strength of the evidence (in the six level should be associated with this interval
density values) that the variability in density as a prediction interval for a single additional
differs for the 2,000 and 4,000 psi conditions bushing length? What confidence level should
(i.e., that σ2,000 6= σ4,000 ). be associated with this interval as a tolerance
(d) Give a 90% two-sided confidence interval for interval for 99% of additional bushing lengths?
the ratio of density standard deviations for the 6. The study mentioned in Exercise 5 also included
two pressures. measurement of the outside diameters of the 16
(e) What model assumptions stand behind the for- bushings. Two of the students measured each of
mal inferences you made in parts (a) through the bushings, with the results given here.
(d) above?
4. Simple counting with the data of Chapter Exercise 2 Bushing 1 2 3 4
in Chapter 3 shows that 18 out of the 26 PET sam- Student A .3690 .3690 .3690 .3700
ples had aluminum contents above 100 ppm. Give Student B .3690 .3695 .3695 .3695
a two-sided approximate 95% confidence interval
Bushing 5 6 7 8
for the fraction of all such samples with aluminum
Student A .3695 .3700 .3695 .3690
contents above 100 ppm.
Student B .3695 .3700 .3700 .3690
5. Losen, Cahoy, and Lewis measured the lengths of
some spanner bushings of a particular type pur-
chased from a local machine supply shop. The
Chapter 6 Exercises 429
mean and for the standard deviation of the (c) Make a normal plot of the transformed values
Brand B stretch distribution. and verify that it is very linear.
(d) Compare the Brand B and Brand D stan- (d) Make a 95% two-sided prediction interval for
dard deviations of stretch using an appropriate the next transformed purity delivered by this
90% two-sided confidence interval. supplier. What does this “untransform” to in
(e) Compare the Brand B and Brand D mean terms of raw purity?
stretch values using an appropriate 90% two- (e) Make a 99% two-sided tolerance interval for
sided confidence interval. Does this interval 95% of additional transformed purities from
give clear indication of a difference in mean this supplier. What does this “untransform”
stretch values for the two brands? to in terms of raw purity?
(f) Carry out a formal significance test of the hy- (f) Suppose that the air products supplier ad-
pothesis that the two brands have the same vertises a median purity of at least 99.5%.
mean stretch values (use a two-sided alter- This corresponds to a median (and therefore
native hypothesis). Does the conclusion you mean) transformed value of at least −1.61.
reach here agree with your answer to part (e)? Test the supplier’s claim (H0 : µ y 0 = −1.61)
11. The accompanying data are n = 10 daily mea- against the possibility that the purity is sub-
surements of the purity (in percent) of oxygen be- standard. Show and carefully label all five
ing delivered by a certain industrial air products steps.
supplier. (These data are similar to some given in 12. Chapter Exercise 6 of Chapter 3 contains a data
a November 1990 article in Chemical Engineer- set on the lifetimes (in numbers of 24 mm deep
ing Progress and used in Chapter Exercise 10 of holes drilled in 1045 steel before tool failure) of 12
Chapter 3.) D952-II (8 mm) drills. The data there have mean
ȳ = 117.75 and s = 51.1 holes drilled. Suppose
99.77 99.66 99.61 99.59 99.55 that a normal distribution can be used to roughly
99.64 99.53 99.68 99.49 99.58 describe drill lifetimes.
(a) Give a 90% lower confidence bound for the
(a) Make a normal plot of these data. What does mean lifetime of drills of this type in this kind
the normal plot reveal about the shape of the of industrial application.
purity distribution? (“It is not bell-shaped” is (b) Based on your answer to (a), do you think a
not an adequate answer. Say how its shape hypothesis test of H0 : µ = 100 versus Ha : µ >
departs from the normal shape.) 100 would have a large p-value or a small p-
(b) What statistical “problems” are caused by value? Explain.
lack of a normal distribution shape for data (c) Give a 90% lower prediction bound for the
such as these? next life length of a drill of this type in this
As a way to deal with problems like those from kind of industrial application.
part (b), you might try transforming the original (d) Give two-sided tolerance limits with 95%
data. Next are values of y 0 = ln(y − 99.3) corre- confidence for 90% of all life lengths for
sponding to each of the original data values y, drills of this type in this kind of industrial
and some summary statistics for the transformed application.
values. (e) Give two-sided 90% confidence limits for the
standard deviation of life lengths for drills of
− .76 −1.02 −1.17 −1.24 −1.39 this type in this kind of industrial application.
−1.08 −1.47 − .97 −1.66 −1.27
13. M. Murphy recorded the mileages he obtained
ȳ 0 = −1.203 and s y 0 = .263 while commuting to school in his nine-year-old
economy car. He kept track of the mileage for ten
Chapter 6 Exercises 431
different tankfuls of fuel, involving gasoline of a roll of plastic, with a constant probability ( pH or
two different octanes. His data follow. pL ) of any particular bag produced being faulty.
(a) Give a 95% upper confidence bound for pH .
87 Octane 90 Octane (b) Give a 95% upper confidence bound for pL .
(c) Compare pH and pL using an appropriate two-
26.43, 27.61, 28.71, 30.57, 30.91, 31.21, sided 95% confidence interval. Does this in-
28.94, 29.30 31.77, 32.86 terval provide a clear indication of a differ-
ence in the effectiveness of the machine at
(a) Make normal plots for these two samples of start-up when run at the two speeds? What
size 5 on the same set of axes. Does the “equal kind of a p-value (big or small) would you
variances, normal distributions” model ap- expect to find in a test of H0 : pH = pL versus
pear reasonable for describing this situation? Ha : pH 6= pL ?
(b) Find sP for these data. What is this quantity (d) Use the five-step format and test H0 : pH = pL
measuring in the present context? versus Ha : pH 6= pL .
(c) Give a 95% two-sided confidence interval for
15. Hamilton, Seavey, and Stucker measured resis-
the difference in mean mileages obtainable
tances, diameters, and lengths for seven copper
under these circumstances using the fuels of
wires at two different temperatures and used these
the two different octanes. From the nature of
to compute experimental resistivities for copper
this confidence interval, would you expect to
at these two temperatures. Their data follow. The
find a large p-value or a small p-value when
units are 10−8 m.
testing H0 : µ87 = µ90 versus Ha : µ87 6= µ90 ?
(d) Conduct a significance test of H0 : µ87 = µ90
against the alternative that the higher-octane Wire 0.0◦ C 21.8◦ C
gasoline provides a higher mean mileage. 1 1.52 1.72
(e) Give 95% lower prediction bounds for the 2 1.44 1.56
next mileages experienced, using first 87 oc-
3 1.52 1.68
tane fuel and then 90 octane fuel.
4 1.52 1.64
(f) Give 95% lower tolerance bounds for 95% of
5 1.56 1.69
additional mileages experienced, using first
87 octane fuel and then 90 octane fuel. 6 1.49 1.71
7 1.56 1.72
14. Eastman, Frye, and Schnepf worked with a com-
pany that mass-produces plastic bags. They fo-
(a) Suppose that primary interest here centers on
cused on start-up problems of a particular machine
the difference between resistivities at the two
that could be operated at either a high speed or a
different temperatures. Make a normal plot of
low speed. One part of the data they collected con-
the seven observed differences. Does it appear
sisted of counts of faulty bags produced in the first
that a normal distribution description of the
250 manufactured after changing a roll of plastic
observed difference in resistivities at these
feedstock. The counts they obtained for both low-
two temperatures is plausible?
and high-speed operation of the machine were 147
(b) Give a 90% two-sided confidence interval for
faulty ( p̂ H = 147 ) under high-speed operation and
250 the mean difference in resistivity measure-
12 faulty under low-speed operation ( p̂ L = 25012
). ments for copper wire of this type at 21.8◦ C
Suppose that it is sensible to think of the machine and 0.0◦ C.
as operating in a physically stable fashion during
the production of the first 250 bags after changing
432 Chapter 6 Introduction to Formal Statistical Inference
(c) Give a 90% two-sided prediction interval for Suppose that one wishes to use an interval of
an additional difference in resistivity mea- the form x̄ ± 1 with a particular confidence co-
surements for copper wire of this type at efficient to estimate the mean µ of a normal dis-
21.8◦ C and 0.0◦ C. tribution. If it is desirable to have 1 ≤ # for some
16. The students referred to in Exercise 15 also mea- number # and one can collect data in two stages,
sured the resistivities for seven aluminum wires at it is possible to choose an overall sample size to
the same temperatures. The 21.8◦ C measurements satisfy these criteria as follows. After taking a
that they obtained follow: small or moderate initial sample of size n 1 (n 1
must be at least 2 and is typically at least 4 or
2.65, 2.83, 2.69, 2.73, 2.53, 2.65, 2.69 5), one computes the sample standard deviation
of the initial data—say, s1 . Then if t is the ap-
(a) Give a 99% two-sided confidence interval for propriate tn −1 distribution quantile for producing
1
the mean resistivity value derived from such the desired (one- or two-sided) confidence, it is
experimental determinations. necessary to find the smallest integer n such that
(b) Give a 95% two-sided prediction interval for 2
the next resistivity value that would be derived ts1
from such an experimental determination. n≥
#
(c) Give a 95% two-sided tolerance interval for
99% of resistivity values derived from such If this integer is larger than n 1 , then n 2 = n −
experimental determinations. n 1 additional observations are taken. (Otherwise,
(d) Give a 95% two-sided confidence interval for n 2 = 0.) Finally, with x̄ the sample mean of all the
the standard deviation of resistivity values de- observations (from both the initial andpany sub-
rived from such experimental determinations. sequent sample), the formula x̄ ± ts1 / n 1 + n 2
(e) How strong is the evidence that there is a real (with t still based on n 1 − 1 degrees of freedom)
difference in the precisions with which the is used to estimate µ.
aluminum resistivities and the copper resistiv- Suppose that in estimating the mean resistance
ities can be measured at 21.8◦ C? (Carry out of a production run of resistors, it is desirable to
a significance test of H0 : σcopper = σaluminum have the two-sided confidence level be 95% and
versus Ha : σcopper 6= σaluminum using the data the “± part” of the interval no longer than .5 .
of this problem and the 21.8◦ C data of Exer- (a) If an initial sample of n 1 = 5 resistors pro-
cise 15.) duces a sample standard deviation of 1.27 ,
(f) Again using the data of this exercise and Ex- how many (if any) additional resistors should
ercise 15, give a 90% two-sided confidence be sampled in order to meet the stated goals?
interval for the ratio σcopper /σaluminum . (b) If all of the n 1 + n 2 resistors taken together
17. (The Stein Two-Stage Estimation Procedure) produce the sample mean x̄ = 102.8 , what
One of the most common of all questions faced confidence interval for µ should be declared?
by engineers planning a data-based study is how 18. Example 15 of Chapter 5 concerns some data on
much data to collect. The last part of Example 3 service times at a residence hall depot counter.
illustrates a rather crude method of producing an The data portrayed in Figure 5.21 are decidedly
answer to the sample-size question when estima- nonnormal-looking, so prediction and tolerance
tion of a single mean is involved. In fact, in such interval formulas based on normal distributions
circumstances, a more careful two-stage proce- are not appropriate for use with these data. How-
dure due to Charles Stein can sometimes be used ever, the largest of the n = 65 observed service
to find appropriate sample sizes. times in that figure is 87 sec.
Chapter 6 Exercises 433
(iii) Give a 95% two-sided tolerance inter- counts in the accompanying table are the num-
val for 90% of additional flatness distortion bers of cars (out of 25 checked) falling into the
values. four possible categories.
(iv) Give a 90% two-sided confidence inter-
val for the standard deviation of flatness dis- Underinflated
tortion values for gears of this type. tires
(d) Repeat parts (b) and (c) using the improved
settings’ concentricity values, y2 , instead of At Least
flatness. None One Tire
(e) Explain why it is not possible to base formal
Overinflated None 6 5
inferences (tests and confidence intervals), for
comparing the standard deviations of the y1 tires
At Least One Tire 10 4
and y2 distributions for the improved process
settings, on the sample standard deviations of (a) Behne’s sample was in all likelihood a con-
the y1 and y2 measurements from gears 1A venience sample (as opposed to a genuinely
through 10A. simple random sample) of the cars in the large
(f) What assumptions are necessary in order to lot. Does it make sense to argue in this case
make comparisons between parameters of the that the data can be treated as if the sample
y1 (or y2 ) distributions for the original and were a simple random sample? On what ba-
improved settings of the process variables? sis? Explain.
(g) Make normal plots of the y1 data for the (b) Give a two-sided 90% confidence interval for
original settings and for the improved set- the fraction of all cars in the lot with at least
tings on the same set of axes. Does an “equal one underinflated tire.
variances, normal distributions” model ap- (c) Give a two-sided 90% confidence interval for
pear tenable here? Explain. the fraction of the cars in the lot with at least
(h) Supposing that the flatness distortion distri- one overinflated tire.
butions for the original and improved process (d) Give a 90% lower confidence bound on the
settings are adequately described as normal fraction of cars in the lot with at least one
with a common standard deviation, do the misinflated tire.
following. (e) Why can’t the data here be used with formula
(i) Use an appropriate significance test to as- (6.67) of Section 6.5 to make a confidence
sess the strength of the evidence in the data to interval for the difference in the fraction of
the effect that the improved settings produce cars with at least one underinflated tire and
a reduction in mean flatness distortion. the fraction with at least one overinflated tire?
(ii) Give a 90% lower confidence bound on 22. The article “A Recursive Partitioning Method for
the reduction in mean flatness distortion pro- the Selection of Quality Assurance Tests” by Raz
vided by the improved process settings. and Bousum (Quality Engineering, 1990) con-
(i) Repeat parts (g) and (h) using the y2 values tains some data on the fractions of torque convert-
and concentricity instead of flatness. ers manufactured in a particular facility failing a
21. R. Behne measured air pressure in car tires in a final inspection (and thus requiring some rework).
student parking lot. Shown here is one summary of For a particular family of four-element convert-
the data he reported. Any tire with pressure read- ers, about 39% of 442 converters tested were out
ing more than 3 psi below its recommended value of specifications on a high-speed operation inlet
was considered underinflated, while any tire with flow test.
pressure reading more than 3 psi above its recom-
mended value was considered overinflated. The
Chapter 6 Exercises 435
(a) If plant conditions tomorrow are like those (a) Compare the variabilities of the gripping pres-
under which the 442 converters were man- sures delivered to the two different objects
ufactured, give a two-sided 98% confidence using an appropriate 98% two-sided confi-
interval for the probability that a given con- dence interval. Does there appear to be much
verter manufactured will fail the high-speed evidence in the data of a difference between
inlet flow test. these? Explain.
(b) Suppose that a process change is instituted in (b) Supposing that the variabilities of gripping
an effort to reduce the fraction of converters pressure delivered by the gripper to the two
failing the high-speed inlet flow test. If only different objects are comparable, give a 95%
32 out of the first 100 converters manufac- two-sided confidence interval for the differ-
tured fail the high-speed inlet flow test, is this ence in mean gripping pressures delivered.
convincing evidence that a real process im- (c) The data here came from the operation of a
provement has been accomplished? (Give and single prototype gripper. Why would you ex-
interpret a 90% two-sided confidence interval pect to see more variation in measured grip-
for the change in test failure probability.) ping pressures than that represented here if
23. Return to the situation of Chapter Exercise 1 in each measurement in a sample were made on
Chapter 3 and the measured gains of 120 ampli- a different gripper? Strictly speaking, to what
fiers. The nominal/design value of the gain was do the inferences in (a) and (b) apply? To the
10.0 dB; 16 of the 120 amplifiers measured had single prototype gripper or to all grippers of
gains above nominal. Give a 95% two-sided con- this design? Discuss this issue.
fidence interval for the fraction of all such ampli- 25. A sample of 95 U-bolts produced by a small com-
fiers with above-nominal gains. pany has thread lengths with a mean of x̄ = 10.1
24. The article “Multi-functional Pneumatic Gripper (.001 in. above nominal) and s = 3.2 (.001 in.).
Operating Under Constant Input Actuation Air (a) Give a 95% two-sided confidence interval for
Pressure” by J. Przybyl (Journal of Engineering the mean thread length (measured in .001 in.
Technology, 1988) discusses the performance of a above nominal). Judging from this interval,
6-digit pneumatic robotic gripper. One part of the would you expect a small or a large p-value
article concerns the gripping pressure (measured when testing H0 : µ = 0 versus Ha : µ 6= 0?
by manometers) delivered to objects of different Explain.
shapes for fixed input air pressures. The data given (b) Use the five-step format of Section 6.2 and
here are the measurements (in psi) reported for assess the strength of the evidence provided
an actuation pressure of 40 psi for (respectively) by the data to the effect that the population
a 1.7 in. × 1.5 in. × 3.5 in. rectangular bar and a mean thread length exceeds nominal.
circular bar of radius 1.0 in. and length 3.5 in. 26. D. Kim did some crude tensile strength testing on
pieces of some nominally .012 in. diameter wire
Rectangular Bar Circular Bar of various lengths. Below are Kim’s measured
strengths (kg) for pieces of wire of lengths 25 cm
76 84 and 30 cm.
82 87
85 94 25 cm Lengths 30 cm Lengths
88 80
4.00, 4.65, 4.70, 4.50 4.10, 4.50, 3.80, 4.60
82 92
4.40, 4.50, 4.50, 4.20 4.20, 4.60, 4.60, 3.90
436 Chapter 6 Introduction to Formal Statistical Inference
(a) If one is to make a confidence interval for the the diameters of 821 particles observed in a bright
mean measured strength of 25 cm pieces of field TEM micrograph of a Zircaloy-4 specimen.
this wire using the methods of Section 6.3, The sample mean diameter was x̄ = .055 µm, and
what model assumption must be employed? the sample standard deviation of the diameters
Make a probability plot useful in assessing was s = .028 µm.
the reasonableness of the assumption. (a) The engineering researchers wished to es-
(b) Make a 95% two-sided confidence interval for tablish from their observation of this single
the mean measured strength of 25 cm pieces specimen the impact of a certain combination
of this wire. of specimen lot and heat-treating regimen on
(c) Give a 95% lower confidence bound for the particle size. Briefly discuss why data such as
mean measured strength of 25 cm pieces. the ones summarized have serious limitations
(d) Make a 95% two-sided prediction interval for for this purpose. (Hints: The apparent “sam-
a single additional measured strength for a ple size” here is huge. But of what is there a
25 cm piece of wire. sample? How widely do the researchers want
(e) Make a 99% two-sided tolerance interval for their results to apply? Given this desire, is the
95% of additional measured strengths of “real” sample size really so large?)
25 cm pieces of this wire. (b) Use the sample information and give a 98%
(f) Consider the statistical interval derived from two-sided confidence interval for the mean di-
the minimum and maximum sample values ameter of particles in this particular Zircaloy-
for the 25 cm lengths—namely, (4.00, 4.70). 4 specimen.
What confidence should be associated with (c) Suppose that a standard method of heat treat-
this interval as a prediction interval for a sin- ing for such specimens is believed to produce
gle additional measured strength? What con- a mean particle diameter of .057 µm. Assess
fidence should be associated with this interval the strength of the evidence contained in the
as a tolerance interval for 95% of additional sample of diameter measurements to the ef-
measured strengths for 25 cm pieces of this fect that the specimen’s mean particle diam-
wire? eter is different from the standard. Show the
(g) In order to make formal inferences about whole five-step format.
µ25 − µ30 based on these data, what must (d) Discuss, in the context of part (c), the po-
you be willing to use for model assumptions? tential difference between the mean diameter
Make a plot useful for investigating the rea- being statistically different from .057 µm and
sonableness of those assumptions. there being a difference between µ and .057
(h) Proceed under the assumptions discussed in that is of practical importance.
part (g) and assess the strength of the evi- 28. Return to Kim’s tensile strength data given in Ex-
dence provided by Kim’s data to the effect ercise 26.
that an increase in specimen length produces (a) Operating under the assumption that mea-
a decrease in measured strength. sured tensile strengths of 25 cm lengths of
(i) Proceed under the necessary model assump- the wire studied are normally distributed, give
tions to give a 98% two-sided confidence in- a two-sided 98% confidence interval for the
terval for µ25 − µ30 . standard deviation of measured strengths.
27. The article “Influence of Final Recrystallization (b) Operating under the assumption that mea-
Heat Treatment on Zircaloy-4 Strip Corrosion” sured tensile strengths of 30 cm lengths of the
by Foster, Dougherty, Burke, Bates, and Worces- wire studied are normally distributed, give a
ter (Journal of Nuclear Materials, 1990) reported 95% upper confidence bound for the standard
some summary statistics from the measurement of deviation of measured strengths.
Chapter 6 Exercises 437
(c) Operating under the assumption that both 25 is a 95% confidence interval.) How does this
and 30 cm lengths of the wire have normally number compare to the lower end point of
distributed measured tensile strengths, assess your interval from (a)?
the strength of Kim’s evidence that 25 and (c) Repeat (a) using 90% confidence. How does
30 cm lengths differ in variability of their this interval compare with the one from (a)?
measured tensile strengths. (Use H0 : σ25 = (d) Repeat (b) using 90% confidence. How does
σ30 and Ha : σ25 6= σ30 and show the whole this bound compare to the one found in (b)?
five-step format.) (e) Interpret your interval from (a) for someone
(d) Operating under the assumption that both 25 with little statistical background. (Speak in
and 30 cm lengths produce normally dis- the context of the drilling study and use the
tributed tensile strengths, give a 98% two- “authorized interpretation” of confidence as
sided confidence interval for the ratio σ25 /σ30 . your guide.)
29. Find the following quantiles: (f) Based on your confidence intervals, would
(a) the .99 quantile of the χ42 distribution you expect the p-value in a test of H0 : µ =
(b) the .025 quantile of the χ42 distribution .0210 versus Ha : µ 6= .0210 to be small? Ex-
(c) the .99 quantile of the F distribution with plain.
numerator degrees of freedom 3 and denom- (g) Based on your confidence intervals, would
inator degrees of freedom 15 you expect the p-value in a test of H0 : µ =
(d) the .25 quantile of the F distribution with .0210 versus Ha : µ > .0210 to be small? Ex-
numerator degrees of freedom 3 and denom- plain.
inator degrees of freedom 15 (h) Consider again your answer to part (a). A col-
league sees your calculations and says, “Oh,
30. The digital and vernier caliper measurements of
so 95% of the measured diameters would be
no. 10 machine screw diameters summarized in
in that range?” What do you say to this per-
Exercise 3 of Section 6.3 are such that for 19 out
son?
of 50 of the screws, there was no difference in
(i) Use the five step significance-testing format
the measurements. Based on these results, give a
of Section 6.2 and assess the strength of the
95% confidence interval for the long-run fraction
evidence provided by the data to the effect
of such measurements by the student technician
that the process mean diameter differs from
that would produce agreement between the digital
the mid-specification of .0210. (Begin with
and vernier caliper measurements.
H0 : µ = .0210 and use Ha : µ 6= .0210.
31. Duren, Leng, and Patterson studied the drilling of (j) Thus far in this exercise, inference for the
holes in a miniature metal part using electrical dis- mean hole diameter has been of interest. Ex-
charge machining. Blueprint specifications on a plain why in practice the variability of di-
certain hole called for diameters of .0210 ± .0003 ameters is also important. The methods of
in. The diameters of this hole were measured on 50 Sections 6.1 are not designed for analyzing
parts with plug gauges and produced x̄ = .02046 distributional spread. Where in Chapter 6 can
and s = .00178. Assume that the holes the stu- you find inference methods for this feature?
dents measured were representative of the output
32. Return to Babcock’s fatigue life testing data in
of a physically stable drilling process.
Chapter Exercise 18 of Chapter 3 and for now
(a) Give a 95% two-sided confidence interval for
focus on the fatigue life data for heat 1.
the mean diameter of holes drilled by this
(a) In order to do inference based on this small
process.
sample, what model assumptions must you
(b) Give a 95% lower confidence bound for the
employ? What does a normal plot say about
mean diameter of the holes drilled by this
the appropriateness of these assumptions?
process. (Find a number, #, so that (#, ∞)
438 Chapter 6 Introduction to Formal Statistical Inference
(b) Give a 90% two-sided confidence interval for (b) What assumption must you make in order to
the mean fatigue life of such specimens from do formal inference on the mean difference
this heat. in dial bore and air spindler gauge measure-
(c) Give a 90% lower confidence bound for the ments? Make a plot useful for assessing the
mean fatigue life of such specimens from this reasonableness of this assumption. Comment
heat. on what it indicates in this problem.
(d) If you are interested in quantifying the vari- (c) Make the necessary assumptions about the
ability in fatigue lives produced by this heat dial bore and air spindler measurements and
of steel, inference for σ becomes relevant. assess the strength of the evidence in the data
Give a 95% two-sided confidence interval for of a systematic difference between the two
σ based on display (6.42) of the text. gauges.
(e) Make a 90% two-sided prediction interval for (d) Make a 95% two-sided confidence interval
a single additional fatigue life for a specimen for the mean difference in dial bore and air
from this heat. spindler measurements.
(f) Make a 95% two-sided tolerance interval for (e) Briefly discuss how your answers for parts (c)
90% of additional fatigue lives for specimens and (d) of this problem are consistent.
from this heat. How does this interval com- 34. Chapter Exercise 20 in Chapter 3 concerned the
pare to your interval from (e)? drilling of holes in miniature metal parts using
(g) Now consider the statistical interval derived laser drilling and electrical discharge machining.
from the minimum and maximum sample val- Return to that problem and consider first only the
ues from heat 1, namely (11, 548). What con- EDM values.
fidence should be associated with this interval (a) In order to use the methods of inference of
as a prediction interval for a single additional Section 6.3 with these data, what model as-
fatigue life from this heat? What confidence sumptions must be made? Make a plot useful
should be associated with the interval as a tol- for investigating the appropriateness of those
erance interval for 90% of additional fatigue assumptions. Comment on the shape of that
lives? plot and what it says about the appropriate-
Now consider both the data for heat 1 and the data ness of the model assumptions.
for heat 3. (b) Give a 99% two-sided confidence interval for
(h) In order to make formal inferences about the mean angle produced by the EDM drilling
µ1 − µ3 based on these data, what must be of this hole.
assumed about fatigue lives for specimens (c) Give a 99% upper confidence bound for the
from these two heats? Make a plot useful for mean angle produced by the EDM drilling of
investigating the reasonableness of these as- this hole.
sumptions. (d) Give a 95% two-sided confidence interval for
(i) Under the appropriate assumptions (state the standard deviation of angles produced by
them), give a 95% two-sided confidence in- the EDM drilling of this hole.
terval for µ1 − µ3 . (e) Make a 99% two-sided prediction interval
33. Consider the Notch/Dial Bore and Notch/Air for the next measured angle produced by the
Spindler measurements on ten servo sleeves re- EDM drilling of this hole.
corded in Chapter Exercise 19 in Chapter 3. (f) Make a 95% two-sided tolerance interval for
(a) If one wishes to compare the dial bore gauge 99% of angles produced by the EDM drilling
and the air spindler gauge measurements, the of this hole.
methods of formulas (6.35), (6.36), and (6.38) (g) Consider the statistical interval derived from
are not appropriate. Why? the minimum and maximum sample EDM
Chapter 6 Exercises 439
values, namely (43.2, 46.1). What confidence roll over on their sides. “Tilttable ratios” (which
should be associated with this interval as are the tangents of the angles at which lift-off
a prediction interval for a single additional occurred) were measured for two minivans of dif-
measured angle? What confidence should be ferent makes four times each with the following
associated with this interval as a tolerance in- results.
terval for 99% of additional measured angles?
Now consider both the EDM and initial set of Van 1 Van 2
Laser values in Chapter Exercise 20 of Chapter 3
(two sets of 13 parts). 1.096, 1.093, .962, .970,
(h) In order to make formal inferences about 1.090, 1.093 .967, .966
µLaser − µEDM based on these data, what must
you be willing to use for model assumptions? (a) If you were to make a confidence interval
Make a plot useful for investigating the rea- for the long-run mean measured tilttable ratio
sonableness of those assumptions. for Van 1 (under conditions like those expe-
(i) Proceed under appropriate assumptions to as- rienced during the testing) using the methods
sess the strength of the evidence provided by of Section 6.3, what model assumption must
the data that there is a difference in the mean be made?
angles produced by the two drilling methods. (b) Make a 95% two-sided confidence interval for
(j) Give a 95% two-sided confidence interval for the mean measured tilttable ratio for Van 1 un-
µLaser − µEDM . der conditions like those experienced during
(k) Give a 90% two-sided confidence interval for the testing.
comparing the standard deviations of angles (c) Give a 95% lower confidence bound for the
produced by Laser and EDM drilling of this mean measured tilttable ratio for Van 1.
hole. (d) Give a 95% lower confidence bound for the
Now consider both sets of Laser measurements standard deviation of tilttable ratios for Van 1.
given in Chapter Exercise 20 of Chapter 3. (Holes (e) Make a 95% two-sided prediction interval for
A and B are on the same 13 parts.) a single additional measured tilttable ratio for
(l) If you wished to compare the mean angle Van 1 under conditions such as those experi-
measurements for the two holes, the formulas enced during testing.
used in (i) and (j) are not appropriate. Why? (f) Make a 99% two-sided tolerance interval for
(m) Make a 90% two-sided confidence interval 95% of additional measured tilttable ratios for
for the mean difference in angles for the two Van 1.
holes made with the laser equipment. (g) Consider the statistical interval derived from
(n) Assess the strength of the evidence provided the minimum and maximum sample values
by these data that there is a systematic differ- for Van 1, namely (1.090, 1.096). What con-
ence in the angles of the holes made with the fidence should be associated with this inter-
laser equipment. val as a prediction interval for a single ad-
(o) Briefly discuss why your answers to parts (m) ditional measured tilttable ratio? What confi-
and (n) of this exercise are compatible. (Dis- dence should be associated with this interval
cuss how the outcome of part (n) could have as a tolerance interval for 95% of additional
been anticipated from the outcome of part tilttable test results for Van 1?
(m).) Now consider the data for both vans.
(h) In order to make formal inferences about
35. A so-called “tilttable” test was run in order to
µ1 − µ2 based on these data, what must you
determine the angles at which certain vehicles ex-
be willing to use for model assumptions?
perience lift-off of one set of wheels and begin to
440 Chapter 6 Introduction to Formal Statistical Inference
(i) Proceed under the necessary assumptions to (k) Proceed under the necessary model assump-
assess the strength of the evidence provided tions to give a 90% two-sided confidence in-
by the data that there is a difference in mean terval for σ1 /σ2 .
measured tilttable ratios for the two vans.
(j) Proceed under the necessary model assump-
tions to give a 90% two-sided confidence in-
terval for µ1 − µ2 .
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Table 1
Inference Methods for Individual Values
observations normal x̄ ± τ2 s
or (x̄ − τ1 s, ∞) 6.6
or (−∞, x̄ + τ1 s)
Chapter 6 Summary Tables 441
Table 2
Inference Methods for One and Two Means
Inference For Sample Size Assumptions H0 , Test Stat, Reference Interval Section
s
µ (one mean) large n H0 : µ = # x̄ ± z √ 6.1, 6.2
n
x̄ − #
Z= √
s/ n
standard normal
s
small n observations H0 : µ = # x̄ ± t √ 6.3
normal n
x̄ − #
T = √
s/ n
t with ν = n − 1
s
s12 s2
µ 1 − µ2 large n 1 , n 2 independent H0 : µ 1 − µ 2 = # x̄ 1 − x̄ 2 ± z + 2 6.3
(difference samples n1 n2
in means) x̄ 1 − x̄ 2 − #
Z= r
s12 s22
n1
+ n2
standard normal
s
1 1
small n 1 or n 2 independent H0 : µ 1 − µ 2 = # x̄ 1 − x̄ 2 ± tsP + 6.3
normal samples n1 n2
x̄ 1 − x̄ 2 − #
σ1 = σ2 T = q
sP n1 + n1
1 2
t with ν = n 1 + n 2 − 2
s
s12 s2
possibly σ1 6= σ2 x̄ 1 − x̄ 2 ± t̂ + 2 6.3
n1 n2
use random ν̂ given in (6.37)
s
µd large n (paired data) H0 : µ d = # d̄ ± z √dn 6.3
(mean
difference) d̄ − #
Z= √
sd / n
standard normal
s
small n (paired data) H 0 : µd = # d̄ ± t √d 6.3
n
d̄ − #
normal T = √
differences sd / n
t with ν = n − 1
442 Chapter 6 Introduction to Formal Statistical Inference
Table 3
Inference Methods for Variances
σ12
σ12 /σ22 (variance ratio) observations normal H0 : =#
independent samples σ22
Table 4
Inference Methods for Proportions
Inference for
Unstructured
Multisample Studies
C hapter 6 introduced the basics of formal statistical inference in one- and two-
sample studies. This chapter begins to consider formal inference for multisample
studies, with a look at methods that make no explicit use of structure relating the
samples (beyond time order of data collection). That is, the study of inference
methods specifically crafted for use in factorial and fractional factorial studies and
in curve- and surface-fitting analyses will be delayed until subsequent chapters.
The chapter opens with a discussion of the standard one-way model typically
used in the analysis of measurement data from multisample studies and of the role
of residuals in judging its appropriateness. The making of confidence intervals in
multisample contexts is then considered, including both individual and simultane-
ous confidence interval methods. The one-way analysis of variance (ANOVA) test
for the hypothesis of equality of several means and a related method of estimating
variance components are introduced next. The chapter then covers the basics of
Shewhart control (or process monitoring) charts. The x̄, R, and s control charts for
measurement data are studied. The chapter then closes with a section on p charts
and u charts for attributes data.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
443
444 Chapter 7 Inference for Unstructured Multisample Studies
This section begins to provide such methods. First the reader is reminded of the
usefulness of some of the simple graphical tools of Chapter 3 for making informal
comparisons in multisample studies. Next the “equal variances, normal distribu-
tions” model is introduced. The role of residuals in evaluating the reasonableness
of that model in an application is explained and emphasized. The section then pro-
ceeds to introduce the notion of combining several sample variances to produce a
single pooled estimate of baseline variation. Finally, there is a discussion of how
standardized residuals can be helpful when sample sizes vary considerably.
6000
Compressive strength (psi)
5000
4000
3000
2000
1000
1 2 3 4 5 6 7 8
Concrete formula
Table 7.1
Compressive Strengths for 24 Concrete Specimens
Table 7.2
Empirical Spring Constants
2.0 2.0
Type 3 Type 3
Type 1 Type 1
springs springs
springs springs
Methods of formal statistical inference are meant to sharpen and quantify the
impressions that one gets when making a descriptive analysis of data. But an intel-
ligent graphical look at data and a correct application of formal inference methods
rarely tell completely different stories. Indeed, the methods of formal inference of-
fered here for simple, unstructured multisample studies are confirmatory—in cases
like Examples 1 and 2, they should confirm what is clear from a descriptive or
exploratory look at the data.
Distribution 1 Distribution 2
Distribution 3 Distribution r
µ3 µ1 µr µ2
where µi is the ith underlying mean and the quantities 11 , 12 , . . . , 1n , 21 , 22 , . . . ,
1
2n , . . . , r 1 , r 2 , . . . , r n are independent normal random variables with mean 0
2 r
and variance σ 2 . (In this statement, the means µ1 , µ2 , . . . , µr and the variance σ 2
are typically unknown parameters.)
Equation (7.1) says exactly what is conveyed by Figure 7.4 and the statement
of the one-way assumptions in words. But it says it in a way that is suggestive of
another useful pattern of thinking, reminiscent of the “residual” notion that was
used extensively in Sections 4.1, 4.2, and 4.3. That is, equation (7.1) says that an
observation in sample i is made up of the corresponding underlying mean plus some
random noise, namely
i j = yi j − µi
ni
ith sample mean 1 X
ȳ i = y
n i j=1 i j
That is,
Fitted values
for the one- ŷ i j = ȳ i (7.2)
way model
(This is not only intuitively plausible but also consistent with what was done in
Sections 4.1 and 4.2. If one fits the approximate relationship yi j ≈ µi to the data via
P
least squares—i.e., by minimizing i j (yi j − µi )2 over choices of µ1 , µ2 , . . . , µr —
each minimizing value of µi is ȳ i .)
Taking equation (7.2) to specify fitted values for an r -sample study, the pattern
established in Chapter 4 (specifically, Definition 4, page 132) then says that residuals
are differences between observed values and sample means. That is, with
one has
Residuals for
the one-way ei j = yi j − ŷ i j = yi j − ȳ i (7.3)
model
yi j = ŷ i j + ei j = ȳ i + ei j (7.4)
yi j = µi + i j = ȳ i + ei j (7.5)
This is a specific instance of a pattern of thinking that runs through all of the common
normal-distribution-based methods of analysis for multisample studies. In words,
equation (7.5) says
and display (7.6) is a paradigm that provides a unified way of approaching the
majority of the analysis methods presented in the rest of this book.
The decompositions (7.5) and (7.6) suggest that
The fact that the i j in equation (7.1) are assumed to be iid normal (0, σ 2 ) random
variables then suggests that the ei j ought to look at least approximately like a random
sample from a normal distribution.
So the normal-plotting of an entire set of residuals (as in Chapter 4) is a way
of checking on the reasonableness of the one-way model. The plotting of residuals
against (1) fitted values, (2) time order of observation, or (3) any other potentially
relevant variable—hoping (as in Chapter 4) to see only random scatter—are other
ways of investigating the appropriateness of the model assumptions.
These kinds of plotting, which combine residuals from all r samples, are often
especially useful in practice. When r is large at all, budget constraints on total data
collection costs often force the individual sample sizes n 1 , n 2 , . . . , n r to be fairly
small. This makes it fruitless to investigate “single variance, normal distributions”
model assumptions using (for example) sample-by-sample normal plots. (Of course,
where all of n 1 , n 2 , . . . , n r are of a decent size, a sample-by-sample approach can
be effective.)
450 Chapter 7 Inference for Unstructured Multisample Studies
Example 1 Returning again to the concrete strength study, consider investigating the reason-
(continued ) ableness of model (7.1) for this case. Figure 7.1 is a first step in this investigation.
As remarked earlier, it conveys the visual impression that at least the “equal
variances” part of the one-way model assumptions is plausible. Next, it makes
sense to compute some summary statistics and examine them, particularly the
sample standard deviations. Table 7.3 gives sample sizes, sample means, and
sample standard deviations for the data in Table 7.1.
At first glance, it might seem worrisome that in this table s1 is more than three
times the size of s8 . But the sample sizes here are so small that a largest ratio of
Table 7.3
Summary Statistics for the Concrete Strength Study
i, ni , ȳ i , si ,
Concrete Sample Sample Sample Standard
Formula Size Mean (psi) Deviation (psi)
1 n1 =3 ȳ 1 = 5,635.3 s1 = 965.6
2 n2 =3 ȳ 2 = 5,753.3 s2 = 432.3
3 n3 =3 ȳ 3 = 4,527.3 s3 = 509.9
4 n4 =3 ȳ 4 = 3,442.3 s4 = 356.4
5 n5 =3 ȳ 5 = 2,923.7 s5 = 852.9
6 n6 =3 ȳ 6 = 3,324.7 s6 = 353.5
7 n7 =3 ȳ 7 = 1,551.3 s7 = 505.5
8 n8 =3 ȳ 8 = 2,390.7 s8 = 302.5
Table 7.4
Example Computations of Residuals for the Concrete Strength Study
i, yi j , ŷ i j = ȳ i ,
Concrete Compressive Fitted ei j ,
Specimen Formula Strength (psi) Value Residual
1 1 5,800 5,635.3 164.7
2 1 4,598 5,635.3 −1,037.3
3 1 6,508 5,635.3 872.7
4 2 5,659 5,753.3 −94.3
5 2 6,225 5,753.3 471.7
.. .. .. .. ..
. . . . .
22 8 2,051 2,390.7 −339.7
23 8 2,631 2,390.7 240.3
24 8 2,490 2,390.7 99.3
7.1 The One-Way Normal Model 451
sample standard deviations on the order of 3.2 is hardly unusual (for r = 8 sam-
ples of size 3 from a normal distribution). Note from the F tables (Tables B.6)
that for samples of size 3, even if only 2 (rather than 8) sample standard de-
viations were involved, a ratio of sample variances of (965.6/302.5)2 ≈ 10.2
would yield a p-value between .10 and .20 for testing the null hypothesis
of equal variances with a two-sided alternative. The sample standard devia-
tions in Table 7.3 really carry no strong indication that the one-way model
is inappropriate.
Since the individual sample sizes are so small, trying to see anything useful
in eight separate normal plots of the samples is hopeless. But some insight can
be gained by calculating and plotting all 8 × 3 = 24 residuals. Some of the
calculations necessary to compute residuals for the data in Table 7.1 (using the
fitted values appearing as sample means in Table 7.3) are shown in Table 7.4.
Figures 7.5 and 7.6 are, respectively, a plot of residuals versus fitted y (ei j versus
ȳ i j ) and a normal plot of all 24 residuals.
700
Residual, eij
–700
3.0
Standard normal quantile
1.5 2
0.0
2
–1.5
Example 2 The spring testing data can also be examined with the potential use of the one-way
(continued ) normal model (7.1) in mind. Figures 7.2 and 7.3 indicate reasonably comparable
variabilities of experimental spring constants for the r = 3 different spring types.
The single very large value (for spring type 1) causes some doubt both in terms of
this judgment and also (by virtue of its position on its boxplot as an outlying value)
regarding a “normal distribution” description of type 1 experimental constants.
Summary statistics for these samples are given in Table 7.5.
Table 7.5
Summary Statistics for the Empirical
Spring Constants
i, Spring Type ni ȳ i si
1 7 2.030 .134
2 6 2.750 .074
3 6 2.035 .064
Without the single extreme value of 2.30, the first sample standard deviation
would be .068, completely in line with those of the second and third samples.
But even the observed ratio of largest to smallest sample variance (namely
(.134/.064)2 = 4.38) is not a compelling reason to abandon a one-way model
description of the spring constants. (A look at the F tables with ν1 = 6 and ν2 = 5
shows that 4.38 is between the F6,5 distribution .9 and .95 quantiles. So even if
there were only two rather than three samples involved, a variance ratio of 4.38
would yield a p-value between .1 and .2 for (two-sided) testing of equality of
variances.) Before letting the single type 1 empirical spring constant of 2.30 force
abandonment of the highly tractable model (7.1) some additional investigation
is warranted.
Sample sizes n 1 = 7 and n 2 = n 3 = 6 are large enough that it makes sense
to look at sample-by-sample normal plots of the spring constant data. Such plots,
drawn on the same set of axes, are shown in Figure 7.7. Further, use of the fitted
values ( ȳ i ) listed in Table 7.5 with the original data given in Table 7.2 produces
7.1 The One-Way Normal Model 453
Type 1 springs
Type 2 springs
Type 3 springs
1.0
Standard normal quantile
–1.0
Table 7.6
Example Computations of Residuals for the Spring Constant Study
j,
i, Observation yi j , ŷ i j = ȳ i , ei j ,
Spring Type Number Spring Constant Sample Mean Residual
1 1 1.99 2.030 −.040
.. .. .. .. ..
. . . . .
1 7 2.30 2.030 .270
2 1 2.85 2.750 .100
.. .. .. .. ..
. . . . .
2 6 2.80 2.750 .050
3 1 2.10 2.035 .065
.. .. .. .. ..
. . . . .
3 6 2.05 2.035 .015
19 residuals, as partially illustrated in Table 7.6. Then Figures 7.8 and 7.9, re-
spectively, show a plot of residuals versus fitted responses and a normal plot of
all 19 residuals.
454 Chapter 7 Inference for Unstructured Multisample Studies
Example 2
(continued ) 0.30
–0.15
1.95 2.10 2.25 2.40 2.55 2.70
Fitted response, yij = yi (psi)
3.0
1.5
2
0.0 3
2
–1.5
But Figures 7.8 and 7.9 again draw attention to the largest type 1 empirical
spring constant. Compared to the other measured values, 2.30 is simply too large
(and thus produces a residual that is too large compared to all the rest) to permit
serious use of model (7.1) with the spring constant data. Barring the possibility
that checking of original data sheets would show the 2.30 value to be an arithmetic
blunder or gross error of measurement (which could be corrected or legitimately
force elimination of the 2.30 value from consideration), it appears that the use of
model (7.1) with the r = 3 spring types could produce inferences with true (and
unknown) properties quite different from their nominal properties.
One might, of course, limit attention to spring types 2 and 3. There is nothing
in the second or third samples to render the “equal variances, normal distributions”
model untenable for those two spring types. But the pattern of variation for
springs of type 1 appears to be detectably different from that for springs of types
2 and 3, and the one-way model is not appropriate when all three types are
considered.
7.1 The One-Way Normal Model 455
Definition 1 is just Definition 14 in Chapter 6 restated for the case of more than
two samples. As was the case for sP based on two samples, sP is guaranteed to lie
between the largest and smallest of the si and is a mathematically convenient form
of compromise value.
Equation (7.7) can be rewritten in a number of equivalent forms. For one thing,
letting
X
r X
r X
r
(n i − 1) = ni − 1 = n −r
i=1 i=1 i=1
n
1 X i
si2 = (y − ȳ i )2
n i − 1 j=1 i j
456 Chapter 7 Inference for Unstructured Multisample Studies
ni
X
r X
= ei2j (7.9)
i=1 j=1
Alternative So one can define sP2 in terms of the right-hand side of equation (7.8) or (7.9) divided
formulas for sP2 by n − r .
Example 1 For the compressive strength data, each of n 1 , n 2 , . . . , n 8 are 3, and s1 through s8
(continued ) are given in Table 7.3. So using equation (7.7),
and thus
p
I sP = 338,213 = 581.6 psi
(n − r )sP2
σ2
has a χn−r
2
distribution. Thus, in a manner exactly parallel to the derivation in Section
6.4, a two-sided confidence interval for σ 2 has endpoints
Example 1 In the concrete compressive strength case, consider the use of display (7.10) in
(continued ) making a two-sided 90% confidence interval for σ . Since n − r = 16 degrees
of freedom are associated with sP2 , one consults Table B.5 for the .05 and .95
quantiles of the χ16
2
distribution. These are 7.962 and 26.296, respectively. Thus,
from display (7.10), a confidence interval for σ 2 has endpoints
16(581.6)2 16(581.6)2
and
26.296 7.962
So a two-sided 90% confidence interval for σ has endpoints
s s
16(581.6)2 16(581.6)2
and
26.296 7.962
that is,
that the i j are iid normal variables, the ei j ought to look approximately like iid
normal variables. This is sensible rough-and-ready reasoning, adequate for many
circumstances. But strictly speaking, the ei j are neither independent nor identically
distributed, and it can be important to recognize this.
As an extreme example of the dependence of the residuals for a given sample i,
consider a case where n i = 2. Since
ei j = yi j − ȳ i
one immediately knows that ei1 = −ei2 . So ei1 and ei2 are clearly dependent.
One can further apply Proposition 1 of Chapter 5 to show that if the sample
sizes n i are varied, the residuals don’t have the same variance (and therefore can’t
be identically distributed). That is, since
ni − 1 1 X
ei j = yi j − ȳ i = yi j − yi j 0
ni ni 0
j 6= j
So, for example, residuals from a sample of size n i = 2 have variance σ 2 /2, while
those from a sample of size n i = 100 have variance 99σ 2 /100, and one ought to
expect residuals from larger samples to be somewhat bigger in magnitude than those
from small samples.
A way of addressing at least the issue that residuals need not have a common
variance is through the use of standardized residuals.
Definition 2 If a residual e has variance a · σ 2 for some positive constant a, and s is some
estimate of σ , the standardized residual corresponding to e is
e
e∗ = √ (7.12)
s a
√
The division by s a in equation (7.12) is a division by an estimated standard
deviation of e. It serves, so to speak, to put all of the residuals on the same scale.
7.1 The One-Way Normal Model 459
ei j
Standardized ei∗j = s (7.13)
residuals for the ni − 1
one-way model
sP
ni
is a somewhat more refined way of judging the adequacy of the one-way model
than the plotting of raw residuals ei j illustrated in Examples 1 and 2. When all n i
are the same, as in Example 1, the plotting of the standardized residuals in equation
(7.13) is completely equivalent to plotting with the raw residuals. And as a practical
matter, unless some n i are very small and others are very large, the standardization
used in equation (7.13) typically doesn’t have much effect on the appearance of
residual plots.
Example 2 In the spring constant study, allowing for the fact that sample 1 is larger than the
(continued ) other two (and thus according to the model (7.1) should produce larger residuals)
doesn’t materially change the outcome of the residual analysis. To see this, note
that using the summary statistics in Table 7.5,
so that
√
sP = .0097 = .099
Then using equation (7.13), each residual from sample 1 should be divided by
r
7−1
.099 = .0913
7
to get standardized residuals, while each residual from the second and third
samples should be divided by
r
6−1
.099 = .0900
6
Clearly, .0913 and .0900 are not much different, and the division before plotting
has little effect on the appearance of residual plots. By way of example, a normal
plot of all 19 standardized residuals is given in Figure 7.10. Verify its similarity
to the normal plot of all 19 raw residuals given in Figure 7.9 on page 454.
460 Chapter 7 Inference for Unstructured Multisample Studies
Example 2
1.5
2
0.0 3
2
–1.5
Section 1 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
ȳ i − µi
T =
s
pP
ni
has a tn−r distribution. Hence, a two-sided confidence interval for the ith mean, µi ,
has endpoints
Confidence limits s
for µi based on ȳ i ± t pP (7.14)
the one-way model ni
where the associated confidence is the probability assigned to the interval from −t
to t by the tn−r distribution. This is exactly formula (6.20) from Section 6.3, except
that sP has replaced si and the degrees of freedom have been adjusted from n i − 1
to n − r .
462 Chapter 7 Inference for Unstructured Multisample Studies
Confidence limits s
for µi − µi0 based 1 1
ȳ i − ȳ i 0 ± tsP + (7.15)
on the one-way ni ni 0
model
where the associated confidence is the probability assigned to the interval from −t to
t by the tn−r distribution. Display (7.15) is essentially formula (6.35) of Section 6.3,
except that sP is calculated based on r samples instead of two and the degrees of
freedom are n − r instead of n i + n i 0 − 2.
Of course, use of only one endpoint from formula (7.14) or (7.15) produces a
one-sided confidence interval with associated confidence corresponding to the tn−r
probability assigned to the interval (−∞, t) (for t > 0). The virtues of formulas
(7.14) and (7.15) (in comparison to the corresponding formulas from Section 6.3)
are that (when appropriate) for a given confidence, they will tend to produce shorter
intervals than their Chapter 6 counterparts.
s 581.6
t pP = 1.746 √ = 586.3 psi
ni 3
So ±586.3 psi precision could be attached to any one of the sample means
in Table 7.7 as an estimate of the corresponding formula’s mean strength. For
7.2 Simple Confidence Intervals in Multisample Studies 463
example, since ȳ 3 = 4,527.3 psi, a 90% two-sided confidence interval for µ3 has
endpoints
4,527.3 ± 586.3
that is,
Thus, ±829.1 psi precision could be attached to any difference between sample
means in Table 7.7 as an estimate of the corresponding difference in formula
mean strengths. For instance, since ȳ 3 = 4,527.3 psi and ȳ 7 = 1,551.3 psi, a
90% two-sided confidence interval for µ3 − µ7 has endpoints
That is,
Table 7.7
Concrete Formula Sample Mean Strengths
A linear combination
L = c1 µ1 + c2 µ2 + · · · + cr µr (7.16)
of population means
is of engineering interest. (Note that, for example, if all ci ’s except c3 are 0 and c3 =
1, L = µ3 , the mean response from condition 3. Similarly, if c3 = 1, c5 = −1, and
all other ci ’s are 0, L = µ3 − µ5 , the difference in mean responses from conditions
3 and 5.) A natural data-based way to approximate L is to replace the theoretical
or underlying means, µi , with empirical or sample means, ȳ i . That is, define an
estimator of L by
A linear combination
L̂ = c1 ȳ 1 + c2 ȳ 2 + · · · + cr ȳ r (7.17)
of sample means
EL̂ = c1 E ȳ 1 + c2 E ȳ 2 + · · · + cr E ȳ r
= c1 µ1 + c2 µ2 + · · · + cr µr
=L
and
!
c12 c2 c2
= σ2 + 2 + ··· + r
n1 n2 nr
The one-way model restrictions imply that the ȳ i are independent and normal and,
in turn, that L̂ is normal. So the standardized version of L̂,
L̂ − EL̂ L̂ − L
Z= p = s (7.18)
Var L̂ c12c2 c2
σ + 2 + ··· + r
n1 n2 nr
is standard normal. The usual manipulations beginning with this fact would produce
an unusable confidence interval for L involving the unknown parameter σ . A way to
reason to something of practical importance is to begin not with the variable (7.18),
but with
L̂ − L
T = s (7.19)
c12 c2 c2
sP + 2 + ··· + r
n1 n2 nr
instead. The fact is that under the current assumptions, the variable (7.19) has a tn−r
distribution. And this leads in the standard way to the fact that the interval with
endpoints
Confidence limits s
for a linear c12 c2 c2
combination of
L̂ ± tsP + 2 + ··· + r (7.20)
n1 n2 nr
means
Example 4 graduated cylinder were recorded. Some summary statistics for the tests on these
(continued ) brands are given in Table 7.8. Plots (not shown here) of the raw absorbency
values and residuals indicate no problems with the use of the one-way model in
the analysis of the absorbency data.
One question of practical interest is “On average, do the national brands
absorb more than the generic brand?” A way of quantifying this is to ask for a
two-sided 95% confidence interval for
1
L = µ1 − (µ2 + µ3 ) (7.21)
2
the difference between the average liquid left by the generic brand and the
arithmetic mean of the national brand averages.
With L as in equation (7.21), formula (7.17) shows that
1 1
L̂ = 93.2 − (81.0) − (83.8) = 10.8 ml
2 2
is an estimate of the increased absorbency offered by the national brands. Using
the standard deviations given in Table 7.8,
and thus
√
sP = .59 = .77 ml
Table 7.8
Summary Statistics for Absorbencies of Three
Brands of Paper Towels
Brand i ni ȳ i si
Generic 1 5 93.2 ml .8 ml
National B 2 5 81.0 ml .7 ml
National V 3 5 83.8 ml .8 ml
7.2 Simple Confidence Intervals in Multisample Studies 467
10.8 ± 2.179(.77)(.55)
that is,
10.8 ± .9
i.e.,
The interval indicated in display (7.22) shows definitively the substantial advan-
tage in absorbency held by the national brands over the generic, particularly in
view of the fact that the amount actually absorbed by the generic brand appears
to average only about 6.8 ml (= 100 ml − 93.2 ml).
Table 7.9
Modulus of Rupture Measurements for Brick Bars
in a 22 Factorial Study
i, % Water Heat-Treating
Bar Type in Mix Regimen MOR (psi)
1 17 slow cool 4911, 5998, 5676
2 19 slow cool 4387, 5388, 5007
3 17 fast cool 3824, 3140, 3502
4 19 fast cool 4768, 3672, 3242
Notice that the data represented in Table 7.9 have a 2 × 2 complete factorial
structure. Indeed, returning to Section 4.3 (in particular, to Definition 5, page 166),
468 Chapter 7 Inference for Unstructured Multisample Studies
Example 5 it becomes clear that the fitted main effect of the factor Heat-Treating Regimen
(continued ) at its slow cool level is
1 1
( ȳ 1 + ȳ 2 ) − ( ȳ 1 + ȳ 2 + ȳ 3 + ȳ 4 ) (7.23)
2 4
But the variable (7.23) is the L̂ for the linear combination of mean strengths µ1 ,
µ2 , µ3 , and µ4 given by
1 1 1 1
L= µ1 + µ2 − µ3 − µ4 (7.24)
4 4 4 4
1 1
L̂ = ( ȳ 1 + ȳ 2 ) − ( ȳ 1 + ȳ 2 + ȳ 3 + ȳ 4 )
2 4
1 1 1 1
= ȳ 1 + ȳ 2 − ȳ 3 − ȳ 4
4 4 4 4
1
= (5,528.3 + 4,927.3 − 3,488.7 − 3,894.0)
4
= 768.2 psi
and
s
(3 − 1)(558.3)2 + (3 − 1)(505.2)2 + (3 − 1)(342.2)2 + (3 − 1)(786.8)2
sP =
(3 − 1) + (3 − 1) + (3 − 1) + (3 − 1)
= 570.8 psi
Table 7.10
Summary Statistics for the
Modulus of Rupture Measurements
i, Bar Type ȳ i si
1 5,528.3 558.3
2 4,927.3 505.2
3 3,488.7 342.2
4 3,894.0 786.8
7.2 Simple Confidence Intervals in Multisample Studies 469
768.2 ± 2.896(570.8)(.2887)
that is,
1 1
2L = (µ + µ2 ) − (µ3 + µ4 )
2 1 2
shows that (when averaged over 17% and 19% water mixtures) the slow, cool
regimen seems to offer an increase in MOR in the range from
1
µ1 , µ2 , µ3 , µ1 − µ2 , µ1 − µ3 , µ2 − µ3 , and µ1 − (µ2 + µ3 )
2
Since many confidence statements are often made in multisample studies, it is
important to reflect on the meaning of a confidence level and realize that it is
attached to one interval at a time. If many 90% confidence intervals are made,
470 Chapter 7 Inference for Unstructured Multisample Studies
the 90% figure applies to the intervals individually. One is “90% sure” of the
first interval, separately “90% sure” of the second, separately “90% sure” of the
third, and so on. It is not at all clear how to arrive at a reliability figure for the
intervals jointly or simultaneously (i.e., an a priori probability that all the intervals
are effective). But it is fairly obvious that it must be less than 90%. That is, the
simultaneous or joint confidence (the overall reliability figure) to be associated
with a group of intervals is generally not easy to determine, but it is typically less
(and sometimes much less) than the individual confidence level(s) associated with
the intervals one at a time.
There are at least three different approaches to be taken once the difference
between simultaneous and individual confidence levels is recognized. The most
obvious option is to make individual confidence intervals and be careful to interpret
them as such (being careful to recognize that as the number of intervals one makes
increases, so does the likelihood that among them are one or more intervals that fail
to cover the quantities they are meant to locate).
A second way of handling the issue of simultaneous versus individual confidence
is to use very large individual confidence levels for the separate intervals and then
employ a somewhat crude inequality to find at least a minimum value for the
simultaneous confidence associated with an entire group of intervals. That is, if
k confidence intervals have associated confidences γ1 , γ2 , . . . , γk , the Bonferroni
inequality says that the simultaneous or joint confidence that all k intervals are
effective (say, γ ) satisfies
The Bonferroni
γ ≥ 1 − (1 − γ1 ) + (1 − γ2 ) + · · · + (1 − γk ) (7.26)
inequality
(Basically, this statement says that the joint “unconfidence” associated with k inter-
vals (1 − γ ) is no larger than the sum of the k individual unconfidences. For example,
five intervals with individual 99% confidence levels have a joint or simultaneous
confidence level of at least 95%.)
The third way of approaching the issue of simultaneous confidence is to develop
and employ methods that for some specific, useful set of unknown quantities provide
intervals with a known level of simultaneous confidence. There are whole books
full of such simultaneous inference methods. In the next section, two of the better
known and simplest of these are discussed.
Section 2 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Return to the situation of Exercise 1 of Section confidence is desired, what is 1? If all five of
7.1 (and the pressure/density data of Example 1 in these intervals are made, what does the Bonfer-
Chapter 4). roni inequality guarantee for a minimum joint
(a) Individual two-sided confidence intervals for or simultaneous confidence?
the five different means here would be of the (b) Individual two-sided confidence intervals for
form ȳ i ± 1 for a number 1. If 95% individual the differences in the five different means
7.3 Two Simultaneous Confidence Interval Methods 471
would be of the form ȳ i − ȳ i 0 ± 1 for a num- (b) Individual confidence intervals for the differ-
ber 1. If 95% individual confidence is desired, ences between particular pairs of mean tilttable
what is 1? ratios are of the form ȳ i − ȳ i 0 ± 1 for appro-
(c) Note that if mean density is a linear func- priate values of 1. Find values of 1 if individ-
tion of pressure over the range of pressures ual 99% two-sided intervals are desired, first
from 2,000 to 6,000 psi, then µ4000 − µ2000 = for pairs of means with samples of size 4 and
µ6000 − µ4000 , that is L = µ6000 − 2µ4000 + then for pairs of means where one sample size
µ2000 has the value 0. Give 95% two-sided is 4 and the other is 5.
confidence limits for this L. What does your (c) It might be of interest to compare the average
interval indicate about the linearity of the pres- of the tilttable ratios for the minivans to that of
sure/density relationship? the full-size vans. Give a 99% two-sided con-
2. Return to the tilttable testing problem of Exercise fidence interval for the quantity 12 (µ1 + µ2 ) −
2 of Section 7.1.
1
2
(µ3 + µ4 ) .
(a) Make (individual) 99% two-sided confidence 3. Explain the difference between several intervals
intervals for the four different mean tilttable ra- having associated 95% individual confidences and
tios for the four vans, µ1 , µ2 , µ3 and µ4 . What having associated 95% simultaneous confidence.
does the Bonferroni inequality guarantee for a
minimum joint or simultaneous confidence for
these four intervals?
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
is applied r times to estimate these means, the only handle one has on the corre-
sponding simultaneous confidence is given by the Bonferroni inequality (7.26). This
fairly crude tool says that if r = 8 and one wants 95% simultaneous confidence, in-
dividual “unconfidences” of .05 8
= .00625 (i.e., individual confidences of 99.375%)
for the eight different intervals will suffice to produce the desired simultaneous
confidence.
Another approach to the setting of simultaneous confidence limits on all of
µ1 , µ2 , . . . , µr is to replace t in formula (7.27) with a multiplier derived specif-
ically for the purpose of providing an exact, stated, simultaneous confidence in
the estimation of all the means. Such multipliers were derived by Pillai and Ra-
machandran, where either all of the intervals for the r means are two-sided or all are
one-sided. That is, Table B.8A gives values of constants k2∗ such that the r two-sided
intervals with respective endpoints
P-R two-sided
s
simultaneous 95% ȳ i ± k2∗ pP (7.28)
confidence limits ni
for r means
P-R one-sided !
s
simultaneous 95% −∞, ȳ i + k1∗ pP (7.29)
confidence intervals ni
for r means
or of the form
P-R one-sided !
simultaneous 95% s
ȳ i − k1∗ pP ,∞ (7.30)
confidence intervals ni
for r means
581.6
ȳ i ± 2.120 √
3
that is,
581.6
ȳ i ± 3.099 √
3
that is,
Expressions (7.31) and (7.32) provide two-sided intervals for the eight mean
compressive strengths. If one-sided intervals of the form (#, ∞) were desired
instead, consulting the t table for the .95 quantile of the t16 distribution and use
of formula (7.27) shows that the values
581.6
ȳ i − 1.746 √
3
that is,
are individual 95% lower confidence bounds for the formula mean compres-
sive strengths, µi . At the same time, consulting Table B.8B shows that for
474 Chapter 7 Inference for Unstructured Multisample Studies
Example 6 simultaneous 95% confidence, use of k1∗ = 2.779 in formula (7.30) is appro-
(continued ) priate, and the values
581.6
ȳ i − 2.779 √
3
that is,
are simultaneous 95% lower confidence bounds for the formula mean compressive
strengths, µi .
Comparing intervals (7.31) with intervals (7.32) and bounds (7.33) with bounds
(7.34) shows clearly the impact of requiring simultaneous rather than individual
confidence. For a given nominal confidence level, the simultaneous intervals must
be bigger (more conservative) than the corresponding individual intervals.
It is common practice to summarize the information about mean responses
gained in a multisample study in a plot of sample means versus sample numbers,
enhanced with “error bars” around the sample means to indicate the uncertainty
associated with locating the means. There are various conventions for the making
of these bars. When looking at such a plot, one typically forms an overall visual
impression. Therefore, it is our opinion that error bars derived from the P-R simul-
taneous confidence limits of display (7.28) are the most sensible representation of
what is known about a group of r means. For example, Figure 7.11 is a graphical
representation of the eight formula sample mean strengths given in Table 7.7 with
±1,040.6 psi error bars, as indicated by expression (7.32).
When looking at a display like Figure 7.11, it is important to remember that
what is represented is the precision of knowledge about the mean strengths, rather
than any kind of predictions for individual compressive strengths. In this regard,
the similarity of the spread of the samples on the side-by-side dot diagram given
as Figure 7.1 and the size of the error bars here is coincidental. As sample sizes
increase, spreads on displays of individual measurements like Figure 7.1 will tend to
stabilize (representing the spreads of the underlying distributions), while the lengths
of error bars associated with means will shrink to 0 as increased information gives
sharper and sharper evidence about the underlying means.
In any case, Figure 7.11 shows clearly that the information in the data is quite
adequate to establish the existence of differences in formula mean compressive
strengths.
7000
6000
Mean compressive strength (psi)
5000
4000
3000
2000
1000
1 2 3 4 5 6 7 8
Concrete formula
s
1 1
ȳ i − ȳ i 0 ± tsP + (7.35)
ni ni 0
where the associated confidence level is an individual one. But if, for example,
r = 8, there are 28 different two-at-a-time comparisons of underlying means to be
considered (µ1 versus µ2 , µ1 versus µ3 , . . . , µ1 versus µ8 , µ2 versus µ3 , . . . , and
µ7 versus µ8 ). If one wishes to guarantee a reasonable simultaneous confidence
level for all these comparisons via the crude Bonferroni idea, a huge individual
confidence level is required for the intervals (7.35). For example, the Bonferroni in-
equality requires 99.82% individual confidence for 28 intervals in order to guarantee
simultaneous 95% confidence.
A better approach to the setting of simultaneous confidence limits on all of
the differences µi − µi 0 is to replace t in formula (7.35) with a multiplier derived
specifically for the purpose of providing exact, stated, simultaneous confidence in
the estimation of all such differences. J. Tukey first pointed out that it is possible
to provide such multipliers using quantiles of the Studentized range distributions.
476 Chapter 7 Inference for Unstructured Multisample Studies
Tables B.9A and B.9B give values of constants q ∗ such that the set of two-sided
intervals with endpoints
Tukey’s two- s
sided simultaneous q∗ 1 1
confidence limits ȳ i − ȳ i 0 ± √ sP + (7.36)
2 n i n i0
for all differences
in r means
has simultaneous confidence at least 95% or 99% (depending on whether Q(.95)
is read from Table B.9A or Q(.99) is read from Table B.9B) in the estimation of
all differences µi − µi 0 . If all the sample sizes n 1 , n 2 , . . . , n r are equal, the 95% or
99% nominal simultaneous confidence figure is exact, while if the sample sizes are
not all equal, the true value is at least as big as the nominal value.
In order to apply Tukey’s method, one must find (using interpolation as needed)
the column in Tables B.9 corresponding to the number of samples/means to be
compared and the row corresponding to the degrees of freedom associated with sP ,
(namely, ν = n − r ).
Example 6 Consider the making of confidence intervals for differences in formula mean
(continued ) compressive strengths. If a 95% two-sided individual confidence interval is de-
sired for a specific difference µi − µi 0 , formula (7.35) shows that appropriate
endpoints are
r
1 1
ȳ i − ȳ i 0 ± 2.120(581.6) +
3 3
that is,
On the other hand, if one plans to estimate all differences in mean com-
pressive strengths with simultaneous 95% confidence, by formula (7.36) Tukey
two-sided intervals with endpoints
r
4.90 1 1
ȳ i − ȳ i 0 ± √ (581.6) +
2 3 3
that is,
are in order (4.90 is the value in the r = 8 column and ν = 16 row of Table B.9A.)
7.3 Two Simultaneous Confidence Interval Methods 477
In keeping with the fact that the confidence level associated with the intervals
(7.38) is a simultaneous one, the Tukey intervals are wider than those indicated
in formula (7.37).
The plus-or-minus part of display (7.38) is not as big as twice the plus-or-
minus part of expression (7.32). Thus, when looking at Figure 7.11, it is not
necessary that the error bars around two means fail to overlap before it is safe to
judge the corresponding underlying means to be detectably different. Rather, it
is only necessary that the two sample means differ by the plus-or-minus part of
formula (7.36)—1,645.4 psi in the present situation.
This section has mentioned only two of many existing methods of simultane-
ous confidence interval estimation for multisample studies. These should serve to
indicate the general character of such methods and illustrate the implications of a
simultaneous (as opposed to individual) confidence guarantee.
One final word of caution has to do with the theoretical justification of all of
the methods found in this section. It is the “equal variances, normal distributions”
model that supports these engineering tools. If any real faith is to be put in the
nominal confidence levels attached to the P-R and Tukey methods presented here,
that faith should be based on evidence (typically gathered, at least to some extent,
as illustrated in Section 7.1) that the standard one-way normal model is a sensible
description of a physical situation.
Section 3 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Return to the situation of Exercises 1 of Sections (a) Use the P-R method of simultaneous confi-
7.1 and 7.2 (and the pressure/density data of Ex- dence intervals and make simultaneous 95%
ample 1 in Chapter 4). two-sided confidence intervals for the four
(a) Using the P-R method, what 1 can be em- mean tilttable ratios.
ployed to make two-sided intervals of the form (b) Simultaneous confidence intervals for the dif-
ȳ i ± 1 for all five mean densities, possessing ferences in all pairs of mean tilttable ratios
simultaneous 95% confidence? How does this are of the form ȳ i − ȳ i 0 ± 1. Find appropriate
1 compare to the one computed in part (a) of values of 1 if simultaneous 99% two-sided in-
Exercise 1 of Section 7.2? tervals are desired, first for pairs of means with
(b) Using the Tukey method, what 1 can be em- samples of size 4 and then for pairs of means
ployed to make two-sided intervals of the form where one sample size is 4 and the other is
ȳ i − ȳ i 0 ± 1 for all differences in the five 5. How do these compare to the intervals you
mean densities, possessing simultaneous 95% found in part (b) of Exercise 2 of Section 7.2?
confidence? How does this 1 compare to the Why is it reasonable that the 1’s should be
one computed in part (b) of Exercise 1 of Sec- related in this way?
tion 7.2?
2. Return to the tilttable study of Exercises 2 of Sec-
tions 7.1 and 7.2.
478 Chapter 7 Inference for Unstructured Multisample Studies
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
H0 : µ3 = 7 (7.39)
H0 : µ3 − µ7 = 0 (7.40)
1
H0 : µ1 − (µ2 + µ3 ) = 0 (7.41)
2
The confidence interval methods discussed in Section 7.2 have their significance-
testing analogs for treating hypotheses that, like all three of these, involve linear
combinations of the means µ1 , µ2 , . . . , µr .
In general (under the standard one-way model), if
L = c1 µ1 + c2 µ2 + · · · + cr µr
the hypothesis
H0 : L = # (7.42)
L̂ − #
T = s (7.43)
c12c2 c2
sP + 2 + ··· + r
n1 n2 nr
and a tn−r reference distribution. This fact specializes to cover hypotheses of types
(7.39) to (7.41) by appropriate choice of the ci and #.
7.4 One-Way Analysis of Variance (ANOVA) 479
But the significance-testing method most often associated with the one-way
normal model is not for hypotheses of the type (7.42). Instead, the most common
method concerns the hypothesis that all r underlying means have the same value. In
symbols, this is
H0 : µ1 = µ2 = · · · = µr (7.44)
Given that one is working under the assumptions of the one-way model to begin
with, hypothesis (7.44) amounts to a statement that all r underlying distributions are
essentially the same—or “There are no differences between treatments.”
Hypothesis (7.44) can be thought of in terms of the simultaneous equality of
r (r −1)
2
pairs of means—that is, as equivalent to the statement that simultaneously
µ1 − µ2 = 0, µ1 − µ3 = 0, ..., µ1 − µr = 0,
µ2 − µ3 = 0, ..., and µr −1 − µr = 0
And this fact should remind the reader of the ideas about simultaneous confidence
intervals from the previous section (specifically, Tukey’s method). In fact, one way of
judging the statistical significance of an r -sample data set in reference to hypothesis
(7.44) is to apply Tukey’s method of simultaneous interval estimation and note
whether or not all the intervals for differences in means include 0. If they all do,
the associated p-value is larger than 1 minus the simultaneous confidence level. If
not all of the intervals include 0, the associated p-value is smaller than 1 minus
the simultaneous confidence level. (If simultaneous 95% intervals all include 0,
no differences between means are definitively established, and the corresponding
p-value exceeds .05.)
We admit a bias toward estimation over testing per se. A consequence of this
bias is a fondness for deriving a rough idea of a p-value for hypothesis (7.44) as a
byproduct of Tukey’s method. But a most famous significance-testing method for
hypothesis (7.44) also deserves discussion: the one-way analysis of variance test.
(At this point it may seem strange that a test about means has a name apparently
emphasizing variance. The motivation for this jargon is that the test is associated
with a very helpful way of thinking about partitioning the overall variability that is
encountered in a response variable.)
H0 : µ1 = µ2 = · · · = µr
Ha : not H0 (7.45)
480 Chapter 7 Inference for Unstructured Multisample Studies
Definition 3 In multisample studies, symbols for sample sizes and sample statistics appear-
(A Notational Convention ing without subscript indices or dots will be understood to be calculated from
for Multisample Studies) all responses in hand, obtained by combining all samples.
So n will stand for the total number of data points (even in an r -sample study),
ȳ for the grand sample average of response y, and s 2 for a grand sample variance
calculated completely ignoring sample boundaries.
For present purposes (of writing down a test statistic for testing hypothesis
(7.44)), one needs to make use of ȳ, the grand sample average. It is important to
recognize that ȳ and
1X
The (unweighted) r
average of r sample ȳ . = ȳ (7.46)
means r i=1 i
are not necessarily the same unless all sample sizes are equal. That is, when sample
sizes vary, ȳ is the (unweighted) arithmetic average of the raw data values yi j but is a
weighted average of the sample means ȳ i . On the other hand, ȳ . is the (unweighted)
arithmetic mean of the sample means ȳ i but is a weighted average of the raw data
values yi j . For example, in the simple case that r = 2, n 1 = 2, and n 2 = 3,
1 2 3
ȳ = (y + y12 + y21 + y22 + y23 ) = ȳ 1 + ȳ 2
5 11 5 5
while
1 1 1 1 1 1
ȳ . = ( ȳ + ȳ 2 ) = y11 + y12 + y21 + y22 + y23
2 1 4 4 6 6 6
and, in general, ȳ and ȳ . will not be the same.
7.4 One-Way Analysis of Variance (ANOVA) 481
X
r
n i ( ȳ i − ȳ)2 (7.47)
i=1
One can think of statistic (7.47) either as a weighted sum of the quantities ( ȳ i − ȳ)2
or as an unweighted sum, where there is a term in the sum for each raw data point
and therefore n i of the type ( ȳ i − ȳ)2 . The quantity (7.47) is a measure of the
between-sample variation in the data. For a given set of sample sizes, the larger it
is, the more variation there is between the sample means ȳ i .
In order to produce a test statistic for hypothesis (7.44), one simply divides the
measure (7.47) by (r − 1)sP2 , giving
1 X
r
One-way ANOVA n ( ȳ − ȳ)2
test statistic for r − 1 i=1 i i
equality of r means F= (7.48)
sP2
Example 7 Returning again to the concrete compressive strength study of Armstrong, Babb,
(Example 1 revisited ) and Campen, ȳ = 3,693.6 and the 8 sample means ȳ i have differences from this
value given in Table 7.11.
Then since each n i = 3, in this situation,
X
r
n i ( ȳ i − ȳ)2 = 3(1,941.7)2 + 3(2,059.7)2 + · · ·
i=1
+ 3(−2,142.3)2 + 3(−1,302.9)2
= 47,360,780 (psi)2
482 Chapter 7 Inference for Unstructured Multisample Studies
i,
Formula ȳ i ȳ i − ȳ
1 5,635.3 1,941.7
2 5,753.3 2,059.7
3 4,527.3 833.7
4 3,442.3 −251.3
5 2,923.7 −769.9
6 3,324.7 −368.9
7 1,551.3 −2,142.3
8 2,390.7 −1,302.9
In order to use this figure to judge statistical significance, one standardizes via
equation (7.48) to arrive at the observed value of the test statistic
1
(47,360,780)
8−1
f = = 20.0
(581.6)2
It is easy to verify from Tables B.6 that 20.0 is larger than the .999 quantile of
the F7,16 distribution. So
That is, the data provide overwhelming evidence that µ1 , µ2 , . . . , µ8 are not all
equal.
For pedagogical reasons, the one-way ANOVA test has been presented after
discussing interval-oriented methods of inference for r -sample studies. But if it is to
be used in applications, the testing method typically belongs chronologically before
estimation. That is, the ANOVA test can serve as a screening device to determine
whether the data in hand are adequate to differentiate conclusively between the
means, or whether more data are needed.
One-way X
r
ANOVA (n − 1)s 2 = n i ( ȳ i − ȳ)2 + (n − r )sP2 (7.49)
identity i=1
or in other symbols,
ni
A second statement X X
r X
r X
of the one-way (yi j − ȳ) =
2
n i ( ȳ i − ȳ) +
2
(yi j − ȳ i )2 (7.50)
ANOVA identity i, j i=1 i=1 j=1
Proposition 1 should begin to shed some light on the phrase “analysis of vari-
ance.” It says that an overall measure of variability in the response y, namely,
X
(n − 1)s 2 = (yi j − ȳ)2
i, j
X
r
n i ( ȳ i − ȳ)2
i=1
ni
X
r X
(n − r )sP2 = (yi j − ȳ i )2
i=1 j=1
measures variation within the samples (and in fact consists of the sum of the squared
residuals). The F statistic (7.48), developed for testing H0 : µ1 = µ2 = · · · = µr , has
a numerator related to the first of these and a denominator related to the second. So
using the ANOVA F statistic amounts to a kind of analyzing of the raw variability
in y.
In recognition of their prominence in the calculation of the one-way ANOVA
F statistic and their usefulness as descriptive statistics in their own right, the three
sums (of squares) appearing in formulas (7.49) and (7.50) are usually given special
names and shorthand. These are stated here in definition form.
484 Chapter 7 Inference for Unstructured Multisample Studies
Definition 4 In a multisample study, (n − 1)s 2 , the sum of squared differences between the
raw data values and the grand sample mean, will be called the total sum of
squares and denoted as SSTot.
P
Definition 5 In an unstructured multisample study, n i ( ȳ i − ȳ)2 will be called the treat-
ment sum of squares and denoted as SSTr.
P
Definition 6 In a multisample study, the sum of squared residuals, (y − ŷ)2 (which is
(n − r )sP2 in the unstructured situation) will be called the error sum of squares
and denoted as SSE.
Table 7.12
General Form of the One-Way ANOVA Table
name Error is sometimes replaced by Within (Samples) or Residual. The first two
entries in the SS column must sum to the third, as indicated in equation (7.51).
Similarly, the Treatments and Error degrees of freedom add to the Total degrees of
freedom, (n − 1). Notice that the entries in the d f column are those attached to the
numerator and denominator, respectively, of the test statistic in equation (7.48). The
ratios of sums of squares to degrees of freedom are called mean squares, here the
mean square for treatments (MSTr) and the mean square for error (MSE). Verify that
in the present context, MSE = sP2 and MSTr is the numerator of the F statistic given
in equation (7.48). So the single ratio appearing in the F column is the observed
value of F for testing H0 : µ1 = µ2 = · · · = µr .
Example 7 Consider once more the concrete strength study. It is possible to return to the raw
(continued ) data given in Table 7.1 and find that ȳ = 3,693.6, so
SSTot = (n − 1)s 2
= (5,800 − 3,693.6)2 + (4,598 − 3,693.6)2 + (6,508 − 3,693.6)2
+ · · · + (2,631 − 3,693.6)2 + (2,490 − 3,693.6)2
= 52,772,190 (psi)2
X
r
SSTr = n i ( ȳ i − ȳ)2 = 47,360,780
i=1
Then, plugging these and appropriate degrees of freedom values into the general
form of the one-way ANOVA table produces the table for the concrete compres-
sive strength study, presented here as Table 7.13.
Table 7.13
One-Way ANOVA Table for the Concrete Strength Study
Example 7 Notice that, as promised by the one-way ANOVA identity, the sum of the
(continued ) treatment and error sums of squares is the total sum of squares. Also, Table
7.13 serves as a helpful summary of the testing process, showing at a glance the
observed value of F, the appropriate degrees of freedom, and sP2 = M S E.
The computations here are by no means impossible to do “by hand.” But the
most sensible way to handle them is to employ a statistical package. Printout 1
shows the results of using MINTAB to create an ANOVA table. (The routine
under MINITAB’s “Stat/ANOVA/One-way” menu was used.)
You may recall having used a breakdown of a “raw variation in the data” earlier
in this text (namely, in Chapter 4). In fact, there is a direct connection between the
present discussion and the discussion and use of R 2 in Sections 4.1, 4.2, and 4.3.
(See Definition 3 in Chapter 4 and its use throughout those three sections.) In the
present notation, the coefficient of determination defined as a descriptive measure
in Section 4.1 is
The coefficient of
determination in SSTot − SSE
general sums of R2 = (7.52)
SSTot
squares notation
(Fitted values for the present situation are the sample means and SSE is the sum
of squared residuals here, just as it was earlier.) Expression (7.52) is a perfectly
general recasting of the definition of R 2 into “SS” notation. In the present one-way
context, the one-way identity (7.51) makes it possible to rewrite the numerator of
7.4 One-Way Analysis of Variance (ANOVA) 487
The coefficient
of determination SSTr
R2 = (7.53)
in a one-way SSTot
analysis
That is, the first entry in the SS column of the ANOVA table divided by the total entry
of that column can be taken as “the fraction of the raw variability in y accounted for
in the process of fitting the equation yi j ≈ µi to the data.”
Example 7 In the concrete compressive strength study, a look at Table 7.13 and equation
(continued ) (7.53) shows that
SSTr 47,360,780
R2 = = = .897
SSTot 52,772,190
That is, another way to describe these data is to say that differences between
concrete formulas account for nearly 90% of the raw variability observed in
compressive strength.
of interest, is a variation on the one-way model of this chapter called the one-way
random effects model. It is built on the usual one-way assumptions that
yi j = µi + i j (7.54)
where the i j are iid normal (0, σ 2 ) random variables. But it doesn’t treat the means
µi as parameters/unknown constants. Instead, the means µ1 , µ2 , . . . , µr are treated
Random effects as (unobservable) random variables independent of the i j ’s and themselves iid
model assumptions according to some normal distribution with an unknown mean µ and unknown
variance στ2 . The random variables µi are now called random (treatment) effects,
and the variances σ 2 and στ2 are called variance components. The objects of
formal inference become µ (the mean of the random effects) and the two variance
components σ 2 and στ2 .
Table 7.14
Measured Magnesium Contents for Five Alloy Specimens
will also be ignored for present purposes.) The units of measurement in Table
7.14 are .001% magnesium.
In this example, on the order of 8,300 test specimens could be cut from the
100 m rod. The purpose of creating the rod was to provide secondary standards for
field calibration of chemical analysis instruments. That is, laboratories purchasing
pieces of this rod could use them as being of “known” magnesium content to
calibrate their instruments. As such, the practical issues at stake here are not
primarily how the r = 5 particular test specimens analyzed compare. Rather, the
issues are what the overall magnesium content is and whether or not the rod is
consistent enough in content along its length to be of any use as a calibration tool.
A random effects model and inference for the mean effect µ and the variance
components are quite natural in this situation. Here, στ2 represents the variation
in magnesium content among the potentially 8,300 different test specimens, and
σ 2 represents measurement error plus variation in magnesium content within the
1.2 cm thick specimens, test location to test location.
When all of the r sample sizes n i are the same (say, equal to m), it turns out to
be quite easy to do some diagnostic checking of the aptness of the normal random
effects model (7.54) and make subsequent inferences about µ, σ 2 , and στ2 . So this
discussion will be limited to cases of equal sample sizes.
As far as investigation of the reasonableness of the model restrictions on the
distribution of the µi and inference for µ are concerned, a key observation is that
1 X
m
ȳ i = (µ + i j ) = µi + ¯ i
m j=1 i
(where, of course, ¯ i is the sample mean of i1 , . . . , im ). Under the random effects
model (7.54), these ȳ i = µi + ¯ i are iid normal variables with mean µ and variance
στ2 + σ 2 /m. So normal-plotting the ȳ i is a sensible method of at least indirectly
investigating the appropriateness of the normal distribution assumption for the µi .
In addition, the fact that the model says the ȳ i are independent normal variables with
mean µ and a common variance suggests that the small-sample inference methods
from Section 6.3 should simply be applied to the sample means ȳ i in order to do infer-
ence for µ. In doing so, the “sample size” involved is the number of ȳ i ’s—namely, r .
Example 8 For the magnesium alloy rod, the r = 5 sample means are in Table 7.14. Figure
(continued ) 7.12 gives a normal plot of those five values, showing no obvious problems with
a normal random effects model for specimen magnesium contents.
To find a 95% two-sided confidence interval for µ, we calculate as follows
(treating the five values ȳ i as “observations”). The sample mean (of ȳ i ’s) is
1X
5
ȳ . = ȳ = 68.86
5 i=1 i
490 Chapter 7 Inference for Unstructured Multisample Studies
Example 8
(continued )
–1.0
68 69 70
Sample mean quantile
1 X
5
( ȳ − ȳ . )2 = .76
5 − 1 i=1 i
Applying the small-sample confidence interval formula for a single mean from
Section 6.3 (since r − 1 = 4 degrees of freedom are appropriate), a two-sided
95% confidence for µ has endpoints
.87
68.86 ± 2.776 √
5
that is,
These limits provide a notion of precision appropriate for the number 68.86 ×
10−3 % as an estimate of the rod’s mean magnesium content.
7.4 One-Way Analysis of Variance (ANOVA) 491
It is useful to write out in symbols what was just done to get a confidence
interval for µ. That is, a sample variance of ȳ i ’s was used. This is
1 X X
r r
1 1 1
( ȳ i − ȳ . )2 = m( ȳ i − ȳ)2 = SSTr = MSTr
r − 1 i=1 m(r − 1) i=1 m(r − 1) m
because all n i are m and ȳ . = ȳ in this case. But this means that under the assumptions
of the one-way normal random effects model, a two-sided confidence interval for µ
has endpoints
Balanced data
confidence limits r
MSTr
for the overall ȳ . ± t (7.55)
mean in the mr
one-way random
effects model where t is such that the probability the tr −1 distribution assigns to the interval
between −t and t is the desired confidence. One-sided intervals are obtained in the
usual way, by employing only one of the endpoints in display (7.55).
H0 : στ2 = 0 (7.56)
has an Fr −1, n−r distribution under the assumptions of the random effects model
(7.54) when the null hypothesis (7.56) holds. Thus, the same one-way ANOVA F
test used to test H0 : µ1 = µ2 = · · · = µr when the means µi are considered fixed
parameters can also be used to test H0 : στ2 = 0 under the assumptions of the random
effects model.
As far as estimation goes, it doesn’t turn out to be possible to give a simple
confidence interval formula for στ2 directly. But what can be done in a straightforward
fashion is to give both a natural ANOVA-based single-number estimate of στ2 and
a confidence interval for the ratio στ2 /σ 2 . To accomplish the first of these, consider
the mean values of random variables MSTr and MSE (= sP2 ) under the assumptions
of the random effects model. Not too surprisingly,
(After all, sP2 has been used to approximate σ 2 . That the “center” of the probability
distribution of sP2 is σ 2 should therefore seem only reassuring.) And further,
1
E(MSTr) − E(MSE) = στ2
m
or
1
E (MSTr − MSE) = στ2 (7.60)
m
So equation (7.60) suggests that the random variable
1
(MSTr − MSE) (7.61)
m
is one whose distribution is centered about the variance component στ2 and thus is a
natural ANOVA-based estimator of στ2 . The variable in display (7.61) is potentially
negative. When that occurs, common practice is to estimate στ2 by 0. So the variable
actually used to estimate στ2 is
An ANOVA-based
1
estimator of the σ̂ 2τ = max 0, (MSTr − MSE) (7.62)
treatment variance m
Facts (7.58) and (7.60), which motivate this method of estimating στ2 , are important
enough that they are often included as entries in an Expected Mean Square column
added to the one-way ANOVA table when testing H0 : στ2 = 0.
7.4 One-Way Analysis of Variance (ANOVA) 493
MSTr
σ + mστ2
2
F=
MSE
σ2
has an Fr −1, n−r distribution. Some algebraic manipulations beginning from this fact
show that the interval with endpoints
Confidence limits
for στ2 /σ 2 in the
1 MSTr 1 MSTr
one-way random −1 and −1 (7.63)
m U · MSE m L · MSE
effects model
can be used as a two-sided confidence interval for στ2 /σ 2 , where the associated
confidence is the probability the Fr −1, n−r distribution assigns to the interval (L , U ).
One-sided intervals for στ2 /σ 2 can be had by using only one of the endpoints and
choosing L or U such that the probability assigned by the Fr −1, n−r distribution to
(L , ∞) or (0, U ) is the desired confidence.
Example 8 Consider again the measured magnesium contents for specimens cut from the
(continued ) 100 m alloy rod. Some normal plotting shows the “single variance normal i j ” part
of the model assumptions (7.54) to be at least not obviously flawed. Sample-by-
sample normal plots show fair linearity (at least after allowing for the discreteness
introduced in the data by the measurement scale used), except perhaps for sample
4, with its five identical values. The five sample standard deviations are roughly
of the same order of magnitude, and the normal plot of residuals in Figure 7.13
is pleasantly linear. So it is sensible to consider formal inference for σ 2 and στ2
based on the normal theory model.
Table 7.15 is an ANOVA table for the data of Table 7.14. From Table 7.15,
the p-value for testing H0 : στ2 = 0 is the F4,45 probability to the right of 1.10.
According to Tables B.6, this is larger than .25, giving very weak evidence of
detectable variation between specimen mean magnesium contents.
The EMS column in Table 7.15 is based on relationships (7.58) and (7.59)
and is a reminder first that MSE = sP2 = 6.88 serves as an estimate of σ 2 . So
multiple magnesium determinations on a given √ specimen would be estimated
I to have a standard deviation on the order of 6.88 = 2.6 × 10−3 %. Then the
expected mean squares further suggest that στ2 be estimated by
1 1
σ̂ 2τ = (MSTr − MSE) = (7.58 − 6.88) = .07
10 10
494 Chapter 7 Inference for Unstructured Multisample Studies
Example 8
1.5 2
4
5
3 2
0.0 5
22
23
2
2
–1.5
Table 7.15
ANOVA Table for the Magnesium Content Study
That is, the standard deviation of specimen mean magnesium contents is estimated
1
to be on the order of 10 of the standard deviation associated with multiple
measurements on a single specimen.
A confidence interval for σ 2 could be made using formula (7.10) of Section
7.1. That will not be done here, but formula (7.63) will be used to make a one-
sided 90% confidence interval of the form (0, #) for στ /σ . The .90 quantile of
the F45,4 distribution is about 3.80, so the .10 quantile of the F4,45 distribution is
1
about 3.80 . Then taking the root of the second endpoint given in display (7.63), a
90% upper confidence bound for στ /σ is
v
u
u
u1
u 7.58
I u
t 10 1
− 1
= .56
6.88
3.80
7.4 One-Way Analysis of Variance (ANOVA) 495
The bottom line here is that στ is small compared to σ and is not even clearly
other than 0. Most of the variation in the data of Table 7.14 is associated with the
making of multiple measurements on a single specimen. Of course, this is good
news if the rod is to be cut up and distributed as pieces having known magnesium
contents and thus useful for measurement instrument calibration.
Section 4 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Return to the situation in Exercises 1 of Sections Waves in Railroad Rail” by Bray and Leon-
7.1 through 7.3 (and the pressure/density data of Salamanca (Materials Evaluation, 1985). Given
Example 1 in Chapter 4). are measurements in nanoseconds of the travel time
(a) In part (b) of Exercise 1 of Section 7.3, you (in excess of 36.1 µs) of a certain type of mechan-
were asked to make simultaneous confidence ical wave induced by mechanical stress in railroad
intervals for all differences in the r = 5 mean rails. Three measurements were made on each of
densities. From your intervals, what kind of six different rails.
a p-value (small or large) do you expect to
find when testing the equality of these means? Travel Time
Explain. Rail (nanoseconds above 36.1 µs)
(b) Make an ANOVA table (in the form of Table
7.12) for the data of Example 1 in Chapter 4. 1 55, 53, 54
You should do the calculations by hand first and 2 26, 37, 32
then check your arithmetic using a statistical 3 78, 91, 85
computer package. Then use the calculations 4 92, 100, 96
to find both R 2 for the one-way model and also 5 49, 51, 50
the observed level of significance for an F test 6 80, 85, 83
of the null hypothesis that all five pressures
produce the same mean density. (a) Make plots to check the appropriateness of a
2. Return to the tilttable study of Exercises 2 of Sec- one-way random effects analysis of these data.
tions 7.1 through 7.3. What do these suggest?
(a) In part (b) of Exercise 2 of Section 7.3, you (b) Ignoring any possible problems with the stan-
were asked to make simultaneous confidence dard assumptions of the random effects model
intervals for all differences in the r = 4 mean revealed in (a), make an ANOVA table for these
tilttable ratios. From your intervals, what kind data (like Table 7.15) and find estimates of σ
of a p-value (small or large) do you expect to and στ . What, in the context of this problem,
find when testing the equality of these means? do these two estimates measure?
Explain. (c) Find and interpret a two-sided 90% confidence
(b) Make an ANOVA table (in the form of Table interval for the ratio στ /σ .
7.12) for the data of Exercise 2 of Section 7.1. 4. The following are some general questions about the
Then find both R 2 for the one-way model and random effects analyses:
also the observed level of significance for an (a) Explain in general terms when a random effects
F test of the null hypothesis that all four vans analysis is appropriate for use with multisam-
have the same mean tilttable ratio. ple data.
3. The following data are taken from the paper “Zero- (b) Consider a scenario where r = 5 different tech-
Force Travel-Time Parameters for Ultrasonic Head- nicians employed by a company each make
496 Chapter 7 Inference for Unstructured Multisample Studies
m = 2 measurements of the diameter of a par- run mean measurements for various techni-
ticular widget using a particular gauge in a cians (στ ). The sums of squares are in units
study of how technician differences show up of square inches.
in diameter data the company collects. Under
what circumstances would a random effects ANOVA Table
analysis of the resulting data be appropriate? Source SS df MS F
(c) Suppose that the following ANOVA table was
made in a random effects analysis of data like Technician .0000136 4 .0000034 1.42
those described in part (b). Give estimates of Error .0000120 5 .0000024
the standard deviation associated with repeat
Total .0000256 9
diameter measurements for a given technician
(σ ) and then for the standard deviation of long-
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Table 7.16
Lengths of 22 Samples of Five Sheets
Cut on a Ream Cutter
Data (like those in Table 7.16) collected for purposes of assessing process
stability will often be r samples of some fixed sample size m, lacking any structure
except for the fact that they were taken in a particular time order. So Shewhart control
498 Chapter 7 Inference for Unstructured Multisample Studies
charting is at home in this chapter that treats inference methods for unstructured
multisample studies.
Shewhart’s fundamental qualitative insight regarding variation seen in process
data over time is that
Shewhart conceived of baseline variation as that which will remain even under the
most careful process monitoring and appropriate physical interventions—an inherent
property of a particular system configuration, which cannot be reduced without basic
changes in the physical process or how it is run. This is variation due to common
(universal) causes or system causes. Other terms used for it are random variation
and short-term variation. In the context of the cutting operation of Example 9, this
kind of variation might be seen in consecutive sheet lengths cut on a single ream
cutter, from a single roll of material, without any intervening operator adjustments,
following a particular plant standard method of machine operation, etc. It is variation
that comes from hundreds of small unnameable, unidentifiable physical causes.
When only this kind of variation is acting, it is reasonable to call a process “stable.”
The second component of overall process variation is variation that can poten-
tially be eliminated by appropriate physical intervention. This kind of variation has
been called variation due to special or assignable causes, nonrandom variation,
and long-term variation. In the sheet-cutting example, this might be variation in
sheet length brought about by undesirable changes in tension on the material being
cut, roller slippage on the cutter, unwarranted operator adjustments to the machine,
eccentricities associated with how a particular incoming roll of material was wound,
etc. Shewhart reasoned that being able to separate the two kinds of variation is a
prerequisite to ensuring good process performance. It provides a basis for knowing
when to intervene and find and eliminate the cause of any assignable variation,
thereby producing process stability.
Shewhart control Shewhart’s method for separating the two components of overall variation in
charts equation (7.64) is graphical and based on the following logic. First, periodically
taken samples are reduced to appropriate summary statistics, and the summary
statistics are plotted against time order of observation. To this simple time-plotting
of summary statistics, Shewhart added the notion that lines be drawn on the chart to
separate values that are consistent with a “baseline variation only” view of process
performance from those that are not. Shewhart called these lines of demarcation
control limits. When all plotted points fall within the control limits, the process is
judged to be stable, subject only to chance causes. But when a point falls outside
the limits, physical investigation and intervention is called for, to eliminate any
assignable cause of variation. Figure 7.14 is a plot of a generic control chart for a
summary statistic, w. It shows upper and lower control limits (UCL and LCL), some
plotted values, and one “out of control” point.
There are any number of charts that fit the general pattern of Figure 7.14.
For example, common possibilities relevant in the sheet-cutting case of Example 9
include control charts for the sample mean, sample range, and sample standard
7.5 Shewhart Control Charts for Measurement Data 499
Center line
“Out of Control”
point
1 2 3 4 5 6 7 8 9 10 11
Time
deviation of sheet lengths. These will presently be discussed in detail. But first,
some additional generalities still need to be considered.
Setting control For one thing, there remains the matter of how to set the position of the control
limits limits. Shewhart argued that probability theory can be applied and appropriate stable-
process/iid-observations distributions developed for the plotted statistics. Then small
upper and lower percentage points for these can be used to establish control limits.
As an example, the central limit material in Section 5.5 should have conditioned
the reader to think of√sample means as approximately normal with mean µ and
standard deviation σ/ m, where µ and σ describe individual observations and m is
the sample size. So for plotting sample means, the upper and lower control limits
might be set at small upper and lower percentage
√ points of the normal distribution
with mean µ and standard deviation σ/ m, where µ and σ are a process mean and
short-term standard deviation, respectively.
Two different circumstances are possible regarding the origin of values for
process parameters used to produce control limits. In some applications, values of
process parameters (and therefore, parameters for the “stable process” distribution
“Standards given” of the plotted statistic) and thus control limits are provided from outside the data
contexts producing the charted values. Such circumstances will be called “standards given”
situations. For emphasis, the meaning of this term is stated here in definition form.
Definition 7 When control limits are derived from data, requirements, or knowledge of the
behavior of a process that are outside the information contained in the samples
whose summary statistics are to be plotted, the charting is said to be done with
standards given.
500 Chapter 7 Inference for Unstructured Multisample Studies
When a plotted point lies inside control limits, one is directed to a decision in favor
of hypothesis (7.65) for the time period in question. A point plotting outside limits
makes hypothesis (7.65) untenable at the time represented by the sample.
Retrospective In contrast to “standards given” applications, there are situations in which no
contexts external values for process parameters are used. Instead, a single set of samples
taken from the process is used to both develop a plausible set of parameters for the
process and also to judge the stability of the process over the period represented by
the data. The terms retrospective or “as past data” will be used in this text for such
control charting applications.
Definition 8 When control limits are derived from the same samples whose summary
statistics are plotted, the charting is said to be done retrospectively or “as
past data.”
In the context of Example 9, control limits derived from the data in Table 7.16
and applied to summary statistics for those same data would be “as past data” control
limits for assessing the cutting process stability over the period from 12:40 through
1:22 on the day the data were taken.
Retrospective A way of thinking about a retrospective control chart is a graphical means of
charting and testing the hypothesis
hypothesis testing
H0 : A single set of process parameters was acting throughout the
(7.66)
time period studied
When a point or points plot outside of control limits derived from the whole data
set, the hypothesis (7.66) of process stability over the period represented by the data
becomes untenable.
by the symbols used for the plotted statistics. So the following discussion con-
cerns Shewhart x̄ charts. In using this terminology (and other notation from
the statistical quality control field), this text must choose a path through nota-
tional conflicts that exist between the most common usages in control charting and
those for other multisample analyses. The options that will be exercised here must
be explained.
Notational conventions In the first place, to this point in Chapter 7 (also in Chapter 4, for that matter) the
for x̄ charting symbol y has been used for the basic response variable in a multisample statistical
engineering study, ȳ i for a sample mean, and ȳ . and ȳ for unweighted and weighted
averages of the ȳ i , respectively . In contrast, in Chapters 3 and 6, where the discussion
centered primarily on one- and two-sample studies, x was used as the basic response
variable and x̄ (or x̄ i in the case of two-sample studies) to stand for a sample
mean. Standard usage in Shewhart control charting is to use the x and x̄ (x̄ i )
convention, and the precedent is so strong that this section will adopt it as well.
In addition, historical momentum in control charting dictates that rather than using
x̄ . notation,
Average sample 1X
r
mean (quality x̄¯ = x̄ (7.67)
control notation)
r i=1 i
is used for the average of sample means. But this “bar bar” or “double bar” notation
is used in this book only in this section.
Something must also be said about notation for sample sizes. It is universal
to use the notation n i for an individual sample size. But there is some conflict
when all sample sizes n i have a common value. TheP convention in this chapter has
been to use m for such a common value and n for n i . Standard quality control
notation is to instead use n for a common sample size. In this matter, we will
continue to use the conventions established thus far in Chapter 7, believing that to
do otherwise invites too much confusion. But the reader is hereby alerted to the
fact that the m used here is usually going to appear as n in other treatments of
control charting.
Having dealt with the notational problems, we turn to the making of a “standards
given” Shewhart x̄ chart based on samples of size m. An iid model for observations
from a process with mean µ and standard deviation σ produces
E x̄ = µ (7.68)
and
√ σ
Var x̄ = √ (7.69)
m
and often an approximately normal distribution for x̄. The fact that essentially all of
the probability of a normal distribution is within 3 standard deviations of its mean
502 Chapter 7 Inference for Unstructured Multisample Studies
led Shewhart to suggest that given process standards µ and σ , x̄ chart control limits
could be set at
“Standards given” σ σ
LCLx̄ = µ − 3 √ and UCLx̄ = µ + 3 √ (7.70)
control limits for x̄ m m
Example 9 Consider the use of process standards µ = 10 and σ = 1.9 in x̄ charting based
1
(continued ) on the data given in Table 7.16 (recall the values there are in units of 64 in. over
a reference length). With these standard values for µ and σ , since the r = 22
WWW
in. above reference)
UCL
12
11
Center line
Sample mean length, x ( 64
1
10
LCL
7
5 10 20
Sample number, i
1.9 1.9
I UCLx̄ = 10 + 3 √ = 12.55 and LCLx̄ = 10 − 3 √ = 7.45
5 5
along with a center line drawn at µ = 10. Table 7.17 gives some sample-by-
sample summary statistics for the data of Table 7.16, including the sample means
x̄ i . Figure 7.15 is a “standards given” Shewhart x̄ chart for the same data.
Figure 7.15 shows two points plotting below the lower control limit: the
means for samples 5 and 11. But it is perfectly obvious from the plot what was
going on in the data of Table 7.16 to produce the “out of control” points and
corresponding debunking of hypothesis (7.65). Not one of the r = 22 plotted
Table 7.17
Sample-by-Sample Summary Statistics
for 22 Samples of Sheet Lengths
i, Sample x̄ i si Ri
1 8.8 1.30 3
2 8.4 1.67 4
3 9.2 2.49 6
4 8.6 1.14 3
5 7.4 2.61 6
6 8.8 1.10 3
7 8.6 1.95 5
8 8.4 1.14 3
9 9.0 2.55 7
10 8.4 1.67 4
11 7.4 2.19 6
12 8.6 1.67 4
13 8.2 1.79 4
14 8.0 1.41 4
15 9.4 3.51 8
16 8.2 2.49 6
17 7.6 .89 2
18 7.6 1.34 3
19 7.6 2.51 5
20 8.6 1.82 5
21 7.6 2.70 6
22 9.0 4.47 12
P P P
x̄ = 183.4 s = 44.41 R = 109
504 Chapter 7 Inference for Unstructured Multisample Studies
Example 9 sample means lies at or above 10. If an average sheet length of µ = 10 was truly
(continued ) desired, a simple adjustment was needed, to increase sheet lengths roughly
10 − x̄¯ = 10 − 8.3 = 1.7 1
64
in.
The true process mean operating to produce the data was clearly below the
standard mean.
ER = d2 σ (7.71)
or equivalently,
ER
σ = (7.72)
d2
7.5 Shewhart Control Charts for Measurement Data 505
Values of d2 for various m are given in Table B.2. (Return to the comments preceding
Proposition 1 in Section 3.3 and recognize that what was cryptic there should now
make sense.)
Statements (7.71) and (7.72) are theoretical. The way they find practical rel-
evance is to think that under the hypothesis that the process standard deviation is
constant, the sample mean of sample ranges
1X
r
Average sample
range
R= R (7.73)
r i=1 i
can be expected to approximate the theoretical mean range, ER. That is, from
statement (7.72), it seems that
A range-based R
estimator of σ σ̂ = (7.74)
d2
extensively (beginning in Section 6.4) in this text. That density can in turn be used
to find a theoretical mean for s. As it turns out, although Es 2 = σ 2 , the theoretical
mean of s is not quite σ , but rather a multiple of σ (for a given sample size m). The
constant of proportionality is typically called c4 , and in symbols,
Es = c4 σ (7.75)
or equivalently,
Es
σ = (7.76)
c4
Values of c4 for various m are given in Table B.2. From that table, it is easy to see
that as a function of m, c4 increases from about .8 when m = 2 to essentially 1 for
large m.
The practical use made of the theoretical statements (7.75) and (7.76) is to think
that the sample average of the sample standard deviations
1X
r
Average sample
standard deviation s̄ = s (7.77)
r i=1 i
A standard deviation- s̄
based estimator of σ
σ̂ = (7.78)
c4
(It is worth remarking that s̄ is not the same as sP , even when all sample sizes are
the same. sP is derived by averaging sample variances and then taking a square root.
s̄ comes from taking the square roots of the sample variances and then averaging.
In general, these two orders of operation do not produce the same results.)
In any case, commonly used retrospective control limits for x̄ are obtained by
substituting x̄¯ given in formula (7.67) for µ and either of the estimates of σ given
in displays (7.74) or (7.78) for σ in the formulas (7.70). Further, an “as past data”
¯
center line for an x̄ chart is typically set at x̄.
Example 9 Consider retrospective x̄ control charting for the ream cutter data. Using the
(continued ) column totals given in Table 7.17, one finds from formulas (7.67), (7.73), and
(7.77) that
183.4
x̄¯ = = 8.3
22
109
R̄ = = 4.95
22
44.41
s̄ = = 2.019
22
R 4.95
I = = 2.13
d2 2.326
7.5 Shewhart Control Charts for Measurement Data 507
Also, Table B.2 shows that for a sample size of m = 5, c4 = .9400, so an estimate
of σ based on s̄ is (from expression (7.78))
s̄ 2.019
= = 2.15
c4 .94
(Note that beginning from the standard deviations in Table 7.17, sP = 2.19, and
clearly sP 6= s̄.)
Using (for example) statistic (7.74), one is thus led to substitute 8.3 for µ
and 2.13 for σ in “standards given” formulas (7.70) to obtain the retrospective
limits
2.13 2.13
I LCLx̄ = 8.3 − 3 √ = 5.44 and UCLx̄ = 8.3 + 3 √ = 11.16
5 5
Figure 7.16 shows an “as past data” Shewhart x̄ control chart for the ream cutter
data, using limits based on R.
Notice the contrast between the pictures of the ream cutter performance given
in Figures 7.15 and 7.16. Figure 7.15 shows clearly that process parameters are
not at their standard values, but Figure 7.16 shows that it is perhaps plausible
to think of the data in Table 7.16 as coming from some stable data-generating
mechanism. The observed x̄’s hover nicely (indeed—as will be argued at the end
of the next section—perhaps too nicely) about a central value, showing no “out of
control” points or obvious trends. That hypothesis (7.66) is at least approximately
true is believable on the basis of Figure 7.16.
in. above reference)
UCL
11
10
9
Sample mean length, x ( 64
1
8 Center line
6
LCL
5 10 15 20
Sample number, i
R R
LCLx̄ = x̄¯ − 3 √ and UCLx̄ = x̄¯ + 3 √ (7.79)
d2 m d2 m
3
A2 = √
d2 m
s̄ s̄
LCLx̄ = x̄¯ − 3 √ and UCLx̄ = x̄¯ + 3 √ (7.81)
c4 m c4 m
7.5 Shewhart Control Charts for Measurement Data 509
3
A3 = √
c4 m
Standard deviation-
based retrospective LCLx̄ = x̄¯ − A3 s̄ and UCLx̄ = x̄¯ + A3 s̄ (7.82)
control limits for x̄
Values of A3 are given in Table B.2.
The limit indicated in formula (7.84) turns out to be negative for m ≤ 6. For those
sample sizes, since ranges are nonnegative, no lower control limit is used. Formulas
510 Chapter 7 Inference for Unstructured Multisample Studies
(7.84) and (7.85) are typically simplified by the introduction of yet more notation.
That is, standard quality control usage is to let
Like the other control chart constants, D1 and D2 appear in Table B.2. Note that for
m ≤ 6, there is no tabled value for D1 , as no lower limit is in order.
Example 9 Consider a “standards given” control chart analysis for the sheet length ranges
(continued ) given in Table 7.17, using a standard σ = 1.9 ( 64
1
in.). Since samples of size m = 5
are involved, Table B.2 shows that d2 = 2.326 and D2 = 4.918 are appropriate
WWW for establishing a “standards given” control chart for R. The center line should
be drawn at
d2 σ = 2.326(1.9) = 4.4
I D2 σ = 4.918(1.9) = 9.3
(Since m ≤ 6, no lower control limit will be used.) Figure 7.17 shows a “standards
given” control chart for ranges of the sheet lengths. It is clear from the figure that
in.)
10 UCL
5
Center line
5 10 15 20
Sample number, i
for the most part, a constant process standard deviation of σ = 1.9 is plausible,
except for the clear indication to the contrary at sample 22. The 22nd observed
range, R = 12, is simply larger than expected based on a sample of size m = 5
from a normal distribution with σ = 1.9. In practice, it would be appropriate
to undertake a physical search for the cause of the apparent increase in process
variability associated with the last sample taken.
As was the case for x̄ charts, combination of formulas for the estimation of
(supposedly constant) process parameters with the “standards given” limits (7.86)
produces retrospective control limits for R charts. For example, basing an estimate
of σ on R as in display (7.74), leads (not too surprisingly) to a retrospective center
line for R at d2 (R/d2 ) = R and retrospective control limits
D1 R D2 R
LCL R = and UCL R = (7.87)
d2 d2
The abbreviations
D1 D2
D3 = and D4 =
d2 d2
Retrospective control
LCL R = D3 R and UCL R = D4 R (7.88)
limits for R
R = 4.95
Look again at Figure 7.17 and note that the use of these retrospective limits
(instead of the σ = 1.9 “standards given” limits of Figure 7.17) does not materi-
ally alter the appearance of the plot. The range for sample 22 still plots above the
upper control limit. It is not plausible that a single σ stands behind all of the 22
512 Chapter 7 Inference for Unstructured Multisample Studies
Example 9 plotted ranges (not even σ ≈ R/d2 = 2.13). It is pretty clear that a different phys-
(continued ) ical mechanism must have been acting at sample 22 than was operative earlier.
For pedagogical reasons, x̄ charts were considered first before turning to charts
aimed at monitoring σ . In terms of order of attention in an application, however, R
(or s) charts are traditionally (and correctly) given first priority. They deal directly
with the baseline component of process variation. Thus (so conventional wisdom
goes), if they show lack of stability, there is little reason to go on to considering
the behavior of means (which deals primarily with the long-term component of
process variation) until appropriate physical changes bring the ranges (or standard
deviations) to the place of repeatability.
√ p
Var s = 1 − c42 σ (7.89)
Then formulas (7.75) and (7.89) taken together yield “standards given” 3-sigma
control limits for s. That is, with a center line at c4 σ , one employs the limits
p p
LCLs = c4 σ − 3 1 − c42 σ = c4 − 3 1 − c42 σ
p p
UCLs = c4 σ + 3 1 − c42 σ = c4 + 3 1 − c42 σ
Example 9 Returning once more to the ream cutter example of Shervheim and Snider, con-
(continued ) sider the monitoring of σ through the use of sample standard deviations rather
than ranges, based on a standard of σ = 1.9 ( 64 1
in.). Table B.2 with sample size
WWW m = 5 once again gives c4 = .9400 and also shows that B6 = 1.964. So an s
chart for the data of Table 7.16 has a center line at
c4 σ = (.94)(1.9) = 1.79
5.0
4.0
3.0
2.0
Center line
1.0
5 10 15 20
Sample number, i
As was the case for x̄ and R charts, retrospective control limits for s can
be had by replacing the parameter σ in the “standards given” limits (7.90) with
any appropriate estimate. The most common way of proceeding is to employ the
estimator s̄/c4 and thus end up with a retrospective center line for an s chart at
c4 (s̄/c4 ) = s̄ and retrospective control limits
B5 s̄ B6 s̄
LCLs = and UCLs = (7.91)
c4 c4
514 Chapter 7 Inference for Unstructured Multisample Studies
B5 B6
B3 = and B4 =
c4 c4
s̄ = 2.02
Look again at Figure 7.18 and verify that the use of these retrospective limits (in-
stead of the σ = 1.9 “standards given” limits) wouldn’t much change the appear-
ance of the plot. As was the case for the retrospective R chart analysis, these retro-
spective s chart limits still put sample 22 in a class by itself, suggesting that a dif-
ferent physical mechanism produced it than that which led to the other 21 samples.
Ranges are easier to calculate “by hand” than standard deviations and are
easier to explain as well. As a result, R charts are more popular than s charts. In
fact, R charts are so common that the phrase “x̄ and R charts” is often spoken in
quality control circles in such a way that the x̄/R pair is almost implied to be a
single inseparable entity. However, when computational problems and conceptual
understanding are not issues, s charts are preferable to R charts because of their
superior sensitivity to changes in σ .
A useful final observation about the s chart idea is that for r -sample statistical en-
gineering studies where all sample sizes are the same, the “as past data” control limits
in display (7.92) can provide some rough help in the model-checking activities of
Section 7.1 (in reference to the “single variance” assumption of the one-way model).
B3 s̄ and B4 s̄ can be treated as rough limits on the variation in sample standard devia-
tions deemed to be consistent with the one-way model’s single variance assumption.
7.5 Shewhart Control Charts for Measurement Data 515
The largest of the eight values si in Table 7.3 is 965.6, and there are thus no “out
of control” standard deviations. So as in Section 7.1, no strong evidence against
the relevance of the “single variance” model assumption is discovered here.
Section 5 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. The following are some data taken from a larger set (a) Suppose that standard values for the process
in Statistical Quality Control by Grant and Leav- mean and standard deviation of drained
enworth, giving the drained weights (in ounces) weights (µ and σ ) in this canning plant are
of contents of size No. 2 12 cans of standard grade 21.0 oz and 1.0 oz, respectively. Make and in-
tomatoes in puree. Twenty samples of three cans terpret “standards given” x̄ and R charts based
taken from a canning process at regular intervals on these samples. What do these charts indi-
are represented. cate about the behavior of the filling process
over the time period represented by these data?
Sample x1 x2 x3 Sample x1 x2 x3 (b) As an alternative to the “standards given” range
chart made in part (a), make a “standards given”
1 22.0 22.5 22.5 11 20.0 19.5 21.0 s chart based on the 20 samples. How does its
2 20.5 22.5 22.5 12 19.0 21.0 21.0 appearance compare to that of the R chart?
3 20.0 20.5 23.0 13 19.5 20.5 21.0 Now suppose that no standard values for µ and σ
4 21.0 22.0 22.0 14 20.0 21.5 24.0 have been provided.
5 22.5 19.5 22.5 15 22.5 19.5 21.0 (c) Find one estimate of σ for the filling process
6 23.0 23.5 21.0 16 21.5 20.5 22.0 based on the average of the 20 sample ranges,
7 19.0 20.0 22.0 17 19.0 21.5 23.0 R, and another based on the average of 20 sam-
8 21.5 20.5 19.0 18 21.0 20.5 19.5 ple standard deviations, s̄. How do these com-
9 21.0 22.5 20.0 19 20.0 23.5 24.0 pare to the pooled sample standard deviation
10 21.5 23.0 22.0 20 22.0 20.5 21.0
(of Section 7.1), sP , here?
(d) Use x̄¯ and your estimate of σ based on R and
make retrospective control charts for x̄ and R.
7.5 Shewhart Control Charts for Measurement Data 517
What do these indicate about the stability of the (b) Use your estimate from (a) based on sample
filling process over the time period represented standard deviations and compute control lim-
by these data? its for the sample ranges R, and then compute
(e) Use x̄¯ and your estimate of σ based on s̄ and control limits for the sample standard devia-
make retrospective control charts for x̄ and s. tions s. Applying these to the R and s values,
How do these compare in appearance to the what is suggested about the threading process?
retrospective charts for process mean and vari- (c) Using a center line at x̄, ¯ and your estimate
ability made in part (d)? of σ based on the sample standard deviations,
2. A manufacturer of U-bolts collects data on the compute control limits for the sample means
thread lengths of the bolts that it produces. Nine- x̄. Applying these to the x̄ values here, what is
teen samples of five consecutive bolts gave the suggested about the threading process?
thread lengths indicated the accompanying table (in (d) A check of the control chart form from which
.001 in. above nominal). these data were taken shows that the coil of the
heavy wire from which these bolts are made
Sample Thread Lengths x̄ R s was changed just before samples 1, 9, and 16
were taken. What insight, if any, does this in-
1 11, 14, 14, 10, 8 11.4 6 2.61 formation provide into the possible origins of
2 14, 10, 11, 10, 11 11.2 4 1.64 any patterns you see in the data?
3 8, 13, 14, 13, 10 11.6 6 2.51 (e) Suppose that a customer will purchase bolts
4 11, 8, 13, 11, 13 11.2 5 2.05 of the type represented in the data only if es-
5 13, 10, 11, 11, 11 11.2 3 1.10 sentially all bolts received can be guaranteed
6 11, 10, 10, 11, 13 11.0 3 1.22 to have thread lengths within .01 in. of nom-
inal. Does it appear that with proper process
7 8, 6, 11, 11, 11 9.4 5 2.30
monitoring and adjustment, the equipment and
8 10, 11, 10, 14, 10 11.0 4 1.73
manufacturing practices in use at this company
9 11, 8, 11, 8, 10 9.6 3 1.52
will be able to produce only bolts meeting these
10 6, 6, 11, 13, 11 9.4 7 3.21 standards? Explain in quantitative terms. If the
11 11, 14, 13, 8, 11 11.4 6 2.30 equipment was not adequate to meet such re-
12 8, 11, 10, 11, 14 10.8 6 2.17 quirements, name two options that might be
13 11, 11, 13, 8, 13 11.2 5 2.05 taken and their practical pros and cons.
14 11, 8, 11, 11, 11 10.4 3 1.34 3. State briefly the practical goals of control charting
15 11, 11, 13, 11, 11 11.4 2 .89 and action on “out of control” signals produced by
16 14, 13, 13, 13, 14 13.4 1 .55 the charts.
17 14, 13, 14, 13, 11 13.0 3 1.22 4. Why might it well be argued that the name control
18 13, 11, 11, 11, 13 11.8 2 1.10 chart invites confusion?
19 14, 11, 11, 11, 13 12.0 3 1.41 5. What must an engineering application of control
X X X charting involve beyond the simple naming of
x̄ = 212.4 R = 77 s = 32.92
points plotting out of control if it is to be prac-
tically effective?
(a) Compute two different estimates of the process
short-term standard deviation of thread length, 6. Explain briefly how a Shewhart x̄ chart can help
one based on the sample ranges and one based reduce variation in, say, a widget diameter, first
on the sample standard deviations. by signaling the need for process intervention/
adjustment and then also by preventing adjustments
when no “out of control” signal is given.
518 Chapter 7 Inference for Unstructured Multisample Studies
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
7.6.1 p Charts
This text has consistently indicated that measurements are generally preferable to
attributes data. But in some situations, the only available information on the stability
of a process takes the form of qualitative or count data. Consideration of the topic
of control charting in such situations will begin here with p charts for cases where
what is available for plotting are sample fractions, p̂i . The most common use of this
is where p̂i is the fraction of a sample of n i items that is nonconforming according
to some engineering standard or specification. So this section will use the “fraction
nonconforming” language, in spite of the fact that p̂i can be the sample fraction
having any attribute of interest (desirable, undesirable, or indifferent).
The probability facts supporting control charting for the fraction nonconform-
ing are exactly those used in Section 6.5 to develop inference methods based on p̂.
That is, if a process is stable over time, each n i p̂i is usefully modeled as binomial
(n i , p), where p is a constant likelihood that any sampled item is nonconform-
ing. (This section will explicitly allow for sample sizes n i varying in time. Charts
for measurements are almost always based on fairly small but constant sample
sizes. But charts for attributes data typically involve larger sample sizes that some-
times vary.)
As in Section 6.5, a binomial model for n i p̂i leads immediately to
E p̂i = p (7.93)
and
s
q p(1 − p)
Var p̂i = (7.94)
ni
But then formulas (7.93) and (7.94) suggest obvious “standards given” 3-sigma
control limits for the sample “fraction nonconforming” p̂i . That is, if p is a standard
7.6 Shewhart Control Charts for Qualitative and Count Data 519
likelihood that any single item is nonconforming, then a “standards given” p chart
has a center line at p and control limits
s
p(1 − p)
LCL p̂ = p − 3 (7.95)
i ni
“Standards given” p
chart control limits
s
p(1 − p)
UCL p̂ = p + 3 (7.96)
i ni
In the event that formula (7.95) produces a negative value, no lower control limit
is used.
i, n i p̂i ,
Sample Number Nonconforming p̂i
1 13 .43
2 12 .40
3 9 .30
4 15 .50
5 17 .57
6 13 .43
7 20 .67
8 18 .60
9 18 .60
10 16 .53
11 15 .50
12 17 .57
13 15 .50
14 20 .67
15 10 .33
16 12 .40
17 17 .57
18 14 .47
19 16 .53
20 10 .33
21 14 .47
22 13 .43
23 17 .57
24 10 .33
25 12 .40
P
n i p̂i = 363
UCL
.7
Center line
.6
.5
.4
LCL
.3
5 10 15 20 25
Sample number, i
( p̂ is the total number nonconforming divided by the total number inspected. When
sample sizes vary, it is a weighted average of the p̂i .)
With p̂ as in formula (7.97), an “as past data” Shewhart p chart has a center
line at p̂ and
s
p̂(1 − p̂)
Retrospective LCL p̂ = p̂ − 3 (7.98)
i ni
p chart control
limits
s
p̂(1 − p̂)
UCL p̂ = p̂ + 3 (7.99)
i ni
As in the “standards given” context, when formula (7.98) produces a negative value,
no lower control limit is used for p̂i .
Example 11 In
P the pelletizing case, the total number nonconforming in the samples was
(continued ) n i p̂i = 363. Then, since mr = 30(25) = 750 pellets were actually inspected
522 Chapter 7 Inference for Unstructured Multisample Studies
UCL
.7
Sample fraction nonconforming, p
.6
.5 Center line
.4
.3
LCL
.2
5 10 15 20 25
Sample number, i
beyond what is in Section 6.5. For example, formal significance tests of equality of r
proportions, parallel to the tests of equality of r means presented in Section 7.4, won’t
be discussed. However, the retrospective p chart can be interpreted as a rough graph-
ical tool for judging how sensible the hypothesis H0 : p1 = p2 = · · · = pr appears.
7.6.2 u Charts
Section 3.4 introduced the notation û for the ratio of the number of occurrences of
a phenomenon of interest to the total number of inspection units or items sampled
in contexts where there may be multiple occurrences on a given item or inspection
unit. The most common application of u charts based on such ratios is that of
nonconformance to some engineering standard or specification. This section will
use the terminology of “nonconformances per unit” in spite of the fact that û can be
the sample occurrence rate for any type of phenomenon (desirable, undesirable, or
indifferent).
The theoretical basis for control charting based on nonconformances per unit
is found in the Poisson distributions of Section 5.1. That is, suppose that for some
specified inspection unit or unit of process output of a given size, a physically stable
process has an associated mean nonconformances per unit of λ and
Then a reasonable model for X i is often the Poisson distribution with mean ki λ. The
material in Section 5.1 then says that both E X i = ki λ and Var X i = ki λ.
But notice that if û i is the sample nonconformances per unit observed at period i,
Rate plotted on Xi
a u chart û i =
ki
Xi 1 1
E û i = E = EXi = (ki λ) = λ
ki ki ki
(7.100)
X 1 1 λ
Var û i = Var i = 2 Var X i = 2 (ki λ) =
ki ki ki ki
so
s
q λ
Var û i = (7.101)
ki
524 Chapter 7 Inference for Unstructured Multisample Studies
The relationships (7.100) and (7.101) then motivate “standards given” 3-sigma
control limits for û i . That is, if λ is a standard mean nonconformances per unit, then
a “standards given” u chart has a center line at λ and
s
λ
LCLû = λ − 3 (7.102)
i ki
“Standards given”
u chart control
limits
s
λ
UCLû = λ + 3 (7.103)
i ki
The difference in formula (7.102) can turn out negative. When it does, no lower
control limit is used.
Another matter of notation must be discussed at this point. λ is the symbol
commonly used (as in Section 5.1) for a Poisson mean, and this fact is the basis for
the usage here. However, it is more common in statistical quality control circles to
use c or even c0 for a standard mean nonconformances per unit. In fact, the case of
the u chart where all ki are 1 is usually referred to as a c chart. The λ notation used
here represents the path of least confusion through this notational conflict and thus
c or c0 will not be used in this text. However, be aware that at least in the quality
control world, there is a more popular alternative to the present λ convention.
When the limits (7.102) and (7.103) are used with nonconformances per unit
data, one is essentially checking whether the prespecified λ is a plausible description
of a physical process at each time period covered by the data. Often, however, there
is no obvious standard occurrence rate λ, and u charting is to be done retrospectively.
The question is then whether or not it is plausible that some (single) λ describes
the process over all time periods covered by the data. What is needed in order to
produce retrospective control limits for such cases is a way to use the û i to make a
single estimate of a supposedly constant λ. This text’s approach to this problem is to
make an estimate exactly analogous to the pooled estimate of p in formula (7.97).
That is, let
Pooled estimator k1 û 1 + k2 û 2 + · · · + kr û r
λ̂ = (7.104)
of a common λ k 1 + k 2 + · · · + kr
s
λ̂
Retrospective u LCLû = λ̂ − 3 (7.105)
i ki
chart control limits
7.6 Shewhart Control Charts for Qualitative and Count Data 525
s
λ̂
UCLû = λ̂ + 3 (7.106)
i ki
As the reader might by now expect, when formula (7.105) gives a negative value,
no lower control limit is employed.
Example 12 u Chart Monitoring of the Defects per Truck Found at Final Assembly
(Example 13, Chapter 3,
In his book Statistical Quality Control Methods, I. W. Burr discusses the use of u
revisited—see page 110)
charts to monitor the performance of an assembly process at a station in a truck
assembly plant. Part of Burr’s data were given earlier in Table 3.19. Table 7.19
WWW gives a (partially overlapping) r = 30 production days’ worth of Burr’s data. (The
values were extrapolated from Burr’s figures and the fact that truck production
through sample 13 was 95 trucks/day and was 130 trucks/day thereafter. Burr
gives only û i values, production rates, and the fact that all trucks produced
were inspected.)
Consider the problem of control charting for these data. Since Burr gave no
figure λ for the plant’s standard errors per truck, this problem will be approached
as one of making a retrospective u chart. Using formula (7.104), and the column
totals from Table 7.19,
P
X 6,078
λ̂ = P i = = 1.764
ki 3,445
So an “as past data” u chart will have a center line at 1.764 errors/truck. From
formulas (7.105) and (7.106), for the first 13 days (where each ki was 95),
r
1.764
LCLû = 1.764 − 3 = 1.355 errors/truck
i 95
I r
1.764
UCLû = 1.764 + 3 = 2.173 errors/truck
i 95
On the other hand, for the last 17 days (during which 130 trucks were produced
each day),
r
1.764
LCLû = 1.764 − 3 = 1.415 errors/truck
i 130
I r
1.764
UCLû = 1.764 + 3 = 2.113 errors/truck
i 130
526 Chapter 7 Inference for Unstructured Multisample Studies
i, ki , X i = ki û i , û i ,
Sample Date Trucks Produced Errors Found Errors/Truck
1 11/4 95 114 1.20
2 11/5 95 142 1.50
3 11/6 95 146 1.54
4 11/7 95 257 2.70
5 11/8 95 185 1.95
6 11/11 95 228 2.40
7 11/12 95 327 3.44
8 11/13 95 269 2.83
9 11/14 95 167 1.76
10 11/15 95 190 2.00
11 11/18 95 199 2.09
12 11/19 95 180 1.89
13 11/20 95 171 1.80
14 11/21 130 163 1.25
15 11/22 130 205 1.58
16 11/25 130 292 2.25
17 11/26 130 325 2.50
18 11/27 130 267 2.05
19 11/29 130 190 1.46
20 12/2 130 200 1.54
21 12/3 130 185 1.42
22 12/4 130 204 1.57
23 12/5 130 182 1.40
24 12/6 130 196 1.51
25 12/9 130 140 1.08
26 12/10 130 165 1.27
27 12/11 130 153 1.18
28 12/12 130 181 1.39
29 12/13 130 185 1.42
30 12/16 130 270 2.08
P P
ki = 3,445 X i = 6,078
Notice that since ki appears in the denominator of the plus-or-minus part of control
limit formulas (7.102), (7.103), (7.105), and (7.106), the larger the inspection
effort at a given time period, the tighter the corresponding control limits. This
is perfectly logical. A bigger “sample size” at a given period ought to make the
7.6 Shewhart Control Charts for Qualitative and Count Data 527
3.0
Errors per truck, u
UCL
2.0
Center line
LCL
1.0
5 10 15 20 25 30
Day, i
This book has had little to say about formal inference from data with an under-
lying Poisson distribution. But retrospective u charts like the one in Example 12 can
be thought of as rough graphical tests of the hypothesis H0 : λ1 = λ2 = · · · = λr for
Poisson-distributed X i = ki û i .
appropriate to amplify and extend those comments somewhat, in light of the extra
element provided by the control limits.
What is expected Before discussing interesting possible departures from the norm, it should prob-
if a process is stable? ably be explicitly stated how a 3-sigma control chart is expected to look if a process
is physically stable. One expects (tacitly assuming the distribution of the plotted
statistic to be mound-shaped) that
1. most plotted points will lie in the middle, (say, the middle 23 ) of the region
delineated by the control limits around the center line,
2. a few (say, on the order of 1 in 20) points will lie outside this region but
inside the control limits,
3. essentially no points will lie outside the control limits, and
4. there will be no obvious trends in time for any sizable part of the chart.
That is, one expects to see a random-scatter/white-noise plot that fills, but essentially
remains within, the region bounded by the control limits. When something else is
seen, even if no points plot outside the control limits, there is reason to consider
the possibility that something in addition to chance causes is active in the data-
generating mechanism.
Cyclical patterns Cyclical (repeated “up, then back down again”) patterns sometimes show up on
on a control chart Shewhart control charts. Such behavior is not characteristic of plots resulting from
a stable-process data-generating mechanism. When it occurs, the alert engineer will
look for identifiable physical causes of variation whose effects would come and go on
about the same schedule as the ups and downs seen on the chart. Sometimes cyclical
patterns are associated with daily or seasonal variables like ambient temperature
effects, which may be largely beyond a user’s control. But at other times, they have
to do with things like different (rotating) operators’ slightly different methods of
machine operation, which can be mostly eliminated via standardization, training,
and awareness.
Too much variation Again, the expectation is that points plotted on a Shewhart control chart should
on a control chart (over time) pretty much fill up but rarely plot outside the region delineated by
control limits. This can be violated in two different ways, both of which suggest the
need for engineering attention. In the first place, more variation than expected (like
that evident on Figure 7.21), which produces multiple points outside the control
limits, is often termed instability. And (after eliminating the possibility of a blunder
in calculations) it is nearly airtight evidence of one or more unregulated process
variables having effects so large that they must be regulated. Such erratic behavior
can sometimes be traced to material or components from several different suppliers
having somewhat different physical properties and entering a production line in a
mixed or haphazard order. Also, ill-advised operators may overadjust equipment
(without any basis in control charting). This can take a fairly stable process and
make it unstable.
Too little variation Less variation than expected on a Shewhart chart presents an interesting puzzle.
on a control chart Look again at Figure 7.16 on page 507 and reflect on the fact that the plotted x̄’s
7.6 Shewhart Control Charts for Qualitative and Count Data 529
on that chart hug the center line. They don’t come close to filling up the region
between the control limits. The reader’s first reaction to this might well be, “So
what? Isn’t small variation good?” Small variation is indeed a virtue, but when
points on a control chart hug the center line, what one has is unbelievably small
variation, which may conceal a blunder in calculation or (almost paradoxically)
unnecessarily large but nonrandom variation.
In the first place, the simplest possible explanation of a plot like Figure 7.16
is that the process short-term variation, σ , has been overestimated—either because
a standard σ is not applicable or because of some blunder in calculation or logic.
Notice that using a value for σ that is bigger than what is really called for when
making the limits
σ σ
LCLx̄ = µ − 3 √ and UCLx̄ = µ + 3 √
m m
will spread the control limits too wide and produce an x̄ chart that is insensitive to
changes in µ. So this possibility should not be taken lightly.
Systematic differences A more subtle possible source of unbelievably small variation on a Shewhart
and too little variation chart has to do with the (usually unwitting) mixing of several consistently different
on a control chart/ streams of observations in the calculation of a single statistic that is naively thought
stratification to be representing only one stream of observations. This can happen when data are
being taken from a production stream where multiple heads or cavities on a machine
(or various channels of another type of multiple-channel process) are represented in
a regular order in the stream. For example, items machined on heads 1, 2, and 3 of
a machine might show up downstream in a production process in the order 1, 2, 3,
1, 2, 3, 1, 2, 3, etc. Then, if there is more difference between the different types of
observations than there is within a given type, values of a single statistic calculated
using observations of several types can be remarkably (excessively) consistent.
Consider, for example, the possibility that a five-head machine has heads that
are detectably/consistently different. Suppose four of the five are perfectly adjusted
and always produce conforming items and the fifth is severely misadjusted and
always produces nonconforming items. Although 20% of the items produced are
nonconforming, a binomial distribution model with p = .2 will typically overpredict
the variation that will be seen in n i p̂i for samples of items from this process. Indeed,
samples of size m = 5 of consecutive items coming off this machine will have
p̂i = .2, always. Clearly, no p̂i ’s would approach p chart control limits.
Or in a measurement data context, with the same hypothetical five-head ma-
chine, consider the possibility that four of the five heads always produce a part
dimension at the target of 8 in. (plus or minus, say, .01 in.), whereas the fifth head is
grossly misadjusted, always producing the dimension at 9 in. (plus or minus .01 in.).
Then, in this exaggerated example, naive mixing together of the output of all five
heads will produce ranges unbelievably stable at about 1 in. and sample means (of
five consecutive pieces) unbelievably stable at about 8.2 in. But the super-stability
is not a cause for rejoicing. Rather it is a cause for thought and investigation that
could well lead to the physical elimination of the differences between the various
mechanisms producing the data—in this case, the fixing of the faulty head.
530 Chapter 7 Inference for Unstructured Multisample Studies
Cutter blades
Feeder rollers
Table 7.20
Western Electric Alarm Rules (from the AT&T Quality Control Handbook)
Table 7.21
Alarm Rules of L. S. Nelson (from the Journal of Quality Technology)
Section 6 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
ki = 1 and on ki = 2 in terms of the probabil- What do these values indicate about the stability of
ities that a given sample produces an “out of the bolt cutting process?
control” signal if 5. Why is it essential to have an operational definition
(i) the actual defect rate is standard. of a nonconformance to make effective practical
(ii) the actual defect rate is twice standard. use of a Shewhart c chart?
4. Successive samples of carriage bolts are checked 6. Explain why too little variation appearing on a
for length using “a go–no go” gauge. The results Shewhart control chart need not be a good sign.
from ten successive samples are as follows:
Sample 1 2 3 4 5 6 7 8 9 10
Sample Size 30 20 40 30 20 20 30 20 20 20
Nonconforming 2 1 5 1 2 1 3 0 1 2
Chapter 7 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Hoffman, Jabaay, and Leuer did a study of pen- differences in mean lead strengths, µ4H −
cil lead strength. They loaded pieces of lead of µH , µ4H − µB , and µH − µB .
the same diameter (supported on two ends) in (e) Suppose that for some reason it is desirable
their centers and recorded the forces at which they to compare the mean strength of B lead to
failed. Part of their data are given here (in grams the average of the mean strengths of 4H and
of load applied at failure). H leads. Give a 95% two-sided confidence
interval for the quantity 12 µ4H + µH − µB .
4H lead H lead B lead (f) Use the P-R method of simultaneous confi-
dence intervals and make simultaneous 95%
56.7, 63.8, 56.7 99.2, 99.2, 92.1 56.7, 63.8, 70.9 two-sided confidence intervals for the three
63.8, 49.6 106.0, 99.2 63.8, 70.9 mean strengths, µ4H , µH , and µB . How do
the lengths of these intervals compare to the
(a) In applying the methods of this chapter in the lengths of the intervals you found in part (c)?
analysis of these data, what model assump- Why is it sensible that the lengths should be
tions must be made? Make three normal plots related in this way?
of these samples on the same set of axes and (g) Use the Tukey method of simultaneous confi-
also make a normal plot of residuals for the dence intervals and make simultaneous 95%
one-way model as means of investigating the two-sided confidence intervals for the three
reasonableness of these assumptions. Com- differences in mean lead strengths, µ4H −
ment on the plots. µH , µ4H − µB , and µH − µB . How do the
(b) Compute a pooled estimate of variance based lengths of these intervals compare to the
on these three samples. What is the corre- lengths of the intervals you found in part (d)?
sponding value of sP ? (h) Use the one-way ANOVA test statistic and
(c) Use the value of sP that you calculated in (b) assess the strength of the evidence against
and make (individual) 95% two-sided con- H0 : µ4H = µ H = µ B in favor of Ha : not H0 .
fidence intervals for each of the three mean Show the whole five-step format.
lead strengths, µ4H , µH , and µB . (i) Make the ANOVA table corresponding to the
(d) Use sP and make (individual) 95% two-sided significance test you carried out in part (h).
confidence intervals for each of the three
534 Chapter 7 Inference for Unstructured Multisample Studies
(j) As a means of checking your work for parts on what the charts indicate about the turning
(h) and (i) of this problem, use a statistical process.
package to produce the required ANOVA ta- (c) If you were to judge the sample ranges to
ble, F statistic, and p-value. be stable, it would then make sense to use R̄
2. Allan, Robbins, and Wyckoff worked with a ma- to develop an estimate of the turning process
chine shop that employs a CNC (computer nu- short-term standard deviation σ . Find such an
merically controlled) lathe in the manufacture of estimate.
a part for a heavy equipment maker. Some sum- (d) The engineering specifications for the turned
mary statistics for measurements of a particular diameter are (still in .0001 in. above 1.1800
diameter on the part for 20 hourly samples of in.) from 4 to 14. Supposing that the average
m = 4 parts turned on the lathe are given here. diameter could be kept on target (at the mid-
(The means are in 10−4 in. above 1.1800 in. and specification), does your estimate of σ from
the ranges are in 10−4 in.) part (c) suggest that the turning process would
then be capable of producing most diameters
Sample 1 2 3 4 5
in these specifications? Explain.
x̄ 9.25 8.50 9.50 6.25 5.25 3. Becker, Francis, and Nazarudin conducted a study
R 1 2 2 8 7 of the effectiveness of commercial clothes dryers
in removing water from different types of fabric.
The following are some summary statistics from
Sample 6 7 8 9 10 a part of their study, where a garment made of one
x̄ 5.25 5.75 19.50 10.0 9.50
of r = 3 different blends was wetted and dried for
10 minutes in a particular dryer and the (water)
R 5 5 1 3 1
weight loss (in grams) measured. Each of the three
different garments was tested three times.
Sample 11 12 13 14 15
100% Cotton Cotton/Polyester Cotton/Acrylic
x̄ 9.50 9.75 12.25 12.75 14.50
R 6 1 9 2 7 n1 = 3 n2 = 3 n3 = 3
ȳ 1 = 85.0 g ȳ 2 = 348.3 g ȳ 3 = 258.3 g
s1 = 25.0 g s2 = 88.1 g s3 = 63.3 g
Sample 16 17 18 19 20
x̄ 8.00 10.0 10.25 8.75 10.0 (a) What restrictions/model assumptions are re-
R 3 0 1 3 0 quired in order to do formal inference based
on the data summarized here (if information
(a) The midspecification for the diameter in ques- on the baseline variability involved is pooled
tion was 1.1809 in. Suppose that a standard and the formulas of this chapter are used)?
σ for diameters turned on this machine is Assume that those model assumptions are a
2.5 × 10−4 in. Use these two values and find sensible description of this situation.
“standards given” control limits for x̄ and R. (b) Find sP and the associated degrees of free-
Make both x̄ and R charts using these and dom.
comment on what the charts indicate about (c) What does sP measure?
the turning process. (d) Give a 90% lower confidence bound for the
(b) In contrast to part (a) where standards were mean amount of water that can be removed
furnished, compute retrospective or “as past from the cotton garment by this dryer in a
data” control limits for both x̄ and R. Make 10-minute period.
both x̄ and R charts using these and comment
Chapter 7 Exercises 535
(e) Give a 90% two-sided confidence interval for (d) What model assumptions stand behind the
comparing the means for the two blended gar- formulas you used in parts (a) and (b)? In
ments. part (c)?
(f) Suppose that all pairs of fabric means are For the following questions, consider test results
to be compared using intervals of the form from all eight glues when making your analyses.
ȳ i − ȳ i 0 ± 1 and that simultaneous 95% con- (e) Find a pooled sample standard deviation and
fidence is desired. Find 1. give its degrees of freedom.
(g) A partially completed ANOVA table for test- (f) Repeat parts (a) and (b) using the pooled stan-
ing H0 : µ1 = µ2 = µ3 follows. Finish filling dard deviation instead of only s1 . What extra
in the table then find a p-value for a signifi- model assumption is required to do this (be-
cance test of this hypothesis. yond what was used in parts (a) and (b))?
(g) Find the value of an F statistic for testing
ANOVA Table H0 : µ1 = µ2 = · · · = µ8 and give its degrees
Source SS df MS F of freedom. (Hint: These data are balanced.
You ought to be able to use the ȳ’s and the
24,787 sample variance routine on your calculator to
help get the numerator for this statistic.)
132,247
(h) Simultaneous 95% two-sided confidence lim-
its for the mean strengths for the eight glues
4. The article “Behavior of Rubber-Based Elasto- are of the form ȳ i ± 1 for an appropriate
meric Construction Adhesive in Wood Joints” by number 1. Find 1.
P. Pellicane (Journal of Testing and Evaluation, (i) Simultaneous 95% two-sided confidence lim-
1990) compared the performance of r = 8 dif- its for all differences in mean strengths for the
ferent commercially available construction adhe- eight glues are of the form ȳ i − ȳ i 0 ± 1 for a
sives. m = 8 joints glued with each glue were number 1. Find 1.
tested for strength, giving results summarized as
5. Example 7 in Chapter 4 treats some data collected
follows (the units are kN):
by Kotlers, MacFarland, and Tomlinson while
studying strength properties of wood joints. Part
Glue (i) 1 2 3 4 5 6 7 8
of those data (stress at failure values in units of psi
ȳ i 1821 1968 1439 616 1354 1424 1694 1669 for four out of the original nine wood/joint type
si 214 435 243 205 135 191 225 551 combinations) are reproduced here, along with ȳ
and s for each of the four samples represented:
(a) Temporarily considering only the test results
for glue 1, give a 95% lower tolerance bound Wood Type
for the strengths of 99% of joints made with
glue 1. Pine Oak
(b) Still considering only the test results for glue
829 1169
1, give a 95% lower confidence bound for the
Butt 596
mean strength of joints made with glue 1.
(c) Now considering only the test results for ȳ = 712.5 ȳ = 1169
glues 1 and 2, assess the strength of the evi- s = 164.8
Joint Type
dence against the possibility that glues 1 and 1000 1295
2 produce joints with the same mean strength. Lap 859 1561
Show the whole five-step significance-testing ȳ = 929.5 ȳ = 1428.0
format. s = 99.7 s = 188.1
536 Chapter 7 Inference for Unstructured Multisample Studies
(a) Treating pine/butt joints alone, give a 95% sometimes have difficulty distinguishing in
two-sided confidence interval for mean their thinking and speech between specifica-
strength for such joints. (Here, base your in- tions and control limits. Briefly (but carefully)
terval on only the pine/butt data.) discuss the difference in meaning between the
(b) Treating only lap joints, how strong is the control limits for x̄ found in part (a) and these
evidence shown here of a difference in mean engineering specifications. (To what quanti-
joint strength between pine and oak woods? ties do the two apply? What are the different
(Here use only the pine/lap and oak/lap data.) purposes for the two? Where do the two come
Use the five-step format. from? And so on.)
(c) Give a 90% two-sided confidence interval for 7. Here are some summary statistics produced by
comparing the strength standard deviations Davies and Sehili for ten samples of m = 4 pin
for pine/lap and oak/lap joints. head diameters formed on a type of electrical com-
Consider all four samples in the following ques- ponent. The sampled components were groups
tions. of consecutive items taken from the output of a
(d) Assuming that all four wood type/joint type machine approximately once every ten minutes.
conditions are thought to have approximately The units are .001 in.
the same associated variability in joint
strength, give an estimate of this supposedly Sample x̄ R s Sample x̄ R s
common standard deviation.
(e) It is possible to compute simultaneous 95% 1 31.50 3 1.29 6 33.00 3 1.41
lower (one-sided) confidence limits for mean 2 30.75 2 .96 7 33.00 2 .82
joint strengths for all four wood type/joint 3 29.75 3 1.26 8 33.00 4 1.63
type combinations. Give these (based on the 4 30.50 3 1.29 9 34.00 2 .82
P-R method). 5 32.00 0 0 10 26.00 0 0
(f) Suppose that you want to compare butt joint
strength to lap joint strength and in fact want
a 95% two-sided confidence interval for Some summaries for the statistics are
X X X
1 1 x̄ = 313.5 R = 22 and s = 9.48
(µ + µoak/butt ) − (µpine/lap + µoak/lap )
2 pine/butt 2
(a) Assuming that the basic short-term variabil-
Give such a confidence interval, again making ity of the mechanism producing pin head di-
use of your answer to (d). ameters is constant, it makes sense to try to
6. In an industrial application of Shewhart x̄ and R quantify it in terms of a standard deviation σ .
control charts, 20 successive hourly samples of Various estimates of that σ are possible. Give
m = 2 high-precision metal parts were taken, and three such possible estimates based on R, s̄,
a particular diameter on the parts was measured. and sP .
x̄ and R values were calculated for each of the 20 (b) Using each of your estimates from (a), give
samples, and these had retrospective control limits for both x̄ and R.
(c) Compare the x̄’s and R’s given above to your
x̄¯ = .35080 in. and R = .00019 in. control limits from (b) based on R. Are there
any points that would plot outside control
(a) Give retrospective control limits that you limits on a Shewhart x̄ chart? On a Shewhart
would use in an analysis of the x̄ and R values. R chart?
(b) The engineering specifications for the diame- (d) For the company manufacturing these parts,
ter being measured were .3500 in. ± .0020 in. what are the practical implications of your
Unfortunately, even practicing engineers analysis in parts (b) and (c)?
Chapter 7 Exercises 537
8. Dunnwald, Post, and Kilcoin studied the viscosi- number of visual imperfections on a square foot
ties of various weights of various brands of motor of plastic sheet is λ = .04.
oil. Some summary statistics for part of their data (a) Give upper control limits for the number of
are given here. Summarized are m = 10 measure- imperfections found on pieces of material
ments of the viscosities of each of r = 4 different .5 ft × .5 ft and then 5 ft × 5 ft.
weights of Brand M motor oil at room tempera- (b) What would you tell a worker who, instead
ture. Units are seconds required for a ball to drop of inspecting a 10 ft × 10 ft specimen of the
a particular distance through the oil. plastic (counting total imperfections on the
whole), wants to inspect only a 1 ft × 1 ft
10W30 SAE 30 10W40 20W50 specimen and multiply the observed count of
imperfections by 100?
ȳ = 1.385 ȳ 2 = 2.066 ȳ 3 = 1.414 ȳ 4 = 4.498
11. Bailey, Goodman, and Scott worked on a process
s1 = .091 s2 = .097 s3 = .150 s4 = .204
for attaching metal connectors to the ends of hy-
draulic hoses. One part of that process involved
(a) Find the pooled sample standard deviation grinding rubber off the ends of the hoses. The
here. What are the associated degrees of free- amount of rubber removed is termed the skive
dom? length. The values in the accompanying table are
(b) If the P-R method is used to find simultane- skive length means and standard deviations for
ous 95% two-sided confidence intervals for 20 samples of five consecutive hoses ground on
all four mean viscosities, the intervals pro- one grinder. Skive length is expressed in .001 in.
duced are of the form ȳ i ± 1, for 1 an ap- above the target length.
propriate number. Find 1.
(c) If the Tukey method is used to find simulta- Sample x̄ s Sample x̄ s
neous 95% two-sided confidence intervals for
all differences in mean viscosities, the inter- 1 −.4 5.27 11 −2.2 5.50
vals produced are of the form ȳ i − ȳ i 0 ± 1, 2 0.0 4.47 12 −5.2 2.86
for 1 an appropriate number. Find 1. 3 −1.4 3.29 13 −.8 1.30
(d) Carry out an ANOVA test of the hypothesis 4 1.8 2.28 14 .8 2.68
that the four oil weights have the same mean 5 1.4 1.14 15 −2.0 2.92
viscosity.
6 0.0 4.24 16 −.2 1.30
9. Because of modern business pressures, it is not 7 −.4 4.39 17 −6.6 2.30
uncommon for standards for fractions noncon- 8 1.4 4.51 18 −1.0 4.21
forming to be in the range of 10−4 to 10−6 .
9 .2 4.32 19 −3.2 5.76
(a) What are “standards given” 3-sigma control
10 −3.2 2.05 20 −2.4 4.28
limits for a p chart with standard fraction
nonconforming 10−4 and sample size 100? −23.4 69.07
(b) If p becomes twice the standard value (of
10−4 ), what is the probability that the scheme
(a) What do these values indicate about the stabil-
from (a) detects this state of affairs at the first
ity of the skiving process? Show appropriate
subsequent sample? (Use your answer to (a)
work and explain fully.
and the binomial distribution for n = 100 and
(b) Give an estimate of the process short-term
p = 2 × 10−4 .)
standard deviation based on the given values.
(c) What does (b) suggest about the feasibility
(c) If specifications on the skive length are ±.006
of doing process monitoring for very small
in. and, over short periods, skive length can
fractions defective based on attributes data?
be thought of as normally distributed, what
10. Suppose that a company standard for the mean does your answer to (b) indicate about the
538 Chapter 7 Inference for Unstructured Multisample Studies
best possible fraction (for perfectly adjusted might then consider charting
grinders) of skives in specifications? Give a
number. X = demerits
(d) Based on your answer to (b), give control
= 2(number of A defects)
limits for future control of skive length means
and ranges for samples of size m = 3. + (number of B defects)
(e) Suppose that hoses from all grinders used dur-
ing a given shift are all dumped into a com- If one can model (number of A defects) and
mon bin. If upon sampling, say, 20 hoses from (number of B defects) as independent Pois-
this bin at the end of a shift, the 20 measured son random variables, it is relatively easy to
skive lengths have a standard deviation twice come up with sensible control limits. (Re-
the size of your answer to (b), what possible member that the variance of a sum of inde-
explanations come to mind for this? pendent random variables is the sum of the
(f) Suppose current policy is to sample five con- variances.)
secutive hoses once an hour for each grinder. (i) If the mean number of A defects per wid-
An alternative possibility is to sample one get is λ1 and the mean number of B defects per
hose every 12 minutes for each grinder. widget is λ2 , what are the mean and variance
(i) Briefly discuss practical trade-offs that for X ? Use your answers to give “standards
you see between the two possible sampling given” control limits for X .
methods. (ii) In light of your answer to (i), what nu-
(ii) If in fact the new sampling scheme were merical limits for X would you use to analyze
adopted, would you recommend treating the these values “as past data”?
five hoses from each hour as a sample of size 13. (Variables Versus Attributes Control Chart-
5 and doing x̄ and R charting with m = 5? ing) Suppose that a dimension of parts pro-
Explain. duced on a certain machine over a short period can
12. Two different types of nonconformance can ap- be thought of as normally distributed with some
pear on widgets manufactured by Company V. mean µ and standard deviation σ = .005 in. Sup-
Counts of these on ten widgets produced one per pose further that values of this dimension more
hour are given here. than .0098 in. from the 1.000 in. nominal value are
considered nonconforming. Finally, suppose that
Widget 1 2 3 4 5 6 7 8 9 10 hourly samples of ten of these parts are to be taken.
Type A Defects 4 2 1 2 2 2 0 2 1 0 (a) If µ is exactly on target (i.e., µ = 1.000 in.),
Type B Defects 0 2 2 4 2 4 3 3 7 2 about what fraction of parts will be noncon-
forming? Is it possible for the fraction non-
Total Defects 4 4 3 6 4 6 3 5 8 2 conforming ever to be any less than this fig-
ure?
(a) Considering first total nonconformances, is (b) One could use a p chart based on m = 10 to
there evidence here of process instability? monitor process performance in this situation.
Show appropriate work. What would be “standards given” 3-sigma
(b) What statistical indicators might you expect control limits for the p chart, using your an-
to observe in data like these if in fact type A swer from part (a) as the standard value of p?
and B defects have a common cause mecha- (c) What is the probability that a particular sam-
nism? ple of m = 10 parts will produce an “out of
(c) (Charts for Demerits) For the sake of ex- control” signal on the chart from (b) if µ re-
ample, suppose that type A defects are judged mains at its standard value of µ = 1.000 in.?
twice as important as type B defects. One How does this compare to the same probability
Chapter 7 Exercises 539
for a 3-sigma x̄ chart for m = 10 set up (d) Find and interpret a two-sided 90% confi-
with a center line at 1.000? (For the p chart, dence interval for σ and then the ratio στ /σ .
use a binomial probability calculation. For (e) If there is variability in skew, customers must
√ use the facts that µx̄ = µ and
the x̄ chart, continually adjust automatic folding and pack-
σx̄ = σ/ m.) aging equipment in order to prevent machine
(d) Compare the probability that a particular sam- jam-ups. Such variability is therefore highly
ple of m = 10 parts will produce an “out of undesirable for the box manufacturer, who
control” signal on the p chart from (b) to the wishes to please customers. What does your
probability that the sample will produce an analysis from (c) and (d) indicate about how
“out of control” signal on the (m = 10) 3- the manufacturer should proceed in any at-
sigma x̄ chart first mentioned in (c), suppos- tempts to reduce variability in skew? (What
ing that in fact µ = 1.005 in. What moral is is the big component of variance, and what
told by your calculations here and in part (c)? kind of actions might be taken to reduce it?
14. The article “How to Use Statistics Effectively in For example, is there a need for the immediate
a Pseudo-Job Shop” by G. Fellers (Quality En- purchase of new high-precision manufactur-
gineering, 1990) discusses some applications of ing equipment?)
statistical methods in the manufacture of corru- 15. The article “High Tech, High Touch” by J. Ryan
gated cardboard boxes. One part of the article (Quality Progress, 1987) discusses the quality en-
concerns the analysis of a variable called box hancement processes used by Martin Marietta in
“skew,” which quantifies how far from being per- the production of the space shuttle external (liq-
fectly square boxes are. This response variable, uid oxygen) fuel tanks. It includes a graph giving
which will here be called y, is measured in units counts of major hardware nonconformances for
of 321
in. r = 24 customer orders (each requir- each of 41 tanks produced. The accompanying
ing a different machine setup) were studied, and data (see next page) are approximate counts read
from each, the skews, y, of five randomly se- from that graph for the last 35 tanks. (The first 6
lected boxes were measured. A partial ANOVA tanks were of a different design than the others
table made in summary of the data follows. and are therefore not included here.)
(a) Make a retrospective c chart for these data.
ANOVA Table Is there evidence of real quality improvement
Source SS df MS F in this series of counts of nonconformances?
Explain.
Order (setup) 1052.39 (b) Consider only the last 17 tanks. Does it ap-
Error pear that quality was stable over the produc-
tion period represented by these tanks? (Make
Total 1405.59 119
another retrospective c chart.)
(c) It is possible that some of the figures read
(a) Complete the ANOVA table. from the graph in the original article may dif-
(b) In a given day, hundreds of different orders fer from the real figures by as much as, say, 15
are run in this plant. This situation is one nonconformances. Would this measurement
in which a random effects analysis is most error account for the apparent lack of stabil-
natural. Explain why. ity you found in (a) or (b) above? Explain.
(c) Find estimates of σ and στ . What, in the con-
text of this situation, do these two estimates
measure?
540 Chapter 7 Inference for Unstructured Multisample Studies
17. Eastman, Frye, and Schnepf counted defective form ȳ i ± 1, for an appropriate number 1.
plastic bags in 15 consecutive groups of 250 com- If 90% individual confidence is desired, what
ing off a converting machine immediately after a value of 1 should be used?
changeover to a new roll of plastic. Their counts (d) Individual two-sided confidence intervals for
are as follows: the differences in the five different means
would be of the form ȳ i − ȳ i 0 ± 1, for a num-
Sample Nonconforming Sample Nonconforming ber 1. If 90% individual confidence is de-
sired, what value of 1 should be used here?
1 147 9 0 (e) Using the P-R method, what 1 would be used
2 93 10 0 to make two-sided intervals of the form ȳ i ±
3 41 11 0 1 for all five mean boiling points, possessing
4 0 12 0 simultaneous 95% confidence?
5 18 13 0 (f) Using the Tukey method, what 1 would be
6 0 14 0 used to make two-sided intervals of the form
7 31 15 0 ȳ i − ȳ i 0 ± 1 for all differences in the five
8 22 mean boiling points, possessing simultaneous
99% confidence?
(g) Make an ANOVA table for these data. Then
Is it plausible that these data came from a phys-
use the calculations to find both R 2 for the
ically stable process, or is it clear that there is
one-way model and also the observed level
some kind of start-up phenomenon involved here?
of significance for an F test of the null hy-
Make and interpret an appropriate control chart to
pothesis that all five oils have the same mean
support your answer.
boiling point.
18. Sinnott, Thomas, and White compared several (h) It is likely that the measurements represented
properties of five different brands of 10W30 mo- here were all made on a single can of each
tor oil. In one part of their study, they measured brand of oil. (The students’ report was not
the boiling points of the oils. m = 3 measure- explicit about this point.) If so, the formal in-
ments for each of the r = 5 oils follow. (Units are ferences made here are really most honestly
degrees F.) thought of as applying to the five particu-
lar cans used in the study. Discuss why the
Brand C Brand H Brand W Brand Q Brand P inferences would not necessarily extend to
all cans of the brands included in the study
378 357 321 353 390 and describe the conditions under which you
386 365 303 349 378 might be willing to make such an extension.
388 361 306 353 381 Is the situation different if, for example, each
of the measurements comes from a different
(a) Compute and make a normal plot for the can of oil, taken from different shipping lots?
residuals for the one-way model. What does Explain.
the plot indicate about the appropriateness of 19. Baik, Johnson, and Umthun worked with a small
the one-way model assumptions? metal fabrication company on monitoring the per-
(b) Using the five samples, find sP , the pooled formance of a process for cutting metal rods.
estimate of σ . What does this value measure? Specifications for the lengths of these rods were
Give a two-sided 90% confidence interval for 33.69 in. ± .03 in. Measured lengths of rods in 15
σ based on sP . samples of m = 4 rods, made over a period of two
(c) Individual two-sided confidence intervals for
the five different means here would be of the
542 Chapter 7 Inference for Unstructured Multisample Studies
days, are shown in the accompanying table. (The (b) Repeat part (a) for the sample standard devi-
data are recorded in inches above the target value ations rather than ranges.
of 33.69, and the first five samples were made on The initial five samples were taken while the op-
day 1, while the remainder were made on day 2.) erators were first learning to cut these particu-
lar rods. Suppose that it therefore makes sense
Sample Rod Lengths x̄ R s to look separately at the last ten samples. These
samples have x̄¯ = −.00159, R = .00435, and s̄ =
1 .0075, .0100 .001964.
.0135, .0135 .01113 .0060 .00293 (c) Both the ranges and standard deviations of
2 −.0085, .0035 the last ten samples look reasonably stable.
−.0180, .0010 −.00550 .0215 .00981 What about the last ten x̄’s? (Compute control
3 .0085, .0000 limits for the last ten x̄’s, based on either R
.0100, .0020 .00513 .0100 .00487 or s̄, and say what is indicated about the rod
4 .0005, −.0005 cutting process.)
.0145, .0170 .00788 .0175 .00916
As a matter of fact, the cutting process worked
as follows. Rods were welded together at one
5 .0130, .0035
end in bundles of 80, and the whole bundle cut
.0120, .0070 .00888 .0095 .00444
at once. The four measurements in each sample
6 −.0115, −.0110 came from a single bundle. (There are 15 bundles
−.0085, −.0105 −.01038 .0030 .00131 represented.)
7 −.0080, −.0070 (d) How does this explanation help you under-
−.0060, −.0045 −.00638 .0035 .00149 stand the origin of patterns discovered in the
8 −.0095, −.0100 data in parts (a) through (c)?
−.0130, −.0165 −.01225 .0070 .00323 (e) Find an estimate of the “process short-term
9 .0090, .0125 σ ” for the last ten samples. What is it really
.0125, .0080 .01050 .0045 .00235 measuring in the present context?
10 −.0105, −.0100 (f) Use your estimate from (e) and, assuming
−.0150, −.0075 −.01075 .0075 .00312
that lengths of rods from a single bundle are
approximately normally distributed, compute
11 .0115, .0150
an estimate of the fraction of lengths in a
.0175, .0180 .01550 .0065 .00297
bundle that are in specifications, if in fact µ =
12 .0020, .0005 33.69 in.
.0010, .0010 .00113 .0015 .00063 (g) Simply pooling together the last ten samples
13 −.0010, −.0025 (making a single sample of size 40) and com-
−.0020, −.0030 −.00213 .0020 .00085 puting the sample standard deviation gives the
14 −.0020, .0015 value s = .00898. This is much larger than
.0025, .0025 .00113 .0045 .00214 any s recorded for one of the samples and
15 −.0010, −.0015 should be much larger than your value from
−.0020, −.0045 −.00225 .0035 .00155 (e). What is the origin of this difference in
magnitude?
x̄¯ = .00078 R = .0072 s̄ = .00339 20. Consider the last ten samples from Exercise 19.
Upon considering the physical circumstances that
(a) Find a retrospective center line and control produced the data, it becomes sensible to replace
limits for all 15 sample ranges. Apply them the control chart analysis done there with a ran-
to the ranges and say what is indicated about dom effects analysis simply meant to quantify
the rod cutting process.
Chapter 7 Exercises 543
the within- and between-bundle variance compo- paper). The data that follow (the units are ounces)
nents. are for m = 3 trials with each of the four paper
(a) Make an ANOVA table for these ten samples types and also for a “baseline” condition where
of size 4. Based on the mean squares, find es- no paper was loaded into the trimmer.
timates of σ , the standard deviation of lengths
for a given bundle, and στ , the standard devi- No Paper Newsprint Construction Computer Magazine
ation of bundle mean lengths.
(b) Find and interpret a two-sided 90% confi- 24, 25, 31 61, 51, 52 72, 70, 77 59, 59, 70 54, 59, 61
dence interval for the ratio στ /σ .
(c) What is the principal origin of variability in (a) If the methods of this chapter are applied in
the lengths of rods produced by this cutting the analysis of these data, what model as-
method? (Is it variability of lengths within sumptions must be made? With small sample
bundles or differences between bundles?) sizes such as those here, only fairly crude
21. The following data appear in the text Quality Con- checks on the appropriateness of the assump-
trol and Industrial Statistics by A. J. Duncan. tions are possible. One possibility is to com-
They represent the numbers of disabling injuries pute residuals and normal-plot them. Do this
suffered and millions of man-hours worked at a and comment on the appearance of the plot.
large corporation in 12 consecutive months. (b) Compute a pooled estimate of the standard
deviation based on these five samples. What
Month 1 2 3 4 5 6
is sP supposed to be measuring in the present
situation?
Injuries 11 4 5 8 4 4
(c) Use the value of sP and make (individual)
106 man-hr .175 .178 .175 .180 .183 .198
95% two-sided confidence intervals for each
of the five mean force requirements µNo paper ,
µNewsprint , µConstruction , µComputer , and µMagazine .
Month 7 8 9 10 11 12
(d) Individual confidence intervals for the differ-
Injuries 9 12 2 6 6 7
ences between particular pairs of mean force
106 man-hr .210 .212 .210 .211 .195 .200 requirements are of the form ȳ i − ȳ i 0 ± 1,
for an appropriate value of 1. Use sP and find
(a) Temporarily assuming the injury rate per man- 1 if individual 95% two-sided intervals are
hour to be stable over the period studied, find desired.
a sensible estimate of the mean injuries per (e) Suppose that it is desirable to compare the
106 man-hours. “no paper” force requirement to the average
(b) Based on your figure from (a), find “control of the force requirements for the various pa-
limits” for the observed rates in each of the 12 pers. Give a 95% two-sided confidence inter-
months. Do these data appear to be consistent val for the quantity µNo paper − 14 (µNewsprint +
with a “stable system” view of the corpora- µConstruction + µComputer + µMagazine ).
tion’s injury production mechanisms? Or are (f) Use the P-R method of simultaneous confi-
there months that are clearly distinguishable dence intervals and make simultaneous 95%
from the others in terms of accident rates? two-sided confidence intervals for the five
22. Eder, Williams, and Bruster studied the force (ap- mean force requirements. How do the lengths
plied to the cutting arm handle) required to cut of these intervals compare to the lengths of
various types of paper in a standard paper trim- the intervals you found in part (c)? Why is it
mer. The students used stacks of five sheets of four sensible that the lengths should be related in
different types of paper and recorded the forces this way?
needed to move the cutter arm (and thus cut the
544 Chapter 7 Inference for Unstructured Multisample Studies
Find 1 if individual 95% two-sided intervals tics for the tests on four particular designs are
are desired. given next. (The units are seconds.)
(e) Suppose that it is desirable to compare the per
pulse change in average depth of cut between Design #1 Design #2 Design #3 Design #4
100 pulses and 500 pulses to the per pulse
change in average depth of cut between 500 n1 = 4 n2 = 4 n3 = 4 n4 = 4
pulses and 1,000 pulses. Give a 90% two- ȳ 1 = 1.640 ȳ 2 = 2.545 ȳ 3 = 1.510 ȳ 4 = 2.600
sided confidence interval for the quantity s1 = .096 s2 = .426 s3 = .174 s4 = .168
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
546
8.1 Basic Inference in Two-Way Factorials with Some Replication 547
Example 1 Joint Strengths for Three Different Joint Types in Three Different Woods
(Example 7, Chapter 4,
Consider again the wood joint strength study of Kotlers, MacFarland, and Tom-
revisited—page 163 )
linson. Table 8.1 reorganizes the data given earlier in Table 4.11 into a 3 × 3 table
showing the nine different samples of one or two joint strengths for all combina-
tions of three woods and three joint types. The data in Table 8.1 have complete
two-way factorial structure, and seven of the nine combinations represented in
the table provide some replication.
Table 8.1
Joint Strengths for 32 Combinations of Joint Type and Wood
Wood
1 (Pine) 2 (Oak) 3 (Walnut)
1 (Butt) 829, 596 1169 1263, 1029
Joint 2 (Beveled) 1348, 1207 1518, 1927 2571, 2443
3 (Lap) 1000, 859 1295, 1561 1489
Example 1 The place to begin a formal analysis of the wood joint strength data is with
(continued ) consideration of the appropriateness of the one-way (normal distributions with
a common variance) model for joint strength. Table 8.2 gives some summary
statistics for the data of Table 8.1.
Residuals for the joint strength data are obtained by subtracting the sample
means in Table 8.2 from the corresponding observations in Table 8.1. In this
data set, the sample sizes are so small that the residuals will obviously be highly
dependent. Those from samples of size 2 will be plus-and-minus a single number
548 Chapter 8 Inference for Full and Fractional Factorial Studies
Wood
corresponding to that sample. Those from samples of size 1 will be zero. So there
is reason to expect residual plots to show some effects of this dependence. Figure
8.1 is a normal plot of the 16 residuals, and its complete symmetry (with respect
to the positive and negative residuals) is caused by this dependence.
Of course, the sample standard deviations in Table 8.2 vary somewhat, but
the ratio between the largest and smallest (a factor of about 3) is in no way
surprising based on these sample sizes of 2. (Even if only 2 rather than 7 sample
variances were involved, since 9(= 32 ) is between the .75 and .9 quantiles of the
F1,1 distribution, the observed level of significance for testing the equality of the
two underlying variances would exceed .2 = 2(1 − .9).) And Figure 8.2, which
is a plot of residuals versus sample means, suggests no trend in σ as a function
of mean response, µ.
In sum, the very small sample sizes represented in Table 8.1 make definitive
investigation of the appropriateness of the one-way normal model assumptions
Standard normal quantile
2.4
1.2
2
0.0 2
2
–1.2
300
Residual (psi)
150
–150
impossible. But the limited checks that are possible provide no indication of
serious problems with operating under those restrictions.
Notice that for these data,
(2 − 1)s11
2
+ (2 − 1)s13
2
+ (2 − 1)s212
+ · · · + (2 − 1)s32
2
sP2 =
(2 − 1) + (2 − 1) + (2 − 1) + · · · + (2 − 1)
1
= (164.8)2 + (165.5)2 + · · · + (188.1)2
7
I = 28,805 (psi)2
So
p
sP = 28,805 = 169.7 psi
1
ȳ i j ± 3.499(169.7) q
ni j
I ȳ i j ± 593.9 (8.1)
I ȳ i j ± 419.9 (8.2)
550 Chapter 8 Inference for Full and Fractional Factorial Studies
Example 1 Figure 8.3 is an interaction plot (like Figure 4.22) enhanced with error bars made
(continued ) using limits (8.1) and (8.2). Notice, by the way, that the Bonferroni inequality puts
the simultaneous confidence associated with all nine of the indicated intervals at
a minimum of 91% (.91 = 1 − 9(1 − .99)).
The important message carried by Figure 8.3, not already present in Figure
4.22, is the relatively large imprecision associated with the sample means as esti-
mates of long-run mean strengths. And that imprecision has implications regard-
ing the statistical detectability of factorial effects. For example, by moving near
the extremes on some error bars in Figure 8.3, one might find nine means within
the indicated intervals such that their connecting line segments would exhibit par-
allelism. That is, the plot already suggests that the empirical interactions between
Wood Type and Joint Type seen in these data may not be large enough to distin-
guish from background noise. Or if they are detectable, they may be only barely so.
The issues of whether the empirical differences between woods and between
joint types are distinguishable from experimental variation are perhaps somewhat
easier to call. There is consistency in the patterns “Walnut is stronger than oak is
stronger than pine” and “Beveled is stronger than lap is stronger than butt.” This,
combined with differences at least approaching the size of indicated imprecisions,
2500
Beveled
Mean stress at failure
2000
Lap
1500
Butt
1000
500
suggests that firm statements about the main effects of Wood Type and Joint Type
are likely possible.
The kind of analysis made thus far on the joint strength data is extremely impor-
tant and illuminating. Our discussion will proceed to more complicated statistical
methods for such problems. But these often amount primarily to a further refinement
and quantification of the two-way factorial story already told graphically by a plot
like Figure 8.3.
Use the notations ȳ i j , ȳ i. , and ȳ . j introduced in Section 4.3, and in the obvious way
(actually already used in Example 1), let
The model assumptions that the I · J samples are roughly describable as independent
samples from normal distributions with a common variance σ 2 can be written as
Two-way model
yi jk = µi j + i jk (8.3)
statement
552 Chapter 8 Inference for Full and Fractional Factorial Studies
1X
J
µi. = µ
J j=1 i j
Figure 8.4 shows these as row, column, and grand averages of the µi j . (This is the
theoretical counterpart of Figure 4.21.)
Then, following the pattern established in Definitions 5 and 6 in Chapter 4 for
sample quantities, there are the following two definitions for theoretical quantities.
Definition 1 In a two-way complete factorial study with factors A and B, the main effect
of factor A at its ith level is
αi = µi. − µ..
β j = µ. j − µ..
These main effects are measures of how (theoretical) mean responses change
from row to row or from column to column in Figure 8.4. The fitted main effects
of Section 4.3 can be thought of as empirical approximations to them. It is a
8.1 Basic Inference in Two-Way Factorials with Some Replication 553
Factor B
Level 1 Level 2 Level J
Level 2 µ 21 µ 22 µ 2J µ 2.
Factor A
µ I1 µ I2 µ IJ µ I.
Level I
µ .1 µ .2 µ.J µ ..
consequence of the form of Definition 1 that (like their empirical counterparts) main
effects of a given factor sum to 0 over levels of that factor. That is, simple algebra
shows that
X
I X
J
αi = 0 and βj = 0
i=1 j=1
Definition 2 In a two-way complete factorial study with factors A and B, the interaction
of factor A at its ith level and factor B at its jth level is
αβi j = µi j − (µ.. + αi + β j )
X
I X
J
αβi j = αβi j = 0
i=1 j=1
Another simple consequence is that upon adding (µ.. + αi + β j ) to both sides of the
equation defining αβi j , one obtains a decomposition of each µi j into a grand mean
plus an A main effect plus a B main effect plus an AB interaction:
The identity (8.4) is sometimes combined with the two-way model equation (8.3)
to obtain the equivalent model equation
A second statement
yi jk = µ.. + αi + β j + αβi j + i jk (8.5)
of the two-way model
Here the factorial effects appear explicitly as going into the makeup of the observa-
tions. Although there are circumstances where representation (8.5) is essential, in
most cases it is best to think of the two-way model assumptions in form (8.3) and
just remember that the αi , β j , and αβi j are simple functions of the I · J means µi j .
1. the drawing of inferences concerning the interactions and main effects, with
2. the possibility of finding A, B, or A and B “main effects only” models
adequate to describe responses, and subsequently using such simplified de-
scriptions in making predictions about system behavior.
The basis of inference for the αi , β j , and αβi j is that they are linear combinations
Factorial effects of the means µi j . (That is, for properly chosen “c’s,” the factorial effects are “L’s”
are L’s, fitted from Section 7.2.) And the fitted effects defined in Chapter 4’s Definitions 5 and
effects are 6 are the corresponding linear combinations of the sample means ȳ i j . (That is, the
corresponding L̂’s fitted factorial effects are the corresponding “ L̂’s.”)
Example 1 To illustrate that the effects defined in Definitions 1 and 2 are linear combinations
(continued ) of the underlying means µi j , consider α1 and αβ23 in the wood joint strength
study. First,
α1 = µ1. − µ..
8.1 Basic Inference in Two-Way Factorials with Some Replication 555
1 1
= (µ11 + µ12 + µ13 ) − (µ11 + µ12 + · · · + µ32 + µ33 )
3 9
2 2 2 1 1 1 1 1 1
= µ11 + µ12 + µ13 − µ21 − µ22 − µ23 − µ31 − µ32 − µ33
9 9 9 9 9 9 9 9 9
Once one realizes that the factorial effects are simple linear combinations of
the µi j , it is a small step to recognize that formula (7.20) of Section 7.2 can be
applied to make confidence intervals for them. For example, the question of whether
the lack of parallelism evident in Figure 8.3 is large enough to be statistically
detectable can be approached by looking at confidence intervals for the αβi j . And
quantitative comparisons between joint types can be based on confidence intervals
for differences between the A main effects, αi − αi 0 = µi. − µi 0 . . And quantitative
comparisons between woods can be based on differences between the B main effects,
β j − β j 0 = µ. j − µ. j 0 .
The only obstacle to applying formulaP (7.20) of Section 7.2 to do inference for
factorial effects is determining how the “ ci2 /n i ” term appearing in the formula
should look for quantities of interest. In the preceding example, a number of rather
odd-looking coefficients ci j appeared when writing out expressions for α1 and αβ23
in terms of the basic means µi j . However,
P 2 it is possible to discover and write
down general formulas for the sum ci j /n i j for some important functions of the
factorial effects. Table 8.3 gives the relatively simple formulas for the balanced data
case where all n i j = m. The less pleasant general versions of the formulas are given
in Table 8.4.
556 Chapter 8 Inference for Full and Fractional Factorial Studies
Table 8.3
Balanced Data Formulas to Use
with Limits (8.6)
P ci2j
L L̂ i, j
ni j
(I − 1)(J − 1)
αβi j abi j
mIJ
I −1
αi ai
mIJ
2
αi − αi 0 ai − ai 0
mJ
J −1
βj bj
mIJ
2
βj − βj 0 bj − bj 0
mI
Armed with Tables 8.3 and 8.4, the form of individual confidence intervals for
any of the quantities L = αβi j , αi , β j , αi − αi 0 , or β j − β j 0 is obvious. In the formula
for confidence interval endpoints
Confidence limits v
for a linear uX 2
u ci j
combination of L̂ ± tsP t (8.6)
two-way factorial i, j
ni j
means
2. the fitted effects from Section 4.3 are used to find L̂,
3. an appropriate formula from Table 8.3 or 8.4 is chosen to give the quantity
under the radical, and
Table 8.4
General Formulas to use with Limits (8.6)
P ci2j
L L̂ i, j
ni j
2 X 1 X 1 X
(I − 1) (J − 1) + (I − 1)2
2 2
1 1
αβi j abi j + (J − 1)2 +
IJ ni j 0
ni j 0 0
ni 0 j ni 0 j 0
j 6= j i 6=i i 0 6=i, j 0 6= j
2 X 1 X 1
1 (I − 1)2
αi ai +
IJ ni j 0
ni 0 j
j i 6=i, j
1 X 1 X 1
αi − αi 0 ai − ai 0 +
J2 j
n ij j
n i0 j
2 X 1 X 1
1 (J − 1)
βj bj 2
+
IJ n i j 0
n i j 0
i i, j 6= j
!
1 X 1 X 1
βj − βj 0 bj − bj 0 +
I2 i
ni j i
ni j 0
Example 2 Then choosing t (as a quantile of the t9 distribution) to produce the desired
(continued ) confidence level, equation (8.6) shows appropriate confidence limits to be
Then again choosing t to produce the desired confidence level, equation (8.6)
shows appropriate confidence limits to be
b j − b j 0 ± tsP (.5774)
that is,
I ȳ . j − ȳ . j 0 ± tsP (.5774)
Example 1 Consider making formal inferences for the factorial effects in the (unbalanced)
(continued ) wood joint strength. Suppose that inferences are to be phrased in terms of two-
sided 99% individual confidence intervals and begin by considering the interac-
tions αβi j .
Despite the students’ best efforts to the contrary, the sample sizes in Table
8.1 are not all the same. So one is forced to use formulas in Table 8.4 instead of
the simpler ones in Table 8.3. Table 8.5 collects the sums of reciprocal sample
sizes appearing in the first row of Table 8.4 for each of the nine combinations of
i = 1, 2, 3 and j = 1, 2, 3.
For example, for the combination i = 1 and j = 1,
1 1
= = .5
n 11 2
1 1 1 1
+ = + = 1.5
n 12 n 13 1 2
1 1 1 1
+ = + = 1.0
n 21 n 31 2 2
1 1 1 1 1 1 1 1
+ + + = + + + = 2.5
n 22 n 23 n 32 n 33 2 2 2 1
8.1 Basic Inference in Two-Way Factorials with Some Replication 559
Table 8.5
Sums of Reciprocal Sample Sizes Needed in Making
Confidence Intervals for Joint/Wood Interactions
1 X 1 X 1 X 1
i j
ni j 0
ni j 0 0
ni 0 j ni 0 j 0
j 6= j i 6=i i 0 6=i, j 0 6= j
P
The entries in Table 8.5 lead to values for ci2j /n i j via the formula on the
first row of Table 8.4. Then, since (from before) sP = 169.7 psi with 7 associated
degrees of freedom, and since the .995 quantile of the t7 distribution is 3.499,
it is possible to calculate the plus-or-minus part of formula (8.6) in order to get
two-sided 99% confidence intervals for the αβi j . In addition, remember that all
qP in Section 4.3 and collected in Table 4.14
nine fitted interactions were calculated
(page 170). Table 8.6 gives the ci2j /n i j values, the fitted interactions abi j ,
and the plus-or-minus part of two-sided 99% individual confidence intervals for
the interactions αβi j .
To illustrate the calculations summarized in the third column of Table 8.6,
consider the combination with i = 1 (butt joints) and j = 1 (pine wood). Since
I = 3 and J = 3, the first row of Table 8.4 shows that for L = αβ11
2 !
X ci2j 1 22 · 22
= + 2 (1.5) + 2 (1.0) + 2.5 = .2531
2 2
ni j 3·3 2
from which
v
u
uX ci2j √
t = .2531 = .5031
ni j
associated confidence interval entirely to one side of 0. That is, most of the lack of
parallelism seen in Figure 8.3 is potentially attributable to experimental variation.
But that associated with beveled joints and walnut wood can be differentiated
from background noise. This suggests that if mean joint strength differences on
the order of 333 ± 299 psi are of engineering importance, it is not adequate to
think of the factors Joint Type and Wood Type as operating separately on joint
strength across all three levels of each factor. On the other hand, if attention was
restricted to either butt and lap joints or to pine and oak woods, a “no detectable
interactions” description of joint strength would perhaps be tenable.
To illustrate the use of formula (8.6) in making inferences about main effects
on joint strength, consider comparing joint strengths for pine and oak woods.
The rather extended analysis of interactions here and the character of Figure 8.3
suggest that the strength profiles of pine and oak across the three joint types are
comparable. If this is so, estimation of β1 − β2 = µ.1 − µ.2 amounts to more than
the estimation of the difference in average (across joint types) mean strengths
of pine and oak joints (pine minus oak). β1 − β2 is also the difference in mean
strengths of pine and oak joints for any of the three joint types individually. It is
thus a quantity of real interest.
Once again, since the data in Table 8.1 are not balanced, it is necessary to
use the more complicated formula in Table 8.4 rather than the formula in Table
8.3 in making a confidence interval for β1 − β2 . For L = β1 − β2 , the last row
of Table 8.4 gives
X ci2j
1 1 1 1 1 1 1
= + + + + + = .3889
i, j
ni j 32 2 2 2 1 2 2
8.1 Basic Inference in Two-Way Factorials with Some Replication 561
formula (8.6) shows that endpoints of a two-sided 99% confidence interval for
L = β1 − β2 are
√
(−402.5 − 64.17) ± 3.499(169.7) .3889
that is,
−466.67 ± 370.29
that is,
This analysis establishes that the oak joints are on average from 96 psi to 837 psi
stronger than comparable pine joints. This may seem a rather weak conclusion,
given the apparent strong increase in sample mean strengths as one moves from
pine to oak in Figure 8.3. But it is as strong a statement as is justified in the light of
the large confidence requirement (99%) and the substantial imprecision in the stu-
dents’ data (related to the small sample sizes and a large pooled standard deviation,
sP = 169.7 psi). If ±370 psi precision for comparing pine and oak joint strength is
not adequate for engineering purposes and large confidence is still desired, these
calculations point to the need for more data in order to sharpen that comparison.
from formula (8.6) needed to make confidence limits for main effects and inter-
actions. (The MINITAB printout lists this information for only (I − 1) factor A
main effects, (J − 1) factor B main effects, and (I − 1)(J − 1) A×B interactions.
Renaming levels of the factors to change their alphabetical order will produce
562 Chapter 8 Inference for Full and Fractional Factorial Studies
a different printout giving this information for the remaining main effects and
interactions.)
Corresponding to formulas (8.7) and (8.8) are formulas for simultaneous two-
sided confidence limits for all possible differences in B main effects β j − β j 0 =
µ. j − µ. j 0 —namely,
Tukey simultaneous
confidence limits for s
q∗ 1 X 1 X 1
all differences in B ȳ . j − ȳ . j 0 ± √ sP + (8.9)
main effects 2 I i
ni j i
ni j 0
and
Balanced data Tukey
simultaneous confidence q ∗s
limits for all differences
ȳ . j − ȳ . j 0 ± √ P (8.10)
Im
in B main effects
where q ∗ is taken from Tables B.9 using ν = n − IJ degrees of freedom and number
of means to be compared J .
B Placement
I sP = 106.7 lb
106.7
ȳ i j ± 3.095 √
3
for estimating each µi j . (k2∗ = 3.095 was obtained from Table B.8A.) This is
approximately
ȳ i j ± 191
a1 = 88.9 b1 = −63.1
a2 = −24.4 b2 = 63.1
a3 = −64.4
ab11 = 15.6 ab12 = −15.6
ab21 = −5.4 ab22 = 5.4
ab31 = −10.1 ab32 = 10.1
8.1 Basic Inference in Two-Way Factorials with Some Replication 565
5900
5800
5300
1 2
(.5 in. from edge) (1.0 in. from edge)
Hole placement
Then, since the data are balanced, one may use the formulas of Table 8.3 together
with formula (8.6). So individual confidence intervals for the interactions αβi j
are of the form
r
(3 − 1)(2 − 1)
abi j ± t (106.7)
3·3·2
that is,
abi j ± t (35.6)
Clearly, for any sensible confidence level (producing t of at least 1), such intervals
all cover 0. This confirms the lack of statistical detectability of the interactions
already represented in Figure 8.5.
It thus seems sensible to proceed to consideration of the main effects in this
tensile strength study. To illustrate the application of Tukey’s method to factorial
main effects, consider first simultaneous 95% two-sided confidence intervals
for the three differences α1 − α2 , α1 − α3 , and α2 − α3 . Applying formula (8.8)
566 Chapter 8 Inference for Full and Fractional Factorial Studies
(3.77)(106.7)
ȳ i. − ȳ i 0 . ± √
2·3
that is,
I ȳ i. − ȳ i 0 . ± 164 lb
are in order. No difference between the ai ’s exceeds 164 lb. That is, if simultaneous
95% confidence is desired in the comparison of the hole size main effects, one
must judge the students’ data to be interesting—perhaps even suggestive of
a decrease in strength with increased diameter—but nevertheless statistically
inconclusive. To really pin down the impact of hole size on tensile strength,
larger samples are needed.
To see that the Clubb and Goedken data do tell at least some story in a
reasonably conclusive manner, finally consider the use of the last row of Table
8.3 with formula (8.6) to make a two-sided 95% confidence interval for β2 − β1 ,
the difference in mean strengths for strips with centered holes as compared to
ones with holes .5 in. from the strip edge. The desired interval has endpoints
r
2
b2 − b1 ± tsP
mI
that is,
s
2
63.1 − (−63.1) ± 2.179(106.7)
3(3)
that is,
126.2 ± 109.6
that is,
Thus, although the students’ data don’t provide much precision, they are adequate
to establish clearly the existence of some decrease in tensile strength as a hole is
moved from the center of the strip towards its edge.
model (8.3), formulas (8.7) and (8.9) provide an actual simultaneous confidence at
least as big as the nominal one, and when all n i j = m, formulas (8.8) and (8.10)
provide actual simultaneous confidence equal to the nominal one.) But in practical
terms, the inferences they provide (and indeed the ones provided by formula (8.6) for
individual differences in main effects) are not of much interest unless the interactions
αβi j have been judged to be negligible.
Nonnegligible interactions constitute a warning that the patterns of change in
mean response, as one moves between levels of one factor, (say, B) are different
for various levels of the other factor (say, A). That is, the pattern in the µi j is not
a simple one generally describable in terms of the two factors acting separately.
Rather than trying to understand the pattern in terms of main effects, something else
must be done.
What if interactions As discussed in Section 4.4, sometimes a transformation can produce a response
are not negligible? variable describable in terms of main effects only. At other times, restriction of
attention to part of a factorial produces a study (of reduced scope) where it makes
sense to think in terms of main effects. (In Example 1, consideration of only butt
and lap joints gives an arena where “negligible interactions” may be a sensible
description of joint strength.) Or it may be most natural to mentally separate an
I × J factorial into I (J ) different J (I ) level studies on the effects of factor B(A) at
different levels of A(B). (The 3 × 3 wood joint strength study in Example 1 might
be thought of as three different studies, one for each joint type, of the effects of wood
type on strength.) Or if none of these approaches to analyzing two-way factorial data
with important interactions is attractive, it is always possible to ignore the two-way
structure completely and treat the I · J samples as arising from simply r = I · J
unstructured different conditions.
Section 1 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. The accompanying table shows part of the data of using the P-R method of making 95% simul-
Dimond and Dix, referred to in Examples 6 (Chap- taneous two-sided confidence intervals. (Plot
ter 1) and 9 (Chapter 3). The values are the shear mean strength versus glue type.)
strengths (in lb) for m = 3 tests on joints of various (b) Compute the fitted main effects and interac-
combinations of Wood Type and Glue Type. tions from the six combination sample means.
Use these to make individual 95% confidence
Wood Glue Joint Shear Strengths intervals for all of the main effects and inter-
actions in this 2 × 3 factorial study. What do
pine white 130, 127, 138 these indicate about the detectability of the var-
pine carpenter’s 195, 194, 189 ious effects?
pine cascamite 195, 202, 207 (c) Use Tukey’s method for simultaneous com-
fir white 95, 119, 62 parison of main effects and give simultaneous
fir carpenter’s 137, 157, 145 95% confidence intervals for all differences in
fir cascamite 152, 163, 155 Glue Type main effects.
2. B. Choi conducted a replicated full factorial study
(a) Make an interaction plot of the six combination of the stopping properties of various types of bi-
means and enhance it with error bars derived cycle tires on various riding surfaces. Three dif-
ferent Types of Tires were used on the bike, and
568 Chapter 8 Inference for Full and Fractional Factorial Studies
three different Pavement Conditions were used. For (d) Compute all of the fitted factorial effects for
each Tire Type/Pavement Condition combination, Choi’s data. (Find the ai ’s, b j ’s, and abi j ’s de-
m = 6 skid mark lengths were measured. The ac- fined in Section 4.3.)
companying table shows some summary statistics (e) If one wishes to make individual 95% two-
for the study. (The units are cm.) sided confidence intervals for the interactions
αβi j , intervals of the form abi j ± 1 are appro-
Dry Wet priate. Find 1. Based on this value, are there
Concrete Concrete Dirt statistically detectable interactions here? How
does this conclusion compare with your more
Smooth ȳ 11 = 359.8 ȳ 12 = 366.5 ȳ 13 = 393.0
qualitative answer to part (c)?
Tires s11 = 19.2 s12 = 26.4 s13 = 25.4
(f) If one wishes to compare Tire Type main ef-
Reverse ȳ 21 = 343.0 ȳ 22 = 356.7 ȳ 23 = 375.7 fects, confidence intervals for the differences
Tread s21 = 15.5 s22 = 37.4 s23 = 39.9 αi − αi 0 are in order. Find individual 95% two-
sided confidence intervals for α1 − α2 , α1 −
Treaded ȳ 31 = 384.8 ȳ 32 = 400.8 ȳ 33 = 402.5
α3 , and α2 − α3 . Based on these, are there sta-
Tires s31 = 15.4 s32 = 60.8 s33 = 32.8
tistically detectable differences in Tire Type
main effects here? How does this conclusion
(a) Compute sP for Choi’s data set. What is this compare with your answer to part (c)?
supposed to be measuring? (g) Redo part (f), this time using (Tukey) simulta-
(b) Make an interaction plot of the sample means neous 95% two-sided confidence intervals.
similar to Figure 8.3. Use error bars for the
means calculated from individual 95% two-
sided confidence limits for the means. (Make
use of your value of sP from (a).)
(c) Based on your plot from (b), which factorial
effects appear to be distinguishable from back-
ground noise? (Tire Type main effects? Pave-
ment Condition main effects? Tire × Pavement
interactions?)
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
on the power required to make a cut on a lathe at a particular depth of cut, feed
rate, and spindle speed. The response variable was the vertical deflection (in
mm) of the indicator needle on a dynamometer (a measurement proportional to
the horsepower required to make the particular cut). Miller’s data are given in
Table 8.8.
The most elementary view possible of the power requirement data in Table
8.8 is as r = 8 samples of size m = 4. Simple summary statistics for these 23 = 8
samples are given in Table 8.9.
To the extent that the one-way normal model is an adequate description of
this study, the methods of Chapter 7 are available for use in analyzing the data of
Table 8.8. The reader is encouraged to verify that plotting of residuals (obtained
by subtracting the ȳ values in Table 8.9 from the corresponding raw data values of
Table 8.8
Dynamometer Readings for 23 Treatment Combinations in a Metal Cutting Study
Table 8.8) reveals only one slightly unpleasant feature of the power requirement
data relative to the potential use of standard methods of inference. When plotted
against levels of the Type of Cut variable, the residuals for interrupted cuts are
shown to be on the whole somewhat smaller than those for continuous cuts. (This
phenomenon is also obvious in retrospect from the sample standard deviations in
Table 8.9. These are smaller for the second four samples than for the first four.)
But the disparity in the sizes of the residuals is not huge. So although there may
be some basis for suspecting improvement in power requirement consistency for
interrupted cuts as opposed to continuous ones, the tractability of the one-way
model and the kind of robustness arguments put forth at the end of Section 6.3
once again suggest that the standard model and methods be used. This is sensible,
provided the resulting inferences are then treated as approximate and real-world
“close calls” are not based on them.
The pooled sample variance here is
so
I sP = 1.492 mm
1.492
ȳ i jk ± 2.969 √
4
8.2 p-Factor Studies with Two Levels for Each Factor 571
that is,
I ȳ i jk ± 2.21 mm
(There is enough precision provided by the data to think of the sample means in
Table 8.9 as roughly “all good to within 2.21 mm.”) And the other methods of
Sections 7.1 through 7.4 based on sP might be used as well.
and further continue the dot notations used in Section 4.3 for unweighted averages
of the ȳ i jk . In comparison to the notation of Chapter 7, this amounts to adding two
subscripts in order to acknowledge the three-way structure in the samples.
The use of additional subscripts is helpful not only for naming empirical quan-
tities but also for naming theoretical quantities. That is, with
where the i jkl terms are iid normal random variables with mean 0 and variance σ 2 .
Formula (8.11) could be called the three-way (normal) model equation because it
recognizes the special organization of the I · J · K samples according to combina-
tions of levels of the three factors. But beyond this, it says no more or less than the
one-way model equation from Section 7.1.
The initial objects of inference in three-way factorial analyses are linear com-
binations of theoretical means µi jk , analogous to the fitted effects of Section 4.3.
Thus, it is necessary to carefully define the theoretical or underlying main effects,
2-factor interactions, and 3-factor interactions for a three-way factorial study. In the
definitions that follow, a dot appearing as a subscript will (as usual) be understood to
indicate that an average has been taken over all levels of the factor corresponding to
the dotted subscript. Consider first main effects. Parallel to Definition 7 in Chapter 4
(page 182) for fitted main effects is a definition of theoretical main effects.
Definition 3 In a three-way complete factorial study with factors A, B, and C, the main
effect of factor A at its ith level is
αi = µi.. − µ...
β j = µ. j. − µ...
γk = µ..k − µ...
These main effects measure how (when averaged over all combinations of levels
of the other factors) underlying mean responses change from level to level of the
factor in question. Definition 3 has the algebraic consequences that
X
I X
J X
K
αi = 0, β j = 0, and γk = 0
i=1 j=1 k=1
Definition 4 In a three-way complete factorial study with factors A, B, and C, the 2-factor
interaction of factor A at its ith level and factor B at its jth level is
αβi j = µi j. − (µ... + αi + β j )
the 2-factor interaction of A at its ith level and C at its kth level is
and the 2-factor interaction of B at its jth level and C at its kth level is
βγ jk = µ. jk − (µ... + β j + γk )
Like their empirical counterparts defined in Section 4.3, the 2-factor interactions
in a three-way study are measures of lack of parallelism on two-way plots of means
obtained by averaging out over levels of the “other” factor. And it is an algebraic
consequence of the form of Definition 4 that
X
I X
J X
I X
K
αβi j = αβi j = 0, αγik = αγik = 0
i=1 j=1 i=1 k=1
and
X
J X
K
βγ jk = βγ jk = 0
j=1 k=1
Definition 5 In a three-way complete factorial study with factors A, B, and C, the 3-factor
interaction of factor A at its ith level, factor B at its jth level, and factor C
at its kth level is
Like their fitted counterparts, the (theoretical) 3-factor interactions are measures of
patterns in the µi jk not describable in terms of the factors acting separately or in pairs.
Or differently put, they measure how much what one would call the AB interactions
at a single level of C change from level to level of C. And, like the fitted 3-factor
574 Chapter 8 Inference for Full and Fractional Factorial Studies
interactions defined in Section 4.3, the theoretical 3-factor interactions defined here
sum to 0 over levels of any one of the factors. That is,
X
I X
J X
K
αβγi jk = αβγi jk = αβγi jk = 0
i=1 j=1 k=1
Factorial effects The fundamental fact that makes inference for the factorial effects defined in
are L’s, fitted Definitions 3, 4, and 5 possible is that they are particular linear combinations of the
effects are means µi jk (L’s from Section 7.2). And the fitted effects from Section 4.3 are the
corresponding L̂’s corresponding linear combinations of the sample means ȳ i jk ( L̂’s from Section 7.2).
So at least in theory, to make confidence intervals for the factorial effects, one needs
only to figure out exactly what coefficients are applied to each of the means and use
formula (7.20) of Section 7.2.
X ci2jk 2
1 1 1 1 1
= + + +
n i jk 6 n 213 n 223 n 113 n 123
2
1 1 1 1 1 1 1 1 1
+ + + + + + + +
12 n 211 n 221 n 212 n 222 n 111 n 121 n 112 n 122
8.2 p-Factor Studies with Two Levels for Each Factor 575
and using this expression, endpoints for a confidence interval for αγ23 are
v
u
uX ci2jk
ac23 ± tsP t
n i jk
P
It is possible to work out (unpleasant) general formulas for the “ ci2 /n i ” terms
for factorial effects in arbitrary p-way factorials and implement them in computer
software. It is not consistent with the purposes of this book to lay those out here.
However, in the special case of 2 p factorials, there is no difficulty in describing
how to make confidence intervals for the effects or in carrying out a fairly complete
analysis of all of these “by hand” for p as large as even 4 or 5. This is because the 2 p
case of the general p-way factorial structure allows three important simplifications.
Coefficients applied First, for any factorial effect in a 2 p factorial, thePcoefficients “ci ” applied to the
to means to produce means to produce the effect are all ± 21p . So the “ ci2 /n i ” term needed to make a
2p factorial effects confidence interval for any effect in a 2 p factorial is
are all ± 21p !
1 2 1 1 1 1
± p + + + + ···
2 n (1) n a n b n ab
where the subscripts (1), a, b, ab, etc. refer to the combination-naming convention
for 2 p factorials introduced in Section 4.3.
So let E stand for a generic effect in a 2 p factorial (a particular kind of L from
Section 7.2) and Ê be the corresponding fitted effect (the corresponding L̂ from
Section 7.2). Then endpoints of an individual two-sided confidence interval for E
are
Individual confidence
s
1 1 1 1 1
limits for an effect Ê ± tsP p + + + + ··· (8.12)
in a 2p factorial 2 n (1) na nb n ab
where the associated confidence is the probability that the t distribution with
ν = n − r = n − 2 p degrees of freedom assigns to the interval between −t and
t. The usual device of using only one endpoint from formula (8.12) and halving
the unconfidence produces a one-sided confidence interval for the effect. And in
balanced-data situations where all sample sizes are equal to m, formula (8.12) can
be written even more simply as
Balanced data confidence
s
limits for an effect Ê ± t √ P p (8.13)
in a 2p factorial m2
Estimating one 2p effect There is a second simplification of the general p-way factorial situation afforded
of a given type is enough in the 2 p case. Because of the way factorial effects sum to 0 over levels of any factor
576 Chapter 8 Inference for Full and Fractional Factorial Studies
Example 4 Consider again the metal working power requirement study. Agreeing to (arbi-
(continued ) trarily) name tool type 2, the 30◦ tool bevel angle, and the interrupted cut type as
the “high” levels of (respectively) factors A, B, and C, the eight combinations of
the three factors are listed in Table 8.9 in Yates standard order. Taking the sample
means from that table in the order listed, the Yates algorithm can be applied to
produce the fitted effects for the high levels of all factors, as in Table 8.10.
Recall that for the data of Table 8.8, m = 4 and sP = 1.492 mm with 24(=
32 − 23 ) associated degrees of freedom. So one has (from formula (8.13)) that
for (say) individual 90% confidence, the factorial effects in this example can be
estimated with two-sided intervals having endpoints
1.492
Ê ± 1.711 √
4 · 23
Table 8.10
The Yates Algorithm Applied to the Means in Table 8.9
that is,
I Ê ± .45
Then, comparing the fitted effects in the last column of Table 8.10 to the ±.45
value, note that only the main effects of Tool Bevel Angle (factor B) and Type
of Cut (factor C) are statistically detectable. And for example, it appears that
running the machining process at the high level of factor B (the 30◦ bevel angle)
produces a dynamometer reading that is on average between approximately
simplicity, this presentation will use the full normal plot modification of Daniel’s
method. The idea of half normal plotting is considered further in Chapter Exercise 9.)
Factor A Load
Factor B Flow Rate
Factor C Rotational Speed
Factor D Type of Mud
on the logarithm of an advance rate of a small stone drill were considered. (The
raw data are in Table 4.24.) The Yates algorithm applied to the 16 = 24 observed
log advance rates produced the following fitted effects:
ȳ ... = 1.5977
a2 = .0650 b2 = .2900 c2 = .5772 d2 = .1633
ab22 = −.0172 ac22 = .0052 ad22 = .0334
bc22 = −.0251 bd22 = −.0075 cd22 = .0491
abc222 = .0052 abd222 = .0261 acd222 = .0266
bcd222 = −.0173 abcd2222 = .0193
2.0
1.0 c2
b2
d2
0.0
–1.0
–2.0
–.02 .13 .28 .43 .58
Fitted effect quantile
Figure 8.6 Normal plot of the fitted effects for Daniel’s drill
advance rate study
8.2 p-Factor Studies with Two Levels for Each Factor 579
Interpreting a Example 6 is one in which the normal plotting clearly identifies a few effects
normal plot of as larger than the others. However, a normal plot of fitted effects sometimes has
fitted effects a fairly straight-line appearance. When this happens, the message is that the fitted
effects are potentially explainable as resulting from background variation. And it is
risky to make real-world engineering decisions based on fitted effects that haven’t
been definitively established as representing consistent system reactions to changes
in level of the corresponding factors. A linear normal plot of fitted effects from an
unreplicated 2 p study says that more data are needed.
This normal-plotting device has been introduced primarily as a tool for analyzing
data lacking any replication. However, the method is useful even in cases where there
is some replication and sP can therefore be calculated and formula (8.12) or (8.13)
used to judge the detectability of the various factorial effects. Some practice making
and using such plots will show that the process often amounts to a helpful kind of
“data fondling.” Many times, a bit of thought makes it possible to trace an unusual
pattern on such a plot back to a previously unnoticed peculiarity in the data.
As an example, consider what a normal plot of fitted effects would point out
about the following eight hypothetical sample means.
ȳ (1) = 95 ȳ c = 145
ȳ a = 101 ȳ ac = 103
ȳ b = 106 ȳ bc = 107
ȳ ab = 106 ȳ abc = 97
1.0
0.0
–1.0
–5 0 5
Fitted effect quantile
1. the fitting and checking (residual analysis) of the simplified model, and
even to
2. the making of formal inferences under the restricted/simplified model as-
sumptions.
When a 2 p factorial data set is balanced, the model fitting, checking, and subsequent
interval-oriented inference is straightforward.
With balanced 2 p factorial data, producing least squares fitted values is no
more difficult than adding together (with appropriate signs) desired fitted effects
and the grand sample mean. Or equivalently and more efficiently, the reverse Yates
algorithm can be used.
Example 4 In the power requirement study and the data of Table 8.8, only the B and C main
(continued ) effects seem detectably nonzero. So it is reasonable to think of the simplified
version of model (8.11),
for possible use in describing dynamometer readings. From Table 8.10, the fitted
version of µ... is ȳ ... = 27.7969, the fitted version of β2 is b2 = .7969, and the
fitted version of γ2 is c2 = −.9844. Then, simply adding together appropriate
signed versions of the fitted effects, for the four possible combinations of j and
k, produces the corresponding fitted responses in Table 8.11. So for example, as
long as the 15◦ bevel angle (low level of B) and a continuous cut (low level of C)
are being considered, a fitted dynamometer reading of about 27.98 is appropriate
under the simplified model (8.14).
Table 8.11
Fitted Responses for a “B and C Main Effects Only”
Description of Power Requirement
j k bj ck ŷ = ȳ ··· + b j + ck
1 1 −.7969 .9844 27.9844
2 1 .7969 .9844 29.5782
1 2 −.7969 −.9844 26.0156
2 2 .7969 −.9844 27.6094
Example 6 Having identified the C, B, and D main effects as detectably larger than the A
(continued ) main effect or any of the interactions in the drill advance rate study, it is natural
to consider fitting the model
to the logarithms of the unreplicated 24 factorial data of Table 4.24. (Note that
even though p = 4 factors are involved here, five subscripts are not required,
since a subscript is not needed to differentiate between multiple members of the
24 different samples in this unreplicated context. yi jkl is the single observation
at the ith level of A, jth level of B, kth level of C, and lth level of D.) Since
the drill advance rate data are balanced (all sample sizes are m = 1), the fitted
effects given earlier (calculated without reference to the simplified model) serve
as fitted effects under model (8.15). And fitted responses under model (8.15) are
obtainable by simple addition and subtraction using those.
Since there are eight different combinations of j, k, and l, eight different
linear combinations of ȳ ... , b2 , c2 , and d2 are required. While these could be
treated one at time, it is more efficient to generate them all at once using the
reverse Yates algorithm (from Section 4.3) as in Table 8.12. From Table 8.12 it
is evident, for example, that the fitted mean responses for combinations bcd and
abcd ( ŷ bcd and ŷ abcd ) are both 2.6282.
582 Chapter 8 Inference for Full and Fractional Factorial Studies
Fitted means derived as in these examples lead in the usual way to residuals,
R 2 values, and plots for checking on the reasonableness of simplified versions of
the general 2 p version of model (8.11). In addition, corresponding to simplified or
reduced models like (8.14) or (8.15), there are what will here be called few-effects s 2
values. When m > 1, these can be compared to sP2 as another means of investigating
the reasonableness of the corresponding models.
1 X
2
sFE = (y − ŷ)2 (8.16)
m2 − u
p
where the sum is over the squares of the u − 1 fitted effects corresponding to those
main effects andP interactions appearing in the reduced model equation, and (as
always) SSTot = (y − ȳ)2 = (n − 1)s 2 .
Example 4 Residuals for the power requirement data based on the full model (8.11) are
(continued ) obtained by subtracting sample means in Table 8.9 from observations in Table
8.8. Under the reduced model (8.14), however, the fitted values in Table 8.11 are
appropriate for producing residuals. The fitted means and residuals for a “B and
C main effects only” description of this 23 data set are given in Table 8.13. Figure
8.8 is a normal plot of these residuals, and Figure 8.9 is a plot of the residuals
against the fitted values.
If there is anything remarkable in these plots, it is that Figure 8.9 contains a
hint that smaller mean response has associated with it smaller response variability.
In fact, looking back at Table 8.13, it is easy to see that the two smallest fitted
means correspond to the high level of C (i.e., interrupted cuts). That is, the hint of
change in response variation shown in Figure 8.9 is the same phenomenon related
Table 8.13
Residuals for the “B and C Main Effects Only” Model of Power
Requirement
Example 4
1.5 2
3
2
0.0 3
3
3
2
–1.5
4.0
2.0 2
Residual
2
0.0 3 2
2
2 2
–2.0
to cut type that was discussed when these data were first introduced. It appears
that power requirements for interrupted cuts may be slightly more consistent than
for continuous cuts. But on the whole, there is little in the two figures to invalidate
model (8.14) as at least a rough-and-ready description of the mechanism behind
the data of Table 8.8.
For the power requirement data,
Then, since sP2 = 2.226, the one-way ANOVA identity (7.49, 7.50, or 7.51) of
Section 7.4 says that
SStr 55.51
I R2 = = = .51
SStot 108.93
On the other hand, it is possible to verify that for the simplified model (8.14),
squaring and summing the residuals in Table 8.13 gives
X
SSE = (y − ŷ)2 = 57.60
(Recall Definition 6 in Chapter 7 for SS E.) So for the “B and C main effects
only” description of dynamometer readings,
Thus, although at best only about 51% of the raw variation in dynamometer
readings will be accounted for, fitting the simple model (8.14) will account for
nearly all of that potentially assignable variation. So from this point of view as
well, model (8.14) seems attractive as a description of power requirement.
Note that formulas (8.16) and (8.17) imply that for balanced 2 p factorial
data, fitting reduced models gives
X X 2
(y − ŷ)2 = SSTot − m2 p Ê
So it is not surprising that using the b2 = .7969 and c2 = −.9844 figures from
before,
X 2
SSTot − m2 p Ê = 108.93 − 4 · 23 · (.7969)2 + (−.9844)2
= 108.93 − 51.33
= 57.60
P
which is the value of (y − ŷ)2 just used in finding R 2 for the reduced model.
From formula (8.16) or (8.17), it is then clear that (corresponding to reduced
model (8.14))
1
2
sFE = (57.60) = 1.986
4 · 23 − 3
so
√
I sFE = 1.986 = 1.409 mm
586 Chapter 8 Inference for Full and Fractional Factorial Studies
Example 4 which agrees closely with sP = 1.492. Once again on this account, description
(continued ) (8.14) seems quite workable.
Example 6 Table 8.14 contains the log advance rates, fitted values, and residuals for Daniel’s
(continued ) unreplicated 24 example. (The raw data were given in Table 4.24, and it is the
few-effects model (8.15) that is under consideration.)
The reader can verify by plotting that the residuals in Table 8.14 are not in
any way remarkable. Further, it is possible to check that
X
SSTot = (y − ȳ)2 = 7.2774
and
X
SSE = (y − ŷ)2 = .1736
So (as indicated earlier in Example 12 in Chapter 4) for the use of model (8.15),
Table 8.14
Responses, Fitted Values, and Residuals for the “B, C, and D
Main Effects” Model and Daniel’s Drill Advance Rate Data
Since there is no replication in this data set, fitting the 4-factor version of the
general model (8.11) would give a perfect fit, R 2 equal to 1.000, all residuals
equal to 0, and no value of sP2 . Thus, there is really nothing to judge R 2 = .976
against in relative terms. But even in absolute terms it appears that the “B, C, and
D main effects only” model for log advance rate fits the data well.
An estimate of the variability of log advance rates for a fixed combina-
tion of factor levels derived under the assumptions of model (8.15), is (from
formula (8.16))
s
1
I sFE = (.1736) = .120
1 · 24 − 4
As noted, there’s no sP to compare this to, but it is at least consistent with the kind
of variation in y seen in Table 8.14 when responses are compared for pairs of
combinations that (like combinations b and ab) differ only in level of the factor A.
σ2
Var ŷ = u
m2 p
may be used as an individual confidence interval for the corresponding mean re-
sponse. The associated confidence is the probability that the t distribution with
ν = m2 p − u degrees of freedom assigns to the interval between −t and t. And a
one-sided confidence interval for the mean response can be obtained in the usual
way, by employing only one of the endpoints indicated in formula (8.18) and appro-
priately adjusting the confidence level.
Example 4 Consider estimating the mean dynamometer reading corresponding to a 15◦ bevel
(continued ) angle and interrupted cut using the “B and C main effects only” description of
Miller’s power requirement study. (These are the conditions that appear to produce
the smallest mean power requirement.) Using (for example) 95% confidence, a
fitted value of 26.02 from Table 8.11, and sFE = 1.409 mm possessing ν =
4 · 23 − 3 = 29 associated degrees of freedom in formula (8.18), leads to a two-
sided interval with endpoints
r
3
26.02 ± 2.045(1.409)
4 · 23
that is, endpoints
that is,
In contrast to this interval, consider what the method of Section 7.2 provides
for a 95% confidence interval for the mean reading for tool type 1, a 15◦ bevel
angle, and interrupted cuts. Since sP = 1.492 with ν = 24 associated degrees of
freedom, and (from Table 8.9) ȳ c = 26.50, formula (7.14) of Section 7.2 produces
a two-sided confidence interval for µc with endpoints
1
26.50 ± 2.064(1.492) √
4
that is,
A major practical difference between intervals (8.19) and (8.20) is the apparent
increase in precision
√ provided by interval (8.19), due in numerical terms primarily
to the “extra” 3/8 factor present in the first plus-or-minus calculation but not
in the second. However, it must be remembered that the extra precision is bought
at the price of the use of model (8.14) and the consequent use of all observations
8.2 p-Factor Studies with Two Levels for Each Factor 589
in the generation of ŷ c (rather than only the observations from the single sample
corresponding to combination c).
Example 6 Consider again Daniel’s drill advance rate study and, for example, the effect of
(continued ) the high level of rotational speed on the natural logarithm of advance rate. Under
the “B, C, and D main effects only” description of log advance rate, sFE = .120
with ν = 1 · 24 − 4 = 12 associated degrees of freedom. Also, c2 = .5772. Then
(for example) using a 95% confidence level, from formula (8.21), a two-sided
interval for γ2 under the simplified model has endpoints
.120
.5772 ± 2.179 √
1 · 24
that is,
.5772 ± .0654
that is,
in average log advance rate as one moves from the low level of rotational speed
to the high level. And upon exponentiation, a multiplication of median advance
rate by a factor between
There are other ways to use the reduced model ideas discussed here. For exam-
ple, a simplified model for responses can be used to produce prediction and tolerance
intervals for individuals. Section 8.3 of Vardeman’s Statistics for Engineering Prob-
lem Solving is one place to find an exposition of these additional methods.
Section 2 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Consider again the situation of Exercise 2 of Sec- done so, use the Yates algorithm and compute fitted
tion 4.3. 24 factorial effects for the data set.
(a) For the logged responses, make individual 95% (a) Use normal-plotting to identify statistically de-
confidence intervals for the effects correspond- tectable effects here.
ing to the high levels of all three factors. Which (b) Based on your analysis from (a), postulate a
effects are statistically detectable? possible few-effects model for this situation.
(b) Fit an appropriate few-effects model suggested Use the reverse Yates algorithm to fit such a
by your work in (a) to these data. Compare the model to these data. Use the fitted values to
corresponding value of sFE to the value of sP . compute residuals. Normal-plot these and plot
(c) Compare a two-sided individual 95% confi- them against levels of each of the four factors,
dence interval for the mean (logged) response looking for obvious problems with the model.
for combination (1) made using the fitted few- (c) Based on your few-effects model, make a rec-
effects model to one based on the methods of ommendation for the future making of these
Section 7.2. devices. Give a 95% two-sided confidence in-
2. Chapter Exercise 9 in Chapter 4 concerns the mak- terval (based on the few-effects model) for the
ing of Dual In-line Packages and the number of mean pullouts you expect to experience if your
pullouts produced on such devices under 24 dif- advice is followed.
ferent combinations of manufacturing conditions. 3. A classic unreplicated 24 factorial study, used as an
Return to that exercise, and if you have not already example in Experimental Statistics (NBS Handbook
1
8.3 Standard Fractions of Two-Level Factorials, Part I: 2
Fractions 591
# 91) by M. G. Natrella, concerns flame tests of (a) Use the (four-cycle) Yates algorithm and com-
fire-retardant treatments for cloth. The factors and pute the fitted 24 factorial effects for the study.
levels used in the study were (b) Make either a normal plot or a half normal
plot using the fitted effects from part (a). What
A Fabric Tested sateen (−) vs. monk’s cloth (+) subject-matter interpretation of the data is sug-
B Treatment X (−) vs. Y (+) gested by the plot? (See Chapter Exercise 9
C Laundering before (−) vs. after (+) regarding half normal-plotting.)
(c) Natrella’s original analysis of these data pro-
Condition
duced the conclusion that both the A main ef-
D Direction of Test warp (−) vs. fill (+) fects and the AB two-factor interactions are
statistically detectable and of practical impor-
The response variable, y, is the inches burned on
tance. We (based on a plot like the one asked for
a standard-size sample in the flame test. The data
in (b)) are inclined to doubt that the data are re-
reported by Natrella follow:
ally adequate to detect the AB interaction. But
for the sake of example, temporarily accept the
Combination y Combination y
conclusion of Natrella’s analysis. What does it
(1) 4.2 d 4.0 say in practical terms about the fire-retardant
a 3.1 ad 3.0
treating of cloth? (How would you explain the
results to a clothing manufacturer?)
b 4.5 bd 5.0
ab 2.9 abd 2.5
c 3.9 cd 4.0
ac 2.8 acd 2.5
bc 4.6 bcd 5.0
abc 3.2 abcd 2.3
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
This section begins with some general qualitative remarks about fractional
factorial experimentation. The standard 12 fractions of 2 p studies (the 2 p−1 fractional
factorials) are then discussed in detail. The section covers in turn (1) the proper
choice of such fractions, (2) the resultant aliasing or confounding patterns, and (3)
corresponding methods of data analysis. The section closes with a few remarks
about qualitative issues, addressed to the practical use of 2 p−1 designs.
(1), a, b, and ab
(1) and a
is no good, since in both cases the factor B is held at its low level. Therefore, no
information at all would be obtained on B’s impact on the response. Similarly,
the possibility of studying the combinations
(1) and b
(1) and ab
as a 12 fraction of the full 22 factorial that is at all sensible (if combination (1)
is to be included). Similar reasoning eliminates all other pairs of combinations
from potential use except the pair
a and b
But now notice that any experiment that includes only combinations
(1) and ab
or combinations
a and b
must inevitably produce somewhat ambiguous results. Since one moves from
combination (1) to combination ab (or from a to b) by changing levels of both
factors, if a large difference in response is observed, it will not be clear whether
the difference is due to A or due to B.
At least in qualitative terms, such is the nature of all fractional factorial stud-
ies. Although very poor choices of experimental combinations may be avoided,
some level of ambiguity must be accepted as the price for not conducting a full
factorial.
594 Chapter 8 Inference for Full and Fractional Factorial Studies
µ... = 10, α2 = 3, β2 = 1, γ2 = 2,
αβ22 = 2, αγ22 = 0, βγ22 = 0, αβγ222 = 0
Either through the use of the reverse Yates algorithm or otherwise, it is possible
to verify that corresponding to these effects are then the eight combination means
Now imagine that for some reason, only four of the eight combinations of
levels of A, B, and C will be included in a study of this system, namely the
combinations
a, b, c, and abc
µa = 8, µb = 4, µc = 10, µabc = 18
Figure 8.10 shows the complete set of eight combination means laid out on a
cube plot, with the four observed means circled.
As a sidelight, note the admirable symmetry possessed by the four circled
corners on Figure 8.10. Each face of the cube has two circled corners (both levels
of all factors appear twice in the choice of treatment combinations). Each edge
has one circled corner (each combination of all pairs of factors appears once).
And collapsing the cube in any one of the three possible directions (left to right,
top to bottom, or front to back) gives a full factorial set of four combinations.
(Ignoring the level of any one of A, B, or C in the four combinations a, b, c, and
abc gives a full factorial in the other two factors.)
1
8.3 Standard Fractions of Two-Level Factorials, Part I: 2
Fractions 595
µ bc = 8 µ abc = 18
(+) µb = 4 µab = 14
Factor B µc = 10 µ ac = 12
)
(+
C
or
ct
(−) µ (1) = 6 µa = 8
Fa
)
(−
(−) Factor A (+)
α2 = µ2.. − µ...
the average of all four mean
the grand average of all
= responses where A is at its −
eight mean responses
second or high level
which can be thought of as the right-face average minus the grand average for the
cube in Figure 8.10. Armed only with the four means µa , µb , µc , and µabc (the
four circled corners on Figure 8.10), it is not possible to compute α2 . But what
might be done is to make a calculation similar to the one that produces α2 using
only the available means. That is,
while this hypothetical example began with γ2 = 2. Here, the 12 fraction calcula-
tion gives something quite different from the full factorial calculation.
The key to understanding how one can apparently get something for nothing
in the case of the A main effects in this example, but cannot do so in the case of
the C main effects, is to know that (in general) for this 12 fraction,
α2∗ = α2 + βγ22
and
γ2∗ = γ2 + αβ22
Since this numerical example began with βγ22 = 0, one is “fortunate”—it turns
out numerically that α2∗ = α2 . On the other hand, since αβ22 = 2 6= 0, one is
“unfortunate”—it turns out numerically that γ2∗ = γ2 + 2 6= γ2 .
Relationships like these for α2∗ and γ2∗ hold for all 12 fraction versions of
the full factorial effects. These relationships detail the nature of the ambiguity
inherent in the use of the 12 fraction of the full 23 factorial set of combinations.
Essentially, based on data from four out of eight possible combinations, one will
be unable to distinguish between certain pairs of effects, such as the A main effect
and BC 2-factor interaction pair here.
These questions will be answered in this section for the case of 12 fractions (2 p−1
fractional factorials) and for general q in the next section.
Prescription for In order to arrive at what is in some sense a best possible choice of 12 of 2 p
a best half fraction combinations of levels of p factors, do the following. For the first p − 1 factors,
of a 2p factorial write out all 2 p−1 possible combinations of these factors. By multiplying plus and
minus signs (thinking of multiplying plus and minus 1’s) corresponding to levels
of the first factors, then arrive at a set of plus and minus signs that can be used to
prescribe how to choose levels for the last factor (to be used in combination with
the indicated levels of the first p − 1 factors).
Table 8.15
Five Chemical Process Variables and Their Experimental Levels
A B C D ABCD Product
− − − − +
+ − − − −
− + − − −
+ + − − +
− − + − −
+ − + − +
− + + − +
+ + + − −
− − − + −
+ − − + +
− + − + +
+ + − + −
− − + + +
+ − + + −
− + + + −
+ + + + +
In Snee’s study, the signs in the ABCD Product column were used without
modification to specify levels of E. The corresponding treatment combination
names (written in the same order as in Table 8.16) and the data reported by Snee
are given in Table 8.17. Notice that the 16 combinations listed in Table 8.17 are
1
2
of the 25 = 32 possible combinations of levels of these five factors. (They are
those 16 that have an odd number of factors appearing at their high levels).
Table 8.17
16 Combinations and Observed
Color Indices in Snee’s 25−1 Study
(Example 9 )
Table 8.18
Five Peanut Processing Variables and Their Experimental Levels
factors appearing at their high levels). The 16 combinations studied and corre-
sponding responses reported by Kilgo are given in Table 8.19 in the same order
for factors A through D as in Table 8.16.
The difference between the combinations listed in Tables 8.17 and 8.19 deserves
some thought. As Kilgo named the factor levels, the two lists of combinations
are quite different. But verify that if she had made the slightly less natural but
nevertheless permissible choice to call the 4.05 mm level of factor E the low (−) level
600 Chapter 8 Inference for Full and Fractional Factorial Studies
Table 8.19
16 Combinations and Observed
Yields in Kilgo’s 25−1 Study
(Example 10 )
and the 1.28 mm level the high (+) level, the names of the physical combinations
actually studied would be exactly those in Table 8.17 rather than those in Table 8.19.
The point here is that due to the rather arbitrary nature of how one chooses to
name high and low levels of two factors, the names of different physical combinations
are themselves to some extent arbitrary. In choosing fractional factorials, one chooses
some particular naming convention and then has the freedom to choose levels of
the last factor (or factors for q > 1 cases) by either using the product column(s)
directly or after switching signs. The decision whether or not to switch signs does
affect exactly which physical combinations will be run and thus how the data should
be interpreted in the subject-matter context. But generally, the different possible
Fractional factorials choices (to switch or not switch signs) are a priori equally attractive. For systems
fully reveal system that happen to have relatively simple structure, all possible results of these arbitrary
structure only for choices typically lead to similar engineering conclusions. When systems turn out to
simple cases have complicated structures, the whole notion of fractional factorial experimentation
loses its appeal. Different arbitrary choices lead to different perceptions of system
behavior, none of which (usually) correctly portrays the complicated real situation.
1
8.3.3 Aliasing in the Standard 2
Fractions
Once a 12 fraction of a 2 p study is chosen, the next issue is determining the nature
of the ambiguities that must arise from its use. For 2 p−1 data structures of the
type described here, one can begin with a kind of statement of how the fractional
1
8.3 Standard Fractions of Two-Level Factorials, Part I: 2
Fractions 601
factorial plan was derived and through a system of formal multiplication arrive at an
understanding of which (full) factorial effects cannot be separated on the basis of the
fractional factorial data. Some terminology is given next, in the form of a definition.
Definition 7 When it is only possible to estimate the sum (or difference) of two or more
(full) factorial effects on the basis of data from a fractional factorial, those
effects are said to be aliased or confounded and are sometimes called aliases.
In this text, the phrase alias structure of a fractional factorial plan will mean
a complete specification of all sets of aliased effects.
where the plus or minus sign is determined by whether the signs were left alone
or switched in the specification of levels of the last factor. The double arrow in
expression (8.22) will be read as “is aliased with.” And since expression (8.22)
really says how the fractional factorial under consideration was chosen, expression
(8.22) will be called the plan’s generator. The generator (8.22) for a 2 p−1 plan says
that the (high level) main effect of the last factor will be aliased with plus or minus
the (all factors at their high levels) p − 1 factor interaction of the first p − 1 factors.
was used. Therefore the (high level) E main effect is aliased with the (all high
levels) ABCD 4-factor interaction. That is, only 2 + αβγ δ2222 can be estimated
based on the 12 fraction data, not either of its summands individually.
Example 10 was used. The (high level) E main effect is aliased with minus the (all high levels)
(continued ) ABCD 4-factor interaction. That is, only 2 − αβγ δ2222 can be estimated based
on the 12 fraction data, not either of the terms individually.
Conventions for The entire alias structure for a 12 fraction follows from the generator (8.22) by
the system of multiplying both sides of the expression by various factor names, using two special
formal multiplication conventions. These are that any letter multiplied by itself produces the symbol “I”
and that any letter multiplied by “I” is that letter again. Applying the first of these
conventions to expression (8.22), both sides of the expression may be multiplied by
the name of the last factor to produce the relation
Defining relation for
a standard half fraction I ↔ ± the product of names of all p factors (8.23)
of a 2p factorial
Expression (8.23) means that the grand mean is aliased with plus or minus the (all
factors at their high level) p-factor interaction. There is further special terminology
for an expression like that in display (8.23).
Definition 8 The list of all aliases of the grand mean for a 2 p−q fractional factorial is called
the defining relation for the design.
By first translating a generator (or generators in the case of q > 1) into a defining
relation and then multiplying through the defining relation by a product of letters
corresponding to an effect of interest, one can identify all aliases of that effect.
I I ↔ ABCDE (8.24)
which indicates that the grand mean µ..... is aliased with the 5-factor interaction
αβγ δ22222 . Then, for example, multiplying through defining relation (8.24) by
the product AC produces the relationship
AC ↔ BDE
Thus, the AC 2-factor interaction is aliased with the BDE 3-factor interaction.
In fact, the entire alias structure for the Snee study can be summarized in terms
of the aliasing of 16 different pairs of effects. These are indicated in Table 8.20,
1
8.3 Standard Fractions of Two-Level Factorials, Part I: 2
Fractions 603
which was developed by using the defining relation (8.24) to find successively
(in Yates order) the aliases of all effects involving only factors A, B, C, and D.
Table 8.20 shows that main effects are confounded with 4-factor interactions and
2-factor interactions with 3-factor interactions. This degree of ambiguity is as
mild as is possible in a 25−1 study.
Table 8.20
The Complete Alias Structure for
Snee’s 25−1 Study
I ↔ ABCDE D ↔ ABCE
A ↔ BCDE AD ↔ BCE
B ↔ ACDE BD ↔ ACE
AB ↔ CDE ABD ↔ CE
C ↔ ABDE CD ↔ ABE
AC ↔ BDE ACD ↔ BE
BC ↔ ADE BCD ↔ AE
ABC ↔ DE ABCD ↔ E
Example 10 In Kilgo’s peanut oil extraction study, since the generator is E ↔ −ABCD, the
(continued ) defining relation is I ↔ −ABCDE, and the alias structure is that given in Table
8.20, except that a minus sign should be inserted on one side or the other of
every row of the table. So, for example, αβ22 − γ δ222 may be estimated based
on Kilgo’s data, but neither αβ22 nor γ δ222 separately.
1. Temporarily ignore the last factor and compute the estimated or fitted “ef-
fects.”
2. Somehow judge the statistical significance and apparent real importance of
the “effects” computed for the complete factorial in p − 1 two-level factors.
(Where some replication is available, the judging of statistical significance
can be done through the use of confidence intervals. Where all 2 p−1 samples
are of size 1, the device of normal-plotting fitted “effects” is standard.)
3. Finally, seek a plausible simple interpretation of the important fitted “ef-
fects,” recognizing that they are estimates not of the effects in the first p − 1
factors alone, but of those effects plus their aliases.
604 Chapter 8 Inference for Full and Fractional Factorial Studies
Example 9 Consider the analysis of Snee’s data, listed in Table 8.17 in Yates standard order
(continued ) for factors A, B, C, and D (ignoring the existence of factor E). Then, according to
the prescription for analysis just given, the first step is to use the Yates algorithm
(for four factors) on the data. These calculations are summarized in Table 8.21.
Each entry in the final column of Table 8.21 gives the name of the effect
that the corresponding numerical value in the “Cycle 4 ÷ 16” column would be
estimating if factor E weren’t present, plus the alias of that effect. The numbers
in the next-to-last column must be interpreted in light of the fact that they are
estimating sums of 25 factorial effects.
Since there is no replication indicated in Table 8.17, only normal-plotting
fitted (sums of) effects is available to identify those that are distinguishable from
noise. Figure 8.11 is a normal plot of the last 15 entries of the Cycle 4 ÷ 16
column of Table 8.21. (Since in most contexts one is a priori willing to grant that
the overall mean response is other than 0, the estimate of it plus its alias(es) is
rarely included in such a plot.)
Standard normal quantile
2.4 D + ABCE
1.2 A + BCDE
2
0.0 2
E + ABCD
–1.2
B + ACDE
Depending upon how the line is drawn through the small estimated (sums of)
effects in Figure 8.11, the estimates corresponding to D + ABCE, and possibly
B + ACDE, E + ABCD, and A + BCDE as well, are seen to be distinguishable
in magnitude from the others. (The line in Figure 8.11 has been drawn in keeping
with the view that there are four statistically detectable sums of effects, primarily
because a half normal plot of the absolute values of the estimates—not included
here—supports that view.) If one adopts the view that there are indeed four
detectable (sums of) effects indicated by Figure 8.11, it is clear that the simplest
possible interpretation of this outcome is that the four large estimates are each
reflecting primarily the corresponding main effects (and not the aliased 4-factor
1
8.3 Standard Fractions of Two-Level Factorials, Part I: 2
Fractions 605
Table 8.21
The Yates Algorithm for a 24 Factorial Applied to Snee’s 25−1 Data
Example 10 Verify that for Kilgo’s data in Table 8.19, use of the (four-cycle) Yates algorithm
(continued ) on the data as listed (in standard order for factors A, B, C, and D, ignoring factor
E) produces the estimated (differences of) effects given in Table 8.22.
606 Chapter 8 Inference for Full and Fractional Factorial Studies
are significantly larger than the other 13 estimates. The simplest possible interpre-
tation of this outcome is that the two large estimates are each reflecting primarily
the corresponding main effects (not the aliased 4-factor interactions). That is,
a tentative description of the oil extraction process is that average particle size
(factor E) and temperature (factor B), acting more or less separately, are the prin-
ciple determinants of yield. This is an example where the ultimate engineering
objective is to maximize response and the two large estimates are both positive.
So, for best yield one would prefer the high level of B (95◦ C temperature) and
Standard normal quantile
2.4 ABCD – E
B – ACDE
1.2
0.0
2
–1.2
low level of E (1.28 mm particle size). (−2 is apparently positive, and since
1 = −2 , the superiority of the low level of E is indicated.)
Factor A reflects the left-to-right location on the fabric width from which a
tested sample is taken. Factor C reflects a count of yarns per inch inserted in the
cloth, top to bottom, during weaving. Factor D reflects the air pressure used to
propel the yarn across the fabric width during weaving.
Initially, a replicated 24−1 study was done using the generator D ↔ ABC.
m = 5 pieces of cloth were tested for each of the eight different factor-level
combinations studied. The resulting mean fabric tenacities ȳ, expressed in terms
of strength per unit linear density, are given in Table 8.24. Although it is not
absolutely clear in the article, it also appears that pooling the eight s 2 values from
the 12 fraction gave sP ≈ 1.16.
Apply the (three-cycle) Yates algorithm to the means listed in Table 8.24 (in
the order given) and verify that the estimated sums of effects corresponding to
the means in Table 8.24 are those given in Table 8.25.
Temporarily ignoring the existence of factor D, confidence intervals based on
these estimates can be made using the m = 5 and p = 3 version of formula (8.13)
from Section 8.2. That is, using 95% two-sided individual confidence intervals,
since ν = 8(5 − 1) = 32 degrees of freedom are associated with sP , a precision
of roughly
(2.04)(1.16)
± √ = ±.375
5·8
should be associated with each of the estimates in Table 8.25. By this standard, the
estimates corresponding to the A + BCD, AB + CD, C + ABD, and BC + AD
Table 8.24
Eight Sample Means from a 24−1 Fabric Tenacity Experiment
Table 8.25
Estimated Sums of 24 Effects in a 24−1
Fabric Weaving Experiment
sums are statistically significant. Two reasonably plausible and equally simple
tentative interpretations of this outcome are
1. There are detectable A and C main effects and detectable 2-factor inter-
actions of A with B and D.
2. There are detectable A and C main effects and detectable 2-factor inter-
actions of C with B and D.
(For that matter, there are others that you may well find as plausible as these two.)
In any case, the ambiguities left by the collection of the data summarized in
Table 8.24 were unacceptable. To remedy the situation, the authors subsequently
completed the 24 factorial study by collecting data from the other eight combina-
tions defined by the generator D ↔ −ABC. The means they obtained are given
in Table 8.26.
One should honestly consider (and hopefully eliminate) the possibility that
there is a systematic difference between the values in Table 8.24 and in Table
8.26 as a result of some unknown factor or factors that changed in the time lapse
between the collection of the first block of observations and the second block. If
Table 8.26
Eight More Sample Means from a Second 24−1
Fabric Tenacity Study
Combination ȳ Combination ȳ
d 23.73 c 24.63
a 23.55 acd 25.78
b 25.98 bcd 24.10
abd 23.64 abc 23.93
610 Chapter 8 Inference for Full and Fractional Factorial Studies
Example 11 that possibility can be eliminated, it would make sense to put together the two data
(continued ) sets, treat them as a single full 24 factorial data set, and employ the methods of
Section 8.2 in their analysis. (Some repetition of a combination or combinations
included in the first study phase—e.g., the center point of the design—would have
been advisable to allow at least a cursory check on the possibility of a systematic
block effect.)
Johnson, Clapp, and Baqai don’t say explicitly what sample sizes were used
to produce the ȳ’s in Table 8.26. (Presumably, m = 5 was used.) Nor do they
give a value for sP based on all 24 samples, so it is not possible to give a complete
analysis of the full factorial data à la Section 8.2. But it is possible to note what
results from the use of the Yates algorithm with the full factorial set of ȳ’s. This
is summarized in Table 8.27.
Table 8.27
Fitted Effects from the Full 24 Factorial Fabric Tenacity Study
The statistical significance of the entries of Table 8.27 will not be judged here.
But note that the picture of fabric tenacity given by the fitted effects in this table is
somewhat more complicated than either of the tentative descriptions derived from
the original 24−1 study. The fitted effects, listed in order of decreasing absolute
value, are
Although tentative description (2) (page 609) accounts for the first four of these,
the A and C main effects indicated in Table 8.27 are not really as large as
one might have guessed looking only at Table 8.25. Further, the AC 2-factor
interaction appears from Table 8.27 to be nearly as large as the C main effect.
This is obscured in the original 24−1 fractional factorial because the AC 2-factor
interaction is aliased with an apparently fairly large BD 2-factor interaction of
opposite sign.
1
8.3 Standard Fractions of Two-Level Factorials, Part I: 2
Fractions 611
Section 3 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
ν = 2 version of formula (8.13) of Section 8.2. fitted 24 factorial effects for the study. Normal-
What is the plus-or-minus value that comes plot these. What subject-matter interpretation
from this program, for individual 95% two- of the data is suggested by the normal plot?
sided confidence intervals? Using this value, Now suppose that instead of a full factorial study,
which of the fitted sums of effects would you only the 12 fraction with generator D ↔ ABC had
judge to be statistically detectable? Does this been conducted.
list suggest to you any particularly simple/ (b) Which 8 of the 16 treatment combinations
intuitive description of how bond strength de- would have been run? List these combinations
pends on the levels of the five factors? in Yates standard order as regards factors A,
(c) Based on your analysis from (b), if you had B, and C and use the (three-cycle) Yates al-
to guess what levels of the factors A, C, and gorithm to compute the 8 estimated sums of
D should be used for high bond strength, what effects that it is possible to derive from these
would you recommend? If the CE + ABD fit- 8 treatment combinations. Verify that each of
ted sum reflects primarily the CE 2-factor in- these 8 estimates is the sum of two of your
teraction, what level of E then seems best? fitted effects from part (a). (For example, you
Which of the combinations actually observed should find that the first estimated sum here is
had these levels of factors A, C, D, and E? How ȳ .... + abcd2222 from part (a).)
does its response compare to the others? (c) Normal-plot the last 7 of the estimated sums
3. Return to the fire retardant flame test study of Ex- from (b). Interpret this plot. If you had only the
ercise 3 of Section 8.2. The original study, summa- data from this 24−1 fractional factorial, would
rized in that exercise, was a full 24 factorial study. your subject-matter conclusions be the same as
(a) If you have not done so previously, use the those reached in part (a), based on the full 24
(four-cycle) Yates algorithm and compute the data set?
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
to 14 , 18 , 16
1
, etc. fractions of 2 p studies is to realize that there are several possible
similar columns that could be developed using only some of the first p − 1 factors.
When moving from 12 fractions to 21q fractions of 2 p factorials, one makes use of
such columns in assigning levels of the last q factors and then develops and uses an
alias structure consistent with the choice of columns.
For example, first consider the situation for cases where p − q = 3—that is,
where 23 = 8 different combinations of levels of p two-level factors are going to be
included in a study. A table of signs specifying all eight possible combinations of
levels of the first three factors A, B, and C, with four additional columns made up
as the possible products of the first three columns, is given in Table 8.28.
The final column of Table 8.28 can be used to choose levels of factor D for a
best possible 24−1 fractional factorial study. But it is also true that two or more of
Choosing a 2p−q the product columns in Table 8.28 can be used to choose levels of several additional
fractional factorial factors (beyond the first three). If this is done, one winds up with a fractional factorial
with p − q = 3 that can be understood in the same ways it is possible to make sense of the standard
2 p−1 data structures discussed in Section 8.3.
Table 8.28
Signs for Specifying all Eight Combinations of Three Two-Level Factors
and Four Sets of Products of Those Signs
Table 8.29
Combinations Included in the 26−3 Propellant Slurry Study
A B C F E D Combination Name
− − − − − − (1)
+ − − + − + adf
− + − − + + bde
+ + − + + − abef
− − + + + + cdef
+ − + − + − ace
− + + + − − bcf
+ + + − − + abcd
The development of 2 p−q fractional factorials has been illustrated with eight-
combination (i.e., p − q = 3) plans. But it should be obvious that there are 16-row,
32-row, 64-row, . . . , etc. versions of Table 8.28. Using any of these, one can assign
levels for the last q factors according to signs in product columns and end up with a
1
2q
fraction of a full 2 p factorial plan. When this is done, the 2 p factorial effects are
aliased in 2 p−q groups of 2q effects each. The determination of this alias structure
Determining the can be made by using q generators to develop a defining relation for the fractional
alias structure factorial. A general definition of the notion of generators for a 2 p−q fractional
of a 2p−q factorial factorial is next.
Definition 9 When a 2 p−q fractional factorial comes about by assigning levels of each of
the “last” q factors based on a different column of products of signs for the
“first” p − q factors, the q different relationships
the name of an a product of names of some
↔±
additional factor of the first p − q factors
8.4 Standard Fractions of Two-Level Factorials Part II: General 2p−q Studies 615
Each generator can be translated into a statement with I on the left side and
then taken individually, multiplied in pairs, multiplied in triples, and so on until the
whole defining relation is developed. (See again Definition 8, page 602, for the
meaning of this term.) In doing so, use can be made of the convention that minus
any letter times minus that letter is I.
Example 12 In the Army propellant example, the q = 3 generators that led to the combinations
(continued ) in Table 8.29 were
D ↔ ABC
E ↔ −BC
F ↔ −AC
Multiplying through by the left sides of these, one obtains the three relationships
I ↔ ABCD (8.25)
I ↔ −BCE (8.26)
I ↔ −ACF (8.27)
I ↔ (ABCD) · (−BCE)
that is,
I ↔ −ADE
I ↔ −BDF
I ↔ ABEF
616 Chapter 8 Inference for Full and Fractional Factorial Studies
Example 12 and finally, using all three relationships (8.25), (8.26), and (8.27), one has
(continued )
I ↔ CDEF
Combining all of this, the complete defining relation for this 26−3 study is
Defining relation (8.28) is rather formidable, but it tells the whole truth about
what can be learned based on the 18 of 64 possible combinations of six two-level
factors. Relation (8.28) specifies all effects that will be aliased with the grand
mean. Appropriately multiplying through expression (8.28) gives all aliases of
any effect of interest. For example, multiplying through relation (8.28) by A gives
and for example, the (high level) A main effect will be indistinguishable from
minus the (all high levels) CF 2-factor interaction.
Data analysis for With a 2 p−q fractional factorial’s defining relation in hand, the analysis of data
a 2p−q study proceeds exactly as indicated earlier for 12 fractions. It is necessary to
1. compute estimates of (sums and differences of) effects ignoring the last q
factors,
Example 12 In the Army propellant study, m = 2 trials for each of the 26−3 combinations listed
(continued ) in Table 8.29 gave sP2 = .02005 and the sample averages listed in Table 8.30.
Temporarily ignoring all but the (“first”) three factors A, B, and C (since
the levels of D, E and F were derived or generated from the levels of A, B
and C), the (three-cycle) Yates algorithm can be used on the sample means, as
shown in Table 8.31. Remember that the estimates in the next-to-last column
of Table 8.31 must be interpreted in light of the alias structure for the original
experimental plan. So for example, since (both from the original generators and
8.4 Standard Fractions of Two-Level Factorials Part II: General 2p−q Studies 617
Table 8.30
Eight Sample Means from the 26−3 Propellant Slurry Study
Combination ȳ Combination ȳ
from relation (8.28)) one knows that D ↔ ABC, the −.0650 value on the last
line of Table 8.31 is estimating
So if one were expecting a large main effect of factor D, one would expect it to
be evident in the −.0650 value.
Since a value of sP is available here, there is no need to resort to normal-
plotting to judge the statistical detectability of the values coming out of the
Yates algorithm. Instead (still temporarily calculating as if only the first three
factors were present) one can make confidence intervals based on the estimates,
by employing the ν = 8 = 16 − 8, m = 2, and p = 3 version of formula (8.13)
from Section 8.2. That is, using 95% two-sided individual confidence intervals,
a precision of
√
.02005
±2.306 √ = ±.0817
2 · 23
should be attached to each of the estimates in Table 8.31. By this standard,
none of the estimates from the propellant study are clearly different from 0. For
Table 8.31
The Yates Algorithm for a 23 Factorial Applied to the 26−3 Propellant Data
Example 12 engineering purposes, the bottom line is that more data are needed before even
(continued ) the most tentative conclusions about system behavior should be made.
where the fact that the ADE 3-factor interaction is aliased with the grand mean can
be seen by multiplying together ABCD and BCE, which (from the generators)
themselves represent effects aliased with the grand mean. Here one sees that
effects will be aliased together in eight groups of four.
The data reported by Hansen and Best, and some corresponding summary
statistics, are given in Table 8.33. The pooled sample variance derived from the
values in Table 8.33 is
(3 − 1)(2.543) + (2 − 1)(2.163) + (2 − 1)(.238)
sP2 =
(3 − 1) + (2 − 1) + (2 − 1)
= 1.872
Table 8.32
Five Catalysis Variables and Their Experimental Levels
Table 8.33
Data from a 25−2 Catalyst Study and Corresponding Sample
Means and Variances
that is,
I ±1.195% water
The simplest possible tentative interpretation of the first two of these results is that
the A and E main effects are large enough to see above the background variation.
What to make of the third, given the first two, is not so clear. The large 3.682
estimate can equally simply be tentatively attributed to a D main effect or to an
AE 2-factor interaction. (Interestingly, Hansen and Best reported that subsequent
experimentation was done with the purpose of determining the importance of the
D main effect, and indeed, the importance of this factor in determining y was
established.)
Exactly what to make of the fourth statistically significant estimate is even
less clear. It is therefore comforting that, although big enough to be detectable,
it is less than half the size of the third largest estimate. In the particular real
situation, the authors seem to have found an “A, E, and D main effects only”
description of y useful in subsequent work with the chemical system.
The reader may have noticed that the possibilities discussed in the previous
example do not even exhaust the plausible interpretations of the fact that three
estimated sums of effects are especially large. For example, “large DE 2-factor
interactions and large D and E main effects” is yet another alternative possibility.
This ambiguity serves to again emphasize the tentative nature of conclusions that
can be drawn on the basis of small fractions of full factorials. And it also underlines
the absolute necessity of subject-matter expertise and follow-up study in sorting out
the possibilities in a real problem. There is simply no synthetic way to tell which of
various simple alternative explanations suggested by a fractional factorial analysis
is the right one.
have the simplest alias structure possible when it comes time to interpret the results
Good choice of a fractional factorial study. The object is to have low-order effects (like main
of a fractional effects and 2-factor interactions) aliased not with other low-order effects, but rather
factorial only with high-order effects (many-factor interactions). It is the defining relation
that governs how the 2 p factorial effects are divided up into groups of aliases. If
there are only long products of factor names appearing in the defining relation,
low-order effects are aliased only with high-order effects. On the other hand, if there
are short products of factor names appearing, there will be low-order effects aliased
with other low-order effects. As a kind of measure of quality of a 2 p−q plan, it is
thus common to adopt the following notion of design resolution.
Definition 10 The resolution of a 2 p−q fractional factorial plan is the number of letters in
the shortest product appearing in its defining relation.
In general, when contemplating the use of a 2 p−q design, one wants the largest
resolution possible for a given investment in 2 p−q combinations. Not all choices of
generators give the same resolution. In Section 8.3, the prescription given for the 12
fractions was intended to give 2 p−1 fractional factorials of resolution p (the largest
resolution possible). For general 2 p−q studies, one must be a bit careful in choosing
generators. What seems like the most obvious choice need not be the best in terms
of resolution.
E ↔ ABCD
F ↔ ABC
The resulting design is of resolution 3, and there are some main effects aliased
with (only) 2-factor interactions.
On the other hand, the perhaps slightly less natural choice of generators
E ↔ BCD
F ↔ ABC
622 Chapter 8 Inference for Full and Fractional Factorial Studies
and is of resolution 4. No main effect is aliased with any interaction of order less
than 3. This second choice is better than the first in terms of resolution.
Table 8.35 indicates what is possible in terms of resolution for various numbers
of factors and combinations for a 2 p−q fractional factorial. The table was derived
from a more detailed one on page 410 of Statistics for Experimenters by Box, Hunter,
and Hunter, which gives not only the best resolutions possible but also generators for
designs achieving those resolutions. The more limited information in Table 8.35 is
sufficient for most purposes. Once one is sure what is possible, it is usually relatively
painless to do the trial-and-error work needed to produce a plan of highest possible
resolution. And it is probably worth doing as an exercise, to help one consider the
pros and cons of various choices of generators for a given set of real factors.
Table 8.35 has no entries in the “8 combinations” row for more than 7 factors.
If the table were extended beyond 11 factors, there would be no entries in the “16
samples” row beyond 15 factors, no entries in the “32 samples” row beyond 31
factors, etc. The reason for this should be obvious. For 8 combinations, there are
only 7 columns total to use in Table 8.28. Corresponding tables for 16 combinations
would have only 15 columns total, for 32 combinations only 31 columns total, etc.
As they have been described here, 2 p−q fractional factorials can be used to study
at most 2t − 1 factors in 2t samples. The cases of 7 factors in 8 combinations, 15
factors in 16 combinations, 31 factors in 32 combinations, etc. represent a kind of
extreme situation where a maximum number of factors is studied (at the price of
creating a worst possible alias structure) in a given number of combinations. For the
case of p = 7 factors in 8 combinations, effects are aliased in 27−4 = 8 groups of
24 = 16; for the case of p = 15 factors in 16 combinations, the effects are aliased in
215−11 = 16 groups of 211 = 2,048; etc. These extreme cases of 2t − 1 factors in 2t
combinations are sometimes called saturated fractional factorials. They have very
complicated alias structures and can support only the most tentative of conclusions.
Table 8.35
Best Resolutions Possible for Various Numbers of Combinations in a 2p−q Study
Number of Factors ( p)
4 5 6 7 8 9 10 11
8 4 3 3 3 — — — —
Number of 16 5 4 4 4 3 3 3
Combinations (2 p−q ) 32 6 4 4 4 4 4
64 7 5 4 4 4
128 8 6 5 5
8.4 Standard Fractions of Two-Level Factorials Part II: General 2p−q Studies 623
Table 8.36
15 Process Variables and Their Experimental Levels
The combinations actually run and the cold crack resistances observed are given
in Table 8.37.
Ignoring all factors but A, B, C, and D, the combinations listed in Table 8.37
are in Yates standard order and are therefore ready for use in finding estimates
of sums of effects. Table 8.38 shows the results of using the (four-cycle) Yates
algorithm on the 16 observations listed in Table 8.37. A normal plot of the last
15 of these estimates is shown in Figure 8.13. It is clear from the figure that the
two corresponding to B + aliases and F + aliases are detectably larger than
the rest.
624 Chapter 8 Inference for Full and Fractional Factorial Studies
Combination y Combination y
2.4 B + Aliases
F + Aliases
1.2
2
0.0 2
–1.2
It is not feasible to write out the whole defining relation for this 215−11
study. Effects are aliased in 2 p−q = 215−11 = 16 groups of 2q = 211 = 2,048. In
particular (though it would certainly be convenient if the 2.87 estimate in Table
8.38 could be thought of as essentially representing β2 ), β2 has 2,047 aliases,
some of them as simple as 2-factor interactions. By the same token, it would
certainly be convenient if the small estimates in Table 8.38 were indicating that
all summands of the sums of effects they represent were small. But the possibility
of cancellation in the summation must not be overlooked.
The point is that only the most tentative description of this system should
be drawn from even this very simple “two large estimates” outcome. The data
in Table 8.37 hint at the primary importance of factors B and F in determining
cold crack resistance, but the case is hardly airtight. There is a suggestion of
a direction for further experimentation and discussion with process experts but
certainly no detailed map of the countryside where one is going.
8.4 Standard Fractions of Two-Level Factorials Part II: General 2p−q Studies 625
Table 8.38
Estimates of Sums of Effects for the 215−11
Process Development Study
One thing that can be said fairly conclusively on the basis of this study is
that the analysis points out what is in retrospect obvious in Table 8.37. Consistent
with the “B + aliases and F + aliases sums are positive and large” story told
in Figure 8.13, the largest four values of y listed in Table 8.37 correspond to
combinations where both B and F are at their high levels.
I ↔ ABCD (8.29)
I ↔ −ABCD (8.30)
define the two sets of 8 out of 16 ABCD combinations actually run. These result
from a formal expression like
I ↔ ABCDE (8.31)
where E can be thought of as contributing either the plus or the minus signs in
expressions (8.29) and (8.30). If one calls block 1 (the first set of 8 samples) the
high level of E, expression (8.31) leads to exactly the I ↔ ABCD 12 -fraction of
24 combinations of A, B, C, and D for use as block 1. And the I ↔ −ABCD
1
2
-fraction for use as block 2. This can be seen in Table 8.39.
With factor E designating block number, the two columns of Table 8.39
taken together designate the I ↔ ABCDE 12 -fraction of 25 A, B, C, D, and E
combinations. And (ignoring the e) the first column of Table 8.39 designates
the I ↔ ABCD 12 -fraction of 24 A, B, C, and D combinations, while the second
designates the I ↔ −ABCD 12 -fraction of 24 A, B, C, and D combinations.
Once it is clear that the Johnson, Clapp, and Baqai study can be thought
of in terms of expression (8.31) with the two-level blocking factor E, it is also
clear how any block effects will show up during data analysis. One temporarily
ignores the blocks and uses the Yates algorithm to compute fitted 24 factorial
effects. It is then necessary to remember, for example, that the fitted ABCD 4-
factor interaction reflects not only αβγ δ2222 but any block main effects as well.
8.4 Standard Fractions of Two-Level Factorials Part II: General 2p−q Studies 627
Table 8.39
A 25−1 Fractional Factorial or
a 24 Factorial in Two Blocks
Block 1 Block 2
e a
abe b
ace c
bce abc
ade d
bde abc
cde acd
abcde bcd
And for example, any 2-factor interaction of A and blocks will be reflected in
the fitted BCD 3-factor interaction. Of course, if all interactions with blocks are
negligible, all fitted effects except that for the ABCD 4-factor interaction would
indeed represent the appropriate 24 factorial effects.
E ↔ BCD (8.32)
F ↔ ABC (8.33)
will be used here. Table 8.40 indicates the 16 combinations of levels of factors A
through F prescribed by the generators (8.32) and (8.33).
628 Chapter 8 Inference for Full and Fractional Factorial Studies
Example 17 The four different combinations of levels of E and F ((1), e, f, and ef) can be
(continued ) thought as designating in which block a given ABCD combination should appear.
So generators (8.32) and (8.33) prescribe the division of the full 24 factorial (in
the factors A through D) into the blocks indicated in Table 8.40 and Table 8.41.
As always, the defining relation (given here in display (8.34)) describes how
effects are aliased. Table 8.42 indicates the aliases of each of the 24 factorial
effects, obtained by multiplying through relation (8.34) by the various combina-
tions of the letters A, B, C, and D. Notice from Table 8.42 that the BCD and ABC
3-factor interactions are aliased with block main effects. So is the AD 2-factor
Table 8.40
16 Combinations of Levels of A through F
− − − − − − 1
+ − − − − + 3
− + − − + + 4
+ + − − + − 2
− − + − + + 4
+ − + − + − 2
− + + − − − 1
+ + + − − + 3
− − − + + − 2
+ − − + + + 4
− + − + − + 3
+ + − + − − 1
− − + + − + 3
+ − + + − − 1
− + + + + − 2
+ + + + + + 4
Table 8.41
A 24 Factorial in Four Blocks
(from a 26−2 Fractional Factorial)
Table 8.42
Aliases of the 24 Factorial Effects
When Run in Four Blocks Prescribed
by Generators (8.32) and (8.33)
interaction, since one of its aliases is EF, which involves only the two-level extra
factors E and F used to represent the four-level factor Blocks. On the other hand,
if interactions with Blocks are negligible, it is only these three of the 24 factorial
effects that are aliased with other possibly nonnegligible effects. (For any other
of the 24 factorial effects, each alias involves letters both from the group A, B, C,
and D and also from the group E and F—and is therefore some kind of Block ×
Treatment interaction.)
Analysis of data from a plan like that in Table 8.41 would proceed as indicated
repeatedly in this chapter. The Yates algorithm applied to sample means listed
in Yates standard order for factors A, B, C, and D produces estimates that are
interpreted in light of the alias structure laid out in Table 8.42.
Example 18 Only eight combinations are to be chosen. In doing so, one needs to account
(continued ) for the four experimental factors A, B, C, and D and two extras E and F, which
can be used to represent the four-level factor Blocks. Starting with the first three
experimental factors A, B, and C (three of them because 23 = 8), one needs to
choose three generators. The original 24−1 study had generator
D ↔ ABC
so it is natural to begin there. For the sake of example, consider also the generators
E ↔ BC
F ↔ AC
and the prescribed set of combinations listed in Table 8.43. (The four different
combinations of levels of E and F ((1), e, f, and ef) designate in which block a
given ABCD combination from the 12 fraction should appear.)
Table 8.43
A 26−3 Fractional Factorial or a 24−1 Fractional Factorial
in Four Blocks
Some experimenting with relation (8.35) will show that all 2-factor inter-
actions of the four original experimental factors A, B, C, and D are aliased not
only with other 2-factor interactions of experimental factors but also with Block
main effects. Thus, any systematic block-to-block changes would further confuse
one’s perception of 2-factor interactions of the experimental factors. But at least
the main effects of A, B, C, and D are not aliased with Block main effects.
figuring out, for example, how to analyze a full 24 factorial that is run completely
once in each of two blocks, or even how to analyze a standard 24−1 fractional
factorial that is run completely once in each of four blocks.
Section 4 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. What are the advantages and disadvantages of frac- 3. What is the principle of “sparsity of effects” and
tional factorial experimentation in comparison to how can it be used in the analysis of unreplicated
factorial experimentation? 2 p and 2 p−q experiments?
2. Under what circumstances can one hope to be suc- 4. In a 7-factor study, only 32 different combinations
cessful experimenting with (say) 12 factors in (say) of levels of (two-level factors) A, B, C, D, E, F, and
16 experimental runs (i.e., based on 16 data points)?
632 Chapter 8 Inference for Full and Fractional Factorial Studies
G will be included, at least initially. The genera- 5. In a 25−2 study, where four sample sizes are 1 and
tors F ↔ ABCD and G ↔ ABCE will be used to four sample sizes are 2, sP = 5. If 90% two-sided
choose the 32 combinations to include in the study. confidence limits are going to be used to judge
(a) Write out the whole defining relation for the the statistical detectability of sums of effects, what
experiment that is contemplated here. plus-or-minus value will be used?
(b) Based on your answer to part (a), what effects
6. Consider planning, executing, and analyzing the re-
will be aliased with the C main effect in the
sults of a 26−2 fractional factorial experiment based
experiment that is being planned?
on the two generators E ↔ ABC and F ↔ BCD.
(c) When running the experiment, what levels of
(a) Write out the defining relation (i.e., the whole
factors F and G are used when all of A, B, C,
list of aliases of the grand mean) for such a
D, and E are at their low levels? What levels
plan.
of factors F and G are used when A, B, and C
(b) When running the experiment, what levels of
are at their high levels and D and E are at their
factors E and F are used when all of A, B, C,
low levels?
and D are at their low levels? When A is at
(d) Suppose that after listing the data (observed
its high level but B, C, and D are at their low
y’s) in Yates standard order as regards factors
levels?
A, B, C, D, and E, you use the Yates algo-
(c) Suppose that m = 3 data points from each of
rithm to compute 32 fitted sums of effects.
the 16 combinations of levels of factors (spec-
Suppose further that the fitted values appear-
ified by the generators) give a value of sP ≈
ing on the A + aliases, ABCD + aliases, and
2.00. If individual 90% two-sided confidence
BCD + aliases rows of the Yates computations
intervals are to be made to judge the statistical
are the only ones judged to be of both sta-
significance of the estimated (sums of) effects,
tistical significance and practical importance.
what is the value of the plus-or-minus part of
What is the simplest possible interpretation of
each of those intervals?
this result?
Chapter 8 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Return to the situation of Chapter Exercise 4 of (a) Compute sP for the logged data. Individual
Chapter 4. That exercise concerns some unrepli- confidence intervals for the theoretical 23 ef-
cated 23 factorial data taken from a study of the fects are of the form Ê ± 1. Find 1 if 95%
mechanical properties of a polymer. If you have individual two-sided intervals are of interest.
not already done so, use the Yates algorithm to (b) Based on your value from part (a), which
compute fitted 23 factorial effects for the data of the factorial effects are statistically de-
given in that exercise. Then make a normal plot tectable? Considering only those effects that
of the seven fitted effects a2 , b2 , . . . , abc222 as a are both statistically detectable and large
means of judging the statistical detectability of enough to have a material impact on the
the various effects on impact strength. Interpret breaking strength, interpret the results of the
this plot. students’ experiment. (For example, if the A
2. Chapter Exercise 5 in Chapter 4 concerns a 23 main effect is judged to be both detectable
study of mechanical pencil lead strength done by and of practical importance, what does mov-
Timp and M-Sidek. Return to that exercise, and if ing from the .3 diameter to the .7 diameter do
you have not already done so, use the Yates algo- to the breaking strength? Remember to trans-
rithm to compute fitted 23 effects for the logged late back from the log scale when making
data. these interpretations.)
Chapter 8 Exercises 633
(c) Use the reverse Yates algorithm to produce (c) Compare the Student main effects using indi-
fitted ln(y) values for a few-effects model vidual 95% two-sided confidence intervals.
corresponding to your answer to (b). Use the (d) Compare the Student main effects using si-
fitted values to compute residuals (still on the multaneous 95% two-sided confidence inter-
log scale). Normal-plot these and plot them vals.
against levels of each of the three factors and 4. The oil viscosity study of Dunnwald, Post, and
against the fitted values, looking for obvious Kilcoin (referred to in Chapter Exercise 8 of
problems with the few-effects model. Chapter 7) was actually a 3 × 4 full factorial
(d) Based on your few-effects model, give a 95% study. Some summary statistics for the entire data
two-sided confidence interval for the mean set are recorded in the accompanying tables. Sum-
ln(y) that would be produced by the abc treat- marized are m = 10 measurements of the viscosi-
ment combination. By exponentiating the ties of each of four different weights of three dif-
endpoints of this interval, give a 95% two- ferent brands of motor oil at room temperature.
sided confidence interval for the median num- Units are seconds required for a ball to drop a
ber of clips required to break a piece of lead particular distance through the oil.
under this set of conditions.
3. The following are the weights recorded by I = 3 10W30 SAE 30
different students when weighing the same nomi-
nally 5 g mass with J = 2 different scales m = 2 Brand M ȳ 11 = 1.385 ȳ 12 = 2.066
times apiece. (They are part of the much larger s11 = .091 s12 = .097
data set given in Chapter Exercise 5 of Chapter 3.) Brand C ȳ 21 = 1.319 ȳ 22 = 2.002
s21 = .088 s22 = .089
Scale 1 Scale 2
Brand H ȳ 31 = 1.344 ȳ 32 = 2.049
Student 1 5.03, 5.02 5.07, 5.09
s31 = .066 s32 = .089
Student 2 5.03, 5.01 5.02, 5.07
Student 3 5.06, 5.00 5.10, 5.08
10W40 20W50
Corresponding fitted factorial effects are: a1 = Brand M ȳ 13 = 1.414 ȳ 14 = 4.498
.00417, a2 = −.01583, a3 = .01167, b1 = s13 = .150 s14 = .204
−.02333, b2 = .02333, ab11 = −.00417, ab12 =
.00417, ab21 = .01083, ab22 = −.01083, ab31 = Brand C ȳ 23 = 1.415 ȳ 24 = 4.662
−.00667, and ab32 = .00667. Further, a pooled s23 = .115 s24 = .151
standard deviation is sP = .02483.
Brand H ȳ 33 = 1.544 ȳ 34 = 4.549
(a) To enhance an interaction plot of sample
means with error bars derived from 95% two- s33 = .068 s34 = .171
sided individual confidence limits for the
mean weights, what plus-or-minus value (a) Find the pooled sample standard deviation
would be used to make those error bars? Make here. What are the associated degrees of free-
such a plot and discuss the likely statistical dom?
detectability of the interactions. (b) Make an interaction plot of sample means.
(b) Individual 95% two-sided confidence limits Enhance this plot by adding error bars derived
for the interactions αβi j are of the form abi j ± from 99% individual confidence intervals for
1. Find 1 here. Based on this, are the inter- the cell means. Does it appear that there are
actions statistically detectable? important and statistically detectable interac-
tions here?
634 Chapter 8 Inference for Full and Fractional Factorial Studies
(a) Compute the fitted 23 factorial effects corre- of factors A, B, C, and D gave the accompanying
sponding to the “all high treatment” combi- ȳ and s 2 values.
nation.
(b) Compute the pooled sample standard devia- Combination ȳ s 2 Combination ȳ s2
tion, sP .
(c) Use your value of sP from (b) and find the (1) 28.4 97.6 d 36.8 146.4
plus-or-minus part of 90% individual two- a 21.9 15.1 ad 19.2 24.8
sided confidence limits for the 23 factorial b 20.2 5.1 bd 19.9 5.7
effects. ab 14.3 61.1 abd 22.5 22.5
(d) Based on your calculation in (c), which of the c 30.4 43.5 cd 25.5 53.4
effects do you judge to be detectable in this ac 25.1 96.2 acd 21.5 56.6
23 study? bc 38.2 100.8 bcd 22.0 10.4
(e) Write a paragraph or two for your engineer- abc 12.8 23.6 abcd 22.5 123.8
ing manager, summarizing the results of this
experiment and making recommendations for
the future running of this process. (Remem- (a) Compute the pooled sample standard devia-
ber that you want low y and, all else being tion. What does it measure in the present con-
equal, low production cost.) text? (Variability in hour-to-hour missed lead
counts? Variability in shift-to-shift missed
11. The article “Use of Factorial Designs in the De- lead per hour figures?)
velopment of Lighting Products” by J. Scheesley (b) Use the Yates algorithm and compute the fit-
(Experiments in Industry: Design, Analysis and ted 24 factorial effects.
Interpretation of Results, American Society for (c) Which of the effects are statistically detectable
Quality Control, 1985) discusses a large indus- here? (Use individual two-sided 98% confi-
trial experiment intended to compare the use of dence limits for the effects to make this de-
two different types of lead wire in the manufac- termination.) Is there a simple interpretation
ture of incandescent light bulbs under a variety of of this set of effects?
plant circumstances. The primary response vari- (d) Would you be willing to say, on the basis of
able in the study was your analysis in (a) through (c), that the new
lead type will provide an overall reduction in
y = average number of leads missed per hour the number of missed leads? Explain.
(because of misfeeds into automatic (e) Would you be willing to say, on the basis of
assembly equipment) your analysis in (a) through (c), that a switch
to the new lead type will provide a reduction
which was measured and recorded on the basis of
in missed leads for every set of plant circum-
eight-hour shifts. Consider here only part of the
stances? Explain.
original data, which may be thought of as having
replicated 24 factorial structure. That is, consider 12. DeBlieck, Rohach, Topf, and Wilcox conducted
the following factors and levels: a replicated 3 × 3 factorial study of the uniaxial
force required to buckle household cans. A single
A Lead Type standard (−) vs. new (+) brand of cola cans, a single brand of beer cans, and
B Plant 1 (−) vs. 2 (+)
a single brand of soup cans were used in the study.
The cans were prepared by bringing them to 0◦ C,
C Machine Type standard (−) vs. high speed (+) 22◦ C, or 200◦ C before testing. The forces required
D Shift 1st (−) vs. 2nd (+) to buckle each of m = 3 cans for the nine different
Can Type/Temperature combinations follow.
m = 4 values of y (each requiring an eight-hour
shift to produce) for each combination of levels
638 Chapter 8 Inference for Full and Fractional Factorial Studies
abi j ± 1 are appropriate. Find 1. Based on effects (in comparison to the other four, for
this value, are there statistically detectable example).
interactions here? How does this conclusion (d) Why might it be well argued that the choice
compare with your more qualitative answer D ↔ ABC is superior to the choice D ↔
to part (c)? AB?
(f) To compare Width main effects, confidence 16. p = 5 factors A, B, C, D, and E are to be stud-
intervals for the differences β j − β j 0 are in ied in a 25−2 fractional factorial study. The two
order. Find individual 95% two-sided con- generators D ↔ AB and E ↔ AC are to be used
fidence intervals for β1 − β2 , β1 − β3 , and in choosing the eight ABCDE combinations to be
β2 − β3 . Based on these, are there statis- included in the study.
tically detectable Width main effects here? (a) Give the list of eight different combinations
How does this compare with your answer to of levels of the factors that will be included
part (c)? in the study. (Use the convention of naming,
(g) Redo part (f), this time using simultaneous for each sample, those factors that should be
95% two-sided confidence intervals. set at their high levels.)
15. In Section 8.3, you were advised to choose 12 frac- (b) Give the list of all effects aliased with the
tions of 2 p factorials by using the generator A main effect if this experimental plan is
adopted.
last factor ↔ product of all other factors 17. The following are eight sample means listed in
Yates standard order (left to right), considering
For example, this means that in choosing 12 of 24 levels of three two-level factors A, B, and C:
possible combinations of levels of factors A, B, C,
and D, you were advised to use the generator D ↔ 70, 61, 72, 59, 68, 64, 69, 69
ABC. There are other possibilities. For example,
you could use the generator D ↔ AB. (a) Use the Yates algorithm here to compute eight
(a) Using this alternative plan (specified by D ↔ estimates of effects from the sample means.
AB), what eight different combinations of (b) Temporarily suppose that no value for sP is
factor levels would be run? (Use the standard available. Make a plot appropriate to identify-
naming convention, listing for each of the ing those estimates from (a) that are likely to
eight sets of experimental conditions to be run represent something more than background
those factors appearing at their high levels.) noise. Based on the appearance of your plot,
(b) For the alternative plan specified by D ↔ AB, which if any of the estimated effects are
list all eight pairs of effects of factors A, B, clearly representing something more than
C, and D that would be aliased. (You may, background noise?
if you wish, list eight sums of the effects (c) As it turned out, sP = .9, based on m = 2
µ.... , α2 , β2 , αβ22 , γ2 , . . . etc. that can be esti- observations at each of the eight different
mated.) sets of conditions. Based on 95% individual
(c) Suppose that in an analysis of data from an two-sided confidence intervals for the under-
experiment run according to the alternative lying effects estimated from the eight ȳ’s,
plan (with D ↔ AB), the Yates algorithm is which estimated effects are clearly represent-
used with ȳ’s listed according to Yates stan- ing something other than background noise?
dard order for factors A, B, and C. Give four (If confidence intervals Ê ± 1 were to be
equally plausible interpretations of the even- made, show the calculation of 1 and state
tuality that the first four lines of the Yates which estimated effects are clearly represent-
calculations produce large estimated sums of ing more than noise.)
640 Chapter 8 Inference for Full and Fractional Factorial Studies
Still considering the eight sample means, hence- were studied. (The actual levels of the factors em-
forth suppose that by some criteria, only the es- ployed were not given in the article.) The combi-
timates ending up on the first, second, and sixth nations studied and the values of y that resulted
lines of the Yates calculations are considered to are given next.
be both statistically detectable and of practical
importance. Combination y Combination y
(d) If in fact the eight ȳ’s came from a (4-factor)
24−1 experiment with generator D ↔ ABC, (1) .037 de .351
how would one typically interpret the result a .040 ade .360
that the first, second, and sixth lines of the b .014 bde .329
Yates calculations (for means in standard or- ab .042 abde .173
der for factors A, B, and C) give statistically ce .063 cd .372
detectable and practically important values? ace .100 acd .184
(e) If in fact the eight ȳ’s came from a (5-factor) bce .067 bcd .158
25−2 experiment with generators D ↔ ABC abce .026 abcd .131
and E ↔ AC, how would one typically inter-
pret the result that the first, second, and sixth
lines of the Yates calculations (for means in Kenett and Vogel were apparently called in after
standard order for factors A, B, and C) give the fact of experimentation to help analyze this
statistically detectable and practically impor- nonstandard 12 fraction of the full 25 factorial.
tant values? The recommendations of Section 8.3 were not
followed in choosing which 16 of the 32 possible
18. A production engineer who wishes to study six
combinations of levels of factors A through E to
two-level factors in eight experimental runs de-
include in the wave soldering study. In fact, the
cides to use the generators D ↔ AB, E ↔ AC,
generator E ↔ −CD was apparently employed.
and F ↔ BC in planning a 26−3 fractional facto-
(a) Verify that the combinations listed above are
rial experiment. in fact those prescribed by the relationship
(a) What eight combinations of levels of the six E ↔ −CD. (For example, with all of A
factors will be run? (Name them using the through D at their low levels, note that the
usual convention of prescribing for each run low level of E is indicated by multiplying
which of the factors will appear at their high minus signs for C and D by another minus
levels.) sign. Thus, combination (1) is one of the 16
(b) What seven other effects will be aliased with prescribed by the generator.)
the A main effect in the engineer’s study? (b) Write the defining relation for the experiment.
19. The article “Going Beyond Main-Effect Plots” by What is the resolution of the design chosen by
Kenett and Vogel (Quality Progress, 1991) out- the authors? What resolution does the stan-
lines the results of a 25−1 fractional factorial in- dard choice of 12 fraction provide? Unless
dustrial experiment concerned with the improve- there were some unspecified extenuating cir-
ment of the operation of a wave soldering ma- cumstances that dictated the choice of 12 frac-
chine. The effects of the five factors Conveyor tion, why does it seem to be an unwise one?
Speed (A), Preheat Temperature (B), Solder Tem- (c) Write out the 16 different differences of ef-
perature (C), Conveyor Angle (D), and Flux Con- fects that can be estimated based on the data
centration (E) on the variable given. (For example, one of these is µ..... −
γ δ222 , another is α2 − αγ δ2222 , etc.)
y = the number of faults per 100 solder joints (d) Notice that the combinations listed here are in
(computed from inspection of 12 Yates standard order as regards levels of fac-
circuit boards) tors A through D. Use the four-cycle Yates
Chapter 8 Exercises 641
algorithm and find the fitted differences of ef- y2 is a measure of uniformity of the epitaxial
fects. Normal-plot these and identify any sta- thickness, and y1 is (clearly) a measure of the
tistically detectable differences. Notice that magnitude of the thickness. The authors reported
by virtue of the choice of 12 fraction made by results from the experiment as shown in the ac-
the engineers, the most obviously statistically companying table.
significant difference is that of a main effect
and a 2-factor interaction. Combination y1 (µm) y2
20. The article “Robust Design: A Cost-Effective
(1) 14.821 −.4425
Method for Improving Manufacturing Processes”
afgh 14.888 −1.1989
by Kacker and Shoemaker (AT&T Technical Jour-
nal, 1986) discusses the use of a 28−4 fractional begh 14.037 −1.4307
factorial experiment in the improvement of the abef 13.880 −.6505
performance of a step in an integrated circuit fab- cefh 14.165 −1.4230
rication process. The initial step in fabricating sil- aceg 13.860 −.4969
icon wafers for IC devices is to grow an epitaxial bcfg 14.757 −.3267
layer of sufficient (and, ideally, uniform) thick- abch 14.921 −.6270
ness on polished wafers. The engineers involved defg 13.972 −.3467
in running this part of the production process adeh 14.032 −.8563
considered the effects of eight factors (listed in bdfh 14.843 −.4369
the accompanying table) on the properties of the abdg 14.415 −.3131
deposited epitaxial layer.
cdgh 14.878 −.6154
acdf 14.932 −.2292
Factor A Arsenic Flow Rate 55% (−) vs. 59% (+)
bcde 13.907 −.1190
Factor B Deposition Temperature 1210◦ C (−)
abcdefgh 13.914 −.8625
vs. 1220◦ C (+)
Factor C Code of Wafers 668G4 (−)
vs. 678G4 (+) It is possible to verify that the combinations listed
Factor D Susceptor Rotation continuous (−) here come from the use of the four generators E ↔
vs. oscillating (+) BCD, F ↔ ACD, G ↔ ABD, and H ↔ ABC.
Factor E Deposition Time high (−) vs. low (+) (a) Write out the whole defining relation for this
Factor F HC1 Etch Temperature 1180◦ C (−) experiment. (The grand mean will have 15
vs. 1215◦ C (+) aliases.) What is the resolution of the design?
Factor G HC1 Flow Rate 10% (−) vs. 14% (+) (b) Consider first the response y2 , the measure
Factor H Nozzle Position 2 (−) vs. 6 (+)
of uniformity of the epitaxial layer. Use the
Yates algorithm and normal- and/or half
normal-plotting (see Exercise 9) to identify
A batch of 14 wafers is processed at one time, statistically detectable fitted sums of effects.
and the experimenters measured thickness at five Suppose that only the two largest (in magni-
locations on each of the wafers processed during tude) of these are judged to be both statisti-
one experimental run. These 14 × 5 = 70 mea- cally significant and of practical importance.
surements from each run of the process were then What is suggested about how levels of the
reduced to two response variables: factors might henceforth be set in order to
minimize y2 ? From the limited description of
y1 = the mean of the 70 thickness measurements the process above, does it appear that these
y2 = the logarithm of the variance of the 70 settings require any extra manufacturing ex-
thickness measurements pense?
642 Chapter 8 Inference for Full and Fractional Factorial Studies
(c) Turn now to the response y1 . Again use the dividing by another so as to disguise the original
Yates algorithm and normal- and/or half responses without destroying their basic structure.
normal-plotting to identify statistically de- You may think of these values as output measured
tectable sums of effects. Which of the factors in numbers of some undisclosed units above an
seems to be most important in determining undisclosed baseline value.)
the average epitaxial thickness? In fact, the
target thickness for this deposition process Combination y
was 14.5 µm. Does it appear that by appro-
priately choosing a level of this variable it ef 13.99
may be possible to get the mean thickness on a 6.76
target? Explain. (As it turns out, the thought bf 20.71
process outlined here allowed the engineers abe 11.11, 11.13
to significantly reduce the variability in epi- ce 19.61
taxial thickness while getting the mean on acf 15.73
target, improving on previously standard pro- bc 23.45
cess operating methods.) abcef 20.00
21. Arndt, Cahill, and Hovey worked with a plastics def 24.94
manufacturer and experimented on an extrusion ad 24.03, 25.03
process. They conducted a 26−2 fractional facto- bdf 24.97
rial study with some partial “replication” (the rea- abde 24.29
son for the quote marks will be discussed later). cde 24.94, 25.21
The experimental factors in their study were as
acdf 24.32, 24.48
follows:
bcd 30.00
Factor A Bulk Density, a measure of the weight per abcdef 33.08
unit volume of the raw material used
Factor B Moisture, the amount of water added to the (a) The students who planned this experiment
raw material mix hadn’t been exposed to the concept of design
Factor C Crammer Current, the amperage supplied to resolution. What does Table 8.35 indicate is
the crammer-auger the best possible resolution for a 26−2 frac-
Factor D Extruder Screw Speed tional factorial experiment? What is the res-
Factor E Front-End Temperature, a temperature controlled olution of the one that the students planned?
by heaters on the front end of the extruder Why would they have been better off with a
Factor F Back-End Temperature, a temperature controlled different plan than the one specified by the
by heaters on the back end of the extruder generators E↔ AB and F ↔ AC?
(b) Find a choice of generators E ↔ (some prod-
Physically low and high levels of these factors uct of letters A through D) and F ↔ (some
were identified. Using the two generators E ↔ other product of letters A through D) that
AB and F ↔ AC, 16 different combinations of provides maximum resolution for a 26−2 ex-
levels of the factors were chosen for inclusion in periment.
a plant experiment, where the response of primary (c) The combinations here are listed in Yates
interest was the output of the extrusion process in standard order as regards factors A through
terms of pounds of useful product per hour. A D. Compute ȳ’s and then use the (four-cycle)
coded version of the data the students obtained is Yates algorithm and compute 16 estimated
given in the accompanying table. (The data have sums of 26 factorial effects.
been rescaled by subtracting a particular value and
Chapter 8 Exercises 643
(d) When the extrusion process is operating, many next, and what would you be planning to do
pieces of product can be produced in an hour, with them?
but the entire data collection process lead- 22. The article “The Successful Use of the Taguchi
ing to the data here took over eight hours. Method to Increase Manufacturing Process Capa-
(Note, for example, that changing tempera- bility” by S. Shina (Quality Engineering, 1991)
tures on industrial equipment requires time discusses the use of a 28−3 fractional factorial
for parts to heat up or cool down, changing experiment to improve the operation of a wave
formulas of raw material means that one must soldering process for through-hole printed circuit
let one batch clear the system, etc.) The re- boards. The experimental factors and levels stud-
peat observations above were obtained from ied were as shown in the accompanying table.
two consecutive pieces of product, made min-
utes apart, without any change in the extruder Factor A Preheat Temperature 180◦ (−) vs. 220◦ (+)
setup in between their manufacture. With this
Factor B Solder Wave height .250 (−) vs. .400 (+)
in mind, discuss why a pooled standard de-
Factor C Wave Temperature 490◦ (−) vs. 510◦ (+)
viation based on these four “samples of size
2” is quite likely to underrepresent the level Factor D Conveyor Angle 5.0 (−) vs. 6.1 (+)
of “baseline” variability in the output of this Factor E Flux Type A857 (−) vs. K192 (+)
process under a fixed combination of levels of Factor F Direction of Boards 0 (−) vs. 90 (+)
factors A through F. Argue that it would have Factor G Wave Width 2.25 (−) vs. 3.00 (+)
been extremely valuable to have (for exam- Factor H Conveyor Speed 3.5 (−) vs. 6.0 (+)
ple) rerun one or more of the combinations
tested early in the study again late in the study. The generators F ↔ −CD, G ↔ −AD, and H ↔
(e) Use the pooled sample standard deviation −ABCD were used to pick 32 different com-
from the repeat observations and compute binations of levels of these factors to run. For
(using the p = 4 version of formula (8.12) in each combination, four special test printed cir-
Section 8.2) the plus-or-minus part of 90% cuit boards were soldered, and the lead shorts per
two-sided confidence limits for the 16 sums board, y1 , and touch shorts per board, y2 , were
of effects estimated in part (c), acting as if the counted, giving the accompanying data. (The data
value of sP were a legitimate estimate of back- here and on page 644 are exactly as given in the
ground variability. Which sums of effects are article, and we have no explanation for the fact
statistically detectable by this standard? How that some of the numbers do not seem to have
do you interpret this in light of the informa- come from division of a raw count by 4.)
tion in part (d)?
(f) As an alternative to the analysis in part (e), Combination y1 y2
make a normal plot of the last 15 of the 16
estimated sums of effects you computed in (1) 6.00 13.00
part (c). Which sums of effects appear to be agh 10.00 26.00
statistically detectable? What is the simplest bh 10.00 12.00
interpretation of your findings in the context abg 8.50 14.00
of the industrial problem? (What has been cfh 1.50 18.75
learned about how to run the extruding pro- acfg .25 16.25
cess?) bcf 1.75 25.75
(g) Briefly discuss where to go from here if it
abcfgh 4.25 18.50
is your job to optimize the extrusion process
dfgh 6.50 6.50
(maximize y). What data would you collect
(continued )
644 Chapter 8 Inference for Full and Fractional Factorial Studies
(a) In light of the material in Chapter 2 on ex- Feed Ratio 6. The response variable was
periment planning and the formal notion of
confounding, what risk of a serious logical y = percent conversion of butane
flaw did the engineers run in the execution of
their experiment? (How would possible shift- and the data in the accompanying table were col-
to-shift differences show up in the data from lected.
an experiment run like this? One of the main
things learned from the experiment was that Feed Wall Feed
factor E was very important. Did the engi- Day Flow Temp. Ratio Combination y
neers run the risk of clouding their view of
this important fact?) Explain. 1 115 495 6 — 78
(b) Devise an alternative plan that could have 1 50 470 4 (1) 99
been used to collect data in the situation of 1 180 520 8 abc 87
Exercise 22 without completely confounding 2 50 520 4 b 98
the effects of Flux and Shift. Continue to use 2 180 470 8 ac 18
the 32 combinations of the original factors 2 115 495 6 — 87
listed in Exercise 22, but give a better as- 3 50 520 8 bc 95
signment of 16 of them to each shift. (Hint: 3 180 470 4 a 59
Think of Shift as a ninth factor, pick a sensi- 3 115 495 6 — 90
ble generator, and use it to put half of the 32 4 50 470 8 c 76
combinations in each shift. There are a variety
4 180 520 4 ab 92
of possibilities here.)
4 115 495 6 — 89
(c) Discuss in qualitative terms how you would
do data analysis if your suggestion in (b) were
to be followed. (a) Suppose that to begin with, you ignore the
fact that these data were collected over a pe-
24. The article “Computer Control of a Butane Hy-
riod of four days and simply treat the data as
drogenolysis Reactor” by Tremblay and Wright
a complete 23 factorial augmented with a re-
(The Canadian Journal of Chemical Engineering,
peated center point. Analyze these data using
1974) contains an interesting data set concerned
the methods of this chapter. (Compute sP from
with the effects of p = 3 process variables on the
four center points. Use the Yates algorithm
performance of a chemical reactor. The factors
and the eight corner points to compute fitted
and their levels were as follows:
23 factorial effects. Then judge the statistical
Factor A Total Feed Flow (cc/sec at STP) 50 (−)
significance of these using appropriate 95%
vs. 180 (+) two-sided confidence limits based on sP .) Is
Factor B Reactor Wall Temperature (◦ F) 470 (−)
any simple interpretation of the experimental
vs. 520 (+) results in terms of factorial effects obvious?
Factor C Feed Ratio (Hydrogen/Butane) 4 (−)
According to the authors, there was the possibility
vs. 8 (+) of “process drift” during the period of experimen-
tation. The one-per-day center points were added
to the 23 factorial at least in part to provide some
The data had to be collected over a four-day pe-
check on that possibility, and the allocation of two
riod, and two combinations of the levels of fac-
ABC combinations to each day was very carefully
tors A, B, and C above were run each day along
done in order to try to minimize the possible con-
with a center point—a data point with Total Feed
founding introduced by any Day/Block effects.
Flow 115, Reactor Wall Temperature 495, and
The rest of this problem considers analyses that
646 Chapter 8 Inference for Full and Fractional Factorial Studies
might be performed on the experimenters’ data in normal-plot (or half normal-plot) the fitted
recognition of the possibility of process drift. effects from (a). Do so. Interpret your plot,
(b) Plot the four center points against the num- supposing that there were no interactions with
ber of the day on which they were collected. Days in the reactor study. How do your con-
What possibility is at least suggested by your clusions differ (if at all) from those in (a)?
plot? Would the plot be particularly troubling (f) One possible way of dealing with the possi-
if your experience with this reactor told you bility of Day effects in this particular study
that a standard deviation of around 5(%) was is to use the center point on each day as a
to be expected for values of y from consecu- sort of baseline and express each other re-
tive runs of the reactor under fixed operating sponse as a deviation from that baseline. (If
conditions on a given day? Would the plot be on day i there is a Day effect γi , and on day
troubling if your experience with this reactor i the mean response for any combination of
told you that a standard deviation of around levels of factors A through C is µcomb + γi ,
1(%) was to be expected for values of y from the mean of the difference ycomb − ycenter is
consecutive runs of the reactor under fixed µcomb − µcenter ; one can therefore hope to see
operating conditions on a given day? 23 factorial effects uncontaminated by ad-
(c) The four-level factor Day can be formally ditive Day effects using such differences in
thought of in terms of two extra two-level place of the original responses.) For each of
factors—say, D and E. Consider the choice of the four days, subtract the response at the cen-
generators D ↔ AB and E ↔ BC for a 25−2 ter point from the other two responses and
fractional factorial. Verify that the eight com- apply the Yates algorithm to the eight differ-
binations of levels of A through E prescribed ences. Normal-plot the fitted effects on the
by these generators divide the eight possible (difference from the center point mean) re-
combinations of levels of A through C up into sponse. Is there any substantial difference be-
the four groups of two corresponding to the tween the result of this analysis and that for
four days of experimentation. (To begin with, the others suggested in this problem?
note that both A low, B low, C low and A 25. The article “Including Residual Analysis in De-
high, B high, C high correspond to D high signed Experiments: Case Studies” by W. H.
and E high. That is, the first level of Day Collins and C. B. Collins (Quality Engineering,
can be thought of as the D high and E high 1994) contains discussions of several machining
combination.) experiments concerned with surface finish. Given
(d) The choice of generators in (c) produces the here are the factors and levels studied in (part of)
defining relation I ↔ ABD ↔ BCE ↔ one of those experiments on a particular lathe.
ACDE. Write out, on the basis of this defining
relation, the list of eight groups of aliased 25 Factor Levels
factorial effects. Any effect involving factors
A, B, or C with either of the letters D (δ) or E A Speed 2500 RPM (−)
() in its name represents some kind of inter- vs. 4500 RPM (+)
action with Days. Explain what it means for B Feed .003 in/rev (−)
there to be no interactions with Days. Make vs. .009 in/rev (+)
out a list of eight smaller groups of aliased ef- C Tool Condition New (−)
fects that are appropriate supposing that there vs. Used (after 250 parts) (+)
are no interactions with Days.
(e) Allowing for the possibility of Day (Block) m = 2 parts were turned on the lathe for each of
effects, it does not make sense to use the cen- the 23 different combinations of levels of the 3
ter points to compute sP . However, one might
Chapter 8 Exercises 647
factors, and surface finish measurements, y, were 26. Below are 24 factorial data for two response vari-
made on these. (y is a measurement of the verti- ables taken from the article “Chemical Vapor De-
cal distance traveled by a probe as it moves hor- position of Tungsten Step Coverage and Thick-
izontally across a particular 1 inch section of the ness Uniformity Experiments” by J. Chang (Thin
part.) Next are some summary statistics from the Solid Films, 1992). The experiment concerned the
experiment. blanket chemical vapor deposition of tungsten in
the manufacture of integrated circuit chips. The
Combination ȳ s Combination ȳ s factors studied were as follows:
(1) 33.0 0.0 c 35.5 6.4 A Chamber Pressure 8 (−) vs. 9 (+)
a 45.5 7.8 ac 44.0 7.1 B H2 Flow 500 (−) vs. 1000 (+)
b 222.5 4.9 bc 216.5 6.4 C SiH4 Flow 15 (−) vs. 25 (+)
ab 241.5 4.9 abc 216.5 0.7
D WF6 Flow 50 (−) vs. 60 (+)
(a) Find sP and its degrees of freedom. What does The pressure is measured in Torr and the flows
this quantity intend to measure? are measured in standard cm3 /min. The response
(b) 95% individual two-sided confidence limits variable y1 is the “percent step coverage,” 100
for the mean surface finish measurement for times the ratio of tungsten film thickness at the
a part turned under a given set of conditions top of the side wall to the bottom of the side wall
are of the form ȳ i jk ± 1. Based on the value (large is good). The response variable y2 is an
of sP found above, find 1. “average sheet resistance” (measured in m).
(c) Would you say that the mean surface finish
measurements for parts of types “(1)” and Combination y1 y2 Combination y1 y2
“a” are detectably different? Why or why not?
(Show appropriate calculations.) (1) 73 646 d 83 666
(d) 95% individual two-sided individual confi- a 60 623 ad 80 597
dence limits for the 23 factorial effects in this b 77 714 bd 100 718
study are of the form Ê ± 1. Find 1. ab 90 643 abd 85 661
(e) Compute the 23 factorial fitted effects for the c 67 360 cd 77 304
“all high” combination (abc). ac 78 359 acd 90 309
(f) Based on your answers to parts (d) and (e),
bc 100 335 bcd 70 360
which of the main effects and/or interactions
abc 77 318 abcd 75 318
do you judge to be statistically detectable?
Explain.
(g) Give the practical implications of your answer (a) Make a normal plot of the 15 fitted effects
to part (f). (How do you suggest running the a2 , b2 , . . . , abcd2222 as a means of judging
lathe if small y and minimum machining cost the statistical detectability of the effects on the
are desirable?) response, y1 . Interpret this plot and say what
(h) Suppose you were to judge only the B main is indicated about producing good “percent
effect to be both statistically detectable and step coverage.”
of practical importance in this study. What (b) Repeat part (a) for the response variable y2 .
surface finish value would you then predict Now suppose that instead of a full factorial study,
for a part made at a 2500 RPM speed and a only the half fraction with defining relation
.009 in/rev feed rate using a new tool? D ↔ ABC had been conducted.
648 Chapter 8 Inference for Full and Fractional Factorial Studies
(c) Which 8 of the 16 treatment combinations factorial experiment. The experimental factors
would have been run? List these combina- and their levels in the study were:
tions in Yates standard order as regards fac-
tors A, B, and C and use the (3-cycle Yates A Method of Preparation Usual (−) vs. Modified (+)
algorithm) to compute the 8 estimated sums B Sugar Content 50% (−) vs. 60% (+)
of effects that it is possible to derive from
C Antibiotic Level 8% (−) vs. 16% (+)
these 8 treatment combinations for response
y2 . Verify that each of these 8 estimates is D Aerosol .4% (−) vs. .6% (+)
the sum of two of your fitted effects from E CMC .2% (−) vs. .4% (+)
part (b). (For example, you should find that
the first estimated sum here is ȳ .... + abcd2222 The response variable was
from part (b).)
(d) Normal-plot the last 7 of the estimated sums y = separated clear volume (%)
from (c). Interpret this plot. If you had only for a suspension of antibiotic after 45 days
the data from this 24−1 fractional factorial,
would your subject-matter conclusions be the and the manufacturer hoped to find a way to make
same as those reached in part (b), based on y small. The experimenters failed to follow the
the full 24 data set? recommendation in Section 8.3 for choosing a
27. An engineer wishes to study seven experimental best half fraction of the factorial and used the
factors, A, B, C, D, E, F and G, each at 2 levels, generator E ↔ ABC (instead of the better one
using only 16 combinations of factor levels. He E ↔ ABCD).
plans initially to use generators E ↔ ABCD, F ↔ (a) In what sense was the experimental plan used
ABC, and G ↔ BCD. in the study inferior to the one prescribed in
(a) With this initial choice of generators, what Section 8.3? (How is the one from Section 8.3
16 combinations of levels of the seven factors “better”?)
will be run? The Yates algorithm applied to the 16 responses
(b) In a 27−3 fractional factorial, each effect is given in the paper produced the 16 fitted sums of
aliased with 7 other effects. Starting from effects:
the engineer’s choice of generators, find the
mean + alias = 37.563 D + alias = −7.437
defining relation for his study. (You will need
not only to consider products of pairs but also A + alias = .187 AD + alias = .937
a product of a triple.) B + alias = 2.437 BD + alias = .678
(c) An alternative choice of generators is AB + alias = .312 ABD + alias = .812
E ↔ ABC, F ↔ BCD, G ↔ ABD. This C + alias = −1.062 CD + alias = 1.438
choice yields the defining relation AC + alias = .312 ACD + alias = .062
BC + alias = −1.187 BCD + alias = .062
I ↔ ABCE ↔ BCDF ↔ ABDG ABC + alias = −2.063 ABCD + alias = −.062
↔ ADEF ↔ CDEG ↔ ACFG ↔ BEFG
(a) Make a normal plot of the last 15 of these
Which is preferable, the defining relation in fitted sums.
part (b), or the one here? Why? (b) If you had to guess (based on the results of this
experiment) the order of the magnitudes of
28. The article “Establishing Optimum Process Lev-
the five main effects (A, B, C, D and E) from
els of Suspending Agents for a Suspension Prod-
smallest to largest, what would you guess?
uct” by A. Gupta (Quality Engineering,
Explain.
1997–1998) discussed an unreplicated fractional
Chapter 8 Exercises 649
(c) Based on the normal plot in (b), which sums (b) Following are the mean thicknesses measured
of effects do you judge to be statistically de- for the combinations studied, listed in Yates
tectable? Explain. standard order as regards levels of factors A,
(d) Based on your answers to (c) and (d), how do B, and C. Use the Yates algorithm and find
you suggest that suspensions of this antibiotic eight estimated (sums of) effects.
be made in order to produce small y? What
mean y do you predict if your recommenda- A B C ȳ
tions are followed?
(e) Actually, the company that ran this study − − − .98
planned to make suspensions using both high + − − 1.58
and low levels of antibiotic (factor C). Does − + − 1.13
your answer to (d) suggest that the company + + − 1.74
needs to use different product formulations − − + 1.49
for the two levels of antibiotic? Explain. + − + .84
29. The paper “Achieving a Target Value for a Manu- − + + 2.18
facturing Process,” by Eibl, Kess, and Pukelsheim + + + 1.45
(Journal of Quality Technology, 1992) describes
a series of experiments intended to guide the ad- (c) Two-sided confidence limits based on the es-
justment of a paint coating process. The first of timated (sums of) effects calculated in part (b)
these was a 26−3 fractional factorial study. The ex- are of the form Ê ± 1. Find 1 if (individual)
perimental factors studied were as follows (exact 95% confidence is desired.
levels of these factors are not given in the paper, (d) Based on your answer to (c), list those esti-
presumably due to corporate security considera- mates from part (b) that represent statistically
tions): detectable (sums of) effects.
A Tube Height low (−) vs. high (+)
In fact, the experimental plan used by the investi-
gators had generators D ↔ AC, E ↔ BC, and F
B Tube Width low (−) vs. high (+) ↔ ABC.
C Paint Viscosity low (−) vs. high (+) (e) Specify the combinations (of levels of the ex-
D Belt Speed low (−) vs. high (+)
perimental factors A, B, C, D, E and F) that
were included in the experiment.
E Pump Pressure low (−) vs. high (+) (f) Write out the whole defining relation for this
F Heating Temperature low (−) vs. high (+) study. (You will need to consider here not only
products of pairs but a product of a triple as
The response variable was a paint coating thick- well. The grand mean is aliased with seven
ness measurement, y, whose units are mm. m = 4 other effects.)
workpieces were painted and measured for each (g) In light of your answers to part (d) and the
of the r = 8 combinations of levels of the fac- aliasing pattern here, what is the simplest pos-
tors studied. The r = 8 samples of size m = 4 sible potential interpretation of the results of
produced a value of sP = .118 mm. this experiment?
(a) Suppose that you wish to attach a precision
to one of the r = 8 sample means obtained in
this study. This can be done using 95% two-
sided confidence limits of the form ȳ ± 1.
Find 1.
9
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Regression
Analysis—Inference
for Curve- and
Surface-Fitting
650
9.1 Inference Methods Related to the Least Squares Fitting of a Line (Simple Linear Regression) 651
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
yi j = µi + i j (9.1)
yi jk = µ.. + αi + β j + i jk (9.3)
Model (9.3) really differs from model (9.2) or (9.1) only in the fact that it postulates
a special form or restriction for the means µi j . Expression (9.3) says that the means
must satisfy a parallelism relationship.
Turning now to the matter of inference based on data pairs (x 1 , y1 ), (x2 , y2 ), . . . ,
(xn , yn ) exhibiting an approximately linear scatterplot, one once again proceeds by
imposing a restriction on the one-way model (9.1). In words, the model assumptions
will be that there are underlying normal distributions for the response y with a
652 Chapter 9 Regression Analysis—Inference for Curve- and Surface-Fitting
where the i are (unobservable) iid normal (0, σ 2 ) random variables, the xi are known
constants, and β0 , β1 , and σ 2 are unknown model parameters (fixed constants).
Model (9.4) is commonly known as the (normal) simple linear regression model.
If one thinks of the different values of x in an (x, y) data set as separating it into
various samples of y’s, expression (9.4) is the specialization of model (9.1) where the
(previously unrestricted) means of y satisfy the linear relationship µ y|x = β0 + β1 x.
Figure 9.1 is a pictorial representation of the “constant variance, normal, linear (in x)
mean” model.
Inferences about quantities involving those x values represented in the data (like
the mean response at a single x or the difference between mean responses at two
different values of x) will typically be sharper when methods based on model (9.4)
can be used in place of the general methods of Chapter 7. And to the extent that model
(9.4) describes system behavior for values of x not included in the data, a model
like (9.4) provides for inferences involving limited interpolation and extrapolation
on x.
Section 4.1 contains an extensive discussion of the use of least squares in the
fitting of the approximately linear relation
y ≈ β 0 + β1 x (9.5)
to a set of (x, y) data. Rather than redoing that discussion, it is most sensible simply
to observe that Section 4.1 can be thought of as an exposition of fitting and the
use of residuals in model checking for the simple linear regression model (9.4). In
µ y| x = β 0 + β1x
Distributions of y
for various x
particular, associated with the simple linear regression model are the estimates of
β1 and β0
X
Estimator of β1 , (x − x)(y − y)
the slope b1 = X (9.6)
(x − x)2
and
Estimator of β0 , b0 = ȳ − b1 x̄ (9.7)
the intercept
Definition 1 For a set of data pairs (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) where least squares fitting
of a line produces fitted values (9.8) and residuals (9.9),
1 X 1 X 2
2
sLF = (y − ŷ)2 = e (9.10)
n−2 n−2
2
sLF estimates the level of basic background variation, σ 2 , whenever the model
(9.4) is an adequate description of the system under study. When it is not, sLF will
tend to overestimate σ . So comparing sLF to sP is another way of investigating
the appropriateness of model (9.4). (sLF much larger than sP suggests the linear
regression model is a poor one.)
654 Chapter 9 Regression Analysis—Inference for Curve- and Surface-Fitting
in the dry pressing of a ceramic compound into cylinders, and Figure 9.2 is a
scatterplot of the data.
Recall further from the calculation of R 2 in Example 1 of Chapter 4 that the
data of Table 4.1 produce fitted values in Table 4.2 and then
X
I (y − ŷ)2 = .005153
So for the pressure/density data, one has (via formula (9.10)) that
1
2
sLF = (.005153) = .000396 (g/cc)2
15 − 2
so
√
I sLF = .000396 = .0199 g/cc
If one accepts the appropriateness of model (9.4) in this powder pressing example,
for any fixed pressure the standard deviation of densities associated with many
cylinders made at that pressure would be approximately .02 g/cc.
Table 9.1
Pressing Pressures and Resultant Specimen Densities
x, y, x, y,
Pressure (psi) Density (g/cc) Pressure (psi) Density (g/cc)
2.900
2.800
2.600
2.500
Table 9.2
Sample Means and Standard Deviations of Densities for Five
Different Pressing Pressures
x, ȳ, s,
Pressure (psi) Sample Mean Sample Standard Deviation
2,000 2.479 .0070
4,000 2.569 .0110
6,000 2.652 .0056
8,000 2.769 .0423
10,000 2.866 .0114
Comparing sLF and sP , there is no indication of poor fit carried by these values.
Section 4.1 includes some plotting of the residuals (9.9) for the pressure/density
data (in particular, a normal plot that appears as Figure 4.7). Although the (raw)
residuals (9.9) are most easily calculated, most commercially available regression
programs provide standardized residuals as well as, or even in preference to, the
raw residuals. (At this point, the reader should review the discussion concerning
standardized residuals surrounding Definition 2 of Chapter 7.) In curve- and surface-
fitting analyses, the variances of the residuals depend on the corresponding x’s.
Standardizing before plotting is a way to prevent mistaking a pattern on a residual
plot that is explainable on the basis of these different variances for one that is
indicative of problems with the basic model. Under model (9.4), for a given x with
corresponding response y,
!
1 (x − x̄)2
Var(y − ŷ) = σ 2
1− − P (9.11)
n (x − x̄)2
So using formula (9.11) and Definition 7.2, corresponding to the data pair (xi , yi ) is
the standardized residual for simple linear regression
Standardized ei
residuals for
ei∗ = s (9.12)
1 (x − x̄)2
simple linear sLF 1− − Pi
regression n (x − x̄)2
The more sophisticated method of examining residuals under model (9.4) is thus to
make plots of the values (9.12) instead of plotting the raw residuals (9.9).
Example 1 Consider how the standardized residuals for the pressure/density data set are
(continued ) related to the raw residuals. Recalling that
X
I (x − x̄)2 = 120,000,000
and that the xi values in the original data included only the pressures 2,000 psi,
4,000 psi, 6,000 psi, 8,000 psi, and 10,000 psi, it is easy to obtain the necessary
values of the radical in the denominator of expression (9.12). These are collected
in Table 9.3.
9.1 Inference Methods Related to the Least Squares Fitting of a Line (Simple Linear Regression) 657
Table 9.3
Calculations for Standardized Residuals
in the Pressure/Density Study
s
1 (x − 6,000)2
x 1− −
15 120,000,000
2,000 .894
4,000 .949
6,000 .966
8,000 .949
10,000 .894
The entries in Table 9.3 show, for example, that one should expect residuals
corresponding to x = 6,000 psi to be (on average) about .966/.894 = 1.08 times
as large as residuals corresponding to x = 10,000 psi. Division of raw residuals
by sLF times the appropriate entry of the second column of Table 9.3 then puts
them all on equal footing, so to speak. Table 9.4 shows both the raw residuals
(taken from Table 4.5) and their standardized counterparts.
In the present case, since the values .894, .949, and .966 are roughly com-
parable, standardization via formula (9.12) doesn’t materially affect conclusions
about model adequacy. For example, Figures 9.3 and 9.4 are normal plots of (re-
spectively) raw residuals and standardized residuals. For all intents and purposes,
they are identical. So any conclusions (like those made in Section 4.1 based on
Figure 4.7) about model adequacy supported by Figure 9.3 are equally supported
by Figure 9.4, and vice versa.
In other situations, however (especially those where a data set contains a
few very extreme x values), standardization can involve more widely varying
denominators for formula (9.12) than those implied by Table 9.3 and thereby
affect the results of a residual analysis.
Table 9.4
Residuals and Standardized Residuals for the Pressure/Density Study
x e Standardized Residual
2,000 .0137, .0067, −.0003 .77, .38, −.02
4,000 −.0117, .0003, .0103 −.62, .02, .55
6,000 −.0210, −.0100, −.0140 −1.09, −.52, −.73
8,000 −.0403, .0097, .0437 −2.13, .51, 2.31
10,000 −.0007, .0173, −.0037 −.04, .97, −.21
658 Chapter 9 Regression Analysis—Inference for Curve- and Surface-Fitting
Example 1
1.2
0.0
–1.2
2.4
1.2
0.0
–1.2
Eb1 = β1
and
σ2
Var b1 = P (9.13)
(x − x̄)2
9.1 Inference Methods Related to the Least Squares Fitting of a Line (Simple Linear Regression) 659
b1 − β1
Z=
σ
qP
(x − x̄)2
b1 − β1
T = (9.14)
s
qP LF
(x − x̄)2
H0 : β 1 = # (9.15)
b1 − #
Test statistic for
T = (9.16)
s
H0 : β1 = # qP LF
(x − x̄)2
and a tn−2 reference distribution. More importantly, under the simple linear re-
gression model (9.4), a two-sided confidence interval for β1 can be made using
endpoints
sLF
b1 ± t r (9.17)
Confidence limits X
for the slope, β1 (x − x̄)2
where the associated confidence is the probability assigned to the interval between
−t and t by the tn−2 distribution. A one-sided interval is made in the usual way,
based on one endpoint from formula (9.17).
Example 1 In the context of the powder pressing study, Section 4.1 showed that the slope of
(continued ) the least squares line through the pressure/density data is
b1 = .0000486̄ (g/cc)/psi
660 Chapter 9 Regression Analysis—Inference for Curve- and Surface-Fitting
Example 1 Then, for example, a 95% two-sided confidence interval for β1 can be made using
(continued ) the .975 quantile of the t13 distribution in formula (9.17). That is, one can use
endpoints
.0199
.0000486̄ ± 2.160 √
120,000,000
that is,
.0000486̄ ± .0000039
that is,
A confidence interval like this one for β1 can be translated into a confidence
interval for a difference in mean responses for two different values of x. Ac-
cording to model (9.4), two different values of x differing by 1x have mean
responses differing by β1 1x. One then simply multiplies endpoints of a confi-
dence interval for β1 by 1x to obtain a confidence interval for the difference
in mean responses. For example, since 8,000 − 6,000 = 2,000, the difference
between mean densities at 8,000 psi and 6,000 psi levels has a 95% confidence
interval with endpoints
that is,
Considerations Formula (9.17) allows a kind of precision to be attached to the slope of the
in the selection least squares line. It is useful to consider how that precision is related to study
of x values characteristics that are potentially under an investigator’s
P control. Notice that both
formulas (9.13) and (9.17) indicate that the larger (x − x̄)2 is (i.e., the more spread
out the xi values are), the more precision b1 offers as an estimator of the underlying
slope β1 . Thus, as far as the estimation of β1 is concerned, in studies where x
represents the value of a system variable under the control of an experimenter, he or
she should choose settings of x with the largest possible sample variance. (In fact,
if one has n observations to spend and can choose values of x anywhere in some
interval [a, b], taking n2 of them at x = a and n2 at x = b produces the best possible
precision for estimating the slope β1 .)
However, this advice (to spread the xi ’s out) must be taken with a grain of salt.
The approximately linear relationship (9.4) may hold over only a limited range of
possible x values. Choosing experimental values of x beyond the limits where it is
reasonable to expect formula (9.4) to hold, hoping thereby to obtain a good estimate
9.1 Inference Methods Related to the Least Squares Fitting of a Line (Simple Linear Regression) 661
µ y|x = β0 + β1 x (9.18)
The natural data-based approximation of the mean in formula (9.18) is the corre-
sponding y value taken from the least squares line. The notation
Estimator of
µy|x = β0 + β1 x
ŷ = b0 + b1 x (9.19)
will be used for this value on the least squares lines. (This is in spite of the fact that
the value in formula (9.19) may not be a fitted value in the sense that the phrase
has most often been used to this point. x need not be equal to any of x 1 , x2 , . . . , xn
for both expressions (9.18) and (9.19) to make sense.) The simple linear regression
model (9.4) leads to simple distributional properties for ŷ that then produce inference
methods for µ y|x .
Under model (9.4), ŷ has a normal distribution with
E ŷ = µ y|x = β0 + β1 x
and
!
1 (x − x̄)2
Var ŷ = σ 2
+P (9.20)
n (x − x̄)2
ŷ − µ y|x
Z= s
1 (x − x̄)2
σ +P
n (x − x̄)2
has a standard normal distribution. This in turn motivates the fact that
ŷ − µ y|x
T = s (9.21)
1 (x − x̄)2
sLF +P
n (x − x̄)2
H0 : µ y|x = # (9.22)
ŷ − #
Test statistic for T = s (9.23)
H0 : µy|x = #
1 (x − x̄)2
sLF +P
n (x − x̄)2
and a tn−2 reference distribution. Further, under the simple linear regression model
(9.4), a two-sided individual confidence interval for µ y|x can be made using end-
points
s
Confidence limits
1 (x − x̄)2
for the mean repsonse, ŷ ± tsLF +P (9.24)
µy|x = β0 + β1 x n (x − x̄)2
where the associated confidence is the probability assigned to the interval between
−t and t by the tn−2 distribution. A one-sided interval is made in the usual way
based on one endpoint from formula (9.24).
Example 1 Returning again to the pressure/density study, consider making individual 95%
(continued ) confidence intervals for the mean densities of cylinders produced first at 4,000
psi and then at 5,000 psi.
9.1 Inference Methods Related to the Least Squares Fitting of a Line (Simple Linear Regression) 663
Treating first the 4,000 psi condition, the corresponding estimate of mean
density is
Further, from formula (9.24) and the fact that the .975 quantile of the t13 distri-
bution is 2.160, a precision of plus-or-minus
s
1 (4,000 − 6,000)2
2.160(.0199) + = .0136 g/cc
15 120,000,000
can be attached to the 2.5697 g/cc figure. That is, endpoints of a two-sided 95%
confidence interval for the mean density under the 4,000 psi condition are
can be attached to the 2.6183 g/cc figure. That is, endpoints of a two-sided 95%
confidence interval for the mean density under the 5,000 psi condition are
The reader should compare the plus-or-minus parts of the two confidence
intervals found here. The interval for x = 5,000 psi is shorter and therefore more
informative than the interval for x = 4,000 psi. The origin of this discrepancy
should be clear, at least upon scrutiny of formula (9.24). For the students’ data,
x̄ = 6,000 psi. x = 5,000 psi is closer to x̄ than is x = 4,000 psi, so the (x − x̄)2
term (and thus the interval length) is smaller for x = 5,000 psi than for x =
4,000 psi.
the investigator’s control. If there is an interval of values of x over which one wants
good precision in estimating mean responses, it is only sensible to center one’s data
collection efforts in that interval.
Inference for Proper use of displays (9.22), (9.23), and (9.24) gives inference methods for the
the intercept, β0 parameter β0 in model (9.4). β0 is the y intercept of the linear relationship (9.18). So
by setting x = 0 in displays (9.22), (9.23), and (9.24), tests and confidence intervals
for β0 are obtained. However, unless x = 0 is a feasible value for the input variable
and the region where the linear relationship (9.18) is a sensible description of
physical reality includes x = 0, inference for β0 alone is rarely of practical interest.
The confidence intervals represented by formula (9.24) carry individual associ-
ated confidence levels. Section 7.3 showed that it is possible (using the P-R method)
to give simultaneous confidence intervals for r possibly different means, µi . This
comes about essentially by appropriately increasing the t multiplier used in the plus-
or-minus part of the formula for individual confidence limits. Here it is possible, by
replacing t in formula (9.24) with a larger value, to give simultaneous confidence
intervals for all means µ y|x . That is, under model (9.4), simultaneous two-sided
confidence intervals for all mean responses µ y|x can be made using respective end-
points
Simultaneous two- v
u
sided confidence √ u1 (x − x̄)2
limits for all (b0 + b1 x) ± 2 f sLF t + X (9.25)
means, µy|x
n (x − x̄)2
where for positive f , the associated simultaneous confidence is the F2,n−2 probability
assigned to the interval (0, f ).
Of course, the practical meaning of the phrase “for all means µ y|x ” is more
like “for all mean responses in an interval where the simple linear regression model
(9.4) is a workable description of the relationship between x and y.” As is always
the case in curve- and surface-fitting situations, extrapolation outside of the range
of x values where one has data (and even to some extent interpolation inside that
range) is risky business. When it is done, it should be supported by subject-matter
expertise to the effect that it is justifiable.
It may be somewhat difficult to grasp the meaning of a simultaneous confidence
figure applicable to all possible intervals of the form (9.25). To this point, the
confidence levels considered have been for finite sets of intervals. Probably the
best way to understand the theoretically infinite set of intervals given by formula
(9.25) is as defining a region in the (x, y)-plane thought likely to contain the line
µ y|x = β0 + β1 x. Figure 9.5 is a sketch of a typical confidence region represented
by formula (9.25). There is a region indicated about the least squares line whose
vertical extent increases with distance from x̄ and which has the stated confidence
in covering the line describing the relationship between x and µ y|x .
9.1 Inference Methods Related to the Least Squares Fitting of a Line (Simple Linear Regression) 665
Least
squares
line
Simultaneous
confidence region
for all mean responses
x
Example 1 It is instructive to compare what the P-R method of Section 7.3 and formula
(continued ) (9.25) give for simultaneous 95% confidence intervals for mean cylinder densities
produced under the five conditions actually used by the students in their study.
First, formula (7.28) of Section 7.3 shows that with n − r = 15 − 5 = 10
degrees of freedom for sP and r = 5 conditions under study, 95% simultaneous
two-sided confidence limits for all five mean densities are of the form
s
ȳ i ± 3.103 √P
ni
which in the present context is
.0206
ȳ i ± 3.103 √
3
that is,
ȳ i ± .0369 g/cc
are indicated. Table 9.5 shows the five intervals that result from the use of each
of the two simultaneous confidence methods, together with individual intervals
(9.24).
Two points are evident from Table 9.5. First, the intervals that result from
formula (9.25) are somewhat wider than the corresponding individual intervals
given by formula (9.24). But it is also clear that the use of the simple linear
regression model assumptions in preference to the more general one-way as-
sumptions of Chapter 7 can lead to shorter simultaneous confidence intervals and
correspondingly sharper real-world engineering inferences.
yn+1 − ŷ
T = s
1 (x − x̄)2
sLF 1+ +P
n (x − x̄)2
9.1 Inference Methods Related to the Least Squares Fitting of a Line (Simple Linear Regression) 667
has a tn−2 distribution. This fact leads in the usual way to the conclusion that under
model (9.4) the two-sided interval with endpoints
Simple linear s
regression 1 (x − x̄)2
prediction limits for ŷ ± tsLF 1+ +P (9.26)
n (x − x̄)2
an additional y at a
given x
A one-sided tolerance
interval for the y ( ŷ − τ sLF , ∞) (9.27)
distribution at x
and
Another one-sided
(−∞, ŷ + τ sLF ) (9.28)
tolerance interval for
the y distribution at x
s
The ratio of
p 1 (x − x̄)2
Var ŷ to σ for simple A= +P (9.29)
linear regression n (x − x̄)2
will be adopted for the multiplier that is used (e.g., in formula (9.24)) to go from an
estimate of σ to an estimate of the standard deviation of ŷ. Then, for approximate
668 Chapter 9 Regression Analysis—Inference for Curve- and Surface-Fitting
v !
u
u 1 Q 2z ( p)
Q z ( p) + AQ z (γ )t1 + − Q 2z (γ )
Multiplier to use in 2(n − 2) A2
interval (9.27) or (9.28) τ= (9.30)
Q 2z (γ )
1−
2(n − 2)
Example 1 To illustrate the use of prediction and tolerance interval formulas in the simple
(continued ) linear regression context, consider a 90% lower prediction bound for a single
additional density in powder pressing, if a pressure of 4,000 psi is employed.
Then, additionally consider finding a 95% lower tolerance bound for 90% of
many additional cylinder densities if that pressure is used.
Treating first the prediction problem, formula (9.26) shows that an appropri-
ate prediction bound is
s
1 (4,000 − 6,000)2
2.5697 − 1.350(.0199) 1 + + = 2.5796 − .0282
15 120,000,000
that is,
I 2.5514 g/cc
If, rather than predicting a single additional density for x = 4,000 psi, it is
of interest to locate 90% of additional densities corresponding to a 4,000 psi
pressure, a tolerance bound is in order. First use formula (9.29) and find that
s
1 (4,000 − 6,000)2
A= + = .3162
15 120,000,000
v !
u
u 1 (1.282)2
1.282 + (.3162)(1.645)t1 + − (1.645)2
2(15 − 2) (.3162)2
τ= = 2.149
(1.645)2
1−
2(15 − 2)
9.1 Inference Methods Related to the Least Squares Fitting of a Line (Simple Linear Regression) 669
that is,
I 2.5269 g/cc
Cautions about The fact that curve-fitting facilitates interpolation and extrapolation makes it
prediction and imperative that care be taken in the interpretation of prediction and tolerance in-
tolerance intervals tervals. All of the warnings regarding the interpretation of prediction and tolerance
in regression intervals raised in Section 6.6 apply equally to the present situation. But the new
element here (that formally, the intervals can be made for values of x where one
has absolutely no data) requires additional caution. If one is to use formulas (9.26),
(9.27), and (9.28) at a value of x not represented among x 1 , x2 , . . . , xn , it must
be plausible that model (9.4) not only describes system behavior at those x values
where one has data, but at the additional value of x as well. And even when this is
“plausible” the application of formulas (9.26), (9.27), and (9.28) to new values of
x should be treated with a good dose of care. Should one’s (unverified) judgment
prove wrong, the nominal confidence level has unknown practical relevance.
It is not obvious, but the difference referred to in Definition 2 in general has the
form of a sum of squares of appropriate quantities. In the present context of fitting
a line by least squares,
X
n
SSR = ( ŷ i − ȳ)2
i=1
Without using the particular terminology of Definition 2, this text has already
made fairly extensive use of SSR = SSTot − SSE. A review of Definition 3 in Chap-
ter 4 (page 130), and Definitions 4 and 6 in Chapter 7 (page 484) will show that in
curve- and surface-fitting contexts,
The coefficient of
determination for SSR
simple linear regression R2 = (9.31)
in sum of squares
SSTot
notation
That is, SSR is the numerator of the coefficient of determination defined first in
Definition 3 (Chapter 4). It is commonly thought of as the part of the raw variability
in y that is accounted for in the curve- or surface-fitting process.
SSR and SSE not only provide an appealing partition of SSTot but also form the
raw material for an F test of
H0 : β1 = 0 (9.32)
versus
Ha : β1 6= 0 (9.33)
Under model (9.4), hypothesis (9.32) can be tested using the statistic
and an F1,n−2 reference distribution, where large observed values of the test statistic
constitute evidence against H0 .
Earlier in this section, the general null hypothesis H0 : β1 = # was tested using
the t statistic (9.16). It is thus reasonable to consider the relationship of the F
test indicated in displays (9.32), (9.33), and (9.34) to the earlier t test. The null
hypothesis H0 : β1 = 0 is a special form of hypothesis (9.15), H0 : β1 = #. It is the
most frequently tested version of hypothesis (9.15) because it can (within limits)
be interpreted as the null hypothesis that mean response doesn’t depend on x.
This is because when hypothesis (9.32) is true within the simple linear regression
model (9.4), µ y|x = β0 + 0 · x = β0 , which doesn’t depend on x. (Actually, a better
interpretation of a test of hypothesis (9.32) is as a test of whether a linear term in
9.1 Inference Methods Related to the Least Squares Fitting of a Line (Simple Linear Regression) 671
x adds significantly to one’s ability to model the response y after accounting for an
overall mean response.)
If one then considers testing hypotheses (9.32) and (9.33), it might appear that
the # = 0 version of formula (9.16) and formula (9.34) represent two different testing
methods. But they are equivalent. The statistic (9.34) turns out to be the square of
the # = 0 version of statistic (9.16), and (two-sided) observed significance levels
based on statistic (9.16) and the tn−2 distribution turn out to be the same as observed
significance levels based on statistic (9.34) and the F1,n−2 distribution. So, from one
point of view, the F test specified here is redundant, given the earlier discussion. But
it is introduced here because of its relationship to the ANOVA ideas of Section 7.4,
and because it has an important natural generalization to more complex curve- and
surface-fitting contexts. (This generalization is discussed in Section 9.2 and cannot
be made equivalent to a t test.)
The partition of SSTot into its parts, SSR and SSE, and the calculation of the
statistic (9.34) can be organized in ANOVA table format. Table 9.6 shows the general
format that this book will use in the simple linear regression context.
Table 9.6
General Form of the ANOVA Table for Simple Linear Regression
Example 1 Recall again from the discussion of the pressure/density example in Section 4.1
(continued ) that
X
SSTot = (y − ȳ)2 = .289366
Thus,
and the specific version of Table 9.6 for the present example is given as Table 9.7.
672 Chapter 9 Regression Analysis—Inference for Curve- and Surface-Fitting
and one has very strong evidence against the possibility that β1 = 0. A linear term
in Pressure is an important contributor to one’s ability to describe the behavior
of Cylinder Density. This is, of course, completely consistent with the earlier
interval-oriented analysis that produced 95% confidence limits for β1 of
Table 9.7
ANOVA Table for the Pressure/Density Data
Source DF SS MS F P
Regression 1 0.28421 0.28421 717.06 0.000
Residual Error 13 0.00515 0.00040
Total 14 0.28937
Predicted Values
s
1 (x − x̄)2
“StDev Fit” = sLF +P
n (x − x̄)2
674 Chapter 9 Regression Analysis—Inference for Curve- and Surface-Fitting
and residual, and standardized residual corresponding to the n data points. MINI-
TAB’s regression program has an option that allows one to request fitted values,
confidence intervals for µ y|x , and prediction intervals for x values of interest, and
Printout 1 finishes with this information for the value x = 5,000.
The reader is encouraged to compare the information on Printout 1 with the
various results obtained in Example 1 and verify that everything on the printout
(except the “adjusted R 2 ” value) is indeed familiar.
Section 1 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Return to the situation of Exercise 3 of Section (a) Find estimates of the parameters β0 , β1 , and σ
4.1 and the polymer molecular weight study of R. in the simple linear regression model y = β0 +
Harris. β1 x + . How does your estimate of σ based
(a) Find sLF for these data. What does this intend on the simple linear regression model compare
to measure in the context of the engineering with the pooled sample standard deviation, sP ?
problem? (b) Compute residuals and standardized residuals.
(b) Plot both residuals versus x and the standard- Plot both against x and ŷ and normal-plot them.
ized residuals versus x. How much difference How much do the appearances of the plots of
is there in the appearance of these two plots? the standardized residuals differ from those of
(c) Give a 90% two-sided confidence interval for the raw residuals?
the increase in mean average molecular weight (c) Make a 90% two-sided confidence interval for
that accompanies a 1◦ C increase in temperature the increase in mean compressive strength that
here. accompanies a .1 increase in the water/cement
(d) Give individual 90% two-sided confidence in- ratio. (This is .1β1 ).
tervals for the mean average molecular weight (d) Test the hypothesis that the mean compressive
at 212◦ C and also at 250◦ C. strength doesn’t depend on the water/cement
(e) Give simultaneous 90% two-sided confidence ratio. What is the p-value?
intervals for the two means indicated in part (e) Make a 95% two-sided confidence interval for
(d). the mean strength of specimens with the wa-
(f) Give 90% lower prediction bounds for the next ter/cement ratio .5 (based on the simple linear
average molecular weight, first at 212◦ C and regression model).
then at 250◦ C. (f) Make a 95% two-sided prediction interval for
(g) Give approximately 95% lower tolerance the strength of an additional specimen with
bounds for 90% of average molecular weights, the water/cement ratio .5 (based on the simple
first at 212◦ C and then at 250◦ C. linear regression model).
(h) Make an ANOVA table for testing H0 : β1 = 0 (g) Make an approximately 95% lower tolerance
in the simple linear regression model. What is bound for the strengths of 90% of additional
the p-value here for a two-sided test of this specimens with the water/cement ratio .5
hypothesis? (based on the simple linear regression model).
2. Return to the situation of Chapter Exercise 1 of
Chapter 4 and the concrete strength study of Nichol-
son and Bartle.
9.2 Inference Methods for General Least Squares Curve- and Surface-Fitting (Multiple Linear Regression) 675
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
y ≈ β 0 + β1 x 1 + β2 x 2 + · · · + βk x k (9.35)
holds. As in Section 4.2, the form (9.35) not only covers those circumstances where
x1 , x2 , . . . , xk all represent physically different variables but also describes contexts
where some of the variables are functions of others. For example, the relationship
y ≈ β0 + β1 x1 + β2 x12
y with a common variance σ 2 but means µ y|x ,x ,...,x that change linearly with each
1 2 k
of x1 , x2 , . . . , xk . In symbols, it is typical to write that for i = 1, 2, . . . , n,
The (normal) multiple
linear regression yi = β0 + β1 x1i + β2 x2i + · · · + βk xki + i (9.36)
model
where the i are (unobservable) iid normal (0, σ 2 ) random variables, the x 1i , x2i , . . . ,
xki are known constants, and β0 , β1 , β2 , . . . , βk and σ 2 are unknown model param-
eters (fixed constants). This is the specialization of the general one-way model
yi j = µi + i j
µ y|x = β 0 + β1 x 1 + β2 x 2 + · · · + βk x k (9.37)
1 ,x 2 ,...,x k
β0
Surface defined by
µ y|x1, x 2 = β 0 + β 1 x1 + β 2 x 2
x1
Distributions of y
for 2 different
(x1, x 2) pairs
x2
in the data, it provides the basis for inferences involving limited interpolation and
extrapolation on the system variables x 1 , x2 , . . . , xk .
Section 4.2 contains a discussion of using statistical software in the least squares
fitting of the approximate relationship (9.35) to a set of (x 1 , x2 , . . . , xk , y) data.
That discussion can be thought of as covering the fitting and use of residuals in
model checking for the multiple linear regression model (9.36). Section 4.2 did
Estimators of the not produce explicit formulas for b0 , b1 , b2 , . . . , bk , the (least squares) estimates of
coefficients β in β0 , β1 , β2 , . . . , βk . Instead it relied on the software to produce those estimates. Of
the multiple linear course, once one has estimates of the β’s, corresponding fitted values immediately
regression model become
Fitted values for
the multiple linear ŷ i = b0 + b1 x1i + b2 x2i + · · · + bk xki (9.38)
regression model
with residuals
Residuals for
the multiple linear ei = yi − ŷ i (9.39)
regression model
Definition 3 For a set of n data vectors (x11 , x21 , . . . , xk1 , y1 ), (x12 , x22 , . . . , xk2 , y2 ), . . . ,
(x1n , x2n , . . . , xkn , yn ) where least squares fitting produces fitted values given
by formula (9.38) and residuals (9.39),
1 X 1 X
2
sSF = (y − ŷ)2 = e2 (9.40)
n−k−1 n−k−1
1
2
sSF = (.053)2 + (−.125)2 + · · · + (.265)2 + (2.343)2
17 − 3 − 1
= 1.26
so a corresponding estimate of σ is
√
sSF = 1.26
I = 1.125
(The units of y—and therefore sSF —are .1% of incoming ammonia escaping
unabsorbed.)
In routine practice it is a waste to do even these calculations, since multiple
regression programs typically output sSF as part of their analysis. The reader
should take time to locate the value sSF = 1.125 on Printout 2. If one accepts
the relevance of model (9.41), for fixed values of airflow and inlet temperature
(and therefore airflow squared), the standard deviation associated with many
days’ stack losses produced under those conditions would then be expected to be
approximately .1125%.
9.2 Inference Methods for General Least Squares Curve- and Surface-Fitting (Multiple Linear Regression) 679
Printout 2 Multiple Linear Regression for the Stack Loss Data (Example 2)
Regression Analysis
Analysis of Variance
Source DF SS MS F P
Regression 3 799.80 266.60 210.81 0.000
Residual Error 13 16.44 1.26
Total 16 816.24
Source DF Seq SS
x1 1 775.48
x2 1 18.49
x1**2 1 5.82
Predicted Values
Example 2 Among the 17 data points in Table 4.8, there are only 12 different airflow/inlet
(continued ) temperature combinations (and therefore 12 different (x 1 , x2 , x12 ) vectors). The
original data can be thought of as organized into r = 12 separate samples, one
for each different (x1 , x2 , x12 ) vector and there is thus an estimate of σ that
doesn’t depend for its validity on the appropriateness of the assumption that
µ y|x ,x = β0 + β1 x1 + β2 x2 + β3 x12 . That is, sP can be computed and compared
1 2
it to sSF as a check on the appropriateness of model (9.41). Table 9.8 organizes
the calculation of that pooled estimate of σ .
Table 9.8
Twelve Sample Means and Four Sample Variances
for the Stack Loss Data
x1 , x2 , y,
Air Inlet Stack
Flow Temperature Loss ȳ s2
50 18 8, 7 7.5 .5
50 19 8, 8 8.0 0.0
50 20 9 9.0 —
56 20 15 15.0 —
58 17 13 13.0 —
58 18 14, 14, 11 13.0 3.0
58 19 12 12.0 —
58 23 15 15.0 —
62 22 18 18.0 —
62 23 18 18.0 —
62 24 19, 20 19.5 .5
80 27 37 37.0 —
Then
1
sP2 = ((2 − 1)(.5) + (2 − 1)(0.0) + (3 − 1)(3.0) + (2 − 1)(.5))
17 − 12
= 1.40
so
q √
I sP = sP2 = 1.40 = 1.183
9.2 Inference Methods for General Least Squares Curve- and Surface-Fitting (Multiple Linear Regression) 681
The fact that sSF = 1.125 and sP = 1.183 are in substantial agreement is
consistent with the work in Example 5 of Chapter 4, which found the equation
sSF is basic to all of formal statistical inference based on the multiple lin-
ear regression model. But before using it to make statistical intervals and do
significance testing, note also that it is useful for producing standardized resid-
uals for the multiple linear regression model. That is, it is possible to find pos-
itive constants a1 , a2 , . . . , an (which are each complicated functions of all of
x11 , x21 , . . . , xk1 , x12 , x22 , . . . , xk2 , . . . , x1n , x2n , . . . , xkn ) such that the ith residual
ei = yi − ŷ i has
Var(yi − ŷ i ) = ai σ 2
Then, recalling Definition 2 in Chapter 7 (page 458), corresponding to the data point
(x1i , x2i , . . . , xki , yi ) is the standardized residual for multiple linear regression
Standardized
residuals for ei
ei∗ = √ (9.42)
multiple linear sSF ai
regression
It is not possible to include here a simple formula for the ai that are needed to
compute standardized residuals. (They are of interest only as building blocks in
formula (9.42) anyway.) But it is easy to read the standardized residuals (9.42) off a
typical multiple regression printout and to plot them in the usual ways as means of
checking the apparent appropriateness of a candidate version of model (9.36) fit to
a set of n data points (x 1 , x2 , . . . , xk , y).
1.2 1.2
0.0 0.0
2 2
–1.2 –1.2
–1.60 –0.80 0.00 0.80 1.60 2.40 –1.60 –0.80 0.00 0.80 1.60 2.40
Residual quantile Standardized residual quantile
Figure 9.7 Normal plots of residuals and standardized residuals for the stack loss data (Example 2)
µ y|x = β 0 + β1 x 1 + β2 x 2 + · · · + βk x k
1 ,x 2 ,...,x k
Ebl = βl
and
Var bl = dl σ 2
9.2 Inference Methods for General Least Squares Curve- and Surface-Fitting (Multiple Linear Regression) 683
Estimated standard q
deviation of bl sSF dl (9.43)
bl − βl
T = p (9.44)
sSF dl
H0 : βl = # (9.45)
bl − #
Test statistic T = q (9.46)
for H0 : βl = # sSF dl
and a tn−k−1 reference distribution. More importantly, under the multiple linear
regression model (9.36), a two-sided individual confidence interval for βl can be
made using endpoints
Confidence limits q
for βl bl ± tsSF dl (9.47)
where the associated confidence is the probability assigned to the interval between
−t and t by the tn−k−1 distribution. Appropriate use of only one of the endpoints
(9.47) gives a one-sided interval for βl .
Example 2 Looking again at Printout 2 (see page 679), note that MINITAB’s multiple re-
(continued ) gression output includespa table of estimated coefficients (bl ) and (estimated)
standard deviations (sSF dl ). These are collected in Table 9.9.
684 Chapter 9 Regression Analysis—Inference for Curve- and Surface-Fitting
Then since the upper .05 point of the t13 distribution is 1.771, from formula
(9.47) a two-sided 90% confidence interval for β2 in model (9.41) has endpoints
I .5278 ± 1.771(.1501)
that is,
This interval establishes that there is an increase in mean stack loss y with
increased inlet temperature x2 (the interval contains only positive values). It
further gives a way of assessing the likely impact on y of various changes in x 2 .
For example, if x 1 (and therefore x3 = x12 ) is held constant but x 2 is increased by
2◦ , one can anticipate an increase in mean stack loss of between
As a second example of the use of formula (9.47), note that a 90% two-sided
confidence interval for β3 has endpoints
.006818 ± 1.771(.003178)
that is,
β3 controls the amount and direction of curvature (in the variable x 1 ) possessed by
the surface specified by µ y|x ,x = β0 + β1 x1 + β2 x2 + β3 x12 . Since the interval
1 2
contains only positive values, it shows that at the 90% confidence level, there is
some important concave-up curvature in the airflow variable needed to describe
the stack loss variable. This is consistent with the picture of fitted mean response
given previously in Figure 4.15 (see page 155).
9.2 Inference Methods for General Least Squares Curve- and Surface-Fitting (Multiple Linear Regression) 685
However, check that if 95% confidence is used in the calculation of the two-
sided interval for β3 , the resulting confidence interval contains values on both
sides of 0. If this higher level of confidence is needed, the data in hand are not
adequate to establish definitively the nature of any curvature in mean stack loss
as a function of airflow. Any real curvature appears weak enough in comparison
to the basic background variation that more data are needed to decide whether
the surface is concave up, linear, or concave down in the variable x 1 .
Very often multiple regression programs output not only the estimated standard
deviations of fitted coefficients (9.43) but also the ratios
bl
t= p
sSF dl
H0 : βl = 0
Review Printout 2 and note that, for example, the two-sided p-value for testing
H0 : β3 = 0 in model (9.41) is slightly larger than .05. This is completely consistent
with the preceding discussion regarding the interpretation of interval estimates
of β3 .
Estimator of
µy|x ,x ,...,x ŷ = b0 + b1 x1 + b2 x2 + · · · + bk xk (9.48)
1 2 k
will here be used for the value produced by the least squares equation when a
particular set of numbers x1 , x2 , . . . , xk is plugged into it. ( ŷ may not be a fitted
value in the strict sense of the phrase, as the vector (x 1 , x2 , . . . , xk ) may not match
any data vector (x1i , x2i , . . . , xki ) used to produce the least squares coefficients
b0 , b1 , . . . , bk .) As it turns out, the multiple linear regression model (9.36) leads to
simple distributional properties for ŷ, which then produce inference methods for
µ y|x ,x ,...,x .
1 2 k
686 Chapter 9 Regression Analysis—Inference for Curve- and Surface-Fitting
E ŷ = µ y|x = β 0 + β1 x 1 + · · · + βk x k
1 ,x 2 ,...,x k
and
p
A= Var ŷ/σ Var ŷ = σ 2 A2 (9.49)
ŷ − µ y|x
1 ,x 2 ,...,x k
T =
sSF · A
H0 : µ y|x =# (9.51)
1 ,x 2 ,...,x k
and a tn−k−1 reference distribution. Further, under the multiple linear regression
model (9.36), a two-sided confidence interval for µ y|x ,x ,...,x can be made using
1 2 k
endpoints
Confidence limits
for the mean response ŷ ± tsSF · A (9.53)
µy|x ,x ,...,x
1 2 k
where the associated confidence is the probability assigned to the interval between
−t and t by the tn−k−1 distribution. One-sided intervals based on formula (9.53) are
made in the usual way.
Finding the The practical obstacle to be overcome in the use of these methods is the compu-
factor A tation of A. Although it is not possible to give a simple formula for A, most multiple
regression programs provide A for (x1 , x2 , . . . , xk ) vectors of interest. MINITAB,
for example, will fairly automatically produce values of sSF · A corresponding to
9.2 Inference Methods for General Least Squares Curve- and Surface-Fitting (Multiple Linear Regression) 687
each data point (x1i , x2i , . . . , xki , yi ), labeled as (the estimated) standard deviation
(of the) fit. And an option makes it possible to obtain similar information for any
user-specified choice of (x 1 , x2 , . . . , xk ). (Division of this by sSF then produces A.)
Example 2 Consider the problem of estimating the mean stack loss if the nitrogen plant
(continued ) of Example 5 in Chapter 4 is operated consistently with x 1 = 58 and x2 = 19.
(Notice that this means that x 3 = x12 = 3,364 is involved.) Now the conditions
x1 = 58, x2 = 19, and x3 = 3,364 match perfectly those of data point number
11 on Printout 2 (see page 679). Thus, ŷ and sSF · A for these conditions may
I be read directly from the printout as 13.546 and .378, respectively. Then, for
example, from formula (9.53), a 90% two-sided confidence interval for the mean
stack loss corresponding to an airflow of 58 and water inlet temperature of 19
has endpoints
13.546 ± 1.771(.378)
that is,
15.544 ± 1.771(.383)
that is,
(Of course, endpoints of a 95% interval can be read directly from the printout.)
688 Chapter 9 Regression Analysis—Inference for Curve- and Surface-Fitting
Example 2 It is impossible to overemphasize the fact that the preceding two intervals are
(continued ) dependent for their practical relevance on that of model (9.41) for not only those
(x1 , x2 ) pairs in the original data but (in the second case) also for the x1 = 60 and
x 2 = 20 set of conditions. Formulas like (9.53) always allow for imprecision due
to statistical fluctuations/background noise in the data. They do not, however,
allow for discrepancies related to the application of a model in a regime over
which it is not appropriate. Formula (9.53) is an important and useful formula.
But it should be used thoughtfully, with no expectation that it will magically do
more than help quantify the precision provided by the data in the context of a
particular set of model assumptions.
where for positive f , the associated confidence is the Fk+1,n−k−1 probability as-
signed to the interval (0, f ). Formula (9.54) is related to formula (9.53) through
the replacement
√ of the multiplier t by the (larger for a given nominal confidence)
multiplier (k + 1) f . When it is applied only to (x1 , x2 , . . . , xk ) vectors found in
the original n data points, formula (9.54) is an alternative to the P-R method of
simultaneous intervals for means, appropriate to surface-fitting problems. When the
multiple linear regression model is indeed appropriate, formula (9.54) will usually
give shorter simultaneous intervals than the P-R method.
Example 2 For making simultaneous 90% confidence intervals for the mean stack losses
(continued ) at the 12 different sets of plant conditions represented in the original data set,
one can use formula (9.54) with k = 3, f = 2.43 (the .9 quantile of the F4,13
distribution) and the ŷ and corresponding sSF · A values appearing on Printout 2
(see page 679). For example, considering the x 1 = 80 and x2 = 27 conditions of
9.2 Inference Methods for General Least Squares Curve- and Surface-Fitting (Multiple Linear Regression) 689
observation 1 on the printout, sSF · A = 1.121 and one of the simultaneous 90%
confidence intervals associated with these conditions has endpoints
p
36.947 ± (3 + 1)(2.43)(1.121)
or
33.452 (.1% nitrogen loss) and 40.442 (.1% nitrogen loss)
An alternative q
formula for ŷ ± t sSF
2
+ (sSF · A)2 (9.56)
prediction limits
v !
u
u 1 Q 2z ( p)
Multiplier to use Q z ( p) + AQ z (γ )t1 + − Q 2z (γ )
in making tolerance 2(n − k − 1) A2
intervals in τ= (9.57)
multiple regression Q 2z (γ )
1−
2(n − k − 1)
690 Chapter 9 Regression Analysis—Inference for Curve- and Surface-Fitting
Example 2 Returning to the nitrogen plant example, consider first the calculation of a 90%
(continued ) lower prediction bound for a single additional stack loss y, if airflow of x 1 = 58
and water inlet temperature of x 2 = 19 are used. Then consider also a 95% lower
tolerance bound for 90% of many additional stack loss values if the plant is run
under those conditions.
Treating the prediction interval problem, recall that for x 1 = 58 and x2 = 19,
ŷ = 13.546 and sSF · A = .378. Since sSF = 1.125 and the .9 quantile of the t13
distribution is 1.350, formula (9.56) shows that the desired 90% lower prediction
bound for an additional stack loss under such plant operating conditions is
q
13.546 − 1.350 (1.125)2 + (.378)2
To not predict a single additional stack loss, but rather to locate 90% of many
additional stack losses with 95% confidence, expression (9.57) is the place to
begin. Note that for x 1 = 58 and x2 = 19,
A = .378/1.125 = .336
So finally, a 95% lower tolerance bound for 90% of stack losses produced under
operating conditions of x1 = 58 and x2 = 19 is, via display (9.58),
that is,
The warnings raised in the previous section concerning prediction and tolerance
intervals in simple regression all apply equally to the present case of multiple
regression. So do points similar to those made in Example 2 (page 688) in reference
to confidence intervals for the mean system response. Although they are extremely
useful engineering tools, statistical intervals are never any better than the models on
which they are based.
Further, under model (9.36), these sums of squares (SSTot, SSE, and SSR) form the
basis of an F test of the hypothesis
H0 : β1 = β2 = · · · = βk = 0 (9.60)
versus
Ha : not H0 (9.61)
and an Fk,n−k−1 reference distribution, where large observed values of the test statis-
tic constitute evidence against H0 . (The denominator of statistic (9.62) is another
2
way of writing sSF .)
Hypothesis (9.60) in the context of the multiple linear regression model implies
that the mean response doesn’t depend on any of the process variables x 1 , x2 , . . . , xk .
That is, if all of β1 through βk are 0, model statement (9.36) reduces to
yi = β0 + i
Interpreting a test of So a test of hypothesis (9.60) is often interpreted as a test of whether the mean
H0 : β1 = β2 = · · · = βk = 0 response is related to any of the input variables under consideration. The calculations
leading to statistic (9.62) are most often organized in a table quite similar to the
one discussed in Section 9.1 for testing H0 : β1 = 0 in simple linear regression. The
general form of that table is given as Table 9.10.
Table 9.10
General Form of the ANOVA Table for Testing H0 : β1 = β2 = · · · = βk = 0
in Multiple Regression
Example 2 Once again turning to the analysis of the nitrogen plant data under the model yi =
(continued ) β0 + β1 x1i + β2 x2i + β3 x1i2 + i , consider testing H0 : β1 = β2 = β3 = 0—that
is, mean stack loss doesn’t depend on airflow (or its square) or water inlet
temperature. Printout 2 (see page 679) includes an ANOVA table for testing this
hypothesis, which is essentially reproduced here as Table 9.11.
From Table 9.11, the observed value of the F statistic is 210.81, which is to be
compared to F3,13 quantiles in order to produce an observed level of significance.
As indicated in Printout 2, the F3,13 probability to the right of the value 210.81
is 0 (to three decimal places). This is definitive evidence that not all of β1 , β2 ,
and β3 can be 0. Taken as a group, the variables x 1 , x2 , and x3 = x12 definitely
enhance one’s ability to predict stack loss.
9.2 Inference Methods for General Least Squares Curve- and Surface-Fitting (Multiple Linear Regression) 693
Table 9.11
ANOVA Table for Testing H0 : β1 = β2 = β3 = 0 for the Stack Loss
Data
Note also that the value of the coefficient of determination here can be
calculated using sums of squares given in Table 9.11 as
SSR 799.80
R2 = = = .980
SSTot 816.24
This is the value for R 2 advertised long ago in Example 5 in Chapter 4. Also,
the error mean square, MSE = 1.26, is (as expected) exactly the value of sSF2
It is a matter of simple algebra to verify that R 2 and the F statistic (9.62) are
equivalent in the sense that
An expression for
R 2 /k
the F statistic (9.62) F= (9.63)
in terms of R2 (1 − R 2 )/(n − k − 1)
Suppose that there are two different regression models for describing a data
set—the first of the usual form (9.36) for k input variables x 1 , x2 , . . . , xk ,
and the second being a specialization of the first where some p of the coefficients
β (say, βl , βl , . . . , βl ) are all 0 (i.e., a specialization not involving input variables
1 2 p
xl , xl , . . . , xl ). The first of these models will be called the full regression model
1 2 p
and the second a reduced regression model. When one informally compares R 2
values for two such models, the comparison is essentially between SSR values, since
the two R 2 values share the same denominator, SSTot. The two SSR values can be
used to produce an observed level of significance for the comparison.
Under model the full model (9.36), the hypothesis
H0 : βl = βl = · · · = βl = 0 (9.64)
1 2 p
Ha : not H0 (9.65)
and an Fp,n−k−1 reference distribution, where large observed values of the test
statistic constitute evidence against H0 in favor of Ha . In expression (9.66), the
“f” and “r” subscripts refer to the full and reduced regressions. The calculation of
statistic (9.66) can be facilitated by expanding the basic ANOVA table for the full
model (Table 9.10). Table 9.12 shows one form this can take.
Table 9.12
Expanded ANOVA Table for Testing H0 : βl = βl = · · · = βl = 0 in Multiple Regression
1 2 p
Example 2 In the nitrogen plant example, consider the comparison of the two possible
(continued ) descriptions of stack loss
y ≈ β0 + β1 x 1 (9.67)
y ≈ β0 + β1 x1 + β2 x2 + β3 x12 (9.68)
(the description of stack loss that has been used throughout this section). Although
a printout won’t be included here to show it, it is a simple matter to verify that the
fitting of expression (9.67) to the nitrogen plant data produces SSR = 775.48 and
therefore R 2 = .950. Fitting expression (9.68), on the other hand, gives SSR =
799.80 and R 2 = .980. Since expression (9.67) is the specialization/reduction of
expression (9.68) obtained by dropping the last p = 2 terms, the comparison of
these two SSR (or R 2 ) values can be formalized with a p-value. A test of
H0 : β2 = β3 = 0
can be made in the (full) model (9.68). Table 9.13 organizes the calculation of
the observed value of the statistic (9.66) for this problem. That is,
(799.80 − 775.48)/2
f = = 9.7
16.44/13
When compared with tabled F2,13 percentage points, the observed value of
9.7 is seen to produce a p-value between .01 and .001. There is strong evidence
in the nitrogen plant data that an explanation of mean response in terms of
expression (9.68) (pictured, for example, in Figure 4.15) is superior to one in
terms of expression (9.67) (which could be pictured as a single linear mean
response in x1 for all x2 ).
Table 9.13
ANOVA Table for Testing H0 : β2 = β3 = 0 in Model (9.68)
for the Stack Loss Data
so that the test of hypothesis (9.64) is indeed a way of attaching a p-value to the
Interpreting full comparison of two R 2 ’s. However, just as was remarked earlier concerning the test
and reduced R2 ’s of hypothesis (9.60), it is the R 2 ’s themselves that indicate how much additional
and the F test variation a full model accounts for over a reduced model. The observed F value
or associated p-value measures the extent to which that increase is distinguishable
from background noise.
p tests that single To conclude this section, something needs to be said about the relationship
coefficients are 0 between the tests of hypotheses (9.45) (with # = 0), mentioned earlier, and the tests
versus a test that p of hypothesis (9.64) based on the F statistic (9.66). When p = 1 (the full model
coefficients are all 0 contains only one more term than the reduced model), observed levels of significance
based on statistic (9.66) are in fact equal to two-sided observed levels of significance
based on # = 0 versions of statistic (9.46). But for cases where p ≥ 2, the tests of
the hypotheses that individual β’s are 0 (one at a time) are not an adequate substitute
for the tests of hypothesis (9.64). For example, in the full model
y = β0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + (9.70)
testing
H0 : β2 = 0 (9.71)
H0 : β3 = 0 (9.72)
H0 : β2 = β3 = 0 (9.73)
This fact may at first seem paradoxical. But should the variables x 2 and x3 be
reasonably highly correlated in the data set, it is possible to get large p-values
for tests of both hypothesis (9.71) and (9.72) and yet a tiny p-value for a test of
hypothesis (9.73). The message carried by such an outcome is that (due to the fact
that the variables x2 and x3 appear in the data set to be more or less equivalent) in
the presence of x 1 and x2 , x3 is not needed to model y. And in the presence of x 1
and x3 , x2 is not needed to model y. But one or the other of the two variables x 2 and
x 3 is needed to help model y even in the presence of x1 . So, the F test of hypothesis
(9.64) is more than just a fancy version of several tests of hypotheses H0 : βl = 0. It
is an important addition to an engineer’s curve- and surface-fitting tool kit.
9.3 Application of Multiple Regression in Response Surface Problems and Factorial Analyses 697
Section 2 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Return to the situation of Chapter Exercise 2 of in the context of the study and the quadratic
Chapter 4 and the carburetion study of Griffith and model?
Tesdall. Consider an analysis of these data based 2. Return to the situation of Exercise 2 of Section
on the model y = β0 + β1 x + β2 x 2 + . 4.2, and the chemithermomechanical pulp study of
(a) Find sSF for these data. What does this intend Miller, Shankar, and Peterson. Consider an analysis
to measure in the context of the engineering of the data there based on the model y = β0 +
problem? β1 x1 + β2 x2 + .
(b) Plot both residuals versus x and the standard- (a) Find sSF . What does this intend to measure in
ized residuals versus x. How much difference the context of the engineering problem?
is there in the appearance of these two plots? (b) Plot both residuals and standardized residuals
(c) Give 90% individual two-sided confidence in- versus x1 , x2 , and ŷ. How much difference is
tervals for each of β0 , β1 , and β2 . there in the appearance of these pairs of plots?
(d) Give individual 90% two-sided confidence in- (c) Give 90% individual two-sided confidence in-
tervals for the mean elapsed time with a carbu- tervals for all of β0 , β1 , and β2 .
retor jetting size of 70 and then with a jetting (d) Give individual 90% two-sided confidence in-
size of 76. tervals for the mean specific surface area, first
(e) Give simultaneous 90% two-sided confidence when x1 = 9.0 and x2 = 60 and then when
intervals for the two means indicated in x1 = 10.0 and x2 = 70.
part (d). (e) Give simultaneous 90% two-sided confidence
(f) Give 90% lower prediction bounds for an ad- intervals for the two means indicated in part
ditional elapsed time with a carburetor jetting (d).
size of 70 and also with a jetting size of 76. (f) Give 90% lower prediction bounds for the next
(g) Give approximate 95% lower tolerance bounds specific surface area, first when x 1 = 9.0 and
for 90% of additional elapsed times, first with x2 = 60 and then when x1 = 10.0 and x2 = 70.
a carburetor jetting size of 70 and then with a (g) Give approximate 95% lower tolerance bounds
jetting size of 76. for 90% of specific surface areas, first when
(h) Make an ANOVA table for testing H0 : β1 = x1 = 9.0 and x2 = 60 and then when x1 = 10.0
β2 = 0 in the model y = β0 + β1 x + β2 x 2 + and x2 = 70.
. What is the meaning of this hypothesis in the (h) Make an ANOVA table for testing H0 : β1 =
context of the study and the quadratic model? β2 = 0 in the model y = β0 + β1 x1 + β2 x2 +
What is the p-value? . What is the p-value?
(i) Use a t statistic and test the null hypothesis H0 :
β2 = 0. What is the meaning of this hypothesis
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
y ≈ β 0 + β1 x 1 + β 2 x 2 + · · · + βk x k (9.74)
Response surfaces specified by equation (9.74) are “planar” (see again Figure 9.6
in this regard). When such surfaces fail to capture the nature of dependence of
y on x1 , x2 , . . . , xk because of their “lack of curvature,” quadratic approximate
relationships often prove effective. The general version of a quadratic equation for
y in k variables x has k linear terms, k quadratic terms, and cross product terms
for all pairs of x variables. For example, the general 3-variable quadratic response
surface is specified by
Gathering adequate One issue in using the k-variable version of quadratic function (9.75) is that of
data collecting adequate data to support the enterprise. 2k factorial data are not sufficient.
This is easy to see by considering the k = 1 case. Having data for only two different
values of x1 , say x1 = 0 and x1 = 1, would not be adequate to support the fitting of
y ≈ β0 + β1 x1 + β2 x12 (9.76)
There are, as an arbitrary example, many different versions of equation (9.76) with
y = 5 for x 1 = 0 and y = 7 for x 1 = 1, including
y ≈ 5 + 2x1 + 0x12
y ≈ 5 − 8x1 + 10x12
y ≈ 5 + 10x1 − 8x12
9.3 Application of Multiple Regression in Response Surface Problems and Factorial Analyses 699
y
8
7 y = 5 + 10x1 − 8x 12
y = 5 + 2 x1
5 y = 5 − 8x1 + 10 x 12
0 1 x1
These three equations have plots with quite different shapes. The first is linear, the
second is concave up with a minimum at x1 = .4, and the third is concave down
with a maximum at x1 = .625. This is illustrated in Figure 9.8. The point is that
data from at least three different x 1 values are needed in order to fit a one-variable
quadratic equation.
What would happen if a regression program were used to fit equation (9.76)
to a set of (x1 , y) data having only two different x1 values in it? The program
will typically refuse the user’s request, perhaps fitting instead the simpler equation
y ≈ β 0 + β1 x 1 .
Exactly what is needed in the way of data in order to fit a k-variable quadratic
equation is not easy to describe in elementary terms. 3k factorial data are sufficient
but for large k are really much more than are absolutely necessary. Statisticians have
invested substantial effort in identifying patterns of (x 1 , x2 , . . . , xk ) combinations
that are both small (in terms of number of different combinations) and effective (in
terms of facilitating precise estimation of the coefficients in a quadratic response
function). See, for example, Section 7.2.2 of Statistical Quality Assurance Methods
for Engineers by Vardeman and Jobe for a discussion of “central composite” plans
often employed to gather data adequate to fit a quadratic. An early successful
application of such a plan is described next.
Example 3 A Central Composite Study for Optimizing Bread Wrapper Seal Strength
The article “Sealing Strength of Wax-Polyethylene Blends” by Brown, Turner,
and Smith (Tappi, 1958) contains an interesting central composite data set. The
effects of the three process variables Seal Temperature, Cooling Bar Temperature,
and % Polyethylene Additive on the seal strength y of a bread wrapper stock were
studied. With the coding of the process variables indicated in Table 9.14, the data
700 Chapter 9 Regression Analysis—Inference for Curve- and Surface-Fitting
Factor Variable
t1 − 255
A Seal Temperature x1 = where t1 is in ◦ F
30
t2 − 55
B Cooling Bar Temperature x2 = where t2 is in ◦ F
9
c − 1.1
C Polyethylene Content x3 = where c is in %
.6
Table 9.15
Seal Strengths Produced under 15 Different Sets
of Process Conditions
Seal Strength,
x1 x2 x3 y (g/in.)
−1 −1 −1 6.6
1 −1 −1 6.9
−1 1 −1 7.9
1 1 −1 6.1
−1 −1 1 9.2
1 −1 1 6.8
−1 1 1 10.4
1 1 1 7.3
0 0 0 10.1
0 0 0 9.9
0 0 0 12.2
0 0 0 9.7
0 0 0 9.7
0 0 0 9.6
−1.682 0 0 9.8
1.682 0 0 5.0
0 −1.682 0 6.9
0 1.682 0 6.3
0 0 −1.682 4.0
0 0 1.682 8.6
9.3 Application of Multiple Regression in Response Surface Problems and Factorial Analyses 701
in Table 9.15 were obtained. Notice that there are fewer than 33 = 27 different
(x1 , x2 , x3 ) vectors in these data. (The central composite plan involves only 15
different combinations.)
If one fits a first-order (linear) model
y = β 0 + β1 x 1 + β2 x 2 + β3 x 3 + (9.77)
Plots and For small values of k, the interpretation of a fitted quadratic response function
interpreting a can be facilitated through the use of various plots. One possibility is to plot ŷ versus
fitted quadratic a particular system variable x, with values of any other system variables held fixed.
This was the method used in Figure 4.15 for the nitrogen plant data, in Figure 4.16
(see page 158) for the lift/drag ratio data of Burris, and in Figure 9.8 of this section
for the hypothetical one-variable quadratics. (It is also worth noting that in light of
the inference material presented in Section 9.2, one can enhance such plots of ŷ by
adding error bars based on confidence limits for the means µ y|x ,x ,...,x .)
1 2 k
A second kind of plot that can help in understanding a fitted quadratic function is
the contour plot. A contour plot is essentially a topographic map. For a given pair of
system variables (say x 1 and x2 ) one can, for fixed values of all other input variables,
sketch out the loci of points in the (x1 , x2 )-plane that produce several particular
values of ŷ. Most statistical packages and engineering mathematics packages will
make contour plots.
Example 3 Figure 9.9 shows a series of five contour plots made using the fitted equation
(continued ) (9.78) for seal strength. These correspond to x 3 = −2, −1, 0, 1, and 2. The figure
suggests that optimum predicted seal strength may be achievable for x 3 between
0 and 1, with x 1 between −2 and −1, and x2 between 0 and 1.
702 Chapter 9 Regression Analysis—Inference for Curve- and Surface-Fitting
x2 x2 x2
x3 = −2 x3 = −1 x3 = 0
2 2 2
1 1 1
y = 8.12 y = 10.59
–2 –1 1 2 x1 –2 –1 y = 8 1 2 x1 –2 –1 1 2 x1
y=3 y=1 y=7 y = 10 y = 9 y = 8
–1 y=2 y=6 –1
y = 3.54 –1
–2 –2 –2
x3 = 1 x2 x3 = 2 x2
2 2
1 y = 9.26 1
y = 10.97
y=9
–2 –1 y = 10 1 2 x1 –2 –1 y = 8 1 2 x1
y=9
–1 y = 8 y = 7 –1
–2 –2
Analytic interpretation Plotting is helpful in understanding a fitted quadratic primarily for small k. So
of a fitted quadratic it is important that there are also analytical tools that can be employed. To illustrate
their character, consider the simple case of k = 1. The basic nature of the quadratic
equation
ŷ = b0 + b1 x1 + b2 x12
b1
x1 = −
2b2
produces the minimum (b2 > 0) or maximum (b2 < 0) value of ŷ. Something like
this story is also true for k > 1.
It is necessary to use some matrix notation to say what happens for k > 1.
Temporarily modify the way the b’s are subscripted as follows. The meaning of
b0 will remain unchanged. b1 through bk will be the coefficients for the k system
9.3 Application of Multiple Regression in Response Surface Problems and Factorial Analyses 703
variables x1 through xk . b11 through bkk will be the coefficients for the k squares x 12
through xk2 . And for each i 6= j, bi j will be the coefficient of the xi x j cross product.
One can define a k × 1 vector b and a k × k matrix B as
b1
b2
b=
..
.
Vector of linear
bk
coefficients and
matrix of quadratic
coefficients b11 1
b
2 12
··· 1
b
2 1k
1
b b22 ··· 1
b
B=
2 12
.. ..
2 2k
..
. . .
1
b
2 1k
1
b
2 2k
··· bkk
With
x1
x2
x=
..
.
xk
(9.80) are negative, a fitted quadratic is bowl-shaped down and has a maximum at
the point (9.79). When some solutions to equation (9.80) are positive and some are
negative, the fitted quadratic surface has neither a maximum nor minimum (unless
one restricts attention to some bounded region of x vectors).
Printout 3 Analysis of the Fitted Quadratic for the Bread Wrapper Data
(Example 3)
Data Display
C1
-1.27090 -1.11680 -0.56190
Data Display
Matrix M5
-1.01104
0.26069
0.68146
Example 3 Printout 3 illustrates the use of MINITAB in the analytic investigation of the
(continued ) nature of the fitted surface (9.78) in the bread wrapper seal strength study. The
printout shows the three eigenvalues of B to be negative. The fitted seal strength
therefore has a maximum. This maximum is predicted to occur at the combination
of values x1 = −1.01, x2 = .26, and x3 = .68. (The MINITAB matrix functions
used to make the printout are under the “Calc/Matrices” menu, and the display
routine is under the “Manip/Display Data” menu.)
9.3 Application of Multiple Regression in Response Surface Problems and Factorial Analyses 705
looks deceptively simple. With proper choice of the inputs x, versions of it can
be used in a wide variety of contexts, including factorial analyses. For purposes
of illustration, consider the case of a complete two-way factorial study with I = 3
levels of factor A and J = 3 levels of factor B. In the usual two-way factorial
notation introduced in Definitions 1 and 2 ofPChapter 8, P
the basic constraints
P on the
main effects and two-factor interactions are i αi = 0, j β j = 0, and i αβi j =
P
j αβi j = 0. These imply that the I · J = 3 · 3 = 9 different mean responses in
such a study,
Table 9.16
Mean Responses in a 32 Factorial Study
i, j,
Level of A Level of B Mean Response
1 1 µ.. + α1 + β1 + αβ11
1 2 µ.. + α1 + β2 + αβ12
1 3 µ.. + α1 − β1 − β2 − αβ11 − αβ12
2 1 µ.. + α2 + β1 + αβ21
2 2 µ.. + α2 + β2 + αβ22
2 3 µ.. + α2 − β1 − β2 − αβ21 − αβ22
3 1 µ.. − α1 − α2 + β1 − αβ11 − αβ21
3 2 µ.. − α1 − α2 + β2 − αβ12 − αβ22
3 3 µ.. − α1 − α2 − β1 − β2 + αβ11 + αβ12 + αβ21 + αβ22
or 2 but with negative signs when i = 3 (= I ). In a similar manner, the first two
B main effects, β1 and β2 , appear with positive signs when (respectively) j = 1
or 2 but with negative signs when j = 3 (= J ). If one thinks of the four A and B
main effects used in Table 9.16 in terms of coefficients β in a regression model,
it soon becomes clear how to invent “system variables” x to make the regression
coefficients β appear with correct signs in the expressions for means µi j . That is,
define four dummy variables
1 if the response y is from level 1 of A
x1A = −1 if the response y is from level 3 of A
0 otherwise
1 if the response y is from level 2 of A
x2A = −1 if the response y is from level 3 of A
0 otherwise
1 if the response y is from level 1 of B
x1B = −1 if the response y is from level 3 of B
0 otherwise
1 if the response y is from level 2 of B
x2B = −1 if the response y is from level 3 of B
0 otherwise
Table 9.17
Correspondences between Regression Coefficients and the Grand
Mean and Main Effects in a 32 Factorial Study
What is more, since the x’s used here take only the values −1, 0, and 1, so
also do their products. And taken in pairs (one x A variable with one x B variable),
their products produce the correct (−1, 0, or 1) multipliers for the 2-factor inter-
actions αβ11 , αβ12 , αβ21 , and αβ22 appearing in Table 9.16. That is, if one thinks
of the interactions αβi j in terms of regression coefficients β, with the additional
correspondences listed in Table 9.18, the entire expression (9.82) can be written in
regression notation as
Table 9.18
Correspondence between Regression Coefficients and Interactions
in a 32 Factorial Study
The general I × J two-way factorial version of this story is similar. One defines
I − 1 factor A dummy variables x 1A , x2A , . . . , x IA−1 according to
I − 1 dummy 1 if the response y is from level i of A
variables for xiA = −1 if the response y is from level I of A (9.84)
factor A 0 otherwise
J − 1 dummy 1 if the response y is from level j of B
variables for x jB = −1 if the response y is from level J of B (9.85)
factor B 0 otherwise
Multiple regression and uses a regression program to do the computations. Estimated regression coeffi-
and two-way cients of xiA or x jB variables alone are estimated main effects, while those for xiA x jB
factorial analyses cross products are estimated 2-factor interactions.
Table 9.19
Strengths of 11 Wood Joints
B Wood Type
1 (Pine) 2 (Oak)
Table 9.20
Joint Strength Data Prepared for a Factorial Analysis Using
a Regression Program
i, j,
Joint Type Wood Type x1A x2A x1B y
1 1 1 0 1 829, 596
1 2 1 0 −1 1169
2 1 0 1 1 1348, 1207
2 2 0 1 −1 1518, 1927
3 1 −1 −1 1 1000, 859
3 2 −1 −1 −1 1295, 1561
Notice that because these data are unbalanced (due to the unfortunate loss
of one butt/oak response), it is not possible to fit a no-interaction model to these
data by simply adding together fitted effects (defined in Section 4.3) or to use
anything said in Chapter 8 to make inferences based on such a model. But it is
possible to use the dummy variable regression approach based on formulas (9.84)
and (9.85) to do so.
Consider the regression-data-set version of Table 9.19 given in Table 9.20.
Printouts 4 and 5 show the results of fitting the two regression models
to the data of Table 9.20. Printout 4 corresponding to model (9.86) is the full model
or µi j = µ.. + αi + β j + αβi j description of the data. For that regression run, the
reader should verify the correspondences between fitted regression coefficients
b and fitted effects (defined in Section 4.3), listed in Table 9.21. (For example,
Table 9.21
Correspondence between Fitted Regression Coefficients and Fitted Factorial
Effects for the Wood Joint Strength Data
s 182.2
128.9 = sSF · A = √ P = √
n 11 2
708.7 ± 2.365(94.8)
that is,
Similarly, using formula (9.56) on page 689, a 90% lower prediction limit for a
single additional butt/pine joint strength is
q
I 708.7 − 1.415 (154.7)2 + (94.8)2 = 452.0 psi
From these two calculations, it should be clear that other methods from
Section 9.2 could be used here as well. The reader should have no trouble finding
and using residuals and standardized residuals for the no-interaction model based
on formulas (9.39) and (9.42), giving simultaneous confidence intervals for all
six mean responses under the no-interaction model using formula (9.54) or giving
one-sided tolerance bounds for certain joint/wood combinations under the no-
interaction model using formula (9.58) or (9.59).
Data Display
1 1 0 1 829
2 1 0 1 596
3 1 0 -1 1169
4 0 1 1 1348
5 0 1 1 1207
6 0 1 -1 1518
7 0 1 -1 1927
8 -1 -1 1 1000
9 -1 -1 1 859
10 -1 -1 -1 1295
11 -1 -1 -1 1561
Regression Analysis
Analysis of Variance
Source DF SS MS F P
Regression 5 1283527 256705 7.73 0.021
Residual Error 5 166044 33209
Total 10 1449571
Source DF Seq SS
xa1 1 120144
xa2 1 577927
xb1 1 583908
xa1*xb1 1 897
xa2*xb1 1 650
Regression Analysis
Source DF SS MS F P
Regression 3 1281980 427327 17.85 0.001
Residual Error 7 167591 23942
Total 10 1449571
Source DF Seq SS
xa1 1 120144
xa2 1 577927
xb1 1 583908
9.3 Application of Multiple Regression in Response Surface Problems and Factorial Analyses 713
The pattern of analysis set out for two-way factorials carries over quite nat-
urally to three-way and higher factorials. To use a multiple regression program
to fit and make inferences based on simplified versions of the p-way factorial
Dummy variables model, proceed as follows. I − 1 dummy variables x1A , x2A , . . . , x IA−1 are defined
for regression (as before) to carry information about I levels of factor A, J − 1 dummy variables
analysis of p-way x1B , x2B , . . . , x BJ −1 are defined (as before) to carry information about J levels of factor
factorials B, K − 1 dummy variables x1C , x2C , . . . , x KC −1 are defined to carry information about
K levels of factor C, . . . , etc. Products of pairs of these, one each from the groups
representing two different factors, carry information about 2-factor interactions of
the factors. Products of triples of these, one each from the groups representing
three different factors, carry information about 3-factor interactions of the factors.
And so on.
When something short of the largest possible regression model is fitted to
an unbalanced factorial data set, the estimated coefficients b that result are the
least squares estimates of the underlying factorial effects in the few-effects model.
(Usually, these differ somewhat from the (full-model) fitted effects defined in Section
4.3.) All of the regression machinery of Section 9.2 can be applied to create fitted
values, residuals, and standardized residuals; to plot these to do model checking; to
make confidence intervals for mean responses; and to create prediction and tolerance
intervals.
When the regression with dummy variables approach is used as just described,
the fitted coefficients b correspond to fitted effects for the levels 1 through I − 1,
J − 1, K − 1, etc. of the factors. For two-level factorials, this means that the fitted
coefficients are estimated factorial effects for the “all low” treatment combination.
However, because of extensive use of the Yates algorithm in this text, you will
probably think first in terms of the 2 p factorial effects for the “all high” treatment
combination.
Two sensible courses of action then suggest themselves for the analysis of
unbalanced 2 p factorial data. You can proceed exactly as just indicated, using
Alternative choice dummy variables x1A , x1B , x1C , etc. and various products of the same, taking care
of x variables for to remember to interpret b’s as “all low” fitted effects and subsequently to switch
regression analysis signs as appropriate to get “all high” fitted effects. The other possibility is to depart
of 2p factorials slightly from the program laid out for general p-way factorials in 2 p cases: Instead
714 Chapter 9 Regression Analysis—Inference for Curve- and Surface-Fitting
of using the variables x1A , x1B , x1C , etc. and their products when doing regression, one
may use the variables
1 if the response y is from the high level of A
x2A = −x1A =
−1 if the response y is from the low level of A
1 if the response y is from the high level of B
x2B = −x1B =
−1 if the response y is from the low level of B
1 if the response y is from the high level of C
x2C = −x1C =
−1 if the response y is from the low level of C
etc. and their products when doing regression. When the variables x 2A , x2B , x2C , etc.
are used, the fitted b’s are the estimated “all high” 2 p factorial effects.
Table 9.22
Dynamometer Readings for 23 Treatment Combinations
For this slightly altered data set, the Yates algorithm produces the fitted
effects
x2A , x2B , x2C , x2A x2B , x2A x2C , x2B x2C , and x2A x2B x2C
(i.e., using the full model in regression terminology and the unrestricted 23
factorial model in the terminology of Section 8.2). On Printout 6, one can identify
the fitted regression coefficients b with the fitted factorial effects in the pairs
indicated in Table 9.23.
Table 9.23
Correspondence Between Fitted Regression Coefficients
and Fitted Factorial Effects for the Regression Run
of Printout 6
Example 5 Analysis of the data of Table 9.22 based on a full factorial model
(continued )
yi jkl = µ... + αi + β j + γk + αβi j + αγik + βγ jk + αβγi jk + i jkl
that is,
is a logical first step. Based on that step, it seems desirable to fit and draw
inferences based on a “B and C main effects only” description of y. Since the
data in Table 9.22 are unbalanced, the naive use of the reverse Yates algorithm
with the (full-model) fitted effects will not produce appropriate fitted values. ȳ ... ,
b2 , and c2 are simply not the least squares estimates of µ... , β2 , and γ2 for the “B
and C main effects only” model in this unbalanced data situation.
However, what can be done is to fit the reduced regression model
yi = β0 + β2 x2iB + β3 x2iC + i
to the data. Printout 7 represents the use of this technique. Locate on that printout
the (reduced-model) estimates of the factorial effects µ... , β2 , and γ2 and note
that they differ somewhat from ȳ ... , b2 , and c2 as defined in Section 4.3 and
displayed on Printout 6. Note also that the four different possible fitted mean
responses, along with their estimated standard deviations, are as given in Table
9.24.
The values in Table 9.24 can be used in the formulas of Section 9.2 to produce
confidence intervals for the four mean responses, prediction intervals, tolerance
intervals, and so on based on the “B and C main effects only” model. All of this
can be done despite the fact that the data of Table 9.22 are unbalanced.
Table 9.24
Fitted Values and Their Estimated Standard Deviations for a “B
and C Main Effects Only” Analysis of the Unbalanced Power
Requirement Data
Regression Analysis
Source DF SS MS F P
Regression 7 54.748 7.821 3.41 0.012
Residual Error 23 52.687 2.291
Total 30 107.435
Source DF Seq SS
xa2 1 2.202
xb2 1 22.645
xc2 1 28.398
xa*xb 1 0.091
xa*xc 1 0.051
xb*xc 1 1.293
xa*xb*xc 1 0.068
Regression Analysis
Source DF SS MS F P
Regression 2 50.972 25.486 12.64 0.000
Residual Error 28 56.463 2.017
Total 30 107.435
Source DF Seq SS
xb2 1 23.093
xc2 1 27.879
Example 5 has been treated as if the lack of balance in the data came about
by misfortune. And the lack of balance in Example 4 did come about in such a
way. But lack of balance in p-way factorial data can also be the result of careful
planning. Consider, for example, a 24 factorial situation where the budget can
support collection of 20 observations but not as many as 32. In such a case, complete
replication of the 16 combinations of two levels of four factors in order to achieve
balance is not possible. But it makes far more sense to replicate four of the 16
combinations (and thus be able to calculate sP and honestly assess the size of
background variation) than to achieve balance by using no replication. By now
it should be obvious how to subsequently go about the analysis of the resulting
partially replicated (and thus unbalanced) factorial data.
Section 3 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
at any fixed combination of time and tempera- (iii) give an approximate 95% lower tolerance
ture? How does this estimate compare with sP ? bound for the hardness increases of 90% of
Does there appear to be enough difference be- such slugs undergoing tempering.
tween the two values to cast serious doubt on 2. Return to the situation of Chapter Exercise 10 of
the appropriateness of the regression model? Chapter 8 and the chemical product impurity study.
(b) There was some concern on the project group’s The analysis suggested in that exercise leads to the
part that the 5-minute time was completely un- conclusion that only the A and B main effects are
like the other times and should not be consid- detectably nonzero. The data are unbalanced, so it
ered in the same analysis as the longer times. is not possible to use the reverse Yates algorithm
Temporarily delete the 12 slugs treated only 5 to fit the “A and B main effects only” model to the
minutes from consideration, refit the quadratic data.
model, and compare fitted values for the 36 (a) Use the dummy variable regression techniques
slugs tempered longer than 5 minutes for this to fit the “A and B main effects only” model.
regression to those from part (a). How different (You should be able to pattern what you do
are these two sets of values? after Example 5.) How do A and B main
Henceforth consider the quadratic model fitted to effects estimated on the basis of this few-
all 48 data points. effects/simplified description of the pattern of
(c) Make a contour plot showing how y varies with response compare with what you obtained for
ln(x1 ) and x2 . In particular, use it to identify the fitted effects using the Yates algorithm?
region of ln(x1 ) and x2 values where the tem- (b) Compute and plot standardized residuals for
pering seems to provide an increase in hard- the few-effects model. (Plot against levels of
ness. Sketch the corresponding region in the A, B, and C, against ŷ, and normal-plot them.)
(x1 , x2 )-plane. Do any of these plots indicate any problems
(d) For the x1 = 50 and x2 = 800 set of conditions, with the few-effects model?
(i) give a 95% two-sided confidence interval (c) How does sFE (which you can read directly off
for the mean increase in hardness provided by your printout as sSF ) compare with sP in this
tempering. situation? Do the two values carry any strong
(ii) give a 95% two-sided prediction interval suggestion of lack of fit?
for the increase in hardness produced by tem-
pering an additional slug.
Chapter 9 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Return to the situation of Chapter Exercise 3 of (b) Make a plot of the observed y’s versus the cor-
Chapter 4 and the grain growth study of Huda and responding ln(x2 )’s. On this plot, sketch the lin-
Ralph. Consider an analysis of the researchers’ data ear fitted response functions ( ŷ versus ln(x2 ))
based on the model for x1 = 1443, 1493, and 1543. Notice that the
fit to the researchers’ data is excellent. How-
y = β0 + β1 x1 + +β2 ln(x2 ) + β3 x1 ln(x2 ) + ever, notice also that the model has four β’s and
was fit based on only nine data points. What
(a) Fit this model to the data given in Chapter 4. possibility therefore needs to be kept in mind
Based on this fit, what is your estimate of the when making predictions based on this model?
standard deviation of grain size, y, associated
with different specimens treated using a fixed
temperature and time?
Chapter 9 Exercises 721
distinguish between various possible quadratic bonder. The effects of the variables Force, Ultra-
response surfaces in four variables.) sonic Power, Temperature, and Time on the final
(d) In light of the difficulty experienced in (c), ball bond shear strength were studied. The accom-
a natural thing to do might be to try to fit panying table gives data like those collected by the
quadratic surfaces involving only some of all authors. (The original data were not given in the
possible second-order terms. Fit the two mod- paper, but enough information was given to pro-
els for y2 including (i) x1 , x2 , x3 , x4 , x12 , x22 , duce these simulated values that have structure like
x32 , and x42 terms, and (ii) x1 , x2 , x4 , x12 , x22 , x42 , the original data.)
x1 x2 , and x2 x4 terms. How do these two fitted
equations compare in terms of ŷ 2 values for Force, Power, Temp., Time, Strength,
(x1 , x2 , x3 , x4 ) combinations in the data set? x1 (gm) x2 (mw) x3 ◦ C x4 (ms) y (gm)
How do ŷ 2 values compare for the two fitted
equations when x1 = 325, x2 = 550, x3 = 1.2, 30 60 175 15 26.2
and x4 = 200? (Notice that although this last 40 60 175 15 26.3
combination is not in the data set, there are 30 90 175 15 39.8
values of the individual variables in the data 40 90 175 15 39.7
set matching these.) What is the practical engi- 30 60 225 15 38.6
neering difficulty faced in a situation like this, 40 60 225 15 35.5
where there is not enough data available to fit 30 90 225 15 48.8
a full quadratic model but it doesn’t seem that
40 90 225 15 37.8
a model linear in the variables is an adequate
30 60 175 25 26.6
description of the response?
40 60 175 25 23.4
Henceforth, confine attention to y3 and consider an
analysis based on a model linear in all of x 1 , x2 , x3 , 30 90 175 25 38.6
and x4 . 40 90 175 25 52.1
(e) Give a 90% two-sided individual confidence 30 60 225 25 39.5
interval for the increase in mean selectivity ra- 40 60 225 25 32.3
tio that accompanies a 1 watt increase in power. 30 90 225 25 43.0
(f) What appear to be the optimal (large y3 ) set- 40 90 225 25 56.0
tings of the variables x1 , x2 , x3 , and x4 (within 25 75 200 20 35.2
their respective ranges of experimentation)? 45 75 200 20 46.9
Refer to the coefficients of your fitted equa- 35 45 200 20 22.7
tion from (b).
35 105 200 20 58.7
(g) Give a 90% two-sided confidence interval for
35 75 150 20 34.5
the mean selectivity ratio at the combination of
35 75 250 20 44.0
settings that you identified in (f). What cautions
would you include in a report in which this 35 75 200 10 35.7
interval is to appear? (Under what conditions 35 75 200 30 41.8
is your calculated interval going to have real- 35 75 200 20 36.5
world meaning?) 35 75 200 20 37.6
3. The article “How to Optimize and Control the Wire 35 75 200 20 40.3
Bonding Process: Part II” by Scheaffer and Levine 35 75 200 20 46.0
(Solid State Technology, 1991) discusses the use 35 75 200 20 27.8
of a k = 4 factor central composite design in the 35 75 200 20 40.3
improvement of the operation of the K&S 1484XQ
Chapter 9 Exercises 723
(a) Fit both the full quadratic response surface and and an Fr −k−1,n−r reference distribution, where
the simpler linear response surface to these large values of F count as evidence against H0 .
data. On the basis of simple examination of (If sSF is much larger than sP , the difference in the
the R 2 values, does it appear that the quadratic numerator of F will be large, producing a large
surface is enough better as a data summary sample value and a small observed level of signifi-
to make it worthwhile to suffer the increased cance.)
complexity that it brings with it? How do the (a) It is not possible to use the lack of fit test in
sSF values for the two fitted models compare any of Exercise 3 of Section 4.1, Exercise 2
to sP computed from the final six data points of Section 4.2, or Chapter Exercises 2 or 3 of
listed here? Chapter 4. Why?
(b) Conduct a formal test (in the full quadratic (b) For the situation of Exercise 2 of Section 9.1,
model) of the hypothesis that the linear model conduct a formal test of lack of fit of the linear
y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + is relationship µ y|x = β0 + β1 x to the concrete
an adequate description of the response. Does strength data.
your p-value support your qualitative judg- (c) For the situation of Exercise 1 of Section 9.3,
ment from part (a)? conduct a formal test of lack of fit of the full
(c) In the linear model y = β0 + β1 x1 + β2 x2 + quadratic relationship
β3 x3 + β4 x4 + , give a 90% confidence inter-
2
val for β2 . Interpret this interval in the context µ y|x = β0 + β1 ln(x1 ) + β2 x2 + β3 ln(x1 )
1 ,x 2
of the original engineering problem. (What is
β2 supposed to measure?) Would you expect + β4 x22 + β5 x2 ln(x1 )
the p-value from a test of H0 : β2 = 0 to be
large or to be small? to the hardness increase data.
(d) Use the linear model and find an approximate (d) For the situation of Chapter Exercise 3, con-
95% lower tolerance bound for 98% of bond duct a formal test of lack of fit of the linear
shear strengths at the center point x 1 = 35, relationship
x2 = 75, x3 = 200, and x4 = 20.
4. (Testing for “Lack of Fit” to a Regression Model) µ y|x = β 0 + β1 x 1 + β2 x 2
1 ,x 2 ,x 3 ,x 4
In curve- and surface-fitting problems where there + β 3 x 3 + β4 x 4
is some replication, this text has used the informal
comparison of sSF (or sLF ) to sP as a means of de- to the ball bond shear strength data.
tecting poor fit of a regression model. It is actually
possible to use these values to conduct a formal 5. Return to the situation of Chapter Exercises 18 and
significance test for lack of fit. That is, under the 19 of Chapter 4 and the ore refining study of S.
one-way normal model of Chapter 7, it is possible Osoka. In that study, the object was to discover set-
to test tings of the process variables x1 and x2 that would
simultaneously maximize y1 and minimize y2 .
H0 : µ y|x = β 0 + β1 x 1 + β2 x 2 + · · · + β k x k (a) Fit full quadratic response functions for y1 and
1 ,x 2 ,...,x k y2 to the data given in Chapter 4. Compute
and plot standardized residuals for these two
using the test statistic
fitted equations. Comment on the appearance
of these plots and what they indicate about the
(n − k − 1)sSF
2
− (n − r )sP2
appropriateness of the fitted response surfaces.
r −k−1 (b) One useful rule of thumb in response surface
F=
sP2 studies (suggested by Box, Hunter, and Hunter
in their book Statistics for Experimenters) is to
724 Chapter 9 Regression Analysis—Inference for Curve- and Surface-Fitting
check that for a fitted surface involving a total 6. Return to the concrete strength testing situation of
of l coefficients b (including b0 ), Chapter Exercise 16 of Chapter 4.
s (a) Find estimates of the parameters β0 , β1 , and
σ in the simple linear regression model y =
l · sSF
2
max ŷ − min ŷ > 4 β0 + β1 x + .
n (b) Compute standardized residuals and plot them
in the same ways that you were asked to plot
before trying to make decisions based on its na- the ordinary residuals in part (g) of the problem
ture (bowl-shape up or down, saddle, etc.) or in Chapter 4. How much do the appearances of
do even limited interpolation or extrapolation. the new plots differ from the earlier ones?
This criterion is a comparison of the movement (c) Make a 95% two-sided confidence interval for
of the fitted surface across those n data points the increase in mean compressive strength that
in hand, to four times an estimate of the root of accompanies a 5 psi increase in splitting tensile
the average variance associated with the n fit- strength. (Note: This is 5β1 .)
ted values ŷ. If the criterion is not satisfied, the (d) Make a 90% two-sided confidence interval for
interpretation is that the fitted surface is so flat the mean strength of specimens with splitting
(relative to the precision with which it is deter- tensile strength 300 psi (based on the simple
mined) as to make it impossible to tell with any linear regression model).
certainty the true nature of how mean response (e) Make a 90% two-sided prediction interval for
varies as a function of the system variables. the strength of an additional specimen with
Judge the usefulness of the surfaces fitted in part (a) splitting tensile strength 300 psi (based on the
against this criterion. Do the response surfaces ap- simple linear regression model).
pear to be determined adequately to support further (f) Find an approximate 95% lower tolerance
analysis (involving optimization, for example)? bound for the strengths of 90% of additional
(c) Use the analytic method discussed in Section specimens with splitting tensile strength 300
9.3 to investigate the nature of the response sur- psi (based on the simple linear regression model).
faces fitted in part (a). According to the signs
of the eigenvalues, what kinds of surfaces were 7. Wiltse, Blandin, and Schiesel experimented with
fitted to y1 and y2 , respectively? a grain thresher built for an agricultural engineer-
(d) Make contour plots of the fitted y1 and y2 re- ing design project. They ran efficiency tests on the
sponse surfaces from (a) on a single set of cleaning chamber of the machine. This part of the
(x1 , x2 )-axes. Use these to help locate (at least machine sucks air through threshed material, draw-
approximately) a point (x1 , x2 ) with maximum ing light (nonseed) material out an exhaust port,
predicted y1 , subject to a constraint that pre- while the heavier seeds fall into a collection tray.
dicted y2 be no larger than 55. Airflow is governed by the spacing of an air relief
(e) For the point identified in part (d), give 90% door. The following are the weights, y (in grams),
two-sided prediction intervals for the next val- of the portions of 14 gram samples of pure oat seeds
ues of y1 and y2 that would be produced by run through the cleaning chamber that ended up in
this refining process. Also give an approximate the collection tray. Four different door spacings x
95% lower tolerance bound for 90% of ad- were used, and 20 trials were made at each door
ditional pyrite recoveries and an approximate spacing.
95% upper tolerance bound for 90% of addi-
tional kaolin recoveries at this combination of
x1 and x2 settings.
Chapter 9 Exercises 725
(i) What does the hypothesis H0 : β1 = 0 mean nificance testing format.) What is the meaning
in the context of this study and the model be- of this hypothesis in the present context?
ing used in this exercise? Find the p-value (c) Use a t statistic and test the hypothesis H0 :
associated with a two-sided t test of this hy- β2 = 0 in the quadratic model. (Again, show
pothesis. the whole five-step significance testing for-
9. Return to the PETN density/detonation velocity mat.) What is the meaning of this hypothesis
data of Chapter Exercise 23 of Chapter 4. in the present context?
(a) Find estimates of the parameters β0 , β1 , and (d) Give a 95% two-sided confidence interval for
σ in the simple linear regression model y = the mean torque at failure for a thread engage-
β0 + β1 x + . How does your estimate of σ ment of 40 (in the units of the problem) using
compare to sP ? What does this comparison the quadratic model.
suggest about the reasonableness of the re- (e) Give a 95% two-sided prediction interval for
gression model for the data in hand? an additional torque at failure for a thread
(b) Compute standardized residuals and plot engagement of 40 using the quadratic model.
them in the same ways that you plotted the (f) Give an approximate 99% lower tolerance
residuals in part (g) of Chapter Exercise 23 bound for 95% of torques at failure for studs
of Chapter 4. How much do the appearances having thread engagements of 40 using the
of the new plots differ from the earlier ones? quadratic model.
(c) Make a 90% two-sided confidence interval 11. Return to the situation of Chapter Exercise 28 of
for the increase in mean detonation velocity Chapter 4 and the metal cutting experiment of
that accompanies a 1 g/cc increase in PETN Mielnick. Consider an analysis of the torque data
density. based on the model y10 = β0 + β1 x10 + β2 x20 + .
(d) Make a 90% two-sided confidence interval (a) Make a 90% two-sided confidence interval
for the mean detonation velocity of charges for the coefficient β1 .
with PETN density 0.65 g/cc. (b) Make a 90% two-sided confidence interval
(e) Make a 90% two-sided prediction interval for for the mean log torque when a .318 in drill
the next detonation velocity of a charge with and a feed rate of .005 in./rev are used.
PETN density 0.65 g/cc. (c) Make a 95% two-sided prediction interval for
(f) Make an approximate 99% lower tolerance an additional log torque when a .318 in drill
bound for the detonation velocities of 95% of and a feed rate of .005 in./rev are used. Expo-
charges having a PETN density of 0.65 g/cc. nentiate the endpoints of this interval to get
10. Return to the thread stripping problem of Chapter a prediction interval for a raw torque under
Exercise 24 of Chapter 4. these conditions.
(a) Find estimates of the parameters β0 , β1 , β2 , (d) Find a 95% two-sided confidence interval for
and σ in the model y = β0 + β1 x + β2 x 2 + the mean log torque for x 1 = .300 in and x2 =
. How does your estimate of σ compare to .010 in./rev.
sP ? What does this comparison suggest about 12. Return to Chapter Exercise 25 of Chapter 4 and
the reasonableness of the quadratic model for the tire grip force study.
the data in hand? What is your estimate of σ (a) Find estimates of the parameters β0 , β1 , and σ
supposed to be measuring? in the simple linear regression model ln(y) =
(b) Use an F statistic and test the null hypothe- β0 + β1 x + .
sis H0 : β1 = β2 = 0 for the quadratic model. (b) Compute standardized residuals and plot
(You may take values off a printout to help them in the same ways you plotted the resid-
you do this but show the whole five-step sig- uals in part (h) of Chapter Exercise 25 of
Chapter 9 Exercises 727
Chapter 4. How much do the appearances of (e) Give a 90% two-sided prediction interval for
the new plots differ from the earlier ones? the next permeability measured on a specimen
(c) Make a 90% two-sided confidence interval of this type having a 6.5% asphalt content.
for the increase in mean log grip force that (f) Find an approximate 95% lower tolerance
accompanies an increase in drag of 10% (e.g., bound for the permeability of 90% of the
from 30% drag to 40% drag). Note that this specimens of this type having a 6.5% asphalt
is 10β1 . content.
(d) Make a 95% two-sided confidence interval 14. Consider again the axial breaking strength data
for the mean log grip force of a tire of this of Koh, Morden, and Ogbourne given in Chapter
type under 30% drag (based on the simple Exercise 27 of Chapter 4. At one point in that
linear regression model). exercise, it is argued that perhaps the variable
(e) Make a 95% two-sided prediction interval for x3 = x12 /x2 is the principal determiner of axial
the raw grip force of another tire of this design breaking strength, y.
under 30% drag. (Hint: Begin by making an (a) Plot the 36 pairs (x 3 , y) corresponding to the
interval for log grip force of such a tire.) data given in Chapter 4. Note that a constant
(f) Find an approximate 95% lower tolerance σ assumption is probably not a good one over
bound for the grip forces of 90% of tires of the whole range of x 3 ’s in the students’ data.
this design under 30% drag (based on the sim- In light of the point raised in part (a), for purposes
ple linear regression model for ln(y)). of simple linear regression analysis, henceforth
13. Consider again the asphalt permeability data of restrict attention to those 27 data pairs with x 3 >
Woelfl, Wei, Faulstich, and Litwack given in .004.
Chapter Exercise 26 of Chapter 4. Use the qua- (b) Find estimates of the parameters β0 , β1 , and
dratic model y = β0 + β1 x + β2 x 2 + and do σ in the simple linear regression model y =
the following: β0 + β1 x3 + . How does your estimate of σ
(a) Find an estimate of σ in the quadratic model. based on the simple linear regression model
What is this supposed to measure? How does compare to sP ? What does this comparison
your estimate compare to sP here? What does suggest about the reasonableness of the re-
this comparison suggest to you? gression model for the data in hand?
(b) Use an F statistic and test the null hypothe- (c) Make a 98% two-sided confidence interval for
sis H0 : β1 = β2 = 0 for the quadratic model. the mean axial breaking strength of .250 in.
(You may take values off a printout to help dowels 8 in. in length based on the regression
you do this, but show the whole five-step sig- analysis. How does this interval compare with
nificance testing format.) What is the meaning the use of formula (6.20) and the four mea-
of this hypothesis in the present context? surements on dowels of this type contained in
(c) Use a t statistic and test the null hypothesis the data set?
H0 : β2 = 0 in the quadratic model. Again, (d) Make a 98% two-sided prediction interval for
show the whole five-step significance testing the axial breaking strength of a single addi-
format. What is the meaning of this hypothe- tional .250 in. dowel 8 in. in length. Do the
sis in the present context? same if the dowel is only 6 in. in length.
(d) Give a 90% two-sided confidence interval for (e) Make an approximate 95% lower tolerance
the mean permeability of specimens of this bound for the breaking strengths of 98% of
type with a 6.5% asphalt content. .250 in. dowels 8 in. in length.
APPENDIX
A
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
More on Probability
and Model Fitting
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
728
A.1 More Elementary Probability 729
Mathematically, outcomes are points in a universal set that is the sample space.
And notions of simple set theory become relevant. For one thing, subsets of S
containing more than one outcome can be of interest.
Once one has defined events, the standard set-theoretic operations of comple-
mentation, union, and intersection can be applied to them. However, rather than
using the typical “c ,” “∪,” and “∩” mathematical notation for these operations, it is
common in probability theory to substitute the use of the words not, or, and and,
respectively.
Example 1 metal specimen is to be tested using this two-detector system, a potential sample
(continued ) space consists of four outcomes corresponding to the possible combinations of
what can happen at each detector. That is, a possible sample space is specified in
a kind of set notation as
S = {(FPI signal and ECI signal), (no FPI signal and ECI signal), (A.1)
(FPI signal and no ECI signal), (no FPI signal and no ECI signal)}
and in tabular and pictorial forms as in Table A.1 and Figure A.1. Notice that
Figure A.1 can be treated as a kind of Venn diagram—the big square standing for
S and the four smaller squares making up S standing for events that each consist
of one of the four different possible outcomes.
Using this four-outcome sample space to describe experience with a metal
specimen, one can define several events of potential interest and illustrate the use
of the notation described in Definition 3. That is, let
A = {(FPI signal and ECI signal), (FPI signal and no ECI signal)} (A.2)
B = {(FPI signal and ECI signal), (no FPI signal and ECI signal)} (A.3)
Table A.1
A List of the Possible Outcomes for Two Inspections
ECI signal
Yes No
Yes
FPI signal
No
Then in words,
Part 1 of Definition 3 means, for example, that using notations () and (A.2),
not A = {(no FPI signal and ECI signal), (no FPI signal and no ECI signal)}
= the FPI detector doesn’t signal
Part 2 of Definition 3 means, for example, that using notations (A.2) and (A.3),
AorB = {(FPI signal and ECI signal), (FPI signal and no ECI signal),
(no FPI signal and ECI signal)}
= at least one of the two detectors’ signals
And Part 3 of Definition 3 means that again using (A.2) and (A.3), one has
notA, AorB, and AandB are shown in Venn diagram fashion in Figure A.2.
Elementary set theory allows the possibility that a set can be empty—that is,
have no elements. Such a concept is also needed in probability theory.
Definition 4 The empty event is an event containing no outcomes. The symbol ∅ is typically
used to stand for the empty event.
∅ has the interpretation that none of the possible outcomes of a chance situation occur.
The way in which ∅ is most useful in probability is in describing the relationship
between two events that have no outcomes in common, and thus cannot both occur.
There is special terminology for this eventuality (that AandB = ∅).
Definition 5 If event A and event B have no outcomes in common (i.e., AandB = ∅), then
the two events are called disjoint or mutually exclusive.
732 Appendix A More on Probability and Model Fitting
Yes Yes
Yes Yes
notA
ECI signal
AandB Yes No
Yes
FPI signal
No
Example 1 From Figure A.2 it is quite clear that, for example, the event A and the event
(continued ) notA are disjoint. And the event AandB and the event not(AorB), for example,
are also mutually exclusive events.
The relationships (1), (2), and (3) are the axioms of probability theory.
A1 = {X = 2}
A2 = {X = 3}
A3 = {X = 4}
A4 = {X = 5}
It is only in very simple situations that one would ever try to make use of
Definition 6 by checking that an entire candidate set of probabilities satisfies the
axioms of probability. It is more common to assign probabilities (totaling to 1) to
individual outcomes and then simply declare that the third axiom of Definition 6
734 Appendix A More on Probability and Model Fitting
will be followed in making up any other probabilities. (This strategy guarantees that
subsequent probability assignments will be logically consistent.)
S
{crack signaled}
{no crack signaled}
∅
P[S] = 1
P[crack signaled] = .3
P[no crack signaled] = .7
P[∅] = 0
Example 1 Returning to the situation of redundant inspection of metal parts using both
(continued ) fluoride penetrant and eddy current technologies, suppose that via extensive
testing it is possible to verify that for cracks of depth .005 in., the following four
values are sensible:
And further,
It is clear that to find the two values, one simply adds the numbers that appear in
Figure A.3 in the regions that are shaded in Figure A.2 delimiting the events in
question.
ECI signal
Yes No
FPI signal
No .32 .18
P[not A] = 1 − P[A]
This fact is again one that was used freely in Chapter 5 without explicit reference.
For example, in the context of independent, identical success-failure trials, the
fact that the probability of at least one success (i.e., P[X ≥ 1] for a binomial
random variable X) is 1 minus the probability of 0 successes (i.e., 1 − P[X = 0] =
1 − f (0)) is really a consequence of Proposition 1.
Example 1 Upon learning, via the addition of probabilities for individual outcomes given in
(continued ) displays (A.4) through (A.7), that the assignment
is also appropriate. (Of course, if the point here weren’t to illustrate the use
of Proposition 1, this value could just as well have been gotten by adding .32
and .18.)
A.1 More Elementary Probability 737
Note that when dealing with mutually exclusive events, the last term in equation (A.8)
is P[∅] = 0. Therefore, equation (A.8) simplifies to a two-event version of part (3)
of Definition 6. When the event A and the event B are not mutually exclusive,
the simple addition P[ A] + P[B] (so to speak) counts P[ AandB] twice, and the
subtraction in equation (A.8) corrects for this in the computing of P[AorB].
The practical usefulness of an equation like (A.8) is that when furnished with
any three of the four terms appearing in it, the fourth can be gotten by using simple
arithmetic.
P[at least one inspector detects the crack] = P[inspector 1 detects the crack]
+ P[inspector 2 detects the crack] − P[both inspectors detect the crack]
Thus,
so
Example 3 Of course, the .40 value is only as good as the three others used to produce it.
(continued ) But it is at least logically consistent with the given probabilities, and if they have
practical relevance, so does the .40 value.
A third simple theorem of probability concerns cases where the basic outcomes
in a sample space are judged to be equally likely.
Proposition 3 If the outcomes in a finite sample space S all have the same probability, then
for any event A,
S = {(G1, G2), (G1, G3), (G1, D), (G2, G1), (G2, G3), (G2, D), (G3, G1),
(G3, G2), (G3, D), (D, G1), (D, G2), (D, G3)}
G1
G2
First chip selected
G3
Then, noting that the 12 outcomes in this sample space are reasonably thought
of as equally likely and that 6 of them do not have D listed either first or second,
Proposition 3 suggests the assessment
6
P[two good chips] = = .50
12
Definition 7 For event A and event B, provided event B has nonzero probability, the
conditional probability of A given B is
P[AandB]
P[ A | B] = (A.9)
P[B]
The ratio (A.9) ought to make reasonable intuitive sense. If, for example,
P[ AandB] = .3 and P[B] = .5, one might reason that “B occurs only 50% of
the time, but of those times B occurs, A also occurs .3
.5
= 60% of the time. So .6 is
a sensible assessment of the likelihood of A knowing that indeed B occurs.”
740 Appendix A More on Probability and Model Fitting
Example 4 Return to the situation of selecting two integrated circuit chips at random from
(continued ) four residing in a storeroom, one of which is defective. Consider using expres-
sion (A.9) and evaluating
P[the second chip selected is defective | the first chip selected is good]
9
P[the first chip selected is good] = = .75
12
3
P[first chip selected is good and second is defective] = = .25
12
So using Definition 7,
3
1
P[the second chip selected is defective | the first selected is good] = 12
9
=
12
3
Of the 9 equally likely outcomes in S for which the first chip selected is good,
there are 3 for which the second chip selected is defective. If one thinks of the 9
outcomes for which the first chip selected is good as a kind of reduced sample
space (brought about by the partial restriction that the first chip selected is good),
then the 39 figure above is a perfectly plausible value for the likelihood that the
second chip is defective.
1
P[the second chip selected is defective | the first selected is good] =
3
because if the first is good, when the second is to be selected, the storeroom will
contain three chips, one of which is defective.
When one does have a natural value for P[A | B], the relationship between this
and the probabilities P[AandB] and P[B] can sometimes be exploited to evaluate
one or the other of them. This notion is important enough that the relationship (A.9)
is often rewritten by multiplying both sides by the quantity P[B] and calling the
result the multiplication rule of probability.
A.1 More Elementary Probability 741
Combining these two values according to rule (A.10), one then sees that the
authors’ assessment of the failure probability for each primary O-ring was
Typically, the numerical values of P[ A | B] and P[A] are different. The dif-
ference can be thought of as reflecting the change in one’s assessed likelihood of
occurrence of A brought about by knowing that B’s occurrence is certain. In cases
where there is no difference, the terminology of independence is used.
P[ A | B] = P[A]
Example 1 Consider again the example of redundant fatigue crack inspection with probabil-
(continued ) ities given in Figure A.3. Since
Yes .4 .1
FPI signal
No .4 .1
P[ECI signal] = .4 + .4 = .8
.4
P[ECI signal | FPI signal] = = .8
.4 + .1
The multiplication Independence is the mathematical formalization of the qualitative notion of un-
rule when A and B relatedness. One way in which it is used in engineering applications is in conjunction
are independent with the multiplication rule. If one has values for P[A] and P[B] and judges the
event A and the event B to be unrelated, then independence allows one to replace
P[A | B] with P[ A] in formula (A.10) and evaluate P[ AandB] as P[ A] · P[B].
(This fact was behind the scenes in Section 5.1 when sequences of independent
identical success-failure trials and the binomial and geometric distributions were
discussed.)
Example 5 In their probabilistic risk assessment of the pre-Challenger space shuttle solid
(continued ) rocket motor field joints, Dalal, Fowlkes, and Hoadley arrived at the figure
P[failure] = .023
for a single field joint in a shuttle launch at 31◦ F. A shuttle’s two solid rocket
motors have a total of six such field joints, and it is perhaps plausible to think of
their failures as independent events.
A.1 More Elementary Probability 743
Section 1 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. Return to the situation of Chapter Exercise 30 of 2. A bin of nuts is mixed, containing 30% 12 in. nuts
Chapter 5, where measured diameters of a turned and 70% 16 9
in. nuts. A bin of bolts has 40% 12 in.
metal part were coded as Green, Yellow, or Red, bolts and 60% 16 9
in. bolts. Suppose that one bolt
depending upon how close they were to a mid- and one nut are selected (independently and at ran-
specification. Suppose that the probabilities that a dom) from the two bins.
given diameter falls into the various zones are .6247 (a) What is the probability that the nut and bolt
for the Green Zone, .3023 for the Yellow Zone, and match?
.0730 for the Red Zone. Suppose further (as in the (b) What is the conditional probability that the nut
problem in Chapter 5) that the lathe turning the 9
is a 16 in. nut, given that the nut and bolt match?
parts is checked once per hour according to the fol-
lowing rules: One diameter is measured, and if it is 3. A physics student is presented with six unmarked
in the Green Zone, no further action is needed that specimens of radioactive material. She knows that
hour. If it is in the Red Zone, the process is immedi- two are of substance A and four are of substance B.
ately stopped. If it is in the Yellow Zone, a second Further, she knows that when tested with a Geiger
diameter is measured. If the second diameter is counter, substance A will produce an average of
in the Green Zone, no further action is necessary, three counts per second, while substance B will
but if it is not, the process is stopped immediately. produce an average of four counts per second. (Use
Suppose further that the lathe is physically stable, Poisson models for the counts per time period.)
so that it makes sense to think of successive color (a) Suppose the student selects a sample at random
codes as independent. and makes a one-second check of radioactiv-
(a) Show that the probability that the process is ity. If one count is observed, how should the
stopped in a given hour is .1865. student assess the (conditional) probability that
(b) Given that the process is stopped, what is the the specimen is of substance A?
conditional probability that the first diameter (b) Suppose the student selects a sample at random
was in the Yellow Zone? and makes a ten-second check of radioactivity.
744 Appendix A More on Probability and Model Fitting
If ten counts are observed, how should the stu- (e) Describe any two mutually exclusive events in
dent assess the (conditional) probability that this situation.
the specimen is of substance A? 6. A lot of machine parts is checked piece by piece
(c) Are your answers to (a) and (b) the same? How for Brinell hardness and diameter, with the result-
should this be understood? ing counts as shown in the accompanying table. A
4. At final inspection of certain integrated circuit single part is selected at random from this lot.
chips, 20% of the chips are in fact defective. An (a) What is the probability that it is more than
automatic testing device does the final inspection. 1.005 in. in diameter?
Its characteristics are such that 95% of good chips (b) What is the probability that it is more than
test as good. Also, 10% of the defective chips test 1.005 in. in diameter and has Brinell hardness
as good. of more than 210?
(a) What is the probability that the next chip is
good and tests as good? Diameter
(b) What is the probability that the next chip tests
as good? 1.000 to
(c) What is the (conditional) probability that the < 1.000 in. 1.005 in. > 1.005 in.
next chip that tests as good is in fact good?
< 190 154 98 48
5. In the process of producing piston rings, the rings Brinell 190–210 94 307 99
are subjected to a first grind. Those rings whose Hardness
> 210 33 72 95
thicknesses remain above an upper specification
are reground. The history of the grinding process
(c) What is the probability that it is more than
has been that on the first grind,
1.005 in. in diameter or has Brinell hardness of
more than 210?
60% of the rings meet specifications (and are
(d) What is the conditional probability that it has
done processing)
a diameter over 1.005 in., given that its Brinell
25% of the rings are above the upper specifi- hardness is over 210?
cation (and are reground) (e) Are the events {Brinell hardness over 210} and
15% of the rings are below the lower specifi- {diameter over 1.005 in.} independent? Ex-
cation (and are scrapped) plain.
(f) Name any two mutually exclusive events in this
The history has been that after the second grind, situation.
7. Widgets produced in a factory can be classified as
80% of the reground rings meet specifications defective, marginal, or good. At present, a machine
20% of the reground rings are below the lower is producing about 5% defective, 15% marginal,
specification and 80% good widgets. An engineer plans the fol-
lowing method of checking on the machine’s ad-
A ring enters the grinding process today. justment: Two widgets will be sampled initially,
(a) Evaluate P[the ring is ground only once]. and if either is defective, the machine will be im-
(b) Evaluate P[the ring meets specifications]. mediately adjusted. If both are good, testing will
(c) Evaluate P[the ring is ground only once | the cease without adjustment. If neither of these first
ring meets specifications]. two possibilities occurs, an additional three wid-
(d) Are the events {the ring is ground only once} gets will be sampled. If all three of these are good,
and {the ring meets specifications} indepen- or two are good and one is marginal, testing will
dent events? Explain.
A.1 More Elementary Probability 745
cease without machine adjustment. Otherwise, the Suppose first that the single pair {00} is transmitted.
machine will be adjusted. (a) Find the probability that the pair is correctly
(a) Evaluate P[only two widgets are sampled and received.
no adjustment is made]. (b) Find the probability that what is received has
(b) Evaluate P[only two widgets are sampled]. obviously been corrupted.
(c) Evaluate P[no adjustment is made]. (c) Find the conditional probability that the pair
(d) Evaluate P[no adjustment is made | only two is correctly received given that it is not obvi-
widgets are sampled]. ously corrupted.
(e) Are the events {only two widgets are sam- Suppose now that the “doubled string” {00 00 11 11} is
pled} and {no adjustment is made} indepen- transmitted and that the string received is not obviously
dent events? Explain. corrupted.
(f) Describe any two mutually exclusive events in (d) What is then a reasonable assignment of the
this situation. “chance” that the correct message string
8. Glass vials of a certain type are conforming, blem- (namely {0 0 1 1}) is received? (Hint: Use
ished (but usable), or defective. Two large lots of your answer to part c).)
these vials have the following compositions. 10. Figure A.6 is a Venn diagram with some proba-
bilities of events marked on it. In addition to the
Lot 1: 70% conforming, 20% blemished, and values marked on the diagram, it is the case that,
10% defective P[B] = .4 and P[C | A] = .8.
Lot 2: 80% conforming, 10% blemished, and
10% defective
A B
Lot 1 is three times the size of Lot 2 and these two .1
lots have been mixed in a storeroom. Suppose that
a vial from the storeroom is selected at random to .1
use in a chemical analysis. .3 .2
(a) What is the probability that the vial is from Lot
1 and not defective? .1
C
(b) What is the probability that the vial is blem-
ished?
(c) What is the conditional probability that the vial Figure A.6 Figure for Exercise 10
is from Lot 1 given that it is blemished?
9. A digital communications system transmits infor- (a) Finish filling in the probabilities on the di-
mation encoded as strings of 0’s and 1’s. As a means agram. That is, evaluate the three probabili-
of reducing transmission errors, each digit in a mes- ties P[ AandB and notC], P[Aand notB and
sage string is repeated twice. Hence the message notC] and P[not( AorBorC)] = P[not A and
string {0 1 1 0} would (ideally) be transmitted as notB and notC].
{00 11 11 00} and if digits received in a given pair (b) Use the probabilities on the diagram (and your
don’t match, one can be sure that the pair has been answers to (a)) and evaluate P[AandB].
corrupted in transmission. When each individual (c) Use the probabilities on the diagram and eval-
digit in a “doubled string” like {00 11 11 00} is uate P[B | C].
transmitted, there is a probability p of transmis- (d) Based on the information provided here, are
sion error. Further, whether or not a particular digit the events B, C independent events? Explain.
is correctly transferred is independent of whether
any other one is correctly transferred.
746 Appendix A More on Probability and Model Fitting
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1 C1 C2 C3 2
where the last step depends on the independence assumption. And in general, if the
reliability of component Ci (i.e., P[Ci functions]) is ri , then assuming that the k
components in a series system behave independently, the (series) system reliability
(say, RS ), becomes
Series system
reliability for
RS = r 1 · r 2 · r 3 · · · · · r k (A.11)
independent
components
Example 6 Space Shuttle Solid Rocket Motor Field Joints as a Series System
(Example 5 revisited )
The probabilistic risk assessment of Dalal, Fowlkes, and Hoadley put the relia-
bility (at 31◦ F) of pre-Challenger solid rocket motor field joints at .977 apiece.
Since the proper functioning of six such joints is necessary for the safe operation
of the solid rocket motors, assuming independence of the joints, the reliability of
the system of joints is then
RS = (.977)(.977)(.977)(.977)(.977)(.977) = .87
as in Example 5. (The .87 figure might well be considered optimistic with regard
to the entire solid rocket motor system, as it doesn’t take into account any potential
problems other than those involving field joints.)
Since typically each ri is less than 1.0, formula (A.11) shows (as intuitively it
should) that system reliability decreases as components are added to a series system.
And system reliability is no better (larger) than the worst (smallest) component
reliability.
C1
1 C2 2
C3
The fact that made it easy to develop formula (A.11) for the reliability of a
series system is that for a series system to function, all components must function.
The corresponding fact for a parallel system is that for a parallel system to fail, all
components must fail. So if it is sensible to model the functioning of the individual
components in a parallel system as independent, if ri is the reliability of component
i, and if RP is the (parallel) system reliability,
RP = 1 − (1 − r )k
and this can be solved for an approximate number of components required, giving
Approximate number
of components with
individual reliability ln(1 − RP )
k≈ (A.13)
r needed to produce ln(1 − r )
parallel system
reliability RP
Using (for the sake of example) the values r = .80 and RP = .98, expression (A.13)
gives k ≈ 2.4, so rounding up to an integer, 3 components of individual 80% relia-
bility will be required to give a parallel system reliability of at least 98%.
Example 8 Switching
(continued ) subsystem
C1
1 CA CB C2 2
C3
It is clear that the weak link in this communications system is at site A, rather
than at B or at the communications hub.
Section 2 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. A series system is to consist of k = 5 independent Engineering design requirements are that the en-
components with comparable individual reliabili- tire system have overall reliability at least .99. Two
ties. How reliable must each be if the system re- kinds of components are available. Type A compo-
liability is to be at least .999? Suppose that it is nents cost $8 apiece and have reliability .98. Type B
your job to guarantee components have this kind of components cost $5 apiece and have reliability .90.
individual reliability. Do you see any difficulty in (a) If only type A components are used, what will
empirically demonstrating this level of component be the minimum system cost? If only type B
performance? Explain. components are used, what will be the mini-
2. A parallel system is to consist of k identical inde- mum system cost?
pendent components. Design requirements are that (b) Find a system design meeting engineering re-
system reliability be at least .99. Individual com- quirements that uses some components of each
ponent reliability is thought to be at least .90. How type and is cheaper than the best option in
large must k be? part (a).
3. A combination series-parallel system is to consist
of k = 3 parallel subsystems, themselves in series.
A.3 Counting 751
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
A.3 Counting
Proposition 3 and Example 4 illustrate that using a model for a chance situation
that consists of a finite sample space S with outcomes judged to be equally likely,
the computation of probabilities for events of interest is conceptually a very simple
matter. The number of outcomes in the event are simply counted up and divided by
the total number of outcomes in the whole sample space. However, in most realistic
applications of this simple idea, the process of writing down all outcomes in S and
doing the counting involved would be most tedious indeed, and often completely
impractical. Fortunately, there are some simple principles of counting that can often
be applied to shortcut the process, allowing outcomes to be counted mentally. The
purpose of this section is to present those counting techniques.
This section presents a multiplication principle of counting, the notion of per-
mutations and how to count them, and the idea of combinations and how to count
them, along with a few examples. This material is on the very fringe of what is
appropriate for inclusion in this book. It is not statistics, nor even really probability,
but rather a piece of discrete mathematics that has some engineering implications.
It is included here for two reasons. First is the matter of tradition. Counting has tra-
ditionally been part of most elementary expositions of probability, because games
of chance (cards, coins, and dice) are often assumed to be fair and thus describable
in terms of sample spaces with equally likely outcomes. And for better or worse,
games of chance have been a principal source of examples in elementary probability.
A second and perhaps more appealing reason for including the material is that it
does have engineering applications (regardless of whether they are central to the
particular mission of this text). Ultimately, the reader should take this short section
for what it is: a digression from the book’s main story that can on occasion be quite
helpful in engineering problems.
n = n 1 · n 2 · · · · · nr
752 Appendix A More on Probability and Model Fitting
In graphical terms, this proposition is just a statement that a tree diagram that
has n 1 first-level nodes, each of which leads to n 2 second-level nodes, and so on,
must end up having a total of n 1 · n 2 · · · · · n r r th-level nodes.
n 1 · n 2 · n 3 = 3 · 4 · 2 = 24
100 · 99 · 98 · 97 = 94,109,400
Proposition 5 shows that the number of different ways in which this placement can
be accomplished is
since at each stage of sequentially placing objects into positions, there is one less
object available for placement. The special terminology and notation for this are
next.
In the notation of Definition 11, one has (from expression (A.14) that
that is,
Formula for the
number of n!
Pn,r = (A.15)
permutations of (n − r )!
n things r at a time
Example 10 In the special permutation notation, the number of different ways of installing
(continued ) the four pistons is
100!
P100,4 =
96!
Example 11
(continued )
front face of the hub (and one therefore thinks of the blade positions as completely
distinguishable), there are
P12,12 = 12 · 11 · 10 · · · · · 2 · 1
P11,11 = 11 · 10 · 9 · · · · · 2 · 1
n
combinations possible will be symbolized as r
, read “the number of combi-
nations of n things r at a time.”
There is in Definition 12 a slight conflict in terminology with other usage in this text.
The “combination” in Definition 12 is not the same as the “treatment combination”
terminology used in connection with multifactor statistical studies to describe a
set of conditions under which a sample is taken. (The “treatment combination”
terminology has been used in this very section in Example 9.) But this conflict
rarely causes problems, since the intended meaning of the word combination is
essentially always clear from context.
Appropriate use of Proposition 5 and formula (A.15) makes it possible to
develop a formula for nr as follows. A permutation of r out of n distinguishable
objects can be created through a two-step process. First a combination of r out of
the n objects is selected and then those selected objects are placed in an order. This
thinking suggests that Pn,r can be written as
n
Pn,r = · Pr,r
r
that is,
Formula for the
number of n n!
= (A.16)
combinations of r r ! (n − r )!
n things r at a time
The ratio in equation (A.16) ought to look familiar to readers who have studied
Section 5.1. The multiplier of p x (1 − p)n−x in the binomial probability function is
n
of the form x , counting up the number of ways of placing x successes in a series
of n trials.
Example 12 Then notice that directly from expression (A.16), there are in fact
(continued )
3000 3000!
=
100 100! 2900!
would then be a sensible figure to use for the probability that the sample is
composed entirely of connectors initially judged to be defect-free.
It is instructive to take this example one step further and combine the use
of Definition 12 and Proposition 5. So consider the problem of counting up the
number of different samples containing 96 connectors initially judged defect-free,
1 judged to have only minor defects, and 3 judged to have moderately serious,
serious, or very serious defects. To solve this problem, the creation of such a
sample can be considered as a three-step process. In the first, 96 nominally defect-
free connectors are chosen from 2,985. In the second, 1 connector nominally
having minor defects only is chosen from 1. And finally, 3 connectors are chosen
from the remaining 14. There are thus
2985 1 14
· ·
96 1 3
that 3 of the seemingly identical devices are defective. Consider the probability
that each lab receives 1 defective device, if the assignment of devices to labs is
done at random.
The total number of possible assignments of devices to labs can be computed
by thinking first of choosing 5 of 15 to send to Lab A, then 5 of the remaining 10
to send to Lab B, then sending the remaining 5 to Lab C. There are thus
15 10 5
· ·
5 5 5
Section 3 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. A lot of 100 machine parts contains 10 with diam- the high side, and 7 parts with diameters that
eters that are out of specifications on the low side, are in specifications?
20 with diameters that are out of specifications on 2. The lengths of bolts produced in a factory are
the high side, and 70 that are in specifications. checked with two “go–no go” gauges and the bolts
(a) How many different possible samples of n = sorted into piles of short, OK, and long bolts. Sup-
10 of these parts are there? pose that of the bolts produced, about 20% are
(b) How many different possible samples of size short, 30% are long, and 50% are OK.
n = 10 are there that each contain exactly 1 (a) Find the probability that among the next ten
part with diameter out of specifications on the bolts checked, the first three are too short, the
low side, 2 parts with diameters out of spec- next three are OK, and the last four are too
ifications on the high side, and 7 parts with long.
diameters that are in specifications? (b) Find the probability that among the next ten
(c) Based on your answers to (a) and (b), what is bolts checked, there are three that are too short,
the probability that a simple random sample of three that are OK, and four that are too long.
n = 10 of these contains exactly 1 part with (Hint: In how many ways it is possible to
diameter out of specifications on the low side, choose three of the group to be short, three
2 parts with diameters out of specifications on
758 Appendix A More on Probability and Model Fitting
to be OK, and four to be long? Then use your (b) What is the probability that all three digits in
answer to (a).) her number are different?
3. User names on a computer system consist of three (c) What is the probability that her number uses
letters A through Z, followed by two digits 0 three different digits and lists them in either
through 9. (Letters and digits may appear more ascending or descending order?
than once in a name.) 6. When ready to configure a PC order, a consumer
(a) How many user names of this type are there? must choose a Processor Chip, a MotherBoard, a
(b) Suppose that Joe has user name TPK66, but Drive Controller and a Hard Drive. The choices
unfortunately he’s forgotten it. Joe remembers are:
only the format of the user names and that the
letters K, P, and T appear in his name. If he Processor Mother- Drive Hard
picks a name at random from those consistent Chip Board Controller Drive
with his memory, what’s the probability that he
selects his own? Fast New Generation Premium Premium Premium
(c) If Joe in part (b) also remembers that his digits Slow New Generation Standard Standard Standard
match, what’s the probability that he selects his Fast Old Generation Economy Economy
own user name? Slow Old Generation
4. A lot contains ten pH meters, three of which are
miscalibrated. A technician selects these meters (a) Suppose initially that all components are com-
one at a time, at random without replacement, and patible with all components. How many differ-
checks their calibration. ent configurations are possible?
(a) What is the probability that among the first four Suppose henceforth that:
meters selected, exactly one is miscalibrated? (i) a Premium MotherBoard is needed to run a New
(b) What is the probability that the technician dis- Generation Processor,
covers his second miscalibrated meter when (ii) a Premium MotherBoard is needed to run a
checking his fifth one? Premium Drive Controller, and
(iii) a Premium Drive Controller is needed to run a
5. A student decides to use the random digit function
Premium Hard Drive.
on her calculator to select a three-digit PIN number
(b) How many permissible configurations are there
for use with her new ATM card. (Assume that all
with a Standard MotherBoard?
numbers 000 through 999 are then equally likely to
(c) How many permissible configurations are there
be chosen.)
total? Explain carefully.
(a) What is the probability that her number uses
only odd digits?
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Definition 13 The survivorship function for a nonnegative random variable T is the function
f (t)
h(t) =
S(t)
h(t) is sometimes called the hazard function for T, but such usage tends to perpetu-
ate unfortunate confusion with the entirely different concept of “hazard rate” for re-
pairable systems. (The important difference between the two concepts is admirably
explained in the paper “On the Foundations of Reliability” by W. A. Thompson
(Technometrics, 1981) and in the book Repairable Systems Reliability by Ascher
and Feingold.) This book will thus stick to the term force of mortality.
The force-of-mortality function can be thought of heuristically as
Example 14 The force-of-mortality function for the diesel engine fan example is, for t > 0,
(continued )
f (t)
1
27,800
e−t/27,800 1
h(t) = = −t/27,800
=
S(t) e 27,800
A.4 Probabilistic Concepts Useful in Survival Analysis 761
The exponential (mean α = 27,800) model for fan life implies a constant 1
27,800
force of mortality.
The property of the fan-life model shown in the previous example is characteris-
Constant force tic of exponential distributions. That is, a distribution has constant force of mortality
of mortality is exactly when that distribution is exponential. So having a constant force of mortality
equivalent to is equivalent to possessing the memoryless property of the exponential distributions
exponential discussed in Section 5.2. If the lifetime of an engineering component is described
distribution using a constant force of mortality, there is no (mathematical) reason to replace such
a component before it fails. The distribution of its remaining life from any point in
time is the same as the distribution of the time till failure of a new component of the
same type.
Potential probability models for lifetime random variables are often classified
according to the nature of their force-of-mortality functions, and these classifi-
cations are taken into account when selecting models for reliability engineering
applications. If h(t) is increasing in t, the corresponding distribution is called
an increasing force-of-mortality (IFM) distribution, and if h(t) is decreasing
in t, the corresponding distribution is called a decreasing force-of-mortality
(DFM) distribution. The reliability engineering implications of an IFM distri-
bution being appropriate for modeling the lifetimes of a particular type of com-
ponent are often that (as a form of preventative maintenance) such components
are retired from service once they reach a particular age, even if they have not
failed.
β β−1 −(t/α)β
t e
f (t) f (t) β βt β−1
h(t) = = = α β
=
S(t) 1 − F(t) e−(t/α) αβ
For β = 1 (the exponential distribution case) this is constant. For β < 1, this is
decreasing in t, and the Weibull distributions with β < 1 are DFM distributions.
For β > 1, this is increasing in t, and the Weibull distributions with β > 1 are
IFM distributions.
762 Appendix A More on Probability and Model Fitting
f (t)
1.0
.5
0 1 t
h(t)
3.0
2.0
1.0
0 1 t
Figure A.11 shows plots of both f (t) and h(t) for the uniform model. h(t) is
clearly increasing for 0 < t < 1 (quite drastically so, in fact, as one approaches
t = 1). And well it should be. Knowing that (according to the uniform model) life
will certainly end by t = 1, nervousness about impending death should skyrocket
as one nears t = 1.
h(t)
0 t
The shape in Figure A.12 is often referred to as the bathtub curve shape. It
includes an early region of decreasing force of mortality, a long central period of
relatively constant force of mortality, and a late period of rapidly increasing force
of mortality. Devices with lifetimes describable as in Figure A.12 are sometimes
subjected to a burn-in period to eliminate the devices that will fail in the early
period of decreasing force of mortality, and then sold with the recommendation
that they be replaced before the onset of the late period of increasing force of
mortality or wear-out. Although this story is intuitively appealing, the most tractable
models for life length do not, in fact, have force-of-mortality functions with shapes
like that in Figure A.12. For a further discussion of this matter and references to
papers presenting models with bathtub-shaped force-of-mortality functions, refer to
Chapter 2 of Nelson’s Applied Life Data Analysis.
The functions f (t), F(t), S(t), and h(t) all carry the same information about
a life distribution. They simply express it in different terms. Given one of them,
the derivation of the others is (at least in theory) straightforward. Some of the
764 Appendix A More on Probability and Model Fitting
relationships that exist among the four different characterizations are collected here
for the reader’s convenience. For t > 0,
Z t
F(t) = f (x) dx
0
d
f (t) = F(t)
dt
S(t) = 1 − F(t)
Relationships
between F(t), f(t), f (t)
h(t) =
S(t), and h(t) S(t)
Z t
S(t) = exp − h(x) dx
0
Z t
f (t) = h(t) exp − h(x) dx
0
Section 4 Exercises ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1. An engineer begins a series of presentations to his of-mortality function? Is it constant like those
corporate management with a working bulb in his of X and Y ?
slide projector and (an inferior-quality) Brand W 2. A common modeling device in reliability applica-
replacement bulb in his briefcase. Suppose that the tions is to assume that the (natural) logarithm of a
random variables lifetime variable, T , has a normal distribution. That
is, one might suppose that for some parameters µ
X = the number of hours of service given by and σ , if t > 0
the bulb in the projector
Y = the number of hours of service given by ln t − µ
F(t) = P[T ≤ t] = 8
the spare bulb σ
may be modeled as independent exponential ran- Consider the µ = 0 and σ = 1 version of this.
dom variables with respective means 15 and 5. The (a) Plot F(t) versus t.
number of hours that the engineer may operate (b) Plot S(t) versus t.
without disaster is X + Y . (c) Compute and plot f (t) versus t.
(a) Find the mean and standard deviation of X + Y (d) Compute and plot h(t) versus t.
using Proposition 1 in Chapter 5. (e) Is this distribution for T an IFM distribution,
(b) Find, for t > 0, P[X + Y ≤ t]. a DFM distribution, or neither? What implica-
(c) Use your answer to (b) and find the probability tion does your answer have for in-service re-
density for T = X + Y . placement of devices possessing this lifetime
(d) Find the survivorship and force-of-mortality distribution?
functions for T . What is the nature of the force-
A.5 Maximum Likelihood Fitting of Probability Models and Related Inference Methods 765
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
f 2 ( y)
A discrete data
likelihood function f 2 ( y) (A.17)
766 Appendix A More on Probability and Model Fitting
A discrete data
log likelihood L(2) = ln f 2 ( y) (A.18)
function
might well be modeled using a binomial distribution with n = 100 and p some
unknown parameter. The corresponding probability function is thus
100!
p x (1 − p)100−x x = 0, 1, . . . , 100
f (x) = x! (100 − x)!
0 otherwise
66
I p̂ = = .66
100
That is, p = .66 makes the chance of observing the particular data in hand
(X = 66) as large as possible.
A.5 Maximum Likelihood Fitting of Probability Models and Related Inference Methods 767
L( p)
−5
−10
−15
.4 .5 .6 .7 .8 p
p = .66
X
n X
n X
n X
n
L(λ) = −λ ki + xi ln(ki ) + xi ln(λ) − ln(xi !) (A.20)
i=1 i=1 i=1 i=1
Table A.2
Pre-Challenger Field Joint Primary O-Ring Failure Data
x,
Number of Field Joint t,
Flight Date Primary O-Ring Incidents Temperature at Launch (◦ F)
4/12/81 0 66
11/12/81 1 70
3/22/82 0 69
11/11/82 0 68
4/4/83 0 67
6/18/83 0 72
8/30/83 0 73
11/28/83 0 70
2/3/84 1 57
4/6/84 1 63
8/30/84 1 70
10/5/84 0 78
11/8/84 0 67
1/24/85 2 53
4/12/85 0 67
4/29/85 0 75
6/17/85 0 70
7/29/85 0 81
8/27/85 0 76
10/3/85 0 79
10/30/85 2 75
11/26/85 0 76
1/12/86 1 58
A.5 Maximum Likelihood Fitting of Probability Models and Related Inference Methods 769
(
1 if xi ≥ 1
yi =
0 if xi = 0
that indicate which flights experienced primary O-ring incidents. (They also
considered a likelihood approach based on the counts xi themselves. But here
only the slightly simpler analysis based on the yi ’s will be discussed.) The
authors modeled Y1 , Y2 , . . . , Y23 as a priori independent variables and treated the
probability of at least one O-ring incident on flight i,
pi = P[Yi = 1] = P[X i ≥ 1]
p
ln = α + βt (A.21)
1− p
for α and β some unknown parameters. Equation (A.21) can be solved for p to
produce the function of t
1
p(t) = −(α+βt)
(A.22)
1+e
From either equation (A.21) or (A.22), it is possible to see that if β > 0, the
probability of at least one O-ring incident is increasing in t (low-temperature
launches are best). On the other hand, if β < 0, p is decreasing in t (high-
temperature launches are best).
The joint probability function for Y1 , Y2 , . . . , Y23 employed by Dalal,
Fowlkes, and Hoadley was then
23
Y 1−yi
p(ti ) yi 1 − p(ti ) for each yi = 0 or 1
f (y1 , y2 , . . . , y23 ) =
i=1
0 otherwise
770 Appendix A More on Probability and Model Fitting
Example 19 β
(continued ) −.16
The log likelihood function is then (using equations (A.21) and (A.22))
X X
23
p(ti )
23
L(α, β) = yi ln + ln 1 − p(ti )
i=1
1 − p(ti ) i=1
!
X
23 X
23
e−(α+βti )
= yi (α + βti ) + ln
i=1 i=1
1 + e−(α+βti )
= 7α + β(70 + 57 + 63 + 70 + 53 + 75 + 58)
! !
e−(α+66β) e−(α+70β)
+ ln + ln
1 + e−(α+66β) 1 + e−(α+70β)
!
e−(α+58β)
+ · · · + ln (A.23)
1 + e−(α+58β)
where the sum abbreviated in equation (A.23) is over all 23 ti ’s. Figure A.14 is a
contour plot of L(α, β) given in equation (A.23).
It is interesting (and sadly, of great engineering importance) that the region
of (α, β) pairs making the data of Table A.2 most likely is in the β < 0 part of
the (α, β)-plane—that is, where p(t) is decreasing in t (i.e., increases as t falls).
(Remember that the tragic Challenger launch was made at t = 31◦ .)
variables. For example, consider a life test of some electrical components, where
a technician begins a test by connecting 50 devices to a power source, goes away,
and then returns every ten hours to note which devices are still functioning. The
details of data collection produce only discrete data (which ten-hour period produces
failure) from the intrinsically continuous life lengths of the 50 devices. The next
example shows how the likelihood idea might be used in another situation where
the underlying phenomenon is continuous.
Table A.3
Measurements of a Critical
Dimension on Five Metal Parts
Produced on a CNC Lathe
Example 20 σ
(continued )
2.0 Point of maximum
L( µ , σ )
1.0
Y5
yi + .5 − µ yi − .5 − µ
f (y1 , y2 , . . . , y5 ) = 8 −8
i=1
σ σ
Example 18 Differentiating the log likelihood (A.20) with respect to λ, one obtains
(continued )
d Xn
1X
n
L(λ) = − ki + x
dλ i=1
λ i=1 i
which is the total number of defects observed divided by the total number of units
inspected. Since the second derivative of L(λ) is easily seen to be negative for all
λ, û is the unique maximizer of L(λ)—that is, the maximum likelihood estimate
of λ.
Example 19 Careful examination of contour plots like Figure A.14, or use of a numerical
(continued ) search method for the (α, β) pair maximizing L(α, β), produces maximum like-
lihood estimates
I α̂ = 15.043
I β̂ = −.2322
based on the pre-Challenger data. Figure A.16 is a plot of p(t) given in display
(A.22) for these values of α and β. Notice the disconcerting fact that the cor-
responding estimate of p(31) (the probability of at least one O-ring failure in a
31◦ launch) exceeds .99. (t = 31 is clearly a huge extrapolation away from any t
values in Table A.2, but even so, this kind of analysis conducted before the Chal-
lenger launch could well have helped cast legitimate doubt on the advisability of
a low-temperature launch.)
774 Appendix A More on Probability and Model Fitting
Example 19 p(t)
(continued ) 1.0
.8
.6
.4
.2
30 40 50 60 70 80 t
Example 20 Examination of the contour plot in Figure A.15 shows maximum likelihood
(continued ) estimates of µ and σ based on the rounded normal data model and the data in
Table A.3 to be approximately
µ̂ = 3.0
σ̂ = .55
It is worth noting that for these data, s = .71, which is noticeably larger than
σ̂ . This illustrates a well-established piece of statistical folklore. It is fairly well
known that to ignore rounding of intrinsically continuous data will typically
have the effect of inappropriately inflating the apparent spread of the underlying
distribution.
f 2 ( y)
1
P[each Yi is within 2
of yi ] ≈ f 2 ( y)1n (A.26)
But in expression (A.26), 1n doesn’t depend on 2—that is, the approximate prob-
ability is proportional to the function of 2, f 2 ( y). It is therefore plausible to use
the joint density with data plugged in,
A continuous data
likelihood function f 2 ( y) (A.27)
A continuous data
log likelihood L(2) = ln( f 2 ( y)) (A.28)
function
Example 21 L(α )
(continued ) −17.7
−17.8
−17.9
−18.0
−18.1
−18.2
20 25 30 35 40 45 α
α = 30.75
So with the data of display (A.29) in hand, the log likelihood function becomes
1
L(α) = −4 ln(α) − (75.4 + 39.4 + 3.7 + 4.5) (A.30)
α
It is easy to verify (using calculus and/or simply looking at the plot of L(α) in
Figure A.17) that L(α) is maximized for
Example 21 is fairly simple, in that only one parameter is involved and calculus
Maximum can be used to find an explicit formula for the maximum likelihood estimator. The
likelihood reader might be interested in working through the somewhat more complicated
and normal (two-parameter) situation involving n iid normal random variables with means µ
observations and standard deviations σ . Two-variable calculus can be used to show that maximum
A.5 Maximum Likelihood Fitting of Probability Models and Related Inference Methods 777
µ̂ = x̄
r
n−1
σ̂ = s
n
10
Y β β−1 β
y e−(yi /α) for each yi > 0
β i
f ( y) = i=1 α
0 otherwise
is indicated. Figure A.19 shows a contour plot of L(α, β) and indicates that
maximum likelihood estimates of α and β are indeed in the vicinity of β̂ = 2.0
and α̂ = .26.
778 Appendix A More on Probability and Model Fitting
.073, .098, .117, .135, .175, .262, .270, .350, .386, .456
ln(−ln(1−p))
1.0
0.0
−1.0 Slope ≈ 2
About −1.4
−3.0 (e −1.4 ≈ .25)
.5
L(α , β ) = 4.2
.4
L(α, β ) = 5.2
.3
L(α , β ) = 6.2
.2
L(α, β ) = 7.2
.1
Analytical attempts to locate the maximum likelihood estimates for this kind
of iid Weibull data situation are only partially fruitful. Setting partial derivatives
of L(α, β) equal to 0, followed by some algebra, does lead to the two equations
P β P !−1
yi ln(yi ) ln(yi )
β= P β −
yi n
P β !1/β
yi
α=
n
which maximum likelihood estimates must satisfy, but these must be solved
numerically.
Discrete and continuous likelihood methods have thus far been discussed sep-
arately. However, particularly in life-data analysis contexts, statistical engineering
studies occasionally yield data that are mixed—in the sense that some parts are
discrete, while other parts are continuous. If it is sensible to think of the two parts
as independent, a combination of things already said here can lead to an appropri-
ate likelihood function and then, for example, to maximum likelihood parameter
estimates.
That is, suppose that one has available discrete data, Y1 = y1 , and continuous
data, Y2 = y2 , which can be thought of as independently generated—Y1 from a
discrete joint distribution with joint probability function
f 2(1) ( y1 )
f 2(2) ( y2 )
A mixed-data
likelihood function f 2(1) ( y1 ) · f 2(2) ( y2 ) (A.31)
A mixed-data
log likelihood L(2) = ln f 2(1) ( y1 ) + ln f 2(2) ( y2 ) (A.32)
function
Then considering the continuous part of the likelihood, the joint density of ten
independent exponential variables with mean α is
P
1 e− yi /α for each yi > 0
f α(2) ( y2 ) = α
10
0 otherwise
Putting these two pieces together via equation (A.32), the log likelihood function
appropriate here is
1
L(α) = −10 ln(α) − (50 + 134 + 187 + · · · +
α
+ 15,800 + 29,200 + 86,100)
1
= −10 ln α − (144,673) (A.33)
α
Table A.5
12 Insulating Fluid Breakdown Times
50, 134, 187, 882, 1450, 1470, 2290, 2930, 4180, 15800, > 29200, > 86100
A.5 Maximum Likelihood Fitting of Probability Models and Related Inference Methods 781
144,673
I α̂ = = 14,467.3 sec
10
which has the intuitively appealing interpretation of the ratio of the total time on
test to the number of failures observed during testing.
The maximum of
the log-likelihood L ∗ = max L(2)
2
function
that is, L ∗ is the largest possible value of the log likelihood. (If 2̂ is a maximum
likelihood estimate of 2, then L ∗ = L(2̂).) An intuitively appealing way to make
a confidence set for the parameter vector 2 is to use the set of all 2’s with log
likelihood not too far below L ∗ ,
A likelihood-
based confidence 2 | L(2) > L ∗ − c (A.34)
set for 2
for an appropriate number c. And a plausible way of deriving a p-value for testing
H0 : 2 = 2 0 (A.35)
L ∗ − L(20 ) (A.36)
when H0 holds, and using the upper-tail probability beyond an observed value of
variable (A.36) as a p-value.
The practical gaps in this thinking are two: how to choose c in display (A.34)
to get a desired confidence level and what kind of distribution to use to describe
variable (A.36) under hypothesis (A.35). There are no general exact answers to these
782 Appendix A More on Probability and Model Fitting
questions, but statistical theory does provide at least some indication of approximate
answers that are often adequate for practical purposes when large samples are
involved. That is, statistical theory suggests that in many large-sample situations, if
2 is of dimension k, choosing
Constant producing
(large sample)
approximate γ c = 12 U (A.37)
level confidence for
2 | L(2) > L∗ − c
for U the γ quantile of the χk2 distribution, produces a confidence set (A.34) of
confidence level roughly γ . And similar reasoning suggests that in many large-
sample situations, if 2 is of dimension k, the hypothesis (A.35) can be tested using
the test statistic
A test statistic
for H0 : 2 = 20
with an 2 L ∗ − L(20 ) (A.38)
approximately χk2
reference distribution
and a χk2 approximate reference distribution, where large values of the test statistic
(A.38) count as evidence against H0 .
Example 23 Consider the problem of setting confidence limits on the mean time till break-
(continued ) down of Nelson’s insulating fluid tested at 30 kV. In this problem, 2 is k = 1-
dimensional. So, for example, making use of the facts that the .9 quantile of
the χ12 distribution is 2.706 and that the maximum likelihood estimate of α is
14,467.3, displays (A.33), (A.34), and (A.37) suggest that those α with
1 1
L(α) > −10 ln(14,467.3) − (144,673) − (2.706)
14,467.3 2
that is,
1
I −10 ln(α) − (144,673) > −107.15
α
form an approximate 90% confidence set for α. Figure A.20 shows a plot of the
log likelihood (A.33) cut at the level −107.15 and the corresponding interval of
α’s. Numerical solution of the equation
1
−10 ln(α) − (144,673) = −107.15
α
shows the interval for mean time till breakdown to extend from 8,963 sec to
25,572 sec.)
A.5 Maximum Likelihood Fitting of Probability Models and Related Inference Methods 783
L(α )
L*
−106.0
−106.5
−107.0
−107.15
−107.5 Approximate 90%
confidence interval
−108.0 for α
Figure A.20 Plot of the log likelihood for Nelson’s insulating fluid
breakdown time data and approximate confidence limits for α
It is not a trivial matter to verify that the χk2 approximations suggested here
Cautions concerning are adequate for a particular nonstandard probability model. In engineering sit-
the large-sample uations where fairly exact confidence levels and/or p-values are critical, readers
likelihood-based should seek genuinely expert statistical advice before placing too much faith in the
inference methods χk2 approximations. But for purposes of engineering problem solving requiring a
rough, working quantification of uncertainty associated with parameter estimates,
the use of the χk2 approximation is certainly preferable to operating without any such
quantification.
The insulating fluid example involved only a single parameter. As an example of
a k = 2-parameter application, consider once again the space shuttle O-ring failure
example.
Example 19 Again use the log likelihood (A.23) and the fact that maximum likelihood esti-
(continued ) mates of α and β in equation (A.21) or (A.22) are α̂ = 15.043 and β̂ = −.2322.
These produce corresponding log likelihood −10.158. This, together with the
784 Appendix A More on Probability and Model Fitting
Example 19 β
(continued )
0 Approximate 90% confidence
region for (α, β )
−.15
−.30
−.45
−.60
0 8 16 24 32 40 α
fact that the .9 quantile of the χ22 distribution is 4.605, gives one (from displays
(A.34) and (A.37)), that the set of (α, β) pairs with
1
L(α, β) > −10.158 − (4.605)
2
that is,
constitutes an approximate 90% confidence region for (α, β). This set of possible
parameter vectors is shown in the plot in Figure A.21. Notice that one message
conveyed by the contour plot is that β is pretty clearly negative. Low-temperature
launches are more prone to O-ring failure than moderate- to high-temperature
launches.
B
● ● ● ● ● ● ● ● ● ● ● ● ●
Tables
Table B.1
Random Digits
12159 66144 05091 13446 45653 13684 66024 91410 51351 22772
30156 90519 95785 47544 66735 35754 11088 67310 19720 08379
59069 01722 53338 41942 65118 71236 01932 70343 25812 62275
54107 58081 82470 59407 13475 95872 16268 78436 39251 64247
99681 81295 06315 28212 45029 57701 96327 85436 33614 29070
27252 37875 53679 01889 35714 63534 63791 76342 47717 73684
93259 74585 11863 78985 03881 46567 93696 93521 54970 37601
84068 43759 75814 32261 12728 09636 22336 75629 01017 45503
68582 97054 28251 63787 57285 18854 35006 16343 51867 67979
60646 11298 19680 10087 66391 70853 24423 73007 74958 29020
97437 52922 80739 59178 50628 61017 51652 40915 94696 67843
58009 20681 98823 50979 01237 70152 13711 73916 87902 84759
77211 70110 93803 60135 22881 13423 30999 07104 27400 25414
54256 84591 65302 99257 92970 28924 36632 54044 91798 78018
36493 69330 94069 39544 14050 03476 25804 49350 92525 87941
87569 22661 55970 52623 35419 76660 42394 63210 62626 00581
22896 62237 39635 63725 10463 87944 92075 90914 30599 35671
02697 33230 64527 97210 41359 79399 13941 88378 68503 33609
20080 15652 37216 00679 02088 34138 13953 68939 05630 27653
20550 95151 60557 57449 77115 87372 02574 07851 22128 39189
72771 11672 67492 42904 64647 94354 45994 42538 54885 15983
38472 43379 76295 69406 96510 16529 83500 28590 49787 29822
24511 56510 72654 13277 45031 42235 96502 25567 23653 36707
01054 06674 58283 82831 97048 42983 06471 12350 49990 04809
94437 94907 95274 26487 60496 78222 43032 04276 70800 17378
(continued )
785
786 Appendix B Tables
Table B.1
Random Digits (continued )
97842 69095 25982 03484 25173 05982 14624 31653 17170 92785
53047 13486 69712 33567 82313 87631 03197 02438 12374 40329
40770 47013 63306 48154 80970 87976 04939 21233 20572 31013
52733 66251 69661 58387 72096 21355 51659 19003 75556 33095
41749 46502 18378 83141 63920 85516 75743 66317 45428 45940
10271 85184 46468 38860 24039 80949 51211 35411 40470 16070
98791 48848 68129 51024 53044 55039 71290 26484 70682 56255
30196 09295 47685 56768 29285 06272 98789 47188 35063 24158
99373 64343 92433 06388 65713 35386 43370 19254 55014 98621
27768 27552 42156 23239 46823 91077 06306 17756 84459 92513
67791 35910 56921 51976 78475 15336 92544 82601 17996 72268
64018 44004 08136 56129 77024 82650 18163 29158 33935 94262
79715 33859 10835 94936 02857 87486 70613 41909 80667 52176
20190 40737 82688 07099 65255 52767 65930 45861 32575 93731
82421 01208 49762 66360 00231 87540 88302 62686 38456 25872
Reprinted from A Million Random Digits with 100,000 Normal Deviates, RAND (New York: The Free Press, 1955).
Copyright
c 1955 and 1983 by RAND. Used by permission.
Appendix B Tables 787
Table B.2
Control Chart Constants
m d2 d3 c4 A2 A3 B3 B4 B5 B6 D1 D2 D3 D4
2 1.128 0.853 0.7979 1.880 2.659 3.267 2.606 3.686 3.267
3 1.693 0.888 0.8862 1.023 1.954 2.568 2.276 4.358 2.575
4 2.059 0.880 0.9213 0.729 1.628 2.266 2.088 4.698 2.282
5 2.326 0.864 0.9400 0.577 1.427 2.089 1.964 4.918 2.114
6 2.534 0.848 0.9515 0.483 1.287 0.030 1.970 0.029 1.874 5.079 2.004
7 2.704 0.833 0.9594 0.419 1.182 0.118 1.882 0.113 1.806 0.205 5.204 0.076 1.924
8 2.847 0.820 0.9650 0.373 1.099 0.185 1.815 0.179 1.751 0.388 5.307 0.136 1.864
9 2.970 0.808 0.9693 0.337 1.032 0.239 1.761 0.232 1.707 0.547 5.394 0.184 1.816
10 3.078 0.797 0.9727 0.308 0.975 0.284 1.716 0.276 1.669 0.686 5.469 0.223 1.777
11 3.173 0.787 0.9754 0.285 0.927 0.321 1.679 0.313 1.637 0.811 5.535 0.256 1.744
12 3.258 0.778 0.9776 0.266 0.886 0.354 1.646 0.346 1.610 0.923 5.594 0.283 1.717
13 3.336 0.770 0.9794 0.249 0.850 0.382 1.618 0.374 1.585 1.025 5.647 0.307 1.693
14 3.407 0.763 0.9810 0.235 0.817 0.406 1.594 0.399 1.563 1.118 5.696 0.328 1.672
15 3.472 0.756 0.9823 0.223 0.789 0.428 1.572 0.421 1.544 1.203 5.740 0.347 1.653
20 3.735 0.729 0.9869 0.180 0.680 0.510 1.490 0.504 1.470 1.549 5.921 0.415 1.585
25 3.931 0.708 0.9896 0.153 0.606 0.565 1.435 0.559 1.420 1.806 6.056 0.459 1.541
This table was computed using Mathcad.
Table B.3
Standard Normal Cumulative Probabilities
Z !
z
1 t2
8(z) = √ exp − dt
−∞ 2π 2
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
−3.4 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0002
−3.3 .0005 .0005 .0005 .0004 .0004 .0004 .0004 .0004 .0004 .0003
−3.2 .0007 .0007 .0006 .0006 .0006 .0006 .0006 .0005 .0005 .0005
−3.1 .0010 .0009 .0009 .0009 .0008 .0008 .0008 .0008 .0007 .0007
−3.0 .0013 .0013 .0013 .0012 .0012 .0011 .0011 .0011 .0010 .0010
−2.9 .0019 .0018 .0018 .0017 .0016 .0016 .0015 .0015 .0014 .0014
−2.8 .0026 .0025 .0024 .0023 .0023 .0022 .0021 .0021 .0020 .0019
−2.7 .0035 .0034 .0033 .0032 .0031 .0030 .0029 .0028 .0027 .0026
−2.6 .0047 .0045 .0044 .0043 .0041 .0040 .0039 .0038 .0037 .0036
−2.5 .0062 .0060 .0059 .0057 .0055 .0054 .0052 .0051 .0049 .0048
−2.4 .0082 .0080 .0078 .0075 .0073 .0071 .0069 .0068 .0066 .0064
−2.3 .0107 .0104 .0102 .0099 .0096 .0094 .0091 .0089 .0087 .0084
−2.2 .0139 .0136 .0132 .0129 .0125 .0122 .0119 .0116 .0113 .0110
−2.1 .0179 .0174 .0170 .0166 .0162 .0158 .0154 .0150 .0146 .0143
−2.0 .0228 .0222 .0217 .0212 .0207 .0202 .0197 .0192 .0188 .0183
−1.9 .0287 .0281 .0274 .0268 .0262 .0256 .0250 .0244 .0239 .0233
−1.8 .0359 .0351 .0344 .0336 .0329 .0322 .0314 .0307 .0301 .0294
−1.7 .0446 .0436 .0427 .0418 .0409 .0401 .0392 .0384 .0375 .0367
−1.6 .0548 .0537 .0526 .0516 .0505 .0495 .0485 .0475 .0465 .0455
−1.5 .0668 .0655 .0643 .0630 .0618 .0606 .0594 .0582 .0571 .0559
−1.4 .0808 .0793 .0778 .0764 .0749 .0735 .0721 .0708 .0694 .0681
−1.3 .0968 .0951 .0934 .0918 .0901 .0885 .0869 .0853 .0838 .0823
−1.2 .1151 .1131 .1112 .1093 .1075 .1056 .1038 .1020 .1003 .0985
−1.1 .1357 .1335 .1314 .1292 .1271 .1251 .1230 .1210 .1190 .1170
−1.0 .1587 .1562 .1539 .1515 .1492 .1469 .1446 .1423 .1401 .1379
−0.9 .1841 .1814 .1788 .1762 .1736 .1711 .1685 .1660 .1635 .1611
−0.8 .2119 .2090 .2061 .2033 .2005 .1977 .1949 .1922 .1894 .1867
−0.7 .2420 .2389 .2358 .2327 .2297 .2266 .2236 .2206 .2177 .2148
−0.6 .2743 .2709 .2676 .2643 .2611 .2578 .2546 .2514 .2483 .2451
−0.5 .3085 .3050 .3015 .2981 .2946 .2912 .2877 .2843 .2810 .2776
−0.4 .3446 .3409 .3372 .3336 .3300 .3264 .3228 .3192 .3156 .3121
−0.3 .3821 .3783 .3745 .3707 .3669 .3632 .3594 .3557 .3520 .3483
−0.2 .4207 .4168 .4129 .4090 .4052 .4013 .3974 .3936 .3897 .3859
−0.1 .4602 .4562 .4522 .4483 .4443 .4404 .4364 .4325 .4286 .4247
−0.0 .5000 .4960 .4920 .4880 .4840 .4801 .4761 .4721 .4681 .4641
788
Table B.3
Standard Normal Cumulative Probabilities (continued)
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
0.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
1.0 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621
1.1 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830
1.2 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015
1.3 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177
1.4 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319
1.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441
1.6 .9452 .9463 .9474 .9484 .9495 .9505 .9515 .9525 .9535 .9545
1.7 .9554 .9564 .9573 .9582 .9591 .9599 .9608 .9616 .9625 .9633
1.8 .9641 .9649 .9656 .9664 .9671 .9678 .9686 .9693 .9699 .9706
1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750 .9756 .9761 .9767
2.0 .9773 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817
2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857
2.2 .9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .9890
2.3 .9893 .9896 .9898 .9901 .9904 .9906 .9909 .9911 .9913 .9916
2.4 .9918 .9920 .9922 .9925 .9927 .9929 .9931 .9932 .9934 .9936
2.5 .9938 .9940 .9941 .9943 .9945 .9946 .9948 .9949 .9951 .9952
2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964
2.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .9974
2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981
2.9 .9981 .9982 .9983 .9983 .9984 .9984 .9985 .9985 .9986 .9986
3.0 .9987 .9987 .9987 .9988 .9988 .9989 .9989 .9989 .9990 .9990
3.1 .9990 .9991 .9991 .9991 .9992 .9992 .9992 .9992 .9993 .9993
3.2 .9993 .9993 .9994 .9994 .9994 .9994 .9994 .9995 .9995 .9995
3.3 .9995 .9995 .9996 .9996 .9996 .9996 .9996 .9996 .9996 .9997
3.4 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9998
This table was generated using MINITAB.
789
Table B.4
t Distribution Quantiles
790
Table B.5
Chi-Square Distribution Quantiles
ν Q(.005) Q(.01) Q(.025) Q(.05) Q(.1) Q(.9) Q(.95) Q(.975) Q(.99) Q(.995)
1 0.000 0.000 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838
4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860
5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.833 15.086 16.750
6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278
8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955
9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589
10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188
11 2.603 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725 26.757
12 3.074 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217 28.300
13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819
14 4.075 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141 31.319
15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801
16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 34.267
17 5.697 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718
18 6.265 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 37.156
19 6.844 7.633 8.907 10.117 11.651 27.204 30.143 32.852 36.191 38.582
20 7.434 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566 39.997
21 8.034 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932 41.401
22 8.643 9.542 10.982 12.338 14.041 30.813 33.924 36.781 40.290 42.796
23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181
24 9.886 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 45.559
25 10.520 11.524 13.120 14.611 16.473 34.382 37.653 40.647 44.314 46.928
26 11.160 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642 48.290
27 11.808 12.879 14.573 16.151 18.114 36.741 40.113 43.195 46.963 49.645
28 12.461 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278 50.994
29 13.121 14.256 16.047 17.708 19.768 39.087 42.557 45.722 49.588 52.336
30 13.787 14.953 16.791 18.493 20.599 40.256 43.773 46.979 50.892 53.672
31 14.458 15.655 17.539 19.281 21.434 41.422 44.985 48.232 52.192 55.003
32 15.134 16.362 18.291 20.072 22.271 42.585 46.194 49.480 53.486 56.328
33 15.815 17.074 19.047 20.867 23.110 43.745 47.400 50.725 54.775 57.648
34 16.501 17.789 19.806 21.664 23.952 44.903 48.602 51.966 56.061 58.964
35 17.192 18.509 20.569 22.465 24.797 46.059 49.802 53.204 57.342 60.275
36 17.887 19.233 21.336 23.269 25.643 47.212 50.998 54.437 58.619 61.581
37 18.586 19.960 22.106 24.075 26.492 48.364 52.192 55.668 59.893 62.885
38 19.289 20.691 22.878 24.884 27.343 49.513 53.384 56.896 61.163 64.183
39 19.996 21.426 23.654 25.695 28.196 50.660 54.572 58.120 62.429 65.477
40 20.707 22.164 24.433 26.509 29.051 51.805 55.759 59.342 63.691 66.767
This table was generated using MINITAB.
r !3
2 2
For ν > 40, the approximation Q( p) ≈ ν 1 − + Q z ( p) can be used.
9ν 9ν 791
Table B.6A
F Distribution .75 Quantiles
ν2
(Denominator
Degrees of ν1 (Numerator Degrees of Freedom)
Freedom) 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 ∞
1 5.83 7.50 8.20 8.58 8.82 8.98 9.10 9.19 9.26 9.32 9.41 9.49 9.58 9.63 9.67 9.71 9.76 9.80 9.85
2 2.57 3.00 3.15 3.23 3.28 3.31 3.34 3.35 3.37 3.38 3.39 3.41 3.43 3.43 3.44 3.45 3.46 3.47 3.48
3 2.02 2.28 2.36 2.39 2.41 2.42 2.43 2.44 2.44 2.44 2.45 2.46 2.46 2.46 2.47 2.47 2.47 2.47 2.47
4 1.81 2.00 2.05 2.06 2.07 2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08
5 1.69 1.85 1.88 1.89 1.89 1.89 1.89 1.89 1.89 1.89 1.89 1.89 1.88 1.88 1.88 1.88 1.87 1.87 1.87
6 1.62 1.76 1.78 1.79 1.79 1.78 1.78 1.78 1.77 1.77 1.77 1.76 1.76 1.75 1.75 1.75 1.74 1.74 1.74
7 1.57 1.70 1.72 1.72 1.71 1.71 1.70 1.70 1.69 1.69 1.68 1.68 1.67 1.67 1.66 1.66 1.65 1.65 1.65
8 1.54 1.66 1.67 1.66 1.66 1.65 1.64 1.64 1.64 1.63 1.62 1.62 1.61 1.60 1.60 1.59 1.59 1.58 1.58
9 1.51 1.62 1.63 1.63 1.62 1.61 1.60 1.60 1.59 1.59 1.58 1.57 1.56 1.56 1.55 1.54 1.54 1.53 1.53
10 1.49 1.60 1.60 1.59 1.59 1.58 1.57 1.56 1.56 1.55 1.54 1.53 1.52 1.52 1.51 1.51 1.50 1.49 1.48
11 1.47 1.58 1.58 1.57 1.56 1.55 1.54 1.53 1.53 1.52 1.51 1.50 1.49 1.49 1.48 1.47 1.47 1.46 1.45
12 1.46 1.56 1.56 1.55 1.54 1.53 1.52 1.51 1.51 1.50 1.49 1.48 1.47 1.46 1.45 1.45 1.44 1.43 1.42
13 1.45 1.55 1.55 1.53 1.52 1.51 1.50 1.49 1.49 1.48 1.47 1.46 1.45 1.44 1.43 1.42 1.42 1.41 1.40
14 1.44 1.53 1.53 1.52 1.51 1.50 1.49 1.48 1.47 1.46 1.45 1.44 1.43 1.42 1.41 1.41 1.40 1.39 1.38
15 1.43 1.52 1.52 1.51 1.49 1.48 1.47 1.46 1.46 1.45 1.44 1.43 1.41 1.41 1.40 1.39 1.38 1.37 1.36
16 1.42 1.51 1.51 1.50 1.48 1.47 1.46 1.45 1.44 1.44 1.43 1.41 1.40 1.39 1.38 1.37 1.36 1.35 1.34
17 1.42 1.51 1.50 1.49 1.47 1.46 1.45 1.44 1.43 1.43 1.41 1.40 1.39 1.38 1.37 1.36 1.35 1.34 1.33
18 1.41 1.50 1.49 1.48 1.46 1.45 1.44 1.43 1.42 1.42 1.40 1.39 1.38 1.37 1.36 1.35 1.34 1.33 1.32
19 1.41 1.49 1.49 1.47 1.46 1.44 1.43 1.42 1.41 1.41 1.40 1.38 1.37 1.36 1.35 1.34 1.33 1.32 1.30
20 1.40 1.49 1.48 1.47 1.45 1.44 1.43 1.42 1.41 1.40 1.39 1.37 1.36 1.35 1.34 1.33 1.32 1.31 1.29
21 1.40 1.48 1.48 1.46 1.44 1.43 1.42 1.41 1.40 1.39 1.38 1.37 1.35 1.34 1.33 1.32 1.31 1.30 1.28
22 1.40 1.48 1.47 1.45 1.44 1.42 1.41 1.40 1.39 1.39 1.37 1.36 1.34 1.33 1.32 1.31 1.30 1.29 1.28
23 1.39 1.47 1.47 1.45 1.43 1.42 1.41 1.40 1.39 1.38 1.37 1.35 1.34 1.33 1.32 1.31 1.30 1.28 1.27
24 1.39 1.47 1.46 1.44 1.43 1.41 1.40 1.39 1.38 1.38 1.36 1.35 1.33 1.32 1.31 1.30 1.29 1.28 1.26
25 1.39 1.47 1.46 1.44 1.42 1.41 1.40 1.39 1.38 1.37 1.36 1.34 1.33 1.32 1.31 1.29 1.28 1.27 1.25
26 1.38 1.46 1.45 1.44 1.42 1.41 1.39 1.38 1.37 1.37 1.35 1.34 1.32 1.31 1.30 1.29 1.28 1.26 1.25
27 1.38 1.46 1.45 1.43 1.42 1.40 1.39 1.38 1.37 1.36 1.35 1.33 1.32 1.31 1.30 1.28 1.27 1.26 1.24
28 1.38 1.46 1.45 1.43 1.41 1.40 1.39 1.38 1.37 1.36 1.34 1.33 1.31 1.30 1.29 1.28 1.27 1.25 1.24
29 1.38 1.45 1.45 1.43 1.41 1.40 1.38 1.37 1.36 1.35 1.34 1.32 1.31 1.30 1.29 1.27 1.26 1.25 1.23
30 1.38 1.45 1.44 1.42 1.41 1.39 1.38 1.37 1.36 1.35 1.34 1.32 1.30 1.29 1.28 1.27 1.26 1.24 1.23
40 1.36 1.44 1.42 1.40 1.39 1.37 1.36 1.35 1.34 1.33 1.31 1.30 1.28 1.26 1.25 1.24 1.22 1.21 1.19
60 1.35 1.42 1.41 1.38 1.37 1.35 1.33 1.32 1.31 1.30 1.29 1.27 1.25 1.24 1.22 1.21 1.19 1.17 1.15
120 1.34 1.40 1.39 1.37 1.35 1.33 1.31 1.30 1.29 1.28 1.26 1.24 1.22 1.21 1.19 1.18 1.16 1.13 1.10
∞ 1.32 1.39 1.37 1.35 1.33 1.31 1.29 1.28 1.27 1.25 1.24 1.22 1.19 1.18 1.16 1.14 1.12 1.08 1.00
This table was generated using MINITAB.
792
Table B.6B
F Distribution .90 Quantiles
ν2
(Denominator
Degrees of ν1 (Numerator Degrees of Freedom)
Freedom) 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 ∞
1 39.86 49.50 53.59 55.84 57.24 58.20 58.90 59.44 59.85 60.20 60.70 61.22 61.74 62.00 62.27 62.53 62.79 63.05 63.33
2 8.53 9.00 9.16 9.24 9.29 9.33 9.35 9.37 9.38 9.39 9.41 9.42 9.44 9.45 9.46 9.47 9.47 9.48 9.49
3 5.54 5.46 5.39 5.34 5.31 5.28 5.27 5.25 5.24 5.23 5.22 5.20 5.18 5.18 5.17 5.16 5.15 5.14 5.13
4 4.54 4.32 4.19 4.11 4.05 4.01 3.98 3.95 3.94 3.92 3.90 3.87 3.84 3.83 3.82 3.80 3.79 3.78 3.76
5 4.06 3.78 3.62 3.52 3.45 3.40 3.37 3.34 3.32 3.30 3.27 3.24 3.21 3.19 3.17 3.16 3.14 3.12 3.10
6 3.78 3.46 3.29 3.18 3.11 3.05 3.01 2.98 2.96 2.94 2.90 2.87 2.84 2.82 2.80 2.78 2.76 2.74 2.72
7 3.59 3.26 3.07 2.96 2.88 2.83 2.78 2.75 2.72 2.70 2.67 2.63 2.59 2.58 2.56 2.54 2.51 2.49 2.47
8 3.46 3.11 2.92 2.81 2.73 2.67 2.62 2.59 2.56 2.54 2.50 2.46 2.42 2.40 2.38 2.36 2.34 2.32 2.29
9 3.36 3.01 2.81 2.69 2.61 2.55 2.51 2.47 2.44 2.42 2.38 2.34 2.30 2.28 2.25 2.23 2.21 2.18 2.16
10 3.28 2.92 2.73 2.61 2.52 2.46 2.41 2.38 2.35 2.32 2.28 2.24 2.20 2.18 2.16 2.13 2.11 2.08 2.06
11 3.23 2.86 2.66 2.54 2.45 2.39 2.34 2.30 2.27 2.25 2.21 2.17 2.12 2.10 2.08 2.05 2.03 2.00 1.97
12 3.18 2.81 2.61 2.48 2.39 2.33 2.28 2.24 2.21 2.19 2.15 2.10 2.06 2.04 2.01 1.99 1.96 1.93 1.90
13 3.14 2.76 2.56 2.43 2.35 2.28 2.23 2.20 2.16 2.14 2.10 2.05 2.01 1.98 1.96 1.93 1.90 1.88 1.85
14 3.10 2.73 2.52 2.39 2.31 2.24 2.19 2.15 2.12 2.10 2.05 2.01 1.96 1.94 1.91 1.89 1.86 1.83 1.80
15 3.07 2.70 2.49 2.36 2.27 2.21 2.16 2.12 2.09 2.06 2.02 1.97 1.92 1.90 1.87 1.85 1.82 1.79 1.76
16 3.05 2.67 2.46 2.33 2.24 2.18 2.13 2.09 2.06 2.03 1.99 1.94 1.89 1.87 1.84 1.81 1.78 1.75 1.72
17 3.03 2.64 2.44 2.31 2.22 2.15 2.10 2.06 2.03 2.00 1.96 1.91 1.86 1.84 1.81 1.78 1.75 1.72 1.69
18 3.01 2.62 2.42 2.29 2.20 2.13 2.08 2.04 2.00 1.98 1.93 1.89 1.84 1.81 1.78 1.75 1.72 1.69 1.66
19 2.99 2.61 2.40 2.27 2.18 2.11 2.06 2.02 1.98 1.96 1.91 1.86 1.81 1.79 1.76 1.73 1.70 1.67 1.63
20 2.97 2.59 2.38 2.25 2.16 2.09 2.04 2.00 1.96 1.94 1.89 1.84 1.79 1.77 1.74 1.71 1.68 1.64 1.61
21 2.96 2.57 2.36 2.23 2.14 2.08 2.02 1.98 1.95 1.92 1.87 1.83 1.78 1.75 1.72 1.69 1.66 1.62 1.59
22 2.95 2.56 2.35 2.22 2.13 2.06 2.01 1.97 1.93 1.90 1.86 1.81 1.76 1.73 1.70 1.67 1.64 1.60 1.57
23 2.94 2.55 2.34 2.21 2.11 2.05 1.99 1.95 1.92 1.89 1.84 1.80 1.74 1.72 1.69 1.66 1.62 1.59 1.55
24 2.93 2.54 2.33 2.19 2.10 2.04 1.98 1.94 1.91 1.88 1.83 1.78 1.73 1.70 1.67 1.64 1.61 1.57 1.53
25 2.92 2.53 2.32 2.18 2.09 2.02 1.97 1.93 1.89 1.87 1.82 1.77 1.72 1.69 1.66 1.63 1.59 1.56 1.52
26 2.91 2.52 2.31 2.17 2.08 2.01 1.96 1.92 1.88 1.86 1.81 1.76 1.71 1.68 1.65 1.61 1.58 1.54 1.50
27 2.90 2.51 2.30 2.17 2.07 2.00 1.95 1.91 1.87 1.85 1.80 1.75 1.70 1.67 1.64 1.60 1.57 1.53 1.49
28 2.89 2.50 2.29 2.16 2.06 2.00 1.94 1.90 1.87 1.84 1.79 1.74 1.69 1.66 1.63 1.59 1.56 1.52 1.48
29 2.89 2.50 2.28 2.15 2.06 1.99 1.93 1.89 1.86 1.83 1.78 1.73 1.68 1.65 1.62 1.58 1.55 1.51 1.47
30 2.88 2.49 2.28 2.14 2.05 1.98 1.93 1.88 1.85 1.82 1.77 1.72 1.67 1.64 1.61 1.57 1.54 1.50 1.46
40 2.84 2.44 2.23 2.09 2.00 1.93 1.87 1.83 1.79 1.76 1.71 1.66 1.61 1.57 1.54 1.51 1.47 1.42 1.38
60 2.79 2.39 2.18 2.04 1.95 1.87 1.82 1.77 1.74 1.71 1.66 1.60 1.54 1.51 1.48 1.44 1.40 1.35 1.29
120 2.75 2.35 2.13 1.99 1.90 1.82 1.77 1.72 1.68 1.65 1.60 1.55 1.48 1.45 1.41 1.37 1.32 1.26 1.19
∞ 2.71 2.30 2.08 1.94 1.85 1.77 1.72 1.67 1.63 1.60 1.55 1.49 1.42 1.38 1.34 1.30 1.24 1.17 1.00
793
Table B.6C
F Distribution .95 Quantiles
ν2
(Denominator
Degrees of ν1 (Numerator Degrees of Freedom)
Freedom) 1 2 3 4 5 6 7 8 9 10
1 161.44 199.50 215.69 224.57 230.16 233.98 236.78 238.89 240.55 241.89
2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.39 19.40
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49
17 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41
19 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38
20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35
21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32
22 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30
23 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.27
24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25
25 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.24
26 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22
27 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.31 2.25 2.20
28 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.19
29 4.18 3.33 2.93 2.70 2.55 2.43 2.35 2.28 2.22 2.18
30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16
40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08
60 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99
120 3.92 3.07 2.68 2.45 2.29 2.18 2.09 2.02 1.96 1.91
∞ 3.84 3.00 2.60 2.37 2.21 2.10 2.01 1.94 1.88 1.83
794
Table B.6C
F Distribution of .95 Quantiles (continued )
ν2
(Denominator
Degrees of ν1 (Numerator Degrees of Freedom)
Freedom) 12 15 20 24 30 40 60 120 ∞
1 243.91 245.97 248.02 249.04 250.07 251.13 252.18 253.27 254.31
2 19.41 19.43 19.45 19.45 19.46 19.47 19.48 19.49 19.50
3 8.74 8.70 8.66 8.64 8.62 8.59 8.57 8.55 8.53
4 5.91 5.86 5.80 5.77 5.75 5.72 5.69 5.66 5.63
5 4.68 4.62 4.56 4.53 4.50 4.46 4.43 4.40 4.36
6 4.00 3.94 3.87 3.84 3.81 3.77 3.74 3.70 3.67
7 3.57 3.51 3.44 3.41 3.38 3.34 3.30 3.27 3.23
8 3.28 3.22 3.15 3.12 3.08 3.04 3.01 2.97 2.93
9 3.07 3.01 2.94 2.90 2.86 2.83 2.79 2.75 2.71
10 2.91 2.85 2.77 2.74 2.70 2.66 2.62 2.58 2.54
11 2.79 2.72 2.65 2.61 2.57 2.53 2.49 2.45 2.40
12 2.69 2.62 2.54 2.51 2.47 2.43 2.38 2.34 2.30
13 2.60 2.53 2.46 2.42 2.38 2.34 2.30 2.25 2.21
14 2.53 2.46 2.39 2.35 2.31 2.27 2.22 2.18 2.13
15 2.48 2.40 2.33 2.29 2.25 2.20 2.16 2.11 2.07
16 2.42 2.35 2.28 2.24 2.19 2.15 2.11 2.06 2.01
17 2.38 2.31 2.23 2.19 2.15 2.10 2.06 2.01 1.96
18 2.34 2.27 2.19 2.15 2.11 2.06 2.02 1.97 1.92
19 2.31 2.23 2.16 2.11 2.07 2.03 1.98 1.93 1.88
20 2.28 2.20 2.12 2.08 2.04 1.99 1.95 1.90 1.84
21 2.25 2.18 2.10 2.05 2.01 1.96 1.92 1.87 1.81
22 2.23 2.15 2.07 2.03 1.98 1.94 1.89 1.84 1.78
23 2.20 2.13 2.05 2.01 1.96 1.91 1.86 1.81 1.76
24 2.18 2.11 2.03 1.98 1.94 1.89 1.84 1.79 1.73
25 2.16 2.09 2.01 1.96 1.92 1.87 1.82 1.77 1.71
26 2.15 2.07 1.99 1.95 1.90 1.85 1.80 1.75 1.69
27 2.13 2.06 1.97 1.93 1.88 1.84 1.79 1.73 1.67
28 2.12 2.04 1.96 1.91 1.87 1.82 1.77 1.71 1.65
29 2.10 2.03 1.94 1.90 1.85 1.81 1.75 1.70 1.64
30 2.09 2.01 1.93 1.89 1.84 1.79 1.74 1.68 1.62
40 2.00 1.92 1.84 1.79 1.74 1.69 1.64 1.58 1.51
60 1.92 1.84 1.75 1.70 1.65 1.59 1.53 1.47 1.39
120 1.83 1.75 1.66 1.61 1.55 1.50 1.43 1.35 1.25
∞ 1.75 1.67 1.57 1.52 1.46 1.39 1.32 1.22 1.00
This table was generated using MINITAB.
795
Table B.6D
F Distribution .99 Quantiles
ν2
(Denominator
Degrees of ν1 (Numerator Degrees of Freedom)
Freedom) 1 2 3 4 5 6 7 8 9 10
1 4052 4999 5403 5625 5764 5859 5929 5981 6023 6055
2 98.51 99.00 99.17 99.25 99.30 99.33 99.35 99.38 99.39 99.40
3 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.35 27.23
4 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.55
5 16.26 13.27 12.06 11.39 10.97 10.67 10.46 10.29 10.16 10.05
6 13.75 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87
7 12.25 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 6.62
8 11.26 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 5.81
9 10.56 8.02 6.99 6.42 6.06 5.80 5.61 5.47 5.35 5.26
10 10.04 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 4.85
11 9.65 7.21 6.22 5.67 5.32 5.07 4.89 4.74 4.63 4.54
12 9.33 6.93 5.95 5.41 5.06 4.82 4.64 4.50 4.39 4.30
13 9.07 6.70 5.74 5.21 4.86 4.62 4.44 4.30 4.19 4.10
14 8.86 6.51 5.56 5.04 4.69 4.46 4.28 4.14 4.03 3.94
15 8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80
16 8.53 6.23 5.29 4.77 4.44 4.20 4.03 3.89 3.78 3.69
17 8.40 6.11 5.18 4.67 4.34 4.10 3.93 3.79 3.68 3.59
18 8.29 6.01 5.09 4.58 4.25 4.01 3.84 3.71 3.60 3.51
19 8.19 5.93 5.01 4.50 4.17 3.94 3.77 3.63 3.52 3.43
20 8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 3.37
21 8.02 5.78 4.87 4.37 4.04 3.81 3.64 3.51 3.40 3.31
22 7.95 5.72 4.82 4.31 3.99 3.76 3.59 3.45 3.35 3.26
23 7.88 5.66 4.76 4.26 3.94 3.71 3.54 3.41 3.30 3.21
24 7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.26 3.17
25 7.77 5.57 4.68 4.18 3.85 3.63 3.46 3.32 3.22 3.13
26 7.72 5.53 4.64 4.14 3.82 3.59 3.42 3.29 3.18 3.09
27 7.68 5.49 4.60 4.11 3.78 3.56 3.39 3.26 3.15 3.06
28 7.64 5.45 4.57 4.07 3.75 3.53 3.36 3.23 3.12 3.03
29 7.60 5.42 4.54 4.04 3.73 3.50 3.33 3.20 3.09 3.00
30 7.56 5.39 4.51 4.02 3.70 3.47 3.30 3.17 3.07 2.98
40 7.31 5.18 4.31 3.83 3.51 3.29 3.12 2.99 2.89 2.80
60 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 2.72 2.63
120 6.85 4.79 3.95 3.48 3.17 2.96 2.79 2.66 2.56 2.47
∞ 6.63 4.61 3.78 3.32 3.02 2.80 2.64 2.51 2.41 2.32
796
Table B.6D
F Distribution of .99 Quantiles (continued )
ν2
(Denominator
Degrees of ν1 (Numerator Degrees of Freedom)
Freedom) 12 15 20 24 30 40 60 120 ∞
1 6107 6157 6209 6235 6260 6287 6312 6339 6366
2 99.41 99.43 99.44 99.45 99.47 99.47 99.48 99.49 99.50
3 27.05 26.87 26.69 26.60 26.51 26.41 26.32 26.22 26.13
4 14.37 14.20 14.02 13.93 13.84 13.75 13.65 13.56 13.46
5 9.89 9.72 9.55 9.47 9.38 9.29 9.20 9.11 9.02
6 7.72 7.56 7.40 7.31 7.23 7.14 7.06 6.97 6.88
7 6.47 6.31 6.16 6.07 5.99 5.91 5.82 5.74 5.65
8 5.67 5.52 5.36 5.28 5.20 5.12 5.03 4.95 4.86
9 5.11 4.96 4.81 4.73 4.65 4.57 4.48 4.40 4.31
10 4.71 4.56 4.41 4.33 4.25 4.17 4.08 4.00 3.91
11 4.40 4.25 4.10 4.02 3.94 3.86 3.78 3.69 3.60
12 4.16 4.01 3.86 3.78 3.70 3.62 3.54 3.45 3.36
13 3.96 3.82 3.66 3.59 3.51 3.43 3.34 3.25 3.17
14 3.80 3.66 3.51 3.43 3.35 3.27 3.18 3.09 3.00
15 3.67 3.52 3.37 3.29 3.21 3.13 3.05 2.96 2.87
16 3.55 3.41 3.26 3.18 3.10 3.02 2.93 2.84 2.75
17 3.46 3.31 3.16 3.08 3.00 2.92 2.83 2.75 2.65
18 3.37 3.23 3.08 3.00 2.92 2.84 2.75 2.66 2.57
19 3.30 3.15 3.00 2.92 2.84 2.76 2.67 2.58 2.49
20 3.23 3.09 2.94 2.86 2.78 2.69 2.61 2.52 2.42
21 3.17 3.03 2.88 2.80 2.72 2.64 2.55 2.46 2.36
22 3.12 2.98 2.83 2.75 2.67 2.58 2.50 2.40 2.31
23 3.07 2.93 2.78 2.70 2.62 2.54 2.45 2.35 2.26
24 3.03 2.89 2.74 2.66 2.58 2.49 2.40 2.31 2.21
25 2.99 2.85 2.70 2.62 2.54 2.45 2.36 2.27 2.17
26 2.96 2.81 2.66 2.58 2.50 2.42 2.33 2.23 2.13
27 2.93 2.78 2.63 2.55 2.47 2.38 2.29 2.20 2.10
28 2.90 2.75 2.60 2.52 2.44 2.35 2.26 2.17 2.06
29 2.87 2.73 2.57 2.49 2.41 2.33 2.23 2.14 2.03
30 2.84 2.70 2.55 2.47 2.39 2.30 2.21 2.11 2.01
40 2.66 2.52 2.37 2.29 2.20 2.11 2.02 1.92 1.80
60 2.50 2.35 2.20 2.12 2.03 1.94 1.84 1.73 1.60
120 2.34 2.19 2.03 1.95 1.86 1.76 1.66 1.53 1.38
∞ 2.18 2.04 1.88 1.79 1.70 1.59 1.47 1.32 1.00
This table was generated using MINITAB.
797
Table B.6E
F Distribution .999 Quantiles
ν2
(Denominator
Degrees of ν1 (Numerator Degrees of Freedom)
Freedom) 1 2 3 4 5 6 7 8 9 10
1 405261 499996 540349 562463 576409 585904 592890 598185 602359 605671
2 998.55 999.01 999.23 999.26 999.29 999.38 999.40 999.35 999.45 999.41
3 167.03 148.50 141.11 137.10 134.58 132.85 131.58 130.62 129.86 129.25
4 74.14 61.25 56.18 53.44 51.71 50.53 49.66 49.00 48.48 48.05
5 47.18 37.12 33.20 31.08 29.75 28.83 28.16 27.65 27.24 26.92
6 35.51 27.00 23.70 21.92 20.80 20.03 19.46 19.03 18.69 18.41
7 29.24 21.69 18.77 17.20 16.21 15.52 15.02 14.63 14.33 14.08
8 25.41 18.49 15.83 14.39 13.48 12.86 12.40 12.05 11.77 11.54
9 22.86 16.39 13.90 12.56 11.71 11.13 10.70 10.37 10.11 9.89
10 21.04 14.91 12.55 11.28 10.48 9.93 9.52 9.20 8.96 8.75
11 19.69 13.81 11.56 10.35 9.58 9.05 8.66 8.35 8.12 7.92
12 18.64 12.97 10.80 9.63 8.89 8.38 8.00 7.71 7.48 7.29
13 17.82 12.31 10.21 9.07 8.35 7.86 7.49 7.21 6.98 6.80
14 17.14 11.78 9.73 8.62 7.92 7.44 7.08 6.80 6.58 6.40
15 16.59 11.34 9.34 8.25 7.57 7.09 6.74 6.47 6.26 6.08
16 16.12 10.97 9.01 7.94 7.27 6.80 6.46 6.19 5.98 5.81
17 15.72 10.66 8.73 7.68 7.02 6.56 6.22 5.96 5.75 5.58
18 15.38 10.39 8.49 7.46 6.81 6.35 6.02 5.76 5.56 5.39
19 15.08 10.16 8.28 7.27 6.62 6.18 5.85 5.59 5.39 5.22
20 14.82 9.95 8.10 7.10 6.46 6.02 5.69 5.44 5.24 5.08
21 14.59 9.77 7.94 6.95 6.32 5.88 5.56 5.31 5.11 4.95
22 14.38 9.61 7.80 6.81 6.19 5.76 5.44 5.19 4.99 4.83
23 14.20 9.47 7.67 6.70 6.08 5.65 5.33 5.09 4.89 4.73
24 14.03 9.34 7.55 6.59 5.98 5.55 5.23 4.99 4.80 4.64
25 13.88 9.22 7.45 6.49 5.89 5.46 5.15 4.91 4.71 4.56
26 13.74 9.12 7.36 6.41 5.80 5.38 5.07 4.83 4.64 4.48
27 13.61 9.02 7.27 6.33 5.73 5.31 5.00 4.76 4.57 4.41
28 13.50 8.93 7.19 6.25 5.66 5.24 4.93 4.69 4.50 4.35
29 13.39 8.85 7.12 6.19 5.59 5.18 4.87 4.64 4.45 4.29
30 13.29 8.77 7.05 6.12 5.53 5.12 4.82 4.58 4.39 4.24
40 12.61 8.25 6.59 5.70 5.13 4.73 4.44 4.21 4.02 3.87
60 11.97 7.77 6.17 5.31 4.76 4.37 4.09 3.86 3.69 3.54
120 11.38 7.32 5.78 4.95 4.42 4.04 3.77 3.55 3.38 3.24
∞ 10.83 6.91 5.42 4.62 4.10 3.74 3.47 3.27 3.10 2.96
798
Table B.6E
F Distribution .999 Quantiles (continued )
ν2
(Denominator
Degrees of ν1 (Numerator Degrees of Freedom)
Freedom) 12 15 20 24 30 40 60 120 ∞
1 610644 615766 620884 623544 626117 628724 631381 634002 636619
2 999.46 999.40 999.44 999.45 999.47 999.49 999.50 999.52 999.50
3 128.32 127.37 126.42 125.94 125.45 124.96 124.47 123.97 123.47
4 47.41 46.76 46.10 45.77 45.43 45.09 44.75 44.40 44.05
5 26.42 25.91 25.40 25.13 24.87 24.60 24.33 24.06 23.79
6 17.99 17.56 17.12 16.90 16.67 16.44 16.21 15.98 15.75
7 13.71 13.32 12.93 12.73 12.53 12.33 12.12 11.91 11.70
8 11.19 10.84 10.48 10.30 10.11 9.92 9.73 9.53 9.33
9 9.57 9.24 8.90 8.72 8.55 8.37 8.19 8.00 7.81
10 8.45 8.13 7.80 7.64 7.47 7.30 7.12 6.94 6.76
11 7.63 7.32 7.01 6.85 6.68 6.52 6.35 6.18 6.00
12 7.00 6.71 6.40 6.25 6.09 5.93 5.76 5.59 5.42
13 6.52 6.23 5.93 5.78 5.63 5.47 5.30 5.14 4.97
14 6.13 5.85 5.56 5.41 5.25 5.10 4.94 4.77 4.60
15 5.81 5.54 5.25 5.10 4.95 4.80 4.64 4.47 4.31
16 5.55 5.27 4.99 4.85 4.70 4.54 4.39 4.23 4.06
17 5.32 5.05 4.78 4.63 4.48 4.33 4.18 4.02 3.85
18 5.13 4.87 4.59 4.45 4.30 4.15 4.00 3.84 3.67
19 4.97 4.70 4.43 4.29 4.14 3.99 3.84 3.68 3.51
20 4.82 4.56 4.29 4.15 4.01 3.86 3.70 3.54 3.38
21 4.70 4.44 4.17 4.03 3.88 3.74 3.58 3.42 3.26
22 4.58 4.33 4.06 3.92 3.78 3.63 3.48 3.32 3.15
23 4.48 4.23 3.96 3.82 3.68 3.53 3.38 3.22 3.05
24 4.39 4.14 3.87 3.74 3.59 3.45 3.29 3.14 2.97
25 4.31 4.06 3.79 3.66 3.52 3.37 3.22 3.06 2.89
26 4.24 3.99 3.72 3.59 3.44 3.30 3.15 2.99 2.82
27 4.17 3.92 3.66 3.52 3.38 3.23 3.08 2.92 2.75
28 4.11 3.86 3.60 3.46 3.32 3.18 3.02 2.86 2.69
29 4.05 3.80 3.54 3.41 3.27 3.12 2.97 2.81 2.64
30 4.00 3.75 3.49 3.36 3.22 3.07 2.92 2.76 2.59
40 3.64 3.40 3.14 3.01 2.87 2.73 2.57 2.41 2.23
60 3.32 3.08 2.83 2.69 2.55 2.41 2.25 2.08 1.89
120 3.02 2.78 2.53 2.40 2.26 2.11 1.95 1.77 1.54
∞ 2.74 2.51 2.27 2.13 1.99 1.84 1.66 1.45 1.00
This table was generated using MINITAB.
799
800 Appendix B Tables
Table B.7A
Factors for Two-Sided Tolerance Intervals for Normal Distributions
Table B.7B
Factors for One-Sided Tolerance Intervals for Normal Distributions
Table B.8A
Factors for Simultaneous 95% Two-Sided Confidence Limits for Several Means
Number of Means
ν 1 2 3 4 5 6 7 8 9 10 12 14 16 32
2 4.303 5.571 6.340 6.886 7.306 7.645 7.929 8.172 8.385 8.573 8.894 9.162 9.390 10.529
3 3.182 3.960 4.430 4.764 5.023 5.233 5.410 5.562 5.694 5.812 6.015 6.184 6.328 7.055
4 2.776 3.382 3.745 4.003 4.203 4.366 4.503 4.621 4.725 4.817 4.975 5.107 5.221 5.794
5 2.571 3.091 3.399 3.619 3.789 3.928 4.044 4.145 4.233 4.312 4.447 4.560 4.657 5.150
6 2.447 2.916 3.193 3.389 3.541 3.664 3.769 3.858 3.937 4.008 4.129 4.230 4.317 4.760
7 2.365 2.800 3.055 3.236 3.376 3.489 3.585 3.668 3.740 3.805 3.916 4.009 4.090 4.498
8 2.306 2.718 2.958 3.127 3.258 3.365 3.454 3.532 3.600 3.660 3.764 3.852 3.927 4.310
9 2.262 2.657 2.885 3.046 3.171 3.272 3.357 3.430 3.494 3.552 3.650 3.733 3.805 4.169
10 2.228 2.609 2.829 2.983 3.103 3.199 3.281 3.351 3.412 3.467 3.562 3.641 3.710 4.058
11 2.201 2.571 2.784 2.933 3.048 3.142 3.220 3.288 3.347 3.400 3.491 3.568 3.634 3.969
12 2.179 2.540 2.747 2.892 3.004 3.095 3.171 3.236 3.294 3.345 3.433 3.507 3.571 3.897
13 2.160 2.514 2.717 2.858 2.967 3.055 3.129 3.193 3.249 3.299 3.385 3.457 3.519 3.836
14 2.145 2.493 2.691 2.830 2.936 3.022 3.095 3.157 3.212 3.260 3.344 3.415 3.475 3.784
15 2.131 2.474 2.669 2.805 2.909 2.994 3.065 3.126 3.180 3.227 3.309 3.378 3.438 3.740
16 2.120 2.458 2.650 2.784 2.886 2.969 3.039 3.099 3.152 3.199 3.279 3.347 3.405 3.701
17 2.110 2.444 2.633 2.765 2.866 2.948 3.017 3.076 3.127 3.173 3.253 3.319 3.376 3.668
18 2.101 2.432 2.619 2.749 2.849 2.929 2.997 3.055 3.106 3.151 3.229 3.295 3.351 3.638
19 2.093 2.421 2.606 2.734 2.833 2.912 2.979 3.037 3.087 3.132 3.209 3.273 3.329 3.611
20 2.086 2.411 2.594 2.721 2.819 2.897 2.963 3.020 3.070 3.114 3.190 3.254 3.308 3.587
24 2.064 2.380 2.558 2.681 2.775 2.851 2.914 2.969 3.016 3.059 3.132 3.193 3.246 3.513
30 2.042 2.350 2.522 2.641 2.732 2.805 2.866 2.918 2.964 3.005 3.075 3.133 3.184 3.439
36 2.028 2.331 2.499 2.615 2.704 2.775 2.834 2.885 2.930 2.970 3.038 3.094 3.143 3.391
40 2.021 2.321 2.488 2.602 2.690 2.760 2.819 2.869 2.913 2.952 3.019 3.075 3.123 3.367
60 2.000 2.292 2.454 2.564 2.649 2.716 2.772 2.821 2.863 2.900 2.964 3.018 3.064 3.295
120 1.980 2.264 2.420 2.527 2.608 2.673 2.727 2.773 2.814 2.849 2.910 2.961 3.005 3.225
144 1.977 2.259 2.415 2.521 2.602 2.666 2.720 2.766 2.806 2.841 2.902 2.952 2.996 3.214
∞ 1.960 2.237 2.388 2.491 2.569 2.631 2.683 2.727 2.766 2.800 2.858 2.906 2.948 3.156
This table was prepared using a program written by Daniel L. Rose.
Appendix B Tables 803
Table B.8B
Factors for Simultaneous 95% One-Sided Confidence Limits for Several Means
Number of Means
ν 1 2 3 4 5 6 7 8 9 10 12 14 16 32
2 2.920 4.075 4.834 5.397 5.842 6.208 6.516 6.781 7.014 7.220 7.573 7.867 8.118 9.364
3 2.353 3.090 3.551 3.888 4.154 4.372 4.557 4.717 4.858 4.983 5.199 5.380 5.535 6.315
4 2.132 2.722 3.080 3.340 3.544 3.711 3.852 3.974 4.082 4.179 4.345 4.484 4.604 5.212
5 2.015 2.532 2.840 3.062 3.234 3.376 3.495 3.599 3.690 3.772 3.912 4.031 4.132 4.650
6 1.943 2.417 2.696 2.894 3.049 3.175 3.282 3.374 3.455 3.528 3.653 3.758 3.849 4.312
7 1.895 2.340 2.599 2.783 2.925 3.041 3.139 3.224 3.299 3.365 3.480 3.577 3.660 4.085
8 1.860 2.285 2.530 2.703 2.837 2.946 3.038 3.117 3.187 3.250 3.357 3.447 3.525 3.923
9 1.833 2.243 2.479 2.644 2.772 2.875 2.962 3.038 3.104 3.163 3.265 3.351 3.424 3.801
10 1.812 2.211 2.439 2.598 2.720 2.820 2.904 2.976 3.039 3.096 3.193 3.275 3.346 3.707
11 1.796 2.186 2.407 2.561 2.680 2.776 2.857 2.927 2.988 3.042 3.136 3.215 3.283 3.631
12 1.782 2.164 2.380 2.531 2.647 2.740 2.819 2.886 2.946 2.999 3.090 3.166 3.232 3.569
13 1.771 2.147 2.359 2.506 2.619 2.710 2.787 2.853 2.911 2.962 3.051 3.126 3.190 3.517
14 1.761 2.132 2.340 2.485 2.596 2.685 2.760 2.825 2.881 2.932 3.018 3.091 3.154 3.473
15 1.753 2.119 2.324 2.467 2.576 2.663 2.737 2.800 2.856 2.905 2.990 3.062 3.123 3.436
16 1.746 2.108 2.311 2.451 2.558 2.645 2.717 2.779 2.834 2.883 2.966 3.036 3.096 3.403
17 1.740 2.099 2.299 2.437 2.543 2.628 2.700 2.761 2.815 2.863 2.945 3.014 3.073 3.375
18 1.734 2.090 2.288 2.425 2.530 2.614 2.684 2.745 2.798 2.845 2.926 2.994 3.052 3.349
19 1.729 2.083 2.279 2.415 2.518 2.601 2.671 2.731 2.783 2.830 2.910 2.977 3.034 3.327
20 1.725 2.076 2.271 2.405 2.507 2.590 2.659 2.718 2.770 2.816 2.895 2.961 3.018 3.307
24 1.711 2.055 2.245 2.375 2.474 2.554 2.621 2.678 2.728 2.772 2.848 2.912 2.967 3.244
30 1.697 2.034 2.219 2.346 2.442 2.519 2.584 2.639 2.687 2.730 2.803 2.864 2.917 3.183
36 1.688 2.020 2.202 2.327 2.421 2.496 2.559 2.613 2.660 2.702 2.773 2.833 2.884 3.142
40 1.684 2.014 2.194 2.317 2.410 2.485 2.547 2.600 2.647 2.688 2.758 2.817 2.868 3.122
60 1.671 1.993 2.169 2.289 2.379 2.451 2.511 2.563 2.607 2.647 2.715 2.771 2.820 3.063
120 1.658 1.974 2.145 2.261 2.349 2.418 2.476 2.526 2.569 2.607 2.672 2.726 2.773 3.005
144 1.656 1.971 2.141 2.257 2.344 2.413 2.471 2.520 2.563 2.601 2.665 2.719 2.765 2.995
∞ 1.645 1.955 2.121 2.234 2.319 2.386 2.442 2.490 2.531 2.568 2.630 2.682 2.727 2.948
This table was prepared using a program written by Daniel L. Rose.
804 Appendix B Tables
Table B.9A
.95 Quantiles of the Studentized Range Distribution
Table B.9B
.99 Quantiles of the Studentized Range Distribution
Answers to
Section Exercises
Chapter 1
Section 1 Section 2
1. Designing and improving complex products and 1. Observational study—you might be interested in
systems often leads to situations where there is no assessing the job satisfaction of a large number
known theory that can guide decisions. Engineers of manufacturing workers; you could administer
are then forced to experiment and collect data to a survey to measure various dimensions of job
find out how a system works, usually under time satisfaction. Experimental study—you might want
and monetary constraints. Engineers also collect to compare several different job routing schemes
data in order to monitor the quality of products and to see which one achieves the greatest throughput
services. Statistical principles and methods can be in a job shop.
used to find effective and efficient ways to collect
2. Qualitative data—rating the quality of batches of
and analyze such data.
ice cream as either poor, fair, good, or exceptional.
2. The physical world is filled with variability. It Quantitative data—measuring the time (in hours)
comes from differences in raw materials, machin- it takes for each of 1,000 integrated circuit chips
ery, operators, environment, measuring devices, to fail in a high-stress environment.
and other uncontrollable variables that change over
3. Any relationships between the variables x and y
time. This produces variability in engineering data,
can only be derived from a bivariate sample.
at least some of which is impossible to completely
eliminate. Statistics must therefore address the re- 4. You might want to compare two laboratories in
ality of variability in data. their ability to determine percent impurities in rare
metal specimens. Each specimen could be divided
3. Descriptive statistics provides a way of summariz-
in two, with each half going to a different lab. Since
ing patterns and major features of data. Inferential
each specimen is being measured twice for percent
statistics uses a probability model to describe the
impurity, the data would be paired (according to
process from which the data were obtained; data
specimen).
are then used to draw conclusions about the pro-
cess by estimating parameters in the model and
making predictions based on the model.
806
Answers to Section Exercises 807
5. Full factorial data structure—tests are performed produces too much variability (and this cannot be
for all factor-level combinations: corrected by calibration). If a measurement sys-
tem is valid and precise, but inaccurate, it might
Design Paper Loading Condition be easy to make it accurate (and thus useful) by
calibrating it to a standard.
delta construction with clip
t-wing construction with clip 2. If the measurement system is not valid, then tak-
delta typing with clip ing an average will still produce a measurement
that is invalid. If the individual measurements are
t-wing typing with clip
inaccurate, then the average will be inaccurate. Av-
delta construction without clip
eraging many measurements only improves preci-
t-wing construction without clip
sion. Suppose that the long-run average yield of
delta typing without clip the process is stable over time. Imagine making
t-wing typing without clip 5 yield measurements every hour, for 24 hours.
This produces 120 individual measurements, and
Fractional factorial data structure—tests are per- 24 averages. Since the averages are “pulled” to the
formed for only some of the possible factor-level center, there will be less variability in the 24 aver-
combinations. One possibility is to choose the fol- ages than in the 120 individual measurements, so
lowing “half fraction”: averaging improves precision.
3. Unstable measurement systems (e.g., instrument
Design Paper Loading Condition drift, multiple inconsistent devices) can lead to
delta construction without clip differences or changes in validity, precision, and
t-wing construction with clip accuracy. In a statistical engineering study, it is
delta typing with clip
important to obtain valid, precise, and accurate
measurements throughout the study. Changes or
t-wing typing without clip
differences may create excessive variability, mak-
ing it hard to draw conclusions. Changes or differ-
6. Variables can be manipulated in an experiment. ences can also bias results by causing patterns in
If changes in the response coincide with changes data that might incorrectly be attributed to factors
in factor levels, it is usually safe to infer that the in the experiment.
changes in the factor caused the changes in the
Section 4
response (as long as other factors have been con-
trolled and there is no source of bias). There is 1. Mathematical models can help engineers describe
no control or manipulation in an observational (in a relatively simple and concise way) how phys-
study. Changes in the response may coincide with ical systems behave, or will behave. They are an
changes in another variable, but there is always integral part of designing and improving products
the possibility that a third variable is causing the and processes.
correlation. It is therefore risky to infer a cause-
and-effect relationship between any variable and Chapter 2
the response in an observational study. Section 1
Section 3 1. Flight distance might be defined as the horizontal
1. Even if a measurement system is accurate and pre- distance that a plane travels after being launched
cise, if it is not truly measuring the desired dimen- from a mechanical slingshot. Specifically, the hor-
sion or characteristic, then the measurements are izontal distance might be measured from the point
useless. If a measurement system is valid and ac- on the floor directly below the slingshot to the
curate, but imprecise, it may be useless because it
808 Answers to Section Exercises
point on the floor where any part of the plane first the n items chosen will be “extreme” members of
touches. the population.
2. If all operators are trained to use measuring equip- Section 3
ment in the same consistent way, this will result 1. Possible controlled variables: operator, launch an-
in better repeatability and reproducibility of mea- gle, launch force, paper clip size, paper manu-
surements. The measurements will be more repeat- facturer, plane constructor, distance measurer, and
able because individual operators will use the same wind. The response is Flight Distance and the ex-
technique from measurement to measurement, re- perimental variables are Design, Paper Type, and
sulting in small variability among measurements Loading Condition. Concomitant variables might
of the same item by the same operator. The mea- be wind speed and direction (if these cannot be
surements will be more reproducible because all controlled), ambient temperature, humidity, and
operators will be trained to use the same technique, atmospheric pressure.
resulting in small variability among measurements
made by different operators. 2. Advantage: may reduce baseline variation (back-
ground noise) in the response, making it easier to
3. This scheme will tend to “over-sample” larger lots see the effects of factors. Disadvantage: the vari-
and “under-sample” smaller lots, since the amount able may fluctuate in the real world, so controlling
of information obtained about a large population it makes the experiment more artificial—it will be
from a particular sample size does not depend on harder to generalize conclusions from the experi-
the size of the population. To obtain the same ment to the real world.
amount of information from each lot, you should
use an absolute (fixed) sample size instead of a 3. Treat “distance measurer” as an experimental
relative one. (blocking) variable with 2 levels. For each level
(team member), perform a full factorial experi-
4. If the response variable is poorly defined, the data ment using the 3 primary factors. If there are differ-
collected may not properly describe the character- ences in the way team members measure distance,
istic of interest. Even if they do, operators may then it will still be possible to unambiguously as-
not be consistent in the way that they measure the sess the effects of the primary factors within each
response, resulting in more variation. “sub-experiment” (block).
Section 2 4. List the tests for Mary in the same order given for
1. Label the 38 runout values consecutively, 1–38, in Exercise 5 of Section 1.2. Then list the tests for
the order given in Table 1.1 (smallest to largest). Tom after Mary, again in the same order. Label
First sample labels: {12, 15, 5, 9, 11}; First sample the tests consecutively 1–16, in the order listed.
runout values: {11, 11, 9, 10, 11}. Second sample Let the digits 01–05 refer to test 1, 06–10 to test
labels: {34, 31, 36, 2, 14}; Second sample runout 2, . . . , and 76–80 to test 16. Move through Table
values: {17, 15, 18, 8, 11}. Third sample labels: B.1 choosing two digits at a time. Ignore previ-
{10, 35, 12, 27, 30}; Third sample runout values: ously chosen test labels or numbers between 81
{10, 17, 11, 14, 15}. Fourth sample labels: {15, 5, and 00. Order the tests in the same order that their
19, 11, 8}; Fourth sample runout values: {11, 9, corresponding two-digit numbers are chosen from
12, 11, 10}. The samples are not identical. Note: the table. Using this method (and starting from the
the population mean is 12.63; the sample means upper-left of the table), the test labeled 3 (Mary,
are 10.4, 13.8, 13.4, and 10.6. delta, typing, with clip) would be first, followed
by the tests labeled 13, 9, 1, 2, 7, 10, 8, 14, 11, 6,
3. A simple random sample is not guaranteed to be
15, 4, 16, 12, and 5.
representative of the population from which it is
drawn. It gives every set of n items an equal chance
of being selected, so there is always a chance that
Answers to Section Exercises 809
5. For the delta/construction/with clip condition (for decisions to be made in the middle of the study.
example), flying the same plane twice would pro- Although some may think that this is improper
vide information about flight-to-flight variability from a scientific/statistical point of view, it is only
for that particular plane. This would be useful practical to base the design of later stages on the
if you are only interested in making conclusions results of earlier stages.
about that particular plane. If you are interested in
generalizing your conclusions to all delta design Section 4
planes made with construction paper and loaded 1. If you regard student as a blocking variable, then
with a paper clip, then reflying the same airplane this would be a randomized complete block ex-
does not provide much more information. But periment. Otherwise, it would just be a completely
making and flying two planes for this condition randomized experiment (with a full factorial struc-
would give you some idea of variability among ture).
different planes of this type, and would therefore 2. (a) Label the 24 runs as follows:
validate any general conclusions made from the
study. This argument would be true for all 8 con- Labels Level of A Level of B Level of C
ditions, and would also apply to comparisons made
among the 8 conditions. 1, 2, 3 1 1 1
4, 5, 6 2 1 1
6. Random sampling is used in enumerative studies.
Its purpose is to choose a representative sample 7, 8, 9 1 2 1
from some population of items. Randomization 10, 11, 12 2 2 1
is used in analytical/experimental studies. Its pur- 13, 14, 15 1 1 2
pose is to assign units to experimental conditions 16, 17, 18 2 1 2
in an unbiased way, and to order procedures to 19, 20, 21 1 2 2
prevent bias from unsupervised variables that may 22, 23, 24 2 2 2
change over time.
7. Blocking is a way of controlling an extraneous Use the following coding for the test labels: ta-
variable: within each block, there may be less base- ble number 01–04 for test label 1, table number
line variation (background noise) in the response 05–08 for test label 2, . . . , table number 93–96 for
than there would be if the variable were not con- test number 24. Move through Table B.1 choosing
trolled. This makes it easier to see the effects of two digits at a time, ignoring numbers between 97
the factors of interest within each block. Any ef- and 00 and those corresponding to test labels that
fects of the extraneous variable can be isolated and have already been picked. Order the tests in the
distinguished from the effects of the factors of in- same order that their corresponding two-digit num-
terest. Compared to holding the variable constant bers are picked from the table. Using this method,
throughout the experiment, blocking also results and starting from the upper-left corner of the ta-
in a more realistic experiment. ble, the order would be 3, 4, 24, 16, 11, 2, 9, 12,
17, 8, 21, 1, 13, 7, 18, 5, 20, 14, 19, 15, 22, 23,
8. Replication is used to estimate the magnitude of 6, 10. (b) Treat day as a blocking variable, and
baseline variation (background noise, experimen- run each of the 8 factor-level combinations once on
tal error) in the response, and thus helps sharpen each day. Blocking allows comparisons among the
and validate conclusions drawn from data. It pro- factor-level combinations to be made within each
vides verification that results are repeatable and day. If blocking were not used, differences among
establishes the limits of that repeatability. days might cause variation in the response which
9. It is not necessary to know exactly how the entire would cloud comparisons among the factor-level
budget will be spent. Experimentation in engineer-
ing is usually sequential, and this requires some
810 Answers to Section Exercises
Loading combination once, and each Paper/Load- value (outlier). (c) The nonlinearity of the Q–
ing combination once. Q plot indicates that the overall shapes of these
two data sets are not the same. The lengthwise
5. This is an incomplete block experiment.
cuts had an unusually large data point (“long right
Section 5 tail”), whereas the crosswise cuts had an unusu-
1. A cause-and-effect diagram may be useful for rep- ally small data point (“long left tail”). Without
resenting a complex system in a relatively sim- these two outliers, the data sets would have simi-
ple and visual way. It enables people to see how lar shapes, since the rest of the Q–Q plot is fairly
the components of the system interact, and may linear.
help identify areas which need the most atten- 2. Use the (i − .5)/n quantiles for the smaller data
tion/improvement. set. The plot coordinates are: (.370, .907), (.520,
1.22), (.650, 1.47), (.920, 1.70), (2.89, 2.45), (3.62,
Chapter 3 5.89).
Section 1 3. The first 3 plot coordinates are: (65.6, −2.33),
1. One choice of intervals for the frequency table and (65.6, −1.75), (66.2, −1.55). The normal plot is
histogram is 65.5–66.4, 66.5–67.4, . . . , 73.5–74.4. quite linear, indicating that the data are very bell-
For this choice, the frequencies are 3, 2, 9, 5, 8, 6, shaped.
2, 3, 2; the relative frequencies are .075, .05, .225,
4. Theoretical Q–Q plotting allows you to roughly
.125, .2, .15, .05, .075, .05; the cumulative relative
check to see if a data set has a shape that is similar
frequencies are .075, .125, .35, .475, .675, .825,
to some theoretical distribution. This can be use-
.875, .95, 1. The plots reveal a fairly symmetric,
ful in identifying a theoretical (probability) model
bell-shaped distribution.
to represent how the process is generating data.
2. The plots show that the depths for 200 grain bullets Such a model can then be used to make inferences
are larger and have less variability than those for (conclusions) about the process.
the 230 grain bullets.
Section 3
3. (a) There are no obvious patterns. (b) The dif- 1. For the lengthwise cuts: x̄ = .919, Median = .895,
ferences are −15, 0, −20, 0, −5, 0, −5, 0, −5, R = .310, IQR = .060, s = .088. For the cross-
20, −25, −5, −10, −20, and 0. The dot diagram wise cuts: x̄ = .743, Median = .775, R = .430,
shows that most of the differences are negative IQR = .110, s = .120. The sample means and me-
and “truncated” at zero. The exception is the tenth dians show that the center of the distribution for
piece of equipment, with a difference of 20. This lengthwise cuts is higher than the center for cross-
point does not fit in with the shape of the rest of wise cuts. The sample ranges, interquartile ranges,
the differences, so it is an outlier. Since most of the and sample standard deviations show that there
differences are negative, the bottom bolt generally is less spread in the lengthwise data than in the
required more torque than the top bolt. crosswise data.
Section 2 2. These values are statistics. They are summariza-
1. (a) For the lengthwise sample: Median = .895, tions of two samples of data, and do not represent
Q(.25) = .870, Q(.75) = .930, Q(.37) = .880. exact summarizations of larger populations or the-
For the crosswise sample: Median = .775, oretical (long-run) distributions.
Q(.25) = .690, Q(.75) = .800, Q(.37) = .738.
4. In the first case, the sample mean and median in-
(b) On the whole, the impact strengths are larger
crease by 1.3, but none of the measures of spread
and more consistent for lengthwise cuts. Each
change; in the second case, all of the measures
method also produced an unusual impact strength
double.
812 Answers to Section Exercises
ŷ = 5.0317 + .14167x1 with a corresponding R 2 are the only effects on Time. (b) ȳ ··· = 2.699,
of .594. The straight-line least squares equation for a2 = .006, b2 = −.766, ab22 = −.003, c2 = .271,
x2 is ŷ = 7.3233 − .01694x2 with a correspond- ac22 = −.003, bc22 = −.130, abc222 = .007. Yes,
ing R 2 of .212. The slopes in these one-variable but the Diameter × Fluid interaction still seems
linear equations are the same as the corresponding to be important. (c) In standard order, the fitted
slopes in the two variable equation from (a). The values are 3.19, 3.19, 1.66, 1.66, 3.74, 3.74, 2.20,
R 2 value in (a) is the sum of the R 2 values from 2.20. R 2 = .974. For a model with all factorial
dy = ln y ), R 2 = .995. (d) b −
effects (ln
the two one-variable linear equations. i jk i jk 1
b2 = 1.532 ln (sec) decrease; divide the .188 raw
Section 3 drain time by e1.532 to get the .314 drain time. This
1. (a) Labeling x1 as A and x2 as B, a1 = −.643, a2 = suggests that (.188 drain time/.314 drain time) =
−.413, a3 = 1.057, b1 = .537, b2 = −.057, b3 = e1.532 = 4.63; the theory predicts this ratio to be
−.480, ab11 = −.250, ab12 = −.007, ab13 = 7.78.
.257, ab21 = −.210, ab22 = .013, ab23 = .197,
ab31 = .460, ab32 = −.007, ab33 = −.453. The 3. Interpolation, and possibly some cautious extrapo-
fitted interactions ab31 and ab33 are large (rela- lation, is only possible using surface-fitting meth-
tive to fitted main effects) indicating that the effect ods. In many engineering situations, an “optimal”
on y of changing NaOH from 9% to 15% de- setting of quantitative factors is sought. This can
pends on the Time (non-parallelism in the plot). be facilitated by interpolation (or extrapolation)
It would not be wise to use the fitted main effects using a surface-fitting model.
alone to summarize the data, since there may be Section 4
an importantly large interaction. (b) ŷ 11 = 6.20, 1. Transforming data can sometimes make relation-
ŷ 12 = 5.61, ŷ 13 = 5.18, ŷ 21 = 6.43, ŷ 22 = 5.84, ships among variables simpler. Sometimes nonlin-
ŷ 23 = 5.41, ŷ 31 = 7.90, ŷ 32 = 7.31, ŷ 33 = 6.88. ear relationships can be made linear, or factors and
Like the plot in part (c) and unlike the plot in response can be transformed so that there are no in-
(f) of Exercise 2 in Section 4.2, the fitted val- teractions among the factors. Transformations can
ues for each level of B (x 2 ) must produce parallel also potentially make the shape of a distribution
plots; no interactions are allowed. However, un- simpler, allowing the use of statistical models that
like parts (c) and (f) of that exercise, the current assume a particular distributional shape (such as
model allows these fitted values to be nonlinear the bell-shaped normal distribution).
in x1 (factorial models are generally more flexible
than lines, curves, and surfaces). (c) R 2 = .914. 2. In terms of the raw response, there will be interac-
The plots of residuals versus Time and residuals tions, since x 1 and x2 are multiplied together in the
versus ŷ i both have patterns; these show that the power law. The suggested plot of raw y versus x 1
“main effects only” model is not accounting for will have different slopes for different values of x2 .
the apparent interaction between the two factors. This means that the effect of changing x 1 depends
Even though R 2 is higher than both of the models on the setting of x2 , which is one way to define an
in Exercise 2 of Section 4.2, this model does not interaction.
seem to be adequate. In terms of the log of y, there will not be inter-
actions, since x 1 and x2 appear additively in the
2. (a) ȳ ··· = 20.792, a2 = .113, b2 = −13.807, equation for ln y. Therefore, the suggested plot of
ab22 = −.086, c2 = 7.081, ac22 = −.090, bc22 = ln y versus x 1 will have the same slope for all val-
−6.101, abc222 = .118. Other fitted effects can ues of x2 . This means that the effect of changing
be obtained by appropriately changing the signs x1 does not depend on the setting of x2 (there are
of the above. The simplest possible interpreta- no interactions).
tion is that Diameter, Fluid, and their interaction
814 Answers to Section Exercises
Section 5
First Second
1. A deterministic model is used to describe a sit- Item Item x̄ s2 Probability
uation where the outcome can be almost exactly
predicted if certain variables are known. A stochas- 2 3 2.5 .5 1
15
tic/probabilistic model is used in situations where 2 41 3.0 2.0 1
15
it is not possible to predict the exact outcome. 1
2 42 3.0 2.0
This may happen when important variables are 15
1
unknown, or when no known deterministic the- 2 5 3.5 4.5 15
ory can describe the situation. An example of a 2 6 4.0 8.0 1
15
deterministic model is the classical Economic Or- 3 41 3.5 .5 1
15
der Quantity (EOQ) model for inventory control. 1
Given constant rate of demand R, order quantity 3 42 3.5 .5 15
1
X , ordering cost P, and per unit holdingcost C, the 3 5 4.0 2.0
15
total cost per time period is Y = P XR + C X2 . 3 6 4.5 4.5 1
15
1
41 42 4.0 0
Chapter 5 15
1
41 5 4.5 .5 15
Section 1 1
41 6 5.0 2.0 15
1. (b) 4.1; 1.136. 1
42 5 4.5 .5 15
2. (a) X has a binomial distribution with n = 10 and 42 6 5.0 2.0 1
15
p = 13 . Use equation (5.3) with n = 10 and p = 13 . 1
5 6 5.5 .5
f (0)– f (10) are .0173, .0867, .1951, .2601, .2276, 15
x 2 3 4 5 6 s2 0 .5 2 4.5 8
P(X = x) 1
6
1
6
2
6
1
6
1
6 P(S = s )
2 2 1 6 5 2 1
15 15 15 15 15
p = .7, f (0)– f (5) are .00, .03, .13, .31, .36, .17; is not a subfield of probability just as engineering
µ = 3.5; σ = 1.02. For p = .9, f (0)– f (5) are is not a subfield of calculus; many simple statisti-
.00, .00, .01, .07, .33, .59; µ = 4.5; σ = .67. cal methods do not require the use of probability,
and many engineering techniques do not require
5. Binomial distribution: n = 8, p = .20. (a) .147
calculus.
(b) .797 (c) np = 1.6 (d) np(1 − p) = 1.28
(e) 1.13 11. A relative frequency distribution is based on data.
A probability distribution is based on a theoreti-
6. Geometric distribution: p = .20. (a) .08 (b) .59
cal model for probabilities. Since probability can
(c) 1/ p = 5 (d) (1 − p)/ p 2 = 20 (e) 4.47
be interpreted as long-run relative frequency, a
7. For λ = .5, f (0), f (1), . . . are .61, √ .30, .08, .01, relative frequency distribution approximates the
.00, .00, . . . ; µ = λ = .5; σ = λ = .71. For underlying probability distribution, with the ap-
λ = 1.0, f (0), f (1), . . . are .37, .37, .18, .06, proximation getting better as the amount of data
.02, .00, .00, . . . ; µ = 1.0; σ = 1.0. For λ = 2.0, increases.
f (0), f (1), . . . are .14, .27, .27, .18, .09, .04, .01,
Section 2
.00, .00, . . . ; µ = 2.0; σ = 1.41. For λ = 4.0,
f (0), f (1), . . . are .02, .07, .15, .20, .20, .16, .10, 1. (a) 2/9 (c) .5
.06, .03, .01, .00, .00, . . . ; µ = 4.0; σ = 2.0.
0 for x ≤ 0
8. (a) .323 (b) .368
2
(d) F(x) = 10x−x for 0 < x < 1
9. (a) .0067 (b) Y ∼ Binomial(n = 4, p = .0067);
9
.00027 1 for x ≥ 1
10. Probability is a mathematical system used to de- (e) 13/27; .288
scribe random phenomena. It is based on a set of
axioms, and all the theory is deduced from the 2. (a) .2676 (b) .1446 (c) .3393 (d) .3616
axioms. Once a model is specified, probability (e) .3524 (f) .9974 (g) 1.28 (h) 1.645
provides a deductive process that enables predic- (i) 2.17
tions to be made based on the theoretical model. 3. (a) .7291 (b) .3594 (c) .2794 (d) .4246
Statistics uses probability theory to describe (e) .6384 (f) 48.922 (g) 44.872. (h) 7.056
the source of variation seen in data. Statistics tries
to create realistic probability models that have 4. (a) .4938 (b) Set µ to the midpoint of the speci-
(unknown) parameters with meaningful interpre- fications: µ = 2.0000; .7888 (c) .0002551
tations. Then, based on observed data, statistical 5. (a) P(X < 500) = .3934; P(X > 2000) = .1353
methods try to estimate the unknown parame- (b) Q(.05) = 51.29; Q(.90) = 2,302.58
ters as accurately and precisely as possible. This
means that statistics is inductive, using data to 6. (b) Median = 68.21 × 106 (c) Q(.05) =
draw conclusions about the process or popula- 21.99 × 106 ; Q(.95) = 128.9 × 106
tion from which the data came. Section 3
Neither is a subfield of the other. Just as engi- 1. Data that are being generated from a particular dis-
neering uses calculus and differential equations tribution will have roughly the same shape as the
to model physical systems, statistics uses proba- density of the distribution, and this is more true
bility to model variation in data. In each case the for larger samples. Probability plotting provides
mathematics can stand alone as theory, so calcu- a sensitive graphical way of deciding if the data
lus is not a subfield of engineering and probability have the same shape as a theoretical probability
is not a subfield of statistics. Conversely, statistics
816 Answers to Section Exercises
(
2e−t (1 − e−t ) for t ≥ 0; 2. (a) [111.0, 174.4] (b) [105.0, 180.4] (c) 167.4
(e) f T (t) = . (d) 174.4 (e) [111.0, 174.4] ppm is a set of plau-
0 otherwise sible values for the mean aluminum content of
E(T ) = 1.5 samples of recycled PET plastic from the recycling
pilot plant at Rutgers University. The method used
Section 5 to construct this interval correctly contains means
1. mean = .75 in.; standard deviation = .0037. in 90% of repeated applications. This particular in-
2. (a) Propagation of error formula gives 1.4159 × terval either contains the mean or it doesn’t (there
10−6 . (b) The lengths. is no probability involved). However, because the
method is correct 90% of the time, we might say
3. (a) 13/27; .0576 (b) X ∼ Normal with mean that we have 90% confidence that it was correct
13/27 and standard deviation .0576. (c) .3745 this time.
(d) .2736 (e) 13/27, .0288; X ∼ Normal with
mean 13/27 and standard deviation .0288; .2611; 3. n = 66
.5098. 4. (a) x̄ = 4.6858 and s = .02900317 (b) =
4. .7888, .9876, 1.0000 [4.676, 4.695] mm (c) [4.675, 4.696] mm. This
interval is wider than the one in (b). To increase
5. Rearrange the relationship in terms of g to get the confidence that µ is in the interval, you need to
2
g = 4π 2 L . Take the given length and period to be make the interval wider. (d) The lower bound is
τ
approximately equal to the means of these input 4.677 mm. This is larger than the lower endpoint
random variables. To use the propagation of error of the interval in (b). Since the upper endpoint here
formula, the partial derivatives need to be eval- is set to ∞, the lower endpoint must be increased
uated at the means of the input random variables to keep the confidence level the same. (e) To
and ∂∂gL = 4π2 = 6.418837 and ∂τ ∂g
2 2
= −8π3 L = make a 99% one-sided interval, construct a 98%
τ τ two-sided interval and use the lower endpoint. This
−25.8824089. Then applying equation (5.59),
was done in part (a), and the resulting lower bound
Var(g) ≈ (6.418837)2 (.0208)2 +(−25.8824089)2
is 4.676. This is smaller than the value in (d); to
× (.1)2 = 6.7168 ft2 /sec√4
so the approximate stan-
increase the confidence, the interval must be made
dard deviation of g is 6.7168 = 2.592 ft/sec2 .
“wider.” (f) [4.676, 4.695] ppm is a set of plau-
The precision in the period measurement is the
sible values for the mean diameter of this type
principal limitation on the precision of the derived
of screw as measured by this student with these
g because its term (variance × squared partial
calipers. The method used to construct this inter-
derivative) contributes much more to the propaga-
val correctly contains means in 98% of repeated
tion of error formula than the length’s term.
applications. This particular interval either con-
Chapter 6 tains the mean or it doesn’t (there is no probability
involved). However, because the method is correct
Section 1 98% of the time, we might say that we have 98%
1. [6.3, 7.9] ppm is a set of plausible values for the confidence that it was correct this time.
mean. The method used to construct this interval
Section 2
correctly contains the true mean in 95% of re-
1. H0 : µ = 200; Ha : µ > 200; z = −2.98; p-
peated applications. This particular interval either .
contains the mean or it doesn’t (there is no prob- value = .9986. There is no evidence that the mean
ability involved). However, because the method is aluminum content for samples of recycled plastic
correct 95% of the time, we might say that we have is greater than 200 ppm.
95% confidence that it was correct this time.
818 Answers to Section Exercises
2. (a) H0 : µ = .500; Ha : µ 6= .500; z = 1.55; p- strong evidence that the mean torque is not 100
.
value = .1212. There is some (weak) evidence that ft lb. (c) [104.45, 117.55] (d) Independence
the mean punch height is not .500 in. (The rounded among assemblies; normal distribution for differ-
x̄ and s given produce a z that is quite a bit dif- ences. (e) H0 : µd = 0; Ha : µd < 0 (where dif-
ferent from what the exact values produce. x̄ = ferences are Top – Bottom); t = −2.10 on 14 df;
.005002395 and s = .002604151, computed from .025 < p-value < .05. (f) [−13.49, 1.49]
the raw data, produce z = 1.85, and a p-value of
3. (a) [−0.0023, .0031] mm (b) H0 : µd = 0; Ha :
2(.0322) = .0644.) (b) [.49990, .50050] (c) If
µd 6= 0 ; z = .24; p-value = .8104. There is no ev-
uniformity of stamps on the same piece of material
idence of a systematic difference between calipers.
is important, then the standard deviation (spread)
(c) The confidence interval in part (a) contains
of the distribution of punch heights will be impor-
zero; in fact, zero is near the middle of the inter-
tant (in addition to the mean).
val. This means that zero is a very plausible value
3. The mean of the punch heights is almost cer- for the mean difference—there is no evidence that
tainly not exactly equal to .50000000 inches. Given the mean is not equal to zero. This is reflected by
enough data, a hypothesis test would detect this as the large p-value in part (b).
a “statistically significant” difference (and produce
4. (a) The data within each sample must be iid nor-
a small p-value). What is practically important is
mal, and the two distributions must have the same
whether the mean is “close enough” to .500 inches.
variance σ 2 . One way to check these assumptions
The confidence interval in part (b) answers this
is to normal plot both data sets on the same axes.
more practical question.
For such small sample sizes, it is difficult to defini-
4. H0 : µ = 4.70; Ha : µ 6= 4.70 ; z = −3.46; p- tively verify the assumptions. But the plots are
.
value = .0006. There is very strong evidence that roughly linear with no outliers, indicating that the
the mean measured diameter differs from nominal. normal part of the assumption may be reasonable.
The slopes are similar, indicating that the common
5. Although there is evidence that the mean is not
variance assumption may be reasonable. (b) La-
equal to nominal, the test does not say anything
bel the Treaded data Sample 1 and the Smooth
about how far the mean is from nominal. It may
data Sample 2. H0 : µ1 − µ2 = 0; Ha : µ1 − µ2 6=
be “significantly” different from nominal, but the
0; t = 2.49; p-value is between .02 and .05. This
difference may be practically unimportant. A con-
is strong evidence of a difference in mean skid
fidence interval is what is needed for determining
lengths. (c) [2.65, 47.35] (d) [2.3, 47.7]
how far the mean is from nominal.
Section 4
Section 3 σT2
1. The normal distribution is bell-shaped and sym- 1. (a) [9.60, 37.73] (b) 57.58 (c) H0 : =1
σS2
metric, with fairly “short” tails. The confidence σT2
Ha : 6= 1; f = .64 on 5,5 df; p-value > .50
interval methods depend on this regularity. If the σS2
distribution is skewed or prone to outliers/extreme (d) [.36, 1.80]
observations, the normal-theory methods will not
properly take this into account. The result is an 2. (a) [7.437, ∞) (b) [44.662, ∞) (c) Top and
interval whose real confidence level is different bottom bolt torques for a given piece are probably
from the nominal value (and often lower than the not sensibly modeled as independent.
nominal value). Section 5
2. (a) Independence among assemblies; normal dis- 1. (a) Conservative method: [.562, .758]; .578. Other
tribution for top-bolt torques. (b) H0 : µ = 100; method: [.567, .753]; .582. (b) H0 : p = .55; Ha :
.
Ha : µ 6= 100 ; t = 4.4; p-value = .001. There is
Answers to Section Exercises 819
is .05, the error probability associated with the in- 4. (a) Unstructured multisample data could also be
dividual intervals must be made smaller than .05. thought of as data from one factor with r levels.
This is equivalent to increasing the individual con- In many situations, the specific levels of the fac-
fidences (above 95%), which makes the intervals tor included in the study are the levels of interest.
wider. For example, in comparing three drugs, the fac-
tor might be called“Treatment.” It might have four
Section 3 levels: Drug 1, Drug 2, Drug 3, and Control. The
1. (a) .03682; it is larger. (b) .05522; it is larger. experimenter is interested in comparing the spe-
2. (a) k2∗= 2.88 so the intervals in numerical order of cific drugs used in the study to each other and to
the four vans are: [1.0878, 1.0982], [.9610, 9714], the control. Sometimes the specific levels of the
[1.0147, 1.0240], [.9970, 1.0074]. (b) 1 = factor are not of interest in and of themselves,
.0097; 1 = .0092. These are larger than the earlier but only because they may represent (perhaps they
1’s. The confidence level here is a simultaneous are a random sample of) many different possible
one while the earlier level was an individual one. levels that could have been used in the study. A
The intervals here are doing a more ambitious job random effects analysis is appropriate in this situ-
and must therefore be wider. ation. For an example, see part (b). (b) If there
are many technicians, and five of these were ran-
Section 4 domly chosen to be in the study, then interest is
1. (a) Small, since some means differ by more than in the variation among all technicians, not just the
the 1 there. (b) SSTr = .285135, MSTr = five chosen for the study. (c) σ̂ = .00155 in.;
.071284, df = 4; SSE = .00423, MSE = .000423, σ̂ τ = .00071 in.
df = 10; SSTot = .289365, df = 14; f = 168.52
Section 5
on 4,10 df; p-value < .001. R 2 = .985.
1. (a) Center linex̄ = 21.0, UCLx̄ = 22.73, LCLx̄ =
2. (a) Small, since some sample means differ by more 19.27. Center line R = 1.693, UCL R = 4.358, no
than the 1’s there. (b) SSTr = .034134, MSTr = LCL R . (b) Center lines = .8862, UCLs = 2.276,
.011378, df = 3; SSE = .000175, MSE = .000013, no LCLs . (c) 1.3585; 1.3654; sp = 1.32.
df = 13; SSTot = .034308, df = 16; f = 847 on (d) Center linex̄ = 21.26, UCLx̄ = 23.61, LCLx̄ =
3,13 df; p-value < .001. 18.91. Center line R = 2.3, UCL R = 5.9202, no
3. (a) To check that the µi ’s are normal, make a nor- LCL R . (e) Center linex̄ = 21.26, UCLx̄ = 23.62,
mal plot of the ȳ i ’s. To check that the i ’s are LCLx̄ = 18.90. Center lines = 1.21, UCLs =
normal, make a normal plot of the residuals. (Nor- 3.10728, no LCLs .
mal plotting each sample individually will not
2. (a) R
d2
= 4.052632
2.326
= 1.742318 × .001 in.; s̄
c4
=
be very helpful because the sample sizes are so
small.) Both plots are roughly linear, giving no ev-
1.732632
.9400
= 1.843226 × .001 in. (b) For the R chart
idence that the one-way random effects model as- Center Line R = 2.326(1.843226) = 4.287344 ×
sumptions are unreasonable. (b) SSTr = 9310.5, .001 in., UCL R = 4.918(1.843226) = 9.064985 ×
MSTr = 1862.1, df = 5; SSE = 194.0, MSE = .001 in. and there is no lower control limit. For
16.2, df = 12; SSTot = 9504.5, df = 17; f = the s chart Center Lines = 1.732632 × .001 in.
115.18 on 5,12 df; p-value < .001. σ̂ = 4.025 UCLs = 2.089(1.732632) = 3.619468 × .001 in.
measures variation in y from repeated measure- and there is again no lower control limit. Neither
ments of the same rail; σ̂ τ = 24.805 measures chart indicates that the short-term variability of the
the variation in y from differences among rails. process (as measured by σ ) was unstable. (c) Use
(c) [3.46, 13.38] Center Linex̄ = 11.17895 × .001 in. above nom-
inal, LCLx̄ = 11.17895 − 3 1.843226
√
5
= 8.706 × .001
Answers to Section Exercises 821
in. above nominal and UCLx̄ = 11.17895 + causes and taking action to eliminate them. Reduc-
3 1.843226
√
5
= 13.65189 × .001 in. above nominal. ing variability increases the quality of the process
x̄ from sample 16 comes close to the upper con- output.
trol limit, but overall the process mean seems to 4. Shewhart control charts do not physically control
have been stable over the time period. (d) The a process in the sense of guiding or adjusting it.
x̄’s from samples 9 and 16 seem to have “jumped” They only monitor the process, trying to detect
from the previous x̄. The coil change may be caus- process instability. There is an entirely different
ing this jump, but it could also be explained by field dedicated to “engineering control”; this field
common cause variation. It may be something uses feedback techniques that manipulate process
worth investigating. (e) Assuming that the mean variables to guide some response. Shewhart con-
could be adjusted (down), you need to look at trol charts simply monitor a response, and are not
one of the estimates of σ to answer this question intended to be used to make “real time” adjust-
about individual thread lengths. (You should not ments.
use control limits to answer this question!) If µ
could be made equal to zero, then (assuming nor- 5. Out-of-control points should be investigated. If the
mally distributed thread lengths), almost all of the causes of such points can be determined and elim-
thread lengths would fall in the interval ±3σ . Us- inated, this will reduce long-term variation from
ing the estimate of σ based on s̄ from part (a), this the process. There must be an active effort among
can be approximated by 3(1.843226) = 5.53× those involved with the process to improve the
.001 in. It does seem that the equipment is ca- quality; otherwise, control charts will do nothing
pable of producing thread lengths within .01 in. to improve the process.
of nominal. If the equipment were not capable 6. Control limits for an x̄ chart are set so that, un-
of meeting the given requirements, the company der the assumption that the process is stable, it
could invest in better equipment. This would “per- would be very unusual for an x̄ to plot outside the
manently” solve the problem, but it might not control limits. The chart recognizes that there will
be feasible from a financial standpoint. A sec- be some variation in the x̄’s even if the process
ond option is to inspect the bolts and remove the is stable, and prevents overadjustment by allow-
ones that are not within .01 in. of nominal. This ing the x̄’s to vary “randomly” within the control
might be cheaper than investing in new equip- limits. If the process mean or standard deviation
ment, but it will do nothing to improve the quality changes, x̄’s will be more likely to plot outside of
of the process in the long run. A third option is the control limits, and sooner or later the alarm will
to study the process (through experimentation) to sound. This provides an opportunity to investigate
see if there might be some way of reducing the the cause of the change, and hopefully take steps
variability without making a large capital invest- to prevent it from happening again. In the long run,
ment. such troubleshooting may improve the process by
3. Control charting is used to monitor a process and making it less variable.
detect changes (lack of stability) in a process. The
Section 6
focus is on detecting changes in a meaningful pa-
rameter such as µ, σ, p, or λ. Points that plot out of 1. (a) Center line p̂ = .02, UCL p̂ = .0438, no LCL p̂ .
control are a signal that the process is not stable at (b) Center line p̂ =.0234, UCL p̂ =.0491, no LCL p̂ .
the standard parameter value (for a standards given 2. Center lineû = .138 for all i, UCLû = .138 +
chart) or was not stable at any parameter value (for q i i
3 .072(1−.072) = −.05061158 < 0, so there is no 14.53, ab11 = .033, ab12 = −5.40, ab13 = 5.37,
40
lower control limit, while UCL p̂ = .072 + ab21 = −2.13, ab22 = −.567, ab23 = 2.70,
q i ab31 = 2.104, ab32 = 5.97, ab33 = −8.07.
3 .072(1−.072)
40
= .1946116. There is no evidence (e) 18.24. No. (f) Use (ai − ai0 ) ± 22.35. (g) Use
that the process fraction nonconforming was un- (ai − ai0 ) ± 26.88.
stable (changing) over the time period studied.
Section 2
5. If different data collectors have different ideas of 1. (a) Ê ± .014. B and C main effects, BC inter-
exactly what a “nonconformance” is, then the data action. (b) sFE = .0314 with 20 df; close to
collected will not be consistent. A stable process sp = .0329. (c) Using few effects model: [3.037,
may look unstable (according to the c chart) be- 3.091]. Using general method: [3.005, 3.085].
cause of these inconsistencies.
2. (a) Only the main effect for A plots “off the line.”
6. It may indicate that the chart was not applied prop- (b) Since the D main effect is almost as big (in
erly. For example, if hourly samples of size m = 4 absolute value) as the main effect for A, you might
are collected, it may or may not be reasonable to choose to include it. For this model, the fitted val-
use a retrospective x̄ chart with m = 4. If the 4 ues are (in standard order): 16.375, 39.375, 16.375,
items sampled are from 4 different machines, 3 of 39.375, 16.375, 39.375, 16.375, 39.375, −4.125,
which are stable at some mean and the 4th stable at 18.875, −4.125, 18.875, −4.125, 18.875, −4.125,
a different mean, then the sample ranges and stan- 18.875. (c) Set A low (unglazed) and D high (no
dard deviations will be inflated. This will make the clean). [0, 9.09].
control limits on the x̄ chart too wide. Also, the x̄’s
will show very little variation about a center line 3. (a) ȳ ···· = 3.594, a2 = −.806, b2 = .156, ab22 =
somewhere between the two means. This is all a −.219, c2 = −.056, ac22 = −.031, bc22 = .081,
result of the fact that each sample is really com- abc222 = .031, d2 = −.056, ad22 = −.156,
ing from four different processes. Four different bd22 = .006, abd222 = −.119, cd22 = −.031,
control charts should be used. acd222 = −.056, bcd222 = −.044, abcd2222 =
.006. (b) It appears that only the main effect for
A is detectably larger than the rest of the effects,
since the point for a2 is far away from the rest of
Answers to Section Exercises 823
the fitted effects. (c) To minimize y, use A(+) estimates from the Yates algorithm (excluding the
(monks cloth) and B(+) (treatment Y). one that includes the grand mean) to be bell-shaped
around zero. A normal plot of these estimates
Section 3 would then be roughly linear. However, if there
1. Since A ↔ BCDE, if both are large but opposite are effects (or sums of effects) that are relatively
in sign, their estimated sum will be small. far from zero, the corresponding estimates will plot
2. (a) 8.23, .369, .256, −.056, .344, −.069, −.081, away from the rest (off the line), and may be con-
−.093, −.406, .181, .269, −.344, −.094, −.156, sidered more than just random noise. The principle
−.069, .019. (b) .312. The sums α2 + βγ δ2222 , of “sparsity of effects” says that in most situations,
γ2 + αβδ2222 , δ2 + αβγ 2222 , and αβδ222 + γ 22 only a few of the many effects in a factorial exper-
are detectable. Simplest explanation: A, C, D main iment are dominant, and their estimates will then
effects and CE interaction are responsible for these plot off the line on a normal plot.
large sums. (c) A (+), C (+), D (−), and E (−). 4. (a) I ↔ ABCDF ↔ ABCEG ↔ DEFG (b) ABDF,
The abc combination, which did have the largest ABEG, CDEFG (c) +, +; −, − (d) That only
observed bond strength. A, F, and their interaction are important in describ-
3. (b) (1), ad, bd, ab, cd, ac, bc, abcd. Estimated ing y.
sums of effects: 3.600, −.850, .100, −.250, −.175, 5. 3.264
−.025, −, 075, −.025. (c) The estimate of α2 +
βγ δ222 plots off the line. Still, one might conclude 6. (a) I ↔ ABCE ↔ BCDE ↔ ADEF (b) −, −;
that this is due to the main effect for A, but the +, − (c) .489
conclusion here would be a little more tentative.
Chapter 9
Section 4
Section 1
1. The advantage of fractional factorial experiments
1. (a) sLF = 67.01 measures the baseline variation
is that the same number of factors can be stud-
in Average Molecular Weight for any particular
ied using less experimental runs. This is important
Pot Temperature, assuming this variation is the
when there are a large number of factors, and/or
same for all Pot Temperatures. (b) Standard-
experimental runs are expensive. The disadvantage
ized residuals: 2.0131, −.3719, −.9998, −1.562,
is that there will be ambiguity in the results; only
.2715, .2394, .7450, .0004 (c) [22.08, 24.91]
sums of effects can be estimated. The advantage
(d) [1761, 1853], [2630, 2770] (e) [1745, 1869],
of using a complete factorial experiment is that
[2605, 2795] (f) 1705; 2590 (g) 1627; 2503
all means can be estimated, so all effects can be
(h) SSR = 4,676,798, MSR = 4,676,798, df = 1;
estimated.
SSE = 26,941, MSE = 4490, df = 6; SSTot =
2. It will be impossible to separate main effects from 4,703,739, df = 7; f = 1041.58 on 1,6 df; p-value
two-factor interactions. You would hope that any < .001
interactions are small compared to main effects;
2. (a) b0 = 4345.9, b1 = −3160.0, sLF = 26.76
the results of the experiment can then be (tenta-
(close to sp = 26.89) (b) Standardized resid-
tively) summarized in terms of main effects. (If all
uals: 1.32, −.48, −.04, −.91, .52, −1.07, 1.94,
interactions are really zero, then it is possible to
−.04, −1.09. (c) [−357.4, −274.64] (d) t =
estimate all of the main effects.) Looking at Ta-
−14.47 on 7 df, p-value < .001; or f = 209.24
ble 8.35, the best possible resolution is 3 (at most).
on 1,7 df, p-value < .001. (e) [2744.8, 2787.0]
3. Those effects (or sums of effects) that are nearly (f) [2699.2, 2832.6] (g) 2698.5
zero will have corresponding estimates that are
“randomly” scattered about zero. If all of the ef-
fects are nearly zero, then one might expect the
824 Answers to Section Exercises
Index
Note: boldface page numbers indicate definitions.
2-factor interaction of factors, 553, 573 inference for variance components, Block of experimental units, 41
491– 495 Blocking variables, 40
2 p factorial studies simple linear regression, 669–672, 673 Bonferroni inequality, 470, 471– 472
balanced, 580 –587 Analytical study, 6 Book paper thickness, measurements, 16
confidence intervals for, 587–590 ANOVA, see Analysis of variance Boxplots, 81
special devices for, 187–190 (ANOVA) Brittleness, measuring, 14
without replication, 577–580 Arithmetic mean, 93 Brownlee’s stack loss data, 150
2 p−1 fractional factorials “As past data” Shewhart control charts, Bunching, 530
choosing, 596 –597 500 Burn-in period, 763
data analysis for, 603–607 Assignable causes, 498
determining the alias structure of, Attributes data Calibration, 17
600 –601 bar charts and plots for, 107–112 Capability, 95
2 p−q fractional factorials numerical summarization of, 104 –107 Capability of a process, 389
choosing, 613–614 Axioms, 728 Carryover effects, 290
data analysis for, 618–620 Axioms of probability theory, 729–735, Categorical data, 8–9 (see also qualitative
determining the alias structure of, 614 733 data)
Causality, 5
3-factor interaction, 573 Balanced 2 p factorial studies, Cause-and-effect diagram, 60
confidence intervals for, 587–590 Census, 33
Accelerated life test, 62 fitting and checking simplified models, Center of mass, 93
Accelerated life testing, 210 580 –587 Central limit effect, 316 –321
Accompanying variable, 39 Balanced data, 172 Central Limit Theorem, 316
Accurate measurement, 17 Balanced data confidence limits Changes in level, 530
Alias structure, 601 for a 2 p effect, 587 Charts for demerits, 538
Allocation of resources, 46 for one-way random effects model, Chebyschev’s Theorem, 97
Alternative hypothesis, 347 491, 493 Chi-squared distribution, 386
Analysis of variance (ANOVA) Baseline variation, 44, 498 Coefficient of determination, 130 –132,
multiple linear regression, 691–696 Bathtub curve, 763 143, 173, 486 – 487
one-way, 478 Bell-shaped histogram, 73 simple linear regression, sum of
F test, 479– 482 Bimodal histogram, 72 squares, 670
identity and table, 482– 487 Binomial distribution, 233–236 Coefficients for a quadratic
random effects models and analyses, mean of, 236 matrix of quadratic, 703
487– 491 variance of, 236 vector of linear, 703
estimator of the treatment variance, Bivariate data, 11 Combination, 754 –757
492 check sheet, 30 –31 Comparative study, 43– 44, 374
825
826 Index
Complete block plans, 630 –631 Contour plot, 701–702 types of, 8–11
Complete randomization, 48– 49 Control chart patterns, 527–531 (see also univariate, 11
Completely randomized experiments, Shewhart control charts) variables, 104
47–50 Control charts, see Shewhart control Data analysis, 19–23
Conceptual population, 8 charts Data collection
Concomitant variable, 39 Control limits, 498 physical preparation, 63
Conditional densities, geometry of, 298 setting, 499 problem definition, 57–60
Conditional distributions, for continuous Controlled variables, 38, 40 recording, 30 –32
random variables, 297–300 Correlation vs. causation, 137 sampling, 28–30
Conditional probability, 739–743 Correlations study definition, 60 –63
Conditional probability density function, sample, 129–130, 137 Data structures, types of, 11–14
for continuous random variables, squared, 131 Data vectors, influence of in regression,
297 Count data, descriptive statistics for, 159
Conditional probability function, for 104 –112 (see also attributes data) Decreasing force-of-mortality (DFM)
discrete random variables, 284 –285 Count variables, 27 distribution, 761
Confidence intervals, 335 Cumulative probability functions, Defining relation, 602, 615
factorial effects, 554 –562 226 –228, 247–249 Definition of effects, 552–554, 572–575
interpretation of, 342 Curve fitting by least squares, 149–158 Descriptive statistics, 104 –112
large-sample, 335–344 Cyclical patterns, 147, 528 Design resolution, 620 –625, 621
P-R method of simultaneous, 472 Deterministic models, 202–203
Tukey method, 474 – 477 Daniel, Cuthbert, 577–578 Diagram, cause-and-effect, 60
Confidence intervals for means, 461– 464 Data Diagrams
Confidence levels attributes dot, 66 –68, 74, 76
individual and simultaneous, 469– 470 bar charts and plots for, 107–112 Ishikawa, 61
interpretation, 341–342 numerical summarization of, Pareto, 58
of prediction intervals, 424 104 –107 Direct measure, 26
of tolerance intervals, 424 – 425 analysis for 2 p−1 fractional studies, Discrete data
Confidence limits 603–607 likelihood function, 765–774
effects in a 2 p factorial, 575 balanced, 172 log likelihood function, 766
mean system response, 662, 686 bivariate, 11, 123 Discrete probability distributions,
one-way model, 461– 462 Brownlee’s stack loss, 150 228–232
one-way random effects model, 491, categorical, 8 binomial, 232–237
493 continuous geometric, 237–240
simultaneous (in regression), 664, 688 likelihood function, 774 –781 Discrete probability functions, 223–226
slope parameter, 659 log likelihood function, 775 Discrete probability models, 221
Tukey simultaneous for main effects, count, 104 –112 (see also attributes Discrete random variable, 222
562–563 data) conditional distributions of, 283–284
variance of one-way model, 457 discrete conditional probability function,
Continuous data likelihood function, 765–774 284 –285
likelihood function, 774 –781 log likelihood function, 776 expected value of, 228
log likelihood function, 775 engineering independence of, 289
Continuous distributions, means and collection of, 26 –32 mean of, 228
variances for, 249–250 preparing to collect, 56 –64 standard deviation of, 230
Continuous random variable, 222, measurement, 104 variance of, 230
244 –263, 292–300 mixed Disjoint events, 731
conditional distributions, 297–300 likelihood function, 779 Distribution
conditional probability density log likelihood function, 779 center of mass, 93
function, 297 multivariate, 11 first moment, 93
independent, 299 numerical, 9 Distributional shapes
joint probability density, 292 overfitting of, 160 engineering interpretations of, 72–73
marginal probability density, 295 paired, 11 terminology for, 73
mean or expected value of, 249 qualitative, 8–9, 104 –112 (see also Distributions
standard deviation of, 250 attributes data) binomial, 233–236
variance of, 250 quantitative, 9 chi-squared, 386
Continuous variables, 9 repeated measures, 11
Index 827
decreasing force-of-mortality (DFM), mutually exclusive, 731 three-way and higher factorials,
761 Experimental study, 5 178–184
exponential, 257–260 Experimental variables, 38 Fitted interaction of factors, 169, 182, 183
Gaussian, 251 Experiments, completely randomized, Fitted main effect of factors, 166, 182
geometric, 237–239 47–50 Fitted quadratics, interpreting, 701–702
increasing force-of-mortality (IFM), Exponential distributions, 257–260, 258 Fitted value, 129
761 Extraneous variables, 40 Flowcharts, 58
joint, 279 blocking, 40 Force-of-mortality function, 760 –764
marginal, 282 control of, 40 Formal inference, methods of, 361
normal, 251 randomization, 40 Fractional factorial experimentation, 591
null, 348 Extrapolation, caution concerning, Fractional factorial studies
Poisson, 240 –243 158–159 aliases, 601
probability, 222, 251–257 blocks, 625–631
reference, 348 Factorial effects, individual confidence complete block plans, 630 –631
Snedecor F, 391 intervals for, 554 –562 design resolution, 620 –625
standard normal, 88 Factorial inference methods, 705 experiment size, 631
Studentized extreme deviate, 472 Factorial interactions, interpretation of, fundamental issues, 596 –597
Studentized range, 475 183–184 generator, 601, 614
Weibull, 260 –263, 761 Factorial notation, special 2 p , 187–188 observations about, 592–596
Documentation, 31–32 Factorial study Fractional factorials, saturated, 622
Dot diagram, 66 –68, 74, 76, 81, 94 2p Frequency histogram, 72
Dummy variables, 706 confidence intervals for, 587–590 Frequency table, 70 –71, 74
regression analysis, 713 special devices for, 187–190 Functions
without replication, 577–580 conditional probability, 284 –285
Effect sparsity, 577 balanced 2 p factorial studies conditional probability density, 297
Effective experimentation, principles for, confidence intervals for, 587–590 cumulative probability, 226 –228,
38– 47 fitting and checking simplified 247–249
Eigenvalues, 703 models, 580 –587 discrete probability, 223–226
Empirical models, 161 complete, 12 force-of-mortality, 760 –764
Empty event, 731 fractional, 13 geometric cumulative probability
Engineering data Factorials, importance of two-level, 190 relationship for, 237
collection of, 26 –32 Factors hazard (see force-of-mortality)
preparing to collect, 56 –64 2-factor interaction of in three-way joint probability, 279
Engineering data-generating process, factorial studies, 573 likelihood
stability of, 496 3-factor interaction of in three-way continuous and mixed data,
Engineering statistics, 2 factorial studies, 573 774 –781
Enumerative studies fitted interaction of, 169, 182 discrete data, 765–774
judgment-based method, 33 fitted main effect of, 166, 182 mixed, 779
sampling, 33–37 interaction of in p-way factorials, 553 linear, 698
systematic method, 33 levels, 12 log likelihood
Enumerative study, 6 main effect of in p-way factorials, 552 continuous data, 775
Equal variances, 651 main effect of in three-way factorial discrete data, 766
Equations studies, 572 mixed, 779
choice and interpretation of Few-effects model, confidence intervals marginal probability, 282
appropriate, 151 for balanced 2 p studies, 587–590 probability density, 245–247
normal, 126, 141 Few-effects sample variance, 582 conditional, 297
polynomial, 141 alternative formula for, 583 probability, 223–228
Error sum of squares, 484 First (or lower) quartile, 80 conditional, 284 –285
Estimation of all r individual mean First moment, 93 cumulative, 226 –228, 247–249
responses, 471 Fishbone diagram, see cause-and-effect discrete, 223–226
Events, 729 and Ishikawa diagrams geometric cumulative, 237
dependent, 741 Fitted effects, normal-plotting of, joint, 279
disjoint, 731 577–580 marginal, 282
empty, 731 Fitted factorial effects, 162–190 mathematically valid, 225
independence of, 741–743 2-factor studies, 163–171 standard normal cumulative, 252
828 Index
discrete random variables, 228 Multiple linear regression model, assumptions, 447
general linear combinations 675–682 confidence limits for, 461– 462
intervals for, 464 – 469 fitted values for, 677 confidence limits for variance of, 457
geometric distribution, 239 residuals for, 677 fitted values for, 448
inference methods for, 441 standardized residuals for, 682 residuals for, 449
linear combinations, 464 Multiple linear regression program, 141 standardized residuals for, 459
confidence limits for, 465 Multiple regression statement in symbols, 447
confidence limits for 2-way common residual plots in, 155 One-way random effects model, 488
factorial, 556 goal of, 152 balanced data confidence limits, 491,
paired differences interpreting fitted coefficients from, 493
inference for, 368–374 151 Operating characteristic curve, 331
Poisson distributions, 241 Multiple regression methods, 650 Operational definitions, 27
population, 98 Multiple regression model, factorial Outcomes, 729
process, 101 analyses, 705–719
random variables Multiplication principle, 751–752 p charts, 518–523 (see also Shewhart
linear combinations of, 307–310 Multiplication rule of probability, 740, control charts)
sample, 163, 178 742 Paired data, 11
linear combination of, 464 Multisample studies, 478– 479 Paired differences, inference for the mean
simultaneous two-sided confidence notational convention for, 480 of, 368–374
limits for, 664, 688 pooled estimate of variance for, Paired distortion measurements, 11
Weibull, 260 455– 457 Parallel systems, 747–749
Mean system response Multivariate data, 11 Parallel traces, 168
confidence limits for, 662, 686 Mutually exclusive events, 731 Parameters, 98
estimate of all r individual, 471 fitting or estimating, 20
inference for, 661–666, 685–689 Nonrandom variation, 498 Pareto diagram, 58
Measurement Normal distribution Permutations, 753–754
accuracy, 15, 17 inference for the variance of, 386 –391 Peterson, Dr. Frank, 20
blind, 29 prediction intervals, 414 – 419 Physical preparation, 63
calibration of a system, 17 Normal distributions, 651 Pillai-Ramachandran method, 471– 474
methods of, 14 –19 with a common variance, 378 (see also P-R method)
precision, 15, 16 –17 Normal equations, 126 Pilot plants, 62
unbiased, 17 Normal plot, 88 (see also probability plot) Plots
validity, 15 Normal probability distributions, attributes data, 107–112
variation/error, 15 251–257 boxplots, 81–85
Measurement data, 104 (see also Normal probability paper, 90 common residual, 155
variables data) Normal probability plots, 264 –269 contour, 701–702
Measures of location, 92–95 Normal-plotting of fitted effects, 577–580 cube, 180 –181
Measures of spread, 95–98 interpreting, 579 cycles in, 101
Median, 80 Null distribution, 348 (see also reference cyclical pattern of, 147
Memoryless property, 259 distribution) exponential probability, 270 –273
Methods of formal inference, 361 Null hypothesis, 347 half normal, 577
MINITAB, 102–103, 138–139, 142–143, Numerical data, 9 (see also quantitative interaction, 165
150 –151, 156, 170, 306, 402, 486, data) interpreting fitted quadratic, 701–702
561, 672–674, 704 discrete, 9 multiple regression
Mixed data Numerical summary measures, 92–104 common residual, 155
likelihood function, 779 normal, 88
log likelihood function, 779 Observational study, 5 normal probability, 264 –269
Multimodal histogram, 72 Observed level of significance, 349 probability, 88
Multiple linear regression One-way methods in p-way factorials, Q-Q, 85–92, 86
ANOVA, 691–696 569–571 quantile, 80 –81
prediction intervals, 689–691 One-way methods in two-way factorials, residual, 135–136
prediction limits, alternative formula 547–551 (see also inference scatterplots, 74 –75
for, 689 methods) steam-and-leaf, 68–70, 74
tolerance intervals, 689–691 One-way model, 447 summary statistics, 99–101
theoretical Q-Q, 88
830 Index
Total sum of squares, 484 Variables Variables vs. attributes control charting,
Transformation accompanying, 39 538
logarithmic, 193 behavior of, 75 Variables data, 104
power, 193 blocking, 40 Variance, 95
Transformations concomitant, 39 population, 99
multifactor studies, 194 –202 continuous, 9 sample, 96
multiple samples, 193–194 controlled, 38, 40 transforming to stabilize, 194
single sample, 192–193 count, 27 Variances
Transmission of variance formulas, 311 dummy, 706 equal, 651
(see also propagation of error) for regression analysis, 713 estimate for multiple linear regression
Treatment sum of squares, 484 experimental, 38 model, 675–682
Trend charts, 75–77 (see also run charts) extraneous, 40 estimate for simple linear regression
Truncated histogram, 73 factors, 12 model, 651–658
Tukey’s method, 474 – 477, 479 handling extraneous, 40 – 43 Variation, 498
comparing main effects, 562–567 iid random, 291
Two proportions, inference for the independent continuous random, 299 Wear-out, 763
difference between, 407– 413 independent discrete random, 289 Weibull distributions, 260 –263, 761
Two-level factorials, standard fractions jointly continuous random, 292–297 mean of, 260
of, 591–611 jointly discrete random, 279–283 median of, 261
Two-way factorial notation, 551–554 linear combinations of random, variance of, 260
Type I error, 353 307–310 Weibull paper, 276 –277
probability, 354 lurking, 5 Weibull probability density, 260
Type II error, 353 managed, 38 Whiskers, 82
probability, 354 plots against process, 101–102 Wood joint strength, measuring, 15
qualitative, 27
u charts, 523–527 (see also Shewhart random, 221–223, 222 Yates algorithm, 188–189
control charts) response, 38 reverse, 189
Uniform histogram, 73 supervised, 38 Yates standard order, 188
Univariate data, 11 taxonomy of, 38–39
IMPORTANT
If the CD-ROM packaging has been opened,
the purchaser cannot return the book for a refund!
The CD-ROM is subject to this agreement!
Notice to Users: Do not install or use the CD-ROM until you have read and agreed to
this agreement. You will be bound by the terms of this agreement if you install or use the
CD-ROM or otherwise signify acceptance of this agreement. If you do not agree to the
terms contained in this agreement, do not install or use any portion of this CD-ROM.
License: The material in the CD-ROM (the “Software”) is copyrighted and is pro-
tected by United States copyright laws and international treaty provisions. All rights
are reserved to the respective copyright holders. No part of the Software may be re-
produced, stored in a retrieval system, distributed (including but not limited to over
the www/Internet), decompiled, reverse engineered, reconfigured, transmitted, or tran-
scribed, in any form or by any means—electronic, mechanical, photocopying, record-
ing, or otherwise—without the prior written permission of Duxbury Press, an imprint
of Brooks/Cole (the “Publisher”). Adopters of Vardeman and Jobe’s Basic Engineering
Data Collection and Analysis may place the Software on the adopting school’s network
during the specific period of adoption for classroom purposes only in support of that
text. The Software may not, under any circumstances, be reproduced and/or downloaded
for sale. For further permission and information, contact Brooks/Cole, 511 Forest Lodge
Road, Pacific Grove, CA 93950.
U.S. Government Restricted Rights: The enclosed Software and associated documen-
tation are provided with RESTRICTED RIGHTS. Use, duplication, or disclosure by the
Government is subject to restrictions as set forth in subdivision (c)(1)(ii) of the Rights in
Technical Data and Computer Software clause at DFARS 252.277.7013 for DoD con-
tracts, paragraphs(c)(1) and (2) of the Commercial Computer Software-Restricted Rights
clause in the FAR (48 CFR 52.227-19) for civilian agencies, or in other comparable
agency clauses. The proprietor of the enclosed software and associated documentation
is Brooks/Cole, 511 Forest Lodge, Pacific Grove, CA 93950.
Limited Warranty: The warranty for the media on which the Software is provided
is for ninety (90) days from the original purchase and valid only if the packaging for
the Software was purchased unopened. If, during that time, you find defects in the
workmanship or material, the Publisher will replace the defective media. The Publisher
provides no other warranties, expressed or implied, including the implied warranties
of merchantability or fitness for a particular purpose, and shall not be liable for any
damages, including direct, special, indirect, incidental, consequential, or otherwise.
For Technical Support:
Voice: 1-800-423-0563 E-mail: [email protected].