Introduction To Engineering Statistics and Six Sigma
Introduction To Engineering Statistics and Six Sigma
Six Sigma
Theodore T. Allen
Introduction to
Engineering Statistics
and Six Sigma
Statistical Quality Control and Design of
Experiments and Systems
123
Theodore T. Allen, PhD
Department of Industrial Welding and Systems Engineering
The Ohio State University
210 Baker Systems
1971 Neil Avenue
Colombus, OH 43210-1271
USA
There are four main reasons why I wrote this book. First, six sigma consultants
have taught us that people do not need to be statistical experts to gain benefits from
applying methods under such headings as “statistical quality control” (SQC) and
“design of experiments” (DOE). Some college-level books intertwine the methods
and the theory, potentially giving the mistaken impression that all the theory has to
be understood to use the methods. As far as possible, I have attempted to separate
the information necessary for competent application from the theory needed to
understand and evaluate the methods.
Second, many books teach methods without sufficiently clarifying the context
in which the method could help to solve a real-world problem. Six sigma, statistics
and operations-research experts have little trouble making the connections with
practice. However, many other people do have this difficulty. Therefore, I wanted
to clarify better the roles of the methods in solving problems. To this end, I have
re-organized the presentation of the techniques and included several complete case
studies conducted by myself and former students.
Third, I feel that much of the “theory” in standard textbooks is rarely presented
in a manner to answer directly the most pertinent questions, such as:
Should I use this specific method or an alternative method?
How do I use the results when making a decision?
How much can I trust the results?
Admittedly, standard theory (e.g., analysis of variance decomposition,
confidence intervals, and defining relations) does have a bearing on these
questions. Yet the widely accepted view that the choice to apply a method is
equivalent to purchasing a risky stock investment has not been sufficiently
clarified. The theory in this book is mainly used to evaluate in advance the risks
associated with specific methods and to address these three questions.
Fourth, there is an increasing emphasis on service sector and bioengineering
applications of quality technology, which is not fully reflected in some of the
alternative books. Therefore, this book constitutes an attempt to include more
examples pertinent to service-sector jobs in accounting, education, call centers,
health care, and software companies.
In addition, this book can be viewed as attempt to build on and refocus material
in other books and research articles, including: Harry and Schroeder (1999) and
Pande et al. which comprehensively cover six sigma; Montgomery (2001) and
Besterfield (2001), which focus on statistical quality control; Box and Draper
viii Preface
(1987), Dean and Voss (1999), Fedorov and Hackl (1997), Montgomery (2000),
Myers and Montgomery (2001), Taguchi (1993), and Wu and Hamada (2000),
which focus on design of experiments.
At least 50 books per year are written related to the “six sigma movement”
which (among other things) encourage people to use SQC and DOE techniques.
Most of these books are intended for a general business audience; few provide
advanced readers the tools to understand modern statistical method development.
Equally rare are precise descriptions of the many methods related to six sigma as
well as detailed examples of applications that yielded large-scale returns to the
businesses that employed them.
Unlike many popular books on “six sigma methods,” this material is aimed at
the college- or graduate-level student rather than at the casual reader, and includes
more derivations and analysis of the related methods. As such, an important
motivation of this text is to fill a need for an integrated, principled, technical
description of six sigma techniques and concepts that can provide a practical guide
both in making choices among available methods and applying them to real-world
problems. Professionals who have earned “black belt” and “master black belt”
titles may find material more complete and intensive here than in other sources.
Rather than teaching methods as “correct” and fixed, later chapters build the
optimization and simulation skills needed for the advanced reader to develop new
methods with sophistication, drawing on modern computing power. Design of
experiments (DOE) methods provide a particularly useful area for the development
of new methods. DOE is sometimes called the most powerful six sigma tool.
However, the relationship between the mathematical properties of the associated
matrices and bottom-line profits has been only partially explored. As a result, users
of these methods too often must base their decisions and associated investments on
faith. An intended unique contribution of this book is to teach DOE in a new way,
as a set of fallible methods with understandable properties that can be improved,
while providing new information to support decisions about using these methods.
Two recent trends assist in the development of statistical methods. First,
dramatic improvements have occurred in the ability to solve hard simulation and
optimization problems, largely because of advances in computing speeds. It is now
far easier to “simulate” the application of a chosen method to test likely outcomes
of its application to a particular problem. Second, an increased interest in six sigma
methods and other formal approaches to making businesses more competitive has
increased the time and resources invested in developing and applying new
statistical methods.
This latter development can be credited to consultants such as Harry and
Schroeder (1999), Pande et al. (2000), and Taguchi (1993), visionary business
leaders such as General Electric’s Jack Welch, as well as to statistical software that
permits non-experts to make use of the related technologies. In addition, there is a
push towards closer integration of optimization, marketing, and statistical methods
into “improvement systems” that structure product-design projects from beginning
to end.
Statistical methods are relevant to virtually everyone. Calculus and linear
algebra are helpful, but not necessary, for their use. The approach taken here is to
minimize explanations requiring knowledge of these subjects, as far as possible.
Preface ix
This book is organized into three parts. For a single introductory course, the first
few chapters in Parts One and Two could be used. More advanced courses could be
built upon the remaining chapters. At The Ohio State University, I use each part for
a different 11 week course.
References
Box GEP, Draper NR (1987) Empirical Model-Building and Response Surfaces.
Wiley, New York
Besterfield D (2001) Quality Control. Prentice Hall, Columbus, OH
Breyfogle FW (2003) Implementing Six Sigma: Smarter Solutions® Using
Statistical Methods, 2nd edn. Wiley, New York
Dean A, Voss DT (1999) Design and Analysis of Experiments. Springer, Berlin
Heidelberg New York
Fedorov V, Hackl P (1997) Model-Oriented Design of Experiments. Springer,
Berlin Heidelberg New York
Harry MJ, Schroeder R (1999) Six Sigma, The Breakthrough Management
Strategy Revolutionizing The World’s Top Corporations. Bantam
Doubleday Dell, New York
Montgomery DC (2000) Design and Analysis of Experiments, 5th edn. John Wiley
& Sons, Inc., Hoboken, NJ
Montgomery DC (2001) Statistical Quality Control, 4th edn. John Wiley & Sons,
Inc., Hoboken, NJ
Myers RH, Montgomery DA (2001) Response Surface Methodology, 5th edn. John
Wiley & Sons, Inc., Hoboken, NJ
Pande PS, Neuman RP, Cavanagh R (2000) The Six Sigma Way: How GE,
Motorola, and Other Top Companies are Honing Their Performance.
McGraw-Hill, New York
Taguchi G (1993) Taguchi Methods: Research and Development. In Konishi S
(ed.) Quality Engineering Series, vol 1. The American Supplier Institute,
Livonia, MI
Wu CFJ, Hamada M (2000) Experiments: Planning, Analysis, and Parameter
Design Optimization. Wiley, New York
Acknowledgments
I thank my wife, Emily, for being wonderful. I thank my son, Andrew, for being
extremely cute. I also thank my parents, George and Jodie, for being exceptionally
good parents. Both Emily and Jodie provided important editing and conceptual
help. In addition, Sonya Humes and editors at Springer Verlag including Kate
Brown and Anthony Doyle provided valuable editing and comments.
Gary Herrin, my advisor, provided valuable perspective and encouragement.
Also, my former Ph.D. students deserve high praise for helping to develop the
conceptual framework and components for this book. In particular, I thank Liyang
Yu for proving by direct test that modern computers are able to optimize
experiments evaluated using simulation, which is relevant to the last four chapters
of this book, and for much hard work and clear thinking. Also, I thank Mikhail
Bernshteyn for his many contributions, including deeply involving my research
group in simulation optimization, sharing in some potentially important
innovations in multiple areas, and bringing technology in Part II of this book to the
marketplace through Sagata Ltd., in which we are partners. I thank Charlie Ribardo
for teaching me many things about engineering and helping to develop many of the
welding-related case studies in this book. Waraphorn Ittiwattana helped to develop
approaches for optimization and robust engineering in Chapter 14. Navara
Chantarat played an important role in the design of experiments discoveries in
Chapter 18. I thank Deng Huang for playing the leading role in our exploration of
variable fidelity approaches to experimentation and optimization. I am grateful to
James Brady for developing many of the real case studies and for playing the
leading role in our related writing and concept development associated with six
sigma, relevant throughout this book.
Also, I would like to thank my former M.S. students, including Chaitanya
Joshi, for helping me to research the topic of six sigma. Chetan Chivate also
assisted in the development of text on advanced modeling techniques (Chapter 16).
Also, Gavin Richards and many other students at The Ohio State University played
key roles in providing feedback, editing, refining, and developing the examples and
problems. In particular, Mike Fujka and Ryan McDorman provided the student
project examples.
In addition, I would like to thank all of the individuals who have supported this
research over the last several years. These have included first and foremost Allen
Miller, who has been a good boss and mentor, and also Richard Richardson and
David Farson who have made the welding world accessible; it has been a pleasure
xii Acknowledgments
to collaborate with them. Jose Castro, John Lippold, William Marras, Gary Maul,
Clark Mount-Campbell, Philip Smith, David Woods, and many others contributed
by believing that experimental planning is important and that I would some day
manage to contribute to its study.
Also, I would like to thank Dennis Harwig, David Yapp, and Larry Brown both
for contributing financially and for sharing their visions for related research.
Multiple people from Visteon assisted, including John Barkley, Frank Fusco, Peter
Gilliam, and David Reese. Jane Fraser, Robert Gustafson, and the Industrial and
Systems Engineering students at The Ohio State University helped me to improve
the book. Bruce Ankenman, Angela Dean, William Notz, Jason Hsu, and Tom
Santner all contributed.
Also, editors and reviewers played an important role in the development of this
book and publication of related research. First and foremost of these is Adrian
Bowman of the Journal of the Royal Statistical Society Series C: Applied
Statistics, who quickly recognized the value of the EIMSE optimal designs (see
Chapter 13). Douglas Montgomery of Quality and Reliability Engineering
International and an expert on engineering statistics provided key encouragement
in multiple instances. In addition, the anonymous reviewers of this book provided
much direct and constructive assistance including forcing the improvement of the
examples and mitigation of the predominantly myopic, US-centered focus.
Finally, I would like to thank six people who inspired me, perhaps
unintentionally: Richard DeVeaux and Jeff Wu, both of whom taught me design of
experiments according to their vision, Max Morris, who forced me to become
smarter, George Hazelrigg, who wants the big picture to make sense, George Box,
for his many contributions, and Khalil Kabiri-Bamoradian, who taught and teaches
me many things.
Contents
1 Introduction ............................................................................................. 1
1.1 Purpose of this Book ...................................................................... 1
1.2 Systems and Key Input Variables................................................... 2
1.3 Problem-solving Methods .............................................................. 6
1.3.1 What Is “Six Sigma”? ....................................................... 7
1.4 History of “Quality” and Six Sigma ............................................. 10
1.4.1 History of Management and Quality............................... 10
1.4.2 History of Documentation and Quality ........................... 14
1.4.3 History of Statistics and Quality ..................................... 14
1.4.4 The Six Sigma Movement .............................................. 17
1.5 The Culture of Discipline ............................................................. 18
1.6 Real Success Stories..................................................................... 20
1.7 Overview of this Book ................................................................. 21
1.8 References .................................................................................... 22
1.9 Problems....................................................................................... 22
Introduction
In Section 1.2, several terms are defined in relation to generic systems. These
definitions emphasize the diversity of the possible application areas. People in all
sectors of the world economy are applying the methods in this book and similar
books. These sectors include health care, finance, education, and manufacturing.
Next, in Section 1.3, problem-solving methods are defined. The definition of six
sigma is then given in Section 1.4 in terms of a method, and a few specific
principles and the related history are reviewed in Section 1.5. Finally, an overview
of the entire book is presented, building on the associated definitions and concepts.
x1 y1
x2 y2
x3 System y3
# #
xm yq
Figure 1.1. Diagram of a generic system
Assume that every system of interest is associated with at least one output
variable of prime interest to you or your team in relation to the effects of input
variable changes. We will call this variable a “key output variable” (KOV).
Often, this will be the monetary contribution of the system to some entity’s profits.
Other KOV are variables that are believed to have a reasonably strong predictive
relationship with at least one other already established KOV. For example, the
most important KOV could be an average of other KOVs.
“Key input variables” (KIVs) are directly controllable by team members, and
when they are changed, these changes will likely affect at least one key output
variable. Note that some other books use the terms “key process input variables”
(KPIVs) instead of key input variables (KIVs) and “key process output variables”
(KPOVs) instead of key output variables (KOVs). We omit the word “process”
because sometimes the system of interest is a product design and not a process.
Therefore, the term “process” can be misleading.
A main purpose of these generic-seeming definitions is to emphasize the
diversity of problems that the material in this book can address. Understandably,
students usually do not expect to study material applicable to all of the following:
(1) reducing errors in administering medications to hospital patients, (2) improving
the welds generated by a robotic welding cell, (3) reducing the number of errors in
Introduction 3
accounting statements, (4) improving the taste of food, and (5) helping to increase
the effectiveness of pharmaceutical medications. Yet, the methods in this book are
currently being usefully applied in all these types of situations around the world.
Another purpose of the above definitions is to clarify this book’s focus on
choices about the settings of factors that we can control, i.e., key input variables
(KIVs). While it makes common sense to focus on controllable factors, students
often have difficulty clarifying what variables they might reasonably be able to
control directly in relation to a given system. Commonly, there is confusion
between inputs and outputs because, in part, system inputs can be regarded as
outputs. The opposite is generally not true.
The examples that follow further illustrate the diversity of relevant application
systems and job descriptions. These examples also clarify the potential difficulty
associated with identifying KIVs and KOVs. Figure 1.2 depicts objects associated
with the examples, related to the medical, manufacturing, and accounting sectors of
the economy.
07/29/04 Account
x2
John Smith 48219
wrong account!
x1
07/31/04 Account
Travel 48207
Meals 48207
08/02/04 Account
Copier repair 52010
Answer: Possible KIVs and KOVs are listed in Table 1.1. Note also that the table
is written implying that there is only one type of drug being administered. If there
were a need to check the administration of multiple drugs, more output variables
would be measured and documented. Then, it might be reasonable to assign a KOV
as a weighted sum of the mistake amounts associated with different drugs.
4 Introduction to Engineering Statistics and Six Sigma
In the above example, there was an effort made to define KOVs specifically
associated with episodes and input combinations. In this case, it would also be
standard to say that there is only one output variable “mistake amount” that is
potentially influenced by bar-coding, the specific patient, and administration time.
In general, it is desirable to be explicit so that it is clear what KOVs are and how to
measure them. The purpose of the next example is to show that different people
can see the same problem and identify essentially different systems. With more
resources and more confidence with methods, people tend to consider
simultaneously more inputs that can be adjusted.
Table 1.1 Key input and output variables for the first bar-code investigation
KIV Description KOV Description
x1 Bar-coding (Y or N) y1 Mistake amount patient #1 with x1=N
y2 Mistake amount patient #2 with x1=N
# #
y501 Average amount with bar-coding
y502 Average amount without bar-coding
Answer: Possible KIVs and KOVs are listed in Table 1.2. Patient satisfaction
ratings are not included as KOVs. This follows despite the fact that all involved
believe they are important. However, according to the definition here, key output
variables must be likely to be affected by changes in the inputs being considered or
believed to have a strong predictive relationship with other KOVs. Also, note that
the team cannot control exactly how much time nurses spend with patients.
However, the team could write a policy such that nurses could tell patients, “I
cannot spend more than X minutes with you according to policy.”
Table 1.2 The list of inputs and outputs for the more thorough investigation
KIV Description KOV Description
Bar-coding (Y or N) Mistake amount patient-combo.
x1 y1
#1 (cc)
Spacing on tray Mistake amount patient-combo.
x2 y2
(millimeters) #2 (cc)
x3 Number of patients (#) # #
a
x4 Nurse-patient time Mistake amount patient-combo.
y1000
(minutes) #1000 (cc)
x5 Shift length (hours) Nurse #1 rating for input
y1002
combo. #1
# #
a
Stated policy is y1150 Nurse #15 rating for input
less than X combo. #20
Question: The shape of welds strongly relates to profits, in part because operators
commonly spend time fixing or reworking welds with unacceptable shapes. Your
team is investigating robot settings that likely affect weld shape, including weld
speed, voltage, wire feed speed, time in acid bath, weld torch contact tip-to-work
distance, and the current frequency. Define and list KIVs and KOVs and their
units.
Answer: Possible KIVs and KOVs are listed in Table 1.3. Weld speed can be
precisely controlled and likely affects bead shape and therefore profits. Yet, since
the number of parts made per minute likely relates to revenues per minute (i.e.,
throughput), it is also a KOV.
Table 1.3 Key input and output variables for the welding process design problem
x3 Voltage (Volts) # #
Table 1.4 Key input and output variables for the accounting systems design problem
KIV Description KOV Description
x1 New software (Y or N) y1 Number mistakes report #1
x2 Change(Y or N) y2 Number mistakes report #2
# #
y501 Average number mistakes x1=Y, x2=Y
y502 Average number mistakes x1=N, x2=Y
y503 Average number mistakes x1=Y, x2=N
y504 Average number mistakes x1=N, x2=N
key output variable (KOV) values. It is standard to refer to activities that result in
recommended inputs and other related knowledge as “problem-solving methods.”
Imagine that you had the ability to command a “system genie” with specific
types of powers. The system genie would appear and provide ideal input settings
for any system of interest and answer all related questions. Figure 1.3 illustrates a
genie based problem-solving method. Note that, even with a trustworthy genie,
steps 3 and 4 probably would be of interest. This follows because people are
generally interested in more than just the recommended settings. They would also
desire predictions of the impacts on all KOVs as a result of changing to these
settings and an educated discussion about alternatives.
In some sense, the purpose of this book is to help you and your team efficiently
transform yourselves into genies for the specific systems of interest to you.
Unfortunately, the transformation involves more complicated problem-solving
methods than simply asking an existing system genie as implied by Figure 1.3. The
methods in this book involve potentially all of the following: collecting data,
performing analysis and formal optimization tasks, and using human judgement
and subject-matter knowledge.
The definition of the phrase “six sigma” is somewhat obscure. People and
organizations that have played key roles in encouraging others to use the phrase
include the authors Harry and Schroeder (1999), Pande et al. (2000), and the
8 Introduction to Engineering Statistics and Six Sigma
American Society of Quality. These groups have clarified that “six sigma”
pertains to the attainment of desirable situations in which the fraction of
unacceptable products produced by a system is less than 3.4 per million
opportunities (PMO). In Part I of this book, the exact derivation of this number will
be explained. The main point here is that a key output characteristic (KOV) is often
the fraction of manufactured units that fail to perform up to expectations.
Here, the definition of six sigma is built on the one offered in Linderman et al.
(2003, p. 195). Writing in the prestigious Journal of Operations Management,
those authors emphasized the need for a common definition of six sigma and
proposed a definition paraphrased below:
The authors further described that while “the name Six Sigma suggests a goal”
of less than 3.4 unacceptable units PMO, they purposely did not include this
principle in the definition. This followed because six sigma “advocates establishing
goals based on customer requirements.” It is likely true that sufficient consensus
exists to warrant the following additional specificity about the six sigma method:
The six sigma method for completed projects includes as its phases either
Define, Measure, Analyze, Improve, and Control (DMAIC) for system
improvement or Define, Measure, Analyze, Design, and Verify (DMADV) for new
system development.
Note that some authors use the term Design For Six Sigma (DFSS) to refer to
the application of six sigma to design new systems and emphasize the differences
compared with system improvement activities.
Further, it is also probably true that sufficient consensus exists to include in the
definition of six sigma the following two principles:
Principle 1: The six sigma method only fully commences a project after
establishing adequate monetary justification.
Principle 2: Practitioners applying six sigma can and should benefit from
applying statistical methods without the aid of statistical experts.
Question: What aspects of six sigma suggest that it might not be another passing
management fad?
Answer: Admittedly, six sigma does share the characteristic of many fads in that
its associated methods and principles do not derive from any clear, rigorous
foundation or mathematical axioms. Properties of six sigma that suggest that it
might be relevant for a long time include: (1) the method is relatively specific and
therefore easy to implement, and (2) six sigma incorporates the principle of budget
justification for each project. Therefore, participants appreciate its lack of
ambiguity, and management appreciates the emphasis on the bottom line.
1. Instruction is “case-based” such that all people being trained are directly
applying what they are learning.
The following history is intended to establish a context for the current quality and
six sigma movements. This explanation of the history of management and quality
is influenced by Womack and Jones (1996) related to so-called “Lean Thinking”
and “value stream mapping” and other terms in the Toyota production system.
In the renaissance era in Europe, fine objects including clocks and guns were
developed using “craft” production. In craft production, a single skilled individual
is responsible for designing, building, selling, and servicing each item. Often, a
craftperson’s skills are certified and maintained by organizations called “guilds”
and professional societies.
During the 1600s and 1700s, an increasing number of goods and services were
produced by machines, particularly in agriculture. Selected events and the people
responsible are listed in Figure 1.4.
It was not until the early 1900s that a coherent alternative to craft production of
fine objects reached maturity. In 1914, Ford developed “Model T” cars using an
“assembly line” in which many unskilled workers each provided only a small
contribution to the manufacturing process. The term “mass production” refers to a
set of management policies inspired by assembly lines. Ford used assembly lines to
make large numbers of nearly identical cars. His company produced component
parts that were “interchangeable” to an impressive degree. A car part could be
taken from one car, put on another car, and still yield acceptable performance.
As the name would imply, another trait of mass production plants is that they
turn out units in large batches. For example, one plant might make 1000 parts of
one type using a press and then change or “set up” new dies to make 1000 parts of
a different type. This approach has the benefit of avoiding the costs associated with
large numbers of change-overs.
Introduction 11
i.e., creating a single unit of one type and then switching over to a single of another
type of unit and so on. This approach requires frequent equipment set-ups. To
compensate, the workers put much effort into reducing set-up costs, including the
time required for set-ups. Previously, many enterprises had never put effort into
reducing set-ups because they did not fully appreciate the importance.
Also, the total inventory at each stage in the process is generally regulated
using kanban cards. When the cards for a station are all used up, the process shuts
down the upstream station, which can result in shutting down entire supply chains.
The benefit is increased attention to the problems causing stoppage and (hopefully)
permanent resolution. Finally, lean production generally includes an extensive
debugging process; when a plant starts up with several stoppages, many people
focus on and eliminate the problems. With small batch sizes, “U” shaped cells, and
reduced WIP, process problems are quickly discovered before nonconforming units
accumulate.
Question: Assume that you and another person are tasked with making a large
number of paper airplanes. Each unit requires three operations: (1) marking, (2)
cutting, and (3) folding. Describe the mass and lean ways to deploy your resources.
Which might generate airplanes with higher quality?
Answer: A mass production method would be to have one person doing all the
marking and cutting and the other person doing all the folding. The lean way
would have both people doing marking, cutting, and folding to make complete
airplanes. The lean way would probably produce higher quality because, during
folding, people might detect issues in marking and cutting. That information would
be used the next time to improve marking and cutting with no possible loss
associated with communication. (Mass production might produce units more
quickly, however.)
Even today, in probably all automotive companies around the world, many
engineers are in “reactive mode,” constantly responding to unforeseen problems.
The term “fire-fighting” refers to reacting to these unexpected occurrences. The
need to fire-fight is, to a large extent, unavoidable. Yet the cost per design change
plot in Figure 1.5 is meant to emphasize the importance of avoiding problems
rather than fire-fighting. Costs increase because more and more tooling and other
coordination efforts are committed based on the current design as time progresses.
Formal techniques taught in this book can play a useful role in avoiding or
reducing the number of changes needed after Job 1, and achieving benefits
including reduced tooling and coordination costs and decreased need to fire-fight.
time
Job 1
Figure 1.5. Formal planning can reduce costs and increase agility
Question: With respect to manufacturing, how can freezing designs help quality?
Answer: Often the quality problem is associated with only a small fraction of units
that are not performing as expected. Therefore, the problem must relate to
something different that happened to those units, i.e., some variation in the
production system. Historically, engineers “tweaking” designs has proved to be a
major source of variation and thus a cause of quality problems.
14 Introduction to Engineering Statistics and Six Sigma
to the invention of calculus in the 1700s. Least squares regression estimation was
one of the first optimization problems addressed in the calculus/optimization
literature. In the early 1900s, statistical methods played a major role in improving
agricultural production in the U.K. and the U.S. These developments also led to
new methods, including fractional factorials and analysis of variance (ANOVA)
developed by Sir Ronald Fisher (Fisher 1925).
DOE applied in Formal methods
Modern ANOVA, 1924 Statistical
calculus charting manufacturing widespread for all
DOE, Deming (Box & Taguchi) decision-making
(Newton) popularization (Shewhart) in Japan (“six sigma”, Harry)
(Fisher)
| | | | | |
1700 1800 1900 1950 1980 2000
Least squares Galton regression Food & Drug Administration Service sector
(Laplace) generates confirmation need applications of
for statisticians statistics (Hoerl)
Figure 1.6. Timeline of selected statistical methods
statistical methods useful for helping to improve profits from methods useful for
such purposes as verifying the safety of foods and drugs (“standard statistics”), or
the assessment of threats from environmental contaminants.
Edward Deming is credited with playing a major role in developing so-called
“Total Quality Management” (TQM). Total quality management emphasized the
ideas of Shewhart and the role of data in management decision-making. TQM
continues to increase awareness in industry of the value of quality techniques
including design of experiments (DOE) and statistical process control (SPC). It
has, however, been criticized for leaving workers with only a vague understanding
of the exact circumstances under which the methods should be applied and of the
bottom line impacts.
Because Deming’s ideas were probably taken more seriously in Japan for much
of his career, TQM has been associated with technology transfer from the U.S. to
Japan and back to the U.S. and the rest of the world. Yet in general, TQM has little
to do with Toyota’s lean production, which was also technology transfer from
Japan to the rest of the world. Some credible evidence has been presented
indicating that TQM programs around the world have resulted in increased profits
and stock prices (Kaynak 2003). However, a perception developed in the 1980s
and 1990s that these programs were associated with “anti-business attitudes” and
“muddled thinking.”
This occurred in part because some of the TQM practices such as “quality
circles” have been perceived as time-consuming and slow to pay off. Furthermore,
the perception persists to this day that the roles of statistical methods and their use
in TQM are unclear enough to require the assistance of a statistical expert in order
to gain a positive outcome. Also, Deming placed a major emphasis on his “14
points,” which included #8, “Drive out fear” from the workplace. Some managers
and employees honestly feel that some fear is helpful. It was against this backdrop
that six sigma developed.
Question: Drawing on information from this chapter and other sources, briefly
describe three quality technologies transferred from Japan to the rest of the world.
Answer: First, lean production was developed at Toyota which has its headquarters
in Japan. Lean production includes two properties, among others: inventory at each
machine center is limited using kanban cards, and U-shaped cells are used in which
workers follow parts for many operations which instills worker accountability.
However, lean production might or might not relate to the best way to run a
specific operation. Second, quality circles constitute a specific format for sharing
quality-related information and ideas. Third, a Japanese consultant named Genechi
Taguchi developed some specific DOE methods with some advantages that will be
discussed briefly in Part II of this book. He also emphasized the idea of using
formal methods to help bring awareness of production problems earlier in the
design process. He argued that this can reduce the need for expensive design
changes after Job 1.
Introduction 17
The six sigma movement began in 1979 at Motorola when an executive declared
that “the real problem [is]…quality stinks.” With millions of critical characteristics
per integrated circuit unit, the percentage of acceptable units produced was low
enough that these quality problems obviously affected the company’s profits.
In the early 1980s, Motorola developed methods for problem-solving that
combined formal techniques, particularly relating to measurement, to achieve
measurable savings in the millions of dollars. In the mid-1980s, Motorola spun off
a consulting and training company called the “Six Sigma Academy” (SSA). SSA
president Mikel Harry led that company in providing innovative case-based
instruction, “black belt” accreditations, and consulting. In 1992, Allied Signal
based its companywide instruction on Six Sigma Academy techniques and began
to create job positions in line with Six Sigma training levels. Several other
companies soon adopted Six Sigma Academy training methods, including Texas
Instruments and ABB.
Also during the mid-1990s, multiple formal methodologies to structure product
and process improvement were published. These methodologies have included
Total Quality Development (e.g., see Clausing 1994), Taguchi Methods (e.g., see
Taguchi 1993), the decision analysis-based framework (e.g., Hazelrigg 1996), and
the so-called “six sigma” methodology (Harry and Schroeder 1999). All these
published methods developments aim to allow people involved with system
improvement to use the methods to structure their activities even if they do not
fully understand the motivations behind them.
In 1995, General Electric (GE) contracted with the “Six Sigma Academy” for
help in improving its training program. This was of particular importance for
popularizing six sigma because GE is one of the world’s most admired companies.
The Chief Executive Officer, Jack Welch, forced employees at all levels to
participate in six sigma training and problem-solving approaches. GE’s approach
was to select carefully employees for Black Belt instruction, drawing from
employees believed to be future leaders. One benefit of this approach was that
employees at all ranks associated six sigma with “winners” and financial success.
In 1999, GE began to compete with Six Sigma Academy by offering six sigma
training to suppliers and others. In 2000, the American Society of Quality initiated
its “black belt” accreditation, requiring a classroom exam and signed affidavits that
six sigma projects had been successfully completed.
Montgomery (2001) and Hahn et al. (1999) have commented that six sigma
training has become more popular than other training in part because it ties
standard statistical techniques such as control charts to outcomes measured in
monetary and/or physical terms. No doubt the popularity of six sigma training also
derives in part from the fact that it teaches an assemblage of techniques already
taught at universities in classes on applied statistics, such as gauge repeatability
and reproducibility (R&R), statistical process control (SPC), design of experiments
(DOE), failure modes and effects analysis (FMEA), and cause and effect matrices
(C&E).
All of the component techniques such as SPC and DOE are discussed in Pande
et al. (2000) and defined here. The techniques are utilized and placed in the context
18 Introduction to Engineering Statistics and Six Sigma
Answer: Six sigma is a generic method for improving systems or designing new
products, while lean manufacturing has a greater emphasis on the best structure, in
Toyota’s view, of a production system. Therefore, six sigma focuses more on how
to implement improvements or new designs using statistics and optimization
methods in a structured manner. Lean manufacturing focuses on what form to be
implemented for production systems, including specific high-level decisions
relating to inventory management, purchasing, and scheduling of operations, with
the goal of emulating the Toyota Production System. That being said, there are
“kaizen events” and “value stream mapping” activities in lean production. Still, the
overlap is small enough that many companies have combined six sigma and lean
manufacturing efforts under the heading “lean sigma.”
Why might formal methods be more likely than trial and error to achieve these
extreme quality levels? Here, we will use the phrase “One-Factor-at-a-Time”
(OFAT) to refer to trial-and-error experimentation, following the discussion in
Czitrom (1999). Intuitively, one performs experimentation because one is uncertain
which alternatives will give desirable system outputs. Assume that each alternative
tested thoroughly offers a roughly equal probability of achieving process goals.
Then the method that can effectively thoroughly test more alternatives is more
likely to result in better outcomes.
Formal methods (1) spread tests out inside the region of interest where good
solutions are expected to be and (2) provide a thorough check of whether changes
help. For example, by using interpolation models, e.g., linear regressions or neural
nets, one can effectively thoroughly test all the solutions throughout the region
spanned by these experimental runs.
OFAT procedures have the advantages of being relatively simple and
permitting opportunistic decision-making. Yet, for a given number of experimental
runs, these procedures effectively test far fewer solutions, as indicated by the
regions in Figure 1.7 below. Imagine the dashed lines indicate contours of yield as
a function of two control factors, x1 and x2. The chance that the OFAT search area
contains the high yield required to be competitive is far less than the formal
method search area.
x1
Figure 1.7. Formal procedures search much larger spaces for comparable costs
A good engineer can design products that work well under ideal circumstances.
It is far more difficult, however, to design a product that works well for a range of
conditions, i.e., noise factor settings as defined originally by Taguchi. This reason
is effectively a restatement of the first reason because it is intuitively clear that it is
noise factor variation that causes the yields to be less than 100.00000%. Something
must be changing in the process and/or the environment. Therefore, the designers’
challenge, clarified by Taguchi, is to design a product that gives performance
robust to noise factor variation. To do this, the experimenter must consider an
expanded list of factors including both control and noise factors. This tends to
favor formal methods because typically the marginal cost of adding factors to the
experimental plan in the context of formal methods (while achieving comparable
20 Introduction to Engineering Statistics and Six Sigma
Answer: Specific evidence that competitor companies are saving money is most
likely to make management excited about formal techniques. Also, many people at
all levels are impressed by success stories. The theory that discipline might
substitute for red tape might also be compelling.
1.8 References
Bloom BS (ed.) (1956) Taxonomy of Educational Objectives. (Handbook I:
Cognitive Domain). Longmans, Green and Co., New York
Clausing D (1994) Total Quality Development: A Step-By-Step Guide to World-
Class Concurrent Engineering. ASME Press, New York
Collins, J (2001) Good to Great: Why Some Companies Make the Leap... and
Others Don’t. Harper-Business, New York
Czitrom V (1999) One-Factor-at-a-Time Versus Designed Experiments. The
American Statistician 53 (2):126-131
Fisher RA. (1925) Statistical Methods for Research Workers. Oliver and Boyd,
London
Hahn, GJ, Hill, WJ, Hoer, RW, and Zinkgraft, SA (1999) The Impact of Six Sigma
Improvement--A Glimpse into the Future of Statistics. The American
Statistician, 532:208-215.
Harry MJ, Schroeder R (1999) Six Sigma, The Breakthrough Management
Strategy Revolutionizing The World’s Top Corporations. Bantam
Doubleday Dell, New York
Hazelrigg G (1996) System Engineering: An Approach to Information-Based
Design. Prentice Hall, Upper Saddle River, NJ
Kaynak H (2003) The relationship between total quality management practices and
their effects on firm performance. The Journal of Operations Management
21:405-435
Linderman K, Schroeder RG, Zaheer S, Choo AS (2003) Six Sigma: a goal-
theoretic perspective. The Journal of Operations Management 21:193-203
Pande PS, Neuman RP, Cavanagh R (2000) The Six Sigma Way: How GE,
Motorola, and Other Top Companies are Honing Their Performance.
McGraw-Hill, New York
Taguchi G (1993) Taguchi Methods: Research and Development. In: Konishi S
(ed.) Quality Engineering Series, vol 1. The American Supplier Institute,
Livonia, MI
Welch J, Welch S (2005) Winning. HarperBusiness, New York
Womack JP, Jones DT (1996) Lean Thinking. Simon & Schuster, New York
Womack JP, Jones DT, Roos D (1991) The Machine that Changed the World: The
Story of Lean Production. Harper-Business, New York
1.9 Problems
In general, pick the correct answer that is most complete.
1. Consider the toy system of paper airplanes. Which of the following constitute
possible design KIVs and KOVs?
a. KIVs include time unit flies dropped from 2 meters and KOVs include
wing fold angle.
b. KIVs include wing fold angle in design and KOVs include type of
paper in design.
Introduction 23
c. KIVs include wing fold angle and KOVs include time unit flies
assuming a 2 meters drop.
d. Answers in parts “a” and “b” are both correct.
e. Answers in parts “a” and “c” are both correct.
3. Assume that you are paid to aid with decision-making about settings for a die
casting process in manufacturing. Engineers are frustrated by the amount of
flash or spill-out they must clean off the finished parts and the deviations of
the part dimensions from the nominal blueprint dimensions. They suggest that
the preheat temperature and injection time might be changeable. They would
like to improve the surface finish rating (1-10) but strongly doubt whether any
factors would affect this. Which of the following constitute KIVs and KOVs?
a. KIVs include deviation of part dimensions from nominal and KOVs
include surface finish rating.
b. KIVs include preheat temperature and KOVs include deviation of
part dimensions from nominal.
c. KIVs include surface finish rating and KOVs include deviation of
part dimensions from nominal.
d. Answers in parts “a” and “b” are both correct.
e. Answers in parts “a” and “c” are both correct.
9. How does six sigma training differ from typical university instruction?
Explain in two sentences.
10. List two perceived problems associated with TQM that motivated the
development of six sigma.
11. Which of the following is the lean production way to making three
sandwiches?
a. Lay out six pieces of bread, add tuna fish to each, add mustard, fold
all, and cut.
b. Lay out two pieces of bread, add tuna fish, mustard, fold, and cut.
Repeat.
c. Lay out the tuna and mustard, order out deep-fat fried bread and wait.
d. Answers in parts “a” and “b” are both correct.
e. Answers in parts “a” and “c” are both correct.
12. Which of the following were innovations associated with mass production?
a. Workers did not need much training since they had simple, small
tasks.
b. Guild certification built up expertise among skilled tradesmen.
c. Interchangeability of parts permitted many operations to be
performed usefully at one time without changing over equipment.
d. Answers in parts “a” and “b” are both correct.
e. Answers in parts “a” and “c” are both correct.
Introduction 25
13. In two sentences, explain the relationship between mass production and lost
accountability.
15. In two sentences, summarize the relationship between lean production and
quality.
16. Give an example of a specific engineered system and improvement system that
might be relevant in your work life.
17. Provide one modern example of craft production and one modern example of
mass production. Your examples do not need to be in traditional
manufacturing and could be based on a task in your home.
18. Which of the following are benefits of freezing a design long before Job 1?
a. Your design function can react to data after Job 1.
b. Tooling costs more because it becomes too easy to do it correctly.
c. It prevents reactive design tinkering and therefore reduces tooling
costs.
d. Answers in parts “a” and “b” are both correct.
e. Answers in parts “a” and “c” are both correct.
19. Which of the following are benefits of freezing a design long before Job 1?
a. It encourages people to be systematic in attempts to avoid problems.
b. Design changes cost little since tooling has not been committed.
c. Fire-fighting occurs more often.
d. Answers in parts “a” and “b” are both correct.
e. Answers in parts “a” and “c” are both correct.
20. Which of the following are perceived benefits of being ISO certified?
a. Employees must share information and agree on which practices are
best.
b. Inventory is reduced because there are smaller batch sizes.
c. Training costs are reduced since the processes are well documented.
d. Answers in parts “a” and “b” are both correct.
e. Answers in parts “a” and “c” are both correct.
21. Which of the following are problems associated with gaining ISO
accreditation?
a. Resources must be devoted to something not on the value stream.
b. Managers may be accused of “tree hugging” because fear can be
useful.
c. Employees rarely feel stifled because of a bureaucratic hurdles are
eliminated.
d. Answers in parts “a” and “b” are both correct.
26 Introduction to Engineering Statistics and Six Sigma
24. Suppose one defines two basic levels of understanding of the material in this
book to correspond to “green belt” (lower) and “black belt” (higher).
Considering Bloom’s Taxonomy, and inspecting this book’s table of contents,
what types of knowledge and abilities would a green belt have and what types
of knowledge would a black belt have?
25. Suppose you were going to teach a fifteen year old about your specific major
and its usefulness in life. Provide one example of knowledge for each level in
Bloom’s Taxonomy.
2.1 Introduction
The phrase “statistical quality control” (SQC) refers to the application of
statistical methods to monitor and evaluate systems and to determine whether
changing key input variable (KIV) settings is appropriate. Specifically, SQC is
associated with Shewhart’s statistical process charting (SPC) methods. These SPC
methods include several charting procedures for visually evaluating the
consistency of key process outputs (KOVs) and identifying unusual circumstances
that might merit attention.
In common usage, however, SQC refers to many problem-solving methods.
Some of these methods do not relate to monitoring or controlling processes and do
not involve complicated statistical theory. In many places, SQC has become
associated with all of the statistics and optimization methods that professionals use
in quality improvement projects and in their other job functions. This includes
methods for design of experiments (DOE) and optimization. In this book, DOE and
optimization methods have been separated out mainly because they are the most
complicated quality methods to apply and understand.
In Section 2.2, we preview some of the SQC methods described more fully later
in this book. Section 2.3 relates these techniques to possible job descriptions and
functions in a highly formalized organization. Next, Section 2.4 discusses the
possible roles the different methods can play in the six sigma problem-solving
method.
The discussion of organizational roles leads into the operative definition of
quality, which we will define as conformance to design engineering’s
specifications. Section 2.5 explores related issues including the potential difference
between nonconforming and defective units. Section 2.6 concludes the chapter by
describing how standard operating procedures capture the best practices derived
from improvement or design projects.
30 Introduction to Engineering Statistics and Six Sigma
In the chapters that follow, these and many other techniques are described in
detail, along with examples of how they have been used in real-world projects to
facilitate substantial monetary savings.
1. Define terminates when specific goals for the system outputs are clarified
and the main project participants are identified and committed to project
success.
2. Measure involves establishing the capability of the technology for
measuring system outputs and using the approved techniques to evaluate
the state of the system before it is changed.
3. Analyze is associated with developing a qualitative and/or quantitative
evaluation of how changes to system inputs affect system outputs.
4. Improve involves using the information from the analyze phase to
develop recommended system design inputs.
5. Control is the last phase in which any savings from using the newly
recommended inputs is confirmed, lessons learned are documented, and
plans are made and implemented to help guarantee that any benefits are
truly realized.
Often, six sigma improvement projects last three months, and each phase
requires only a few weeks. Note that for new system design projects, the design
and verify phases play somewhat similar roles to the improve and control phases in
improvement projects. Also, the other phases adjust in intuitive ways to address the
reality that in designing a new system, potential customer needs cannot be
measured by any current system.
While it is true that experts might successfully use any technique in any phase,
novices sometimes find it helpful to have more specific guidance about which
techniques should be used in which phase. Table 2.1 is intended to summarize the
associations of methods with major project phases most commonly mentioned in
the six sigma literature.
Table 2.1. Abbreviated list of methods and their role in improvement projects
Method Phases
Acceptance Sampling Define, Measure, Control
Benchmarking Define, Measure, Analyze
Control Planning Control, Verify
Design of Experiments Analyze, Design, Improve
Failure Mode & Effects Analyze, Control, Verify
Analysis (FMEA)
Formal Optimization Improve, Design
Gauge R&R Measure, Control
Process Mapping Define, Analyze
Quality Function Measure, Analyze, Improve
Deployment (QFD)
Regression Define, Analyze, Design, Improve
SPC Charting Measure, Control
Statistical Quality Control and Six Sigma 33
Question: A team is trying to evaluate the current system inputs and measurement
system. List three methods that might naturally be associated with this phase.
Answer: From the above definitions, the question pertains to the “measure” phase.
Therefore, according to Table 2.1, relevant methods include Gauge R&R, SPC
charting, and QFD.
Question: In addition to the associations in Figure 2.1, list one other department
that might use acceptance sampling. Explain in one sentence.
Answer: Production might use acceptance sampling. When the raw materials or
other input parts show up in lots (selected by procurement), production might use
acceptance sampling to decide whether to reject these lots.
34 Introduction to Engineering Statistics and Six Sigma
Customer
Figure 2.1. Methods which might most likely be used by each department group
x2 = 10.0 % carbon
(must be < 10.5%)
x1 = 5.20 ± 0.20
millimeters
x3 =
80.5 ± 0.2 º
Figure 2.2. Part of blueprint for custom designed screw with two KIVs
The maximum value allowed on a blueprint for a characteristic is called the upper
specification limit (USL). For example, the LSL for x1 is 5.00 millimeters for the
blueprint in Figure 2.2 and the USL for x3 is 80.7º. For certain characteristics, there
might be only an LSL or a USL but not both. For example, the characteristic x2 in
Figure 2.2 has USL = 10.5% and no LSL.
Note that nominal settings of quality characteristics are inputs, in the sense that
the design engineer can directly control them by changing numbers, usually in an
electronic file. However, in manufacturing, the actual corresponding values that
can be measured are uncontrollable KOVs. Therefore, quality characteristics are
associated with nominals that are KIVs (xs) and actual values that are KIVs (ys).
In many real-world situations, the LSL and USL define quality. Sometimes
these values are written by procurement into contracts. A “conforming” part or
product has all quality characteristic values, within the relevant specification limits.
Other parts or products are called “nonconforming,” since at least one
characteristic fails to conform to specifications. Manufacturers use the term
“nonconformity” to describe each instance in which a part or product’s
characteristic value falls outside its associated specification limit. Therefore, a
given part or unit might have many nonconformities. A “defective” part or product
yields performance sufficiently below expectations such that its safe or effective
usage is prevented. Manufacturers use the term “defect” to describe each instance
in which a part or product’s characteristic value causes substantially reduced
product performance. Clearly, a defective unit is not necessarily nonconforming
and vice versa. This follows because designers can make specifications without full
knowledge of the associated effects on performance.
Table 2.2 shows the four possibilities for any given characteristic of a part or
product. The main purpose of Table 2.2 is to call attention to the potential
fallibility of specifications and the associated losses. The arguably most serious
case occurs when a part or product’s characteristic value causes a defect but meets
specifications. In this case, a situation could conceivably occur in which the
supplier is not contractually obligated to provide an effective part or product.
Worse still, this case likely offers the highest chance that the defect might not be
detected. The defect could then cause problems for customers.
Table 2.2. Possibilities associated with any given quality characteristic value
Another kind of loss occurs when production and/or outside suppliers are
forced to meet unnecessarily harsh specifications. In these cases, a product
36 Introduction to Engineering Statistics and Six Sigma
characteristic can be nonconforming, but the product is not defective. This can
cause unnecessary expense because efforts to make products consistently conform
to specifications can require additional tooling and personnel expenses. This type
of waste, however, is to a great extent unavoidable.
Note that a key input variable (KIV) in the eyes of engineering design can be a
key output variable (KOV) for production, because engineering design is
attempting to meet customer expectations for designed products or services. To
meet these expectations, design engineering directly controls the ideal nominal
quality characteristic values and specifications. Production tries to manipulate
process settings so that the parts produced meet the expectations of design
engineering in terms of the quality characteristic values. Therefore, for production,
the controllable inputs are settings on the machines, and the characteristics of units
that are generated are KOVs. Therefore, we refer to “quality characteristics”
instead of KIVs or KOVs.
Answer: Figure 2.3 shows the added characteristic x4. The LSL is 81.3º and the
USL is 81.7º. If x4 equalled 95.0º, that would constitute both a nonconformity,
because 95.0º > 81.7º, and a defect, because the customer would have difficulty
inserting the screw.
x2 = 10.0 % carbon
(must be < 10.5%)
x1 = 5.20 ± 0.20
millimeters
x3 =
80.5 ± 0.2 º
x4 =
81.5 ± 0.2 º
Answer: Table 2.3 below contains a SOP for paper helicopter manufacturing.
38 Introduction to Engineering Statistics and Six Sigma
3
7 4 5 6
1
Figure 2.4. Helicopter cut (__) and fold (--) lines (not to scale, grid spacing = 1 cm)
Note that not all information in a blueprint, including specification limits, will
necessarily be included in a manufacturing SOP. Still, the goal of the SOP is, in an
important sense, to make products that consistently conform to specifications.
The fact that there are multiple possible SOPs for similar purposes is one of the
central concepts of this book. The details of the SOPs could be input parameters
for a system design problem. For example, the distances 23 centimeter and 5
centimeter in the above paper helicopter example could form input parameters x1
and x2 in a system design improvement project. It is also true that there are multiple
ways to document what is essentially the same SOP. The example below is
intended to offer an alternative SOP to make identical helicopters.
Statistical Quality Control and Six Sigma 39
Answer: Table 2.4 below contains a concise SOP for paper helicopter
manufacturing.
(a) (b)
Figure 2.5. (a) Paper with cut and fold lines (grid spacing is 1 cm); (b) desired result
With multiple ways to document the same operations, the question arises: what
makes a good SOP? Many criteria can be proposed to evaluate SOPs, including
cost of preparation, execution, and subjective level of professionalism. Perhaps the
most important criteria in a manufacturing context relate to the performance that a
given SOP fosters in the field. In particular, if this SOP is implemented in the
company divisions, how desirable are the quality outcomes? Readability,
conciseness, and level of detail may affect the outcomes in unexpected ways. The
next chapters describe how statistical process control (SPC) charting methods
provide thorough ways to quantitatively evaluate the quality associated with
manufacturing SOPs.
40 Introduction to Engineering Statistics and Six Sigma
Quite often, SOPs are written to regulate a process for measuring a key output
variable (KOV) of interest. For example, a legally relevant SOP might be used by a
chemical company to measure the Ph in fluid flows to septic systems. In this book,
the term “measurement SOPs” refers to SOPs where the associated output is a
number or measurement. This differs from “production SOPs” where the output is
a product or service. An example of a measurement SOP is given below. In the
next chapters, it is described how gauge R&R methods provide quantitative ways
to evaluate the quality of measurement SOPs.
Answer: Table 2.5 describes a measurement SOP for timing paper helicopters.
2.7 References
Harry, MJ, Schroeder R (1999) Six Sigma, The Breakthrough Management
Strategy Revolutionizing The World’s Top Corporations. Bantam
Doubleday Dell, New York
Pande PS, Neuman RP, Cavanagh, R (2000) The Six Sigma Way: How GE,
Motorola, and Other Top Companies are Honing Their Performance.
McGraw-Hill, New York
Statistical Quality Control and Six Sigma 41
2.8 Problems
In general, pick the correct answer that is most complete or inclusive.
8. A large number of lots have shown up on a shipping dock, and their quality
has not been ascertained. Which method(s) would be obviously helpful?
a. Acceptance sampling
b. DOE
c. Formal optimization
d. Answers in parts “a” and “b” are both correct.
e. Answers in parts “a” and “c” are both correct.
9. Based on Table 2.1, which methods are useful in the first phase of a project?
10. Based on Table 2.1, which methods are useful in the last phase of a project?
13. According to Chapter 2, which would most likely use acceptance sampling?
a. Sales and logistics
Statistical Quality Control and Six Sigma 43
b. Design engineering
c. Procurement
d. Answers in parts “a” and “b” are both correct.
e. Answers in parts “a” and “c” are both correct.
14. According to the chapter, which would most likely use formal optimization?
a. Design engineering
b. Production engineering
c. Process engineering
d. All of the above are correct.
17. Create a blueprint of an object you design including two quality characteristics
and associated specification limits.
18. Propose an additional quality characteristic for the screw design in Figure 2.3
and give associated specification limits.
25. In two sentences, critique the SOP in Table 2.3. What might be unclear to an
operator trying to follow it?
26. In two sentences, critique the SOP in Table 2.5. What might be unclear to an
operator trying to follow it?
3
3.1 Introduction
This chapter focuses on the definition of a project, including the designation of
who is responsible for what progress by when. By definition, those applying six
sigma methods must answer some or all of these questions in the first phase of
their system improvement or new system design projects. Also, according to what
may be regarded as a defining principle of six sigma, projects must be cost-
justified or they should not be completed. Often in practice, the needed cost
justification must be established by the end of the “define” phase.
A central theme in this chapter is that the most relevant strategies associated
with answering these questions relate to identifying so-called “subsystems” and
their associated key input variables (KIVs) and key output variables (KOVs).
Therefore, the chapter begins with an explanation of the concept of systems and
subsystems. Then, the format for documenting the conclusions of the define phase
is discussed, and strategies are briefly defined to help in the identification of
subsystems and associated goals for KOVs.
Next, specific methods are described to facilitate the development of a project
charter, including benchmarking, meeting rules, and Pareto charting. Finally, one
reasonably simple method for documenting significant figures is presented.
Significant figures and the implied uncertainty associated with numbers can be
important in the documentation of goals and for decision-making.
As a preliminary, consider that a first step in important projects involves
searching the available literature. Search engines such as google and yahoo are
relevant. Also, technical indexes such as the science citation index and compendex
are relevant. Finally, consider using governmental resources such as the National
Institute of Standards (NIST) and the United States Patent Office web sites.
46 Introduction to Engineering Statistics and Six Sigma
x1 System
Subsystem #1 Ϳ1 y1
x2 y2
x3 Ϳ2 y3
x4 Subsystem #2 y4
Ϳ3
#
x5 # y101
Ϳ52
Subsystem #3
Question: Consider a system in which children make and sell lemonade. Define
two subsystems, each with two inputs and outputs and one intermediate variable.
Answer: Figure 3.2 shows the two subsystems: (1) Product Design and (2) Sales &
Marketing. Inputs to the product design subsystem are: x1, percentage of sugar in
Define Phase and Strategy 47
x1 Lemonade Stand
x2 Product Design Ϳ1 y1
y2
x3 Sales & Marketing
• Subsystem
• Starting KOVs
personnel • Target values
on team • Deliverables
Who? What?
How Much? When?
• Expected • Target date
profit from for project
project deliverables
Answer: The younger sibling seeks to deliver a recipe specifying what percentage
of sweetener to use (x1) with a target average taste rating (ӻ1) increase greater than
1.5 units as measured by three family measures on a 1-10 scale. It is believed that
taste ratings will drive sales, which will in turn drive profits. In the approved view
of the younger sibling, it is not necessary that the older sibling will personally
prefer the taste of the new lemonade recipe.
In defining who is on the project team, common sense dictates that the
personnel included should be representative of people who might be affected by
the project results. This follows in part because affected people are likely to have
the most relevant knowledge, giving the project the best chance to succeed. The
phrase “not-invented-here syndrome” (NIHS) refers to the powerful human
tendency to resist recommendations by outside groups. This does not include the
tendency to resist orders from superiors, which constitutes insubordination, not
Define Phase and Strategy 49
NIHS. NIHS implies resistance to fully plausible ideas that are resisted purely
because of their external source. By including on the team people who will be
affected, we can sometimes develop the “buy-in” need to reduce the effects of the
not-invented-here syndrome. Scope creep can be avoided by including all of these
people on the team.
In defining when a project should be completed, an important concern is to
complete the project soon enough so that the deliverables are still relevant to the
larger system needs. Many six sigma experts have suggested project timeframes
between two and six months. For projects on the longer side of this range, charters
often include a schedule for deliverables rendered before the final project
completion. In general, the project timeframe limits imply that discipline is
necessary when selecting achievable scopes.
There is no universally used format for writing project charters. The following
example, based loosely on a funded industrial grant proposal, illustrates one
possible format. One desirable feature of this format is its brevity. In many cases, a
three-month timeframe permits an effective one-page charter. The next subsection
focuses on a simple model for estimating expected profits from projects.
Answer:
Often projects focus only on a small subsystem that is not really autonomous inside
a company. Therefore, it is difficult to evaluate the financial impact of the project
on the company bottom line. Yet an effort to establish this linkage is generally
50 Introduction to Engineering Statistics and Six Sigma
project. The main deliverable was a fastener design in 3D computer aided design
(CAD) format. The result achieved a 50% increase in pull-apart strength by
manipulating five KIVs in the design. The new design is saving $300K/year by
reducing assembly costs for two product lines (not including project expense). A
similar product line uses a different material. Estimate the expected profit from this
project, assuming a two-year horizon.
Answer: As savings do not derive from rework and scrap reductions, we cannot
use Equation (3.1). However, since $300K/year was saved on two product lines in
similar circumstances, it is likely that $150K/year in costs could be reduced
through application to a single new product line. Therefore, expected savings over
a 2-year horizon would be 2.0 years × $150K/year = $300K. With three engineers
working 25% time for 0.25 year, the person-years of project expense should be 3 ×
0.25 × 0.25 = 0.1875. Therefore, the expected profits from the model in Equation
(3.2) would be $300K – $18.73K = $281K.
In their influential book The Goal, Goldratt and Cox (1992) offer ideas relevant to
subset selection. It is perhaps fair to rephrase their central thesis as follows:
however. After other people improve the bottleneck system or “alleviate the
bottleneck,” there is a chance that the subsystem under consideration will become a
bottleneck.
The term “categorical factor” refers to inputs that take on only qualitatively
different settings. The term “design concept” is often used to refer to one level
setting for a categorical factor. For example, one design concept for a car could be
a rear engine, which is one setting of the categorical factor of engine type. In the
development of systems and subsystems, only a finite number of design concepts
can be considered at any one time due to resource limitations. The phrase “go-no-
go decisions” refers to the possible exclusion from consideration of one or more
design concepts or projects. For example, an expensive way to arc weld aluminum
might be abandoned in favor of cheaper methods because of a go-no-go decision.
The benefits of go-no-go decisions are similar to the benefits of freezing designs
described in Chapter 1.
One relevant goal of improvement or design projects is to make go-no-go
decisions decisively. For example, the design concept snap tabs might be
competing with the design concept screws for an automotive joining design
problem. The team might explore the strength of snap tabs to decide which concept
should be used.
A related issue is the possible existence of subsystems that are nearly identical.
For example, many product lines could benefit potentially from changing their
joining method to snap tabs. This creates a situation in which one subsystem may
be tested, and multiple go-no-go decisions might result. The term “worst-case
analysis” refers to the situation in which engineers experiment with the subsystem
that is considered the most challenging. Then they make go-no-go decisions for all
the other nearly duplicate systems.
Question: Children are selling pink and yellow lemonade on a busy street with
many possible customers. The fraction of sugar is the same in pink and yellow
lemonade, and the word-of-mouth is that the lemonades are both too sweet,
particularly the pink type, which results in lost sales. Materials are available at
negligible cost. Making reference to TOC and worst-case analysis, suggest a
subsystem for improvement with 1 KIV and 1 KOV.
Answer: TOC suggests focusing on the apparent bottlenecks, which are the
product design subsystems, as shown in Figure 3.4. This follows because
manufacturing costs are negligible and the potential customers are aware of the
products. A worst-case analysis strategy suggests further focusing on the pink
lemonade design subsystem. This follows because if the appropriate setting for the
fraction of sugar input factor, x1, is found for that product, the design setting would
likely improve sales of both pink and yellow lemonade. A reasonable intermediate
variable to focus on would be the average taste rating, ӻ1.
Define Phase and Strategy 53
Lemonade Stand
x1 Pink Lemonade Design Ϳ1
y1
y2
x2 Yellow Lemonade Design Ϳ2
Step 1. List the types of nonconformities or causes associated with failing units.
Step 2. Count the number of nonconformities of each type or cause.
Step 3. Sort the nonconformity types or causes in decending order by the counts.
Step 4. Create a category called “other,” containing all counts associated with
nonconformity or cause counts subjectively considered to be few in
number.
Step 5. Bar-chart the counts using the type of nonconformity or causal labels.
The “Pareto rule” for cost Pareto charts is that often 20% of the causes are
associated with greater than 80% of the costs. The implications for system design
of cost Pareto charts are similar to those of ordinary Pareto charts.
Define Phase and Strategy 55
Answer: Table 3.1 shows the results of Steps 1-3 for both charting procedures.
Note that there are probably not enough nonconformity types to make it desirable
to create “other” categories. Figure 3.5 shows the two types of Pareto charts. The
ordinary chart shows that focusing the project scope on the KOV battery life and
the associated subsystem will probably affect the most people. The second chart
suggests that shielding issues, while rare, might also be prioritized highly for
attention.
14 120000
Count of Nonconformities
8
60000
6
40000
4
2 20000
0 0
electro. shielding
battery life
electro. shielding
irreg. heart beat
discomfort
battery life
lethargy
discomfort
lethargy
(a) (b)
Figure 3.5. (a) Pareto chart and (b) cost Pareto chart of hypothetical nonconformities
56 Introduction to Engineering Statistics and Six Sigma
Note that the Pareto rule applies in the above example, since 80% of the
nonconformities are associated with one type of nonconformity or cause, battery
life. Also, this hypothetical example involves ethical issues since serious
consequences for human patients are addressed. While quality techniques can be
associated with callousness in general, they often give perspective that facilitates
ethical judgements. In some cases, failing to apply methods can be regarded as
ethically irresponsible.
A “check sheet” is a tabular compilation of the data used for Pareto charting. In
addition to the total count vs nonconformity type or cause, there is also information
about the time in which the nonconformities occurred. This information can aid in
identifying trends and the possibility that a single cause might be generating
multiple nonconformities. A check sheet for the pacemaker example is shown in
Table 3.2. From the check sheet, it seems likely that battery issues from certain
months might have causing all other problems except for shielding. Also, these
issues might be getting worse in the summer.
Production Date
Nonconformity Jan. Feb. May April May June July Total
Battery life 3 4 5 12
Irregular heart beat 2 2
Electromagnetic shielding 1 1
Discomfort 1 1
Lethargy 1 1
3.5.2 Benchmarking
The term “benchmarking” means setting a standard for system outputs that is
useful for evaluating design choices. Often, benchmarking standards come from
evaluations of competitor offerings. For this reason, companies routinely purchase
competitor products or services to study their performance. In school, studying
your fellow student’s homework solutions is usually considered cheating. In some
cases, benchmarking in business can also constitute illegal corporate espionage.
Often, however, benchmarking against competitor products is legal, ethical, and
wise. Consult with lawyers if you are unclear about the rules relevant to your
situation.
The version of benchmarking that we describe here, listed in Algorithm 3.3,
involves creating two different matrices following Clausing (1994) p. 66. These
matrices will fit into a larger “Quality Function Deployment” “House of
Quality” that will be described fully in Chapter 6. The goal of the exercise is to
create a visual display inform project definition decision-making. Specifically, by
creating the two matrices, the user should have a better idea about which key input
Define Phase and Strategy 57
Question: Study the following benchmarking tables and recommend two KIVs and
two intermediate variables for inclusion in project scope at ACME, Inc. Include
one target for an intermediate variable (INT). Explain in three sentences.
Table 3.3. Three customer issues (qc = 3) and average ratings from ten customers
Competitor system
ACME, Runner, Coyote,
Customer Issue Inc. Inc. Inc.
Structure is strong because of joint shape 4.7 9.0 4.0
Surface is smooth requiring little rework 5.0 8.6 5.3
Clean factory floor, little work in process 4.3 5.0 5.0
58 Introduction to Engineering Statistics and Six Sigma
Answer: Table 3.3 shows that Runner, Inc. is dominant with respect to addressing
customer concerns. Table 3.4 suggests that Runner, Inc.’s success might be
attributable to travel speed, preheat factor settings, and an impressive control of the
fixture gap. These should likely be included in the study subsystem as inputs x1 and
x2 and output Ϳ1 respectively with output target Ϳ1 < 0.2 mm.
Table 3.4. Benchmark key input variables (KIV), intermediate variables (INT), and key
output variables (KOVs)
KIV - Travel Speed (ipm)
KOV - Support
KIV - Heating
Pretreatment
Flatness (-)
(mm)
Company
ACME,
35.0 8.0 15.0 2.0 N 1.1 0.9 1.1 3.5 1.1
Inc.
Runner,
42.0 9.2 15.0 2.0 Y 0.9 0.2 1.2 4.0 1.2
Inc.
Coyote,
36.0 9.5 15.0 2.5 N 0.9 0.9 1.0 1.5 1.0
Inc.
Step 1. The facilitator suggests, amends, and documents the meeting rules
and agenda based on participant ideas and approval.
Step 2. The facilitator declares default actions or system inputs that will go
into effect unless they are revised in the remainder of the meeting.
If appropriate, these defaults come from the ranking management.
Step 3. The facilitator implements the agenda, which is the main body of
the meeting.
Step 4. The facilitator summarizes meeting results including (1) actions to
be taken and the people responsible, and (2) specific
recommendations generated, which usually relate to inputs to some
system.
Step 5. The facilitator solicits feedback about the meeting rules and agenda
to improve future meetings.
Step 6. Participants thank each other for attending the meeting and say
good-bye.
Answer: Default actions: Use the e-mailed list of KIVs, KOVs, and targets.
1. Review the e-mailed list of KIVs, KOVs, and targets
60 Introduction to Engineering Statistics and Six Sigma
Question: Consider the two written numbers 2.38 and 50.21 ± 10.0. What are the
associated significant figures and digit locations?
Answer: The significant figures of 2.38 are 3. The digit location of 2.38 is –2 since
10–2 = 0.01. The number of significant figures of 50.21 ± 1.0 is 1 since the first
digit in front of the decimal cannot be trusted. If it were ± 0.49, then the digit could
be trusted. The digit location of 50.21 ± 10.0 is 1 because 101 = 10.
Define Phase and Strategy 61
Step 1. Develop ranges (low, high) for all inputs x1,…,xn using explicit
uncertainty if available or implied uncertainty if not otherwise specified.
Step 2. Perform calculations using all 2n combinations of range values.
Step 3. The ranges associated with the output number are the highest and lowest
numbers derived in Step 2.
Step 4. Write the product either using all available digits together with the
explicit range or including only significant digits.
If only significant digits are reported, then rounding should be used in the
formal derivation of significant figures method. Also, it is generally reasonable to
apply some degree of rounding in reporting the explicit ranges. Therefore, the most
explicit, correct representation is in terms of a range such as (12.03, 12.13) or
12.07 ± 0.05. Still, 12.1 is also acceptable, with the uncertainty being implied.
Sum Answer: In Step 1, the range for x1 is (2.505, 2.515) and for x2 is (9.7, 10.7).
In Step 2, the 22 = 4 sums are: 2.505 + 9.7 = 12.205, 2.505 + 10.7 = 13.205, 2.515
+ 9.7 = 12.215, and 2.515 + 10.7 = 13.215. The ranges in Step 3 are (12.205,
13.215). Therefore, the sum can be written 12.71 with range (12.2, 13.2) with
rounding. This can also be written 12.71 ± 0.5.
Product Answer: In Step 1, the range for x1 is (2.505, 2.515) and for x2 is (9.7,
10.7). In Step 2, the 22 = 4 products are: 2.505 × 9.7 = 24.2985, 2.505 × 10.7 =
26.8035, 2.515 × 9.ds7 = 24.3955, and 2.515 × 10.7 = 26.9105. The ranges in Step
3 are (24.2985, 26.9105). Therefore, the product can be written 25.602 with
uncertainty range (24.3, 26.9) with rounding. This could be written 25.602 ± 1.3.
Whole Number Answer: In Step 1, the range for x1 is (4, 4) and for x2 is (2, 2)
since we are dealing with whole numbers. In Step 2, the 22 = 4 products are: 4 × 2
= 8, 4 × 2 = 8, 4 × 2 = 8, and 4 × 2 = 8. The ranges in Step 3 are (8, 8). Therefore,
the product can be written as 8 jobs with uncertainty range (8, 8). This could be
written 8 ± 0.000.
Note that in multiplication or product situations, the uncertainty range does not
usually split evenly on either side of the quoted result. Then, the notation (–,+) can
be used. One attractive feature of the “Formal Derivation of Significant Figures”
method proposed here is that it can be used in cases in which the operations are not
arithmetic in nature, which is the purpose of the next example.
Sum Answer: In Step 1, the range for x1 is (2.45, 2.55), for x2 is (5.15, 5.25), and
for x3 is (2.05, 2.15). In Step 2, the 23 = 8 results are: 2.45 × exp(5.15 × 2.05) =
94,238.9,…,2.55 × exp(5.25 × 2.15) = 203,535.0 (see Table 3.5). The ranges in
Step 3 are (94238.9, 203535.0). Therefore, the result can be written 132,649.9 with
range (94,238.9, 203,535.0) with rounding. This can also be written 132,649.9 (–
38,411.0, +70,885.1) or 148,886.95 ± 54,648.05.
In the general cases, it is probably most helpful and explicit to give the
calculated value ignoring uncertainty followed by (+,-) to generate a range, e.g.,
132,649.9 (–38,411.0, +70,885.1). Quoting the middle number in the range
followed by “±” is also acceptable and is relatively concise, e.g., 148,886.95 ±
54,648.05.
In some cases, it is not necessary to calculate all 2n products, since it is
predictable which combinations will give the minimum and maximum in Step 3.
For example, in all of the above examples, it could be deduced that the first
combination would give the lowest number and the last would give the highest
number. The rigorous proof of these facts is the subject of an advanced problem at
the end of this chapter.
The formal derivation of significant figures method proposed here does not
constitute a world standard. Mullis and Lee (1998) and Lee et al. (2000) propose a
coherent convention for addition, subtraction, multiplication, and division
operations. The desirable properties of the method in this book are: (1) it is
relatively simple conceptually, (2) it is applicable to all types of calculations, and
(3) it gives sensible results in some problems that certain methods in other books
do not. One limitation of the method proposed here is that it might be viewed as
exaggerating the uncertainty, since only the extreme lows and highs are reported.
Statistical tolerancing based on Monte Carlo simulation described in Parts II and
III of this book generally provides the most realistic and relevant information
possible. Statistical tolerancing can also be applied to all types of numerical
calculations.
Define Phase and Strategy 63
Finally, many college students have ignored issues relating to the implied
uncertainty of numbers in prior course experiences. Perhaps the main point to
remember is that in business or research situations where thousands of dollars hang
in the balance, it is generally advisable to account for uncertainties in decision-
making. In the remainder of this book, the formal derivation of significant figures
method is not always applied. However, there is a consistent effort to write
numbers in a way that approximately indicates their implied uncertainty. For
example, 4.521 will not be written when what is meant is 4.5 ± 0.5.
Table 3.5. Calculation for the formal derivation of significant figures example
x1 x2 x3 x1 × exp(x2 × x3)
2.45 5.15 2.05 94238.9
2.55 5.15 2.05 98085.4
2.45 5.25 2.05 115680.6
2.55 5.25 2.05 120402.2
2.45 5.15 2.15 157721.8
2.55 5.15 2.15 164159.4
2.45 5.25 2.15 195553.3
2.55 5.25 2.15 203535.0
2.45 5.25 2.15 195553.3
Question: A cellular relay tower manufacturer has a large order for model #1. The
company is considering spending $2.5M to double capacity to a reworked line or,
alternatively, investing in a project to reduce the fraction nonconforming of the
machine line feeding into the reworked line. Currently, 30% of units are
nonconforming and need to be reworked. Recommend a project scope, including
the key intermediate variable(s).
Sales
Answer: The bottleneck is clearly not in sales, since a large order is in hand. The
rework capacity is a bottleneck. It is implied that the only way to increase that
capacity is through expending $2.5M, which the company would like to avoid.
Therefore, the manufacturing line is the relevant bottleneck subsystem, with the
key intermediate variable being the fraction nonconforming going into rework, Ϳ1,
in Figure 3.6. Reducing this fraction to 15% or less should be roughly equivalent to
doubling the rework capacity.
Question: Suppose the team would like to put more specific information about
subsystem KIVs and KOVs into the project charter. Assume that much of the
information about KIVs is known only by hourly workers on the factory floor.
How could Pareto charts and formal meeting rules aid in collecting the desired
information?
3.9 References
Clausing D (1994) Total Quality Development: A Step-By-Step Guide to World-
Class Concurrent Engineering. ASME Press, New York
Goldratt, EM, Cox J (2004) The Goal, 3rd edition. North River Press, Great
Barrington, MA
Lee W, Mulliss C, Chiu HC (2000) On the Standard Rounding Rule for Addition
and Subtraction. Chinese Journal of Physics 38:36-41
Martin P, Oddo F, and Tate K (1997) The Project Management Memory Jogger: A
Pocket Guide for Project Teams. Goal/Qpc, Salem, NH
Mullis C, Lee W (1998) On the Standard Rounding Rule for Multiplication and
Division. Chinese Journal of Physics 36:479-487
Robert SC, Robert HM III, Robert GMH (2000) Robert’s Rules of Order, 10th edn.
Robert HM III, Evans WJ, Honemann DH, Balch TJ (eds). Perseus
Publishing, Cambridge, MA
Streibel BJ (2002) The Manager’s Guide to Effective Meetings. McGraw-Hill,
New York
3.10 Problems
In general, provide the correct and most complete answer.
1. According to the text, which of the following is true of six sigma projects?
a. Projects often proceed to the measure phase with no predicted
savings.
b. Before the define phase ends, the project’s participants are agreed
upon.
c. Project goals and target completion dates are generally part of project
charters.
d. All of the above are true.
e. Only the answers in parts “b” and “c” are correct.
4. Which of the following constitutes an ordered list of two input variables, two
intermediate variables, and an output variable?
a. Lemonade stand (% sugar, % lemons, taste rating, material cost, total
profit)
b. Chair manufacturing (wood type, saw type, stylistic appeal, waste,
profit)
c. Chip manufacturing (time in acid, % silicon, % dopant, % acceptable,
profit)
d. All of the above fit the definition.
e. Only the answers in parts “a” and “b” are correct.
5. A potential scope for the sales subsystem for a lemonade stand is:
a. Improve the taste of a different type of lemonade by adjusting the
recipe.
b. Increase profit through reducing raw optimizing over the price.
c. Reduce “cycle time” between purchase of materials and final product
delivery.
d. All of the above fit the definition as used in the text.
e. Only the answers in parts “b” and “c” are scope objectives.
7. Which of the following are possible deliverables from a wood process project?
a. Ten finished chairs
b. Posters comparing relevant competitor chairs
c. Settings that minimize the amount of wasted wood
d. All of the above are tangible deliverables.
e. Only the answers in parts “a” and “c” are tangible deliverables.
12. Why might using rework and scrap costs to evaluate the cost of
nonconformities be inaccurate?
a. Rework generally does not require expense.
b. If there are many nonconforming units, some inevitably reach
customers.
c. Production defects increase lead times, resulting in lost sales.
d. All of the above are possible reasons.
e. Only the answers in parts “b” and “c” are correct.
Your team (two design engineers and one quality engineer, working for four
months, each at 25% time) works to achieve $250k total savings over three
different production lines (assuming a two-year payback period). A new project
requiring all three engineers is proposed for application on a fourth production line
with similar issues to the ones previously addressed.
13. Assuming the same rate as for the preceding projects, the total number of
person-years likely needed is approximately:
a. 0.083
b. 0.075
c. 0.006
d. -0.050
e. 0.125
14. According to the chapter, expected savings over two years is approximately:
a. $83.3K
b. $166.6K
c. $66.6K
d. -$0.7K, and the project should not be undertaken.
68 Introduction to Engineering Statistics and Six Sigma
15. According to the chapter, the expected profit over two years is approximately:
a. $158.3K
b. $75K
c. $66.6K
d. -$16.7K, and the project should not be undertaken.
16. In three sentences or less, describe a system from your own life with a
bottleneck.
18. According to the text, which is true of the theory of constraints (TOC)?
a. Workers on non-bottleneck subsystems have zero effect on the
bottom line.
b. Identifying bottleneck subsystems can help in selecting project
KOVs.
c. Intermediate variables cannot relate to total system profits.
d. All of the above are true of the theory of constraints.
e. Only the answers in parts “b” and “c” are correct.
19. Which is a categorical factor? (Give the correct and most complete answer.)
a. Temperature used within an oven
b. The horizontal location of the logo on a web page
c. Type of tire used on a motorcycle
d. All of the above are categorical factors.
e. All of the above are correct except (a) and (d).
A hospital is losing business because of its reputation for long patient waits. It has
similar emergency and non-emergency patient processing tracks, with most
complaints coming from the emergency process. Patients in a hospital system
generally spend the longest time waiting for lab test results in both tracks. Data
entry, insurance, diagnosis, triage, and other activities are generally completed
soon after the lab results become available.
23. An engineer might use a Pareto chart to uncover what type of information?
a. Prioritization of nonconformity types identify the relevant subsystem.
b. Pareto charts generally highlight the most recent problems discovered
on the line.
c. Pareto charting does not involve attribute data.
d. All of the above are correct.
e. Only the answers in parts “b” and “c” result from a Pareto chart.
Figure 3.7 is helpful for answering Questions 24-26. It shows the hypothetical
number of grades not in the “A” range by primary cause as assessed by a student.
20
10
0
Insufficient Lack of Personal Tough Grading
Studying Study gp. Issue Grading Errors
24. Which statement or statements summarize the results of the Pareto analysis?
a. The obvious interpretation is that laziness causes most grade
problems.
b. Avoiding courses with tough grading will likely not have much of an
effect on her GPA.
c. Personal issues with instructors’ errors probably did not have much of
an effect on her GPA.
d. All of the above are supported by the analysis.
26. Which of the following are reasons why this analysis might be surprising?
a. She was already sure that studying was the most important problem.
b. She has little faith that studying hard will help.
c. Her most vivid memory is a professor with a troubling grading
policy.
d. All of the above could be reasons.
e. Only answers in parts (b) and (c) would explain the surprise.
The following hypothetical benchmarking data, in Tables 3.6 and 3.7, is helpful for
answering Questions 27-30. Note that the tables, which are incomplete and in a
nonstandard format, refer to three student airplane manufacturing companies. The
second table shows subjective customer ratings of products (1-10, with 10 being
top quality) from the three companies.
29. Based on customer ratings, which company has “best in class” quality?
a. FlyRite
b. Hercules
c. Reliability
d. Ripping
e. None dominates all others.
30. At FlyRite, which KIVs seem the most promising inputs for futher study
(focusing on emulation of best in class practices)?
a. Scissor type, wing length, and arm release angle
b. Scissor type, wing length, and paper thickness
c. Paper thickness only
d. Aesthetics and crumpling
e. Scissor type and aesthetics
31. Formal meeting rules in agreement with those from the text include:
a. Facilitators should not enforce the agenda.
b. Each participant receives three minutes to speak at the start of the
meeting.
c. No one shall speak without possession of the conch shell.
d. All of the above are potential meeting rules.
e. All of the above are correct except (a) and (d).
72 Introduction to Engineering Statistics and Six Sigma
32. In three sentences, describe a scenario in which Pareto charting could aid in
making ethical judgements.
37. What is the explicit uncertainty of 4.2 + 2.534 (permitting accurate rounding)?
a. 6.734
b. 6.734 ± 0.1
c. 6.734 ± 0.0051
d. 6.734 ± 0.0505
e. 6.734 ± 0.0055
39. What is the explicit uncertainty of 4.2 × 2.534 (permitting accurate rounding)?
a. 10.60 ± 0.129
b. 10.60 ± 0.10
c. 10.643 ± 0.129
d. 10.65 ± 0.10
Define Phase and Strategy 73
e. 10.643 ± 0.100
41. What is the explicit uncertainty of y = 5.4 × exp (4.2 – 1.3) (permitting
accurate rounding)?
a. 98.14 ± 0.015
b. 98.14 (-10.17, +10.33)
c. 98.15 (-10.16, +10.33)
d. 98.14 (-10.16, +11.33)
e. 98.15 (-10.17, +11.33)
42. What is the explicit uncertainty of y = 50.4 × exp (2.2 – 1.3) (permitting
accurate rounding)?
a. 123.92 ± 11
b. 123.96 (-11.9,+13.17)
c. 123.92 (-11.9,+13.17)
d. 123.96 (-9.9,+13.17)
e. 123.92 (-9.9,+13.17)
4
4.1 Introduction
In Chapter 2, it was suggested that projects are useful for developing
recommendations to change system key input variable (KIV) settings. The measure
phase in six sigma for improvement projects quantitatively evaluates the current or
default system KIVs, using thorough measurements of key output variables
(KOVs) before changes are made. This information aids in evaluating effects of
project-related changes and assuring that the project team is not harming the
system. In general, quantitative evaluation of performance and improvement is
critical for the acceptance of project recommendations. The more data, the less
disagreement.
Before evaluating the system directly, it is often helpful to evaluate the
equipment or methods used for measurements. The term “measurement systems”
refers to the methods for deriving KOV numbers from a system, which could be
anything from simple machines used by an untrained operator to complicated
accounting approaches applied by teams of highly trained experts. The terms
“gauge” and “gage,” alternate spellings of the same word, referred historically to
physical equipment for certain types of measurements. However, here gauge and
measurement systems are used synonymously, and these concepts can be relevant
for such diverse applications as measuring profits on financial statements and
visually inspecting weld quality.
Measurement systems generally have several types of errors that can be
evaluated and reduced. The phrase “gauge repeatability & reproducibility”
(R&R) methods refers to a set of methods for evaluating measurement systems.
This chapter describes several gauge R&R related methods with examples.
Thorough evaluation of system inputs generally begins after the acceptable
measurement systems have been identified. Evaluation of systems must include
sufficient measurements at each time interval to provide an accurate picture of
performance in that interval. The evaluation must also involve a study of the
system over sufficient time intervals to ensure that performance does not change
greatly over time. The phrase “statistical process control” (SPC) charting refers to
76 Introduction to Engineering Statistics and Six Sigma
Answer: Gauge R&R evaluates measurement systems. These evaluations can aid
in improving the accuracy of measurement systems. SPC charting uses
measurement systems to evaluate other systems. If the measurement systems
improve, SPC charting will likely give a more accurate picture of the other systems
quality.
Measurement type
Standard values Non-destructive evaluation Destructive testing
Available Comparison with standards Comparison with standards
Not available Gauge R&R (crossed) Not available
It is apparent from Table 4.1 that gauge R&R (crossed) should be used only
when standard values are not available. To understand this, a few definitions may
be helpful. First, “repeatability error”(εrepeatability) refers to the difference between
a given observation and the average a measurement system would obtain through
many repetitions. Second, “reproducibility error” (εreproducibility) is the difference
between an average obtained by a relevant measurement system and the average
obtained by all other similar systems (perhaps involving multiple people or
equipment of similar type). In general, we will call a specific measurement system
an “appraiser” although it might not be a person. Here, an appraiser could be a
consulting company, or a computer program, or anything else that assigns a
number to a system output.
Third, the phrase “systematic errors” (εsystematic) refers in this book to the
difference between the average measured by all similar systems for a unit and that
unit’s standard value. Note that in this book, reproducibility is not considered a
systematic error, although other books may present that interpretation. Writing the
measurement error as εmeasurement, the following equation follows directly from these
definitions:
measurement system. Since gauge (crossed) does not use standard values, it can be
regarded as a second-choice method. However, it is also usable in cases in which
standard values are not available.
Another method called “gauge R&R (nested)” is omitted here for the sake of
brevity. Gauge R&R (nested) is relevant for situations in which the same units
cannot be tested by multiple measurement systems, e.g., parts cannot be shipped to
different testers. Gauge R&R (nested) cannot evaluate either systematic errors or
the separate effects of repeatability and reproducibility errors. Therefore, it can be
regarded as a “third choice” method. Information about gauge R&R (nested) is
available in standard software packages such as Minitab® and in other references
(Montgomery and Runger 1994).
The phrase “standard unit” refers to any of the n units with standard values
available. The phrase “absolute error” means the absolute value of the
measurement errors for a given test run. As usual, the “sample average” (Yaverage)
of Y1, Y2, …, Yr is (Y1 + Y2 + … + Yr) ÷ r. The “sample standard deviation” (s) is
given by:
ҏ
s=
(Y − Y
1 average ) + (Y
2
2 − Yaverage ) + ... + (Yr − Yaverage )
2 2
(4.2)
r −1
Clearly, if destructive testing is used, each of the n standard units can only
appear in one combination in Step 1. Also, it is perhaps ideal that the appraisers
should not know which units they are measuring in Step 2. However, hiding
information is usually unnecessary, either because the appraisers have no incentive
to distort the values or the people involved are too ethical or professional to change
readings based on past values or other knowledge.
In the context of comparison with standards, the phrase “measurement system
capability” is defined as 6.0 × EEAE. In some cases, it may be necessary to tell
apart reliably system outputs that have true differences in standard values greater
than a user-specified value. Here we use “D” to refer to this user-specified value.
In this situation, the term “gauge capable” refers to the condition that the
measurement system capability is less than D, i.e., 6.0 × EEAE < D. In general,
acceptability can be determined subjectively through inspection of the EEAE,
which has the simple interpretation of being the error magnitude the measurement
system user can expect to encounter.
Measure Phase and Statistical Charting 79
# # #
17 1 2
18 2 3
19 5 3
20 4 1
3. Place items on scale and record the weight, rounding to the nearest
pound.
Answer: Assuming that the dumbbell manufacturer controls its products far better
than the scale manufacturer, we have n = 3 standard values: 12.0 pounds, 12.0
pounds, and 24.0 pounds (when both weights are on the scale). With only one scale
and associated SOP, we have m = 1.
Step 2. Table 4.3 also shows the measured values. Note that the scale was picked up
between each measurement to permit the entire measurement SOP to be
evaluated.
EEEAE =
(2 − 1.55)2 + (0 − 1.55)2 + ... + (2 − 1.55)2 ÷ 20 = 0.135
20 − 1
Step 4. Since EEAE ÷ EEEAE » 5, one writes the expected absolute errors as 1.55 ±
0.135 pounds, and the method stops. Since 6.0 × 1.55 = 9.3 pounds is
greater than 5.0 pounds, we say that the SOP is not gauge capable.
In the preceding example, significant figures are less critical than usual because
the method itself provides an estimate of its own errors. Note also that failure to
establish the capability of a measurement system does not automatically signal a
need for more expensive equipment. For example, in the home scale case, the
measurement SOP was not gauge capable, but simply changing the procedures in
the SOP would likely create a capable measurement system. With one exception,
all the measurements were below the standard values. Therefore, one could change
the third step in the SOP to read “Place items on the scale, note the weight to the
nearest pound, and record the noted weight plus 1 pound.”
In general, vagueness in the documentation of measurement SOPs contributes
substantially to capability problems. Sometimes simply making SOPs more
specific establishes capability. For example, the second step in the home scale SOP
might read “Carefully lean over the scale, then adjust the dial to set reading to
zero.”
Finally, advanced readers will notice that the EEAE is equivalent to a Monte
Carlo integration estimate of the expected absolute errors. These readers might also
apply pseudo random numbers in Step 1 of the method. Related material is covered
in Chapter 10 and in Part II of this book. This knowledge is not needed, however,
for competent application of the comparison with standards method.
Measure Phase and Statistical Charting 81
In general, both gauge R&R (crossed) and gauge R&R (nested) are associated with
two alternative analysis methods: Xbar & R and analysis of variance (ANOVA)
analysis. The experimentation steps in both methods are the same. Many students
find Xbar & R methods intuitively simpler, yet ANOVA methods can offer
important advantages in accuracy. For example, if the entire procedure were to be
repeated, the numerical outputs from the ANOVA process would likely be more
similar than those from Xbar & R methods. Here, we focus somewhat arbitrarily
on the Xbar & R analysis methods. One benefit is that the proposed method derives
from the influential Automotive Industry Task Force (AIAG) Report (1994) and
can therefore be regarded as the industry-standard method.
82 Introduction to Engineering Statistics and Six Sigma
Step 1a. Create a listing of all combinations of n units or system measurements and m
appraisers. Repeat this list r times, labeling the repeated trials 1,…,r.
Step 1b. Randomly reorder the list and leave one column blank to record the data.
Table 4.4 illustrates the results from Step 1 with n = 5 parts or units, m = 2
appraisers, and r = 3 trials.
Step 2. Appraisers perform the measurements that have not already been performed in
the order indicated in the table from Step 1. This data is referred to using the
notation Yi,j,k where i refers to the unit, j refers to the appraiser involved, and k
refers to the trial. For example, the measurement from Run 1 in Table 4.4 is
referred to as Y3,2,1.
Step 3. Calculate the following (i is for the part, j is for the appraiser, k is for the trial,
n is the number of parts, m is the number of appraisers, r is the number of
trials):
Yaverage,i,j = r–1 Σk = 1,…,rYi,j,k and
Yrange,i,j = Max[Yi,j,1,…, Yi,j,r] – Min[Yi,j,1,…, Yi,j,r]
for i = 1,…,n and j = 1,…,m,
Yaverage parts,i = (m)–1 Σj = 1,…,m Yaverage,i,j for i = 1,…,n
Yinspector average,j = (n)–1 Σi = 1,…,n Yaverage,i,j for j = 1,…,m
Yaverage range = (mn)–1 Σi = 1,…,n Σj = 1,…,m Yrange,i,j
Yrange parts = Max[Yaverage parts,1,…,Yaverage parts,n]
– Min[Yaverage parts,1,…,Yaverage parts,n] (4.4)
Yrange inspect = Max[Yinspector average,1,…,Yinspector average,m]
– Min[Yinspector average,1,…,Yinspector average,m]
Repeatability = K1Yaverage range
Reproducibility = sqrt{Max[(K2 Yrange inspect)2 – (nr)–1 Repeatability2,0]}
R&R = sqrt[Repeatability2 + Reproducibility2]
Part = K3Yrange parts
Total = sqrt[R&R2 + Part2]
%R&R = (100 × R&R) ÷ Total
where “sqrt” means square root and K1 = 4.56 for r = 2 trials and 3.05 for r =3
trials, K2 = 3.65 for m = 2 machines or inspectors and 2.70 for m=3 machines
or human appraisers, and K3 = 3.65, 2.70, 2.30, 2.08, 1.93, 1.82, 1.74, 1.67,
1.62 for n = 2, 3, 4, 5, 6, 7, 8, 9, and 10 parts respectively.
Step 4. If %R&R < 10, then one declares that the measurement system is “gauge
capable,” and measurement error can generally be neglected. Depending upon
problem needs, one may declare the process to be marginally gauge capable if
10 ≤ %R&R < 30. Otherwise, more money and time should be invested to
improve the inspection quality.
Measure Phase and Statistical Charting 83
Table 4.4. Example gauge R&R (crossed) results for (a) Step 1a and (b) Step 1b
(a) (b)
Unit Appraiser Trial Run Unit (i) Appraiser (j) Trial (k) Yi,j,k
1 1 1 1 3 2 1
2 1 1 2 2 1 1
3 1 1 3 3 1 2
4 1 1 4 5 1 2
5 1 1 5 2 2 1
1 2 1 6 2 2 2
2 2 1 7 4 2 2
3 2 1 8 5 2 2
4 2 1 9 3 1 1
5 2 1 10 2 1 3
1 1 2 11 1 1 1
2 1 2 12 5 1 1
3 1 2 13 3 2 3
4 1 2 14 5 2 3
5 1 2 15 4 2 1
1 2 2 16 1 1 2
2 2 2 17 1 2 2
3 2 2 18 4 1 2
4 2 2 19 1 2 1
5 2 2 20 4 1 1
1 1 3 21 1 1 3
2 1 3 22 5 1 3
3 1 3 23 3 2 2
4 1 3 24 4 2 3
5 1 3 25 5 2 1
1 2 3 26 2 1 2
2 2 3 27 3 1 3
3 2 3 28 4 1 3
4 2 3 29 3 2 1
5 2 3 30 2 1 1
The crossed method involves collecting data from all combinations of n units,
m appraisers, and r trials each. The method is only defined here for the cases
satisfying: 2 ≤ n ≤ 10, 2 ≤ m ≤ 3, and 2 ≤ r ≤ 3. In general it is desirable that the
total number of evaluations is greater than 20, i.e., n × m × r ≥ 20. As for
84 Introduction to Engineering Statistics and Six Sigma
comparison with standards methods, it is perhaps ideal that appraisers do not know
which units they are measuring.
Note that the gauge R&R (crossed) methods with Xbar & R analysis methods
can conceivably generate undefined reproducibility values in Step 3. If this
happens, it is often reasonable to insert a zero value for the reproducibility.
In general, the relevance of the %R&R strongly depends on the degree to which
the units’ unknown standard values differ. If the units are extremely similar, no
inspection equipment at any cost could possibly be gauge capable. As with
comparison with standards, it might only be of interest to tell apart reliably units
that have true differences greater than a given number, D. This D value may be
much larger than the unknown standard value differences of the parts that
happened to be used in the gauge study. Therefore, an alternative criterion is
proposed here. In this nonstandard criterion, a system is gauge capable if 6.0 ×
R&R < D.
Question: Suppose R&R = 32.0 and Part = 89.0. Calculate and interpret the
%R&R.
Table 4.5. Hypothetical undercut data for gauge study (superscripts show run order)
Software Part
#1 1 2 3 4 5
2 11 13 5
Trial 1 0.94 1.05 1.03 1.01 0.8815
Trial 2 0.947 1.058 1.0220 1.0418 0.8627
Trial 3 0.9710 1.0419 1.0522 1.0024 0.8830
Software Part
#2 1 2 3 4 5
6 1 12 9
Trial 1 0.90 1.03 1.03 1.02 0.873
Trial 2 0.947 1.0414 1.0523 1.0117 0.8816
Trial 3 0.9710 1.0126 1.0628 0.9829 0.8725
The following example shows a way to reorganize the data from Step 2 that can
make the calculations in Step 3 easier to interpret and to perform correctly. It is
Measure Phase and Statistical Charting 85
Part
Inspector 1 1 2 3 4 5
Trial 1 0.94 1.05 1.03 1.01 0.88 Yaverage range 0.026
Trial 2 0.94 1.05 1.02 1.04 0.86 Yrange parts 0.167
Trial 3 0.97 1.04 1.05 1.00 0.88 Yinspector average,1 Yrange inspect 0.012
Yaverage,i,1 0.950 1.047 1.033 1.017 0.873 0.984 Repeatability 0.079
Yrange,i,1 0.030 0.010 0.030 0.040 0.020 Reproducibility 0.039
Inspector 2 1 2 3 4 5 R&R 0.088
Trial 1 0.90 1.03 1.03 1.02 0.87 Part 0.347
Trial 2 0.92 1.04 1.05 1.01 0.88 Total 0.358
Trial 3 0.91 1.01 1.06 0.98 0.87 %R&R 25%
Yaverage,i,2 0.910 1.027 1.047 1.003 0.873 Yinspector average,2
Yrange,i,2 0.020 0.030 0.030 0.040 0.010 0.972
Yaverage parts,i 0.930 1.037 1.040 1.010 0.873
Answer: Table 4.6 shows the calculations for Step 3. The process is marginally
capable, i.e., %R&R = 25%. This might be acceptable depending upon the goals.
In 1931, Shewhart formally proposed the Xbar & R charting method he invented
while working at Bell Telephone Laboratories (see the re-published version in
Shewhart 1980). Shewhart had been influenced by the mass production system that
Henry Ford helped to create. In mass production, a small number of skilled
laborers were mixed with thousands of other workers on a large number of
assembly lines producing millions of products. Even with the advent of Toyota’s
lean production in the second half of the twentieth century and the increase of
service sector jobs such as education, health care, and retail, many of the problems
addressed by Shewhart’s method are relevant in today’s workplace.
Figure 4.1 illustrates Shewhart’s view of production systems. On the left-hand
side stands skilled labor such as technicians or engineers. These workers have
responsibilities that blind them from the day-to-day realities of the production
lines. They only see a sampling of possible output numbers generated from those
lines, as indicated by the spread-out quality characteristic numbers flowing over
the wall. Sometimes variation causes characteristic values go outside the
specification limits, and units become nonconforming. Typically, most of the units
conform to specifications. Therefore, skilled labor generally views variation as an
“evil” or negative issue. Without it, one hundred percent of units would conform.
The phrase “common cause variation” refers to changes in the system outputs
or quality characteristic values under usual circumstances. The phrase “local
authority” refers to the people (not shown) working on the production lines and
local skilled labor. Most of the variation in the characteristic numbers occurs
because of the changing of factors that local authority cannot control. If the people
and systems could control the factors and make all the quality characteristics
constant, they would do so. Attempts to control the factors that produce common
cause variation generally waste time and add variation. The term “over-control”
refers to a foolish attempt to dampen common cause variation that actually
increases it. Only a large, management-supported improvement project can reduce
the magnitude of common cause variation.
On the other hand, sometimes unusual problems occur that skilled labor and
local authority can fix or make less harmful. This is indicated in Figure 4.1 by the
gremlin on the right-hand side. If properly alerted, skilled labor can walk around
the wall and scare away the gremlin. The phrase “assignable cause” refers to a
change in the system inputs that can be reset or resolved by local authority.
Examples of assignable causes include meddling engineers, training problems,
unusually bad batches of materials from suppliers, end of financial quarters, and
vacations. Using the vocabulary of common and assignable causes, it is easy to
express the primary objectives of the statistical process control charts:
Measure Phase and Statistical Charting 87
9.1
9.1
8.3
8.5 8.9
5.2
5.5
Figure 4.1. Scarce resources, assignable causes, and data in statistical process control
Possible Answer: Lone customers and employees stealing small items from the
floor or warehouse contribute to common cause variation. A conspiracy of multiple
employees systematically stealing might be terminated by local management.
Question: Which charting procedure is most relevant for monitoring retail theft?
Also, provide two examples of rational subgroups.
Possible Answer: A natural key output variable (KOV) is the amount of money or
value of goods stolen. Since the amount is a single continuous variable, Xbar & R
charting is the most relevant of the methods in this book. The usual inventory
counts that are likely in place can be viewed as complete inspection with regard to
property theft. Because inventory counts might not be gauge capable, it might
make sense to institute random intense inspection of a subset of expensive items at
the stores.
There are no universal standard rules for selecting n and τ. This selection is
done in a pre-step before the method begins. Three considerations relevant to the
selection follow. First, a rule of thumb is that n should satisfy n × p0 > 5.0 and n ×
(1 – p0) > 5.0. This may not be helpful, however, since p0 is generally unknown
before the method begins. Advanced readers will recognize that this is the
approximate condition for p to be normally distributed. In general, this condition
can be expected to improve the performance of the charts. In many relevant
situations p0 is less than 0.05 and, therefore, n should probably be greater than 100.
Second, τ should be short enough such that assignable causes can be identified
and corrected before considerable financial losses occur. It is not uncommon for
the charting procedure to require a period of 2 × τ before signaling that an
assignable cause might be present. In general, larger n and smaller τ value shorten
response times. If τ is too long, the slow discovery of problems will cause
unacceptable pile-ups of nonconforming items and often trigger complaints.
Step 1. (Startup) Obtain the total fraction of nonconforming units or systems using
25 rational subgroups each of size n. This should require at least 25 ×
τ time. Tentatively, set p0 equal to this fraction.
Step 2. (Startup) Calculate the “trial” control limits using
UCLtrial = p0 + 3.0 × p0 (1− p0 ) ,
n
CLtrial = p0, and (4.5)
LCLtrial = Maximum{p0 – 3.0 × p0 (1− p )
0 ,0.0}
n
where “Maximum” means take the largest of the numbers separated by
commas.
Step 3. (Startup) Identify all the periods for which p = fraction nonconforming in
that period and p < LCLtrial or p > UCLtrial. If the results from any of these
periods are believed to be not representative of future system operations,
e.g., because their assignable causes were fixed permanently, remove the
data from the l not representative periods from consideration.
Step 4. (Startup) Calculate the total fraction nonconforming based on the remaining
25 – l periods and (25 – l) × n data and p0 set equal to this number. The
quantity p0 is sometimes called the “process capability” in the context of p-
charting. Calculate the revised limits using the same formulas as in Step 2:
UCL = p0 + 3.0 × p0 (1− p0 ) ,
n
CL = p0, and
LCL = Maximum{p0 – 3.0 × p0 (1− p0 ) ,0.0}.
n
Step 5. (Steady State) Plot the fraction nonconforming, pj, for each period j together
with the upper and lower control limits.
Measure Phase and Statistical Charting 91
Answer: It is implied that if a customer does not agree that everything is OK, then
everything is not OK. Then, also, the restaurant party associated with the
unsatisfied customer is effectively a nonconforming unit. Therefore, p-charting is
relevant since the given data is effectively the count of nonconforming units. Also,
the number to be inspected is constant so a fixed value of n = 200 is used in all
calculations.
First, the trial limit calculations are
p0 = (total number nonconforming) ÷ (total number inspected)
= 252/5000 = 0.050,
UCLtrial = 0.050 + 3.0 × sqrt [(0.050) × (1 – 0.050) ÷ 200] = 0.097,
CLtrial = 0.050, and
LCLtrial = Max {0.050 – 3.0 × sqrt [(0.050) × (1 – 0.050) ÷ 200], 0.0} = 0.004.
Figure 4.2 shows a p-chart of the startup period at the restaurant location.
Clearly, the p value in week 9 and 10 subgroups constitute out-of-control signals.
These signals were likely caused by a rare assignable cause (construction) that
92 Introduction to Engineering Statistics and Six Sigma
makes the associated data not representative of future usual conditions. Therefore,
the associated data are removed from consideration.
Table 4.7. (a) Startup period restaurant data and (b) data available after startup period
(a) (b)
Sum Not Sum Not Sum Not
Week OK Week OK Week OK
1 8 14 10 1 8
2 7 15 9 2 8
3 10 16 5 3 11
4 6 17 8 4 2
5 8 18 9 5 7
6 9 19 11
7 8 20 8
8 11 21 9
9 30 22 9
10 25 23 10
11 10 24 6
12 9 25 9
13 8
0.160
0.140
p , fraction nonconforming
0.120
0.100
UCL
0.080 CL
0.060 LCL
0.040 p
0.020
0.000
1 3 5 7 9 11 13 15 17 19 21 23
Trial Period Week
Figure 4.2. Restaurant p-chart during the startup period
Measure Phase and Statistical Charting 93
Question: Plot the data in Table 4.2(b) on the control chart derived in the previous
example. Are there any out-of-control signals?
Answer: Figure 4.3 shows an ongoing p-charting activity in its steady state. No
out-of-control signals are detected by the chart.
In the above example, the lower control limit (LCL) was zero. An often
reasonable convention for this case is to consider only zero fractions
nonconforming (p = 0) to be out-of-control signals if they occur repeatedly. In any
case, values of p below the lower control limit constitute positive assignable causes
and potential information with which to improve the process. After an
investigation, the local authority might choose to use information gained to rewrite
standard operating procedures.
Since many manufacturing systems must obtain fractions of nonconforming
products much less than 1%, p-charting the final outgoing product often requires
complete inspection. Then, n × p0 « 5.0, and the chart can be largely ineffective in
both evaluating quality and monitoring. Therefore, manufacturers often use the
charts upstream for units going into a rework operation. Then the fraction
nonconforming might be much higher. The next example illustrates this type of
application.
94 Introduction to Engineering Statistics and Six Sigma
0.100
0.080
p, fraction nonconforming
UCL
0.060 CL
LCL
0.040 p
0.020
0.000
1 2 3 4 5
Week
Question: A process engineer decides to study the fraction of welds going into a
rework operation using p-charting. Suppose that 2500 welds are inspected over 25
days and 120 are found to require rework. Suppose one day had 42 nonconforming
welds which were caused by a known corrected problem and another subgroup had
12 nonconformities but no assignable cause could be found. What are your revised
limits and what is the process capability?
Answer: Assuming a constant sample size with 25 subgroups gives n = 100. The
trial limits are p0 = 120/2500 = 0.048, UCLtrial = 0.110, CLtrial = 0.048, and LCLtrial
= 0.000, so there is effectively no lower control limit. We remove only the
subgroup whose values are believed to be not representative of the future. The
revised numbers are p0 = 78/2400 = 0.0325, UCL = 0.0857, CL = the process
capability = 0.0325, and LCL = 0.000.
people constructing the demerit chart. Here, we consider a single scheme in which
there are two classes of nonconformities: particularly serious nonconformities with
weight 5.0, and typical nonconformities with weight 1.0. The following symbols
are used in the description of demerit charting:
1. n is the number of samples in each rational subgroup. If the sample size
varies because of choice or necessity, it is written ni, where i refers to the
relevant sampling period. Then, n1 n2 might occur and/or n1 n3 etc.
2. cs is the number of particularly serious nonconformities in a subgroup.
3. ct is the number of typical nonconformities in a subgroup.
4. c is the weighted count of nonconformities in a subgroup or, equivalently,
the sum of the demerits. In terms of the proposed convention:
c = (5.0 × cs + 1.0 × c) (4.6)
5. u is the average number of demerits per item in a subgroup. Therefore,
u=c÷n (4.7)
6. u0 is the true average number of weighted nonconformities per item in all
subgroups under consideration.
Question: Table 4.8 summarizes the results from the satisfacturing survey written
by patients being discharged from a hospital wing. It is known that day 5 was a
major holiday and new medical interns arrived during day 10. Construct an SPC
chart appropriate for monitoring patient satisfaction.
Step 1. (Startup) Obtain the weighted sum of all nonconformities, C, and count
of units or systems, N = n1 + … + n25 from 25 time periods. Tentatively,
set u0 equal to C ÷ N.
Step 2. (Startup) Calculate the “trial” control limits using
UCLtrial = u0 + 3.0 × u0 ,
n
CLtrial = u0, and (4.8)
u
LCLtrial = Maximum{u0 – 3.0 × 0 ,0.0}.
n
Step 3. (Startup) Define c as the number of weighted nonconformities in a given
period. Define u as the weighted count of nonconformities per item in
that period, i.e., u = c/n. Identify all periods for which u < LCLtrial or u >
UCLtrial. If the results from any of these periods are believed to be not
representative of future system operations, e.g., because problems were
fixed permanently, remove the data from the l not representative periods
from consideration.
Step 4. (Startup) Calculate the number of weighted nonconformities per unit
based on the remaining 25 – l periods and (25 – l) × n data and set this
equal to u0. The quantity u0 is sometimes called the “process capability”
in the context of demerit charting. Calculate the limits using the
formulas repeated from Step 2:
UCL = u0 + 3.0 × u0 ,
n
CL = u0, and
LCL = Maximum{u0 – 3.0 × u0 ,0.0}.
n
Step 5. (Steady State) Plot the number of nonconformities per unit, u, for each
future period together with the revised upper and lower control limits.
An out-of-control signal is defined as a case in which the fraction
nonconforming for a given time period, u, is below the lower control
limit (u < LCL) or above the upper control limit (u > UCL). From then
on, technicians and engineers are discouraged from making minor
process changes unless a signal occurs. If a signal does occur, designated
people should investigate to see if something unusual and fixable is
happening. If not, the signal is referred to as a false alarm.
The out-of-control signals occurred on Day 5 (an unusually positive statistic) and
Days 11 and 12. It might subjectively be considered reasonable to remove Day 5
since patients might have felt uncharacteristically positive due to the rare major
holiday. However, it is less clear whether removing data associated with days 11
and 12 would be fair. These patients were likely affected by the new medical
interns. Considering that new medical interns affect hospitals frequently and local
authority might have little control over them, their effects might reasonably be
considered part of common cause variation. Still, it would be wise to inform the
Measure Phase and Statistical Charting 97
Table 4.8. Survey results from patients leaving a hypothetical hospital wing
#Complaints #Complaints
Day Discharges Typical Serious Day Discharges Typical Serious
1 40 22 3 14 45 22 3
2 29 28 3 15 30 22 3
3 55 33 4 16 30 33 1
4 30 33 2 17 30 44 1
5 22 3 0 18 35 27 2
6 33 32 1 19 25 33 1
7 40 23 2 20 40 34 4
8 35 38 2 21 55 44 1
9 34 23 2 22 55 33 1
10 50 33 1 23 70 52 2
11 22 32 2 24 34 24 2
12 30 39 4 25 40 45 2
13 21 23 2
2.500
2.000
1.500 u
UCL
u-Bar
1.000 LCL
0.500
0.000
1 3 5 7 9 11 13 15 17 19 21 23 25
Subgroup Number
Step 1. (Startup) Measure the continuous characteristics, Xi,j, for i = 1,…,n units
for j = 1,…,25 periods. Each n units is carefully chosen to be representative
of all units in that period, i.e., a rational subgroup.
Step 2. (Startup) Calculate the sample averages Xbar,j = (X1,j +…+ Xn,j)/n and ranges
Rj = Max[X1,j,…, Xn,j] – Min[X1,j,…, Xn,j] for j = 1,…,25. Also, calculate the
average of all of the 25n numbers, Xbarbar, and the average of the 25 ranges
Rbar = (R1 +…+ R25)/25.
Step 4. (Startup) Find all the periods for which either Xbar,j or Rj or both are not
inside their control limits, i.e., {Xbar,j < LCLXbar or Xbar,j > UCLXbar} and/or
{Rj < LCLR or Rj > UCLR}. If the results from any of these periods are
believed to be not representative of future system operations, e.g., because
problems were fixed permanently, remove the data from the l not
representative periods from consideration.
Step 6. (Steady State, SS) Plot the sample nonconforming, Xbar,j, for each period j
together with the upper and lower control limits, LCLXbar and UCLXbar. The
resulting “Xbar chart” typically provides useful information to stakeholders
(engineers, technicians, and operators) and builds intuition about the
engineered system. Also, plot Rj for each period j together with the control
limits, LCLR and UCLR. The resulting chart is called an “R chart”.
100 Introduction to Engineering Statistics and Six Sigma
Table 4.9. Constants d2, D1, and D2 relevant for Xbar & R charting
Sample d2 D1 D2 Sample d2 D1 D2
size (n) size (n)
2 1.128 0.000 3.686 8 2.847 0.388 5.306
3 1.693 0.000 4.358 9 2.970 0.547 5.393
4 2.059 0.000 4.698 10 3.078 0.687 5.469
5 2.326 0.000 4.918 15 3.472 1.203 5.741
6 2.534 0.000 5.078 20 3.737 1.549 5.921
7 2.704 0.204 5.204
Generally, n is small enough that people are not interested in variable sample
sizes. In the formulas below, quantities next to each other are implicitly multiplied
with the “×” omitted for brevity, and “/” is equivalent to “÷”. The numbers 3.0 and
0.0 in the formulas are assumed to have an infinite number of significant digits.
An out-of-control signal is defined as a case in which the sample average, Xbar,j,
or range, Rj, or both, are outside the control limits, i.e., {Xbar,j < LCLXbar or Xbar,j >
UCLXbar} and/or {Rj < LCLR or Rj > UCLR}. From then on, technicians and
engineers are discouraged from making minor process changes unless a signal
occurs. If a signal does occur, designated people should investigate to see if
something unusual and fixable is happening. If not, the signal is referred to as a
false alarm.
Note that all the charts in this chapter are designed such that, under usual
circumstances, false alarms occur on average one out of 370 periods. If they occur
more frequently, it is reasonable to investigate with extra vigor for assignable
causes. Also, as an example of linear interpolation, consider the estimated d2 for n
= 11. The approximate estimate for d2 is 3.078 + (1 ÷ 5) × (3.472 – 3.078) =
3.1568. Sometimes the quantity “Cpk” (spoken “see-pee-kay”) is used as a system
quality summary. The formula for Cpk is
Cpk = Min[USL – Xbarbar, Xbarbar – LSL]/(3σ0), (4.10)
where σ0 is based on the revised Rbar from an Xbar & R method application. Also,
USL is the upper specification limit and LSL is the lower specification limit. These
Measure Phase and Statistical Charting 101
are calculated and used to summarize the state of the engineered system. The σ0
used is based on Step 4 of the above standard procedure.
Large Cpk and small values of 6σ0 are generally associated with high quality
processes. This follows because both these quantities measure the variation in the
system. We reason that variation is responsible for the majority of quality
problems because typically only a small fraction of the units fail to conform to
specifications. Therefore, some noise factor changing in the system causes those
units to fail to conform to specifications. The role of variation in causing problems
explains the phrase “variation is evil” and the need to eliminate source of variation.
Question: A Korean shipyard wants to evaluate and monitor the gaps between
welded parts from manual fixturing. Workers measure 5 gaps every shift for 25
shifts over 10 days. The remaining steady state (SS) is not supposed to be available
at the time this question is asked. Table 4.10 shows the resulting hypothetical data.
Chart this data and establish the process capability.
Table 4.10. Example gap data (in mm) to show Xbar & R charting (start-up & steady state)
Phase j X1,j X2,j X3,j X4,j X5,j Xbar,j Rj Phase j X1,j X2,j X3,j X4,j X5,j Xbar,j Rj
SU 1 0.85 0.71 0.94 1.09 1.08 0.93 0.38 SU 19 0.97 0.99 0.93 0.75 1.09 0.95 0.34
SU 2 1.16 0.57 0.86 1.06 0.74 0.88 0.59 SU 20 0.85 0.77 0.78 0.84 0.83 0.81 0.08
SU 3 0.80 0.65 0.62 0.75 0.78 0.72 0.18 SU 21 0.82 1.03 0.98 0.81 1.10 0.95 0.29
SU 4 0.58 0.81 0.84 0.92 0.85 0.80 0.34 SU 22 0.64 0.98 0.88 0.91 0.80 0.84 0.34
SU 5 0.85 0.84 1.10 0.89 0.87 0.91 0.26 SU 23 0.82 1.03 1.02 0.97 1.00 0.97 0.21
SU 6 0.82 1.20 1.03 1.26 0.80 1.02 0.46 SU 24 1.14 0.95 0.99 1.18 0.85 1.02 0.33
SU 7 1.15 0.66 0.98 1.04 1.19 1.00 0.53 SU 25 1.06 0.92 1.07 0.88 0.78 0.94 0.29
SU 8 0.89 0.82 1.00 0.84 1.01 0.91 0.19 SS 26 1.06 0.81 0.98 0.98 0.85 0.936 0.25
SU 9 0.68 0.77 0.67 0.85 0.90 0.77 0.23 SS 27 0.83 0.70 0.98 0.82 0.78 0.822 0.28
SU 10 0.90 0.85 1.23 0.64 0.79 0.88 0.59 SS 28 0.86 1.33 1.09 1.03 1.10 1.082 0.47
SU 11 0.51 1.12 0.71 0.80 1.01 0.83 0.61 SS 29 1.03 1.01 1.10 0.95 1.09 1.036 0.15
SU 12 0.97 1.03 0.99 0.69 0.73 0.88 0.34 SS 30 1.02 1.05 1.01 1.02 1.20 1.060 0.19
SU 13 1.00 0.95 0.76 0.86 0.92 0.90 0.24 SS 31 1.02 0.97 1.01 1.02 1.06 1.016 0.09
SU 14 0.98 0.92 0.76 1.18 0.97 0.96 0.42 SS 32 1.20 1.02 1.20 1.05 0.91 1.076 0.29
SU 15 0.91 1.02 1.03 0.80 0.76 0.90 0.27 SS 33 1.10 1.15 1.10 1.02 1.08 1.090 0.13
SU 16 1.07 0.72 0.67 1.01 1.00 0.89 0.40 SS 34 1.20 1.05 1.04 1.05 1.06 1.080 0.16
SU 17 1.23 1.12 1.10 0.92 0.90 1.05 0.33 SS 35 1.22 1.09 1.02 1.05 1.05 1.086 0.20
SU 18 0.97 0.90 0.74 0.63 1.02 0.85 0.39
102 Introduction to Engineering Statistics and Six Sigma
Answer: From the description, n = 5 inspected gaps between fixtured parts prior to
welding, and the record of the measured gap for each in millimeters is Xi,j. The
inspection interval is a production shift, so roughly τ = 6 hours.
The calculated subgroup averages and ranges are also shown (Step 2) and
Xbarbar = 0.90, Rbar = 0.34, and σ0 = 0.148. In Step 3, the derived values were
UCLXbar = 1.103, LCLXbar = 0.705, UCLR = 0.729, and LCLR = 0.000. None of the
first 25 periods has an out-of-control signal. In Step 4, the process capability is
0.889. From then until major process changes occur (rarely), the same limits are
used to find out-of-control signals and alert designated personnel that process
attention is needed (Step 5). The chart, Figure 4.5, also prevents “over-control” of
the system by discouraging changes unless out-of-control signals occur.
0.8
Range Gap (mm).
0.6 UCL
R
0.4
CL
0.2 LCL
0
1 11 21 31
Subgroup
1.2
1.1
Avg. Gap (mm).
UCL
1
Xbar
0.9
CL
0.8
LCL
0.7
0.6
1 11 Subgroup 21 31
Figure 4.5. Xbar & R charts for gap data ( separates startup and steady state)
The phrase “sigma level” (σL) is an increasingly popular alternative to Cpk. The
formula for sigma level is
If the process is under control and certain “normal assumptions” apply, then the
fraction nonconforming is less than 1.0 nonconforming per billion opportunities. If
the mean shifts 1.5 σ0 to the closest specification limit, the fraction nonconforming
is less than 3.4 nonconforming per million opportunities. Details from the fraction
nonconforming calculations are documented in Chapter 10.
Measure Phase and Statistical Charting 103
The goal implied by the phrase “six sigma” is to change system inputs so that
the σL derived from an Xbar & R charting evaluation is greater than 6.0.
In applying Xbar & R charting, one simultaneously creates two charts and uses
both for process monitoring. Therefore, the plotting effort is greater than for p
charting, which requires the creation of only a single chart. Also, as implied above,
there can be a choice between using one or more Xbar & R charts and a single p
chart. The p chart has the advantage of all nonconformity data summarized in a
single, interpretable chart. The important advantage of Xbar & R charts is that
generally many fewer runs are required for the chart to play a useful role in
detecting process shifts than if a p chart is used. Popular sample sizes for Xbar & R
charts are n = 5. Popular sample sizes for p charts are n = 200.
To review, an “assignable cause” is a change in the engineered system inputs
which occur irregularly that can be affected by “local authority”, e.g., operators,
process engineers, or technicians. For example, an engineer dropping a wrench into
the conveyor apparatus is an assignable cause.
The phrase “common cause variation” refers to changes in the system outputs
or quality characteristic values under usual circumstances. This variation occurs
because of the changing of factors that are not tightly controlled during normal
system operation.
As implied above, common cause variation is responsible for the majority of
quality problems. Typically only a small fraction of the units fail to conform to
specifications, and this fraction is consistently not zero. In general, it takes a major
improvement effort involving robust engineering methods including possibly
RDPM from the last chapter to reduce common cause variation. The values 6σ0,
Cpk, and σL derived from Xbar & R charting can be useful for measuring the
magnitude of the common cause variation.
An important realization in total quality management and six sigma training is
that local authority should be discouraged from making changes to the engineered
system when there are no assignable causes. These changes could cause an “over-
controlled” situation in which energy is wasted and, potentially, common cause
variation increases.
The usefulness of both p-charts and Xbar & R charts partially depends upon a
coincidence. When quality characteristics change because the associated
engineered systems change, and this change is large enough to be detected over
process noise, then engineers, technicians, and operators would like to be notified.
There are, of course, some cases in which the sample size is sufficiently large (e.g.,
when complete inspection is used) that even small changes to the engineered
system inputs can be detected. In these cases, the engineers, technicians, and
operators might not want to be alerted. Then, ad hoc adjustment of the formulas for
the limits and/or the selection of the sample size, n, and interval, τ, might be
justified.
for 100 units over 25 periods (inspecting 4 units each period). The average of the
characteristic is 55.0 PSI. The average range is 2.5 PSI. Suppose that during the
trial period it was discovered that one of the subgroups with average 62.0 and
range 4.0 was influenced by a typographical error and the actual values for that
period are unknown. Also, another subgroup with average 45.0 and range 6.0 was
not associated with any assignable cause. Determine the revised limits and Cpk.
Interpret the Cpk.
Answer: All units are in PSI. The trial limit calculations are:
Xbarbar = 55.0, Rbar = 2.5, σ0 = 2.5/2.059 = 1.21
UCLXbar = 55.0 + 3(1.21)(4–1/2) = 56.8
LCLXbar = 55.0 – 3(1.21)(4–1/2) = 53.2
UCLR = (4.698)(1.21) = 5.68
LCLR = (0.0)(1.21) = 0.00
The subgroup average 62.0 from the Xbarbar calculation and 4.0 from the Rbar
calculation are removed because the associated assignable cause was found and
eliminated. The other point was left in because no permanent fix was implemented.
Therefore, the revised limits and Cpk are derived as follows:
Xbarbar = [(55.0)(25)(4) – (62.0)(4)]/[(24)(4)] = 54.7
Rbar = [(2.5)(25) – (4.0)]/(24) = 2.4375, σ0 = 2.4375/2.059 = 1.18
UCLXbar = 54.7 + 3(1.18)(4–1/2) = 56.5
LCLXbar = 54.7 – 3(1.18)(4–1/2) = 53.0
UCLR = (4.698)(1.18) = 5.56
LCLR = (0.0)(1.21) = 0.00
Cpk = Minimum{59.0 – 54.7,54.7 – 46.0}/[(3)(1.18)] = 1.21
Therefore, the quality is high enough that complete inspection may not be needed
(Cpk > 1.0). The outputs very rarely vary by more than 7.1 PSI and are generally
close to the estimated mean of 54.7 PSI, i.e., values within 1 PSI of the mean are
common. However, if the mean shifts even a little, then a substantial fraction of
nonconforming units will be produced. Many six sigma experts would say that the
sigma level is indicative of a company that has not fully committed to quality
improvement.
The term “run rules” refers to specific patterns of charted quantities that may
constitute an out-of-control signal. For example, some companies institute policies
in which after seven charted quantities in a row are above or below the center line
(CL), then the designated people should investigate to look for an assignable cause.
They would do this just as if an Xbar,j or Rj were outside the control limits. If this
run rule were implemented, the second-to-last subgroup in the fixture gap example
during steady state would generate an out-of-control signal. Run rules are
potentially relevant for all types of charts including p-charts, demerit charts, Xbar
charts, and R charts.
Also, many other kinds of control charts exist besides the ones described in this
chapter. In general, each offers some advantages related to the need to inspect
Measure Phase and Statistical Charting 105
technique would you recommend for establishing which inspector has the more
appropriate inspection method?
Answer: Since the issue is systematic errors, the only relevant method is
comparison with standards. Also, this method is possible since standard values are
available.
Table 4.11. Summary of the SQC methods relevant to the measurement phase
In the context of six sigma projects, statistical process control charts offer
thorough evaluation of system performance. This is relevant both before and after
system inputs are adjusted in the measure and control or verify phases respectively.
In many relevant situation, the main goal of the six sigma analyze-and-improve
phases is to cause the charts established in the measure phase to generate “good”
out-of-control signals indicating the presence of a desirable assignable cause, i.e.,
the project team’s implemented changes.
For example, values of p, u, or R consistently below the lower control limits
after the improvement phase settings are implemented indicate success. Therefore,
it is often necessary to go through two start-up phases in a six sigma project during
the measure phase and during the control or verify phase. Hopefully, the
established process capabilities, sigma levels, and/or Cpk numbers will confirm
improvement and aid in evaluation of monetary benefits.
failed because team members tampered with the settings and actually increased the
fraction nonconforming. What next steps do you recommend?
Answer: Since there apparently is evidence that the process has become worse, it
might be advisable to return to the system inputs documented in the manufacturing
SOP prior to interventions by the previous team. Then, measurement systems
should be evaluated using the appropriate gauge R&R method unless experts are
confident that they should be trusted. If only the fraction nonconforming numbers
are available, p-charting should be implemented to thoroughly evaluate the system
both before and after changes to inputs.
4.9 References
Automotive Industry Task Force (AIAG) (1994) Measurement Systems Analysis
Reference Manual. Chrysler, Ford, General Motors Supplier Quality
Requirements Task Force
Montgomery DC, Runger GC (1994) Gauge Capability and Designed Experiments
(Basic Methods, Part I). Quality Engineering 6:115–135
Shewhart WA (1980) Economic Quality Control of Manufactured Product, 2nd
edn. ASQ, Milwaukee, WI
4.10 Problems
In general, pick the correct answer that is most complete.
1. According to the text, what types of measurement errors are found in standard
values?
a. Unknown measurement errors are in all numbers, even standards.
b. None, they are supposed to be accurate within the uncertainty
implied.
c. Measurements are entirely noise. We can’t really know any values.
d. All of the above are true.
e. Only the answers in parts “b” and “c” are correct.
The following information and Table 4.12 are relevant for Questions 4-8. Three
personal laptops are repeatedly timed for how long they require to open Microsoft
Internet Explorer. Differences greater than 3.0 seconds should be reliably
differentiated.
1 3 1 15 2
2 2 1 19 1
3 2 1 18 2
4 1 1 11 2
5 3 1 19 2
6 3 1 20 3
7 1 1 10 3
8 2 1 17 3
9 3 1 14 2
10 2 1 21 1
11 1 1 15 2
12 2 1 22 2
13 2 1 18 2
14 1 1 11 2
15 3 1 13 3
16 2 1 23 3
17 1 1 15 1
18 2 1 22 2
19 1 1 14 2
20 3 1 19 3
c. 2.20
d. 2.05
e. Answers “a” and “c” are both valid answers.
9. Following the text formulas, solve for Yrange parts (within the implied
uncertainty).
a. 0.24
b. 0.12
c. 0.36
d. 0.1403
e. 0.09
f. None of the above is correct.
110 Introduction to Engineering Statistics and Six Sigma
Part number
Appraiser A 1 2 3 4 5
Trial 1 0.24 0.35 0.29 0.31 0.24
Trial 2 0.29 0.39 0.32 0.34 0.25
Trial 3 0.27 0.34 0.28 0.27 0.26
Appraiser B
Trial 1 0.20 0.38 0.27 0.32 0.25
Trial 2 0.22 0.34 0.29 0.31 0.23
Trial 3 0.17 0.31 0.24 0.28 0.25
10. Following the text formulas, solve for R&R (within the implied uncertainty).
a. 0.140
b. 0.085
c. 0.164
d. 0.249
e. 0.200
f. None of the above is correct.
12. In three sentences or less, interpret the %R&R value obtained in Question 11.
14. Which of the following describes the relationship between common cause
variation and local authorities?
a. Local authority generally cannot reduce common cause variation on
their own.
Measure Phase and Statistical Charting 111
The following data set in Table 4.14 will be used to answer Questions 16-20. This
data is taken from 25 shifts at a manufacturing plant where 200 ball bearings are
inspected per shift.
Table 4.14. Hypothetical trial ball bearing numbers of nonconforming (nc) units
16. Where will the center line of a p-chart be placed (within implied uncertainty)?
a. 0.015
b. 0.089
c. 0.146
d. 431
e. 0.027
f. None of the above is correct.
17. How many trial data points are outside the control limits?
a. 0
112 Introduction to Engineering Statistics and Six Sigma
b. 1
c. 2
d. 3
e. 4
f. None of the above is correct.
18. Where will the revised p-chart UCL be placed (within implied uncertainty)?
a. 0.015
b. 0.089
c. 0.146
d. 0.130
e. 0.020
f. None of the above is correct.
19. Use software (e.g., Minitab® or Excel) to graph the revised p-chart, clearly
showing the p0, UCL, LCL, and percent nonconforming for each subgroup.
The call center data in Table 4.15 will be used for Questions 21-25..
Day Callers Time Value Security Day Callers Time Value Security
1 200 40 15 0 14 217 41 10 1
2 232 38 11 1 15 197 42 9 0
3 189 25 12 0 16 187 38 15 0
4 194 29 13 0 17 180 35 13 1
5 205 31 14 2 18 188 37 16 4
6 215 33 16 1 19 207 38 13 2
7 208 37 13 0 20 202 35 11 1
8 195 32 10 0 21 206 39 12 0
9 175 31 9 1 22 221 42 18 0
10 140 15 2 0 23 256 43 10 1
11 189 29 11 0 24 229 19 20 0
12 280 60 22 3 25 191 40 14 0
13 240 36 17 1 26 209 31 11 1
Measure Phase and Statistical Charting 113
In Table 4.15, errors and their assigned weighting values are as follows: time took
too long (weight is 1), value of quote given incorrectly (weight is 3), and security
rules not inforced (weight is 10). Assume all the data are available when the chart
is being set up
23. How many out-of-control signals are found in this data set?
a. 0
b. 1
c. 2
d. 3
e. 4
f. None of the above is correct.
24. Create the trial or startup demerit chart in Microsoft® Excel, clearly showing
the UCL, LCL, CL, and process capability for each subgroup.
25. In steady state, what actions should be taken for out-of-control signals?
a. Always immediately remove them from the data and recalculate
limits.
b. Do careful detective work to find causes before making
recommendations.
c. In some cases, it might be desirable to shut down the call center.
d. All of the above are correct.
e. All of the above are correct except (a) and (d).
Table 4.16 will be used in Questions 27-32. Paper airplanes are being tested, and
the critical characteristic is time aloft. Every plane is measured, and each subgroup
is composed of five successive planes.
No. X1 X2 X3 X4 X5 No. X1 X2 X3 X4 X5
1 2.1 1.8 2.3 2.6 2.6 14 2.4 2.2 1.9 2.8 2.3
2 2.7 1.5 2.1 2.5 1.9 15 2.2 2.4 2.5 2.9 1.5
3 2.0 1.7 1.6 1.9 2.0 16 2.5 1.8 1.7 2.4 2.4
4 1.6 2.0 2.1 2.2 2.1 17 2.9 2.6 2.6 2.2 2.2
5 2.1 2.1 2.6 2.2 2.1 18 3.1 2.8 3.1 3.4 3.3
6 2.0 2.8 2.5 2.9 2.0 19 3.5 3.2 2.9 3.5 3.4
7 2.7 1.7 2.4 2.5 2.8 20 3.3 3.0 3.1 2.7 2.8
8 2.2 2.0 2.4 2.1 2.4 21 2.8 3.6 3.3 3.3 3.2
9 1.8 1.9 1.7 2.1 2.2 22 3.1 3.4 3.4 3.1 3.4
10 2.2 2.1 2.9 1.7 2.0 23 2.0 2.5 2.4 2.3 2.4
11 1.4 2.6 1.8 2.0 2.4 24 2.7 2.3 2.4 2.8 2.1
12 2.3 2.5 2.4 1.8 1.9 25 2.5 2.2 2.5 2.2 2.0
13 2.4 2.3 1.9 2.1 2.2
27. What is the starting centerline for the Xbar chart (within the implied
uncertainty)?
a. 2.1
b. 2.3
c. 2.4
d. 3.3
e. 3.5
f. None of the above is correct.
28. How many subgroups generate out-of-control signals for the trial R chart?
a. 0
b. 1
c. 2
d. 3
e. 4
f. None of the above is correct.
29. How many subgroups generate out-of-control signals for the trial Xbar chart?
a. 0
b. 1
Measure Phase and Statistical Charting 115
c. 2
d. 3
e. 4
f. None of the above is correct.
30. What is the UCL for the revised Xbar chart? (Assume that assignable causes
are found and eliminated for all of the out-of-control signals in the trial chart.)
a. 1.67
b. 1.77
c. 2.67
d. 2.90
e. 3.10
f. None of the above is correct.
31. What is the UCL for the revised R chart? (Assume that assignable causes are
found and eliminated for all of the out-of-control signals in the trial chart.)
a. 0.000
b. 0.736
c. 1.123
d. 1.556
e. 1.801
f. None of the above is correct.
32. Control chart the data and propose limits for steady state monitoring. (Assume
that assignable causes are found and eliminated for all of the out-of-control
signals in the trial chart.)
33. Which of the following relate run rules to six sigma goals?
a. Run rules generate additional false alarms improving chart
credibility.
b. Sometimes LCL = 0 so run rules offer a way to check project success.
c. Run rules signal assignable causes in start-up, improving capability
measures.
d. All of the above are correct.
e. All of the above are correct except (a) and (d).
36. It is often true that project objectives can be expressed in terms of the limits on
R charts before and after changes are implemented. The goal can be that the
limits established through an entirely new start-up and steady state process
should be narrower. Explain briefly why this goal might be relevant.
Analyze Phase
5.1 Introduction
In Chapter 3, the development and documentation of project goals was discussed.
Chapter 4 described the process of evaluating relevant systems, including
measurement systems, before any system changes are recommended by the project
team.The analyze phase involves establishing cause-and-effect relationships
between system inputs and outputs. Several relevant methods use different data
sources and generate different visual information for decision-makers. Methods
that can be relevant for system analysis include parts of the design of experiments
(DOE) methods covered in Part II of this book and previewed here. Also, QFD
Cause & Effects Matrices, process mapping, and value-stream mapping are
addressed. Note that DOE methods include steps for both analysis and
development of improvement recommendations. As usual, all methods could
conceivably be applied at any project phase or time, as the need arises.
Step 1. (Both methods) Create a “ ” for each predefined operation. Note if the
operation does not have a standard operating procedure. Use a double-box
shape for automatic processes and storage. Create a “¡” for each decision
point in the overall system. Use an oval for the terminal operation (if any).
Use arrows to connect the boxes with the relevant flows.
Step 2. (Both methods) Under each box, label any noise factors that may be
varying uncontrollably, causing variation in the system outputs, with a “z”
symbol. Also, label the controllable factors with an “x” symbol. It may also
be desirable to identify any gaps in standard operating procedure (SOP)
documentation.
Step 3. (VSM only) Identify which steps are “value added” or truly essential.
Also, note which steps do not have documented standard operating
procedures.
Step 4. (VSM only) Draw a map of an ideal future state of the process in which
certain steps have been simplified, shortened, combined with others, or
eliminated. This step requires significant process knowledge and practice
with VSM.
A natural next step is the transformation of the process to the ideal state. This
would likely be considered as part of an improvement phase.
Analyze Phase 119
Question: A Midwest manufacturer making steel vaults has a robotic arc welding
system which is clearly the manufacturing bottleneck. A substantial fraction of the
units require intensive manual inspection and rework. VSM the system and
describe the possible benefit of the results.
non-value-added tasks is possible. Figure 5.2 shows an ambitious ideal future state.
Another plan might also include necessary operations like transport.
Fixture Weld
The above example illustrates that process mapping can play an important role
in calling attention to non-value-added operations. Often, elimination of these
operations is not practically possible. Then documenting and standardizing the
non-value-added operation can be useful and is generally considered good practice.
In the Toyota vocabularly, the term “necessary” can be used to described these
operations. Conveyor transportation in the above operations might be called
necessary.
To gain a clearer view of “ideal” systems, it is perhaps helpful to know more about
Toyota. The “Toyota Production System” used by Toyota to make cars has
evolved over the last 50-plus years, inspiring great admiration among competitors
and academics. Several management philosophies and catch phrases have been
derived from practices at Toyota, including “just in time” (JIT) manufacturing,
“lean production”, and “re-engineering”. Toyota uses several of these novel
policies in concert as described in Womack and Jones (1999). The prototypical
Toyota system includes:
• “U-shaped cells,” which has workers follow parts through many
operations. This approach appears to build back some of the “craft”
accountability lost by the Ford mass-production methods. Performing
operations downstream, workers can gain knowledge of mistakes made in
earlier operations.
• “Mixed production” implies that different types of products are made
alternatively so that no batches greater than one are involved. This results
in huge numbers of “set-ups” for making different types of units.
However, the practice forces the system to speed up the set-up times and
results in greatly reduced inventory and average cycle times (the time it
takes to turn raw materials into finished products). Therefore, it is often
more than possible to compensate for the added set-up costs by reducing
costs of carrying inventory and by increasing sales revenues through
offering reduced lead times (time between order and delivery).
• “Pull system” implies that production only occurs on orders that have
already been placed. No items are produced based on projected sales. The
pull system used in concert with the other elements of the Toyota
Production System appears to reduce the inventory in systems and reduce
the cycle times.
Analyze Phase 121
5 4 and
A= 2 3 A′ = 5 2 –3
–3 4 4 3 4
When two matrices are multiplied, each entry in the result is a row in the first
matrix “product into” a column in the second matrix. In this product operation,
each element in the first row is multiplied by a corresponding element in the
second column and the results are added together. For example:
5 4 B= 2 12
A= 2 3 6 9
–3 4
122 Introduction to Engineering Statistics and Six Sigma
The C&E method uses input information from customers and engineers and
generally requires no building of expensive prototype units. In performing this
exercise, one fills out another “room” in the “House of Quality,” described further
in Chapter 6.
Answer: In discussions with software users, seven issues were identified (qc = 7).
Consensus ratings for the importance of each were developed. Through discussions
with software engineers, the guesses were made about the correlations (were
guessed) between the customer issues and the m = 9 inputs. No ouputs were
considered (q = 0). Table 5.1 shows the results, including the factor ratings (F′) in
the bottom column.
The results suggest that regression formula outputs and a wizard for first-timers
should receive top priority if the desires of customers are to be respected.
The above example shows how F′ values are often displayed in line with the
various system inputs and outputs. This example is based on real data from a
Midwest software company. The next example illustrates how different customer
needs can suggest other priorities. This could potentially cause two separate teams
to take the same product in different directions to satisfy two markets.
Analyze Phase 123
Table 5.1. Cause and effect matrix for software feature decision-making
Include random
Include logistic
formula output
delimited data
Include DOE
optimization
Import text
Regression
regression
Improved
examples
Improve
stability
Include
factors
timers
Customer
issues
Easy to use 5 1 1 1 1 1 5 5 7 7
Helpful user
6 1 6 1 1 1 10 5 6 6
environment
Good
6 1 1 1 1 1 1 9 4 4
examples
Powerful
5 8 9 5 5 5 6 2 2 1
enough
Enough
methods 8 8 4 7 7 5 1 2 1 1
covered
Good help
5 1 1 1 1 1 1 6 8 9
and support
Low price 8 7 5 7 7 7 4 2 5 4
F′ = 182 169 159 159 143 166 181 193 185
Answer: The new factor rating vector is F′ = [193, 195, 156, 156, 142, 183, 186,
183, 172]. This implies that the highest priorities are text delimited data analysis
and logistic regression.
random run ordering in DOE as essential for the establishing “proof” in the
statistical sense.
Regression is also relevant when the choice of input combinations has not been
carefully planned. Then, the data is called “on-hand,” and statistical proof is not
possible in the purest sense.
DOE methods are classified into several types, which include screening using
fractional factorials, response surface methods (RSM), and robust design
procedures. Here, we focus on the following three types of methods:
• Screening using fractional factorial methods begin with a long list of
possibly influential factors; these methods output a list of factors (usually
fewer in number) believed to affect the average response, and an
approximate prediction model.
• Response surface methods (RSM) begin with factors believed to be
important. These methods generate relatively accurate prediction models
compared with screening methods, and also recommended engineering
input settings from optimization.
• Robust design based on process maximization (RDPM) methods begin
with the same inputs as RSM; they generate information about the
control-by-noise interactions. This information can be useful for making
the system outputs more desirable and consistent, even accounting for
variation in uncontrollable factors.
$500
$450
Profit
$400
$350
$8 $10 $12 $14 $16
Price
Figure 5.3. Predicted restaurant profits as a function of spaghetti meal price
Analyze Phase 125
Answer: This would be DOE if the runs were conducted in random order, but
constantly increasing price is hardly random. Without randomness, there is no
proof, only evidence. Also, several factor settings are usually varied
simultaneously in DOE methods.
Step 1. Create a list of the q customer issues or “failure modes” for which failure to
meet specifications might occur.
Step 2. Document the failure modes using a causal language. Also document the
potentially harmful effects of the failures and notes about the causes and
current controls.
Step 3. Collect from engineers the ratings (1-10, with 10 being high) on the
severity, Si, occurrence, Oi, and detection, Di, all for all i = 1,…,q failure
modes.
Step 4. Calculate the risk priority numbers, RPNi, calculated using RPNi = SiOiDi
for i = 1,…,q .
Step 5. Use the risk priority numbers to prioritize the need for additional
investments of time and energy to improve the inspection controls.
Question: Perform FMEA to analyze the threats to a toddler at one time and in
one home.
Detection
Severity
Answer: In this case, any harm to the toddler is considered nonconformity. Table
5.2 shows an FMEA analysis of the perceived threats to a toddler. The results
suggest that the most relevant failure modes to address are TV watching, old milk,
and slamming doors. The resulting analysis suggests discussion with childcare
Analyze Phase 127
providers and efforts to limit total TV watching to no more than three episodes of
the program “Blue’s Clues” each day. Also, door stops should be used more often
and greater care should be taken to place old sippy cups out of reach. Periodic
reassessment is recommended.
Occurence
Detection
Severity
Controlled
RPN
Potential Potential Current
factors and Potential causes
failure modes failure effects control
responses
Fixture gap,
Distortion Down stream Visual &
Out-of-spec. 4 offset, voltage, 10 4 160
(Flatness) $ & problems informal
others
Operator
Failures by All of the Visual &
Fixture Gap 5 variability & 9 5 225
undercut above informal
part variability
All of the
Initial Cosmetic & Size of sheet, Visual &
above plus no 5 4 5 100
Flatness yielding transportation informal
weld
Voltage See undercut Gauge &
See above 5 Power supply 3 2 30
Variability & penetration informal
5.7 References
Banks J, Carson JS, Nelson NB (2000) Discrete-Event System Simulation, 3rd edn.
Pearson International, Upper Saddle River, NJ
Irani SA, Zhang H, Zhou J, Huang H, Udai TK, Subramanian S (2000) Production
Flow Analysis and Simplification Toolkit (PFAST). International Journal of
Production Research 38:1855-1874
Law A, Kelton WD (2000) Simulation Modeling and Analysis, 3rd edn. McGraw-
Hill, New York
Liker J (ed) (1998) Becoming Lean: Inside Stories of U.S. Manufacturers.
Productivity Press, Portland, OR
Suzaki K (1987) The New Manufacturing Challenge. Simon & Schuster, New
York
Womack JP, Jones DT (1996) Lean Thinking. Simon & Schuster, New York
Womack JP, Jones DT (1999) Learning to See, Version 1.2. Lean Enterprises
Instititute Incorporated
5.8 Problems
In general, pick the correct answer that is most complete.
2. Apply process mapping steps (without the steps that are only for value stream
mapping) to a system that you might improve as a student project. This system
must involve at least five operations.
3. Perform Steps 3 and 4 to the system identified in solving the previous problem
to create a process map of an ideal future state. Assume that sufficient
resources are available to eliminate all non-value added operations.
b. Kanban cards can limit the total amount of inventory in a plant at any
time.
c. U-shaped cells cause workers to perform only a single specialized
task well.
d. All of the above are correct.
e. All of the above are correct except (a) and (d).
Table 5.5 contains hypothetical data on used motorcycles for questions 6-8.
Table 5.5. Cause and effect matrix for used motorcycle data
Customer importance
PSI capacity
Racer customer issues
Handling feels sticky 7 2 4 7 3 6 2
Tires seem worn down 4 4 5 7 6 7 2
Handling feels stable 1 9 8 6 6 5 3
Good traction around
3 1 3 9 5 6 6
turns
Cost is low 7 8 7 9 2 4 3
Installation is difficult 2 5 4 2 5 3 8
Performance is good 1 9 5 4 8 1 1
Wear-to-weight ratio is
1 9 1 7 1 5 2
good
Factor rating number (F') 231 274 141 137 203 142
6. What issue or issues do customers value most according to the C&E matrix?
a. The cost is low.
b. Performance is good.
c. Tires seem worn down.
d. Traction around turns is good.
e. Customers value answers “a” and “c” with equal importance.
Critic approval
script writers
Importance
group used
Stars used
Customer
Customer
Criteria
Story is
8 8 5 4 7
interesting
It was funny 9 8 5 4 7
It made me
4 6 2 4 5
inspired
It was a rush 6 6 8 6 5
It was cool 6 3 8 4 5
Factor Rating Number
232 219 198 205
(F')
10. Suppose the target audience thought the only criterion was inspiration. Which
variable would be the most important to focus on?
a. Money spent on script writers
b. Young male focus group used
132 Introduction to Engineering Statistics and Six Sigma
c. Stars used
d. Critic approval
e. All of the above would have equal factor rating numbers.
Occurrence
Detection
Severity
Controlled Potential
RPN
Potential failure Potential Current
factors and failure
effects causes control
responses modes
13. Which response or output probably merits the most attention for quality
assurance?
a. Freshness
b. Taste
c. Number of chips
d. Size
e. Burn level
14. How many failure modes are represented in this study?
Analyze Phase 133
a. 3
b. 4
c. 5
d. 6
e. 7
Occurrence
Detection
Severity
Controlled Potential
RPN
Potential failure Current
factors and failure Potential causes
effects control
responses modes
16. If the system were changed such that it would be nearly impossible for the
explosion failure mode to occur (occurrence = 1) and no other failure mode
was affected, the highest priority factor or response to focus on would be:
a. Rubber width
b. Bead thickness
c. PSI capacity
d. Tire height
134 Introduction to Engineering Statistics and Six Sigma
17. If the system were changed such that detection of all issues was near perfect
and no other issues were affected (detection = 1 for all failure modes), the
lowest priority factor or response to focus on would be:
a. Rubber width
b. Bead thickness
c. PSI capacity
d. Tire height
18. Critique the toddler FMEA analysis, raising at least two issues of possible
concern.
6.1 Introduction
In Chapter 5, methods were described with goals that included clarifying the input-
output relationships of systems. The purpose of this chapter is to describe methods
for using the information from previous phases to tune the inputs and develop
tentative recommendations. The phrase “improvement phase” refers to the
situation in which an existing system is being improved. The phrase “design
phase” refers to the case in which a new product is being designed.
The recommendations derived from the improve or design phases are generally
considered tentative. This follows because usually the associated performance
improvements must be confirmed or verified before the associated standard
operating procedures (SOPs) or design guidelines are changed or written.
Here, the term “formality” refers to the level of emphasis placed on data and/or
computer assistance in decision-making. The methods for improvement or design
presented in this chapter are organized by their level of formality. In cases where a
substantial amount of data is available and there are a large number of potential
options, people sometimes use a high level of formality and computer assistance.
In other cases, less information is available and/or a high degree of subjectivity is
preferred. Then “informal” describes the relevant decision-making style. In
general, statistical methods and six sigma are associated with relatively high levels
of formality.
Note that the design of experiments (DOE) methods described in Chapter 5 and
in Part II of this book are often considered to have scope beyond merely clarifying
the input-output relationships. Therefore, other books and training materials
sometimes categorize them as improvement or design activities. The level of
formality associated with DOE-supported decision-making is generally considered
to be relatively high.
This section begins with a discussion of informal decision-making including
so-called “seat-of-the-pants” judgments. Next, moderately formal decision-
making is presented, supported by so-called “QFD House of Quality,” which
combines the results of benchmarking and C&E matrix method applications.
136 Introduction to Engineering Statistics and Six Sigma
Answer: An engineer at the energy company might look at Table 6.1 and decide
to implement complete inspection. This tentative choice can be written x1 =
complete inspection. With this choice, fractures caused by porosity might still
Improve or Design Phase 137
cause problems but they would no longer constitute the highest priority for
improvement.
Table 6.1. Hypothetical FMEA for aluminum welding process in energy application
Occurrence
Detection
Severity
RPN
Potential Potential Current
Failure mode
effect cause control
The next example illustrates a deliberate choice to use informal methods even
when a computer has assisted in identifying possible input settings. In this case, the
best price for maximizing expected profits is apparently known under certain
assumptions. Yet, the decision-maker subjectively recommends different inputs.
Question: Use the DOE and regression process described in the example in
Chapter 5 to support an informal decision process.
Answer: Rather than selecting a $12 menu price for a spaghetti dinner because
this choice formally maximizes profit for this item, the manager might select $13.
This could occur because the price change could fit into a general strategy of
becoming a higher “class” establishment. The predicted profits indicate that little
would be lost from this choice.
Answer: A reasonable set of choices in this case might be to implement all of the
known settings for Company 3. This would seem to meet the targets set by the
engineers. Then, tentative recommendations might be: x1 = 40, x2 = 9. x3 = 5, x4 =
15, x5 = 2.5, x6 = 8, x7 = 19, x8 = 0.8, x9 = 9.7, x10 = 1.0, x11 = 3.0, x12 = 23, x13 =
Yes, x14 = Yes, and x15 = 1.5, where the input vector is given in the order of the
imputs and outputs in Table 6.2. Admittedly, it is not clear that these choices
maximize the profits, even though these choice seem most promising in relation to
the targets set by company engineers and management.
In the preceding example, the choice was made to copy another company to the
extent possible. Some researchers have provided quantitative ways to select
settings from QFD activities; see Shin et al. (2002) for a recent reference. These
approaches could result in recommendations having settings that differ from all of
the benchmarked alternatives. Also, decision-makers can choose to use QFD
simply to aid in factor selection in order to perform followup design of
experiments method applications or to make sure that a formal decision process is
considering all relevant issues. One of the important benefits of applying QFD to
support decision-making is increasing confidence that the project team has
thoroughly accounted for many types of considerations. Even when applying QFD,
information can still be incomplete, making it unadvisable to copy best in-class
competitors without testing and/or added data collection.
Improve or Design Phase 139
Customer criterion
Incidence crevices (undercut)
Incidence of cracks (cracks)
Factor rating number (F')
Company 3
Company 2
Company 1
Targets
3.0 8.3 4.0 8.0 9.0 6.0 9.0 6.0 5.0 3.3 Company 1
4.0 8.0 4.7 9.0 9.7 8.0 9.0 9.3 5.0 4.0 Company 2
7.0 7.7 7.0 9.0 10.0 9.0 9.0 8.3 5.3 8.0 Company 3
140 Introduction to Engineering Statistics and Six Sigma
where x1 ∈ [2,5] is called a “constraint” because it limits the possibilities for x1. It
can also be written 2 x1 5.
The term “optimization formulation” is synonymous with optimization
program. The term “formulating” refers to the process of transforming a word
problem into a specific optimization program that a computer could solve. The
study of “operations research” focuses on the formulation and solutions of
optimization programs to tune systems for more desirable results. This is the study
of perhaps the most formalized decision processes possible.
Figure 6.1. Screen shot showing the application of the Excel solver
Improve or Design Phase 141
Figure 6.1 shows the application of the Excel solver to derive the solution of
this problem, which is x1 = 3.0. The number 3.0 appears in cell “A1” upon pressing
the “Solve” button. To access the solver, one may need to select “Tools”, then
“Add-Ins…”, then check the “Solver Add-In” and click OK. After the Solver is
added in, the “Solver…” option should appear on the “Tools” menu.
The term “optimal solution” refers to the settings generated by solvers when
there is high confidence that the best imaginable settings have been found. In the
problem shown in Figure 6.2, it is clear that x1 = 3.0 is the optimal solution since
the objective is a parabola reaching its highest value at 3.0.
Parts II and III of this book contain many examples of optimization
formulations. In addition, Part III contains computer code for solving a wide
variety of optimization problems. The next example illustrates a real-world
decision problem in which the prediction models come from an application of
design of experiments (DOE) response surface methods (RSM). This example is
described further in Part II of this book. It illustrates a case in which the settings
derived by the solver were recommended and put into production with largely
positive results.
Question: Suppose a design team is charged with evaluating whether plastic snap
tabs can withstand high enough pull-apart force to replace screws. Designers can
manipulate design inputs x1, x2, x3, and x4 over allowable ranges –1.0 to 1.0. These
inputs are dimensions of the design in scaled units. Also, a requirement is that each
snap tab should require less than 12 pounds (386 N). From RSM, the following
models are available for pull-apart force (yest,1) and insertion force (y est,2) in
pounds:
yest,1(x1, x2, x3, x4) = 72.06 + 8.98 x1 + 14.12 x2 + 13.41 x3 + 11.85 x4 + 8.52 x12 –
6.16 x22 + 0.86 x32 + 3.93 x1 x2 – 0.44 x1x3– 0.76 x2x3
y est,2(x1, x2, x3, x4) = 14.62 + 0.80 x1 + 1.50 x2 – 0.32 x3 – 3.68 x4 – 0.45 x12 – 1.66x32
+ 7.89 x42 – 2.24 x1 x3 – 0.33 x1 x4 + 1.35 x3 x4.
Formulate the relevant optimization problem and solve it.
Note that plots of the objectives and constraints can aid in building human
appreciation of the solver results. People generally want more than just a
recommended solution or set of system inputs. They also want some appreciation
142 Introduction to Engineering Statistics and Six Sigma
of how sensitive the objectives and constraints are to small changes in the
recommendations. In some cases, plots can spot mistakes in the logic of the
problem formulation or the way in which data was entered into the solver.
Figure 6.2 shows a plot of the objective contours and the insertion force
constraint for the snap tab optimization example. Note that dependence of
objectives and constraints can only be plotted as a function of two input factors in
this way. The plot shows that a conflict exists between the goals of increasing
pull-apart forces and decreasing insertion forces.
1.0
Insertion force
12
120
Optimal X
x4
110
12
100
90
80
70
-1.0 x2 1.0
Figure 6.2. The insertion force constraint on pull force contours with x1=1 and x3=1
6.6 References
Brady JE (2005) Six Sigma and the University: Research, Teaching, and Meso-
Analysis. PhD dissertation, Industrial & Systems Engineering, The Ohio
State University, Columbus
Clausing D (1994) Total Quality Development: A Step-By-Step Guide to World-
Class Concurrent Engineering. ASME Press, New York
Shin JS, Kim KJ, Chandra MJ (2002) Consistency Check of a House of Quality
Chart. International Journal of Quality & Reliability Management 19:471-
484
6.7 Problems
In general, pick the correct answer that is most complete.
2. According to this chapter, the study of the most formal decision processes is
called:
a. Quality Function Deployment (QFD)
b. Optimization solvers
c. Operations Research (OR)
d. Theory of Constraints (TOC)
e. Design of Experiments (DOE)
6. Which of the setting changes for Company 1 seems most supported by the
HOQ?
a. Change the area used per page to 9.5
b. Change the arm height from 2 m to 1 m
c. Change to batched production (batched or not set to yes)
d. Change paper thickenss to 1 mm
Batched or not
Company 1
Company 2
Company 3
Customer criterion
Paper failure at fold
4 2 4 8 3 1 2 1 4 1 3.33 4 8
(ripping)
Surface roughness
2 4 1 6 2 1 2 1 5 2 5 5 5.33
(crumpling)
Immediate flight
8 5 3 4 2 2 1 2 1 1 8 7.66 8.33
failure (falls)
Holes in wings
6 7 8 6 2 2 1 1 1 1 6 8 9
(design flaw)
Wings unfold
5 1 7 6 5 1 1 2 1 2 9 9.66 10
(flopping)
Ugly appearance
9 2 9 8 2 1 1 1 1 1 3 4 7
(aesthetics)
Factor rating
- 121 206 214 87 48 40 47 54 41
number F'
Company 1 - 35 8 15 2 2.00% 15 2 2 No
Company 2 - 42 9.2 12 2 0.10% 0 1 2 No
Company 3 - 42 9.5 18 3 8.00% 10 2 2 Yes
8. List two benefits of applying QFD compared with using only formal
optimization.
9. In two sentences, explain how changing the targets could affect supplier
selection.
146 Introduction to Engineering Statistics and Six Sigma
10. Create an HOQ with at least four customer criterion, two companies, and three
engineering inputs. Identify the reasonable recommended inputs.
14. Formulate and solve an optimization problem from your own life. State all
your assumptions in reasonable detail.
7.1 Introduction
If the project involves an improvement to existing systems, the term “control” is
used to refer to the final six sigma project phase in which tentative
recommendations are confirmed and institutionalized. This follows because
inspection controls are being put in place to confirm that the changes do initially
increase quality and that they continue to do so. If the associated project involves
new product or service design, this phase also involves confirmation. Since there is
less emphasis on evaluating a process on an on-going basis, the term “verify”
refers evaluation on a one-time, off-line basis.
Clearly, there is a chance that the recommended changes will not be found to be
an improvement. In that case, it might make sense to return to the analyze and/or
improvement phases to generate new recommendations. Alternatively, it might be
time to terminate the project and ensure that no harm has been done. In general,
casual reversal of the DMAIC or DMADV ordering of activities might conflict
with the dogma of six sigma. Still, this can constitute the most advisable course of
action.
Chapter 6 presented methods and decision processes for developing
recommended settings. Those settings were called tentative because in general,
sufficient evidence was not available to assure acceptability. This chapter describes
two methods for thoroughly evaluating the acceptability of the recommended
system input settings.
The method of “control planning” refers to a coordinated effort to guarantee
that steady state charting activities will be sufficient to monitor processes and
provide some degree of safeguard on the quality of system outputs. Control
planning could itself involve the construction of gauge R&R method applications
and statistical process control charting procedures described in Chapter 4.
The method of “acceptance sampling” provides an economical way to
evaluate the acceptability of characteristics that might otherwise go uninspected.
Both acceptance sampling and control planning could therefore be a part of a
control or verification process.
148 Introduction to Engineering Statistics and Six Sigma
Overall, the primary goal of the control or verify phase is to provide strong
evidence that the project targets from the charter have been achieved. Therefore,
the settings should be thoroughly tested through weeks of running in production, if
appropriate. Control planning and acceptance sampling can be useful in this
process. Ultimately, any type of strong evidence confirming the positive effects of
the project recommendations will likely be acceptable. With the considerable
expense associated with many six sigma projects, the achievement of measurable
benefits of new system inputs is likely. However, a conceivable, useful role of the
control or verify phases is to determine that no recommended changes are
beneficial and the associated system inputs should not be changed.
Finally, the documentation of any confirmed input setting changes in the
corporate standard operating procedures (SOPs) is generally required for
successful project completion. This chapter begins with descriptions of control
planning and acceptance sampling methods. It concludes with brief comments
about appropriate documentation of project results.
Note that after the control plan is created, it might make sense to consider
declaring characertistics with exceedingly high Cpk values not to be critical.
Nonconformities for these characteristics may be so rare that monitoring them
could be a waste of money.
The following example illustrates a situation involving a complicated control
plan with many quality characteristics. The example illustrates how the results in
control planning can be displayed in tabular format. In the example, the word
“quarantine” means to separate the affected units so that they are not used in
downstream processes until after they are reworked.
Question 1: If the data in Table 7.1 were real, what is the smallest number of
applications of gauge R&R (crossed) and statistical process control charting (SPC)
that must have been performed?
Answer 1: At least five applications of gauge R&R (crossed) and four applications
of Xbar & R charting must have been done. Additionally, it is possible that a p-
chart was set up to evaluate and monitor the fraction of nonconforming units
requiring rework, but the capability from that p-chart information is not included in
Table 7.1.
Answer 2: Gauge R&R results confirm that penetration was well measured by the
associated X-ray inspection. However, considering Cpk > 2.0 and σL > 6.0,
inspection of that characteristic might no longer be necessary. Yes, removing it
from the critical list might be warranted.
Table 7.1. A hypothetical control plan for ship structure arc welding
Critical
Measurement Control % Period Sample Reaction
quality or Cpk
iechnique method R&R (τ) size (n) plan
issue
Fixture Xbar &
Adjust &
maximum Caliper R 12.5% 1.0 1 shift 6
check
gaps charting
Initial Xbar &
Photo- Adjust &
flatness R 7.5% 0.8 1 shift 4
grammetry check
(mm) charting
100%
Spatter Visual 9.3% NA NA 100% Adjust
insp.
Distortion Xbar &
Photo- Quarantine
(rms R 7.5% 0.7 1 shift 4
grammetry and rework
flatness) charting
p-
Seen
Visual charting Notify shift
Appearance as not NA 100% 100%
go-no-go & check supervisor
needed
sheet
Xbar &
Penetration X-ray Notify shift
R 9.2% 2.1 1 shift 4
depth (mm) inspection supervisor
charting
Question 3: Suppose the p-chart shown in Table 7.1 was set up to evaluate and
monitor rework. How could this chart be used to evaluate a six sigma project?
Clearly, the exercise of control planning often involves balancing the desire to
guarantee a high degree of safety against inspection costs and efforts. If the
control plan is too burdensome, it conceivably might not be followed. The effort
implied by the control plan in the above would be appropriate to a process
involving valuable raw materials and what might be regarded as “high” demands
on quality. Yet, in some truly safety critical applications in industries like
aerospace, inspection plans commonly are even more burdensome. In some cases,
complete or 100% inspection is performed multiple times.
Control or Verify Phase 151
0.25
0.20
0.15
0.10
0.05
0.00
1 5 9 13 17 21 25
Subgroup (shift)
Figure 7.1. Follow-up SPC chart on total fraction nonconforming
For these reasons, acceptance sampling can be used as part of virtually any
system, even those requiring high levels of quality.
The phrase “acceptance sampling policy” refers to a set of rules for
inspection, analysis, and action related to the possible return of units to a supplier
or upstream process. Many types of acceptance sampling policies have been
proposed in the applied statistics literature. These policies differ by their level of
complexity, cost, and risk trade-offs. In this book, only “single sampling” and
“double sampling” acceptance sampling policies are presented.
Step 1. Carefully select n units for inspection such that you are reasonably
confident that the quality of these units is representative of the quality of
the N units in the lot, i.e., they constitute a rational subgroup.
Step 2. Inspect the n units and determine the number d that do not conform to
specifications.
Step 3. If d > c, then the lot is rejected. Otherwise the lot is “accepted” and the d
units found nonconforming are reworked or scrapped.
Rejection of a lot generally means returning all units to the supplier or upstream
sub-system. This return of units often comes with a demand that the responsible
people should completely inspect all units and replace nonconforming units with
new or reworked units. Note that the same inspections for an acceptance sampling
policy might naturally fit into a control plan in the control phase of a six sigma
process. One might also chart the resulting data on a p-chart or demerit chart.
Control or Verify Phase 153
Question 2: Is there a risk that a lot with ten nonconforming units would pass
through this acceptance sampling control?
Answer 2: Yes, there is a chance. In Chapter 10, we show how to calculate the
probability under standard assumptions, which is approximately 0.7. An OC curve,
also described in Chapter 10, could be used to understand the risks better.
Step 1. Carefully select n1 units for inspection such that you are reasonably
confident that the quality of these units is representative of the quality of
the N units in the lot, i.e., inspect a rational subgroup.
Step 2. Inspect the n1 units and determine the number d1 that do not conform to
specifications.
Step 3. If d1 > r, then the lot is rejected and process is stopped. If d1 ≤ c1, the lot is
said to be accepted and process is stopped. Otherwise, go to Step 4.
Step 4. Carefully select an additional n2 units for inspection such that you are
reasonably confident that the quality of these units is representative of the
quality of the remaining N – n1 units in the lot, i.e., inspect another rational
subgroup.
Step 5. Inspect the additional n2 units and determine the number d2 that do not
conform to specifications.
Step 6. If d1 + d2 ≤ c2, the lot is said to be “accepted”. Otherwise, the lot is
“rejected”.
Inspect n1
d1 = #Nonconforming
d1 ≤ c1?
Yes
No
“Accept” Lot “Reject” Lot
rework or scrap d1 + d2 units d1 > r? return lot upstream
assume others conforms Yes 100% inspection and sorting
No
Inspect n2
d2 = #Nonconforming
Yes No
d1 + d2 ≤ c2?
In general, if lots are accepted, then all of the items found to be nonconforming
must be reworked or scrapped. It is common at that point to treat all the remaining
units in a similar way as if they had been inspected and passed.
Control or Verify Phase 155
The selection of the parameters c1, c2, n1, n2, r, and d2 may be subjective or
based on military or corporate policies. Their values have implications for the
chance that the units delived to the customer do not conform to specifications.
Also, their values have implications for bottom-line profits. These implications are
studied more thoroughly in Chapter 10 to inform the selection of specific
acceptance sampling methods.
Question 3: If you were an operator, would you prefer this approach to complete
inspection? Explain.
7.6 References
Dodge HF, Romig HG (1959) Sampling Inspection Tables, Single and Double
Sampling, 2nd edn. Wiley, New York
7.7 Problems
In general, pick the correct answer that is most complete.
2. The technique most directly associated with guaranteeing that all measurement
equipment are capable and critical characteristics are being monitored is:
a. Process mapping
b. Benchmarking
c. Design of Experiments (DOE)
d. Control Planning
e. Acceptance Sampling
6. The text implies that FMEAs and control plans are related in which way?
a. FMEAs can help clarify whether characteristics should be declared
critical.
b. FMEAs determines the capability values to be included in the control
plan.
c. FMEAs determine the optimal reaction plans to be included in control
plans.
d. All of the above are correct.
e. All of the above are correct except (a) and (d).
Questions 7-9 derive from the paper airplane control plan in Table 7.2.
Critical
Measurement Control Sample Reaction
characteristic %R&R Cpk Period (τ)
technique method size (n) plan
or issue
Surface
Adjust &
roughness Laser X-bar & R 7.8 2.5 1 shift 10
re-check
(crumpling)
Unsightly Check
Quarantine
appearance Visual sheet, p- 20.4 0.4 2 shifts 100%
and rework
(aesthetics) chart
8. The above control plan implies that how many applications of gauge R&R
have been applied?
a. 0
b. 1
c. 2
d. 3
e. None of the above
Control or Verify Phase 159
9. Which part of implementing a control plan requires the most on-going expense
during steady state?
11. When considering sampling policies, the risks associated with accepting an
undesirable lot grows with:
a. Larger rational subgroup size
b. Decreased tolerance of nonconformities in the rational subgroup (e.g.,
lower c)
c. Increased tolerance of nonconformities in the overall lot (e.g., higher
c)
d. Decreased overall lot size
e. None of the above
Each day, 1000 screws are produced and shipped in two truckloads to a car
manufacturing plant. The screws are not sorted by production time. To determine
lot quality, 150 are inspected by hand. If 15 or more are defective, the screws are
returned.
Each shift, 1000 2’× 2’ sheets of steel enter your factory. Your boss wants to be
confident that approximately 5% of the accepted incoming steel is nonconforming.
14. Which of the following is correct and most complete for single sampling?
a. Acceptance sampling is too risky for such a tight quality constraint.
b. Assuming inspection is perfect, n = 950 and c = 0 could ensure
success.
160 Introduction to Engineering Statistics and Six Sigma
16. Design a double sampling plan that could be applied to this problem.
17. What is the maximum number of units inspected, assuming the lot is accepted?
19. Why do recorded voices on customer service voicemail systems say, “This call
may be monitored for quality purposes?”
8.1 Introduction
In the previous chapters several methods are described for achieving various
objectives. Each of these methods can be viewed as representative of many other
similar methods developed by researchers. Many of these methods are published in
such respected journals as the Journal of Quality Technology, Technometrics, and
The Bell System Technical Journal. In general, the other methods offer additional
features and advantages.
For example, the exponentially weighted moving average (EWMA) charting
methods described in this chapter provide a potentially important advantange
compared with Shewhart Xbar & R charts. This advantage is that there is generally
a higher chance that the user will detect assignable causes associated with only a
small shift in the continuous quality characteristic values that persists over time.
Also, the “multivariate charting” methods described here offer an ability to
monitor simultaneously multiple continuous quality characteristics. Compared with
multiple applications of Xbar & R charts, the multivariate methods (Hoteling’s T2
chart) generally cause many fewer false alarms. Therefore, there are potential
savings in the investigative efforts of skilled personnel.
Yet the more basic methods described in previous chapters have “stood the test
of time” in the sense that no methods exist that completely dominate them in every
aspect. For example, both EWMA and Hotelling’s T2 charting are more
complicated to implement than Xbar & R charting. Also, neither provide direct
information about the range of values within a subgroup.
Many alternative versions of methods have been proposed to process mapping,
gauge R&R, SPC charting, design of experiments, failure mode & effects analysis
(FMEA), formal optimization, Quality Function Deployment (QFD), acceptance
sampling, control planning. In this chapter, only two alternatives to Xbar & R
charting are selected for inclusion, somewhat arbitrarily: EWMA and multivariate
charting or Hoteling’s T2 chart.
162 Introduction to Engineering Statistics and Six Sigma
Generally, n is small enough that people are not interested in variable sample
sizes. In the formulas below, quantities next to each other are implicitly multiplied
with the “×” omitted for brevity. Also, “/” is equivalent to “÷”. The numbers in the
formulas 3.0 and 0.0 are assumed to have an infinite number of significant digits.
The phrase “EWMA chart” refers to the associated resulting chart. An out-of-
control signal is defined as a case in which Zj is outside the control limits. From
then on, technicians and engineers are discouraged from making minor process
changes unless a signal occurs. If a signal does occur, they should investigate to
see if something unusual and fixable is happening. If not, they should refer to the
signal as a false alarm.
Note that a reasonable alternative approach to the one above is to obtain Xbarbar
and σ0 from Xbar & R charting. Then, Zj and the control limits can be calculated
using Equations (8.3) and (8.4) in Algorithm 8.1.
Advanced SQC Methods 163
Step 1. (Startup) Measure the continuous characteristics, Xi,j, for i = 1,…,n units for
j = 1,…,25 periods.
Step 2. (Startup) Calculate the sample averages Xbar,j = (X1,j +…+ Xn,j)/n. Also,
calculate the average of all of the 25n numbers, Xbarbar, and the sample
standard deviation of the 25n numbers, s. The usual formula is
ҏ
s=ҏ
(X 1,1 − X barbar ) + (Y1, 2 − X barbar ) + ... + (X 25,n − X barbar )
2 2 2
.(8.1)
25n − 1
Step 3. (Startup) Set σ0 = s tentatively and calculate the “trial” control limits using
Table 8.1. Example gap data in millimeters (SU = Start Up, SS = Steady State)
1.10
EWMA Gap (mm) .
1.00
UCL
0.90
Z
0.80 CL
LCL
0.70
0.60
1 11 Subgroup 21 31
Figure 8.1. EWMA chart for the gap data ( separates startup and steady state)
x7
x1 x4
x3 x5 x6
x2
Production Line #2
Answer: Table 8.2 shows real data collected over 50 weeks. The plot in Figure
8.3 shows the ellipse that characterizes usual behavior and two out-of-control
signals.
Table 8.2. Systolic (xi1k) and diastolic (xi2k) blood pressure and weight (xi3k) data
k X1k X21k X31k X12k X22k X32k X13k X23k X33k k X11k X21k X31k X12k X22k X32k X13k X23k X33k
1 127 130 143 76 99 89 172 171 170 26 159 124 147 101 93 107 172 173 172
2 127 149 131 100 95 85 170 175 172 27 147 132 146 91 94 91 172 173 173
3 146 142 138 87 93 87 172 173 172 28 135 148 152 89 96 85 171 172 173
4 156 128 126 94 89 95 171 173 170 29 154 144 136 98 95 96 172 175 173
5 155 142 129 92 100 104 170 171 170 30 139 131 133 85 91 85 172 172 172
6 125 150 125 96 96 97 170 169 171 31 140 120 142 100 88 89 173 172 172
7 133 143 123 92 113 99 169 170 171 32 131 122 138 94 88 81 174 172 171
8 147 140 121 93 102 97 170 170 171 33 136 139 130 89 91 87 171 172 172
9 137 120 135 88 100 113 170 171 170 34 130 135 135 90 89 91 173 173 172
10 138 139 148 112 104 90 170 172 172 35 137 142 149 86 98 91 175 175 174
11 146 150 129 99 105 96 172 170 170 36 127 120 140 93 93 96 171 174 172
12 129 122 150 96 90 110 170 170 172 37 144 147 141 95 104 80 172 173 174
13 146 150 129 99 105 96 172 170 170 38 126 119 122 83 94 87 173 172 173
14 128 150 151 95 110 92 170 172 172 39 144 142 133 83 102 91 171 171 172
15 125 142 141 95 90 93 172 169 170 40 140 154 141 92 90 97 174 173 173
16 120 136 142 82 75 87 169 169 171 41 141 126 145 103 96 91 173 171 171
17 144 140 135 97 97 97 172 167 167 42 134 144 144 81 91 89 172 172 171
18 130 136 142 91 89 96 170 172 169 43 136 132 122 95 98 96 172 173 171
19 121 126 143 92 85 93 171 170 170 44 119 127 133 90 91 86 174 172 174
20 146 131 135 101 97 88 171 167 169 45 130 133 137 84 91 87 172 175 175
21 130 145 135 101 93 91 169 169 170 46 138 150 148 91 91 89 175 174 173
22 132 127 151 95 86 91 169 170 170 47 135 132 148 96 88 95 177 176 174
23 138 129 153 92 89 93 171 171 170 48 146 129 135 91 87 96 174 174 175
24 123 135 144 89 85 91 171 171 172 49 129 103 120 90 81 94 175 173 173
25 152 160 148 94 94 99 172 169 172 50 125 139 142 91 95 92 172 172 172
168 Introduction to Engineering Statistics and Six Sigma
177
ellipsoid
175
Weigh (lbs)
173
171
169
167
80 85 90 95 100 105
Diastolic (mm Hg)
Figure 8.3. Plot of average systolic and diastolic blood pressure
Question: Apply Hotelling’s T2 analysis to the data in Table 8.2. Describe any
insights gained.
Advanced SQC Methods 169
T 2 = n ( x − x ) ′S −1 ( x − x ) (8.11)
2 2
and plot. If T < LCL or T > UCL, then investigate. Consider removing the
associated subgroups from consideration if assignable causes are found that
make it reasonable to conclude that these data are not representative of usual
conditions.
Step 5 (Startup): Calculate the revised limits using the remaining r* units using
q( r * +1)( n − 1)
UCL = Fa , q, r*n − r*− q +1 and LCL = 0, (8.12)
r * n − r * −q + 1
where F comes from Table 8.3. Also, calculate the revised S matrix.
Step 6 (Steady state): Plot the T2 for new observations and have a designated person or
persons investigate out-of-control signals. If and only if assignable causes are
found, the designated local authority should take corrective action. Otherwise,
the process should be left alone.
170 Introduction to Engineering Statistics and Six Sigma
Answer: The following steps were informed by the data and consultation with the
friend involved. The method offered evidence that extra support should be given to
the friend during challenging situations including holiday travel and finding
suitable childcare, as shown in Figure 8.4.
40
child care
35 situation in limbo
holiday
30 heard apparently traveling
cause unknown
25 good news
2
T 20 UCL
15
10
5
0
0 10 20 30 40 50
Sample
Figure 8.4. Trial period in the blood pressure and weight example
Table 8.3. Critical values of the F distribution with α=0.01, i.e., Fα=0.01,ν1,ν2
ν1
ν2 1 2 3 4 5 6 7 8 9 10
1 405284.1 499999.5 540379.2 562499.6 576404.6 585937.1 592873.3 598144.2 602284.0 605621.0
2 998.5 999.0 999.2 999.2 999.3 999.3 999.4 999.4 999.4 999.4
3 167.0 148.5 141.1 137.1 134.6 132.8 131.6 130.6 129.9 129.2
4 74.1 61.2 56.2 53.4 51.7 50.5 49.7 49.0 48.5 48.1
5 47.2 37.1 33.2 31.1 29.8 28.8 28.2 27.6 27.2 26.9
6 35.5 27.0 23.7 21.9 20.8 20.0 19.5 19.0 18.7 18.4
7 29.2 21.7 18.8 17.2 16.2 15.5 15.0 14.6 14.3 14.1
8 25.4 18.5 15.8 14.4 13.5 12.9 12.4 12.0 11.8 11.5
9 22.9 16.4 13.9 12.6 11.7 11.1 10.7 10.4 10.1 9.9
10 21.0 14.9 12.6 11.3 10.5 9.9 9.5 9.2 9.0 8.8
11 19.7 13.8 11.6 10.3 9.6 9.0 8.7 8.4 8.1 7.9
12 18.6 13.0 10.8 9.6 8.9 8.4 8.0 7.7 7.5 7.3
13 17.8 12.3 10.2 9.1 8.4 7.9 7.5 7.2 7.0 6.8
14 17.1 11.8 9.7 8.6 7.9 7.4 7.1 6.8 6.6 6.4
15 16.6 11.3 9.3 8.3 7.6 7.1 6.7 6.5 6.3 6.1
Advanced SQC Methods 171
Step 1(Startup): The data are shown in Table 8.2 for n = 3 samples (roughly over the
period being one week), q = 3 characteristics (systolic and diastolic blood
pressure and weight), and r = 50 periods.
Step 2(Startup): The trial calculations resulted in
93.0 10.6 0.36
S= 10.6 35.6 0.21
0.36 0.21 1.3
Step 4 (Startup): The T2 statistics were calculated and charted in the below using
Step 6 (Steady state): Monitoring continued using equation (12) and the revised S.
Later data showed that new major life news caused a need to begin
medication about one year after the trial period finished.
Plugging that data into the formulas in Steps 2-5 generates rules for admissions. If
a new student applies with characteristics yielding an out-of-control signal as
calculated using Equation (8.11), admission might not be granted. That student
might be expected to perform in an unusual manner and/or perform poorly if
admitted.
8.5 Summary
This chapter has described two advanced statistical process control (SPC) charting
methods. First, exponential average moving average (EWMA) charting methods
are relevant when detecting even small shifts in a single quality characteristic.
They also provide a visual summary of the mean smoothed. Second, Hotelling’s T2
charts (also called multivariate control charts) permit the user to monitor a large
number of quality characteristics using a single chart. In addition to reducing the
burden of plotting multiple charts, the user can regulate the overall rate of false
alarms.
8.6 References
Hotelling H (1947) Multivariate Quality Control, Techniques of Statistical
Analysis. Eisenhard, Hastay, and Wallis, eds. McGraw-Hill, New York
Roberts SW (1959) Control Chart Tests Based on Geometric Moving Averages.
Technometrics 1: 236-250
8.7 Problems
In general, pick the correct answer that is most complete.
9.1 Introduction
This chapter contains two descriptions of real projects in which a student played a
major role in saving millions of dollars: the printed circuit board study and the wire
harness voids study. The objectives of this chapter include: (1) providing direct
evidence that the methods are widely used and associated with monetary savings
and (2) challenging the reader to identify situations in which specific methods
could help.
In both case studies, savings were achieved through the application of many
methods described in previous chapters. Even while both case studies achieved
considerable savings, the intent is not to suggest that the methods used were the
only appropriate ones. Method selection is still largely an art. Conceivably, through
more judicious selection of methods and additional engineering insights, greater
savings could have been achieved. It is also likely that luck played a role in the
successes.
The chapter also describes an exercise that readers can perform to develop
practical experience with the methods and concepts. The intent is to familiarize
participants with a disciplined approach to documenting, evaluating, and
improving product and manufacturing approaches.
boards where failures could occur continue to increase. Also, identifying the source
of a quality problem is becoming increasingly difficult.
As noted in Chapter 2, one says that a unit is “nonconforming” if at least one of
its associated “quality characteristics” is outside the “specification limits”. These
specification limits are numbers specified by design engineers. For example, if
voltage outputs of a specific circuit are greater than 12.5 V or less than 11.2 V we
might say that the unit is nonconforming. As usual, the company did not typically
use the terms “defective” or “defect” because the engineering specifications may or
may not correspond to what the customer actually needs. Also, somewhat
arbitrarily, the particular company in question preferred to discuss the “yield”
instead of the fraction nonconforming. If the “process capability” or standard
fraction nonconforming is p0, then 1 – p0 is called the standard yield.
Typical circuit board component process capabilities are in the region of 50
parts per million defective (ppm) for solder and component nonconformities.
However, since the average board contains over 2000 solder joints and 300
components, even 50 ppm defective generates far too many boards requiring
rework and a low overall capability.
In early 1998, an electronics manufacturing company with plants in the
Midwest introduced to the field a new advanced product that quickly captured 83%
of the market in North America, as described in Brady and Allen (2002). During
the initial production period, yields (the % of product requiring no touchup or
repair) had stabilized in the 70% range with production volume at 6000 units per
month. In early 1999, the product was selected for a major equipment expansion in
Asia. In order to meet the increased production demand, the company either
needed to purchase additional test and repair equipment at a cost of $2.5 million, or
the first test yield had to increase to above 90%. This follows because the rework
needed to fix the failing units involved substantial labor content and production
resources reducing throughput. The improvement to the yields was the preferred
situation due to the substantial savings in capital and production labor cost, and,
thus, the problem was how to increase the yield in a cost-effective manner.
Answer: Convening experts is often useful and could conceivably result in quick
resolution of problems without need for formalism. However, (a) is probably not
the best choice because: (1) if OFAT were all that was needed, the yield would
likely have already been improved by process engineers; and (2) a potential $2.5M
payoff could pay off as many as 25 person years. Therefore, the formalism of a six
sigma project could be cost justified. The answer (c) is more appropriate than (b)
SQC Case Studies 177
from the definition of six sigma in Chapter 1. The problem involves improving an
existing system, not designing a new one.
This project was of major importance to the financial performance of the company.
Therefore a team of highly regarded engineers from electrical and mechanical
engineering disciplines was assembled from various design and manufacturing
areas throughout the company. Their task was to recommend ways to improve the
production yield based on their prior knowledge and experience with similar
products. None of these engineers from top universities knew much about, nor
intended to use, any formal experimental planning and analysis technologies.
Table 9.1 gives the weekly first test yield results for the 16 weeks prior to the
team’s activities based on a production volume of 1500 units per week.
Table 9.1. Yields achieved for 16 weeks prior to the initial team’s activities
Question: Which of the following could the first team most safely be accused of?
a. Stifling creativity by adopting an overly formal decision-making
approach
178 Introduction to Engineering Statistics and Six Sigma
Table 9.2. The initial team’s predicted yield improvements by adjusting each factor
FACTOR YIELD
Replace vendor of main oscillator 5.3%
Add capacitance to base transistor 4.7%
Add RF absorption material to isolation shield 4.4%
New board layout on power feed 4.3%
Increase size of ground plane 3.8%
Lower residue flux 3.6%
Change bonding of board to heat sink 3.2%
Solder reflow in air vs N2 2.3%
Raise temperature of solder tips 1.7%
Based on their analysis of the circuit, the above experimental results and past
experience, the improvement team predicted that a yield improvement of 16.7%
would result from their proposed changes. All of their recommendations were
implemented at the end of Week 17. Table 9.3 gives the weekly first test yields
results for the six weeks of production after the revision.
It can be determined from the data in the tables that, instead of a yield
improvement, the yield actually dropped 29%. On Week 22 it was apparent to the
company that the proposed process changes were not achieving the desired
SQC Case Studies 179
outcome. Management assigned to this project two additional engineers who had
been successful in the past with yield improvement activities. These two engineers
both had mechanical engineering backgrounds and had been exposed to “design of
experiments” and “statistical process control” tools through continued education at
local universities, including Ohio State University, and company-sponsored
seminars.
Question: Which is the most appropriate first action for the second team?
a. Perform design of experiments using a random run ordering
b. Apply quality function deployment to relate customer needs to
engineering inputs
c. Return the process inputs to their values in the company SOPs
d. Perform formal optimization to determine the optimal solutions
Answer: Design of experiments, quality function deployment, and formal
optimization all require more system knowledge than what is probably
immediately available. Generally speaking, returning the system inputs to those
documented in SOPs is a safe move unless there are objections from process
experts. Therefore, (c) is probably the most appropriate initial step.
The second team’s first step was to construct a yield attribute control chart (a yield
chart or 1- defective chart “1-p”) with the knowledge of the process change date
(Table 9.4). From the chart, the two engineers were able to see that most of the
fluctuations in yield observed before the team implemented their changes during
Week 17 were, as Deming calls it, common cause variation or random noise. From
this, they concluded that since around 1000 rows of data were used in each point
on the chart, a significant number of samples would need to resolve yield shifts of
less than 5% during a one-factor-at-a-time experiment. Control limits with p0 =
0.37 and n = 200 units have UCL – LCL = 0.2 or 20% such that the sample sizes in
the OFAT experiments were likely too small to spot significant differences.
The two engineers’ first decision was to revert back to the original, documented
process settings. This differed from the initial settings used by the first team
because tinkering had occurred previously. The old evidence that had supported
this anonymous tinkering was probably due to random noise within the process
(factors changing about which the people are not aware). Table 9.4 gives the
weekly test yields for the five weeks after this occurrence.
Table 9.4. Yields for the five weeks subsequent to the initial intervention
Question: If you were hired as a consultant to the first team, what specific
recommendations would you make?
This approach restored the process to its previous “in control” state with yields
around 75%. The increase in yield shown on the control chart (Figure 9.1 below)
during this time frame was discounted as a “Hawthorne effect” since no known
improvement was implemented. The phrase “Hawthorne effect” refers to a
difference caused simply by our attention to and study of the process. Next the
team tabulated the percent of failed products by relevant defect code shown in
Figure 9.2. It is generally more correct to say “nonconformity” instead of “defect”
but in this problem the engineers called these failures to meet specifications
“defects”. The upper control limit (UCL) and lower control limit (LCL) are shown
calculated in a manner similar to “p-charts” in standard textbooks on statistical
process control, e.g., Besterfield (2001), based on data before any of the teams’
interventions.
100%
90%
UCL
80%
70%
Yield
CL
60%
Team #2 changes
50%
Team #1 changes
LCL
40%
30%
0 10 20 30 40
Subgroup
The procedure of Pareto charting was then applied to help visualize the
problem shown in the figure below. The total fraction of units that were
nonconforming was 30%. The total fraction of unit that were nonconforming
associated with the ACP subsystem was 21.5%. Therefore, 70% of the total yield
loss (fraction nonconforming) was associated with the “ACP” defect code or
subsystem. The engineers then concentrated their efforts on this dominant defect
code. This information, coupled with process knowledge, educated their selection
of factors for the following study.
Count of nonconformities
0 200 400 600 800 1000 1200 1400 1600 1800
Figure 9.2. Pareto chart of the nonconforming units from 15 weeks of data
Table 9.5. Data from the screening experiment for the PCB case study
Run A B C D y1 – Yield
1 -1 1 -1 1 92.7
2 1 1 -1 -1 71.2
3 1 -1 -1 1 95.4
4 1 -1 1 -1 69.0
5 -1 1 1 -1 72.3
6 -1 -1 1 1 91.3
7 1 1 1 1 91.5
8 -1 -1 -1 -1 79.8
An analysis of this data based on first order linear regression and so-called
Lenth’s method (Lenth 1989) generated the statistics in the second column of
Table 9.6. Note that tLenth for factor D, 8.59, is larger than the “critical value”
tEER,0.05,8 = 4.87. Since the experimental runs were performed in an order
determined by a so-called “pseudo-random number generator” (See Chapters 3
and 5), we can say that “we have proven with α = 0.05 that factor D significantly
affects average yield”. For the other factors, we say that “we failed to find
significance” because tLenth is less than 12.89. The level of “proof” is somewhat
complicated by the particular choice of experimental plan. In Chapter 3,
experimental plans yielding higher levels of evidence will be described.
Intuitively, varying multiple factors simultaneously does make statements about
causality dependent upon assumptions about the joint effects of factors on the
SQC Case Studies 183
response. However, the Lenth (1989) method is designed to give reliable “proof”
based on often realistic assumptions.
An alternative analysis is based on the calculation of Bayesian posterior
probabilities for each factor being important yields, the values shown in the last
column of Table 9.6. This analysis similarly indicates that the probability that the
heat sink type affects the yield is extremely high (96%). Further, it suggests that
the alternative heat sink is better than the current one (the heat sink factor
estimated coefficient is positive).
Based on this data (and a similar analysis), the two engineers recommended
that the process should be changed permanently to incorporate the new heat sink.
In the terminology needed in subsequent chapters, this corresponded to a
recommended setting x4 = D = the new heat sink. This was implemented during
Week 29. Table 9.7 gives the weekly yield results for the period of time after the
recommended change was implemented. Using the yield charting procedure, the
engineers were able to confirm that the newly designed process produced a stable
first test yield (no touch-up) in excess of 90%, thus avoiding the equipment
purchase and saving the company $2.5 million.
Estimated probability of
Factor Estimated coefficients (β est) tLenth
being “important”
A -1.125 0.98 0.13170
B -0.975 0.85 0.02081
C -1.875 1.64 0.03732
D 9.825 8.59 0.96173
This case illustrates the benefits of our DOE technology. First, the screening
experiment technology used permitted the fourth factor to be varied with only eight
experimental runs. The importance of this factor was controversial because the
operators had suggested it and not the engineers. If the formal screening method
had not been used, then the additional costs associated with one-factor-at-a-time
(OFAT) experimentation and adding this factor would likely have caused the team
not to vary that factor. Then the important subsequent discovery of its importance
184 Introduction to Engineering Statistics and Six Sigma
would not have occurred. Second, in the experimental plan, which in this case is
the same as the standard “design of experiments” (DOE), multiple runs are
associated with the high level of each factor and multiple runs are associated with
the low level of each factor. For example, 1400 units were run with the current heat
sink and 1400 units were run with the new heat sink. The same is true for the other
factors. The reader should consider that this would not be possible using an OFAT
strategy to allocate the 2800 units in the test. Finally, the investigators varied only
factors that they could make decisions about. Therefore, when the analysis
indicated that the new heat sink was better, they could “dial it up”, i.e., implement
the change.
Note that the purpose of describing this study is not necessarily to advocate the
particular experimental plan used by the second team. The purpose is to point out
that the above “screening design” represents an important component of one
formal experimentation and analysis strategy. The reader would likely benefit by
having these methods in his or her set of alternatives when he or she is selecting a
methodology. (For certain objectives and under certain assumptions, this
experimental plan might be optimal.) The reader already has OFAT as an option.
Question: While evidence showed that the project resulting system inputs helped
save money, which of the following is the safest criticism of the approach used?
a. The team could have applied design of experiments methods.
b. A cause & effect matrix could have clarified what was important to
customers.
c. Having a charter approved by management could have shorted the DOE
time.
d. Computer assisted optimization would improve decision-making in this
case.
Answer: The team did employ design of experiments methods, so answer (a) is
clearly wrong. It was already clear that all stakeholders wanted was a higher yield.
Therefore, no further clarification of customer needs (b) would likely help. With
only a single system output or response (yield) and a first order model from the
DOE activity, optimization can be done in one’s head. Set factors at the high level
(low level) if the coefficient is positive (negative) and significant. Answer (c) is
correct, since much of the cost and time involved with the DOE related to
obtaining needed approvals and no formal charter had been cleared with
management in the define phase.
jet engine control cables. The case study involved a Guidance Communication
Cable used on a ballistic missile system.
Part of this communication cable is molded with a polyurethane compound to
provide mechanical strength and stress relief to the individual wires as they enter
the connector shell. This is a two-part polyurethane which is procured premixed
and frozen to prevent curing. The compound cures at room temperature or can be
accelerated with elevated heat. Any void or bubble larger than 0.04 inches that
appears in the polyurethane after curing constitutes a single nonconformity to
specifications. Whenever the void nonconformities are found, local rework on the
part is needed, requiring roughly 17 minutes of time per void. Complete inspection
is implemented, in part because the number of voids per part typically exceeds ten.
Also, units take long enough to process that a reasonably inspection interval
includes only one or two units.
The team counted the number of voids or nonconformities in 20 5-part runs and set
up a u-chart as shown in Figure 9.3. The u-charting start up period actually ran
into January so that the recommended changes from the improvement phase went
into effect immediately after the start up period finished. A u-chart was selected
instead of, e.g., a p-chart. As noted above, the number of units inspected per period
was small and almost all units had at least one void. Therefore, a p-chart would not
be informative since p0 would be nearly 1.0.
186 Introduction to Engineering Statistics and Six Sigma
60
Setup Period
50
Number of Nonconformities
30
UCL
20
10
LCL
0
0 5 10 15 20 25 30 35
Figure 9.3. u-Chart of the voids per unit used in measurement and control phases
At the end of the chart start-up period, an informal gauge R&R activity
investigated the results from two inspectors. The approach used was based on units
that had been inspected by the relevant subject matter expert so “standard values”
were available. The results showed that one operator identified an average of 17
voids per run while the second operator identified an average of 9 voids per run
SQC Case Studies 187
based on the same parts. The specifications from the customer defined a void to be
any defect 0.040” in diameter or 0.040” in depth. An optical measuring device and
depth gage were provided to the inspectors to aid in their determination of voids.
Subsequent comparisons indicated both operators to average nine voids per run.
Question: Which likely explains why formal gauge R&R was not used?
a. The count of voids is attribute data, and it was not clear whether standard
methods were applicable.
b. The engineers were not aware of comparison with standards methods
since it is a relatively obscure method.
c. The project was not important enough for formal gauge R&R to be used.
d. Answers in parts (a) and (b) are both reasonable explanations.
Answer: Gauge R&R is generally far less expensive than DOE. Usually if
managers feel that DOE is cost justified, they will likely also approve gauge R&R.
The attribute data nature of count data often makes engineers wonder whether they
can apply standard methods. Yet, if n × u0 > 5, applying gauge R&R methods for
continuous data to count of nonconformity data is often reasonable. Also, even
though many companies use methods similar to gauge R&R (comparison with
standards) from Chapter 4, such methods are not widely known. Therefore, (d) is
correct.
The analysis phase began with the application of Pareto charting to understand
better the causes of voids and to build intuition. The resulting Pareto chart is shown
in Figure 9.4. Pareto charting was chosen because the nonconformity code
information was readily available and visual display often aids intuition. This
charting activity further called attention to the potential for inspectors to miss
counting voids in certain locations.
Number of voids
0 50 100 150 200 250
Top Face
Nonconformity Code
Edge
Cover Tape
Center
Side
Figure 9.4. Pareto chart of the void counts by location or nonconformity type
188 Introduction to Engineering Statistics and Six Sigma
The chart aided subtly in selecting the factors for the two designs of
experiments (DOE) applications described below (with results omitted for brevity).
A first DOE was designed in a nonstandard way involving two factors one of
which was qualitative at four levels. The response was not void count but
something easier to measure. The DOE was designed in a nonstandard way in part
because not all combinations of the two factors were economically feasible. The
results suggested a starting point for the next DOE.
A second application of DOE was performed to investigate systematically the
effect of seven factors on the void count using an eight run fractional factorial. The
results suggested that the following four factors had significant effects on the
number of voids: thaw time, thaw temperature, pressure, pot life.
The team recommended adjusting the process settings in the following manner.
For all factors that had significant effects in the second DOE, the settings were
selected that appeared to reduce the void count. Other factors were adjusted to
settings believed to be desirable, taking into account considerations other than void
count. Also, the improved measurement procedures were simultaneously
implemented as suggested by the informal application of gauge R&R in the
measurement phase.
The team spent ten weeks confirming that the recommended settings did in fact
reduce the void counts as indicated in Figure 9.3 above. Charting was terminated at
that point because it was felt that the information from charting would not be
helpful with such small counts, and the distribution of void nonconformities had
certainly changed. In other words, there was a long string of out-of-control signals
indicating that the adoption of the new settings had a positive and sustained
assignable cause effect on the system.
Four weeks were also spent documenting the new parameters into the
production work instructions for the production operators and into the mold design
rules for the tool engineers. At the same time, training and seminars were provided
on the results of the project. The plan was for a 17-week project, with actual
duration of 22 weeks. At the same time the project was projected to save $50,000
per year with actual calculated direct rework-related savings of $97,800 per year.
Total costs in materials and labor were calculated to be $31,125. Therefore, the
project was associated with approximately a four-month payback period. This
accounts only for direct rework-related savings, and the actual payback period was
likely much sooner.
SQC Case Studies 189
Question: Which is the safest critique of methods used in the wire harness study?
a. A cost Pareto chart would have been better since cost reduction was the
goal.
b. QFD probably would have been more effective than DOE.
c. Charting of void count should not have been dropped since it always
helps.
d. An FMEA could have called attention to certain void locations being
missed.
e. Pareto charting must always be applied in the define phase.
Answer: It is likely that a cost Pareto chart would not have shown any different
information than an ordinary Pareto chart. This follows since all voids appeared to
be associated with the same cost of rework and there were no relevant performance
or failure issues mentioned. QFD is mainly relevant for clarifying, in one method,
customer needs and competitor strengths. The realities at this defense contractor
suggested that customer needs focused almost solely on cost reduction, and no
relevant competitors were mentioned. DOE was probably more relevant because
the relevant system inputs and outputs had been identified and a main goal was to
clarify the relevant relationships. With such low void counts, u-charting would
likely not have been effective since n × u0 < 5. In general, all methods can be
applied in all phases, if Table 2.1 in Chapter 2 is taken seriously. This is
particularly true for Pareto charting, which generally requires little expense.
FMEA would likely have cost little and might have focused inspector attention on
failures associated with specific void locations. Therefore, (d) is probably most
correct.
wings. This project is based on results from an actual student project with some
details changed to make the example more illustrative.
Define: The primary goal of the this project is to improve the experience of
making, using, and disposing of paper air wings. The initial standard operating
procedure for making paper air wings is shown in Table 9.9. Performing the
process mapping method generated the flowchart of the manufacturing and usage
map in Figure 9.5. The process elicited the key input variables, x’s, and key output
variables, y’s, for study. Further, it was decided that the subsystem of interest
would not include the initial storage, so that initial flatness and humidity were out
of scope.
Table 9.9. Initial standard operating procedure for making paper air wings
Figure 9.5. Process map flowchart of air wing manufacturing and usage
SQC Case Studies 191
Measure: The measurement SOP in Table 9.10 was developed to evaluate the
current manufacturing SOP and design. The initial manufacturing SOP was then
evaluated using the measurement SOP and an Xbar & R charting procedure. In all,
24 air wings were built and tested in batches of two. This generated the data in
Table 9.11 and the Xbar & R chart in Figure 9.6.
Table 9.11. Air wing times studying the initial system for short run Xbar & R chart
As there were no out-of-control signals, the initial and revised charts were the
same. The initial process capability was 0.8 seconds (6σ0) and the initial average
flight time was 1.7 seconds (Xbarbar).
2.5
2 UCLXbar
Time (Seconds)
Xbar
1.5 LCLXbar
1 UCLR
R
0.5 LCLR
0
1 6 11
Subgroup
Figure 9.6. Combined Xbar & R chart for initial system evaluation
Analyze: The main analysis method investigated applied was benchmarking with
a friend’s air wing material selection and manufacturing method. The friend was
asked to make air wings, and the process was observed and evaluated. This process
generated the benchmarking matrices shown in Table 9.12. The friend also serves
as the customer, generating the ratings in the tables. It was observed that the air
times were roughly similar, but the appearance of the friends air wings was judged
superior.
KOV – time
(average of
KIV – fold
placement
in the air
seconds)
method
method
method
two in
KIV –
type
Competitor
Project leader Notebook Tearing Regular Dropping 1.7
Friend Magazine Scissiors Press firmly Dropping 1.6
SQC Case Studies 193
Improve: From the process mapping experience, the placement method was
identified as a key input variable. Common sense suggested that careful placement
might improve the appearance. Even though the air time was a little lower based on
a small amount of evidence, other benchmarking results suggested that the friend’s
approach likely represented best practices to be emulated. This resulted in the
revised standard operating procedure (SOP) in Table 9.13.
Table 9.13. Revised standard operating procedure for making paper air wings
Control: To verify that the air time was not made worse by the revised SOP, Xbar
& R charting based on an additional 24 air wings were constructed and tested (see
Table 9.14 and Figure 9.7).
Table 9.14. Air wing times studying the initial system for short run Xbar & R chart
2.5
2 UCLXbar
Time (Seconds)
Xbar
1.5
LCLXbar
1 UCLR
R
0.5 LCLR
0
1 6 11
Subgroup
Figure 9.7. Combined Xbar & R chart for initial system evaluation
The revised SOP was followed for the manufacturing, and the testing SOP was
applied to emulate usage. The improvement in appearance was subjectively
confimed. The revised average was not improved or made worse (the new Xbarbar
equaled 1.7 seconds). At the same time the consistency improved as measured by
the process capability (6σ0 equaled 0.3 seconds) and the width of control limits.
9.7 References
Besterfield D (2001) Quality Control. Prentice Hall, Columbus, OH
Brady J, Allen T (2002) Case Study Based Instruction of SPC and DOE. The
American Statistician 56(4):1-4
Lenth RV (1989) Quick and Easy Analysis of Unreplicated Factorials.
Technometrics 31:469-473
9.8 Problems
In general, pick the correct answer that is most complete.
3. According to this book, which is the most appropriate first project action?
a. Quality function deployment
b. Design of experiments
c. Creating a project charter
d. Control planning
e. All of the following are equally relevant for the define phase.
5. In the voids project, which phase and method combinations occurred? Give
the answer that is correct and the most complete.
a. Analyze – Quality Function Deployment
b. Measure – gauge R&R (informal version)
c. Define – creating a charter
d. All of the above are correct.
e. All of the above are correct except (a) and (d).
6. Suppose that top face voids were much more expensive to fix than other voids.
Which charting method would be most appropriate?
196 Introduction to Engineering Statistics and Six Sigma
a. p-charting
b. u-charting
c. Xbar & R charting
d. Demerit charting
e. EWMA charting
7. Which of the following could be a key input variable for the void project
system?
a. The number of voids on the top face
b. The total number of voids
c. The preheat temperature of the mold
d. The final cost on an improvement project
e. None of the above
8. In the voids case study, what assistance would FMEA most likely provide?
a. It could have helped to identify the techniques used by competitors.
b. It could have helped to develop quanititative input-output
relationships.
c. It could have helped select specific inspections systems for
improvement.
d. It could help guarantee that design settings were optimal.
e. It could have helped achieve effective document control.
9. In the voids project, what change is most likely to invite scope creep?
a. Increasing the predicted savings to $55,000
b. Removing two engineers from the team
c. Changing the quantifiable project objective to read, “to be decided”
d. Shortening the schedule to complete in January
e. None of the above is relevant to scope creep
10. Which rows of the void project charter address the not-invented-here
syndrome?
a. The predicted savings section or row
b. The team members’ project objectives section or row
c. The quantifiable project objectives section or row
12. According to this text, which of the following is the most correct and
complete?
a. Pareto charting could never be used in the analyze phase.
b. Formal optimization must be applied.
c. Meeting, a charter, two control charting activities, a C&E matrix,
informal optimization, and SOPs might constitute a complete six
sigma project.
d. All of the above are correct.
e. All of the above are correct except (a) and (d).
13. Write a paragraph about a case study that includes at least one “safe criticism”
based on statements in this book. Find the case study in a refereed journal such
as Applied Statistics, Quality Engineering, or The American Statistician.
14. Identify at least one KOV and a target value for a system that you want to
improve in a student project. We are looking for a KOV associated with a
project that is measurable, of potential interest, and potentially improvable
without more than ten total hours of effort by one person.
15. This exercise involves starting with an initial standard operating procedure
(SOP) and concluding with a revised and confirmed SOP. Both SOPs must be
evaluated using at least one control chart. For training purposes, the sample
size can be only n = 2 and only 12 subgroups are needed for the startup
periods for each chart.
a. At least six “methods” or “activities” listed in Table 2.1 must be
applied. The creation of SOPs of various types can also be considered
as activities counted in the total of six. Document all results in four
pages including tables and figures. Since design of experiments
(DOE) may be unknown to the reader at this point, it might make
sense not to use these methods.
b. Perform the exercise described in part (a) with eight instead of six
methods or activities. Again, documenting SOPs can be counted in
the total of eight.
10
SQC Theory
10.1 Introduction
Some people view statistical material as a way to push students to sharpen their
minds, but as having little vocational or practical value. Furthermore, practitioners
of six sigma have demonstrated that it is possible to derive value from statistical
methods while having little or no knowledge of statistical theory. However,
understanding the implications of probability theory (assumptions to predictions)
and inference theory (data to informed assumptions) can be intellectually satisfying
and enhance the chances of successful implementations in at least some cases.
This chapter focuses attention on two of the most practically valuable roles that
theory can play in enhancing six sigma projects. First, there are many parameters
to be selected in applying acceptance sampling. In general, larger sample sizes and
lower acceptable limits reduce the chances of accepting bad lots. However, it can
be helpful to quantify these risks, particularly considering the need to balance the
risks vs costs of inspection.
Second, control charts also pose risks, even if they are applied correctly as
described in Chapter 4. These risks include the possibility that out-of-control
signals will occur even when only assignable causes are operating. Then,
investigators would waste their time and either conclude that a signal was a false
alarm or, worse, would attempt to overcontrol the system and introduce variation.
Also, there is a chance that charting will fail to identify an assignable cause. Then,
large numbers of nonconforming items could be shipped to the customer.
Evaluating formally these risks using probability can help in making decisions
about whether to apply Xbar & R charting (Chapter 4) and EWMA charting or
multivariate charting (Chapter 8). Also, some of the risks are a function of the
sample size. Therefore, quantifying dependencies can help in selecting sample
sizes.
In Section 10.2, the fundamental concepts of probability theory are defined,
including random variables, both discrete and continuous, and probability
distributions. Section 10.3 focuses in particular on continuous random variables
and normally distributed random variables. Section 10.4 describes discrete random
200 Introduction to Engineering Statistics and Six Sigma
Hypergeometric Distribution
Question: What can be said about the unknown number of boats that will be sold
next month at a certain market?
a. It is not random because the planner knows it for certain in advance.
b. It is a continuous random variable.
c. It is a discrete random variable.
Answer: It is a random variable, assuming that the planner cannot confidently
predict the number in advance. Count of units is discrete. Therefore, the number of
boats is a discrete random variable (c).
Question: A planner has sold two boats out of two attempts last month at a market
and has been told those sales were lucky. What is true about the probability of
selling two more next month?
a. The probability is 1.0 since 2 ÷ 2 = 1 based on last month’s data.
b. The planner might reasonably feel that the probability is high, for example
0.7 or 70%.
c. Probabilities are essentially rationalizations and therefore have no value.
d. The answers in part (a) and (b) are both true.
202 Introduction to Engineering Statistics and Six Sigma
Answer: Last month you sold two similar boats which might suggest that the
probability is high, near 1. However, past data can rarely if ever be used to declare
that a probability is 1.0. While probabilities are rationalizations, they can have
value. For example, they can communicate feelings and cause participants in a
decision to share information. The planner can judge the probability is any number
between 0 and 1, and 0.7 might seem particularly reasonable. Therefore, (b) is the
only true answer.
Question: The planner has enjoyed a positive experience with a single sampling
plan. A bad lot of 1200 parts was identified and no complaints were made about
expected lots. A quality engineer states some reasonable-seeming assumptions and
declares the following: there is a 0.6 probability of cutting the inspection costs by
half and a 0.05 higher chance of detecting a bad lot using a double sampling
policy. Which answer is most complete and correct?
a. In business, never trust subjective theory. Single sampling was proven to
work consistently.
b. The evidence to switch may be considered trustworthy.
c. Single sampling is easier. Simplicity could compensate for other benefits.
d. Double sampling practically guarantees bad lots will not be accepted.
e. The answers in parts (b) and (c) are both correct.
Answer: Currently, many top managers feel the need to base the most important
business decisions on calculated probabilities. It can be technically correct not to
trust subjective theory. However, here the position is adopted that proof can only
come from an experiment using randomization (see Chapter 11) or form physics or
mathematical theory. In general, all acceptance sampling methods involve a risk of
accepting “bad” lots. Probabilistic information may be regarded as trustworthy
evidence if it is based on reasonable assumptions. Also, trading off intangibles
against probabilistic benefits is often reasonable. Therefore, (e) is the most
complete and correct answer.
The rigorous equations and mathematics in the next few sections should not
obscure the fact that probability theory is essentially subjective in nature and is the
servant of decision-makers. The main point is that even though probabilities are
subjective, probability theory can take reasonable assumptions and yield
surprisingly thorough comparisons of alternative methods or decision options.
These calculations can be viewed as mental experiments or simulations. While not
unreal in an important sense, these calculations can often offer more convincing
verifications than empirical tests. Similar views were articulated by Keynes (1937)
and Savage (1972).
SQC Theory 203
Pr(A) = ³ f (x )dx
x∈A
(10.1)
Question: Assume an engineer believes that the price of a boat will be between a
= $9,500 and b = $10,600, with c = $10,000 being the most likely price of a boat
he might buy next month. Develop and plot a probability density function that is
both reasonably consistent with these beliefs and easy to work with.
0 if x ≤ a or x ≥ b
(10.2)
f(x) = 2(x – a) a if a < x ≤ c
(b – a)(c – a)
f(x)
0.0018
total area is 1.0
0.0000
$9,500 $10,000 $10,600
Figure 10.3. Distribution for boat price (shaded refers to Example 10.3.2)
Note that the total area underneath all probability density functions is 1.0.
Therefore, if X is any continuous random variable and a is any number, Pr{X < a}
= 1 – Pr{X ≥ a}.
The next example shows that a probability can be calculated from a distribution
function. It is important to remember that the distribution functions are subjectively
chosen just like the probabilities. The calculus just shows how one set of subjective
assumptions implies other subjective assumptions.
10 , 000 $10,000
³ f (x )dx = 0 + $9,500
2(x - $9,500) x
Pr(A) = ³ ($10,600 – $9,500)($10,000 – $9,500) dx (10.3)
−∞
This integral corresponds to the shaded area in Figure 10.3. From our
introductory calculus course, we might remember that the anti-derivative of xn is
(n+1)-1xn+1 + K, where K is a constant. (With computers, integrals can be done even
without antiderivatives, but in this case, one is available.) Applying the anti-
derivative, our expression becomes
$10,000
As implied above, distributions with widely known names like the triangular
distribution rarely if ever exactly correspond to the beliefs of the planner.
Choosing a named distribution is often done simply to make the calculations easy.
Yet, computers are making calculus manipulations easier all the time so that
custom distributions might grow in importance. In the future, planners will
increasingly turn to oddly shaped distribution functions, f(x), that still have an area
equal to 1.0 underneath them but which more closely correspond to their personal
beliefs.
Using calculus, the “mean” (µ) or “expected value” (E[X]) of a random
variable with probability density function, f(x), is defined as
∞
E[X] = ³ x f (x )dx = µ
−∞
(10.5)
³ (x − µ ) f (x )dx
2
E[X] = = σ (10.6)
−∞
The next example illustrates the calculation of a mean from a distribution
function.
Answer: $10,000
2(x – $9,500) x
E[X] = 0 + x ³
$9,500 ($10,600 – $9,500)($10,000 – $9,500)
dx (10.7)
$10,600
2($10,600 – x) x
+
³x ($10,600 – $9,500)($10,600 – $10,000)
dx + 0 = $10,033.33
$10,000
Looking at Figure 10.3, it seems reasonable that the mean is slightly to the right of
$10,000.
The “uniform” probability distribution function has f(x) = 1 ÷ (b – a) for a ≤ x ≤
b and f(x) = 0 otherwise. In words, X is uniformly distributed if it is equally likely
to be anywhere between the numbers a and b with no chance of being outside the
[a,b] interval. The distribution function is plotted in Figure 10.4.
0.0000
a b
Figure 10.4. The generic uniform distribution function
The uniform distribution is probably the easiest to work with but also among
the least likely to exactly correspond to a planner’s subjective beliefs. Probabilities
and mean values of uniformly distributed random variables can be calculated using
plots and geometry since areas correspond to probabilities.
0.0000
85 95
Answer: P(92 ≤ X ≤ 95) is given by the area under the distribution function over
the range [92,95], which equals 0.1 × 3 = 0.3 or 30%.
As noted earlier, there are only a small number of distribution shapes with
“famous distribution functions” names such as the triangular distribution, the
uniform distribution, and the normal distribution, which will be discussed next.
One can, of course, propose “custom distribution functions” specifically
designed to express beliefs in specific situations.
SQC Theory 207
The “normal” probability density function, f(x), has a special role in statistics in
general and statistical quality control in particular. This follows because it is
relevant for describing the behavior of plotted quantities in control charts. The
reason for this relates to the central limit theory (CLT). The goals of this section
are to clarify how to calculate probabilities associated with normally distributed
random variables and the practical importance of the central limit theorem.
The normal probability density function is
−
( x − µ )2
0.398942 2σ 2
f(x) = e (10.8)
σ
where the parameters µ and σ also happen to be the mean and standard deviation of
the relevant normally distributed random variable.
The normal distribution is important enough that many quality engineers have
memorized the probabilities shown in Figure 10.6. The phrase “standard normal
distribution” refers to the case µ = 0 and σ = 1, which refers to both plots in
Figure 10.6.
0.5 0.5
f (x ) f (x )
0.4 0.4
0.3 0.3 area
0.2 0.2 under curve
0.68 between -3 and 3
0.1 0.1
is 0.9973
0 0
-4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0
x x
(a) (b)
Figure 10.6. Shows the fraction within (a) 1.0 × σ of the µ and (b) 3.0 × σ of µ
In general, for any random variable X and constants µ and σ with σ > 0:
The normal distribution has three special properties that aid in hand calculation
of relevant probabilities. First, the “location scale” property of normal probability
density function guarantees that, if X is normally distributed, then
Z~
(X − µ ) (10.10)
σ
is also normally distributed, for any constants µ and σ. Note that, for many
distributions, shifting and scaling results in a random variable from a distribution
with a different name.
Second, if µ and σ are the mean and standard deviation of X respectively, then
Z derived from Equation (10.10) has mean 0.000 and standard deviation 1.000.
Then, we say that Z is distributed according to the “standard normal” distribution.
Third, the “symmetry property” of the normal distribution guarantees that
Pr{Z < a} = Pr{Z > – a}. One practical benefit of these properties is that
probabilities of events associated with normal probability density functions can be
calculated using Table 10.1. The table gives Pr{Z < a} where the first digit of a is
on the left-hand-side column and the last digit is on the top row. For example:
The examples that follow illustrate probability calculations that can be done
with paper and pencil and access to Table 10.1. They show the procedure of using
the equivalence of events to transform normal probability calculations to a form
where answers can be looked up using the table. In situations where Excel
spreadsheets can be accessed, similar results can be derived using built-in
functions. For example, “=NORMDIST(5,9,2,TRUE)” gives the value 0.02275,
where the TRUE refers to the cumulative probability that X is less than a. A false
would give the probability density function value at the point X = a.
SQC Theory 209
Table 10.1. If Z ~ N[0,1], then the table gives P(Z < z). The first column gives the first three
digits of z, the top row gives the last digit.
Question 2: Assume X ~ N(µ = 20, σ = 5). What is the Pr{X > 22}?
Answer 2: Pr{X > 22} = Pr{Z > (22 – 20)/5}= Pr{Z > 0.4}, using the location scale
property. Also, Pr{Z > 0.4}= Pr{Z<–0.4}, because of the symmetry property of the
normal distribution. Pr{X > 22}= Pr{Z < –0.40} = 0.344578, from the table.
Question 3: Assume X ~ N(µ = 20, σ = 5). What is the Pr{12 < X < 23}?
Answer 3: Pr{12 < X < 23} = Pr{X < 23} – Pr{X < 12}, which follows directly
from the definition of probability as an integral in Figure 10.7. Next,
Pr{X < 12} = Pr{Z > (12 – 20)/5}
= Pr{Z > –1.60} = 0.054799 and
Pr{X < 23} = Pr{Z > (23 – 20)/5}
= Pr{Z > 0.60} = Pr{Z > –0.60}
= 0.274253,
where the location scale and symmetry properties have been used. Therefore, the
answer is 0.274253 – 0.054799 = 0.219. (The implied uncertainty of the original
numbers is unclear, but quoting more than three digits for probabilities is often not
helpful because of their subjective nature.)
0.10 0.10 0.10
0.08 0.08 0.08
0.06 0.06 0.06
0.04 0.04 0.04
0.02 0.02 0.02
0.00 0.00 0.00
0.0 10.0 20.0 30.0 40.0 0.0 10.0 20.0 30.0 40.0 0.0 10.0 20.0 30.0 40.0
total area
0.1 is
1.0
0.0000
85 95
Figure 10.8. The uniform distribution function example probability calculation
212 Introduction to Engineering Statistics and Six Sigma
Answer 4: P(92 ≤ X ≤ 95) is given by the area under the distribution function over
the range [92,95] in Figure 10.8, which equals 0.1 × 3 = 0.3 or 30%.
The phrase “test for normality” refers to an evaluation of the extent to which a
decision-making can feel comfortable believing that responses or averages are
normally distributed. In some sense, numbers of interest from the real world never
come from normal distributions. However, if the numbers are averages of many
other numbers, or historical data suggests approximate normality, then it can be of
interest to assume that future similar numbers come from normal distributions.
There are many formal approaches for evaluating the extent to which assuming
normality is reasonable, including evaluation of skew and kurtosis and normal
probability plotting the numbers as described in Chapter 15.
Assume that a unit produced by an engineered system has only one critical quality
characteristic, Y1(xc). For example, the critical characteristic of a bolt might be
inner diameter. If the value of this characteristic falls within values called the
“specification limits,” then the unit in question is generally considered acceptable,
otherwise not. Often critical characteristics have both “upper specification limits”
(USL) and “lower specification limits” (LSL) that define acceptability. For
example, the bolt diameter must be between LSL = 20.5 millimeters and USL =
22.0 millimeters for the associated nuts to fit the bolt.
Suppose further that the characteristic values of items produced vary
uncontrollably around an average or “mean” value, written “µ,” with typical
differences between repeated values equal to the “standard deviation,” written “σ”.
For example, the bolt inner diameter average might be 21.3 mm with standard
deviation, 0.2 mm, i.e., µ = 21.3 mm and σ = 0.2 mm.
With these definitions, one says that the “sigma level,” σL, of the process is
Note that σL = 3 × Cpk (from Chapter 4). If σL > 6, then one says that the
process has “six sigma quality.” For instance, the bolt process sigma level in the
example given is 3.5. This quality level is often considered “mediocre”.
With six sigma quality and assuming normally distributed quality characteristic
values under usual circumstances, the fraction of units produced with characteristic
values outside the specification limits is less than 1 part per billion (PPB). If the
process mean shifts 1.5σ toward the closest limit, then the fraction of
“nonconforming” units (with characteristic values that do not conform to
specifications) is less than 3.4 parts per million (PPM).
Figure 10.9 shows the probability density function associated with a process
having slightly better than six sigma quality. This figure implies assumptions
including that the upper specification limit is much closer to the mean than the
lower specification limit.
SQC Theory 213
0.5
USL
relative frequency
< 1 PPB
nonconforming
6σ
0.0
µ − 2σ µ µ + 2σ µ + 4σ µ + 6σ µ + 8σ
-2 0 2 4 6 8 10
Y1
Figure 10.9. Shows the relative frequency of parts produced with six sigma quality
The central limit theorem (CLT) plays an important role in statistical quality
control largely because it helps to predict the performance of control charts. As
described in Chapter 4 and Chapter 8, control charts are used to avoid intervention
when no assignable causes are present and to encourage intervention when they are
present. The CLT helps to calculate the probabilities that charts will succeed in
these goals with surprising generality and accuracy. The CLT aids in probability
calculations regardless of the charting method (with exceptions including R
charting) or the system in question, e.g., from restaurant or hospital emergency
room to traditional manufacturing lines.
To understand how to benefit from the central limit theorem and to comprehend
the limits of its usefulness, it is helpful to define two concepts. First, the term
“independent” refers to the condition in which a second random variable’s
probability density function is the same regardless of the values taken by a set of
other random variables. For example, consider the two random variables: X1 is the
number of boats that will be sold next month and X2 is their sales prices as
determined by unknown boat sellers. Both are random variables because they are
unknown to the planner in question. The planner in question assumes that they are
indendent if and only if the planner believes that potential buyers make purchasing
decisions with no regard to price within the likely ranges. Formally, if f( x 1 ,x 2 ) is
214 Introduction to Engineering Statistics and Six Sigma
the “joint” probability density function, then independence implies that it can be
written f( x 1 , x 2 ) = f( x 1 ) f( x 2 ) .
Second, “identically distributed” means that all of the relevant variables are
assumed to come from exactly the same distribution. Clearly, the number of boats
and the price of boats cannot be identically distributed since one is discrete (the
number) and one is continuous (the price). However, the numbers of boats sold in
successive months could be identically distributed if (1) buyers were not
influenced by seasonal issues and (2) there was a large enough pool of potential
buyers. Then, higher or lower number of sales one month likely would not
influence prospects much in the next month.
In the context of control charts, making the combined assumption that system
outputs being charted are independent and identically distributed (IID) is relevant.
Departures of outputs from these assumptions are also relevant. Therefore, it is
important to interpret the meaning of IID in this context. System outputs could
include the count of demerits on individual hospital surveys or the gaps on
individual parts measured before welding.
To review: under usual circumstances, common causes force the system outputs
to vary with a typical pattern (randomly with the identical, same density function).
Rarely, however, assignable causes enter and change the system, thereby changing
the usual pattern of values (effectively shifting the probability density function).
Therefore, even under typical circumstances the units inspected will not be
associated with constant measurement values of system outputs. The common
cause factors affecting them will force the observations to vary up and down. If
measurements are made on only a small fraction of units produced at different
times by the system, then it can be reasonably assumed that the common causes
will effectively reset themselves. Then, the outputs will be IID to a good
approximation. However, even with only common causes operating, units made
immediately after one another might not be associated with independently
distributed system outputs. Time is often needed for the common causes to reset
enough that independence is reasonable. Table 10.2 summarizes reasons why IID
might or might not be a reasonable assumption in the context of control charting.
Table 10.2. Independent and identically distributed assumptions for control charting
Question: Assume the mood of the emergency room nurse, working at a small
hospital with typically ten patients per shift, affects patient satisfaction. The key
SQC Theory 215
output variable is the sum of demerits. Consider the following statement: “The
nurse’s mood is a source of common cause variation, making it unreasonable to
assume that subsequent patients’ assigned demerits are independently distributed.”
Which answer is most complete and correct?
a. The statement is entirely reasonable.
b. Moods are always assignable causes because local people can always fix
them.
c. Moods fluctuate so quickly that successive demerit independence is
reasonable.
d. Satisfaction ratings are always independent since patients never talk
together.
e. The answers in parts (b) and (c) are both reasonable.
Answer: In most organizations, moods are uncontrollable factors. Since they are
often not fixable by local authority, they are not generally regarded as assignable
causes. Moods typically change at a time scale of one shift or one half shift.
Therefore, multiple patients would likely be affected by the same mood.
Therefore, assuming successive demerit independence is reasonable. Satisfaction
ratings might not be independently distributed because the same common cause
factor fluctuation might affect multiple observations. Therefore, the answer is (a).
Question: An untrained welder is put on second shift and does not follow the
standard operating procedure for fixturing parts, dramatically increasing gaps.
Consider the following statement: “The operator’s lack of training constitutes an
assignable cause and could make it difficult to believe the same, identical
distribution applies to gaps before and after the untrained welder starts.” Which
answer is most complete and correct?
a. The statement is entirely reasonable.
b. Training issues are assignable causes because local authority can fix them.
c. It is usual for assignable causes to effectively shift the output density
function.
d. With only common causes operating, it is often reasonable to assume
outputs continually derive from the identical distribution function.
216 Introduction to Engineering Statistics and Six Sigma
In the context of SQC, the central limit theorem (CLT) can be viewed as an
important fact that increases the generality of certain kinds of control charts. Also,
it can be helpful for calculating the small adjustment factors d2, D1, and D2 that are
commonly used in Xbar charting. Here, the CLT is presented with no proof using
the following symbols:
X1, X2, …, Xn are random variables assumed to be independent identically
distributed (IID). These could be quality characteristic values outputted from a
process with only common causes operating. They could also be a series of outputs
from some type of numerical simulation.
f(x) is the common density function of the identically distributed X1, X2, …, Xn.
Xbarn is the sample average of X1, X2, …, Xn. Xbarn is effectively the same as
Xbar from Xbar charts with the “n” added to call attention to the sample size.
σ is the standard deviation of the X1, X2, …, Xn, which do not need to be
normally distributed.
The CLT focuses on the properties of the sample averages, Xbarn.
If X1, X2, …, Xn are independent, identically distributed (IID) random variables
from a distribution function with any density function f(x) with finite mean and
standard deviation, then the following can be said about the average, Xbarn, of the
random variables. Defining
∞
Xbarn − ³ u f (u)du
( X 1 + X 2 + ... + X n ) and Z n = −∞
(σ / n )
Xbarn = , (10.12)
n
it follows that
x 1
1 − u2
lim Pr (Z n ≤ x ) = ³ e 2
du . (10.13)
n →∞
−∞ 2π
In words, averages of n random variables, Xbarn, are approximately
characterized by a normal probability density function. The approximation
improves as the number of quantities in the average increases. A reasonably
understandable proof of this theorem, i.e., the above assumptions are equivalent to
the latter assumption, is given in Grimmet and Stirzaker (2001), Chapter 5.
To review, the expected value of a random variable is:
∞
E[X] = ³ uf (u)du
−∞
(10.14)
SQC Theory 217
Then, the CLT implies that the sample average converges, Xbarn, converges to
the true mean E[X] as the number of random variables averaged goes to infinity.
Therefore, the CLT can be effectively rewritten as
E[X] = Xbarn + eMC., (10.15)
where eMC is normally distributed with mean 0.000 and standard deviation σ ÷
sqrt[n] for “large enough” n. We call Xbarn the “Monte Carlo estimate” of the
mean, E[X]. There, with only common causes operating, the Xbar chart user is
charting Monte Carlo estimates of the mean. Since σ is often not known, it is
sometimes of interest to use the sample standard deviation, s:
n
¦ (X − Xbarn )
2
i
i =1
s= (10.16)
n −1
Then, it is common to use:
σestimate = s ÷ c4 (10.17)
where c4 comes from Table 10.3. As noted in Chapter 6, the standard deviation can
also be estimated using the average range, Rbar, using:
σestimate = Rbar ÷ d2 (10.18)
However σ is estimated, σestimate ÷ sqrt[n] is called the “estimated error of the
Monte Carlo estimate” or a typical difference between Xbarn and E[X].
Table 10.3. Constants c4 and d2 relevant to Monte Carlo estimation and charting
Question: The time between the arrival of patients in an emergency room (ER)
and when they meet with doctors, X, can be a critical characteristic. Assume that
times are typically 20 minutes with standard deviation 10 minutes. Suppose that
the average of seven consecutive patient times was 35 minutes. Which is correct
and most complete?
a. A rough estimate for the probability that this would happen
without assignable causes is 0.000004.
b. This data constitutes a signal that something unusual is
happening.
c. It might be reasonable to assign additional resources to the ER.
d. It is possible that no assignable causes are present.
e. All of the above are correct.
Answer: It has not been established that the averages of seven consecutive times,
Xbar7, are normally distributed to a good approximation under usual
circumstances. Still, it is reasonable to assume this for rough predictions. Then, the
central limit theorem gives that Xbar7, under usual circumstances, has mean 20
minutes and standard deviation 10 ÷ sqrt[7] = 3.8 minutes. The chance that Xbar7
would be greater than 35 minutes is estimated to be Pr{Z > (35 – 20) ÷ 3.8} = Pr{Z
< –4.49} = 0.000004 from Table 10.1.
good reason to send in additional medical resources if they are available. The
answer is (e), all of the above are correct.
Answer: The pseudo-random numbers shown in Table 10.4 were generated using
Excel (Tools Menu Data Analysis Random Number Generation). The
distribution selected was normal with mean 0 and standard deviation σ0 = 1 with
random seed equal to 1 (without loss of generality). Definining R = Max{X1,…,Xn}
– Min{X1,…,Xn}, one has 1000 effectively random variables whose expected value
is d2 according to Equations (10.15) and (10.21). Averaging, we obtain 2.3338 as
our estimated for d2 with Monte Carlo estimated standard error 0.8767 ÷ sqrt[1000]
= 0.0278. This estimate is within one standard deviation of the true value from
Table 10.3 of 2.326. Note that Table 10.4 also permits an estimate for c4.
220 Introduction to Engineering Statistics and Six Sigma
Table 10.4. 1000 simulated subgroups each with five pseudo-random numbers
Sub-
X1 X2 X3 X4 X5 R S
group
1 -3.0230 0.1601 -0.8658 0.8733 0.2147 → 3.8963 1.5271
# # # # # # # #
Pr(A) = ¦ Pr{ X = x }
xi ∈ A
i . (10.23)
In this book, we focus on cases in which the set of possible values that X can
assume are nonnegative integers 0, 1, 2,…(N – 1), e.g., the number of
nonconforming units in a lot of parts. An event of particular interest is the chance
SQC Theory 221
that X is less than or equal to a constant, c. Then, the probability of this event can
be written:
Pr{X ≤ c} = Pr{X = 0} + Pr{X = 1} + … + Pr{X = c}. (10.24)
The following example illustrates the elicitation of a discrete distribution function
from a verbal description. It also shows that many “no-name” distribution
functions can be relevant in real world situations.
Question: An emergency room nurse tells you that there is about a 50% chance
that no accident victims will come any given hour. If there is at least one victim, it
is equally likely that any number up to 11 (the most ever observed) will come. Plot
a probability mass function consistent with these beliefs and estimate the
probability that greater than or equal to 10 will come.
Answer: Figure 10.10 plots a custom distribution for this problem. The relevant
sum is Pr{X = 10} + Pr{X = 11} = 0.10, giving 10% as the estimated probability.
Pr{X = xi}
0.50
0 5 10
Figure 10.10. Distribution for number of victims with selected event (dotted lines)
Question: Using the distribution function from the previous example, calculate the
expected number of accident cases in any given hour.
Answer 1: This is the perfect case for the geometric distribution. The distribution
function is, therefore
Pr{X = x} = p0(x – 1)(1 – p0) for x = 1,...,∞
Advanced readers will realize that the definition of independence of events permits
the formula to be generated through the multiplication of x – 1 consecutive
successes followed by 1 failure.
An important message of the above example is that the geometric probability mass
function, while appearing to derive from elementary assumptions, is still
approximate and subjective when applied to real problems. For example, in a real
situation one might have several trials but yet not be entirely comfortable assuming
that results are independent and are associated with the same, constant success
probability, p0. Then, the geometric probability mass function might be applied for
convenience only, to gain approximate understanding.
The general formula for the expected value of a geometric random variable is:
E[X] = (1) p0(1 – 1)(1 – p0) + (2)p0(2 – 1)(1 – p0) + … = 1 (10.27)
1 − p0
The “hypergeometric” distribution also has a special role in SQC theory because
it helps in understanding the risks associated with acceptance sampling methods.
The hypergeometric probability mass function is
SQC Theory 223
§ M ·§ N − M ·
¨¨ ¸¸¨¨ ¸¸
Pr{X = x} =
© x ¹© n − x ¹ for x = 0,1,...,∞ (10.28)
§N·
¨¨ ¸¸
©n¹
0 for all other x
where M, N, and n are parameters that must be nonnegative integers. The symbol
“( )” refers to the so-called “choose” operation given by
§M ·
¨¨ ¸¸ = “M choose x”
© x¹
M!
= (10.29)
x!( M − x )!
[ M × ( M − 1) × ... × 1]
= .
[ x × ( x − 1) × ... × 1] × [( M − x ) × ( M − x − 1) × ... × 1]
The following example shows the assumptions that motivate many applications of
the hypergeometric distribution.
Question: Assume that one is considering a situation with n units selected from N
units where the total number of nonconforming units is M. Assume the selection is
random such that each of the N units has an equal chance of being selected because
a “rational subgroup” is used (see Chapter 4). Diagram this sampling situation and
provide a formula for the chance that exactly x nonconforming units will be
selected.
Answer: This is the perfect case for the hypergeometric distribution. The
distribution function is, therefore, given by Equation (10.28). Advanced readers
can calculate this formula from the assumptions by counting all cases in which x
units are selected, divided by the total number of possible selections. Figure 10.11
illustrates the selection situation.
←N→
M
n
§ M ·§ N − M · nM x
¨¨ ¸¸¨¨ ¸¸ 2.7182 N §¨ nM ·¸
© x ¹© n − x ¹ § © N ¹ . (10.30)
§N· x!
¨¨ ¸¸
©n¹
In Microsoft® Excel, the function “HYPGEOMDIST” generates probabilities, as
illustrated in the next example.
Answer: The assumed beliefs are consistent with the hypergeometric mass
function,
§10 ·§150 − 10 · 10! 140!
¨¨ ¸¸¨¨ ¸¸ ×
© 2 ¹© 15 − 2 ¹ = 2!×8! 13!× 27!
§150 · 150!
¨¨ ¸¸
© 15 ¹ 15!× 35!
(10 × 9) 1
×
( 2 ×1) (1 ×1)
= =0.19 (10.31)
(150 × 149 × ... × 141)
(15 × 14) × ( 135 × 134 × ... × 128)
SQC Theory 225
Note that the Poisson approximation is generally not considered accurate with n =
15. However, for reference the Poisson approximation gives 0.184 for the
probability, which might be acceptable depending on the needs.
Analysis of Xbar charting methods starts with the assumption that, with only
common causes operating, individual observations are independent, identically
distributed (IID) from some unknown distribution. Then, the central limit theorem
guarantees that, for large enough sample size n, Xbar will be approximately
normally distributed. Denoting the mean of individual observations µ and the
standard deviation σ0, the central limit theorem further guarantees that Xbar will
have mean equal to µ and standard deviation approximately equal to σ0 ÷ sqrt[n].
Figure 10.12 shows the approximate distributions of the charted Xbar values
for two cases. First, if only common causes are operating (the unknown
distribution of the quality characteristic stays fixed), the Xbar mean remains µ and
standard deviation approximately equals σ0 ÷ sqrt[n]. The event of a false alarm is
{Xbar > UCL or Xbar < LCL}. The probability of this event is approximately
3σ 3σ
Pr{false alarm}= Pr{Xbar > µ + } + Pr{Xbar < µ − } (10.32)
n n
= Pr{Z > 3} + Pr{Z < –3} = 2 × Pr{Z < –3} = 0.0026
where the symmetry property of the normal distribution and Table 10.1 were
applied. The phrase “false alarm rate when the process is in-control” is often used
to refer to the above probability.
The second case considered here involved a shift of “∆” in the mean of the
distribution of the individual observations because of an assignable cause. This in
turn causes a shift of ∆ in the mean of Xbar as indicated by Figure 10.12 (b).
226 Introduction to Engineering Statistics and Six Sigma
σ0 σ0
µ +3 µ +3
n σ0 n
µ+∆_ n
σ0
µ_ n
(a) σ0 (b) σ0
µ −3 µ −3
n n
Figure 10.12. Approximation distribution of Xbar for (a) no shift and (b) shift = +∆
Answer: For this problem, we have ∆ = 1.0 grams, σ0 = 1.2 grams, and n = 5 or n
= 10. Applying the formula the detection “rate” or probability is
SQC Theory 227
1.0
Pr{chart signal} § Pr{Z < − 3 + }, (10.34)
1.2
n
which gives 0.128 and 0.358 for n = 5 or n = 10 respectively. Going from roughly
one-tenth chance to one-third chance of detection could be important depending on
material, inspection, and other costs. With either inspection effort, there is a good
chance that the next charted quantity will fail to signal the assignable cause. It will
likely require several subgroups for the shift to be noticed.
The chances of false alarms and detecting shifts associated with assignable causes
are helpful for decision-making about sample sizes in charting. Next, we
investigate the timing of false alarms and shift detections. Figure 10.13 shows one
possible Xbar chart and the occurrence of a false alarm on subgroup 372.
“Run length” (RL) is the number of subgroups inspected before an out-of-
control signal occurs. Therefore, run length is a discrete random variable because
when the chart is being set up, the actual run length is unknown but must be a
whole number. The expected value of the run length or “average run length”
(ARL) is often used for subjective evaluation of alternative sample sizes (n) and
different charting approaches (e.g., Xbar charting and EWMA charting from
Chapter 9).
If one is comfortable in assuming that the individual quality characteristics are
independent, identically distributed (IID), then these assumptions logically imply a
comfort with assuming that the run length is distributed according to a geometric
probability mass function. Under these assumptions, the expected value of a
geometric random variable is relevant, and the ARL is given as a function of the
shift ∆, the quality characteristic distribution, σ0, and the sample size n:
1
E[RL] = ARL = . (10.35)
∆
Pr{Z < −3 + }
σ0
n
228 Introduction to Engineering Statistics and Six Sigma
UCL
µ = CL
LCL
1 2 3 4 5 6 7 8 9 10 11 12 369 370 371 372 373 subgroup
RL(∆ = 0)
Figure 10.13. Random run length (RL) with only common causes operating
UCL = CL + 3σ 0/sqrt(n)
µ = CL + ∆ with ∆ = +1σ 0
CL
UCL = CL – 3σ 0/sqrt(n)
1 2 3 4 5 6 7 8 9 10 11 12 subgroup
RL(∆ = +1σ 0)
Figure 10.14. Random run length (RL) after a mean shift upwards of ∆ = +1σ0
Table 10.5 shows the ARLs for Xbar charts with different sample sizes given in
units of τ. For example, if the period between sampling is every 2.0 hours (τ = 2.0
hours), then the ARL(∆ = 0) = 370 (periods) × 2.0 (hours/period) = 740 hours.
Therefore, false alarms will occur every 740 hours. In fact, a property of all Xbar
charts regardless of sample size is that the in-control run ARLs are 370.4. This in-
control ARL is typical of many kinds of charts.
Note that chart “designer” or user could use the ARL formula to decide which
sample size to use. For example, if it is important to detect 1σ0 process mean shifts
within two periods with high probability such that ARL(∆ = 1σ0) < 2.0, then
sample sizes equal to or greater than 10 should be used. Note also that ARL does
not depend on the true mean, µ.
SQC Theory 229
Question: Assume that one is applying Xbar charting with subgroup sampling
periods completing every 2.0 hours (τ = 2 hours) for a plant operating all shifts
every day of the week. How often do false alarms typically occur from a given
chart?
Answer: With false alarms occurring on average every 370.4 subgroups and 12
subgroups per day, alarms typically occur once per 30 days or 1 per month.
For single sampling, the event that the lot is accepted is simply defined in terms of
the number of units found nonconforming, X. If {X ≤ c}, the lot is accepted,
otherwise it is not accepted. Therefore, the probability of acceptance is
pA = Pr{X = 0} + Pr{X = 1} + … + Pr{X = c} (10.36)
For known lot size N, sample size n, and true number nonconforming, M, it is
often reasonable to assume that Pr{X = x} is given by the hypergeometric
distribution. Then, the probability Pr{X ≤ c} is given by the so-called
“cumulative” hypergeometric distribution.
Figure 10.15 shows the calculation of the entire OC Curve for single sampling
with N =1000, c = 2, and n = 100. The plotting proceeds, starting with values of M,
then deriving 100×p0 and 100×pA by calculation. Because of the careful use of
dollar signs, the formulas in cells B6 and C6 can be copied down to fill in the rest
of the curve. Looking at the chart, the decision-maker might decide that the 0.22
probability of accepting a lot with 4% nonconforming is unacceptable. Then,
increasing n and/or decreasing c might produce a more desirable set of risks.
Answer 1: This question is based on a single sampling plan for N = 2000 units in a
lot. It has n = 150 units in a sample and the rejection limit is c = 3.
p0 = 0.01 then M = 20
pA = P(X = 0, N = 2000, n = 150, M = 20)
+ P(X = 1, N = 2000, n = 150, M = 20)
+ P(X = 2, N = 2000, n = 150, M = 20) (10.37)
+ P(X = 3, N = 2000, n = 150, M = 20)
= 0.94
p0 = 0.02 then M = 40 pA = 0.65.
The resulting OC curve is given in Figure 10.16. The plot shows that the single
sampling approach will effectively identify trainees yielding unacceptable
inspections greater than 5% of the time, and if the fraction nonconforming is kept
to less than 1%, there is almost zero chance of being found to need re-training.
100 100
100 pA 100 pA
50 50
0 0
0 100 p0 5 0 100 p0 5
Figure 10.16. OC curves for plans with (a) n = 150 units, c = 3 and (b) n = 110, c = 1
Answer 2: The new policy is less risky in the sense that the probability of
acceptance is always smaller (within two decimal places). However, relatively
good teams are much more likely to be flagged for re-training, which might be
considered unnecessary.
Question: Consider a teaching hospital in which the N = 7500 patients are passed
through a training class of medical students in a probationary period. The attending
physician inspects patient interactions with n1 = 150 patients. If less than or equal
to c1 = 3 student-patient interactions are unacceptable, the class is passed. If the
number unacceptable is greater than r = 6, then the class must enter an intensive
program (lot is rejected). Otherwise, an additional n2 = 350 interactions are
inspected. If the total number unacceptable is less than or equal to c2 = 7, the class
is passed. Otherwise, intensive training is required. Develop an OC curve and
comment on how the benefit of double sampling is apparent.
Answer: Figure 10.17 shows the OC curve calculated using an Excel spreadsheet.
Generally speaking, a desirable OC curve is associated with a relatively steep drop
in the acceptance probability as a function of the true fraction nonconforming
(compared with single sampling with the same average sample number). In this
way, high quality classes of students (lots) are accepted with high probability and
low quality lots are rejected with high probability.
Percent Accepted (100 p A)
100
80
60
40
20
0
0 2 4 6 8
Assumed Percent NC (100 p0)
OC curves can help quantify some of the benefits associated with double sampling
and other sampling methods compared with single sampling. Yet, it can be difficult
to evaluate the importance of costs associated with these benefits because the
SQC Theory 233
Question: Consider a lot with N = 2000, single sampling with n = 150, and c = 3,
and double sampling with n1 = 70, c1 = 1, r = 4, n1 = 190, and c2 = 4. These single
and double sampling plans have comparable OC curves. Compare the average
sample numbers (ASN) under the assumption that the true fraction nonconforming
is 3%.
Answer: Under the standard assumption that all units in the lot have an equal
chance of being selected, the hypergeometric mass function is reasonable for
predicting ASN. For single sampling, the ASN is 150. Assuming 3% are
nonconforming, M = 0.03 × 2000 = 60. For double sampling, the ASN = 70 +
(0.1141 + 0.2562) × 190 = 140.3.
10.8 References
Grimmet GR. and DR. Stirzaker (2001) Probability and Random Processes, 3rd
edn., Oxford University Press, Oxford
Keynes JM (1937) General Theory of Employment. Quarterly Journal of
Economics
Savage LJ (1972) The Foundations of Statistics, 2nd edn. Dover Publications, Inc.,
New York
10.9 Problems
In general, choose the correct answer that is most complete.
5. Suppose someone tells you that she believes that revenues for her product line
will be between $2.2M and $3.0M next year with the most likely value equal
to $2.7M. She says that $2.8M is much more likely than $2.3M. Define a
distribution function consistent with her beliefs.
b. The central limit guarantees that all random variables are normally
distributed.
c. The central limit theorem does not apply to discrete random
variables.
d. All of the above are correct.
e. All of the above are correct except (c) and (d).
18. Which of the above two policies is more likely to do the following:
a. Accept lots with large fractions of nonconforming units
b. Accept lots with small fractions of nonconforming units
19. What is the shape of an ideal acceptance sampling curve? Explain briefly.
Part II: Design of Experiments (DOE) and Regression
11
11.1 Introduction
Design of experiments (DOE) methods are among the most complicated and useful
of statistical quality control techniques. DOE methods can be an important part of a
thorough system optimization, yielding definitive system design or redesign
recommendations. These methods all involve the activities of experimental
planning, conducting experiments, and fitting models to the outputs. An essential
ingredient in applying DOE methods is the use of procedure called
“randomization” which is defined at the end of this chapter. To preview,
randomization involves making many experimental planning decisions using a
random or unpatterned approach.
The purpose of this chapter is to preview the various DOE methods described
in Part II of this book. All of these DOE methods involve changing key input
variable (KIV) settings which are directly controllable (called factors) using
carefully planned patterns, and then observing outputs (called responses). Also,
this chapter describes the “two-sample t-test” method which permits proof that
one level of a single factor results in a higher average response than another level
of one factor. Two-sample t-testing is also used to illustrate randomization and its
relationship with proof.
Section 2 provides an overview of the different types of DOE and related
methods. Section 3 describes two-sample t-testing with examples and a discussion
of randomization. Section 4 describes an activity called “randomization”, common
to all DOE methods and technically required for achieving proof. Section 5
summarizes the material covered. Note that most of the design of experiments
presented here are supported by standard software such as Minitab®,
DesignExpert®, and Sagata® DOEToolSet and Sagata® Regression. (The author of
this book is part owner of Sagata Ltd.; see www.sagata.com for more details.)
242 Introduction to Engineering Statistics and Six Sigma
Answer: Yes, fractional factorials (FF) are often the last and only design of
experiments method used in many projects. Also, modeling the combined effects
of factors or “interactions” is possible using response surface methods (RSM).
Also, t-testing using randomization can generate proof. Therefore, the correct and
most complete answer is (d).
DOE: The Jewel of Quality Engineering 243
Roughly speaking, this method is useful for situations in which one is interested
in “proving” with a “high level of evidence” that one alternative is better in terms
of average response than another. Therefore, there is one factor of interest at two
levels. The screening procedure described subsequently can permit several factors
to be “proven” significant simultaneously with a comparable number of total tests.
However, a subjectively greater level of assumption-making is needed for those
screening methods such that the two-sample t-test offers a higher level of evidence.
Definition: The phrase “blocking factor” refers to system input variables that
are not of primary interest. For example, in a drug study, the names of the people
receiving the drug and the placebo are not of primary interest even though their
safety is critical.
Algorithm 11.1. Two-sample t-tests
Step 1. a. Develop an experimental table or “DOE array” that describes the levels of
all blocking factors and the factor of interest for each run. The ordering of the
factor levels should exhibit no pattern, i.e., an effort should be made to
allocate all blocking factor levels in an unpatterned way. Ideally,
experimentation is “blind” so that human participants do not know which
level they are testing. Unpatterned ordering can be accomplished by putting
n1 As and n2 Bs in 1 column on a spreadsheet and pseudo-random uniform
[0,1] numbers in the next column. Sorting, we have a “uniformly random”
ordering, e.g., 2-1-1-2-2-2-1…
b. Collect n1 + n2 data, where n1 of these data are run with factor A at level 1
and n2 are run with factor A at level 2 following the experimental table.
Step 2. Defining y1 as the average of the run responses with factor A at level 1 and
s12 as the sample variance of these responses, and making similar definitions
for level 2, one then calculates the quantities t0 and degrees of freedom (df)
using
2
ª § s12 s22 · º
« ¨¨ + ¸¸ »
y1 − y 2 «
df = round « © n1 n 2 ¹ »
t0 = (11.1)
« (s1 n1 ) + (s2 n 2 )
2 2 2 2 2 2 »
s s »
+
1 2
n1 n2 «¬ n1 − 1 n2 − 1 »¼
where “round” means round the number in brackets to the nearest integer.
Step 3. Find tcritical using the Excel formula “=TINV(2*0.05,df)” or using the critical
value from a t-table referenced by tα,df (see Table 11.2). If t0 > tcritical, then
claim that “it has been proven that level 1 of factor A results in a significantly
higher average or expected value of the response than level 2 of factor A with
alpha equal to 0.05”.
Step 4. (Optional) Construct two “box plots” of the response data at each of the two
level settings (see below). Often, these plots aid in building engineering
intuition.
DOE: The Jewel of Quality Engineering 245
α α
df 0.01 0.05 0.1 df 0.01 0.05 0.1
1 31.82 6.31 3.08 7 3.00 1.89 1.41
2 6.96 2.92 1.89 8 2.90 1.86 1.40
3 4.54 2.35 1.64 9 2.82 1.83 1.38
4 3.75 2.13 1.53 10 2.76 1.81 1.37
5 3.36 2.02 1.48 20 2.53 1.72 1.33
6 3.14 1.94 1.44
If the number of data is even, then the 25% (Q1) and 75% (Q3) quartiles are the middle
values of the two halves of the data. Otherwise, they are the median including the
middle in both halves.
Step 1: Draw horizontal lines at the median, Q1, and Q3.
Step 2: Connect with vertical lines the edges of the Q1 and Q3 lines to form a
rectangle or “box”.
Step 3: Then, draw a line from the top middle of the rectangle up to the highest data
below Q3 + 1.5(Q3 – Q1) and down from the bottom middle of the rectangle
to the smallest observation greater than Q1 – 1.5(Q3 – Q1).
Step 4: Any observations above the top of the upper line or below the bottom of the
lower line are called “outliers” and labeled with “*” symbols.
Note that, with only 3 data points, software generally does not follow the above
exactly. Instead, the ends of the boxes are often the top and bottom observations.
If we were trying to prove that level 1 results in a significantly lower average
response than level 2, in Step 3 of Algorithm 11.1, we would test –t0 > tcritical. In
general, if the sign of t0 does not make sense in terms of what we are trying to
prove, the above “one-sided” testing approach fails to find significance. The
phrase “1-tailed test” is a synonym for one-sided.
To prove there is any difference, either positive or negative, use α/2 instead of
α and the test becomes “two-sided” or “2-tailed”. A test is called “double blind”
if it is blind and the people in contact with the human testers also do not know
which level is being given to which participant. The effort to become double blind
generally increases the subjectively assessed level of evidence. Achieving
blindness can require substantial creativity and expense.
The phrase “Hawthorne effect” refers to a change in average output values
caused by the simple act of studying the system, e.g., if people work harder
because they are being watched. To address issues associated with Hawthorne
246 Introduction to Engineering Statistics and Six Sigma
effects and generate a high level of evidence, it can be necessary to include the
current system settings as one level in the application of a t-test. The phrase
“control group” refers to any people in a study who receive the current level
settings and are used to generate response data.
Definition: If something is proven using any given α, it is also proven with all
higher levels of α. The “p-value” in any hypothesis test is the value of α such that
the test statistic, e.g., t0, equals the critical value, e.g., tα,df. The phrase
“significance level” is a synonym for p-value. For example, if the p-value is 0.05,
the result is proven with “alpha” equal to 0.05 and the significance level is 0.05.
Generally speaking, people trying to prove hypotheses with limited amounts of
data are hoping for small p-values.
Using t-testing is one of the ways of achieving evidence such that many people
trained in statistics will recognize a claim that you make as having been “proven”
with “objective evidence”. Note that if t0 is not greater than tcritical, then the
standard declaration is that “significance has not been established”.Then,
presumably either the true average of level 1 is not higher than the true average of
level 2 or, alternatively, additional data is needed to establish significance.
The phrase “null hypothesis” refers to the belief that the factors being studied
have no effects, e.g., on the mean response value. Two-sample t-testing is not
associated with any clear claims about the factors not found to be significant, e.g.,
these factors are not proven to be “insignificant” under any widely used
conventional assumptions. Therefore, failing to find significance can be viewed as
accepting the null hypothesis, but it is not associated with proof.
In general, the testing procedures cannot be used to prove that the null
hypothesis is true. The Bayesian analysis can provide “posterior probabilities” or
chances that factors are associated with negligible average changes in responses
after Step 1 is performed. This nonstandard Bayesian analysis strategy can be used
to provide evidence of factors being unimportant.
Table 11.3. One approach to randomize the run order using pseudo-random numbers
Pseudo-random Response
Levels Run Level Sorted Nos.
Uniform Nos.
1 0.583941 1 1 0.210974 Y1,1=25
1 0.920469 2 2 0.448561 Y2,1=20
1 0.210974 3 1 0.583941 Y1,2=35
2 0.448561 4 2 0.589953 Y2,2=23
2 0.692587 5 2 0.692587 Y2,3=21
2 0.589953 6 1 0.920469 Y1,3=34
Step 1. The engineer uses Table 11.3 to determine the run ordering. Pseudo-
random uniform numbers were generated and then used to sort the levels
for each run. Then, we first input level 1 (the new additive) into the
system and observed the response 25. Then, we input level 2 (the current
additive) and observed 20 and so on.
Step 2. Responses from welding tests are shown in the right-hand column of
Table 11.3. The engineer calculated y1 = 31.3, y 2 = 21.3, s12 = 30.3, s22
= 2.33, t0 = 3.03, and df = 2.
Step 3. The critical value given by Excel “=TINV(0.1,2)” was tcritical = 2.92.
Since t0 was greater than tcritical, we declared, “We have proven that level
1 results in a significantly higher average mean value than level 2 with
alpha equal to 0.05.” The p-value is 0.047.
Step 4. A box plot from Minitab® software is below which shows that level 1
results in higher number of bodies welded on average. Note that with 3
data Minitab® defines the lowest data point at Q1 and the highest data
point as Q3.
A work colleague wants to “prove” that his or her software results in shorter times
to register the product over the internet on average than the current software.
Suppose six people are available for the study: Fred, Suzanne,…(see below).
248 Introduction to Engineering Statistics and Six Sigma
35
Response 30
25
20
1 2
Level
Figure 11.1. Minitab® box plot and whisker for the autobody welding example
Question 1: How many factors, response variables, and levels are involved?
Answer 1: There are two correct answers: (1) two factor (software) at two levels
(new and old) and 1 response (time) and (2) two factors (software and people) at
two and six levels and 1 response (time). If the same person tested more than one
software, people would be a factor.
Question 2: What specific instructions (that a technician can understand) can you
give her to maximize the level of evidence that she can obtain?
Answer 2: Assume that we only want one person to test one software. Then, we
need to randomly assign people to levels of the factor. Take the names Fred,
Suzanne,… and match each to a pseudo-random number, e.g., Fred with 0.82,
Suzanne with 0.22,… Sort the names by the numbers and assign the top half to the
old and the bottom half to the new software. Then, repeat the process with a new
set of pseudo-random numbers to determine the run order. There are other
acceptable approaches, but both assignment to groups and run order must be
randomized.
Analyze the above data and draw conclusions that you think are appropriate.
average time is lower (a better result), therefore the sign of t-critical makes sense
and we can ignore it for the calculation. Since 3.33 > 2.92 we have proven that the
new software reduces the average registration time with α = 0.05.
Question 4: How might the answer to the previous question support decision-
making?
Answer 4: The software significantly reduces average times, but that might not
mean that the new software should be recommended. There might be other criteria
such as reliability and cost of importance.
Question 1: Assume that the experimental designer and all testers are watching all
trials related to Table 11.4. The goal of the new software is task time reduction.
Which is correct and most complete?
a. The data can be used to prove the new software helps with α = 0.05.
b. The theory that the people taking the test learned from watching others is
roughly equally plausible to the theory that the new software helps.
c. The theory that women are simply better at the tasks than men is roughly
equally plausible to the theory that the new software helps.
d. The tests would have been much more valuable if randomization had been
used.
250 Introduction to Engineering Statistics and Six Sigma
Answer 1: The experimental plan has multiple problems. The run order is not
randomized so learning effects could be causing the observed variation. The
assignment of people to levels is not randomized so that gender issues might be
causing the variation. The test was run in an unblind fashion, so knowledge of the
participants could bias the results. Therefore, the correct answer is (e).
Answer 2: Often, in experimentation using t-testing, there are blocking factors that
should be considered in planning and yet the t-testing analysis is appropriate.
Also, the definition of blind is expressed in part (c). Therefore, the answer is (d).
concepts are helpful for competent application and interpretation of results. They
are also helpful for appreciating the benefits associated with standard screening
using fractional factorials.
Table 11.5 defines Type I and Type II errors. The definition of these errors
involves the concepts of a “true” difference and absence of the true difference in
the natural system being studied. Typically, this difference relates to alternative
averages in response values corresponding to alternative levels of an input factor.
In real situations, the truth is generally unknown. Therefore, Type I and Type II
errors are defined in relation to a theoretical construct. In each hypothesis test, the
test either results in a declaration of significance or failure to find significance.
Nature or truth
No difference
Difference exists
exists
Significance is found Type I error Success
Declaration
Failure to find Semi-success Type II error
Question 1: Suppose that you are given a $4,000,000 grant over three years to
study retention of students in engineering colleges by the Ohio State Board of
Regents. The goal is to provide proven methods that can help increase retention,
i.e., cause a higher fraction of students who start as freshmen to graduate in
engineering. Describe one possible parameterization of your engineered system
including the names of relevant factors and responses.
Answer 1: The system boundaries only include the parts of the Ohio public
university network that the Board of Regents could directly influence. These
regents can control factors including: (1) the teaching load per faculty (3 to 7
course per year), (2) the incentives for faculty promotion (weighted towards
teaching or research), (3) the class size (relatively small or large), (4) the
curriculum taught (standard or hands-on), (5) the level of student services (current
or supplemented), and (6) the incentives to honors students at each of the public
campuses (current or augmented). Responses of interest include total revenues per
college, retention rates of students at various levels, and student satisfaction ratings
as expressed through surveys.
Question 2: With regard to the student retension example, how might you spend
the money to develop the proof you need?
11.8 Problems
1. Consider applying DOE to improving your personal health. Which of the
following is correct and most complete?
a. Input factors might include weight, blood pressure, and happiness
score.
b. Output responses might include weight, blood pressure, and
happiness score.
c. Randomly selecting daily walking amount each week could generate
proof.
d. Walking two months 30 minutes daily followed by two months off
can yield proof.
e. Answers to parts (a) and (d) are both correct.
f. Answers to parts (b) and (c) are both correct.
185.0
182.5
180.0
Weight
177.5
175.0
Figure 11.2. Data and Minitab® Box and Whisker plot for weight loss example
6. Calculate the degrees of freedom (df) using data from the above example.
256 Introduction to Engineering Statistics and Six Sigma
Use the following design of experiments array and data to answer Questions 9 and
10. Consider the following in relation to proving that the new software reduces task
times.
11. Which is correct and most complete for t-testing or factor screening?
a. In general, adding more runs to the plan increases many types of error
rates.
b. Type I errors in t-testing include the possibility of missing important
factors.
c. Type II errors in t-testing focus on the possibility of missing
important factors.
d. Standard t-testing can be used to prove the insignificance of factors.
e. All of the above are correct.
f. All of the above are correct except (a) and (e).
12
12.1 Introduction
The methods presented in this chapter are primarily relevant when it is desired to
determine simultaneously which of many possible changes in system inputs cause
average outputs to change. “Factor screening” is the process of starting with a
long list of possibly influential factors and ending with a usually smaller list of
factors believed to affect the average response. More specifically, the methods
described in this section permit the simultaneous screening of several (m) factors
using a number of runs, n, comparable to but greater than the number of factors (n
~ m and n > m).
The methods described here are called “standard screening using fractional
factorials” because they are based on the widely used experimental plans proposed
by Fisher (1925) and in Plackett and Burman (1946) and Box et al. (1961 a, b).
The term “prototype” refers to a combination of factor levels because each run
often involves building a new or prototype system. The experimental plans are
called fractional factorials because they are based on building only a fraction of the
prototypes that would constitute all combinations of levels for all factors of interest
(a full factorial). The analysis methods used were proposed in Lenth (1989) and Ye
et al. (2001).
Compared with multiple applications of two-sample t-tests, one for each factor,
the standard screening methods based on fractional factorials offer relatively
desirable Type I and Type II errors. This assumes that comparable total
experimental costs were incurred using the “one-factor-at-a-time” (OFAT) two-
sample t-test applications and the standard screening using fractional factorial
methods. It also requires additional assumptions that are described in the “decision
support” section below. Therefore, the guarantees associated with two-sample t-
tests require fewer and less complicated assumptions.
260 Introduction to Engineering Statistics and Six Sigma
Pre-step. Define the factors and ranges, i.e., the highs, H, and lows, L, for all factors.
Step 1. Form your experimental array by selecting the first m columns of the array
(starting from the left-hand column) in the table below with the selected
number of runs n. The remaining n – m – 1 columns are unused.
Step 2. For each factor, if it is continuous, scale the experimental design using the
ranges selected by the experimenter. Dsi,j = Lj + 0.5(Hj – Lj)(Di,j + 1) for i =
1,…,n and j = 1,…,m. Otherwise, if it is categorical simply assign the two
levels, the one associated with “low” to –1 and the level with “high” to +1.
Step 3. Build and test the prototypes according to Ds. Record the test measurements
for the responses from the n runs in the n dimensional vector Y.
Step 4. Form the so-called “design” matrix by adding a column of 1s, 1, to the left
hand side of the entire n × (n – 1) selected design D, i.e., X = (1|D). Then,
for each of the q responses calculate the regression coefficients β est = AY,
where A is the (X′X)–1X′ (see the tables below for pre-computed A).
Always use the same A matrix regardless of the number of factors and the
ranges.
Step 5. (Optional) Plot the prediction model, yest(x), for prototype system output
yest (x) = βest,1 + βest,2 x1 + … + βest,m xm (12.1)
as a function of xj varied from –1 to 1 for j = 1, …, m, with the other factors
held constant at zero. These are called “main effects plots” and can be
® ®
generated by standard software such as Minitab or using Sagata
software. A high absolute value of the slope, βest,j, provides some evidence
that the factor, j, has an important effect on the average response in
question.
Step 6. Calculate s0 using
s0 = median{|βest,2|,…,|β est,n|} (12.2)
where the symbols “||” stand for the absolute values. Let S be the set of
non-negative numbers |βest,2|,…,|β est,n| in S with values less than 2.5s0 for r
= 1, …, q. Next, calculate
PSE = 1.5 × median{numbers in S} (12.3)
and
tLenth,j = |βest,j+1|/PSE for j = 1, …, m. (12.4)
Step 7. If tLenth,j > tLenth Critical,α,n given in Table 12.1, then declare that factor j has a
significant effect for response for j = 1, …, m. The critical values, tLenth
critical,α,n, were provided by Ye et al. (2001). The critical values are designed
to control the experimentwise error rate (EER) and the less conservative
individual error rate (IER).
Step 8. (Subjective system optimization) If one level has been shown to offer
significantly better average performance for at least one criterion of
interest, then use that information subjectively in your engineered system
optimization. Otherwise, consider adding more data and/or take the fact
that evidence does not exist that the level change helps into account in
system design.
262 Introduction to Engineering Statistics and Six Sigma
In general, extreme care should be given for the prestep, i.e., the
parameterization of the engineered system design problem. If the factors are varied
over ranges containing only poor prototype system designs, then the information
derived from the improvement system will likely be of little value. Also, it is
common for engineers to select timidly ranges with the settings too close together.
For example, varying wing length from 5 cm to 6 cm for a paper air plane would
likely be a mistake. If the ranges are too narrow, then the method will fail to find
significant differences. Also, the chance that good prototype designs are in the
experimental region increases as the factor ranges increase.
Further, special names are given for cases in which human subjects constitute
an integral part of the system generating the responses of interest. The phrase
“within subjects variable” refers to a factor in an experiment in which a single
subject or group is tested for all levels of that factor. For example, if tests all tests
are performed by one person, then all factors are within subject variables.
The phrase “between subject variables” refers to factors for which a different
group of subjects is used for each level in the experimental plan. For example if
each test was performed by a different person, then all factors would be between
subject variables. A “within subjects design” is an experimental plan involving
only within subject variables and a “between subjects design” is a plan involving
only between subject variables. This terminology is often used in human factors
and biological experimentation and can be useful for looking up advanced analysis
procedures.
Figure 12.1 provides a practical worksheet following steps similar to the ones
in the above method. The worksheet emphasizes the associated system design
decision problem and de-emphasizes hypothesis testing. Considering that a single
application of fractional factorials can constitute an entire quality project in some
instances, it can make sense to write a problem statement or mini-project charter.
Also, clarifying with some detail what is meant by key reponses and how they are
measured is generally good practice.
Also, small differences shown on main effects plots can provide useful
evidence about factors not declared significant. First, if the average differences are
small, adjusting the level settings based on other considerations besides the average
response might make sense, e.g., to save cost or reduce environmental impacts.
Further, recent research suggests that Type II errors may be extremely common
and that treating to even small differences on main effect plots (i.e., small
“effects”) as effective “proof” might be advisable.
Selecting the ranges and the number of runs can be viewed as a major part of
the design of the “improvement system”. Then, Steps 2-8 are implementation of the
improvement system to develop recommended inputs for the engineered system.
DOE: Screening Using Fractional Factorials 263
7. Recommendations (Design)
1.
8.Confirmation (Verify)
9. 2.
3.
Figure 12.1. Worksheet based on the eight run regular fractional factorial
Table 12.1. Critical values for tLenth critical,α,n: (a) EER and (b) IER
(a) (b)
n runs n runs
α 8 12 16 α 8 12 16
0.01 9.715 7.412 6.446 0.01 5.069 4.077 3.629
0.05 4.867 4.438 4.240 0.05 2.297 2.211 2.156
0.10 3.689 3.564 3.507 0.10 1.710 1.710 1.701
Run x1 x2 x3 x4 x5 x6 x7
1 -1 1 -1 1 1 -1 -1
2 1 1 -1 -1 -1 -1 1
3 1 -1 -1 1 -1 1 -1
4 1 -1 1 -1 1 -1 -1
5 -1 1 1 -1 -1 1 -1
6 -1 -1 1 1 -1 -1 1
7 1 1 1 1 1 1 1
8 -1 -1 -1 -1 1 1 1
264 Introduction to Engineering Statistics and Six Sigma
Table 12.3. (a) The design or X matrix and (b) A = (X′X)–1X′ for the eight run plan
(a)
1 -1 1 -1 1 1 -1 -1
1 1 1 -1 -1 -1 -1 1
1 1 -1 -1 1 -1 1 -1
1 1 -1 1 -1 1 -1 -1
1 -1 1 1 -1 -1 1 -1
1 -1 -1 1 1 -1 -1 1
1 1 1 1 1 1 1 1
1 -1 -1 -1 -1 1 1 1
(b)
0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125
-0.125 0.125 0.125 0.125 -0.125 -0.125 0.125 -0.125
0.125 0.125 -0.125 -0.125 0.125 -0.125 0.125 -0.125
-0.125 -0.125 -0.125 0.125 0.125 0.125 0.125 -0.125
0.125 -0.125 0.125 -0.125 -0.125 0.125 0.125 -0.125
0.125 -0.125 -0.125 0.125 -0.125 -0.125 0.125 0.125
-0.125 -0.125 0.125 -0.125 0.125 -0.125 0.125 0.125
-0.125 0.125 -0.125 -0.125 -0.125 0.125 0.125 0.125
0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083
0.083 -0.083 0.083 0.083 -0.083 0.083 0.083 -0.083 -0.083 0.083 -0.083 -0.083
0.083 0.083 -0.083 0.083 0.083 -0.083 0.083 0.083 -0.083 -0.083 -0.083 -0.083
-0.083 0.083 0.083 0.083 -0.083 0.083 -0.083 0.083 -0.083 -0.083 -0.083 0.083
0.083 0.083 -0.083 -0.083 -0.083 0.083 0.083 -0.083 0.083 -0.083 -0.083 0.083
0.083 -0.083 -0.083 0.083 -0.083 -0.083 -0.083 0.083 0.083 0.083 -0.083 0.083
-0.083 0.083 -0.083 0.083 0.083 0.083 -0.083 -0.083 0.083 0.083 -0.083 -0.083
0.083 0.083 0.083 -0.083 0.083 -0.083 -0.083 -0.083 -0.083 0.083 -0.083 0.083
-0.083 -0.083 0.083 0.083 0.083 -0.083 0.083 -0.083 0.083 -0.083 -0.083 0.083
-0.083 0.083 0.083 -0.083 -0.083 -0.083 0.083 0.083 0.083 0.083 -0.083 -0.083
-0.083 -0.083 -0.083 -0.083 0.083 0.083 0.083 0.083 -0.083 0.083 -0.083 0.083
0.083 -0.083 0.083 -0.083 0.083 0.083 -0.083 0.083 0.083 -0.083 -0.083 -0.083
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
-1 1 1 1 -1 1 1 1 1 -1 -1 -1 1 -1 -1 -1
1 1 1 -1 -1 -1 1 1 -1 -1 1 -1 -1 1 -1 1
1 1 -1 1 -1 -1 -1 1 1 -1 1 1 -1 -1 1 -1
-1 1 -1 1 1 1 1 -1 -1 -1 1 -1 -1 -1 1 1
1 1 1 -1 -1 1 -1 -1 1 1 -1 -1 -1 -1 1 1
-1 1 -1 -1 -1 1 -1 1 -1 -1 -1 1 1 1 1 1
A = 0.0625 1 1 -1 -1 1 -1 1 -1 1 -1 -1 -1 1 1 1 -1
1 1 1 1 1 -1 -1 -1 -1 -1 -1 1 1 -1 -1 1
-1 1 1 -1 1 1 -1 -1 1 -1 1 1 -1 1 -1 -1
-1 1 1 -1 1 -1 1 1 -1 1 -1 1 -1 -1 1 -1
-1 1 -1 1 1 -1 -1 1 1 1 -1 -1 -1 1 -1 1
1 1 -1 1 -1 1 1 -1 -1 1 -1 1 -1 1 -1 -1
1 1 -1 -1 1 1 -1 1 -1 1 1 -1 1 -1 -1 -1
-1 1 -1 -1 -1 -1 1 -1 1 1 1 1 1 -1 -1 1
-1 1 1 1 -1 -1 -1 -1 -1 1 1 -1 1 1 1 -1
Pre-step. Here, let us assume that the result of “thought experiments” based on
“entertained assumptions” was the informed choice of the n=8 run design including
m=4 factors used in the actual study. For ranges, we have L={low transistor output,
screwed, 0.5 turns, current sink}′ and H={high transistor output, soldered, 1.0
turns, alternative sink}′. The factors each came from different people with the last
factor coming from rework line operators. Without the ability to study all factors
with few runs, the fourth factor might have been dropped from consideration.
DOE: Screening Using Fractional Factorials 267
Step 1. The experimental plan, D, was created by selecting the first four columns
of the n=8 run experimental array above. See part (a) of Table 12.8 below.
The remaining three columns are unused.
Step 2. All the factors are categorical except for the third factor, screw position,
which is continuous. Assigning the factors produced the scaled design, Ds,
in Table 12.8 part (b).
Step 3. 350 units were made and tested based on each combination of process
inputs in the experimental plan (8 × 350 = 2800 units). The single
prototype system response values are shown in Table 12.8 part (c), which
are the fraction of the units that conformed to specifications. Note that, in
this study, the fidelity of the prototype system was extremely high because
perturbations of the engineered system created the prototype system.
Step 4. The relevant design matrix, X, and A = (X′X)–1X′ matrix are given in Table
12.3. The derived list of coefficients is (using βest = AY):
βest = {82.9, –1.125, –0.975, –1.875, 9.825, 0.35, 1.85, 0.55}′.
Step 5. (Optional) The prediction model, yest(x), for prototype system output is
yest(x) = 82.9 – 1.125 x 1 – 0.975 x 2 – 1.875 x 3 + 9.825 x4 . (12.5)
The main effects plot is shown in Figure 12.2 below.
Step 6. We calculated s0 using
s0 = median{|βest,2|,…,|β est,8|} = 1.125. (12.6)
The set S is {1.125, 0.975, 1.875, 0.55, 0.35, 1.85}. Next, calculate
PSE = 1.5 × median{numbers in S}
= (1.5)(1.05) (12.7)
= 1.58
and
tLenth,j = |βest,j+1|/PSE (12.8)
= 0.71, 0.62, 1.19, 6.24 for j = 1, …,4 respectively.
Step 7. In this case, for many choices of IER vs EER and α, the conclusions about
significance were the same. For example, with either tcritical = tIER,α=0.1,n=8 =
1.710 or tcritical = tEER,α=0.05,n=8 = 4.876, the fourth factor “heat sink” had a
significant effect on average yield when varied with α = 0.05 and using the
relatively EER approach. Also, for both choices, the team failed to find
significance for the other factors. They might have changed the average
response but we could not detect it without more data.
Step 8. Subjectively, the team wanted to maximize the yield and heat sink had a
significant effect. It was clear from the main effects plot and the hypothesis
testing for heat sink that the high level (alternative heat sink) significantly
improved the quality compared with the current heat sink. The team
therefore suggested using the alternative heat sink because the material cost
increases were negligible compared with the savings associated with yield
increases. In fact, the first pass process yield increased to greater than 90%
consistently from around 70%. This permitted the company to meet new
demand without adding another rework line. The direct savings was
estimated to be $2.5 million.
268 Introduction to Engineering Statistics and Six Sigma
In evaluating the cost of poor quality, e.g., low yield, it was relevant to consider
costs in addition to the direct cost of rework. This followed in part because
production time variability from rework caused the need to quote high lead times
to customers, resulting in lost sales.
95
90
%yield
85
80
75
70
-1 1 -1 1 -1 1 -1 1
x1 x2 x3 x4
Figure 12.2. Main effects plot for the printed circuitboard example
Table 12.8. (a) Design, D, (b) Scaled design, Ds, and (c) Responses, % yield
(a) (b) (c)
Run x1 x2 x3 x4 Run x1 x2 x3 x4 % Yield
Low trans. 0.5 Alternative
1 -1 1 -1 1 1 Soldered 92.7
output turns sink
High trans. 0.5
2 1 1 -1 -1 2 Soldered Current sink 71.2
output turns
High trans. 0.5 Alternative
3 1 -1 -1 1 3 Screwed 95.4
output turns sink
High trans. 1.0
4 1 -1 1 -1 4 Screwed Current sink 69.0
output turns
Low trans. 1.0
5 -1 1 1 -1 5 Soldered Current sink 72.3
output turns
Low trans. 1.0 Alternative
6 -1 -1 1 1 6 Screwed 91.3
output turns sink
High trans. 1.0 Alternative
7 1 1 1 1 7 Soldered 91.5
output turns sink
Low trans. 0.5
8 -1 -1 -1 -1 8 Screwed Current sink 79.8
output turns
DOE: Screening Using Fractional Factorials 269
Question 1: Consider the example in Table 12.9. What are D, X, X′X, and A?
Answer 1: D is the entire matrix in Table 12.6 without the column for the runs.
Using X = (1|D) one has
1 -1 1 1 -1 1 -1 1 1 -1 -1 -1 1 1 -1 -1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 -1 -1 1 -1 -1 1 1 1 -1 -1 -1 -1 1
1 1 -1 1 1 -1 -1 -1 1 -1 -1 1 1 -1 -1 1
1 -1 -1 -1 1 -1 -1 1 1 1 1 1 -1 1 -1 -1
1 1 -1 -1 1 1 1 -1 -1 1 -1 -1 1 1 -1 -1
1 1 1 -1 1 -1 -1 1 -1 -1 1 -1 1 -1 1 -1
X= 1 1 1 1 -1 -1 1 -1 -1 -1 1 1 -1 1 -1 -1
1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 -1 1 -1
1 -1 -1 -1 -1 1 -1 -1 -1 -1 1 1 1 1 1 1
1 -1 1 1 1 -1 -1 -1 -1 1 -1 -1 -1 1 1 1
1 -1 -1 1 -1 -1 1 -1 1 1 1 -1 1 -1 1 -1
1 1 -1 -1 -1 -1 1 1 1 -1 -1 -1 -1 1 1 1
1 -1 1 -1 -1 -1 1 1 -1 1 -1 1 1 -1 -1 1
1 -1 -1 1 1 1 1 1 -1 -1 1 -1 -1 -1 -1 1
1 -1 1 -1 1 1 1 -1 1 -1 -1 1 -1 -1 1 -1
270 Introduction to Engineering Statistics and Six Sigma
Answer 2: Lenth’s method using the experimentwise error rate (EER) critical
characteristic is a standard conservative approach for analyzing fractional factorial
data. The individual error rate (IER) is less conservative in the Type I error rate is
higher, but the Type II error rate is lower. The needed calculations are as follows:
s0 = median{|βest,2|,…,|β est,12|} = 0.92,
S1 is {0.58, 1.42, 0.08, 0.42, 0.92, 2.25, 0.08, 1.08, 0.75}, and (12.9)
PSE = 1.5 × median{numbers in S1} = (1.5) × (0.75) = 1.125.
tLenth,j = |βest,j+1|/PSE = 0.516, 2.293, 1.262, 24.222, and 0.071 for j = 1, …,5
respectively.
Whether α = 0.01 or α = 0.05, the conclusions are the same in this problem
because the critical values are 7.412 and 4.438 respectively. Tests based on both
identify that factor 4 has a significant effect and fail to find that the other factors
are significant. Factor into system design decision-making that factor 4 has a
significant effect on the average response. Therefore, it might be worthwhile to pay
more to adjust this factor.
Answer 3: The main effects plot shows the predictions of the regression model
when each factor is varied from the low to the high setting, with the other factors
held constant at zero. For example, the prediction when the factor x2 is at the low
level is 110.42 – 2.58 = 107.84. Figure 12.3 shows the plot. We can see that factor
x4 has a large negative effect and the other factors have small effects on the average
response. If the costs of changing the factors were negligible, the plot would
indicate which settings would likely increase or decrease the average response.
140
130
120
110
100
90
80
-1 1 -1 1 -1 1 -1 1 -1 1
xx11 xx22 xx33 xx44 xx55
Figure 12.3. Main effects plot for the fictional example
DOE: Screening Using Fractional Factorials 271
Question 4: Suppose that the high level of each factor was associated with a
substantial per unit savings for the company, but that demand is assumed to be
directly proportional to the customer rating, which is the response. Use the above
information to make common-sense recommendations under the assumption that
the company will not pay for any more experiments.
Answer 4: Since the high setting of factor x4 is associated with a significant drop
in average response and thus demand, it might not make sense to use that setting to
stimulate demand. In the absence of additional information, however, the other
factors fail to show any significant effect on average response and demand.
Therefore, we tentatively recommend setting these factors at the high level to save
cost.
It is possible that many researchers from many places in the world independently
generated matrices similar to those used in standard screening using fractional
factorials. Here, the focus is on the school of research started by the U.K.
researcher Sir Ronald Fisher. In the associated terminology, “full factorials” are
arrays of numbers that include all possible combinations of factor settings for a
pre-specified number of levels. For example, a full factorial with three levels and
five factors consists of all 35 = 243 possible combinations. Sir Ronald Fisher
generated certain fractional factorials by starting with full factorials and removing
portions to create half, quarter, eighth, and other fractions.
Box et al. (1961 a, b) divided fractional factorials into “regular” and “irregular”
designs. “Regular fractional factorials” are experimental planning matrices that
are fractions of full factorials having all of the following property. All columns in
the matrix can be formed by multiplying other columns. Irregular designs are all
arrays without the above-mentioned multiplicative property. For example, in Table
12.10 (a), it can be verified that column A is equal to the product of columns B
times C. Regular designs are only available with numbers of runs given by n = 2p,
where p is a whole number. Therefore, possible n equal 4, 8, 16, 32,…
272 Introduction to Engineering Statistics and Six Sigma
Consider the three factor full factorial and the regular fractions in Table 12.10.
The ordering of the runs in the experimental plan in Table 12.10 (a) suggests one
way to generate full factorials by alternating –1s and 1s at different rates for
different columns. Note that the experimental plan is not provided in randomized
order and should not be used for experimentation in the order given. The phrase
“standard order” (SO) refers to the not-randomized order presented in the tables.
A “generator” is a property of a specific regular fractional factorial array
showing how one or more columns may be obtained by multiplying together other
columns. For example, Table 12.10 (b) and (c) show selection of runs from the full
factorial with a specific property. The entry in column (c) is the product of the
entries in columns (a) and (b) giving the generator, (c) = (a)(b). The phrase
“defining relations” refers a set of generators that are sufficiently complete as to
uniquely identify a regular fractional factorial in standard order.
Table 12.10. (a) Full factorial, (b) half fraction, and (c) quarter fraction
Table 12.11. A Placket Burman fractional factorial array not in randomized order
Answer: Placket and Burman considered many possible sequences and picked the
one that achieved desirable properties such as orthogonality. For regular designs,
all columns can be achieved as products of other columns, and X′X is diagonal for
assumptions in this chapter. Therefore, the correct answer is (a).
Run x1 x2 x3 x4 x5 x6 x7 x8
1 2 3 1 3 2 3 1 2
2 1 1 2 2 2 2 2 2
3 1 3 1 2 1 3 2 3
4 2 3 3 2 1 2 3 1
5 1 2 2 2 3 3 1 1
6 1 2 3 3 1 1 2 2
7 2 2 2 3 1 2 1 3
8 1 3 2 3 2 1 3 1
9 2 1 1 3 3 2 2 1
10 2 1 3 2 2 1 1 3
11 1 3 3 1 3 2 1 2
12 2 1 2 1 1 3 3 2
13 1 1 3 3 3 3 3 3
14 1 1 1 1 1 1 1 1
15 2 2 1 2 3 1 3 2
16 1 2 1 1 2 2 3 3
17 2 2 3 1 2 3 2 1
18 2 3 2 1 3 1 2 3
approach are even lower than the probabilities in Table 18.3 in Chapter 18 with n1
= n2 = 3. In the next, section information about Type I and II errors suggests that
standard screening using fractional factorial methods offers reduced error rates of
both types.
Table 12.13. (a) Fractional factorial example, (b) low cost OFAT, (c) multiple t-tests
The hypothetical response data in Table 12.13 (a) was generated from the
equation Y = 19 + 6x1 + 9x1x3 with “+1” random noise added to the first response
only. Therefore, there is an effect of factor x1 in the “true” model. It can be checked
(see Problem 18 at the end of this chapter) that tLenth,1 = 27 so that the first factor
(x1) is proven using Lenth’s method with α = 0.01 and the EER convention to
affect the average response values significantly. Therefore, the combined effect or
“interaction” represented by 9x1x3 does not cause the procedure to fail to find that
x1 has a significant effect.
It would be inappropriate to apply two-sample t-testing analysis to the data in
Table 12.13 (a) focusing on factor x1. This follows because randomization was not
applied with regard to the other factors. Instead, a structured, formal experimental
plan was used for these. However, applying one-sided two-sample t-testing (see
Problem 19 below) results in t0 = 1.3, which is associated with a failure to find
significance with α = 0.05. This result provides anecdotal evidence that regular
fractional factorial screening based on Lenth’s method offers statistical power to
find factor significance and avoid Type II errors. Addressing interactions and using
all runs for each test evaluation helps in detecting even small effects.
DOE: Screening Using Fractional Factorials 277
Question: Consider the first “printed circuit board” case study in this chapter.
What advice could you provide for the team about errors?
Answer: The chance of false positives (Type I errors) are directly controlled by the
selection of the critical parameter in the methods. With only four factors and eight
runs, the chances of Type II errors are lower than those typically accepted by
method users. Still, only rather large actual differences will likely be found
significant unless a larger design of experiments matrix were used, e.g., n = 12 or
n = 16.
12.7 References
Allen TT, Bernshteyn M (2003) Supersaturated Designs that Maximize the
Probability of Finding the Active Factors. Technometrics 45: 1-8
Box GEP, Hunter JS (1961a) The 2k-p fractional factorial designs, part I.
Technometrics 3: 311-351
278 Introduction to Engineering Statistics and Six Sigma
Box GEP, Hunter JS (1961b) The 2k-p fractional factorial designs, part II.
Technometrics 3:449-458
Brady J, Allen T (2002) Case Study Based Instruction of SPC and DOE. The
American Statistician 56 (4):1-4
Daniel C (1959) Use of Half-Normal Plots in Interpreting Factorial Two-Level
Experiments. Technometrics 1: 311-341.
Fisher RA (1925) Statistical Methods for Research Workers. Oliver and Boyd,
London
Lenth RV (1989) Quick and Easy Analysis of Unreplicated Factorials.
Technometrics 31: 469-473
Plackett RL, Burman JP (1946) The Design of Optimum Multifactorial
Experiments. Biometrika 33: 303-325
Ye K, Hamada M, Wu CFJ (2001) A Step-Down Lenth Method for Analyzing
Unreplicated Factorial Designs. Journal of Quality Technology 33:140-152
12.8 Problems
1. Which is correct and most complete?
a. Using FF, once an array is chosen, generally only the first m columns
are used.
b. Typically roughly half of the settings change from run to run in
applying FF.
c. Selecting the factors and levels is critical and should be done
carefully.
d. Main effects plots often clarify which factors matter and which do
not.
e. The approved approach for designing systems is to select the DOE
array settings that gave the best seeming responses.
f. All of the above are correct.
g. All of the above are correct except (e) and (f).
5. Which of the following is correct and most complete based on Table 12.14?
a. There are five factors, and the most standard, conservative analysis
uses EER.
b. Even if four factors had been used, the same A matrix would be
applied.
c. The matrix used is part of a matrix that can handle as many as 11
factors.
d. All of the above are correct.
e. All of the above are correct except (a) and (d).
Run (i) xi,1 xi,2 xi,3 xi,4 xi,5 xi,6 Y1 βest (Coefficients)
1 -1 1 -1 1 1 -1 20 β1(Constant) 35.0
2 1 1 -1 -1 -1 -1 50 β2(factor x1) 1.0
3 1 -1 -1 1 -1 1 22 β 3(factor x2) -0.5
4 1 -1 1 -1 1 -1 52 β 4(factor x3) -0.5
5 -1 1 1 -1 -1 1 48 β 5(factor x4) -15.0
6 -1 -1 1 1 -1 -1 18 β 6(factor x5) 0.5
7 1 1 1 1 1 1 20 β7(factor x6) 0.0
8 -1 -1 -1 -1 1 1 50 β8 -0.5
13. Standard screening using regular FFs is used and all responses are close
together. Which is correct and most complete?
a. Likely users did not vary the factors over wide enough ranges.
b. The exercise could be useful because likely none of the factors
strongly affect the response. That information could be exploited.
c. You will possibly discover that none of the factors has a significant
effect even using IER and α = 0.1.
d. All of the above are correct.
e. All of the above are correct except (a) and (d).
15. Suppose that a person experiments using five factors and the eight run regular
array. βest = {82.9, –1.125, –0.975, –1.875, 9.825, 0.35, 1.85, 0.55}.
a. No factor has a significant effect on the average response with α =
0.05.
b. One fails to find the 4h factor has a significant effect using IER and α
= 0.05.
c. One fails to find the 5h factor has a significant effect using IER and α
= 0.05.
d. The PSE = 2.18.
e. All of the above are correct.
f. All of the above are correct except (a) and (d).
16. Assume n is the number of runs and m is the number of factors. Which is
correct and most complete?
a. Normal probability plots are often used instead of Lenth’s method for
analysis.
b. A reason not to use 3 level designs for screening might be that 2 level
designs give a relatively high chance of finding which factors matter
for the same n.
c. For fixed n, larger m generally means reduced chance of complete
correctness.
d. Adding more factors is often desirable because each factor is a
chance of finding a way to improve the system.
e. All of the above are correct.
f. All of the above are correct except (a) and (e).
18. Give one generator for the experimental design in Table 12.15.
DOE: Screening Using Fractional Factorials 283
13.1 Introduction
Response surface methods (RSM) are primarily relevant when the decision-maker
desires (1) to create a relatively accurate prediction of engineered system input-
output relationships and (2) to “tune” or optimize thoroughly of the system being
designed. Since these methods require more runs for a given number of factors
than screening using fractional factorials, they are generally reserved for cases in
which the importance of all factors is assumed, perhaps because of previous
experimentation.
The methods described here are called “standard response surface methods”
(RSM) because they are widely used and the prediction models generated by them
can yield 3D surface plots. The methods are based on three types of design of
experiments (DOE) matrices. First, “central composite designs” (CCDs) are
matrices corresponding to (at most) five level experimental plans from Box and
Wilson (1951). Second, “Box Behnken designs” (BBDs) are matrices
corresponding to three level experimental plans from Box, Behnken (1960). Third,
Allen et al. (2003) proposed methods based on so-called “expected integrated
mean squared error optimal” (EIMSE-optimal) designs. EIMSE-optimal designs
are one type of experimental plan that results from the solution of an optimization
problem.
We divide RSM into two classes: (1) “one-shot” methods conducted in one
batch and (2) “sequential” methods based on central composite designs from Box
and Wilson (1951). This chapter begins with “design matrices” which are used in
the model fitting part of response surface methods. Next, one-shot and sequential
response surface methods are defined, and examples are provided. Finally, a brief
explanation of the origin of the different types of DOE matrices used is given.
286 Introduction to Engineering Statistics and Six Sigma
f(x1)′
X= # (13.3)
f(xn)′
Question: For m = 3 factors and n = 11 runs, provide a full quadratic f(x) and an
example of D, and the associated X.
Note that the above form contains quadratic terms, e.g., f5(x) = x32. Therefore,
the associated linear model is called a “response surface model”. Terms involving
products, e.g., f7(x) = x1x3, are called interaction terms.
DOE: Response Surface Methods 287
1 -1 -1 -1 1 -1 -1 -1 1 1 1 1 1 1
x1 -1 1 -1 1 -1 1 -1 1 1 1 -1 1 -1
x2 1 1 -1 1 1 1 -1 1 1 1 1 -1 -1
x3 -1 0 0 1 -1 0 0 1 0 0 0 0 0
f(x) = x 12 D= 1 -1 1 X= 1 1 -1 1 1 1 1 -1 1 -1 (13.4)
x 22 0 0 0 1 0 0 0 0 0 0 0 0 0
x 32 1 0 0 1 1 0 0 1 0 0 0 0 0
x1x2 -1 -1 1 1 -1 -1 1 1 1 1 1 -1 -1
x1x3 -1 1 1 1 -1 1 1 1 1 1 -1 -1 1
x2x3 1 1 1 1 1 1 1 1 1 1 1 1 1
1 -1 -1 1 1 -1 -1 1 1 1 -1 -1 1
Referring back to the first case study in Chapter 2 with the printed circuit
board, the relevant design, D, and X matrix were:
-1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1
1 1 -1 -1 1 -1 -1 1 1 1 -1 -1 1 -1 -1
1 -1 -1 1 -1 -1 1 1 1 -1 -1 1 -1 -1 1
D= 1 -1 1 -1 -1 1 -1 X= 1 1 -1 1 -1 -1 1 -1
-1 1 1 -1 -1 -1 1 1 -1 1 1 -1 -1 -1 1
-1 -1 1 1 1 -1 -1 1 -1 -1 1 1 1 -1 -1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
-1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1 1 1 1
(13.5)
so that the last row of the X matrix, f1(x8)′, was given by:
f(x8)′ = 1 -1 -1 -1 -1 1 1 1 (13.6)
The next example illustrates how design matrices can be constructed based on
different combinations of experimental plans and functional forms.
In one-shot RSM, the most relevant model form is a full quadratic. However, it is
possible that a model fitter might consider more than one functional form.
Consider the experimental plan and models in Table 13.1.
Answer 2: All of the answers in (a), (b), and (c) are correct. Therefore, the answer
is (d). Note that, the fact that X’X implies the property that this central composite
design is “orthogonal” with respect to first order functional forms.
standard RSM approaches, all factors must be continuous, e.g., none of the system
inputs can be categorical variables such as the type of transistor or country in
which units are made. The development of related methods involving categorical
factors is an active area for research. Software to address combinations of
continuous and categorical factors is available from JMPTM and through
www.sagata.com.
As for screening methods, the experimental design, D, specifies indirectly
which prototype systems should be built. For RSM, these designs can be selected
from any of the ones provided in tables that follow both in this section and in the
appendix at the end of this chapter. The design, D, only indirectly specifies the
prototypes because it must be scaled or transformed using H and L to describe the
prototypes in an unambiguous fashion that people who are not familiar with DOE
methods can understand (see below).
Pre-step. Define the factors and ranges, i.e., the highs, H, and lows, L, for all factors.
Step 1. Select the experimental design from the Tables in 13.2 below to facilitate the
scaling in Step 2. Options include the Box Behnken (BBD), central composite
(CCD), or EIMSE-optimal designs. If a CCD design is used, then the
parameter αC can be adjusted as desired. If αC = 1, then only three levels are
used. The default setting is αC = sqrt{m}, where m is the number of factors.
Step 2. Scale the experimental design using the ranges selected by the experimenter.
Dsi,j = Lj + 0.5(Hj – Lj)(Di,j + 1) for i = 1,…,n and j = 1,…,m.
Step 3. Build and test the prototypes according to Ds. Record the test measurements
for the responses for the n runs in the n dimensional vector Y.
Step 4. Form the so-called “design” matrix, X, based on the scaled design, Ds,
following the rules in the above equations. Then, calculate the regression
coefficients β est = AY, where A is the (X′X)–1X′. Reexamine the approach
used to generate the responses to see if any runs were not representative of
system responses of interest.
Step 5. (Optional) Plot the prediction model, yest(x), for prototype system output
The above method is given in terms of only a single response. Often, many
responses are measured in Step 3, derived from the same prototype systems. Then,
Step 4 and Step 5 can be iterated for each of the responses considered critical.
Then also, optimization in Step 6 would make use of the multiple prediction
models.
290 Introduction to Engineering Statistics and Six Sigma
Table 13.2. RSM designs: (a) BBD, (b) and (c) EIMSE-optimal, and (d) CCD
If EIMSE designs are used, the method is not usually referred to as “standard”
RSM. These designs and others are referred to as “optimal designs” or sometimes
DOE: Response Surface Methods 291
“computer generated” designs. However, they function in much the same ways as
the standard designs and offer additional method options that might be useful.
With scaled units used in the calculation of the X matrices, care must be taken
to avoid truncation of the coefficients. In certain cases of possible interest, the
factor ranges, (Hj – Lj), may be such that even small coefficients, e.g., 10–6, can
greatly affect the predicted response. Therefore, it can be of interest to fit the
models based on inputs and design matrices in the original -1 and 1 coding.
Another benefit of using “coded” -1, 1 units is that the magnitude of the derived
coefficients can be compared to see which factors are more important in their
effects on response averages.
The majority of the experimental designs, D, associated with RSM have
repeated runs, i.e., repeated combinations of the same settings such as x1 = 0, x2 =
0, x3 = 0, x4 = 0. One benefit of having these repeated runs is that the experimenter
can use the sample standard deviation, s, of the associated responses as an
“assumption free” estimate of the standard deviation of the random error, σ
(“sigma”). This can establish the so-called “process capability” of the prototype
system and therefore aid in engineered system robust optimization (see below).
In Step 4, a reassessment is often made of each response generated in Step 3, to
see if any of the runs should be considered untrustworthy, i.e., not representative of
system performance of interest. Chapter 15 provides formulas useful for
calculating the so-called “adjusted R-squared”. In practice, this quantity is
usually calculated directly by software, e.g., Excel. Roughly speaking, the adjusted
R-squared gives the “fraction of the variation explained by changes in the factors”.
When one is analyzing data collected using EIMSE-optimal, Box Behnken, or
central composite designs, one expects adjusted R-squared values in excess of 0.50
or 50%. Otherwise, there is a concern that some or all of the responses are not
trustworthy and/or that the most important factors are unknown and uncontrolled
during the testing.
Advanced readers might be interested to know that, for certain assumptions,
Box Behnken designs are EIMSE designs. Also, an approximate formula to
estimate the number of runs, n, required by standard response surface methods
involving m factors is 0.75(m + 1)(m + 2).
Pre-step. The highs, H, and lows, L, for all factors are shown in Table 13.3.
Step 1. The Box Behnken design was selected in Table 13.4 (a) because it
was offered a reasonable balance between run economy and prediction
accuracy.
Step 2. The scaled Ds array is shown in Table 13.4 (b).
Step 3. Planes were thrown from sholder height and the time in air was
measured using a stopwatch and recorded in the right-hand column of
Table 13.4 (b).
Step 4. The fitted coefficients are written in the regression model:
Predicted Average Time =
– 295.14 + 30.34 Width + 38.12 Length + 0.000139 Angle
– 1.25 Width² – 1.48 Length² – 0.000127 Angle²
– 1.2 (Width) (Length) – 0.00778 (Width) (Angle) (13.8)
+ 0.00639 (Length) (Angle)
which has adjusted R2 of only 0.201. Inspection of the airplanes
reveals that the second prototype (Run 2) was not representative of the
system being studied because of an added fold. Removing this run did
little to change the qualitative shape of the surface but it did increase
the adjusted R2 to 0.55.
Step 5. The 3D surface plot with rudder fixed at 0° is in Figure 13.1. This plot
was generated using Sagata® Regression.
Step 6. The model indicates that the rudder angle did affect the time but that 0°
is approximately the best. Considering that using 0° effectively
removes a step in the manufacturing SOP, which is generally
desirable, that setting is recommended. Inspection of the surface plot
then indicates that the highest times are achieved with width A equal
to 7.4 inches and length equal to 9.9 inches.
7
Flight Time (seconds)
6
5
4
3
2
1 6.5
0
7.2
9.0
9.4
7.8
9.9
8.5
Plane Length (inches)
10.8
Table 13.4. Example (a) coded DOE array, D and (b) scaled Ds and response values
(a) (b)
A B C Run A -Width B - Length Angle Time
0 -1 1 1 7.5 9 90 3.3
0 0 0 2 7.5 10 0 3.1
1 0 1 3 8.5 10 90 2.2
0 -1 -1 4 7.5 9 -90 4.2
-1 -1 0 5 6.5 9 0 1.1
-1 0 1 6 6.5 10 90 5.3
1 1 0 7 8.5 11 0 1.3
1 0 -1 8 8.5 10 -90 1.8
-1 1 0 9 6.5 11 0 3.9
0 0 0 10 7.5 10 0 6.1
1 -1 0 11 8.5 9 0 3.3
0 1 -1 12 7.5 11 -90 0.8
0 1 1 13 7.5 11 90 2.2
0 0 0 14 7.5 10 0 6.2
-1 0 -1 15 6.5 10 -90 2.1
As a second example, consider that researchers at the Ohio State University Die
Casting Research Center have conducted a series of physical and computer
experiments designed to investigate the relationship of machine dimensions and
part distortion. This example is described in Choudhury (1997). Roughly speaking,
the objective was to minimize the size and, therefore, cost of the die machine while
maintaining acceptable part distortion by manipulating the inputs, x1, x2, and x3
shown in the figure below. These factors and ranges and the selected experimental
design are shown in Figure 13.2 and Table 13.5.
294 Introduction to Engineering Statistics and Six Sigma
Step 1. The team prepared the experimental design, D, shown in the Table 13.6
for scaling. Note that this design is not included as a recommended option
for the reader because of what may be viewed as a mistake in the
experimental design generation process. This design does not even
approximately maximize the EIMSE, although it was designed with the
EIMSE in mind.
Step 2. The design in Table 13.6 used the above-mentioned ranges and the formula
Dsi,j = Lj + 0.5(Hj – Lj)(Di,j + 1) for i = 1,…,11 and j = 1,…,3 to produce the
experimental plan in Table 13.7 part (a).
Step 3. The prototypes were built according to Ds using a type of virtual reality
simulation process called finite element analysis (FEA). From these FEA
test runs, the measured distortion values Y1,…,Y8 are shown in the Table
13.7 (b). The numbers are maximum part distortion of the part derived
from the simulated process in inches.
Step 4. The analyst on the team calculated the so-called “design” matrix, X, and A
= (X′X)–1X′ given by Table 13.8 and Table 13.9. Then, for each of the 8
responses, the team calculated the regression coefficients shown in Table
13.10 using βest,r = AYr for r = 1, …, 8.
Step 6. In this study, the engineers chose to apply formal optimization. They chose
to limit maximum average part distortion to 0.075” while minimizing the
2.0 x1 + x2 which is roughly proportional to the machine cost. They
included the experimental ranges as constraints, L x H, because they
knew that prediction model errors usually become unacceptable outside the
prediction ranges. The precise optimization formulation was:
Minimize: 2.0 x1 + x2
Subject to:
Maximum[yest,1(x), yest,2(x),…, yest,8(x)] 0.075” (13.10)
and L x H
which has the solution x1 = 6.3 inches, x2 = 9.0 inches, and x3 = 12.5 inches.
This solution was calculated using the Excel solver.
DOE: Response Surface Methods 295
Die
C
A
B
Figure 13.2. The factor explanation for the one-shot RSM casting example
Table 13.5. The factor and range table for the one-shot RSM casting example
Run X1 x2 x3
1 -1.0 1.0 1.0
2 1.0 -1.0 1.0
3 1.0 1.0 -1.0
4 1.0 1.0 1.0
5 1.0 0.0 0.0
6 0.0 1.0 0.0
7 0.0 0.0 1.0
8 0.0 0.0 0.0
9 0.5 -1.0 -1.0
10 -1.0 0.5 -1.0
11 -1.0 -1.0 0.5
296 Introduction to Engineering Statistics and Six Sigma
Table 13.7. (a) Ds and (b) measured distortions at eight part locations in inches
(a) (b)
Run x1 x2 x3 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8
1 5.50 14.50 12.5 0.0167 0.0185 0.0197 0.0143 0.0113 0.0177 0.0195 0.0153
2 8.00 9.00 12.5 0.006 0.0069 0.0069 0.004 0.0008 0.0078 0.0088 0.0057
3 8.00 14.50 0.00 0.0053 0.0038 0.002 0.0063 0.0069 0.0062 0.0047 0.0072
4 8.00 14.50 12.5 0.0056 0.0067 0.0074 0.0038 0.0016 0.0064 0.0075 0.0046
5 8.00 11.75 6.25 0.0067 0.0066 0.0037 0.0063 0.0051 0.0079 0.0078 0.0076
6 6.75 14.50 6.25 0.0109 0.0109 0.0104 0.0104 0.0093 0.0118 0.0118 0.0113
7 6.75 11.75 12.5 0.0095 0.011 0.0117 0.0072 0.0038 0.0109 0.0123 0.0086
8 6.75 11.75 6.25 0.0106 0.0105 0.0098 0.0100 0.0087 0.0118 0.0118 0.0113
9 7.38 9.00 0.00 0.007 0.0048 0.0013 0.008 0.008 0.0092 0.0069 0.0103
10 5.50 13.13 0.00 0.0163 0.0144 0.012 0.0177 0.0185 0.0175 0.0155 0.0188
11 5.50 9.00 9.38 0.0175 0.0183 0.018 0.0155 0.0122 0.0199 0.0205 0.0178
1.00 5.50 14.50 12.50 30.25 210.25 156.25 79.75 68.75 181.25
1.00 8.00 9.00 12.50 64.00 81.00 156.25 72.00 100.00 112.50
1.00 8.00 14.50 0.00 64.00 210.25 0.00 116.00 0.00 0.00
1.00 8.00 14.50 12.50 64.00 210.25 156.25 116.00 100.00 181.25
X= 1.00 8.00 11.75 6.25 64.00 138.06 39.06 94.00 50.00 73.44
1.00 6.75 14.50 6.25 45.56 210.25 39.06 97.88 42.19 90.63
1.00 6.75 11.75 12.50 45.56 138.06 156.25 79.31 84.38 146.88
1.00 6.75 11.75 6.25 45.56 138.06 39.06 79.31 42.19 73.44
1.00 7.38 9.00 0.00 54.39 81.00 0.00 66.38 0.00 0.00
1.00 5.50 13.13 0.00 30.25 172.27 0.00 72.19 0.00 0.00
1.00 5.50 9.00 9.38 30.25 81.00 87.89 49.50 51.56 84.38
0.629 -1.439 -0.243 10.565 3.107 -5.165 -14.781 -7.993 -0.296 3.213 13.400
-1.032 0.616 -0.169 -1.601 -3.260 3.332 2.808 1.512 1.318 -1.413 -2.111
0.379 -0.206 -0.101 -0.752 1.254 -1.086 0.934 0.542 -0.520 0.503 -0.947
0.136 0.082 0.245 -0.362 0.115 0.079 0.027 0.053 -0.212 -0.260 0.098
A= 0.138 0.010 0.010 0.052 0.280 -0.232 -0.232 -0.113 -0.085 0.086 0.086
0.002 0.028 0.002 0.011 -0.048 0.058 -0.048 -0.023 0.018 -0.018 0.018
0.000 0.000 0.006 0.002 -0.009 -0.009 0.011 -0.005 0.003 0.003 -0.003
-0.061 -0.061 0.026 0.065 -0.026 -0.026 0.032 0.001 -0.009 -0.009 0.069
-0.027 0.011 -0.027 0.028 -0.011 0.014 -0.011 0.000 -0.004 0.030 -0.004
0.005 -0.012 -0.012 0.013 0.006 -0.005 -0.005 0.000 0.014 -0.002 -0.002
DOE: Response Surface Methods 297
Table 13.10. The prediction model coefficients for the eight responses
0.020
Deflection (inches)
0.018
0.016
0.014
0.012
0.010
0.008
0.006
0.004 0
2
0.002 4
6
0.000 8
10 Diameter Tie
5.5
5.7
5.9
6.1
6.3
6.5
12
6.7
(inches)
Figure 13.3. The predicted distortion for the casting example
Question 1: Assume that a food scientist is trying to improve the taste rating of an
ice cream sandwich product by varying factors including the pressure, temperature,
and amount of vanilla additive. What would be the advantage of using response
surface methods instead of screening using fractional factorials?
Question 2: The scientist is considering varying either three or four factors. What
are the advantages of using four factors?
Answer 2: Two representative standard design methods for three factors methods
require 15 and 16 runs. Two standard design methods for four factors require 27
and 30 runs. Therefore, using only three factors would save the costs associated
with ten or more runs. However, in general, each factor that is varied offers an
opportunity to improve the system. Thorough optimization over four factors
provably results in more desirable or equivalently desirable settings compared with
thorough optimization over three factors.
Question 3: The scientist experiments with four factors and develops a second
order regression model with an adjusted R-squared of 0.95. What does this
adjusted R-squared value imply?
Answer 3: A high R-squared value such as 0.95 implies that the factors varied
systematically in experimentation are probably the most influential factors
affecting the relevant engineered system average response. The effects of other
factors that are not considered most likely have relatively small effects on average
response values. The experimenter feel reasonably confident that “what if”
analyses using the regression prediction model will lead to correct conclusions
about the engineered system.
are selected such that the formula in cell E8 can be copied to all cells in the range
E8:J20, producing correct predictions. Having generated all the predictions and
putting the desired axes values in cells, E7:J7 and D8:D20, then the entire region
D7:J20 is selected, and the “Chart” utility is called up through the “Insert” menu.
An easier way to create identical surface plots is to apply Sagata® software
(www.sagata.com), which also includes EIMSE designs (author is part owner).
Note that the method described here is not the fully sequential response surface
methods of Box and Wilson (1951) and described in textbooks on response surface
methods such as Box and Draper (1987). The general methods can be viewed as an
optimization under uncertainty method which is a competitor to approaches in Part
III of this text. The version described here might be viewed as “two-step”
experimentation in the sense that runs are performed in at most two batches. The
fully sequential response surface method could conceivably involve tens of batches
of experimental test runs.
Two-step RSM are characterized by (1) an “experimental design”, D (2)
vectors that specify the highs, H, and lows, L, of each factor, and (3) the α
parameter used in the lack of fit test in Step 5 based on the critical values in Table
13.11.
Definition: “Block” here refers to a batch of experimental runs that are performed
at one time. Time here is a blocking factor that we cannot randomize over. Rows of
experimental plans associated with blocks are not intended to structure
experimentation for usual factors or system inputs. If they are used for usual
factors, then prediction performance may degrade substantially.
Definition: A “center point” is an experimental run with all of the settings set at
their mid-value. For example, if a factor ranges from 15” to 20” in the region of
interest to the experiment, the “center point” would have a value of 17.5” for the
factor.
Definition: Let the symbol, nc, refer to the number of center points in a central
composite experimental design with the so-called block factor having a value of 1.
For example, for the n = 14 run central composite in part (a) of Table 13.12, nC = 3.
Let yaverage,c and yvariance,c be the sample average and sample variance of the rth
response for the nc center point runs in the first block, respectively. Let the symbol,
nf, refer to the number of other runs with the block factor having a value of 1. For
the same n = 14 central composite, nf = 4. Let yaverage,f be the average of the
response for the nf other runs in the first block.
An example application of central composite designs is given together with the
robust design example in Chapter 14. In that case the magnitude of the curvature
was large enough such that F0 > Fα,1,nC – 1, for both responses for any α between
0.05 and 0.25.
Note that when F0 > Fα,1,nC – 1, it is common to say that “the lack of fit test is
rejected and more runs are needed.” Also, the lack of fit test is a formal hypothesis
test like two-sample t-tests. Therefore, if q responses are tested simultaneously, the
overall probability of wrongly finding significance is greater than the α used for
each test. However, it is less than qα by the Bonferroni inequality. Yet, if accuracy
is critically important, a high value of α should be used because that increases the
chance that all the experimental runs will be used. With the full amount of runs and
the full quadratic model form, prediction accuracy will likely be higher than if
experimentation terminates at Step 5.
DOE: Response Surface Methods 301
Step 1. Prepare the experimental design selected from the tables below to facilitate
the scaling in Step 2. The selected design must be one of the central
composite designs (CCDs), either immediately below or in the appendix at
the end of the chapter.
Step 2. Scale the experimental design using the ranges selected by the
experimenter. Dsi,j = Lj + 0.5(Hj – Lj)(Di,j + 1) for i = 1,…,n and j = 1,…,m.
Step 3. Build and test the prototypes according to only those runs in Ds that
correspond to the runs with the “block” having a setting of 1. Record the
test measurements for the responses for the n1 runs in the n dimensional
vector Y.
Step 4. Form the so-called “design” matrix, X, based on the scaled design, Ds,
based on the following model form, f(x):
f1(x) = 1, fj(x) = xj-1 for j = 2,…,m + 1 (13.11)
and
fm + 2(x) = x1x2, fm + 3(x) = x1x3, …, f[(m + 1) (m + 2)/2]–m (x)
= xm–1xm.
(Note that the pure quadratic terms, e.g., x12, are missing.) Then, for each
of the q responses calculate the regression coefficients β est = AY, where A
is the (X′X)–1X′.
Step 5. Calculate “mean squared lack-of-fit” (MSLOF), yvariance,c, and the F-
statistics, F0 using the following:
MSLOF = nfnc (yaverage,f – yaverage,c)2/(nf + nc) and (13.12)
F0 = (MSLOF)/yvariance,c
where yvariance,c is the sample variance of the center point response values
for the rth response. If F0 < Fα,1,nc – 1, for all responses for which an accurate
model is critical, then go to Step 8. Otherwise continue with Step 6. The
values of Fα,1,nc – 1 are given in Table 13.9. The available prediction model
βest based on the above model form with no pure quadratic terms.
is f1(x)′β
Step 6. Build and test the remaining prototypes according to Ds. Record the test
measurements for the responses for the n2 additional runs in the bottom of
the n1 + n2 dimensional vector Y.
Step 7. Form the so-called “design” matrix, X, based on the scaled design, Ds,
following the rules for full quadratic model forms, f(x), as for one-shot
methods. (The pure quadratic terms are included.) Then, calculate the
regression coefficients β est = AY, where A is the (X′X)–1X′.
Step 8. (Optional) Plot the prediction model, yest(x) = f(x)′β βest for prototype system
output to gain intuition about system inputs and output relationships. The
example above shows how to make 3D plots using Excel and models of the
above form.
Step 9. Apply informal or formal optimization using the prediction models, yest(x),
…, yest(x) to develop recommended settings. Formal optimization is
described in detail in Chapter 6.
302 Introduction to Engineering Statistics and Six Sigma
Table 13.11. Critical values of the F distribution, Fα,ν1,ν2 (a) α = 0.05 and (b) α = 0.10
(a)
α=0.05 ν1
ν2 1 2 3 4 5 6 7 8 9 10
1 161. 199.50 215.71 224.58 230.16 233.99 236.77 238.88 240.54 241.88
2 18.5 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40
3 10.1 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54
(b)
α=0.10 ν1
ν2 1 2 3 4 5 6 7 8 9 10
1 39.86 49.50 53.59 55.83 57.24 58.20 58.91 59.44 59.86 60.19
2 8.53 9.00 9.16 9.24 9.29 9.33 9.35 9.37 9.38 9.39
3 5.54 5.46 5.39 5.34 5.31 5.28 5.27 5.25 5.24 5.23
4 4.54 4.32 4.19 4.11 4.05 4.01 3.98 3.95 3.94 3.92
5 4.06 3.78 3.62 3.52 3.45 3.40 3.37 3.34 3.32 3.30
6 3.78 3.46 3.29 3.18 3.11 3.05 3.01 2.98 2.96 2.94
7 3.59 3.26 3.07 2.96 2.88 2.83 2.78 2.75 2.72 2.70
8 3.46 3.11 2.92 2.81 2.73 2.67 2.62 2.59 2.56 2.54
9 3.36 3.01 2.81 2.69 2.61 2.55 2.51 2.47 2.44 2.42
10 3.29 2.92 2.73 2.61 2.52 2.46 2.41 2.38 2.35 2.32
11 3.23 2.86 2.66 2.54 2.45 2.39 2.34 2.30 2.27 2.25
12 3.18 2.81 2.61 2.48 2.39 2.33 2.28 2.24 2.21 2.19
13 3.14 2.76 2.56 2.43 2.35 2.28 2.23 2.20 2.16 2.14
14 3.10 2.73 2.52 2.39 2.31 2.24 2.19 2.15 2.12 2.10
15 3.07 2.70 2.49 2.36 2.27 2.21 2.16 2.12 2.09 2.06
DOE: Response Surface Methods 303
Table 13.12. Central composite designs for (a) 2 factors, (b) 3 factors, and (c) 4 factors
(a) (b) (c)
Run Block x1 x2 Run Block x1 x2 x3 Run Block x1 x2 x3 x4
1 1 0 0 1 1 1 1 1 1 1 -1 1 -1 -1
2 1 1 -1 2 1 1 -1 1 2 1 -1 1 -1 1
3 1 1 1 3 1 0 0 0 3 1 0 0 0 0
4 1 -1 1 4 1 0 0 0 4 1 1 -1 -1 -1
5 1 -1 -1 5 1 -1 -1 -1 5 1 1 -1 1 -1
6 1 0 0 6 1 -1 1 -1 6 1 -1 1 1 -1
7 1 0 0 7 1 -1 -1 1 7 1 -1 1 1 1
8 2 0 -1.41 8 1 -1 1 1 8 1 1 1 -1 1
9 2 -1.41 0 9 1 0 0 0 9 1 1 1 1 -1
10 2 0 0 10 1 1 -1 -1 10 1 -1 -1 1 -1
11 2 0 1.41 11 1 0 0 0 11 1 -1 -1 -1 -1
12 2 0 0 12 1 1 1 -1 12 1 1 -1 1 1
13 2 0 0 13 2 0 -αC 0 13 1 0 0 0 0
14 2 1.41 0 14 2 0 0 0 14 1 0 0 0 0
15 2 0 0 -αC 15 1 -1 -1 -1 1
16 2 -αC 0 0 16 1 0 0 0 0
17 2 0 0 αC 17 1 1 1 1 1
18 2 0 αC 0 18 1 1 -1 -1 1
19 2 αC 0 0 19 1 1 1 -1 -1
20 2 0 0 0 20 1 -1 -1 1 1
21 2 0 0 0 0
22 2 αC 0 0 0
23 2 0 αC 0 0
24 2 0 0 0 αC
25 2 0 0 0 0
26 2 -αC 0 0 0
27 2 0 -αC 0 0
28 2 0 0 -αC 0
29 2 0 0 αC 0
30 2 0 0 0 -αC
Question: Suppose that you had performed the first seven runs of a central
composite design in two factors, and the average and standard deviation of the only
critical response for the three repeated center points are 10.5 and 2.1 respectively.
304 Introduction to Engineering Statistics and Six Sigma
Further, suppose that the average response for the other four runs is 17.5. Perform
a lack of fit analysis to determine whether adding additional runs is needed. Note
that variance = (standard deviation)2.
In this chapter, three types of experimental arrays are presented. The first two
types, central composite designs (CCDs) and Box Behnken designs (BBDs), are
called standard response surface designs. The third type, EIMSE designs,
constitutes one kind of optimal experimental design. Many other types of response
surface method experimental arrays are described in Myers and Montgomery
(2001).
Box and Wilson (1951) generated CCD arrays by combining three components
as indicated by the example in Table 13.11. For clarity, Table 13.11 lists the design
in standard order (SO), which is not randomized. To achieve proof and avoid
problems, the matrix should not be used in this order. The run order should be
randomized.
The first CCD component consists of a two level matrix similar or identical to
the ones used for screening (Chapter 12). Specifically, this portion is either a full
factorials as in Table 13.11 or a so-called “Resolution V” regular fractional
factorial. The phrase “Resolution V” refers to regular fractional factorials with the
property that no column can be derived without multiplying at least four other
columns together. For example, it can be checked that a 16 run regular fractional
factorial with five factors and the generator E = ABCD is Resolution V.
Resolution V implies that a model form with all two level interactions, e.g.,
β 1 0 x 2 x 3 , can be fitted with accuracy that is often acceptable.
The phrase “center points” refers to experimental runs with all setting set to
levels at the midpoint of the factor range. The second CCD component part
consists of nc center points. For example, if factor A ranges from 10 mm to 15 mm
and factor B ranges from 30 °C to 40 °C, the center point settings would be 12.5
DOE: Response Surface Methods 305
mm and 35 °C. The CCD might have nc = 3 runs with these settings mixed in with
the remaining runs. One benefit of performing multiple tests at those central values
is that the magnitude of the experimental errors can be measured in a manner
similar to measuring process capability in Xbar & R charting (Chapter 4). One can
simply take the sample standard deviation, s, of the response values from the center
point runs.
Advanced readers may realize that the quantity s ÷ c4 is an “assumption-free”
estimate of the random error standard deviation, σ0. This estimate can be compared
with the one derivable from regression (Chapter 15), providing a useful way to
evaluate the lack of fit of the fitted model form in addition to the Adjusted R2.
This follows because the regression estimate σ0 of contains contributions from
model misspecification and the random error. The quantity s ÷ c4 only reflects
random or experimental errors and is not effected by the choice of fit model form.
The phrase “star points” refers to experimental runs in which a single factor is
set to αC or –αC while the other factors are set at the midvalues. The last CCD
component part consistes of two star points for every factor. One desirable feature
of CCDs is that the value of αC can be adjusted by the method user. The statistical
properties of the CCD based RSM method are often considered acceptable for 0.5
< αC < sqrt[m], where m is the number of factors.
Table 13.13. Two factor central composite design (CCD) in standard order
Standard Order A B
1 –1 –1
2 1 –1 ĸ regular fractional factorial part
3 –1 1
4 1 1
5 0 0
6 0 0 ĸ three “center points”
7 0 0
8 αC 0
9 0 αC ĸ “star” points
10 –αC 0
11 0 –αC
Box and Behnken (1960) generated BBD arrays by combining two components
as shown in Table 13.12. The first component itself was the combination of two
level arrays and sub-columns of zeros. In all the examples in this book, the two
level arrays are two factor full factorials.
In some cases, the sub-columns of zeros were deployed such that each factor
was associated with one sub-column as shown in Table 13.12. Advanced readers
may be interested to learn that the general structure of the zero sub-columns itself
306 Introduction to Engineering Statistics and Six Sigma
Table 13.14. Three factor Box Behnken design (BBD) in standard order
Standard Order A B C
1 –1 –1 0
2 1 –1 0 ĸ first repetition
3 –1 1 0
4 1 1 0
5 –1 0 –1
6 1 0 –1 ĸ second repetition
7 –1 0 1
8 1 0 1
9 0 –1 –1
10 0 1 –1 ĸ third repetition
11 0 –1 1
12 0 1 1
13 0 0 0
14 0 0 0 ĸ three center points
15 0 0 0
a fundamental limitation of the fitted model form in its ability to replicate the
twists and turns of the true system input-output relationships.
The formula developed by Allen et al. (2003) suggests that prediction errors are
undefined or infinite if the number of runs, n, is less than the number of terms in
the fitted model, k. This suggests a lower limit on the possible number of runs that
can be used. Fortunately, the number of runs is otherwise unconstrained. The
formula predicts that as the number of runs increases, the expected prediction
errors decrease. This flexibility in the number of runs that can be used may be
considered a major advantage of EIMSE designs over CCDs or BBDs. Advanced
readers may realize that BBDs are a subset of the EIMSE designs in the sense that,
for specific assumption choices, EIMSE designs also minimize the expected bias.
This section explores concepts from Allen et al. (2003) and, therefore, previews
material in Chapter 18. It is relevant to decisions about which experimental array
should be used to achieve the desired prediction accuracy. Response surface
methods (RSM) generate prediction models, yest(x) intended to predict accurately
the prototype system’s input-output relationships. Note that, in analyzing the
general method, it is probably not obvious which combinations of settings, x, will
require predictions in the subjective optimization in the last step.
The phrase “prediction point” refers to a combination of settings, x, at which
prediction is of potential interest. The phrase “region of interest” refers to a set of
prediction points, R. This name derives from the fact that possible settings define a
vector space and the settings of interest define a region in that space.
The prediction model, yest(x) with the extra subscript is called an “empirical
model” since it is derived from data. If there is only one response, then the
subscript is omitted. Empirical models can derive from screening methods or
standard response surface or from many other procedures including those that
involve so-called “neural nets” (see Chapter 16).
The empirical model, yest(x), is intended to predict the average prototype system
response at the prediction point x. Ideally, it can predict the engineered system
response at x. Through the logical construct of a thought experiment, it is possible
to develop an expectation of the prediction errors that will result from performing
experiments, fitting a model, and using that model to make a prediction. This
expectation can be derived even before real testing in an application begins. In a
thought experiment, one can assume that one knows the actual average response of
the prototype or engineered system would give at the point x, ytrue(x).
The “true response” or ytrue(x) at the point x is the imagined actual value of the
average response at x. In the real world, we will likely never know ytrue(x), but it
can be a convenient construct for thought experiments and decision support for
RSM. The “prediction errors” at the point x, ε(x), are the difference between the
true average response and the empirical model prediction at x, i.e., εr(x) = ytrue(x) –
yest(x). Since ytrue(x) will likely be never known in real world problems, ε(x) will
likely also not be known. Still, it may be useful in thought experiments pertinent to
method selection to make assumptions about ε(x).
308 Introduction to Engineering Statistics and Six Sigma
Clearly, the prediction errors for a given response will depend on our beliefs
about the true system being studied or, equivalently, about the properties of the
true model, yest(x). For example, if the true model is very nonlinear or “bumpy”,
there is no way that a second order polynomial can achieve low prediction errors.
Figure 13.5 below illustrates this concept.
30 30
25 25
y est (x)
20 20
y true (x)
ε ( x)
15 15 ε (x )
10
y true (x) 10
5
y est (x)
5
x x
0 0
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
(a) (b)
Figure 13.5. Prediction errors for true models with “bumpiness” (a) low and (b) high
Table 13.15. Decision support for RSM with three and four factors
Run Block x1 x2 x3 x4 x5
1 1 0 0 0 0 0
2 1 -1 -1 1 1 1
3 1 0 0 0 0 0
4 1 1 -1 1 1 -1
5 1 1 -1 -1 1 1
6 1 -1 -1 -1 -1 1
7 1 1 -1 -1 -1 -1
8 1 1 1 1 1 1
9 1 0 0 0 0 0
10 1 -1 -1 -1 1 -1
11 1 0 0 0 0 0
12 1 -1 1 -1 -1 -1
13 1 0 0 0 0 0
14 1 1 1 -1 1 -1
15 1 -1 -1 1 -1 -1
16 1 1 1 1 -1 -1
17 1 -1 1 1 -1 1
18 1 0 0 0 0 0
19 1 -1 1 1 1 -1
20 1 1 -1 1 -1 1
21 1 -1 1 -1 1 1
22 1 1 1 -1 -1 1
23 2 0 0 0 -αC 0
24 2 0 0 0 αC 0
25 2 -αC 0 0 0 0
26 2 0 0 -αC 0 0
27 2 0 0 0 0 0
28 2 0 0 0 0 -αC
29 2 0 αC 0 0 0
30 2 αC 0 0 0 0
31 2 0 0 0 0 αC
32 2 0 -αC 0 0 0
33 2 0 0 αC 0 0
312 Introduction to Engineering Statistics and Six Sigma
R B x1 x2 x3 x4 x5 x6 R B x1 x2 x3 x4 x5 x6
1 1 -1 1 1 -1 -1 -1 28 1 1 -1 -1 1 -1 -1
2 1 -1 -1 1 1 1 1 29 1 0 0 0 0 0 0
3 1 -1 1 1 1 -1 1 30 1 1 1 1 1 -1 -1
4 1 0 0 0 0 0 0 31 1 -1 -1 1 -1 -1 1
5 1 -1 1 -1 -1 1 -1 32 1 0 0 0 0 0 0
6 1 1 1 -1 1 -1 1 33 1 0 0 0 0 0 0
7 1 1 -1 -1 -1 1 -1 34 1 1 1 -1 -1 1 1
8 1 1 -1 1 -1 -1 -1 35 1 1 -1 -1 1 1 1
9 1 -1 1 1 1 1 -1 36 1 1 1 1 1 1 1
10 1 0 0 0 0 0 0 37 1 0 0 0 0 0 0
11 1 -1 -1 -1 -1 -1 -1 38 1 1 -1 -1 -1 -1 1
12 1 -1 -1 -1 1 1 -1 39 1 1 1 1 -1 1 -1
13 1 1 -1 1 -1 1 1 40 1 -1 -1 -1 1 -1 1
14 1 -1 1 -1 -1 -1 1 41 2 0 αC 0 0 0 0
15 1 1 1 -1 -1 -1 -1 42 2 0 0 0 0 0 0
16 1 1 -1 1 1 1 -1 43 2 αC 0 0 0 0 0
17 1 1 1 -1 1 1 -1 44 2 0 0 0 0 0 -αC
18 1 -1 -1 1 -1 1 -1 45 2 0 0 0 αC 0 0
19 1 -1 1 1 -1 1 1 46 2 0 -αC 0 0 0 0
20 1 -1 1 -1 1 1 1 47 2 0 0 0 0 αC 0
21 1 1 1 1 -1 -1 1 48 2 0 0 0 0 0 0
22 1 0 0 0 0 0 0 49 2 0 0 0 0 -αC 0
23 1 0 0 0 0 0 0 50 2 0 0 0 -αC 0 0
24 1 -1 1 -1 1 -1 -1 51 2 0 0 -αC 0 0 0
25 1 -1 -1 1 1 -1 -1 52 2 -αC 0 0 0 0 0
26 1 1 -1 1 1 -1 1 53 2 0 0 αC 0 0 0
27 1 -1 -1 -1 -1 1 1 54 2 0 0 0 0 0 αC
DOE: Response Surface Methods 313
Run x1 x2 x3 x4 x5 x6 Run x1 x2 x3 x4 x5 x6
1 -1 0 0 -1 -1 0 28 0 0 0 0 0 0
2 0 -1 0 0 -1 -1 29 1 0 0 -1 1 0
3 0 -1 1 0 1 0 30 0 -1 1 0 -1 0
4 1 0 0 -1 -1 0 31 1 1 0 1 0 0
5 0 0 -1 1 0 1 32 0 -1 0 0 1 1
6 -1 0 0 1 -1 0 33 1 0 1 0 0 1
7 0 0 -1 1 0 -1 34 -1 0 -1 0 0 -1
8 0 0 1 1 0 -1 35 1 -1 0 1 0 0
9 0 0 1 -1 0 -1 36 0 1 0 0 -1 1
10 1 1 0 -1 0 0 37 -1 1 0 1 0 0
11 0 0 0 0 0 0 38 0 0 0 0 0 0
12 -1 -1 0 1 0 0 39 0 1 -1 0 1 0
13 1 0 1 0 0 -1 40 -1 0 0 -1 1 0
14 0 1 -1 0 -1 0 41 -1 0 -1 0 0 1
15 1 0 -1 0 0 1 42 0 0 0 0 0 0
16 -1 0 1 0 0 -1 43 1 0 -1 0 0 -1
17 1 0 0 1 1 0 44 0 1 0 0 1 -1
18 0 0 -1 -1 0 -1 45 -1 -1 0 -1 0 0
19 0 0 0 0 0 0 46 -1 0 0 1 1 0
20 0 -1 0 0 -1 1 47 0 0 1 -1 0 1
21 0 -1 -1 0 -1 0 48 0 1 1 0 1 0
22 0 0 0 0 0 0 49 1 -1 0 -1 0 0
23 0 1 0 0 1 1 50 0 -1 0 0 1 -1
24 -1 1 0 -1 0 0 51 0 0 1 1 0 1
25 0 1 1 0 -1 0 52 0 1 0 0 -1 -1
26 1 0 0 1 -1 0 53 -1 0 1 0 0 1
27 0 0 -1 -1 0 1 54 0 -1 -1 0 1 0
314 Introduction to Engineering Statistics and Six Sigma
Run x1 x2 x3 x4 x5 x6 x7 Run x1 x2 x3 x4 x5 x6 x7
1 -1 0 -1 0 -1 0 0 32 1 0 -1 0 -1 0 0
2 1 -1 0 1 0 0 0 33 -1 -1 0 -1 0 0 0
3 0 -1 1 0 0 1 0 34 1 0 0 0 0 -1 -1
4 0 0 1 -1 0 0 1 35 1 0 1 0 1 0 0
5 -1 1 0 -1 0 0 0 36 0 0 -1 -1 0 0 1
6 0 1 1 0 0 -1 0 37 0 0 0 1 1 -1 0
7 0 1 0 0 -1 0 1 38 -1 0 1 0 1 0 0
8 0 0 0 0 0 0 0 39 0 0 -1 1 0 0 1
9 1 1 0 1 0 0 0 40 1 1 0 -1 0 0 0
10 -1 0 0 0 0 -1 -1 41 1 0 1 0 -1 0 0
11 0 0 0 1 1 1 0 42 0 -1 1 0 0 -1 0
12 0 1 0 0 -1 0 -1 43 1 0 0 0 0 1 1
13 0 0 0 1 -1 1 0 44 0 0 1 1 0 0 -1
14 -1 0 -1 0 1 0 0 45 0 0 1 -1 0 0 -1
15 0 1 1 0 0 1 0 46 0 -1 -1 0 0 1 0
16 0 0 0 -1 -1 -1 0 47 0 -1 -1 0 0 -1 0
17 0 0 -1 -1 0 0 -1 48 0 -1 0 0 -1 0 1
18 1 0 -1 0 1 0 0 49 0 -1 0 0 1 0 1
19 0 0 0 0 0 0 0 50 1 0 0 0 0 1 -1
20 0 0 0 0 0 0 0 51 1 -1 0 -1 0 0 0
21 0 0 0 -1 1 1 0 52 0 0 0 -1 1 -1 0
22 0 0 0 1 -1 -1 0 53 0 0 0 0 0 0 0
23 0 1 0 0 1 0 -1 54 0 0 1 1 0 0 1
24 -1 0 0 0 0 -1 1 55 -1 -1 0 1 0 0 0
25 -1 0 1 0 -1 0 0 56 0 1 -1 0 0 1 0
26 0 0 0 0 0 0 0 57 -1 0 0 0 0 1 -1
27 0 1 -1 0 0 -1 0 58 -1 0 0 0 0 1 1
28 -1 1 0 1 0 0 0 59 0 -1 0 0 1 0 -1
29 1 0 0 0 0 -1 1 60 0 -1 0 0 -1 0 -1
30 0 1 0 0 1 0 1 61 0 0 -1 1 0 0 -1
31 0 0 0 -1 -1 1 0 62 0 0 0 0 0 0 0
DOE: Response Surface Methods 315
13.10 References
Allen TT, Yu L, Schmitz J (2003) The Expected Integrated Mean Squared Error
Experimental Design Criterion Applied to Die Casting Machine Design.
Journal of the Royal Statistical Society, Series C: Applied Statistics 52:1-15
Box GEP, Behnken DW (1960) Some New Three-Level Designs for the Study of
Quantitative Variables. Technometrics 30:1-40
Box GEP, Draper NR (1987) Empirical Model-Building and Response Surfaces.
Wiley, New York
Box GEP, Wilson KB (1951) On the Experimental Attainment of Optimum
Conditions. Journal of the Royal Statistical Society, Series B 13:1-45
Choudhury AK (1997) Study of the Effect of Die Casting Machine Upon Die
Deflections. Master’s thesis, Industrial & Systems Engineering, The Ohio
State University, Columbus
Myers R, Montgomery D (2001) Response Surface Methodology, 5th edn. John
Wiley & Sons, Inc., Hoboken, NJ
316 Introduction to Engineering Statistics and Six Sigma
13.11 Problems
1. Which is correct and most complete?
a. RSM is mainly relevant for finding which factor changes affect a
response.
b. Central composite designs have at most three distinct levels of each
factor.
c. Sequential response surface methods are based on central composite
designs.
d. All of the above are correct.
e. All of the above are correct except (a) and (d).
Table 13.19. (a) A two factor DOE, (b) - (d) model forms, and (e) ranges
(a)
Run A B (b) y(x) = β1 + β2 A + β3 B
1 -1 -1
2 1 -1 (c) y(x) = β1 + β2 A + β3 B + β4 A B
3 -1 1
4 0 0 (d) y(x) = full quadratic polynomial in A and B
5 1 1
6 -1.4 0 (e) Factor (-1) (+1)
7 0 0 A 10.0 N 14.0 N
8 0 1.4 B 2.5 mm 4.5 mm
9 0 -1.4
7. How many factors and levels are involved in the paper airplane example?
c. Often, EIMSE designs are not available with fewer runs than CCDs
or BBDs.
d. Central composite designs include fractional factorial, star, and center
points.
e. All of the above are correct.
f. All of the above are correct except (a) and (e).
For problems 12 and 13, consider the array in Table 13.1 (a) and the responses 7, 5,
2, 6, 11, 4, 6, 6, 8, 6 for runs 1, 2, …, 10 respectively. The relevant model form is a
full quadratic polynomial.
12. Which is correct and most complete (within the implied uncertainty)?
a. A full quadratic polynomial cannot be fitted since (X′X)–1X′ is
undefined.
b. RSM fitted coefficients are 5.88, 1.83, 0.19, 0.25, 0.06, and 2.75.
c. RSM fitted coefficients are 5.88, 2.83, 0.19, 0.25, 1.06, and 2.75.
d. All of the above are correct.
e. All of the above are correct except (a) and (d).
13. Which is correct and most complete (within the implied uncertainty)?
a. Adjusted R2 calculated is 0.99 a high fraction of the variation is
unexplained.
b. Adjusted R2 calculated is 0.99 a high fraction of the variation is
explained.
DOE: Response Surface Methods 319
For Question 14, suppose that you had performed the first seven runs of a central
composite design in two factors, and the average and standard deviation of the only
critical response for the three repeated center points are 10.5 and 2.1 respectively.
Further, suppose that the average response for the other four runs is 17.5.
14. Which is correct and most complete (within the implied uncertainty)?
a. F0 = 29 and lack of fit is detected.
b. F0.05,1,nC – 1 = 3.29.
c. F0 = 19.
d. All of the above are correct.
e. All of the above are correct except (a) and (d).
16. Which is correct and most complete based on how the designs are constructed?
a. Central composite designs do not, in general, contain center points.
b. A BBD design with seven factors contains the run -1, -1, -1, -1, -1, -1,
-1.
c. Central composite designs contain Resolution V fractional factorials.
d. CCDs and BBDs were generated originally using a computer.
e. All of the above are correct.
f. All of the above are correct except (a) and (e).
14.1 Introduction
In Chapter 4, it is claimed that perhaps the majority of quality problems are caused
by variation in quality characteristics. The evidence is that typically only a small
fraction of units fail to conform to specifications. If characteristic values were
consistent, then either 100% of units would conform or 0%. Robust design
methods seek to reduce the effects of input variation on a system’s outputs to
improve quality. Therefore, they are relevant when one is interested in designing a
system that gives consistent outputs despite the variation of uncontrollable factors.
Taguchi (1993) created several “Taguchi Methods” (TM) and concepts that
strongly influenced design of experiments (DOE) method development related to
robust design. He defined “noise factors” as system inputs, z, that are not
controllable by decision-makers during normal engineered system operation but
which are controllable during experimentation in the prototype system. For
example, variation in materials can be controlled during testing by buying
expensive materials that are not usually available for production. Let mn be the
number of noise factors so that z is an mn dimensional vector. Taguchi further
defined “control factors” as system inputs, xc, that are controllable both during
system operation and during experimentation. For example, the voltage setting on a
welding robot is fully controllable. Let mc be the number of control factors so xc is
an mc dimensional vector.
Consider that the rth quality characteristic can be written as yest,r(x c, z,ε ) to
emphasize its dependence on control factors, noise factors, and other factors that
are completely uncontrollable, ε. Then, the goal of robust engineering is to adjust
the settings in xc so that the characteristic’s value is within its specification limits,
LSLr and USLr, and all other characteristics are within their limits consistently.
Figure 14.1 (a) shows a case in which there is only one noise factor, z, and the
control factor combination, x1, is being considered. For simplicity, it is also
assumed that there is only one quality characteristic whose subscript is omitted.
Also, sources of variation other than z do not exist, i.e., ε = 0, and the relationship
between the quality characteristic, yest(x 1,z ,0 ) , and the z is as shown.
322 Introduction to Engineering Statistics and Six Sigma
y y p (x2)
USL USL
y (x2,z)
y (x1,z) y (x1,z)
LSL LSL
z z
z1
Figure 14.1 (a) focuses on a particular value, z = z1, and the associated quality
characteristic value yest(x 1, z, 0 ) , which is below the specification limit. Also, the
Figure 14.1 (a) shows a distribution for the noise factor under ordinary operations
and how this distribution translates into a distribution of the quality characteristic.
It also shows the fraction nonconforming, p(x1), for this situation.
Figure 14.1 (b) shows how different choices of control factor combinations
could result in different quality levels. Because of the nature of the system being
studied, the choice x2 results in less sensitivity of characteristic values than if x1 is
used. As in control charting, sensitivity can be measured by the width of the
distribution of the quality characteristic, i.e., the standard deviation, σ, or the
process capability. It is also more directly measurable by the fraction
nonconforming. It can be said that x2 settings are more robust than x1 settings
because p(x2) < p(x1).
In this chapter, multiple methods are presented, each with the goal of deriving
robust system settings. First, methods are presented that are an extension of
response surface methods (RSM) and are therefore similar to techniques in Lucas
(1994) and Myers and Montgomery (2001). These first methods presented here are
also based on formal optimization and expected profit maximization such that we
refer to them as “robust design based on profit maximization” (RDPM). These
methods were first proposed in Allen et al. (2001). Next, commonly used “static”
Taguchi Methods are presented, which offer advantages in some cases.
DOE: Robust Design 323
Question: Write out a functional form which is a second order polynomial with
two control factors and two noise factors and calculate the related c vector and C
and D matrices assuming there is only one quality characteristic, so the index r is
dropped for the remainder of this chapter.
1.0
0.9 Φ(x,µ,σ)
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-3 µ -2
- 2σ µ -1
-σ 0µ µ +1 σ µ2
+ 2σ 3
Step 1: If models of the pr(xc) for all quality characteristics are available, go to Step 6.
Otherwise continue.
Step 2: For each quality characteristic for which pr(xc) is not available, include the
associated response index in the set S1 if the response is a quality
characteristic. Include the response in the set S2 if the response is the fraction
nonconforming with respect to at least one type of nonconformity. Also,
identify the specification limits, LSLk and USLk, for the responses in the set S1.
Step 3: Apply a response surface method (all steps except the last, optimization step)
to obtain an empirical model of all quality characteristics including the
production rate, yest,r(x c, z ) for r = 1,…,q.
Step 4: Estimate the expected value, µz,i, and standard deviation, σz,i, of all the noise
factors relevant under normal system operation for all i = 1, …, mn.
Step 5: Estimate the failure probabilities as a function of the control factors, pr(xc) for
all quality characteristics, r ∈ S1, using the formulas in Equation (14.3),
(14.4), and (14.5).
Step 6: Obtain cost information in the form of revenue per unit, w0, and rework and/or
scrap costs per defect or nonconformity of type wr for r = 1,…,q.
Step 7: Maximize the profit, Profit(x c), in Equation (14.3) as a function of xc.
326 Introduction to Engineering Statistics and Six Sigma
In this section, the proposed methods are illustrated through their application to the
design of a robotic gas metal arc-welding (GMAW) cell. This case study is based
on a research study at the Ohio State University documented in Allen et al. (2001)
and Allen et al. (2002).
In that study, there were mn = 2 noise factors, z1 and z2, m – mn = 4 control
factors, x1,…,x4. These factors are shown in Figure 14.3. We chose two-step
response surface methods because we were not sure that the factor ranges
contained the control and noise settings associated with desirable arc welding
systems, taking into account the particular power supply and type of material. The
two-step approach offered the potentially useful option of performing only 40 tests
and stopping with both screening related results and information about two factor
interactions. The central composite design shown with two blocks is given in Table
14.1.
In this study, there were three relevant responses. The rate of producing units
was directly proportional to the control factor x1. Therefore, before doing
experiments, we knew that yest,0(xc) = 0.025 x1 in millions of parts. The other two
relevant responses were quality characteristics of the parts produced by the system.
x2
x1
z2
x4 x3
z1
Figure 14.3. The control and noise factors for the arc welding example
DOE: Robust Design 327
Step 1: In this application, models of p1(xc) and p2(xc) were not readily available.
Therefore, it was necessary to go to Step 2.
Step 2: In this step, two relevant quality characteristics were identified corresponding to
the main ways the units failed inspection or “failure modes”. In order to save
inspection costs and create continuous criteria, the team developed a continuous
(1-10) rating system based on visual inspection for each type of described in the
first table in Chapter 10 was utilized. Also, the specification limits LSL1 = LSL2
= 8.0 and USL1 = USL2 = were assigned. Therefore, higher ratings
corresponded to better welds.
Step 3: The first 40 experiments shown in Table 14.1 were performed using the central
composite design. After the first 40 runs, nc = center points and nf = 32
fractional factorial runs. MSLOF1 = 19.6, yvariance,c,1 = 0.21, F0,1 = 91.5 >>
F0.05,1,7 > 5.59 so the remainder of the runs in the table below were needed. For,
thoroughness we calculated MSLOF2 = 24.9, yvariance,c,2 = 0.21, F0,2 = 116.3 >>
F0.05,1,7 > 5.59. Therefore, curvature is significant for both responses. Therefore,
also, we performed the remainder of the runs given below. After all of the runs
were performed, we estimated coefficients using β est,r = AYr for r = 1 and 2,
where A = (X′X)–1X′. These multiplications performed using matrix functions
in Excel (“Ctrl-Shift-Enter” instead of OK is needed for assigning function
values to multiple cells), but the coefficients could have derived using many
choices of popular statistical software. Then, we rearranged the coefficients into
the form listed in Equation (14.7), and for the other response related to a quality
characteristic in Equation (14.8) below.
Step 4: The expected value, µz,i, and the standard deviations, σz,i, of the noise factors
were based on verbal descriptions from the engineers on our team. Gaps larger
than 1.0 mm and offsets larger than ±1.0 of the wire diameters were considered
unlikely, where 1.0 WD corresponds to 1.143 mm. Therefore, it was assumed
that z1 was N(mean=0.25,standard deviation=0.25) distributed in mm and z2 was
N(mean=0,standard deviation=0.5) distributed in wire diameters with zero
correlation across runs and between the gaps and offsets. Note that these
assumptions gave rise to some negative values of gap, which were physically
impossible but were necessary for the analytical formula in Equation (14.6) to
apply. In addition, it was assumed that ε1 and ε2 were both N(mean=0.0,
standard deviation=0.5) based on the sample variances (both roughly equal to
0.25 rating units) of the repeated center points in our experimental design.
Step 5: Based on the “standard assumptions” the failure probability functions were
found to be as listed in Equation (14.9).
Step 6: The team selected (subjectively since there was no real engineered system), w0
= $100 revenue per part, w1 = $250 per unit and w2 = $100 per unit based on
rework costs. Burning through the unit was more than twice as expensive to
repair since additional metal needed to be added to the part structure as well as
the weld. The travel speed was related to the number of parts per minute by the
simple relation, yest,0(xc) = 0.025 x1 in millions of parts, where x1 was in
millimeters per minute.
Step 7: The formulation then became: minimize 0.025 x1[$100 – p1(xc) $250 – p2(xc)
$100] where p1(xc) and p2(xc) were given in Equation (14.3). The following
additional constraints, listed in Equation (14.10) were added because the
prediction model was only accurate over the region covered by the experimental
design in Table 14.1. (continued)
328 Introduction to Engineering Statistics and Six Sigma
Step 8: This problem was solved using the Excel solver, which uses GRG2 (Smith
and Lasdon 1992) and the solution was x1 = 1533.3 mm/min, x2 = 6.83, x3 =
3.18 mm, and x4 = 15.2 mm, which achieved an expected profit of $277.6/min.
The derived settings offer a compromise between making units at a high rate
and maintaining consistent quality characteristic values despite the variation
of the noise factors (gap and offset).
b0,1 = –179.9
§ 0.00 0.00 ·
¨ ¸
C1 = ¨ − 0.50 − 0.75 ¸
¨ − 0.31 0.62 ¸
¨ ¸
¨ 0.25 − 0.25 ¸
© ¹
− 0.88 − 1.00 ·
D1 = §¨
¨ 0.00 − 0.88 ¸¸
© ¹
b0,2 = 88.04
§ 0.00 0.00 ·
¨ ¸
C2 = ¨ 1.38 0.50 ¸
¨ − 0.16 0.00 ¸
¨ ¸
¨ − 0.12 − 0.50 ¸
© ¹
− 3.78 0.00 ·
D2 = §¨
¨ 0.00 − 0.78 ¸¸
© ¹
ª0.25 0 º
with J=«
¬ 0 0.25»¼
Question: What is the relationship between six sigma and RDPM methods?
Answer: RDPM uses RSM and specific formulas to model directly the standard
deviation or “sigma” of responses as a function of factors that can be controlled.
Then, it uses these models to derive settings and sigma levels that generate the
highest possible system profits. Applying RDPM could a useful component in a six
sigma type improvement system.
Step 1. Plan the experiment using so-called “product” arrays. Product arrays are
based on all combinations or runs from a “inner array” and an “outer array”
which are smaller arrays. Table 14.2 shows an example of a product array
based on inner and outer arrays which are four run regular fractional
factorials. Taguchi uses many combinations of inner and outer arrays.
Often the 18 run array in Table 12.12 is used for the inner array. Taguchi
also introduced a terminology such that the regular design in Table 14.2 is
called an “L4” design.
Table 14.2 shows the same experimental plan in two formats. In total,
there are 16 runs. The notation implies that there is a single response
variable with 16 response data. Taguchi assigns control factors to the inner
array, e.g., factors A, B, and C, and the noise factors to the outer array, e.g.,
factors D, E, and F. In this way, each row in the product format in Table
14.2 (a) describes the consistency and quality associated with a single
control factor combination. Note that writing out the experimental plan in
“combined array” format as in Table 14.2 (b) can be helpful for ensuring
the the runs are performed in a randomized order. The array in Table 14.2
is not randomized to clarify that the experimental plan is the same as the
one in Table 14.2 (a).
Step 2. Once the tests have been completed according to the experimental design,
Taguchi based analysis on so-called “signal-to-noise ratio” (SNR) that
emphasize consistent performance regardless of noise factor setting for
each control factor combination. Probably three most commonly used
signal-to-noise are “smaller-the-better” (SNRS), “larger-the-better”
(SNRL), and “nominal-is-best” (SNRN). These are appropriate for cases in
which high, low, and nominal values of the quality characteristic are most
desirable, respectively. Formulas for the characteristic values are:
SNRS = -10 Log10 [ mean of sum of squares of measured data ] (14.11)
SNRL = -10 Log10 [ mean of sum squares of reciprocal of measured data]
SNRN = -10 Log10 [ mean of sum of squares of {measured - ideal} ] .
For example, using the experimental plan in Table 14.2, SNRS value for
the first inner array run would equal:
-10 Log10 [ ( y12 + y22 + y32 + y42) ÷ 4 ].
Similar calculations are then completed for each inner array combination.
Step 3. Create so-called “marginal plots” by graphing the average SNR value for
each of control factor settings. For example, the marginal plot for factor A
and the design in Table 14.2 would be based on the SNR average of the
first and the third control factor combination runs and the second and
fourth runs.
Step 4. Pick the factor settings that are most promising according to the marginal
plots. For factors that do not appear to strongly influence the SNR,
Taguchi suggests using other considerations. In particular, marginal plots
based on the average response are often used to break ties using subjective
decision-making.
334 Introduction to Engineering Statistics and Six Sigma
Table 14.2. A Taguchi product array: (a) in product format and (b) in standard order
(a) (b)
Outer array D -1 1 -1 1 Run A B C D E F Y
E -1 -1 1 1 1 -1 -1 1 -1 -1 1 y1
Inner array F 1 -1 -1 1 2 1 -1 -1 -1 -1 1 y2
Run A B C 3 -1 1 -1 -1 -1 1 y3
1 -1 -1 1 y1 y2 y3 y4 4 1 1 1 -1 -1 1 y4
2 1 -1 -1 y5 y6 y7 y8 5 -1 -1 1 1 -1 -1 y5
3 -1 1 -1 y9 y10 y11 y12 6 1 -1 -1 1 -1 -1 y6
4 1 1 1 y13 y14 y15 y16 7 -1 1 -1 1 -1 -1 y7
8 1 1 1 1 -1 -1 y8
9 -1 -1 1 -1 1 -1 y9
10 1 -1 -1 -1 1 -1 y10
11 -1 1 -1 -1 1 -1 y11
12 1 1 1 -1 1 -1 y12
13 -1 -1 1 1 1 1 y13
14 1 -1 -1 1 1 1 y14
15 -1 1 -1 1 1 1 y15
16 1 1 1 1 1 1 y16
Question: Without performing any new tests, sketch what the application of
Taguchi Methods might look like for the problem used to illustrate RPDM.
Table 14.3. Data for the Taguchi experiment from RSM model predictions
a1 a2
L4 Gap 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0
L9 Offset -0.5 -0.5 0.5 0.5 -0.5 -0.5 0.5 0.5
TS (mm/ Arc L. CTTW SNRL SNRL
min) Ratio (mm) (mm) 1 2
1270.0 6.0 3.2 14.0 9.7 9.3 9.6 8.2 8.3 4.8 7.2 3.6 19.3 14.2
1270.0 7.0 4.0 15.0 10.7 9.8 10.1 8.2 9.1 6.6 7.9 5.5 19.6 16.8
1270.0 8.0 4.8 16.0 9.2 7.8 8.1 5.7 9.3 7.9 8.1 6.8 17.4 17.9
1524.0 6.0 4.0 16.0 9.5 9.8 9.4 8.7 9.5 5.3 7.5 3.3 19.4 14.1
1524.0 7.0 4.8 15.0 10.0 9.0 9.5 7.9 7.5 4.8 7.0 4.3 19.1 14.8
1524.0 8.0 3.2 15.0 9.1 8.5 7.3 5.6 9.6 8.4 9.1 7.9 17.2 18.8
1778.0 6.0 4.8 15.0 8.9 9.3 9.6 8.9 7.6 3.1 6.2 1.7 19.2 9.2
1778.0 7.0 3.2 16.0 8.3 8.9 7.0 6.6 8.4 5.5 7.0 4.1 17.5 15.0
1778.0 8.0 4.0 14.0 5.5 4.8 4.4 2.7 7.6 6.1 7.7 6.2 11.8 16.6
Because of the way the data were generated, the assumptions of normality,
independence and constancy of variance were satisfied so that no transformation of
the data was needed to achieve these goals. Transformations to achieve separability
and additivity were not investigated because Song et al. (1995) state that their
method was not restricted by separability requirements, and the selection of the
transformation to achieve additivity involves significant subjective decision-
making with no guarantee that a feasible transformation was possible. Control
factor settings that were not dominated were identified by inspection of Figure
14.4. The fi,j(l) refers to the mean values of the jth characteristics’ average signal-to-
noise ratio at level l for factor i. For example, all combinations of settings having
arc length equal to 4.0 mm were dominated since at least one other choice of arc
length exists (arc length equal 3.2 mm) for which both signal-to-noise ratio
averages are larger.
This first step left 12 combinations of control factors. Subsequently, the
formula in Song et al. (1995), which sums across signal-to-noise ratios for different
responses, was used to eliminate four additional combinations. The resulting eight
combinations included x1 = 1270.0 mm/min, x2 = 7.00, x3 = 3.18 mm, and x4 = 15.0
mm, which yielded the highest expected profit among the group equal to
$90.8/min. The combinations also included x1 = 1270.0 mm/min, x2 = 6.0, x3 = 4.80
mm, and x4 = 15.0 mm, with the lowest expected profit equal to $–48.7/min. The
user was expected to select the final process settings using engineering judgment
from the remaining setting combinations, although without the benefit of knowing
the expected profit.
336 Introduction to Engineering Statistics and Six Sigma
Including the parts per minute Q(x), which is proportional to travel speed, as an
additional criterion increased by a factor of three the number of solutions that were
not dominated. This occurred because setting desirability with respect to the other
criteria consistently declined as travel speed increased. The revised process also
included several settings which were predicted to result in substantially negative
profits, i.e., situations in which expected rework costs would far outweigh sales
revenue.
20.0
19.0
18.0
17.0
16.0
15.0
14.0
13.0
12.0
11.0
10.0
1270 1524 1778 6 7 8 3.2 4.0 4.8 14 15 16
Figure 14.4. Signal-to-noise ratio marginal plots for the two quality characteristics
Note that an important issue not yet mentioned can make the Taguchi product
array structure highly desirable. The phrase “easy-to-change factors” (ETC) refers
to system inputs with the property that if only their settings are changed, the
marginal cost of each additional experimental run is small. The phrase “hard-to-
change” (HTC) factors refers to system inputs with the property that if any of their
settings is changed, the marginal cost of each additional experimental run is large.
For cases in which factors divide into ETC and HTC factors, the experimental costs
are dominated by the number of distinct combinations of HTC factors, for
338 Introduction to Engineering Statistics and Six Sigma
example, printing off ten plastic cups and then testing each in different
environments (ETC factors). Since most of the costs relate to making tooling for
distinct cup shapes (HTC factors), printing extra identical cups and testing them
differently is easy and costs little.
Taguchi has remarked that noise factors are often all ETC, and control factors
are often HTC. For cases in which these conditions hold, the product array
structure offers the advantage that the number of distinct HTC combinations in the
associated combined array is relatively small compared with the combinations
required by a typical application of response surface method arrays.
Finally, Lucas (1994) proposed a class of “mixed resolution” composite designs
that can be used in RDPM to save on experimentation costs. The mixed resolution
designs achieved lower numbers of runs by using a special class of fractional
factorials such that the terms in the matrices Dk for k = 1,…,r were not estimable.
Lucas argued that the terms in Dk are of less interest than other terms and are not
estimable with most Taguchi designs. For our case study, the mixed resolution
design (not shown) would have 43 instead of 54 runs. In general, using Lucas
mixed resolution composite designs can help make RSM based alternatives to
Taguchi Methods like RDPM cost competitive even when all noise factors are
ETC.
14.7 References
Allen TT, Ittiwattana W, Richardson RW, Maul G (2001) A Method for Robust
Process Design Based on Direct Minimization of Expected Loss Applied to
Arc Welding. The Journal of Manufacturing Systems 20:329-348
DOE: Robust Design 339
Allen TT, Richardson RW, Tagliabue D, and Maul G (2002) Statistical Process
Design for Robotic GMA Welding of Sheet Metal. The Welding Journal
81(5): 69s-77s
Chen LH, Chen YH (1995) A Computer-Simulation-Oriented Design Procedure
for a Robust and Feasible Job Shop Manufacturing System. Journal of
Manufacturing Systems 14: 1-10
Devor R, Chang T, et al.. (1992) Statistical Quality Design and Control, p. 47-57.
Macmillan, New York
Johnson NL, Kotz S, et al.. (1995) Continuous Univariate Distributions. John
Wiley, New York
Lucas JM (1994) How to Achieve a Robust Process Using Response Surface
Methodology. Journal of Quality Technology 26: 248-260
Myers R, Montgomery D (2001) Response Surface Methodology, 5th edn. John
Wiley & Sons, Inc., Hoboken, NJ
Nair VN, Pregibon D (1986) A Data Analysis Strategy for Quality Engineering
Experiments. AT&T Technical Journal: 74-84
Rodriguez JF, Renaud JE, et al.. (1998) Trust Region Augmented Lagrangian
Methods for Sequential Response Surface Approximation and Optimization.
Transactions of the ASME Journal of Engineering for Industry 120: 58-66
Song AA, Mathur A, et al.. (1995) Design of Process Parameters Using Robust
Design Techniques and Multiple Criteria Optimization. IEEE Transactions
on Systems, Man, and Cybernetics 24: 1437-1446
Smith, S and Lasdon L (1992) Solving Large Sparse Nonlinear Programs Using
GRG. ORSA Journal on Computing, 4, 1: 2-15.
Taguchi G (1987) A System for Experimental Design. UNIPUB, Detroit
Taguchi G (1993) Taguchi Methods: Research and Development. In: Konishi S
(ed), Quality Engineering Series, vol 1. The American Supplier Institute,
Livonia, MI
14.8 Problems
In general, choose the answer that is correct and most complete.
9. Assume that z1 and z2 have means µ1 and µ2 and standard deviations σ1 and σ2
respectively. Also, assume their covariance is zero. What is
ªª ′ ′ º º
§ 2 · § x1 · § 5 2 ·» »
Var[(c′ + x′C)z ] = Var ¨¨ ¸¸ + ¨¨ ¸¸ ¨¨
« « ¸¸ z ,
z «« − 1
z
© ¹ © x2 ¹ © 2 8 ¹ » »
¬¬ ¼ ¼
in terms of µ 1 and µ2, standard deviations σ1 and σ2, and no matrices?
12. (Advanced) Extend RDPM to drop the assumption that Dr = 0 for all r.
15
Regression
15.1 Introduction
Regression is a family of curve-fitting methods for (1) predicting average response
performance for new combinations of factors and (2) understanding which factor
changes cause changes in average outputs. In this chapter, the uses of regression
for prediction and performing hypothesis tests are described. Regression methods
are perhaps the most widely used statistics or operations research techniques.
Also, even though some people think of regression as merely the “curve fitting
method” in Excel, the methods are surprisingly subtle with much potential for
misuse (and benefit).
Some might call virtually all curve fitting methods “regression” but, more
commonly, the term refers to a relatively small set of “linear regression” methods.
In linear regression predictions increase like a first order polynomial in the
coefficients. Models fit with terms like β32 x12x4 are stilled called “linear” because
the term is linear in β32, i.e., if the coefficient β32 increases, the predicted response
increases proportionally. See Chapter 16 for a relatively thorough discussion of
regression vs alternatives.
Note that standard screening using fractional factorials, response surface
methods (RSM), and robust design using profit maximization (RDPM) methods
are all based on regression analysis. Yet, regression modeling is relevant whether
the response data is collected using a randomized experiment or, alternatively, if it
is “on-hand” data from an observational study. In addressing on-hand data,
primary challenges relate to preparing the data for analysis and determining which
terms should be included in the model form.
Section 2 focuses on the simplest regression problem involving a single
response or system output and a single factor or system input and uses it to
illustrate the derivation of the least squares estimation formula. Section 3
describes the challenge of preparing on-hand data for regression analysis including
missing data. Section 4 discusses the generic task of evaluating regression models
and its relation to design of experiments (DOE) theory. Section 5 describes
analysis of variance (ANOVA) followed by multiple t-tests, which is the primary
344 Introduction to Engineering Statistics and Six Sigma
4 6 200 166 34
5 7 190 198 -8 100
SSE = 3480 50
0
2 (b) 4 6 8
x1
Figure 15.1. Single factor example (a) data and (b) plot of data and 1st order model
Regression 345
The higher the residual, the more concerned one might be that important factors
unexplained by the model are influencing the observation in question. These
concerns could lead us to fit a different model form and/or to investigate whether
the data in questions constitutes an “outlier” that should be removed or changed
because it does not represent the system of interest.
The example in Algorithm 15.1 below illustrates the application of regression
modeling to predict future responses. The phrase “trend analysis” refers to the
application of regression to forecast future occurrences. Such regression modeling
constitutes one of the most popular approaches for predicting demand or revenues.
Answer: The best fit line is yest(x1) = 10 + 20 x1. This clearly minimizes almost any
measure of the summed residuals, since it passes through the average responses at
the two levels. The resulting residuals are –2, +2, +5, and –5. The forecast or
prediction for Month 3 is 10 + 20 × 3 = 70 units.
It is an interesting fact that the residuals for all observations can be written in
vector form as follows. Using the notation from Section 13.2, “y” is a column of
responses, “X” is the design matrix for fitted model based on the data, and
“Errorest” is a vector of the residuals. Then, in vector form, we have
Errorest = y – Xβest . (15.1)
The “sum of squares estimated errors” (SSE) is the sum or squared residual
values and can be written
SSE = (y – Xβest)′(y – Xβest) . (15.2)
For example, for the data in Figure 15.1 (a), βest,1 = –26, and βest,2 = 32, we have
1 3 70 0
1 4 120 18
X= 1 5 y= 90 y – Xβest = –44
1 6 200 34
1 7 190 –8
The example in Figure 15.1 (a) is simple enough that the coefficients βest,1= –26
and βest,2 = 32 can be derived informally by manually plotting the line and then
estimating the interscept and slope. A more general formal curve-fitting approach
346 Introduction to Engineering Statistics and Six Sigma
X= 1 3 Xƍ = 1 1 1 1 1 XƍX= 5 25
14 3 4 5 6 7 25 135
15
16 (XƍX) –1Xƍ = 1.2 0.7 0.2 –0.3 –0.8
17 –0.2 –0.1 0.0 0.1 0.2
In general, removing the data points with missing entries can be the safest, most
conservative approach generating the highest standard of evidence possible.
However, in some cases other strategies are of interest and might even increase the
believability of results. For these cases, a common strategy is to include an average
value for the missing responses and then see how sensitive the final results are to
changes in these made-up values (Strategy 2). Reasons for adopting this second
strategy could be:
1. The missing entries constitute a sizable fraction of entries in the
database and many completed entries would be lost if the data points
were discarded.
2. The most relevant data to the questions being asked contain missing
entries.
Making up data should always be done with caution and clear labelling of what is
made up should be emphasized in any relevant documentation.
Question: Consider the data in Table 15.1 related to predicting the number of sales
in Month 24 using a first order model using month and interest rate as factors.
Evaluate Strategy 2 for addressing the missing data in Table 15.1 (a) and (b).
Table 15.1. Two cases involving missing data and regression modeling for forecasting
(a) (b)
Point/ x2 Point/ x2
x1 y x1 y
(Interest (Interest
Run (Month) (#sales) Run (Month) (#sales)
Rate) Rate)
1 17 3.5 168 1 15 120
2 18 3.7 140 2 16 3.3 157
3 19 3.5 268 3 17 3.5 168
4 21 3.2 245 4 18 3.5 140
5 22 242 5 19 3.7 268
6 23 3.2 248 6 20 3.5 245
7 21 3.2 242
8 22 3.3 248
9 23 3.2 268
Answer: It would be more tempting to include the average interest rate for the
case in Table 15.1 (a) than for the case in Table 15.1 (b). This follows in part
because the missing entry is closer in time to the month for which prediction is
needed in Table 15.1 (a) than in Table 15.1 (b). Also, there is less data overall in
Table 15.1 (a), so data is more precious. Added justification for making up data for
Table 15.1 (a) derives from the following sensitivity analysis. Consider forecasts
348 Introduction to Engineering Statistics and Six Sigma
based on second order models and an assumed future interest rate of 3.2. With the
fifth point/run removed in Table 15.1 (a), the predicted or forecasted sales is 268.9
units. Inserting the average value of 3.42 for the missing entry, the forecast is 252.9
units. Inserting 3.2 for the missing entry, the forecast is 263.4 units. Therefore,
there seems to be some converging evidence in favor of lowering the forecast from
that predicted with the data removed. This evidence seems to derive from the
recent experience in Month 22 which is likely relevant. It can be checked that the
results based on Table 15.1 (b) are roughly the same regardless of the strategy
related to removing the data. Therefore, since removing the data is the simplest and
often least objectionable approach, it makes sense to remove point 1 in Table 15.1
(b) but not necessarily to remove point 5 in Table 15.1 (a).
randomization establishes the cause and effect relationship between input changes
and response variation.
In an important sense, the main justifications for using design of experiments
relate to the creation of acceptable regression models. By using the special
experimental arrays and randomizing, much subjectivity is removed from the
analysis process. Also, there is the benefit that, if DOE methods are used, it may be
possible to properly use the word “proof” in drawing conclusions.
Table 15.2. Acceptability checks (“9 guaranteed, “?” unclear, “²” loss unavoidable)
This section concerns evaluation of whether a given set of data can be reliably
trusted to support fitting a model form of interest. The least squares estimation
formula reveals that coefficient estimates can be written as βest = Ay where A =
(X′X)-1X′ and A is the “alias” matrix. The alias matrix is a function of the model
form fitted and the input factor settings in the data. If the combination is poor, then
if any random error, εi, influencing a response in y occurs, the result will be
inflated and greatly change the coefficients.
The term “input data quality” refers to the ability of the input pattern to
support accurate fitting a modeling of interest. We define the following in relation
to quantitative evalution of input data quality:
1. Ds is the input pattern in the flat file.
2. H and L are the highs and lows respectively of the numbers in each
column of the input data, Ds.
3. D is the input data in coded units that range between –1 and 1.
4. X is the design matrix.
5. Xs is the scaled design matrix (potentially the result of two scalings).
6. n is the number of data points or rows in the flat file.
7. m is the number of factors in the regression model being fitted.
8. k is the number of terms in the regression model being fitted.
The following procedure, in Algorithm 15.1, is useful for quantitative evaluation of
the extent to which errors are inflated and coefficient estimates are unstable.
Note that, in Step 4 the finding of a VIF greater than 10 or a ri,j greater than 0.5
does not imply that the model form does not describe nature. Rather, the
conclusions would be that the model form cannot be fitted accurately because of
350 Introduction to Engineering Statistics and Six Sigma
the limitations of the pattern of the input data settings, i.e., the input data quality.
More and better quality data would be needed to fit that model.
Note also that most statistical software packages do not include the optional
Step 1 in their automatic calculations. Therefore, they only perform a single
scaling. Therefore also, the interpretation of their output in Step 4 is less credible.
In general, the assessment of input data quality is an active area of research, and
the above procedure can sometimes prove misleading. In some cases, the
procedure might signal that the input data quality is poor while the model has
acceptable accuracy. Also, in some cases the procedure might suggest that the
input data quality is acceptable, but the model does not predict well and results in
poor inference.
Step 4. If any of the VIFs is greater than 10 or any of the ri,j are greater than 0.5
declare that the input data quality likely does not permit an accurate fit.
Question 1: Consider the data in Table 15.3. Does the data support fitting a
quadratic model form?
Answer 1: Following the procedure, in Step 1, the D matrix in Table 15.3 was
calculated using H1 = 45 and L1 = 25. Since there is only a single factor, D is a
vector. In Step 2, the Xs matrix was calculated using Xbar,1 = –0.025, s1 = 1.127,
Xbar,2 = 0.953, and s2 = 0.095. In Step 3, the transpose, multiplication, and inverse
operations were applied using Excel resulting in the correlation matrix in Table
Regression 351
15.3. Step 4 results in the conclusion that the input data is likely not of high enough
quality to support fitting a quadratic model form since r12 = 0.782 > 0.5.
Question 2: Intepret visually why a second order model might be unreliable when
fitted to the data in Table 15.3 (a).
Answer 2: Figure 15.2 (a) shows the intial fit of the second order model. Figure
15.2 (b) shows the second order fit when the last observation is shifted by 20
downward. The fact that such a small shift compared with the data range causes
such a large change in appearance indicates that the input data has low quality and
resulting models are unreliable.
Table 15.3. Example: (a) data and D, (b) Xs, and (c) the correlation matrix
300 300
250 250
200 200
150 150
100 100
50 50
0 x1 0 x1
20 30 40 50 20 30 40 50
(a) (b)
Figure 15.2. (a) Initial second order model and (b) model from slightly changed data
The value can be obtained by searching Table 15.4 below for the
argument and then reading over for the first two digits and reading up for
the third digit. Note also, that if 0.5 < s < 1, then Φ–1[s] = 1.0 – Φ–1[–s].
Step 2. Generate ysorted by sorting in ascending order the numbers in y.
Therefore, ysorted,1 is the smallest number among y1,…,yn (could be the
most negative number).
Step 3. Plot the set of ordered pairs {ysorted,1,Z1},…,{ysorted,n,Zn}.
Step 4. Examine the plot. If all numbers appear roughly on a single line then the
assumption that all the numbers y1,…,yn come from a single normal
distribution is reasonable. If the numbers with small absolute values line
up but a few with large absolute values are either to the far right-hand-
side or to far left hand side, off the (rough) line formed by the others,
then we say that the larger (absolute value) numbers probably did not
come from the same distribution as the smaller numbers. Probably some
factor caused these numbers to have a different origin than the others.
These numbers with large absolute values off the line are called
“outliers”.
Question: Assume that the residuals are: Errorest,1 = –3.6, Errorest,2 = –15.1,
Errorest,3 = –1.8, Errorest,4 = 3.9, Errorest,5 = –1.4, Errorest,6 = 4.8, and Errorest,7 = 2.0.
Use normal probability plotting to assess whether any are outliers.
Answer: Step 1 gives Z = {–1.47, –0.79, –0.37, 0.00, 0.37, 0.79, 1.47}. Step 2
gives ysorted = {–15.1, –3.6, –1.8 –1.4, 2.0, 3.9, 4.8}. The plot from Step 3 is shown
in Figure 15.3. All numbers appear to line up, i.e., seem to come from the same
normal distribution, except for -15.1, which is an outlier. It may be important to
investigate the cause of the associated usual response (run 2). For example, there
could be something simple and fixable, such as a data entry error. If found and
corrected, a mistake might greatly reduce prediction errors.
2.0
1.5
1.0
0.5
0.0
Z
-1.5
-2.0
y
Table 15.4. If Z ~ N[0,1], then the table gives P(Z < z). The first column gives firs three
digits of z, the top row gives the last digit.
0.00 0.01 0.02 0.03 0.04
-6.0 9.90122E-10 1.05294E-09 1.11963E-09 1.19043E-09 1.26558E-09
-4.4 5.41695E-06 5.67209E-06 5.93868E-06 6.21720E-06 6.50816E-06
-3.5 0.00023 0.00024 0.00025 0.00026 0.00027
-3.4 0.00034 0.00035 0.00036 0.00038 0.00039
-3.3 0.00048 0.00050 0.00052 0.00054 0.00056
-3.2 0.00069 0.00071 0.00074 0.00076 0.00079
-3.1 0.00097 0.00100 0.00104 0.00107 0.00111
-3.0 0.00135 0.00139 0.00144 0.00149 0.00154
-2.9 0.00187 0.00193 0.00199 0.00205 0.00212
-2.8 0.00256 0.00264 0.00272 0.00280 0.00289
-2.7 0.00347 0.00357 0.00368 0.00379 0.00391
-2.6 0.00466 0.00480 0.00494 0.00508 0.00523
-2.5 0.00621 0.00639 0.00657 0.00676 0.00695
-2.4 0.00820 0.00842 0.00866 0.00889 0.00914
-2.3 0.01072 0.01101 0.01130 0.01160 0.01191
-2.2 0.01390 0.01426 0.01463 0.01500 0.01539
-2.1 0.01786 0.01831 0.01876 0.01923 0.01970
-2.0 0.02275 0.02330 0.02385 0.02442 0.02500
-1.9 0.02872 0.02938 0.03005 0.03074 0.03144
-1.8 0.03593 0.03673 0.03754 0.03836 0.03920
-1.7 0.04457 0.04551 0.04648 0.04746 0.04846
-1.6 0.05480 0.05592 0.05705 0.05821 0.05938
-1.5 0.06681 0.06811 0.06944 0.07078 0.07215
-1.4 0.08076 0.08226 0.08379 0.08534 0.08692
-1.3 0.09680 0.09853 0.10027 0.10204 0.10383
-1.2 0.11507 0.11702 0.11900 0.12100 0.12302
-1.1 0.13567 0.13786 0.14007 0.14231 0.14457
-1.0 0.15866 0.16109 0.16354 0.16602 0.16853
-0.9 0.18406 0.18673 0.18943 0.19215 0.19489
-0.8 0.21186 0.21476 0.21770 0.22065 0.22363
-0.7 0.24196 0.24510 0.24825 0.25143 0.25463
-0.6 0.27425 0.27760 0.28096 0.28434 0.28774
-0.5 0.30854 0.31207 0.31561 0.31918 0.32276
-0.4 0.34458 0.34827 0.35197 0.35569 0.35942
-0.3 0.38209 0.38591 0.38974 0.39358 0.39743
-0.2 0.42074 0.42465 0.42858 0.43251 0.43644
-0.1 0.46017 0.46414 0.46812 0.47210 0.47608
0.0 0.50000 0.50399 0.50798 0.51197 0.51595
Regression 355
§1·
SST = Y′Y – ¨ ¸ Y′QY (15.9)
©n¹
Then, the adjusted R-squared (R2adj) is given by
§ n − 1 ·§ SSE * ·
R2 adjusted = 1 – ¨ ¸¨ ¸ (15.10)
© n − k ¹© SST ¹
where k is the number of terms in the fitted model and SSE* is the sum of squares
error defined in Equation (15.3). It is common to interpret R2adj as the “fraction of
the variation in the response data explained by the model”.
Question: Calculate and interpret R2 adjusted for the example in Figure 15.2(a).
0 70 1 1 1 1 1
18 120 1 1 1 1 1
Errorest = –44 , Y= 90 , and Q = 1 1 1 1 1 (15.11)
34 200 1 1 1 1 1
–8 190 1 1 1 1 1
Therefore, with n = 5 data points, SST = 13720 and R2 adjusted = 0.662 so that
roughly 66% of the observed variation is explained by the first order model in x1.
the first example in this chapter, since the system input (x) values do not follow the
pattern of a planned experiment, one is skeptical about how much the 0.66 implies.
The next two summary statistics are based on the concept that the SSE can
underestimate the errors of regression model predictions on new data. This follows
intuitively because the fit might effectively “cheat” by overfitting the data upon
which it was based and extrapolate poorly. The phrase “cross-validation” refers to
efforts to evaluate prediction errors by using some of the data points only for this
purpose, i.e., a set of data points only for testing.
Define yest(i,β est,x) as the regression fitted to a training set consisting of all runs
except for the ith run. Define the xi and yi as the inputs and response for the ith run
respectively. Then, the PRESS statistic is
R2 prediction = 1 – ¨
§ n − 1 ·§ PRESS · . (15.13)
¸¨ ¸
© n − k ¹© SST ¹
Question: Calculate and interpret the R2 prediction for the example in Figure
15.2(a).
Answer: Table 15.5 shows the model coefficients, predictions, and errors in the
PRESS sum. Squaring and summing the errors gives PRESS = 6445.41. Then, the
R2 prediction = 0.53. Therefore, the model explains only 53% of the variation and
cross validation indicates that some overfitting is occurring.
In Chapter 4, the process capability in the context of the Xbar and R charts was
defined as 6σ0. The symbol “σ0” or “sigma” is the standard deviation of system
outputs when inputs are fixed. For establishing the value of σ0 using Xbar and R
charting, it is necessary to remove data associated with any of the 25 periods that is
not representative of system performance under usual conditions. This process is
358 Introduction to Engineering Statistics and Six Sigma
where SSE* is the sum of squares error for the least squares model, n is the number
of runs, and k is the number of terms in the fitted model form. Many software refer
to their estimate of “σest” using “S,” including Minitab® and Sagata® Regression.
The value of σest is useful for at least three reasons. First, it provides a typical
difference or error between the regression prediction and actual future values.
Differences will often be larger partly because of the regression model predictions
are not perfectly accurate with respect to predicting average responses. Second, σest
can be used in robust system optimization, e.g., it can be used as an estimate of σr
for the formulas in Chapter 14.
Third, if the value of σest is greater by an amount considered subjectively large
compared with the standard deviation of repeated response values from the same
inputs, then evidence exists that the model form is a poor choice. This is
particularly easy to evaluate if repeated runs in the input pattern permit an
independent estimate of σest by taking the standard deviation of these responses.
Then, it might be desirable to include higher order terms such as x12 if there were
sufficient runs available for their estimation. This type of “lack of fit” can be
proven formally using hypothesis tests as in two-step response surface methods
after the first step in Section 13.6.
Question: Calculate and interpret the value of σest using the data in Figure 15.1(a).
Regression 359
Answer: First, the normal probability plot of residuals in Figure 15.4 finds no
obvious outliers. Therefore, there is no need to remove data and refit the model.
From previous problems, the SSE* is 3480 and σest = sqrt(3480 ÷ 3) = 34.1.
Without physical insight about the system of interest or responses from repeated
system inputs, there is little ability to assess lack of fit. Typically, outputs from the
same system would be within 34.1 units from the mean predicted by the regression
model yest(x1) = –26 + 32 x1.
1.5
1
Normal Scores
0.5
0
-50 -0.5 0 50
-1
-1.5
Residuals
Question: Calculate and interpret the results of the ANOVA method followed by
multiple t-tests based on the data in Figure 15.1 (a).
Answer: Table 15.7 (a) shows the ANOVA table and Table 15.7 (b) shows the
calculation of the t-statistic. Note that for a single factor example, the ANOVA p-
value is the same as the single factor coefficient p-value, i.e., the chance that the
data is all noise can be evaluated with either statistic. With so little data, the p-
value of 0.056 can be considered strong evidence that factor x1 affects the average
362 Introduction to Engineering Statistics and Six Sigma
response values. Note that since it is not clear whether randomized experimentation
has been used, it is not proper to declare that the analysis provides proof.
The “Bonferroni inequality” establishes that if q tests are made each with an
α chance of giving a Type I error, the chance of no false alarm on any test is
greater than 1 – q × α. Even though additional mathetical results can increase this
bounding limit, with even a few tests (e.g., q = 4) approaches based on individual
testing offer limited overall coverage unless the α values used are very small.
ANOVA followed by t-tests can offer the same guarantee while achieving lower
Type II error risks than any procedure based on the Bonferroni inequality.
Table 15.7. Single factor (a) ANOVA table and (b) t-test and p-values
df SS MS F p-value
(a) Regression 1 SSR = 10240 10240 8.83 0.0590
Residuals 3 SSE = 3480 MSE=1160
of these inputs. Starting with inputs scaled to –1 to 1 provides a natural basis for
assessing whether interactions underlie the system performance being studied.
Add or remove
Fit model:
terms and/or
X ĺ (XಬX)-1XಬY
transform..
Ready to No
Model Acceptable?
give up?
No
Yes Yes
Often, the analysis process in Figure 15.5 can be completed within a single
hour after the flat file is created. A first order model using factors of intuitive
importance is a natural starting point. Patterns in the residuals or an intuitive desire
to explore additional interactions and curvatures generally provide motivation for
adding more terms. Often, adding terms such as x12 is as easy as clicking a button.
Therefore, the bottleneck is subjective interpretation of the acceptability of the
residual plots and of the model form.
Even though all results from on-hand data should be evaluated with caution,
regression analyses often provide a solid foundation for important business
decisions. These could include the adjustment of an engineering design factor such
as the width of seats on airplanes or the setting aside of addition money in a budget
because of a regression forecast of the financial needs.
The following are reproduced with permission from a study by Dr. A. G. Fisher
and others and made available through the internet at http://lib.stat.cmu.edu.
Analyze the body fat data in Table 15.8 and make recommendations to the extent
possible for a person in training who is 35 years old, 190 pounds, 68 inches, with a
42 cm neck, and who wants to lose weight. A good analysis will typically include
one or two models, reasons for selecting that model, estimates of the errors of the
model, and interpretation for the layperson.
Question 1: What prediction model would you use to predict the body fat of
people not in the table such as the person in training and why?
Answer 1: Consider the terms in a full second order model including f1(x) = 1,
f2(x) = Age, …, f15(x) = Height × Neck. The combination of terms up to second
order that minimize the PRESS are Age, Age × Weight, and Age × Height. Fitting
a model with only these terms using least squares gives %Fat = 3.25×Age +
0.00699×Age×Weight – 0.0561×Age×Height. This model has an R2adj_prediction
approximately equal to 0.86 so that these few terms explain a high fraction of the
variation. The model is also simple and intuitive in that it correctly predicts that
older, heavier, and shorter people tend to have relative high body fat percentages.
Figure 15.7 shows the model predictions as a function of height and weight.
Regression 365
Question 2: Does the normal probability plot of residuals support the assumption
that the residuals are IID normally distributed?
Answer 2: The normal probability plot provided limited, subjective support for
the assumption that the residuals are IID. normally distributed noise. There are no
obvious outliers, i.e., points to the far right or left off the line. Since the points do
not precisely line up, there could well be missing factors providing systematic
errors.
Question 3: What body fat percentage do you predict for the person in training
and what are the estimated errors for this prediction?
Answer 3: This model predicts that the average person with x = (35, 190, 68, 42)′
has 25.9% body fat with standard error of the mean {Variance[yest(βest,x)]}1/2 equal
to 2.5%. The estimated standard errors are 6.1%. Therefore, the actual body fat of
the person in training could easily be 6-8% higher or lower than 25.9%. This
follows because there are errors in predicting what the average body fat for a
person with x = (35, 190, 68, 42)′ (±2.5%), and the person in training is likely to be
not average (±6.1%). Presumably, factors not included in the data set such as head
size and muscle weight are causing these errors. These error estimates assume that
the person in training is similar, in some sense, to the 29 people whose data are in
the training set. The surface plot below in Figure 15.6 shows that the prediction
model gives nonsensical predictions outside the region of the parameter space
occupied by the data, e.g., some average body fat percentages are predicted to be
negative.
2.5
2
1.5
1
Normal Scores
0.5
0
-15 -10 -5 -0.5 0 5 10 15 20
-1
-1.5
-2
-2.5
Residual
Figure 15.6. Normal probability plot for the body fat prediction model
Question 4: What type of reduction in body fat percentage could the person in
training expect by losing 15 pounds?
366 Introduction to Engineering Statistics and Six Sigma
Answer 4: The model predicts that the average person with specifications x = (35,
175, 68, 42)′ would have 22.2% percent body fat with error of the mean
{Variance[yest(βest,x)]}1/2 equal to 2.5%. Therefore, if the “Joe average” person
with the same specifications lost 15 pounds, then “Joe average” could expect to
lose roughly 4% body fat. It might be reasonable for the person in training to
expect losses of this magnitude also.
40
35
30
25
20
% Fat
15
10
216.0
5
187.8
Weight
0
159.7 (lbs.)
-5
131.5
76.0
74.8
73.5
72.3
71.0
69.8
68.5
67.3
66.0
64.8
Height (inches)
Figure 15.7. The average body fat percentages predicted for 35-year-old people and plotted
using Sagata® Regression Professional
Table 15.9. Example illustrating regression with a three level categorical factor
(a) (b)
Run Temp. Supplier x2 x3 y Const. x1 x2 x3 x12 x1 x2 x1 x3
1 20 Intel 1 0 22 1 20 1 0 400 20 0
2 80 Panasonic 0 1 33 1 80 0 1 6400 0 80
3 20 RCA 0 0 21 X= 1 20 0 0 400 0 0
4 80 Intel 1 0 3 1 80 1 0 6400 80 0
5 50 Panasonic 0 1 1 1 50 0 1 2500 0 50
6 80 RCA 0 0 23 1 80 0 0 6400 0 0
7 50 Intel 1 0 21 1 50 1 0 2500 50 0
Many methods have been proposed for planning response surface methods
experiments involving categorical factors. Chantarat et al. (2003) offered optimal
design of experiments methods with advantages in run economy and prediction
accuracy. In this section, we describe what is probably the simplest approach for
extending response surface methods, which is based on a product array approach in
which a standard response surface array is repeated for all combinations of
categorical factor levels. For example, Table 15.10 shows a product array for two
continuous factors and one categorical factor at two levels.
If the product array approach is used, then the fitted model includes: (1) all full
quadratic terms for the continuous factors, (2) main effects contrasts for the
categorical factors, and (3) interaction terms involving every interaction term
contrast and ever one of the continuous factor terms. For example, with two
continuous factors and one categorical factor at two levels, the model form is:
y(x1, x2, x3) = β1 + β2x1 + β3x2 + β4x3 + β5x12 + β6x22 + β7x1x2 + β8x1x3
+ β9x2x3 + β10x12x3 + β11x22x3 + β12x1x2x3 . (15.15)
In general, none of the design of experiments and regression methods in this
and previous chapters are appropriate if the response is categorical, e.g.,
conforming or nonconforming to specifications. Logistic regression and neural nets
described in the next chapter are relevant when outputs are categorical.
However, if each experimental run is effectively a batch of “b” successes or
failures, then the fraction nonconforming can be treated as a continous response.
370 Introduction to Engineering Statistics and Six Sigma
Moreover, if the batch size and true fraction nonconforming satisfies the following,
then it is reasonable to expect that the residuals in regression will be normally
distributed:
b × p0 > 5 and b × (1 – p0) > 5. (15.16)
This is the condition such that binomial distributed random probabilities can be
approximated using the “normal approximation” or normal probability distribution
functions. As for selecting sample sizes in the context of p-charting (in Chapter 4),
a preliminary estimate of a typical fraction nonconforming, p0, is needed. For
example, in the printed circuit board (PCB) described in Chapter 11, batches of
size 350 were used and all estimated fraction nonconforming were between 0.05
and 0.95.
Table 15.10. Product design of a central composite and a two-level categorical factor
(a) (b)
SO A B C SO A B C Run A B C Run A B C
1 -1 -1 1 11 -1 -1 2 1 0 -1 2 11 -1 -1 2
2 1 -1 1 12 1 -1 2 2 0 0 1 12 1.4 0 1
3 -1 1 1 13 -1 1 2 3 -1 -1 1 13 0 0 1
4 1 1 1 14 1 1 2 4 1 1 1 14 0 1.4 1
5 0 0 1 15 0 0 2 5 0 0 2 15 1 -1 1
6 0 0 1 16 0 0 2 6 0 1.4 2 16 -1 1 1
7 -1.4 0 1 17 -1.4 0 2 7 -1 0 2 17 0 -1 1
8 1.4 0 1 18 1.4 0 2 8 1 1 2 18 1 -1 2
9 0 -1.4 1 19 0 -1.4 2 9 1.4 0 2 19 0 0 2
10 0 1.4 1 20 0 1.4 2 10 -1 0 1 20 -1 1 2
¦x
i =1
i = T and 0 ai xi bi T for i = 1, …, q (15.17)
estimates would be undefined. For this reason, Scheffé (1958) proposed dropping
selected terms from full d polynomials with the additional intent of preserving
interpretability of the estimated coefficients.
Alternative model schemes developed by Scheffé and other authors have been
reviewed in Cornell (2002). An example of the model forms that Scheffé proposed
is the so-called “canonical second order” mixture model
y(x1,…,xq) = Σ β
i=1,…,q ixi + Σ Σ
i=1,…,q β
i<j ijxixj +ε . (15.18)
The model is “canonical first order” if the interaction terms are omitted. Data
from Piepel and Cornell (1991) show that models of the form in Equation (15.18)
are by far the most popular in documented case studies. Relevant recent models
involving both mixture and process variables are described in Chantarat (2003).
15.9 References
Chantarat N (2003) Modern Design of Experiments For Screening and
Experimentations With Mixture and Qualitative Variables. PhD
dissertation, Industrial & Systems Engineering, The Ohio State University,
Columbus
Chantarat N, Zheng N, Allen TT, Huang D (2003) Optimal Experimental Design
for Systems Involving Both Quantitative and Qualitative Factors.
Proceedings of the Winter Simulation Conference, ed RDM Ferrin and P
Sanchez
Cornell, JA (2002) Experiments with Mixtures, 3rd Edition. Wiley: New York
Fisher RA. (1925) Statistical Methods for Research Workers. Oliver and Boyd,
London
Piepel GF, Cornell JA (1991) A Catalogue of Mixture Experiments. Proceedings of
the Joint Statistical Meetings (August 19), Atlanta
Scheffé, H (1958) Experiments With Mixtures. Journal of the Royal Statistical
Society: Series B 20 (2), 344-360.
15.10 Problems
1. A new product is released in two medium-sized cities. The demands in Month
1 were 37 and 43 units and in Month 2 were 55 and 45 units. Which of the
following is correct and most complete?
a. A first order regression forecast for Period 3 is 70 units.
b. One of the residuals is -2 and another one is 5.
c. A first order regression forecast for Period 3 is 60 units.
d. Trend analysis cannot involve regression analysis.
e. All of the above are correct except (a).
3. Consider the data in Table 15.1, which is correct and most complete?
a. Analysts are almost always given data in a format that make analysis
easy.
b. Making up data can never increase the believability of analysis
results.
Regression 373
c. The missing data point in Table 15.1 (a) could be worth saving to
produce relatively accurate forecasts.
d. All of the above are correct.
e. All of the above are correct except (a) and (d).
Table 15.11 is relevant to Questions 6-9. Assume that the model form being fitted
is a first order polynomial in factors x1 and x2 only, unless otherwise mentioned.
6. Which of the following is correct and most complete (for a model without x3)?
a. Including the optional scaling, VIF1 = VIF2 = 1.08.
b. One of the values on the normal probability plot of residuals is
–1.7, –2.4 .
c. In general, Φ–1[s] = – Φ–1[–s].
d. The input pattern is clearly not acceptable for fitting a first-order
model.
e. All of the above are correct.
f. All of the above are correct except (a) and (e).
7. Which of the following is correct and most complete (for a model without x3)?
a. The data derive from an application of standard response surface
methods.
b. No outliers appear in the upper right of the normal probability plot of
residuals.
c. The PRESS value for a first-order model in this case is 3280.
d. The ANOVA in this case clearly indicates the response data are all
noise.
e. All of the above are correct.
g. All of the above are correct except (a) and (e).
8. Which of the following is correct and most complete (for a model without x3)?
a. The regression estimate for sigma based on a first-order model is
0.944.
b. The SSR for a first order model is 6637.55.
c. The SSE for a first order model is 396.45.
d. There are no obvious outliers on a normal plot of residuals.
e. All of the above are correct.
f. All of the above are correct except (a) and (e).
9. Consider a first order regression using all of the factors including x3 with the
sales response. Which is correct and most complete?
a. Since “Service type” is a three level categorical factor, the coefficient
calculations could be aided by creating two “contrasts” or dummy
factors.
b. All of the factors are proven to have a significant effect on sales with
α = 0.05.
c. It is unlikely that distance from campus (Distance) really affects
sales.
d. It is impossible that population and service type interact in their effect
on sales.
e. All of the above are correct.
10. Which is correct and most complete with regard to the regression flowchart in
Figure 15.5?
a. Acceptability checks cannot include residuals plots.
b. Models of on-hand data must be used with caution.
Regression 375
13. Analyze the box office data in Table 15.13 and make recommendations to the
extent possible for a vice president at a major movie studio. A good analysis
will typically include one or two models, reasons for selecting that model,
estimates of the errors of the model, and interpretation for the layperson of
everything. (Note that this question is intentionally open-ended because that is
the way problems are on-the-job). Feel free to supplement with additional real
data if you think it helps support your points. (These are from Yahoo.com)
14. Analyze the real estate data in Table 15.14 and make recommendations to a
real estate developer about where to build and what type of house to build for
profitability. A good analysis will typically include one or two models, reasons
376 Introduction to Engineering Statistics and Six Sigma
for selecting that model, estimates of the errors of the model, and
interpretation of everything for the layperson. (Note that this question is
intentionally openended because that is the way problems are on-the-job).
Feel free to supplement with additional real data.
16.1 Introduction
Linear regression models are not the only curve-fitting methods in wide use. Also,
these methods are not useful for analyzing data for categorical responses. In this
chapter, so-called “kriging” models, “artificial neural nets” (ANNs), and logistic
regression methods are briefly described. ANNs and logistic regression methods
are relevant for categorical responses. Each of the modeling methods described
here offers advantages in specific contexts. However, all of these alternatives have
a practical disadvantage in that formal optimization must be used in their fitting
process.
Section 2 discusses generic curve fitting and the role of optimization. Section 3
briefly describes kriging models, which are considered particularly relevant for
analyzing deterministic computer experiments and in the context of global
optimization methods. In Section 3, one type of neural net is presented. Section 4
defines logistic regression models including so-called “discrete choice” models. In
Section 5, examples illustrate logit and probit discrete choice models.
where predictions come from yest(βest,x), which is based on the so-called functional
form, h(βest,x), of the fitted model. This “functional form” constrains the
relationship between the predictions (yest), the inputs (x), and the model parameters
(βest). Generally, yest(βest,x) refers to the prediction that will be generated, given βest
for the mean value of the system response at the point x in the region of interest.
The quantity Rk refers to the k dimensional vector space of real numbers, i.e., βest,i
are real numbers i = 1,…,k. Note the symbol “∧” can be used interchangeably with
“est” to indicate that the quantity involved is estimated from the data.
For linear regression and many implementations of neural nets, the objective
function, g, is the negative of the sum of squares of the estimated errors (SSE). For
kriging models, the objective function, g, is the so-called “likelihood” function.
Yet, the curve-fitting objective function could conceivably directly account for the
expected utility of the decision-maker, instead of reflecting the SSE or likelihood.
In the context of regression models, the entries in βest are called coefficients. In the
context of neural nets, specific βest,i refer to so-called weights and numbers of
nodes and layers. In the context of kriging models, entries in β est are estimated
parameters.
Here we review a curve fitting problem that reveals the special properties of linear
regression curve fitting. Consider an example with a single factor involving the
five runs, i.e., input, ( x1)i, and output, yi, combinations, given in Table 16.1.The
linear regression optimization problem for estimating the coefficients is given by
Equation (16.3) clarifies that the functional form is a first order polynomial or,
in other words, a line. Because Equation (16.3) is linear in the coefficients and the
objective function in Equation (16.2) is a quadratic polynomial, there is a formula
giving the solution.
As described in Chapter 15, the solution is given by the formula
model associated with the coefficients that minimize the sum of squares error are
shown in Figure 16.1 (b). The coefficients giving the fit in Figure 16.1 (b) can be
derived using the a solver. Using such a solver is necessary for all of the other
curve fitting methods in this chapter.
300 300
250 250
200 200
150 150
y
y
100 100
50 50
0 0
2 4 6 8 2 4 6 8
x1 x1
(a) (b)
Figure 16.1. The data and two models (a) sub-optimal and (b) least squares optimal
The following equation offers intuition about how kriging models work:
Y(x) = f(x) + Z(x), (16.5)
where f(x) is a regression model that is potentially the same as a linear regression
model and Z(x) is a function that models departures from the regression model. A
relevant concept is, therefore, an attempt to more aggressively model the
unexplained variation compared with using regression models only.
Advanced Regression and Alternatives 383
Table 16.2. Example (a) latin hypercube and (b) space-filling design
(a) (b)
Run A B C Run A B C
1 0.375 0.042 0.875 1 0.040 0.929 0.788
2 0.625 0.458 0.542 2 0.000 0.313 1.000
3 0.042 0.542 0.292 3 1.000 0.010 0.000
4 0.792 0.792 0.125 4 0.253 0.000 0.000
5 0.458 0.375 0.708 5 0.909 0.434 0.475
6 0.208 0.708 0.375 6 0.000 0.586 0.020
7 0.292 0.125 0.458 7 1.000 1.000 0.768
8 0.542 0.292 0.625 8 0.414 1.000 0.273
9 0.125 0.875 0.042 9 1.000 0.909 0.040
10 0.958 0.208 0.792 10 0.980 0.000 0.980
11 0.875 0.625 0.958 11 0.576 0.586 1.000
12 0.708 0.958 0.208 12 0.424 0.000 0.636
Step 1. Develop a function giving the correlation matrix, R, between the responses
at the points, x1, …, xn, using the formula
m
∏e
−θ k ( xk − xi , k ) p k
Ri,j(θ,p) = (16.6)
k =1
where xi and xj are all pairs of points and R is a function of θ and p. This
matrix is used for calculating and optimizing the likelihood.
Step 2. Calculate βest as a function of the fitting parameters using
βest(θ,p) = (1′R–11)–1 1′R–1y (16.7)
where 1 is an n dimensional vector of 1s. This coefficient is used for
calculating and optimizing the likelihood.
Step 3. Calculate σest as a function of the fitting parameters using
σest2(θ,p) = n–1(y – 1βest) ′ R–1 (y – 1βest) . (16.8)
Step 4. Calculate ln L as a function of the fitting parameters using:
ln L(θ,p) = − n ln σ est + ln[det(R )] .
2
(16.9)
2
Step 5. Estimate parameters by solving
Maximize: ln L(θ,p) (16.10)
subject to θi ≥ 0 and 0 ≤ pi ≤ 2 for i = 1,…,m .
Step 6. Generate predictions at any point of interest, x, using
yest(x) = βest + r′(x)R–1(y – 1βest). (16.11)
where r(x)=[R(x,x1),…,R(x,xn)]′ with R(w,x) given by
m
∏e
−θ k ( xi , k − x j , k ) pk
R(x,xi) = . (16.12)
k =1
The functions in Equations (16.6) and (16.11) represent one of several possible
functional forms of interest. With the response of the system viewed as random
variables, these equations express possible beliefs about how responses might
correlate. The equations imply that repeated experiments at the same points, e.g., x
= xi give the same outputs, because the correlations are 1.
Maximization of the likelihood function in Step 5 can be a difficult problem
because there might be multiple extrema in the θk and pk space. Welch et al. (1992)
propose a search technique for this purpose that is based on multiple line searches.
Advanced Regression and Alternatives 385
Commonly, pk = 2 for all k is assumed because this gives rise to often desirable
smoothness properties and reduces the difficulty in maximizing the likelihood
function.
Question: Fixing p1 = 2, use the data in Table 16.1 to estimate the optimal θ
parameter, R matrix, and a prediction for x1 = 5.5.
Answer: Equation (16.8) gives an estimate of β equal to 133.18. Next, one derives
the log-likelihood as a function of θ1. Maximizing gives the estimated θ1 equal to
1.648. The resulting R matrix is
and the prediction is βest + r′(x1 = 5.5)R–1(y – 1βest) = 141.9 which is somewhat
close to the first order linear regression prediction of –26 + 32 × 5.5 = 150.
Allen et al. (2003) compared the prediction accuracy of neural nets with linear
models in the context of test functions and response surface methods. The tentative
conclusion reached is that kriging models do not offer obvious, substantial
prediction advantages despite their desirable property of passing through all points
in the data base. However, the cause of the prediction errors was shown to relate to
the choice of the likelihood fitting objective and not the bias inherent in the fit
model form. Therefore, as additional research generates alternative estimation
methods, kriging models will likely become a useful alternative to linear models
for cases in which prediction accuracy is important. Also, as noted earlier, kriging
models adapt easily to cases in which the number of runs exceeds that in standard
response surface methods.
Ribardo (2000). The principle advantages of the networks described here relate to
their pedagogical use, in that they can be completely described. Also, they can be
implemented with minimal training and without special software. Note that neural
networks are a particularly broad class of modeling techniques. No single
implementation could be representative of all of the possible methodologies that
have been proposed in the literature. Therefore, the disadvantage of this approach
is that other, superior, implementations for similar problems almost surely exist.
However, results in Ribardo (2000) with the proposed neural net probably justify a
few general comments about training and complexity associated with the existing
methods.
Kohonen (1989) and Chambers (2000) review neural net modeling in the
context of predicting continuous responses such as undercut in millimeters. Neural
nets have also been proposed for classification problems involving discrete
responses.
Here also, an attempt is made to avoid using the neural net terminology as
much as possible to facilitate the comparison with the other methods. Basically,
neural nets are a curve-fitting procedure that, like regression, involves estimating
several coefficients (or “weights”) by solving an optimization problem to minimize
the sum of squares error. In linear regression, this minimization is relatively trivial
from the numerical standpoint, since formulas are readily available for the solution,
i.e., β est = (X′X)-1X′y. In neural net modeling, however, this minimization is
typically far less trivial and involves using a formal optimization solver. In the
implementation here, the Excel solver is used. In more standard treatments,
however, the solvers involve methods that are to some degree tailored to the
specific functional form (or “architecture”) of the model (or “net”) being fit. A
solver algorithm called “back-propagation” is the most commonly used method for
estimating the model parameters (or “training on the training set”) for the model
forms that were selected. This solver technique and its history is described in
Rumelhart and McClelland (1986) and Reed and Marks (1999).
There are many possible functional forms (“architectures”) and, unfortunately,
little consensus about which of these forms yield the lowest prediction errors for
which type of problems, e.g., see Chambers (2000). An architecture called “single
hidden layer, feed forward neural network based on sigmoidal transfer functions”
with five randomly chosen data in the test set and the simplest “training
termination criterion” was arbitrarily selected. One reason for selecting this
architecture type is that substantial literature exists on this type of network, e.g.,
see Chambers (2000) for a review. Also, it has been demonstrated rigorously that,
with a large enough number of nodes, this type of network can approximate any
continuous function (Cybenko 1989) to any desired degree of accuracy. This fact
may be misleading, however, because in practice the possible number of nodes is
limited by the amount of data (see below). Also, it may be possible to obtain a
relatively accurate net with fewer total nodes using a different type of architecture.
The choice of the number of nodes and the other specific architectural
considerations is largely determined by the accepted compromise between the
observed variation (high adjusted R2) and what is referred to as “over-fitting”.
Figure 16.2 illustrates the concept of over-fitting. In the selected feed-forward
architecture, for each of the l nodes in the hidden layer (not including the constant
Advanced Regression and Alternatives 387
node, which always equals 1.0), the number of coefficients (or “weights”) equals
the number of factors, m, plus two. Therefore, the total number of weights is w =
l(m+2) + 2. The additional two weights derive from multiplying the constant node
in the final prediction node and the (optional) overall scale factor, which can help
in achieving realistic weights.
Several rules of thumb for selecting w and l exist and are discussed in
Chambers (2000). If w equals or exceeds the number of data points n, then
provably the sum of squares error is zero and the neural net passes through all the
points as shown in Figure 16.2 below. If there are random errors in the data,
illustrated in Figure 16.2 by the εis, then prediction of the average response values
will be inaccurate since the net has been overly influenced by these “random
errors”.
The simple heuristic method for selecting the number of nodes described in
Ribardo (2000) will be adopted to address this over-fitting issue in the response
surface context. This approach involves choosing the number of nodes so that the
number of weights approximately equals the number of terms in a quadratic Taylor
series or, equivalently, a response surface model. The number of such terms is (m +
1)(m + 2)/2. In the case study, m = 5 and the number of terms in the RSM
polynomial is 21. This suggests using l = 3 nodes in the hidden layer so that the
number of weights is 3 × (5 + 2) + 2 = 23 weights.
One additional complication is the number of runs selected for the so-called
“test set”. These runs are set aside and not used for estimating the weights in the
minimization of the sum of squares error. In the context of welding parameter
development from planned experiments, it seems reasonable to assume that the
number of runs is typically small by the standards discussed in the neural net
literature. Therefore, the ad hoc selection of five random runs for the test set was
proposed because this is perhaps the smallest number that could reasonably be
expected to provide an independent and reliable estimate of the prediction errors.
A final complication is the so-called “termination criterion” for the
minimization of the sum of squares error. In the hopes of avoiding over-fitting
inaccuracies illustrated in Figure 16.3, many neural net users do not attempt to
solve the sum of squares minimization problem for the coefficients (“weights”) to
global optimality. Instead they terminate the minimization algorithm before its
completion based on nontrivial rules deriving from inspection of the test set errors.
For simplicity, these complications were ignored, and the Excel solver was
permitted to attempt to select the weights that globally minimize the sum of
squares error.
388 Introduction to Engineering Statistics and Six Sigma
data points
εi random errors
ε2
ε3 true model
ε1
over-fit estimated
model
Next the construction of neural nets is illustrated using a welding example from
Ribardo (2000). The response data is shown in Table 16.3 from a Box-Behnken
experimental design. This data was used to fit (“train”) the spreadsheet neural net
in Figure 16.3.
Figure 16.3. The Excel spreadsheet neural net for undercut response
Advanced Regression and Alternatives 389
Then 46 identical neural nets were created in Excel, designed to predict each of
the test run data based on a common set of weights. A random number generator
was used to select five of these runs as the “test set”. The Excel solver was used
next in order to minimize the training set sum of squares error by optimizing the
weights. This procedure resulted in the net shown in Figure 16.4 for undercut with
the weights at the bottom right and the weights for convexity shown in Table 16.4.
Figure 16.5 shows a plot of the neural net predictions for the welding example
compared with other methods compared in Ribardo (2000).
Table 16.3. Box Behnken design and data for the neural net from Ribardo (2000)
Table 16.4. The weights for the convexity response neural net
Factors/inputs Const
1 2 3 4 5 6
Node 1 7.069 -6.35 -7.12 7.086 -6.2 0.646
Node 2 2.134 2.048 2.899 -2.66 -2.25 -0.26
Node 3 54.01 0.518 37.03 19.98 -38.4 -35.5
FHL 5.26 5.487 5.224 -6.59 -0.28
Advanced Regression and Alternatives 391
Figure 16.4. The solver fields for estimating the net coefficients (“weights”)
1.2
1.0
0.8
Undercut (mm)
-0.2
-0.4
-0.6
0.00 0.05 0.10 0.15 0.20 0.25
Arc Length
Figure 16.5. Predictions from the models derived from alternative methodologies
any one of categorically different reponse levels (e.g., see Ben-Akiva and Steven
1985 and Hosmer and Lemeshow 1989).
Logistic regression models are a widely used set of modeling procedures for
predicting these probabilities. It is particularly relevant for cases in which what
might be considered a large number of data points is available. Considering that
“data mining” is the analysis or very large flat files, logistic regression can be
considered an important data mining technique. Also, “discrete choice models” are
logistic regression models in which the levels of the categorical variables are
options a decision-maker might select. In these situations, the probability is the
market share might command when faced with a specified list of competitors.
Logistic regression models including discrete choice models are based on the
following concept. Each level of the categorical response is associated with a
continuous random variable, which we might call ui for the “utility” of response
level i. If the random variable associated with a given level is highest, that level is
response or choice. Figure 16.6 (a) shows a response with two levels, e.g., system
options a decision-maker might choose. System 1 random variables have a lower
average than system 2 random variables. However, by chance the realization for
system 1 (♦) has a higher value than for system 2 (♦). Then, the response would
be system 1 but, in general, system 2 would have a higher probability.
25 25
20 20
15 15
system 1
10 10 function
(a) (b)
Figure 16.6. Utilities of (a) two systems each with fixed level and (b) two system functions
Figure 16.6 (b) shows how the distribution means of the two random variables
are functions of a controllable input factor x. By adjusting x, it could be possible to
tune each system to its optimum resulting in the highest chance that that level (or
system) will occur (or be chosen). Note that the input factor levels that tune one
system to its maximum can be different than those that tune another system to its
maximum. The goal of experimentation in logistic regression is, therefore, to
derive the underlying functions and then to use these functions to predict
probabilities.
The specific utilities, ui, for each level i are random variables. “Logit models”
are logistic regression models based on the assumption that the random utilities
follow a so-called “extreme value” distribution. “Probit models” are logistic
Advanced Regression and Alternatives 393
regression models based on the assumption that the utilities are normally
distributed random variables. Sources of randomness in the utilities can be
attributed to differences between the average system performance and the actual
and/or differences between the individual decision-maker and the average
decision-maker.
Question: Develop and experimental plan with the following properties. Three
prototypes (short, medium, and tall) are required. Two prices ($10, $15) are tested.
Three people are involved in choosing (Frank, Neville, and Maria). People never
choose between more than two alternatives at a time.
Answer: There are many possible plans. One solution is shown in Table 16.5.
Note that, with the restriction on the choice set size, this becomes a discrete choice
problem.
For example, Sandor and Wedel (2001) used the so-called “Db-error criterion”
to generate lists of recommended prototypes and arrangements for their
presentation to representative samples of consumers. The Db-error criterion is
394 Introduction to Engineering Statistics and Six Sigma
Logit models are probably the most widely used logistic regression and discrete
choice models, partly because the associated extreme value distribution makes
logit models easy to work with mathematically. The following notation is used in
fitting logit models:
1. c is the number of choice sets.
2. xj,s is an m vector of factor levels (attributes) of response level
(alternative system) j in choice set s. In ordinary logistic regression
cases, there is only one choice set (c = 1), and j is similar to the usual
run index in an experimental design array.
3. ms is the number of response levels (alternative systems) in choice
set s.
4. ns is the number of observations of selections from choice set s.
5. βest,j is the estimated coefficient reflecting the average utility of the
response level j as a function of the factor levels. Here, the focus is on
the assumption that β est,j = βest for all j.
6. fj(x) is the functional form of the response j (alternative system)
model. Here, the focus is on the assumption that fj(x) = f(x) for all j.
7. pj,s(x,βest) is the probability that the response j with attributes (x) will
be selected in the set s.
8. yj,s denotes the number of selections of the alternative j in the choice
set s.
9. ln L(βest) is the log-likelihood which is the fitting objective.
Equation (16.16) can be used to estimate chances that responses will take on
specific values. If the responses come from observing peoples’ choices, Equation
(16.16) could be used to estimate market shares that a new alternative or product
with input values x might achieve.
The form in Equations (16.14) through (16.16) is associated with potentially
restrictive “independence from irrelevant alternatives” (IIA) property. This
property is that a change in the attributes, x, of one alternative j necessarily results
in a change in all other choice probabilities, exactly preserving their relative
magnitudes. This property is generally considered not desirable and motivates
alternative to logit based logistic regression models.
Note also, some of the attributes associated with specific choices in choice sets
could be associated with the decision-makers, e.g., their incomes. This might not
require changes to the above formulas as illustrated by the next example. In
Advanced Regression and Alternatives 395
general, many variations of the above approach are considered in the literation with
complications depending on relevant assumptions and the input pattern or design
of experiments array.
Step 1. Observe the ns selections for s = 1,…,c and document the choice counts yj,s
in the context of the input pattern, xj,s.
Step 2. Estimate parameters by maximizing the likelihood and solving
c ms
Maximize: ln L(βest) = ¦¦
s =1 j =1
yj,s ln[pj,s(xj,s,βest)] (16.14)
where
pj,s(xj,s,βest) = exp[f(xj,s)ƍβ est] (16.15)
Σj =1,…,ms exp[f(xj,s)ƍβ est]
Step 3. Predict the probability that the response will be level l (alternative l will be
chosen), which is associated with factor levels, x, in a choice set with
alternatives, z1,…,zq, is the following:
pj,s(x,βest) = exp[f(x)ƍβ est] . (16.16)
exp[f(x)ƍβ est] + Σr =1,…,q exp[f(zj)ƍβ est]
Table 16.7. (a) Prototypes shown to the people and (b) choices and utility calculations
(a) (b)
Choice x4 – Income Estimated
Response x1 x2 x3 Person yj,1 Choice Ln(prob)
# (×$100K)
Prob.
x1,1 0 0 0 1 1 11 0.3 5.00E-02 -1.301E+00
x2,1 -1 1 0 2 2 6 0.5 5.00E-02 -1.301E+00
x3,1 0 1 -1 3 3 3 0.4 5.00E-02 -1.301E+00
x4,1 -1 0 -1 4 4 10 0.2 5.00E-02 -1.301E+00
x5,1 1 0 1 5 5 9 0.7 5.01E-02 -1.300E+00
x6,1 -1 0 1 6 6 5 0.3 4.98E-02 -1.303E+00
x7,1 1 0 -1 7 7 1 0.4 5.00E-02 -1.301E+00
x8,1 1 -1 0 8 8 8 0.5 4.99E-02 -1.302E+00
x9,1 0 -1 -1 9 9 6 0.3 4.99E-02 -1.302E+00
x10,1 0 -1 1 10 10 9 0.3 5.00E-02 -1.301E+00
x11,1 0 1 1 11 11 9 0.5 5.00E-02 -1.301E+00
x12,1 -1 -1 0 12 12 6 0.6 5.01E-02 -1.300E+00
x13,1 1 1 0 13 13 1 0.2 5.00E-02 -1.301E+00
14 7 0.2 5.01E-02 -1.301E+00
15 7 0.3 4.99E-02 -1.302E+00
16 5 0.1 5.02E-02 -1.299E+00
17 8 0.2 5.01E-02 -1.300E+00
18 13 0.3 5.00E-02 -1.301E+00
19 5 0.25 4.99E-02 -1.302E+00
20 2 0.3 5.00E-02 -1.301E+00
Sum -2.602E+01
Advanced Regression and Alternatives 397
16.7 References
Allen T, Bernshteyn M, Kabiri K, Yu L (2003) A Comparison of Alternative
Methods for Constructing Meta-Models for Computer Experiments. The
Journal of Quality Technology, 35(2): 1-17
Ben-Akiva M, Steven RL (1985) Discrete Choice Analysis. MIT Press,
Cambridge, Mass.
Chambers M (2000) Queuing Network Construction Using Artificial Neural
Networks. Ph.D. Dissertation. The Ohio State University, Columbus.
Cybenko G (1989) Approximations by Superpositions of a Sigmoidal Function.
Mathematics of Control, Signals, and Systems. Springer–Verlag, New York
398 Introduction to Engineering Statistics and Six Sigma
16.8 Problems
1. Which is correct and most complete based on the data in Table 16.1?
a. A kriging model prediction would be yest(x1=4.5) = 101.5.
b. Kriging model predictions could not pass through x1 = 3 and y1 = 70.
c. FEM experiments necessarily involve random errors.
d. Kriging models cannot be used for the same problems as regression.
e. All of the above are correct.
f. All of the above are correct except (a) and (e).
2. Which is correct and most complete based on the data in Table 16.1 with the
last response value changed to 200?
a. A kriging model prediction would be yest(x1=4.5) = 101.5.
b. The kriging models in the text are based on the assumption σ0 = 0.0.
c. The maximum log-likelihood value for θ1 is less than 1.5.
Advanced Regression and Alternatives 399
17.1 Introduction
In this chapter, two additional case studies illustrate design of experiments (DOE)
and regression being applied in real world manufacturing. The first study involved
the application of screening methods for identifying the cause of a major quality
problem and resolving that problem. The second derives from Allen et al. (2000)
and relates to the application of a type of response surface method. In this second
study, the design of an automotive part was tuned to greatly improve its
mechanical performance characteristics.
Note that Chapter 13 contains a student project description illustrating standard
response surface methods and what might realistically be achieved in the course of
a university project. Also, Chapter 14 reviews an application of sequential response
surface methods to improve the robustness and profitability of a manufacturing
process.
of the world market for the type of component produced. For confidentiality
reasons, we will refer to the part as a “bottle cap” which is the informal name
sometimes used within the company. In the years prior to the study, the company
had been highly successful in reducing production costs and improving profits
through the intelligent application of lean manufacturing including value stream
mapping (Chapter 5) and other industrial and quality engineering-related
techniques. Therefore, the management of the company was generally receptive to
the application of formal procedures for quality and process improvement.
In its desire to maintain momentum in cost-cutting and quality improvement,
the company decided to purchase two new machines for applying rubber to the
nickel-plated steel cap and hardening the rubber into place. The machines cost
between $250,000 and $500,000 each in direct costs. These new machines required
less labor content than the previous machines and promised to achieve the same
results more consistently. Unfortunately, soon after the single production line was
converted to using the new machine, the rubber stopped sticking on roughly 10%
of the bottle caps produced. Because this failure type required expensive rework as
well as 100% inspection and sorting, the company reverted to its old process.
“Bottle cap”
Because of the unexpected need to use the old process, the company was rapidly
losing money due to overtime and disruptions in the product flow through the
plant. Therefore, the problem was to adjust the process inputs (x) to make the
rubber stick onto the nickel plating consistently using the newer machines.
DOE and Regression Case Studies 403
Unfortunately, the engineering and technicians had many theories about which
factors should be adjusted to which levels, with little convincing evidence
supporting the claims of each person (because of the application of OFAT). Seven
candidate input factors were identified whose possible adjustment could solve the
problem. Also, considering the volume of parts produced and the ease of
inspection, it was possible to entertain the use of reasonably large batch sizes, i.e.,
b = 500 was possible.
Question: Which of the following could the first team most safely be accused of?
a. Leaders stifled creativity by adopting an overly formal decision-
making approach.
b. The team forfeited the ability to achieve statistical proof by using a
nonrandom run order.
c. The team failed to apply engineering principles and relied too much
on statistical methods.
d. The team failed to devote substantial resources to solve the problem.
Answer: This answer is virtually identical to the one in the printed circuit-board
study. Compared with many of the methods described in this book, team one has
adopted a farily “organic” or creative decision style. Also, while it is usually
possible to gain additional insights through recourse to engineering principles, it is
likely that these principles were consulted in selecting factors for OFAT
experimentation to a reasonable extent. In addition, the first team did provide
enough data to determine the usual yields prior to implementing recommendations.
Therefore, the criticisms in (a), (c), and (d) are probably not fair. According to
Chapter 11, random run ordering is essential to establishing statistical proof.
Therefore, (b) is correct.
The improvement team selected the eight run fractional factorial in Table 17.2
to structure experimentation. The resulting fractions nonconforming are also
described in the right-hand column. Interestingly, all fractions were lower than
expected perhaps because of a Hawthorne effect, i.e., the act of watching the
process carefully seems to have improved the quality.
One of the factors involved a policy decision about how long parts could wait
in queue in front of the rubber machine before they would need to be “reprimed”
using an upstream “priming” machine. This factor was called “floor delay”. If the
results had suggested that floor delay was important, the team would have issued
recommendations relating to the redesign of engineering policies about production
scheduling to the plant management. It was recognized that we probably could not
directly control the time parts waited. With 4000 parts involved in the experiment,
complete control of the times would have cost too much time.
Therefore, the team could only control decisions within its sphere of influence.
Implicitly, therefore, the “system boundaries” were defined to correspond to what
could be controlled, e.g., a maximum time of 15 minutes recommended for parts to
sit without being re-primed in our recommended guidelines. This was the control
factor. To simulate the impacts of possible decisions the team would make on this
issue, parts were either re-primed in the experiment if they waited longer than 15
minutes or they were constrained to wait at least 12 hours.
The main effects plot in Figure 17.2 and the results of applying Lenth’s
analysis method both indicated that Factor F likely affected the fraction
nonconforming. Because the statistic called “tLenth” for this factor is greater than
the “critical value” tIER,0.05,8 = 2.297 and the order of experimentation was
determined using randomization, many people would say that “this factor was
proven to be significant with α = 0.05 using the individual error rate.”
DOE and Regression Case Studies 405
Table 17.1. Factors and the ranges decided by the engineering team
Table 17.2. The experimental design and the results from the rubber machine study
Run A B C D E F G Y1
1 1 -1 1 -1 1 -1 1 4.4
2 -1 -1 -1 1 1 1 1 0
3 1 1 -1 -1 1 1 -1 0
4 -1 1 1 1 1 -1 -1 3.8
5 -1 1 1 -1 -1 1 1 0
6 1 1 -1 1 -1 -1 1 0.6
7 -1 -1 -1 -1 -1 -1 -1 2.8
8 1 -1 1 1 -1 1 -1 0
Table 17.3. Analysis results for the rubber machine screening experiment
Estimated
Factor tLenth
Coefficients (β est)
A -0.2 0.48
B -0.35 0.85
C 0.6 1.45
D -0.35 0.85
E 0.6 1.45
F -1.45 3.52
G -0.2 0.48
3.5
3
% Not Sticking
2.5
1.5
0.5
0
A- A+ B- B+ C- C+ D- D+ E- E+ F- F+ G- G+
Figure 17.2. Main effects plots derived from the fractional factorial experiment
After the experiment and analysis, it was discovered that some of the
maintenance technicians in the plant had been adjusting the shot size intermittently
based on their intuition about how to correct another less serious problem that
related to the “leaker” cosmetic defect. This problem was less serious because the
rework operation needed to fix the parts for this defect involved only scraping off
the parts, instead of pulling off all the rubber, cleaning the part, and starting over.
The maintenance staff involved had documented their changes in a notebook, but
no one had thought to try to correlate the changes with the incidence of defective
parts.
A policy was instituted and documented in the standard operating procedures
(SOPs) that the shot size should never be changed without direct permission from
the engineering supervisor, and the “hydrolyzer” machine was removed from the
process. The non-sticking problem effectively disappeared, and production shifted
over to the new machines, saving roughly $15K/month in direct supplemental labor
costs. The change also effectively eliminated costs associated with production
disruption and having five engineers billing their time to an unproductive activity.
DOE and Regression Case Studies 407
Another team was created to address the less important problem of eliminating the
fraction of parts exhibiting the cosmetic defect.
Table 17.4. Cause and effect (C&E) matrix used for “pre-screening” factors
(A) Ledge width
engineer rating
Manufacturing
Loop radius
Entry angle
Flat length
(B) Loop
(C) Loop
thickness
thickness
Issue
Easy to assemble 4.5 3.5 9.0 2.0 2.5 8.0 8.0 7.5 4.0
Strong enough to
10.0 4.5 10.0 3.5 8.0 7.5 7.5 1.0 3.0
replace screws
Factor Rating 140. 111. 111.
(F′) 60.8 44.0 91.3 43.8 48.0
Number 5 0 0
408 Introduction to Engineering Statistics and Six Sigma
B
D
Figure 17.3. The snap tab design concept optimized in our case study
The constraint on test runs followed from the fact that each test to evaluate
pull-apart and insertion forces required roughly three days of two people working
to create and analyze a finite element method simulation. Since management was
only willing to guarantee enough resources to perform 12 experimental runs,
application of the standard central composite design, which required at least 25
runs, was impossible. Even a 2 level design that permits accurate estimation of
interactions contains 16 runs, so just the first experiment in two-step RSM could
not be applied. Even the small central composite design, which had at least 17 runs
DOE and Regression Case Studies 409
was practically impossible (Myers and Montgomery 2001). Note that two similar
optimization projects were actually performed using different materials. These
necessities led to the use of a non-standard response surface methods.
The majority of standard design of experiments (DOE) methods were presented
initially in the Journal of the Royal Statistical Society: Series B and Technometrics.
These and other journals including the Journal of Quality Technology, the Journal
of Royal Statistical Society: Series C, Quality & Reliability Engineering
International, and Quality Engineering contain many innovative DOE methods.
These methods can address nonstandard situations, such as those involving
categorical and mixture factors (Chapter 15), and/or potentially result in more
accurate predictions and declarations for cases in which standard methods can be
applied.
In this study, the team chose to apply so-called “low cost response surface
methods” (LCRSM) from Allen et al. (2000) and Allen and Yu (2002). Those
papers provide tabulated, general-purpose experimental designs for three, four, and
five factors each with roughly half the number of runs of the corresponding central
composite designs and comparable expected prediction errors. Table 17.5 shows
the design of experiments (DOE) arrays and model forms relevant to LCRSM.
Table 17.6 shows the actual DOE array used in the case study. Note that no
repeated tests were needed because finite element method (FEM) computer
experiments have little or no random error, as described in Chapter 16.
Table 17.5. LCRSM: (a) initial design (b) the model forms, and (c) the additional runs
Table 17.6. Experimental runs and the measured pull-apart and insertion forces
Run A B C D Y1 Y2
1 1.25 1.7 12.5 10.00 55.95 15.39
2 2.00 2.1 10.0 10.00 101.76 19.92
3 1.00 2.1 20.0 10.00 101.23 21.02
4 2.00 1.7 12.5 6.25 52.93 18.55
5 1.50 1.9 10.0 7.50 59.93 13.42
6 1.50 2.1 15.0 7.50 80.54 15.90
7 1.25 1.7 20.0 6.25 60.87 14.70
8 1.00 1.9 15.0 7.50 72.02 13.51
9 2.00 2.1 20.0 5.00 102.70 22.81
10 1.00 2.1 10.0 5.00 51.36 23.79
11 1.50 1.9 15.0 5.00 59.42 26.33
12 1.75 1.8 8.8 8.75 81.94 13.50
Step 1:
relevantβ number (
Set up the experiment
q ,est = ¦
q by
) (
taking
)
1 / 2 the experimental
β i2,est from
ofi factors q −the
−1 / 2 design appropriate for the
1 appropriate table. Here only the four
factor design in Table 17.5 (a) is given, which is given in scaled (–1,1)
units. Scale to engineering units, e.g., see Table 17.6, perform the
experiments, and record the responses.
Step 2: Create the regression model(s) of each response by fitting the appropriate
set of candidate model forms from Allen, Yu (2002). For the design in
Table 17.5 (a), this is the set in Table 17.5 (b). The model fitting uses least
squares linear regression. Select the fit model form with the lowest sum of
squares error.
Step 3: (The Least Squares Coefficient Based Diagnostic) Calculate
(17.2)
where β i ,est are the least squares estimates of the q second order
coefficients in the model chosen in Step 2. Include coefficients of terms
like A2 and BC, but not first order terms such as A and D. Estimate the
maximum acceptable standard error of prediction or "plus or minus"
accuracy goal, σprediction. If βq,est ≤ 1.0σprediction, refit the model form in the
engineering units. Stop. Otherwise, or if there is any special concern with
the accuracy, continue to Step 4. Special concerns might include mid-
experiment changes to the experimental design. The default assumption for
σprediction is that it equals 2.0 times the estimated standard error, because
then the achieved expected “plus or minus” accuracy approximately equals
the error that would be expected if the experimenter applied substantially
more expensive methods based on composite designs.
Step 4: Perform additional experiments specified in the appropriate if needed, e.g.,
Table 17.5 (c). After the experiment, fit a full quadratic polynomial
regression model as in ordinary response surface methods. The resulting
model is expected to have comparable prediction errors (within 0.2σ) as if
the full central composite with 27 runs had been applied.
In the modified Step 3, the choice was made to set the desired accuracy to be
σprediction = 3.0 lb or ± 3 lb accuracy for the pull-apart force and σprediction = 3.0 lb for
the insertion force. The square roots of the sum of squares of the quadratic
coefficients divided by the number of quadratic coefficients, 6, for the two
responses were βq,est = 4.6 lb and 3.5 lb respectively. Since these were less than
their respective cutoffs, 2.0 × 3.0 lb = 6.0 lb for the pull force and 2.0 × 3.0 lb = 6.0
lb for the insertion force, we stopped. No more experiments were needed. The
expected average errors that resulted from this procedure were estimated to
roughly equal their desired values.
Compared with central composite designs using 25 distinct runs, there was a
savings of 13 runs, which was approximately half the experimental expense. The
expected average errors that resulted from this procedure were as small or smaller
than desired, i.e., within ±3 pounds for both pull apart and insertion forces
averaged over the region of interest. This prediction accuracy oriented experiment
DOE and Regression Case Studies 413
was likely considerably more accurate than what could be obtained from a
screening experiment such as either of the first two case studies. Also, the project
was finished on time and within the budget.
The models obtained from the low cost response surface methods procedure
were then optimized to yield the recommended engineering design. The parameters
were constrained to the experimental region both because of size restrictions and to
assure good accuracy of the models. An additional constraint was that the insertion
force of the snap tab needed to be less than 12 lb to guarantee easy assembly. The
formal optimization program that we used was
Maximize: yest,1(A,B,C,D)
Subject to: yest,2(A,B,C,D) ≤ 12.0 lb
–1.0 ≤ A,B,C,D ≤ 1.0
where we expressed the variables in coded experimental units.
Using a standard spreadsheet solver, the optimal design was A = 1.0, B = 0.85,
C = 1.0, and D = 0.33. Figure 17.5 shows the region of the parameter space near
the optimal. The insertion force constraint is overlaid on the contours of the pull
force. Forces are in pounds. In engineering units, the optimal engineering design
was A = 2.0 mm, B = 2.07 mm, C = 20 mm, and D = 8.3 mm, with predicted pull-
apart force equal to yest,1(A,B,C,D) = 118 lb.
Note that all factors have at least one associated model term that is large in
either or both of the models derived from the model selection for the insertion and
pull-apart forces. If the team had used fewer factors to economize, then important
opportunities to improve the quality would likely have been lost because the effects
of the missing factors would not have been understood. These missing factors
would likely have been set to sub-optimal values.
Figure 17.5. Finite element analysis (FEA) simulation of the snap tab
The results of the snap fit case study are summarized in Figure 17.7. The
“current” model derived from existing standard operating procedures (SOPs) in the
corporate design guide. Results associated with the “best guess” design, chosen
414 Introduction to Engineering Statistics and Six Sigma
after run 1 was completed, and the final recommended design, are also shown in
Figure 17.7. Neither the best guess design nor the current model designs were
strong and small enough to replace screws. The size increase was deemed
acceptable by the engineers because the improved strength made replacing screws
feasible.
Note that there was a remarkable agreement between the predicted and the
actual pull-apart forces (within 3%), which validates both the low cost response
surface method errors and our procedure for finite element simulation. The
resulting optimized design was put into production and into the standard operating
procedures. Some savings was achieved, but unanticipated issues caused the
retention on screws on many product lines.
1.0
Insertion force constraint
12
120
D Optimal X
110
12
100
90
80
70
-1.0 B 1.0
Figure 17.6. Insertion force constraint on pull force contours with A = 1 and C = 1
S iz e 176%
150%
100%
Figure 17.7. Improvement of the snap fit achieved in the case study
DOE and Regression Case Studies 415
ε1,ε2,..., εn y1 , y2 ,…,ys
Random Errors Responses
System
Inputs (β1 ,…, β s – True Model) Outputs
x1 x2 x3 z1 z2
Control Factors Noise Factors
significant really do affect the average response. The following examples are
designed to clarify the practical value of randomization.
Imagine that the rubber machine experiment had been performed in an order not
specified by pseudo-random numbers. Table 17.7 shows the same experimental
plan and data from the rubber machine study except the run order is given in an
order that displays some of the special properties of the experimental matrix. For
example, the columns corresponding to factors E, F, and G have an “elegant”
structure. This is a run order that Box and Hunter (1961) might have first generated
in their derivation of the matrix from combinatorial manipulations.
As in the real study, all of the runs with high fractions of nonconforming units
correspond to prototype systems in which the shot size was low. However, without
randomization, another simple explanation for the data confuses the issue of
whether shot size causes nonconforming units. The people performing the study
might simply have improved in their ability to operate the system, i.e., a “learning
effect”. Notice that only the first four runs are associated with poor results. The
absence of randomization in this imagined experiment would greatly diminish the
value of the collected data.
Run A B C D E F G Y1
1 1 -1 1 -1 1 -1 1 4.4
2 -1 1 1 1 1 -1 -1 3.8
3 1 1 -1 1 -1 -1 1 0.6
4 -1 -1 -1 -1 -1 -1 -1 2.8
5 -1 -1 -1 1 1 1 1 0
6 1 1 -1 -1 1 1 -1 0
7 -1 1 1 -1 -1 1 1 0
8 1 -1 1 1 -1 1 -1 0
for the control group could easily be caused by smoking and not the absence of the
drug. Using pseudo-random numbers makes this type of confusion or
“confounding” extremely unlikely. For example, if there are 10 smokers in a
group of 30, the chance that all 10 would be randomly assigned to a test group of
15 is less than 0.000001.
Because of the desirable characteristics from randomization, researchers in
multiple fields associate the word “proof” with the application of randomized
experimental plans. Generally, researchers draw an important distinction between
inferences drawn from “on-hand data”, i.e., data not from randomized
experimental plans, which they call observational studies, and the results from
randomized experimental plans. In language that I personally advocate, one can
only claim a hypothesis is “proven” if one has a mathematical proof with stated
assumptions or “axioms” derivation of the hypothesis from the standard model in
physics, or evidence from hypothesis testing, based on randomized experimental
plans.
The issue of fidelity further complicates the use of the word proof. As noted
earlier, in all of the studies, the stakeholders were comfortable with the assumption
that the prototype systems used for experimentation were acceptable surrogates for
the engineered systems that people cared about, i.e., that made money for the
stakeholders. Still, it might be more proper to say that causality was proven in the
randomized experiments for the prototype systems and not necessarily for the
engineered systems. Conceivably, one could prove a claim pertinent to a low
fidelity prototype system in the laboratory but not be able to generalize that claim
to the important, highest fidelity, real-world system in production. Although
methods to address concerns associated with fidelity are a subject of ongoing
research, fidelity issues, while extremely important, continue to be largely outside
the scope of formal statistical methods.
Note that randomization benefits are associated with the effects of factors that
are not controlled. Since these factors are often overlooked, the experimenter may
not have the option of controlling and fixing them. Yet, it is also not clear that
controlling these factors would be desirable (even if it were possible) since their
variation might constitute an important feature of the engineered system.
Therefore, a tightly controlled prototype system might be a low fidelity surrogate
for the engineered system. This explains why proof is generally associated with
randomization and not control.
strength and insertion effort for snap tabs. Formal optimization of the resulting
surface models permitted the doubling of the strength with small increase in size.
17.11 References
Allen TT, Yu L (2002) Low Cost Response Surface Methods For and From
Simulation Optimization. Quality and Reliability Engineering International
18: 5-17
Allen TT, Yu L, Bernshteyn M (2000) Low Cost Response Surface Methods
Applied to the Design of Plastic Snap Fits. Quality Engineering 12: 583-591
Brady J, Allen T (2002) Case Study Based Instruction of SPC and DOE. The
American Statistician 56 (4):1-4
Myers RH, Montgomery DA (2001) Response Surface Methodology, 5th edn. John
Wiley & Sons, Inc., Hoboken, NJ
17.12 Problems
Use the following information to answer Questions 1-3:
The above approach resulted in a disastrous drop in the yield (to 40%), and an IE
“DOE expert” was called in to plan new experiments. Someone other than an
electrical engineer then suggested an additional factor to consider.
The IE let team found that the operator suggested factor was critical, adjusted only
it and increased the yield to 95%.
8. Perform an experiment involving four factors and one or more responses using
standard screening using fractional factorials or responses surface methods.
The experimental system studies should permit building and testing individual
prototypes requiring less than $5 and 10 minutes time.
18
18.1 Introduction
As is the case for other six sigma-related methods, practitioners of six sigma have
demonstrated that it is possible to derive value from design of experiments (DOE)
and regression with little or no knowledge of statistical theory. However,
understanding the implications of probability theory can be intellectionally
satisfying and enhance the chances of successful implementations.
Also, in some situations, theory can be practically necessary. For example, in
cases involving mixture or categorical variables (Chapter 15), it is necessary to go
beyond the standard methods and an understanding of theory is needed for
planning experiments and analyzing results. This chapter focuses attention on three
of the most valuable roles that theory can play in enhancing DOE and regression
applications. For a review of basic probability theory, refer to Chapter 10.
First, applying t-testing theory can aid in decision-making about the numbers of
samples and the α level to use in analysis. Associated choices have implications
about the chances that different types of errors will occur. Under potentially
relevant assumptions, the chance of wrongly declaring significance (a Type I error)
might not be the α level used. Also, if the number of runs is not large enough, a
lost opportunity for developing statistical evidence is likely (a Type II error).
Second, theory can aid in the many decisions associated with standard
screening using fractional factorials. Decisions include which DOE array to use,
which alpha level to use in analysis, and whether to use the individual error rate
(IER) or experimentwise error rate (EER) critical values. With multiple factors
being tested simultaneously, many Type I and Type II errors are possible in the
same experiment.
Third, in applying responses surface methods (RSM) and regression in general,
the resulting prediction models will unavoidably result in some inaccuracy or
prediction errors. Theory can aid in predicting what those errors will be and aid in
the selection of the design of experiment (DOE) array. In general, design of
experiment arrays (DOE) can be selected from a pre-tabulated set or custom
designed. “Optimal design of experiments” is the activity of using theory and
424 Introduction to Engineering Statistics and Six Sigma
Table 18.1. Preview of the design of experiments criteria explored in this chapter
Criterion
Method Objective Assumptions Relevance
Type I and II Responses are normally Correct
T-testing errors distributed with selected declarations
probabilities means during analysis
Hierarchical
Type I error and Correct
Standard assumptions based on
Type II error declarations
screening normality and unknown
probabilities during analysis
true models
Expected Random, independent Accuracy of
One-shot squared true model coefficients, predictions
RSM prediction errors errors, and prediction after
or the “EIMSE” points experimentation
DOE and Regression Theory 425
x1 x1
x1 declared significant
Method User
? DOE points
Response data
Prediction point
Figure 18.1. An example DOE design problem with one simulation run or scenario
[a,b]. The notation that we will use is U ~ U[a,b]. Uniform random variables have
the distribution function fu(x) = (a – b)–1 for a ≤ x ≤ b and fu(x) = 0 otherwise.
The initial starting point of most simulations are approximately independent
identically (IID) distributed random numbers from a uniform distribution between
a = 0 and b = 1, written U[0,1]. As noted in Chapter 10, “independent” means that
one is comfortable with the assumption that the next random variable’s distribution
is not a function of the value taken by any other random variables for which the
independence is believed to apply.
For example, if a person is very forgetful, one might be comfortable assuming
that this person’s arrival times to class on two occasions are independent. Under
that assumption, even though the person might be late on one occasion (and feel
bad) the person would not modify his/her behavior and the chance of being late the
next time would be the same as always. Formally, if f(x 1 ,x 2 ) is the “joint
probability density function”, then independence implies that it can be written
f(x 1 , x 2 ) = f(x 1 )f (x 2 ) . Also, the phrase “identically distributed” means that all
of the relevant variables are assumed to come from exactly the same distribution.
Consider the sequences of numbers Q1, Q2, …, Qn and U1, U2, …Un given by
Qi = mod(1664525Qi–1+ 1013904223,232)
Qi x (18.1)
Ui = for i = 1,…∞ with Q 0 = 1
232
where the function “mod” returns the remainder of the first quantity in the brackets
when divided by the second quantity. For example 14 mod 3 is 14 – 4(3) = 2. The
phrase “random seed” refers to any of the numbers Q1,…,Qn, which starts a
sequence.
Then, starting with Q0 = 3, the first eight values i = 1,…,8 of the Qi sequence
are 1018897798, 2365144877, 3752335016, 3345418727, 1647017498,
3714889393, 2735194204, and 1668571147. Also, the associated Ui are
0.23723063, 0.550678204, 0.873658577, 0.778915995, 0.383476144,
0.864940088, 0.636837027, and 0.388494494. We know that these numbers are
not random since they follow the above sequence, and all values can be predicted
precisely at time of planning. In fact, the sequence repeats every 4,294,967,296
digits so that there is necessarily a perfect correlation between each element and
the element 4,294,967,296 after it (they are identical). Therefore, the numbers are
not independent, even if they appear random, if small strings are considered. Still,
considering the histogram of the first 5000 numbers in Figure 18.2, it might be of
interest to pretend that they are IID U[0,1].
For the computations in subsequent chapters, numbers are used based on
different, more complicated sequences of pseudo-random numbers given by the
function “ran2” in Numerical Recipes on pp. 282-283. Yet, the concept is the same.
The sequence that will be used also repeats but only after 2.3 × 1018 numbers.
Therefore, when ran2 is used one can confortably entertain the assumption that
these are perfect IID uniform random variables.
DOE and Regression Theory 427
600
500
400
Frequency
300
200
100
0
0.0-0.1 0.0-0.2 0.2-0.3 0.3-0.4 0.4-0.5 0.5-0.6 0.6-0.7 0.7-0.8 0.8-0.9 0.9-0.10
Generally, pseudo-random numbers for distributions other than uniform are created
starting with uniformly distributed pseudo-random numbers. The “univariate
transformation method” refers to one popular way to create these random numbers,
illustrated in Figure 18.3. An initial pseudo-random U[0,1] number U is
transformed to another number, X, using the so-called “inverse cumulative
distribution” or F function associated with the distribution of interest.
Since the U has a roughly equal chance of hitting anywhere along the vertical
axis, the chance that X will lie in any interval on the horizontal axis is proportional
to the slope of the curve at that point. One can write this slope (d/dx)F(x). From
the “Liebniz rule” in calculus (see the Glossary), we can see that (d/dx)F(x) = f(x)
if and only if
x
F(x) = ³ f (x )dx
−∞
(18.2)
1
F(x
U = 0.705
F-1(U) = X =
x
$9,500 $10,000 $10,600
Figure 18.3. One way to derive a pseudo-random number, X
428 Introduction to Engineering Statistics and Six Sigma
For example, the cumulative distribution function, F(x), for the triangular
distribution function with a = $9,500 and b = $10,600, with c = $10,000 is
0 if x ≤ a
(x – a)2 a if a < x ≤ c
(b – a)(c – a) (18.3)
F(x) =
1– (b – x)2a if c < x < b
(b – a)(b – c)
1 if x ≥ b
The inverse cumulative distribution function for the triangular is
Suppose someone tells you that she believes that revenues for her product line will
be between $1.2M and $3.0M next year, with the most likely value equal to $2.7M.
She says that $2.8M is much more likely than $1.5M.
f(x)
1.0
Figure 18.4. A proper distribution function consistent with the stated beliefs
DOE and Regression Theory 429
Question 2: Use your own distribution function from Question 1 to estimate the
probability, according to her beliefs, that revenue will be greater than $2.6M.
Answer 2:
P(X > 2.6) = the shaded area above (18.5)
= 1 – P(X ≤ 2.6) where P(X ≤ 2.6) is the CDF for 2.6 and
Answer 3:
(3.0 – 1.2)(2.7 – 1.2)u + a if u < (2.7 – 1.2)/(3.0 – 1.2)
F-1(u) = (18.6)
3.0 – (1 – u)(3.0 – 1.2)(3.0 – 2.7) otherwise
Plugging in and marking the units we obtain: $1.9M, $2.65M, and $2.37M.
Then, one would like our pseudo-random numbers to reflect these “correlations”
and, e.g., have similar sample correlations.
Suppose that we have F–1(u) available for a normal distribution with mean µ =
0 and standard deviation σ = 1. Then, one can generate, Z1, Z2, Z3 approximately
IID standard normal random variables. It is a fact verifiable by linear algebra and
calculus that if we form the matrices V and T:
430 Introduction to Engineering Statistics and Six Sigma
X1 Z1 µ1
X2 = T Z2 + µ2 (18.9)
X3 Z3 µ3
Because of the unusual properties of the normal distribution, one can also say
that the Xi calculated this way are approximately normally distributed. For a recent
reference on generating random variables from almost any distribution with many
possible assumptions about correlations, see Deler and Nelson (2001).
Note that it is possible to generate approximately IID random variables from
many distributions that have no commonly used names by constructing them from
other random variables. For example, if Z1 and Z2 are IID normally distributed with
mean 0 and standard deviation 1, then X1 = sin(Z1) and X2 = sin(Z2) are also IID,
but their distribution has no special name.
Defining
∞
Xbarn − ³ u f (u)du
( X 1 + X 2 + ... + X n ) and Z n = −∞
(σ / n )
Xbarn = , (18.10)
n
it follows that
x 1
1 − 2u2
lim Pr (Z n ≤ x ) = ³ e du . (18.11)
n →∞
−∞ 2π
In words, averages of n random variables, Xbarn, are approximately characterized
by a normal probability density function. The approximation improves as the
number of quantities in the average increases. A reasonably understandable proof
of this theorem, i.e., the above assumptions are equivalent to the latter assumption,
is given in Grimmet and Starzaker (2001), Chapter 5.
To review, the expected value of a random variable is:
∞
E[X] = ³ u f (u)du
−∞
(18.12)
Then, the CLT implies that the sample average converges, Xbarn, converges to
the true mean E[X] as the number of random variables averaged goes to infinity.
Therefore, the CLT can be effectively rewritten as
E[X] = Xbarn + eMC, (18.13)
where eMC is normally distributed with mean 0.000 and standard deviation σ ÷
sqrt[n] for “large enough” n. It is standard to refer to Xbarn as the “Monte Carlo
simulation estimate” of the mean, E[X]. There, with only common causes
operating, the Xbar chart user is charting Monte Carlo estimates of the mean.
Since σ is often not known, it is sometimes of interest to use the sample
standard deviation, s:
n
¦(X i − Xbarn )
2
s= i =1 (18.14)
n −1
Then, it is common to use
σestimate = s ÷ c4 (18.15)
where c4 comes from Table 10.3. Therefore, the central limit theorem provides us
with an estimate of the errors of Monte Carlo estimates.
where f(x) is the distribution function the random variable x. Then, to calculate
E[g(X)], we can generate IID g(X) using IID X from the distribution function f(x).
In this way, Monte Carlo simulation can evaluate a wide variety of expected
values. For example, if g(X) is an “indicator function” which is a 1 if an event A
occurs and 0 otherwise, then E[g(X)] = Pr{A}.
This law can be proven using some of the basic definitions associated with
integrals. Intuitively, if the probability that {X = x} is proportional to f(x), the
probability that {g(X) = g(x)} is also proportional to f(x).
3
x2 2
³
Question: Estimate e x dx .
1
Answer: Rewriting, we have
³ [(3 − 1)(e )]
3 ∞
³e
x2 2
x dx = x2
x 2 f ( x)dx (18.17)
1 −∞
x2 2
= E[g(X)] where g(x) = 2e x
and where f(x) is the density function for a uniform distribution with a = 1 and b =
3. Also, X ~ U[1,3], i.e., X is uniformly distributed with a = 1 and b = 3.
Therefore, the pseudo-random U[0,1] numbers 0.23723063, 0.550678204,
0.873658577, 0.778915995, 0.383476144, 0.864940088, 0.636837027,
0.388494494, and 0.033923503 can be used to construct the pseudo-random
sequence 1.474461, 2.101356, 2.747317, 2.557832, 1.766952, 2.72988, 2.273674,
which pretend to be IID U[1,3]. Using the inverse cumulative is equivalent to
multiplying by (b – a) and then adding a.
From this sequence, one constructs the sequence 38.24, 730.71, 28628.23,
9081.29, 141.71, 25691.32, 1818.08, 148.51, and 7.13, which we pretend are IID
2
samples of 2e X X 2 . The average of these numbers is 7365.0 and the standard
deviation is 11607. Therefore, the Monte Carlo estimate for the original integral is
7365.0 with estimated error 11607/3 = 3869.0. Using 10,000 pseudo-random
numbers the estimate is 10949.51 with standard error 255.9. Therefore the true
integral value is very likely within 768 of 10949.5 (three standard deviations or 3.0
× σestimate).
or makes things worse. For example, a salesman might be selling snake oil as
something that makes hair grow when it does not.
The admission associated with t-testing is that even if changing levels does
nothing and the procedure is applied correctly, there is some low probability
significance will be established. Therefore, a criterion that can be used to evaluate
the t-test strategy is the probability that the test will wrongly indicate significance,
i.e., a “Type I error” is made and the snake oil salesman fools us.
The following assumptions can be used to create and/or verify the t-critical
values used in all standard t-test procedure:
1. When level 1 is inputted, responses are IID normally distributed with
mean, µ1, and standard deviation, σ1.
2. When level 2 is inputted, responses are IID normally distributed with
mean µ2 and standard deviation, σ2.
3. µ1 = µ2 + ∆ and ∆ = 0.0 if Type I errors are being simulated.
Under these assumptions and when α = 0.05, the probability of wrongly finding
significance is well known to be 0.05 independent of µ1, σ1, µ2, and σ2. This is the
defining property of the t-test strategy. As an example of evaluating a procedure
using Monte Carlo, we next show how this probability (0.05) can be estimated for
the case in example 1 in the proceeding section.
Carlo is trying to estimate the number 0.05 which is the exact Type I error
associated with the test strategy described above under standard assumptions.
Table 18.2 illustrates results from applying a spreadsheet solver to estimate the
Type I error rate.
Table 18.2. Simulations used to estimate the probability of Type I error (α)
No. Y1,1 Y1,2 Y1,3 Y2,1 Y2,2 Y2,3 y1 y2 s12 s22 t0 df tcritical I(Y)
1 -1.501 -6.388 1.221 -0.818 0.661 -0.760 -2.223 -0.306 14.867 0.702 -0.842 2 2.920 0
2 6.382 5.992 8.666 0.179 -0.031 -0.116 7.013 0.011 2.086 0.023 8.352 2 2.920 1
3 -10.918 -1.171 5.475 -1.137 0.610 0.092 -2.205 -0.145 67.984 0.805 -0.430 2 2.920 0
4 -5.434 -3.451 -8.452 -0.425 0.285 -0.680 -5.779 -0.273 6.342 0.250 -3.714 2 2.920 0
5 -9.235 -4.888 -3.868 2.008 -0.617 -0.564 -5.997 0.276 8.123 2.251 -3.373 3 2.353 0
6 -10.590 -2.840 -2.020 -1.457 -0.985 -1.044 -5.150 -1.162 22.362 0.066 -1.459 2 2.920 0
7 0.674 -1.827 -1.635 -1.163 0.895 -0.973 -0.929 -0.414 1.938 1.293 -0.497 3 2.353 0
8 -1.851 6.713 -0.426 0.044 0.483 0.498 1.479 0.342 21.059 0.067 0.428 2 2.920 0
9 -0.931 -2.566 9.861 -1.296 -0.650 -0.867 2.121 -0.938 45.595 0.108 0.784 2 2.920 0
10 4.328 11.878 -3.275 1.904 1.218 1.097 4.311 1.406 57.402 0.189 0.663 2 2.920 0
# # # # # # # # # # # # # # #
104 3.806 1.534 3.202 -0.669 -1.538 1.773 2.847 -0.145 1.385 2.948 2.490 3 2.353 1
In practice, one does not need to use simulation since the critical values are already
tabulated to give a pre-specified Type I error probability. Still, it is interesting to
realize that the error rates can be reproduced. Similar methods can be used to
estimate Type I error rates based on assumptions other than normally distributed
responses. Also, simulation can also be used to evaluate other properties of this
strategy including Type II error as described in the next example.
Suppose you are thinking about using a t-test to “analyze” experimental data in
which one factor was varied at two levels with two runs at each of the two levels.
Suppose that you are interested in entertaining the assumption that the true average
response at the two levels differs by ∆ = 0.5 seconds and that the random errors
always have standard deviations σ1 = σ2 = 0.3 seconds.
Question 1: What additional assumptions are needed for estimation of the power,
i.e., the probability that the t-test will correctly find significance?
Answer 1: Many acceptable answers could be given. The assumed mean difference
must be 0.5. For example, assume the level 1 values are IID N(µ1 = 0, σ1 = 0.3)
DOE and Regression Theory 435
and the level 2 values are IID N(µ2 = 0.5, σ2 = 0.3). Note that power equals 1 –
probability of Type II errors so that it is higher if we find significance more often.
Question 2: What would one Monte Carlo run for the estimation of the power
under your assumptions from Question 1 look like? Arbitrary random-seeming
numbers are acceptable for this purpose.
Answer 2: The responses were generated arbitrarily, being mindful that the
second level responses should be roughly higher by 0.5 than the first. Then, the
other numbers were calculated: Y1,1 = 0.60, Y1,2 = –1.20, Y2,1 = 2.10, Y2,2 = 0.10,
y1 = –0.30, y 2 = 1.10, s12 = 1.62, s22 = 2.00, t0 = –1.274, df = 2, tcritical = 2.92, I() =
0, because we failed to find significance in this simulation or though experiment.
Question 3: How might the Type II error probability be derived through averaging
the indicator function values from many simulation runs influence your decision-
making?
Answer 3: If one feels that the estimated Type II error probability for a given true
effect size is too high (subjectively), then we might re-plan the experiment to have
more runs. With more runs, we can generally expect the probabilities of Type I and
Type II errors to decrease and the probability to correctly detect effects of any
given size to increase.
Next, the implications of simulation results are explored related to the choices of
the initial sample sizes n1 and n2. Table 18.3 provides information in support of
decisions about the method parameters n1, n2, and α. Table 18.3 shows the chance
that significance will be declared under the standard assumptions described in the
last section. If ∆ = 0, then the table probabilities are the Type I error rates.
“Power” (β) is often used to refer to the probability of finding significance when
there is a true difference, i.e., ∆ 0. Therefore, the probability of a Type II error is
1 – β. Interpolating or extrapolating linearly to other sample sizes might give some
insights.
To use the decision support information in Table 18.3, it is necessary to
entertain assumptions relating to the size of the true prototype system to the
average response change that it is desirable to detect, ∆. Also, it is necessary to
estimate the typical difference, σ, between responses from prototype systems with
identical inputs. These numbers must be guessed, and then the implications of
various decisions about the methods to use can be explored as illustrated in the
following example. Note that the quantity, ∆ ÷ σ, is sometimes called the “signal to
noise ratio” even though it is not related to the “SN” ratio in Taguchi Methods
(from Chapter 15).
436 Introduction to Engineering Statistics and Six Sigma
α = 0.01 ∆÷σ
n1 = n2 0.001 0.5 1 2 5
3 1.0% 2.3% 4.8% 15.6% 67.6%
6 1.0% 5.4% 19.4% 71.5% 100.0%
α = 0.05 ∆÷σ
n1 = n2 0.001 0.5 1 2 5
3 5.0% 11.4% 22.1% 52.2% 98.1%
6 5.0% 11.9% 47.7% 93.8% 100.0%
Question 1: An auto racer is interested to know if a new oil additive reduces her
race time by 10.0 seconds, i.e., ∆ = 10.0 seconds. Also, the racer may know that,
with no changes in his or her vehicle or strategy, times typically vary ± 5.0
seconds. What is a reasonable estimate of ∆ ÷ σ?
Question 2: Assume that the cost of the fuel additive is not astronomically high.
Therefore, the racer is willing to tolerate a 5% risk, wrongly concluding that the
additive helps when it does not. What a level makes sense for this case?
Question 3: The racer is considering using n1 = n2 = 6 test runs. Would this offer
an high chance of detecting the effect of interest?
Answer 3: Yes, Table 18.3 indicates that this approach would give greater than or
equal to 93.8% probability of finding average differences significant if the true
benefit of the additive is a reduction on average greater than 10.0 seconds. Under
standard assumptions, the Type I error rate would be 0.05 and the Type II error rate
would be 0.062. In other words, if the effect of the additive is strong, starting with
6 runs gives an excellent chance of proving statistically that the average difference
is nonzero.
Question 4: Flow chart a decision process resulting in the selections α = 0.05 and
n1 = n2 = 6 using criteria power (g1), Type I error rate (g2), and number of runs (g3).
Pick
Method x1={n1=3,n2=3,α=0.05} with
method
Pick γ = δ/σ = 2.0 g1(x1,2.0) = 0.52, g2(x1,2.0)=0.05, g3(x1,2.0)=6
x1
because interested in
finding differences #
twice as large as
typical experimental Build and test 3
Method x4={n1=6,n2=6,α=0.05} with
prototypes at level 1
errors. g1(x1,2.0) = 0.94, g2(x1,2.0)=0.05, g3(x1,2.0)=12
and 3 at level 2, test
results with α = 0.05
Figure 18.5. Example t-test method (initial sample size and α) selection
Answer 1: The team would likely have been happy with factors reducing the
fraction nonconforming by ∆ = 2% or more. Also, they would likely agree that
only p0 = 50% of the factors were important (but they did not know which ones, of
course). A reasonable estimate for σ based on 500 samples would be 0.02 or 2%.
Question 2: Use Table 18.4 to estimate the power and pCS. Interpret this
information.
Table 18.4. Probability of finding a given important factor significant (the power)
Factors (m)
Assumptions n 3 4 5 6 7 8 9
Liberal (∆ = 2.0σ, p0 = 0.25,
IER, α = 0.05) 8 0.95 0.90 0.82 0.74 0.73 - -
Conservative (∆ = 1.0σ, p0 =
0.5, EER, α = 0.10) 0.69 0.61 0.45 0.36 0.33 - -
Liberal (∆ = 2.0σ, p0 = 0.25,
IER, α = 0.05) 16 0.96 0.99 1.00 0.98 0.97 0.96 0.93
Conservative (∆ = 1.0σ, p0 =
0.5, EER, α = 0.10) 0.74 0.79 0.99 0.77 0.93 0.87 0.84
Answer 2: Under conservative assumptions, the power might have been as low as
0.33 and the pCS as low as 0.07. However, looking back on the results it seems that
p0 was actually 1 ÷ 7 = 0.14. This means that the hypothetical planners would have
likely overestimated their own abilitities to identify important factors in
experimental planning. Further, such overestimation might have wrongly made
them believe that they needed more runs. With the sparsity present in the actual
system (small true p0), the chance of the method finding the important factor was
probably closer to 0.73.
Factors (m)
Assumptions n 3 4 5 6 7 8 9
Liberal (∆ = 2.0σ, p0 = 0.25,
IER, α = 0.05) 8 0.79 0.73 0.57 0.44 0.36 - -
Conservative (∆ = 1.0σ, p0 =
0.5, EER, α = 0.10) 0.45 0.13 0.17 0.09 0.07 - -
Liberal (∆ = 2.0σ, p0 = 0.25,
IER, α = 0.05) 16 0.79 0.76 0.57 0.76 0.45 0.53 0.30
Conservative (∆ = 1.0σ, p0 =
0.5, EER, α = 0.10) 0.52 0.58 0.35 0.48 0.36 0.31 0.16
Simulation results in Tables 18.4 and 18.5 support the following general
insights about standard screening using fractional factorials. First, using the IER
increases the power (but also the chance of Type I errors) compared with using the
EER. Second, using more factors generally reduces the probabilities of correct
selection. This corresponds to common sense in part because, with more factors,
more opportunities for errors are possible. Also, more interactions in the true
model are possible that can reduce the effectiveness of the screening analysis.
Finally, the better a job engineers or other team members do in selecting factors,
the higher the p0. Unfortunately, high values of p0 actually decreases the method
performance. In technical jargon, the chance of correct selection shrinks because
the methods are based on the assumption of “sparsity” or small p0.
Assumptions about the true model, ytrue(x), are critical to the theory of
experimental design. Clearly, if one knew the exact true model before
experimentation and the only goal was accurate prediction of the mean response
440 Introduction to Engineering Statistics and Six Sigma
ytrue(x1)
10 5.1(x1 – 8)
= + +Errors
2.4
x1 x1 x1
-10 6 8 10 6 8 10 6 8 10
Figure 18.6. Shows how the Taylor series approximates a function over an interval
$ $ $ $
x1 declared significant
DOE points
Response data
Method Tester Prediction point
One-way mirror Prediction for the mean
Imaginary User
Next, two example simulation runs useful for estimating the EIMSE
quantitatively are illustrated. The first, a simulation run, starts with assumptions
and generates an n = 4 dimensional simulated data vector, Y. Assume that the
experimental plan allocates test units at the points x1 = –1.0 mm, x1 = 0.0 mm, x1 =
–0.5 mm, and x1 = 1.0 mm. The assumed true model form is β0 + β1 x1 + β2 x12 + β3
x13 + ε. One starts with the pseudo-random numbers 0.236455525, 0.369270674,
0.504242032, 0.704883264, 0.050543629, 0.369518354.0.774762962,
0.556188571, 0.016493236. We use the first four to generate pseudo-random true
model coefficients from a N(0,γ2). Then, we use the next four numbers to generate
four random errors.
The Excel function “NORMINV” can be used to generate pseudo-random
normally distributed random numbers from pseudo-random uniformly distributed
random numbers. Note that this is not needed since Excel also has the ability to
generate normally distributed numbers directly; however, it is good practice to
generate all random numbers from the same sequence. Combining all this
DOE and Regression Theory 443
1 -1 1 0.004332
X1 = 1 -0.5 0.25 so that βest = 0.552287 (18.20)
1 0 0 -0.73481
seed. The resulting εP = 0.16. Thus, the n = 2 Monte Carlo estimate is the sample
average 0.38 with estimated error stdev(0.61,0.16)/sqrt(2) = 0.25.
Figure 18.9. A second run with the only difference being the random seed
In this section, the formula for the expected integrated mean squared error
(EIMSE) criterion is described. As for the last section, the concepts are potentially
relevant for predicting the errors of any “empirical model” in the context of a
given input pattern or design of experiments (DOE) array. Also, this formula is
useful for comparing response surface method (RSM) designs and generating them
using optimization.
The parts of the name include the “mean squared error” which derives from
the fact that empirical models generally predict “mean” or average response values.
The term “integrated” was originally used by Box and Draper (1959) to refer to the
fact that the experimenter is not interested in the prediction errors at one point and
would rather take an expected value or integration of these areas of all prediction
points of interest. The term “expected” was added by Allen et al. (2003) who
derived the formula presented here. It was included to emphasize the additional
expectation taken over the unknown true system model.
Important advantages of the EIMSE compared with many other RSM design
criteria such as so-called “D-efficiency” include:
1. The sqrt(EIMSE) has the simple interpretation of being the expected plus
or minus prediction errors.
2. The EIMSE criteria offers a more accurate evaluation of performance
because it addresses contributions from both random errors and “bias” or
model-mispecification, i.e., the fact that the fitted model form is limited in
its ability to mimic the true input-output performance of the system being
studied.
DOE and Regression Theory 445
An advantage of the EIMSE compared with some other criteria is that it does not
require simulation for its evaluation. The primary reason that simulation of the
EIMSE was described in the last section was to clarify related concepts.
The following quantities are used in the derivation of the EIMSE formula:
1. xp is the prediction point in the decision space where prediction is desired.
2. ρ(xp) is the distribution of the prediction points.
3. R is the region of interest which describes the area in which ρ(xp) is
nonzero.
4. βtrue is the vector of true model coefficients.
5. ε is a vector of random or repeatability errors.
6. σ is the standard deviation of the random or repeatability errors.
7. ytrue(xp,βtrue) is the true average system response at the point xp.
8. yest(xp,βtrue,ε,DOE) is the predicted average from the empirical model.
9. f1(x) is the model form to be fitted after the testing, e.g., a second order
polynomial.
10. f2(x) contains terms in the true model not in f1(x), e.g., all third order terms.
11. β1 is a k1 dimensional vector including the true coefficients corresponding
to those terms in f1(x) that the experimenter is planning to estimate.
12. β2 is a k2 dimensional vector including the true coefficients corresponding
to those terms in f2(x) that the experimenter is hoping equal 0 but might
not. These are the source of bias or model mis-specification related errors.
13. X1 is the design matrix made using f1(x) and the DOE array.
14. X2 is the design matrix made using f2(x) and the DOE array.
15. R is the “region of interest” or all points where prediction might be desired.
16. µ11, µ12, and µ22 are “moment matrices” which depend only on the
distribution of the prediction points and the model forms f1(x) and f2(x).
17. “E” indicates the statistical expectation operation which is here taken over
a large number of random variables, xp,βtrue,ε.
18. XN,1 is the design matrix made using f1(x) and all the points in the
candidate set.
19. XN,2 is the design matrix made using f2(x) and all the points in the
candidate set.
Answer: The assumptions σ = 2.0 mm and R is the cube defined by the ranges
2000 psi to 2300 psi, 3.0 mm to 6.0 mm, and 2 inches to 3 inches seem reasonable.
446 Introduction to Engineering Statistics and Six Sigma
With the goal of tuning, the choices f1(x)′ = [1 x1 x2 x3 x12 x22 x32 x1x2 x1x3 x2x3] and,
therefore, k1 = 10 seem appropriate which can easily be estimated with n = 20 runs.
With these definitions, the general formula for the expected integrated mean
squared error is:
EIMSE(DOE) = E {[ytrue(xp,βtrue) – yest(xp,βtrue,ε,DOE)]2} . (18.22)
xp,βtrue,ε
Note that this formula could conceivably apply to any type of empirical or
fitted model, e.g., linear models, kriging models, or neural nets. This section
focuses on linear models of the form
ytrue(xp,βtrue) = f1(x)β1 + f2(x)β2 . (18.23)
For properly constructed design matrices X1 and X2 based on the DOE and
model forms (see Chapter 13), the response vector, Y, describing all n experiments
is
Y = X1β1 + X2β2 + ε . (18.24)
It is perhaps remarkable that, for linear models, the above assumptions imply:
EIMSE(DOE) = σ2 Tr[µ11(X1′X1)–1] + Tr[B2 ∆] (18.25)
where “Tr[ ]” is the trace operator, i.e., gives the sum of the diagonal elements, and
B2 = E [β2β2′] , ∆ = A′µ
µ11A – µ12′A + A′µ
µ12 + µ22 , and
β2
A = (X1′X1)–1X1′X2 (18.26)
and
µij = ³R ρ(xp)fi(xp)fj(xp)′dxp for i = 1 or 2 and i ≤ j ≥ 2. (18.27)
Note that we have assumed that the random variables xp, βtrue, and ε are
independently distributed. If this assumption is not believable, then the formulas
might not give relevant estimates of the expected squared prediction errors.
However, simulation based approaches similar to those described in the last section
can be applied directly to the definition in Equation (18.23). This was the approach
taken in Allen et al. (2000) and Allen and Yu (2002).
Question 1: f1(x)′ = [1 x1 x12 x13] and ρ(x1) = 0.5 for – 1 ≤ x1 ≤ 1. What is µ11?
µ11 = ³R ρ(xp)f1(xp)f1(xp)′dxp
DOE and Regression Theory 447
§ 1 ·
2 3
¨ 1 0 0¸
§1 x1 x x · ¨ 3 ¸
¨ 1 1
¸ ¨ 1 1¸
¨ x1 x12 x3
x4
¸ 0 0
¨ 5¸
= ³R 0.5 ¨ 2 3
1 1
¸ dx1 = ¨ 1 1 ¸ (18.28)
¨ x1 x13 x4
1 x5
1 ¸ ¨ 0 0¸
¨ x3 ¸ ¨ 3 5 ¸
© 1 x14 x5
1 x6
1 ¹ ¨¨ 1 1¸
0 0 ¸
© 5 7¹
array. For example, with m = 3 factors, the default region of interest would be
a cube in the design space.
The design matrices associated with the candidate points are written XN,1
and XN,2. If the candidate points are a random sample from the region of
interest and N is large enough, then
µij = (N–1) × XN,iXN, j′ + εMC for i = 1 or 2 and i ≤ j ≥ 2, (18.29)
which is established by the central limit theorem and εMC is the Monte Carlo
error.
2. During the experimentation process, the experimenter will include only some
terms in the fitted model and thus effectively assume that the other terms equal
zero. Therefore, the experimenter hopes that the true system coefficients of the
terms assumed to equal zero, β2, actually are equal zero. Yet, what to assume
about these errors is unclear. Clearly, it is unwise to assume that all the β 2 are
all zero or B2 = E[β2β2′] = 0. This type of wishful thinking is embodied by
criteria such as the integrated variance and D-efficiency. These criteria lead to
optimistic views about the prediction accuracy and poor decision-making.
Here, two kinds of assumptions about B2 are considered. The first is B2 = γ2
× I, where γ is an adjustable parameter that permits studing of sensitivity of a
DOE and model form to bias errors. For example, γ = 0 represents the
assumption that β2 is zero and the EIMSE is the integrated variance. The
second derives from DuMouchel and Jones (1994). It can be shown that
assumptions in that paper imply
B2 = γ2 × C2 , (18.30)
where C is a diagonal matrix, i.e., the off-diagonal entries equal 0.0. The
diagonal entries equal the ranges of the columns, i.e., Max[ ] – Min[ ], of the
matrix a given by
α = XN,2 – XN,1(XN,1′XN,1) –1XN,1′XN,2 . (18.31)
The DuMouchel and Jones (1994) default assumption is γ = 1. Their choices
are also considered the default here because they can be subjectively more
reasonable for cases in which the region of interest has an usual shape. For
example, in experiments involving mixture variables (see Chapter 15), certain
factors might have much more narrow ranges than other factors. Then, the
assumption B2 = γ2 × I could imply a belief that certain terms in f2(x)β2 have far
more impact on errors than other terms. Fortunately, for many regions of
interest, including many cases with cuboidal regions of interest, the two types
of assumptions are equivalent.
3. The EIMSE formula above is based on the assumptions that the random errors
in ε are independent of each other and have equal variance. Huang and Allen
(2005) proposed a formula for cases in which these assumptions do not apply.
Calculation of the EIMSE formula here requires an estimate of the standard
deviation of the random errors, σ. This is the same sigma described in Chapter
4 which characteristizes the common cause variability of the system.
DOE and Regression Theory 449
Question: Consider two experimental plans for a single variable problem: DOE1 =
[–1 0 1]′ and DOE2 = [–1 0.95 1]′. Assume that γ = 0.4 and the fitted model form
will be f(x)′ = [1 x1]. What type of model form is this? Use default assumptions to
estimate the expected prediction errors associated with the process of
experimenting with each experimental design, writing down the data, and fitting
the model form f(x).
Answer: The fitted model is a first order polynomial. One assumes that ı = 1, Ȗ =
1 (true response has a default level of bumpiness), f2(x) = [x12], and the x1 input is
equally likely to be between –1 and 1 so x1 ~ U[–1,1]. First, using evenly spaced
candidate points on the line [–1, 1], one derives C = 1 and B2 = γ2C2 = 0.2. The
calculations are
ª1 − 1º ª1 º ª0.6667º
X1 = «1 0 » , X2 = «0» , A = (X1′X1)–1X1′X2 = « »,
« » « » ¬ 0 ¼
«¬1 1 »¼ «¬1»¼
1
ª1 0 º
µ11 = ρ ( x)f1 ( x)f1 ' ( x )dx = «
³
−1
»,
¬0 0.33¼
(18.32)
1
ª0.33º 1
µ12 = ³ ρ ( x)f1 ( x)f 2 ' ( x)dx = « » 22 ³ ρ ( x)f 2 ( x)f 2 ' ( x)dx = [0.2] ,
, µ =
−1 ¬ 0 ¼ −1
∆ = A′µ
µ11A – µ12′A + A′µ
µ12 + µ22 = 0.2 , and
EIMSE(DOE1) = σ2Tr[µ11(X1′X1)–1] + Tr[B2 ∆] = 0.1667 + 0.032 = 0.2 .
For DOE2, the matrices µ11, µ12, and µ22 are the same and
ª1 − 1 º ª 1 º
ª 0.9750 º
X1 = «1 0.95» , X2 = «0.9025» , A = « », (18.33)
« » « » ¬ − 0. 0237 ¼
«¬1 1 »¼ «¬ 1 »¼
The higher EIMSE for the DOE2 correctly reflects the obvious fact that the
second design is undesirable. Using DOE2, experimenters can expect roughly 50%
higher prediction squared errors. It could be said that DOE2 causes higher errors.
It should be noted that the formula derivation originally required two steps.
First, in general
β2′∆
∆β2 = Tr[(β2β2′)∆], (18.34)
450 Introduction to Engineering Statistics and Six Sigma
which can be proven by writing out the terms on both sides of the equality and
showing they are equal. Also, for constant matrix ∆ and matrix of random variables
(β2β2′), it is generally true that:
E{Tr[(β2β2′)∆]} = Tr[E[β2β2′]∆]} = Tr[B2 ∆]. (18.35)
Finally, note that the above EIMSE criteria have limitations which offer
opportunities for future research. These include that the criterion has not been
usefully developed for sequential applications relative to linear models. After some
data is available, it seems reasonable that this data could be useful for updating
beliefs about the prediction errors after additional experimentation. In addition,
efficient ways to minimize the EIMSE to generate optimal experimental designs
have not been identified.
prediction errors or, equivalently, the expected integrated means square error
(EIMSE) criterion. The final section describes a formula that can more efficiently
evaluate the EIMSE under specific assumptions and the details of its calculation.
18.8 References
Allen TT, Bernshteyn M (2003) Supersaturated Designs that Maximize the
Probability of Finding the Active Factors. Technometrics 45: 1-8
Allen TT, Yu L, Schmitz J (2003) The Expected Integrated Mean Squared Error
Experimental Design Criterion Applied to Die Casting Machine Design.
Journal of the Royal Statistical Society, Series C: Applied Statistics 52:1-15
Allen TT, Yu L (2002) Low Cost Response Surface Methods For and From
Simulation Optimization. Quality and Reliability Engineering International
18: 5-17
Allen TT, Yu L, Bernshteyn M (2000) Low Cost Response Surface Methods
Applied to the Design of Plastic Snap Fits. Quality Engineering 12: 583-591
Box GEP, Draper NR (1959) A Basis for the Selection of a Response Surface
Design. Journal of American Statistics Association 54: 622-654
Box GEP, Draper NR (1987) Empirical Model-Building and Response Surfaces.
Wiley, New York
DuMouchel W, Jones B (1994) A Simple Bayesian Modification of D-Optimal
Designs to Reduce Dependence on a Assumed Model. Technometrics
36:37-47
Huang D, Allen T (2005) Design and Analysis of Variable Fidelity
Experimentation Applied to Engine Valve Heat Treatment Process Design.
The Journal of the Royal Statistical Society (Series C) 54(2):1-21
Grimmet GR. and DR. Stirzaker (2001) Probability and Random Processes, 3rd
edn., Oxford University Press, Oxford
Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1993) Numerical Recipes
in C: The Art of Scientific Computing, 2nd edn. Cambridge University
Press, New York (also available on-line through www.nr.com)
Simmons GF (1996) Calculus with Analytic Geometry, 2nd edn. McGraw Hill,
New York
18.9 Problems
1. Which of the following is correct and most complete?
a. Random variables are numbers whose values are known at time of
planning.
b. Probabilities can generally be written as expected values of indicator
functions.
c. Probability theory and simulation can generate information of interest
to people considering which methods or strategies to apply.
d. mod(9,7) = 2.
452 Introduction to Engineering Statistics and Six Sigma
12. Use the assumption, B2 = (1.25)2 I and standard assumptions to calculate the
EIMSE for the following DOE array.
Run A B C
1 –0.5 –1.0 –0.5
2 –0.5 0.5 –1.0
3 0.5 1.0 –0.5
4 –1.0 0.0 0.0
5 0.5 –1.0 0.5
6 0.0 0.0 0.0
7 1.0 0.0 0.0
8 –0.5 –0.5 1.0
9 –0.5 1.0 0.5
10 0.5 0.5 1.0
11 0.5 –0.5 –1.0
Part III: Optimization and Strategy
19
19.1 Introduction
The selection of confirmed key system input (KIV) settings is the main outcome of
a six sigma project. The term “optimization problem” refers to the selection of
settings to derive to formally maximize or minimize a quantitative objective.
Chapter 6 described how formal optimization methods are sometimes applied in
the assumption phase of projects to develop recommended settings to be evaluated
in the control or verify phases.
Even if the decision-making approach used in practice is informal, it still can
be useful (particularly for theorists) to imagine a quantitative optimization problem
underlying the associated project. This imagined optimization problem could
conceivably offer the opportunity to quantitatively evaluate whether the project
results were the best possible or the project could be viewed as a lost opportunity
to push the system to its true potential. The phrase “project decision problem”
refers to the optimization problem underlying a given six sigma project.
In this part of the book, “strategy” refers to decision-making about a project
including the selection of methods to be used in the different phases. The strategic
question of whether to use the six sigma method or “adopt” six sigma on a
companywide basis is briefly discussed in Chapter 21, but the focus is on project
decision problems. Therefore, strategy here is qualitatively different than design of
systems that are not methods. A second optimization problem associated with six
sigma projects involves the selections of techniques to derive most efficiently the
solution of the underlying project decision problem. For example, in some cases
benchmarking can almost immediately result in settings that push a system to its
potential. Then, bechmarking could itself constitute a nearly optimal strategy
because it aided in the achievement of desirable settings with low cost.
In this chapter, optimization problems and formal methods for solving them are
described in greater detail. This discussion includes optimization problems taking
into account uncertainty. For example, the robust design optimization described in
Chapter 14, uncontrollable “noise” factors constitute random variables that can
458 Introduction to Engineering Statistics and Six Sigma
The numbers –11, 12, and –2 in Equation (19.2) might have been derived
from experimentation and regression, perhaps, but this first example is really a
“toy” problem for the purposes of illustration. This problem is not representative
of actual problems that decision-makers might encounter. This follows because
(probably) most formal optimization problems of interest involve considerably
larger decision spaces, M, i.e., more decision variables and/or ranges that contain
so-called local maxima.
10.0
g(x1)
0.0 x1
0 1 2 3 4 5 6 7 8
-10.0
M
-20.0
-30.0
-40.0
-50.0
Figure 19.1. Illustration of the optimization region, M, and the solution to (2), xoptimal = 3
60
50
40 Global maximum
over M
30 Local maxima
20 M
g(x1)
10
0
0 1 2 3 4 5 6 7 8
Figure 19.2. A formulation with two local maxima and one global maximum
Answer 2: In the snap tab case study, a large number of factors were considered
and strategy limitations did not permit the creation of accurate models of the KOVs
as a function of all KIVs. For this reason, cause and effect matrices were used to
shorten then KIV list. Then, an innovative design of experiments (DOE) method
was used to quantify input-output relationships.
Question 2: Rewrite the snap tab formulation from Chapter 17 into the form in
Equation (19.1).
In the real snap tab study, the Excel solver was used with multiple starting
points to derive the recommended settings, x1=1.0, x2=0.85, x3=1.0, and x4=0.33.
An exercise at the end of the chapter involves using the Excel solver to derive
these settings by coding and solving Equation (19.3). To solve this problem one
needs to activate the “Solver” option under the “Tools” menu. It may be necessary
to make the solver option available in Excel because it might not have been
installed. Do this using the “Add-Ins” option, also under the “Tools” menu.
Question 1: If part distortion causes $0.6M per year per mm in average distortion
in rework costs and the current gate position is 9 mm, suppose any change cost
$0.2M. Formulate decision-making about gate position as an optimization problem.
Suppose a die casting engineer has the following prediction model for average part
distortion y (in mm), as a function of gate position, x1 (in mm): y(x1) = 5.2 – 4.1x1 +
1.5 x12.
Answer 2: Assuming the current setting is not optimal, d/dx1 [g(x1)] = 0 = 0.6[–4.1
+ 2(1.5)x1,opt], x1,opt = 1.36 mm g(x1) = $1.6M < $53.8M, so the assumption is
valid. Therefore, unless conditions other than gate position are more important, the
casting engineer should seriously consider moving the gate position to 1.36 mm.
parameters. For example, times are always less than a polynomial function of m
and q for some finite coefficients. See Papadimitriou (1994) for more information.
The ability to generate a global maxima efficiently cannot be guaranteed for some
quadratic programs, many types of integer programs, and for perhaps most
problems of interest. The class of other, more challenging problems is “non-
polynomial time” or “np-hard” problems.
“Heuristics” are procedures that do not guarantee to find a global maxima in
polynomial time. By contrast, “rigorous algorithms” are procedures associated
with a mathematically proven claim about the objective values of the solutions
produced in polynomial time. Sometimes, one also uses rigorous algorithms to
refer to methods that eventually converge to a global maxima.
Generally, rigorous algorithms are only available for polynomial time
optimization problems. Much, perhaps most, of the historical contributions to the
study of operations research relates to exploiting the properties of specific
formulations to produce methods that guarantee the attainment of global optimal
solutions in reasonable time periods. Yet the properties of the objective function,
g(x), only permit operations researchers and computer scientist to apply heuristic
solution methods. For example, the Excel solver has some difficulty solving
Equation (19.3) even though the optimization is over only four decision variables,
because the quadratic program does not have a convex B.
Size
g(x) M Name Type
Parameters
Linear
Linear is the xi Convex set m and q Polynomial
program
Linear plus a term x′Bx
Quadratic
with positive Convex set m and q Polynomial
program
semidefinite (PSD) B
Linear plus a term x′Bx Quadratic Non-
Convex set m and q
with non-PSD B program polynomial
m, q, General
Nonlinear with integer Non-
Nonconvex objective Integer
constraints on x polynomial
function size Program (IP)
Subject to: x ∈ M
and where Z = [Z1, Z2, …, Zq]′ where the Zi are random variables with known
distribution functions.
The semantic distinction between problems of the form in Equation (19.5) and
those of the form in Equation (19.1) is blurred by the realization that every
problem of the form in Equation (19.1) could include a term +E[0] in the objective
464 Introduction to Engineering Statistics and Six Sigma
where Z 1 ~ N(µ=100, σ=25). Therefore, the first term in the expectation is the
revenue from sales and the second term is the upfront cost.
A person knowledgeable about statistics and calculus might recognize that the
objective function, g(x1), in Equation (19.6) can be expressed in terms of the mean
value of a truncated normal distribution for which interpolation functions might be
used. To that person, numerical integration would not be needed, probably
permitting more efficient code to be developed for the problem solution. Then, he
or she might not refer to Equation (19.6) as a stochastic optimization problem.
Instead, he or she might call this problem “deterministic” or not requiring
numerical integration.
Still, many people might use Monte Carlo to estimate g(x1) in Equation (19.6)
in the context of their optimization method. For them, Equation (19.6) would be a
stochastic optimization problem. Whichever way one uses to solve Equation
(19.6), the solutions is xoptimal = 106 newspapers. Also, it is acceptable for anyone
to say that Equation (19.6) is a stochastic optimization problem because of the
ambiguity. Next, we define a general-purpose heuristic for solving stochastic
optimization problems, which could be used to derive xoptimal = 106 newspapers.
Question 1: If part distortion causes $0.6M per year per mm in average distortion
in rework costs and the current gate position is 9 mm. Suppose any change cost
$0.2M. Formulate decision-making about gate position as an optimization problem.
Suppose a die casting engineer has the following prediction model for average part
Optimization and Strategy 465
distortion, y (in mm), as a function of gate position, x1 (in mm): y(x1) = 5.2 – 4.1x1
+ 1.5 x12.
Answer 2: Assuming the current setting is not optimal d/dx1 [g(x1)] = 0 = 0.6[–4.1
+ 2(1.5)x1,opt], x1,opt = 1.36 mm g(x1) = $1.6M < $53.8M so the assumption is
valid. Therefore, unless conditions other than gate position are more important, the
casting engineer should seriously consider moving the gate position to 1.36 mm.
Genetic algorithms are motivated by the process of natural evolution and the
observation that the nature is successful in creating fit organisms adapted to their
environments. GAs differ from the majority of optimization methods which iterate
from a single solution to another single solution. Instead, GAs iterate between
whole sets of solutions or “populations”. The population being considered is
called the current generation. Individuals in this population are called
chromosomes and are each associated with one possible solution to the
mathematical program. Iteration is based on the natural processes including
probabilistic mating, crossover based reproduction, and mutation based on the
fitness.
To understand how these natural processes relate to solving mathematical
programs, it is helpful to understand first how an individual chromosome can be
interpreted as a solution to a mathematical program. The chromosome is typically
stored in coded form, e.g., as a vector of real numbers between 0 and 1. Each of
these numbers is called a “gene” with reference to natural selection. This form
must be decoded to be interpreted as a system design option, x. Often, the number
Optimization and Strategy 467
Figure 19.3. (a) A sample chromosome, (b) the decoded solution, (c) the fitness
The following method is proposed for student use. It is coded in the Appendix to
this chapter, and it is called “toycoolga”. The initial population of N chromosomes
is generated using U[0,1] pseudo-random numbers. Then, the method is iterated for
a pre-specified number of generations. In each iteration, the chromosomes
associated with the top e estimated fitness values are copied from the current
generation to the next generation. If Monte Carlo (MC) is used for the fitness
evaluation, then n simulations are used for all evaluations. The putatively top e
solutions are called the elitist subset. The next c chromosomes in the new
generation are created through so-called “one-point crossovers” that mix two
solutions. The word “putatively” refers to the fact that we do not know for certain
which solutions have the highest mean values because of MC errors.
In one-point crossovers, two chromosomes or “parents” are selected from the
current generation. Each chromosome has an equal probability of being selected
for parenting. Then, a random integer, I, between zero and the number of genes, m,
is selected. The chromosome entered into the next population contains the same
first I genes as the first parent and the remaining m – I genes come from the second
parent. The remaining N – e – c chromosomes in the new generation are generated
using U[0,1] pseudo-random numbers. The term “immigrants” refers to these
468 Introduction to Engineering Statistics and Six Sigma
Table 19.2. (a) The population at generation t; (b) the population at generation t + 1
The first problem is originally from De Jong (1975), but also incorporates the
modification of Aizawa and Wah (1994):
30 (19.9)
Maximize : g ( x ) = − E (¦ i ⋅ xi + 64ω )
4
x ω
i =1
where –1.28 ≤ xi ≤ 1.28 ∀ i = 1,…,30 ω ~ N(µ = 0,σ = 1). This is the problem
coded into the fitness function in the Appendix.
The second problem is from Mühlenbein et al. (1991)
20
ª º
Maximize : g ( x ) = − E «20 + ¦ ( xi − cos( 2πxi )) + ω »
2
(19.10)
x ω
¬ i =1 ¼
where –5.12 ≤ xi ≤ 5.12 ∀ i = 1,…,20 ω ~ N(µ = 0,σ = 35). These problems
provide a nontrivial challenge to the proposed optimization method. Optimal
design of experiments (DOE) problems such as the simulation optimization ones in
Optimization and Strategy 469
pi = [Σ i=1,…,N fi]
–1
(fi) (19.11)
This selection is vulnerable to converging to a local optimum since, once
found, the local optimum will be assigned a high probability to be selected and its
components will multiply in the next generation. If the search does not identify a
better solution soon enough, this local optimum will fill up the whole generation.
Once all solutions are alike, only mutation will be able to produce different
solutions, thus deteriorating the efficiency of the algorithm.
Ranking Selection
In ranking selection, the fi are ranked for i=1,…,N. Then, the probabilities of
selection are functions of the ranks only. This approach prevents a much better, but
a local solution from being excessively successful in the selection process and
eventually dominating the whole generations.
Tournament Selection
To select a solution for mating, a group of size q ≥ 2 (tournament size) is drawn
from the generation with replacement. The solution of this group with the highest
objective function value passes to the mating. The process continues until enough
crossed over solutions are produced.
Elitist Selection
Under previous schemes of selection not involving elitist subsets it is possible that
the best solution will not be passed to the next generation. A popular method to
470 Introduction to Engineering Statistics and Six Sigma
ensure monotonicity of the objective value of the best candidate from generation to
generation is to copy a subset of e ≥ 1 of the putatively best solutions from
generation to generation. Elitist selection, in general, is an additional selection
feature that can be combined with any of the previously described.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
struct CANDIDATE {
double fitness;
Optimization and Strategy 471
double vector[SIZE];
double stdev;
long evals_used;};
output=fopen("outputfile.txt","w");
/* Convert the eliteFraction and randomFraction
from fractions to numbers */
nelite=(long)(sizeOfGeneration*eliteFraction);
nrandom=lmax(1,(long)(sizeOfGeneration*randomFraction));
/* STEP I */
/* Cloning (copying) of the top number of solutions
472 Introduction to Engineering Statistics and Six Sigma
specified by nelite */
for (i=0; i<nelite; i++) newgeneration[i]=generation[i];
/* STEP II */
/* Filling the new generation with crossovers */
for (i=nelite; i < (sizeOfGeneration-nrandom); i++)
{
/* Randomly picking parents */
parent1=getRandomNumber(&seed,(long)(sizeOfGeneration/2));
/* picking 1st one from a better part */
parent2=getRandomNumber(&seed,sizeOfGeneration);
/* One-point crossover is implemented */
cut=1+getRandomNumber(&seed,SIZE-1); /* this
decides after which coordinate
to cut the solutions */
for (coord=0;coord<cut;coord++)
newgeneration[i].vector[coord]=generation[parent1].vector[coord];
for (coord=cut;coord<SIZE;coord++)
newgeneration[i].vector[coord]=generation[parent2].vector[coord];
} //Crossovers are in.
/* STEP III */
/* Filling the rest of the new generation with random solutions */
for (i=(sizeOfGeneration-nrandom);i<sizeOfGeneration;i++)
generateCandidate(&newgeneration[i]);
*result=currentbest;
bestsolution=fopen("bestsolution.txt","w");
int ind; for ( ind=0;ind<SIZE;ind++)
fprintf(bestsolution,"%5.2lf ",currentbest.vector[ind]);
fclose(output);
fclose(bestsolution);}
/****************/
void generateCandidate(struct CANDIDATE *newborn) {
long i; for (i=0;i<SIZE;i++) newborn->vector[i]=ranS(&seed);
newborn->fitness=666; /* just to set it to something */
newborn->evals_used=-1; /* that's nothing was used */ }
/**********************************************/
long lmax(long a,long b) {return (a>b)?a:b;}
/***************************************/
int compare(const void *vp, const void *vq) {
const struct CANDIDATE *p;
const struct CANDIDATE *q;
p=(struct CANDIDATE *) vp;
q=(struct CANDIDATE *) vq;
if ((*p).fitness<(*q).fitness) return -1;else
if ((*p).fitness>(*q).fitness) return 1;
else return 0; }
/***************/
//The fitness function is the part that is tailored to each problem.
//The program will minimize the expected fitness value.
double fitness(struct CANDIDATE *x, long noise) {
int j; double s,vect[SIZE];
/* We'll have to decode the chromosome first */
for (j=0;j<SIZE;j++) vect[j]=-1.28+(x->vector[j])*1.28*2;
/* now calculate the fitness */
s=0;
for (j=0;j<SIZE;j++) s+=(j+1)*pow(vect[j],4.0);
if (noise) s+=64*gasdev(&seed);
return s; }
19.8 References
Aizawa AN, Wah BW (1994) Scheduling of Genetic Algorithms in a Noisy
Environment. Evolutionary Computation 2:97-122
Allen TT, Bernshteyn M (2003) Supersaturated Designs that Maximize the
Probability of Finding the Active Factors. Technometrics 45: 1-8
Andradóttir S (1998) A Review of Simulation Optimization Techniques.
Proceedings of the 1998 Winter Simulation Conference, Washington, DC
Bernshteyn M (2001) Simulation Optimization Methods That Combine Multiple
Comparisons and Genetic Algorithms with Applications in Design for
Computer and Supersaturated Experiments. Ph.D. Dissertation. The Ohio
State University, Columbus, Ohio.
Boesel J (1999) Search and selection for large-scale stochastic optimization. Ph.D.
Dissertation. Department of Industrial Engineering and Management
Sciences, Northwestern University, Evanston, Illinois
Optimization and Strategy 475
19.9 Problems
1. Which is correct and most complete?
a. Optimal strategy cannot involve method selection.
b. Tolerance design cannot involve optimal selection of components.
c. Constraints can be used to specify the feasible region, M.
d. All of the above are correct.
e. All of the above are correct except (a) and (d).
7. Which is correct and most complete with reference to the problem in Equation
(19.8)?
a. A global minimum problem has xi = 1 for all i.
b. The objective value in the problem can reach 25.0.
c. Efficient solution methods can use variable sample sizes as they
progress.
d. All of the above are correct.
e. All of the above are correct except (a) and (d).
Tolerance Design
20.1 Introduction
“Tolerance design” refers to the selection of specifications for individual
components using formal optimization. Specifications might relate to the
acceptable length of a shaft, for example, or the acceptable resistance of a specific
resistor in a printed circuit board. Choices about the specifications are important in
part because conforming component parts can cause the entire engineered system
to fail to conform to specifications. Also, sometimes the specification limits may
be needlessly “tight” requiring expensive manufacturing equipment that does not
benefit the customer.
“Statistical tolerancing” is the study of the properties of an ensemble of
components using assumed properties of the individual components. Monte Carlo
simulation from Chapter 18 is a powerful method in this study. Stochastic
optimization from Chapter 19 can also be used to select the optimal combination of
tolerances to achieve a variety of possible objectives.
“Stackup analysis” is statistical tolerancing when distances are associated with
ensemble properties. Such analysis of specifications might involve all the
complications of so-called geometric dimensioning and tolerancing (GD&T, e.g.,
see Krulikowski 1997). In some cases, Monte Carlo technology is built into the
computer aided design (CAD) software for stackup analyses.
Question: Two resistors are in series and the resistance of each is assumed to be a
normally distributed random variable with mean 10 ohms and standard deviation
0.5 ohms. What is the resistance distribution of the assembly for the two resistors
480 Introduction to Engineering Statistics and Six Sigma
in series and the chance that the entire component will conform to specifications,
LSL = 17.1 ohms and USL = 22.1?
Answer: Monte Carlo simulation from Chapter 19 using Excel and Tools ĺ Data
Analysis ĺ Random Number Generation derives that the series resistance has
expected value equal to 20.0 ohms with standard deviation equal to 0.70. The
chance of conformance can be similarly estimated to equal 0.997.
Question: A “co-packer” company inserts shampoo into bottles and sells the
bottle to a well known brand for retail. The co-packer may have three potential
equipment choices, i = 1, 2, and 3. The equipment cost c1 = $90K, c2 = $110K, and
c3 = $115K. The volumes of materials inserted are random variables whose
distribution depends on the equipment choice and the nominal setting, µ. Past data
confirm that the volumes are normally distributed to a good approximation and that
the equipment is associated with standard deviations in ounces of σ1 = 0.35, σ2 =
0.15, and σ3 = 0.04, respectively. Further, assume that the co-packer makes 10
million bottles a year with a material cost equal to $0.01/ounce. Finally, assume
that any units found below the lower specification limit of 16.0 ounces cost the
company $1 in penalty. Assume that USL – LSL = 10σi and USL + LSL = µ. By
selecting USL and LSL, you are selecting specific equipment and the process
mean. Which USL and LSL do you recommend?
20.3 References
Krulikowski A (1997) Geometric Dimensioning and Tolerancing. Thomson
Delmar Learning, Albany, NY
20.4 Problems
1. Consider two resistors in series each with resistance uniformly distributed with
mean 10 ohms and uniform limits a = 8.5 ohms and b = 11.5 ohms. Which is
correct and most complete?
a. Pr{series resistance is between 19.0 and 21.0 ohms is} > 0.95.
b. The series resistance is not uniformly distributed.
c. The mean or average series resistance is 20.0 ohms.
d. All of the above are correct.
e. All of the above are correct except (a) and (d).
21.1 Introduction
The purposes of this chapter are: (1) to describe six sigma strategy and (2) to
propose opportunities for additional research and evolution of six sigma. Part I of
this book describes several methods that can structure activities within a project.
Part II focuses on design of experiment (DOE) methods that can be used inside six
sigma projects. DOE methods are complicated to the extent that decision-making
about them might seem roughly comparable to decision-making about an entire
project.
The extension of DOE theory from Chapter 18 and optimization methods from
Chapter 19 to the design of projects constitutes perhaps the primary suggestion for
future research. In Chapter 19, “strategy” is defined as decision-making about
projects, focusing on the selection of methods to be used in the different phases.
Brady (2005) proposed the following definitions:
Micro – dealing with individual statistical methods.
Meso – supervisor level decision-making about method selection and timing.
Macro – related to overall quality programs and stock performance.
Brady (2005) argued that the primary contributions associated with six sigma
relate to the meso-level because the definition of six sigma relates to meso-level
issues issues. These meso-level contributions and possible future meso-level
analyses are explored in this chapter.
Section 2 reviews the academic literature on six sigma based on Brady (2005).
Section 3 explores the concept of “reverse engineering” six sigma, i.e., hypotheses
about why six sigma works. The further hypothesis is suggested that understanding
six sigma success is critical for additional contributions. Section 4 describes
possible relationships of the sub-methods to decision problems that underly six
sigma projects. Section 6 describes meso-level analysis of projects at one company
reviewing results in Brady (2005). Section 7 concludes with suggestions for
possible future research.
484 Introduction to Engineering Statistics and Six Sigma
little attention, in part because data relating to project actions and results has been
largely unavailable. Also, modeling the performance of a series of statistical
method applications is generally beyond the scope of statistical researchers.
Table 21.1. Definition of six sigma related to requirements for return on investments
Requirement
Management Desirable Convincing
Definition or principle
support settings evidence
Six sigma is an organized
No premature Display of
… problem-solving method
– closure, based respect for the
… based on statistical
on data problem
methods …
Six sigma … includes as Guarantees
phases either … Control … – – thorough
or … Verify confirmation
The six sigma method only
Six sigma Motivation of
fully commences … after
programs pay quantitative –
establishing adequate
for themselves feedback
monetary justification.
Practitioners applying six
Motivation of
sigma can and should
– reduced credit –
benefit … without …
sharing
statistical experts.
Yet the principle that statistics and optimization experts are not needed implies
constraints on the methods used and substantial training costs. Clearly the methods
developed by operations researchers and statisticians for use in six sigma projects
must be “robust to user” to the extent that people with little training can benefit
from them. Also, without statistical experts, many (if not all) personnel in
companies must receive some level of training. Since many do not have the same
level of technical knowledge as engineers or scientists, the technical level of
training is generally lower than this book.
Answer: The text implies that engineers and scientists might generally sacrifice
solution quality to avoid sharing credit with statisticians. Also, Table 21.1 implies
Six Sigma Project Design 487
that none of the other aspects of the six sigma definition provides important
motivation for managers to adopt six sigma. The confirmation in the control or
verify phases aids in acceptance but not improvement. Therefore, the most
complete, correct answer is (f).
Note that many users of six sigma know little about formal optimization and
would find Equation (21.1) puzzling. This follows perhaps because the challenge
in uncovering the objective function and constraint sets can be much more difficult
than solving the problem once formulated. For example, once design of
experiments has been applied, apparently desirable settings can come from
inspection of main effects plots (Chapter 12), 3D surface plots (Chapter 13), or
marginal plots (Chapter 14). These practitioners are therefore attempting to solve
an optimization problem without being aware of the associated formal vocabulary.
Table 21.2 reviews selected methods from previous chapters and their possible
roles in solving a formulation of the form in Equation (21.1). Some of the methods
such as Pareto charting and cause and effects matrix construction result in subtle
adjustments in the perceptions of objective or beliefs about which factors have no
effects on the key output variables. Other methods such as regression, response
surface methods, and formal optimization can play direct roles in exposing and
solving the underlying formulation.
Researchers in optimal design of experiments (DOE) have long based their
theories on the view that, once the DOE array or strategy is decided, many
important consequences govering the prediction accuracy and decisions made later
inevitably follow. This is the point of view underlying the evaluation of criteria
described in Chapter 18. Extending such concepts to six sigma project and meso-
analysis implies that certain strategies are likely to foster good outcomes in certain
cases.
Six Sigma Project Design 489
would result. Therefore, the recommendations, xop, are a random variable, the
expected value of its associated performance, g(xop), can be used to judge any
given strategy.
preliminary
regression
Pareto charting
C&E Matrix
RSM
Make
Charter Formal
optimization
xop,1
x0
Figure 21.1. “Method rollercoaster” is a strategy that starts with x0 and ends with xop,1
contributions to six sigma and future methods of similar scope. This section
describes several possible areas for future research.
Table 21.3 overviews an admittedly arbitrary sampling of proposed areas for
future research. The first two rows represent continuing on-going threads of
research likely to be received gratefully by practitioners in the short run. Continued
quantitative research on the value of six sigma programs will likely be of interest to
stock holders and management partly because past results are somewhat
inconclusive.
Research on new statistical micro-level methods for general uses can be highly
valuable. Further, advances in computational speed and optimization heuristics
provide unprecedented opportunities for new method development. Through
applying optimization, it is possible that many if not all criteria used to evaluate
methods can be used to generate new methods. For example, methods can be
designed tailored to specific problems to maximize the chance of getting correct
information or the expected profit from performing the study. However, studies of
historical successes in micro-level method development such as Xbar & R charting
from Shewhart (1931) suggest that practitioners may be slow to perceive value.
Six sigma explicitely creates different classes of practitioners interested in
methods: green-belts, black belts, and master black belts. The reverse engineering
of six sigma suggested that all of these classes are interested in using these
492 Introduction to Engineering Statistics and Six Sigma
practices without the aid of statistical methods. However, the requirement that
methods be robust to the user does not mean that the methods cannot have
complicated derivations and justifications. For example, many users of fractional
factorials understand the process described in Box, Hunter, Hunter (1961) for the
derivation of the arrays used.
Possibilities for valuable new micro level-methods exist for cases in which
current similar methods are not in widespread use, e.g., supersaturated designs
from stochastic optimization in Allen and Bernshteyn (2003). Further, “data
mining” or analysis of very large data sets using novel regression type or other
methods continues to be an important area of investigation. Much of the related
technology has not reached the level of maturity in which the methods are robust to
user.
Also, new methods can potentially dominate many or all performance criteria
relative to time-tested methods. For example, the EIMSE optimal design of
experiments (DOE) arrays from Allen et al. (2003) offer methods with both fewer
runs and lower expected prediction errors than either central composite or Box
Behnken designs. Other possible areas under this category include new DOE arrays
for problems involving categorical variables, custom arrays associated with
reduced costs, and fully sequential response surface methods (RSM) deriving
better results using reduced experimental costs. Also, multi-variate analysis and
monitoring technologies tailored to specific problems can be developed.
In general, Bayesian decision analysis based methods (e.g., Degroot 1970) have
not been fully exploited with respect to their abilities to generate practically useful
methods. Also, it is possible that many (if not all) of the quantitive methods in this
book could be improved in their ability to foster desirable outcomes and user
robust methods using optimization and/or Bayesian analyses.
With its emphases on monetary justification and documentation, six sigma has
spawned the creation of many corporate databases containing the methods used in
projects and the associated financial results. While the majority of these databases
are confidential, practitioners at specific companies generally have access to their
own organization’s database. Brady (2005) proposed several approaches for
analyzing such databases and the possible benefits of related research.
Results and benefits were illustrated using an example database describing 39
projects. A sampling of that project database is shown in Table 21.4. “#” refers to
the project number. “M/I” indicates management (M) or individuals (I) identified
the preliminary project charter. “A/P” indicates whether team members were
assigned (A) or elected to participate (P). “#P” is the number of people on the
team. “EC” is the number of economic analyses performed. The actual table in
Brady (2005) also contained the number of SPC, DOE, and other quality methods
used. Profits were estimated assuming a two-year payback period that addresses
the fact that affected products typically have a two-year life cycle.
Brady (2005) showed how EWMA control charting (Chapter 8) of the profits
from individual six sigma projects can provide quantitative, statistical evidence of
the monetary benefits associated with training programs. Also, the analysis
indicated that design for six sigma (DFSS) can offer far higher profits than
improvement projects. The control charting resulted in a point removed
corresponding to the DFSS project found by charting to be not representative
(following the approach in Chapter 4). The remaining data were analyzed using
regression (Chapter 15) and Markov Decision Processes (e.g., see Puterman 1994).
Results included prescriptive recommendations about which sub-methods can be
expected (at the related company) to achieve the highest profits in which situations.
Table 21.4. Sample from an open source six sigma project database
A
Expected Expected M/ # E
# / … Cost Savings Profit
savings time I P C
P
# # # # # # # # # # #
Much of the information from the analyses in Brady (2005) based on the
databases could be viewed as commonsensical. However, possible uses of
494 Introduction to Engineering Statistics and Six Sigma
information from such activities include feedback at the meso-level about needs for
six sigma program management adjustments, improvements in training methods,
and improvements to project budget planning. In general, modeling of effects of
method applications on profits can provide many intangible benefits to statistical
researchers including quantitative confirmation of specifics of the methods.
Additional research can consider larger databases and, possibly, achieve
stronger inferences with larger scope than a single manufacturing unit. Also, a
wider variety of possible analysis methods can be considered including neural nets,
logistic regression, and many other techniques associated with data mining. Each
of these methods might offer advantages in specific contexts and answer new types
of questions. Overall, it seems that analyses of project databases is largely
unexplored.
Finally, the same criteria and test beds associated with method evaluation and
comparison can be used for generating optimal strategies. For example, Brady
(2005) used an optimization method called stochastic dynamic programming to
solve for the optimal sub-methods that maximize profits pertinent to many realistic
situations. Yet the assumptions in Brady (2005) were not sufficiently realistic to
provide valuable feedback to engineers. Much more research is possible to increase
the scope and relevance of efforts to develop optimal improvement strategies.
21.6 References
Allen TT, Yu L, Schmitz J (2003) The Expected Integrated Mean Squared Error
Experimental Design Criterion Applied to Die Casting Machine Design.
Journal of the Royal Statistical Society, Series C: Applied Statistics 52:1-15
Allen TT, Bernshteyn M (2003) Supersaturated Designs that Maximize the
Probability of Finding the Active Factors. Technometrics 45: 1-8
Bisgaard S, Freiesleben J (2000) Quality Quandaries: Economics of Six Sigma
Program. Quality Engineering 13: 325-331
Box GEP, Hunter JS (1961) The 2k-p fractional factorial designs, part I.
Technometrics 3: 311-351
Brady JE (2005) Six Sigma and the University: Research, Teaching, and Meso-
Analysis. PhD dissertation, Industrial & Systems Engineering, The Ohio
State University, Columbus
Brady JE, Allen TT (accepted) Six Sigma: A Literature Review and Suggestions
for Future Research. Quality and Reliability Engineering International
(special issue) Douglas Montgomery, ed.
DeGroot MH (1970) Optimal Statistical Decisions. McGraw-Hill, New York
Goh TN, Low PC, Tsui KL, Xie M (2003) Impact of Six Sigma implementation on
stock price performance. Total Quality Management & Business Excellence
14:753-763
Hoerl RW (2001) Six Sigma Black Belts: What Do They Need to Know? The
Journal of Quality Technology 33:391-406
Hazelrigg G (1996) System Engineering: An Approach to Information-Based
Design. Prentice Hall, Upper Saddle River, NJ
Montgomery D (2001) Editorial, Beyond Six Sigma. Quality and Reliability
Engineering International 17(4):iii-iv
Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic
Programming. John Wiley & Sons, Inc., New York
Ribardo C, Allen TT (2003) An Alternative Desirability Function For Achieving
Six Sigma Quality. Quality and Reliability Engineering International 19:1-
14
Shewhart WA (1931) Economic Quality Control of Manufactured Product. ASQ,
Milwaukee, WI (reprinted 1980)
Welch J, Welch S (2005) Winning. HarperBusiness, New York
496 Introduction to Engineering Statistics and Six Sigma
21.7 Problems
1. Which is correct and most complete according to the text?
a. Documenting a project case study is generally a micro-level
contribution.
b. Future opportunities for survey based contributions are generally not
possible.
c. Models predicting the value of specific strategies are entirely lacking.
d. The subject of what should be taught to black belts has been
addressed in the literature.
e. All of the above are correct.
f. All of the above are correct except (a) and (e).
3. Which is correct and most complete using the notation from the text?
a. In general, g(x) corresponds to the primary KOV.
b. Control charting likely results in improved recommendations for
settings, xop.
c. Regression models cannot be used to specify constraint sets.
d. Formal optimization is generally used by all teams in six sigma
projects.
e. All of the above are correct.
f. All of the above are correct except (a) and (e).
5. Adapt the genetic algorithm in Chapter 19 to select n1, n2, c1, r, and c2 to
minimize a weighted sum of the expected number of tests and the chance of
Six Sigma Project Design 497
errors for a t-testing procedure of the type shown in Figure 21.2. Assume that
σ0 is known to equal 1.0. (Hypothetically, this procedure might be preferable
to the certain well-known z-testng procedures.)
Test n1 at level 1
Test n1 at level 2
y 2 − y1 ≤σ0c1?
Yes
No
Test n2 at level 1
Test n2 at level 2
yt 2 − yt1≤σ0c2?
Yes No
Bayes formula – define Pr(B|Ai) as the probability that the event B will happen
given that the event Ai has happened. Also, let A1, A2,…,A∞ be an infinite sequence
of disjoint events that exhaust the sample space. Then, Bayes’ formula, provable
using Venn diagrams and the definition of conditional probability, states
Pr(Ai|B) = Pr(B|Ai)Pr(Ai) . .
Σj=1…∞ Pr(B|Aj)Pr(Aj)
benchmarking – an activity in which people systematically inspect the relative
performance of their product or service compared with alternatives
categorical variable – system input that can assume only a finite number of levels
and these levels have no natural ordering
continuous factor – input variable that can assume, theoretically, any of an infinite
number of levels (with levels having a natural ordering)
500 Glossary
control factor – a system input that can be controlled during normal system
operation and during experimentation
control group – people in a study who receive the current system level settings
and are used to generate response data (related to counteracting any Hawthorne
effects)
control limit – when the charted quantity is outside these numbers, we say that
evidence for a possible assignable cause is strong and investigation is warranted.
decision space – the set of solutions for system design that the decision-maker is
choosing from
double blind – attribute of an experimental plan such that test subjects and
organizers in contact with them do not know which input factor combination is
being tested
easy-to-change factors (ETC) – inputs with the property that if only their settings
are changed, the marginal cost of each additional experimental run is small
engineered system – an entity with controllable inputs (or “factors”) and outputs
(or “responses”) for which the payoff to stakeholders is direct
expected value – the theoretical mean of a random variable derivable from the
distribution or probability mass function, written E[ ]
flat file – database of entries that are formatted well enough to facilitate easy
analysis with standard software
fractional factorial (FF) designs – input patterns or arrays in which some possible
combinations of factors and levels are omitted (standard FFs have the property that
all columns can be obtained by multiplying together other columns)
functional form – the relationship that constrains and defines the fitted model
global
Glossary 501
hard-to-change factors (HTC) – inputs with the property that if any of their
settings are changed, the marginal cost of each additional experimental run is large
Job 1 – the time in the design cycle in which the first product for mass market is
made
leverage in regression – the issue of input patterns causing the potentially strong
influence of a small number of observations on the fitted model attributes
local optimum – the best solution to a formal optimization problem in which the
optimization is constrained to a subspace not containing the global optimum
neural nets – a curve fitting approach useful for either classifying outputs or as an
alternative to linear regression sharing some characteristics with the human mind
noise factor – an input to the system that can be controlled during experimentation
but which cannot be controlled during normal system operation
null hypothesis – the belief that the factors being studied have no effects, e.g., on
the mean response value
p-value – the value of alpha in hypothesis testing such that the test statistic equals
the critical value. Small values of alpha can be used as evidence that the effect is
“statistically significant”. Under standard assumptions it is the probability that a
larger value of the test statistic would occur if the factor in question had zero effect
on the mean or variance in question.
random effects models – fitted curves to data and associated tests for cases in
which some of the factor levels are relevant only for their ability to be
representative of populations (e.g., people in a drug study or parts in
manufacturing)
random errors – the difference between the true average for a given set of outputs
and the actual values of those outputs (caused by uncontrolled factors varying)
region of interest – the set of solutions for system design that the decision-maker
feels is particularly likely to contain desirable solutions
regression model – an equation derived from curve fitting response data useful for
predicting mean outputs for a given set of inputs
residual – difference between the metamodel prediction and the actual data value
rigorous method (in the context of optimization) – an algorithm associated with a
rigorously proven claim about the objective function of the derived solution
response surface model - a fitted model with a model form involving quadratic
terms
specification limit – these are numbers that determine the acceptability of a part.
If the critical characteristic is inside the limits, the part is acceptable
Type I error – the event that a hypothesis testing procedure such as t-testing
results in the declaration that a factor or term is significant when, in the true
engineered system, that factor or term has no effect on system performance
Type II error – the event that a hypothesis testing procedure such as t-testing
results in a failure to declare that a factor or term is significant when, in the true
engineered system, that factor or term has a nonzero effect on system performance
within and between subject designs – experimental plans involving within and
between subject variables (relates to the allocation of levels or runs to subjects)
Chapter 1
1. c
2. e
3. b
4. d
5. KIVs - study time, number of study mates
KOVs - math GPA, english GPA
6. KIVs - origin account type, expediting fee
KOVs - time until funds are available, number of hours spent
7. d
8. b
9. Six sigma training is case based. It is also vocational and not theory based.
10. TQM might be too vague for workers to act on. It also might not be profit
focused enough to make many managers embrace it.
11. b
12. e
13. Having only a small part of a complex job in mass production, the workers
cannot easily perceive the relationship between their actions and quality.
14. Shewhart wanted skilled workers to not need to carefully monitor a large
number of processes. He also wanted a thorough evaluation of process
quality.
15. By causing workers to follow a part through many operations in lean
production, they can understand better how their actions affect results.
Also, with greatly reduced inventory and one piece flows, problems are
discovered downstream much faster.
16. A book being written is an engineered system. Applying benchmarking and
engaging proof readers together constitute part of an improvement system.
17. In grading exams, I would be a mass producer if I graded all problem 1s then
all problem 2s and so on. I would be a lean producer if I graded entire exams
one after another.
18. c
19. d
20. e
21. a
506 Problem Solutions
22. b
23. e
24. Green belts should know terminology and how to apply the methods
competently. Black belts should know what green belts know and have enough
understanding of theory to critique and suggest which methods should be used
in a project.
25. Knowledge: Physics refers to attempt to predict occurences involving few
entites with high accuracy.
Comprehension: One table in physics might show the time it takes for various
planets to orbit the sun.
Application: An example application of physics is predicting the time it takes
for the earth around the sun.
Analysis: An analysis question relates to identifying whether a given type of
theory can achieved the desired level of accuracy.
Synthesis: An issue for synthesis is how to connect mathematical technology
with a need to forecast blast trajectories.
26. a
Chapter 2
1. b
2. e
3. a
4. d
5. a
6. c
7. c
8. a
9. Acceptance Sampling, Process Mappling, Regression
10. Acceptance Sampling, Control Planning, FMEA, Gauge R&R, SPC Charting
11. d
12. d
13. e
14. a
15. a
16. d
17. See the examples in the chapter.
18. An additional quality characteristic might be the groove width with USL =
0.60 millimeters and LSL = 0.50 millimeters.
19. b
20. a
21. e
22. c
23. See the example in Chapter 4.
24. See the examples in Chapter 9.
25. In Step 5 of the method, which order should the base be folded. Fold up first
from the bottom or from the sides?
Problem Solutions 507
26. It is a little difficult to tell which are cuts and which are folds. For example, is
there a cut on the sides or a fold only?
Chapter 3
1. e
2. b
3. e
4. d
5. b
6. a
7. d
8. e
9. a
10. b
11. See the examples in Chapter 9.
12. e
13. a
14. a
15. b
16. Writing this book slowed progress on many other projects including course
preparation, writing grant proposals, and writing research papers.
17. e
18. b
19. c
20. d
21. b
22. a
23. a
24. d
25. b
26. e
27. c
28. c
29. b
30. a
31. e
32. In decision-making, there is a tendancy not to focus on the failures that affect
the most people in the most serious way. Instead, we often focus on problems
about which someone is bothering us the most. Pareto charting can bring
perspective that can facilitate alignment of project scopes with the most
important needs.
33. For many important issues such as voting methods, there is a tendancy to
abandon discipline and make decisions based on what we personallty like.
Through a careful benchmarking exercise, one can discover what make
customers happy, drawing on inspiration from competitor systems. This can
be important ethically because it can help make more people happy.
508 Problem Solutions
34. b
35. e
36. e
37. d
38. b
39. c
40. a
41. d
42. b
Chapter 4
1. b
2. d
3. a
4. b
5. b
6. c
7. c
8. a
9. b
10. c
11. d
12. Under standard assumptions, the measurement system is not gauge capable.
The measurement system cannot reliably distinguish between parts whose
differences are comparable to the ones used in the study.
13. c
14. a
15. b
16. b
17. b
18. c
19. The following is from Minitab®:
Problem Solutions 509
P Chart of numbernc
0.16
1
UCL=0.1455
0.14
0.12
Proportion
0.10
_
P=0.0860
0.08
0.06
0.04
LCL=0.0266
0.02
2 4 6 8 10 12 14 16 18 20 22 24
Sample
Figure PS.1.
20. e
21. d
22. b
23. d
24. The following is from Microsoft® Excel:
0.70
0.60
0.50
0.40 UCL
u
0.30 LCL
0.20
0.10
0.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Figure PS.2.
25. e
26. a
27. c
28. a
29. f
510 Problem Solutions
30. c
31. d
32. The following was generated using Minitab® with subgroups 18 through 22
removed:
U C L=2.668
2.5
_
_
X=2.221
2.0
LC L=1.774
2 4 6 8 10 12 14 16 18 20 22 24
Sample
1.6 U C L=1.639
Sample Range
1.2
_
0.8 R=0.775
0.4
0.0 LC L=0
2 4 6 8 10 12 14 16 18 20 22 24
Sample
33. e
34. Crossed gauge R&R could help quantify the gauge capability of the system
including both inspectors. Also, the inspectors could meet and document a
joint measurement SOP. Finally, it might be helpful to create standard units
using an external expert. Then, comparison with standards could be used
which would be most definitive.
35. Assignable causes might include non-regular, major job performance
evaluations or major life decisions. In general, the charting could aid in the
development of a relatively rational perspective and acceptance that life has
peaks and valleys. Also, the chart could help in rational approaches to
determine that a major life decision has been a success or a failure.
36. The majority of quality problems are caused by variability in the process.
Otherwise, all of the units would be conforming or nonconforming. The width
of the limits on R-charts quantifies the amount of variability in the process
with only common causes operating. Since major quality projects are often
designed to reduce the common cause variation, a pair of charting activities
before and after the project can quantify the effects.
37. Suppose that we are playing doubles tennis and coming up to the net at every
opportunity. Unfortunately, we have a string of pathetic volleying attempts
causing lost points. It still is probably advisable to continue coming to net
since the strategy still makes sense.
Problem Solutions 511
Chapter 5
1. d
2. See the example at the end of Chapter 9.
3. It would be ideal, perhaps, to eliminate all operations besides manufacturing,
usage, and storage.
4. e
5. b
6. a
7. b
8. c
9. b
10. a
11. C&E matrices help engineers communicate about their beliefs related to
causality. The process of creating these matrices forces engineers to think
about all of the customer issues. Also, C&E matrices are needed for the
construction of the house of quality which many people feel helps them make
system design decisions.
12. e
13. a
14. d
15. c
16. b
17. d
18. It is not clear how parents could fail to detect their child’s pinching their
fingers in doors. If the failures were not detected because the baby sitters
failed to report the problem, that needs to be clarified. Similarly, it is not clear
how parents could fail to detect too much TV watching unless they were
previously unaware of the issue or there were baby sitters involved.
19. e
Chapter 6
1. b
2. c
3. c
4. b
5. c
6. a
7. d
8. QFD might be better able to address more customer considerations and input
and output variable related issues. Also, QFD forces the decision-makers to
study at least some of their competitors which can provide invaluable insights.
9. Targets can function as effective customer ratings, changing the ranking of the
companies. For example, one company could dominate the customer ratings
but how key input variable or key output variable settings far from the targets.
Then, that company’s approach might not be worthy of emulation.
512 Problem Solutions
10. c
11. c
12. c
13. d
14. See the snap tab example in the chapter
15. d
Chapter 7
1. e
2. d
3. a
4. c
5. a
6. a
7. a
8. d
9. Monitoring using control charts requires the greatest on-going expense.
10. c
11. c
12. b
13. Acceptance sampling can be used when the sampling is destructive.
Acceptance sampling results generally results in reduced inspection costs.
14. e
15. a
16. N = 1000, n1 = 100, n2 = 350, c1 = 3, c2 = 7, and r = 6.
17. 450
18. 100
19. Because the company is applying some form of sampling, perhaps control
charting or acceptance sampling. Acceptance sampling could be in place to
potentially make firing decisions for a company supplying services.
20. e
Chapter 8
1. e
2. d
3. a
4. d
5. Monitoring the several major index funds simultaneously is possible.
Monitoring the results customer satisfaction surveys for all questions
simultaneously is possible.
Chapter 9
1. d
2. b
Problem Solutions 513
3. c
4. I did in fact recommend performing design of experiments using even more
factors. It seemed that they were lucky to find the source of the majority of the
variation using only four factors.
5. e
6. d
7. c
8. c
9. c
10. b
11. d
12. c
13. Examples of criticisms are found in Chapter 9.
14. For the air wing project, one might propose a goal to increase the air flight
time to 2.0 seconds.
15. See the example in Chapter 9.
Chapter 10
1. d
2. e
3. a
4. 0.0013
5. Triangular[a = $2.2M, b = $3.0M, c = $2.7M] (others are possible)
6. 0.4
7. a
8. Yes
9. b
10. 0.81
11. a
12. d
13. c
14. a
15a. 0.94
15b. 0.94
16. The following is from Excel:
514 Problem Solutions
100
90
80
70
60
100 p A
50
40
30
20
10
0
0 1 2 3 4 5 6
100 p 0
Figure PS.4.
100
90
80
70
60
100 p A
50
40
30
20
10
0
0 1 2 3 4 5 6
100 p 0
Figure PS.5.
Problem Solutions 515
18. The policy in problem 17 is always more likely to accept lots because the OC
curve is always above the preceding OC curve.
19. The ideal OC curve would look like Figure PS.6. This follows because the
policy would function like complete and perfect inspection but at potentially
reduced cost.
100
80
100 p A
60
40
20
0
0 1 2 desired3cut off 4 5 6
100 p 0
Figure PS.6.
Chapter 11
1. f
2. b
3. a
4. d
5. c
6. df = 4 using the rounding formula.
7. f
8. b
9. f
10. a
11. c
Chapter 12
1. g
2. b
3. c
4. a
5. d
516 Problem Solutions
6. c
7. a
8. d
9. a
10. d
11. b
12. b
13. d
14. f
15. c
16. e
17. See the examples in Chapter 18.
18. Naming the factors A through E, E = AD.
19. b
20. e
21. c
Chapter 13
1. c
2. a
3. b
4. a
5. e
6. a
7. The number of factors is three and the number of levels is three.
8. e
9. d
10. a
11. f
12. b
13. b
14. c
15. b
16. c
17. e
Chapter 14
1. b
2. f
3. a
4. RDPM is designed to elevate the bottleneck subsystem identified using a TOC
approach. Dependencies are modeled and settings are chosen to maximize
throughput subject to appropriate fractions of nonconforming units.
5. c
6. c
Problem Solutions 517
7. a
8. d
9. ı12(5x1+2x2+2)2 + ı22(2x1+8x2-1)2
10. (i) Derives the settings that directly maximize the profits and not an obscure
measure of quality. (ii) Build upon standard RSM which is taught in many
universities and might result in improved recommendations.
11. (i) Generally, Taguchi Methods are easier for practitioners to use and
understand without the benefit of software. (ii) If all noise factors or all control
factors are ETC, the cost of experimentation might be far less using Taguchi
product arrays compared with standard RSM based experimentation.
12. See Allen et al. (2001).
Chapter 15
1. c
2. a
3. c
4. a
5. e
6. a
7. b
8. f
9. a
10. b
11. a. yest(x1, x2) = 0.443 + 0.0361 x1 + 0.00151 x2
b. yest(x1, x2) = 0.643 + 0.0352 x1 + 0.000564 x2 - 0.000231 x1² + 8.11066e-007
x2² + 1.12021e-005 x1x2
c. 0.778
d. 1.3
e. 0.678
f. 4.15
12. f
13. It seems likely that genre plays an important role in movie profitability.
However, it is not clear that all of the levels and contrasts are needed to
quantify that effect. Therefore, the analysis here focuses only on the action
contrast. Starting with scaled inputs and a first order fit model, the following
table can be derived. It indicates that stars likely play a far greater role in non-
action movie profits than they do in action movie profits. Considering the
potentially canceling effects of video sales and marketing costs, it seems
reasonable that fifth week revenues roughly correspond to profits. Then, stars
are worth roughly $8M a piece on average, and that value depends greatly on
the type of movie. Note that this analysis effectively assumes that critic rating
is controllable through the hiring of additional writing talent.
518 Problem Solutions
(Note also that leaving stars out of the model might make sense but it would
result in no help for setting their pay.)
14. Many possible models can be considered. However, models including
interactions between the city and the numbers of bedrooms and baths do not
make physical sense. After exploring models derived in coded –1 and 1 units,
the following model was fitted in the original units:
This model can be used to develop reasonable prices in line with other offering
prices in the market. This model gives an acceptable normal probability plot of
residuals shown in Figure PS.7, seems decent from the VIF and PRESS
standpoints, and makes intuitive sense.
2.5
2
1.5
1
Normal Scores
0.5
0
-15000 -10000 -50000-0.5 0 50000 100000 150000
0 0
-1
-1.5
-2
-2.5
Residuals
Figure PS.7. Residual plot for a specific offering price regression model
Problem Solutions 519
Chapter 16
1. a
2. f
3. c
4. f
5. a
6. e
Chapter 17
1. b
2. f
3. e
4. c
5. f
6. d
7. b
8. See the first example in Section 13.4.
Chapter 18
1. f
2. a
3. d
4. b
5. b
6. b
7. a
8. e
9. c
10. d
11. a
12. 1.7
Chapter 19
1. c
2. b
3. e
4. c
5. d
6. a
7. c
8. a
9. /* This change de-codes the chromosome to address the new constraint. */
for (j=0;j<SIZE;j++) vect[j]=2.0 +(x->vector[j])*3.0;
520 Problem Solutions
Chapter 20
1. e
2. a
3. The solutions is unchanged. The selected system is the most expensive.
Chapter 21
1. d
2. c
3. a
4. d
5. // ** translate solution vector to method design
c1crit = x->vector[0]*NLITTLE;
rcrit = c1crit + x->vector[1]*(NLITTLE-c1crit);
c2crit = x->vector[2]*NLITTLE;
Ntotal = (long) ((NBIG - MINRUNS) * (x->vector[3])) + MINRUNS;
n1 = (long) ((Ntotal) * (x->vector[4]));
n2 = Ntotal - n1;
// ** generate truth
temp = ranS(&seed); difference = 0.0;
if(temp < trueProb) {difference = diffParm;}
else {difference = 0.0;}
// ** generate data
for(i=0;i<n1;i++) //first round
{ y11i[i]=gasdev(&seed);
y12i[i]=gasdev(&seed)+difference; }
for(i=i;i<Ntotal;i++) //second round (might be wasted)
{ y21i[i]=gasdev(&seed);
y22i[i]=gasdev(&seed)+difference; }
// ** do test
for(i=0;i<n1;i++)
{ y11avg+=y11i[i]; y12avg+=y12i[i]; }
y11avg=y11avg/((double) n1); y12avg=y12avg/((double) n1);
for(i=i;i<Ntotal;i++)
{ y21avg+=y21i[i]; y22avg+=y22i[i]; }
y21avg=y21avg/((double) n2); y22avg=y22avg/((double) n2);
if(y12avg - y11avg < c1crit)
{declareD = 0; nUsed=((double)n1);} //1st stage says no diff
else {if(y12avg - y11avg > rcrit)
{declareD = 1; nUsed=((double)n1);} //1st stage says diff
else { nUsed=((double)Ntotal);
temp = (y12avg-y11avg)*((double)n1)–(y22avg-y21avg)*((double)n2);
if(temp > c1crit * ((double)Ntotal) && n2 > 0)
Problem Solutions 521
// ** evaluate results
if((difference>0.00000001) && (declareD < 1)) {errorP = 1.0;}
if((difference<0.00000001) && (declareD > 0)) {errorP = 1.0;}
return nUsed + weight*errorP;
Index
A
C
absolute error 78
acceptance sampling 33, 42, 43, categorical factor (or variable) 53,
147, 151, 152, 153, 155 367, 369, 415, 499
activities 9 CCD xxi, 290, 304-305
adjusted R-squared 291, 298, 356, center points 300
499 characteristic(s) 175, 326, 334-336
Analysis of Variance (ANOVA) charter 45-49, 58-59, 64, 67, 106
15, 81, 242-243, 359-362, 364 check sheet 56
anecdotal information 136 common cause variation 86-89, 98,
appraiser 77, 79, 105, 106, 108 103, 110-111, 179, 499
assignable cause 86, 87, 88, 90, 91, complete inspection 88-89, 93, 98,
94, 103, 104, 111, 148, 500 103-104, 136, 151
assignable cause variation 499 component methods 9
average run length 227 concurrent engineering 11
conforming 35, 50, 155, 479
B confounding 418, 499
constraint 140-142, 370, 413, 414
batches of size one 11-12, 120 continuous factor (or variable) 369,
Bayes formula 499 371, 499
BBD 290, 305, 306 control 3, 4, 17, 19, 20, 24, 32, 37,
benchmarking 57, 135, 499 58, 76, 86, 88, 91, 93, 94, 95,
black belt 9, 17 96, 98, 102, 104, 111, 113,
block 300 114, 115, 124, 127, 140, 147-
blood pressure 103, 166 153, 158, 165, 179-181, 227,
Bonferonni inequality 499 228, 253, 323-326, 334-336,
bottleneck 51, 53, 63-64, 68, 119, 404, 415-418, 458, 499
326 control factor 181, 332, 404, 500
box office data 375 control limit 500
box plot 248 control planning 147-149
buy in 49 control-by-noise interaction 324
co-packer 480
craft 10, 11, 25, 120
524 Index
lower control limit 88, 91-94, 180 normal distribution 98, 102-103,
lower specification limit 181-183 207-210, 243, 273-274, 321-
325, 351-353, 365-415, 429-
M 430, 458, 464, 480, 502
normal probability plot 351
mass customization 13, 15, 18 not-invented-here syndrome 48
mass production 10-15, 24-25, 86, np-hard 462, 463, 476, 502
120
master black belt 9 O
measurement errors 76, 78, 107,
357 objective function 458
measurement systems 69, 75-78, observational study 249, 343, 418,
85, 105-107, 117, 148 502
meddling 86-87 OC curve 229
metamodel 501-503 OFAT 177, 184, 275-277, 401
method vii, viii, 1, 2, 7, 8, 9, 18, 19, one-factor-at-a-time (OFAT) xxii,
20, 29, 30, 33, 37-91, 95, 98, 176, 177, 183, 257, 275-
100-107, 117-125, 128, 135, 277, 401, 411, See OFAT
136-153, 182, 183, 243-262, one-point crossover 467
270-275, 289, 299-300, 307- one-sided t-test 245
310, 321, 335, 343, 353, 359- operations research vii, 7, 15, 136,
386, 387, 397, 405, 411, 414, 140, 457-475
435, 437-459, 464-469, 479- optimal tolerancing 478-481
503 optimization program 140, 379,
miniaturization 13 413, 458-463
mixed production 120 optimization solver 140, 142, 143,
monitoring 87, 89, 95, 103 386, 466-474
multicollinearity 499-502 optimum 392, 461-466, 469, 501-
multivariate charting 98, 105, 161, 502
166 orthogonal array 275
out of scope 47
N out-of-control signal 91, 100, 162
output variables 2-8, 30, 45-47, 51,
neural nets 385, 386, 502 57-58, 63, 75, 87, 148, 504,
newsvendor 464 See KOVs
noise factor(s) 19-20, 101, 319, over-control 86, 88, 102, 111
321-338, 415, 420, 502
nominal 34 P
nonconforming 35-36, 53-56, 69,
94, 103-113, 125, 136, 149, parameterization 410, 502
151-154, 168, 176, 181, 212- parametric 360
213, 229, 417 p-charting 86-94, 105-107, 150
nonconformity 35-36, 53-56, 69, person-years 50-51, 67
94, 103, 126, 180, 323 posterior probability 183, 502
non-destructive evaluation 77 power 435
normal approximation 370 prediction 411-412
prediction errors 306-309
Index 527
significant figures 45, 60-63, 72, 80 136, 140, 143, 147, 148, 152,
six sigma vii, viii, xi, 2, 7-10, 16- 154, 242-262, 270-274, 286-
24, 29-32, 45, 49-51, 63-65, 300, 309, 321-326, 332, 356-
75, 103-104, 123, 135, 136, 360, 370, 380, 404, 415-417,
150, 152, 212-213, 332, 486, 459, 466, 479, 500-504
503 systematic errors 77, 78, 106-108,
six sigma quality 207, 212-213 365
smoothing parameter 162
SOP(s) 24, 36-40, 44, 79, 80, 88, T
107, 119, 127, 135, 143, 189,
190-194, 197, 282, 419-420, Taguchi methods 321, 332-335
504 test for normality 212, 504
SPC xxii, 15, 16, 18, 29, 30, 31, theory vii, 21, 26, 29, 60, 68, 118,
32, 33, 37, 39, 41, 42, 43, 75, 431
76, 85, 88, 91, 93, 95, 105, theory of constraints (TOC) 51, 63
110, 148, 149, 151, 161, 172, total quality management (TQM)
195, 417, 504 16-17
specification limits 176, 212, 504 total variation 323
SQC viii, 1, 18, 20, 21, 29-234, Toyota production system 120-121
504, See statistical quality trace of a matrix 449, 504
control training qualifications 37
stackup analysis 479 training set 386
standard operating procedures true response 307
36-40, 504, See SOPs two sample t-test 242, 243-248. 432
standard order 272 two-sided t-test 243
standard unit 78-79 type I error(s) 251-252, 432-439,
standard values 76-80, 84, 106-108 504
startup phase 87-88, 93 type II error(s) 243, 251-252, 259,
statistical process control xxii, 15, 262, 274-275, 432-439, 504
16, 18, 39, 75, 85-105, 147-
149, 179-180 U
statistical quality control vii, 1, 21,
29, 29-234 upper control limit 88, 91, 180
stochastic optimization 463-465, U-shaped cells 120
470, 474-475, 480, 492, 504
strength 407, 414 V
subsystems 14, 30, 45-55, 58, 63-
69, 117, 119, 121, 125, 128, value added 118
181 value stream 118
summary 37, 100 value stream mapping 10, 18, 53,
supply chain 14, 33 117-121, 128
system 2, 3, 5, 6, 7, 8, 9, 10, 11, 14, variance inflation factors (VIFs)
17, 18, 19, 22, 23, 25, 30, 31, xxii, 348, 349, 504
32, 33, 37, 38, 41, 45, 46-57, verify 15, 32, 143, 147
65, 67, 68, 69, 75, 76, 77, 78,
80, 84, 85, 86, 88, 89, 91, 98,
100-109, 113, 117-128, 135,
Index 529
W Y
within and between subject designs yield 10, 13, 19, 30, 176, 177, 178,
504 179, 180, 181, 182, 183, 268,
386, 413, 438