Untitled
Untitled
in Quality Control
Sensory Evaluation
in Quality Control
Alejandra M. Munoz
Sensory Spectrum, Inc.
Gail Vance Civille
Sensory Spectrum, Inc.
B. Thomas Carr
The NutraSweet Co. R&D
All rights reserved. No part of this work covered by the copyright hereon may be reproduced or used
in any form or by any means-graphic, electronic, or mechanical, including photocopying, recording,
taping, or information storage and retrieval systems-without written permission of the publisher.
16 15 14 13 12 11 10 9 8 7 6 5 4 3 2-1
PREFACE XV
CHAPTER 1 Introduction 1
THE CONCEPT OF QUALITY 1
The Importance of Quality in American Industry 1
Definitions of Quality 2
The Consumer Input in the Defmition of Quality 3
The Multidimensional Nature of Quality 4
Quality Consistency 6
THE MAINTENANCE OF PRODUCT QUALITY
(CONSISTENCY) 6
The Value of Product Consistency 6
The Function and Establishment of Quality Control
Programs 7
The Steps in the Implementation of a QC Program 7
The QC Steps that Require the Measurement of Product
Characteristics 8
MEASUREMENT TECHNIQUES IN A QC
PROGRAM (INSTRUMENTAL AND
SENSORY TEST METHODS) 10
Instrumental Methods in a QC Operation 11
Advantages of Instrumental Methods in a QC
Program 11
vii
viii Contents
Panelists 33
Management Support 34
Facilities 36
Environment 36
Sample Handling 37
Outside Resources 37
Collaborating Groups 38
Research and Development 38
Marketing and marketing research 39
Manufacturing 39
QNQC 39
Data Analysis 94
Report System 97
Characteristics of the Ongoing Program 98
Routine Activities 98
Long-TennActivities 101
OTHER VERSIONS 103
Shorter Version Used By Itself 104
Shorter Version Across Several Similar Products 104
Shorter Version Combined withOther Methods 105
References 230
Index 236
Preface
This book addresses an important, but so far neglected, topic: the application of
sensory evaluation to quality control. Although several articles have been pub-
lished that have discussed concepts of quality control/sensory evaluation (QC/sen-
sory) programs, Sensory Evaluation in Quality Control is the first publication that
addresses this topic in a comprehensive and practical way. This book is com-
prehensive, in that it presents the sensory and statistical information that is needed
to design and implement several types of QCfsensory programs at the plant level.
The book is practical, in that it provides a step-by-step description of the complete
process to implement such programs, and it illustrates this process through real
examples encountered by various consumer products companies (e.g., foods,
personal care products, paper products). With this practical information, sensory
and quality professionals can design and implement sound QC/sensory programs
at the plant level.
This book was developed to provide the sensory and quality professional with
an overview and guide to apply, in a production facility, the unique techniques that
are used to measure sensory responses. Therefore, the book is intended for QC
and/or R&D personnel (e.g., sensory managers and analysts, and quality profes-
sionals) in charge of implementing an in-plant program, as well as for the plant
management and plant technical personnel (sensory coordinator and quality pro-
fessionals) who are ultimately responsible for the routine operation of the estab-
lished program. Technical personnel can use this book as a guide and benefit
mainly from the in-depth description of the sensory and statistical information.
QC, R&D, and plant management can use it to assess the benefits of a sensory
program at the plant level, the various alternative techniques that exist, the
advantages and limitations of each, and the resources and costs needed to imple-
ment any of the QC/sensory programs discussed.
XV
xvi Preface
This book is organized into four main parts. Chapter 1 is an overview of quality
control programs, of the test methods in a QC operation (instrumental and sen-
sory), and of the common sensory techniques that are published in the sensory
literature and used in industry, with a discussion of the advantages and disadvan-
tages of each. Chapter 2 is a discussion of the preliminary phases in the program
implementation, such as the planning phase and the assessment of resources and
needs. Chapters 3 to 6 present the in-depth description of four different QC/sensory
programs: comprehensive descriptive, quality ratings, "in/out," and difference
from control methods. Finally, the appendices contain statistical background infor-
mation on the procedures, for summarizing and analyzing QC/sensory data.
Each program (chapters 3 to 6) is covered in four basic sections: 1) an abstract,
2) a description of the program in operation, 3) the steps and procedures for
implementing and OP.Crating the program, and 4) other versions of the program.
The abstract contains a brief summary of the method and describes the main
characteristics of the method, as well as its uses, advantages, and disadvantages.
The section on operation of the program describes the way the program functions.
It shows the type of data collected, the type of sensory specification used, and the
comparison of the collected data and the specifications for product decisions. This
section allows the reader to become familiar with the methodology and principles
of the program. The section on implementation of the program provides detailed
information on each step recommended for developing and implementing such a
program. This section is intended to provide the reader with information needed to
implement that QCfsensory program. Finally, the section on other versions of the
program presents modified programs of the main method. In many situations, due
to limitations on resources or philosophical differences, professionals might opt to
consider implementing modified or shorter versions instead of the complete pro-
gram discussed in each chapter.
Each method has unique characteristics and approaches that are discussed in
their corresponding chapter. There are, however, many common components and
tests among all methods. These common elements are discussed in detail only in
the first method presented-the comprehensive descriptive approach (Chapter 3).
The other chapters (4-6) make frequent reference to the common program aspects
discussed in Chapter 3.
Our special appreciation is extended to Barbara Pirmann and Andrea Senatore,
for preparing and proofing this manuscript. We wish to also thank Clare Dus, for
her assistance in the literature review, and Linda Brands, for her help in preparing
several tables and figures of the book.
Alejandra M. Muiioz
Gail Vance Civille
B. Thomas Carr
1
Introduction
complex and sophisticated products, but that this is insufficient. Efforts are needed
for the country to regain its world leadership in quality. W. Edward Deming, who
taught Japan about quality, states that, "We in America will have to be more
protectionist or more competitive. The choice is very simple: if we are to become
more competitive, then we have to begin with our quality" (Halberstam 1984).
In light of this scenario, American companies have committed to establishing
programs for the development, maintenance and/or improvement of quality prod-
ucts. Companies know that the total cost of quality programs is outweighed by the
benefits they produce. These benefits include savings, in terms of materials, effort,
and time, as well as the enhanced business resulting from higher consumer
acceptance and greater competitiveness.
Currently, many companies consider the pursuance and maintenance of quality an
essential part of their operation. Experts consider that, without quality programs,
companies are doomed to fail. Cullen and Hollingum (1987) believe that, in the future,
there will be two types of companies: those that have implemented total quality and
those that will be out of business. At the same time, these experts consider the Japanese
quality perspective to be the model to follow for success. (Hayes 1981; Halberstam
1984; Cullen and Hollingum 1987; Schrock and Lefevre 1988; Penzias 1989).
The problem and the challenge are very complex. Within a company, the pursuit
of quality must be the job and challenge of each employee, from top management
to line workers. Educational and operational quality programs and the concepts of
quality should be an essential component of the corporate culture. Many quality
control books cover, in detail, endeavors that many American consumer products
companies have undertaken: programs developed to establish and maintain their
products· quality. This book adds a new tool to quality control programs that is
critical for the consumer products industry: evaluating and controlling products,
based on their sensory properties.
Definitions of Quality
Sinha and Willborn (1985) present a collection of definitions of quality (Table 1-1 ).
Intrinsic in all these definitions is the concept that quality encompasses the
characteristics of a product or service that are designed to meet certain needs under
specified conditions. Although these are accurate definitions that are widely used
and sometimes sufficient for discussing general concepts, there are additional
aspects that need to be included, to better understand the total quality view. These
concepts are:
Definition Source
~Conformance to requirements." 3
~The degree to which a product or service is fit for the specified use." 7
I. ANSI/ASQC Standard, "Quality System Terminology," A3-1978, prepared jointly by the American National
Standards Institute (ANSI) and American Society for Quality Control (ASQC).
2. Juran, J. M., editor-in-<:hief, Quality Conrrol Handbook, 3rd. ed., McGraw Hill Book Company, New York, 1974.
3. Crosby, P.B., Quality Is Free. McGraw Hill Book Company, New York, 1979.
4. DIN-53350. of Deutsches Institute fuer Normung Teil11, Beuth-Verlag, Berlin.
5. EOQC, "Glossary of Terms Used in Quality Control," 5th ed., 1981, published by the European Organization for
Quality Control.
6. QS-Norm Draft of Swiss Standard Association, 1981.
7. Seghezzi, H.D ., "What is Quality Conformance With Requirements or Fitness for the Intended Use," EOQC Journal
4, 1981, p.3.
Source: Sinha and Willbom 1985.
Few quality control publications deal with the consumer dimension within an
in-plant program. In an early publication, MacNiece (1958) explained the import-
ance of studying the consumer requirements and integrating them into engineering
specifications. Many industrial and commercial enterprises believe that they know
what is best for consumers and what they need. However, companies need to
understand the difference between what consumers "want" and what they "need"
(MacNiece 1958). This book emphasizes the importance of consumer input in
quality programs and includes specific procedures, to incorporate consumer re-
sponses in establishing quality programs.
The critical quality dimensions of a product are not stagnant. Their importance
The Concept of Quality 5
Dimension Meaning
Quality Consistency
The two quality dimensions, discussed earlier-the consumer input and the regard
to the multidimensional product nature-play an important role in the first of the
two stages of a product cycle, which is the development of the product and the
establishment of its inherent quality. In this first product stage, the inherent quality
characteristics are established and are meant to satisfy consumer needs and expec-
tations. These inherent characteristics are, among others, the dimensions, aesthet-
ics, performance, and cost of the product, and will determine the initial consumer
acceptance or liking of a product. In the second product stage, the consistency
aspect of quality plays an important role. The two product stages have to be
considered in order to determine the total product quality. A quality product has
inherent properties of high quality and will consistently deliver these quality
properties to consumers, over time.
This book does not deal with issues related to the first product stage, which
includes selecting the critical inherent quality parameters, establishing the inherent
product quality, and performing the technical and business efforts during the
product's development. This book's material deals with the quality issues that
follow the development and production of a product of a given quality, which is
the consistency aspect of quality. To achieve consistent high quality, a program for
maintaining the inherent product's quality is required.
Establishing the product's inherent quality is a very critical business and technical
product phase since it is the basis for the future success of the product or service.
Steven Jobs, of Apple Computer fame, (Gendron and Burlingham 1989) has the
fundamental belief that time has to be sacrificed, in order to produce and launch a
good product the first time. This is easier than having to go back and fix it.
However, the success of a product (i.e., the consumer's continued acceptance) does
not depend solely on this first product stage, but on the product's ability to fulfill
a second need for consumer satisfaction: consistency. As stated by Lehr (1980), the
definition of quality should encompass "the consistent conformance to customers'
The Maintenance of Product Quality (consistency) 7
expectations." This implies ..doing it right every time instead of just doing it right
the ftrst time."
The efforts and the responsibilities to maintain and control the product's quality
shift within the company. While Product Development and Marketing/Market
Research are in charge of establishing the initial product quality, Quality Control
and Quality Assurance are responsible for establishing a system that assures the
product's consistency.
Definition Source
Quality Control
-An effective system for coordinating the quality maintenance
and quality improvement efforts of the various groups in an
organization so as to enable production at the most economical
levels which allow for full consumer satisfaction."
Quality Assurance
-All those planned or systematic actions necessary to provide adequate
confidence that a product or service will satisfy given needs." 3
l. Feigenbaum, A.V. 1951. Quality ControL Principles, practice and administration. New York: McGraw-Hill Book
Company.
2. Sinha, M.N. and W.O. Willbom. 1985. The managemem ofqualiry assurance. New York: John W"liey and Sons.
3. ANSJ/ASQC Standanl, "Quality System Terminology," AJ-1978. Prepared jointly by the American National
Standards Institute (ANSI) and American Society for Quality Contml (ASQC).
4. Canadian Standard Association Standard, Z299.1-1978. -Quality Assurance Program Requirements."
• Setting standards/specifications;
• Inspecting during or after manufacture;
• Appraising conformance; and
• Planning for improvements.
MEASUREMENT TECHNIQUES
IN AQC PROGRAM (INSTRUMENTAL
AND SENSORY TEST METHODS)
The previous section outlined the steps in a QC program, in which product
measurements play an important role. A detailed discussion of QC measurements
is warranted.
Except for one of the five steps in a QC program-the actual manufacture
of the product-all other steps are either (a) measurement techniques them-
selves or (b) related to and depend upon the product measurements obtained.
Ultimately, the decisions on product disposition (reject, accept, hold) are based
on the results of the product measurements themselves. This implies that the
effectiveness of a QC program depends, to a large extent, on the type of
measurement techniques used, their validity, reliability, and reproducibility.
Therefore, one of the important functions of a QC department is to research,
establish, and maintain the most appropriate measurement techniques. Often,
measurement techniques are established and seldom reviewed or improved.
Levy (1983) indicates that quality control programs often are involved in
testing for the sake of testing, rather than understanding what the results
actually mean. There is also a tendency to regard a test as sacrosanct, not to be
changed or modified in any way.
The measurement techniques used at the plant level can be instrumental
and/or sensory methods. Heretofore, focus has been primarily on instrumental
methods. However, as consumer products companies realize that sensory test-
ing is a critical part of product development and quality control (Pangborn and
Durkley 1964) and recognize that sensory evaluation is developing as a science
(Pangborn 1964), sensory methods occupy the same important role as instru-
mental tests.
Measurement Teclmiques In a QC Program (Instrumental and Sensory Test Methods) 11
• Simplicity;
• Expediency;
• Immediate or quick tum-around of data;
• Continued operation (no restriction on number of products/samples tested);
• Precision;
• Accuracy;
• Reproducibility;
• Cost efficiency; and
• Compatibility with other instruments and/or computers.
1968; Bourne 1977; Williams 1978; Bruhn 1979; Trant, Pangborn, and Little 1981;
Szczesniak 1987). Particularly relevant are:
• For some characteristics, the lack of instrumental techniques that measure
properties related to the attributes of interest;
• In some cases, the lack of a relationship between sensory and instrumental
measurements, as well as the inability of those instruments to relate and predict
a sensory response;
• The inability of instruments to measure all the components of the total sensory
response; and
• The lower sensitivity of some analytical methods or instruments (e.g., chroma-
tography), compared with human sensitivity.
• Tune involved;
• Expenses incurred;
• Effect of environmental and emotional factors on people's responses;
• Effect of biases and physiological and psychological factors on the product's
results;
• Possible lack of accuracy and precision;
• Attrition; and
• Need to handle and resolve personal factors that affect the operation of the
program (e.g., sickness, layoffs, promotions, moves).
When sensory methods add the unique measurement "to the set of QC measure-
ment techniques" (e.g., mechanical, microbiological), the above disadvantages are
accepted in order to obtain a more complete picture of the product's quality.
The history of QC/sensory programs has undergone two distinct phases. The first
and oldest QC/sensory practices date back to the development of sensory evalua-
tion methods for product evaluation in industry. These programs relied on "ex-
perts,.. such as perfumers, brewmasters, or winemakers. Operating such a
"QC/sensory program.. was the sole responsibility of such experts, specifically by
establishing the criteria to judge quality. The actual judgement of quality was done
on a subjective basis (Dove 1947; Hinreiner 1956). In time, this responsibility was
delegated to more "experts" or to a small panel of judges. The sensory practices
used at that time mainly included evaluating the product's quality (e.g., poor to
excellent) and, to some degree, evaluating product characteristics by ranking or
rating (Platt 1931; Plank 1948; Dawson and Harris 1951; Peryam and Shapiro
1955; Amerine, Roessler, and Filipello 1959; Tompkins and Pratt 1959; Ough and
Baker 1961).
An important turning point in quality and quality control judgements occurred
when attention was given to selecting and training judges, selecting a formal
The History and Practices of QCfsensory Programs 17
sensory methodology (including using reference materials), and using formal data
analysis methods (Dawson and Harris 1951; Pangborn and Durkley 1964).
QCfsensory practices continued to be developed, such that, in the 1960s and 1970s
formal sensory evaluation programs were developed and integrated into corporate
quality control functions. Regrettably, these few companies have implemented
methodology that has yet to be published. It was not until the end of the 1970s and
beginning of the 1980s that some of the industrial QC/sensory practices or con-
cepts were discussed within the sensory community.
Two 1FT (Institute of Food Technologists) Sensory Evaluation Division Sym-
posia, "The Role of Sensory Evaluation in Product Quality Assurance" (1979) and
"The Wide Scope of Sensory Evaluation" (1980), addressed the industrial prac-
tices of sensory evaluation in a quality control/quality assurance function, for the
first time. In these two symposia, four of the papers (Nakayama and Wessman
1979; Wolfe 1979; Reece 1979; Merolli 1980) discussed very relevant, applied
industrial practices of and issues in a QC/sensory program, such as:
These papers are very general in nature and do not present specific technical
information, discussion on methodology, or procedures in the program implementa-
tion. However, they are the first documentation of the philosophical, administrative,
and general technical issues involved in an in-plant sensory program.
In 1985, an 1FT symposium entitled "In-Plant Sensory Evaluation" addressed
three areas in the operation of QC/sensory evaluation programs:
These papers are also general in nature, in that specific QC/sensory procedures
or product examples are not discussed. The value of this literature is that, in
contrast to the few early publications (1979-1980), some papers of this 1985
18 Inttoduction
The problems discussed in selling this program to management are critical and
are faced by all sensory professionals who plan to get a QC/sensory program
started. Rutenbeck (1985) discusses what getting management support means, as
well as some activities needed to get this support. These activities include:
grams. This book is developed to provide the quality and sensory professionals
with a step-by-step description of the procedures for implementing one of several
QC/sensory programs that meet the product and company needs.
1. Overall difference tests (Kramer and Twigg 1970; Mastrian 1985; Dziezak
1987; Chambers 1989);
2. Difference from control (Aust et al. 1985);
3. Attribute or descriptive tests (Rutenbeck 1985; Chambers 1990);
4. In/out of specifications (Nakayama and Wessman 1979; Sidel, Stone, and
Bloomquist 1983);
5. Preference and other consumer tests (Kramer and Twigg 1970; Dziezak 1987);
6. 1)'pical measurements (Steber 1985);
7. Qualitative description of typical production (Ruehrmund 1985); and
8. Quality gradings (Kramer and Twigg 1970; Bruhn 1979; Waltking 1982; Ke,
Burns, and Woyewoda 1984; Rothwell 1986; Bodyfelt, Tobias, and Trout
1988).
Difference Tests
Forced choice difference tests are too sensitive and too general for quality control
purposes. In an overall difference test, judges focus on any difference between
products, small or large. Unless each production batch is identical to the control or
20 Introduction
Preference Tests
Preference and other consumer tests are the least recommended sensory methods
for quality control measurements. Using preference tests is considered inappropri-
ate because of the population (participants) used and the type of data collected. The
participation of a small group of company employees is considered unrepresenta-
tive of the true consumers and is therefore of little value. A production batch
preferred by employees may not be the one preferred by the actual product
consumers.
Like difference tests, preference tests provide no information as to how or to
what degree the production sample differs from the standard. Furthermore, a
product that is equally preferred may have distinctly different sensory properties
than the control. Also, decisions based on "significant preference" for the con-
trol/standard over production would mean rejecting a large percentage of daily
production.
Typical Measurements
Determining "typical/atypical'' production (Steber 1985) is one of the most popu-
lar QC product evaluation methods used in consumer products companies. Its
popularity is due to its simplicity and the "directness" of the results. Production is
classified into "typical'' and "atypical." If production is "typical," it is shipped, and
if it is "atypical," it is held.
The two main disadvantages of the method are the uninformative nature of the
information and the subjectivity of the evaluations. These measurements are consid-
ered general and nonactionable, when production is found "atypical." No information
on the nature or magnitude is obtained. On the other hand, panelists are subjective in
that they focus on different sensory attributes, to judge what represents typical and
atypical production, and have different tolerance criteria for variation.
show this type of evaluation for the quality control of coconut and ice cream,
respectively. The criteria to judge the smell of coconut (Ruehrmund 1985) are:
"clean, fresh-no stale, musty, or other foreign odors." The descriptions for
unacceptable feel are: "dry and free flowing-no damp or greasy feeling." This
type of evaluation has two major problems. First, the descriptors (criteria for
judgement) are too vague and subjective for panelists to know and agree on the
characteristics being evaluated. This applies to the criteria of "clean and fresh,"
given by Ruehrmund, and "freshness/cleanness/distinctiveness," given by Spencer
(1977). The criteria of this method have been found to be too general and
"integrated" (i.e., encompass the evaluation of many attributes) to be useful for
management's decisions (Bauman and Taubert 1984). These terms are classified
under "consumer terms" and are appropriate for consumer tests, but not recom-
mended for any analytical sensory test at the R&D or plant level. The second
disadvantage of this method is the lack of information and guidance when the
product is assessed as "atypical."
Quality Ratings
This approach, together with the "in/out" and "typical/atypical" evaluations, has been
the most popular QC/sensory test used. Furthermore, the quality gradings have not
only been used in in-plant situations, but in academia, research, and government
evaluations. Various trade associations and commodity organizations have developed
and published "standard" procedures that fall into this category. Of special interest are
the dairy and oil standard grading systems developed by the American Dairy Science
Association and the American Oil Chemist Society (AOCS). These grading systems
or scorecards consist of ariy combination of these components:
Quality grading systems developed for various commodities are those for eggs
(Institute of America Poultry Industries 1962), oils (Waltking 1982), ice cream
(Spencer 1977), dairy products (Bodyfelt et al. 1988), and squid (Ke et al. 1984).
Despite their popularity and extensive use, these quality grading systems have
serious limitations and flaws. Among them are:
• Lacking attention to all the sensory attributes that affect the product's accep-
tance;
• Lacking equidistant quality grade points or categories;
• Nonuniform assignment of score weights across quality categories;
• Using qualitative and quantitative factors in a single category grade; and
• Inappropriately using statistical analysis of data collected, using scales with
characteristics described above.
Several authors have evaluated these limited scoring systems and presented
their criticism (Sidel, Stone, and Bloomquist 1981; O'Mahoney 1979).
Improved quality scoring systems are currently being developed. Among these
are the procedures documented by the ASTM task group E18.06.03 on edible oils
and the new scoring system developed and published by the American Oil Chemist
Society (AOCS 1989).
2
Program Design and Initiation
PRELIMINARY DETERMINATION OF
PROGRAM PARAMETERS
In any sensory study, the objective of the sensory test or program is based on the
project situation or problem. In a QC program, defining the sensory project
objectives requires one to consider several aspects, such as product variability,
marketing objectives, and manufacturing conditions. These considerations have an
impact on important decisions of the QC/sensory program, such as selecting the
product category, points within the process at which to evaluate, the plants to be
included, the key sensory attributes to be evaluated, the product sampling, and the
test method to be used.
Companies cannot hope to understand consumer acceptance, consumer com-
plaints, or shifts in market share, without monitoring the product's sensory prop-
erties that drive these consumer responses. The key sensory test objective of a
QC/sensory program is to develop an efficient way to understand the key attributes
that affect consumer liking or acceptance, to assess which raw materials and
processing variables affect the fmal sensory properties, and to develop the most
efficient and timely system to measure and control these sensory attributes that are
so important to consumer acceptance.
One critical marketing decision that must be made within the company when
establishing a QC/sensory program is whether the selected product is expected to
be consistent across the geographic area in which it is marketed. This may be a
regional (southwest United States) national (all of the United States), multinational
(all of Europe), or global product. A key question to be determined by marketing
and upper management is whether the product is expected to be the same, in terms
of sensory properties, in all markets or different ones, depending on the market.
23
24 Program Design and Initiation
Market Position
The products that yield the largest profits for a company are those usually included
first in the system. Position in the market place implies high consumer acceptance
of the product's sensory attributes and requires assessment and control of these
products and their sensory attributes.
Critical Problems
A sensory defect in a product that is considered unacceptable by management or
the product's consumers (such as the presence of an off-flavor or off-color and its
persistence across several production days) is the most important reason for
including a product in a QC/sensory program. This decision is especially critical,
if the problem has already begun to affect consumer acceptance, demonstrated by
consumer complaints, loss in shares, or poor performance in consumer tests. The
negative consequences of shipping product with a sensory problem to loyal
consumers, because of an inability to detect and control such problems, strongly
justifies measuring and controlling that product's sensory attributes. Sensory
measurement may be the only way to determine a product's critical problems, as
well as the level of variability in production.
Preliminary Determination of Program Parameters 25
Production Variability
Consumers want products not only high in quality and low in defects, but also with
a consistent quality. Therefore, products with large production variability should
be considered as candidates for the QC/sensory program because of the potential
negative impact on consumer acceptance. A program, which can measure and
control production variability, should include those products with large production
variability. With many products, instrumental methods may not be able to measure
or track product variability.
Multiple Plants
Although a lesser consideration, companies with more than one production facility
may choose to include a product that has met one or more of the above criteria and
is produced at more than one plant. This permits study of quality and consistency
across plants.
Raw Ingredients/Materials
Once an inferior or defective raw material enters a process, the defect may be
magnified several times, as the defect is incorporated into and thus becomes
detectable in the finished product. One very deliberate and effective way to control
the quality of a product is to control the quality of the materials used in producing
that product. Therefore, evaluating and controlling the incoming raw materials'
sensory properties offer a significant opportunity to monitor and eliminate varia-
tion early in the process, thus saving time, money, and product. When possible,
every effort should be made to develop a QC/sensory effort to evaluate all or most
raw ingredients by 1) setting ingredient sensory specifications, 2) conducting
regular evaluations upon receipt at the plant, and 3) making decisions as to the
disposition or blending of inferior raw ingredients.
Selecting the raw ingredients to be monitored is determined by the finished
products to be controlled and by the critical sensory attributes of those finished
products known to impact on consumer acceptance. It is possible to incorporate the
assessment of many raw ingredients into a program, since the analysis of raw
ingredients generally consists of the evaluation of only one or two attributes as
opposed to five to ten attributes, in most finished products. One raw ingredient may
26 Program Design and Initiation
be incorporated into several fmished products. Therefore, the control of only a few
raw ingredients may influence the control of several fmished products. In contrast,
selecting in-process materials and fmished products for a QC/sensory program is
more complex.
Understanding the relationship between raw ingredient sensory attributes and
the sensory properties of the resulting fmished products allows for the eventual
shift from the more complex multi-attribute evaluation of several fmished products
to the simpler, more efficient evaluation of a few key attributes of a few key raw
ingredients. This system insures a quality product through early control, rather than
through large scale disposition or rework of fmished product.
In Process
For some processes, there are critical points at which sensory tests can be done to
measure and control the introduction of sensory attributes into the product. Just as
analytical chemical and physical tests are conducted at stages throughout a pro-
cess, QC/sensory monitoring is recommended at critical stages in production,
where interim product quality reflects the quality of fmished product. When a
hatched lot of one phase of a product, such as a dough premix for a cookie or a
fragrance blend with carrier for a household product, are introduced into the
process, this in-process phase or batch can be tested for key sensory attributes.
Often, the reaction of raw materials to partial processing reveals characteristics
that are not desirable in the fmished product. As with raw materials, if defective or
variable in-process products can be identified and held before fmal processing,
conversion, or inclusion with other in-process materials, the result is the saving of
time, production costs, and product.
Finished Product
Since almost all consumer products are purchased for their sensory properties, as
well as other performance characteristics, it behooves managements of consumer
products companies to monitor the sensory properties of their products before they
are shipped to the marketplace. In spite of an effort to control the sensory quality
of a product by monitoring and controlling raw materials, it is, at least initially,
necessary to evaluate the sensory characteristics of fmished product. The interac-
tion of raw materials and/or the effects of processing on the already cleared raw
materials requires monitoring. The appearance, fragrance, or flavor and texture
properties can be monitored for some time period, to learn the variability per se
and its relationship to any variability in the raw materials. Once management and
the QC/sensory team understand how raw materials affect fmished product, it is
possible to reduce the frequency of evaluation of the fmished product.
Quality control of consumer products often includes the instrumental analysis
of chemical and/or physical properties at different processing stages, but often does
Preliminary Determination of Program Parameters 27
not include monitoring appearance, aroma, flavor, or texture of these same prod-
ucts. In many cases, quality control sensory measurements early in the process can
also reduce the need for frequent sampling of finished product. This is especially
true when the effects of early detection of problems have been related to problems
in the finished product.
in one package, they may respond negatively only to the extremes, and may rate a
very wide range of color as highly acceptable. Consumers may also fmd a wide
range of fragrance levels acceptable in personal products. Conversely, however, a
low-level indication of staleness or rancidity in the cookies or potato chips may
yield dramatic loss of consumer acceptance. Likewise, a slight grayness in the
color of a shampoo may reduce overall acceptance, as well as acceptance for
appearance or cleaning ability.
The criteria for including sensory attributes into the QC/sensory program are 1)
those that induce negative reactions in consumers and loss of consumer acceptance
and 2) those that management feels are too broad a range to represent a clear
concept of the product.
Chapters 3 to 6 cover the different methods that can be used in a QC/sensory
program and include a discussion of attribute selection for a variety of consumer
products, as well as the criteria used in attribute selection.
Sampling
During the phase of identification of the parameters to be included in the QC/sen-
sory program, it is necessary to determine the amount of sampling that will have
to be done to understand the variability across products, attributes, and plants.
Preliminary information may be available that indicates certain plants vary more
than others because of the processing equipment or ingredient sources. Certain key
attributes may be a problem only on certain shifts or specific times of a start-up on
a production line.
Early assessment of the scope and frequency of product variability provides a
clue to the possible sampling schedule needed during the preliminary implementa-
tion phase. It is from careful evaluation of possible sources of variability that an
effective sampling plan can be developed, to cover variability across plants,
products, and attributes. Journals such as Technometrics and Journal of Quality
Technology present technical approaches for determining the best sampling plans
for a wide variety of product variability interactions.
IDENTIFYING RESOURCES
After the characteristics of the QC/sensory program are determined, the next step,
before the design and implementation phase, is assessing all possible resources that
are available to support the program. The areas to be studied include the method-
ology, the personnel, the physical facilities, management support, and the avail-
ability of outside resources.
Identifying Resources 29
Methodology
Recommended Methods
Within sensory evaluation, there are several sensory methods to meet project
objectives in basic research, product development and improvement, product
maintenance, and marketing/marketing research. Given the nature of manufactur-
ing and quality control requirements, the sensory evaluation methods that are best
suited to meet those QC needs are those that measure variability. These methods
include those that utilize ratings and those that assess overall conformance to a
product concept.
Rating Methods
Methods that involve ratings of 1) attribute intensities, 2) quality, or 3) difference
afford the sensory analyst the capability of determining a degree of variability from
some target control or specification limit. The three methods prescribed in Chap-
ters 3, 4, and 6 fall into this category. Each is described below with general
application, advantages, and disadvantages.
The Comprehensive Descriptive Method. This method involves the rating of
intensity of individual key attributes, for which specifications are set. This method
identifies the samples that are out of spec and to what degree for each attribute.
Application: For a company's major flagship brand, this method careful assesses
product variability, in terms of both key attributes and the direction they may
vary. Such in-depth information provides direction in making changes in in-
gredients or process to increase greater product conformance.
Advantages: This method provides the most detailed information about "what"
is varying in a product, the size of that variability, and the direction (too high
or too low), relative to the product specifications. This method most closely
resembles the evaluation of other physical or chemical characteristics that are
integral parts of a broad scope QC program.
Disadvantages: This approach is costly and time consuming, mainly for two rea-
sons: the sensory tests required (e.g., descriptive and consumer research tests)
and the time and resources needed to establish and maintain the program
(e.g., extensive training samples and references needed).
The Degree of Difference/Difference from Control Method. This method in-
volves rating each sample in terms of the size of the difference from a designated
control. A cut off point is designated to assess which products or ingredients are
too different from the control to be within specification limits.
Application: When product or ingredient variability follows a single continuum,
from the control to less desirable or "off' production samples, this method
provides a simple, one-step, method for assessing out of spec samples.
30 Program Design and Initiation
Advantages: This method requires few resources and little time to implement.
Only one set of training references is needed, to establish a range of products
from the control (not different to those very different from the control). Panel-
ists need to learn to use only one continuum by which to judge sample confor-
mance.
Disadvantages: A difference from control rating only defmes the size of the dif-
ference from the target control but fails to provide information on the reasons
for the difference. To establish the cutoff for "too different," some consumer
research is required.
The Quality Rating Method. This method involves rating each sample and/or
some set of key attributes for quality. A specification is established for each
attribute scale, to determine proper disposition of the tested sample or product.
Nominal Methods
Methods that allow trained panelists to decide if a given sample is within or outside
of a product concept are also applicable in a QC/sensory program. In a QC
application, the concept represents the range of attribute intensities considered
"acceptable" or "in-spec."
The "in/out" method requires the initial careful definition of the characteristics
of "in" or "acceptable" product. The subjects involved in product evaluation are
shown several examples of the sample, a product, or raw ingredient, some that are
within the product limits or acceptable concept ("in") and some that are outside the
product limits or concept ("out"). During subsequent testing, these trained panel-
ists are expected to determine if the test products or raw materials fall in or out of
the designated "acceptable" product concept.
Application: The choice of the "in/out" method is appropriate for those situa-
tions in which 1) a simple yes/no answer provides sufficient information for
Identifying Resources 31
Methods to Avoid
Some tests, such as most Overall Difference tests, are not effective for QC sensory
work. Triangle, Duo-Trio, A/NOT A, and Simple Difference tests are very sensitive
to fairly small differences. Therefore, by yielding only "different" or "not differ-
ent" answers, they tend to lead to rejection of a large proportion of regular
production through the assessment of only statistically significant differences. The
amount of difference is often the key question in a QC environment, and difference
testing alone will not determine the size of the difference.
Affective responses, including preference and acceptance, are not appropriate
for routine production evaluation. The determination of the preferences or liking
of a small group of plant personnel does not represent the population that routinely
uses the product. However, it should be noted that some affective acceptance tests
using a large consumer base from the population of users should be utilized in
every QC program. The acceptance information, used in conjunction with other
sensory tests at the early stages of a QC program, help determine how variation in
certain attributes drive the acceptance of a product. This affective information is
used to set the sensory specifications.
Personnel
The success of a QCfsensory program relies heavily on two categories of person-
nel: the sensory professional(s), who are responsible for program design and
execution, and the sensory panelists, who are responsible for the routine evaluation
of products and/or raw materials.
32 Program Design and Initiation
Sensory Staff
Designing, administering, and implementing the QC/sensory program require the
expertise of different sensory professionals, depending on the size and scope of the
program. The personnel requirements for a given program might include a sensory
coordinator, a sensory professional at each plant, and one or more sensory techni-
cians working with the plant sensory professional and R&D sensory coordinator.
The R&D sensory coordinator may be a member of the current QC staff who
shows some interest in the sensory activity. In such a case, sensory training
regarding methodology, panelist management and training, and test controls is
critical. More often, the sensory coordinator comes from R&D and is given the
responsibility for the QCfsensory program across several plants. This individual
may have the sensory qualifications, but requires some grounding in plant and QC
practices and in product ingredients and processing.
In most companies, the R&D sensory coordinator has a reporting relationship
to both the R&D sensory manager and the QNQC team or director.
The R&D sensory coordinator has the following responsibilities:
• Manage samples, references, and controls (for product and raw materials) in
terms of pickup, logging, and storage, in accordance with the program's pre-
scribed sampling plans.
• Supervise the preparation and presentation of the samples for each evaluation
session, using strict test protocols to minimize bias due to poor product handling.
• Coordinate panelists and the schedule of evaluation sessions. This includes
working with panel supervisors to assure full and timely attendance.
• Maintain the technical skillsand motivation of panelists.
• Manage the maintenance of the sensory facility and supplies.
• Manage sessions, to insure compliance to the test evaluation protocols for each
panelist.
• Collect and compile data for each sample and report.
• Maintain interaction with R&D coordinator for technical issues and growth of
the program.
One or more QC/sensory technicians may be required to assist the R&D sensory
coordinator and each plant sensory analyst in the following responsibilities:
• Implement the routine sample handling.
• Conduct the routine sensory tests.
• Contact and schedule panelists and tests.
• Enter or record data from each test.
Panelists
As with any QC measurement, the "instrument" is a critical consideration. In
sensory evaluation, the panelists are the measurement tool. Issues to consider when
selecting, training, and maintaining panelists are as follows.
Selection
Panelists may be recruited from within the plant or R&D site where evaluations
are to take place or from the neighboring community. Plant employee panelists
may include line workers (when union and personnel limitations have been
considered) or office staff, if use of line workers is limited or untenable. A major
advantage of using plant personnel for QC/sensory evaluations is that a quality
focus is communicated throughout the plant by these plant employee sensory
panelists. However, using panelists from the neighboring community may be
necessary, if personnel from within the plant are unavailable for the necessary
regularly scheduled sample evaluations.
Sensory testing is labor intensive and using plant personnel requires adding person-
nel or extending work schedules to compensate for panelists' time for evaluations.
Accommodations in staffing may have to be made for employee panelists who are
absent from regular shift or office work for more than a 5 to 10 minute period.
34 Program Design and Initiation
Training
When panels are implemented across different plants, it is critical that the same
selection criteria and training program be used at each plant, to insure that
equivalent ..sensory instruments" exist at each site.
The selected sensory program (see Chapters 3-6) defmes the sensory method
for which the panelists are trained. The simpler in/out or difference from control
methods may require as little as 10 hours of training. The quality rating and
comprehensive descriptive programs, which may involve rating several attributes,
may require over 40 hours of training and practice before the panel is fully
functional.
Maintenance
Panel motivation is secured by developing a reward system for panelists early. The
size and frequency of the rewards are based on the skills and time provided by the
panelists, the rewards permitted under union or nonunion personnel rules, and the
commitment of the company and sensory analyst to recognizing a job well done.
Monitoring panelist performance is critical in all QC/sensory programs and will
depend on criteria derived from the selected methodological approach. Each plant
analyst and the R&D sensory coordinator must also develop a maintenance
program that involves regular use of references and controls, regular feedback on
performance, and remedial training programs for individual panelists or whole
panels.
Management Support
From the very inception of a QC/sensory program, all sectors of the company's
management need to understand, make contributions, and ultimately commit to the
proposed program. One major responsibility of the quality team, charged with
developing a QC/sensory program, is to determine both the commitment and
contribution to be made by each sector of management.
A preliminary proposal, which outlines the program and its costs and benefits,
should be presented to company management before any action is taken. Early
decisions about the scope of the program, encompassing things such as the
products or product lines, should be included in the proposal. Input from the
various directors or vice presidents is used to modify the program.
Identifying Resources 35
Once management has had input into the program design, a more detailed proposal,
which addresses the processing stages at which evaluation will be conducted, the
sensory attributes to be measured anda tentative sampling schedule can be presented
to management. In addition, the preliminary recommendations for the methods,
facilities, personnel, and required outside resources are recommended. At this point,
the various directors and vice presidents commit to providing information and re-
sources, as described, in the following paragraphs, for each group.
Throughout the preliminary implementation stages, each director/VP commu-
nicates support within his or her group so that the necessary consumer, sampling,
processing, or physical data are available to the quality team. Once the consumer
data and preliminary product information are collected and compiled, the recom-
mendations, product attribute specifications, and program costs are presented to
management to get fmal approval to move the QC/sensory program forward to full
implementation.
Lack of full involvement of upper management and the resulting support from
each group creates potential problems both in implementing and operating a
QC/sensory program. Specifically:
• Without the input from the management of all groups, the key information on
product, process, or consumer may be missing from the preliminary stage.
• Without commitment and support from the management of all groups, the
necessary resources-time, personnel, facilities, product information, and so
on-may not be complete. The success of the program, no matter how well
planned and executed, may be at risk.
Facilities
Because the test controls for most sensory studies are rigid in terms of the
environment, as well as sample and data handling, it is necessary to provide
appropriate facilities for a QC/sensory program.
Environment
In order to permit panelists to concentrate on their evaluations, the facility in which
the tests are conducted should be:
in the plant. The "best" available location may not meet requirements for comfort
and/or proximity to panelists.
Once the current resources and the cost and potential for upgrading to a suitable
facility are assessed, the quality team needs to decide if implementation at the
plant(s) is tenable or if testing needs to be done at another site, possibly at R&D or
nearby the plant facility.
Sample Handling
To comply with test protocols for sample handling, the test facility should be
equipped with the kitchen appliances, laboratory equipment, and/or office equip-
ment necessary to prepare and store samples for evaluation. The area designated
for sample preparation must be separate from the evaluation site so that panelists
cannot view the preparation process. For each test method, a strict protocol for
sample handling must be written and adhered to by the QC/sensory analyst and
technicians at each plant.
Outside Resources
Throughout the design and implementation of a QC/sensory program, some tech-
nical capabilities may not be available within the company, the division, or the
plant at which the program is initiated. Therefore, outside expertise may be sought
to insure proper development and implementation of the program.
The following technical aspects of the program, which may need outside
support, are outlined with possible resources.
training or one plant training by the outside sensory support. Thereafter, if the
technical expertise has been transferred, further development of the program is
administered directly by the plant QC personnel.
3. The test facility design should be incorporated as part of the program and
methodology design. However, additional input by corporate or consulting
architects and engineers is recommended, to insure the best use of space at the
lowest cost to the company.
4. A system to input and analyze data can be developed and outlined as part of the
total program and methodology design (No. 1 in this list). Actual implementa-
tion requires support from statistical and systems professionals, a management
systems function in the company, or outside systems and/or statistical consul-
tants. Since the QC/sensory program is unlikely to develop full systems exper-
tise, ongoing support is likely to be required and should be an anticipated cost.
5. Correlating sensory and instrumental data requires professionals with an un-
derstanding of sensory data, instrumental data, and statistics. R&D statisticians
are often capable of providing this support. If the company has no inside
professionals, it is critical that the QC/sensory team seek an outside consultant
who has the experience described above and who can be available for addi-
tional support, as the program develops.
In each aspect of the program that requires technical support, the sensory
coordinator and plant sensory analysts must insist on support and training so that
the QC/sensory program can become as self-sufficient and efficient as possible.
Collaborating Groups
As discussed in several instances above on page 35 on outside support, several
groups within a company are needed as partners in the QC/sensory program. From
each broad sector within the company, different support and collaboration are
required to best utilize the company's resources.
mation to the QC/sensory program regarding product behavior and other techni-
cal aspects of product tolerance and product sensory attributes.
• Analytical chemists and rheologists are primary resources for recommended
analytical methods that may correlate with sensory data and may be im-
plemented ultimately at the plant level.
• Systems and statistical support are critical for developing and implementing the
system to handle and analyze the data generated routinely in the QC/sensory tests.
Manufacturing
Manufacturing provides support with personnel and space.
• When plant and manufacturing management understand and commit to the
QCfsensory program, by providing:
• Personnel for plant sensory coordinator and panelists (a major cost);
• QC time and personnel for picking up and storing samples for evaluation
and references; and
• Facility space and engineering to develop the appropriate testing site.
• Product line supervisors contribute to such a program as part of the overall
corporate commitment to quality. Since plant middle management is responsible
for many of the panelists used in testing and for the disposition of products that
have been tested, they are on the "front lines" of the QC/sensory program. Since
they routinely pay the biggest prices toward quality, they need to be informed
early and throughout the program as to the benefits.
• All plant personnel are involved in the QC/sensory program, since they produce
the products that are evaluated. When the objectives and results of the QC/sen-
sory program are communicated to plant personnel, the quality message is
carried to the culture of the plant.
QA/QC
QNQC integrates the QC/sensory program into the overall QNQC program.
• Whether the QNQC and QC/sensory program are part of manufacturing or part
of an independent quality team, the QNQC personnel need to incorporate the
40 Program Design and Initiation
sensory program into the routine daily evaluations, the production trends, and
the overall final measure of product quality.
• QC at each plant is likely to have fmal responsibility forinterpreting the sensory
results and disposing of the product as part of the regular QC function. Each
plant QC manager and the product line QC supervisors must understand how the
QC/sensory program works so that they use the results most effectively.
Importance
An in-plant sensory capability is implemented, in response to management's
commitment to establishing quality and consistency in production, the lack of
consistency observed in a product's sensory characteristics, and/or the possible
sensory-related consumer complaints associated with that variability. It is the
objective of a QCfsensory program to measure and control that variability that is
not or cannot be measured by instrumental methods and, in turn, reduce the
consumer complaints caused by the lack of consistent production. Consequently,
the basis of the QC/sensory program is to identify and document that sensory
variability.
Usually, some information on production variability is available during the
planning stage of a QC/sensory program. The information that companies have on
hand includes variability measurements that are documented in terms of instru-
mental parameters. This information is useful if the correlation between instrumen-
tal and sensory measurements has been investigated and found to exist. Otherwise,
a formal evaluation and documentation of the variability in sensory terms is
needed. The results of this sensory assessment will indicate 1) the sensory attri-
butes of a product that are constant throughout production days and schedules, 2)
the sensory attributes that vary over wide or narrow ranges, 3) the frequency of
occurrence of that variability, and 4) preliminary information on existing fluctua-
Initial Program Steps 41
Sampling of Products
The quality control professionals assist in collecting the samples required for the
evaluation. They supervise the product's sampling plan to assure that all ranges of
processing conditions are represented during the collection of products to be
surveyed. These might include different raw materials suppliers and/or batches,
processing equipment, processing parameters, and environmental conditions. The
available instrumental data base on production variability and current sampling
plans might be sources of information used to determine the sampling plan
required for the collection of production samples.
To study the variability of one or more products or brands across one or more
plants, QC professionals and the sensory coordinator need to decide on the scope
of the sampling. From earlier data, the QC department should have a sense of the
period oftime (one day, one week, one month) across which the product tends to
manifest a normal range of variability. Once this time period is identified, samples
for the study are pulled from the broadest array of sources of variability.
These include the following:
Decisions regarding the number and scope of these pickups determine the
potential for identifying the range and frequency of variability in a product's
production.
There are no set criteria as to the exact number of samples and the sampling plan
recommended for the collection of products, since the production variability varies
from product to product and manufacturer to manufacturer. Without considering
spot problems or unusual processing practices, it has been the authors' experience
that, on the average, the collection of products from three weeks production usually
provides an adequate and representative picture of the usual production variability.
Such sampling criteria are valid for products such as foods (baked goods, salad
dressings, ice cream, or confections) or personal care products (mouthwash,
deodorant, or shampoo) that are produced in a few hours or less. For products
requiring days or weeks to produce (wine, oven dried fruits, pickled vegetables),
the sampling covers a wider time period than a few weeks. It is advisable that the
survey be conducted on two consecutive weeks and that the third week be
completed on a month following the initial survey. This survey might be reduced
slightly when the survey includes sampling production from more than one shift.
Following this or other similarly effective sampling practices, approximately 150
to 250 products should be available for sensory screening.
APPEARANCE
Color intensity
Color chroma
Surta.c• gloss
Surt'aca graininess
Oll separation
P'U..VOR
Roasted peanutty
Raw beany
Dark roast
Sweet aroaatics
Woody/hulls/skins
Cardboard
Painty
Fruity tenanted
sweet
Salty
Bitter
sour
Astringent
TEXTURE
SURFACE
Graininess
Oiliness
Stickiness
FIRST COMPRESSION
Firllness
Cohesiveness·
Denseness
Graininess
Stickiness
BREAKDOWN
Moistness of •ass
Graininess of aass
Unit'ormity ot mass
RESIDUAL
Chalky t'il11l
Grainy particles
Oily rum
x • product mean; range or production t'alls betveen the vertical slashes
FIGURE 2.1. Peanut butter category survey. Products are rated on a 15-cm.
line scale, based on the Spectrum Description Analysis Method. x = product mean.
44 Program Design and Initiation
descriptive Spectrum™ (Fig. 2.1) to proceed with the screening of the remaining
150 to 250 samples over a series of sessions and days. Trained panelists evaluate
these products (e.g., taste, f~l, smell) by focusing on the attributes that differ in
intensity from the benchmark evaluation (Fig. 2.1).
Documentation of Variability
Figure 2.2 shows the ftnal results of this screening process: 1) the list of the
variable sensory characteristics (representing a subset of all attributes that charac-
terize that product) and 2) the corresponding variability (intensity) ranges. The
evaluations differentiate those characteristics presenting wide variability ranges
(e.g., saltiness) (Fig. 2.2) from those with a narrow variability band. (e.g., sticki-
ness). In addition, intensity histograms for each of the variable attributes are
developed (Fig. 2.3). These histograms summarize the percentage of samples for
each intensity in the variability range. This information is used in selecting a
control, as explained below.
Depending on the product type being evaluated, this process might take several
days to complete. The need for valid and reliable data on the identification of
variable attributes dictate that those evaluations be conducted under the most strict
test controls possible, including a close attention to sample presentation condi-
tions, number of samples evaluated per session, rinsing materials, and so on
(Meilgaard, Civille, and Carr 1987).
l. MOST VARIABLE
Saltiness
sweetness
Color
intensity
Astringency
Peanut flavor
2. LESS VARIABLE
Color chroma
Cardboard
Painty
Firmness
3. HARDLY VARIABLE
Stickiness
Surface
gloss
Woody/hulls/
skins
Chalky
The next step is the consumer research study, whose objective is to determine
the effect of these variable attributes on consumer acceptance. The consumer test
is designed to collect acceptance data, as well as diagnostic information on the
variable attributes (e.g., liking and consumer-perceived intensity of peanut flavor).
This consumer information is used to set the sensory specifications used for
decisions for the daily disposition of regular production.
The procedures to design and conduct the required consumer test and to analyze
the data and establish sensory specifications is covered in detail in Chapters 3 to 6,
for the four sensory methods recommended.
{
i
~
8.
e:g.
g=-·
20
0.4
0.5 25
15
:::::::::;:: 20
io.s
~ 0.4
~
~ 0.3 10 ~
15 ~
10
5
~: 5
[
b: I'W'ffl<J<,X:N'O''f'~X>%'"1
'-~
2 4 6 6 4 6 8 10 12
INTS\ISITY MENSITY
mines the most effective sensory method to use as part of the ongoing day-to-day
evaluation of samples.
The decision regarding which method is best requires some judgement based
on the following criteria:
• If the variable attributes can be or have been determined and if these are limited
to five to ten key attributes, the comprehensive method is feasible. Yet it requires
careful training, using good references and dedicated panelists.
• In cases where a specific product control can be identified and where product
variability is limited to a single continuum, such as amount of processing, but
where this variability is not easy to characterize with specific attributes, the
difference-from-control method is a likely method of choice. Complex products,
such as coffee, beer, yogurt, and so on, which involve raw material compounded
by processing variables may fall into this category.
• When the product variability is not easily defmed by specific sensory attributes,
but can be more readily reflected in the broad sensory parameters (appearance,
flavor or fragrance, texture), the quality rating method is a likely method of
choice.
• In those cases, when variation cannot be specifically defmed by attribute, when
no single control is identified, and when examples of unacceptable product cover
a multitude of sensory conditions, the in/out method is recommended.
There are, however, cases in which the QC/sensory professional has more than
one option or in which the choice of test method may not be clear cut:
Identification of a Control
For the implementation of a QC/sensory program, where judgements are made
relative to a "control'' or a "standard," identification and documentation of such
product is required during the preliminary phases of the program. This step is
usually completed in the evaluation process for identifying production variability,
described earlier.
A "control" used for quality purposes is referred to as a product that is used as
a representation of certain characteristics (not necessarily the "optimal") and a
product that can easily be obtained, maintained, or reproduced. The criteria for
choosing a "control" can be arbitrary or deliberate. For example, a "control'' might
be selected as 1) a product manufactured in plant X, 2) the product with the
characteristics produced "most frequently" (the product in Fig. 2.3 a), 3) the
product preferred by consumers, 4) the product with a given intensity of an
attribute, or even, 5) the pilot plant product.
A "standard" is a product considered to be the "optimum," "most preferred by
consumers," or the highest quality product a company can manufacture. In some
companies, this is called the "gold standard." The gold standard has value in a
company versus an ultimate target for R&D and manufacturing to work toward
producing. Since it does not represent current normal production and since it may
not be possible to produce it using current materials and processes, the gold
standard should not be used as the control in a QC/sensory program.
Initial Program Steps 49
It is clear with this distinction that the selection of a "standard" is more involved
and complex than the selection of a "control." A considerable amount of research
and the input of management are required to select and document a "standard"
product. As a result, few consumer products companies have identified a "stan-
dard" for judging the overall sensory quality of daily production. In many cases,
companies fmd that using such standards is frustrating to manufacturing and R&D,
since they can be reproduced only under the very best, and thus limited, circum-
stances. This discussion shall only focus on selecting and using a "control" for
quality control purposes.
Two philosophies exist among companies regarding who determines what is "in
spec" or "good." Most companies realize that it is the consumer who dictates which
products or product characteristics should be considered "good quality." Others
rely on upper management's input to set the criteria. With some products, consum-
ers have rather loose criteria for "goodness" and fmd the bulk of a company's
production acceptable. Management may choose to set stricter criteria as a com-
mitment to quality, in spite of the consumer's broad acceptance (i.e., quality =
consistency).
Independent of the criteria selected to establish a "control," a survey, as
described in the "Identification of Production Variablity" section, on page 40, is
needed to provide supportive information when selecting a "control." Specifically,
the information on product variability (Fig. 2.2 and 2.3) is needed to select the
product that is considered a "control." Management needs to know the position of
various potential "controls" in the production variability ranges and their fre-
quency of occurrence, to select a given product as a "control." For example, it is
unlikely that a company will select a product with an intensity of 6 in attribute 2
(Fig. 2.3) as a "control," even though it would be found to be the product most liked
by consumers. The frequency distribution graph shows that a product with this
intensity is produced very infrequently and would not represent an adequate
control. H this product is chosen as a "control," a large percentage of daily
production would be rejected, since product close to the "control" is produced very
infrequently. A company would most likely select the next highest quality product
as the "control" (e.g., a product with intensity of 8).
Summary
The decision to implement a QC/sensory program requires a considerable amount
of ~up front" work before the program is established. The question of where to start
requires study of
cation of the resources required to meet those needs. Time, money, and corporate
support are required to provide the test methods, personnel, test facilities, and
support services required to implement the program. As with all sensory testing,
the objective is to provide the most sensory information in a timely and cost-effec-
tive way. Balancing needs against resources early in the program is essential.
The first steps in the program are to determine the product variability in terms
of type, frequency, and size and to select the appropriate test method(s), to
effectively and efficiently track that product variability. Cooperation between QC
and sensory personnel is critical during this phase, since each group brings
different perspectives to the identification of program focus and program re-
sources.
3
Comprehensive Descriptive
Method
ABSTRACT
This approach is the most comprehensive in-plant sensory program. It represents
a program in which a well-trained sensory panel operates as any other analytical
instrument in the QC laboratory. This panel provides data on the intensity/level of
a small set of the product's sensory attributes (approximately 5 to 15). These
attributes are known to vary during production and have been found to affect
consumer acceptance. A specification is set for each of the variable sensory
characteristics.
Specifications represent the tolerable range of intensities for plant produc-
tion. Products whose intensity on any given attribute fall outside those set
specifications are considered unacceptable. The specifications are set prefera-
bly with the input from consumers and management and the consideration of
realistic production and cost limitations. Due to its comprehensive and com-
plex nature, this program is mainly geared to the evaluation and quality control
of finished products.
The two main advantages of this approach are 1) the absence of any subjectivity
in the evaluation (panelists act as sensory instruments, without incorporating
non-product information that might bias their judgements), and 2) the quality of
the data obtained. These data lend themselves to a variety of data analyses and
manipulations, such as those that show the relationship of the panel data (product's
sensory data) with either instrumental and/or consumer data. This program's main
disadvantage is the high cost and the time involved in its implementation and
operation.
52
Abstract 53
PROGRAM IN OPERATION
The comprehensive descriptive approach consists of having at each of the manu-
facturing facilities a well-trained sensory panel that evaluates daily production
samples. All panels are trained similarly and provide comparable information for
any given sample. Each panel provides information on the intensity/level of the
critical sensory variable attributes of the product.
Table 3-1 shows the results of the evaluation of a production batch of potato
chips labelled 825, using this method. The results shown are the average value of
the panel scores for each attribute (e.g., 7.5 for hardness) and are interpreted as any
other analytical/instrumental information would be. They indicate the level at
which each of the product's attributes is perceived.
These data, provided to QC management, are used to make decisions regarding
the disposition of the evaluated production batch. These decisions are based on the
comparison made between the panel results and the specifications set for each of
the product attributes. The column on the right-hand side of Table 3-2 shows the
sensory specifications. The sensory specifications for each variable attribute were
added to this table for the purpose of this discussion. However, it should be clear
that those specifications are never included in the panelists' evaluation forms
(ballots) and should not be information provided to panelists. This information is
known only by management, to compare the panel results to the product's specifi-
cations. A product is considered unacceptable if it falls outside the specifications
Table 3-1. Average panel results for "batch 825" of variable potato chip attributes
APPEARANCE
Color intensity 4.7
Even color 4.8
Even size 4.1
FLAVOR
Fried potato 3.6
Cardboard 5.0
Painty 0.0
Salty 12.3
TEXTURE
Hardness 7.5
Crisp/crunch 13.1
Denseness 7.4
54 Comprehensive Descriptive Method
APPEARANCE
Color intensity 4.7 3.5-6.0
Even color 4.8 6.0-12.0
Even size 4.1 4.0-8.5
FLAVOR
Fried potato 3.6 3.0-5.0
Cardboard 5.0 0.0-1.5
Painty 0.0 0.0-1.0
Salty 12.3 8.0-12.5
TEXTURE
Hardness 1.5 6.0-9.5
Crisp/crunch 13.1 10.0-15.0
Denseness 7.4 7.0-10.0
("out-of-spec"). For example, Table 3-2 shows that the production sample 825
would be considered unacceptable or out-of-specifications. The intensities of
evenness of color (4.8) and cardboard (5.0) fall outside the tolerable intensity
ranges set (specifications: 6.0 to 12.0 for evenness of color and 0.0 to 1.5 for
cardboard). The analysis of these data help management in their decision-making
process.
If an SPC program is being used (see Appendix 5), the data are plotted on
control charts and determination of "in-control" production is made. The trend
toward an out-of-spec cardboard intensity, for example, may have been identified
on the control chart before product had to be discarded (see Fig. 3.1).
...
2:-
5
!
.!:
'E 4
~ 3
"'
0
2
Upper Spec. Umil
LCl
0
2 3 4 s 6 7 8 9 10 11 12
LOTI
FIGURE 3.1. Control chart of cardboard flavor intensity of potato chips, showing the increasing
trend before the out-of-paper spec batch 9 was produced.
Product Amount and Storage. The number and type of sensory tests deter-
mine the total amount of product needed. Although this is important in all sensory
test designs, it is critical in a quality control consumer research situation because
of the large amount of product handled and the amount of product storage needed
until all testing is completed. The large volume of product is a function of all the
tests to be conducted (product inspection for screening, descriptive, and consumer
tests) and all the products that are initially collected and surveyed.
Table 3-3 shows an example of the amount of product needed to establish
specifications for the potato chip example. Part I of the table corresponds to the
common calculation usually followed when planning sensory tests. Part II shows
the unique characteristics of the calculation of product needed for quality control
purposes. For a comprehensive descriptive program, all the amount in Part I of
Table 3-3 must be multiplied by the total number of production samples collected.
If the recommendations of Chapter 2 are followed and 100 to 150 product samples
are collected over one to four weeks of production, then the total amount of product
to be collected and stored is 100 to 150 times the amount calculated in Part I of
Table 3-3. For the potato chip example, the total amount of product with 100
production samples is 8,500 bags of chips.
The large storage space is needed for a short time only. Once the final sample
selection for the consumer tests has been made, up to 70% of all the product can
be either placed back in the distribution cycle or disposed.
Test Scheduling. The most important factor to consider when scheduling all
tests (e.g., product screening based on variability, consumer, and descriptive tests)
is the shelf life of the product being studied. When shelf life is not a limitation, all
tests can be planned and conducted leisurely. Among these cases are the testing of
paper products, fabrics, and some household and personal care products. However,
for other products-certain food and beverage products are extreme cases of this
category-a careful scheduling of activities is required, due to the product's short
shelf life. The schedule has to assure that the products to be tested undergo
minimal, if any, changes from the time they are produced or distributed until they
are evaluated. This problem is alleviated in those few cases where pilot plant
58 Comprehensive Descriptive Method
Potato Chips
• Management meeting
20 participants 2
• Consumer study
100 consumers per location (20)
300 consumers per location 60
produced samples are similar to production samples. Then, pilot plant samples can
be specifically produced for the consumer test, without any scheduling problems.
For products with relatively short shelf lives (e.g., up to six weeks) and no pilot
plant substitutes, the scheduling of all evaluation tests is difficult, if not impossible.
In this case, all the product evaluation tests are to be conducted in two phases.
Specifically, the first phase includes only the sample collection, screening, and
documentation of variability. The second phase consists of screening samples and
the consumer and descriptive tests.
administered. On the other hand, if other surveys have been completed, the
sampling needed is less involved and is geared to fmd the samples that span the
entire range of production variability.
Complete Survey. The following survey is scheduled for the potato chip
example. Product is collected from:
Descriptive Analysis
The descriptive analysis is completed to obtain a detailed characterization of the
selected subset of products. This information shows each sensory characteristic of
the product and the intensity/level at which it is perceived. The descriptive
information is used for three purposes:
1. The selection of the fmal set of products to be consumer tested (pg. 60);
2. The establishment of specifications (pg. 76); and
3. The selection of replacement product references in later stages of the program
operation.
60 Comprehensive Descriptive Method
The descriptive information is obtained from a trained panel. This test can be
completed through the R&D sensory panel, if there is one in the company.
Otherwise, a contract research panel can conduct these evaluations.
An experienced sensory professional designs and administers the descriptive
test. Consideration should be given to the experimental design, sample preparation
and evaluation procedures, and test controls (Meilgaard, Civille, and Carr 1987).
A complete characterization of each sample includes the evaluation of a total of
20 to 40 attributes per sample. Table 3-4 shows the list of all attributes evaluated
for potato chips and the results for one of the batches evaluated. (The scale is a
0-15 intensity scale where 0 =none and 15 =extreme).
Upon completing this test, a similar descriptive characterization is obtained for
each of the 25 potato chip samples screened. With this test, more precise informa-
tion on the product variability is obtained. Table 3-5 shows the summary statistics
and its variability ranges for each attribute obtained through the evaluation of 25
samples. An inspection of these results gives an indication of which attributes
present small, medium, and large variability. For example, some of the attributes
showing the largest variability are evenness of color, evenness of size and shape,
saltiness, and crispness.
The descriptive characterization is used for the fmal sample selection described
below and for the data analysis set for setting specifications, described later in this
chapter (pg. 76).
The same procedures and principles that document production variability, as
described for the potato chip example, would apply to this or other food and
nonfood consumer product. For example, Figure 3.2 shows the production vari-
ability ranges for nail enamels. The line scales are 15 em scales, where the distance
between the left end marked "0" and the first slash mark on the line represents the
lower range intensity value. For this product, the attributes that show high variabil-
ity (e.g., opacity, spread) can be identified from these results.
Sample Selection
Starting with a collection of samples that span the range of typical product
variability, it may not be clear as to which attributes best summarize the variability,
nor which subset of samples most economically span the variability in all of its
meaningful dimensions. It turns out that the two issues are intrinsically linked.
Identifying attributes that exhibit meaningful variability and selecting representa-
tive samples occurs simultaneously.
There is no fixed analytical procedure to accomplish the task of identifying
variable attributes and representative samples. The approach is instead an explor-
atory process in which samples possessing unique (and not so unique) combina-
tions of attribute ratings are identified while simultaneously tracking the varying
and co-varying of the attribute ratings themselves. Sophisticated data analysis
Implementation of the Program 61
APPEARANCE
Color intensity 3.2
Evenness of color 6.0
Blotches 0.0
Translucency 9.7
Evenness of size 3.4
Evenness of shape 3.6
"Thickness 4.5
Bubbles 5.2
Folds 7.3
FLAVOR
Potato 5.3
Raw 2.1
Cooked 2.0
Fried 3.3
Skins 0.7
Heated Oil 1.5
Earthy 0.8
Cardboard 1.2
Painty 0.0
Salty 10.5
Sweet 4.3
Bitter 0.5
Astringent 5.0
Bum 4.0
TEXTURE
Surface bumpiness 5.2
Oily surface 4.1
Hardness 6.3
Crispiness/crunchiness 12.4
Denseness 8.9
Number of particles 11.5
Abrasiveness of particles 4.5
Persistence of crisp/ crunch 6.1
Mixes with saliva 10.1
Cohesiveness of mass 4.1
Grainy mass 6.7
Toothpack 6.5
Oily film 4.3
62 Comprehensive Descriptive Method
Table3-5. Summary statistics of the sensory attributes for 25 samples of potato chips
procedures are available to apply to the problem. However, only the minimum
level of computational complexity necessary to accomplish the task should be
used. Blind reliance on sophisticated techniques can yield misleading results.
Simple graphical techniques play a central role in the approach and may be all that
are required. The basic data analysis tools used for sample selection are data plots
(histograms, scatterplots, etc.), summary statistics (means, standard deviations,
correlation, etc.), and, if necessary, principal components analysis.
hnplementation of the Program 63
0 15
l.Jnount on
brush
Viscosity
Cohesiveness;
stringing
DELIVERY
0 15
Spread
Streaking
],PPE~CE
0 15
Cover/
opacity
Gloss
Color
intensity
Color
eve:ness
Eveness of
surface
AfTER ABUSE
0 15
Chipping
Scuffing
present at even very low levels. In the example, a total range of values of 0.5
units or less has been deemed to be too small to be worried about. Of the
original 36 attributes, 18 show little or no variability. The attributes remaining
after the initial screen are presented in Figure 3.3.
The absence of statistically significant differences among the samples should
not be used to decide what constitutes a trivially small range of responses. Failure
to fmd significant differences does not mean that the samples are the same. Rather
large differences (e.g., 1 or 2 units on a 15-unit scale) may fail to be declared
significant for a variety of reasons, including small panel size or confusion about
the attribute scale. Familiarity with the product category and good sensory judg-
ment should be used with the statistical results to make this decision.
l,PPEARANC;: 0 15
Color int.
Even color
Translucency
Even size
Even shape
Folds
~ 0 15
Potato
fried
raw
Cardboard
Painty
Salty
Burn
~ 0 15
Oily surface
Hardness
crispjcrunch
Denseness
Toothpack
STEP 2: Screen the Samples for Extreme Ratings. Using only the attributes
that remain after step 1, attention now focuses on the samples. Histograms of the
attribute ratings, such as those in Figure 3.4, are used to identify samples at the
extremes of the distributions. The graphical approach will also reveal more about
the general distribution of the ratings than is obtained from the summary statistics
tabulated in step 1. For example, Figure 3.4a shows a regular, bell-shaped distri-
bution of ratings for even shape. No sample stands out as odd in the plot. Similarly,
in Figure 3.4b, no sample is outstanding for even size. However, the even size
histogram reveals a uniform (i.e., flat) distribution of ratings. Even though even
shape has a slightly larger range of variability (see Table 3-5), sample-to-sample
variability in even size is more apparent because of the greater likelihood that an
individual sample will fall at the extremes of the range.
Figures 3.4c and 3.4d illustrate two important situations involving "extreme"
samples. In Figure 3.4c, sample 4 has a very high saltiness rating and sample 7 has
a very low saltiness rating, relative to the rest of the samples. The remaining
samples exhibit a regular pattern, similar to Figure 3.4a. Figure 3.4d shows another
important case, where, for the attribute painty, sample 12 exhibits a moderate
response (3.8), while the remaining samples in the group all have ratings of 0. The
tabular summary of the data done in step 1 indicates a meaningfully large range of
values for painty, but all of the apparent variability is due to a single sample.
Based on Figures 3.4c and 3.4d, samples 4, 7, and 12 will be included in the
consumer test, to determine the effects of their extreme behavior. However, they
will not be included in any further steps of the current sample selection analyses
because doing so would reduce the sensitivity for identifying variable attributes
and representative samples.
In addition to looking for extreme samples, the distributions of the attribute ratings
should also be examined for any other atypical patterns, such as multi-modality (i.e.,
multiple 'peaks') or, in the extreme, groups of samples with completely nonoverlap-
ping ranges of ratings. This information should be noted. Results of parametric
analyses, such as correlations and principal components, must be interpreted in light
of any peculiar patterns that are known to exist. Further, the information can be used
directly in the fmal selection of samples for the consumer test.
Examining the histograms of the remaining attributes (not shown) reveals that
raw potato flavor and cardboard exhibit the same behavior as painty, with all but
one of the samples (either 4, 7, or 12) having the same, or nearly the same ratings.
Since samples 4, 7, and 12 are already selected for consumer testing, and there is
no additional meaningful variability in these three attributes, they can be elimi-
nated from further consideration in the sample selection process.
STEP 3: Identify Samples that Depart from Interrelationships Among the
Attributes. Eliminating extreme samples and non-varying attributes from the
analysis permits a more sensitive study of the interrelationships among the remain-
01
16 A 01
16 B
Ul II)
-! 12 ! 12
E
al
en ~ ]
~
8 en
0 0 8
....
Gl
.J:J
E 4 !E 4
[.
~
~
z z
2 4 6 8 10 12 14 0 0 2 4 10 12 14
[
'1:1
Even Shape &enS~e a.
Ci
16 c 16 D
I
., .,
! 12 ! 12
E E
al al
en 8 en
0 0 8
....
CD ....
.J:J ..8
E 4 4
~ E
~
z z
0 0 2 4 6 0 0
14 2 4 6 8 10 12 14
Salty Painty
FIGUREJA. Histograms of the distribution of the intensity ratings of 25 samples of potato chips for selected attributes.
Implementation of the Program 67
ing attributes. Such an analysis is important because samples that do not fall at the
extremes of any attribute ratings may still be unique in the sense that they fail to
follow the pattern exhibited by the majority of the samples when two or more
attributes are considered simultaneously. Such samples may be perceived as
having an "unbalanced" or "unblended" flavor and, as a result, be less acceptable
to consumers.
Scatterplots are good starting points for identifying relationships among the
attributes and for singling out unique samples. Samples that do not follow the
systematic pattern of the remaining samples should be set aside for further
study. (The possibility of data entry errors should not be overlooked at this
point, either.) The 153 scatterplots of the possible pairs of 18 attributes in the
potato chip example are too numerous to present here, but that number of plots
can be reviewed in a matter of minutes, to reveal the strength and nature of the
pair-wise relationships among the attributes and to determine if any samples
depart substantially from the general trends. In the example, samples 3 and 20
departed from otherwise linear relationships between saltiness and potato
complex (see Fig. 3.5). Samples 3 and 20 are selected for consumer testing
because of their unique behavior.
To summarize, after the first three steps of the example sample selection
analysis:
STEP 4: Select Final Set of Samples for Consumer Testing. In situations where
only a small number of attributes are being considered, it is possible to make the
fmal selection of samples for consumer testing directly from plots and tables of the
descriptive attribute ratings. Samples should be selected to represent the low,
middle, and high ranges of variability in each attribute. When the number of
attributes is too large for direct selection of samples, principal components analysis
(PCA) can be applied, to reduce the number of dimensions of product variability
that need to be considered (see Appendix 4).
PCA applied to the 15 attributes that remain in the analysis after steps 1, 2, and
3 reveals that only two principal components (PCs) are required to explain 88
percent of the variability in the 20 remaining samples (i.e., less the 5 samples that
68 Comprehensive Descriptive Method
3 •
•• •
X (!)
• •••
Q)
a.
E
•
0
•
0
E
0
•• •
•
.....
0
a.. •
••• • (!)
•
20
Salty
F1GURE 3.5. Scatterplot of the attribute ratings of 20 potato chip samples for saltiness vs. potato
complex intensity. Of the 20 samples, 18 exhibit a strong linear trend between the two attributes.
percent of the variability in the 20 remaining samples (i.e., less the 5 samples that
were selected for testing in steps 2 and 3, earlier). The leveling out of the scree plot
in Figure 3.6 shows that no significant gain in explained variability occurs by
including additional components. The PC loadings of each attribute, in Table 3-6,
are interpreted in the same way as correlation coefficients. The loadings can be
studied to reveal the groupings of the attributes within each PC.
More important to our current purpose, the PC scores are computed for each
sample using, for example, PROC SCORE (SAS 1989). The two PC scores per
sample are much easier to examine than the original 15 attributes. Samples
representing the low, middle, and high ranges of product variability can then be
Implementation of the Program 69
5
•
4
....>-
.c
nl 3
•
....
nl
> 2
0
• • • • • • •
2 3 4 5 6 7 8 9
Principal Component Number
FIGURE 3.6. A screen plot from the PCA (see Appendix 4), based on the descriptive attribute
data from the potato chip example.
Table 3-6. Principal component loadings of 15 sensory attributes for 20 samples of potato chips
selected from this smaller number of dimensions. The samples tentatively selected
for consumer testing are indicated in the plot of the two PC scores in Figure 3. 7.
The last step before fmalizing the sample selections is to examine the distribu-
tion of the tentatively selected set, along with those extreme and unique samples
identified in steps 2 and 3, earlier, on their original attribute scales. Establish that
the range of variability in each attribute is adequately spanned by the selected
samples. If necessary, drop or add samples to achieve a uniform coverage of the
observed ranges. For instance, in the example, none of the samples selected thus
far had an extremely high rating in even color. Therefore, sample 8 is added to the
C\1 (!)
• • •
0
0..
17
(!) • (!)
• •
1
22
• •
(!)
19
• (!)
23
• (!)
6
PC1
FIGURE 3.7. A scatterplot of the principal component scores of 20 samples of potato chops. The
highlighted samples were selected for consumer testing.
Implementation of the Program 71
set of selected samples for no other reason than because sample 8 had the highest
rating in even color.
Step 4 results in the selection of samples 1, 6, 8, 10, 15, 17,19, 22, 23, and 25
for consumer testing. Add to these the extreme samples 4, 7, and 12 and the unique
samples 3 and 20, to form the complete set of 15 samples to be evaluated by
consumers.
• Only one test location is needed when both the production and the product
distribution areas are confmed to one region.
• Several regions must be tested when there are manufacturing facilities in various
locations, and the regional products differ due to either formula and/or process-
ing differences.
The results from multiple locations can be used differently. Companies may
treat data and programs regionally (establishing different specifications and
QC/sensory programs in each region), or companies may try to standardize re-
gional production differences prior to establishing specifications and the QC/sen-
sory program.
Consumer Recruitment. Medium or heavy users of the product should be
selected. Interaction with the company's Market Research department is required
at the planning phase, to select the screening criteria needed for the study.
Experimental and Test Design. The consumer research needed in a QC/sen-
72 Comprehensive Descriptive Method
sory project involves the evaluation of many samples. The sample selection
process described above is geared toward reducing the total number of samples.
However, a considerably large number of products is still tested. For example, for
the potato chip example, a total of 15 products is to be consumer tested. The
sensory professional in charge of the study selects the most appropriate design,
based on the characteristics of the products and the total number of samples to be
evaluated.
Unless the products require an in-home placement, a central location test (CLT)
design is recommended. A CLT provides a better setup for evaluating a large
number of samples. Situations that require an in-home placement are more expen-
sive and time consuming than CLTs and, in some instances, may require that a less
than optimal test design be used, in order to complete the study.
In selecting the most appropriate experimental design, the researcher has to
decide on the use of a complete versus an incomplete block design (see Appendix
3). Typically, because of the large number of samples involved, an incomplete
block design is used. In an incomplete block design, each respondent evaluates
only a small number of the test samples. In a complete block design, each
respondent evaluates all of the test samples. Choosing a complete block design
involves either long evaluation sessions (e.g., two four-hour sessions) and/or
multiple sessions conducted over several days, to be able to test all samples. These
designs have advantages and disadvantages (Carr 1989) that should be considered
when planning the study.
In the potato chip example, 15 samples were selected for consumer testing. To
avoid respondent sensory fatigue and boredom, an incomplete block design was
selected. In this design, each consumer evaluates only 3 of the 15 samples. Table
3-7 shows the characteristics of the design, including the total number of consum-
ers needed to obtain the desired minimum of 70 replicate evaluations per sample.
Consumer Questionnaire Design. The consumer questionnaire used in this
study is designed to assure the collection of the type of data needed to relate the
consumer information to the descriptive panel (intensity) data (see pg. 76). Figure
3.8 shows an example of the type of questionnaire used in the potato chip study.
Other versions of this questionnaire could be used.
The questionnaire designed for this study should have two parts: acceptance and
attribute questions. Figure 3.8 shows that, for the potato chip example, overall
acceptance, appearance, flavor, and texture acceptance, and acceptance of specific
attributes (potato flavor, crispness) are included. In addition, intensity questions
for the same attributes are asked (e.g., potato flavor intensity, freshness intensity).
Other tests, such as descriptive and consumer qualitative tests (e.g., focus
groups), a:re very useful when designing consumer questionnaires. The descriptive
information is used to select the attributes to be included in the questionnaire.
These attributes are those found to be the most variable in the product. For
Implementation of the Program 73
Table3-7. Balanced Incomplete block design used for consumer study or potato chips-IS
samples evaluated in groups or 3
Repeat the design 10 times, using 350 respondents, to obtain 70 evaluations per sample.
Block Samples
(1) 2 3
(2) 4 8 12
(3) s IO IS
(4) 6 11 I3
(S) 7 9 I4
(6) I 4 s
(7) 2 8 IO
(8) 3 I3 I4
(9) 6 9 IS
(IO) 7 11 I2
(11) 6 7
(I2) 2 9 11
(13) 3 2 IS
(14) 4 IO I4
(IS) s 8 I3
(I6) 8 9
(I7) 2 I3 IS
(I8) 3 4 7
(19) s 11 I4
(20) 6 IO 12
(2I) I IO 11
(22) 2 I2 I4
(23) 3 s 6
(24) 4 9 13
(2S) 7 8 IS
(26) 12 I3
(27) 2 s 7
(28) 3 9 10
(29) 4 11 IS
(30) 6 8 I4
(3I) I I4 IS
(32) 2 4 6
(33) 3 8 11
(34) s 9 12
(3S) 7 IO 13
example, the potato chip questionnaire includes the attributes color, evenness of
color, potato flavor, saltiness, frrrnness, and crispness, because those are the most
variable characteristics (Fig. 3.3). Focus groups, a qualitative consumer test, are
useful in selecting the appropriate consumer terms to be included on the question-
naire. For example, "freshness" was selected with the information obtained in
~
0 0 0 0 0 0 0 0 0
DISLIKE LIKE
EXTREMELY EXTREMELY FLAVOR
0 0 0 0 0
I
0 0 0 0
DISLIKE LIKE
Please indicate what you particularly liked or disliked EXTREMELY EXTREMELY
about the product.
LI.KW ~ TEXTURE
0 0 0 0 0 0 0 0 0
DISLIKE LIKE
EXTREMELY EXTREMELY
Retaste the product as needed and mark your response for both questions (LlKIHG AND INTENSITY/LEVEL).
~ IN'l'E,.SITY/LEYEL
COLOR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
dial ike like light dark
EVENESS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
OF COLOR dial ike like uneven even
POTATO 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
FLAVOR dialike like none atrong
SALTINESS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
dial. ike like not very
aalty ealty
FRESHNESS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
dialik.e like not vary
freeh fresh 3
FIRMNESS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ~
dialike like not firm/ vary
8
~
aoEt firm
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
g,
0
CRISPNESS 0
very crispy ::I
die like like not crispy
0
.....
OILINESS 0 0 0 0 0 D 0 D 0 D 0 D 0 D D D D D ;.
very oily ~
die like like not. oily
'tl
a
~
8
FIGURE3.8. (continued)
-..I
V\
76 Comprehensive Descriptive Method
groups conducted for the potato chip project. Consumer responses to samples with
a "cardboard" note (descriptive term) were "not fresh," "stale," and "old." The
consumer term "freshness" was included, to capture the consumer responses to
samples with a cardboard note.
Test Execution and Validation. The sensory professional oversees the execu-
tion of the test, to ensure adherence to the test protocols and controls. As in other
consumer tests, the actual execution of the test is a straightforward process,
provided that the study has been carefully designed.
It is re:commended that a validation of completed consumer responses be done.
This practice involves contacting a subset of respondents by telephone, to confrrm
their participation in the test.
A) No Relationship
ll)
u
c:
co
c..
ll)
u
u
<(
Attribute Rating
B) Linear Relationships
ll)
u
c:
co
a_
ll)
u
u
<(
C) Curvilinear Relationships
ll) Q)
u u
c: c:
tO
~
0.
ll)
a.
Q)
u u
u 0
<( <(
FIGURE 3.9. Scatterplots of common relationships between acceptance responses (either overall
or for attributes) and descriptive attribute ratings.
78 Comprehensive Descriptive Method
tive ratings. These trends are readily apparent in plots and are accompanied by
large (positive or negative) correlation coefficients. A simple linear regression of
the form
Acceptance .. bo + b 1 (Attr)
where (Attr) is the descriptive attribute rating, can be used to obtain predictive
equations that relate the value of a descriptive attribute to the consumer acceptabil-
ity rating (see Fig. 3.9b). (In practice, an eyeball fit may be sufficient.)
Lastly, the plots of the descriptive ratings that exhibit curvilinear relationships
with acceptability (both overall and for attributes) should be examined. For well
behaved data an eyeball fit may be sufficient to develop a predictive relationship
between the descriptive and acceptability responses, but, typically, a quadratic
regression equation of the form
1. Translucency, even shape, folds, potato complex, raw potato flavor, bum, Oily
surface and toothpack exhibit no systematic relationship with any of the
consumer acceptability responses-neither overall nor any of the attributes
(i.e., all plots resemble Figure 3.9a).
2. Even size, fried potato flavor, and crisp/crunch exhibited positive linear rela-
tionships with overall acceptance. Cardboard and Painty exhibit negative
linear relationships with overall acceptance (as in Figure 3.9b). Denseness
exhibits no systematic relationship with overall acceptance, but has a negative
linear relationship with the consumer's airiness acceptability response (i.e., as
in Figure 3.10).
3. Color intensity, even color, and saltiness do not exhibit a systematic pattern
with overall acceptability. However, color intensity and even color exhibit a
curvilinear relationship with the consumers' color acceptability response re-
sembling the pattern presented in Figure 3.9c. In addition, saltiness exhibits a
similar curvilinear relationship with the consumers' saltiness acceptability
response. Only hardness exhibits a curvilinear relationship with overall ac-
ceptability.
CD
(.)
c
<1l
CD
(.) a.
c CD
<1l (.)
(.)
a.CD <(
(.)
(.)
<( ..r:.
01
n;
-
:::::i
.._
..... VI
CD VI
> Q)
0 c
·;;:
~
I~
Denseness Denseness :;:.
g
g,
FIGURE 3.10. Scatterplots of overall acceptance versus denseness intensity (as measured by the descriptive panel) and aimess acceptance versus g-
denseness intensity for the 15 potato chip samples.
~
~
-..I
10
80 Comprehensive Descriptive Method
In order for the relationships revealed in the analysis just discussed to be used
to set QC specifications, management must select minimum product performance
criteria for the overall acceptability response and, if necessary, for each of the
attribute acceptability responses considered. No meaningful specifications can be
set without this information because the descriptive attributes by themselves do not
directly reflect acceptability. This issue is discussed in detail in the next section. In
preparing for a management review meeting, the relationships between the de-
scriptive and acceptability responses should be summarized using plots similar to
those presented in Figure 3.9. Only those plots that clarify the relationship between
the descriptive and consumer data should be presented.
1. Those variable attributes which did not show an effect on consumer acceptance
(Fig. 3.9a); and
2. Those variable attributes which did show an effect on consumer acceptance
(Fig. 3.9b and 3.9c).
The discussion and the decisions made are completed by examining one attri-
bute at a time. For each attribute, the following is reviewed:
• The defmition;
• The demonstration of the variability by actual production samples (the lowest,
the highest, and one or two intermediate intensities are shown); and
• The plots displaying the data relationships between consumer responses and
panel attribute results.
Figure 3.9c (e.g., saltiness) affected consumer acceptance and are used to set
specifications. The final specifications represent the limits of the variability range
tolerated for each attribute, expressed in terms of descriptive intensities.
In order to set the QC specifications on the attribute ratings, minimum product
performance criteria must be established for the overall acceptability response and
each of the critical attribute acceptability responses. No meaningful specifications can
be set without this information because the descriptive attributes by themselves do not
directly reflect acceptability. Management of the potato chip company has established
an overall acceptability of 6.5 (on a liking scale from 1-9) to be the minimum product
performance criterion. This cutoff point is used to set specifications.
Figure 3.11 illustrates this process. With a cutoff point of 6.5 in overall
acceptance, the specification set is 5.8-10.0. Products with an attribute intensity
lower than 5.8 or higher than 10.0 yield a consumer acceptance score lower than
~
c: 6
.e
0.
8
c(
5
...cu
Q) 4
Specification
> based on
0 Consumers
3
Specification based on
2 Management Input
Including Equipment Limitations
I
1
4 5 6 7 8 9 10 11 12
Attribute Intensity
Attribute Score
APPEARANCE
Color intensity 3.5-6.0
Even color 6.0-12.0
Even size 4.0-8.5
FLAVOR
Fried potato 3.0-5.0
Cardboard 0.0-1.5
Painty 0.0-1.0
Salty 8.0-12.5
TEXTURE
Hardness 6.0-9.5
Crisp/crunch 10.0-15.0
Denseness 7.0-10.0
hnplementation of the Program 83
• The results of the product survey completed for the identification of variability;
• A subset of production samples that demonstrate the range of the most variable
attributes.
The results of the product survey can be summarized as in Table 3-5 and Figure
3.3. This figure shows the type of variability determined by the survey. Attention
is focused on the attributes with the largest variability (asterisks in Fig. 3.12) and
on off-notes (or other attributes where the variability range might be small but
important).
The subset of production samples shown to management are the products that
represent the extremes of the variable attributes and some intermediate points. The
3 to 4 samples shown to management for each variable attribute are marked with
circles in Figure 3.12. The inspection of samples is essential in this decision-mak-
ing process, since management has to understand both product attributes and their
intensities.
The sensory specification for each variable attribute is set by management, after
the inspection of the samples shown to demonstrate each variable attribute. Based
on their criteria only, management decides how wide or narrow each specification
(tolerable variability range) should be. For example, in setting specifications for
cardboard and denseness, management reviewed the corresponding samples (Fig.
3.12) and set the following specifications:
hPPEARJ.NCE 0 15
Color int.
• • • •
•
* Even color
* Tran•,lucency
* Even size
* Even shape
•
* Folds
FLAVOR 0 15
Potato
* .fried
.raw
* Cardboard
* Painty
Salty
• • •
*
* Burn
TEXTURE 0 15
Oily S1Jrface
* Hardn<ass
• •
Crisp/
* crunch
• •
* Denseness • • •
Toothpack
FIGURE 3.12. Production variability of potato chips and products used for setting specifications
through mangement criteria.
with the company·s products, and the problems involved in handling legal
and/or confidential issues when using nonemployees. The advantages of hav-
ing company employees as panelists include their proximity to the evaluation
location and the use of their technical expertise and/or product experience in
the program. However, the main advantage in using employees in the QC/sen-
sory program is that the product quality message is disseminated throughout
the plant (Keeper 1985). The disadvantages of using employees on the panel
include the logistical problems of adjusting production schedules to allow for
the release of line workers, the lack of motivation and support encountered in
the long run by plant supervisors, and the resulting lower motivation from the
panelists themselves.
Training Steps
The panel is established through three phases: prescreening, screening, and training.
The description of these three phases given below corresponds to the process followed
in the establishment of a Spectrum panel (Meilgaard, Civille, and Carr 1987).
Prescreening
In the prescreening stage, potential panelists are contacted via a prescreening
questionnaire to determine their interest and availability for participation. If em-
ployees are being screened, an indication of management support and interest in
the program (e.g., a letter from the plant manager) is helpful in obtaining cooper-
ation at this stage. Optimally 100 to 120 people are initially contacted at the
prescreening phase, in order to select 50 to 60 people for the screening phase.
However, for small manufacturing facilities, the number of panelists initially
screened may be lower. In order to be selected for this phase, potential panelists
must meet the following criteria: no allergies, availability, limited traveling,
interest, ability to describe simple sensory characteristics, and the ability to use a
scaling system.
Screening
Participants who are potential candidates, as indicated by the prescreening results,
are scheduled to participate in a series of acuity tests and an interview (50 to 60
people). The standard tests and procedures used to screen QC/sensory panelists are
modifications of the procedures used to develop an R&D descriptive panel
(Meilgaard, Civille, and Carr 1987). These modifications include 1) the use of a
smaller set of samples/products for the descriptive tests (e.g., odorants to describe
aromatics) and 2) the use of actual production samples that show product differ-
ences in the detection tests. The purpose of these tests is to select the panelists who
have normal acuity, are able to detect differences in production samples, and show
some ability to describe sensory characteristics and to learn a scaling system.
86 Comprehensive Descriptive Method
Training
Optimally, a total of 20 to 30 participants are selected to participate in the training
program. This large pool is recommended to assure a sufficient number of panelists
per evaluation (e.g., 10 panelists). Plants that operate multiple shifts may conduct
at least two training sessions to include the personnel from all shifts. A smaller
number of panelists is selected in small plants, due to fewer staff members working
at the production facility. In that case, a more involved training program is required
to obtain a low within panel variability, which is achieved through additional
training and practice time.
Depending on the product category and the number of attributes, the training
program requires two to five 6-hour training sessions, for a total of 12 to 30 hours.
Additional sessions are scheduled thereafter for practice.
Prior to the initiation of the training program, a plant sensory coordinator must
be identified. The plant sensory coordinator gets involved in all the steps of the
training, practice sessions, and panel maintenance. This participation and his or her
interacltion with the R&D sensory professional or consultant are required to assure
the coordinator's independent work in later stages of the program.
During training, it is important that the sensory coordinator participates as a
panelist trainee, not as a technician. The coordinator needs to have at least the same
panelist skills as the panel itself, so the sensory coordinator can monitor the panel
perfom1ance, understand panelists' and product problems, maintain the panel, and
operate the program.
The most important component of the training is the product training phase,
where the panel learns to evaluate the variable product characteristics for which
sensory specifications were set.
For the potato chip example, the plant panel is to be trained to evaluate the
following characteristics, which were found to have an impact on consumer
acceptance:
The ..Training Program" section on page 88 presents the details of the training
program.
tests may have changed by the time the training is scheduled. At this point of the
program, enough information and product experience has been gained to be able
to conduct a smaller yet effective survey. Knowing the attributes and attribute
intensities that need to be represented by products, the survey becomes simpler.
The process then consists of fmding the production samples that represent the
various attributes and intensities needed for training. For products with a long shelf
life, the samples needed for training may be available from previous surveys, if
enough product has been stored.
Optimally, at least three intensity references per attribute are selected and
shown to panelists during training. Figure 3.13 shows the training samples identi-
fied for the potato chip program marked with circles. The most important samples
for training purposes are those that display the highest and lowest attribute inten-
sities. It should be noted that certain extreme intensities (low or high) found in
previous surveys and even tested in the consumer study might not be found in the
survey that is completed prior to the training. Figure 3.13 shows this for evenness
of size and cardboard. This might pose some difficulties in the panel training. The
panel could accurately rate higher or lower intensities than those demonstrated in
the training, because of the use of both internal references (e.g., specific product
references like potato chip references shown during training) and external refer-
ences (Munoz 1986; Meilgaard, Civille, and Carr 1987). However, it is recom-
mended that further production sample screening be done to fmd those products
l>EfEARANCi 0 15
Color int.
• • • •
:Even color • •
Even size • •
FLAVOR 0 15
Fried potato
• • •
cardboard
Painty
Salty • • •
~ 0 15
Hardness • • •
Crisp/crunch • • •
Denseness
FIGURE 3.13. Training samples (·) selected for the QC/sensory program of potato chips.
88 Comprehensive Descriptive Method
for the panel, even shortly after the panel has been trained. Alternatively, acceler-
ated shelf life or other product stress procedures or R&D-created samples might
be considered to provide those products needed for training. This might be required
to obtain the high intensity of cardboard (5.0) found in previous surveys (Fig.
3.13), but not in the one completed for training.
Traini1~g Program
The panel training program should be conducted by an experienced sensory
professional. The two main components of this training are the basic sensory phase
and the product training phase. The objective of the basic sensory program is to
familiatrize panelists with the applications of sensory evaluation, its importance
within 1he company and in the measurement of quality, and the basic concepts of
physiology that play a role in the evaluation of the company's product character-
istics (e.g., appearance, flavor, skinfeel).
In the product training phase, each of the attributes, for which sensory specifi-
cations have been set, is discussed one at a time. The production samples, repre-
senting ranges for each attribute that were previously identified, are presented to
the panelists. The definition, evaluation procedures, and scoring of each attribute
are covered, using external and internal (production samples) references. Table 3-9
shows lthe process for the training of denseness for potato chips. The same
procedure is followed for all attributes.
Table 3-'~. Example of training process for the descriptive evaluation of production samples
Denseness
Step 1
Deflnition: Compactness of the cross section of the sample after biting completely through
with the molars.
Low-----> High
(airy) (dense/compact)
Step 2 (optional)
Presentation of external (not product speciflc) intensity references*
Nougat (Three Musketeers-M&M Mars) Denseness = 4.0
Malted Milk Balls (Whoppers) Denseness = 6.0
Fruit Jfellies (Chuckles-Nabisco) Denseness = 13.0
Step 3
Presentation of internal references (potato chips production samples) (from Fig. 3.13)
Sample !-Denseness= 6.1
Sample 2-Denseness = 8.5
Sample 3-Denseness = 10.0
*Muiioz (1986)
Implementation of the Program 89
After this review, test production samples are presented for evaluation. These
test samples are only coded with three digit codes and panelists are asked to rate
the intensity of the attribute they just learned. Initially, the samples representing
the largest difference in an attribute are presented as test samples. For example, for
the saltiness exercise for potato chips, the most different samples available,
representing the lowest (8.0) and highest (14.5) intensity in saltiness are chosen for
the first panel exercise. Panelists rate these two samples for saltiness, and the
results are compared to the documented saltiness intensities of those two samples
(8.0 and 14.5).
In the two to five training sessions, all variable attributes are reviewed and
the product evaluation exercises are completed. Thereafter, practice sessions
are conducted to reinforce the concepts learned and give the panelists addi-
tional practice in the product evaluation. In each practice session, each panelist
is asked to evaluate selected production samples, presented as unknowns, on a
daily basis. Records of the evaluations are kept and used for panel feedback.
Production samples identified in the survey are used for the practice sessions.
Therefore, descriptive evaluations are available and are necessary for judging
panel performance. Table 3-10 shows the process for the potato chip training.
The information listed under "reference" corresponds to the descriptive char-
acterization of the production sample by the trained and experienced panel
(like an R&D or contract research panel). This information was collected prior
to the start-up of the training. The information listed under "plant panel" is the
average rating of the plant panel (n=10) ofthe same production sample. Simple
inspection of these data, and similar information collected across all practice
days for a variety of products, indicates the performance of the panel. In
addition, the individual observations are assessed to conclude on each
panelist's performance. Table 3-10 shows that the panel rated the saltiness
intensity lower and the intensity of fried potato higher than it should have. If
this behavior is observed consistently over the practice sessions, these two
attributes should be reviewed with the panel. Product references and other
training material are used for this review.
Once the practice period is completed and problem areas have been reviewed
and resolved with the panel, the ongoing operation of the program can be started.
This would mean that the panel is qualified to accurately evaluate actual produc-
tion samples in terms of their variable sensory attributes. Furthermore, the panel's
data can then be handled with other QC data for product decisions.
Panel Maintenance
As with any other sensory program, a panel maintenance program needs to be
established. This maintenance program should address both the psychological and
physiological factors of the panel.
90 Comprehensive Descriptive Method
Table 3-10. Example of panel performance using descriptive results (potato chips)
Intensities
APPEARANCE
Color intensity 5.4 5.0
Evenness of color 12.1 12.7
Evenness of size 7.3 8.5
FLAVOR
Fried potato 5.2 7.1
Cardboard 0.4 0.0
Painty 0.0 0.0
Salty 13.6 8.3
TEXTURE
Hardness 8.7 8.3
Crispness{crunchiness 14.1 13.5
Denseness 8.5 7.7
Psychological Factors
The psychological factors are, in the long run, the most important factors that affect
panel performance. A QC/sensory panel that frequently participates in routine
evaluation of production samples experiences boredom and lack of motivation
after a period of continuous panel participation. This problem might be further
aggravated by other factors, such as lack of support from immediate supervisors,
lack of :support for relief from the line, plant layoffs, cancellation of panel sessions
because of plant crises or production schedules, union problems, boredom in the
evaluatilon process because products are too consistent, and so on.
The plant sensory coordinator should administer a program to maintain the
panel's interest, motivation, and, consequently, adequate performance. Some pos-
sible activities are:
tion should be shared not only with the panel, but also with the personnel of the
whole plant.
• Planning special group activities outside the working place. These group activ-
ities foster interaction and friendship among panel members.
• Giving small rewards to the panel on an occasional basis. Every company differs
in what kind of small rewards are given to the panel for their participation. Some
include trinkets, T-shirts, company products, other products, tokens, gift certifi-
cates, lunches, dinners, and so forth.
• Scheduling panel review sessions with the whole panel (or at least groups of ten
people) to review the technical aspects of the program (e.g., attributes, refer-
ences). The panel then realizes that the program is important in the company and
deserves attention and maintenance.
• Providing panel performance feedback. The sensory coordinator informs the
panel on a regular basis of the individual and whole panel performance. This is
possible by the use of blind controls presented in regular evaluations. Both the
adequate performance and problem areas are shared with the panel.
• Recognizing individual panelist's participation. Notable contributions of panel-
ists (attendance, correct identification of blind controls or off-notes,) should
receive recognition by the sensory coordinator and the QC manager.
There are many unique ways by which a company can acknowledge the panel
participation. The philosophy and operation of each company determines the most
effective activities and aspects of this program. One factor to consider for in-plant
programs is the restrictions that plant unions can pose on these activities. In
extreme cases, no tangible rewards can be given to any of the panelists. In such
companies, the sensory coordinator has to rely more heavily on the psychological
rewards to the panel for maintaining their interest and motivation.
Physiological Factors
As with the psychological maintenance program, there are numerous activities that
can be administered by the sensory coordinator to maintain the panel's technical
performance. Among the most common and recommended ones are:
• Scheduling reviewsessions. The panel coordinator meets either with the whole
panel or groups of ten panelists. In these sessions, the technical aspects of the
program are reviewed: attribute definitions, references, and evaluation proce-
dures. Other aspects of the program are discussed, such as scheduling, environ-
mental effects on evaluations, testing controls, and other issues that might have
an effect on panel performance. These sessions should be scheduled regularly
(perhaps once a month). Their frequency is to be determined by the panel and
sensory coordinator, depending on their needs.
92 Comprehensive Descriptive Method
In addition, the difference between the plant panel's average result and the
reference value for each attribute can be plotted on an I -chart, as in Figure 3.14, to
track the consistency of the plant panels ratings over time.
Calculations of the within and between panel variability per attribute (see
Appendix 5) are useful to monitor panelist-to-panelist and session-to-session
variability, respectively. In addition, 1- and R-charts can be generated for panel
performance. For example, the range of the individual panelists' ratings can be
plotted on an R-chart (Fig. 3.15) to assess panelist-to-panelist consistency. !-charts
similar to Figure 3.14 can be generated for each panelist by plotting the difference
between the individual's rating and the panel mean and/or the reference value each
time the hidden reference sample is evaluated. Panelists with consistent positive or
negative biases or erratic, highly variable ratings are clearly evident.
- - - - - - - - - - - - - - - - - - UCL
------------------~
5 10 15 20 25 30 35 40
Evaluation
FIGURE 3.14. 1-chart of the plant panel's rating of a representative attribute for a hidden refer-
ence sample presented routinely during the operation of the program.
-~
~ -----------------·~
CIS
a..
5 10 15 20 25 30 35 40
Evaluation
FIGURE 3.15. R-chart of the plant panel"s rating of a representative attribute for a hidden refer-
ence sample presented routinely during the operation of the program.
94 Comprehensive Descriptive Method
products implies broadening the panel skills or improving the existing ones.
With the introduction of new products to the program, the panel reviews and
improves on the attributes already learned and learns new attributes.
Data Analysis
For each attribute, the individual panelist's ratings are recorded. Table 3-11 con-
tains the panelists' perceived saltiness ratings for the three samples of potato chips
collected during routine QC sampling. Note that in Table 3-11 the scores for the
three samples (early, middle, and late) are recorded separately, so that
Attribute: Saltiness
Sample From Lot 011592A2
Panelist Early Middle Late
X-Chart
-UCL
4
:::-
·u;
Cl> c:
g.s
'- c:
3
Cl>-
> '-
~.2 -LCL
0
(..)
2
0
5 10 15 20 25 30 35
Observation (Lot)
A-Chart
4
:::-
..,_ ·u;
0 c:
<D.S
Ole:
3
c:-
Ill '-
a:::£1 ------------------~
0
(..)
2
0
35
Observation (Lot)
F1GURE 3 .16.. X-chart and R-chart of the color intensity of potato chips showing that the lat-
est lot evaluated, although ~in-spec," falls above the upper controllirnnit (UCL), indicating that
the process is out of control.
Implementation of the Program 97
initial product survey (Table 3-5), the long-term average color intensity is 3.2. The
control limits for color intensity are computed as
UCL = X+ 3a/ID
= 3.2 + 3(0.5)/13
= 4.1
and
LCL = X-3a/ID
= 3.2 - 3(0.5)/13
= 2.3
where a is the long-term standard deviation of color intensity (from Table 3-5) and n
is the number of samples collected per lot during routine QC sampling (i.e., n = 3; early,
middle, and late in the lot). The X -chart shows that, although color intensity is in-spec
(Table 3-8), the value for the latest lot (#0 11592A2) lies above the upper control limit.
Over and above the out-of-spec conditions previously noted, this further indicates that
the process is out of control and that corrective actions need to be taken.
The control limits for the R-chart are established by multiplying the average
range of intensities of the three QC samples (e.g., R= 0.85 for color intensity, based
on historical data) by the entries corresponding ton= 3 in Table A-10 at the end of
the book. The 3a "action limits" for color intensity are
Report System
The sensory coordinator should highlight any samples that fall out of specification,
using the company's standard procedures for such occurrences as presented, for
example, in Table 3-12. Additionally, if an SPC program is in effect, the control charts
for each of the attributes should be updated each time a new lot of product is evaluated as
98 Comprehensive Descriptive Method
X-Chart
9
---------------~
7
Q) Ill
e>"'
ca c:a>
.... 6
a>"E - - - - - - - - - - - - - - - - - - LCL
> ltl
<CI
5
5 10 15 20 25 30 35
Observation (Lot)
F1GURE 3.17. X-chart of the hardness of potato chips showing that the latest lot evaluated is the
ninth consecutive lot to fall above the process mean (X), indcating that the process is out of control.
in Figures 3.16 and 3.17. The updated charts should then be forwarded to
appropriate personnel. The key factor to bear in mind is that the QC/sensory data
should be treated in exactly the same manner as any other analytical measure-
ments being collected for QC. All standard company formats, reporting proce-
dures, and action standards apply. Once in operation, a successful QC/sensory
function should be a fully integrated component of the overall QC program.
Routine Activities
Schedulilng of Panel Sessions
At the start-up of the program, the sensory coordinator spends a considerable
amount of time scheduling and conducting the panel sessions for a given week.
At this time, many program components have yet to be established, such as the
sampling schedule (product collection), panelist's individual schedules, and
procedures for relieving line workers. Once all the program parameters have
been delineated, routine evaluations can start.
The routine schedule of panel sessions is another factor unique to each company.
hnplementation of the Program 99
Some companies might decide to schedule only one session per day to evaluate
production samples from the previous or same day. Others might decide to
schedule more than one session per day, since many more products are to be
evaluated. This decision has to be made in the preliminary and planning phase of
the QC/sensory program because of its impact on other characteristics of the
program.
Based on the product sampling plan chosen (which determines the total number
of products to be routinely evaluated), the product characteristics (which deter-
mine the maximum number of samples to be evaluated in a session), and the
maximum time a panelist can spend in panel session, the following are determined:
These aspects should be established and then become a routine in the operation
of a QC/sensory program.
• The schedule and/or its modifications must be communicated to both the panel-
ist and his supervisor.
• Early notification assures the panelists' availability (especially in the case of line
workers).
• Each panelist's schedule has to conform with the maximum number of sessions
allotted per panelist in a week and the maximum time allowed per session.
Administration of Test
In administering each test, the sensory coordinator must give attention to:
The time spent in the administration of the evaluation sessions by the sensory
coordinator depends on the type of product evaluated, the evaluation procedures,
and the data collection system of the program. The most involved programs are
those requiring very controlled and complex product preparation procedures.
Among these include certain food products requiring cooking or heating. In such
cases, the program requires not only the complete supervision of the sensory
coordinator, but the help of one or more QC/sensory technicians. When complex
sample preparation is required, a more structured scheduling of sessions and
panelists is necessary to ensure the integrity of the samples evaluated.
Conversely, other programs require minimal supervision. An example is a
program for the evaluation of hard candy or paper products. As long as the facility
offers the required environmental controls, panelists can complete their evaluation
at the most convenient times for them. Products can be prepared in advance and
left in the evaluation room for each panelist. The sensory coordinator requires
minimal technician assistance and he or she is only needed to answer special
questions or solve problems.
Panel Maintenance
The panel monitoring and maintenance should be an ongoing and important activity of
the sensory program. Several procedures have been discussed earlier. In addition, any
other procedure used for panel monitoring and found in the sensory literature should be
considered in QC/sensory programs (Amerine, Pangborn, and Roessler 1965).
Program Issues
The established sensory program can grow through the introduction of other
products to the program, the extension of the program to other production phases,
and the involvement in plant-based research projects.
Additional Products. The support, growth, and success of a QC/sensory pro-
gram is evident by the degree and frequency with which additional products are
incorporated into the program. As this occurs, the sensory program further develops in
several ways. New panels are trained, panelists' skills improve, the sensory staff
expands, the sensory facility might be remodeled or expanded, and the program
becomes an essential component of the plant's QC program and operation.
Evaluation of New Production Phases. As the QC/sensory program grows,
other points of evaluation in the process are incorporated into the program. Those
programs dealing exclusively with finished products are expanded to include the
evaluation of raw or in-process materials. Conversely, those programs established for the
evaluation of raw ingredients expand into the assessment of finished products.
The ultimate goal of an in-plant program is to cover all critical production facets
of all the important product categories. Since all production facets for a variety of
102 Comprehensive Descriptive Method
Both areas are geared to reduce the routine tasks and work load of the sensory
program. This in turn allows the use of those resources in other areas of production
and/or other products. Ultimately, a sensory program should only be evaluating
product a1ttributes that 1) cannot be instrumentally measured and 2) are critical to
consumer acceptance. These modifications can only be made through the design
and execution of research projects.
Panelists' Development
The panelists' development should focus on product evaluation skills and their
knowledge in sensory evaluation. The importance of improving the panelists' skills
was discussed under panelists' motivation. Improving their product evaluation
skills through training in other products not only serves as a motivational factor,
but improves their technical expertise as panelists for products evaluated already,
new products, and special sensory problems that may develop at the plant.
In addition, the sensory coordinator should develop a program geared to expand
the panelists' sensory knowledge. This may be done by organizing presentations
of basic sensory topics and of R&D sensory projects, and by circulating basic
sensory technical literature.
OTHER VERSIONS
This chapter has presented the characteristics and the steps to implement the most
complete version of the comprehensive descriptive method. This version requires:
A. RAW INGREDIENTS
FRAGRANCE OF BABY POWDER
CHARACTERISTIC SCORE
Total intensity of fragrance
• sweet aromatic/vanilla
• Floral/rose
Carrier/base fragrance
CHARACTERISTICS SCORE
Brightness of color
Fragrance intensity
Base odor intensity
smoothness on skin
Residue retention
FIGURE 3.18. Evaluation fonn used for a modified comprehensive description method
(shorter version).
the sensory characteristics are common to all varieties, except for the flavor type
and color.
MOUTHWASH
~BMml
0 10
Clarity
Color intensity
(blue)
Peppermint Intensity
cool
B'llrn
CINNA!.'!ON
0 10
Clarity
Color intensity
(red)
cinnamon intesity
s·weet
Cool
Bite/heat
Burn
~
0 10
Clarity
Color intensity
(yellow/orange)
orange/citrus intensity
sweet
Sourness
Cool
Burn
FIGURE 3.19. Evaluation form used for a modified comprehensive descriptive method (shorter
version across similar products).
Other Versions 107
ABSTRACT
This sensory program consists of assessing the quality of daily production by a
panel, based on either the panelists' own perception of "quality" or established
quality criteria. Samples are rated using a quality scale (e.g., very poor to excel-
lent), and a product is rejected when the quality ratings are low. The quality score
or cutoff point that determines when a product is to be considered acceptable or
unacceptable is a management decision.
The main advantage of this method and the reason for its popularity is that it
provides a direct measure of the product's quality and may be an appropriate
selection for those companies where quality assessments are commonly under-
stood. Another advantage of this program is the motivational factor associated with
the type of product evaluation completed. Rating production's quality directly
makes panelists perceive a higher degree of involvement in this program than in
other programs.
This program has several disadvantages. In the absence of training, and without
common quality criteria, the main limitation of this program is the high panel
variability observed in the data due to differences in individual experiences and
familiarity with production. When quality criteria are established and taught to
panelists, the disadvantage is the cumbersome procedures that have to be followed
to establish this type of program. This may translate into a long training period,
frequent panel reviews, and practice sessions. Another disadvantage is that the
quality measurements may be integrated and therefore may not be very actionable
and useful for product documentation or guidance. For example, a "poor" pepper-
mint quality rating in chewing gum can mean that the peppermint is low or high in
intensity, has off notes, or differs in character, compared to the standard.
108
Program In Operation 109
PROGRAM IN OPERATION
Table 4-1. Average panel quality ratings for production sample 82-0115 of variable paper
towel attributes
Chroma 8.1
Specks 5.4
Embo5s depth 9.7
Grittiness 4.1
Stiffness 3.7
Tensile Strength (D =Dry) 7.8
Paper Moistness 9.5
Overall Quality 5.8
* Rated OllL Scale
0 - very poor
10 • excellent
10 (O=very poor and lO=excellent). Specific attributes are rated because they were
found to impact consumer acceptance. In rating this and other production samples,
panelists use the quality criteria taught during their training. Quality references are
reviewt:d periodically to reduce the variability of the panel quality ratings. Without
common quality criteria, results vary widely from person to person and are
dependent on each panelist's previous product experience, personal preference,
knowledge of the process, and so on.
The data in Table 4-1 are provided to QC management to make decisions on the
disposition of the evaluated production batch. Decisions are based on the compar-
ison between the panel results and the quality specifications set for each of the
product attributes. The last column of Table 4-2 shows the quality specifications
set by management. The specifications were added to the table for the purpose of
this discussion. The quality specifications are never included in the panelists'
evaluation forms (ballots) and should not be information provided to panelists,
since they may bias their ratings. The information is only known by management
to compare the panel results to the product's specifications.
A product is considered unacceptable ("out-of-spec") if it falls outside the
quality specifications. For example, Table 4-2 shows that production sample
B2-0015 is unacceptable because the "overall" quality rating and the quality
ratings of the attributes specks, grittiness, and stiffness are lower than the set
minimum quality scores.
The process is the same as that described for the comprehensive descriptive
approach (Chapter 3), except for step 7. In this step, the sensory specifications are
set in terms of quality scores, and quality guidelines are developed. The guidelines
are established by management for panelists to rate quality and are demonstrated
Implementation of the Program 113
Descriptive Analysis
The descriptive data are used for three purposes:
1. The selection of the fmal set of products to be consumer tested (pg. 114);
2. The establishment of specifications (pgs. 115 and 116); and
3. The selection of replacement product references in later stages of the program
operation.
The complete descriptive characterization is obtained for all samples (e.g., 20)
collected and screened in the previous steps. Table 4-3 shows the list of descriptive
terms evaluated for paper towels and the average panel results for one sample. (The
scale used is a 0 to 15 intensity scale, where O=none and 15=extreme.) The panel
results obtained in the evaluation of 20 samples are summarized, and the variability
ranges for each attribute are plotted in Figure 4.1. The line scales are 15 em scales,
where the distance between the left end marked "0" and the first slash mark on the
line represents the lower range intensity value. Some of the attributes showing the
largest variability ranges are depth of embossing, grittiness, and paper moistness.
The descriptive data for the 20 paper towel samples is analyzed for sample
selection purposes.
114 Quality Ratings Method
APPEARANCE
Color Intensity 1.2
Chroma 12.4
Specks 1.7
Emboss Depth 6.S
DRY TACTILE
Overall Surface Complex 11.9
Fuzzy 9.3
Gritty 2.4
Grainy 5.1
Thickness 6.3
Fullness,'Body S.1
Force to Gather 4.1
Force to Compress 5.3
Stiffness 6.5
Noise 3.1
Compression Resilience Intensity 6.7
Compression Resilience Amount 10.5
WET TACTILE
Paper Moistness 5.2
Tensile Strength 2.3
Water Release 2.7
* Scale 0 - r1one 15 - exlreme
Sample Selection
The analyses used for sample selection are described in "Sample Selection" on
page 60 in Chapter 3. Following these steps for the paper towel examples, a total
of 11 products were selected for the consumer test.
Attribute Variability
APPEARANCE 0 15
color intensity
Chroma
Specks
Emboss depth
0 15
Surface complex
Fuzzy
Gritty
Grainy
Thickness
Fullness/body
Force-gather
Force-compress
Stiffness
Tensile strength
CR. intensity
CR. amount
WET TACTILE 0 1
Paper moist
Tensile strength
water release
page 71 in Chapter 3 describes in detail the important issues to consider and the
steps followed to complete the study.
A) No Relationship
CD
0
c:
Ill
c.CD
0
0
<
Attribute Rating
B) Linear Relationships
CD
0 ~
c: c:
ns
nl
0.
CD
a.CD
0 0
0 0
< <
C) Curvilinear Relationships
Q) CD
0 0
c: .::;
ns CIS
a.CD a.CD
0 0
0 0
< <
Attribute Rating
FIGURE 4.2. Relationships between acceptance and descriptive attribute ratings found in paper
towels.
118 Quality Ratings Method
Table 4-4. Summary of the sensory attributes related to consumer acceptance of paper towels
Type of Relationship*
Attribute' Positive Negative Curvilinear
(linear) (linear)
Chroma X X
Specks X
Emboss depth X
Grittiness X
Stiffness X
Tensile slrength X
Paper moistness X
The process is described for the paper towel example and shown in Figure 4.3.
For the paper towel example, it was found that both "overall" and "softness"
consumer acceptance decrease with high stiffness intensities (Fig. 4.3A and 4.3B).
Given an "overall" acceptance cut-off point of 6.0, the intensity sensory specifica-
tion for stiffness is set as an intensity of 9.0 or lower (Fig. 4.3A).
These preliminary intensity specifications are used to set quality specifications.
Figure 4.3C shows that the quality specification for stiffness, based on the intensity
specs, is set as a range from 6 to 10 in quality (0 = very poor, 10 = excellent).
Samples that are stiff are rated low in stiffness quality, with samples rated 6.0 or
lower being considered "out-of-specification."
The quality specification for other critical attributes are established, following
the same procedure. Table 4-5 lists the final quality specifications set for paper
towels.
Establishment of Quality Guidelines/Selection of Quality References.
Quality guidelines are established in the management meeting to provide criteria
to panelists for evaluating quality. The quality guidelines for a given product
attribute specify the relationship between the quality of the attribute and the
attribute intensity/level. For example, if the quality guideline for grittiness in paper
towels is an inverse relationship between the quality of grittiness and grittiness
intensity, then the quality of grittiness decreases as grittiness intensity increases.
The quality guidelines are best demonstrated through product references that
represent levels of quality. Therefore, the establishment of quality guidelines is
completed through the selection of products chosen as quality references for the
various critical attributes in the program.
The selection of products (i.e., quality references) to represent quality guide-
lines is completed in the management meeting. The products are then used for
training the panel. For perishable products (e.g. foods), fresh samples representing
Implementation of the Program 119
"'
'- 6
~
0
s*-:
9 10 II 12
Stiffness
g
.'::)
~
<(
~ 6
5
0
<f)
4 5
Stiffness
Attribute Specification
(minimum acceptability)
Chroma 5.0
Specks 6.5
Emboss Depth 5.0
Grittiness 6.0
Stiffness 6.0
Tensile Strength 4.5
Paper Moistness 7.0
Overall Quality 6.0
Table 4-6 illustrates the process for two paper towel attributes: stiffness and
tensile sltrength. The quality ranges chosen, based on the variability intensity
ranges, are 2 to 10 for stiffness and 3 to 10 for tensile strength. Three quality
references are selected for each attribute. In addition, an "overall" quality score is
assigned to each product reference. The assessment of the "overall" quality scores
across variable attributes is to give panelists an indication of the impact of each
attribute on the total product quality. Table 4-6 shows that stiffness variability has
a greater impact on overall quality of paper towels than tensile strength variability.
During this product review, it is critical to discuss the concept of "excellence"
in quality control as compared to product excellence in research and development
and marketing. A product may be rated "excellent" in a quality control situation
Implementation of the Program 121
Table 4-6. Establishment of quality guidelines and selection of quality references for stiffness
and tensile strength in paper towels
Stiffness
Reference Selected
and Overall Quality
Sample Stiffness Quality Overall Quality
425 10 9
979 6 7
614 2 4
Tensile Strength
References Selected
and Overall Quality
Sample Tensile Strength Quality Overall Quality
817 10 8.5
173 4.5 7.0
321 3 6.0
(i.e., relative quality score compared to all production samples produced at a plant),
and only "fair" in a marketing application (i.e., absolute quality score compared to
all products in the market place).
An illustration of this concept is presented in Figure 4.4. Case I shows the
differences in absolute sensory quality among five different commercial products
(brands A through E). The distribution shows that product E has a higher absolute
sensory quality compared to product B. However, from a quality control perspec-
tive using a quality ratings program (case II, Fig. 4.4) product B would obtain
higher "internal" quality scores more often than product E. This is a reflection that
there is less production variability in product B (range B'B*) than in product E
(range E'E*).
In the management meeting, where specifications and quality guidelines are set,
"excellent" samples in each attribute have to be selected (e.g., samples 425 and 817
122 Quality Ratings Method
A B C D E
E'
Prc>duct E fluctuation++
B' a*
Prc>duct B fluctuation++
++ E'E* and B'B* are the quality variability ranges observed for
products E and B in their production, respectively.
with quality scores of 10 for the attributes stiffness and tensile strength, respec-
tively). The products represent the "best" attribute intensities that can realistically
be produced.
Evaluation Form. Based on the decision reached at the management meet-
ing, the fmal evaluation form for routine assessments is developed. Figure 4.5
shows the evaluation form to be used in the paper towel program, based on the
decisions made at the management meeting. Figure 4.6 shows a similar ballot for
the quality evaluation of another consumer product-orange juice from concen-
trate. As with the paper product, the quality of each of the critical attributes (e.g.,
which impact on consumer acceptance) is rated, as well as the "overall" product
quality.
hnplementation of the Program 123
~~~~u-ct~I~D~------
ZNSTRUCTIONS
Please evaluate the quality of the attributes listed below, using
the scale
very
0 ----------------------> 10
excellent
poor
OVERALL QUALITY
F1GURE 4.5. Evaluation form used for quality ratings of paper towels.
Procedure
The previous section discussed how the consumer responses to production variability
were used as guidelines to set specifications by management. Without the consumer
information, management uses only their criteria for setting specifications.
To follow this approach, only the results from the product survey are required.
A management meeting is scheduled where the following aspects are presented:
Name
Produ-ct,..-,:I"'D_ _ __
INSTRUCTIONS
Please evaluate the quality of the attributes listed below, using
the scale
very
0 ----------------------> 10
excellent
poor
OVERALL QUALITY
F1GURE 4.6. Evaluation fonn for quality ratings of orange juice from concentrate.
For the paper towel example, the results of the product survey are shown (Fig.
4.1). In addition, the products that represent the extremes of the variable attributes
and some intermediate points are presented (Fig. 4.7, circles).
Each attribute is reviewed by demonstrating the products that display its
variability range. Upon this review, the following program parameters are estab-
lished:
Attribute variability
AEE;E;ARANC!l 0 15
Color intensity
Chroma
• •
Specks
Emboss depth
DRJ[ UICIJ:L:C: 0 15
surface Complex
• • •
Fuzzy
Gritty
Grainy
Thickness
Fullness/body
Force-gather
Force-compress
stiffness
Tensile strength
CR. intensity
CR. amount
WEI Ill!:;l;J]J: 0 l
Paper moist
• • •
Tensile strength
Water release
118 in this chapter for the considerations and steps followed to complete each of
these steps.)
For the paper towel example, the attributes selected for the program are
specks, grittiness, stiffness, and tensile strength. The quality ranges, references,
and specifications selected for the attributes specks and grittiness are shown in
Table 4-7.
Descriptive Analysis
The quality references selected to represent quality scores (Table 4-7) need to be
evaluated by a descriptive panel. The references' sensory characteristics need to be
documented for future steps of the program. This documentation of original quality
126 Quality Ratings Method
Table 4-7. Example of program parameters established by management for specks and
grittiness in paper towels
Specks
Intensity r:mge 1-5.0 0-15
Quality range 10-5 0-10
Quality references Sample 615
(quality= 10)
Sample 313
(quality= 4)
Quality specification
(minimum acceptable) 6.5 0-10
Grittiness
Intensity range 2.5-8.0 0-15
Quality range 10-2 0-10
Quality references Sample212
(quality =10)
Sample 813
(quality = 2)
Quality specification
(minimum acceptable) 6.0 0-10
*0-15: Intensity ranges (0 - none, 15 - extreme)
0-10: Quality scales (0- very poor, 10- excellent)
Training Steps
A prescreening and a screening phase are conducted to identify and select the panel
members (..Prescreening" and .. Screening" sections on page 85 in Chapter 3.) The
training is scheduled to train 20 to 30 selected participants. Overall, the training
program for the quality rating approach consists of a basic sensory program and a
product training phase. In the product training phase, panelists are trained to
Implementation of the Program 127
evaluate the quality of those characteristics that either (1) were shown to
impact consumer acceptance or (2) were chosen by management as being
critical attributes. In addition, common quality criteria are established among
the panel members through the demonstration of quality references selected in
previous steps. Depending on the product category and the number of attri-
butes, the training program requires six training sessions of 2 to 5 hours each,
for a total of 12 to 30 hours. Additional sessions are scheduled thereafter for
practice.
For the paper towel example, the attributes taught to panelists are:
• Chroma
• Specks
• Depth of embossing
• Grittiness
• Stiffness
• Tensile strength (dry)
• Paper moistness
In addition, guidelines are given to rate the "overall" quality of daily production
samples.
Training Program
The training program consists of a basic sensory evaluation phase and a product
training phase. The objective of the basic sensory program is to familiarize
128 Quality Ratings Method
panelists with the applications of sensory evaluation, its importance within the
company and in the measurement of quality, and the basic concepts of physiology
that play a role in the evaluation of the company's products characteristics (e.g.,
appearance, flavor, skinfeel).
The product training phase for this approach has two components:
Panelists are usually trained in the attribute definitions and evaluation proce-
dures ftrst (descriptive component), followed by the quality ratings component.
In the paper towel example, the definitions and evaluation procedures of the
critical attributes are taught first. Table 4-8 shows the evaluation procedures
taught to panelists for two of the critical paper towel attributes: stiffness and
tensile strength. Products that differ in these attributes are presented during
training. Although the differences between products are discussed, the training
does not focus on intensity ratings, since attribute intensity measurements are
not collected from the panel.
Once the panel has learned the sensory characteristics, they are trained to grade
the quality of the characteristics (quality program). For that purpose, the quality
product references representing extreme and intermediate quality scores are pre-
sented (Fig. 4.8).
hnplementation of the Program 129
Table 4-8. Example of definitions and evaluation procedures of some paper towel attributes
(descriptive training)
Stiffness
Definition:
Amount of pointed ridged or crooked edges, not rounded/pliable
(pliableftlexible--->stiff)
Procedure:
Lay towel on flat surface. Gather towel with fingers and manipulate gently without completely
closing hand.
Tensile Strength
Definition:
The force required to breakftear paper
(no force--->high force)
Procedure:
Grasp opposite edges of towel in hands and pull the towel until it breaks.
Step 3-Leaming of quality rating through the use of product quality references
Sample C Sample F
low stiffness high stiffness
high quality low quality
stiffness quality score =10 stiffness quality score =6
Step 3-Learning of quality ratings through the use of product quality references
Sample A SampleS Sample M
low ,chroma medium chroma high chroma
low quality high quality low quality
chroma quality score =5.0 chroma quality score = 9.0 chroma quality score = 6.0
To train the panel to rate quality of attributes falling in Table 4-10 requires the
use of:
The process described in Tables 4-9 and 4-10 illustrates the difficulties involved
in this method. Therefore, companies that want to implement this approach need
to understand this complex process before choosing this method.
After each paper towel attribute is learned, test production samples are presented
for evaluation and the results are discussed. The test samples are coded with three digit
codes, and panelists are asked to rate the quality of the attribute they just learned.
Initially, the samples representing the largest difference in an attribute are presented as
test samples. For example, for depth of embossing, the most different samples avail-
able representing the lowest (3.0) and highest (10.0) quality intensity in this attribute
are chosen for the first panel exercise. After panelists have rated the two samples, the
results are compared to their known depth of embossing quality intensities and are
discussed with the panel. In the two to five training sessions, all critical attributes are
reviewed and their quality evaluation exercises completed.
Thereafter, practice sessions are conducted to reinforce the concepts learned
and to give the panelists additional practice in product evaluation. In each practice
session, panelists are asked to evaluate selected production samples, presented as
unknowns. Records of the evaluations are kept and used for panel feedback.
Production samples identified in the survey and quality references are used for the
practice sessions. The known quality ratings of the reference samples are used to
judge panel perfonnance. Both individual panelist ratings and the panel mean
ratings should be compared to the known value of the reference sample to identify
where, if any, problem areas exist.
Once the practice period is completed and problem areas have been resolved
with the panel, the ongoing operation of the program is started. This means that the
panel is qualified to rate the quality of actual production samples. Furthennore, the
panel's data can then be handled with other QC data for product decisions.
Panel Maintenance
The quality ratings QC/sensory program also requires the establishment of a
maintenance program, which addresses psychological and physiological consider-
ations. Chapter 3 summarizes some of the activities involved in the maintenance
program. Among the psychological aspects to be addressed are recognition and
appreciation of the panelists' participation, the planning of special group activities,
rewards, panel perfonnance feedback, and panel review sessions.
The physiological factors described in Chapter 3 are also applicable in the
quality ratings program. These are:
addressed are panel scheduling, conditions of the panel room, sample prepara-
tion, and so forth.
• The review of quality criteria. The review of quality references in this program
is critical. Panelists need to be exposed to the products representing the various
quality levels to assure adequate performance. This review can be done by
presenting a series of products that represent the low and high quality levels for
each c:valuated attribute (as it was done in the training) or by presenting one
product reference with all its quality scores identified. For some nonperishable
products, like paper towels, large quantities of products representative of differ-
ent quality levels should be obtained initially and stored under controlled
conditions for later use. For most products, however, replacement quality refer-
ences need to be produced/identified on a regular basis. To properly administer
the acquisition (through surveys or production of pilot plant samples) and use of
quality references, the sensory coordinator needs to use the descriptive data base
gathered at initial stages of the program (pg.113 or 125) and a descriptive panel
that can characterize the sensory properties of the replacement quality refer-
ences. The review of quality references allows panelists to recall the criteria to
rate the quality of daily production. This practice also decreases within panelist
variability.
• The evaluation of blind references. The introduction of blind quality references in
the product sets is a practice by which the panelist performance is monitored. The
results obtained for these products are compared to the documented quality ratings
for that product. Individual panelist performance is monitored using control charts
(see Appendix 5) and summary statistics, as shown in Table 4-11. Examination of
Table 4-11 reveals that Panelist C rates the reference samples with consistently
higher quality ratings than the other panelists. Further examination reveals that
Panelislt G is consistently more variable (i.e., larger standard deviations) than the
other panelists. The observations suggest that retraining is needed for both panelists.
Data Collection
Samples to be evaluated by the plant panel are collected at the same time as the
samples that will undergo other QC testing. The frequency of sampling is deter-
mined using the same QC criteria that apply to all other tests.
Table 4-11. Panelist performance statistics on blind reference samples
A 7.2 0.8 7.0 0.7 6.9 0.6 7.1 0.8 7.3 0.7 6.8 0.7 7.2 0.8
8 6.9 0.7 7.1 0.6 7.2 0.7 7.1 0.8 7.2 0.6 7.1 0.8 7.1 0.8
c 8.2 0.6 8.3 0.7 8.2 0.8 8.1 0.9 8.4 0.8 8.2 0.7 8.4 0.9
D 6.8 0.8 6.9 0.8 7.3 1.0 7.0 0.6 6.9 0.9 6.9 0.9 7.2 1.0 a
'"0
E 7.1 0.8 6.7 0.9 6.8 1.0 6.9 0.6 7.0 1.0 7.2 1.0 7.3 1.1 ii"
F 7.0 0.9 7.3 0.6 6.7 0.9 6.8 0.7 6.9 1.0 7.2 0.9 7.0 0.9 3(1)
G 7.0 1.8 6.9 2.3 7.1 1.7 7.0 2.2 7.1 2.4 7.0 1.9 7.1 2.3
S"
=
H 6.8 0.6 7.2 1.0 6.9 0.7 7.0 0.9 6.8 0.8 6.9 0.8 7.5 0.8 :::-.
0
7.1 0.7 6.8 0.8 6.9 0.8 6.9 1.0 6.8 0.8 6.8 0.7 7.3 1.0
0
=
J 6.9 0.7 7.0 0.7 7.1 0.8 6.9 0.7 7.0 1.0 7.1 0.7 7.1 0.8 .....
~
~
OQ
13
3
!.;>
!.;>
-
134 Quality Ratings Method
The number of samples submitted to the panel per session affects the sample
presentation design used by the sensory coordinator. If the number of samples per
session is small, then a complete block presentation design is used. If, however, the
number of samples per session exceeds that which can be evaluated before sensory
fatigue sets in, a balanced incomplete block presentation design is used (see
Appendix 3).
For the paper towel example, the sensory coordinator knows that four samples
of paper towels are collected during each shift at approximately 2, 4, 6, and 8 hours.
A shift's production is considered to be a single lot of product. The quality ratings
panel of the following shift evaluates the previous shift's production. The four
samples are evaluated using a complete block design with the order of presentation
of the four samples being randomized separately for each of the ten panelists.
Data Analysis
For each quality rating scale, the individual panelist's scores are recorded for each
sample, as shown in Table 4-12 for the chroma quality scale. The panel's average
quality ratings, which appear in the last row of Table 4-12, are the raw QC data
associated with each sample. For complete block presentation designs, as used for
the paper towels, arithmetic means are used. For incomplete block presentation
designs, adjusted (or least square) means must be computed (see Appendix 3).
The panel's average quality ratings should be recorded in a format consistent
with that used for other QC tests. Specifically, if multiple samples are collected
from each lot, as in the paper towel example, the lot average and range (i.e., the
average and range of the four panel means) are computed and recorded (see Table
4-13). If only a single sample per shift is collected, the panel mean is the shift mean
rating; no measure of within-lot variability is available.
Report System
The sensory coordinator should highlight any samples that fall out of specification,
using the company's standard procedures for such occurrences. For example, in
Table 4-13, the sensory coordinator has highlighted that the current lot of paper
towels (#27B-13) is below the minimum acceptable quality specification for paper
moistness (i.e., 7.0 from Table 4-5). The sensory coordinator forwards the infor-
mation to management, who decide on the fmal disposition of the lot.
If an SPC program is in effect, the control charts for each of the quality ratings
should also be updated and _0rwarded to appropriate personnel. If multiple sam-
ples per lot are collected, X-charts and R-charts are commonly used to track
production (see Appendix 5). If only a single sample is collected per lot, then
!-charts are used in place if X-charts. No analogue for R-charts is available when
only one sample per lot is collected.
Four production samples of paper towels are collected every shift (i.e., lot), so
Implementation of the Program 135
Table4-12. Panelists' quality ratings of the chroma of four production samples of paper
towels
Lot: 27B-13
Scale: Chroma
A 7 8 7 8
B 7 6 6 7
c 9 10 8 9
D 8 7 6 8
E 6 6 5 7
F 5 8 7 6
G 6 8 6 7
H 8 8 6 7
8 7 7 6
6 7 5 6
Table 4-13. Panel averages and ranges of quality ratings for a batch of paper towels
Lot: 27B-13
an X-chart and R-chart are maintained for each of the quality rating scales used by
the panel. Figure 4.9 illustrates the results for the chroma q~lity scale. Although
each of the most recent lots are within specification (i.e., X >5.0) and, in fact,
within the control limits for the process, the most recent lot is the ninth lot in a row
to fall below the process mean value. Based on the criteria for control charts
presented in Appendix 5, this occurrence signals that the process is out of control
and should be examined to determine what is causing the shift to lower chroma
quality ratings.
136 Quality Ratings Method
10
- - - - - - - - - - - - - - - - - - UCL
8
~
"iii
:::J 6 LCL
0 .. - - ----- --- -- --- - - ----- ---- -- -- - - - -- S~c
ns
E L1mit
e
.r.
4
(.)
2
5 10 15 20 25 30
Lot
FIGURE 4.9. X-chart of the chroma quality ratings of paper towels showing a trend toward lower
quality ratings over the last nine product lots.
OTHER VERSIONS
There are other quality program versions that can be developed and implemented
at the plant level. The program described in detail in this chapter dealt with a
program that provides information on the "overall" quality of the product and the
quality of selected attributes. To establish such a program with the use of manage-
ment quality guidelines is a cumbersome and expensive process. In addition, these
quality results are not actionable and can fail to provide information that can be
used for problem solving.
Two other versions of a quality ratings program are given below. Although only
limited product quality information is obtained, these modified versions solve
some of the limitations of the complete program described in this chapter.
APPEARANCE TEXTURE
FIGURE 4.10. Evaluation form used for the overall quality evaluation of vanilla cookies (modi-
fied quality ratings method).
Other Versions 139
0 l 2 3 4 5 6 7 8 9 10
none extreme
Gloss
Force to spread
Amount of residue (immediate)
Occlusion (5 min.)
Amount of residue (5 min.)
other
0 1 2 3 4 5 6 7 8 9 10
very poor excellent
QUALITY RATING
OVERALL QUALITY
FIGURE 4.11. Evaluation fonn ll8ed for the overall quality and attribute evaluation of antiperspir-
ant sticks {modified quality rating method).
5
"In/Out" Method
ABSTRACT
In this program, daily production is evaluated by a trained panel as being either
within ("in") or outside ("out") the sensory specifications (or the concept that
represents "normal" production). The result of this panel is the percentage of the
panelists who assess the product to be "in" specification. This method is mainly
used to identify and reject products that show gross deviations from "in/typical"
production, such as off-notes. It is a recommended method for the evaluation of
raw materials, relatively simple fmished products, or more complex fmished
products with very few variable sensory dimensions. In this program, panelists
have a more direct participation in the decision-making process of product dispo-
sition than in other methods. Ultimately, management makes the decision on
product disposition by setting an action standard around the panelists' results.
Currently, the "in/out" method may be the most popular method for QC/sensory
evaluations at the plant level. Its advantages are the simplicity of the assessments,
the short training and evaluation periods required, and the direct use of panel
results. Comparable to the quality ratings program, panelists are highly motivated
to participate in the program because of their greater involvement in the decision-
making process.
The main disadvantage of the program is its inability to provide descriptive
information and therefore its lack of direction and actionability to fix problems.
Consequently, this method is a "decision-making" tool, rather than a source of
product information. Other disadvantages include the use of panelists as judges of
quality and product disposition and not of sensory characteristics, the need for a
relatively large panel pool for data analysis and interpretation, and the inability to
relate tht~e data to other data, such as instrumental measurements. In addition,
140
Program In Operation 141
PROGRAM IN OPERATION
Comparable to the quality ratings program, there are two scenarios that describe the two
types of "in/out" programs currently in operation in the consumer products industry.
Scenario A: The panel consists of a small group of company employees (4 to
5), mainly from the management team. The panel evaluates a large amount of
production samples (up to 20 to 30) per session without standardized and con-
trolled protocols. Each product is discussed to determine if it is to be considered
"in" or "out" of specifications. In this program no defined specifications or
guidelines for product evaluation exist, and no training or product orientation was
held. As a result, each panelist makes decisions based on his or her individual
experience and familiarity with production, or based on the highest ranking person
on the panel. In this type of scenario, terms 1 such as "acceptable/unacceptable,"
"good/bad," "typical/atypical" might be used by the evaluators.
Another version of scenario A might be when only one judge evaluates products
to decide if they are acceptable or unacceptable.
Scenario B: The panel consists of 25 or more panelists who evaluate products
following standardized and controlled procedures and protocols (e.g., amount of
products evaluated in a session, product presentation). Panelists individually evaluate
production samples and assess if each product is "in" or "out" of specifications. The
criteria used to make such decisions are standard guidelines established by manage-
ment and taught to all panelists during training. In addition, panelists were familiarized
with the characteristics that define "in-spec" products and the characteristics that
defme "out-of-spec" products. Therefore, all panelists work with the same criteria in
the assessment of production samples. Panelists might also indicate the reason(s) why
a product was assessed to be "out-of-spec." This qualitative informationhelps man-
agement defme the condition of the "out-of-spec" product. Data are summarized as
percentages of "in-spec" observations and aoe analyzed statistically.
These two examples represent the "worst" and the "best" scenarios found in
production facilities, respectively. The situation described in scenario A represents
the most frequently encountered program. This program is used by companies that
want an in-plant sensory program, but lack either the resources (i.e., panelists,
panelists' time, and management support) or the background in methodology to
implement a more sound program. However, because of the simplicity and the very
short time required to implement scenario A, companies often choose this form of
1These terms are inappropriate for analytical product evaluation. No reference to these terms
(acceptablefunaccaptable, good/bad, typical/atypical, etc.) is made in this book, unless it refers to
the type of program described in scenario A above.
142 ~In/Out" Method
the "in/out" method. The lack of common product rejection guidelines among
panelists to judge products is a shortcoming of the scenario A program. Without
clear guidelines, panelists judge products based on their own criteria and prefer-
ences. This situation leads to highly variable and subjective data. Therefore, this
chapter focuses on establishing defmed and standard decision-making criteria for
an "in/out" program (scenario B). Companies using the simple approach of sce-
nario A can hopefully implement some of these recommendations to improve their
in-planlt sensory program.
The program described in this chapter has the characteristics described in
scenario B above. This program has distinct differences compared to other pro-
grams described in this book (see Table 5-1). These differences make the "in/out"
program unique in its implementation and operation.
A panel that operates in a situation like the one described in scenario B uses an
evaluation form such as the one shown in Figure 5.1. Products are evaluated by
each panelist who indicates if the product is "in" or "out" of specification based on
criteria set by management.
Figure 5.2 shows another version of a ballot for this program. This version
includes a comment section, where panelists indicate the reasons why that product
was considered "out-of-spec." This information is qualitative in nature. Unless
panelists are well trained and use standardized terms to describe deviations, this
information does not provide insight into the reasons of why a product was
"out-of-·spec."
In their evaluations, panelists use the "in/out" product guidelines taught during
training. The "in" guidelines or concept represent the characteristics of "typi-
cal/norrnal" production and tolerable variations. The "out" guidelines or concept
represent the characteristics and intensities of products considered unacceptable or
Table 5-1. Summary of main differences between the "in/out" method and other programs
(comprehensive, quality, and difference from control)
• INSTRUCTIONS
~ l.n
[ l
l
[ l
[ l [ l
[ l [ l
FIGURE 5.1. Example of an evaluation fonn used for the Kin/out" method.
1. Raw materials with one attribute variation-Simple raw materials with one or
two distinctly different attributes that determine the product's acceptance/re-
jection are suitable for "in/out" judgements. Examples: off-notes in any food
material or fragrance, color or chroma of a dye, tnain character of a product
(e.g., main flavor characters of incoming flavors, total orange impact in orange
juice, total sweetness impact in syrups}.
2. Raw tnaterials or fmished products with attribute variations that are not re-
lated-Simple systems such as fruit juices, marshmallows, syrups, and fabrics
fall into this category. The product's attributes vary without interaction and are
perceived as distinctly different attributes. For example, the variable and
independent attributes of marshmallows that would determine "out-of-spec"
144 "In/Out" Method
• INSTRUCTIONS
Please evaluate the products presented to you and indicate in the
s:pace provided if they are "in" or "out" of specification. Use the
"injout" product guidelines taught during training to make your
decision.
Use the space provided to indicate why you considered a product to be
"out-of-specification."
In out
Product ------------ [ ] [ ]
Reasons if 11 out 11
In out
Product ------------ [ ] [ ]
Reasons if "out"
In Out:
:.•roduct ------------ [ ] [ ]
Reasons if •out"
FIGURE 5.2. Example of evaluation form used for the Min/out" method (quantitative and qualitative).
Table S~Z. QC/sensory results on several production batches using the "In/out" method
Percent of Panel
Batch Rating the Batch MIN"
AJ-0016 65.6
AJ-0018 59.8
AJ-0020 58.4
AJ-0024 42.6.
AJ-0026 44.8·
A3-Q028 74.3
*Out-of-sp::cification (Percent "IN" < SO%)
hnplementation of the Program 145
product could be vanillin intensity, size, and firmness. These attributes do not
interact with one another, and, therefore, the only reference materials needed
for training are products that display the levels of vanillin, firmness, and size,
at which the product would be considered "out of spec."
3. Complex raw materials or finished products with many slight variations,
resulting in a major negative attribute- This case includes products where the
variation of many attributes results in a new perceived attribute that is consid-
ered negative, such as a cooked food product that develops a "scorched" or
"burnt" note with excessive heat treatment.
4. Complex raw materials or finished products with large within variability-Ex-
amples include products that show a large batch-to-batch, container-to-con-
tainer, or package-to-package variation (e.g., canned soups, frozen dinners,
meat products). Attribute differences are perceived from package to package,
but they are not considered negative until these differences are extreme. In
their training, panelists are shown both a) variable products that are considered
"in-spec" and b) variable products with more extreme differences considered
"out of spec."
Descriptive Analysis
A complete sensory descriptive characterization of the screened samples is ob-
tained. Data are used for ensuing steps of the program, such as sample selection
and data analysis.
Sample selection
Descriptive data are analyzed to select the minimum number of samples that span
the typical production variability for consumer testing.
Figures 5.3 and 5.4 show these results for strawberry and vanilla yogurts. The
ranges are indicated on 15-point intensity scales (where 0 = none and 15 =
extreme). The distance between the left end marked "0" and first slash mark on the
line represents the lower range intensity value. The circles in Figure 5.3 represent
the strawberry yogurt samples representative of extreme and intermediate intensi-
ties of variable attributes.
Implementation of the Program 149
CHABACTERISTIC SCALE/VARIABILITY
0 15
watery
• • •
surface
Color int.
• • •
Fruit pieces
• • •
integrity
Strawberry
• • •
flavor
Fermented
• • •
fruit
Dairy
• •
complex
Sourness
• • •
Thickness
• •
of base
Lumpiness
• • •
FIGURE 5.3. Summary of strawberry yogurt production variability and samples selected for the
management meeting.
CHARACTERISTIC SCALE/VARIABILITY
0 15
Watery
surface
Color int.
vanilla bean
complex
Dairy
complex
Unripe
cultures
Sourness
Thickness
Lumpiness
1. Selection of the critical variable attributes (i.e., those variable attributes con-
sidered important by management);
2. Selection of "in/out" limits of each critical attribute (i.e., the minimum or
maximum attribute intensity that defmes an "in-spec" product);
3. Establishment of the "in/out" specification (i.e., group of all attribute limits
and qualitative characteristics that defme "in-spec" product);
4. Identification of product references that represent "in-spec" and "out-of-spec"
production; and
5. Identification of the action standard that will be used to determine product
disposition based on the results of the "in/out" panel.
1. Critical attributes. After reviewing the results of the survey (Fig. 5.3), manage-
ment defmes the following to be the critical strawberry yogurt attributes:
a. Integrity of fruit pieces;
b. Strawberry flavor intensity (complex);
c. Fermented fruit; and
d. Lumpiness.
2. "In/out" limits. The tolerable limits of each critical attribute are set (Fig. 5.5).
CHARACTERISTIC SCALE/VARIABILITY
0 15
Fruit pieces
• >
integrity
Strawberry
•- >
flavor
Fermented
•
fruit
Lumpiness <-
•
FIGURE 5.5. Establishment of "infout" specification and selection of product references.
hnplementation of the Program 151
Therefore, management may choose to set the action standard based on "in/out"
evaluations as being, "Accept any batch of product for which P ~ 50% and reject
any batch for which P < 50%." Alternatively, in recognition of the variability of the
panel responses, management may choose to create a "gray zone" around the
cut-off point with an action standard such as, "Accept any batch for which P > 60%
and reject any batch for which P < 40% Submit batches for which 40% ~p ~ 60%
for additional testing (e.g., descriptive analysis)." The selection of an action
standard is a management decision. It is not the sole responsibility of the sensory
coordinator. For the strawberry yogurt program, a 50-percent cut-off point for P
was selected.
Descriptive Analysis
Several of the product references identified in the management meeting (Fig. 5.5,
circles) are evaluated by a descriptive panel managed by the consulting fl11ll. The
products evaluated are those samples that represent the "infout"limits of all critical
sensory attributes. For example, to document the "in/out" limits for strawberry
~
-
~
~
a) Ideal Response Curve for "ln/Oul" Evalualions b) Aclual Response Curve for "ln/Oul" Evaluations
t
~ 100 r! 100
.Q
iV ~:J
:J
tii ii
Lii Lii
~ 50 :s0 50
0
Q) CD
:if lif
c c
~
Q) 0 ~CD 0
0.. 0..
Exlremely I Extremely Extremely
Oul ln/Oul In ln/Oul In
Boundary Boundary
FIGURE 5.6. The ideal (a) and actual (b) response curves for "in/out" evaluations.
Implementation of the Program 153
APPEARANCE
Watery surface 1.1 0.5
Color chroma 8.2 7.3
Color intensity 6.5 7.0
Fruit piecesjamt 5.2 6.0
Fruit pieces integrity 3.7 2.7
Lumpiness 7.9 6.1
FLAVOR
Strawberry complex 6.5 4.4
Fresh strawberry 2.5 0.5
Cooked strawberry 4.1 4.0
Fermented fruit 0 2.5
Dairy complex 5.8 4.5
Cultured 4.1 3.4
Milky 2.3 1.5
Butterfat 1.2 0.5
Unripe cultures 0 0
Sweet 8.5 8.0
Sour 6.1 5.0
Astringent 4.3 4.0
TEXTURE
Lumpiness 7.5 6.0
Firmness 6.7 5.5
Cohesiveness 4.1 3.5
Fruit awareness 4.5 4.0
Mixes with saliva 9.5 9.2
Cohesiveness of the mass 3.3 3.0
Lumpiness of the mass 4.7 3.5
Dairy film 4.2 4.0
Residual fruit pieces 2.6 2.0
154 Kin/Out" Method
are to match the baseline results. References have to be obtained periodically for
panel calibration.
Panel Selection
Plant employees are recruited to participate in the program. Prescreening and
screening phases are completed to select the panel members. Optimally, 25 or more
panelists are selected. However, smaller panels are usually formed in small plants.
The use of larger panels permits a sensitive analysis of the resulting data, while
smaller panels yield highly imprecise "in/out" measurements such that the practi-
cal value of the approach comes into question (see "Data Analysis" on pg. 161 in
this chapter.).
Traini111g Program
The main advantages of the "in/out" training program are its simplicity and short
time to complete. The total time required depends on the type of product in the
program and varies between 5 and 10 hours.
As in other QC/sensory programs, the two training steps are the basic sensory
evaluation phase and the product training phase. In the basic sensory evaluation
phase the panelists are familiarized with the applications of sensory evaluation, its
importance within the company, and the basic concepts of physiology that play a
role in the evaluation of the company's products (e.g., appearance, flavor, texture).
The product training phase focuses on teaching the panelists the "in" and "out"
product concepts (i.e., the characteristics of "in" and "out" products).
While the comprehensive descriptive (Chapter 3) and quality ratings programs
(Chapter 4) require a detailed training in both the qualitative and quantitative
aspects of attributes, the "in/out" training program requires no specific attribute
training, since no information on product attributes is collected. Only a very
general discussion of product attributes is held to allow panelists to focus on those
Implementation of the Program 155
attributes that detennine "in-spec" and "out-of-spec" product. The five steps
followed in this training program are:
1. A discussion of the nature of "in/out" judgements (i.e., methodology to fol-
low);
2. The identification of the critical attributes that are to detennine "in" or "out"
production;
3. The review of each "in/out" attribute limit;
4. The integration of all concepts into defining the "in" and "out" product
concept; and
5. The evaluation of references and regular production samples (practice period).
This process is illustrated for the strawberry yogurt example.
Table 5-4. List of all strawberry yogurt sensory attributes and critical attributes (*) chosen
by management
• The identification of the attribute level that defmes the limit between "in" and
"out" product (i.e., "in/out" specification); and
• The practice of "in/out-of-spec" judgements for that attribute using "in-spec"
and ~·out-of-spec" products. This process is illustrated for one of the strawberry
yogurt attributes, fermented fruit, in Table 5-5.
Practicx: Period
One to 1three practice sessions, as described above, are scheduled after the training
is completed. In these practice sessions, samples are only identified with three digit
Implementation of the Program 157
Table 5-5. Approach followed in the training or "in/out" judgements per attribute
1. Definition: The aromatic associated with overripe and soured fruit resulting from fermentation.
3. "In/Out" limit
Code Intensity
128 2.0
Panel Maintenance
The "in/out" QC/sensory program also requires a maintenance program that
addresses psychological and physiological considerations. Chapter 3 summarizes
some of the activities involved in the maintenance program. Among the psycho-
logical aspects to be addressed are recognition and appreciation of the panelists
participation, planning special group activities, rewards, panel performance feed-
back, and panel review sessions.
158 "In/Out" Method
For some perishable products, a large amount of the products that represent the
"in/out" limits can be obtained initially, stored under controlled conditions for a
long time, and used as needed. For most products, however, replacement product
references need to be produced/identified on a regular basis. To properly adminis-
ter the acquisition and use of "in/out" references, the sensory coordinatorneeds to
use the descriptive data base gathered at initial stages of the program (pg. 147 or
151), and a descriptive panel that can characterize the sensory properties of the
replacement products considered for "in/out" references.
The review of "in/out" references allows panelists to recall the product criteria
to judge if products are "in-specification" or "out-of-specification." This practice
also decreases within panelist variability.
• The evaluation of blind references. Products known to be "in spec" and others
known to be "out of spec" (blind controls) are presented in the set to monitor that
panelists mark the samples appropriately. For instance, Table 5-6 summarizes an
early panelist monitoring study on the yogurt "in/out" panel. Panelists 5 and 22
are overly critical, marking many "in-spec" samples as "out"; conversely,
panelists 21, 25, and 28 are too lax, marking many of the "out-of-spec" product
as being "in"; and panelists 11 and 16 exhibit a low level of agreement for both
"in-spec" and "out-of-spec" samples. These fmdings indicate the need to sched-
ule sessions to review the "in/out-of-spec" limits.
Data Collection
Samples to be evaluated by the plant panel are collected at the same time as the
samples that will undergo other QC testing. The frequency of sampling (i.e., the
hnplementation of the Program 159
Table 5-6. Panelist monitoring results on level of performance with hidden reference samples
1 71 82
2 78 76
3 80 77
4 73 75
5 47 91
6 83 79
7 80 81
8 72 76
9 75 79
10 84 81
11 49 46
12 77 73
13 83 80
14 73 76
15 71 74
16 41 47
17 78 72
18 81 83
19 79 81
20 84 85
21 92 48
22 44 91
23 84 71
24 83 76
25 90 48
26 75 77
27 81 72
28 93 49
29 79 81
30 77 78
number of pulls per batch) is determined using the same QC criteria that apply to
all other tests. For example, the strawberry yogurt is sampled three times (early,
middle, and late) in the packaging step of each production batch.
Unlike other QC/sensory methods, there are situations in the "in/out" method
that may require that each panelist evaluate a given sample of product several
times. The number of times each panelist evaluates a single sample is determined
by the level of precision that the sensory coordinator requires of the data. The
issues influencing the sensory coordinator's decision are discussed in detail in the
following section on data analysis. For the strawberry yogurt, the sensory coordi-
160 ~In/Out" Method
nator has determined that each of the three production samples (early, middle, and
late) must be evaluated four times in order to achieve the necessary level of
precision. Four separately coded samples from each of the three production
samples are prepared and presented to the panelists, yielding a total of 12 samples
to be evaluated. A complete block presentation design is used. The results of the
respondents' "in/out" evaluations for batch A3-0028 of strawberry yogurt are
presented in Table 5-7. The table is arranged for convenient data analysis that takes
into account the multiple evaluations of each production sample.
Table 5.7. Example of a tally sheet for the "in/out" method for a batch of strawberry yogurt (I =
"in," 0 ="out")
Batch: A3-0028
Sampling
Period I-- Early -----j f--- Middle - - - j 1---- Late ----1
Samples 928 532 174 825 692 773 324 418 992 586 273 401
Respondent
1 I I I I 0 0 I I I I
2 0 I I I 0 I 0 I 0 I
3 0 I I 0 I I I I I I
4 I I I I I I I 0 I 0
5 I 0 I 0 I 0 0 .I I I
6 I I I 0 I I I I I 0 I I
7 I I I I I 0 0 0 I I 0 I
8 I 0 I 0 I 0 I I I 0 I I
9 I I I I I I 0 0 I I I 0
10 I I I I I 0 I 0 I I 0 0
11 I I I I 0 I I I 0 I I I
12 I I 0 I I 0 0 0 I I 0 I
13 0 I I I I I I I I I I I
14 I I I I 0 0 I 0 I I 0 I
15 I I I 0 I I 0 0 I I I I
16 I 0 I I 0 0 I 0 I I I I
17 I I I I I I I I I I I I
18 0 I I I I I I I I 0 I 0
19 I I I 0 I 0 0 0 I 0 I I
20 I I I I 0 I I I I I I I
21 I I I I I 0 0 0 I I I I
22 I I I 0 I I I I 0 I 0 I
23 0 I I I I I 0 0 I 0 0 I
24 I 0 I I 0 I I I I I I 0
25 I I I 0 I 0 I 0 0 I I I
Number"in" 22 20 23 19 18 14 16 12 21 20 18 20
SampleP 88 80 92 76 72 56 64 48 84 80 72 80
Period P 84 60 79
BatchP Average = 74.3
Range= 24.0
hnplementation of the Program 161
Data Analysis
The statistical treatment of the "in/out" responses is straight forward. The propor-
tion of respondents who rate the sample as "in" (P) is used to estimate the
proportion of all possible respondents who would fmd the product to be within its
standard frame of identity, where:
S= ,j P (100-P)
25
which for P =50% yields S=IO%. This is an extremely large measurement error
for a QC method. The precision could be increased by increasing the size of the
panel. However, doing so has limited practical value. The number of respondents
would have to be increased to 100 to double the precision (i.e., S=5%) versus a
25-person panel. Another way to obtain the same increase in precision is to have
each panelist evaluate four samples of product from the same batch (or if within-
batch sampling is being done, to evaluate four samples of product from the same
sampling period). P would be computed for each sample, and the average of the
four P values would be reported as the panel result. The sample standard deviation
of the average of four evaluations is half that of a single evaluation (see Appendix
1), since:
S4 =Sif4= S/2
"Sample P" in Table 5-7). The four P values for each production sample (i.e.,
early, middle, and late in the batch) are then averaged to yield a "Period P"
corresponding to the time period in which the production sample was obtained.
The "Period P" in Table 5-7 are the raw QC/sensory data for each batch of
strawberry yogurt-that is, the overall batch average P and the range of P values
for the batch are computed from the three " Period P" values (see Table 5-7).
The P values should be recorded in a format consistent with that used for other
QC tests.
Speciifically, if multiple samples are collected within each batch, the batch
average and range (i.e., the average and range of the Period P values) are computed
and recorded (see Table 5-8). If only a single sample per batch is collected, the
single P value (an average, if multiple evaluations are performed) is the batch
rating; no measure of within-batch variability is available.
Report System
The sensory coordinator should highlight any samples that fall "out-of-specifica-
tion," using the company's standard procedures for such occurrences. For example,
the management team for the yogurt manufacturer has chosen a 50-percent cut-off
point for P . Batches of product with an average P less than 50-percent are rejected.
The current batch of strawberry yogurt (A3-0028) has a batch average P = 74.3%
and, therefore, is acceptable from the QC/sensory perspective. However, several
recently produced batches had unacceptably low P values. The sensory coordina-
tor has highlighted those batches that fail to meet the action standard and submitted
the results for management review and action (see Table 5-8). The key factor to
bear in mind is that the P should be treated in exactly the same manner as any other
analytical measurements being collected for QC. All standard company formats,
reporting procedures, and action standards apply.
If an SPC program is in effect, control charts for the P response should be
Table 5-~1. Results on several production batches of strawberry yogurt using the "in/out"
method
Percent Min"
Batch Average Range
A3-0016 65.6 20
A3-0018 59.8 19
A3-0020 58.4 18
A3-0024 42.6* 23
A3-0026 44.8* 17
A3-0028 74.3 24
*Out of Specification (Percent "In" 50%)
hnplementation of the Program 163
1. Program growth:
a. Introduction of new products;
b. Introduction of other points of evaluation in the process;
c. Introduction of new critical attributes to be included in the "in/out" guide-
lines;
d. Design and execution of research projects; and
e. Improvement of sensory methods (modification of the "in/out" method or
introduction of new sensory methods).
164 "In/Out" Method
X-Chart
---------------UCl.
90
•
Q).5
a• 80
!!!'I:
~ ~
<CD 70
0..
60
LCL
so -~.u~~------------------- -
5 10 15 20 25 35
Observation (Batch)
R-Chart
50
- - - - - - - - - - - - - - - - - - UAL
-o;-
.
c:
40
& 1:: 30
a:a~ell
0..
35
Observation (Lot)
FIGURE ~i.7. The X-chart and R-chart for batches of strawberry yogurt evaluated using the
"in/out" m1:thod. Note the six consecutive decreasing batches that indicate that the process was "out
of control."'
Other Versions 165
OTHER VERSIONS
"In/Out" Judgement with Descriptive Data
The concepts of a descriptive method (attribute evaluation) can be incorporated
into the "in/out'' program. The rating of key attributes is incorporated to provide
information on specific attribute variations and on the reasons why a given product
was considered "out-of-spec" (see Fig. 5.8).
The advantage of this modified program is the collection of additional product
information, which is helpful in making decisions on product disposition. The
disadvantage is that the program becomes more complex, requires additional
training and resources, may increase the time to complete evaluations, and may
decrease the number of samples evaluated in a session. Specifically, the panel
needs to be trained in the oetection and description of attributes, and attribute
references need to be collected and maintained.
The incorporation of descriptive concepts into the "in/out" method is not
recommended forthose cases in which a simple and quick evaluation is needed
(e.g., simple raw ingredients). Rather, this modified program may be valuable
in the evaluation of more complex products, for which the additional descrip-
tive information is helpful in defining the way production varies. Some exam-
ples include scented products (e.g., laundry detergents), lotions and other
personal care products, and more complex raw ingredients or food products
(e.g., chocolate).
This modified "in/out" program tends to become a descriptive method in the
long term, especially as the panel becomes well trained and skilled. As panelists
improve their skills in focusing on and rating individual attributes, they may
encounter difficulties in integrating all the attribute information on only one
integrated judgement ("in" or "out of spec").
Name______________
Product code_____
• INSTRUCTIONS
PlEoase look at and smell this product and rate the intensity
of the attribute listed below using the scale
0 - - - - - > 10
none extreme
Attribute Intensity
Chroma
Floral
Citrus
A.nimaljbase
Plastic
• "J:N/OUT 11 ASSESSMENT
Indicate if the product is "in" or "out" of specification based
Ol1 your assessment of the individual characteristics.
Mark one.
"OUT"
FIGURE 5.8. Evaluation form used for "in/out" judgement with attribute ratings (scented laundry
detergent).
trained). However, this approach is advantageous and efficient once the program
is in operation.
In this approach, the "in/out" panel operates routinely as described in this
chapter Products are judged to be either "in-spec" or "out-of-spec." Products
found "out-of-spec" could be put on hold until the second evaluation is com-
pleted by the descriptive panel. This evaluation consists of a brief descriptive
characterization of the questionable products. For scented laundry detergent,
the attributes evaluated could be chroma, floral, citrus, animal/base, plastic, or
other attributes.
The advantages of this approach are:
• The two panels are used in the most efficient manner. The least expensive
Other Versions 167
sensory tool ("in/out" panel) is used routinely, while the most expensive sensory
tool is used only in special occasions, when requited.
• Descriptive information is obtained on rejected products, which helps explain
the reasons why that product is "out-of-specification" and provides guidance to
correct problems.
6
Difference-from -Control
Method (Degree of Difference)
ABSTRACT
The difference-from-control method (also called the degree of difference method)
utilizes a standard overall difference method of rating to determine how much any
production sample varies from a control product. This method requires a consistent
easy-to-hold or easy-to-reproduce control. A panel of subjects is trained briefly to
recognize and rate samples that represent an array of product differences from the
control. In some cases, attributes can be rated in addition to the overall difference
from control. These attribute ratings can be in the form of intensity ratings or
difference from control for the attribute.
The main advantage of this method is the simplicity of a single overall rating
that allows for rapid decisions in regard to disposition of daily production. The
addition of attribute ratings provides additional information as to sources of the
differences.
This program's disadvantage is that the overall difference ratings for a produc-
tion sample does not provide enough information about the source of the differ-
ences to allow for the correction of raw materials or process. Even the inclusion of
a subset of attributes may be insufficient to track all variability sources.
PROGRAM IN OPERATION
When a difference-from-control method (also called degree of difference) is
implemented as a quality control tool, the intent is to track production at one or
more plants in terms of the amount that any fmished product or raw material differs
from a control or standard for that product or material. This overall difference test
method is employed to measure the degree of difference and accept product that
168
Program In Operation 169
falls below some critical difference value and is closer to the control, and to reject
product that falls above the critical difference value.
Panelists are trained to recognize the control, to recognize an array of products,
which differ along a continuum from the control, and to rate those differences using
a rating scale. Figure 6.1 shows different forms of difference from control rating
scales. Table 6-1 shows difference-from-control ratings for four samples. Each
score is based on the mean value for 25 panelists. One sample, a blind control, has
the lowest rating, which means it was perceived as closest to the control. This is a
check of internal validity of the panel and serves as a measure of the effect of
asking the difference question (placebo effect).
QC management uses these data to decide on product disposition, based on
previously set criteria for the product. For example, if a difference larger than 5.5
on a 10-point scale is considered out-of-spec, the product 819 is held for rework,
while products 476 and 623 are released for distribution (Table 6-1). At no time are
the panelists aware of the accept/reject criteria for any product.
no -------------------------------------------------e-x7
t~eme
difference difference
Using the scale below, indicate with any whole number the degree of
difference.
o •no difference
20 •very slight difference
~o •slight difference
60 •moderate difference
so = large difference
100 • extreme difference
Difference
Sample Code Sample Identification from Control Rating*
In this chapter, two cases are described, for which the difference-from-control
method is recommended. In both cases, the variation from control tends to be along
a single underlying dimension. Case 1 involves a product (air freshener) with a
control that is consistent across each unit and can be held consistent over a long
period of time. Regular production tends to vary from the control through a loss of
fragrance: intensity accompanied by an increase in the odor of the carrier or "base."
Case 2 involves a heterogeneous product (bran flakes with raisins), for which
the control is not uniform from box to box in amount and texture of raisins and
uniformity of the flake size.
In order to account for this box-to-box variability for both the control and for
any set of production samples, it is necessary to use 1) the equivalent of a blind
control (a sample from the same batch as the control), 2) a secondary control (a
sample from a different batch of control product), and 3) the production sample(s)
(Aust et al. 1985).
In some cases, the QC/sensory program management decides to include some
additional ratings on specific attributes in order to have more information on each
product tested.
These steps are similar to the approach followed for the comprehensive descrip-
tive method (Chapter 3), except for steps 3 and 7, which involve consideration of
the size of the difference from control rating as the major criterion for setting
specifications.
The two cases presented above (the air freshener and bran flakes with raisins)
172 Difference-from-Control Method (Degree of Difference)
are used as examples to illustrate the steps. The air freshener has a control, which
is very consistent from container to container and over time, and variation in
production that corresponds to one underlying dimension-loss of fragrance. The
bran flakes and raisin cereal has variability from box to box of the control and
production samples and a need for a measure of changes across a few attributes.
Modified Spectrum Method (Meilgaard et al. 1987), or rate the degree of differ-
ence for those attributes, as will be discussed in the bran flakes with raisins
example.
Descriptive Analysis
Although the focus of this method is an overall assessment of differences that can
be rated by a descriptive panel, descriptive characterization provides added infor-
mation in terms of:
Table 6-2. Attributes for characterization of air freshener and intensity ranges for produc-
tion survey
Intensity Ratings
for Control Range for Survey
Table 6-3. Descriptive analysis of control and intensity ranges for production survey
Appearance
Unifonnity of flake size 8.6 6.1-10.5*
Color intensity 8.2 8.0-10.2
Chroma 4.1 2.3-4.8
Amount of raisins 6.5 5.0-7.8*
Size of raiisins 8.1 8.0-8.4
Flavor
Texture-Raisin
Firmness 5.1 4.0-7.1*
Cohesiveness 10.4 10.0-11.2
Moistness 12.3 10.0-12.9
in Case 2 provides the documentation of the control and references for range of the
difference as in Case 1. In addition to this information, which describes the
differences in the major underlying dimension (effect of heat processing on the
flakes), the descriptive data provides information regarding the references that
describe differences in other specific attributes. These other attributes include
uniformity of flake size, amount of raisins, and raisin firmness. The asterisks in
Implementation Of The Program 175
Table 6-3 indicate that these are characteristics that vary more than three scale
points of the rating scale.
Sample Selection
The analyses used for sample selection are described in "Sample Selection" on
page 60 in Chapter 3. Some additional analyses can be done to account for the
importance of the difference from control ratings.
Following these steps, 6 products are selected for the air freshener consumer
test and 13 products are selected for the cereal consumer test.
6
~
:c
.1!1
a.
Q)
u
~ 5
~
Q)
0
4
0 2 3 4 5 6 7 8 9 10
the presentation of the relationships should proceed from the simple to the com-
plex.
>- • •
Q):=; •
N ·-
·- .D
rn cu
• • • • •
•
(1)-
~a.
-
C\1 Q)
LL 0
0 • •
<:
If)
Ill >-
Q)-
C:;::
ED
•
-~ C\1
LLQ.
.S ~
Ill 0
"(ij<
c:
FIGURE 6.3. Difference-from-control for attributes vs. consumer acceptance for attributes.
178 Difference-from-Control Method (Degree of Difference)
sample (10). The relationship of these samples, their difference ratings, and their
effect on consumer responses are discussed.
Management may chose to include some attributes that it considers critical or
essential Ito product integrity and that were not high in variability and/or did not have
dramatic effect on consumer responses. Alternatively, management may chose to
eliminate certain attributes (uniformity of flake size) in Case 2-bran flake and raisin
cereal-because they do not have the capacity to control the variable.
Setting Sensory Specifications For the difference-from-control method,
management's primary consideration is the relationship between consumer accep-
tance and overall difference from the control. In both cases (air freshener and bran
flake and raisin cereal), a critical value for consumer acceptance needs to be
selected, based on past consumer research data.
In the case of the air freshener, Figure 6.2 shows that the critical value of 6.5 in
consumer acceptance is considered a critical value. This value corresponds to a
difference rating of 5.0. Having reviewed the samples representing 2, 4, 6, 8, and
10 differences, management decides to set a 4.1 as the difference rating that is to
be considered too large to ship, the reject point.
Discussions regarding which attributes to include are based on several
factors. For uniformity of flake size, the size of difference does not influence
liking, and that attribute is dropped. Since raisin counts are done regularly in
the QC lab, the "amount of raisin" attribute is eliminated from the sensory
evaluation. Specification is set for difference ratings on raisin firmness, since
it is a variable attribute and affects consumer responses. In addition, the
attribute of flake chroma is to be included, since the descriptive intensity data
is highly correlated with overall liking (Fig. 6.4). Difference from control
references can be developed from the samples in the consumer research study.
In addition to deciding on the critical difference and any additional attributes
to be evaluated, management should also confirm the array of samples that
represent the various degrees of difference from the control. These samples
then can be used as training references.
Final Evaluation Form After the management meeting, the training pro-
gram is scheduled and the evaluation form to be used in product evaluation at
the plants is developed. Figure 6.5 shows the evaluation form to be used to
determine degree of difference for the air fresheners. In Figure 6.6, the form
used for overall difference and selected attribute difference are shown for bran
flake and raisin cereal.
-
>.
·- 6
-
.c
m
a.
Ql
0
0
<(
5
"§
Ql
>
0
2 3 4 5
Flake Chroma Intensity
samples that vary to different degrees from the control, but also, in some cases,
samples that represent variations in specific attributes. After evaluating this prod-
uct array and the corresponding difference ratings, management determines the
critical point (the degree of difference from the control) above which product is
considered out of specification.
Training Steps
The operation of a difference-from-control method requires the selection and
training of a panel to evaluate production or raw materials in relationship to the
sensory properties of a control.
The panel may be selected from among employees or from residents of the local
community. Advantages and disadvantages of each approach are discussed in
"Training and Operation of Plant Panels" on page 84 in Chapter 3.
180 Difference-from-Control Method (Degree of Difference)
!:r.st!"1Jctions:
Test samples in the order shown below.
'Ies·t: the control first before each coded sample.
Uncap the container for 1 to 2 seconds and sniff lightly.
Rate the difference in overall fragrance from the control.
Use the scale shown below.
o = no difference
1
2
3
4
5
6
7
8
9
10 extreme difference
Instructions:
Test samples in the order shown below.
Test the control first and before each coded sample.
Rate the attribute differences for one sample.
Then rate the overall difference from control for that sample.
Use the rating scale shown below.
Difference-from-Control
Samole Code Flake Chroma Raisin Firmness Overall Diff.
0 = no difference
2 = very slight difference
4 = slight difference
6 = moderate difference
8 = large difference
10 = extreme difference
FIGURE 6.6 Final evaluation form difference-from-control-bran flake with raisin cereal.
In the product training phase, panelists are trained to recognize and rate a
variety of differences from an identified control that is kept as constant as possible
throughout both the training and testing phases.
Trainin:g Program
The training program for the difference-from-control method involves two stages:
• A basic sensory phase that introduces the panelists to basic sensory evaluation
defmition, physiology, and applications of sensory techniques within the com-
pany in general and with respect to quality measures at the plants.
• The difference from control method, which includes learning to detect and rate
the size of the differences from the control product overall, and, in some cases,
for specific attributes. This stage also includes training for any special evaluation
techniques (sniff, chew, manipulate) that will insure consistency across panel-
ists.
For the air freshener example, the product training phase involves familiarizing
the panel (30+) with the product control. One essential aspect of the training
process is to provide the panelists with enough exposure to the control, so it is
easily recognized. Therefore, the first and primary step is to introduce the control
product and provide some sensory cues to the panelists to improve on both
recognition and sensory memory for the control. Table 6-4 shows the information
given to the panel, along with the sample to aid in the learning process.
After the panelists have had an opportunity to become somewhat familiar with
the control and the evaluation procedure, difference from control references can be
introduced in pairs along with the control to demonstrate the types and size of
differences that panelists may be expected to encounter (see Table 6-5) in normal
hnplementation Of The Program 183
Sample Control
Procedure for evaluation: Hold container 1 inch from nose; remove cap for 1 to 2 seconds only; sniff
contents of container using shallow sniffs. Use this procedure for all samples.
Difference-from-Control = 0
Total Fragrance moderate to strong
Lemon
Floral
Vanillin/sweet
Base Odor very slight
Waxy
Solvent
Difference-from-Control Scale
0 =no difference
2
4
6
8
10 = extreme difference
product evaluation. The descriptions are not intended to teach the panelists to be
descriptive but are intended to provide cues to detecting and remembering what
types of differences correspond to certain difference ratings.
After the panelists have evaluated several "known" references with identified
difference ratings and descriptive cues, they are given a series of unknowns to
compare to the control and rate for overall difference. Included in this series of
"test samples" should be a blind control that acts to measure the placebo effect in
the difference from control test. The placebo effect is a measure of the degree of
difference that the experimenter gets simply from asking the difference question.
The real test of difference is the ability of the panel to separate any reference
(different) sample from the blind control. (See "Establishment of a Data Collec-
tion, Analysis, and Report System" on pg. 189 in this chapter for a discussion of
data analysis for degree of difference.)
The training process involves providing enough known references and controls
to the panelists so that eventually they are able to feed back the same or almost the
same ratings for those references and the control itself, when they are presented as
coded samples to be evaluated.
The bran flake and raisin cereal requires more training because evaluation
includes difference-from-control ratings for two attributes [flake chroma and raisin
firmness], plus the overall difference rating, discussed earlier for the air freshener.
For the cereal, the overall difference ratings are based on differences that are
184 Difference-from-Control Method (Degree of Difference)
Difference-from-Control
0 = no difference
2
4
6
8
10 = extreme difference
related to the major underlying variables in the processing of the bran flakes. The
overall differences are manifested in increases in toasted flavor and color intensity
and occasionally some burnt or scorched flavor notes. Once the panelists are able
to provide accurate overall difference ratings for the cereal references, they are
introduced to the references and difference ratings for the flake chroma and raisin
firmness attributes. Tables 6-6 and 6-7 are examples of the training forms used to
introduce each control and two difference references. These include 1) the descrip-
tion intended to help the identification and learning and 2) the procedure used for
evaluation.
Implementation Of The Program 185
Procedures:
• Use a 75-watt Incandescent bulb directly over the samples at 3 to 4 foot distance.
• Sit or stand so the eyes are 2 feet from the sample horizontally and 1.5 to 2 feet above the
table surface.
• Set the control valve and test bowl side by side under the light source.
• Shut off the special bulb between sample chroma ratings.
Procedure
• Compress one whole raisin between molars partially.
• Repeat with another whole raisin and compress fully.
• Chew down raisin until disintegrated.
Control 0 Soft
Moist
Chewy
Slightly springy
As with overall difference, the reference and control are then repeated as coded
samples. Panelists are expected to rate each sample with the difference rating it was
assigned initially.
Note: At no time are panelists ever told the critical values of difference for the
sensory specifications.
Training for each aspect (overall difference or each attribute difference) can
186 Difference-from-Control Method (Degree of Difference)
require four hours of training (in addition to the basic sensory training of four
hours). Another three to four weeks of practice sessions (one-half hour, three
days a week), for a total of about five to six hours of practice, are necessary to
insure the ability of each panelist to provide the appropriate rating for the coded
(blind) control and references samples. Regular feedback to the panelists,
includiltlg tabular and graphical summaries of their performances against the
known values of the references, should be provided throughout the train-
ing/pra,ctice period.
For the air freshener example, the average difference-from-control ratings
of the reference samples are displayed for ten of the trainees in Table 6-8. The
sensory coordinator should examine each panelist's performance versus the
known values of the reference samples in terms of both their average ratings
and their reproducibility (i.e., their standard deviations). Panelist performance
can also be monitored with graphical data summaries. For instance, Fig. 6.7
shows :several possible outcomes of a set of panelist ratings of two training
sample:s of the bran flake with raisin cereal. The significance of these patterns
is assessed statistically by studying the judge-by-sample interaction effect in
an analysis of variance performed on the trainees' data (see Appendix 3). In plot
a, all of the panelists agree on the general trend, but differ slightly in their
average difference-from-control ratings of the two samples. In plot b, all of the
panelists exhibit the same directional trend, but panelists 3, 6, and 7 show a
much l;uger difference between the two samples than the rest of the panel. In
plot c, paneli~ts 2 and 7 differ substantially in the pattern of their ratings
compared to the other panelists, while panelist 9 shows no difference between
the samples. Each of the plots indicates the need for continued training. Plot c
is the most severe case, followed by plot b, with plot a representing a minor
problem that additional calibration to reference samples should eliminate.
Panel Maintenance
In order to assure that the difference-from-control technique continues to properly
function over time, a maintenance program is necessary. Attention is given to both
the physiological and psychologic factors discussed in Chapter 3. Psychological
considerations, such as panel recognition, rewards performance feedback, and
routine panel review sessions, address attitudinal aspects of the panel and panelists.
The physiological maintenance addresses performance aspects of the program
and focuses on providing the necessary calibration sessions for panelists to main-
tain their abilities to detect and rate differences in compliance with the training
references. Panelists are told when a blind control or reference appears during
regular sample evaluation.
Occasionally, a series of samples from recent production, which have been
carefully selected to represent two or three size differences from the control, plus
Table 6-8. Panelist perfonnance summary table-average and standard deviation In degree of difference ratings on unidentified reference
samples
Actual Panelist
DOD 1 2 3 4 5 6 7 8 P_!!llel
X±S X±S X±S X±S X±S X±S X±S X±S X±S
--
2 2.1±1.3 2.2±1.4 1.9±1.5 2.9±1.7 2.1±1.7 2.6±1.9 2.3±2.0 1.8±1.2 2.2±1.6
4 3.7±1.4 3.8±1.9 4.2±1.5 6.1±2.1 4.1±1.8 3.6±1.6 3.8±1.7 3.7±1.7 4.1±1.9
6 6.0±1.2 6.4±2.0 5.9±1.6 4.9±1.9 6.1±1.4 6.2±1.8 5.9±1.5 6.1±1.5 5.9±1.7 ~n
8 7.5±1.8 8.3±1.6 8.2±1.6 7.9±2.0 8.4±1.9 9.2±2.1 7.9±1.3 8.5±1.6 8.2±1.8
10 8.8±1.7 9.1±1.5 9.4±1.3 8.9±1.9 8.9±1.7 9.3±2.0 9.4±1.8 9.6±2.0 9.2±1.8
:::.
Entries are : Mean ± Standard Deviation
Ig
Q
~
1
~
00
...:I
-
a b c
00
00
15~----------------~
-
151 I 151 I
S1.
~
12 12 12 ~
::;>
Judge Judge
Numbers Numbers ~
7 7
6 2 l
9 ~9 3 >-9
-~ "iii -~
(f) Judge c c:
c Q)
Numbers 2c I
2c
7
.E
Q)
Q) 4 2:::J
38 8
:56 ~6 ,g6 9 '< 7~
.D 96 ·;:: 9
·;::
t: 2 1 <
:t:: 2
5 ~
i
g,
tj
<
A 1
8
~
3 ~·· 3 3 31 l
64
5
F1GURE 6.7. Judge-by-sample interaction plots to assess the level of agreement among panelists during training.
Implementation Of The Program 189
a blind control, are submitted for repeated evaluations. Data from the three or four
samples [blind control+ two or three different rated references] are evaluated for
panelist and panel performance. Using the same tabular and graphical summaries
that are used during the training/practice period, the panel leader monitors the
panelists. These summaries are presented to panelists to inform them of their
performance.
In addition, !-charts (see Appendix 5) of each panelist's difference-from-control
ratings (both overall and, when appropriate, for individual attributes) on hidden
reference samples should be maintained. Biased ratings and/or excessive variabil-
ity are readily apparent in such charts.
These data summaries serve as maintenance tools for panelists' attitudes as well
as daily performance. Management and the QCfsensory analyst and coordinator
also have a measure of the panel's and individual panelist's ability.
Data Collection
Samples to be evaluated by the difference-from-control panel are collected at the
same time as the samples that will undergo other QC testing. The frequency of
sampling is determined by using the same QC criteria that apply to all other tests.
For the air freshener, three production samples are pulled from each batch. For the
bran flakes with raisins cereal, only a single sample of product is collected for
evaluation from each batch.
The number of samples submitted to the panel in one session is affected by two
factors:
1. The number of QC samples that are pulled from each batch of production; and
2. The existence of a substantial amount of perceivable variability among control
products.
At each session, the panelists are presented with an identified control, along
with a set of unidentified "test" samples (i.e., samples labeled with random
three-digit codes). The "test" samples consist of any QC production samples
awaiting evaluation and at least one unidentified sample of control product. In the
air freshener example, the difference-from-control panel is presented with an
identified control and four test samples-that is, the three QC samples that were
pulled from the current production batch and one hidden control. There is no
substantial variability with control air freshener product. In the bran flake with
raisins cereal example, each panelist is presented with an identified control and
three test samples-that is, the one QC sample from the current batch and two
190 Difference-from-Control Method (Degree of Difference)
hidden controls. The hidden controls are obtained from two different batches of
control product so as to capture any within-control variability that exists (see Aust
et al. 1985). Typically, one of the hidden controls is obtained from the same batch
as the identified control; the other is obtained from a different batch of control.
If the number of samples per session is small, a complete block presentation
design is used. If, however, the number of samples per session exceeds that which
can be evaluated before sensory fatigue sets in, then a modified balanced incom-
plete block (Bffi) presentation design is used. In the modified Bm, all hidden
controls are presented in every set of samples (i.e., in each "block"), and the QC
samples are presented according to a Bffi design (see Gacula 1978). Sensory
coordinators using this approach need to consider the number of blocks in the Bffi
portion of the design because of its impact on the number of panelists required.
Further, the sensory coordinators need to be aware that special data analysis
techniques (those available, for example, in SAS® PROC GLM) need to be used
to summarize the data.
Complete block presentations are used for both the air freshener and bran
flakes with raisins cereal examples. In both cases, each panelist evaluates the
test samples and rates each with a difference-from-control score corresponding
to the perceived difference from the identified control. Only overall difference
from conltrol is evaluated on the air freshener samples. For the bran flake with
raisin cereal, the ratings include both overall difference-from-control and
differenct~-from-control on specific, critical attributes. A large pool of respon-
dents is used to evaluate the air freshener samples with the current panel
consisting of 25 panelists. Their difference-from-control ratings for the hidden
control and the three QC production samples are presented in Table 6-9. A
smaller, more highly trained group of respondents make up the QC/sensory
panel for the bran flake with raisin cereal. For the current batch of cereal, eight
panelists evaluate the two hidden controls and the one QC production sample
for overall difference from the identified control, as well as difference-from-
control on the attributes flake chroma and raisin firmness. The panelists' ratings
for the three test samples are presented in Table 6-10.
Data Analysis
The first step in the analysis of data from a difference-from-control panel is to
compute 1he average difference-from-control values for each of the test samples
(i.e., both QC production samples and any hidden controls) for both overall
difference and any attributes difference scales used. The four sample averages for
overall difference of the air freshener samples are presented at the bottom of the
panel data in Table 6-9. The sample averages for both overall and attribute
differences for the bran flakes with raisin cereal samples are presented similarly in
Table 6-10. For both of the examples, simple arithmetic means are computed
Implementation Of The Program 191
2 5 4 4
2 3 3 5 3
3 2 2 4
4 1 5 3 2
5 0 4 2
6 2 4 4 3
7 0 3 3 1
8 1 2 3 2
9 2 3 5 4
10 2 2 1 2
11 3 3 4 5
12 1 3 3 3
13 2 4 3 3
14 1 2 2 4
15 1 2 4 2
16 2 4 3 4
17 0 2 2 2
18 1 5 4 3
19 2 3 2 2
20 3 3 3
21 2 3 4 3
22 2 4 4
23 3 2 2
24 2 3 3 2
25 2 2 3 4
St.
~
Table6-10. Difference-from-Control panel data for a production sample of bran flake with raisin cereal
2
1
0
2
1
2
5
0
0
3
1
3
5
0
0
1
0
3
2
i
3 0 1 3 0 1 5 1 2 2
4 0 0 4
5 0
0
1
2
2 1
1
2 5
0
0
1
3
4
3
t
6 0 2 4 0 2 6 0 2 4
7 1 2 4 0 2 5 0 1 3
8 0 3 3 0 1 5 0 2 3
i
g,
!2
Sample ~
Mean 0.3 1.5 3.1 0.1 1.6 4.8 0.1 1.5 3.0
Control
l
Mean 0.9 0.9 0.8
the average of the hidden control sample from the averages of each of the QC
production samples, as shown toward the bottom of Table 6-9. In situations where
within-control variability needs to be considered, such as in the case of the bran
flake with raisin cereal, this is done by first computing the average of the two
hidden control samples and then subtracting this average from the averages of each
of the QC production samples evaluated in the session-that is:
as shown toward the bottom on Table 6-10 for each of the three scales used to
evaluate the cereal samples. In situations where multiple hidden controls are used,
the adjustment shown not only eliminates the placebo effect, it also adjusts for the
heterogeneity of the control product by, in effect, computing the difference be-
tween the QC production sample and an "average" control.
The tl. values are the raw QC data for each production sample evaluated,
using the difference-from-control method. If multiple samples per batch are
collected, as is done in the air freshener example, the batch average and range
(i.e., the average and range of the panel tl. values) are computed and recorded
(see Table 6-11). If only a single sample per batch is collected, as in the case of
the bran flakes with raisins cereal, then the panel tl. value is the rating for the
batch; no measure of within-batch variability is available (see Table 6-12).
Table 6-11. Summary of Difference-from-Control ratings for the air freshener product
Table 6-12. Summary of Difference-from-Control ratings for bran flake with raisin cereal
Difference-from-Control (A)
Flake Raisin
Batch Overall Chroma Firmness
Report .System
The sensory coordinator should highlight any samples that fall out of specification,
using the company's standard procedures for such occurrences. Additionally, if an
SPC program is in effect, the control charts for each of the a responses should be
updated each time a new batch of product is evaluated, as in Figures 6.8 and 6.9.
The updated charts should then be forwarded to appropriate personnel. The key
factor to bear in mind is that the QC/sensory data should be treated in exactly the
same manner as any other analytical measurements being collected for QC. All
standard company formats, reporting procedures, and action standards apply. Once
in operation, a successful QC/sensory function should be a fully integrated com-
ponent of the overall QC program.
X-Chart
5
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Spec.
4
Limit
3 ------------------~
~
Mean
2
0
5 10 15 20 25 30 35
Batch
R-Chart
3 -----------------·~
~
Range
2
Batch
FIGURE 6.8. X-chart and R-chart of the overall difference-from-control ratings for the air
freshner process.
196 Difference-from-Control Method (Degree of Difference)
4 - - - - - - - - - - - - - - - - - - UCL.
- - - - - - - - - - - - - - - - - - LCL
0
5 10 15 20 25 30 35
Batch
5
------------------UCL
4
<I
Q) 3
..><:n~
~E
u. c,
l: 2
(.')
LCL
0 ~--~5----1~0~--1~5~~2~0--~2~5~~3~0--~35~
Batch
5
------------------UCL
4
<:]
c:
:!1!
alai
a:c:
~·
3
x
E 2
u:
LCL
0
5 10 15 20 25 30 35
Batch
FIGURE 6.9. I -charts of the overall, flake chroma, and raisin finnness difference-from-control
ratings for the bran flake with raisin cereal process.
Other Versions 197
1. Program growth:
a. Addition of new products;
b. Inclusion of other evaluation points in the process;
c. Design and execution of research projects; and
d. Improvement of sensory method.
2. Development of sensory coordinator.
3. Development of panelists.
OTHER VERSIONS
As with other sensory methods, some variations of difference-from-control
method are possible by combining the difference-from-control rating with ratings
for attributes.
Attributes/
Samples Control FJ91 GV97
INTRODUCTION
The first step in the analysis of any set of data is to get a sense of the general
location and dispersion (or spread) of the measurements. Both graphical and
tabular methods are available, and both should be used. The basic data
summary methods should be used before any formal statistical data analyses
(e.g., confidence intervals, tests of hypotheses, model fitting) are per-
formed. Examination of the graphs and tables may reveal features of the
data that would be lost in the computation of test statistics and probabilities
(p-values). In fact, features revealed in the graphs and tables may indicate
that standard statistical analyses would be inappropriate for the data at
hand.
Many standard statistical procedures assume that the data are normally
distributed. The p-values used to assess the statistical significance of various
results are computed using this assumption. If the data are not normally
distributed, the p-values are inaccurate. The level of inaccuracy increases as the
measurements exhibit greater departures from normality. There is also a less
theoretical issue involved. Many standard statistical procedures rely on the
sample mean:
199
200 Basic Data Analysis Methods
n n
S= [I: X[- (I: Xiln]l(n- 1)
i •1 i• I
to summarize the measurements. The sample mean locates the center of the
measurements:.Jhe sample standard deviation is a measur~of the dispersion of
the data about X. For highly skewed or ~ulti-modal data, X does not provide a
good measure of the center of the data. X is sensitive to extreme values in the
data and,. therefore, overreacts when the measurements are highly skewed.
Multi-modal behavior may indicate that the data arise from a mixture of several
distributions, each with their own underlying mean value. The single value of
X is m~ningless in such situations. Further, since S measures the spread
around X, it too is a poor summary measure for skewed or multi-modal data.
The single value of S implies that the data are dispersed symmetrically about
X (not true for skewed data) and that only a single mean exists (not true for
multi-modal data).
GRAPHICAL SUMMARIES
Consider the sweetness intensity data (measured on a 15 em line scale) of 50
consecutive batches of a powdered soft drink presented in Table A-1. Both the
histogram and the corresponding frequency distribution of these measurements
suggest that the data are relatively symmetric (see Fig. A-1). This observation is
confirmed by the box-and-whisker plot in Figure A-2.
A box·-and-whisker plot is an informative graphical summary for medium to
large data sets (i.e., 25 or more observations). The plot is easy to construct because
it is based directly on the observed values and not on any complicated summary
statistics. For instance, the "box" in Figure A-2 is bounded by the first (Q 1) and
third (Q3) quartiles of the data-that is, the points at which 25 percent of the data
and 75 percent of the data fall below, respectively. The plus sign in the box is the
8.1 7.7 8.0 7.3 7.3 8.7 8.0 8.8 8.7 7.6 8.7
8.5 8.9 7.2 8.4 7.3 9.0 7.6 8.0 8.1 8.1 7.6
8.4 7.7 8.5 7.6 8.0 7.4 7.3 8.1 8.6 7.5 8.1
8.2 7.7 8.7 7.4 6.8 7.5 8.2 7.5 8.0 8.4 8.3
7.5 8.9 9.2 7.4 8.3 8.3
Graphical Summaries 201
Sweet
Midpoint
6.8
7.0
7.2
7.4
7.6
7.8
8.0
8.2
8.4
8.6
8.8
9.0
9.2
0 2 3 4 5 6 7 8 9
Frequency
FIGURE A-1. Histogram of the sweetness intensity data from Table A-1.
Sweetness (em)
FIGURE A-2. Box-and-whisker plot of the sweetness intensity data from Table A-1.
second quartile (or median, M); the point at which 50 percent of the data fall below.
Q1 = 7.5, Q3 = 8.4, and M = 8.05 are presented with the summary statistics for the
sweetness intensity data in Table A-2. The "whiskers" that extend beyond the box
on the low and high sidesare calculated as:
1. The maximum of L = M-1.5(Q3-Q 1) and the minimum data value. For the
202 Basic Data Analysis Methods
Table A-2. Summary statistics on the sweetness intensity data in Table A-1
Statistic Value
Count(n) so
Mean(X) 8.02
Median(M) 8.05
Standard Deviation. (S) 0.56
Standard Error (SE) 0.08
Minimum 6.8
Maximum 9.2
First Quartile (Ql) 7.5
Third Quartile (Q3) 8.4
sweetness data, L = 8.05- 1.5(8.4- 7.5) = 6.7, and the minimum data value is
6.8 (from Table A-2), so the lower whisker of the box-plot extends downward
from 7.5 to 6.8.
2. The minimum of U = M+1.5(Q3-Q 1) and the maximum data value. For the
sweetness data, U = 8.05 + 1.5(8.4- 7.5) = 9.4, and the maximum data value is
9.2 (from Table A-2), so the upper whisker of the box-plot extends upward
from 8.4 to 9.2.
Data points that fall beyond the whiskers are plotted individually.
The two "valleys" in the histogram (at midpoints 7.8 and 8.6) in Figure
A-1 hint at the possibility of multi-modal behavior. However, an additional
graphical technique leads to the conclusion that this is not a serious concern.
The normal probability plot in Figure A-3 displays the observed measure-
ments (on the vertical axis) versus the normal deviate scores of n=50
measurements (on the horizontal axis). To construct a normal probability
plot, the measurements are first ranked in increasing order-i = 1 to n. The
i1h ranked observation is then plotted versus P=<I>- 1 [(3i-1)/(3n+l)], where
q,- 1 is the inverse of the cumulative normal distribution function (see Table
A-3). Many statistical analysis packages provide a programmed function to
compute these values. If the observed data form a straight-line relationship
with the normal scores, as is the case in Figure A-3, it can be concluded that
the measurements arise from a normal distribution. A small number of
observations that fall far from an otherwise straight line may be outliers in
a set of normally distributed data. If the data form a curvilinear relationship
with the normal scores, then they arise from a non-normal distribution (see
D'Agostino, Belanger, and D'Agostino 1990).
*
*
* *
8.80 r *
****
*
**
Sweet I ***
***
*******
8.00 *****
I **** ***
****
***
* * **
7.20 *
r ~
tn
-1.60 -0.80 0.00 0.80 1. 60
Probit
I
c:
~
FIGURE A-3.
~-
Normal probability plot of the sweetness intensity data from Table A-1.
N
0
VJ
204 Basic Data Analysis Methods
Table A-3.. Sweetness data sorted in increasing order with normal distribution values used in
the normal probability plot In Figure A-3
1 -2.21895 6.8
2 -1.83690 7.2
3 -1.61662 7.3
4 -1.45491 7.3
5 -1.32422 7.3
6 -1.21291 7.3
7 -1.11488 7.4
8 -1.02654 7.4
9 -0.94556 7.4
10 -0.87036 7.5
11 -0.79978 7.5
12 -0.73297 7.5
13 -0.66929 7.5
14 -0.60821 7.6
15 -0.54933 7.6
16 -0.49229 7.6
17 -0.43681 7.6
18 -0.38264 7.7
19 -0.32957 7.7
20 -0.27742 7.7
21 -0.22601 8.0
22 -0.17519 8.0
23 -0.12482 8.0
24 -0.07477 8.0
25 -0.02490 8.0
26 0.02490 8.1
27 0.07477 8.1
28 0.12482 8.1
29 0.17519 8.1
30 0.22601 8.1
31 0.27742 8.2
32 0.32957 8.2
33 0.38264 8.3
34 0.43681 8.3
35 0.49229 8.3
36 0.54933 8.4
37 0.60821 8.4
38 0.66929 8.4
39 0.73297 8.5
40 0.79978 8.5
41 0.87036 8.6
42 0.94556 8.7
43 1.02654 8.7
44 1.11488 8.7
45 1.21291 8.7
46 1.32422 8.8
47 1.45491 8.9
48 1.61662 8.9
49 1.83690 9.0
50 2.21895 9.2
Confidence Intervals 205
SUMMARY STATISTICS
Based on the earlier graphical inspection of the data, the standard descriptive
statistics are reliable summaries of the sweetness intensity measurements. Table
A-2 contains these statistics, as well as those required to generate the box-and-
whisker plot in Figure A-2. The estimate of the standard error of the mean (SE) is
also included. The standard error of the mean is the standard deviation of the
distribution of X's of a specified sample size-n. If the raw data have a standard
deviation cr, the standard error of the mean is crf/0. Note that as n increases,
cr/10 decreases, so for larger sample sizes, X is more likely to take on a value close
to its true average-fl. The standard error of the mean is estimated by SE = S/10
CONFIDENCE INTERVALS
The values of X and SE in Table A-2 can be used to decide if X is a sufficiently
precise estimate of 1.1 to meet the needs of an investigation. This is done by using
confidence intervals. A confidence interval on a mean is a range of values within
which the true value of 1.1 lies with a known probability. Confidence intervals on
the mean are calculated as:
where taf2,n-h is the critical value from Student's t distribution. The quantity a
measures the level of confidence. For instance, if a=0.05, the confidence interval
is a 100(1-a)% = 95% confidence interval. The quantity (n-1) is a parameter of the
t distribution called degrees-of-freedom. Degrees-of-freedom measure how much
information is available to estimate the variability in a set of data. The value oft
depends on the value of a and the number of degrees-of-freedom (n-1). Critical
values of t are presented in Table A-8 at the end of the appendices. For the
sweetness intensity data, a 95-percent confidence interval on the mean is:
So it can be concluded that with 95-percent confidence the true value of the
average sweetness intensity of the powdered soft drinks lies somewhere between
7.86 em and 8.18 em. If this range of values is not sufficiently narrow, more
observations need to be collected to further decrease SE and, as a result, the width
of the interval.
Ap:pendix 2
Statistical Hypothesis Testing
INTRODUCTION
Statistical hypothesis testing is a decision-making technique that quantifies the risks
associated with various decisions and, thereby, increases the comfort level in every
decision made. The first step in a statistical hypothesis test is to state two mutually
exclusive hypotheses about the true state of a system. The first of these hypotheses, the
null hypothesis, is the condition that is assumed to exist prior to running a study. The
value specified in the null hypothesis is used to calculate the test statistic (and resulting
p-value} in the hypothesis test. The second of these hypotheses, the alternative
hypothesis, is developed based on the prior interest of the investigator.
The alternative hypothesis is generally of greater interest to a researcher be-
cause, when true, it indicates that some action is called for. For example, if a
company is replacing one of the raw ingredients in its current product with a less
expensive one, the primary concern is that the product made with the less expen-
sive ingredient cannot be distinguished from the current product. The null hypoth-
esis and the alternative hypothesis for this investigation are:
where IJl is the mean value of any critical product characteristic of product i. Both
the null and the alternative hypotheses must be specified before the test is con-
ducted. Results are biased in favor of rejecting the null hypothesis too often if the
alternative hypothesis is formulated after reviewing the data.
206
Example of a Statistical Hypothesis Test 207
X- /Jo
t=--
SE
where X and SE are presented in Table A-2 and /Jo is the assumed value of the mean
208 Statistical Hypothesis Testing
sweetness from the null hypothesis (or, as in this example, the "worst case"
situation from H,). Plugging in the appropriate values yields:
= 8.02-7.5 =658
t 0.079 .
INTRODUCTION
The basic goal of the statistical analysis of a designed study is to obtain an accurate
and precise estimate of experimental error. All tests of hypotheses and confidence
statements are based on this. Experimental error is the unexplainable, natural
variability of the products or samples being studied. Typically, experimental error
is expressed quantitatively as the sample standard deviation, S, or as the sample
variance, S2• In order to obtain a good estimate of experimental error, all sources
of "non-product" variability that are known to exist before a study is run should be
compensated for.
One important source of variability in sensory analysis that is known to exist
in every study is panelist-to-panelist variability. For a variety of reasons,
panelists may use different parts of the rating scale to express their perceptions
of the intensities of a product's attributes. Sensory panels need to be designed
to account for this behavior so as to obtain the most sensitive comparisons
among the samples being evaluated. The statistical technique known as "block-
ing" accomplishes this.
Sensory panels are composed of a large number of evaluations. The evalua-
tions are grouped together according to panelists (i.e., the blocking factor), in
recognition of the fact that they may use different parts of the rating scale to
express their perceptions. The samples must be independently applied at each
evaluation. This is accomplished through such techniques as randomized or-
ders of presentation, sequential monadic presentations, and wash-out periods
of sufficient duration to allow the panelist to return to some baseline level
(constant for all evaluations).
209
210 The Statistical Design of Sensory Panels
Samples
Blocks
(Judge:s) 2
I Xu X12 Xu
2 X21 X22 X2t
b Xbt
mean ratings for all of the samples are equal (H.,: lli"" llj for all samples i and j)
versus the alternative hypothesis that the mean ratings of at least two of the samples
are different (H,: lli 'i' llj for some pair of distinct samples i and j). For t samples,
each evaluated by b panelists, if the value of the F-statistic calculated in Table A-5
exceeds the critical value of an F with (t-1) and (b-1)(t-1) degrees of freedom
(Table A-9 at the end of the appendices), the null hypothesis is rejected in favor of
the alternative hypothesis.
A significant F-statistic in Table A -5 raises the question of which of the samples
differ significantly. This question is answered by another statistical technique
called a multiple comparison procedure. To determine which samples have signif-
icantly different mean ratings, a Fisher's LSD for randomized (complete) block
designs is used, where:
LSD = tan, dfE I 2MSE/b
where b is the number of blocks (typically judges) in the study, ta~2 ,dfE is the upper
a/2 critical value of the Student's t distribution with dfE degrees of freedom, and
MSE is the mean square for error from the ANOVA table. Any two sample means
that differ by more than LSD are significantly different at the a level.
Sample
Block 2 3 4 s 6 7
I X X X
2 X X X
3 X X X
4 X X X
5 X X X
6 X X X
7 X X X
until each panelist has completed an entire repetition of the design. The order of
presentation of the blocks should be randomized separately for each panelist, as
should be the order of presentation of the samples within each block. Alternatively,
for Bffi designs with large numbers of blocks, the normal practice is to call upon
a large number of panelists (ph in all) and to have each evaluate a single block of
samples. The block of samples that a particular panelist receives should be
assigned at random. The order of presentation of the samples within each block
should again be randomized in all cases.
ANOVA is used to analyze Bffi data (see Table A-7). As in the case of a
randomized (complete) block design, the total variability is partitioned into the
separate effects of blocks, samples, and error. However, the formulas used to
calculate the sum of squares in a Bffi analysis are more complicated than for a
randomized (complete) block analysis (see Kirk 1968). Regardless of the approach
used to mn the BIB design, the ANOVA table presented in Table A-7 partitions the
sources of variability so that clean estimates of the sample effects and uninflated
INTRODUCTION
Multivariate statistical methods are used to study groups of responses (i.e., multi-
ple vatiables) that are simultaneously collected on each unit in a sample. The
methods take into consideration the existence of groups of correlated variables-
that is, groups of variables whose values increase and decrease together (either in
direct or inverse relationship to one and other). Multivariate methods are particu-
larly useful for analyzing consumer test data and descriptive data where several
consumer acceptability and descriptive intensity ratings are taken on each sample
evaluated. Multivariate methods can provide concise summaries of the total vari-
ability, using fewer measurements than originally collected, or they may identify
subsets: of respondents who display different patterns of responses to the samples
than other respondents in the study.
Ill
ll2
llp
214
Principal Components/Factor Analysis 215
where each component of the vector is the univariate mean for each response. The
estimator of 1-1 is the vector, X, of individual sample means, X's. The dispersion of
a multivariate population is s~mmarized in the variance-covariance matrix, I.. The
estimator of I. is the sample covariance matrix, s, of sample variances and sample
covariances, ~here: -
n
Sij =I. (Xhi- X;) (Xhj- Xj/(n- 1)
h ~I
are the off-diagonal elements. ~a~ § are used in multivariate tests of hypotheses
and confidence intervals, just as X and S are used in univariate situations (see
Morrison 1976).
• Cerebos No. 2
• Perfect Orange
•Hi C • Product X • Cer~~BtA'to. 1
•Just Juice
.QI~itrus Tree • Spring Valley
Tropicoroa •
• rresh 'n' Juicy • Frucci
• .Just Orange
•Volenc::io
Pulpiness
Color
~=-==,_,~Th;cknesa
Natural Color
• Orange Flavor
FIGURE A-4. Plot of the first two principal component scores for a set of 16 orange juice prod-
ucts, showing both the orientation of the original attributes and the distribution of the products.
(Copyright ASTM. Reprinted with permission.)
plotting the products overall acceptability ratings versus each of the principal
component scores and determining if a range of values that relates to meaningful
acceptability limits exists (see Fig. A-5).
Each principal component, Yi> is a linear combination of the original observa-
tions, xi, as in:
The coefficients, aii• lie between -1 and + 1 and measure the importance of
the original variables on each principal component. A coefficient close to -1 or
+ 1 indicates that the corresponding variable has a large influence on the value
of the principal component; values close to zero indicate that the corresponding
variable has little influence on the principal component. In general, the original
variables tend to segregate themselves into nonoverlapping groups where each
group is associated predominantly with a specific principal component. These
groupings help in interpreting the principal components and in ascertaining which
of the original variables combine to yield some net effect on the product.
Cluster Analysis 217
7
Minimum
Acceptability
6 Limit
1
2 :-..__
Rangaof
~
•------ Acceptable ----•
Values for PC1 I
I
-2 -1 0 2
PC1 Score
FIGURE A-5. Plot of the relationship between liking and the first principal component scores for
a set of ten products.
The number of original variables studied should not be reduced based on the
results of a PCA/FA analysis. As can be seen in the equation for Yh each of the
original variables is included in the computation of each principal component.
Retaining only a small group of "representative" variables on a sensory ballot
ignores the multivariate nature of the effects of the original variables and, if done,
may lead to misleading results in future evaluations.
CLUSTER ANALYSIS
Another multivariate technique that those working in the area of QC/sensory
should be aware of is cluster (or segmentation) analysis. The details of the
computations involved in cluster analysis go beyond the scope of the present
discussion. In fact, the discussion of the relative merits of various clustering
218 Multivariate Methods
methods continues (see Jacobsen and Gunderson 1986). The goal of cluster
analysi!;istoidentifyhomogeneoussubgroups(i.e.,clusters)ofindividuals(either
respondents or products) based on their degree of similarity in certain characteris-
tics. Some work on clustering sensory attributes has also been done (see
Powers, Godwin, and Bargmann 1977). Although the statistical propriety of
such an analysis is questionable (because the attributes on a sensory ballot are
not a random sample from any population), the method still provides an
informative way to study the degree of similarity and groupings of attributes
used to measure product quality.
An application of cluster analysis particularly important in QCfsensory
is that of identifying segments of consumers that have different patterns of
liking across products. While some consumers may like a highly toasted
note in a cereal product, others may find this objectionable. The net effect
of merging these two groups would be to obscure their individual trends
and, thus, to fail to identify a meaningful criteria of product quality in the
eyes of consumers. In some ways, failing to recognize clusters of respon-
dents is equivalent to computing the mean of a multi-modal set of data. The
"middle" of such a set may well be the valley between two groups of data
where no individuals exist. Developing a product to suit this "average"
respondent may, in fact, please no one.
The application of cluster analysis to identify segments of respondents that have
different liking patterns proceeds through the following five steps:
GZI12JZJliJlZII12014lllt11110ll2fl2tDtJ01111ZZllt
llli751011l2SS6J2721,16t11115244110!1!152t'037111145
0.2: +JLJIXXXWX&m&~DUJ:U:ZI:I'UDnnxzxniJ.I! DC&IiUCIXXXXu:m:nnDD
IZXXXKDXKXXXxu:xUXEUJ:KIJ:DZDXDrmJ:J::n:JmDZZXDD n:a:a:awua&JLJJn• iD .,., •• JLCDDD.
IDXI:I:DZXDZD:XUXXUXU:I:I:O.XEUXDDI&li::U:lr.J:XXJ:&XDDD ~u:u:x
I XXXXZ'XDZDmXO:UEUU:IJ:UEEZXXDXUXISU::IDIDXDZD ~XXJDnC:ICXJCD,lDCGJCJC kA&XXDuu:un:
1xxxn•••• '***'&nnnD'ZDDXxxuxzxu:omus:znmm mxu:r:xxxxu:xu: om CCCJCDCDZXDZDXDU
IXXXXXXDTZ'J"J"'XX'QXX~"XU:X.U:XUZUIX!UJ:XXDDDIIUI:zn:a: DX:X~DXUIDI:I.
a .1 75 -tXXXXXXXUJ:IUL&XXX&XXUXXXXXDXXUIIIDXXUUXXUJ:J:ZIIXU: mli:J:DlO< CT1'x'XIXXIXXXXXXJCCI:XXXDDI:l'
jxx.xxxxx~xxuxxn:oxxzxzxxxxn:rxn::i:! XXZDJ:XXXXDDXIXJDCDDCXXXDXXZ&XDZXJ:U
IXXXXXXXDXXXnXnaxuuKXO:XXDXXUJ:EX.XIIXXXXXUXXIDX IDXDDXXXIXXXXXDXKXXXXXKXXXXXXXDXXJ:
JXX:XXXXXXXX.XXXXXX:XXX:EZUUXD.Uau:XUXIII%r%XXXDIJ:11D DJXIX~XXXXXXXDICXXXXXli:XXXXKXXXXXXDXXX
1xxxxxxxu:x:xxlOtXxx:x:uxu:n:nxxxnxx:uax:n:n:xxxxoxxxuz UJ.UDXXXXJ:X."'CZXXXXXXXXXXXXXY.Xxxxn:n:x
! xxxxxxxzxxxxxxx:xux:xun:zxuxz:nxxxxunuxxxxxxxxxrrxx zzxnxxxxxxz:xxnxxzxxxxxxxxxxxxxxxu:x:zx
C .1 S +XXXXXXXXXX:XXDXXXX:XI:XXIIXXI.XZICI.XXXXD."XUIIXXXXXDIXIXXX JIDDXXXXXZXXDXXXXXlCOCXXXXXXXXXXXXZZX
S IXXXXX:XXXXJCCX:x:x:xa:&XEUI:U:I.UIXXUXXXXXXZ:XXXIXXXXXXJ:XAJ:J.Z J.X:ZXUXXXXXIXXXZXXXXXXDXXXXXJCY.XXXDXXX
J; I XXXXXXXXXXKXXXXlfXX"'"X'KU:XXXUI:X:U:IX:XIttUXXEI:J:XXIXli:XUXX XXXXXXXDXXJ:XXKXXXXXXXXXKXXXXXXXXXX~X.i:l
K I XKXXXXXXX:XXXXXXKXltXXU:IXXXJ:XXX:IXXXXXUXI.DXXXXXXIXIII:ll XXXJ'UUXXXXXXXXXXXY.XXXXXXXXXXXXX:XXU.XX
r 1Y.KKXXXXXKXXXX&xxuxxxx:aJtxxxxx:nx.x.:rx:a:r:EXXxxx:xxnuxxxx xzzz:r.axxxxzxxxu:xxxxxxxxxxxxxxxxxn.xx-x:
tx;:xxxxxxx.xxxxxxlll:xxxx:xxz.r:xxxx.xxn:xxxuxxxxx:xxxxxxxAxxxxz x:.:xxxo:xx:..-xxxxxxxxxxxxxx.xxxxxxxxxxxxxx:s:
... 0. :.lS +XXXX.XXXXXXXXXXXXXXXXXXXXXXXX'XXUXltXEX:Z:KXUXXXXXUXXXXXX XXXXl:XXXXDXX".t:XXXXXXXXXXXXXXXXXXXlCXXXXl
;.,. IXXXA"XXXXXXXXXKXnxxxxxxux:xxun:cxxz:xxx:xxxxxxxux•xxxxx xxx:ux~:uAXxxxxxxxxxxzxxxxxxxxxxxxxnx:x
I Ah~XXXXXXXXKXKXXXXXXXXXXXXXU:X:ZXXXXJ:UXXXXXXXli:XXXJ'7.1XXJ. JJIXXJ.:UXXXXXXl\I'XXXJCXXXXXX"AKXXXXXXXii:X:ZX
-: 1:.;;;:~x.xxxxxxxxA>;xxxxxxxxxxxxzx:~~:xx:x.:u:nunxxx.xx:rxxxxxJ:XXX xxxxxxnxxnxxxx:xxlOOCxxxx.xxxxxxx:.ocxxxxi
I AY.XXXK10CXXXXXXKKXXXXY.XX:ICCtZXXXXXZIXS:XXXXX:XXXX:X:XXIX%XXXX X.ZUT.XXZXXUXXXXXXXXX'l..XXXXXXXXX:h."XXXXXX
•• I}(O.:¥.~XXXXKXXXXXICXXXXXXKKXKXXXXX.XXX:XXXXXX~X:K;XXJ:r.X.EXi:Z.JX:lJ: :ZIJ11XDXXXX7.XXXXXXXXXXXXXXXXXXXXXXXXXX
t. ~ .1 tAAXKAXXXXICXXXXXKXX.KXXXXXXXXX:RXU:X:XZEXXXXXXXXX:IIXJXXXXX XZXJZIDXXXXXXXXXXXXXJ:X);:Y.XXKXXXXXXXXI:XX
IA<-:;:A-4\XX"..CXXN:XXXXXX:XXXXXXXXUX:ltXX.XX:XAxtXXXXXXXXX:XJ:'l'l:XXlt'l XXXJ.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXY.XrJ:X
IX;-:;:;:~:XXX:KXXXXXXXXXXXXXXXXXXXXXXXZ:lti.XXXX'l:IXXXJ:XXIUXIlXX XXXXliXJXXXXXXXXAXXX".l:XXX:XXZXXXXXXXXXXT.XX
IZXAXAXXIl:XiCXiOCXXXXXXXXXXXxxxxuxni:EXXXXXXX:XXXXXIXIXXXXX XXXXXJ:UXXXX:XXXXXXXXXXXXXIXXX".<XKXXXXXU"
1xxxx~i:xxxxxxxxxxxxxxxxxx:xxxxxxxxx xzxxxx:rxnxzxxxzzn:xx xzxxxxxuu:Dxxxxxxx:nxxxnxx>:xxxxxxxx:x:
Q I:Oi::-:xAXXXXXX.XXXXXXXXXXXXXXKn"XXV.XX XXXXXXDUZ:I::U:XZ:XXX:XX JlXXXXXDXXXX)(XXXXXXXXXXXXXXXXKXKXXlUCXll:
U 0 . 015 +XY.XXXXXKKXXXXXXXXXXXXXXXKXXXXXXXX :a.xxx%XXX:X:XXGZX:D.J:XX XU:l:XXIXQ:JDJ:XXXXXXXXXXXXXXXXXXXXXXXXX
A tZA'Y".AXXXXXXXXXltX'CXXXXXXX:XXX:XX.XXXXX XXXZZ.XHXX:XXXI.I:X.U.J:XX XXX'ZXIIXXXXXDXIXXXXXXXXXXXXXXXXXXXXXXX
1:\:!QQ:XXXXXXXXXXXXXKXXXX'CXXXXXXXXXX XUXXX:XXX.XX:XXXXXUXIX XXXJJIJJXXXXXZXXX:XJ:XXDli:XZXXXXXXXXMXX:CX
I :CCXXX:t-."ZXXXXXXX:XXXXXXXXKXXXXXXXXXX X:OXXJ:UI.I:I:I:Xlli:XU:I.XX XXXUIIUIIX:xxnzDXDXXXXXXXXXXXXXXXXX
1XX'iCAii:>:xxxxxxxxxx:xxxxxxxxxxxxxxxxx: ~~:xxxnx:Jt.XI. xxxxzz:n:xxxx:xxxx:znx:x:uxxzxxxxxxxxxzxxx
I :C..-:AXXXXXXXXXXXXXZXXXXXXXXXXXXXXXX XX~ lXXX OX XXXJJXXIXIXXXZUZDXJI:X:XXXXXXXXXXXXXXXXX
0 .O!J •:iAXXXXXXXXIXXXXXIXXXXXXXXXXXXXXXX IXIXZXXUXXXI ZXXXZXX UXXUJIIX:UXXXXXDXXXXXXDXXXXXXXXXXXX
IXAXXXXXXXXXXXXJCXZXXXXKXXXXXXXXXXX XXXXXDU:XXXI llr:XU:XX:X. XU:XXX:Z:J:XIU'IZXXDZDXXXXXXXXXXXXXXXXXX
IJD:xxY.XIXX~XXXXIJOfXIXXXXXXXXXXXX XXXXXX:XXKJCDX I:ZXXX , UXXU'J'DDXD:IDXx::IXXJ:X:IaXXXXX~
1xxxxxxxxxx:xxxxxxx xxu:xxxxxaxxxxx xxu:xnmxxx nux • arrmxx:n:xzmnrmxxxzxxxxxxxxxoxx
I :\:.:AXX:.."XXXXDXXXD UU:XXXlEDXXXXX XUXXUJ:J:XDX XXXII X:XZ.XIXXJIZJ:I.I IXJXDZDXXDXXXXXXXDXXX
I :.-::;;:Jt:·:A~XXKX XXXXX XXIXXXXZDXXXXX KDZUUIXXU UUZ O:IUZ.IJXJZIX DX:XXX:XUXXDXXXXXXXXXXXX
~. 025 t~z;\X:l.AIXXXI XXXXX XJ:IXXXXDDXXXX XXXXXXXZZU: • :n:r: UI:IJ.IJ.ZJ.IUI ZIXXXJaDD:DlDCXXXXXXXSXX
1:c:xxxxzxxxz xxxxx xxx~ . DXXXXX XXXKKXXXX • ux mrrmxxxu: uxxrx::r:n:n:nnnrxxxxmm
I ~XI IXXX3 XXX • XXU:X • J:JDDDDCX XXXZXKXXX • XXX UX:KKmZ:X:nz x I DZ
I, ZX:X IXXXX XXX • XXX • .. :IDDXIr. DXDXI • UX UJ: XXUJIIP :IU.XJmmrm XIDll: DX
I. :1:7.X • XXX • :OX DX XXUX: o lXX: DDIXI:IJ: UXXIXZXX21 • • DUX UX:
I· • ~ XXX • , UDD:J • ZDXX :Z:ZZXX o o XXX •
0 +•.• 0 • 0 • 0 • - • 0 0 .. 0 0 0 0 •• 0 0 •• 0 0 • - • - •• - - 0 •• 0
FIGURE A-6. A tree diagram used in cluster analysis to identify groups of respondents with sim-
ilar patterns of liking across products.
INTRODUCTION
There will always be some variability in the output of a production process. This
variability can be broken down into two types-assignable causes and common (or
random) causes. Assignable causes of variability are those over which some
control can be exerted, such as sources of raw materials, process settings, or the
level of training of line operators. The common causes of variability are the more
complex sources inherent to the process over which no direct control is available.
The ove:rall effects of common causes are generally small.
A production process is said to be "in control," statistically, when the product's
attributes are varying around their target levels in a manner influenced by only
common causes. If variability due to one or more assignable causes results in
excessive product variability and/or a shift away from the target levels, the process
is said to be "out of control," statistically.
CONTROL CHARTS
Control charts are simple graphical devices for monitoring the output of a process
to determine if it is in control. Statistical criteria, similar to hypothesis tests, allow
users of control charts to distinguish between random variability and assignable
causes. Thus, the number of unneeded process adjustments is minimized and,
simultaneously, early warnings of out-of-control conditions are obtained. The
warnings from control charts are often issued before the process is producing
product that is out-of-specification and, therefore, must be scrapped.
220
Control Charts 221
X-Charts
The most common control chart is the X-chart, which monitors the average value
of a product variable. To use an X -chart, a small, fixed number of samples
(typically, three to five) needs to be drawn at regular intervals (e.g., within a batch)
during production. The critical measures of quality are taken on each of the units
sampled. For each variable measured, the sample mean of the small group is
computed and plotted on the X-chart. The mean of each new group is plotted
immediately following the mean of the previous group. If the process is in control,
then the X's should all vary about their target, Jl, within a narrow range defmed by
the common causes of variability. The range is defined by the historical variability
of the process and is summarized by the standard deviation CJ. At least 25 groups
of samples (each determined to be in control) should be used to compute Jl and a
(see Fig. A-7).
Nelson (1984) proposed eight criteria for detecting assignable causes of vari-
ability in an X-chart. Failing to meet any one of these criterion would indicate an
out-of-control condition existed. The criteria are:
13.5
12.0 f ~
II \ I
\1
/',~ ~
10.5 ~
LCLa9.980
9.0 L_~-L~--L-~~~~~~~--~~~~--~
0 2 4 6 8 10 12 14 16
Sample Number
F1GURE A-7. An X-chart of the average crispness of a product showing the latest 15 production
batches.
222 Statistical Quality Control
Nelson·s first criterion is the standard "action limit" of any X falling outside of
Jl±3afln. The ±3a/ln limits are also called the upper control limit (UCL) and
lower control limit (LCL), respectively. When an individual X exceeds the action
limit, steps are taken to bring the process back into control. The balance of the
criteria go beyond the standard "warning limits" of any X falling beyond
Jl±2aflfl. An individual X that exceeds the warning limits triggers a search for the
cause of the change, but no adjustments to the process are made.
R-Charts
The sample mean, X, measures how close the product is to its target level, Jl, "on
the average." The range, R (=maximum- minimum values in the sample), is a
simple me:asure of the dispersion of the individual readings of product quality. An
extremely large or extremely small value of R is another indication of an out-of-
control situation. To monitor the variability of the product, an R-chart is con-
structed by plotting the ranges of the periodic samples (i.e., the same samples used
to compute the sample mean in the X-chart) during production (see Fig. A-8).
Statistical criteria exist for determining if a process is out of control based on
the value of~ range. The criteria are equivalent to the ±2a/ln and ±3a/ln limits
used in the X-chart. The warning and action limits (i.e., the 95-percent and
99.9-percent confi~nce level limits, respectively) are computed by multiplying
the average range R by the appropriate value in Table A-10, at the end of the
appendices. An individual value of R that exceeds the warning limits has less than
a 5-percent chance of occurring by chance alone. Similarly, an individual value of
R that exceeds the action limits has less than one chance in 1,000 of occurring
while the ]process is in control. Either situation indicates an increased potential for
assignable causes to be affecting the process. However, adjustments are made to
the process only when an action limit has been exceeded.
1-Charts
If only a single unit of product is collected at each of the periodic samplings, an
!-chart can be constructed in the same way as an X-chart. The standard deviation
of the individual values can be calculated, and warning limits, action limits, and
Control Charts 223
4.5 UCL=4.651
3.0
=1.807
1. 5 -
0.0 ~===c~~====~~~~==~~=r~==c=~~LCL=O.OOO
0 2 4 6 8 10 12 14 16
Sample Number
F1GURE A-8. An R-chart of the within-batch range of crispness ratings of a product, showing the
latest 15 production batches.
Nelson's criteria can be applied (see Fig. A-9). It is not possible to construct an
R-chart when only a single unit is collected from each batch.
When sensory panel data are used in a statistical process control program, it
is important to distinguish between the panel mean for an individual sample of
product and the mean of a small group of production samples. If only a single
sample of product is collected during each of the periodic QC samplings, the
sensory panel yields only one piece of raw data (i.e., the panel mean) about the
state of the process at that point in time. This is true, regardless of the number
of individuals on the panel. The individual data values (i.e., the panel means)
from successive QC samplings are plotted on an 1-chart. If several samples of
product are collected from each production batch, the samples should be
presented to the panel according to an experimental design (e.g., a randomized
complete block design) that will allow a panel mean to be calculated for each
of the samples. The p~nel means of the group of samples are then used as the
raw data to compute X and R that summarize the state of the process for that
batch. If only three samples are collected within a batch, then only three pieces
of raw data (i.e., the three panel means) Ee available to compute X and R,
regardless of the number of panelists. The X's and R 's from successive batches
are plotted on X-charts and R-charts.
224 Statistical Quality Control
15.0 ~
L----------------------------------------iuCL=l3. 91
r
10.0 ,-
~ ··f---.j.,f+---T\--+-t--'HI+-f----±-++--TI;-f--t-i'--t-H-\-1X;a 7 , 4 2 6
~
5.0
~--------------------------------------1LCL=0.9438
0.00~-~~~1~0~~~~2~0~~~3~0~~~~4~0~~~~50-LJ
Observation Number
FIGURE A-9. An !-chart of the fmnness ratings of a product, based on a single sample collected
from each of the last 50 production batches.
Upper
10
-----------------------------------· ~fm~
8
6
~~--~~~~~~~~~~~~~~~X -Ln.
LCL
4
Lower
2 -----------------------------------·Efm~
5 10 15 20 25 30 35
Sample Number
FIGURE A-10. An X-chart, including both control limits and specification limits, showing that
the process does not have to operate at the center of the specification range.
PANEL MAINTENANCE
Two sources of variability have been discussed thus far-common causes and
assignable causes. A third source of variability that is particularly important to
sensory data is measurement variability. The instruments of sensory evaluation are
panelists, who are known to be sensitive to a variety of factors that can influence
their evaluations. In order to serve as a useful analytical measure of product quality,
the output of a sensory panel must be sufficiently precise to be relied on. Data from
a sensory panel should be held up to the same quality standards as any other
analytical data. An analytical method is typically regarded as being "pretty good"
if it has a relative standard deviation (RSD=StX) of 5 percent or less. Repeated
evaluations of the same, or standard, samples should be performed periodically to
insure that the panel's mean ratings are sufficiently precise.
There are two components of panel variability-within-session variability and
between-session variability. Within-session variability results from the panelist-to-
panelist differences. A measure of within-session variability is:
where SEw is the sample standard error of the mean from a given session and Xw
is the sample mean attribute rating from that session. RSDw can be reduced by
increasing the number of panelists who participate in the sessions (assuming they
226 Statistical Quality Control
are all well trained and calibrated). Between-session variability arises from
changes in the testing environment (odors, lighting, etc.), general shifts in the
calibration of the panelists (motivation, fatigue, etc.), and so forth. Good analytical
test controls must be exercised to keep these sources of variability to a minimum.
Between session variability is measured by:
where S= = /
X vi
i•l
(Xi- Xf/(p- 1),
Xi is the sample mean from session i (i = 1 top sessions), and X is the grand sample
mean (i.e., the arithmetic average of the individual session means) based on
repeated evaluations of the same, or standard, sample. When necessary, RSDw and
RSDb can be compared to decide which source of variability needs attention.
Statistical Tables 227
c,~
TableA-9. Upper 5 percent critical values for the F distribution with dft degrees of freedom in the numerator and dfz degrees of freedom in ()
~:
the denominator. e.
1:)
c::
dfJ e.
dfz I 2 3 4 5 6 7 8 9 10 20 30 40 50 100 00 ~-
n0
I 161 200
::::
216 225 230 234 237 239 241 242 248 250 251 252 253 254
2 18.51 19.00 19.16 19.25 19.30 19.33 19.36 19.37 19.38 19.39 19.44 19.46 19.47 19.47 19.49 19.50
[
3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 8.81 8.78 8.66 8.62 8.60 8.58 8.56 8.53
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.80 5.74 5.71 5.70 5.66 5.63
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.78 4.74 4.56 4.50 4.46 4.44 4.40 4.36
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 3.87 3.81 3.77 3.75 3.71 3.67
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.63 3.44 3.38 3.34 3.32 3.28 3.23
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.34 3.15 3.08 3.05 3.03 2.98 2.93
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.13 2.93 2.86 2.82 2.80 2.76 2.71
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.97 2.77 2.70 2.67 2.64 2.59 2.54
20 4.35 3.49 3.10 2.87 2.71 2.60 2.52 2.45 2.40 2.35 2.12 2.04 1.99 1.96 1.90 1.84
30 4.17 3.32 2.92 2.69 2.53 2.42 2.34 2.27 2.21 2.16 1.93 1.84 1.79 1.76 1.69 1.62
40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.07 1.84 1.74 1.69 1.66 !.59 1.51
50 4.03 3.18 2.79 2.56 2.40 2.29 2.20 2.13 2.07 2.02 1.78 1.69 1.63 1.60 1.52 1.44
100 3.94 3.09 2.70 2.46 2.30 2.19 2.10 2.03 1.97 1.92 1.68 1.57 1.51 1.48 1.39 1.28
3.84 2.99 2.60 2.37 2.21 2.09 2.01 1.94 1.88 1.83 !.57 1.46 1.40 1.35 1.24 1.00
Statistical Tables 229
Table A-10. Values for computing the lower action limit (LAL), the lower warning limit
(LWL), upper warning limit (UWL), and the upper action limit (UAL) for
R-charts
Limits are computed by multiplying the average range, R, by the values given for the corresponding
sample size.
Sample
Size LAL LWL UWL UAL
American Oil Chemist Society. 1989. AOCS recommended practice Cg 2-83. Champaign,
ll.:AOCS.
American Society for Testing and Materials (ASTM). 1991. A guide for the sensory
evaluation function within a manufacturing quality assurance/quality control program,
ed. J. Yantis. Philadelphia: ASTM (in preparation).
American Society for Testing and Materials (ASTM). 1986. Physical requirement guide-
lines for sensory evaluation laboratories, ed. J. Eggert and K. Zook. Philadelphia:
ASTM.
Amerine, M.A., R. M. Pangborn, and E. B. Roessler. 1965. Principles of Sensory Evalua-
tion of Food. New York: Academic Press.
Amerine, M. A., E. B. Roessler, and F. Filipello. 1959. Modern sensory methods of
evaluating wines. Hilgardia 28:477-567.
Aust, L. B., M. C. Gacula, S. A. Beard, and R. W. Washam ll. 1985. Degree of difference
test method in sensory evaluation of heterogeneous product types. J. Food Sci. 50:511-
513.
Baker, R. A. 1968. Limitation of present objective techniques in sensory characterization.
In Correlation of subjective-objective methods in the study of odors and taste. ASTM
STP 440. Philadelphia: American Society for Testing and Materials.
Barnes, D. L., S. J. Harper, F. W. Bodyfelt, and M. R. McDaniel. 1991. Prediction of
consumer acceptability of yogurt by sensory and analytical measures of sweetness and
sourness. J. Dairy Sci. (submitted)
Bauman, H. E., and C. Taubert. 1984. Why quality assurance is necessary and important to
plant management. Food Technol. 38:101-102.
Bodyfelt, F. W., J. Tobias, and G. M. Trout. 1988. The Sensory Evaluation of Dairy
Products. New York: AVI. Van Nostrand Reinhold.
Bourne, M. C. 1977. Limitations of rheology in food texture measurements. J. Texture Stud.
8:219-227.
230
References 231
Bruhn, W. 1979. Sensory testing and analysis in quality control. Dragaco Report
9/1979:207-218.
Caplen, R. 1969. A Practical Approach to QC. Brandon/Systems Press.
Carlton, D. K. 1985. Plant sensory evaluation within a multi-plant international organiza-
tion. Food Techno/. 39(11):130.
Carr, B. T. 1989. An integrated system for consumer-guided product optimization. In
Product testing with consumers for research guidance. AS1M STP 1035, ed. L. S. Wu.
Philadelphia: AS1M.
Chambers, E., IV. 1990. Sensory analysis-dynamic research for today's products. Food
Techno/. 44(1):92-94.
Cochran, W. G., and G. M. Cox. 1957. Experimental Designs. New York: John Wiley and
Sons, Inc.
Cullen, J., and J. Hollingum. 1987. Implementing Total Quality. UK: IFS (Publications)
Ltd.
D'Agostino, R. B., A. Belanger, and R. B. D'Agostino, Jr. 1990. A Suggestion for using
powerful and informative tests of normality. The American Statistician 44(4):316-21.
Dawson, E. H., and B. L. Harris. 1951. Sensory methods for measuring differences in food
quality. U.S. Dept. Agr., Agr. Jnfor. Bull. 34:1-134.
Dove, W. F. 1947. Food acceptability-its determination and evaluation. Food Techno/.
1:39-50.
Dziezak, J. D. 1987. Quality assurance through commercial laboratories and consultants.
Food Techno/. 41(12):110-127.
Feigenbaum, A. V. 1951. Quality Control: Principles, Practice and Administration. New
York: McGraw-Hill Book Company, Inc.
Finney, E. E. 1969. Objective measurements for texture in foods. J. Texture Stud. 1:19-37.
Foster, D. 1968. Limitations of subjective measurement of odors. In Correlation of Subjec-
tive-Objective Methods in the Study of Odors and Taste. AS1M STP 440. Philadelphia:
American Society for Testing and Materials.
Friedman, H. H., J. E. Whitney, and A. S. Szczesniak. 1963. The texturometer-a new
instrument for objective measurements. J. Food Sci. 28:390-396.
Gacula, M. C. 1978. Analysis of incomplete block designs with reference samples in every
block. J. Food Sci. 43:1461-1466.
Gallup Organization. 1985. Customer perceptions concerning the quality of American
products and services. ASQCfGallup Survey. Princeton, NJ: The Gallup Organization,
Inc.
Garvin, D. A. 1984. What does product quality really mean? Sloan Management Review.
Fall.
Garvin, D. A. 1987. Competing on the eight dimensions of quality. Harvard Business
Review 65(6): 101.
Gendron, G., and B. Burlingham. 1989. The entrepreneur of the decade. An interview with
Steve Jobs. Inc. (April):114-128.
Gerigk, K., G. Hildebrandt, H. Stephen, and J. Wegener. 1986. Lebensmittelrechtliche
Beurteilung von Leberwurst. Fleischwirtsch. 66(3):310-314.
Halberstam, D. 1984. Yes we can! Parade Magazine (7):4-7.
Hayes, R. 1981. Why Japanese factories work. Harvard Business Review (7-8):57.
232 Statistical Quality Control
Herschdoerfer, S.M. (ed). 1986. Quality Control in the Food Industry. New York: Aca-
demic Press.
Hinreiner, E. H. 1956. Organoleptic evaluation by industry panels-the cutting bee. Food
Techno/. 31(11):62-67.
Hubbard, M. R. 1990. Statistical Quality Control for the Food Industry. New York: Van
Nostrand Reinhold.
Hunter, R. S. 1987. Objective methods for food appearance assessment. In Objective
Methods in Food Quality Assessment, ed. J. G. Kapsalis. Boca Raton, FL: CRC Press.
Huntoon, R. B., and L. G. Scharpf, Jr. 1987. Quality control of flavoring materials. In
Objective Methods in Food Quality Assessment, ed. J. G. Kapsalis. Boca Raton, FL:
CRC Press.
Institute of American Poultry Industries. 1962. Chemical and bacteriological methods for
the examination of egg and egg products. Chicago: Mimeo. Methodology, Egg Prod.
Lab., Inst. Am. Poultry Ind.
Jacobsen, T., and R. W. Gunderson. 1986. Applied Cluster Analysis. In Statistical Proce-
dures in Food Research, ed. J. R. Piggott. New York: Elsevier Science PublishingCo.
Ke, P. J., B. G. Burns, and A. D. Woyewoda. 1984. Recommended procedures and
guidelines for quality evaluation of Atlantic short-fm squid (Illex illecebrosus).
Lebersmittelwissenschaft u. Techno/. (17):276-281.
Kepper, R. E. 1985. Quality motivation. Food Techno/. 39 (9):51-52.
Kirk, R. E. 1968. Experimental Design: Procedures for the Behavioral Sciences. Belmont,
CA: Wadsworth Publishing Co.
Kraemer, H. C., and S. Thiemann. 1987. How Many Subjects? Newbury Park, CA: SAGE
Publications.
Kramer, A., and B. A. Twigg. 1970. Quality Control for the Food Industry. Westport, CT:
The Avi Publishing Co., Inc.
Langenau, E. E. 1968. Correlation of objective-subjective methods as applied to the
perfumery and cosmetics industries. In Correlation of objective-subjective methods in
the study of odors and taste. ASTM STP 440. Philadelphia: American Society for
Testing and Materials.
Larrnond, E. 1976. Sensory methods-choices and limitations. In Correlating sensory objec-
tive measurements-new methods for answering old problems. ASTM STP 594. Philadel-
phia: American Society for Testing and Materials.
Levitt, D. J. 1974. The use of sensory and instrumental assessment of organoleptic charac-
teristics in multivariate statistical methods. J. Texture Stud. 5:183-200.
Levy, I. 1983. Quality assurance in retailing: a new professionalism. Quality Assurance
9(1):3-8.
Little, A. C. 1976. Physical measurements as predictors of visual appearance. Food Tech-
no/. 30(10):74,76,77,80,82.
Little, A. C., and G. MacKinney. 1969. The sample as a problem. Food Techno/. 23(1):25-
28.
List, G. R. 1984. The role of the Northern Regional Research Center in the development of
quality control procedures for fats and oils. JAOCS 61(6):1017-1022.
MacKinney, G., A. C. Little, and L. Brinner. 1966. Visual appearance of foods. Food
Techno/. 20(10):1300-1308.
References 233
Reece, R.N. 1979. A quality assurance perspective of sensory evaluation. Food Techno[.
33(9):37.
Rothwell, J. 1986. The quality control of frozen desserts. In Quality Control in the Food
Industry, ed. S. M. Herschdoerfer. Vol3, Second edition. New York: Academic Press.
Ruehnnund, M. E. 1985. Coconut as an ingredient in baking foods. Food Technology in
New Zealand. (11):21-25.
Rutenbeck, S. K. 1985. Initiating an in-plant quality control/sensory evaluation program.
Food Techno/. 39(11):124.
SAS Institute Inc. SA~TAT Alseis Guide, Version 6, Fourth Edition, Volume 2. Cary, NC:
SAS Institute Inc.
Schrock, E. M., and H. L. Lefevre. 1988. The Good and the Bad News about Quality. New
York: Marcel Dekker, Inc.
Sidel, J. L., H. Stone, and J. Bloomquist. 1981. Use and misuse of sensory evaluation in
research and quality control. Dairy Sci. 64:2296-2302.
Sidel, J. L., H. Stone, and I. Bloomquist. 1983. Industrial approaches to defming quality. In
Sensory Quality in Foods and Beverages: Definitions, Measurement and Control, ed. A.
A. Williams and R. K. Atkin, pp. 48-57. Chichester, UK: Ellis Horwood Ltd.
Sinha, M. N., and W. Willborn. 1985. The Management of Quality Assurance. John Wiley
and Sons.
Sjostrom, L. B. 1968. Correlation of objective-subjective methods as applied in the
food field. In Correlation of objective-subjective methods in the study of odors
and taste. ASTM STP 440. Philadelphia: American Society for Testing
Materials.
Smith, G. L. 1988. Statistical analysis of sensory data. In Sensory Analysis of Foods,
Second Edition, ed. J. R. Piggott. New York: Elsevier Science Publishing Co.
Spencer, H. W. 1977. Proceedings of Joint IFST and Food Group Society of Chemical
Industry. Symposium on Sensory Quality Control.
Steber, H. 1985. Qualitatskontrolle bei Fruchtzubereitungen. Deutsche Molkerei-Zeitung
46:1547-1550.
Stevenson, S. G., M. Vaisey-Genser, and N. A.M. Eskin. 1984. Quality control in the use
of deep frying oils. JAOCS 61(6):1102-1108.
Stone, H., and I. Sidel. 1985. Sensory Evaluation Practices. New York: Academic Press.
Stouffer, J. C. 1985. Coordinating sensory evaluation in a multi plant operation. Food
Techno/. 39(11):134.
Szczesniak, A. S. 1973. Instrumental methods of texture measurements. Texture Measure-
ments of Foods, ed. A. Kramer and A. S. Szczesniak. Dordrecht, Holland: D. Reidel
Pub. Co.
Szczesniak, A. S. 1987. Correlating sensory with instrumental texture measurements. An
overview of recent developments. J. Texture Stud. 18: 1-15.
Tompkins, M. D., and G. B. Pratt. 1959. Comparison of flavor evaluation methods for
frozen citrus concentrates. Food Techno[. ·13: 149-152.
Trant, A. S., R. M. Pangborn, and A. C. Little. 1981. Potential fallacy of correlating hedonic
responses with physical and chemical measurements. J. Food Sci. 46(2):583-588.
Waltking, A. E. 1982. Progress report of the AOCS flavor nomenclature and standards
committee. JAOCS. 59(2):116A-120A.
References 235
Wetherill, G. B. 1977. Sampling Inspection and Quality Control, Second Edition. New
York: Chapman and Hall.
Williams, A. A. 1978. Interpretation of the sensory significance of chemical data in flavor
research. IFFA (3/4):80-85.
Wolfe, K. A. 1979. Use of reference standards for sensory evaluation of product quality.
Food Techno/. 33:43-44.
Woollen, A. 1988. How Coca-Cola assures quality with QUACS. Soft Drinks Management
International (3):21-24.
Index
236
Index 237