Model Oriented Design of Experiments, 2nd Edition Fast eBook Download
Model Oriented Design of Experiments, 2nd Edition Fast eBook Download
Visit the link below to download the full version of this book:
https://medipdf.com/product/model-oriented-design-of-experiments-2nd-edition/
Model-Oriented Design
of Experiments
Second Edition
Valerii V. Fedorov Peter Hackl
Independent Consultant Department of Statistics
Newtown Square, PA, USA Vienna University of Economics and
Business
Vienna, Austria
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Science+Business Media, LLC,
part of Springer Nature.
The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.
The first edition of this book was published in December 1996, almost 30 years ago.
v
vi Preface to the Second Edition
edition, especially Dr. S. Leonov. We hope that the new edition will be well received
by the community of users of optimal designs of experiments and researchers in this
field.
These lecture notes are based on the theory of experimental design for courses
given by Valerii Fedorov at a number of places, most recently at the University
of Minnesota, the University of Vienna, and the Vienna University of Economics
and Business Administration.
It was Peter Hackl’s idea to publish these lecture notes and he took the lead in
preparing and developing the text. The work continued longer than we expected,
and we realized that a few thousand miles distance remains a serious hurdle even in
the age of Internet and many electronic gadgets.
While we mainly target graduate students in statistics, the book demands only
a moderate background in calculus, matrix algebra and statistics. These are, to
our knowledge, provided by almost any school in business and economics, natural
sciences, or engineering. Therefore, we hope that the material may be easily
understood by a relatively broad readership.
The book does not try to teach recipes for the construction of experimental
designs. It rather aims at creating some understanding—and interest—in the
problems and basic ideas of the theory of experimental design. Over the years,
quite a number of books have been published on that subject with a varying degree
of specialization. This book is organized in five chapters that lay out in a rather
compact form all ingredients of experimental designs: models, optimization criteria,
algorithms, constrained optimization. The last third of the volume covers topics that
are relatively new and rarely discussed in form of a book: designs for inference
in nonlinear models, in models with random parameters, in stochastic processes,
and in functional spaces; for model discrimination, and for incorrectly specified
(contaminated) models.
Data collected by performing an experiment are based on two elements: (i) a
clearly defined objective and (ii) a piece of real world that generates—under control
of the experimenter—the data. These elements have analogues in the statistical
theory: (i) the optimality criterion to be applied has to be chosen so that it reflects
appropriately the objective of the experimenter, and (ii) the model has to picture—in
adequate accuracy—the data generating process.
vii
viii Preface
When applying the theory of experimental design, it is perhaps more true than
for many other areas of applied statistics that the complexity of the real world
and the ongoing processes can hardly be adequately captured by the concepts and
methods provided by the statistical theory. This theory contains a set of strong and
beautiful results, but it permits in only rare cases closed-form solutions, and only
in special situations is it possible to construct unique and clear-cut designs for an
experiment. Planning an experiment means rather to work out several scenarios
which together yield insights into and understanding of the data generating process,
thereby strengthening the intuition of the experimenter. In that sense, a real life
experiment is a compromise between results from statistical theory and the a priori
knowledge and intuition of the experimenter.
We have kept the list of references as short as possible; it contains only easy
accessible material. We hope that the collection of monographs given in References
will be sufficient for readers who are interested in the origin and history of the
particular results. A bibliography related to the more recent results can be found in
the papers by Cook and Fedorov (1995) and Fedorov (1996). Note that Volume 13
of the Handbook of Statistics, edited by Ghosh and Rao (1996), consists entirely of
survey-type papers related to experimental design.
We gratefully acknowledge the help and encouragement of friends and col-
leagues during the preparation of the text. Debby Flanagan, Grace Montepiedra,
and Chis Nachtsheim participated in the development of some results from Chap. 5;
we are very grateful for their contributions. We are thankful to Agnes Herzberg,
Darryl Downing, Max Morris, Werner Müller, and Bob Wheeler for discussions
and critical reading of various parts of this book. Stelmo Poteet and Christa Hackl
helped us tremendously in the preparation of the text for publication.
The collection of data requires a certain amount of effort such as time, financial
means, or other material resources. A proper design potentially allows to make use
of the resources in the most efficient way.
The history of publications and the corresponding statistical theory goes back as
far as to 1918 when Smith (1918) published a paper that presents optimal designs
for univariate polynomials up to the sixth degree. However, the need to optimize
experiments under certain conditions was understood by many even earlier. Stigler
(1974) provides an interesting historical survey on this subject. After some singular
earlier work, the core of theory of optimal experimental design was developed
during the fifties and sixties. The main contribution done during that time is due
to Jack Kiefer. A survey of Kiefer’s contribution to the theory of optimal design
is contained in the paper by Wynn (1984). Brown et al. (1985) published Kiefer’s
collected papers. Other important names and papers from that early times may be
found in Karlin and Studden (1966), Fedorov and Malyutov (1974), and Atkinson
and Fedorov (1988). Box and coauthors discussed related problems associated with
actual applications; see, e.g., Box and Draper (1987). The work of the Russian
statisticians that covers both mathematical theory and algorithms is surveyed by
Nalimov et al. (1985).
The first comprehensive volume on the theory of optimal experimental design
was written by Fedorov (1972). The book by Silvey (1980) gives a very compact
description of the theory of optimal design for estimation in linear models. Other
systematic monographs were published by Bandemer et al. (1977), Ermakov (1983),
Pázman (1986), Pilz (1993), and Pukelsheim (1993). Helpful introductory textbooks
are Atkinson and Donev (1992) and López-Fidalgo (2023).
Models and Optimization Problems In the description of experiments we distin-
guish between
• variables that are the focus of our interest and response to the experimental
situation and
• variables that state the conditions or design under which the response is obtained.
ix
x Introduction
The former variables usually are denoted by y, often indexed or otherwise sup-
plemented with information about the experimental conditions. For the latter
we distinguish between variables x that are controlled by the experimenter, and
variables t that are—like time or ambient temperature—out of the control of the
experimenter. In real-life experiments, y is often and x and t are almost always
vectors of variables. The theory of optimal designs discussed in this book is mainly
related to the linear regression. But various extensions comprise
• multi-response linear regression,
• nonlinear regression,
• regression models with random parameters,
• models that represent random processes,
and other generalizations of the regression model concept including discrimination
between competing models.
The set of values at which it is possible and intended to observe the response is
called the design region X. In general, X is a finite set with dimension corresponding
to the number of design variables. The classical design theory has been derived for
this case. However, in real-life problems we often encounter design restrictions. In
a time-series context, it is typically not possible to make multiple observations at
the same time point, so that the design region consists of a (in the simplest case,
equidistant) grid in time. Similar restrictions may be required due to geographical
conditions, cost, and ethical constraints as in clinical studies, etc. The most common
cause for restrictions are due to cost limitations: costs often depend on the design
point; e.g., the investment and maintenance costs of a sensor can strongly be
determined by the accessibility of its location.
The classical optimal design problem is the estimation of the model parameters
subject to the condition that a design criterion is optimized. In the case of a
simple linear regression with E{y} = β0 + β1 x, the variance of the estimate β̂1
is proportional to the reciprocal of the meansquared deviations between the design
points xi , and their mean x: Var{β̂1 } ∝ 1/ N i=1 (xi − x) . Consequently, we can—
2
Illustrating Examples
In many practical cases, y, x, and t are vectors in the Euclidean space. For instance,
y may be the yield of crop(s), x are concentrations of fertilizers, and t are weather
conditions. Experimental data let us infer—to the desired degree of accuracy and
reliability—what dosage of an fertilizer is optimal under certain conditions. In
many cases it will help the reader to achieve a better understanding of the general
theoretical results (Chaps. 1–4) if she or he tries to relate these with this or a similar
situation. The recent development shows (see Chap. 5) that the main ideas of the
optimal experimental design may be applied to the large number of problems, in
which y, x, and t have more complicated structures. We sketch a few examples
that are typical for various situations where design considerations can be used to
economize the experimental effort in one or another way.
Air Pollution
The air pollution that is observed in a certain area is determined, among other
factors, by the time of observation, by the location of the sensor, and by the
direction of wind. The wind direction determines what sources of immission are
effective at the location of observation. The air pollution is measured in terms
of the concentration y of one or several (in general K) pollutants; the location
of the observation station is x = (x1 , x2 )T , the wind direction is described by
v(t) = (v1 (t), v2 (t))T , and time is denoted by t. In some cases, the K locations
X = {xk∗ }K 1 of pollution emitters and the respective rates E(t) = {ek (t)}1 of
K
emission are known; in others they should be identified, the so-called inverse
problem. Typical sets of information are:
1. the vector function y (x, v, X , E, t), the concentration of pollutants for a given
location x at time t;
2. the scalar K-vector
T
y(x) = T −1 y (x, v, X , E, t) dt dv ,
v 0
i.e., the mean air pollution at location x over the period [0, T ].
Note that the vectors v and E usually depend on t.
The design problem in this context could be: Where should we locate sensors
so that the result of a certain analysis such as the identification of the location of
polluters has maximal accuracy? Sections 2.6, 5.1, and 5.3 may help to answer such
questions.
xii Introduction
Clinical Studies
and
eη(x,θ)
Pr{y(x) = 1} = ,x∈X,
1 − eη(x,θ)
Chemical Reactor
Spectroscopic Analysis
function xj (ν), which has the value one for the frequency interval of interest and
zero elsewhere. The observed signal for the given window xj (ν) is
m
y(x) = θi fi (ν) xj (ν) dν.
i=1
The first four chapters cover general material. In particular, Chap. 1 contains a
very short collection of facts from regression analysis. Chapter 2 is essential
for understanding and describes the basic ideas of convex design theory. The
subsequent chapters concern the numerical methods (Chap. 3) and a few theoretical
extensions (Chap. 4). The reader may abstain from detailed reading of the sections
on numerical methods. Some basic algorithms are already available either in widely
used statistical packages, e.g., SAS, JMP, and SYSTAT, or in more specific software
like ECHIP and STAT-EASE; see also numerous R packages, most of which are
cited by Groemping and Morgan-Wall (2023). Chapter 5 is the largest chapter and
describes applications of the convex design theory to various specific models. The
appendix provides the elemental Fisher information for popular distributions and
contains, for the reader’s convenience, a rather standard collection of formulas,
mainly from matrix algebra.
Contents
xv