Optimal Design for Polynomial Regression
Optimal Design for Polynomial Regression
Abstract: This paper considers the problem of determining ecient designs for poly-
nomial regression models when only an upper bound for the degree of the polynomial
is known by the experimenter before the experiments are carried out. The optimality
criterion maximizes a weighted p-mean of the relative D-eciencies in the di erent
models. The optimal (model robust) design is completely determined in terms of its
canonical moments which form the unique solution of a system of nonlinear equations.
The eciency of the optimal designs with respect to di erent criteria is investigated
by several examples.
Key words and phrases: Canonical moments, D-eciency, equivalence theorem, mix-
ture of optimality criteria, polynomial regression.
1. Introduction
Consider the polynomial regression model
X̀
g` (x) = aj xj ;
j =0
order to examine how a given design behaves in the model g` with respect to
the D-optimality criterion one uses the D-eciency
jM`()j +11`
e ` ( ) = jM ( D )j : (1:1)
` `
An obvious drawback of the D-optimal design `D is that it is not necessarily very
ecient in polynomial regression models with degree di erent from `. As an ex-
ample consider the D-optimal design 1D for linear regression which puts equal
masses at the points ;1 and 1 and has eciency 0 in the quadratic model. Con-
versely, the D-optimal design 2D for the quadratic model has only 82% eciency
in the linear model. Because in many applications of polynomial regression mod-
els the degree of the polynomial is not known before the experiments are carried
out, the D-optimal design `D is not used very often in practice.
In this paper we consider the somewhat more realistic situation that the
experimenter knows an upper bound for the polynomial regression, say n 2 IN .
In order to nd a design which has good eciencies in all polynomials up to degree
n we maximize a concave function of the eciencies in (1.1). More precisely, we
de ne "X #1
n p
p; ( ) = ` (e ` ( ))
p ; (1:2)
`=1
where p 2 [;1; 1] and = ( 1 ; : : : ; n ) is a prior distribution on the set
f1; : : : ; ng with n > 0 which re ects the experimenters belief about the ade-
quacy of the di erent models. Here the cases p = ;1 and p = 0 have to be
understood as the corresponding limits, that is
n Yn
;1 ( ) = min
`=1
fe `() j ` > 0g; 0; ( ) = (e ` ( )) :
`
(1:3)
`=1
A design p; is called p; -optimal (with respect to the prior ) if it maximizes
the function in (1.2) or (1.3). The case of the geometric mean p = 0 was intro-
duced by Lauter (1974) and a solution of this problem in the case of polynomial
regression models can be found in Dette (1990).
In this paper we present a complete solution of the p; -optimal design prob-
lem for all p 2 [;1; 1]. The p; -optimal design with respect to the prior is
determined as the design whose canonical moments form the unique solution of a
system of n ; 1 nonlinear equations. These equations can be solved very easily by
standard numerical methods as the Newton Raphson algorithm. The proofs are
based on a combination of equivalence theorems for mixtures of information func-
tions (see e.g. Pukelsheim (1993, p: 283-293)), the theory of canonical moments
DESIGNS FOR POLYNOMIAL REGRESSION 461
(see e.g. Studden (1980, 1982) or Lau (1983)) and a one to one correspondence
between the set of (symmetric) probability measures on [;1; 1] and the set of
optimality criteria in (1.2) (see e.g. Dette (1991)). In Section 2 some preliminary
results are given which will be needed throughout the paper. Section 3 deals
with the case p > ;1. The case p = ;1 (for which the solution of the optimal
design problem is more transparent) is treated in Section 4 and some examples
are given in Section 5.
2. Preliminaries
The general equivalence theory for mixtures of optimality criteria is described
in Pukelsheim (1993, p: 283-293). For the p; -optimality criterion we obtain
from these results the following Lemma.
Lemma 2.1. A design is p; -optimal (for some given p > ;1) if and
only if it is P0; 0 -optimal with respect to the prior 0 = ( 10 ; : : : ; n0 ) where l0 =
p n p
l (e l ( )) = j =1 j (e j ( )) .
Let N (;1 ) = f1 j nj j > 0; ;1 (;1 ) = e j (;1 )g, then a design
;1 is ;1 -optimal if and only if there exists a prior = ( 1 ; : : : ; n ) with
l = 0 for all l 2= N (;1 ) such that ;1 is 0; -optimal.
Equivalence theorems provide a general method for investigating if a given
design is optimal and are the basis for many numerical algorithms (see e.g. Wynn
(1972), Lauter (1974)). For the special case of polynomial regression the theory
of canonical moments provides a very useful tool for the determination of optimal
designs (see e.g. Studden (1980, 1982) or RLau (1983)). For a given probability
measure on the interval [;1; 1] let cj = ;11 xj d (x), j = 0; 1; 2; : : : , denote the
ordinary moments. If c0 ; : : : ; ci;1 is a given set of moments (of ) de ne c+i as the
maximum of the ith moment over set of all probability measures with given
moments c0 ; : : : ; ci;1 . Similarly let c;i denote the corresponding minimum value.
The canonical moments are de ned by
ci ; c;i
pi = i = 1; 2 ; : : :
c+i ; c;i
if c+i > c;i and are unde ned whenever c+i = c;i . A design on [;1; 1] is
symmetric if and only if p2i;1 = 21 for all i 2 IN for which p2i;1 is de ned
(see e.g. Lau (1983)). The determinants of the information matrices M` ( ) can
easily be expressed in terms of the canonical moments of (see Studden (1980)
or Lau (1983)) and for a symmetric design on the interval [;1; 1] we have as a
special case
Ỳ ;j ;
jM` ()j = (q j; p j )`
2 2 2
+1
if is symmetric; (2:1)
j =1
462 HOLGER DETTE AND WILLIAM J. STUDDEN
It is well known (see e.g. Lau (1983)) that has canonical moments 0 < pj < 1,
j = 1; : : : ; 2n ; 1, p2n = 1 if and only if is supported at n +1 points including ;1
and 1 (which means that `D has ` ; 1 support points in the open interval (;1; 1)).
The following result shows that there exists an intimate relation between these
probability measures and the solutions of the 0; -optimal design problem and
this is an immediate consequence of Theorem 2.3 in Dette (1991).
Theorem 2.2. Let (n) denote the class of all symmetric probability measures
on [;1; 1] with n + 1 support points including ;1 and 1 such that
(here pj denote the canonical moments of and q2n+2 = 0). The mapping
: ( 1 ; : : : ; n ) ;! 0; = arg max 0; ( )
Theorem 3.1. Let p 2 (;1; 1], then the p; -optimal design is uniquely deter-
mined by its canonical moments ( 12 ; p2 ; 12 ; : : : ; p2n;2 ; 21 ; 1) where (p2 ; : : : ; p2n;2 ) is
the unique solution of the system of equations
0` 1p= ` `
Y ( +1)( +2)
1 ; 2 pq ` + pq `qp ` @ (q j; p j )j A
+1
` +1 2 2 2 +2
`+2 ` ` ` j
2 2 2
q` q` p
2 2 2 +2 =1
` q` q`
= ` + 1 p 1 ; 2p + p p
2
C` ;
2 +2
` = 1; : : : ; n ; 1; (3:1)
2 +2 2 +4
2 ` `
2 +2 `
2 +2 2 +4 `
jM`(`D )j +11 `
2 2 3=`
`` (` + 1) ` (2` ; 1)` Ỳ ;(`+1;j)
2 1 ( +1)( +2) `
= 4
( +1)
(` + 1 ; j )2 5 ; (3:2)
(2` + 1)(`+1)(2`+1) j =2 (2(` ; j ) + 1)(2(` ; j ) + 3)
` = 1; : : : ; n ; 1, and the `th equation in (3:1) has to be replaced by the equation
1 ; 2 pq2` + pq2` qp2`+2 = 0 (3:3)
2` 2` 2`+2
whenever ` = 0, ` = 1; : : : ; n ; 1.
Proof. Let p2; : : : ; p2n;2 denote the canonical moments (of even order) of the
p; -optimal design p; . From Lemma 2.1 it follows that p; is 0; 0 -optimal P
where the prior distribution 0 = ( 10 ; : : : ; n0 ) is given by `0 = ` (e ` (p; ))p = nj=1
j (e j (p; )) . Because the map in Theorem 2.2 is one to one we have
p
( 10 ; : : : ; n0 ) = ;1 (p; ) = ( 1 ; : : : ; n ); (3:4)
where ` is de ned in (2.4) and consequently the canonical moments of p;
satisfy (2.3). On the other hand, if ` 6= 0, we obtain from (3.4) and (2.4)
Q̀
0 (` + 2) q2j
1 ; 2 pq22 +2
` q2 +2 q2 +4
+2 + p2 +2 p2 +4
` `
which is equivalent to (3.1). If ` = 0, (3.3) follows directly from (3.4) and (2.4).
This shows that the canonical moments (of even order) of p; form a solution of
the system of equations de ned in Theorem 3.1.
464 HOLGER DETTE AND WILLIAM J. STUDDEN
Finally, let (p2 ; : : : ; p2n;2 ) denote a second solution of the system of equations
in Theorem 3.1 that satis es (2.3) and let 2 (n) denote the corresponding
design. By Theorem 2.2 it follows that is 0; -optimal for the prior =
( 1 ; : : : ; n ) where (p2n = 1; q2n+2 = 0)
`Y;1 q !
q q
` = 1 ;` +q1=p p2j 1 ; 2 pq2` + p2` p2`+2 ; ` = 1; : : : ; n: (3:5)
2 2 j =1 2j 2` 2` 2`+2
An application of Lemma 2.1 shows that is p; ~-optimal with respect to the
prior ~ = ( ~1 ; : : : ; ~n ) where
;p
~` = Pn ` (e ` ( )) ;p = `; ` = 1; : : : ; n:
(e ( ))
`=1 ` `
Here the last identity is a consequence of (2.2), (3.5) and the fact that (p2 ; : : :,
p2n;2 ) is a solution of the system of equations in Theorem 3.1. It follows from
standard arguments of optimal design theory, that the p; -optimal design is
unique and consequently we conclude that = p; . But this is equivalent to
the fact p2j = p2j , j = 1; : : : ; n, and proves the assertion of the theorem.
Remark 3.2. It is worthwhile to mention that a more complicated proof of
Theorem 3.1 can be obtained from Theorem 3.3 in Dette (1994) by observing
that a p; -optimal design is also a cp; -optimal discriminating design (in the
sense of Dette (1994)) where ` is proportional to
jM`; ( )jp X
n p
j (e ( ))
jM`()jp j l j + 1 :
1
It should also be noted that, in general, every cp; -optimal discriminating design
is also 0; -optimal for an appropriate prior but not necessarily p; 0 -optimal
for p 6= 0 (in the case of negative weights l Lemma 2.1 is not applicable).
In general, the system of equations in Theorem 3.1 has to be solved numer-
ically except in the case p = 0 where it can be shown that the solution of (3.1)
and (3.2) is given by
Pn `+1;j
`=j `+1 `
p2j = P n `+1;j n P `;j ; j = 1; : : : ; n ; 1;
`=j `+1 ` + `=j +1 `+1 `
which is the result of Dette (1990, p: 1789). A further simpli cation occurs if
n;1 = = k = 0, k n ; 1. In this case the canonical moments of the p; -
optimal design have a similar behavior as in the Ds -optimal design problem (see
DESIGNS FOR POLYNOMIAL REGRESSION 465
In order to show the remaining case ` = 1 we remark that it is easy to see that the
canonical moments of ;1 are all greater than 1=2 (here we use (4.7), Lemma 4.1
and the assumption that p2 is de ned as the largest root of (4.5) in the interval
[0; 1]). By a procedure similar to the above (using (4.5) instead of (4.7)) we nd
that (2.3) for ` = 1 is equivalent to the inequality
5 3
f (p2) = 27p22 ; 54p22 + 16p2 + 27p22 ; 12 0;
1
p2 > :
1
2
It is easy to see that f is an increasing function of p2 2 [1=2; 1] and consequently
it follows that f (p2 ) > f (1=2) > 0, which shows that the canonical moments of
;1 satisfy (2.3).
Step 3 (Proof of Theorem 4.2). From Step 2 we have ;1 2 (n) and by Theorem
2.2 we nd that ;1 is 0; -optimal, where = ;1 (;1 ) = ( 1 ; : : : ; n ) is
de ned in (2.4). In Step 1 we showed that N (;1 ) = f1; : : : ; ng and consequently
;1 satis es the condition in the second part of Lemma 2.1 with = . This
proves the ;1 -optimality of ;1 .
In the following sections we see that the ;1 -optimal design serves as an ap-
propriate approximation for the p; -optimal design when p is suciently small.
For this reason we state some properties of the canonical moments of the ;1 -
optimal design in the following Lemma. The proof is omitted for the sake of
brevity.
Lemma 4.3. Let p(2nj ) denote the canonical moments (of even order) of the ;1-
optimal design for polynomial regression models up to degree n. The following
statements hold true
(a) p2(nj ) 1=2 for all j = 1; : : : ; n and n 2 IN
(b) p2(nj ) < p(2nj ;1) for all n 2 IN
(c) If n > 2, there exists an index j0 such that
1 = p(2nn) > p(2nn);2 > > p(2nj0) < p(2nj0)+2 < < p(4n) < p(2n) :
The following result gives the limit distribution of the ;1 -optimal design
as n ! 1. It shows that the limit is NOT the arcsin-distribution in contrast
with the case p = 0 and the uniform prior (see Dette (1990, p: 1797)).
Theorem 4.4. If n ! 1, then the ;1-optimal design converges weakly to
a symmetric distribution with canonical moments (of even order) p2 ; p4 ; : : :
where for ` 2, p2` is given by the (in nite) continued fraction
a` j a` j
p2` = 1 ;
j 1 j 1 ; ;
; ` 2;
+1
Proof. For xed n the canonical moments of the ;1-optimal design for poly-
nomials up to degree n are given in (4.4) and (4.5). By Lemma 4.1 the quantities
a` in (4.4) satisfy a` < 1=4; ` 2, and by Worpitzky's Theorem (see Wall (1948,
p: 42)) the continued fraction in (4.4) converges. This proves the assertion.
Remark 4.5. Numerical calculations yield for the rst two canonical moments
of the limiting design , p2 = 0:68563939, p4 = 0:56914133 while the canonical
moments of higher order can be calculated recursively from p2`+2 = a` =q2` ; ` 2.
For example we obtain p6 = 0:5414; p8 = 0:5296; p10 = 0:5230; : : : (note that
lim`!1 p2` = 1=2). It is also worthwhile to mention that the sequence of the
canonical moments of the limiting design is strictly decreasing in contrast to
the sequence of canonical moments of the ;1 -optimal design for polynomials up
to degree n. Figure 1 shows the density
p of the2 limiting distribution (solid line)
together with the arc-sin density 1= 1 ; x (dashed line). The arc-sin density
is well known to be the limiting density of similar sequences of designs. For
example if n denotes the D-optimal design for nth degree polynomial regression
then n converges weakly to the arc-sin law. Note that the limiting density of
has less mass near the center and more near the end points 1 than the arc-sin
law.
....
....
2 ..........
....
.....
....
1:5 ...........
.....
....
...
1 ..........
.....
....
.....
0:5 ..........
....
.....
.....
............................................................................................................................................................................................................................. X
;1 ;0:5 0:5 1
Figure 1. Solid line = density of ; dashed line = arc-sin density
If the minimum in the optimality criterion (4.1) is not taken over the full
index set f1; : : : ; ng then the solution of the optimal design problem becomes
more complicated. For the sake of brevity we restrict ourselves to the following
two special cases which can be proved by similar arguments as presented in the
proof of Theorem 4.2.
DESIGNS FOR POLYNOMIAL REGRESSION 469
j 1
with al de ned in (4:2) (l = 2; : : : ; k ; 2),
2k; n;j n ; j + 1 n;j 3 +11; jMk; (D )j
Y n ; j
n k
ak; = 4 5
1 +1
k; 2 2
1
j =1
2( n ; j ) + 1 2( n ; j ) + 1 jMk; (k; )j +2+1;;
D
1 1
n
n
k
k
Ck(k+1)(k+2) Yk
(p2k+2 )(k+1) = j =1
(q2j;2 p2j )j (q2k )k+1
470 HOLGER DETTE AND WILLIAM J. STUDDEN
5. Examples
5.1. Optimal designs with respect to various p; -criteria
Consider the case n = 2 (linear or quadratic regression) and a uniform prior
u
1 = 2 = 1=2. In this case there is one equation for the determination of p2 in
u
Theorem 3.1, namely
2 q2 3 q2 16 p=6
1 ; p (p2 q2 ) = 2 p 729
2 p=6
(5:1)
2 2
and the optimal p; -optimal design has canonical moments (1=2; p2 ; 1=2; 1) where
p2 is the unique root of (5.1) such that (2.3) is satis ed, i.e. p2 2=3. There is
a considerable amount of literature concerning the relationship between the se-
quence of canonical moments and the corresponding design (see e.g. Lau (1983)).
Throughout this chapter we use Lemma 4.4 in Lim and Studden (1988) which is
applicable for polynomial regression up to degree 4. Table 5.1 gives the weights of
the p; -optimal design and the D-eciencies in the linear and quadratic model
u
for di erent values of p 2 [;1; 1]. The case p = ;1 can be directly obtained
from the equation (4.5) in Theorem 4.2 which can be interpreted as the limit of
(5.1) when p ! ;1. Note that all designs are supported at ;1; 0; 1.
Table 5.1. Weights of the p; u -optimal design for linear and quadratic
regression using a uniform prior u
p p; u (f1g) p;u (f0g) e 1 (p; u ) e 2 (p; u )
The results in Table 5.1 demonstrate that there do not exist essential dif-
ferences between the p; -optimal designs for polynomials up to degree 2, with
u
1 ; 2 pq2 + pq2pq4 (p2 (q2 p4 )2 )p=6 = 23 pq2 1 ; 2 pq4 C1p
2q42 2 4 4 q
2 4
(5:2)
1 ; p fp2 (q2 p4 )2 q43 gp=12 = 3 p4 C2p
4 4
and p2 ; p4 have to satisfy (2.3). The optimal design puts masses ; (1=2) ;
; (1=2) ; ; at the points ;1; ;t; t and 1 where t = p2 q4 and = p2 p4 =(2(q2 +
p2 p4 )) (see Lim and Studden (1988, p: 1233)). The solution of (5.2) was de-
termined using the Newton Raphson algorithm and the corresponding designs
and eciencies are given in Table 5.2. Again we observe some robustness of the
design with respect to di erent optimality criteria p; . u
The results of Example 5.1 indicate that a given p; -optimal design for the
u
uniform prior u is quite robust with respect to di erent p0 ; -criteria. Because
u
the 0; -optimal designs are very easy to calculate (Dette (1990)) it might be
u
of interest how these designs behave with respect to the other p; -criteria. As
a representative example we consider the case n = 4, 1u = = 4u = 41 . It
follows from the results of Dette (1990) that the 0; -optimal design puts masses
u
0.27167, 0.10354, 0.24958, 0.10354, 0.27167 at the points ;1; ;0:60508; 0; 0:60508
and 1 respectively. The performance of the design 0; with respect to the other
u
( ) ;
Rp ( ) = p;
u
p 2 [;1; 1];
p; (p; ) u u
The results are illustrated in Table 5.4 and show a remarkable robustness
of 0; -optimal design with respect to the other p; criteria. For this reason
u u
and because of the easy computation of the 0; -optimal designs we conclude
with the statement that the design 0; might be a good choice in polynomial
u
regression models when only an upper bound on the degree of the polynomial is
known and a uniform prior is used to re ect the experimenters belief about the
adequacy of the di erent models. It should also be mentioned again that this
statement is not necessarily true for arbitrary prior distributions .
Table 5.4. p; u -eciencies of the 0; u -optimal design for di erent values
of p 2 [;1; 1]
p 1 0.6 ;0:6 ;1 ;2 ;3 ;1
Rp (0; u ) 0.99989 0.99995 0.99996 0.99989 0.99957 0.99906 0.93220
Acknowledgements
This paper was written while the rst author was visiting Purdue University
in the summer 1993. He would like to thank the Department of Statistics for
its hospitality and the Deutsche Forschungsgemeinschaft for its nancial support
that made this visit possible. The research of the second author was supported in
part by NSF Grant DMS 9101730. We are also indebted to an unknown referee
for his constructive comments on a earlier version of this paper.
References
Dette, H. (1990). A generalization of D- and D1 -optimal designs in polynomial regression. Ann.
Statist. 18, 1784-1804.
DESIGNS FOR POLYNOMIAL REGRESSION 473
Dette, H. (1991). A note on robust designs for polynomial regression. J. Statist. Plann.
Inference 28, 223-232.
Dette, H. (1994). Discrimination designs for polynomial regression on compact intervals. Ann.
Statist. 22, 890-903.
Hoel, P. G. (1958). Eciency problems in polynomial estimation. Ann. Math. Statist. 29,
1134-1145.
Lau, T. S. (1983). Theory of canonical moments and its application in polynomial regression.
Technical Report 83-23, Department of Statistics, Purdue University.
Lauter, E. (1974). Experimental design in a class of models. Mathematische Operations-
forschung und Statistik 5, 379-398.
Lim, Y. B. and Studden, W. J. (1988). Ecient Ds -optimal designs for multivariate polynomial
regression on the q-cube. Ann. Statist. 16, 1225-1240.
Pukelsheim, F. (1993). Optimal Design of Experiments. John Wiley, New York.
Studden, W. J. (1980). Ds -optimal designs for polynomial regression using continued fractions.
Ann. Statist. 8, 1132-1141.
Studden, W. J. (1982). Some robust type D-optimal designs in polynomial regression. J. Amer.
Statist. Assoc. 77, 916-921.
Wall, H. S. (1948). Analytic Theory of Continued Fractions. Van Nostrand, New York.
Wynn, H. P. (1972). Results in the theory and construction of D-optimum experimental designs.
J. Roy. Statist. Soc. Ser.B 34, 133-147.
Institut fur Mathematische Stochastik, Technische Universitat Dresden, Mommsenstr. 13, 01062
Dresden, Germany.
Department of Statistics, Purdue University, West Lafayette, IN 47907-1399, U.S.A.
(Received October 1993; accepted August 1994)