0% found this document useful (0 votes)
31 views

978-1-6654-7661-4/22/$31.00 ©2022 Ieee 37

This paper discusses robust simulation design for generalized linear models in conditions of heteroscedasticity or correlation. It first reviews literature on experimental design for simulation that assumes independent and identically distributed errors. However, this overlooks opportunities for variance reduction through correlated random numbers. The paper then illustrates robust design construction for linear models with heteroscedasticity and generalized linear models with correlation. It concludes by proposing a method for jointly optimizing both the design and pseudo-random number assignment for linear models.

Uploaded by

vnodata
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

978-1-6654-7661-4/22/$31.00 ©2022 Ieee 37

This paper discusses robust simulation design for generalized linear models in conditions of heteroscedasticity or correlation. It first reviews literature on experimental design for simulation that assumes independent and identically distributed errors. However, this overlooks opportunities for variance reduction through correlated random numbers. The paper then illustrates robust design construction for linear models with heteroscedasticity and generalized linear models with correlation. It concludes by proposing a method for jointly optimizing both the design and pseudo-random number assignment for linear models.

Uploaded by

vnodata
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Proceedings of the 2022 Winter Simulation Conference

B. Feng, G. Pedrielli, Y. Peng, S. Shashaani, E. Song, C.G. Corlu, L.H. Lee, E.P. Chew, T. Roeder, and
P. Lendermann, eds.

ROBUST SIMULATION DESIGN FOR GENERALIZED LINEAR MODELS IN CONDITIONS


OF HETEROSCEDASTICITY OR CORRELATION

Andrew Gill David J. Warne


Clare McGrory
James M. McGree

Joint and Operations School of Mathematical


Analysis Division Sciences
Defence Science and Queensland University of
Technology Group Technology
Thrid Avenue 2 George Street
Edinburgh, SA 5111, Brisbane, QLD 4000,
AUSTRALIA AUSTRALIA

Antony M. Overstall

Mathematical Sciences
University of Southampton
University Road
Southampton, SO17 1BJ, UK

ABSTRACT
A meta-model of the input-output data of a computationally expensive simulation is often employed for
prediction, optimization, or sensitivity analysis purposes. Fitting is enabled by a designed experiment, and
for computationally expensive simulations, the design efficiency is of importance. Heteroscedasticity in
simulation output is common, and it is potentially beneficial to induce dependence through the reuse of
pseudo-random number streams to reduce the variance of the meta-model parameter estimators. In this
paper, we develop a computational approach to robust design for computer experiments without the need
to assume independence or identical distribution of errors. Through explicit inclusion of the variance or
correlation structures into the meta-model distribution, either maximum likelihood estimation or generalized
estimating equations can be employed to obtain an appropriate Fisher information matrix. Robust designs
can then be computationally sought which maximize some relevant summary measure of this matrix,
averaged across a prior distribution of any unknown parameters.

1 INTRODUCTION
The fitting of a model to a sample of input-output data of a computationally expensive simulation is an
important task in simulation analytics (Santner et al. 2003). This model of a model (meta-model) can
then be efficiently employed for prediction, optimization, or sensitivity analysis purposes. Sensitivity
analyses are often well served by low-order polynomial (usually quadratic) meta-models, as they enable
the characterisation of impact (main effects), synergies (two factor interactions) and diminishing returns
(squared terms) (Gill et al. 2018; Sanchez et al. 2012).

978-1-6654-7661-4/22/$31.00 ©2022 IEEE 37


Gill, Warne, Overstall, McGrory, and McGree

Fitting of the meta-model is enabled by a designed experiment, and for computationally expensive
(often stochastic) simulations, the efficiency of the design is of importance (Fedorov 1972). For linear
meta-models, factorial-based designs (often fractional and supplemented with central and axial points if
fully quadratic) are typically prescribed (Montgomery 2012), as these are efficient under D-optimality if
the typically assumed condition of independent and identically distributed (iid) errors holds.
Kleijnen (2015) is perhaps the seminal text on experimental design for simulation, and discusses the
implications of departures from iid conditions for the analysis of linear meta-models, which Gill (2019)
illustrates. However, while the assumption of independence can actually be assured in simulation by
employing unique pseudo random number (PRN) streams at each design point, this overlooks an important
variance reduction (design efficiency) opportunity. Schruben and Margolin (1978) were the first to devise
a design efficient PRN assignment strategy for linear meta-models and (generally) factorial-based designs
(Gill (2021) illustrates with a simple example).
However, Kleijnen (2015) is relatively silent on the question of design when iid conditions do not hold
for linear meta-models (“the literature pays little attention to the derivation of alternative designs for cases
with heterogeneous output variances” and “the literature pays no attention to the derivation of alternative
designs for situations with common random numbers (CRN)”). Furthermore, simulation outputs are often
discrete and sometimes only binary, so the broader range of generalized linear (meta-)models (GLMs) are
typically required (i.e., linear, Poisson, and logistic) (Dunn and Smyth 2018). Woods et al. (2006) point
out that the design efficiency for GLMs depends on the regression parameters yet to be estimated, so that
robust designs are often sought by computational optimization.
In this paper, we seek to bring to the attention of the simulation analytics community literature which
address some of these design-related gaps. In particular, we illustrate in some detail design construction
for linear meta-models in the presence of heteroscedasticity (drawing on Atkinson and Cook (1995)) and
GLMs in the presence of correlation (Woods and van de Ven 2011), before concluding with a proof of
concept for the idea of jointly optimizing both the design and PRN assignment for linear meta-models.

2 ROBUST DESIGN CONSTRUCTION FOR GLM


2.1 GLM Designs
In the GLM framework, for each input of q factors xi = [xi,1 , xi,2 , . . . , xi,q ] ∈ Rq i = 1, . . . , n we have a
simulation response Yi with a probability mass/density function p(yi ) assumed to come from the exponential
family of distributions and where there is an appropriate link function g(·) such that g (EY [Yi |xi ]) = fT (xi )β
β.
Here β = [β0 , β1 , . . . , βd−1 ]T is a column vector of d (q < d ≤ n) unknown parameters and fT (xi ) : Rq → Rd
is a row vector of d terms that may include first order and higher order interactions of the q input factors.
The goal is to choose the set of n design points X = [x1 , x2 , . . . , xn ]T ∈ Rn×q to efficiently estimate β .
Minimizing the approximate volume of the covariance ellipsoid of the maximum likelihood estimator of
β is equivalent to maximizing the determinant of the Fisher information matrix (hence called D-optimal)
 2 
∗ ∂ `(Y, F, β )
X = argmax |IX (β )|, IX (β ) j,k = −EY
β β β (1)
X ∂ β j ∂ βk

where F = [fT (x1 ), fT (x2 ), . . . , fT (xn )]T is an n × d matrix and `(y, F, β ) = ∑ni=1 log p(yi |fT (xi , β )) is the
(assumed twice differentiable) log-likelihood for the observations y = [y1 , y2 , . . . , yn ]T at the design points
X given the parameters β .
Using second order partial derivatives of log p(yi |fT (xi ), β ) with respect to β , then taking expectations
with respect to Yi , we obtain the following expression for the expected Fisher information matrix
β ) = FT PF
IX (β (2)
 0
where P = diag 1/ g (EY [Yi |xi ])2 VarY [Yi |xi ] which is a known function of F and β for the relevant

exponential family distribution using in the GLM.

38
Gill, Warne, Overstall, McGrory, and McGree

Designs based on (1) and (2) assume a fixed number of design points n (an exact design). If instead
we ascribe to xi a weight 0 ≤ wi ≤ 1 (with ∑i=1 wi = 1 thus representing how sampling effort is distributed
across design points) and relabel xi = [xi,1 , xi,2 , . . . , xi,k , wi ], so that X ∈ Rn×q+1 , then the approximate design
problem is to find X∗ = argmaxX |IX (β β )| where IX (β β ) = FT WPF where Wii = wi . From this approximate
design, an exact design of a particular size can be generated by sampling according to the weights w∗i .

2.2 Robust Design


Obviously the requirement to know β a priori is not useful for finding designs for estimating β . A common
approach to remove the β dependency is to average (some monotonic function of) the optimality criterion
β ) of possible values of β
across a prior distribution π(β
Z

X = argmax log (|IX (β
β )|) π(β
β ) dβ
β (3)
X

where the logarithm of the determinant is often used for numerical stability purposes. We call this
pseudo-Bayesian approach (Chaloner and Verdinelli 1995; Englezou 2018) robust design, as it is robust to
misspecification of the parameters (though not the meta-model - see Section 6). The prior can be based on
previous investigations or subject matter expertise, or a non-informative probability distribution if required.
Often, the integral in (3) is not analytically tractable, so numerical integration is required. Quadrature
rules are possible but are more cumbersome in higher dimensions, so here we use a direct Monte Carlo
estimator, so that
1 M
X∗ ≈ argmax ∑ log (|IX (ββ m )|) ,
X M m=1
where β 1 , β 2 , . . . , β M are iid draws from the prior π(β
β ).
Robust design using a Monte Carlo estimate of the expected Fisher information requires the maximization
of a random quantity with variance of O(1/M). Many standard non-linear optimization algorithms,
such as Levenberg-Marquardt (Levenberg 1944; Marquardt 1963), cannot handle random variables in the
function to be optimized. Instead, we apply simulated annealing, which is a probabilistic optimization
technique (Kirkpatrick et al. 1983). Other methods for stochastic optimization such as the Approximate
Coordinate Exchange algorithm (Overstall and Woods 2017) can be more efficient for more complex
functions, but simulated annealing is sufficient here.

3 ROBUST DESIGNS FOR DEPARTURES FROM IID CONDITIONS


3.1 Linear Meta-Model in the Presence of Heteroscedasticity
Consider a q = 2 design problem x = [x1 , x2 ] ∈ [−1, 1]2 for the full second-order polynomial linear
meta-model, so fT (xi ) = [1, xi,1 , xi,2 , xi,1 xi,2 , xi,1
2 , x2 ] and g(·) is the identity function, but where Y ∼
i,2 i
N(fT (xi )β
β , σ 2 v(xi )) with v(xi ) = exp(xi α [1 + 2xi α ]). Here α is a column vector with the same di-
mension as xi but otherwise unknown. Thus, the Yi are independent, but ||α α ||22 controls the degree of
heteroscedasticity.
To find robust designs, we need the expected information matrix for this meta-model. Since Yi is
normally distributed, it’s log-likelihood at design point xi is

−(yi − fT (xi )β
β )2 xi α [1 + 2xi α ] √
`i (yi , fT (xi ), β , α ) = − − log( 2πσ )
2σ 2 exp(xi α [1 + 2xi α ]) 2

39
Gill, Warne, Overstall, McGrory, and McGree
h i
∂ 2 `i
and it is relatively easy to show that EYi = 0 given EY [Yi |xi ] = fT (xi )β
∂ β j ∂ αk
β , which means the Fisher
information matrix in (1) will be block diagonal with two blocks; one for β and the other for α . For these
∂ 2 `i −Xi, j Xi,k
=
∂ β j ∂ βk σ 2 exp(xi α [1 + 2xi α ])
∂ 2 `i (yi − fT (xi )β
β )2
 
1
= − Xi, j Xi,k 4 − 4 − (1 + 4xi α )2 2
 
.
∂ α j ∂ αk 2 σ exp(xi α [1 + 2xi α ])

Clearly, IX (β
β ) takes the form (2) with Pii = (σ 2 exp(xi α [1 + 2xi α ]))−1 = (σ 2 v(xi ))−1 and given
β )2 |xi = σ 2 exp(xi α [1 + 2xi α ]) we see that IX (α
EY (yi − fT (xi )β α ) = XT QX with Qii = 21 [1 + 4xi α ]2 . We

note that for linear meta-models, P and Q do not depend on β . This accords with Atkinson and Cook
(1995) and their original derivation which showed that the information expected to be obtained about β
based on the i−th √ design point is given by fT (xi )f(xi )/(σ 2 v(xi )) while for α it is presented as JT J, where
J = (1 + 4xi α )xi / 2.
As a means of comparison,the prior considered in Atkinson and Cook (1995) for α placed equal mass
on the following fives values: [1, 0], [0.75, 0.25], [0.5, 0.5], [0.25, 0.75] and [0, 1]. The motivation is that
these values span the directions in which the variance increases with x1 and x2 , and that there is no prior
knowledge to suggest which direction is more likely than another. Figure 1 shows the local D-optimal
designs for each unique value of α along with the robust design, assuming the mean is known (thus focusing
on IX (α
α )).
Simulated annealing was employed to locate each design including the design weights. To do so, the
optimisation was initialised with a random selection of design points and design weights with a relatively large
value of n. Throughout the optimisation, if some weights approached zero, then the corresponding design
points were removed, which is why some optimal designs have different numbers of unique experimental
runs.
Notably, these designs are very similar to those presented in Figure 5 of Atkinson and Cook (1995)
(including the weights wi , not shown here). For the locally optimal designs, symmetry about α is observed.
This is expected given how α and x exist in the model. The robust design resembles a compromise between
the designs found for each value of α with the largest experimental effort being assigned to x = [1, 1].
Further, the points for x1 = 1 and x2 = 1 align with design points selected for different values of α . Lastly,
there is an inner point placed near x = [0, 0] which appears to be a compromise between the additional
design point found at extreme values for α .

3.2 Logistic Meta-Model in the Presence of Correlation


3.2.1 Fisher Information Matrix via Generalized Estimating Equations
Now consider a Bernoulli response Yi , so that P(Yi = 1) = pi = EY [Yi |xi ] and g(·) is the logit function, with
q = 3 input factors and their pairwise interactions
logit(pi ) = β0 + β1 xi,1 + β2 xi,2 + β3 xi,3 + β4 xi,1 xi,2 + β5 xi,1 xi,3 + β6 xi,2 xi,3 + εi (4)
but where we have added latent random variables ε ∼ N(0, R) with a general n × n covariance matrix
Ri, j = R(xi , x j ). Unlike the linear meta-model, the logit(·) introduces non-linear terms into the expression
for the log-likelihood, which will render the analytical integration over ε to obtain the marginal likelihood
impossible. Therefore, it is not possible to obtain an exact analytic expression to the expected Fisher
information matrix for the model given in (4).
However, following Woods and van de Ven (2011), we can obtain an approximation using generalized
estimating equations (GEE) (see Liang and Zeger (1986) for details). For the logistic GLM with correlations,
the GEE leads to the following approximation (in the weighted design context)
β ) ≈ FT (WP)1/2 R−1 (WP)1/2 F,
IX,R (β (5)

40
Gill, Warne, Overstall, McGrory, and McGree

Figure 1: D-optimal designs for various values of α and the robust (Bayes) design.

where the dependence on β is observed through P with Pii = pi (1− pi ) = exp(fT (xi )β β ))−2 ,
β )(1+exp(fT (xi )β
and Wii = wi with weight 0 ≤ wi ≤ 1 (with ∑i=1 wi = 1 thus representing how sampling effort is distributed
across design points) and relabel xi = [xi,1 , xi,2 , . . . , xi,k , wi ], so that X ∈ Rn×q+1 .
For the purposes of this study, we assume constant (homoscedastic) variance R(xi , xi ) = σ 2 . The
covariance structures we consider are as follows (for i 6= j).

• Independent: The standard assumption in which (5) reduces to (2), i.e. R(xi , x j ) = 0.
• Constant: All observations are equally correlated with each other, i.e. R(xi , x j ) = σ 2 ρ.
• Auto-regressive: The observation index is treated as a time index, i.e. R(xi , x j ) = σ 2 ρ |i− j| .
1 2
• Distance-kernel: Isotropic spatial correlation between design points, i.e. R(xi , x j ) = σ 2 ρe− 4 kxi −x j k2 .

Here ρ ∈ [0, 1] is a correlation parameter (for now considering only positive correlations). Each
correlation structure could, in principle, be valid for a specific computer simulation experiment. If this
structure is know a priori, then that structure should be used for the design. However, for most experiments,
the correlation structure is not known. Therefore, we seek to understand the efficiency of each correlation
assumption under misspecification.

41
Gill, Warne, Overstall, McGrory, and McGree

3.2.2 Evaluating Design Efficiency Under Misspecification


To consider the question of design efficiency for the logistic GLM with correlations (4), we perform
a simulation study. For each of the above correlation structures we obtain a robust design using our
computational approach, and assess the efficiency under misspecification of that correlation. We define
some notation to express this comparison more formally. Let X∗ (R) denote a robust design under the
D-optimality criterion (3) using the GEE approximation (5) with covariance matrix R. Then define
1 M
Z
J(X, R) = β )|1/d π(β
|IX,R (β β≈
β )dβ ∑ |IX,R (ββ m )|1/d
M m=1
where the d−th root is routinely used to allow fair comparisons between designs. The ratio J[X1 , R]/J[X2 , R]
gives the D-efficiency of a design X1 relative to a reference design X2 given a covariance matrix R (Woods
and van de Ven 2011). The D-efficiency can be interpreted as the amount of additional experimental effort
needed, whereby if D-efficiency is 0.5, then you would need to run the design twice to obtain as much
information as the optimal design.
Now consider two covariance matrices R1 and R2 , then X∗ (R1 ) denotes the robust design assuming
R1 , and similarly X∗ (R2 ) is robust assuming R2 . It follows, that
J[X∗ (R1 ), R2 ]
Misspecification D-Efficiency(R1 , R2 ) = , (6)
J[X∗ (R2 ), R2 ]
represents the D-efficiency of a design using the misspecified R1 when R2 was the true covariance.
We evaluate this misspecification efficiency (6) for each pair of covariance functions and do this for
a range of ρ ∈ [0.05, 0.95] to investigate how the efficiency depends on the correlation strength. For each
design simulation, we optimize n weighted design points for the q = 3 factor model using the robust design
expected utility estimated with M = 1, 000 prior samples. When evaluating the final efficiency losses we
use a more precise Monte Carlo estimate with M = 20, 000. The resulting efficiency as a function of
correlation strength is provided for each pair of covariance structures in Figure 2.

Figure 2: The efficiency of designs under different combinations of assumed (R1 ) under true (R2 ) correlation
assumptions plotted against correlation strength.

Note that the efficiencies > 1 suggest too small a value of the Monte Carlo sampling rate (M = 1, 000).
However, to produce Figure 2 required 25 robust designs (8 values of ρ for each of the 3 covariance options

42
Gill, Warne, Overstall, McGrory, and McGree

dependent on ρ with an additional independent case), with each design run costing approximately 16 hours
of CPU time on a Intel Xeon Gold 6140 processor (total of approximately 400 CPU hours distributed over
18 cores) which motivated the chosen value of M for this study.
Several important patterns are observed in Figure 2. Firstly, there is almost no penalty for assuming
a correlation structure when independence is valid. That is, the efficiency of constant, auto-regressive
or distance correlation relative to independence is > 0.9 (Figure 2, blue lines). A similar insensitivity
is apparent for misspecification relative to constant correlation. However, the situation becomes quite
different when considering misspecification relative to auto-regressive or distance-based correlation. In
both cases, the penalty of misspecification increases as the correlation strength, ρ, increases. Assuming
distance correlation when auto-regressive is true performs better overall that the converse relationship.
However, when ρ > 0.7, constant correlation starts to be the better assumption under misspecification by
auto-regressive or distance correlation.
Thus, if sufficient knowledge is available to prescribe a correlation assumption with certainty, then
this will always be the best choice. Beyond this unrealistic case, one clear result is that the independent
assumption should only be used if the risks of misspecification is low. The same can mostly be said
for the constant assumption, unless the correlation strength is higher, in which case the more specific
structural distinctions between auto-regressive and distance correlation become apparent. The choice of
auto-regressive or distance based correlation assumptions do not reduce the quality of the design substantially
if independent or constant correlations would have been also been valid choices. However, the choice of
auto-regressive and distance correlation is more complex and depends on the correlation strength. If one can
rule out auto-regressive correlation, that represents temporal correlation, then this causes few problems and
distance correlation should be used. However, if it is unclear if distance or auto-regressive are possibilities,
then additional exploration is needed. In general we arrive at the following recommendations.

1. If R(xi , x j ) is known, use this in the design process.


2. If R(xi , x j ) is uncertain, but auto-regressive correlation can be excluded, then use distance-kernel
correlation.
3. If R(xi , x j ) is completely uncertain, some understanding of the range of ρ is required. If ρ ≤ 0.7
then distance-kernel correlation is more robust, otherwise constant correlation is more robust.

We also investigated the qualitative differences in the design patterns for the various correlation structures
and values of correlation strength ρ. The example spatial patterns shown in Figure 3 correspond to a view
along the x1 -axes (the other axes views are very similar qualitatively).
Given the efficiency results, it is not surprising that the design patterns look similar for different values
of ρ. However, for a fixed ρ we can observe some differences between designs under different correlation
assumptions. Both the independent and constant correlation cases are characterised with fewer points, but
with relatively constant weights, however, auto-regressive and distance correlation tend to have more points
with lower weights.

4 JOINT OPTIMIZATION OF DESIGN AND PRN ASSIGNMENT


Sections 2 and 3 describe design construction for GLMs where iid conditions may not be present. Unlike
physical experiments, where dependence or correlation may arise due to unavoidable constraints and
nuisance blocking effects need to be accounted for, simulation experiments can guarantee independence
by using different PRN streams for each design point. However, simulation experiments can conversely
induce correlations by the use of CRN. Schruben and Margolin (1978) were among the first to clearly
illustrate how doing so can improve the D-efficiency of a given design, and provided an assignment strategy
for mostly factorial-based designs. Of interest in this section is the merging of that idea with the design
construction approaches of the previous sections.

43
Gill, Warne, Overstall, McGrory, and McGree

Figure 3: The weighted design points under different correlation assumptions. The weight of a design
point is represented by the circle radius. View is along the x1 -axis.

4.1 Linear Meta-Model in the Presence of Correlation


Suppose there are 2g PRN streams given by g streams (denoted R1 , . . . , Rg ) and their antitheses (denoted
R̄1 , . . . , R̄g ). Denote the 2g streams R1 , . . . , R2g with Rg+ j = R̄ j , for j = 1, . . . , g and let b(R j ) be the random
block effect associated with PRN stream R j . Suppose now that design point xi is assigned PRN stream
Rk(i) for some assignment strategy k(·). Let Z be the n × 2g matrix with

1 if Rh is used for experimental run i;
Zih =
0 otherwise,

and b = [b(R1 ), . . . , b(R2g )]T be the 2g × 1 column vector of unique random block effects. Then, γ = Zb =
 T
b(Rk(1) ), . . . , b(Rk(n) ) is the column vector of n random block effects as assigned in the experiment.
Now E (γγ ) = 0n and Var (γγ ) = R, where R is the n × n matrix with
 2
  σ ρ2+ if k(i) = k( j);
Ri, j = Cov b(Rk(i) ), b(Rk( j) ) = −σ ρ− if |k(i) − k( j)| = g;
0 otherwise,

where ρ− > 0 and ρ+ > 0 are unknown.


Here ρ+ and −ρ− are positive and negative correlations induced by using the same PRN stream or its
antithesis for experimental points i and j. Following Schruben and Margolin (1978), the model is
Yi = fT (xi )β
β + γi + εi (7)

44
Gill, Warne, Overstall, McGrory, and McGree

where E (εε ) = 0 and Var (εε ) = σ 2 (1 − ρ+ ) I. For a linear meta-model, we can use the Ordinary Least
−1 T
Squares estimator βˆ = FT F F y, which can be shown to have variance (under (7))
  −1 T −1
Var βˆ = σ 2 FT F F VF FT F where V = (1 − ρ+ )I + ZRZT .

4.2 Optimal Blocked Designs


Optimal design for blocked experiments has been considered previously (see, for example, Chapters 7 and
8 of Goos and Jones (2011) or Chapter 15 of Donev et al. (2007)). What is different here is that there is
positive correlation between elements of b whereas these are usually assumed to be independent. Design
specification here involves both the choice of X = [x1 , . . . , xn ]T and the PRN allocation k = [k(1), . . . , k(n)]
where k(i) ∈ {1, . . . , 2g}.
While before we used the (determinant of the) Fisher information matrix, as the goal is to minimize
the volume of the covarianceellipsoid,
 here we can directly and equivalently use the (log of, for numerical
stability) determinant of Var βˆ

(X∗ , k∗ ) = argmin log |FT VF| − 2 log |FT F| .



(X,k)

However, V depends on unknowns ρ− and ρ+ through R. As before, we instead seek a robust design
under a joint prior distribution π(ρ− , ρ+ ) with domain [0, 1]2
Z 1 Z 1 
∗ ∗ T T
(X , k ) = argmin log |F VF|π(ρ+ , ρ− )dρ− dρ+ − 2 log |F F| (8)
(X,k) 0 0

and as the integral in (8) is not typically available in closed form, it is evaluated here using a 2-dimensional
Gauss-Legendre quadrature rule (see, for example, Weiser (2016))
!
M
(X∗ , k∗ ) ≈ argmin ∑ ωm log |FT Vm F| − 2 log |FT F|
(X,k) m=1

where ω1 , . . . , ωM are the quadrature weights and Vm is V evaluated at the corresponding quadrature nodes
(m) (m)
(ρ− , ρ+ ). For illustrative purposes it suffices here to use a rudimentary joint optimization
! !
M
(X∗ , k∗ ) ≈ argmin min ∑ ωm log |FT Vm F| − 2 log |FT F| (9)
X k
m=1

where the inner minimization is performed by enumerating over all possible k and the outer using a simple
coordinate exchange algorithm (Meyer and Nachtsheim 1995).

4.3 Proof of Concept


h i
Suppose n = 10, and there are q = 2 inputs with X = [−1, 1]2 , fT (xi ) = 1, xi,1 , xi,2 , xi,1 xi,2 , xi,1
2 , x2
i,2 so that
d = 6, and there is g = 1 PRN stream. The discrete set of values in the coordinate exchange algorithm is
{−1, −0.9, −0.8, . . . , 0.8, 0.9, 1}. A robust design (denoted (XR , kR )) is found via (9) where the prior joint
distribution for ρ− and ρ+ used were independent uniform distributions. This is compared to the design
(denoted XC ) found by minimizing (9) but where the same PRN stream is used for all design points (i.e.,
CRN where k(i) = 1, i = 1, . . . , n), and to the design (denoted XI ) found by minimizing (9) but with a
different PRN stream for each design point (i.e., independent, which is equivalent to minimizing − log |FT F|

45
Gill, Warne, Overstall, McGrory, and McGree

and is the standard D-optimal design). Note that these are the same comparisons as made by Schruben
and Margolin (1978).
Both the independent and CRN designs converged to the face-centered central composite design with
  points at [−1, −1] and [+1, +1] and did not utilise the
center point, while the robust design had repeated
center point. The minimum values of log |Var βˆ | are −9.1, −11.8 and −13.0 for the independent, CRN
and robust designs, respectively, thus demonstrating the benefit of inducing correlation over favouring
independence and of the benefit of using a combination of common and antithetic random number streams.
Finally, an interesting comparison is the performance of the independent and  CRN
 designs under the
ˆ
optimal allocation of the two PRN streams (R1 and R̄1 ). It turns out that log |Var β | = −11.8 > −13.0 in
both cases. This demonstrate the utility of jointly optimizing over the design points X and PRN assignment
k, i.e. simply using a standard D-optimal design and then applying the optimal PRN assignment strategy
to that design (as originally performed by Schruben and Margolin (1978)) can be outperformed by joint
optimization.

5 SUMMARY
Kleijnen (2015) provides important guidance on how to analyse simulation experiments in the event of
departures from the ubiquitous independent and identically distributed assumptions. Heteroscedasticity
in simulation output is not uncommon, and it is potentially beneficial to induce dependence through the
reuse of pseudo-random number streams to reduce the generalized variance of the meta-model parameter
estimators.
In this paper, we focus on the experimental design (vice analysis) aspects, and employed a computational
approach to robust design for expensive computer experiments without the need to assume independence
or identical distribution of errors in the meta-model to be developed. Through explicit modelling of
the variance component for linear meta-models, the Fisher information was obtained within a maximum
likelihood inference framework, while explicit modelling of the correlation structure for generalized linear
meta-models and generalized estimating equations can be employed to approximate the Fisher information
matrix. In both cases, robust designs can then be computationally sought which maximize some relevant
statistic of this matrix, averaged across a prior distribution of any unknown parameters.
Moving away from the assumption of independence implies that a correlation structure be introduced,
the misspecification of which could have a negative effect on the performance of the design. We built
upon Woods and van de Ven (2011) to begin investigation of robust designs for GLMs with correlations.
However, our work is distinct to Woods and van de Ven (2011) as we investigated the effect of covariance
matrix misspecification for a variety of correlation structures, in the context of a 3-factor logistic GLM with
pairwise interactions. While our results are not exhaustive, the cases of constant correlation, auto-regressive
correlation, and distance correlation represent major classes of correlation structures (uniform, temporal,
spatial) and are helpful to inform some recommendations.
As illustrated in Section 4, it may be effective to consider ρ as part of the vector of unknown parameters
and hence integrate over a joint prior probability density. The choice to look at the efficiencies of designs
as a function of ρ was primary to identify any dependencies between the effect of misspecification and
the correlation strength. Since we mainly observe this dependency for the auto-regressive cases, robust
design over ρ may only be required if auto-regressive is a feasible correlation structure. It is also important
to note that this simulation approach is designed to obtain some heuristics for dealing with correlation
assumptions. In practice, the D-efficiency for a real problem will never be available. However, the results
provide some means to assist in the interpretation of confidence regions that are obtained for a design. That
is, one must assume some misspecification and therefore treat predicted parameter uncertainty estimates
as underestimates for the true uncertainty that could arise when the computer experiment is performed.
Finally, Schruben and Margolin (1978) pioneered the search for effective assignment strategies of
pseudo-random numbers to design points, but did so with fixed (textbook) designs (and for linear meta-

46
Gill, Warne, Overstall, McGrory, and McGree

models only). In this paper, we provide an example proof of concept of the possibility of jointly optimizing
the design and pseudo-random number assignment and show that gains in statistical efficiency can be made.

6 FUTURE RESEARCH
This paper has assumed the vector of covariates, representing the simulator inputs and configuration settings,
are continuous. However, discrete covariates must also be dealt with. Challenges arise in this case since the
structure of covariances can be more complex. Furthermore, stochastic optimization is substantially more
challenging to deal with in the discrete covariate case. While simulated annealing can deal with discrete
spaces (Kirkpatrick, Gelatt, and Vecchi 1983), it will be more computationally intensive. Unfortunately,
methods like Approximate Coordinate Exchange (Overstall and Woods 2017) can only deal with continuous
design spaces, however, other methods may exist to handle a discrete design space (Meyer and Nachtsheim
1995). Further work is needed to determine the most effective computational scheme for this case.
Model misspecification is a broad challenge in robust design for computer experiments. While accounting
for heteroscedasticity and correlations improves the situation substantially, there is still the potential for bias
in the design due to the meta-model being unable to replicate some behaviours of the complex computer
model. One approach to deal with this is the inclusion of an additional discrepancy term using Gaussian
processes (Englezou 2018; Kennedy and O’Hagan 2000).
For the joint optimization of the design and PRN assignment, the coordinate exchange algorithm used
relied on complete enumeration of all possible k(·) assignments. The size of this set grows fast with the
number of PRN streams g and would appear difficult to apply even for g > 1 and is therefore not particularly
scalable. A more sophisticated approach will be required.
For non-linear meta-models we have focused on the logistic GLM that corresponds to a binary outcome
from a simulation. However, the computational approach we consider here for robust design would also be
applicable to other GLMs of interest, such as binomial and Poisson responses. For the joint optimization
problem, a generalized linear mixed model (GLMM) approach might be applicable (see Chapter 17 of Pawitan
(2013)). The meta-model would then have the form g (EY [Y|X]) = Fβ β + γ , where γ = Zb as in Section 4.
−1
The unknown parameters β could be estimated via maximum likelihood where Var(βˆ ) ≈ FT MF and
there are various forms for M under different approximations (Pawitan 2013) . This modelling approach
is an example of a GLMM, for which optimal design have been considered previously (see Xu and Singh
(2021) and references therein).

REFERENCES
Atkinson, A. C., and R. D. Cook. 1995. “D-Optimum Designs for Heteroscedastic Linear Models”. Journal of the American
Statistical Association 90:204–212.
Chaloner, K., and I. Verdinelli. 1995. “Bayesian Experimental Design: A Review”. Statistical Science 10(3):273–304.
Donev, A., A. Atkinson, and R. Tobias. 2007. Optimum Experimental Designs, with SAS. Oxford Statistical Science Series.
United Kingdom: Oxford University Press.
Dunn, P., and G. Smyth. 2018. Generalized Linear Models with Examples in R. Springer Texts in Statistics. Springer New
York.
Englezou, Y. 2018, July. Bayesian Design for Calibration of Physical Models. Ph. D. thesis, University of Southampton.
https://eprints.soton.ac.uk/427145/, accessed 6th May 2021
Fedorov, V. V. 1972. Theory of Optimal Experiments [by] V.V. Fedorov. Translated and edited by W.J. Studden and E.M. Klimko.
Academic Press New York.
Gill, A. 2019. “Two Common Pitfalls Applying Design of Experiments (and Hopefully How to Avoid Them!)”. In Proceedings
of MODSIM2019, 23rd International Conference on Modelling and Simulation, edited by S. Elsawah, 323–329. Canberra,
Australian Capital Territory, Australia: Modelling and Simulation Society of Australia and New Zealand, Inc.
Gill, A. 2021. “Heteroscedasticity and Correlation in Linear Regression”. In Proceedings of MODSIM2021, 24th International
Conference on Modelling and Simulation, edited by R. W. Vervoort, A. A. Voinov, J. P. Evans and L. Marshall, 834–840.
Canberra, Australian Capital Territory, Australia: Modelling and Simulation Society of Australia and New Zealand, Inc.
Gill, A., D. Grieger, M. Wong, and W. Chau. 2018. “Combat Simulation Analytics: Regression Analysis, Multiple Comparisons
and Ranking Sensitivity”. In Proceedings of the 2018 Winter Simulation Conference, edited by M. Rabe, A. A. Juan,

47
Gill, Warne, Overstall, McGrory, and McGree

N. Mustafee, S. Jain and B. Johansson, 3789–3800. Piscataway, New Jersey: Institue of Electrical and Electronics Engineers,
Inc.
Goos, P., and B. Jones. 2011. Optimal Design of Experiments: A Case Study Approach. Wiley.
Kennedy, M. C., and A. O’Hagan. 2000. “Bayesian Calibration of Computer Models”. Journal of the Royal Statistical Society,
Series B, Methodological 63:425–464.
Kirkpatrick, S., C. D. Gelatt, and M. P. Vecchi. 1983. “Optimization by Simulated Annealing”. Science 220(4598):671–680.
Kleijnen, J. 2015. Design and Analysis of Simulation Experiments. 2nd ed. New York, USA: Springer.
Levenberg, K. 1944. “A Method for the Solution of Certain Non-Linear Problems in Least Squares”. Quarterly of Applied
Mathematics 2:164–168.
Liang, K.-Y., and S. L. Zeger. 1986, 04. “Longitudinal Data Analysis Using Generalized Linear Models”. Biometrika 73(1):13–22.
Marquardt, D. W. 1963. “An Algorithm for Least-Squares Estimation of Nonlinear Parameters”. SIAM Journal on Applied
Mathematics 11(2):431–441.
Meyer, R., and C. Nachtsheim. 1995. “The Coordinate-Exchange Algorithm for Constructing exact Optimal Experimental
Designs”. Technometerics 37:60–69.
Montgomery, D. 2012. Design and Analysis of Experiments, 8th Edition. John Wiley & Sons, Incorporated.
Overstall, A. M., and D. C. Woods. 2017. “Bayesian Design of Experiments using Approximate Coordinate Exchange”.
Technometrics 59(4):458–470.
Pawitan, Y. 2013. In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford University Press.
Sanchez, S. M., T. W. Lucas, P. J. Sanchez, C. J. Nannini, and H. Wan. 2012. Designs for Large-Scale Simulation Experiments,
with Applications to Defense and Homeland Security, Chapter 12, 413–441. John Wiley & Sons, Ltd.
Santner, T. J., W. B., and N. W.. 2003. The Design and Analysis of Computer Experiments. New York, USA: Springer-Verlag.
Schruben, L. W., and B. H. Margolin. 1978. “Pseudorandom Number Assignment in Statistically Designed Simulation and
Distribution Sampling Experiments”. Journal of the American Statistical Association 73(363):504–520.
Weiser, C. 2016. mvQuad: Methods for Multivariate Quadrature. (R package version 1.0-6). https://cran.r-
project.org/web/packages/mvQuad/index.html, accessed 13th April 2022.
Woods, D. C., S. M. Lewis, J. A. Eccleston, and K. G. Russell. 2006. “Designs for Generalized Linear Models with Several
Variables and Model Uncertainty”. Technometrics 48:284–292.
Woods, D. C., and P. van de Ven. 2011. “Blocked Designs for Experiments With Correlated Non-Normal Response”.
Technometrics 53(2):173–182.
Xu, X., and S. Singh. 2021. “Robust Designs for Generalized Linear Mixed Models with Possible Model Misspecification”.
Journal of Statistical Planning and Inference 210:20–41.

AUTHOR BIOGRAPHIES
ANDREW GILL is a senior operations analyst at the Defence Science and Technology Group in Australia. He holds a PhD
degree in Applied Mathematics from James Cook University and a MSc degree in Operations Research and Statistics from the
University of New South Wales. His research interest is on the design and analysis of simulation experiments and is currently
conducting a fellowship in that area. His e-mail address is [email protected].

DAVID WARNE is a Lecturer in statistics in the School of Mathematical Sciences at the Queensland University of Technology.
He holds a Ph.D. degree in Mathematics, a BMath, and BInfoTech (Hons) from the Queensland University of Technology.
His research interests include stochastic modelling, mathematical biology and ecology, Bayesian computation, Monte Carlo
methods, and high performance computing (HPC). His email address is [email protected].

ANTONY OVERSTALL is an Associate Professor in Statistics in the School of Mathematical Sciences at the University of
Southampton. His research interests lie in the areas of optimal experimental design and the analysis of categorical data. His
email address is [email protected].

CLARE MCGRORY was a Postdoctoral Fellow in the School of Mathematical Sciences at the Queensland University of
Technology. She holds a PhD degree in Statistics, and an MSc in Mathematics and Statistics from the University of Glasgow.
Her research interests include computational statistics, approximate inference and Bayesian methods. Her email address is
[email protected].

JAMES MCGREE is a Professor in the School of Mathematical Sciences at the Queensland University of Technology. He
holds a PhD degree in Statistics and a BSc (Hons) from the University of Queensland. His research interests include design
of experiments, Bayesian design and Bayesian computational algorithms. His email address is [email protected].

48

You might also like