SlideShare a Scribd company logo
Variable selection in the context of computer
models
MUMS Foundations of Model Uncertainty
Gonzalo Garcia-Donato1
and Rui Paulo2
1 Universidad de Castilla-La Mancha (Spain), 2 Universidade de Lisboa (Portugal)
Duke University, BFF conference, April 30 2019
Screening
Computer model
Let y(·) denote the output of the computer model with generic
input x ∈ S ⊂ IRp
.
Screening
The question we want to answer is:
Which are the inputs that significantly impact the
output, the so-called active inputs. The other inputs are
called inert.
Gaussian process prior
We place a Gaussian process prior on y(·):
y(·) | θ, σ2
∼ GP(0, σ2
c(·, ·))
with correlation function
c(x, x ) =
p
k=1
ck(xk, xk ) .
We will assume that ck is a one-dimensional correlation function
with fixed (known) roughness parameter and unknown range
parameter γk > 0. We adopt the Matern 5/2 correlation function:
with dk = |xk − xk |
ck(xk, xk ) = ck(dk) = 1 +
√
5 dk
γk
+
5 d2
k
3 γ2
k
exp −
√
5 dk
γk
so that θ = (γ1, . . . , γp).
Our approach to screening
The information to address the screening question comes in the
form of model data: a design D = {x1, . . . , xn} is selected, the
computer model is run at each of these configurations and we
define
y = (y(x1), . . . , y(xn)) .
Consider all 2p models for y that result from allowing only a subset
of the p inputs to be active:
Let δ = (δ1, . . . , δp) ∈ {0, 1}p identify each of the subsets
If δ = 0, Mδ : y | σ2, θ ∼ N(0, σ2Rδ) where
Rδ =


k:δk =1
ck(xki , xkj )


i,j=1,...,n
If δ = 0, Mδ : y | σ2, θ ∼ N(0, σ2I)
Screening is now tantamount to assessing the support that y
lends to each of the Mδ, i.e. a model selection exercise
Parametrizations
In multiple linear regression, the 2p models are formally
obtained by setting to zero subsets of the vector of regression
coefficients of the full model
Here, it depends on the parametrization used for the
correlation function:
γk — range parameter:
xk is inert ⇔ γk → +∞
βk = 1/γk — inverse range parameter:
xk is inert ⇔ βk = 0
ξk = ln γk — log-inverse range parameter:
xk is inert ⇔ ξk → −∞
These parameterizations are free of the roughness parameter
(Gu et al. 2018)
Bayes factors and posterior inclusion probabilitites
Our answer to the screening question is then obtained by
considering the marginal likelihoods
m(y | δ) ≡ N(y | 0, σ2
Rδ) π(θ, σ2
| δ)dσ2
dθ
which allow us to compute Bayes factors and, paired with prior
model probabilities, posterior model probabilities, p(δ | y).
Marginal posterior inclusion probabilities are particularly
interesting in this context:
p(xk | y) =
δ: δk =1
p(δ | y)
Difficulties
Conceptual: selecting π(θ, σ2 | δ)
Computational:
computing the integral in m(y | δ)
If p is large, enumeration of all models is not practical, so
obtaining p(xk | y) is problematic
Priors for Gaussian processes
Berger, De Oliveira and Sans´o, 2001, JASA is a seminal paper
c(x, x ) = c(||x − x ||, θ) with θ a unidimensional range
parameter
Focus on spatial statistics
Some of the commonly used priors give rise to improper
posteriors
Reference prior is derived and posterior propriety is proved
π(θ, σ2
) = π(θ)/σ2
with π(θ) proper has long as the mean of
the GP has a constant term
P., 2005, AoS
Focus on emulation of computer models
c(x, x ) = k c(|xk − xk |, θk )
Reference prior is obtained and posterior propriety established
when D is a cartesian product
Many extensions are considered, e.g. to include a nugget, to
focus on spatial models etc.
Gu, Wang and Berger, 2018, AoS focuses on robust
emulation
Gu 2018, BA, the jointly robust prior
Priors for Gaussian processes
The reference prior is proper for many correlation functions as
long as there is an unknown mean, but it changes very little if
the GP has zero mean
It is hence very appealing for variable selection; however,
the normalizing constant is unknown (and we need it)
it is computationally very costly
Gu (2018) jointly robust prior
π(β) = C0
p
k=1
Ckβk
a
exp −b
p
k=1
Ckβk
where C0 is known, mimics the tail behavior of the reference
prior
This talk is about using the jointly robust prior to answer the
screening problem
Robust emulation
Emulation: using y(·) | y in lieu of the actual computer model
Obtaining the posterior π(θ | y) via MCMC and using it to
get to y(·) | y is not always practical
The alternative is to compute an estimate of θ and plug it in
the conditional posterior predictive of y(·)
Problems arise because often
a) ˆR ≈ I, prediction reverts to the mean, becoming an impulse
function close to y
b) ˆR ≈ 11T
and numerical instability increases
Gu et al. (2018) terms this phenomenon lack of robustness
and because of the product nature of the correlation function
it happens when
a) ˆγk → 0 for at least one k
b) ˆγk → +∞ for all k
Robust emulation (cont.)
Gu et al. (2018) shows that if the ˆγk are computed by
maximizing
f (y | γ) π(γ)
where π(γ, σ2, µ) ∝ π(γ)/σ2 is the reference prior, then
robustness is guaranteed
Remarks:
Since γk → +∞ means that xk is inert, this means that the
reference prior will not be appropriate to detect through the
marginal posterior modes the situation where all the inputs
are inert
The reference prior will not detect either the situation where
some of the inputs are inert (by looking at marginal
posterior modes)
The mode is not invariant with respect to reparametrization,
hence the parametrization matters
Gu et al. (2018) discourage the use of βk
Laplace approximation
The jointly robust prior does not allow for a closed form
expression for m(y | δ)
There is a large literature on computing (ratios of) normalizing
constants, often relying on MCMC samples from the posterior
Our goal was to explore the possibility of using the Laplace
approximation to compute m(y | δ), given that there is code
available to obtain ˆβk, k = 1, . . . , p — R package
RobustGaSP [Gu, Palomo and Berger, 2018] — and these
possess nice properties
Laplace approximation (cont.d)
For each model indexed by δ = 0,
m(y) ∝ exp[h(β)] dβ
where, with π(β) denoting the JR prior,
h(β) = ln L(β | y) + ln π(β)
and
L(β | y) ∝ (yT
R−1
y)−n/2
|R|−1/2
Laplace approximation (and BIC)
Having obtained ˆβ = arg max h(β), expand h(ˆβ) around that
mode to obtain (up to a constant)
m(y) ≈ (2π)k/2
exp[h(ˆβ)] |H|−1/2
H = −
∂2h
∂β∂βT
(ˆβ)
where k = 1T
δ is the number of inputs in model indexed by
δ = 0.
One can obtain explicit formulae for all these quantities
(except for ˆβ!) that are functions of
R−1
,
∂R
∂βj
,
∂2R
∂βi ∂βj
One also has all the quantities to compute BIC-based
posterior marginals:
m(y) ≈ n−k/2
L(ˆβ)
Testbeds
Linkletter et al. 2006, Technometrics deals with variable selection
in the context of computer models.
They consider several examples:
Sinusoidal: y(x1, . . . , x10) = sin(x1) + sin(5x2)
Simple linear model:
y(x1, . . . , x10) = 0.2x1 + 0.2x2 + 0.2x3 + 0.2x4
Decreasing impact: y(x1, . . . , x10) = 8
i=1 0.2/2i−1xi
No signal: y(x1, . . . , x10) = 0
The model data is obtained with a 54-run maximin Latin
hypercube design in [0, 1]10 and iid N(0, 0.052) noise is added to
the model runs.
Sinusoidal function — initial results
Upon trying to obtain the Laplace approximation for all
2p − 1 = 1023 models,
We encountered models where the mode had entries that were
at the boundary: ˆβk = 0
Those modes were of two types: hk < 0 or hk > 0 but
non-zero gradient
This precludes the use of the usual Laplace approximation, as
the mode must be an interior point of the parameter space
Variable selection: two approaches
There are mainly two approaches to variable selection:
Estimation-based
The full model is assumed to be true and the prior (or penalty)
encourages sparcity; a criterion is established for determining
whether a variable is included or not
Model selection-based, in which formally all possible models
are considered, which is what we are trying to accomplish here
The issue of multiple comparisons is present, but the priors for
the parameters are one might say of a different nature
The jointly robust prior differs from the reference prior in that
it can detect the case where some of the inputs are inert,
whereas the reference prior cannot
This is was what we are observing: the jointly robust prior
encourages sparcity
The reference prior revisited...
The motivation for the JR prior is that is that it matches the
“expontential and polynomial tail decaying rates of the
reference prior”
Exponential decay when γ → 0 prevents R to be near diagonal
Polynomial decay when γ → +∞ allows the likelihood to
come into play — large values usually produce better fit
Which polynomial?
If γi → +∞ for i ∈ E {x1, . . . , xp} and the other γi are
bounded,
πR
(γ)
i∈E
γ−3
i
or
πR
(β)
i∈E
βi
This motivates considering
πJR
(β) ∝
p
i=1
βi πJR
(β)
Sinusoidal example revisited...
This prior pushes the βk away from zero, so no modes at the
boundary were found
Some of the modes were pushed away from zero an order of
magnitude so the inputs deemed more relevant
The marginal posteriors of the vectors with some components
of the mode at zero did not seem to change much
The marginal posteriors of models with some support did not
seem to change much either
Sinusoidal example results
Recall that only x1 and x2 are active; the prior on the model space
was constant.
With our modification of the jointly robust prior, these were the
results obtained:
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
p(xk | y) 1 1 0.0808 0.0358 0.5720 0.1917 0.2891 0.7042 0.8249 0.0019
The true model was ranked in 44th place with pmp 0.00011.
δ p(δ | y) x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
404 0.3235 1 1 0 0 1 0 0 1 1 0
452 0.2567 1 1 0 0 0 0 1 1 1 0
52 0.1442 1 1 0 0 1 1 0 0 0 0
408 0.0510 1 1 1 0 1 0 0 1 1 0
292 0.0422 1 1 0 0 0 1 0 0 1 0
260 0.03760 1 1 0 0 0 0 0 0 1 0
The modified prior
The culprit
is not the inaccuracy of the Laplace approximation
correlations in the design
the prior or its hyperparameters
it’s the model!
We need a nugget, for δ = 0,
Mδ : y | σ2
, σ2
0, θ ∼ N(0, σ2
Rδ + σ2
0I)
The GP prior implies that the emulator is an interpolator at the
observed data and that no longer happens when we remove inputs
from the correlation matrix, hence defining an inadequate model...
Adding the nugget
By introducing η = σ2
0/σ2, the reference prior is now
π(β, η, σ2
) ∝
π(β, η)
σ2
and σ2 can be integrated out
The jointly robust prior is
π(β, η) = C0
p
k=1
Ckβk + η
a
exp −b
p
k=1
Ckβk + η
And the tail of the reference prior γi → +∞ for
i ∈ E {x1, . . . , xp}, η → 0, and the other γi are bounded,
πR
(γ, η)
i∈E
γ−3
i
or
πR
(β, η)
i∈E
βi
i.e., there is no penalization in η
The modified prior
For robustness, Gu et al. (2018) recommend the
parametrization in terms of (ln β, ln η) = (ξ, τ), so that we set
π(β, η) ∝
p
k=1
βk
p
k=1
Ckβk + η
a
exp −b
p
k=1
Ckβk + η
but do all the calculations (mode and Laplace approximation)
in the (ξ, τ) parametrization
We next show the results of the four examples (M = 250
simulations)
the Laplace-based inclusion probabilities using a flat prior on
the model space
the Laplace-based inclusion probabilities using a Scott and
Berger (2010) prior on the model space
the BIC-based inclusion probabilities using a flat prior on the
model space
q
q
q
q q q q q
q
q
q
0 2 4 6 8 10
0.000.020.040.060.08
p
P(M_p)
Flat vs Scott and Berger
qq
q
qqqq
q
qq
q
q
q
q
q
qqq
q
q
qqqqq
q
qqqqqq
q
q
q
qq
qq
q qqqqq
q
q
qq
q
qqqq
qqq
q
qq
q
qqq
q
qqqq
q
q
qqqqqq
q
qq
q
q
q
q
q
qq
q
qq
q
q
q
qq
qqq
q
q
q
qq
q
qqq
q
qq
q
q
q
qqq
q
q
q
q
q
qq
q
q
q
q
qq qqq
q
q
qqqqq
q
q
qqqq
qqqq
q
q
q
q
qqq
q
q
qqq
qq
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
x_1 x_3 x_5 x_7 x_9
0.00.20.40.60.81.0
Laplace − flat prior
qq
q
q
qq
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
qq
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
qqqq
q
q
q
qqq
q
q
q
qq
q
q
q
qq
q
q
q
qq
q
q
qqq
q
q
q
q
q
q
q
q
q
q
qqqq
q
q
q
qqq
qq
q
q
q
q
q
qq
q
q
q
q
qq
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
qq
q
q
q
q
q
qqq
q
q
q
q
qq
qq
q
q
qq
q
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
x_1 x_3 x_5 x_7 x_9
0.00.20.40.60.81.0
Laplace − SB prior
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqq
q
q
qqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqq qqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqq
q
qqq qqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
x_1 x_3 x_5 x_7 x_9
0.00.20.40.60.81.0
BIC − flat prior
simple linear model
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
qqq
q
qqqqqq
q
q
q
q
q
qqq
qq
q
qq
q
q
qq
q
qq
q
qq
q
qqqqqq
q
q
qqqq
q
qq
q
q
q
q
qqq
q
q
q
qq
q
q
q
q
q
q
q
qqq
q
q
qqqqqqq
q
q
q
q
q
q
qq
q
qqq
q
q
q
q
q
qq
qqqqqq
q
qqqq
qqq
qq
qq
qq
q
qq
qqq qq
q
qqq
qq
q
qq
q
q
q
q
q
q
qqqqq
q
q
qqq
q
q
q q
q
q
q
qq
q
qq
q
q
q
qq
qq
qq
q
q
qqqqqqqq q
q
q
qqq
q
qqqqqqqqqq
qq
qqq
q
qqq
qq
qq
q
qqqq
q
q
q
q
q
q
q
qq
qqq
q
qqq
q
qqqqqqq
q
q
q
q
qq
q
q
x_1 x_3 x_5 x_7 x_9
0.00.20.40.60.81.0
Laplace − flat prior
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
qqq
q
qqqqqq
q
q
q
q
q
qqqqq
q
qq
q
q
qq
q
qqqqq
q
qqqqqq
q
q
qqqq
q
qqqqq
q
qqq
qqq
qq
q
q
qqq
q
q
qqqq
q
qqqqqqq
q
qqqqq
qqqqqq
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqq
q
qq
q
q
q
q
q
q
qqqqq
qq
qqq
q
q
q q
qqqqq
qqq
q
q
q
qqqqqq
q
q
qqqqqqqq q
q
qqqq
q
qqqqqqqqqqqqqqqqqqqqqqq
q
qqqq
q
q
qq
q
q
q
qqqqq
q
qqqqqqqqqqqqqqq
qq
qq
x_1 x_3 x_5 x_7 x_9
0.00.20.40.60.81.0
Laplace − SB prior
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
qq
x_1 x_3 x_5 x_7 x_9
0.00.20.40.60.81.0
BIC − flat prior
sinusoidal model
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
qq
qqqq
qqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
x_1 x_3 x_5 x_7 x_9
0.00.20.40.60.81.0
Laplace − flat prior
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
x_1 x_3 x_5 x_7 x_9
0.00.20.40.60.81.0
Laplace − SB prior
q
q
qqqq
q
qq
q
q
q
q
q
qqqqq
q
qqqq
q
qqqq
q
q
q
qqq
q
qq
q
q
q
q
q
q
q
q
q
qq
qqq
q
qqqqq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
x_1 x_3 x_5 x_7 x_9
0.00.20.40.60.81.0
BIC − flat prior
decreasing impact
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
x_1 x_3 x_5 x_7 x_9
0.00.20.40.60.81.0
Laplace − flat prior
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
x_1 x_3 x_5 x_7 x_9
0.00.20.40.60.81.0
Laplace − SB prior
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
x_1 x_3 x_5 x_7 x_9
0.00.20.40.60.81.0
BIC − flat prior
no signal
Work in progress
In the simulations, we find models for which H has negative
entries in the diagonal. Upon inspection, the marginals seem
well behaved, so this is numerical instability — we need more
stable ways of computing H
Investigate the possibility of using this method in variable
selection in the bias function
Produce an R package
Summary
Fully automatic formal Bayesian variable selection approach to
screening which is computationally very simple
Based on a correlation function that possesses atractive
properties and on a prior which mimics the behavior of the
reference prior
Need for a nugget when considering all 2p models
BIC behaves quite well and does not require H
Thank you for your attention!

More Related Content

What's hot (20)

PDF
Bayesian model choice in cosmology
Christian Robert
 
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Approximating Bayes Factors
Christian Robert
 
PDF
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Joco pavone
Mario Pavone
 
PDF
Unbiased Bayes for Big Data
Christian Robert
 
PDF
Delayed acceptance for Metropolis-Hastings algorithms
Christian Robert
 
PDF
Bayesian Inference and Uncertainty Quantification for Inverse Problems
Matt Moores
 
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
ABC with Wasserstein distances
Christian Robert
 
PDF
(DL hacks輪読) Variational Inference with Rényi Divergence
Masahiro Suzuki
 
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Mgm
Marc Coevoet
 
PPTX
Gaussian processing
홍배 김
 
PDF
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
Chiheb Ben Hammouda
 
PDF
Coordinate sampler: A non-reversible Gibbs-like sampler
Christian Robert
 
PDF
Stochastic Alternating Direction Method of Multipliers
Taiji Suzuki
 
PDF
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
Parameswaran Raman
 
PPT
Q-Metrics in Theory And Practice
guest3550292
 
Bayesian model choice in cosmology
Christian Robert
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
Approximating Bayes Factors
Christian Robert
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
The Statistical and Applied Mathematical Sciences Institute
 
Joco pavone
Mario Pavone
 
Unbiased Bayes for Big Data
Christian Robert
 
Delayed acceptance for Metropolis-Hastings algorithms
Christian Robert
 
Bayesian Inference and Uncertainty Quantification for Inverse Problems
Matt Moores
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
ABC with Wasserstein distances
Christian Robert
 
(DL hacks輪読) Variational Inference with Rényi Divergence
Masahiro Suzuki
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
The Statistical and Applied Mathematical Sciences Institute
 
Gaussian processing
홍배 김
 
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
Chiheb Ben Hammouda
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Christian Robert
 
Stochastic Alternating Direction Method of Multipliers
Taiji Suzuki
 
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
Parameswaran Raman
 
Q-Metrics in Theory And Practice
guest3550292
 

Similar to MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the Context of Computer Models, Rui Paulo, April 30, 2019 (20)

PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Workshop in honour of Don Poskitt and Gael Martin
Christian Robert
 
PDF
asymptotics of ABC
Christian Robert
 
PDF
Bayesian Inference: An Introduction to Principles and ...
butest
 
PDF
Laplace's Demon: seminar #1
Christian Robert
 
ODP
Linear Regression
mailund
 
PDF
Bayesian regression models and treed Gaussian process models
Tommaso Rigon
 
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Including Factors in B...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
the ABC of ABC
Christian Robert
 
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
MUMS Undergraduate Workshop - An Introduction to Surrogates and Emulators - B...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Asymptotics of ABC, lecture, Collège de France
Christian Robert
 
PDF
Practical-bayesian-optimization-of-machine-learning-algorithms_ver2
Rohit Kumar Gupta
 
PDF
Bayesian approach in linear regression.pdf
suliadigame
 
PDF
Petrini - MSc Thesis
Leonardo Petrini
 
PDF
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
Umberto Picchini
 
PPT
Section5 Rbf
kylin
 
PDF
Logistic Regression(SGD)
Prentice Xu
 
PDF
X01 Supervised learning problem linear regression one feature theorie
Marco Moldenhauer
 
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Workshop in honour of Don Poskitt and Gael Martin
Christian Robert
 
asymptotics of ABC
Christian Robert
 
Bayesian Inference: An Introduction to Principles and ...
butest
 
Laplace's Demon: seminar #1
Christian Robert
 
Linear Regression
mailund
 
Bayesian regression models and treed Gaussian process models
Tommaso Rigon
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Including Factors in B...
The Statistical and Applied Mathematical Sciences Institute
 
the ABC of ABC
Christian Robert
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
The Statistical and Applied Mathematical Sciences Institute
 
MUMS Undergraduate Workshop - An Introduction to Surrogates and Emulators - B...
The Statistical and Applied Mathematical Sciences Institute
 
Asymptotics of ABC, lecture, Collège de France
Christian Robert
 
Practical-bayesian-optimization-of-machine-learning-algorithms_ver2
Rohit Kumar Gupta
 
Bayesian approach in linear regression.pdf
suliadigame
 
Petrini - MSc Thesis
Leonardo Petrini
 
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
Umberto Picchini
 
Section5 Rbf
kylin
 
Logistic Regression(SGD)
Prentice Xu
 
X01 Supervised learning problem linear regression one feature theorie
Marco Moldenhauer
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 
Ad

More from The Statistical and Applied Mathematical Sciences Institute (20)

PDF
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
The Statistical and Applied Mathematical Sciences Institute
 
PPTX
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
The Statistical and Applied Mathematical Sciences Institute
 
PPTX
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
The Statistical and Applied Mathematical Sciences Institute
 
PPTX
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
The Statistical and Applied Mathematical Sciences Institute
 
PPTX
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
The Statistical and Applied Mathematical Sciences Institute
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
The Statistical and Applied Mathematical Sciences Institute
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
The Statistical and Applied Mathematical Sciences Institute
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
The Statistical and Applied Mathematical Sciences Institute
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
The Statistical and Applied Mathematical Sciences Institute
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
The Statistical and Applied Mathematical Sciences Institute
 
Ad

Recently uploaded (20)

PPTX
Modern analytical techniques used to characterize organic compounds. Birbhum ...
AyanHossain
 
PDF
Zoology (Animal Physiology) practical Manual
raviralanaresh2
 
PDF
water conservation .pdf by Nandni Kumari XI C
Directorate of Education Delhi
 
PDF
IMP NAAC REFORMS 2024 - 10 Attributes.pdf
BHARTIWADEKAR
 
PPTX
ENGLISH LEARNING ACTIVITY SHE W5Q1.pptxY
CHERIEANNAPRILSULIT1
 
PPTX
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
PPTX
CLEFT LIP AND PALATE: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
PPTX
Nutri-QUIZ-Bee-Elementary.pptx...................
ferdinandsanbuenaven
 
PDF
FULL DOCUMENT: Read the full Deloitte and Touche audit report on the National...
Kweku Zurek
 
PPTX
HIRSCHSPRUNG'S DISEASE(MEGACOLON): NURSING MANAGMENT.pptx
PRADEEP ABOTHU
 
PPTX
Folding Off Hours in Gantt View in Odoo 18.2
Celine George
 
PPT
digestive system for Pharm d I year HAP
rekhapositivity
 
PPTX
SAMPLING: DEFINITION,PROCESS,TYPES,SAMPLE SIZE, SAMPLING ERROR.pptx
PRADEEP ABOTHU
 
PPTX
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
PPTX
ROLE OF ANTIOXIDANT IN EYE HEALTH MANAGEMENT.pptx
Subham Panja
 
PPSX
Health Planning in india - Unit 03 - CHN 2 - GNM 3RD YEAR.ppsx
Priyanshu Anand
 
PPTX
Blanket Order in Odoo 17 Purchase App - Odoo Slides
Celine George
 
PPTX
Views on Education of Indian Thinkers Mahatma Gandhi.pptx
ShrutiMahanta1
 
PPTX
HEAD INJURY IN CHILDREN: NURSING MANAGEMENGT.pptx
PRADEEP ABOTHU
 
PPTX
Capitol Doctoral Presentation -July 2025.pptx
CapitolTechU
 
Modern analytical techniques used to characterize organic compounds. Birbhum ...
AyanHossain
 
Zoology (Animal Physiology) practical Manual
raviralanaresh2
 
water conservation .pdf by Nandni Kumari XI C
Directorate of Education Delhi
 
IMP NAAC REFORMS 2024 - 10 Attributes.pdf
BHARTIWADEKAR
 
ENGLISH LEARNING ACTIVITY SHE W5Q1.pptxY
CHERIEANNAPRILSULIT1
 
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
CLEFT LIP AND PALATE: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
Nutri-QUIZ-Bee-Elementary.pptx...................
ferdinandsanbuenaven
 
FULL DOCUMENT: Read the full Deloitte and Touche audit report on the National...
Kweku Zurek
 
HIRSCHSPRUNG'S DISEASE(MEGACOLON): NURSING MANAGMENT.pptx
PRADEEP ABOTHU
 
Folding Off Hours in Gantt View in Odoo 18.2
Celine George
 
digestive system for Pharm d I year HAP
rekhapositivity
 
SAMPLING: DEFINITION,PROCESS,TYPES,SAMPLE SIZE, SAMPLING ERROR.pptx
PRADEEP ABOTHU
 
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
ROLE OF ANTIOXIDANT IN EYE HEALTH MANAGEMENT.pptx
Subham Panja
 
Health Planning in india - Unit 03 - CHN 2 - GNM 3RD YEAR.ppsx
Priyanshu Anand
 
Blanket Order in Odoo 17 Purchase App - Odoo Slides
Celine George
 
Views on Education of Indian Thinkers Mahatma Gandhi.pptx
ShrutiMahanta1
 
HEAD INJURY IN CHILDREN: NURSING MANAGEMENGT.pptx
PRADEEP ABOTHU
 
Capitol Doctoral Presentation -July 2025.pptx
CapitolTechU
 

MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the Context of Computer Models, Rui Paulo, April 30, 2019

  • 1. Variable selection in the context of computer models MUMS Foundations of Model Uncertainty Gonzalo Garcia-Donato1 and Rui Paulo2 1 Universidad de Castilla-La Mancha (Spain), 2 Universidade de Lisboa (Portugal) Duke University, BFF conference, April 30 2019
  • 2. Screening Computer model Let y(·) denote the output of the computer model with generic input x ∈ S ⊂ IRp . Screening The question we want to answer is: Which are the inputs that significantly impact the output, the so-called active inputs. The other inputs are called inert.
  • 3. Gaussian process prior We place a Gaussian process prior on y(·): y(·) | θ, σ2 ∼ GP(0, σ2 c(·, ·)) with correlation function c(x, x ) = p k=1 ck(xk, xk ) . We will assume that ck is a one-dimensional correlation function with fixed (known) roughness parameter and unknown range parameter γk > 0. We adopt the Matern 5/2 correlation function: with dk = |xk − xk | ck(xk, xk ) = ck(dk) = 1 + √ 5 dk γk + 5 d2 k 3 γ2 k exp − √ 5 dk γk so that θ = (γ1, . . . , γp).
  • 4. Our approach to screening The information to address the screening question comes in the form of model data: a design D = {x1, . . . , xn} is selected, the computer model is run at each of these configurations and we define y = (y(x1), . . . , y(xn)) . Consider all 2p models for y that result from allowing only a subset of the p inputs to be active: Let δ = (δ1, . . . , δp) ∈ {0, 1}p identify each of the subsets If δ = 0, Mδ : y | σ2, θ ∼ N(0, σ2Rδ) where Rδ =   k:δk =1 ck(xki , xkj )   i,j=1,...,n If δ = 0, Mδ : y | σ2, θ ∼ N(0, σ2I) Screening is now tantamount to assessing the support that y lends to each of the Mδ, i.e. a model selection exercise
  • 5. Parametrizations In multiple linear regression, the 2p models are formally obtained by setting to zero subsets of the vector of regression coefficients of the full model Here, it depends on the parametrization used for the correlation function: γk — range parameter: xk is inert ⇔ γk → +∞ βk = 1/γk — inverse range parameter: xk is inert ⇔ βk = 0 ξk = ln γk — log-inverse range parameter: xk is inert ⇔ ξk → −∞ These parameterizations are free of the roughness parameter (Gu et al. 2018)
  • 6. Bayes factors and posterior inclusion probabilitites Our answer to the screening question is then obtained by considering the marginal likelihoods m(y | δ) ≡ N(y | 0, σ2 Rδ) π(θ, σ2 | δ)dσ2 dθ which allow us to compute Bayes factors and, paired with prior model probabilities, posterior model probabilities, p(δ | y). Marginal posterior inclusion probabilities are particularly interesting in this context: p(xk | y) = δ: δk =1 p(δ | y)
  • 7. Difficulties Conceptual: selecting π(θ, σ2 | δ) Computational: computing the integral in m(y | δ) If p is large, enumeration of all models is not practical, so obtaining p(xk | y) is problematic
  • 8. Priors for Gaussian processes Berger, De Oliveira and Sans´o, 2001, JASA is a seminal paper c(x, x ) = c(||x − x ||, θ) with θ a unidimensional range parameter Focus on spatial statistics Some of the commonly used priors give rise to improper posteriors Reference prior is derived and posterior propriety is proved π(θ, σ2 ) = π(θ)/σ2 with π(θ) proper has long as the mean of the GP has a constant term P., 2005, AoS Focus on emulation of computer models c(x, x ) = k c(|xk − xk |, θk ) Reference prior is obtained and posterior propriety established when D is a cartesian product Many extensions are considered, e.g. to include a nugget, to focus on spatial models etc. Gu, Wang and Berger, 2018, AoS focuses on robust emulation Gu 2018, BA, the jointly robust prior
  • 9. Priors for Gaussian processes The reference prior is proper for many correlation functions as long as there is an unknown mean, but it changes very little if the GP has zero mean It is hence very appealing for variable selection; however, the normalizing constant is unknown (and we need it) it is computationally very costly Gu (2018) jointly robust prior π(β) = C0 p k=1 Ckβk a exp −b p k=1 Ckβk where C0 is known, mimics the tail behavior of the reference prior This talk is about using the jointly robust prior to answer the screening problem
  • 10. Robust emulation Emulation: using y(·) | y in lieu of the actual computer model Obtaining the posterior π(θ | y) via MCMC and using it to get to y(·) | y is not always practical The alternative is to compute an estimate of θ and plug it in the conditional posterior predictive of y(·) Problems arise because often a) ˆR ≈ I, prediction reverts to the mean, becoming an impulse function close to y b) ˆR ≈ 11T and numerical instability increases Gu et al. (2018) terms this phenomenon lack of robustness and because of the product nature of the correlation function it happens when a) ˆγk → 0 for at least one k b) ˆγk → +∞ for all k
  • 11. Robust emulation (cont.) Gu et al. (2018) shows that if the ˆγk are computed by maximizing f (y | γ) π(γ) where π(γ, σ2, µ) ∝ π(γ)/σ2 is the reference prior, then robustness is guaranteed Remarks: Since γk → +∞ means that xk is inert, this means that the reference prior will not be appropriate to detect through the marginal posterior modes the situation where all the inputs are inert The reference prior will not detect either the situation where some of the inputs are inert (by looking at marginal posterior modes) The mode is not invariant with respect to reparametrization, hence the parametrization matters Gu et al. (2018) discourage the use of βk
  • 12. Laplace approximation The jointly robust prior does not allow for a closed form expression for m(y | δ) There is a large literature on computing (ratios of) normalizing constants, often relying on MCMC samples from the posterior Our goal was to explore the possibility of using the Laplace approximation to compute m(y | δ), given that there is code available to obtain ˆβk, k = 1, . . . , p — R package RobustGaSP [Gu, Palomo and Berger, 2018] — and these possess nice properties
  • 13. Laplace approximation (cont.d) For each model indexed by δ = 0, m(y) ∝ exp[h(β)] dβ where, with π(β) denoting the JR prior, h(β) = ln L(β | y) + ln π(β) and L(β | y) ∝ (yT R−1 y)−n/2 |R|−1/2
  • 14. Laplace approximation (and BIC) Having obtained ˆβ = arg max h(β), expand h(ˆβ) around that mode to obtain (up to a constant) m(y) ≈ (2π)k/2 exp[h(ˆβ)] |H|−1/2 H = − ∂2h ∂β∂βT (ˆβ) where k = 1T δ is the number of inputs in model indexed by δ = 0. One can obtain explicit formulae for all these quantities (except for ˆβ!) that are functions of R−1 , ∂R ∂βj , ∂2R ∂βi ∂βj One also has all the quantities to compute BIC-based posterior marginals: m(y) ≈ n−k/2 L(ˆβ)
  • 15. Testbeds Linkletter et al. 2006, Technometrics deals with variable selection in the context of computer models. They consider several examples: Sinusoidal: y(x1, . . . , x10) = sin(x1) + sin(5x2) Simple linear model: y(x1, . . . , x10) = 0.2x1 + 0.2x2 + 0.2x3 + 0.2x4 Decreasing impact: y(x1, . . . , x10) = 8 i=1 0.2/2i−1xi No signal: y(x1, . . . , x10) = 0 The model data is obtained with a 54-run maximin Latin hypercube design in [0, 1]10 and iid N(0, 0.052) noise is added to the model runs.
  • 16. Sinusoidal function — initial results Upon trying to obtain the Laplace approximation for all 2p − 1 = 1023 models, We encountered models where the mode had entries that were at the boundary: ˆβk = 0 Those modes were of two types: hk < 0 or hk > 0 but non-zero gradient This precludes the use of the usual Laplace approximation, as the mode must be an interior point of the parameter space
  • 17. Variable selection: two approaches There are mainly two approaches to variable selection: Estimation-based The full model is assumed to be true and the prior (or penalty) encourages sparcity; a criterion is established for determining whether a variable is included or not Model selection-based, in which formally all possible models are considered, which is what we are trying to accomplish here The issue of multiple comparisons is present, but the priors for the parameters are one might say of a different nature The jointly robust prior differs from the reference prior in that it can detect the case where some of the inputs are inert, whereas the reference prior cannot This is was what we are observing: the jointly robust prior encourages sparcity
  • 18. The reference prior revisited... The motivation for the JR prior is that is that it matches the “expontential and polynomial tail decaying rates of the reference prior” Exponential decay when γ → 0 prevents R to be near diagonal Polynomial decay when γ → +∞ allows the likelihood to come into play — large values usually produce better fit Which polynomial? If γi → +∞ for i ∈ E {x1, . . . , xp} and the other γi are bounded, πR (γ) i∈E γ−3 i or πR (β) i∈E βi This motivates considering πJR (β) ∝ p i=1 βi πJR (β)
  • 19. Sinusoidal example revisited... This prior pushes the βk away from zero, so no modes at the boundary were found Some of the modes were pushed away from zero an order of magnitude so the inputs deemed more relevant The marginal posteriors of the vectors with some components of the mode at zero did not seem to change much The marginal posteriors of models with some support did not seem to change much either
  • 20. Sinusoidal example results Recall that only x1 and x2 are active; the prior on the model space was constant. With our modification of the jointly robust prior, these were the results obtained: x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 p(xk | y) 1 1 0.0808 0.0358 0.5720 0.1917 0.2891 0.7042 0.8249 0.0019 The true model was ranked in 44th place with pmp 0.00011. δ p(δ | y) x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 404 0.3235 1 1 0 0 1 0 0 1 1 0 452 0.2567 1 1 0 0 0 0 1 1 1 0 52 0.1442 1 1 0 0 1 1 0 0 0 0 408 0.0510 1 1 1 0 1 0 0 1 1 0 292 0.0422 1 1 0 0 0 1 0 0 1 0 260 0.03760 1 1 0 0 0 0 0 0 1 0
  • 21. The modified prior The culprit is not the inaccuracy of the Laplace approximation correlations in the design the prior or its hyperparameters it’s the model! We need a nugget, for δ = 0, Mδ : y | σ2 , σ2 0, θ ∼ N(0, σ2 Rδ + σ2 0I) The GP prior implies that the emulator is an interpolator at the observed data and that no longer happens when we remove inputs from the correlation matrix, hence defining an inadequate model...
  • 22. Adding the nugget By introducing η = σ2 0/σ2, the reference prior is now π(β, η, σ2 ) ∝ π(β, η) σ2 and σ2 can be integrated out The jointly robust prior is π(β, η) = C0 p k=1 Ckβk + η a exp −b p k=1 Ckβk + η And the tail of the reference prior γi → +∞ for i ∈ E {x1, . . . , xp}, η → 0, and the other γi are bounded, πR (γ, η) i∈E γ−3 i or πR (β, η) i∈E βi i.e., there is no penalization in η
  • 23. The modified prior For robustness, Gu et al. (2018) recommend the parametrization in terms of (ln β, ln η) = (ξ, τ), so that we set π(β, η) ∝ p k=1 βk p k=1 Ckβk + η a exp −b p k=1 Ckβk + η but do all the calculations (mode and Laplace approximation) in the (ξ, τ) parametrization We next show the results of the four examples (M = 250 simulations) the Laplace-based inclusion probabilities using a flat prior on the model space the Laplace-based inclusion probabilities using a Scott and Berger (2010) prior on the model space the BIC-based inclusion probabilities using a flat prior on the model space
  • 24. q q q q q q q q q q q 0 2 4 6 8 10 0.000.020.040.060.08 p P(M_p) Flat vs Scott and Berger
  • 25. qq q qqqq q qq q q q q q qqq q q qqqqq q qqqqqq q q q qq qq q qqqqq q q qq q qqqq qqq q qq q qqq q qqqq q q qqqqqq q qq q q q q q qq q qq q q q qq qqq q q q qq q qqq q qq q q q qqq q q q q q qq q q q q qq qqq q q qqqqq q q qqqq qqqq q q q q qqq q q qqq qq q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q x_1 x_3 x_5 x_7 x_9 0.00.20.40.60.81.0 Laplace − flat prior qq q q qq q q q qq q q q q q qqq q q q q q qq q q q q q qq q qq q q q q q q qqqq q q q qqq q q q qq q q q qq q q q qq q q qqq q q q q q q q q q q qqqq q q q qqq qq q q q q q qq q q q q qq q qq q qq q q q q q q q q q q q qq qq q qq q q q q q qqq q q q q qq qq q q qq q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q x_1 x_3 x_5 x_7 x_9 0.00.20.40.60.81.0 Laplace − SB prior qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqq q q qqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqq qqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqq q qqq qqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q x_1 x_3 x_5 x_7 x_9 0.00.20.40.60.81.0 BIC − flat prior simple linear model
  • 26. qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q qqq q qqqqqq q q q q q qqq qq q qq q q qq q qq q qq q qqqqqq q q qqqq q qq q q q q qqq q q q qq q q q q q q q qqq q q qqqqqqq q q q q q q qq q qqq q q q q q qq qqqqqq q qqqq qqq qq qq qq q qq qqq qq q qqq qq q qq q q q q q q qqqqq q q qqq q q q q q q q qq q qq q q q qq qq qq q q qqqqqqqq q q q qqq q qqqqqqqqqq qq qqq q qqq qq qq q qqqq q q q q q q q qq qqq q qqq q qqqqqqq q q q q qq q q x_1 x_3 x_5 x_7 x_9 0.00.20.40.60.81.0 Laplace − flat prior qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q qqq q qqqqqq q q q q q qqqqq q qq q q qq q qqqqq q qqqqqq q q qqqq q qqqqq q qqq qqq qq q q qqq q q qqqq q qqqqqqq q qqqqq qqqqqq q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqq q qq q q q q q q qqqqq qq qqq q q q q qqqqq qqq q q q qqqqqq q q qqqqqqqq q q qqqq q qqqqqqqqqqqqqqqqqqqqqqq q qqqq q q qq q q q qqqqq q qqqqqqqqqqqqqqq qq qq x_1 x_3 x_5 x_7 x_9 0.00.20.40.60.81.0 Laplace − SB prior qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q qq x_1 x_3 x_5 x_7 x_9 0.00.20.40.60.81.0 BIC − flat prior sinusoidal model
  • 27. q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q qq qqqq qqqqq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q x_1 x_3 x_5 x_7 x_9 0.00.20.40.60.81.0 Laplace − flat prior q q q q q q q q q q q q q q q qqq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q x_1 x_3 x_5 x_7 x_9 0.00.20.40.60.81.0 Laplace − SB prior q q qqqq q qq q q q q q qqqqq q qqqq q qqqq q q q qqq q qq q q q q q q q q q qq qqq q qqqqq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q x_1 x_3 x_5 x_7 x_9 0.00.20.40.60.81.0 BIC − flat prior decreasing impact
  • 28. q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q x_1 x_3 x_5 x_7 x_9 0.00.20.40.60.81.0 Laplace − flat prior q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q x_1 x_3 x_5 x_7 x_9 0.00.20.40.60.81.0 Laplace − SB prior q q q q q q q q q q q q qq q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q x_1 x_3 x_5 x_7 x_9 0.00.20.40.60.81.0 BIC − flat prior no signal
  • 29. Work in progress In the simulations, we find models for which H has negative entries in the diagonal. Upon inspection, the marginals seem well behaved, so this is numerical instability — we need more stable ways of computing H Investigate the possibility of using this method in variable selection in the bias function Produce an R package
  • 30. Summary Fully automatic formal Bayesian variable selection approach to screening which is computationally very simple Based on a correlation function that possesses atractive properties and on a prior which mimics the behavior of the reference prior Need for a nugget when considering all 2p models BIC behaves quite well and does not require H Thank you for your attention!