Bayesian Tomography Using Polynomial Chaos Expansion and Deep Generative Networks
Bayesian Tomography Using Polynomial Chaos Expansion and Deep Generative Networks
Astronomical
Society
Giovanni Angelo Meles ,1 Macarena Amaya ,1 Shiran Levy ,1 Stefano Marelli2 and
Niklas Linde1
1 Institute
of Earth Sciences, Department of Applied and Environmental Geophysics, University of Lausanne, 1015 Lausanne, Switzerland.
E-mail: [email protected]
2 Department of Civil, Environmental and Geomatic Engineering, ETH Zurich, Institute of Structural Engineering, 8093 Zurich, Switzerland
Accepted 2024 January 17. Received 2023 December 19; in original form 2023 October 10
C The Author(s) 2024. Published by Oxford University Press on behalf of The Royal Astronomical Society. This is an Open Access
article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
31
32 G. A. Meles et al.
principal component analysis (PCA) and related methods are the parametrizations and the parametrizations, we preserve the advan-
most common linear dimensionality reduction methods (Boutsidis tageous prior representation capabilities of DGMs while simulta-
et al. 2008; Jolliffe & Cadima 2016). Based on a set of prior model neously achieving precise PCA-based surrogate modelling, thereby
realizations and the eigenvalues/eigenvectors of the corresponding significantly speeding up forward calculations.
covariance matrix, PCA provides optimal M-dimensional represen-
tations in terms of uncorrelated variables that retain as much of the
sample variance as possible. PCA has found widespread applica- 2 METHODOLOGY
tion in geophysics, using both deterministic and stochastic inver-
sion algorithms, with recent advancements offering the potential 2.1 Bayesian inversion
to reconstruct even discontinuous structures (Reynolds et al. 1996; Forward models are mathematical tools that quantitatively evaluate
Giannakis et al. 2021; Thibaut et al. 2021; Meles et al. 2022). In the outcome of physical experiments. We refer to the relationship
the context of complex geological media with discrete interfaces between input parameters and output values as the ’forward prob-
(Strebelle 2002; Zahner et al. 2016; Laloy et al. 2018) investigated lem’:
in this study, PCA-based methods are ineffective in encoding the
prior knowledge. Promising alternatives are offered by deep gen- F(u) = y + . (1)
erative models (DGM) that learn the underlying input distribution Here, F, u, y and stand for the physical law or forward operator,
Gaussian field on a low-dimensional manifold and (ii) learn an ac- The formulation given above relies on the weak hypothesis that
curate surrogate to compute the forward problem. However, it is not Global proximity in the input domain leads to proximity in the
generally granted that a parametrization can achieve both (i) and output domain u (i.e. the observed quantity under investigation).
(ii). For representing the prior distribution, we can utilize manifold Critically, when the output functionally depends mainly on a subset
identification using DGMs. This involves utilizing a DGM to char- L of the entire domain of u, proximity in the output domain can be
acterize a latent space using a set of coordinates (here indicated attained by approximating the input within this Local region. Based
as z ) with a statistical distribution defined prior to the training. on this strong physically-informed assumption, we can achieve this
The DGM allows mapping between this latent space to the physi- goal by means of a Local PCA decomposition restricted to L:
cal space through a decoder/generator, denoted here as GDGM . For a
G LPC ( x LPC ) L ≈ u L ⇒ MLPC ( x LPC ) = y + ,
ˆ (8)
given random realization z , the decoder operation GDGM ( z ) produces
an output u in the physical space that adheres to the characteristics where LPC refers to Local Principal Components (in the following
of the prior distribution. The use of this new set of coordinates z LPCs) and L restricts the validity of these relationships to the
casts the inverse problem on the latent manifold as: subset L. However, because the area spanned by LPCs is smaller
P ( z | y ) ∝ P ( y | z ) P ( z ). (4) than that of GPCs, we expect to need fewer LPCs than GPCs to
achieve a satisfactory approximation in the output domain. Note
While formally identical to eq. (2), eq. (4) involves significant ad- that this change of coordinates is not invertible even if a complete
MDGM ( z ) = y + , (5) Once available, surrogate models can be used for likelihood eval-
uation in MCMC inversions, with a modified covariance operator
where MDGM = F ◦ G DGM , ◦ stands for function composition, and C D = C d + C Tapp comprising the covariance matrices C d and C Tapp
we assume no error induced by the DGM dimensionality reduc- accounting for data uncertainty and modelling error, respectively. In
tion, which in turns implies that the corresponding likelihood is the such cases, the likelihood not only shows a mild dependence on the
same as that in eq. (3). The complexity and non-linearity of GDGM parametrization but also on the surrogate model. This relationship
imply that the forward operator MDGM exhibit considerable irreg- is expressed as follows:
ularity, making it difficult to learn a surrogate model due to typical n/2
issues of neural networks, such as those associated with limited 1 −1/2 1 ˜
L ( M ( x GPC )) =
˜ |C D | exp − (M GPC ( x GPC ) − y)
T
data availability and overfitting (Bejani & Ghatee 2021). Conse- 2π 2
quently, we investigate alternative approaches that avoids using the
C D −1 (M
˜ GPC ( x GPC ) − y) . (10)
latent parametrization for surrogate modelling while retaining it
for the prior representation. Building upon Meles et al. (2022), where | C D | is the determinant of the covariance matrix C D (Hansen
we explore surrogate modelling based on PCA-inputs spanning the et al. 2014). Similar formulas for the likelihood can be derived
Global spatial extent of the input. Without any loss of generality, for any other parametrization of the input space, such as those
we consider a complete set of Global Principal Components (in the associated with DGMs of LPCs. The substantial differences in like-
following, GPCs) for realizations of the DGM (implemented via lihoods associated with the DGM, GPC and LPC parametrizations
GGPC ( x full
GPC ) = GDGM ( z ) = u) and rewrite eq. (1) as:
can potentially give rise to significantly different estimations of the
full posterior distribution.
MGPC x GPC = y + , (6)
Surrogate models all adhere to a fundamental principle: the more
where GGPC , and MGPC are the physical distribution and the model complex the input–output relationship, the higher the computational
associated with the GPCs and therefore MGPC = F ◦ G GPC . We demands needed to build the surrogate model (e.g. in terms of train-
will show in that the linear relationship GGPC ( x full
GPC ) = u helps in ing set). Furthermore, the efficiency of constructing a surrogate is
implementing a surrogate of MGPC , provided that the input and the significantly affected by the number of input parameters, and it can
model can be faithfully represented as operating on an effective become unfeasible when the input dimensionality exceeds several
M-dimensional truncated subset x GPC of the new coordinates x full
GPC , tens of parameters, thus surrogate modelling often relies on some
that is: kind of dimensionality reduction. The dimensionality reduction step
does not necessarily need to be invertible since what holds signifi-
G GPC ( x GPC ) ≈ u ⇒ MGPC ( x GPC ) = y + ,
ˆ (7)
cance is the supervised performance, specifically the minimization
where ˆ is a term including both observational noise and modelling of modelling error (Lataniotis et al. 2020). Surrogate models ex-
errors related to the projection on the subset represented by x GPC . hibit their peak potential when dealing with low-dimensional input
Due to the error introduced by the truncation, the likelihood associ- spaces, provided that such simplicity does not entail a complex
ated with the GPC parametrization deviates from the one of eq. (3) input–output relationship.
However, when a substantial number of GPCs are utilized, the extent Based on these general considerations, we can qualitatively antic-
of such difference tends to be minimal. ipate different surrogate performances when operating on different
34 G. A. Meles et al.
3.2.2 Global-PCA-PCE strategy To quantify the faithfulness of the various reduced parametriza-
tions in terms of the output, we consider 100 realizations of the
The second strategy, in the following Global-PCA-PCE, uses in-
generative model, and compute the resulting histograms of the trav-
puts of the PCE modelling defined in terms of projections on
eltime residuals using the reference forward solver. The root-mean-
prior-informed PCA components spanning the entire domain. More
square error (in the following, rmse) of the misfit between the data
specifically, in the Global-PCA-PCE approach we randomly cre-
associated with the original distribution and its projections on 30, 60
ate a total of 1000 slowness realizations GVAE ( z ) from the prior
and 90 principal components, shown in Figs 4(i)–(k), are 1.60, 0.85
and compute the corresponding principal components (see Fig. 3).
and 0.55 ns, respectively, that are to be compared to the expected
The input for PCE in the Global-PCA-PCE approach are then the
level of GPR data noise of 1 ns for 100 MHz data (Arcone et al.
projections of GVAE ( z ) on a to-be-chosen number of such GPCs.
1998). The number of principal components (i.e. 90 PCs) required
Following Meles et al. (2022), all PCA processes are defined in
to approximate the forward solver below the expected noise level
terms of slowness.
is larger than for the example considered by Meles et al. (2022)
The effective dimensionality of the input with respect to MGPC ,
(i.e. 50 PCs). Building a PCE on such a large basis is challenging
that is, the number of GPCs representing the input, is not a-priori
in terms of computational requirements and efficiency, and could
defined. Following a similar approach to Meles et al. (2022), the
lead to poor accuracy if a small training set is used. To address
effective dimensionality is here assessed by comparison with the
this, one approach is to either reduce the number of components,
reference solution in the output domain with respect to the noise
which introduces larger modelling errors, or explore alternative
level. In Figs 4(a) and (e), two velocity distributions are shown next
parametrizations that offer improved computational efficiency and
to the approximate representations (Figs 4b–d and f–h) obtained by
accuracy. In this study, the Global-PCA-PCE approach utilizes 60
projecting them on 30, 60 and 90 GPCs, respectively. As expected,
GPCs, while an alternative strategy is considered below that is based
the reconstruction quality improves as more principal components
on physical considerations.
are included.
36 G. A. Meles et al.
Figure 3. (a)–(j) The first five GPCs in the input domain corresponding to entire slowness fields. Crosses and circles stand for sources and receivers, respectively.
3.2.3 Local-PCA-PCE strategy expect that fewer LPCs are needed than GPCs to achieve the de-
sired input–output accuracy. In practice, our construction of LPCs
As mentioned in Section 2.3, an improved parametrization for surro-
involves utilizing fat-ray sensitivity kernels, which capture the sensi-
gate modelling can sometimes be found by considering the forward
tivity of arrival-times to small perturbations in the subsurface model,
problem’s specific characteristics. The GPCs in the Global-PCA-
thus providing valuable insights into the regions [corresponding to
PCE approach refer to the input field in its entirety. However, the
the L subset in eq. (8)] that have the most significant impact on the
actual first-arrival time for a given source–receiver combination de-
observed data (Husen & Kissling 2001). For a given source–receiver
pends only on a subdomain of the entire slowness distribution. This
pair, the corresponding sensitivity kernel depends on the slowness
leads us to suggest a Local approach, in the following referred to
field itself and its patterns can vary significantly (see Figs 5a–j).
as Local-PCA-PCE. Instead of using principal components describ-
The sought Local decomposition needs to properly represent any
ing the entire slowness field, we aim to use output-specific LPCs
possible slowness field within the prior, thus it reasonable to define
that characterize only the sub-domains impacting the physics of
it based on a representative sample of input realizations. To reduce
the problem (Jensen et al. 2000; Husen & Kissling 2001). We then
Bayesian tomography via PCE and DGMs 37
the overall PCE computational cost it is also convenient to limit cumulative kernel (see Figs 5k–t). Based on this insight, we define
the number of used output-driven decompositions. To achieve these principal components spanning only the area covered by such cu-
goals, we assume that the prior model is stationary with respect mulative kernels or relevant parts thereof (e.g. a threshold can be
to translation. Instead of focusing on each specific source–receiver taken into consideration to reduce the size of these subdomains).
pair (a total of 144), we can then consider each source–receiver For the practical definition of the components, the cumulative ker-
angle (a total of 23). We then use a total of 1000 slowness real- nels are either set to 0 or 1 depending on whether the correspond-
izations GVAE ( z ) from the prior and build the corresponding fat-ray ing value is below or larger than the threshold, respectively. We
sensitivity kernels using the reference eikonal solver for each of the then multiply point by point the slowness distributions with the cu-
23 possible angles (Hansen et al. 2013). For any given angle, we mulative kernels, and consider the principal components of such
consider a cumulative kernel consisting of the sum of the absolute products.
values of each kernel (green areas in Figs 5k–t). Such a cumulative The first five LPCs are shown for given source–receiver pairs
kernel cover an area larger than each individual kernel but is still (Figs 6a–e and f–j). Note that the pattern variability is confined
considerably smaller than the entire slowness field. For any possible within the cumulative kernels, while in the complementary areas the
additional input model, the corresponding sensitivity kernel is then values are 0. Note also that compared to the five principal compo-
very likely to be geometrically included in the area covered by the nents in Fig. 3, higher resolution details can be identified. Given the
38 G. A. Meles et al.
Figure 5. (a)–(j) For the velocity fields in Fig. 2, the sensitivity-based kernels for two arbitrarily selected source–receiver pairs are shown superimposed on the
corresponding velocity distributions. (k)–(t) The same sensitivity kernels as in (a)–(j), but superimposed on the cumulative kernels (green shaded areas) used
to define the support of the sensitivity-based LPCs used in the Local-PCA-PCE approach.
same number of principal components, we can then expect the input altitude angle, the same kernels and principal components are used,
to be better presented in the physically relevant subdomain when provided they are shifted to cover the appropriate geometry.
the Local-PCA-PCE rather then the Global-PCA-PCE approach is In Figs 7(a)–(g) the two slowness distributions from Fig. 4 are
followed. For all source–receiver pairs corresponding to the same shown next to the representations obtained by projecting them on
Bayesian tomography via PCE and DGMs 39
Figure 7. (a) and (g) Velocity fields with (b)–(f) and (h)–(l) the corresponding representations used for surrogate modelling based on the first 30 LPCs used in
the Local-PCA-PCE approach. Different kernels are used for each source–receiver angle.
40 G. A. Meles et al.
30 LPCs. In the areas complementary to the sensitivity kernels, the corresponding likelihood to mildly differ from that involving no
speed is set to 0.07 m ns−1 . Input reconstructions are remarkably truncation in the GPC projection and exact modelling.
improved with respect to when using the same number of GPCs
[compare Figs 7(a)–(g) and (h)–(l) to Figs 4(b) and (f)] of the entire
slowness field. More importantly, the modelling errors provided by
3.3.3 Local-PCA-PCE performance
using just 30 sensitivity-based LPCs is lower than what was previ-
ously provided by 90 standard components (i.e. rms ≈0.45 ns). By In the Local-PCA-PCE approach, for each of the 23 angles consid-
incorporating these tailored LPCs, we can attain enhanced output ered, training involves randomly chosen source–receiver pair data
fidelity when utilizing truncated representations of the input. This associated with identical angles, while the final rmse is computed on
enhanced fidelity proves particularly advantageous for the imple- the standard 144 traveltime gathers. For a more comprehensive eval-
mentation of PCE, allowing for more precise and efficient mod- uation of the Local-PCA-PCE scheme, we incorporate the results of
elling. Consequently, this approach holds substantial promise in FDTD data processing. In this analysis, training and validation rely
achieving superior accuracy and computational efficiency in PCE- on the FDTD data, while the PCE implementation remains consis-
based analyses. tent with that of the eikonal data. Results are similar and well below
In summary, we have introduced three different parametrizations the noise, with an rmse of 0.65 ns (the corresponding histogram is
to be used as input for PCE. We consider coordinates inherited by displayed in Fig. 8d). All PCE results are unbiased and the model er-
Figure 8. (a)–c) Histograms of the model error with respect to the PCE prediction when using the VAE (20 input parameters), Global (60 input parameters)
and Local (30 input parameters per angle) parametrizations of the input in the PCE-based surrogate modelling, respectively, using the eikonal solver to compute
the training set. (d) Histogram of the model error using Local parameters and FDTD reference data.
to the VAE-PCE scheme (i.e. 60 versus 20). Note that the PCE to either one or 35 input, respectively. While such a result cannot
model evaluations are vectorized and, therefore, the cost is almost be generalized, it is always possible to test the corresponding PCEs
the same when applied to 35 input (≈0.57s). Moreover, the com- accuracy with a representative evaluation set. The option of relying
putational cost of the Global-PCA-PCE method could be reduced on a single family of polynomials for the Local-PCA-PCE method
by applying a PCA decomposition of the output, akin to what is is certainly to be taken into account when optimising the approach.
proposed in Meles et al. (2022). Despite involving fewer variables
than the Global-PCA-PCE approach, the Local-PCA-PCE method
3.4 Inversion results
is slightly more computationally demanding with a cost of ≈0.64s
and ≈0.65s, respectively, when operating on one or 35 input, re- We now explore the performance of the different input parametriza-
spectively. The increase in cost compared to the Global-PCA-PCE tions used for PCE-based surrogate modelling, namely VAE-PCE,
method depends on the fact that for each source–receiver angle (θ), Global-PCA-PCE and Local-PCA-PCE, when performing proba-
the Local-PCA-PCE utilizes its own polynomial basis, denoted as bilistic MCMC inversion. The inversions were carried out using the
θ in the following. UQLAB Matlab-based framework (Marelli & Sudret 2014; Wag-
In comparison with the PCE methods, the cost of the reference ner et al. 2021b). As reference model, consider the field shown
eikonal solver is basically a linear function of the number of input in Fig. 10, which is used to generate a total of 144 traveltimes
distributions it operates on. A single run requires ≈0.05s, while for using the reference eikonal and FDTD solvers. Note that this
35 velocity distributions the cost increases up to ≈1.67s. As such, field is not used to train the PCEs. Uncorrelated Gaussian noise
its cost is either significantly smaller or slightly larger than what characterized by σ 2 = 1 ns2 was added to the data used for
is required by the Global/Local-PCA-PCE approaches. Finally, the inversion.
cost required by the reference FDTD code is ≈120s and ≈4200s if We use a Metropolis–Hastings algorithm, and run 35 non-
operating on either one or 35 velocity distributions, which is orders interacting Markov chains in parallel for 4 × 105 iterations per
of magnitude longer than for the eikonal or PCE models. These chain. During burn-in determined according to the Geweke method,
results are summarized in Table 1, where we estimate the perfor- the scaling factor of the proposal distribution is tuned such that an
mance of an ideally-optimized Local-PCA-PCE method benefiting acceptance rate of about 30 per cent is achieved for each experi-
from (a) evaluating GVAE ( z ) in its native environment and (b) using ment (Geweke 1992; Brunetti et al. 2019). Finally, outlier chains
a single family of polynomials θ for all angles. In numerical re- with respect to the interquartile range statistics discussed in Vrugt
sults not presented herein, we find that choosing any one of the θ et al. (2009) are considered aberrant trajectories and are ignored in
families for all models provides nearly identical fidelity to what is the analysis of the results.
achieved by using specifically tailored polynomials for each angle We first present the results for training data generated by an
at the considerably smaller cost of ≈0.06s and 0.16s when applied eikonal solver. We compare VAE-PCE, Global-PCA-PCE and
42 G. A. Meles et al.
Table 1. Summary of the computational cost of the various steps for a single realization/batch of 35 input of the proposed algorithms.
In addition to the strategies used in the MCMC examples discussed in the manuscript and summarized in the first four columns
(i.e. the VAE-PCE, Global- and Local-PCA-PCE and eikonal schemes), we also consider the cost of the FDTD approach using the
reference code (fifth column) and an optimized Local-PCA-PCE approach ideally benefiting from executing the VAE decoder in the
same environment as the PCE model and based on a single polynomial family for all angles (sixth column).
VAE-PCE Global-PCA-PCE Local-PCA-PCE Eikonal FDTD Optimized Local-PCA-PCE
GVAE ( z ) 0 ≈1.35/1.43s ≈1.35/1.43s ≈1.35/1.43s ≈1.35/1.43s ≈0.005/0.08s
PCA 0 ≈0.002/0.05s ≈0.06/0.23s 0 0 ≈0.06/0.23s
Forward ≈0.06/0.08s ≈0.52/0.57s ≈0.64/0.65s ≈0.05/1.67s ≈120/4200s ≈0.06/0.16s
Figure 12. Gelman–Rubin statistics for the various inversion strategies using 35 chains after 4 × 105 iterations per chain.
Table 2. Assessment of the inversion results in the input and output domains for the various surrogate
modelling strategies.
Model Rmse mean velocity SSIM mean velocity Rmse mean output
VAE-PCE 8.01 × 106 m s−1 0.30 3.49 ns
Global-PCA-PCE 5.38 × 106 m s−1 0.54 1.49 ns
Local-PCA-PCE 2.67 × 106 m s−1 0.73 1.15 ns
Eikonal 1.57 × 106 m s−1 0.87 1.01 ns
FD Local-PCA-PCE 3.06 × 106 m s−1 0.71 1.15 ns
PCE has become widespread in many disciplines. The massive de- bound of the PCA decomposition decreases when more PCs are
crease of the computational costs associated with PCE is achieved taken into account, the corresponding accuracy of PCE is primarily
by approximating demanding computational forward models with limited by the size of the training set. Increasing the number of
simple and easy-to-evaluate functions. A key requirement is that components can actually worsen PCE performance if the training
the number of input variables describing the computational model set is insufficient to determine the polynomial coefficients, which,
is relatively small (i.e. up to a few tens) and that the target model as mentioned, grow significantly with input size. In our case, using
can be approximated by truncated series of low-degree multivariate 90 components would imply an rmse of 1.39 ns, which is worse
polynomials. The number of coefficients defining the PCE model than what was obtained by the 60 components PCE.
grows polynomially in both the size of the input and the max- Both the VAE-PCE and Global-PCA-PCE consist of 144 (i.e. the
imum degree of the expansion. When the reference full-physics total number of source–receiver combinations) different PCE mod-
model response is highly non-linear in its input parameters, the els operating, for each traveltime simulation, on an identical input,
problem is typically non-tractable (Torre et al. 2019). Since the high- that is, the latent variables of the DGM or the 60 PCs characterizing
fidelity mapping of complex prior distributions provided by DGMs the entire physical domain. On the other hand, the Local-PCA-PCE
is based on highly non-linear relationships between latent variables scheme consists of 23 (i.e. the total number of source–receiver al-
and physical domains, replicating the performance of such net- titude angles) different PCE models operating, for each traveltime
works and/or composite functions thereof (e.g. MVAE = F ◦ G VAE simulation, on 144 different input, that is, the 30 LPCs characteriz-
in eq. 5) using PCE is problematic. To circumvent this challenge, ing each local subdomain. Since each of the 23 models operates on
we have explored two PCA-based decompositions that facilitated specific LPCs, the corresponding families of orthonormal polyno-
the straightforward implementation of PCE. One decomposition mials θ are different. This is in contrast with the Global-PCA-PCE
was designed to encompass the entire input domain, while the other method, for which each model operates via a single family of polyno-
specifically focused on subdomains of particular physical relevance. mials. Thus, the Local-PCA-PCE scheme is computationally more
Although this latter concept is investigated here in the context of demanding than the Global-PCA-PCE (see Table 1). However, the
traveltime tomography, the integration of problem-related PCs op- use of a single family of polynomials can also be considered for the
erating synergistically with DGM-defined latent parametrizations Local-PCA-PCE method, resulting in shorter run time. When con-
has the potential for broader applications. sidering computational performance, an optimal implementation of
Whatever the choice of input coordinates, for example, based on GVAE ( z ) should also be sought.
PCA or local properties of the input, the determining criterion for In this study, to determine the minimum number of G/LPCs for
evaluating the quality of the corresponding PCE should always be constructing an accurate PCE, we assess the lower bound of output
performance on a representative validation set. In case of PCA, the prediction misfit rmse as a function of the number of G/LPCs used.
lower bound of prediction misfit rmse can be a priori estimated by We project the input onto subsets of G/LPCs, typically ranging in the
comparing the accuracy of the reference model acting on the full tens. This process generates non-binary images, which are then uti-
and the compressed domains, that is, MG/LPC ( x full ) and MG/LPC ( x ). lized to compute the output using the reference forward modelling.
In our case, such lower bounds for the Global-PCA-PCE approach Alternatively, we could consider rebinarizing the reconstructed im-
operating with 60 components is 0.85 ns. Using the Local-PCA- ages as done in Thibaut et al. (2021). This approach would bring the
PCE scheme with 30 components only, the rmse drops to 0.55 ns. projected input back into the prior, but this property is not neces-
However, the accuracy of the corresponding Global/Local-PCA- sarily relevant for the determination of PCE accuracy. Irrespective
PCE is worse, with rmse of 1.31 and 0.67 ns, respectively, mainly of the chosen reconstruction algorithm, the Local approach main-
due to the small size of the training set. Note that while the lower tains a significant advantage over the Global method. When using
Bayesian tomography via PCE and DGMs 45
Figure 14. Random samples from the posterior obtained by using the (a) VAE-PCE, (b) Global-PCA-PCE, (c) Local-PCA-PCE, (d) eikonal and (e) FD
Local-PCA-PCE strategies. The results in (c)–(e) are visually similar.
an equal number of components, LPCs, in fact, consistently yield rather than advanced deep learning methods for surrogate mod-
superior approximations of the relevant input compared to GPCs. elling can be advantageous in terms of ease of implementation,
We have seen that once a DGM-based latent parametrization has as potentially complex training of a neural network is not needed.
been found to reduce the effective dimensionality of the input do- Many effective sampling methods, such as Adaptive Metropolis,
main and, based on PCA decompositions, high fidelity PCEs have Hamiltonian Monte Carlo, or the Affine Invariant Ensemble Algo-
been trained, MCMC inversion can be efficient. Relying on PCE rithm (Duane et al. 1987; Haario et al. 2001; Goodman & Weare
46 G. A. Meles et al.
Hastings, W.K., 1970. Monte Carlo sampling methods using Markov chains Métivier, D., Vuffray, M. & Misra, S., 2020. Efficient polynomial chaos
and their applications, Biometrika, 57(1), 97–109. expansion for uncertainty quantification in power systems, Electr. Power
Higdon, D., McDonnell, J.D., Schunck, N., Sarich, J. & Wild, S.M., 2015. Syst. Res., 189, doi:10.1016/j.epsr.2020.106791.
A Bayesian approach for parameter estimation and prediction using a Nagel, J.B., 2019. Bayesian Techniques for Inverse Uncertainty Quantifica-
computationally intensive model, J. Phys. G: Nucl. Part. Phys., 42(3), tion, pp. 504, IBK Bericht.
doi:10.1088/0954–3899/42/3/034009. Rasmussen, C.E., 2003. Gaussian processes in machine learning, in Summer
Ho, J., Jain, A. & Abbeel, P., 2020. Denoising diffusion probabilistic models, School on Machine Learning, pp. 63–71, Springer.
Adv. Neural Inform. Process. Syst., 33, 6840–6851. Reynolds, A.C., He, N., Chu, L. & Oliver, D.S., 1996. Reparameteriza-
Husen, S. & Kissling, E., 2001. Local earthquake tomography between tion techniques for generating reservoir descriptions conditioned to vari-
rays and waves: fat ray tomography, Phys. Earth planet. Inter., 123(2–4), ograms and well-test pressure data, SPE J., 1(04), 413–426.
127–147. Sacks, J., Schiller, S.B. & Welch, W.J., 1989. Designs for computer experi-
Irving, J. & Knight, R., 2006. Numerical modeling of ground-penetrating ments, Technometrics, 31(1), 41–47.
radar in 2-D using MATLAB, Comput. Geosci., 32(9), 1247–1258. Strebelle, S., 2002. Conditional simulation of complex geological structures
Jensen, J.M., Jacobsen, B.H. & Christensen-Dalsgaard, J., 2000. Sensitivity using multiple-point statistics, Math. Geol., 34(1), 1–21.
kernels for time-distance inversion, Solar Phys., 192(1), 231–239. Tarantola, A., 2005. Inverse Problem Theory and Methods for Model Pa-
Jetchev, N., Bergmann, U. & Vollgraf, R., 2016. Texture synthesis with rameter Estimation, SIAM.
spatial generative adversarial networks, preprint (arXiv:1611.08207). Thibaut, R., Laloy, E. & Hermans, T., 2021. A new framework for exper-
Jolliffe, I.T. & Cadima, J., 2016. Principal component analysis: a re- imental design using bayesian evidential learning: the case of wellhead
C The Author(s) 2024. Published by Oxford University Press on behalf of The Royal Astronomical Society. This is an Open Access
article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.