0% found this document useful (0 votes)
46 views23 pages

1 s2.0 S004578252300659X Main

This paper introduces GPLaSDI, a framework that uses Gaussian processes for latent space dynamics identification from partial differential equations. GPLaSDI trains an autoencoder to map PDE solutions to a latent space, and uses GPs to interpolate and quantify uncertainty in the latent space ordinary differential equations. This enables adaptive training to select additional data points efficiently. The method is demonstrated on Burgers, Vlasov, and thermal bubble problems, achieving speedups of 200-100,000 times with up to 7% error.

Uploaded by

tjrajan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views23 pages

1 s2.0 S004578252300659X Main

This paper introduces GPLaSDI, a framework that uses Gaussian processes for latent space dynamics identification from partial differential equations. GPLaSDI trains an autoencoder to map PDE solutions to a latent space, and uses GPs to interpolate and quantify uncertainty in the latent space ordinary differential equations. This enables adaptive training to select additional data points efficiently. The method is demonstrated on Burgers, Vlasov, and thermal bubble problems, achieving speedups of 200-100,000 times with up to 7% error.

Uploaded by

tjrajan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

Contents lists available at ScienceDirect

Comput. Methods Appl. Mech. Engrg.


journal homepage: www.elsevier.com/locate/cma

GPLaSDI: Gaussian Process-based interpretable Latent Space


Dynamics Identification through deep autoencoder
Christophe Bonneville a ,∗, Youngsoo Choi b , Debojyoti Ghosh b , Jonathan L. Belof b
a Cornell University, Ithaca, NY 14850, United States
b Lawrence Livermore National Laboratory, Livermore, CA 94550, United States

ARTICLE INFO ABSTRACT

Keywords: Numerically solving partial differential equations (PDEs) can be challenging and computa-
Autoencoders tionally expensive. This has led to the development of reduced-order models (ROMs) that
Gaussian processes are accurate but faster than full order models (FOMs). Recently, machine learning advances
Partial differential equation
have enabled the creation of non-linear projection methods, such as Latent Space Dynamics
Reduced-order-model
Identification (LaSDI). LaSDI maps full-order PDE solutions to a latent space using autoencoders
Latent-space identification
and learns the system of ODEs governing the latent space dynamics. By interpolating and
solving the ODE system in the reduced latent space, fast and accurate ROM predictions can
be made by feeding the predicted latent space dynamics into the decoder. In this paper, we
introduce GPLaSDI, a novel LaSDI-based framework that relies on Gaussian process (GP) for
latent space ODE interpolations. Using GPs offers two significant advantages. First, it enables
the quantification of uncertainty over the ROM predictions. Second, leveraging this prediction
uncertainty allows for efficient adaptive training through a greedy selection of additional
training data points. This approach does not require prior knowledge of the underlying PDEs.
Consequently, GPLaSDI is inherently non-intrusive and can be applied to problems without a
known PDE or its residual. We demonstrate the effectiveness of our approach on the Burgers
equation, Vlasov equation for plasma physics, and a rising thermal bubble problem. Our
proposed method achieves between 200 and 100,000 times speed-up, with up to 7% relative
error.

1. Introduction

Over the past few decades, the advancement of numerical simulation techniques for understanding physical phenomena
has been remarkable, leading to increased sophistication and accuracy. Simultaneously, computational hardware has undergone
significant improvements, becoming more powerful and affordable. As a result, numerical simulations are now extensively utilized
across various domains, including engineering design, digital twins, decision making [1–5], and diverse fields such as aerospace,
automotive, electronics, physics, biology [6–13].
In engineering and physics, computational simulations often involve solving partial differential equations (PDEs) through differ-
ent numerical techniques like finite difference/volume/element methods, particle methods, and more. While these methods have
demonstrated their accuracy and capability to provide high-fidelity simulations when applied correctly, they can be computationally
demanding. This is particularly evident in time-dependent multiscale problems that encompass complex physical phenomena
(e.g., turbulent fluid flows, fusion device plasma dynamics, astrodynamic flows) on highly refined grids or meshes. Consequently,

∗ Corresponding author.
E-mail address: [email protected] (C. Bonneville).

https://doi.org/10.1016/j.cma.2023.116535
Received 10 August 2023; Received in revised form 7 October 2023; Accepted 9 October 2023
Available online 14 October 2023
0045-7825/© 2023 Published by Elsevier B.V.
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

performing a large number of forward simulations using high-fidelity solvers can pose significant computational challenges,
especially in situations like uncertainty quantification [14–16], inverse problems [14,16–18], design optimization [19,20], and
optimal control [21].
The computational bottleneck associated with high-fidelity simulations has prompted the development of reduced-order models
(ROMs). The primary objective of ROMs is to simplify the computations involved in the full-order model (i.e., the high-fidelity
simulation) by reducing the order of the problem. Although ROM predictions are generally less accurate, they offer significantly
faster results, making them highly appealing when a slight decrease in accuracy is tolerable. Several ROM techniques rely on the
projection of snapshot data from the full-order model.
Linear projection methods such as the proper orthogonal decomposition (POD) [22], the reduced basis method [23], and the
balanced truncation method [24] have gained popularity in ROM applications. These linear-subspace methods have been successfully
applied to various equations such as the Burgers and Navier–Stokes equations [25–28], Lagrangian hydrodynamics [29,30],
advection–diffusion problems [31,32], and design optimization [33,34]. More recently, non-linear projection methods [35–38]
utilizing autoencoders [39,40] have emerged as an alternative and have shown superior performance in advection-dominated
problems [35,41,42].
Data-driven projection-based ROM methods can generally be categorized into two types: intrusive and non-intrusive. Intrusive
ROMs are physics-informed models that require knowledge of the governing equation [23,26,28,29,32,36–38,43]. This characteristic
enhances the robustness of predictions and often necessitates less data from the full-order model (FOM). However, intrusive ROMs
also demand access to the FOM solver and specific implementation details, such as the discretized residual of the PDE.
In contrast, non-intrusive ROMs are independent of the governing equation and rely solely on data-driven techniques. These
methods typically utilize interpolation techniques to map parameters to their corresponding ROM predictions [44–47]. However,
being purely black-box approaches, these methods lack interpretability and robustness. Sometimes, they struggle to generalize
accurately and may exhibit limitations in their performance.
To overcome these challenges, recent approaches have focused on combining projection methods with latent space dynamics
learning. These approaches view the latent space as a dynamical system governed by a set of ordinary differential equations (ODEs).
By accurately identifying these ODEs, the dynamics of the latent space can be predicted and projected back into the space of full-order
solutions.
Various methods have been proposed for identifying governing equations from data [48,49], including the widely used Sparse
Identification of Non-Linear Dynamics (SINDy) [50]. SINDy constructs a library of terms that could potentially be part of the governing
ODEs and estimates the associated coefficients using linear regression. This method has gained popularity and has been extended
to a broad range of SINDy-based identification algorithms [51–59].
Champion et al. [60] introduced an approach that combines an autoencoder with SINDy to identify sets of ODEs governing
the dynamics of the latent space. While promising, the identified ODEs are not parameterized based on simulation parameters,
limiting the method’s generalizability. Bai and Peng [61] proposed a similar approach but with a linear projection using proper
orthogonal decomposition (POD). They also introduced parameterization of the latent space ODEs, enabling ROM predictions for
any point in the parameter space. However, this method exhibits limitations when applied to advection-dominated problems due
to the constraints of POD.
Fries et al. [41] proposed a framework called the Latent Space Dynamic Identification (LaSDI), which combines autoencoders
with a parametric SINDy identification of the latent space. In LaSDI, a set of ODEs corresponding to each training data point from
the full-order model (FOM) is estimated. The coefficients of each ODE are then interpolated based on the FOM parameters using
radial basis functions (RBF). However, the sequential training of the autoencoder and SINDy identifications in LaSDI can sometimes
lead to a complex latent space with poorly conditioned and/or inaccurate sets of governing ODEs. Additionally, the training FOM
dataset in LaSDI is generated on a predefined grid in the parameter space, which may not be optimal. The uniform grid may result
in insufficient data in certain regions of the parameter space and excessive data in others, affecting the accuracy of the model.
To address these issues, He et al. [42] introduced the greedy-LaSDI (gLaSDI) framework. In gLaSDI, the autoencoder and SINDy
latent space identifications are trained simultaneously, and the FOM data points are sampled sequentially during training. The
training starts with only a few points, and at a fixed sampling rate, the model’s prediction accuracy is evaluated by plugging the
decoder output into the PDE residual. The parameter yielding the largest residual error is selected as the new sampling point, and
a FOM simulation is performed to obtain the solution, which is then added to the training dataset. This iterative process ensures
that the model focuses on regions of the parameter space where accuracy is needed. Although gLaSDI is robust and accurate, it
inherently requires knowledge of the PDE residual, making it an intrusive ROM method. Therefore, gLaSDI may not be suitable for
problems where the residual is unknown, difficult to implement, or computationally expensive to evaluate.
In this paper, we present the (greedy) Gaussian-Process Latent Space Identification (GPLaSDI), a non-intrusive extension of the
greedy LaSDI framework. One of the main sources of error in LaSDI and gLaSDI arises from potential inaccuracies in the interpolation
of the ODE coefficients. To address this issue, we propose replacing deterministic interpolation methods (such as RBF interpolation
in LaSDI [41] and 𝑘-NN convex interpolation in gLaSDI [42]) with a Bayesian interpolation technique called Gaussian Process
(GP) [62].
A GP can provide confidence intervals for its predictions, allowing us to quantify the uncertainty in the sets of ODEs governing the
latent space. This uncertainty can then be propagated to the decoder, enabling the generation of ROM prediction confidence intervals
for any test point in the parameter space. We adopt a sequential procedure for sampling additional FOM training data, similar to
gLaSDI. However, instead of relying on the PDE residual to select the next parameter for sampling, we choose the parameter that
yields the highest predictive uncertainty.

2
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

Fig. 1. GPLaSDI general framework. (1) Combined training of the autoencoder and the SINDy latent space identification. (2) Interpolation of the latent space
governing sets of ODEs using Gaussian Processes. (3) ROM Prediction methodology using GPLaSDI. (4) FOM training data greedy sampling algorithm, using (1),
(2) and (3).

This approach offers two significant advantages. First, it makes our framework fully independent of the specific PDE and its
residual, making it applicable to a wide range of problems. Second, our method can generate meaningful confidence intervals for
ROM predictions, providing valuable information for assessing the reliability of the simulations.
The details of the GPLaSDI framework are presented in Section 2, where we introduce the utilization of autoencoders in
Section 2.2, the application of SINDy in Section 2.3, the adoption of GP interpolation in Section 2.4, and the incorporation of
variance-based greedy sampling in Section 2.6. The overall structure of the GPLaSDI framework is summarized in Fig. 1, and the
algorithmic procedure is outlined in Algorithm 1. To demonstrate the effectiveness of GPLaSDI, we provide case studies in Section 3.
Specifically, we investigate the application of GPLaSDI to the 1D Burgers equation in Section 3.1, the 2D Burgers equation in
Section 3.2, the 1D1V Vlasov equation for plasma physics in Section 3.3, and the 2D rising thermal bubble problem in Section 3.4.
Through these examples, we highlight the performance and capabilities of the GPLaSDI framework. In conclusion, we summarize
the key findings and contributions of the paper in Section 4, providing a comprehensive overview of the work conducted in this
study.

2. GPLaSDI framework

2.1. Governing equation of physical systems

In the following sections, we consider physical phenomenons described by governing PDEs with the following form:
⎧ 𝜕𝐮
⎪ = 𝐟(𝐮, 𝑡, 𝑥 ∣ 𝝁) (𝑡, 𝑥) ∈ [0, 𝑡max ] × 𝛺
⎨ 𝜕𝑡 (1)
⎪ 𝐮(𝑡 = 0, 𝑥 ∣ 𝝁) = 𝐮0 (𝑥 ∣ 𝝁) 𝝁∈

3
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

Table 1
Table of notations.
Notation Description
𝐮(𝑖)
𝑛 FOM snapshot at time step 𝑛 and parameter 𝜇 (𝑖)
𝐮̂ (𝑖)
𝑛
Reconstructed autoencoder FOM snapshot at time step 𝑛 and parameter 𝜇 (𝑖)
𝐮̃ 𝑛(∗,𝑑) Predicted ROM solution at time step 𝑛 and test parameter 𝜇 (∗)
𝐳𝑛(𝑖) Latent space representation of snapshot 𝐮𝑛(𝑖)
𝐳̃ 𝑛(∗,𝑑) Predicted latent space dynamics at time step 𝑛, for test parameter 𝜇 (∗) and sample 𝑑
𝜇 (𝑖) 𝜇 (∗)
𝜇 /𝜇
𝜇 Training parameter/Test parameter
𝜙𝑒 /𝜙𝑑 Encoder/Decoder
𝜃 enc /𝜃𝜃 dec Encoder weights/Decoder weights
(𝑖)
𝛯 (𝑖) = {𝜉𝑗,𝑘 } SINDy coefficients for the set of ODEs associated to parameter 𝜇 (𝑖)
𝜃 gp = {𝜃𝜃 𝑗,𝑘
gp } Sets of GP hyperparameters
𝛩 (⋅) SINDy dictionary
/ℎ Parameter space/Discretized parameter space
[[1, 𝑁]] A set of integers, i.e., {1, … , 𝑁}
𝑛𝜇 Dimension of 
𝑁𝑢 Number of degrees of freedom in the FOM solution
𝑁𝑡 Number of time steps in the FOM solution
𝑁𝜇 Number of parameters in the training dataset
𝑁𝑧 Number of variables in the latent space
𝑁𝑙 Number of SINDy candidates
𝑁𝑠 Number of samples taken from each GP predictive distribution
𝑁𝑒𝑝𝑜𝑐ℎ Number of training epochs
𝑁𝑢𝑝 Greedy sampling rate
𝛽1 /𝛽2 /𝛽3 Loss hyperparameters
𝑚(∗) /𝑠(∗)
𝑗,𝑘 𝑗,𝑘
Predictive mean/standard deviation of each GP
(𝑑)
𝑐𝑗,𝑘 Sample 𝑑 from  (𝑚(∗) ∕𝑠(∗)2 )
𝑗,𝑘 𝑗,𝑘
𝑛 Dummy variable for time step indexing
𝑖 Dummy variable for parameter indexing
𝑗 Dummy variable for latent variable indexing
𝑘 Dummy variable for SINDy candidate indexing
𝑑 Dummy variable for GP sampling indexing
ℎ Dummy variable for epoch count

In Eq. (1), the solution 𝐮 may either represent a scalar or vector field, defined over the time–space domain [0, 𝑡max ] × 𝛺. The spatial
domain 𝛺 may be of any dimensions, and the differential operator 𝐟 may contain linear and/or non–linear combinations of spatial
derivatives and source terms. The governing equations and their initial/boundary conditions are parameterized by a parameter
vector 𝝁. The parameter space is defined as  ⊆ R𝑛𝜇 and may take any dimension (in the following sections, we consider a 2D-
parameter space, i.e. 𝑛𝜇 = 2). For a given parameter vector 𝝁(𝑖) ∈ , we assume in the following sections that we have access to
the corresponding discretized solution of Eq. (1), denoted as 𝐔(𝑖) . 𝐔(𝑖) is a matrix of concatenated snapshots 𝐮𝑛(𝑖) at each time step
𝑛, such that 𝐔(𝑖) = [𝐮(𝑖)
0
, … , 𝐮(𝑖)
𝑁
]⊤ ∈ R(𝑁𝑡 +1)×𝑁𝑢 . This solution is obtained using either a full order model solver or an experiment. In
𝑡
the following sections, every FOM solutions are organized into a 3rd order tensor dataset 𝐔 ∈ R𝑁𝜇 ×(𝑁𝑡 +1)×𝑁𝑢 of the form:

𝐔 = [𝐔(1) , … , 𝐔(𝑁𝜇 ) ]𝑁𝜇 ×(𝑁𝑡 +1)×𝑁𝑢 (2)

𝑁𝑡 and 𝑁𝑢 are the number of time steps and degrees of freedom, respectively, and 𝑁𝜇 is the number of available FOM solutions.
The notations employed in this paper are summarized in Table 1.

2.2. Autoencoders

An autoencoder [39,63] is a neural network designed specifically for compressing large datasets by reducing the dimensions of the
data through nonlinear transformation. It consists of two stacked neural networks: the encoder, denoted as 𝜙𝑒 and parameterized by
𝜽enc , and the decoder, denoted as 𝜙𝑑 and parameterized by 𝜽dec . The encoder takes an input data snapshot 𝐮𝑛(𝑖) ∈ R𝑁𝑢 and produces a
compressed representation 𝐳𝑛(𝑖) ∈ R𝑁𝑧 in a latent space. Here, 𝑁𝑧 represents the number of latent space variables, which is an arbitrary
design choice, but typically chosen such that 𝑁𝑧 << 𝑁𝑢 . Similar to the matrix 𝐔(𝑖) , we concatenate the latent representations at each
time step into a matrix 𝐙(𝑖) = [𝐳0(𝑖) , … , 𝐳𝑁
(𝑖) ⊤
] ∈ R(𝑁𝑡 +1)×𝑁𝑧 . The latent variables for each time step and each parameter 𝝁(𝑖) are stored
𝑡
in a third-order tensor 𝐙 ∈ R𝑁𝜇 ×(𝑁𝑡 +1)×𝑁𝑧 , analogous to the tensor 𝐔, and have the following form:

𝐙 = [𝐙(1) , … , 𝐙(𝑁𝜇 ) ]𝑁𝜇 ×(𝑁𝑡 +1)×𝑁𝑧 (3)

4
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

The decoder takes each 𝐳𝑛(𝑖) as input and generates a reconstructed version of 𝐮𝑛(𝑖) , denoted as 𝐮̂ 𝑛(𝑖) .

𝐳𝑛(𝑖) = 𝜙𝑒 (𝐮(𝑖)
𝑛 ∣ 𝜽enc )
(4)
𝐮̂ (𝑖) (𝑖)
𝑛 = 𝜙𝑑 (𝐳𝑛 ∣ 𝜽dec )

The parameters of the autoencoder, denoted as 𝜽enc and 𝜽dec , are learned using a numerical optimization algorithm that minimizes
the 𝐿2 norm of the difference between the set of input solutions 𝐔 and the set of reconstructed solutions 𝐔.̂ This is achieved by
defining the reconstruction loss as follows:
𝑁𝜇 ( 𝑁𝑡 )
̂ 2= 1 ∑ 1 ∑ (𝑖)
AE (𝜽enc , 𝜽dec ) = ‖𝐔 − 𝐔‖2
‖𝐮𝑛 − 𝜙𝑑 (𝜙𝑒 (𝐮𝑛(𝑖) ∣ 𝜽enc ) ∣ 𝜽dec )‖22 (5)
𝑁𝜇 𝑖=1 𝑁𝑡 + 1 𝑛=0

2.3. Identification of latent space dynamics

The encoder performs compression of the high-dimensional physical data, e.g., a solution of PDEs, defined over space and time,
into a reduced set of discrete latent variables that are defined solely over time. Consequently, the latent space can be regarded as
a dynamical system governed by sets of ordinary differential equations (ODEs). This characteristic forms a fundamental aspect of
LaSDI (Latent Space Dynamics Identification) algorithms, which enable the compression of dynamical systems governed by PDEs
into systems governed by ODEs [41]. At each time step, the dynamics of the latent variables in the latent space can be described
by an equation of the following form:

𝑑𝐳𝑛(𝑖) [[ ]] [[ ]]
= 𝜓𝐷𝐼 (𝐳𝑛(𝑖) ∣ 𝝁(𝑖) ) (𝑖, 𝑛) ∈ 1, 𝑁𝜇 × 0, 𝑁𝑡 (6)
𝑑𝑡
This equation can be unified across all time steps as:

𝐙̇ (𝑖) = 𝜓𝐷𝐼 (𝐙(𝑖) ∣ 𝝁(𝑖) ) (7)

The system of ODEs governing the dynamics of the latent space can be determined using a technique called SINDy (Sparse
Identification of Nonlinear Dynamics) [50]. In SINDy, a dictionary 𝜣(𝐙(𝑖) ) ∈ R(𝑁𝑡 +1)×𝑁𝑙 is constructed, consisting of 𝑁𝑙 linear and
nonlinear candidate terms that may be involved in the set of ODEs. The approach assumes that the time derivatives 𝐙̇ can be
expressed as a linear combination of these candidate terms. Hence, the right-hand side of Eq. (7) can be approximated as follows:

𝐙̇ (𝑖) ≈ 𝜣(𝐙(𝑖) ) ⋅ 𝜩 (𝑖)⊤ (8)


(𝑖)
where 𝜩 ∈ R𝑁𝑧 ×𝑁𝑙 denotes a coefficient matrix. The selection of terms in 𝜣(⋅) is arbitrary. Including a broader range of SINDy
terms may potentially capture the latent dynamics more accurately, but it can also result in sets of ODEs that are more challenging
to solve numerically. In the subsequent sections, we will typically limit the dictionary to constant and linear terms. Interestingly,
we have found that this restricted choice of terms is generally adequate to achieve satisfactory performance:

⎡1 ⎡ ⎤
𝐳0(𝑖) ⎤ ⎢1 𝑧(𝑖) … 𝑧(𝑖) … 𝑧(𝑖)
0,𝑁𝑧 ⎥
⎢ ⎥ ⎢ 0,1 0,𝑗

𝜣(𝐙(𝑖) ) = ⎢⋮ ⋮ ⎥⎥ = ⎢⋮ ⋮ ⋮ ⋮ ⎥ (9)
⎢ ⎢ ⎥
⎢ (𝑖) ⎥
⎣1 𝐳𝑁 ⎦ ⎢⎣1 𝑧(𝑖)
𝑁 ,1
… 𝑧(𝑖)
𝑁 ,𝑗
… 𝑧(𝑖)
𝑁 ,𝑁 ⎦

𝑡 𝑡 𝑡 𝑡 𝑧 (𝑁𝑡 +1)×𝑁𝑙

In the system, 𝑧(𝑖) (𝑖) (𝑖) (𝑖) (𝑖) (𝑖)


𝑛,𝑗 represents the 𝑗th latent variable for parameter 𝝁 at time step 𝑛, such that 𝐳𝑛 = [𝑧𝑛,1 , … , 𝑧𝑛,𝑗 , … , 𝑧𝑛,𝑁𝑧 ]. Sparse
(𝑖) (𝑖) [[ ]]
linear regressions are conducted between 𝜣(𝐙 ) and 𝑑𝑧∶,𝑗 ∕𝑑𝑡 for each 𝑗 ∈ 1, 𝑁𝑧 . The resulting coefficients associated with the
SINDy terms are stored in a vector 𝝃 (𝑖) (𝑖) (𝑖) 𝑁𝑙
𝑗 = [𝜉𝑗,1 , … , 𝜉𝑗,𝑁 ] ∈ R . More formally, the system of SINDy regressions is expressed as:
𝑙

⎡ 𝑧̇ (𝑖) ⎤
𝑑𝑧(𝑖)
∶,𝑗 ⎢ 0,𝑗 ⎥
= ⎢ ⋮ ⎥ = 𝜣(𝐙(𝑖) ) ⋅ 𝝃 (𝑖)⊤
𝑗 (10)
𝑑𝑡 ⎢𝑧̇ (𝑖) ⎥
⎣ 𝑁 ,𝑗 ⎦
𝑡

Each 𝝃 (𝑖)
𝑗is concatenated into a coefficient matrix 𝜩 (𝑖) = [𝝃 (𝑖)
1
, … , 𝝃 (𝑖)
𝑁𝑧
(𝑖)
]⊤ ∈ R𝑁𝑧 ×𝑁𝑙 . In this study, the time derivatives 𝑧̇ 𝑛,𝑗 are estimated
using a first-order finite difference with a time step 𝛥𝑡 that matches the time step used in the full order model (FOM) data. As
there is a distinct set of ODEs governing the dynamics of the latent space for each parameter 𝝁(𝑖) ∈ , multiple SINDy regressions
[[ ]]
are performed concurrently to obtain the corresponding sets of ODE coefficients 𝜩 (𝑖) , where 𝑖 ∈ 1, 𝑁𝜇 . The collection of ODE
(𝑁 ) 𝑁 ×𝑁 ×𝑁
coefficient matrices Ξ = [Ξ(1) , … , Ξ 𝜇 ] ∈ R 𝜇 𝑧 𝑙 , associated with each system of ODEs, is determined by minimizing the
following mean-squared-error SINDy loss:
𝑁𝜇 ( 𝑁𝑧 (𝑖) 2)
̇ ̇
̂ 2 1 ∑ 1 ∑ ‖ 𝑑𝑧∶,𝑗 (𝑖) (𝑖)⊤ ‖
SINDy (Ξ) = ‖𝐙 − 𝐙‖2 = ‖ − 𝜣(𝐙 ) ⋅ 𝝃 𝑗 ‖ (11)
𝑁𝜇 𝑖=1 𝑁𝑧 𝑗=1 ‖ 𝑑𝑡 ‖2

5
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

In the LaSDI framework, the autoencoder and the SINDy sparse regressions are jointly trained using a single loss function. To prevent
extreme values for the SINDy coefficients, which can result in ill-conditioned sets of ODEs, a penalty term is incorporated into the
loss. Thus, the LaSDI loss function is defined as follows:
(𝜽enc , 𝜽dec , 𝜩) = 𝛽1 AE (𝜽enc , 𝜽dec ) + 𝛽2 SINDy (𝜩) + 𝛽3 ‖𝜩‖22 , (12)
where 𝛽1 , 𝛽2 , and 𝛽3 are weighting hyperparameters. Note that another reconstruction term to the loss function can be added, namely
the 𝐿2 norm between the velocity 𝑼̇ and the reconstructed velocity 𝑼̂̇ for additional stability and accuracy. However, doing so
requires to have access to 𝑼̇ , and computing 𝑼̂̇ requires to backpropagate the autoencoder a second time [42]. This can be expensive,
especially with deeper autoencoders. Consequently in this paper, we do not use such loss term and only rely on the penalty term to
maintain stability of the SINDy coefficients. As shown in the example sections, this is sufficient to obtain satisfactory accuracy. The
autoencoder and SINDy coefficients can be simultaneously trained by minimizing Eq. (12) using a gradient-descent-based optimizer.
In the following sections, we utilize the Adam optimizer [64] with a learning rate of 𝛼.

2.4. Parameterization through coefficient interpolation

The autoencoder learns a mapping to and from the latent space and identifies the set of ODEs governing [[ ]] the latent dynamics,
but it does so only for each parameter 𝝁𝑖 ∈  associated with each training data point 𝐔(𝑖) (with 𝑖 ∈ 1, 𝑁𝜇 ). To make prediction
for any new parameter 𝝁(∗) , the system of ODEs associated to this later parameter value needs to be estimated. This can be done
(∗)
by finding a mapping 𝑓 ∶ 𝝁(∗) ↦ 𝜩 (∗) , where (𝑗, 𝑘)th element of 𝜩 (∗) is denoted as {𝜉𝑗,𝑘 }(𝑗,𝑘)∈[[1,𝑁𝑧 ]]×[[1,𝑁𝑙 ]] . In LaSDI and gLaSDI,
(𝑖)
𝑓 is estimated through RBF interpolation [41] or 𝑘−NN convex interpolation [42] of each pair of data (𝝁(𝑖) , {𝜉𝑗,𝑘 })𝑖∈[[1,𝑁𝜇 ]] . In this
paper, we introduce a replacement for the deterministic interpolation methods, utilizing GP regression [62]. The utilization of GPs
provides three significant advantages, which will be elaborated in the subsequent subsections:

• GPs have the inherent capability to automatically quantify interpolation uncertainties by generating confidence intervals for
their predictions.
• GPs are known for their robustness to noise, making them less prone to overfitting and mitigating the risk of incorporating
incorrect SINDy coefficients.
• Being part of the family of Bayesian methods, GPs allow us to incorporate prior knowledge about the latent space behavior,
particularly in terms of smoothness. By specifying a prior distribution, we can provide useful hints that enhance interpolation
accuracy within the GP framework.

The subsequent two subsections cover essential background information on Gaussian processes (GPs) and outline their specific
utilization within the context of this paper. In the first subsection, we provide an overview of GPs, while the second subsection
focuses on detailing the application of GPs in our approach.

2.4.1. Gaussian processes


Let us consider a dataset consisting of 𝑁𝜇 input–output pairs in the form of  = (𝑋, 𝑦). Here, 𝑋 represents a set of input vectors,
and 𝑦 represents the corresponding continuous scalar output. We assume that the output 𝑦 may be affected by Gaussian noise with
a variance of 𝜎 2 . Our objective is to find a mapping function 𝑓 such that:

𝑦 = 𝑓 (𝑋) + 𝜖 𝜖 ∼  (0, 𝜎 2 𝐼𝑁𝜇 ) (13)


In the Gaussian process (GP) paradigm, the function 𝑓 is assumed to be drawn from a Gaussian prior distribution (Eq. (14)) [62].
In the literature, it is common to set the mean of the prior to 0, while the covariance is determined by a kernel function 𝑘 with
parameters 𝜽gp . The choice of kernel is arbitrary, and in this context, we adopt the radial basis function (RBF) kernel (Eq. (15)).
This choice is particularly suitable because the RBF kernel can effectively approximate any  ∞ function. Furthermore, there is no a
priori reason to assume that the space of ordinary differential equation (ODE) coefficients lacks smoothness.

𝑝(𝑓 ∣ 𝑋) =  (𝑓 ∣ 0, 𝑘(𝑋, 𝑋 ∣ 𝜽gp )) (14)


( )
‖𝑋 − 𝑋 ⊤ ‖22
𝑘(𝑋, 𝑋 ∣ 𝜽gp ) = 𝛾 exp − 𝜽gp = {𝛾, 𝜆} (15)
2𝜆2
The likelihood (Eq. (16)) is obtained as a direct consequence of Eq. (13) and by applying Bayes’ rule in combination with the prior
distribution. This allows us to derive the posterior distribution over 𝑓 (Eq. (17)).

𝑝(𝑦 ∣ 𝑓 , 𝑋) =  (𝑦 ∣ 𝑓 (𝑋), 𝜎 2 𝐼𝑁𝜇 ) (16)

𝑝(𝑦 ∣ 𝑓 , 𝑋)𝑝(𝑓 ∣ 𝑋)
𝑝(𝑓 ∣ 𝑋, 𝑦) = (17)
𝑝(𝑦 ∣ 𝑋)
The denominator in Eq. (17), often referred to as the marginal likelihood, plays a crucial role in Bayesian inference. Typically,
the hyperparameters 𝜽gp are selected to maximize this marginal likelihood. Alternatively, it is common to minimize the negative
log–marginal–likelihood in Eq. (19), which is equivalent but computationally more efficient in practice.

𝑝(𝑦 ∣ 𝑋) = 𝑝(𝑦 ∣ 𝑓 , 𝑋)𝑝(𝑓 ∣ 𝑋)𝑑𝑓 (18)


6
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

GP (𝜽gp ) = − log(𝑝(𝑦 ∣ 𝑋)) (19)

Unlike parametric Bayesian machine learning models such as Bayesian neural networks, where the posterior is defined over the
model parameter space, Gaussian process models define the posterior over the function space. Consequently, predicting outputs for
test input points is not as straightforward. To derive the distribution over the predictive output 𝑦(∗) for a test input 𝑥(∗) , we need to
apply both the sum rule and the product rule in combination:

𝑝(𝑦(∗) ∣ 𝑋, 𝑦, 𝑥(∗) ) = 𝑝(𝑦(∗) ∣ 𝑓 , 𝑥(∗) )𝑝(𝑓 ∣ 𝑋, 𝑦)𝑑𝑓 (20)



In the present scenario, Bayesian inference proves to be highly tractable due to the Gaussian nature of the posterior distribution [62].
This implies that both the posterior distribution and Eq. (20) can be computed analytically, resulting in Gaussian distributions for
both:

𝑝(𝑦(∗) ∣ 𝑋, 𝑦, 𝑥(∗) ) =  (𝑦(∗) ∣ 𝑚(∗) , 𝑠(∗)2 ) (21)


{
𝑚(∗) = 𝑘(𝑥(∗) , 𝑋 ∣ 𝜽gp )(𝑘(𝑋, 𝑋 ∣ 𝜽gp ) + 𝜎 2 𝐼𝑁𝜇 )−1 𝑦
(22)
𝑠(∗)2 = 𝑘(𝑥(∗) , 𝑥(∗) ∣ 𝜽gp ) − 𝑘(𝑥(∗) , 𝑋 ∣ 𝜽gp )(𝑘(𝑋, 𝑋 ∣ 𝜽gp ) + 𝜎 2 𝐼𝑁𝜇 )−1 𝑘(𝑋, 𝑥(∗) ∣ 𝜽gp )

2.4.2. GP interpolation of the SINDy coefficients


After training the autoencoder and obtaining the SINDy coefficients, we proceed to construct 𝑁𝑔𝑝 = 𝑁𝑧 × 𝑁𝑙 regression datasets,
(𝑖) (𝑖) (𝑖)
where each dataset corresponds to an ODE coefficient and consists of 𝑁𝜇 data points: (𝑋𝑗,𝑘 , 𝑦𝑗,𝑘 ) = (𝝁(𝑖) , {𝜉𝑗,𝑘 })𝑖∈[[0,𝑁𝜇 ]] . Subsequently,
a Gaussian Process (GP) is trained for each dataset. For a given test parameter 𝝁(∗) , the predictive mean and standard deviation
are denoted as 𝑚(∗)
𝑗,𝑘
and 𝑠(∗)
𝑗,𝑘
, respectively. To illustrate, let us consider a LaSDI system that solely incorporates constant and linear
candidate terms (i.e., 𝑁𝑙 = 𝑁𝑧 + 1). The system of ODEs, considering a 1-standard deviation uncertainty for the test parameter 𝝁(∗) ,
is as follows:
⎧ (∗)
⎪ 𝑑𝑧∶,1 ( (∗) ) ( ) ( )
⎪ = 𝑚1,1 ± 𝑠(∗)
1,1
𝑧(∗)
∶,1
+ ⋯ + 𝑚(∗)
1,𝑁
± 𝑠(∗)
1,𝑁
𝑧(∗)
∶,𝑁
+ 𝑚(∗)1,𝑁
± 𝑠(∗)
1,𝑁
⎪ 𝑑𝑡 𝑧 𝑧 𝑧 𝑙 𝑙

⎨ ⋮ (23)
⎪ 𝑑𝑧(∗) ( ) ( ) ( )
⎪ ∶,𝑁𝑧 (∗) (∗) (∗) (∗) (∗) (∗) (∗) (∗)
⎪ 𝑑𝑡 = 𝑚𝑁𝑧 ,1 ± 𝑠𝑁𝑧 ,1 𝑧∶,1 + ⋯ + 𝑚𝑁𝑧 ,𝑁𝑧 ± 𝑠𝑁𝑧 ,𝑁𝑧 𝑧∶,𝑁𝑧 + 𝑚𝑁𝑧 ,𝑁𝑙 ± 𝑠𝑁𝑧 ,𝑁𝑙

with:
⎧ (∗) 𝑗,𝑘 𝑗,𝑘
⎪ 𝑚𝑗,𝑘 = 𝑘(𝝁(∗) , 𝑋𝑗,𝑘 ∣ 𝜽gp )(𝑘(𝑋𝑗,𝑘 , 𝑋𝑗,𝑘 ∣ 𝜽gp ) + 𝜎 2 𝐼𝑁𝜇 )−1 𝑦𝑗,𝑘
⎨ (24)
⎪ 𝑠(∗)2 = 𝑘(𝝁(∗) , 𝝁(∗) ∣ 𝜽𝑗,𝑘 ) − 𝑘(𝝁(∗) , 𝑋 ∣ 𝜽𝑗,𝑘 )(𝑘(𝑋 , 𝑋 ∣ 𝜽𝑗,𝑘 ) + 𝜎 2 𝐼 )−1 𝑘(𝑋 , 𝝁(∗) ∣ 𝜽𝑗,𝑘 )
⎩ 𝑗,𝑘 gp 𝑗,𝑘 gp 𝑗,𝑘 𝑗,𝑘 gp 𝑁𝜇 𝑗,𝑘 gp

GP training typically faces scalability challenges when dealing with large datasets due to the need to invert the 𝑁𝜇 × 𝑁𝜇 kernel
matrix, which has a computational complexity of (𝑁𝜇3 ). While the computational cost of training the GPs should only become an
issue which larger values of 𝑁𝜇 (e.g. 𝑁𝜇 ≫ 500), GP scalability can be addressed in different ways. One approach is to partition the
parameter space into multiple subdomains and construct a separate GP for each subdomain [65]. This strategy effectively mitigates
the computational burden associated with GPs when dealing with large datasets. Another approach is to use approximate scalable
GP methods such as kernel interpolation [66,67] or hyperparameter cross-validation [68].

2.5. Predicting solutions

By employing the GP interpolation approach discussed in the preceding sub-section, we are able to make predictions for the
governing equations of the latent space dynamics, along with associated uncertainty, for any test point 𝝁(∗) ∈  within the parameter
space. Consequently, we can now proceed with ease to generate Reduced Order Model (ROM) predictions using the following steps:
[[ ]] [[ ]]
• Compute the set of {𝑚(∗) , 𝑠(∗) } for each (𝑗, 𝑘) ∈ 1, 𝑁𝑧 × 1, 𝑁𝑙 and determine the governing Ordinary Differential Equations
𝑗,𝑘 𝑗,𝑘
(ODEs), utilizing Eq. (23) and Eq. (24), respectively.
• Convert the initial conditions 𝐮(∗) 0
into latent space initial conditions 𝐳0(∗) by performing a forward pass through the encoder
network:

𝐳0(∗) = 𝜙enc (𝐮(∗)


0
∣ 𝜽enc ) (25)

• Perform the simulation of the latent space dynamics by numerically solving the system of ODEs using commonly used
integration schemes such as backward Euler, Runge–Kutta, or other suitable methods. The resulting simulated latent space
̃ (∗) = [𝐳(∗) , 𝐳̃ (∗) , … , 𝐳̃ (∗) ] ∈ R𝑁𝑧 ×(𝑁𝑡 +1) , where 𝑁𝑡 represents the number of time steps.
dynamics is denoted as 𝐙 0 1 𝑁 𝑡

7
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

• Utilize the forward pass of 𝐙


̃ (∗) through the decoder network to generate predictions of the Full Order Model (FOM) physics.
The output of the decoder is denoted as 𝐔̃ (∗) = [̃𝐮(∗)
0
, … , 𝐮̃ (∗)
𝑁
] ∈ R𝑁𝑢 ×(𝑁𝑡 +1) .
𝑡

The uncertainty associated with each ODE coefficient is effectively quantified through GP interpolation. While it is possible to
utilize the predictive GP mean 𝑚(∗)
𝑗,𝑘
for each coefficient to obtain a mean prediction for FOM, an alternative approach is to use
random samples. This approach yields multiple sets of ODEs for a single test point 𝝁(∗) , resulting in multiple corresponding latent
space dynamics, each with varying likelihood. For instance, considering linear and constant candidate terms, we can generate 𝑁𝑠
sets of ODEs:
⎧ 𝑑𝑧(∗)
⎪ ∶,1 = 𝑐 (𝑑) 𝑧(∗) + ⋯ + 𝑐 (𝑑) 𝑧(∗) + 𝑐 (𝑑)
⎪ 𝑑𝑡 1,1 ∶,1 1,𝑁𝑧 ∶,𝑁𝑧 1,𝑁𝑙

⎨ ⋮ (26)
⎪ (∗)
⎪ 𝑑𝑧∶,𝑁𝑧 (𝑑)
⎪ = 𝑐𝑁 𝑧(∗) + ⋯ + 𝑐𝑁
(𝑑)
𝑧(∗) + 𝑐𝑁(𝑑)
⎩ 𝑑𝑡 𝑧 ,1 ∶,1 𝑧 ,𝑁𝑧 ∶,𝑁𝑧 𝑧 ,𝑁𝑙

with:
(𝑑)
𝑐𝑗,𝑘 ∼  (𝑚(∗) , 𝑠(∗)2 )
𝑗,𝑘 𝑗,𝑘
(27)
[[ ]
]
̃ (∗,𝑑) for each sample 𝑑 ∈ 1, 𝑁𝑠 (in the following sections,
Next, we proceed to solve the corresponding latent space dynamics 𝐙
we use 𝑁𝑠 = 20). By making forward passes through the decoder network, we can estimate the uncertainty over the Full Order
Model (FOM) prediction:
𝑁𝑠
1 ∑ [[ ]]
E[̃𝐮(∗)
𝑛 ]= 𝜙 (̃𝐳(∗,𝑑) ∣ 𝜽dec ) 𝑛 ∈ 0, 𝑁𝑡 (28)
𝑁𝑠 𝑑=1 dec 𝑛
𝑁𝑠
1 ∑ [[ ]]
V[̃𝐮(∗)
𝑛 ]= (𝜙 (̃𝐳(∗,𝑑) ∣ 𝜽dec ) − E[̃𝐮(∗)
𝑛 ])
2
𝑛 ∈ 0, 𝑁𝑡 (29)
𝑁𝑠 𝑑=1 dec 𝑛

The expected solution and variance of the ROM are denoted as E[𝐔̃ (∗) ] = [E[̃𝐮(∗)
0
], … , E[̃𝐮(∗)
𝑁𝑡
]] and V[𝐔̃ (∗) ] = [V[̃𝐮(∗)
0
], … , V[̃𝐮(∗)
𝑁𝑡
]],
respectively.

2.6. Variance-based greedy sampling

In most applications, the accuracy of machine learning models, including the one described in this paper, is expected to improve
with the availability of additional training data [69]. Hence, selecting an appropriate parameter value, denoted as 𝝁(𝑁𝜇 +1) , for
running a FOM simulation and acquiring more training data is crucial. One intuitive approach is to choose a value of 𝝁(𝑁𝜇 +1) that
hinders the most the model’s ability from providing satisfactory accuracy. To achieve this, it is necessary to accurately assess the
model’s performance for any given 𝝁(∗) . In the gLaSDI framework [42], this evaluation is accomplished by examining the decoder
predictions for values of 𝝁 sampled from a discretized grid of the parameter space, denoted as ℎ ⊆ . These predictions are then
incorporated into the PDE residual. However, this approach requires explicit knowledge of the PDE and can be computationally
expensive and cumbersome to implement. In GPLaSDI, we propose an alternative method based on the uncertainty associated with
the decoder predictions. Specifically, if the variance, denoted as V[𝐔̃ (∗) ], is large compared to other points in the parameter space,
then the expected value, denoted as E[𝐔̃ (∗) ], is more likely to be inaccurate. Consequently, we select 𝝁(𝑁𝜇 +1) such that:
[ ]
𝝁(𝑁𝜇 +1) = arg max max V[𝐔̃ (∗) ] (30)
𝝁(∗) ∈ℎ (𝑡,𝑥)

In this paper, we evaluate V[𝐔̃ (∗) ] at each and every point 𝝁(∗) of ℎ . An alternative approach, faster, but less accurate, is to use
only a handful of random points within ℎ [42]. As for the acquisition of new data, we adopt a consistent sampling rate (i.e., every
fixed 𝑁𝑢𝑝 training iterations). The steps involved in GPLaSDI, as discussed in Section 2, are summarized in Algorithm 1.

3. Application

In the following sections, we demonstrate the effectiveness of our method on multiple examples. To quantify the accuracy of
GPLaSDI, we use the maximum relative error, defined as:
( (∗) )
‖̃𝐮𝑛 − 𝐮(∗)
𝑛 ‖2
𝑒(𝐔̃ (∗) , 𝐔(∗) ) = max (31)
(∗)
𝑛
‖𝐮𝑛 ‖2
where 𝐮̃ (∗) (∗)
𝑛 and 𝐮𝑛 are the decoder ROM prediction and the ground truth at each time step, respectively (with GPLaSDI, we take
(∗) (∗)
𝐮̃ 𝑛 ≡ E[̃𝐮𝑛 ]). Each example are trained and tested on a compute node of the Livermore Computing Lassen supercomputer at the
Lawrence Livermore National Laboratory, using a NVIDIA V100 (Volta) 64Gb GDDR5 GPU. Our GPLaSDI model is implemented
using the open–source libraries PyTorch [70] (for the autoencoder component) and sklearn (for the GP component) [71]. All
the numerical examples shown in this paper can be regenerated by our open source code, GPLaSDI, which is available at GitHub
page, i.e., https://github.com/LLNL/GPLaSDI.

8
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

Algorithm 1 Autoencoder Training with Variance–based Greedy Sampling


Require: 𝐔 = [𝐔(1) , … , 𝐔(𝑁𝜇 ) ], 𝑁𝜇 , 𝑁𝑒𝑝𝑜𝑐ℎ , 𝑁𝑧 , 𝑁𝑙 , 𝑁𝑢𝑝 , 𝛼, 𝛽1 , 𝛽2 , 𝛽3 , 𝛩 (⋅), ℎ , 𝜙enc , 𝜙dec
𝑗,𝑘
1: Initialize 𝜃 enc , 𝜃 dec , 𝛯 , 𝜃 gp randomly, and ℎ = 0
2: while ℎ < 𝑁𝑒𝑝𝑜𝑐ℎ do
3: Compute 𝐙 = 𝜙enc (𝐔 | 𝜃 enc ) and 𝐔̂ = 𝜙dec (𝐙 | 𝜃 dec ) (See Eq. (4))
4: Compute (𝜃𝜃 enc , 𝜃 dec , 𝛯 ) (See Eq. (12))
5: Update 𝜃 enc , 𝜃 dec , and 𝛯 using Adam algorithm, 𝛼 and ∇(𝜃𝜃 enc , 𝜃 dec , 𝛯 )
6: if ℎ mod 𝑁𝑢𝑝 ≡ 0 then
7: for (𝑗, 𝑘) ∈ [[1, 𝑁𝑧 ]] × [[1, 𝑁𝑙 ]] do
(𝑖)
8: Build dataset (𝑋𝑗,𝑘 , 𝑦𝑗,𝑘 ) = (𝜇 𝜇 (𝑖) , {𝜉𝑗,𝑘 })𝑖∈[[1,𝑁𝜇 ]]
9: Find 𝜃 𝑗,𝑘
gp = arg min GP (𝜃 𝜃 gp ) (See Eq. (19))
10: end for
11: for 𝜇 (∗) ∈ ℎ do
12: for (𝑗, 𝑘) ∈ [[1, 𝑁𝑧 ]] × [[1, 𝑁𝑙 ]] do
13: Compute {𝑚(∗) , 𝑠(∗) } (See Eq. (24))
𝑗,𝑘 𝑗,𝑘
14: for 𝑑 ∈ [[1, 𝑁𝑠 ]] do
(𝑑)
15: Sample 𝑐𝑗,𝑘 ∼  (𝑚(∗) , 𝑠(∗)2 ) (See Eq. (27))
𝑗,𝑘 𝑗,𝑘
16: end for
17: end for
18: for 𝑑 ∈ [[1, 𝑁𝑠 ]] do
(𝑑)
19: Build the system of ODEs using {𝑐𝑗,𝑘 }(𝑗,𝑘)∈[[1,𝑁𝑧 ]]×[[1,𝑁𝑙 ]]
20: ̃
Solve for 𝐙 (∗,𝑑) ̃
and evaluate 𝐔 (∗,𝑑) = 𝜙dec (𝐙 ̃ (∗,𝑑) | 𝜃 dec )
21: end for
22: Compute max V[𝐔̃ (∗) ] (See Eq. (29))
(𝑡,𝑥)
23: end for [ ]
24: Find 𝜇 (𝑁𝜇 +1) = arg max max V[𝐔̃ (∗) ] (See Eq. (30))
𝜇 (∗) ∈ℎ (𝑡,𝑥)
25: Collect 𝐔(𝑁𝜇 +1) by running the FOM
26: Update 𝐔 = [𝐔(1) , … , 𝐔(𝑁𝜇 ) , 𝐔(𝑁𝜇 +1) ] and 𝑁𝜇 = 𝑁𝜇 + 1
27: end if
28: Update ℎ = ℎ + 1
29: end while

3.1. 1D Burgers equation

We first consider the inviscid 1D Burgers Equation, which was initially introduced in [41] and further discussed in [42]:
⎧ 𝜕𝑢 𝜕𝑢
⎪ +𝑢 =0 (𝑡, 𝑥) ∈ [0, 1] × [−3, 3]
⎨ 𝜕𝑡 𝜕𝑥 (32)
⎪𝑢(𝑡, 𝑥 = 3) = 𝑢(𝑡, 𝑥 = −3)

The initial condition is parameterized by 𝝁 = {𝑎, 𝑤} ∈ , and the parameter space is defined as  = [0.7, 0.9] × [0.9, 1.1]:
( )
𝑥2
𝑢(𝑡 = 0, 𝑥) = 𝑎 exp − 𝝁 = {𝑎, 𝑤} (33)
2𝑤 2

The FOM solver utilizes an implicit backward Euler time integration scheme and a backward finite difference discretization in space.
The spatial stepping is set to 𝛥𝑥 = 6 ⋅ 10−3 and the time stepping to 𝛥𝑡 = 10−3 . In this section and in Section 3.2, the discretization is
based on the one used in [41,42], and is chosen to ensure stability at any point of the parameter space. For the purpose of greedy
sampling, the parameter space is discretized into a square grid ℎ with a stepping of 𝛥𝑎 = 𝛥𝑤 = 0.01, resulting in a total of 441
grid points (21 values in each dimension). The initial training dataset consists of 𝑁𝜇 = 4 FOM simulations, corresponding to the
parameters located at each corner of ℎ . Specifically, the parameter values are 𝝁(1) = {0.7, 0.9}, 𝝁(2) = {0.9, 0.9}, 𝝁(3) = {0.7, 1.1} and
𝝁(4) = {0.9, 1.1}.
The encoder architecture follows a 1001–100–5 structure, comprising one hidden layer with 100 hidden units and 𝑁𝑧 = 5 latent
variables. The decoder has a symmetric configuration to the encoder. It employs a sigmoid activation function. Note that in this
paper, we select the autoencoder architecture through a random search using only the initial four corner training datapoints, on
trainings that are run for at most 𝑁𝑢𝑝 epochs (i.e. no sampling of additional FOM data). This is done to ensure that the training loss

9
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

Fig. 2. 1D Burgers Equation — Predictive mean of each ODE coefficients (𝑚(∗)


𝑗,𝑘
) given 𝝁(∗) at the end of the training. The black marks represent each sampled
data point.

does not get stuck and that the learning rate is appropriate. The cost of running these search is cheap since there is few training
points and no FOM sampling. Throughout each examples, as a rule of thumb, we have found that decreasing the width of each
hidden layer by an order of magnitude compared to the previous layer generally yields satisfactory performance. It is possible that
other architectures may lead to equivalent, better, or worse performance (e.g. convolutional layers, etc.).
To identify the latent space dynamics, the dictionary of possible terms 𝜣(⋅) includes only linear terms and a constant, resulting
in 𝑁𝑙 = 6 terms. The autoencoder is trained for 𝑁𝑒𝑝𝑜𝑐ℎ = 2.8 ⋅ 104 iterations, with a learning rate 𝛼 = 10−3 . A new FOM data point
is sampled every 𝑁𝑢𝑝 = 2000 iterations (resulting in adding 13 data points during training, for a total of 17 training points). For
estimating the prediction variance, 20 samples are used (i.e., 𝑁𝑠 = 20). The loss hyperparameters are set as 𝛽1 = 1, 𝛽2 = 0.1, and
𝛽3 = 10−6 . The hyperparameters employed in this example are based on [42]. Additional details on the effects of hyperparameter
selection can be found in [41,42].
For baseline comparison, a gLaSDI model [42] is also trained using identical hyperparameter settings, autoencoder architecture,
and GP interpolation of the latent space dynamics. The key difference lies in the data sampling strategy: gLaSDI employs PDE
residual error-based sampling, while GPLaSDI utilizes uncertainty-based sampling.
The system of governing ODEs for the latent space dynamics consists of 30 coefficients, each of which is interpolated by a GP.
Fig. 2 illustrates the predictive mean of each GP, while Fig. 3 displays the corresponding standard deviation. Fig. 4 showcases
the maximum relative error (in percentage) for each test point in ℎ , obtained using GPLaSDI and gLaSDI [42]. Remarkably, our
GPLaSDI framework achieves outstanding performance, with the worst maximum relative error being below 5%, and the majority
of predictions exhibiting less than 3.5% error. Moreover, for this particular example, GPLaSDI slightly outperforms gLaSDI, where
the worst maximum relative error reaches 5%.

10
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

Fig. 3. 1D Burgers Equation — Predictive standard deviation of each ODE coefficients (𝑠(∗)𝑗,𝑘
) given 𝝁(∗) . The heatmaps are similar for each coefficients, which
is expected because here the input point locations within ℎ are all the same. Notice that the uncertainty is higher in regions with no training data points, as
one might intuitively expect.

In Fig. 5, the maximum standard deviation max(𝑡,𝑥) V[𝐔̃ (∗) ]1∕2 is depicted, representing the uncertainty in the decoder ROM
predictions. As expected, the standard deviation patterns closely match the predictive standard deviation of each GP (Fig. 3). There is
also a significant correlation between high standard deviation and high maximum relative error (Fig. 4). This correlation is valuable
as it indicates that the uncertainty in the ROM model can be reliably quantified. This observation may also explain why GPLaSDI
performs similarly to gLaSDI. A ROM that incorporates knowledge of the underlying physics is generally expected to outperform
purely data-driven models due to additional insights. However in this case, GPLaSDI, despite being agnostic to the PDE (unlike
gLaSDI), effectively captures the correlation between prediction uncertainty and ROM error. It therefore serves as a robust surrogate
for the PDE residual error.
Fig. 6 illustrates the model prediction for parameter 𝝁(∗) = {0.73, 0.92}, corresponding to the case with the largest relative error.
Despite the larger error, the model predictions remain reasonable, and there is a clear correlation between the predictive standard
deviation and the absolute error compared to the ground truth. The absolute error consistently falls within one standard deviation.
During 50 test runs, the FOM wall clock run-time averages at 1.31 s using a single core. On the other hand, the ROM model
requires an average of 6.36 ⋅ 10−3 seconds, resulting in an average speed-up of 206×. It is important to note that in this case, the
system of ODEs is solved using the GP predictive means (𝑚(∗) 𝑗,𝑘
), requiring only one integration of the system of ODEs and one
forward pass through the decoder. This approach does not allow for predictive standard deviation estimations since multiple sets
of ODEs would need to be solved, and an equivalent number of forward passes would be required. Therefore, if we aim to make a
ROM prediction with uncertainty estimation using, for example, 10 samples, the speed-up would reduce to roughly 20×. Note that
running the ROM predictions for multiple samples could also be done in an embarrassingly parallel way, which would limit the
deterioration of speed-up performances.

11
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

Fig. 4. 1D Burgers Equation — Maximum relative error (%) using GPLaSDI (left) and gLaSDI (right). The values in a red square correspond to the original FOM
data at the beginning of the training (located at the four corners). The values in a black square correspond to parameters and FOM runs that were sampled
during training. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 5. 1D Burgers Equation — Maximum predictive standard deviation for GPLaSDI. The numbers inside each box are in scientific notations, scaled by the
factor of 10 specified in the title. For example, the maximum value across the figure is 1.9 ⋅ 10−1 .

As in any neural-network-based algorithm, precisely understanding the source of errors can be challenging. There exist however
several ways of pinpointing and mitigating possible errors in GPLaSDI:

• Compute the maximum relative error for points belonging to the training set (i.e. reproductive case), where ground truth data is
available. Note that the training data comes from deterministic numerical solvers with unchanged simulation settings, and can
thus be considered as noiseless data (as opposed to experimental data for instance). Therefore, overfitting of the autoencoder
is not a primary concern and is not expected to be a major source of error. On the other hand, underfitting and/or using an
autoencoder that does not have an architecture capable of capturing all the complexity in the physics is always a risk. This
can be easily assessed by looking at the autoencoder prediction error on the training data (e.g. mean-squared-error between
̂
𝐔 and 𝐔).

12
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

̃ (∗) ] with a 95% confidence interval,


Fig. 6. 1D Burgers Equation — Predictions for 𝝁(∗) = {0.73, 0.92}. This plot shows the predicted latent space dynamics E[𝐙
the ROM mean prediction E[𝐔̃ (∗) ] (decoder output) and standard deviation, the ground truth, and the absolute error.

• Comparing 𝐙̂ and 𝐙 ̃ through preliminary visual inspection and error metrics such as mean-squared-error. If the error between
𝐔 and 𝐔̂ is low, but the error between 𝐙̂ and 𝐙̃ is high, this would indicate that the autoencoder is well trained, but that the
model fails to reproduce the dynamics of 𝐙,̂ and more emphasis needs to be put on the SINDy loss (i.e. increase 𝛽2 ).
• In this paper, to demonstrate the performance of GPLaSDI, we have generated FOM testing data for every single point in
ℎ (to compute the maximum relative error). In practice, doing so may be very expensive and would most likely defeat the
purpose of a ROM. It may be possible however to generate only a handful of test FOM datapoints at random locations of
the parameter space. Another way of estimating the error could be to feed the ROM predictions into the PDE residual and
compute the residual error, if available. This would however defeat the purpose of a non-intrusive ROM, and in such case,
using an intrusive approach such as gLaSDI [42] may be desirable. Note that a hybrid algorithm, using both the PDE residual
(for physics information) and GPs to interpolate the sets of ODEs (for uncertainty quantification) is perfectly conceivable, but
left to future work.

3.2. 2D Burgers equation

We now consider the 2D Burgers equation with a viscous term, as introduced in [41,42]:
⎧ 𝜕𝐮 1
⎪ + 𝐮 ⋅ ∇𝐮 = 𝛥𝐮 (𝑡, 𝑥, 𝑦) ∈ [0, 1] × [−3, 3] × [−3, 3]
⎨ 𝜕𝑡 𝑅𝑒 (34)
⎪𝐮(𝑡, 𝑥 = ±3, 𝑦 = ±3) = 𝟎

The initial condition is analogous to the 1D case:
( 2 ) {
𝑥 + 𝑦2 𝝁 = {𝑎, 𝑤}
𝑢(𝑡 = 0, 𝑥, 𝑦) = 𝑣(𝑡 = 0, 𝑥, 𝑦) = 𝑎 exp − (35)
2𝑤2 𝐮 = {𝑢, 𝑣}

We consider a Reynolds number of 𝑅𝑒 = 105 . The FOM solver employs an implicit backward Euler time integration scheme, a
backward finite difference for the nonlinear term, and a central difference discretization for the diffusion term. The spatial resolution
is set to 𝛥𝑥 = 𝛥𝑦 = 0.1, and the time stepping is set to 𝛥𝑡 = 5 ⋅ 10−3 . These settings remain consistent with the 1D case. However, in
this scenario, we employ a neural network architecture of 7200–100–20–20–20-5 for the encoder, and a symmetric architecture for
the decoder. The activation function used is softplus. The autoencoder is trained for 𝑁𝑒𝑝𝑜𝑐ℎ = 7.5 ⋅ 105 iterations, with a sampling
rate of 𝑁𝑢𝑝 = 5 ⋅ 104 (resulting in adding 14 data points during training, for a total of 18 training points).
Fig. 7 depicts the maximum relative error for each point in the parameter space, comparing the performance of GPLaSDI and
gLaSDI. GPLaSDI achieves consistently low maximum relative errors across most of the parameter space, typically ranging from 0.5%
to 1%, with a maximum error not exceeding 3.8%. On the other hand, gLaSDI exhibits a larger maximum relative error, reaching
up to 4.6%.
In Fig. 8, we observe the maximum standard deviation, which, similar to the 1D case, exhibits a clear correlation with the
maximum relative error.

13
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

Fig. 7. 2D Burgers Equation – Maximum relative error (%) using GPLaSDI (left) and gLaSDI (right).

Fig. 8. 2D Burgers Equation — Maximum predictive standard deviation for GPLaSDI.

Fig. 9 presents the dynamics of the latent space, the predicted and ground truth 𝐮 fields, the absolute error, and the fields’
predictive standard deviations for the least favorable case (𝝁(∗) = {0.73, 1.07}) at time points 𝑡 = 0.25 and 𝑡 = 0.75. The error
predominantly concentrates along the shock front, where the discontinuity forms. However, the error remains well within the
predictive standard deviation, affirming the capability of GPLaSDI to provide meaningful confidence intervals.
During 50 test runs, the FOM wall clock run-time averaged 63.5 s using a single core. In contrast, the ROM model achieved an
average runtime of blue9.13 ⋅ 10−3 seconds, resulting in an average speed-up of 6949×. Similarly to the 1D case, this speed-up is
attained solely using the mean prediction. To incorporate uncertainty prediction, additional ODE integrations and decoder forward
passes would be required, thereby slightly diminishing the speed-up gains.

14
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

Fig. 9. 2D Burgers Equation — Predictions for 𝝁(∗) = {0.73, 1.07} at 𝑡 = 0.25 (a) and 𝑡 = 0.75 (b). These plot show the predicted latent space dynamics E[𝐙 ̃ (∗) ]
with a 95% confidence interval, the ROM mean prediction for 𝐮 = {𝑢, 𝑣} ≡ {E[𝑢̃ (∗) ], E[𝑣̃(∗) ]} (decoder output) and standard deviation, the ground truth, and the
absolute error.

3.3. Two–stream plasma instability

In this section, we consider the simplified 1D–1V Vlasov–Poisson equation:


( )
⎧ 𝜕𝑓 + 𝜕 (𝑣𝑓 ) + 𝜕 𝑑𝜙
𝑓 = 0, (𝑡, 𝑥, 𝑣) ∈ [0, 5] × [0, 2𝜋] × [−7, 7],
⎪ 𝜕𝑡 𝜕𝑥 𝜕𝑣 𝑑𝑥
⎨ 𝑑2𝜙 (36)
⎪ = 𝑓 𝑑𝑣,
⎩ 𝑑𝑥2 ∫𝑣
where we consider a plasma distribution function denoted as 𝑓 ≡ 𝑓 (𝑥, 𝑣), which depends on the variables 𝑥 (physical coordinate)
and 𝑣 (velocity coordinate). The equation also involves the electrostatic potential 𝜙. This simplified model governs 1D collisionless
electrostatic plasma dynamics and is representative of complex models for plasma behavior in various applications, including
proposed fusion power plant designs. It is important to note that kinetic modeling of plasmas leads to high-dimensional PDEs;
although Eq. (36) describes one-dimensional dynamics, it is a two-dimensional PDE.
In this example, we focus on solving the two-stream instability problem that is a canonical problem used to simulate the excitation
of a plasma wave from counterstreaming plasmas. The initial solution is:
[ ][ ( ) ( )]
4 1 (𝑣 − 2)2 (𝑣 + 2)2
𝑓 (𝑡 = 0, 𝑥, 𝑣) = 1+ cos(𝑘𝜋𝑥) exp − + exp − , (37)
𝜋𝑇 10 2𝑇 2𝑇

15
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

Fig. 10. 1D1V Vlasov Equation — Maximum relative error (%) using GPLaSDI and a uniform training grid (non-greedy).

where 𝑇 represents the plasma temperature. The parameters involved are denoted as 𝝁 = {𝑇 , 𝑘}, where 𝑇 ranges from 0.9 to 1.1,
and 𝑘 ranges from 1.0 to 1.2. We discretize the parameter space over a 21 × 21 grid ℎ , with a step size of 𝛥𝑇 = 𝛥𝑘 = 0.01. The
training process is initialized with 𝑁𝜇 = 4 training data points located at the four corners of the parameter space. The FOM data is
sampled using HyPar [72], a conservative finite difference PDE code. It utilizes the fifth order WENO discretization [73] in space
(𝛥𝑥 = 2𝜋∕128, 𝛥𝑣 = 7∕128) and the classical four-stage, fourth-order Runge–Kutta time integration scheme (𝛥𝑡 = 5 ⋅ 10−3 ). In this
section and in Section 3.4, the discretization is based on the showcasing examples of HyPar [72], and is chosen to ensure stability
at any point of the parameter space.
For the neural network architecture, we use a 16384–1000–200–50–50–50–5 configuration for the encoder (and a symmetric
architecture for the decoder). The activation function employed is softplus. The latent space consists of 𝑁𝑧 = 5 variables, and only
linear terms are considered for the SINDy library. The loss hyperparameters are set as 𝛽1 = 1, 𝛽2 = 0.1, and 𝛽3 = 10−5 . To estimate
the prediction variance, we utilize 𝑁𝑠 = 20 samples. The training process involves 𝑁𝑒𝑝𝑜𝑐ℎ = 6.5 ⋅ 105 epochs with a learning rate of
𝛼 = 10−5 and 𝑁𝑢𝑝 = 5 ⋅ 104 updates (resulting in adding 12 data points during training, for a total of 16 training points).
To provide a baseline for comparison, we evaluate the performance of GPLaSDI against an autoencoder trained with the same
settings and hyperparameters on a uniform parameter grid. The uniform grid consists of 16 data points arranged in a 4 × 4 grid.
In the baseline model, the interpolations of the latent space ODE coefficients are performed using GPs. Similar to GPLaSDI, the
training process for the baseline model also incorporates active learning. It begins with the initial four corner points, and at every
𝑁𝑢𝑝 = 5 ⋅ 104 iterations, a new point is randomly selected from the uniform grid.
Fig. 10 presents the maximum relative error for each point in the parameter space obtained using GPLaSDI and the baseline
model. With GPLaSDI, the worst maximum relative error is 6.1%, and in most regions of the parameter space, the error remains
within the range of 1.5 − 3.5%. The highest errors are concentrated towards smaller values of 𝑘 (typically 𝑘 < 1.07). Compared to
uniform sampling, GPLaSDI outperforms the baseline model, which achieves a maximum relative error of 7.4%.
Fig. 11 illustrates the maximum standard deviation. Although it correlates with the relative error, the correlation is only partial
in this example. The standard deviation is low for parameters that correspond to a training point, indicating reproductive cases.
However, the relative error can still be somewhat high in these cases. For example, for 𝝁(∗) = {0.96, 1.15}, the maximum standard
deviation max(𝑡,𝑥,𝑣) V[𝑓̃(∗) ]1∕2 is 0.0, while the relative error 𝑒(𝑓̃(∗) , 𝑓 (∗) ) is 2.3%. The GPs interpolate the sets of ODEs governing the
latent space dynamics, so the uncertainty quantification reflects the uncertainty in the latent space rather than the uncertainty in
the training of the encoder and/or decoder. Therefore, it is possible that in some case, the uncertainty quantification in GPLaSDI
may only provide a partial depiction of the model uncertainty. Using a Bayesian neural network (BNN) in place of the encoder and
decoder could provide a fuller picture of the model uncertainty, but training BNNs is notoriously difficult and expensive [74].
Fig. 12 displays the latent space dynamics, including the predicted and ground truth values of 𝑓 , the absolute error, and the
predictive standard deviation. The results correspond to the least favorable case (𝝁(∗) = {0.9, 1.04}) at two different time instances:
𝑡 = 1 and 𝑡 = 4. The standard deviation of the reduced-order model (ROM) exhibits qualitative similarity to the absolute error, and
the error generally falls within the range of 1 to 1.5 standard deviations.
In 20 separate test runs, the FOM requires an average wall clock run–time of 22.5 s when utilizing four cores, and 57.9 s when
using a single core. In contrast, the ROM model achieves an average run–time of 1.18 ⋅ 10−2 seconds, resulting in a remarkable
average speed-up of 4906× (1906× when compared to the parallel FOM). It is important to note that, similar to the Burgers equation
cases, this speed-up is obtained solely using the mean prediction and does not take advantage of the full predictive distribution.

16
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

Fig. 11. 1D1V Vlasov Equation — Maximum predictive standard deviation for GPLaSDI.

3.4. Rising thermal bubble

We explore a rising thermal bubble scenario, where an initially warm bubble is introduced into a cold ambient atmosphere [75].
As time progresses, the bubble rises and dissipates, forming a mushroom pattern. The governing equations for this problem are the
two-dimensional compressible Euler equations with gravitational source terms:

⎪ 𝜕𝜌 + ∇ ⋅ (𝜌𝐮) = 0 (𝑡, 𝑥, 𝑦) ∈ [0, 300] × [0, 1000] × [0, 1000]
⎪ 𝜕𝑡
⎪ 𝜕𝜌𝐮
⎨ + ∇ ⋅ (𝜌𝐮 ⊗ 𝐮 + 𝑝) = −𝜌𝐠 𝐠 = {0, 9.8} (38)
⎪ 𝜕𝑡
⎪ 𝜕𝑒
⎪ + ∇ ⋅ (𝑒 + 𝑝) 𝐮 = −𝜌𝐠 ⋅ 𝐮,
⎩ 𝜕𝑡
where 𝜌 represents the fluid density, 𝐮 = {𝑢, 𝑣} represents the velocity, 𝑝 denotes the pressure, and 𝐠 represents the gravitational
acceleration. The internal energy 𝑒 is given by:
𝑝 1
𝑒= + 𝜌𝐮 ⋅ 𝐮, (39)
𝛾 −1 2
where 𝛾 = 1.4 is the specific heat ratio. It is important to note that Eq. (38) is solved in its dimensional form. Slip-wall boundary
conditions are enforced on the velocity field 𝐮 at all boundaries. The ambient atmosphere is a hydrostatically-balanced stratified
air with a constant potential temperature 𝜃 = 300 and a reference pressure 𝑝0 = 105 . The potential temperature is defined as
( ) 𝛾
𝜃 = 𝑇 𝑝𝑝
𝛾−1
. A warm bubble is introduced as a potential temperature perturbation:
0
{1
( √ ) 𝜃 (1 + cos(𝜋 𝑅𝑟 )) 𝑟 < 𝑅𝑐
𝜃 𝑡 = 0, 𝑟 = 𝑥2 + 𝑦2 = 2 𝑐 𝑐 (40)
0 𝑟 > 𝑅𝑐
The parameters of interest are denoted as 𝝁 = {𝜃𝑐 , 𝑅𝑐 }, representing the perturbation strength and bubble radius, respectively. The
parameter 𝜃𝑐 ranges from 0.5 to 0.6, while 𝑅𝑐 ranges from 150 to 160. The parameter space is discretized using a 21 × 21 grid ℎ
with step sizes 𝛥𝜃𝑐 = 0.005 and 𝛥𝑅𝑐 = 0.5. The training process begins with 𝑁𝜇 = 4 training data points located at the four corners of
the parameter space. Similarly to the Vlasov equation example, the full-order model (FOM) data is sampled using HyPar [72,75].
The FOM is solved using a fifth order WENO discretization [73] in space with grid spacings of 𝛥𝑥 = 𝛥𝑦 = 10, and a third order
strong-stability-preserving Runge–Kutta time integration scheme with a time step size of 𝛥𝑡 = 0.01.
For the neural network architecture, we utilize a 10100–1000–200–50–20–5 configuration for the encoder (and a symmetric
architecture for the decoder). The activation function employed is softplus. The latent space consists of 𝑁𝑧 = 5 variables, and only
linear terms are considered for the SINDy library. The loss hyperparameters are set to 𝛽1 = 1, 𝛽2 = 0.25, and 𝛽3 = 10−6 . To estimate
the prediction variance, we use 𝑁𝑠 = 20 samples. The training process involves 𝑁𝑒𝑝𝑜𝑐ℎ = 6.8 ⋅ 105 epochs with a learning rate of
𝛼 = 10−4 , and 𝑁𝑢𝑝 = 4 ⋅ 104 updates (resulting in adding 16 data points during training, for a total of 20 training points).

17
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

̃ (∗) ]
Fig. 12. 1D1V Vlasov Equation — Prediction for 𝝁(∗) = {0.9, 1.04} at (a) 𝑡 = 1.0 and (b) 𝑡 = 4.0. These plots illustrate the predicted latent space dynamics E[𝐙
along with a 95% confidence interval. Additionally, the plots display the ROM mean prediction (decoder output) and standard deviation, the ground truth values,
and the absolute error.

Similar to the Vlasov equation example, we employ a baseline for comparison by training an autoencoder with the same settings
and hyperparameters on a uniform parameter grid. The uniform grid consists of 20 data points arranged in a 5 × 4 grid. Fig. 13
illustrates the maximum relative error for each point in the parameter space obtained using GPLaSDI and the baseline model. With
GPLaSDI, the worst maximum relative error is 6.2%, and the largest errors occur for parameter values located towards the bottom
right corner of the parameter space. GPLaSDI slightly outperforms the baseline, which exhibits higher errors for smaller values of
𝑅𝑐 (𝑅𝑐 < 153), with the worst maximum relative error reaching 8.3%.
Fig. 14 displays the maximum standard deviation. It generally correlates reasonably well with the relative error. However, in the
lower left corner of the parameter space, where large maximum relative errors are observed (𝜃𝑐 < 0.52 and 𝑅𝑐 > 158), the standard
deviation is unexpectedly low, erroneously indicating a high confidence in the model’s predictions.
Fig. 15 depicts the latent space dynamics, specifically the predicted and ground truth values of 𝜃, along with the absolute error
and the predictive standard deviation. The results are presented for the least favorable case (𝝁(∗) = {0.59, 159}) at two time instances:
𝑡 = 100 and 𝑡 = 300. The absolute error typically falls within one standard deviation, and its pattern closely matches that of the

18
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

Fig. 13. 2D Rising Thermal Bubble — Maximum relative error (%) using GPLaSDI and a uniform training grid (non-greedy).

Fig. 14. 2D Rising Thermal Bubble — Maximum predictive standard deviation for GPLaSDI.

standard deviation. This consistent observation aligns with the findings from the Burgers equation and Vlasov equation examples,
indicating that the confidence intervals provided by GPLaSDI are meaningful and closely correlated with the prediction error.
During 20 test runs, the FOM requires an average wall clock run-time of 89.1 s when utilizing 16 cores, and 1246.8 s when using a
single core. On the other hand, the ROM model achieves an average run-time of 1.25⋅10−2 seconds, resulting in an impressive average
speed-up of 99744× (7128× when compared to the parallel FOM). It is worth noting that, similar to the Burgers and Vlasov equation
cases, this speed-up is obtained solely using the mean prediction. It is also interesting to note that throughout each example, the
run time of GPLaSDI remains relatively consistent (in the order of 10−2 seconds), even though the discretization of the high-fidelity
problem varied widely (and with it, the number of parameters in the autoencoder). This would indicate that GPLaSDI prediction
time can scale well to problems requiring finer discretizations.

4. Conclusion

We have presented GPLaSDI, a non-intrusive greedy LaSDI framework that incorporates Gaussian process latent space inter-
polation. Our proposed framework offers several key advantages. First, GPLaSDI efficiently captures the latent space dynamics
and successfully interpolates the governing sets of ODEs while providing uncertainty quantification. This allows for meaningful
confidence intervals to be obtained over the reduced-order model (ROM) predictions. These confidence intervals play a crucial role
in identifying regions of high uncertainty in the parameter space. Furthermore, GPLaSDI intelligently selects additional training data
points in these uncertain regions, thereby maximizing prediction accuracy while providing confidence intervals. Notably, GPLaSDI
accomplishes this without requiring any prior knowledge of the underlying partial differential equation (PDE) or its residual.

19
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

Fig. 15. 2D Rising Thermal Bubble — Predictions for 𝝁(∗) = {0.59, 159} at (a) 𝑡 = 100 and (b) 𝑡 = 300. These plots illustrate the predicted latent space dynamics
E[𝐙̃ (∗) ] along with a 95% confidence interval. Additionally, the plots display the ROM mean prediction (decoder output) and standard deviation, the ground
truth values, and the absolute error.

20
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

We have demonstrated the effectiveness of GPLaSDI through four numerical examples, showcasing its superiority over uniform
sampling baselines and its competitive performance compared to intrusive methods such as gLaSDI. GPLaSDI consistently achieved
maximum relative errors of less than 6–7%, while achieving significant speed-ups ranging from several hundred to several tens of
thousands of times.
Overall, GPLaSDI offers a powerful and efficient approach for capturing latent space dynamics, accurately interpolating ODEs,
and providing uncertainty quantification in the context of reduced-order modeling. Its ability to autonomously select training data
points and generate confidence intervals makes it a valuable tool for various scientific and engineering applications.
Currently the number of training iteration and sampling rate (and thus, the number of points that will be sampled) are all
predetermined. In future work, an early termination strategy when satisfactory accuracy is obtained could be designed. Additionally,
the multiple GP training could become intractable in case of a large number of latent space variables (large 𝑁𝑧 ) and SINDy candidates
(large 𝑁𝑙 ), since the number of ODE coefficients would grow in (𝑁𝑧 𝑁𝑙 ). In such case, a parallel implementation of GP training
would likely be necessary.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared
to influence the work reported in this paper.

Data availability

The code github link is provided in the manuscript.

Acknowledgments

This research was conducted at Lawrence Livermore National Laboratory and received support from the LDRD program under
project number 21-SI-006. Y. Choi also acknowledges support from the CHaRMNET Mathematical Multifaceted Integrated Capability
Center (MMICC). Lawrence Livermore National Laboratory is operated by Lawrence Livermore National Security, LLC, for the U.S.
Department of Energy, National Nuclear Security Administration under Contract DE-AC52-07NA27344 and LLNL-JRNL-852707.

References

[1] S. Raczynski, Modeling and simulation : The computer science of illusion / Stanislaw Raczynski, in: Modeling and Simulation : The Computer Science of
Illusion, in: RSP Series in Computer Simulation and Modeling, John Wiley & Sons, Ltd, Hertfordshire, England, 2006.
[2] D. Jones, C. Snider, A. Nassehi, J. Yon, B. Hicks, Characterising the digital twin: A systematic literature review, CIRP J. Manuf. Sci. Technol. 29 (2020)
36–52, http://dx.doi.org/10.1016/j.cirpj.2020.02.002, URL https://www.sciencedirect.com/science/article/pii/S1755581720300110.
[3] Review of digital twin about concepts, technologies, and industrial applications, J. Manuf. Syst. 58 (2020) 346–361, http://dx.doi.org/10.1016/J.JMSY.
2020.06.017.
[4] M. Calder, C. Craig, D. Culley, R. Cani, C. Donnelly, R. Douglas, B. Edmonds, J. Gascoigne, N. Gilbert, C. Hargrove, D. Hinds, D. Lane, D. Mitchell, G.
Pavey, D. Robertson, B. Rosewell, S. Sherwin, M. Walport, A. Wilson, Computational modelling for decision-making: Where, why, what, who and how, R.
Soc. Open Sci. 5 (2018) 172096, http://dx.doi.org/10.1098/rsos.172096.
[5] E. Winsberg, Computer Simulations in Science, in: E.N. Zalta, U. Nodelman (Eds.), The Stanford Encyclopedia of Philosophy, Winter 2022 ed., Metaphysics
Research Lab, Stanford University, 2022.
[6] R.M. Cummings, W.H. Mason, S.A. Morton, D.R. McDaniel, Applied Computational Aerodynamics: A Modern Engineering Approach, in: Cambridge Aerospace
Series, Cambridge University Press, 2015, http://dx.doi.org/10.1017/CBO9781107284166.
[7] D. Diston, Computational Modelling and Simulation of Aircraft and the Environment: Platform Kinematics and Synthetic Environment, first ed., in: Aerospace
Series, vol. 1, John Wiley & Sons Ltd, United Kingdom, 2009, http://dx.doi.org/10.1002/9780470744130.
[8] K. Kurec, M. Remer, J. Broniszewski, P. Bibik, S. Tudruj, J. Piechna, Advanced modeling and simulation of vehicle active aerodynamic safety, J. Adv.
Transp. 2019 (2019) 1–17, http://dx.doi.org/10.1155/2019/7308590.
[9] A. Muhammad, I.H. Shanono, Simulation of a car crash using ANSYS, in: 2019 15th International Conference on Electronics, Computer and Computation,
ICECCO, 2019, pp. 1–5, http://dx.doi.org/10.1109/ICECCO48375.2019.9043275.
[10] A. Peterson, S. Ray, R. Mittra, Computational Methods for Electromagnetics, Wiley, John & Sons, 1997.
[11] T. Rylander, P. Ingelström, A. Bondeson, Computational Electromagnetics, Springer, 2013.
[12] J. Thijssen, Computational Physics, second ed., Cambridge University Press, 2007, http://dx.doi.org/10.1017/CBO9781139171397.
[13] R. Schwartz, Biological Modeling and Simulation, MIT Press, 2008.
[14] L. Biegler, G. Biros, O. Ghattas, M. Heinkenschloss, D. Keyes, B. Mallick, Y. Marzouk, L. Tenorio, B. van Bloemen Waanders, K. Willcox, Large-Scale Inverse
Problems and Quantification of Uncertainty, Wiley, 2010, http://dx.doi.org/10.1002/9780470685853, URL http://hdl.handle.net/10754/656260.
[15] R.C. Smith, Uncertainty quantification - theory, implementation, and applications, in: Computational Science and Engineering, 2013.
[16] R. Sternfels, C.J. Earls, Reduced-order model tracking and interpolation to solve PDE-based Bayesian inverse problems, Inverse Problems 29 (7) (2013)
075014, http://dx.doi.org/10.1088/0266-5611/29/7/075014.
[17] D. Galbally, K. Fidkowski, K. Willcox, O. Ghattas, Non-linear model reduction for uncertainty quantification in large-scale inverse problems, Internat.
J. Numer. Methods Engrg. 81 (12) (2010) 1581–1608, http://dx.doi.org/10.1002/nme.2746, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/nme.
2746.
[18] V. Fountoulakis, C. Earls, Duct heights inferred from radar sea clutter using proper orthogonal bases, Radio Sci. 51 (10) (2016) 1614–1626, http:
//dx.doi.org/10.1002/2016RS005998, arXiv:https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1002/2016RS005998.
[19] S. Wang, E.d. Sturler, G.H. Paulino, Large-scale topology optimization using preconditioned Krylov subspace methods with recycling, Internat. J. Numer.
Methods Engrg. 69 (12) (2007) 2441–2468, http://dx.doi.org/10.1002/nme.1798, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/nme.1798.
[20] D. White, Y. Choi, J. Kudo, A dual mesh method with adaptivity for stress-constrained topology optimization, Struct. Multidiscip. Optim. 61 (2020)
http://dx.doi.org/10.1007/s00158-019-02393-6.

21
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

[21] Y. Choi, C. Farhat, W. Murray, M. Saunders, A practical factorization of a Schur complement for PDE-constrained distributed optimal control, 2013,
http://dx.doi.org/10.48550/ARXIV.1312.5653, arXiv. URL https://arxiv.org/abs/1312.5653.
[22] G. Berkooz, P. Holmes, J.L. Lumley, The proper orthogonal decomposition in the analysis of turbulent flows, Annu. Rev. Fluid Mech. 25 (1) (1993)
539–575, http://dx.doi.org/10.1146/annurev.fl.25.010193.002543.
[23] G. Rozza, D. Huynh, A. Patera, Reduced basis approximation and a posteriori error estimation for affinely parametrized elliptic coercive partial differential
equations, Arch. Comput. Methods Eng. 15 (2007) 1–47, http://dx.doi.org/10.1007/BF03024948.
[24] M.G. Safonov, R.Y. Chiang, A Schur method for balanced model reduction, in: 1988 American Control Conference, 1988, pp. 1036–1040.
[25] J.T. Lauzon, S.W. Cheung, Y. Shin, Y. Choi, D.M. Copeland, K. Huynh, S-OPT: A points selection algorithm for hyper-reduction in reduced order models,
2022, http://dx.doi.org/10.48550/ARXIV.2203.16494, arXiv. URL https://arxiv.org/abs/2203.16494.
[26] Y. Choi, D. Coombs, R. Anderson, SNS: A solution-based nonlinear subspace method for time-dependent model order reduction, SIAM J. Sci. Comput. 42
(2) (2020) A1116–A1146, http://dx.doi.org/10.1137/19M1242963.
[27] G. Stabile, G. Rozza, Finite volume POD-Galerkin stabilised reduced order methods for the parametrised incompressible Navier–Stokes equations, Comput.
& Fluids 173 (2018) 273–284, http://dx.doi.org/10.1016/j.compfluid.2018.01.035.
[28] T. Iliescu, Z. Wang, Variational multiscale proper orthogonal decomposition: Navier-stokes equations, Numer. Methods Partial Differential Equations 30
(2) (2014) 641–663, http://dx.doi.org/10.1002/num.21835, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/num.21835.
[29] D.M. Copeland, S.W. Cheung, K. Huynh, Y. Choi, Reduced order models for Lagrangian hydrodynamics, Comput. Methods Appl. Mech. Engrg. 388 (2022)
114259, http://dx.doi.org/10.1016/j.cma.2021.114259.
[30] S.W. Cheung, Y. Choi, D.M. Copeland, K. Huynh, Local Lagrangian reduced-order modeling for Rayleigh-Taylor instability by solution manifold
decomposition, 2022, http://dx.doi.org/10.48550/ARXIV.2201.07335, arXiv. URL https://arxiv.org/abs/2201.07335.
[31] B. McLaughlin, J. Peterson, M. Ye, Stabilized reduced order models for the advection–diffusion–reaction equation using operator splitting, Comput.
Math. Appl. 71 (11) (2016) 2407–2420, http://dx.doi.org/10.1016/j.camwa.2016.01.032, Proceedings of the conference on Advances in Scientific
Computing and Applied Mathematics. A special issue in honor of Max Gunzburger’s 70th birthday. URL https://www.sciencedirect.com/science/article/
pii/S0898122116300281.
[32] Y. Kim, K. Wang, Y. Choi, Efficient space–time reduced order model for linear dynamical systems in python using less than 120 lines of code, Mathematics
9 (14) (2021) http://dx.doi.org/10.3390/math9141690, URL https://www.mdpi.com/2227-7390/9/14/1690.
[33] Y. Choi, G. Oxberry, D. White, T. Kirchdoerfer, Accelerating design optimization using reduced order models, 2019, http://dx.doi.org/10.48550/ARXIV.
1909.11320, arXiv. URL https://arxiv.org/abs/1909.11320.
[34] S. McBane, Y. Choi, Component-wise reduced order model lattice-type structure design, Comput. Methods Appl. Mech. Engrg. 381 (2021) 113813,
http://dx.doi.org/10.1016/j.cma.2021.113813.
[35] Y. Kim, Y. Choi, D. Widemann, T. Zohdi, A fast and accurate physics-informed neural network reduced order model with shallow masked autoencoder,
2020, http://dx.doi.org/10.48550/ARXIV.2009.11990, arXiv. URL https://arxiv.org/abs/2009.11990.
[36] Y. Kim, Y. Choi, D. Widemann, T. Zohdi, Efficient nonlinear manifold reduced order model, 2020, http://dx.doi.org/10.48550/ARXIV.2011.07727, arXiv.
URL https://arxiv.org/abs/2011.07727.
[37] K. Lee, K.T. Carlberg, Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders, J. Comput. Phys. 404 (2020)
108973, http://dx.doi.org/10.1016/j.jcp.2019.108973, URL https://www.sciencedirect.com/science/article/pii/S0021999119306783.
[38] A.N. Diaz, Y. Choi, M. Heinkenschloss, A fast and accurate domain-decomposition nonlinear manifold reduced order model, 2023, arXiv:2305.15163.
[39] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504–507, http://dx.doi.org/10.
1126/science.1127647, arXiv:https://www.science.org/doi/pdf/10.1126/science.1127647.
[40] D. DeMers, G. Cottrell, Non-linear dimensionality reduction, in: S. Hanson, J. Cowan, C. Giles (Eds.), Advances in Neural Information Processing Systems.
Vol. 5, Morgan-Kaufmann, 1992, URL https://proceedings.neurips.cc/paper/1992/file/cdc0d6e63aa8e41c89689f54970bb35f-Paper.pdf.
[41] W.D. Fries, X. He, Y. Choi, LaSDI: Parametric latent space dynamics identification, Comput. Methods Appl. Mech. Engrg. 399 (2022) 115436, http:
//dx.doi.org/10.1016/j.cma.2022.115436.
[42] X. He, Y. Choi, W.D. Fries, J. Belof, J.-S. Chen, gLaSDI: Parametric physics-informed greedy latent space dynamics identification, J. Comput. Phys. 489
(2023) 112267, http://dx.doi.org/10.1016/j.jcp.2023.112267, URL https://www.sciencedirect.com/science/article/abs/pii/S0021999123003625.
[43] S. McBane, Y. Choi, K. Willcox, Stress-constrained topology optimization of lattice-like structures using component-wise reduced order models, Comput.
Methods Appl. Mech. Engrg. 400 (2022) 115525.
[44] G. Tapia, S.A. Khairallah, M.J. Matthews, W.E. King, A. Elwany, Gaussian process-based surrogate modeling framework for process planning in laser
powder-bed fusion additive manufacturing of 316L stainless steel, Int. J. Adv. Manuf. Technol. 94 (9–12) (2017) http://dx.doi.org/10.1007/s00170-017-
1045-z.
[45] D. Marjavaara, CFD driven optimization of hydraulic turbine draft tubes using surrogate models, 2006.
[46] K. Cheng, R. Zimmermann, Sliced gradient-enhanced Kriging for high-dimensional function approximation and aerodynamic modeling, 2022.
[47] J.N. Kutz, Deep learning in fluid dynamics, J. Fluid Mech. 814 (2017) 1–4, http://dx.doi.org/10.1017/jfm.2016.803.
[48] J.R. Koza, Genetic programming as a means for programming computers by natural selection, Stat. Comput. 4 (2) (1994) 87–112, http://dx.doi.org/10.
1007/BF00175355.
[49] M. Schmidt, H. Lipson, Distilling free-form natural laws from experimental data, Science 324 (5923) (2009) 81–85, http://dx.doi.org/10.1126/science.
1165893, arXiv:https://www.science.org/doi/pdf/10.1126/science.1165893.
[50] S.L. Brunton, J.L. Proctor, J.N. Kutz, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proc. Natl. Acad.
Sci. 113 (15) (2016) 3932–3937, http://dx.doi.org/10.1073/pnas.1517384113, arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.1517384113.
[51] S.H. Rudy, S.L. Brunton, J.L. Proctor, J.N. Kutz, Data-driven discovery of partial differential equations, Sci. Adv. 3 (4) (2017) e1602614, http:
//dx.doi.org/10.1126/sciadv.1602614, arXiv:https://www.science.org/doi/pdf/10.1126/sciadv.1602614.
[52] L.M. Gao, J.N. Kutz, Bayesian autoencoders for data-driven discovery of coordinates, governing equations and fundamental constants, 2022, http:
//dx.doi.org/10.48550/ARXIV.2211.10575, arXiv. URL https://arxiv.org/abs/2211.10575.
[53] K. Owens, J.N. Kutz, Data-driven discovery of governing equations for coarse-grained heterogeneous network dynamics, 2022, http://dx.doi.org/10.48550/
ARXIV.2205.10965, arXiv. URL https://arxiv.org/abs/2205.10965.
[54] S.M. Hirsh, D.A. Barajas-Solano, J.N. Kutz, Sparsifying priors for Bayesian uncertainty quantification in model discovery, R. Soc. Open Sci. 9 (2) (2022)
211823, http://dx.doi.org/10.1098/rsos.211823, arXiv:https://royalsocietypublishing.org/doi/pdf/10.1098/rsos.211823.
[55] D.A. Messenger, D.M. Bortz, Weak SINDy for partial differential equations, J. Comput. Phys. 443 (2021) 110525, http://dx.doi.org/10.1016/j.jcp.2021.
110525.
[56] Z. Chen, Y. Liu, H. Sun, Physics-informed learning of governing equations from scarce data, Nature Commun. 12 (1) (2021) http://dx.doi.org/10.1038/
s41467-021-26434-1.
[57] C. Bonneville, C. Earls, Bayesian deep learning for partial differential equation parameter discovery with sparse and noisy data, J. Comput. Phys.: X 16
(2022) 100115, http://dx.doi.org/10.1016/j.jcpx.2022.100115, URL https://www.sciencedirect.com/science/article/pii/S2590055222000117.
[58] R. Stephany, C. Earls, PDE-READ: Human-readable partial differential equation discovery using deep learning, Neural Netw. 154 (2022) 360–382,
http://dx.doi.org/10.1016/j.neunet.2022.07.008, URL https://www.sciencedirect.com/science/article/pii/S0893608022002660.

22
C. Bonneville et al. Computer Methods in Applied Mechanics and Engineering 418 (2024) 116535

[59] R. Stephany, C. Earls, PDE-LEARN: Using deep learning to discover partial differential equations from noisy, limited data, 2022, http://dx.doi.org/10.
48550/ARXIV.2212.04971, arXiv. URL https://arxiv.org/abs/2212.04971.
[60] K. Champion, B. Lusch, J.N. Kutz, S.L. Brunton, Data-driven discovery of coordinates and governing equations, Proc. Natl. Acad. Sci. 116 (45) (2019)
22445–22451, http://dx.doi.org/10.1073/pnas.1906995116, arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.1906995116.
[61] Z. Bai, L. Peng, Non-intrusive nonlinear model reduction via machine learning approximations to low-dimensional operators, 2021, http://dx.doi.org/10.
48550/ARXIV.2106.09658, arXiv. URL https://arxiv.org/abs/2106.09658.
[62] C.E. Rasmussen, C.K.I. Williams, Gaussian Processes for Machine Learning., in: Adaptive computation and machine learning, MIT Press, 2006, pp. I–XVIII,
1–248.
[63] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016, http://www.deeplearningbook.org.
[64] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2014, http://dx.doi.org/10.48550/ARXIV.1412.6980, arXiv. URL https://arxiv.org/abs/
1412.6980.
[65] J.N. Fuhg, M. Marino, N. Bouklas, Local approximate Gaussian process regression for data-driven constitutive models: Development and comparison with
neural networks, Comput. Methods Appl. Mech. Engrg. 388 (2022) 114217, http://dx.doi.org/10.1016/j.cma.2021.114217, URL https://www.sciencedirect.
com/science/article/pii/S004578252100548X.
[66] A.G. Wilson, H. Nickisch, Kernel interpolation for scalable structured Gaussian processes (KISS-GP), 2015, arXiv:1503.01057.
[67] A.G. Wilson, C. Dann, H. Nickisch, Thoughts on massively scalable Gaussian processes, 2015, arXiv:1511.01870.
[68] A. Muyskens, B. Priest, I. Goumiri, M. Schneider, MuyGPs: Scalable Gaussian process hyperparameter estimation using local cross-validation, 2021,
arXiv:2104.14581.
[69] C.M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), first ed., Springer, 2007, URL http://www.amazon.
com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738%3FSubscriptionId%3D13CT5CVB80YFWJEPWS02%26tag%3Dws%26linkCode%
3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D0387310738.
[70] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito,
M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, PyTorch: An imperative style, high-performance deep learning library, in:
Advances in Neural Information Processing Systems 32, Curran Associates, Inc., 2019, pp. 8024–8035, URL http://papers.neurips.cc/paper/9015-pytorch-
an-imperative-style-high-performance-deep-learning-library.pdf.
[71] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine
learning in Python, J. Mach. Learn. Res. 12 (Oct) (2011) 2825–2830.
[72] HyPar Repository, https://bitbucket.org/deboghosh/hypar.
[73] G.-S. Jiang, C.-W. Shu, Efficient implementation of weighted ENO schemes, J. Comput. Phys. 126 (1) (1996) 202–228, http://dx.doi.org/10.1006/jcph.
1996.0130.
[74] R.M. Neal, Bayesian learning for neural networks, 1995, URL https://api.semanticscholar.org/CorpusID:60809283.
[75] D. Ghosh, E.M. Constantinescu, Well-balanced, conservative finite-difference algorithm for atmospheric flows, AIAA J. 54 (4) (2016) 1370–1385,
http://dx.doi.org/10.2514/1.J054580.

23

You might also like