16992-Article Text-20486-1-2-20210518
16992-Article Text-20486-1-2-20210518
8146
of two terms, each of which is associated with approximation
errors in ũ and f˜, respectively. In the original formulation, a
loss function consisting of two error terms is considered:
def
L = αLu + βLf , (2)
where α, β ∈ R are coefficients and Lu , Lf are defined
below.
Nu
1 X
Lu = |u(xiu , tiu ) − ũ(xiu , tiu ; Θ)|2 , (3)
(a) Interpolation (b) Extrapolation Nu i=1
Nf
Figure 1: 1D viscous Burgers’ equation examples. We train 1 X ˜ i i
Lf = |f (xf , tf , ũ; Θ)|2 . (4)
the PINN model (Raissi, Perdikaris, and Karniadakis 2019) Nf i=1
with Ttrain = 0.5 and report two solution snapshots of the
The first loss term, Lu , enforces initial and boundary condi-
reference solution (solid blue line) and the approximated Nu
solution (dashed red line) obtained by PINN at t = 0.03 (i.e., tions using a set of training data (xiu , tiu ), u(xiu , tiu ) i=1 ,
interpolation) and t = 0.98 (i.e., extrapolation). where the first element of the tuple is the input to the neural
network ũ and the second element is the ground truth that the
output of ũ attempts to match. These data can be easily col-
pected to learn the dynamics Eq. (1) and, consequently, to ap- lected from specified initial and boundary conditions, which
proximate u(x, t) in (Ttrain , T ] accurately if trained properly. are known a priori (e.g., u(x, 0) = u0 (x) = − sin(πx)
However, in our preliminary study with a one-dimensional in a PDE we use for our experiments). The second loss
viscous Burgers’ equation shown in Fig. 1, we observe that term, Lf , minimizes the discrepancy between the governing
the accuracy of the approximate solution produced by PINN equation f and the neural network approximation f˜ eval-
in the extrapolation setting is significantly degraded com- uated at collocation points, which forms another training
Nf
pared to that produced in the interpolation setting. dataset (xif , tif ), f (xif , tif ) i=1 , where the ground truth
Motivated by this observation, we analyze PINN in detail N
{f (xif , tif )}i=1
f
consists of all zeros.
(Section 2), propose our method to improve the approxima-
tion accuracy in extrapolation (Section 3), and demonstrate The advantages of this loss construction are that i) no costly
the effectiveness of the proposed method with various bench- evaluations of the solutions u(x, t) at collocation points are
mark problems (Section 4). In all benchmark problems, our required to collect training data, ii) initial and boundary con-
proposed methods, denoted by PINN-D1 and D2, show the ditions are enforced by the first loss term Lu where its train-
best accuracies with various evaluation metrics. In compari- ing dataset can be easily generated, and iii) a physical law
son with state-of-the-art methods, errors from our proposed described by the governing equation f in Eq. (1) can be en-
methods are up to 72% smaller. forced by minimizing the second loss term Lf . In (Raissi,
Perdikaris, and Karniadakis 2019), both the loss terms are
considered equally important (i.e., α = β = 1), and the
2 Related Work and Preliminaries combined loss term L is minimized.
We now formally introduce PINN. Essentially, PINN param-
eterizes both the solution u and the governing equation f . Motivations. If PINN can correctly learn a governing equa-
Let us denote a neural network approximation of the solu- tion, its extrapolation should be as good as interpolation. Suc-
tion u(x, t) by ũ(x, t; Θ), where Θ denotes a set of network cessful extrapolation will enable the adoption of PINN to
parameters. The governing equation f is then approximated many PDE applications. With the loss formulation in Eq. (2),
def
by a neural network f˜(x, t, ũ; Θ) = ũt + N (ũ(x, t; Θ)), however, we empirically found that it is challenging to train
where partial derivatives are obtained via automatic differen- PINN for extrapolation as shown in Fig. 1.
tiation (or a back-propagation algorithm (Rumelhart, Hinton, Hence, we first investigate training loss curves of Lu and
and Williams 1986) to be more specific). That is, the neu- Lf separately: Fig. 2 depicts the loss curves Lu and Lf of
ral network f˜(x, t, ũ; Θ) shares the same network weights PINN trained for a 1D inviscid Burgers’ equation. The fig-
with ũ(x, t; Θ), but enforces physical laws by applying an ure shows that Lu converges very fast, whereas Lf starts to
extra problem-specific nonlinear activation defined by the fluctuate after a certain epoch and does not decrease below
PDE in Eq. (1) (i.e., ũt + N (ũ)), which leads to the name a certain value. From the observation, we can conclude that
“physics-informed” neural network.1 the initial and the boundary conditions are successfully en-
This construction suggests that these shared network forced, whereas the dynamics of the physical process may
weights can be learned via forming a loss function consisting not be accurately enforced, which, consequently, could lead
to significantly less accurate approximations in extrapolation,
1
We also note that there are other studies (e.g., (Cranmer et al. e.g., Fig. 1(b). Motivated by this observation, we propose a
2020; Greydanus, Dzamba, and Yosinski 2019)) using the idea of novel training method for PINN in the following section. In
parameterizing the governing equations, where derivatives are also the experiments section, we demonstrate performances of the
computed using automatic differentiation. proposed training method.
8147
is distributive, it satisfies the following condition
(k) (k) (k) (k) (k)
v + gL · gLf = v · gLf + gL · gLf > 0, (6)
(a) Lu curve (b) Lf curve (c) Updating Θ where δ > 0 is to control how much we pull Θ(k) toward the
region where Lf decreases, e.g., the gray region of Fig. 2 (c).
Figure 2: Example training curves of Lu and Lf of PINN for We note that Eq. (7) has many possible solutions. Among
a 1D inviscid Burgers’ equation in (a) and (b) respectively, (k)
−gL ·gL +δ
(k)
(k)
and an example of updating Θ in (c) them, one solution, denoted v ∗ = (k)
f
gLf , can be
kgL k22
f
(k)
computed by using the pseudoinverse of gLf , which is widely
3 Dynamic Pulling Method (DPM) used to find such solutions, e.g., the analytic solution of linear
least-squared problems arising in linear regressions.
The issue with training PINN, which we have identified in our A good characteristic of the pseudoinverse is that it mini-
preliminary experiments, is that Lf is fluctuating and is not mizes kvk22 (Ben-Israel and Greville 2006). By minimizing
decreasing. To resolve this issue, we propose a novel training kvk22 , we can disturb the original updating process as little as
method to impose a soft constraint of Lf ≤ , where is a possible. Therefore, we use the pseudoinverse-based solution
hyperparameter and can be set to an arbitrary small value to in our method.
ensure an accurate approximation of the governing equation, Despite its advantage, the gradient manipulation vector
i.e., enforcing f˜(·) to be close to zero. The proposed training v ∗ sometimes requires many iterations until Lf ≤ . To
concept is dynamically manipulating the gradients. expedite the pulling procedure, we also dynamically control
We dynamically manipulate the gradients of the loss terms the additive pulling term δ as follows:
on top of a gradient-based optimizer including but not limited
to the gradient descent method, i.e., Θ(k+1) = Θ(k) − γg (k) , ∆(k) = Lf (Θ(k) ) − , (8)
where γ is a learning rate, and g (k) is a gradient at k-th epoch. (
(k+1) wδ (k) , if ∆(k) > 0,
We set the gradient g (k) to one of the following vectors δ = δ(k) (k)
(9)
depending on conditions: w , if ∆ ≤ 0,
(k)
, if Lf ≤ where w > 1 is an inflation factor for increasing δ.
g Lu
(k) (k) (k) (k)
g = gL , if Lf > ∧ gLu · gLf ≥ 0, (5)
(k) 4 Experiments
v + gL , otherwise
We describe our experimental environments and results with
where v ∈ Rdim(Θ) is a manipulation vector, which we will four benchmark time-dependent nonlinear PDEs and several
(k) (k) (k) different neural network designs. Our software and hard-
show how to calculate shortly; gLu , gLf , and gL denote
ware environments are as follows: U BUNTU 18.04 LTS,
the gradients of Lu , Lf , and L, respectively. P YTHON 3.6.6, N UMPY 1.18.5, S CIPY 1.5, M ATPLOTLIB
(k)
Here, we care only about gLu , when Lf is small enough, 3.3.1, T ENSOR F LOW- GPU 1.14, CUDA 10.0, and NVIDIA
i.e., Lf ≤ , because Lf already satisfies the constraint. Driver 417.22, i9 CPU, and NVIDIA RTX T ITAN.
(k) (k)
There are two possible cases when Lf > : i) gLu · gLf ≥ 0
(k) (k) 4.1 Experimental Environments
and ii) gLu · gLf < 0. In the former case where the two
(k) (k) PDEs. We consider viscous and inviscid Burgers’ equa-
gradient terms gLu and gLf have the same direction (i.e.,
tions, nonlinear Schrödinger equation (NLS), and Allen–
the angle between them is less than 90◦ and hence their dot- Cahn (AC) equation. We refer readers to the full version (Kim
product is positive), performing a gradient descent update et al. 2020) for detailed descriptions for these equations.
(k)
with gL guarantees a decrease in Lf . In Fig. 2 (c), for For training/validating/testing, we divide the entire time
instance, both Lf and Lu decrease if Θ(k) is updated into the domain [0, T ] into three segments: [0, Ttrain ], (Ttrain , Tval ], and
gray area. (Tval , Ttest ], where T = Ttest > Tval > Ttrain > 0. In other
(k) (k)
When Lf > and gLu · gLf < 0, however, v carefully words, our task is to predict the solution functions of the
PDEs in a future time frame, i.e., extrapolation. We use
manipulates the gradient in such a way that Lf is guaranteed
Ttrain = T2 , Tval = 4T 5 , and Ttest = T , i.e., extrapolating
to decrease after a gradient descent update.
for the last 20% of the time domain, which is a widely used
We now seek such a solution v that will result in v + setting in many time-series prediction studies (Kim 2003;
(k) (k) (k) (k)
gL · gLf > 0 given gL and gLf . Because the dot-product Kang et al. 2016).
8148
L2-norm (↓) Explained variance score (↑) Max error (↓) Mean absolute error (↓)
PDE
PINN PINN-R PINN-D1 PINN-D2 PINN PINN-R PINN-D1 PINN-D2 PINN PINN-R PINN-D1 PINN-D2 PINN PINN-R PINN-D1 PINN-D2
Vis. Burgers 0.329 0.333 0.106 0.092 0.891 0.901 0.988 0.991 0.657 1.081 0.545 0.333 0.085 0.108 0.026 0.021
Inv. Burgers 0.131 0.095 0.083 0.090 0.214 0.468 0.485 0.621 3.088 2.589 1.534 2.036 0.431 0.299 0.277 0.315
Allen–Cahn 0.350 0.286 0.246 0.182 0.090 0.919 0.939 0.967 1.190 1.631 1.096 0.836 0.212 0.142 0.129 0.094
Schrödinger 0.239 0.212 0.314 0.141 -4.364 -3.902 -4.973 -3.257 4.656 4.222 4.945 3.829 0.954 0.894 0.868 0.896
Table 1: The extrapolation accuracy in terms of the relative errors in the L2-norm, the explained variance error, the max error,
and the mean absolute error in various PDEs. Large (resp. small) values are preferred for ↑ (resp. ↓).
Baselines. Our task definition is not to simply approximate a uniform temporal grid in (Tval , Ttest ]. We use a temporal
a solution function u with a regression model but to let a step size of 0.01, 0.0175, 0.01π
2 , and 0.005 for the viscous
neural network learn physical dynamics without costly collec- Burgers’ equation, the inviscid Burgers’ equation, the NLS
tions of training samples (see our broader impact statement equation, and the AC equation, respectively. We divide Ω
to learn why it is important to train without costly collec- into a grid of 256, 512, 256, and 256 points for the viscous
tions of training samples). For this task, the state-of-the-art Burgers’ equation, the inviscid Burgers’ equation, the NLS
method is PINN. We compare our method with the following equation, and the AC equation, respectively.
baselines: i) the original PINN which uses a series of fully- For creating our training sets, we use Nu = 100 initial and
connected and hyperbolic tangent layer, denoted by PINN, boundary tuples for all the benchmark equations. For Nf , we
and ii) PINN improved with the residual connection (He et al. use 10K for the viscous and the inviscid Burgers’ equations,
2016), denoted by PINN-R. We apply our DPM with (resp. and 20K for the NLS equation and the AC equation.
without) controlling δ in Eq. (9) to train PINN-R, denoted by
PINN-D2 (resp. PINN-D1). 4.2 Experimental Results
Table 1 summarizes the overall performance for all bench-
Evaluation Metrics. For performance evaluation, we col- mark PDEs obtained by PINN, PINN-R, PINN-D1, and
lect predicted solutions at testing data instances to construct a PINN-D2. PINN-R shows smaller L2-norm errors than PINN.
solution vector ũ = [ũ(x1test , t1test ; Θ), ũ(x2test , t2test ; Θ), . . .]> , The proposed PINN-D2 significantly outperforms PINN and
where {(xitest , titest )} is a set of testing samples. xitest is sam- PINN-R in all four benchmark problems for all metrics. For
pled at a uniform spatial mesh grid in Ω and titest is on a uni- the viscous Burgers’ equation and the AC equation, PINN-D2
form temporal grid in (Tval , Ttest ]. See the full version (Kim demonstrates 72% and 48% (resp. 72% and 36%) improve-
et al. 2020) for how to build testing sets. For the comparison, ments over PINN (resp. PINN-R) in terms of the relative
we also collect the reference solution vector, denoted u, at the L2-norm, respectively.
same testing data instances by solving the same PDEs using
traditional numerical solvers. As evaluation metrics, we use Viscous Burgers’ equation. Fig. 3 shows the reference
the standard relative errors in L2-norm, i.e., kũ − uk2 /kuk2 , solution and predictions made by PINN and the PINN vari-
the explained variance score, the max error, and the mean ants of the viscous Burgers’ equation. In Figs. 3(b)–3(c),
absolute error, each of which shows a different aspect of per- both PINN and PINN-R fail to correctly learn the govern-
formance. Moreover, we report snapshots of the reference ing equation and their prediction accuracy is significantly
and approximate solutions at certain time indices. degraded as t increases. However, the proposed PINN-D2
shows much more accurate prediction even when t is close
Hyperparameters. For all methods, we test with the fol- to the end of the time domain. These results explain that
lowing hyperparameter configurations: the number of lay- learning a governing equation correctly helps accurate ex-
ers is {2, 3, 4, 5, 6, 7, 8}, the dimensionality of hidden vec- trapolation. Although PINN and PINN-R are able to learn
tor is {20, 40, 50, 100, 150}. For PINN and PINN-R, we the initial and boundary conditions accurately, their extrap-
use α = {1, 10, 100, 1000}, β = {1, 10, 100, 1000} — olation performances are poor because they fail to learn the
we do not test the condition of α = β, except for α = governing equation accurately. Figs. 3(e)–3(j) report solution
β = 1. Our DPM uses α = β = 1. The learning rate snapshots at t = {0.83, 0.98} and we observe that the pro-
is {1e-3, 5e-3, 1e-4, 5e-5} with various standard optimizers posed PINN-D2 outperforms the other two PINN methods.
such as Adam, SGD, etc. For the proposed DPM, we test with Only PINN-D2 accurately enforces the prediction around
= {0.001, 0.005, 0.01, 0.0125}, δ = {0.01, 0.1, 1, 10}, x = 0 in Fig. 3(j). PINN-D1 is comparable to PINN-D2 in
and w = {1.001, 1.005, 1.01, 1.025}. We also use the early this equation according to Table 1.
stopping technique using the validation error as a criterion.
If there are no improvements in validation loss larger than Inviscid Burgers’ equation. In this benchmark problem,
1e-5 for the past 50 epochs, we stop the training process. We we consider the inviscid Burgers’ equation posed on a very
choose the model that best performs on the validation set. long time domain [0, 35], which is much larger than those of
other benchmark problems and could make the extrapolation
Train & Test Set Creation. To build testing sets, xitest is task even more challeging. Fig. 4 reports the results obtained
sampled at a uniform spatial mesh grid in Ω and titest is on by the PINN variants along with the reference solution. All
8149
(a) Reference Solution (b) PINN (a) Reference Solution (b) PINN
Figure 3: Top two rows: the complete reference solution and Figure 4: Top two rows: the complete reference solution
predictions of the benchmark viscous Burgers’ equation. The and predictions of the benchmark inviscid Burgers’ equation.
points marked with × mean initial or boundary points. Bot- The points marked with × mean initial or boundary points.
tom: the solution snapshots at t = {0.83, 0.98} obtained via Bottom: the solution snapshots at t = {28.0875, 34.9125}
the extrapolation. In Fig. 3(a), the black vertical lines corre- obtained via the extrapolation.
spond to Ttrain and Tval , respectively, and in Figs. 3(b)–3(d),
the white vertical lines correspond to time indices, where
we extract solution snapshots. We refer readers to the full
version (Kim et al. 2020) for more snapshots. The meanings where all three methods struggle to make accurate predictions.
of the vertical lines remain the same in the following figures. Moreover, the approximate solutions of PINN-D2 are almost
symmetric w.r.t. x = 0, whereas the approximate solutions of
the other two methods are significantly non-symmetric and
the three methods, PINN-R, PINN-D1, and PINN-D2, are the accuracy becomes even more degraded as t increases.
comparable in this benchmark problem. However, we can
still observe that PINN-D2 produces slightly more accurate
predictions than other methods at x = 0, the boundary con-
dition. The first condition of Eq. (5) accounts for this result: Nonlinear Schrödinger equation (NLS). Fig. 6 reports
when Lf is sufficiently small, the update performed by DPM the reference solution of the NLS equation and the predic-
further decreases Lu to improve the predictions in the initial tions made by all the considered PINN variants. Because the
and boundary conditions. solution of the NLS equation is a complex-valued, the mag-
nitudes of the reference solution |u(x, t)| and the predictions
|ũ(x, t)| are depicted. The solution snapshots produced by
Allen–Cahn equation (AC). Fig. 5 reports the reference PINN and PINN-R exhibit errors around x = −1 and x = 1
solutions of the AC equation and the predictions made by all whereas PINN-D2 is accurate around the region. In particu-
the considered PINN variants. The solution snapshots shown lar, the predictions made by PINN and PINN-R exhibit the
in Figs. 5(e)–5(j) demonstrate that the proposed PNN-D2 shapes that are very similar to previous time steps’ solution
produces the most accurate approximations to the reference snapshots, which indicates that the dynamics of the system
solutions. In particular, the approximate solutions obtained is not learned accurately. In contrast, PINN-D2 seems to en-
by using PINN-D2 matches very closely with the reference force the dynamics much better and produce more accurate
solutions with the exception on the valley (around x = 0), predictions.
8150
(a) Reference Solution (b) PINN (a) Reference Solution (b) PINN
(e) PINN (f) PINN-R (g) PINN-D2 (e) PINN (f) PINN-R (g) PINN-D2
(h) PINN (i) PINN-R (j) PINN-D2 (h) PINN (i) PINN-R (j) PINN-D2
Figure 5: Top two rows: the complete reference solution and Figure 6: Top two rows: the complete reference solution and
predictions of the Allen–Cahn equation. The points marked predictions of the nonlinear Schrödinger equation. The points
with × mean initial or boundary points. Bottom: the extrapo- marked with × mean initial or boundary points. Bottom: the
lation solution snapshots at t = {0.815, 0.995}. extrapolation solution snapshots at t = {1.2802, 1.5551}.
4.3 Ablation Study In general, this approach requires non-trivial efforts to run
To show the efficacy of controlling δ in Eq. (9), we compare computer simulations and collect such reference solutions.
PINN-D1 and PINN-D2. In Table 1, PINN-D2 outperforms Once we collect them, one advantage is learning u becomes
PINN-D1 for three benchmark equations. The biggest im- a simple regression task without involving Lf . However, a
provement is made in the NLS equation, one of the most critical disadvantage is that governing equations cannot be
difficult equations to predict, i.e., 0.314 vs. 0.141 in the L2- explicitly imposed during the training process.
norm metric. We note that without controlling δ, PINN-D1 Although our task is not to fit a regression model to the
shows worse predictions even than PINN and PINN-R in this reference solutions but to learn physical dynamics, we com-
equation. pare our proposed method with the regression-based ap-
proach to better understand our method. To train the re-
4.4 Visualization of Training Process gression model, we use Lu with an augmented training
Nu Nr
Fig. 7 shows the curves of Lu and Lf with our method in set (xiu , tiu ), u(xiu , tiu ) i=1 ∪ (xir , tir ), u(xir , tir ) i=1 ,
the benchmark viscous Burgers’ equation. For Lf , we set where the first set consists of initial and boundary training
= 0.001, δ = 0.01, and w = 1.01, which produces the best samples, (xir , tir ) are sampled uniformly in Ω and [0, Ttrain ],
extrapolation accuracy. With this setting, DPM immediately and we set Nr = Nf for fairness. We run external software
pulls Lf toward the threshold = 0.001 as soon as Lf > to calculate u(xir , tir ), which is not needed for u(xiu , tiu ) be-
0.001. Because our method uses the smallest manipulation cause initial and boundary conditions are known a priori.
vector, v ∗ , Lu is also trained properly as training goes on. We train two regression models: one based on a series of
fully connected (FC) layers and the other based on residual
4.5 PINN vs. Regression connections. In Table 2, they are denoted by FC and FC-R,
The task definition of PINN is different from that of the respectively. We note that the neural network architecture of
simple regression learning a solution function u, where the FC (resp. FC-R) is the same as that of PINN (resp. PINN-R)
reference solutions of u(x, t) are collected not only for ini- but they are trained in the supervised manner described earlier.
tial and boundary conditions but also for other (x, t) pairs. We use the same set of hyperparameters for the number of
8151
L2-norm (↓) Explained variance score (↑) Max error (↓) Mean absolute error (↓)
PDE
FC FC-R PINN-D1 PINN-D2 FC FC-R PINN-D1 PINN-D2 FC FC-R PINN-D1 PINN-D2 FC FC-R PINN-D1 PINN-D2
Vis. Burgers 0.352 0.301 0.112 0.092 0.896 0.915 0.988 0.991 0.718 0.598 0.545 0.333 0.119 0.108 0.026 0.021
Inv. Burgers 0.114 0.133 0.083 0.090 0.060 -0.181 0.454 0.621 3.245 3.301 1.534 2.036 0.255 0.332 0.277 0.315
Allen–Cahn 0.324 0.313 0.246 0.182 0.873 0.766 0.939 0.967 1.512 1.190 1.096 0.8366 0.207 0.336 0.129 0.094
Schrödinger 0.375 0.235 0.314 0.141 -3.438 -3.174 -4.973 -3.257 4.078 4.3165 4.945 3.829 2.072 1.868 0.868 0.896
Table 2: The extrapolation accuracy in terms of the relative errors in L2-norm, the Pearson correlation coefficient, and R2 in
various PDEs. Large (resp. small) values are preferred for ↑ (resp. ↓).
5 Conclusions
In this work, we presented a novel training method, dy-
namic pulling method (DPM), for obtaining better perform-
ing physics-informed neural networks in extrapolation. The
proposed DPM enables PINN to learn dynamics of the gov-
(j) FC-R (k) FC-R (l) FC-R
erning equations accurately. In the numerical experiments,
we first demonstrated that the original PINN performs poorly
on extrapolation tasks and empirically analyzed PINN in Figure 8: We visualize the results by regression models. As
detail. Then, we demonstrated that the proposed DPM signif- shown, they are inferior to our PINN-D2 in Figs. 5 and 6.
icantly outperforms PINN and its residual-block variant (up Top two rows: the regression extrapolation snapshots for the
to 72% in comparison with PINN and PINN-R) in various Allen–Cahn equation. Bottom: the regression extrapolation
metrics. As an ablation study, we compared PINN-D1 and snapshots for the nonlinear Schrödinger equation.
PINN-D2, where PINN-D2 overwhelms PINN-D1 in three
benchmark problems. Finally, we explained how DPM be-
haves by illustrating example training loss curves. All codes
and data will be released upon publication.
8152
Acknowledgements Geist, M.; Petersen, P.; Raslan, M.; Schneider, R.; and Ku-
tyniok, G. 2020. Numerical Solution of the Parametric Dif-
fusion Equation by Deep Neural Networks. arXiv preprint
Noseong Park ([email protected]) is the corresponding arXiv:2004.12131 .
author. This work was supported by the Institute of Informa- Greydanus, S.; Dzamba, M.; and Yosinski, J. 2019. Hamil-
tion & Communications Technology Planning & Evaluation tonian neural networks. In Advances in Neural Information
(IITP) grant funded by the Korea government (MSIT) (No. Processing Systems, 15353–15363.
2020-0-01361, Artificial Intelligence Graduate School Pro-
gram (Yonsei University)). This paper describes objective He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep Residual
technical results and analysis. Any subjective views or opin- Learning for Image Recognition. In CVPR.
ions that might be expressed in the paper do not necessarily Hirsch, C. 2007. Numerical computation of internal and
represent the views of the U.S. Department of Energy or external flows: The fundamentals of computational fluid dy-
the United States Government. Sandia National Laborato- namics. Elsevier.
ries is a multimission laboratory managed and operated by Holl, P.; Thuerey, N.; and Koltun, V. 2020. Learning to
National Technology and Engineering Solutions of Sandia, Control PDEs with Differentiable Physics. In International
a wholly owned subsidiary of Honeywell International, for Conference on Learning Representations.
the U.S. Department of Energy’s National Nuclear Security
Administration under contract DE-NA-0003525. Hornik, K.; Stinchcombe, M.; White, H.; et al. 1989. Mul-
tilayer feedforward networks are universal approximators.
Neural networks 2(5): 359–366.
Iserles, A. 2009. A first course in the numerical analysis of
References differential equations. 44. Cambridge university press.
Kang, C.; Park, N.; Prakash, B. A.; Serra, E.; and Subrahma-
Anderson, D.; Tannehill, J. C.; and Pletcher, R. H. 2016. nian, V. S. 2016. Ensemble Models for Data-Driven Predic-
Computational fluid mechanics and heat transfer. Taylor & tion of Malware Infections. In WSDM.
Francis. Khoo, Y.; Lu, J.; and Ying, L. 2017. Solving parametric
PDE problems with artificial neural networks. arXiv preprint
Ben-Israel, A.; and Greville, T. 2006. Generalized Inverses: arXiv:1707.03351 .
Theory and Applications. CMS Books in Mathematics. Kim, J.; Lee, K.; Lee, D.; Jin, S. Y.; and Park, N. 2020. DPM:
Springer New York. ISBN 9780387216348. A Novel Training Method for Physics-Informed Neural Net-
works in Extrapolation. arXiv preprint arXiv:2012.02681
Cosmin Anitescu, Elena Atroshchenko, N. A. T. R. 2019. .
Artificial Neural Network Methods for the Solution of Second Kim, K. 2003. Financial time series forecasting using support
Order Boundary Value Problems. Computers, Materials & vector machines. Neurocomputing 55(1): 307–319. Support
Continua 59(1). Vector Machines.
Lee, K.; and Carlberg, K. 2019. Deep Conservation: A latent
Cranmer, M.; Greydanus, S.; Hoyer, S.; Battaglia, P.; Spergel, dynamics model for exact satisfaction of physical conserva-
D.; and Ho, S. 2020. Lagrangian neural networks. arXiv tion laws. arXiv preprint arXiv:1909.09754 .
preprint arXiv:2003.04630 . Lee, K.; and Carlberg, K. T. 2020. Model reduction of dy-
namical systems on nonlinear manifolds using deep convolu-
Doan, N. A. K.; Polifke, W.; and Magri, L. 2019. Physics- tional autoencoders. Journal of Computational Physics 404:
Informed Echo State Networks for Chaotic Systems Forecast- 108973.
ing. In Rodrigues, J. M. F.; Cardoso, P. J. S.; Monteiro, J.; LeVeque, R. J.; et al. 2002. Finite volume methods for hyper-
Lam, R.; Krzhizhanovskaya, V. V.; Lees, M. H.; Dongarra, bolic problems, volume 31. Cambridge university press.
J. J.; and Sloot, P. M., eds., Computational Science – ICCS Ling, J.; Kurzawski, A.; and Templeton, J. 2016. Reynolds
2019, 192–198. Cham: Springer International Publishing. averaged turbulence modelling using deep neural networks
with embedded invariance. Journal of Fluid Mechanics 807:
Erichson, N. B.; Muehlebach, M.; and Mahoney, M. W. 2019. 155–166.
Physics-informed autoencoders for Lyapunov-stable fluid Ling, J.; and Templeton, J. 2015. Evaluation of machine learn-
flow prediction. arXiv preprint arXiv:1905.10866 . ing algorithms for prediction of regions of high Reynolds
averaged Navier–Stokes uncertainty. Physics of Fluids 27(8):
Fulton, L.; Modi, V.; Duvenaud, D.; Levin, D. I.; and Ja- 085103.
cobson, A. 2019. Latent-space Dynamics for Reduced De- Raissi, M.; Perdikaris, P.; and Karniadakis, G. E. 2019.
formable Simulation. In Computer graphics forum, vol- Physics-informed neural networks: A deep learning frame-
ume 38, 379–391. Wiley Online Library. work for solving forward and inverse problems involving
8153
nonlinear partial differential equations. Journal of Computa-
tional Physics 378: 686–707.
Rumelhart, D. E.; Hinton, G. E.; and Williams, R. J. 1986.
Learning representations by back-propagating errors. Nature
323(6088): 533–536.
Stoer, J.; and Bulirsch, R. 2013. Introduction to numerical
analysis, volume 12. Springer Science & Business Media.
Tripathy, R. K.; and Bilionis, I. 2018. Deep UQ: Learning
deep neural network surrogate models for high dimensional
uncertainty quantification. Journal of computational physics
375: 565–588.
Vlachas, P. R.; Byeon, W.; Wan, Z. Y.; Sapsis, T. P.; and
Koumoutsakos, P. 2018. Data-driven forecasting of high-
dimensional chaotic systems with long short-term memory
networks. Proceedings of the Royal Society A: Mathematical,
Physical and Engineering Sciences 474(2213): 20170844.
Wiewel, S.; Becher, M.; and Thuerey, N. 2019. Latent space
physics: Towards learning the temporal evolution of fluid
flow. In Computer Graphics Forum, volume 38, 71–82. Wiley
Online Library.
Yang, L.; Meng, X.; and Karniadakis, G. E. 2020. B-
PINNs: Bayesian Physics-Informed Neural Networks for
Forward and Inverse PDE Problems with Noisy Data. ArXiv
abs/2003.06097.
Zhang, D.; Lu, L.; Guo, L.; and Karniadakis, G. E. 2019.
Quantifying total uncertainty in physics-informed neural net-
works for solving forward and inverse stochastic problems.
Journal of Computational Physics 397: 108850. ISSN 0021-
9991.
8154