PPINN
PPINN
net/publication/335990167
CITATIONS READS
0 9,598
4 authors, including:
George Em Karniadakis
Brown University
1,018 PUBLICATIONS 97,861 CITATIONS
SEE PROFILE
All content following this page was uploaded by Zhen Li on 02 October 2019.
Abstract
Physics-informed neural networks (PINNs) encode physical conservation laws and prior physical
knowledge into the neural networks, ensuring the correct physics is represented accurately while
alleviating the need for supervised learning to a great degree [1]. While effective for relatively
short-term time integration, when long time integration of the time-dependent PDEs is sought,
the time-space domain may become arbitrarily large and hence training of the neural network
may become prohibitively expensive. To this end, we develop a parareal physics-informed neural
network (PPINN), hence decomposing a long-time problem into many independent short-time
problems supervised by an inexpensive/fast coarse-grained (CG) solver. In particular, the serial CG
solver is designed to provide approximate predictions of the solution at discrete times, while initiate
many fine PINNs simultaneously to correct the solution iteratively. There is a two-fold benefit
from training PINNs with small-data sets rather than working on a large-data set directly, i.e.,
training of individual PINNs with small-data is much faster, while training the fine PINNs can be
readily parallelized. Consequently, compared to the original PINN approach, the proposed PPINN
approach may achieve a significant speedup for long-time integration of PDEs, assuming that the
CG solver is fast and can provide reasonable predictions of the solution, hence aiding the PPINN
solution to converge in just a few iterations. To investigate the PPINN performance on solving
time-dependent PDEs, we first apply the PPINN to solve the Burgers equation, and subsequently
we apply the PPINN to solve a two-dimensional nonlinear diffusion-reaction equation. Our
results demonstrate that PPINNs converge in a couple of iterations with significant speed-ups
proportional to the number of time-subdomains employed.
1 Introduction
At a cost of a relatively expensive computation in the training process, deep neural networks (DNNs)
provide a powerful approach to explore hidden correlations in massive data, which in many cases are
physically not possible with human manual review [2]. In the past decade, the large computational
cost for training DNN has been mitigated by a number of advances, including high-performance
computers [3], graphics processing units (GPUs) [4], tensor processing units (TPUs) [5], and fast large-
scale optimization schemes [6], i.e., adaptive moment estimation (Adam) [7] and adaptive gradient
∗
The first two authors contributed equally to this work.
†
Corresponding Email: george [email protected]
1
algorithm (AdaGrad) [8]. In many instances for modeling physical systems, physical invariants, e.g.,
momentum and energy conservation laws, can be built into the learning algorithms in the context
of DNN and their variants [9–11]. This leads to a physics-informed neural network (PINN), where
physical conservation laws and prior physical knowledge are encoded into the neural networks [1,12].
Consequently, the PINN model relies partially on the data and partially on the physics described by
partial differential equations (PDEs).
Different from traditional PDE solvers, although a PDE is encoded in the neural network, the
PINN does not need to discretize the PDE or employ complicated numerical algorithms to solve the
equations. Instead, PINNs take advantage of the automatic differentiation employed in the backward
propagation to represent all differential operators in a PDE, and of the training of a neural network to
perform a nonlinear mapping from input variables to output quantities by minimizing a loss function.
In this respect, the PINN model is a grid-free approach as no mesh is needed for solving the equations.
All the complexities of solving a physical problem are transferred into the optimization/training
stage of the neural network. Consequently, a PINN is able to unify the formulation and generalize
the procedure of solving physical problems governed by PDEs regardless of the structure of the
equations. Figure 1 graphically describes the structure of the PINN approach [1], where the loss
function of PINN contains a mismatch in the given data Pon the state variables or boundary condition
(BC) and initial condition (IC), i.e., MSE{u,BC,IC} = Nu −1
||u(x, t) − u? || with u? being the given data,
combined with the residual of the PDE computed on a set of random points in the time-space
domain, i.e., MSER = NR −1
P
||R(x, t)||. Then, the PINN can be trained by minimizing the total loss
MSE = MSE{u,BC,IC} + MSER . For solving forward problems, the first term represents a mismatch of
the NN output u(x, t) from boundary and/or initial conditions, i.e., MSEBC,IC . For solving inverse
problems, the first term considers a mismatch of the NN output u(x, t) from additional data sets, i.e.,
MSEu .
In general, PINNs contain three steps to solve a physical problem involving PDEs:
Step 1. Define the PINN architecture.
Step 2. Define the loss function MSE = MSE{u,BC,IC} + MSER .
Step 3. Train the PINN using an appropriate optimizer, i.e., Adam [7], AdaGrad [8].
Recently, PINNs and their variants have been successfully applied to solve both forward and inverse
problems involving PDEs. Examples include the Navier-Stokes and the KdV equations [1], stochastic
PDEs [13–16], and fractional PDEs [17].
In modeling problems involving long-time integration of PDEs, the large number of spatio-
temporal degrees of freedom leads to a large size of data for training the PINN. This will require
that PINNs solve long-time physical problems, which may be computationally prohibitive. To this
end, we propose a parareal physics-informed neural network (PPINN) to split one long-time problem
into many independent short-time problems supervised by an inexpensive/fast coarse-grained (CG)
solver, which is inspired by the original parareal algorithm [18] and a more recent supervised parallel-
in-time algorithm [19]. Because the computational cost of training a DNN increases fast with the
size of data set [20], this PPINN framework is able to maximize the benefit of high computational
efficiency of training a neural network with small-data sets. More specifically, there is a two-fold
benefit from training PINNs with small-data sets rather than working on a large-data set directly,
i.e., training of individual PINNs with small-data is much faster, while training of fine PINNs
can be readily parallelized on multiple GPU-CPU cores. Consequently, compared to the original
PINN approach, the proposed PPINN approach may achieve a significant speed-up for solving
long-time physical problems; this will be verified with benchmark tests for one-dimensional and
two-dimensional nonlinear problems. This favorable speed-up will depend on a good supervisor
that will be represented by a coarse-grained (CG) solver expected to provide reasonable accuracy.
2
Figure 1: Schematic of a physics-informed neural network (PINN), where the loss function of PINN
contains a mismatchP in the given data on the state variables or boundary and initial conditions, i.e.,
MSE{u,BC,IC} = Nu−1 ||u(x, t) − u? ||, andP the residual for the PDE on a set of random points in the
time-space domain, i.e., MSER = NR −1
||R(x, t)||. The hyperparameters of PINN can be learned by
minimizing the total loss MSE = MSE{u,BC,IC} + MSER .
The remainder of this paper is organized as follows. In Section 2 we describe the details of the
PPINN framework and its implementation. In Section 3 we first present a pedagogical example to
demonstrate accuracy and convergence for an one-dimensional time-dependent problem. Subse-
quently, we apply PPINN to solve a two-dimensional nonlinear time-dependent problem, where we
demonstrate the speed-up of PPINN. Finally, we end the paper with a brief summary and discussion
in Section 4.
2 Parareal PINN
2.1 Methodology
For a time-dependent problem involving long-time integration of PDEs for t ∈ [0, T], instead of
solving this problem directly in one single time domain, PPINN splits [0, T] into N sub-domains with
equal length ∆T = T/N. Then, PPINN employs two propagators, i.e., a serial CG solver represented
by G(uki ) in Algorithm 1, and N fine PINNs computing in parallel represented by F (uki ) in Algorithm 1.
Here, uki denotes the approximation to the exact solution at time ti in the k-th iteration. Because the CG
solver is serial in time and fine PINNs run in parallel, to have the optimal computational efficiency,
we encode a simplified PDE (sPDE) into the CG solver as a prediction propagator while the true PDE
is encoded in fine PINNs as the correction propagator. Using this prediction-correction strategy, we
expect the PPINN solution to converge to the true solution after a few iterations.
The details for the PPINN approach are displayed in Algorithm 1 and Fig. 2, which are explained
step by step in the following: Firstly, we simplify the PDE to be solved by the CG solver. For example,
we can replace the nonlinear coefficient in the diffusion equation shown in Sec. 3.3 with a constant to
3
remove the variable/multiscale coefficients. For instance, we can use a CG PINN as the fast CG solver
but we can also explore standard fast finite difference solvers. Secondly, the CG PINN is employed
to solve the sPDE serially for the entire time-domain to obtain an initial solution. Due to the fact
that we can use less residual points as well as smaller neural networks to solve the sPDE rather
than the original PDE, the computational cost in the CG PINN can be significantly reduced. Thirdly,
we decompose the time-domain into N subdomains. We assume that uki is known for tk ≤ ti ≤ tN
(including k = 0, i.e., the initial iteration), which is employed as the initial condition to run the N
fine PINNs in parallel. Once the fine solutions at all ti are obtained, we can compute the discrepancy
between the coarse and fine solution at ti as shown in Step 3(b) in Algorithm 1. We then run the CG
PINN serially to update the solution u for each interface between two adjacent subdomains, i.e., uk+1i+1
(Prediction and Refinement in Algorithm 1). Step 3 in Algorithm 1 is performed iteratively until the
following criterion is satisfied
q
PN−1 k+1
i=0 ||ui − uki ||2
E= q < Etol , (1)
PN−1 k+1 2
i=0 ||ui ||
number of iterations, Tc0 represents the walltime taken by the CG solver for the initialization, while Tck
and Tkf denote the walltimes taken by the coarse and fine propagators at k-th iteration, respectively.
Let τkc and τkf be the walltimes used by CG solver and fine PINN for one subdomain at k-th iteration,
Tck and Tkf can be expressed as
4
Figure 2: Overview of the parareal physics-informed neural network (PPINN) algorithm. Left:
Schematic of the PPINN, where a long-time problem (PINN with full-sized data) is split into many
independent short-time problems (PINN with small-sized data) guided by a fast coarse-grained (CG)
solver. Right: A graphical representation of the parallel-in-time algorithm used in PPINN, in which a
cheap serial CG solver is used to make an approximate prediction of the solution G(uki ), while many
fine PINNs are performed in parallel for getting F (uki ) to correct the solution iteratively. Here, k
represents the index of iteration, and i is the index of time subdomain.
Furthermore, the walltime for the fine PINN to solve the same PDE in serial is expressed as TPINN =
N · T1f . To this end, we can obtain the speed-up ratio of the PPINN as
TPINN N · T1f
S= = , (4)
TPPINN N · τ0c + PK N · τkc + τk
k=1 f
In addition, considering that the solution in each subdomain for two adjacent iterations (k ≥ 2) does
not change dramatically, the training process converges much faster for k ≥ 2 than that of k = 1 in
each fine PINN, i.e., τkf τ1f . Therefore, the lower bound for S can be expressed as
N · τ1f
Smin = . (6)
N · τ0c + N · K · τkc + K · τ1f
This shows that S increases with N if τ0c τ1f , suggesting that the more subdomains we employ, the
larger the speed-up ratio for the PPINN.
5
3 Results
We first show two simple examples for a deterministic and a stochastic ordinary differential equation
(ODE) to explain the method in detail. We then present results for the Burgers equation and a
two-dimensional diffusion-reaction equation.
where p denotes the index of residual points. The PPINN solution after the first iteration presents
obvious deviations, as shown in Fig. 3(a1). In the second iteration, given the solutions of G(ui )|k=0
6
and F (ui )|k=0 , we set the IC for CG PINN as G. Then, we train the CG PINN again for 5, 000 epochs
using a learning rate α = 0.001 and obtain the solutions G(u(t))|k=1 for the chunks 2 to 10, as shown in
Fig. 3(a2). The IC for fine PINN of the second chunk is given by ũ(t = t1 ) = F(u1 )k=1 , and the ICs of
fine PINNs of the chunks 3 to 10 are given by a combination of G(u(t))|k=0 , F (u(t))|k=0 and G(u(t))|k=1
at t = ti=2,3...,9 , i.e., ũ(t = ti ) = G(ui )|k=1 − [F (ui )|k=0 − G(ui )|k=0 ]. With these ICs, we train the nine
individual fine PINNs (chunks 2 to 10) in parallel using the Adam algorithm with a learning rate of
α = 0.001 until the loss function for each fine PINN drops below 10−6 . We find that the fine PINNs in
the second iteration converge much faster than in the first iteration, with an average ecochs of 4836
compared to 11524 in the first iteration. The accelerated training process benefits from the results of
training performed in the previous iteration. The PPINN solutions for each chunk after the second
iteration are presented in Fig. 3(a2), showing a significant improvement of the solution with a relative
error l2 = 9.97%. Using the same method, we perform a third iteration, and the PPINN solutions
converge to the exact solution with a relative error l2 = 0.08%, as shown in Fig. 3(a3).
We proceed to investigate the computational efficiency of the PPINN. As mentioned in Sec. 2.2,
we may obtain a speed-up ratio, which grows linearly with the number of subdomains if the coarse
solver is efficient enough. We first test the speed-ups for the PPINN using a PINN as the coarse
solver. Here we test four different subdomain decompositions, i.e. 10, 20, 40, and 80. For the coarse
solver, we assign one PINN (CG PINN, [1] + [4] × 2 + [1]) for each subdomain, and 10 randomly
sampled residual points are used in each subdomain. Furthermore, we run all the CG PINNs serially
using one CPU (Intel Xeon E5-2670). For the fine solver, we again employ one PINN (fine PINN) for
each subdomain to solve Eq. (7), and each subdomain is assigned to one CPU (Intel Xeon E5-2670).
The total number for the residual points is 400,000, which are randomly sampled in the entire time
(a) (b)
Figure 3: A pedagogical example to illustrate the procedure of using PPINN to solve a deterministic
ODE. Here a PINN is also employed for the CG solver. (a) The solutions of CG PINN G(u(t)|k ) and
fine PINNs F (u(t)|k ) in different iterations are plotted for (a1) k = 0, (a2) k = 1 and (a3) k = 2 to solve
the ODE du/dt = 1 + π/2 · cos(πt/2). (b) Speed-ups for the PPINN using the different coarse solvers,
i.e. PINN (magenta dashed line with circle), FDM (blue dashed line with plus), and analytic solution
(red dashed line with triangle). N denotes the number of the subdomains. The linear speed-up ratio
is calculated as N/K. Black dashed line: K = 3, Cyan dashed line: K = 2.
7
domain and will be uniformly distributed in each subdomain. Meanwhile, the architecture for the
fine PINN in each subdomain is [1] + [20] × 2 + [1] for the first two cases (i.e., 10 and 20 subdomains),
which is then set as [1]+[10]×2+[1] for the last two cases (i.e., 40 and 80 subdomains). The speed-ups
are displayed in Fig. 3(b) (magenta dashed line), where we observe that the speed-up first increases
with N as N ≤ 20, then it decreases with the increasing N. We further look into the details of the
computational time for this case. As shown in Table 1, we found that the computational time taken
by the CG PINNs increases with the number of the subdomains. In particular, more than 90 % of
the computational time is taken by the coarse solver as N ≥ 40, suggesting the inefficiency of the CG
PINN. To obtain satisfactory speed-ups using the PPINN, a more efficient coarse solver should be
utilized.
Subdomains (#) Iterations (#) NCG (#) TCG (s) Ttotal (s) S
1 - - - 1,793.0 -
10 2 10 62.4 407.4 4.4
20 2 5 189.2 275.1 6.5
40 3 4 617.8 685.2 2.6
80 3 4 2,424.6 2,453.0 0.73
Table 1: Speed-ups for the PPINN using a PINN as coarse solver to solve Eq. (7). NCG is the number
of residual points used for the CG PINN in each subdomain, TCG represents the computational time
taken by the coarse solver, and Ttotal denotes the total computational time taken by the PPINN.
Considering that the simplified ODE can be solved analytically, we can thus directly use the
analytic solution for the coarse solver which has no cost. In addition, all parameters in the fine
PINN are kept the same as the previously used ones. For validation, we again use 10 subdomains
in the PPINN to solve Eq. (7). The l2 relative error between the predicted and analytic solutions is
1.252×10−5 after two iterations, which confirms the accuracy of the PPINN. In addition, the speed-ups
for the four cases are displayed in Fig. 3(b) (red dashed line with triangle). It is interesting to find
that the PPINN can achieve a superlinear speed-up here. We further present the computational time
at each iteration in Table 2. We see that the computational time taken by the coarse solver is now
negligible compared to the total computational time. Since the fine PINN converges faster after the
first iteration, we can thus obtain a superlinear speed-up for the PPINN.
Instead of the analytic solution, we can also use other efficient numerical methods to serve as
the coarse solver. For demonstration purpose, we then present the results using the finite difference
method (FDM) as the coarse solver. The entire domain is discretized into 1,000 uniform elements,
which are then uniformly distributed to each subdomain. Similarly, we also use 10 subdomains for
validation. It also takes 2 iterations to converge, and the l2 relative error is 1.257 × 10−5 . In addition,
we can also obtain a superlinear speed-up which is quite similar to the case using the analytic solution
due to the efficiency of the FDM (Fig. 3(b) and Table 2).
du π π π
= β −u + β sin( t) + cos( t) , t ∈ [0, T], (9)
dt 2 2 2
8
Subdomains (#) Iterations (#) NCG (#) TCG (s) Ttotal (s) S
1 - - - 1,793.0 -
10 2 - < 0.05 295.8 6.1
PPINN (Analytic) 20 2 - < 0.05 71.4 25.1
40 2 - < 0.05 38.6 46.4
80 2 - < 0.05 29.0 61.6
1 - - - 1,793.0 -
10 2 100 < 0.05 298.7 6.0
PPINN (FDM) 20 2 50 < 0.05 81.6 22.0
40 2 40 < 0.05 38.5 46.5
80 2 12 < 0.05 28.8 62.2
Table 2: Walltimes for using the PPINN with different coarse solvers to solve Eq. (7). NCG is the
number of elements used for the FDM in each subdomain, TCG represents the computational time
taken by the coarse solver, and Ttotal denotes the total computational time taken by the PPINN.
where T = 10, β = β0 + , β0 = 0.1, and is drawn from a normal distribution with zero mean and 0.05
standard deviation. In addition, the initial condition for Eq. (9) is u(0) = 1.
In the present PPINN, we employ a deterministic ODE for the coarse solver as
du
= −β0 u, t ∈ [t0 , T]. (10)
dt
Given the initial condition u(t0 ) = u0 , we can obtain the analytic solution for Eq. (10) as u =
u0 exp(−β0 (t − t0 )). For the fine solver, we draw 100 samples for the β using the quasi-Monte Carlo
method, which are then solved by the fine PINNs. Similarly, we utilize three different methods for
the coarse solver, i.e. the PINN, FDM, and analytic solution.
For validation purposes, we first decompose the time-domain into 10 uniform subdomains. For
the FDM, we discretize the whole domain into 1,000 uniform elements. For the fine PINNs, we
employ 400,000 randomly sampled residual points for the entire time domain, which are uniformly
distributed to all the subdomains. We employ one fine PINN for each subdomain to solve the ODE,
which has an architecture of [1] + [20] × 2 + [1]. Finally, the simplified ODE in each subdomain for
the coarse solver is solved serially, while the exact ODE in each subdomain for the fine solver is
solved in parallel. We illustrate the comparison between the predicted and exact solutions at two
representative β, i.e. β = 0.108 and 0.090 in Fig. 4(a). We see that the predicted solutions converge
to the exact ones after two iterations, which confirms the effectiveness of the PPINN for solving
stochastic ODEs. The solutions from the PPINN with the other two different coarse solvers (i.e., the
PINN and analytic solution) also agree well with the reference solutions, but they are not presented
here. Furthermore, we also present the computational efficiency for the PPINN using four different
numbers of subdomains, i.e. 10, 20, 40 and 80 in Fig. 4(b). It is clear that the PPINN can still achieve
a superlinear speed-up if we use an efficient coarse solver such as FDM or analytic solution. The
speed-up for the PPINN with the PINN as coarse solver is similar to the results in Sec. 3.1.1, i.e., the
speed-up ratio first slightly increases with the number of subdomains as N ≤ 20, then it decreases
with the increasing N. Finally, the speed-ups for the PPINN with FDM coarse solver are almost the
same as the PPINN with analytic solution coarse solver, which are similar as the results in Sec. 3.1.1
and will not be discussed in detail here.
In summary, the PPINN can work for both deterministic and stochastic ODEs. In addition, using
the PINN as the coarse solver can guarantee the accuracy for solving the ODE, but the speed-up
9
(a) (b)
Figure 4: A pedagogical example for using the PPINN to solve stochastic ODE. (a) The solutions of
CG solver G(u(t)|k ) and fine PINNs F (u(t)|k ) in different iterations are plotted for (a1) β = 0.108, and
(a2) β = 0.090. The reference solution is u(t) = exp(−βt) + β sin( π2 t) for each β. (b) Speed-ups for the
PPINN using different coarse solvers, i.e. the PINN (magenta dashed line with circle), the FDM (blue
dashed line with plus), and analytic solution (red dashed line with triangle). The linear speed-up
ratio is calculated as N/K. Black dashed line: K = 4, Cyan dashed line: K = 2. For the CG PINN,
we use 1,000 randomly sampled residual points in the whole time domain, which are uniformly
distributed to the subdomains. The architecture of the PINN for each subdomain is the same, i.e.
[1] + [10] × 2 + [1]. For the fine PINN, the architecture is [1] + [20] × 2 + [1] for the cases with 10 and
20 subdomains, and it is [1] + [10] × 2 + [1] for the cases with 40 and 80 subdomains.
may decrease with the number of subdomains due to the inefficiency of the PINN. We can achieve
both high accuracy and good speed-up if we select more efficient coarse solvers, such as an analytic
solution, a finite difference method, and so on.
∂u ∂u ∂2 u
+u = ν 2, (11)
∂t ∂x ∂x
which is a mathematical model for the viscous flow, gas dynamics, and traffic flow, with u denoting
the speed of the fluid, ν the kinematic viscosity, x the spatial coordinate and t the time. Given an
initial condition u(x, 0) = − sin(πx) in a domain x ∈ [−1, 1], and the boundary condition u(±1, t) = 0
for t ∈ [0, 1], the PDE we would like to solve is Eq. (11) with a viscosity of ν = 0.03/π.
In the PPINN, the temporal domain t ∈ [0, 1] is decomposed into 10 uniform subdomains. Each
subdomain has a time length ∆t = 0.1. The simplified PDE for the CG PINN is also a Burgers equation,
which uses the same initial and boundary conditions but with a larger viscosity νc = 0.05/π. It is well
known that the Burgers equation with a small viscosity will develop steep gradient in its solution
as time evolves. The increased viscosity will lead to a smoother solution, which can be captured
using much less computational cost. Here, we use the same NN for the CG and fine PINNs for
10
each subdomain, i.e., 3 hidden layers with 20 neurons per layer. The learning rates are also set to
be the same, i.e., 10−3 . Instead of using one optimizer in the last case, here we use two different
optimizations, i.e., we first train the PINNs using the Adam optimizer (first-order convergence rate)
until the loss is less than 10−3 , then we proceed to employ the L-BFGS-B method to further train the
NNs. The L-BFGS is a quasi-Newtonian approach which has second-order convergence rate and can
enhance the convergence of the training [1].
For the CG PINN in each subdomain, we use 300 randomly sampled residual points to compute
the MSER , while 1, 500 random residual points are employed in the fine PINN for each subdomain. In
addition, 100 uniformly distributed points are employed to compute the MSEIC is for each subdomain,
and 10 randomly sampled points are used to compute the MSEBC in both the CG and fine PINNs.
For this particular case, it takes only 2 iterations to meet the convergence criterion, i.e., Etol = 1%.
The distributions of the u at each iteration are plotted in Fig. 5. As shown in Fig. 5(a), the solution
from the fine PINNs (F (u|k=0 )) is different from the exact one, but the discrepancy is observed to be
small. It is also observed that the solution from the CG PINNs is smoother than that from the fine
PINNs especially for solutions at large times, e.g., t = 0.9. In addition, the velocity at the interface
between two adjacent subdomains is not continuous due to the inaccurate initial condition for each
subdomain at the first iteration. At the second iteration (Fig. 5(b)), the solution from the Refinement
step i.e. u|k=2 shows little difference from the exact one, which confirms the effectiveness of the PPINN.
Moreover, the discontinuity between two adjacent subdomains is significantly reduced. Finally, it is
also interesting to find that the number of the training steps for each subdomain at the first iteration
is from ten to hundred thousand, but they decrease to a few hundred at the second iteration. Similar
results are also reported and analyzed in Sec. 3.1, which will not be presented here again.
To test the flexibility of the PPINN, we further employ two much larger viscosities in the coarse
solver, i.e. νc = 0.09/π and 0.12/π, which are 3× and 4× the exact viscosity, respectively. All the
parameters (e.g., the architecture of the PINN, the number of residual points, etc.) in these two cases
are kept the same as the case with νc = 0.05/π. As shown in Table 3, it is interesting to find that
the computational errors for these three cases are comparable, but the number of iterations increases
with the viscosity employed in the coarse solver. Hence, in order to obtain an optimum strategy in
selecting the CG solver we have to consider the trade-off between the accuracy that the CG solver
can obtain and the number of iterations required for convergence of the PPINN.
Table 3: The PPINN for solving the Burgers equation with different viscosities in the coarse solver.
νc represents the viscosity employed in the coarse solver.
11
(a) (b)
Figure 5: Example of using the PPINN for solving the Burgers equation. G(u(t)|k ) represents the
rough prediction generated by the CG PINN in the (k + 1)-th iteration, while F (u(t)|k ) is the solution
corrected in parallel by the fine PINNs. (a) Predictions after the first iteration (k = 0) at t = 0.3 and
0.9), (b) Predictions after the second iteration (k = 1) at t = 0.3 and 0.9.
where l = 1 is the length of the computational domain, C is the solute concentration, D is the diffusion
coefficient, and A is a constant. Here D depends on C as follows:
D = D0 exp(RC), (15)
where D0 = 1 × 10−2 .
We first set T = 1, A = 0.5511, and R = 1 to validate the proposed algorithm for the two-
dimensional case. In the PPINN, we use the PINNs for both the coarse and fine solver. The time
domain is divided into 10 uniform subdomains with ∆t = 0.1 for each subdomain. For the coarse
solver, the following simplified PDE ∂t C = D0 ∇2 C + 2A sin(πx/l) sin(πy/l) is solved with the same
initial and boundary conditions described in Eq. (12). The diffusion coefficient for the coarse solver is
about 1.7 times smaller than the maximum D in the fine solver. The architecture of the NNs employed
in both the coarse and fine solvers for each subdomain is kept the same, i.e., 2 hidden layers with 20
neurons per layer. The employed learning rate as well as the optimization method are the same as
those applied in Sec. 3.2.
We employ 2,000 random residual points to compute the MSER in each subdomain for the coarse
solver. We also employ 1,000 random points to compute the MSEBC for each boundary, and 2,000
random points to compute the MSEIC . For each fine PINN, we use 5,000 random residual points
for MSER , while the numbers of training points for the boundary and initial conditions are kept the
same as those used in the coarse solver. It takes two iterations to meet the convergence criterion,
i.e. Etol = 1%. The PPINN solutions as well as the distribution of the diffusion coefficient for each
iteration at three representative times (i.e., t = 0.4, 0.8, and 1) are displayed in Figs. 6 and 7,
respectively. We see that the solution at t = 0.4 agrees well with the reference solution after the first
iteration, while the discrepancy between the PPINN and reference solutions increases with the time,
which can be observed for t = 0.8 and 1 in the first row of Fig. 6. This is reasonable because the error
of the initial condition for each subdomain will increase with time. In addition, the solutions at the
12
Figure 6: Example of using PPINN to solve the nonlinear diffusion-reaction equation. First row
(k = 0): First iteration, Second row (k = 1): Second iteration. Black solid line: Reference solution,
which is obtained from the lattice Boltzmann method using a uniform grid of x× y×t = 200×200×100
[21]. Red dashed line: Solution from the PPINN.
Figure 7: Distribution of the normalized diffusion coefficient (D/D0 ). First row (k = 0): First iteration,
Second row (k = 1): Second iteration.
13
three representative times agree quite well with the reference solutions after the second iteration, as
demonstrated in the second row in Fig. 6. All the above results again confirm the accuracy of the
present approach.
Table 4: Parameters used in the PPINN for modeling the two-dimensional diffusion-reaction system.
The NNs are first trained using the Adam optimizer with a learning rate 10−3 until the loss is less
than 10−3 , then we utilize the L-BFGS-B to further train the NNs [1].
Table 5: Speed-ups for using the PPINN with different coarse solvers to model the nonlinear
diffusion-reaction system. Ttotal denotes the total computational time taken by the PPINN, and S is
the speed-up ratio.
We further investigate the speed-up of the PPINN using the PINN as coarse solver. Here T = 10,
A = 0.1146, and R = 0.5, and we use five different subdomains, i.e., 1, 10, 20, 40, and 50. The
parallel code is run on multiple CPUs (Intel Xeon E5-2670). The total number of the residual points
in the coarse solver is 20,000, which is uniformly divided into N subdomains. The number of
training points for the boundary condition on each boundary is 10,000, which is also uniformly
assigned to each subdomain. In addition, 2,000 randomly sampled points are employed for the
initial condition. The architecture and other parameters (e.g., learning rate, optimizer, etc) for the
CG PINN in each subdomain are the same as the fine PINN, which are displayed in Table 4. The
speed-up ratios for the representative cases are shown in Table 5. We notice that the speed-up does
not increase monotonically with the number of subdomains. On the contrary, the speed-up decreases
with the number of subdomains. This result suggests that the cost for the CG PINN is not only
related to the number of training data but is also affected by the number of hyperparameters. The
total hyperparameters in the CG PINNs increases with the number of subdomains, leading to the
walltime for the PPINN to increase with the number of subdomains.
To validate the above hypothesis, we replace the PINN with the finite difference method [22]
(FDM, grid size: x × y = 20 × 20, time step δt = 0.05) for the coarse solver, which is much more
14
efficient than the PINN. The walltime for the FDM in each subdomain is negligible compared to the
fine PINN. As shown in Table 5, the speed-ups are now proportional to the number of subdomains
as expected.
15
with large spatial databases by drawing an analog of multigrid/multiresolution methods [24], where
a cheap coarse-grained (CG) PINN should be constructed and be used to supervise and connect the
PINN solutions of sub-domains iteratively. Moreover, in this prediction-correction framework, the
CG PINN only provides a rough prediction of the solution, and the equations encoded into the CG
PINN can be different from the equations encoded in the fine PINNs. Therefore, PPINN is able
to tackle multi-fidelity modeling of inverse physical problems [25]. Both topics are interesting and
would be investigated in future work.
Acknowledgements
This work was supported by the DOE PhILMs project DE-SC0019453 and the DOE-BER grant DE-
SC0019434. This research was conducted using computational resources and services at the Center
for Computation and Visualization, Brown University. Z. Li would like to thank Dr. Zhiping Mao,
and Dr. Ansel L Blumers for helpful discussions.
References
[1] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learn-
ing framework for solving forward and inverse problems involving nonlinear partial differential
equations. J. Comput. Phys., 378:686–707, 2019.
[2] N. O. Hodas and P. Stinis. Doing the impossible: Why neural networks can be trained at all.
Front. Psychol., 9(1185), 2018.
[4] J. Lew, D. A. Shah, S. Pati, S. Cattell, M. Zhang, A. Sandhupatla, C. Ng, N. Goli, M. D. Sinclair,
T. G. Rogers, and T. M. Aamodt. Analyzing machine learning workloads using a detailed GPU
simulator. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software
(ISPASS), pages 151–152, 2019.
[5] Y. You, Z. Zhang, C. Hsieh, J. Demmel, and K. Keutzer. Fast deep neural network training on
distributed systems and cloud TPUs. IEEE Transactions on Parallel and Distributed Systems, pages
1–14, 2019.
[6] L. Bottou, F. Curtis, and J. Nocedal. Optimization methods for large-scale machine learning.
SIAM Review, 60(2):223–311, 2018.
[7] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980, 2014.
[8] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and
stochastic optimization. J. Mach. Learn. Res., 12(Jul):2121–2159, 2011.
[9] C. Michoski, M. Milosavljevic, T. Oliver, and D. Hatch. Solving irregular and data-enriched
differential equations using deep neural networks. arXiv preprint arXiv:1905.04351, 2019.
16
[10] N. A. K. Doan, W. Polifke, and L. Magri. Physics-informed echo state networks for chaotic sys-
tems forecasting. In ICCS 2019 - International Conference on Computational Science, Faro, Portugal,
2019.
[14] D. Zhang, L. Lu, L. Guo, and G. E. Karniadakis. Quantifying total uncertainty in physics-
informed neural networks for solving forward and inverse stochastic problems. arXiv preprint
arXiv:1809.08327, 2018.
[15] D. Zhang, L. Guo, and G. E. Karniadakis. Learning in modal space: Solving time-dependent
stochastic PDEs using physics-informed neural networks. arXiv preprint arXiv:1905.01205, 2019.
[17] G. Pang, L. Lu, and G. E. Karniadakis. fPINNs: Fractional physics-informed neural networks.
arXiv preprint arXiv:1811.08967, 2018.
[18] Y. Maday and G. Turinici. A parareal in time procedure for the control of partial differential
equations. C. R. Math., 335(4):387–392, 2002.
[19] A. Blumers, Z. Li, and G. E. Karniadakis. Supervised parallel-in-time algorithm for long-time
Lagrangian simulations of stochastic dynamics: Application to hydrodynamics. J. Comput. Phys.,
393:214–228, 2019.
[20] R. Livni, S. Shalev-Shwartz, and O. Shamir. On the computational efficiency of training neural
networks. In Advances in neural information processing systems, pages 855–863, 2014.
[21] X. Meng and Z. Guo. Localized lattice Boltzmann equation model for simulating miscible viscous
displacement in porous media. Int. J. Heat Mass Tran., 100:767–778, 2016.
[22] M. M. Meerschaert, H.-P. Scheffler, and C. Tadjeran. Finite difference methods for two-
dimensional fractional dispersion equation. J. Comp. Phys., 211(1):249–261, 2006.
[23] J. Sanders and E. Kandrot. CUDA by example: an introduction to general-purpose GPU programming.
Addison-Wesley Professional, 2010.
[24] G. Beylkin and N. Coult. A multiresolution strategy for reduction of elliptic PDEs and eigenvalue
problems. Appl. Comput. Harmon. Anal., 5(2):129–155, 1998.
[25] X. Meng and G. E. Karniadakis. A composite neural network that learns from multi-fidelity
data: Application to function approximation and inverse PDE problems. arXiv preprint
arXiv:1903.00104, 2019.
17