0% found this document useful (0 votes)
3 views

PPINN

The document presents the Parareal Physics-Informed Neural Network (PPINN), a novel approach designed to efficiently solve time-dependent partial differential equations (PDEs) by breaking long-time problems into manageable short-time problems. By utilizing a coarse-grained solver to provide initial predictions and training multiple fine PINNs in parallel, the PPINN significantly reduces computational costs and improves convergence speed. The effectiveness of the PPINN is demonstrated through applications to the Burgers equation and a two-dimensional nonlinear diffusion-reaction equation, showing substantial speed-ups in solving long-time PDEs.

Uploaded by

Thế Trần
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

PPINN

The document presents the Parareal Physics-Informed Neural Network (PPINN), a novel approach designed to efficiently solve time-dependent partial differential equations (PDEs) by breaking long-time problems into manageable short-time problems. By utilizing a coarse-grained solver to provide initial predictions and training multiple fine PINNs in parallel, the PPINN significantly reduces computational costs and improves convergence speed. The effectiveness of the PPINN is demonstrated through applications to the Burgers equation and a two-dimensional nonlinear diffusion-reaction equation, showing substantial speed-ups in solving long-time PDEs.

Uploaded by

Thế Trần
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/335990167

PPINN: Parareal Physics-Informed Neural Network for time-dependent PDEs

Preprint · September 2019


DOI: 10.48550/arXiv.1909.10145

CITATIONS READS
0 9,598

4 authors, including:

Xuhui Meng Zhen Li


Huazhong University of Science and Technology Clemson University
59 PUBLICATIONS 5,842 CITATIONS 106 PUBLICATIONS 2,758 CITATIONS

SEE PROFILE SEE PROFILE

George Em Karniadakis
Brown University
1,018 PUBLICATIONS 97,861 CITATIONS

SEE PROFILE

All content following this page was uploaded by Zhen Li on 02 October 2019.

The user has requested enhancement of the downloaded file.


PPINN: Parareal Physics-Informed Neural Network for
time-dependent PDEs

Xuhui Meng1∗ , Zhen Li2∗, Dongkun Zhang1 and George Em Karniadakis1†


arXiv:1909.10145v1 [physics.comp-ph] 23 Sep 2019

1 Division of Applied Mathematics, Brown University, Providence, RI 02912, USA


2 Department of Mechanical Engineering, Clemson University, Clemson, SC 29634, USA

September 24, 2019

Abstract
Physics-informed neural networks (PINNs) encode physical conservation laws and prior physical
knowledge into the neural networks, ensuring the correct physics is represented accurately while
alleviating the need for supervised learning to a great degree [1]. While effective for relatively
short-term time integration, when long time integration of the time-dependent PDEs is sought,
the time-space domain may become arbitrarily large and hence training of the neural network
may become prohibitively expensive. To this end, we develop a parareal physics-informed neural
network (PPINN), hence decomposing a long-time problem into many independent short-time
problems supervised by an inexpensive/fast coarse-grained (CG) solver. In particular, the serial CG
solver is designed to provide approximate predictions of the solution at discrete times, while initiate
many fine PINNs simultaneously to correct the solution iteratively. There is a two-fold benefit
from training PINNs with small-data sets rather than working on a large-data set directly, i.e.,
training of individual PINNs with small-data is much faster, while training the fine PINNs can be
readily parallelized. Consequently, compared to the original PINN approach, the proposed PPINN
approach may achieve a significant speedup for long-time integration of PDEs, assuming that the
CG solver is fast and can provide reasonable predictions of the solution, hence aiding the PPINN
solution to converge in just a few iterations. To investigate the PPINN performance on solving
time-dependent PDEs, we first apply the PPINN to solve the Burgers equation, and subsequently
we apply the PPINN to solve a two-dimensional nonlinear diffusion-reaction equation. Our
results demonstrate that PPINNs converge in a couple of iterations with significant speed-ups
proportional to the number of time-subdomains employed.

1 Introduction
At a cost of a relatively expensive computation in the training process, deep neural networks (DNNs)
provide a powerful approach to explore hidden correlations in massive data, which in many cases are
physically not possible with human manual review [2]. In the past decade, the large computational
cost for training DNN has been mitigated by a number of advances, including high-performance
computers [3], graphics processing units (GPUs) [4], tensor processing units (TPUs) [5], and fast large-
scale optimization schemes [6], i.e., adaptive moment estimation (Adam) [7] and adaptive gradient

The first two authors contributed equally to this work.

Corresponding Email: george [email protected]

1
algorithm (AdaGrad) [8]. In many instances for modeling physical systems, physical invariants, e.g.,
momentum and energy conservation laws, can be built into the learning algorithms in the context
of DNN and their variants [9–11]. This leads to a physics-informed neural network (PINN), where
physical conservation laws and prior physical knowledge are encoded into the neural networks [1,12].
Consequently, the PINN model relies partially on the data and partially on the physics described by
partial differential equations (PDEs).
Different from traditional PDE solvers, although a PDE is encoded in the neural network, the
PINN does not need to discretize the PDE or employ complicated numerical algorithms to solve the
equations. Instead, PINNs take advantage of the automatic differentiation employed in the backward
propagation to represent all differential operators in a PDE, and of the training of a neural network to
perform a nonlinear mapping from input variables to output quantities by minimizing a loss function.
In this respect, the PINN model is a grid-free approach as no mesh is needed for solving the equations.
All the complexities of solving a physical problem are transferred into the optimization/training
stage of the neural network. Consequently, a PINN is able to unify the formulation and generalize
the procedure of solving physical problems governed by PDEs regardless of the structure of the
equations. Figure 1 graphically describes the structure of the PINN approach [1], where the loss
function of PINN contains a mismatch in the given data Pon the state variables or boundary condition
(BC) and initial condition (IC), i.e., MSE{u,BC,IC} = Nu −1
||u(x, t) − u? || with u? being the given data,
combined with the residual of the PDE computed on a set of random points in the time-space
domain, i.e., MSER = NR −1
P
||R(x, t)||. Then, the PINN can be trained by minimizing the total loss
MSE = MSE{u,BC,IC} + MSER . For solving forward problems, the first term represents a mismatch of
the NN output u(x, t) from boundary and/or initial conditions, i.e., MSEBC,IC . For solving inverse
problems, the first term considers a mismatch of the NN output u(x, t) from additional data sets, i.e.,
MSEu .
In general, PINNs contain three steps to solve a physical problem involving PDEs:
Step 1. Define the PINN architecture.
Step 2. Define the loss function MSE = MSE{u,BC,IC} + MSER .
Step 3. Train the PINN using an appropriate optimizer, i.e., Adam [7], AdaGrad [8].
Recently, PINNs and their variants have been successfully applied to solve both forward and inverse
problems involving PDEs. Examples include the Navier-Stokes and the KdV equations [1], stochastic
PDEs [13–16], and fractional PDEs [17].
In modeling problems involving long-time integration of PDEs, the large number of spatio-
temporal degrees of freedom leads to a large size of data for training the PINN. This will require
that PINNs solve long-time physical problems, which may be computationally prohibitive. To this
end, we propose a parareal physics-informed neural network (PPINN) to split one long-time problem
into many independent short-time problems supervised by an inexpensive/fast coarse-grained (CG)
solver, which is inspired by the original parareal algorithm [18] and a more recent supervised parallel-
in-time algorithm [19]. Because the computational cost of training a DNN increases fast with the
size of data set [20], this PPINN framework is able to maximize the benefit of high computational
efficiency of training a neural network with small-data sets. More specifically, there is a two-fold
benefit from training PINNs with small-data sets rather than working on a large-data set directly,
i.e., training of individual PINNs with small-data is much faster, while training of fine PINNs
can be readily parallelized on multiple GPU-CPU cores. Consequently, compared to the original
PINN approach, the proposed PPINN approach may achieve a significant speed-up for solving
long-time physical problems; this will be verified with benchmark tests for one-dimensional and
two-dimensional nonlinear problems. This favorable speed-up will depend on a good supervisor
that will be represented by a coarse-grained (CG) solver expected to provide reasonable accuracy.

2
Figure 1: Schematic of a physics-informed neural network (PINN), where the loss function of PINN
contains a mismatchP in the given data on the state variables or boundary and initial conditions, i.e.,
MSE{u,BC,IC} = Nu−1 ||u(x, t) − u? ||, andP the residual for the PDE on a set of random points in the
time-space domain, i.e., MSER = NR −1
||R(x, t)||. The hyperparameters of PINN can be learned by
minimizing the total loss MSE = MSE{u,BC,IC} + MSER .

The remainder of this paper is organized as follows. In Section 2 we describe the details of the
PPINN framework and its implementation. In Section 3 we first present a pedagogical example to
demonstrate accuracy and convergence for an one-dimensional time-dependent problem. Subse-
quently, we apply PPINN to solve a two-dimensional nonlinear time-dependent problem, where we
demonstrate the speed-up of PPINN. Finally, we end the paper with a brief summary and discussion
in Section 4.

2 Parareal PINN
2.1 Methodology
For a time-dependent problem involving long-time integration of PDEs for t ∈ [0, T], instead of
solving this problem directly in one single time domain, PPINN splits [0, T] into N sub-domains with
equal length ∆T = T/N. Then, PPINN employs two propagators, i.e., a serial CG solver represented
by G(uki ) in Algorithm 1, and N fine PINNs computing in parallel represented by F (uki ) in Algorithm 1.
Here, uki denotes the approximation to the exact solution at time ti in the k-th iteration. Because the CG
solver is serial in time and fine PINNs run in parallel, to have the optimal computational efficiency,
we encode a simplified PDE (sPDE) into the CG solver as a prediction propagator while the true PDE
is encoded in fine PINNs as the correction propagator. Using this prediction-correction strategy, we
expect the PPINN solution to converge to the true solution after a few iterations.
The details for the PPINN approach are displayed in Algorithm 1 and Fig. 2, which are explained
step by step in the following: Firstly, we simplify the PDE to be solved by the CG solver. For example,
we can replace the nonlinear coefficient in the diffusion equation shown in Sec. 3.3 with a constant to

3
remove the variable/multiscale coefficients. For instance, we can use a CG PINN as the fast CG solver
but we can also explore standard fast finite difference solvers. Secondly, the CG PINN is employed
to solve the sPDE serially for the entire time-domain to obtain an initial solution. Due to the fact
that we can use less residual points as well as smaller neural networks to solve the sPDE rather
than the original PDE, the computational cost in the CG PINN can be significantly reduced. Thirdly,
we decompose the time-domain into N subdomains. We assume that uki is known for tk ≤ ti ≤ tN
(including k = 0, i.e., the initial iteration), which is employed as the initial condition to run the N
fine PINNs in parallel. Once the fine solutions at all ti are obtained, we can compute the discrepancy
between the coarse and fine solution at ti as shown in Step 3(b) in Algorithm 1. We then run the CG
PINN serially to update the solution u for each interface between two adjacent subdomains, i.e., uk+1i+1
(Prediction and Refinement in Algorithm 1). Step 3 in Algorithm 1 is performed iteratively until the
following criterion is satisfied
q
PN−1 k+1
i=0 ||ui − uki ||2
E= q < Etol , (1)
PN−1 k+1 2
i=0 ||ui ||

where Etol is a user-defined tolerance, which is set as 1% in the present study.

Algorithm 1: PPINN algorithm


1: Model reduction: Specify a simplified PDE for the fast CG solver to supervise the fine PINNs.
2: Initialization:
Solve the sPDE with the CG solver for the initial iteration: u0i+1 = G(u0i ) for all 0 ≤ ti ≤ tN .
3: Assume {uki } is known for tk ≤ ti ≤ tN and k ≥ 0:
Correction:
(a) Advance with fine PINNs in parallel: F (uki ) for all tk ≤ ti ≤ tN−1 .
(b) Correction: δki = F (uki ) − G(uki ) for all tk ≤ ti ≤ tN−1 .
Prediction:
Advance with the fast CG solver in serial: G(uik+1 ) for all tk+1 ≤ ti ≤ tN−1 .
Refinement:
Combine the correction and prediction terms: ui+1 k+1
= G(uk+1
i
) + F (uki ) − G(uki )
4: Repeat Step 3 to compute uk+2 i
for all 1 ≤ ti ≤ tN until a convergence criterion is met.

2.2 Speed-up analysis


The walltime for the PPINN can be expressed as TPPINN = Tc0 + Kk=1 (Tck + Tkf ), where K is the total
P

number of iterations, Tc0 represents the walltime taken by the CG solver for the initialization, while Tck
and Tkf denote the walltimes taken by the coarse and fine propagators at k-th iteration, respectively.
Let τkc and τkf be the walltimes used by CG solver and fine PINN for one subdomain at k-th iteration,
Tck and Tkf can be expressed as

Tck = N · τkc , Tkf = τkf , (2)

4
Figure 2: Overview of the parareal physics-informed neural network (PPINN) algorithm. Left:
Schematic of the PPINN, where a long-time problem (PINN with full-sized data) is split into many
independent short-time problems (PINN with small-sized data) guided by a fast coarse-grained (CG)
solver. Right: A graphical representation of the parallel-in-time algorithm used in PPINN, in which a
cheap serial CG solver is used to make an approximate prediction of the solution G(uki ), while many
fine PINNs are performed in parallel for getting F (uki ) to correct the solution iteratively. Here, k
represents the index of iteration, and i is the index of time subdomain.

where N is the number of subdomains. Therefore, the walltime for PPINN is


K 
X  K 
X 
TPPINN = Tc0 + N · τc + τ f = N · τc +
k k 0
N · τkc + τkf . (3)
k=1 k=1

Furthermore, the walltime for the fine PINN to solve the same PDE in serial is expressed as TPINN =
N · T1f . To this end, we can obtain the speed-up ratio of the PPINN as

TPINN N · T1f
S= =  , (4)
TPPINN N · τ0c + PK N · τkc + τk
k=1 f

which can be rewritten as


N · τ1f
S= . (5)
N · τ0c + τ1f + N · K · τkc + (K − 1) · τkf

In addition, considering that the solution in each subdomain for two adjacent iterations (k ≥ 2) does
not change dramatically, the training process converges much faster for k ≥ 2 than that of k = 1 in
each fine PINN, i.e., τkf  τ1f . Therefore, the lower bound for S can be expressed as

N · τ1f
Smin = . (6)
N · τ0c + N · K · τkc + K · τ1f

This shows that S increases with N if τ0c  τ1f , suggesting that the more subdomains we employ, the
larger the speed-up ratio for the PPINN.

5
3 Results
We first show two simple examples for a deterministic and a stochastic ordinary differential equation
(ODE) to explain the method in detail. We then present results for the Burgers equation and a
two-dimensional diffusion-reaction equation.

3.1 Pedagogical examples


We first present two pedagogical examples, i.e., a deterministic and a stochastic ODE, to demonstrate
the procedure of using PPINN to solve simple time-dependent problems; the main steps and key
features of PPINN can thus be easily understood.

3.1.1 Deterministic ODE


The deterministic ODE considered here reads as
du
= a + ω cos(ωt), (7)
dt
where the two parameters are a = 1 and ω = π/2. Given an initial condition u(0) = 0, the exact
solution of this ODE is u(t) = t + sin(πt/2). The length of time domain we are interested is T = 10.
In the PPINN approach, we decompose the time-domain t ∈ [0, 10] into 10 subdomains, with
length ∆t = 1 for each subdomain. We use a simplified ODE (sODE) du/dt = a with IC u(0) = 0 for
the CG solver, and use the exact ODE du/dt = a + ω cos(ωt) with IC u(0) = 0 for the ten fine PINNs.
In particular, we first use a CG PINN to act as the fast solver. Let [NI ] + [NH ] × D + [DO ] represent the
architecture of a DNN, where NI is the width of the input layer, NH is the width of the hidden layer, D
is the depth of the hidden layers and NO is the width of the output layer. The CG PINN is constructed
as a [1] + [4] × 2 + [1] DNN encoded with the sODE du/dt = a and IC u(0) = 0, while each fine PINN
is constructed as a [1] + [16] × 3 + [1] DNN encoded with the exact ODE du/dt = a + ω cos(ωt) and IC
u(0) = 0.
In the first iteration, we train the CG PINN for 5, 000 epochs using the Adam algorithm with
a learning rate of α = 0.001, and obtain a rough prediction of the solution G(u(t))|k=0 , as shown in
Fig. 3(a1). Let ti = i for i = 0, 1, ..., 9 be the starting time of each fine PINN in the chunk i+1. To compute
the loss function of the CG PINN, 20 residual points uniformly distributed in each subdomain [ti , ti+1 ]
are used to compute the residual MSER . Subsequently, with the CG PINN solutions, the IC of each
fine PINN is given by the predicted values G(u(t))|k=0 at t = ti except the first subdomain. Similarly
to training the CG PINN, we train ten individual fine PINNs in parallel using the Adam algorithm
with a learning rate of α = 0.001. We accept the PINN solution if the loss function of each fine PINN
loss = MSER + MSEIC < 10−6 , wherein 101 residual points uniformly distributed in [ti , ti+1 ] are used
to compute the residual MSER . We can observe in Fig. 3(a1) that the fine PINN solutions correctly
capture the shape of the exact solution, but their magnitudes significantly deviate from the exact
solution because of the poor quality of the IC given by G(u(t))|k=0 .
The deviation of PPINN solution from the exact solution is quantified by the l2 relative error norm
P k 2
p (up − ûp )
l2 = P 2 , (8)
p ûp

where p denotes the index of residual points. The PPINN solution after the first iteration presents
obvious deviations, as shown in Fig. 3(a1). In the second iteration, given the solutions of G(ui )|k=0

6
and F (ui )|k=0 , we set the IC for CG PINN as G. Then, we train the CG PINN again for 5, 000 epochs
using a learning rate α = 0.001 and obtain the solutions G(u(t))|k=1 for the chunks 2 to 10, as shown in
Fig. 3(a2). The IC for fine PINN of the second chunk is given by ũ(t = t1 ) = F(u1 )k=1 , and the ICs of
fine PINNs of the chunks 3 to 10 are given by a combination of G(u(t))|k=0 , F (u(t))|k=0 and G(u(t))|k=1
at t = ti=2,3...,9 , i.e., ũ(t = ti ) = G(ui )|k=1 − [F (ui )|k=0 − G(ui )|k=0 ]. With these ICs, we train the nine
individual fine PINNs (chunks 2 to 10) in parallel using the Adam algorithm with a learning rate of
α = 0.001 until the loss function for each fine PINN drops below 10−6 . We find that the fine PINNs in
the second iteration converge much faster than in the first iteration, with an average ecochs of 4836
compared to 11524 in the first iteration. The accelerated training process benefits from the results of
training performed in the previous iteration. The PPINN solutions for each chunk after the second
iteration are presented in Fig. 3(a2), showing a significant improvement of the solution with a relative
error l2 = 9.97%. Using the same method, we perform a third iteration, and the PPINN solutions
converge to the exact solution with a relative error l2 = 0.08%, as shown in Fig. 3(a3).
We proceed to investigate the computational efficiency of the PPINN. As mentioned in Sec. 2.2,
we may obtain a speed-up ratio, which grows linearly with the number of subdomains if the coarse
solver is efficient enough. We first test the speed-ups for the PPINN using a PINN as the coarse
solver. Here we test four different subdomain decompositions, i.e. 10, 20, 40, and 80. For the coarse
solver, we assign one PINN (CG PINN, [1] + [4] × 2 + [1]) for each subdomain, and 10 randomly
sampled residual points are used in each subdomain. Furthermore, we run all the CG PINNs serially
using one CPU (Intel Xeon E5-2670). For the fine solver, we again employ one PINN (fine PINN) for
each subdomain to solve Eq. (7), and each subdomain is assigned to one CPU (Intel Xeon E5-2670).
The total number for the residual points is 400,000, which are randomly sampled in the entire time

(a) (b)

Figure 3: A pedagogical example to illustrate the procedure of using PPINN to solve a deterministic
ODE. Here a PINN is also employed for the CG solver. (a) The solutions of CG PINN G(u(t)|k ) and
fine PINNs F (u(t)|k ) in different iterations are plotted for (a1) k = 0, (a2) k = 1 and (a3) k = 2 to solve
the ODE du/dt = 1 + π/2 · cos(πt/2). (b) Speed-ups for the PPINN using the different coarse solvers,
i.e. PINN (magenta dashed line with circle), FDM (blue dashed line with plus), and analytic solution
(red dashed line with triangle). N denotes the number of the subdomains. The linear speed-up ratio
is calculated as N/K. Black dashed line: K = 3, Cyan dashed line: K = 2.

7
domain and will be uniformly distributed in each subdomain. Meanwhile, the architecture for the
fine PINN in each subdomain is [1] + [20] × 2 + [1] for the first two cases (i.e., 10 and 20 subdomains),
which is then set as [1]+[10]×2+[1] for the last two cases (i.e., 40 and 80 subdomains). The speed-ups
are displayed in Fig. 3(b) (magenta dashed line), where we observe that the speed-up first increases
with N as N ≤ 20, then it decreases with the increasing N. We further look into the details of the
computational time for this case. As shown in Table 1, we found that the computational time taken
by the CG PINNs increases with the number of the subdomains. In particular, more than 90 % of
the computational time is taken by the coarse solver as N ≥ 40, suggesting the inefficiency of the CG
PINN. To obtain satisfactory speed-ups using the PPINN, a more efficient coarse solver should be
utilized.

Subdomains (#) Iterations (#) NCG (#) TCG (s) Ttotal (s) S
1 - - - 1,793.0 -
10 2 10 62.4 407.4 4.4
20 2 5 189.2 275.1 6.5
40 3 4 617.8 685.2 2.6
80 3 4 2,424.6 2,453.0 0.73

Table 1: Speed-ups for the PPINN using a PINN as coarse solver to solve Eq. (7). NCG is the number
of residual points used for the CG PINN in each subdomain, TCG represents the computational time
taken by the coarse solver, and Ttotal denotes the total computational time taken by the PPINN.

Considering that the simplified ODE can be solved analytically, we can thus directly use the
analytic solution for the coarse solver which has no cost. In addition, all parameters in the fine
PINN are kept the same as the previously used ones. For validation, we again use 10 subdomains
in the PPINN to solve Eq. (7). The l2 relative error between the predicted and analytic solutions is
1.252×10−5 after two iterations, which confirms the accuracy of the PPINN. In addition, the speed-ups
for the four cases are displayed in Fig. 3(b) (red dashed line with triangle). It is interesting to find
that the PPINN can achieve a superlinear speed-up here. We further present the computational time
at each iteration in Table 2. We see that the computational time taken by the coarse solver is now
negligible compared to the total computational time. Since the fine PINN converges faster after the
first iteration, we can thus obtain a superlinear speed-up for the PPINN.
Instead of the analytic solution, we can also use other efficient numerical methods to serve as
the coarse solver. For demonstration purpose, we then present the results using the finite difference
method (FDM) as the coarse solver. The entire domain is discretized into 1,000 uniform elements,
which are then uniformly distributed to each subdomain. Similarly, we also use 10 subdomains for
validation. It also takes 2 iterations to converge, and the l2 relative error is 1.257 × 10−5 . In addition,
we can also obtain a superlinear speed-up which is quite similar to the case using the analytic solution
due to the efficiency of the FDM (Fig. 3(b) and Table 2).

3.1.2 Stochastic ODE


In addition to the deterministic ODE, we can also apply this idea to solve stochastic ODEs. Here, we
consider the following stochastic ODE

du π π π
 
= β −u + β sin( t) + cos( t) , t ∈ [0, T], (9)
dt 2 2 2

8
Subdomains (#) Iterations (#) NCG (#) TCG (s) Ttotal (s) S
1 - - - 1,793.0 -
10 2 - < 0.05 295.8 6.1
PPINN (Analytic) 20 2 - < 0.05 71.4 25.1
40 2 - < 0.05 38.6 46.4
80 2 - < 0.05 29.0 61.6
1 - - - 1,793.0 -
10 2 100 < 0.05 298.7 6.0
PPINN (FDM) 20 2 50 < 0.05 81.6 22.0
40 2 40 < 0.05 38.5 46.5
80 2 12 < 0.05 28.8 62.2

Table 2: Walltimes for using the PPINN with different coarse solvers to solve Eq. (7). NCG is the
number of elements used for the FDM in each subdomain, TCG represents the computational time
taken by the coarse solver, and Ttotal denotes the total computational time taken by the PPINN.

where T = 10, β = β0 + , β0 = 0.1, and  is drawn from a normal distribution with zero mean and 0.05
standard deviation. In addition, the initial condition for Eq. (9) is u(0) = 1.
In the present PPINN, we employ a deterministic ODE for the coarse solver as
du
= −β0 u, t ∈ [t0 , T]. (10)
dt
Given the initial condition u(t0 ) = u0 , we can obtain the analytic solution for Eq. (10) as u =
u0 exp(−β0 (t − t0 )). For the fine solver, we draw 100 samples for the β using the quasi-Monte Carlo
method, which are then solved by the fine PINNs. Similarly, we utilize three different methods for
the coarse solver, i.e. the PINN, FDM, and analytic solution.
For validation purposes, we first decompose the time-domain into 10 uniform subdomains. For
the FDM, we discretize the whole domain into 1,000 uniform elements. For the fine PINNs, we
employ 400,000 randomly sampled residual points for the entire time domain, which are uniformly
distributed to all the subdomains. We employ one fine PINN for each subdomain to solve the ODE,
which has an architecture of [1] + [20] × 2 + [1]. Finally, the simplified ODE in each subdomain for
the coarse solver is solved serially, while the exact ODE in each subdomain for the fine solver is
solved in parallel. We illustrate the comparison between the predicted and exact solutions at two
representative β, i.e. β = 0.108 and 0.090 in Fig. 4(a). We see that the predicted solutions converge
to the exact ones after two iterations, which confirms the effectiveness of the PPINN for solving
stochastic ODEs. The solutions from the PPINN with the other two different coarse solvers (i.e., the
PINN and analytic solution) also agree well with the reference solutions, but they are not presented
here. Furthermore, we also present the computational efficiency for the PPINN using four different
numbers of subdomains, i.e. 10, 20, 40 and 80 in Fig. 4(b). It is clear that the PPINN can still achieve
a superlinear speed-up if we use an efficient coarse solver such as FDM or analytic solution. The
speed-up for the PPINN with the PINN as coarse solver is similar to the results in Sec. 3.1.1, i.e., the
speed-up ratio first slightly increases with the number of subdomains as N ≤ 20, then it decreases
with the increasing N. Finally, the speed-ups for the PPINN with FDM coarse solver are almost the
same as the PPINN with analytic solution coarse solver, which are similar as the results in Sec. 3.1.1
and will not be discussed in detail here.
In summary, the PPINN can work for both deterministic and stochastic ODEs. In addition, using
the PINN as the coarse solver can guarantee the accuracy for solving the ODE, but the speed-up

9
(a) (b)

Figure 4: A pedagogical example for using the PPINN to solve stochastic ODE. (a) The solutions of
CG solver G(u(t)|k ) and fine PINNs F (u(t)|k ) in different iterations are plotted for (a1) β = 0.108, and
(a2) β = 0.090. The reference solution is u(t) = exp(−βt) + β sin( π2 t) for each β. (b) Speed-ups for the
PPINN using different coarse solvers, i.e. the PINN (magenta dashed line with circle), the FDM (blue
dashed line with plus), and analytic solution (red dashed line with triangle). The linear speed-up
ratio is calculated as N/K. Black dashed line: K = 4, Cyan dashed line: K = 2. For the CG PINN,
we use 1,000 randomly sampled residual points in the whole time domain, which are uniformly
distributed to the subdomains. The architecture of the PINN for each subdomain is the same, i.e.
[1] + [10] × 2 + [1]. For the fine PINN, the architecture is [1] + [20] × 2 + [1] for the cases with 10 and
20 subdomains, and it is [1] + [10] × 2 + [1] for the cases with 40 and 80 subdomains.

may decrease with the number of subdomains due to the inefficiency of the PINN. We can achieve
both high accuracy and good speed-up if we select more efficient coarse solvers, such as an analytic
solution, a finite difference method, and so on.

3.2 Burgers equation


We now consider the viscous Burgers equation

∂u ∂u ∂2 u
+u = ν 2, (11)
∂t ∂x ∂x
which is a mathematical model for the viscous flow, gas dynamics, and traffic flow, with u denoting
the speed of the fluid, ν the kinematic viscosity, x the spatial coordinate and t the time. Given an
initial condition u(x, 0) = − sin(πx) in a domain x ∈ [−1, 1], and the boundary condition u(±1, t) = 0
for t ∈ [0, 1], the PDE we would like to solve is Eq. (11) with a viscosity of ν = 0.03/π.
In the PPINN, the temporal domain t ∈ [0, 1] is decomposed into 10 uniform subdomains. Each
subdomain has a time length ∆t = 0.1. The simplified PDE for the CG PINN is also a Burgers equation,
which uses the same initial and boundary conditions but with a larger viscosity νc = 0.05/π. It is well
known that the Burgers equation with a small viscosity will develop steep gradient in its solution
as time evolves. The increased viscosity will lead to a smoother solution, which can be captured
using much less computational cost. Here, we use the same NN for the CG and fine PINNs for

10
each subdomain, i.e., 3 hidden layers with 20 neurons per layer. The learning rates are also set to
be the same, i.e., 10−3 . Instead of using one optimizer in the last case, here we use two different
optimizations, i.e., we first train the PINNs using the Adam optimizer (first-order convergence rate)
until the loss is less than 10−3 , then we proceed to employ the L-BFGS-B method to further train the
NNs. The L-BFGS is a quasi-Newtonian approach which has second-order convergence rate and can
enhance the convergence of the training [1].
For the CG PINN in each subdomain, we use 300 randomly sampled residual points to compute
the MSER , while 1, 500 random residual points are employed in the fine PINN for each subdomain. In
addition, 100 uniformly distributed points are employed to compute the MSEIC is for each subdomain,
and 10 randomly sampled points are used to compute the MSEBC in both the CG and fine PINNs.
For this particular case, it takes only 2 iterations to meet the convergence criterion, i.e., Etol = 1%.
The distributions of the u at each iteration are plotted in Fig. 5. As shown in Fig. 5(a), the solution
from the fine PINNs (F (u|k=0 )) is different from the exact one, but the discrepancy is observed to be
small. It is also observed that the solution from the CG PINNs is smoother than that from the fine
PINNs especially for solutions at large times, e.g., t = 0.9. In addition, the velocity at the interface
between two adjacent subdomains is not continuous due to the inaccurate initial condition for each
subdomain at the first iteration. At the second iteration (Fig. 5(b)), the solution from the Refinement
step i.e. u|k=2 shows little difference from the exact one, which confirms the effectiveness of the PPINN.
Moreover, the discontinuity between two adjacent subdomains is significantly reduced. Finally, it is
also interesting to find that the number of the training steps for each subdomain at the first iteration
is from ten to hundred thousand, but they decrease to a few hundred at the second iteration. Similar
results are also reported and analyzed in Sec. 3.1, which will not be presented here again.
To test the flexibility of the PPINN, we further employ two much larger viscosities in the coarse
solver, i.e. νc = 0.09/π and 0.12/π, which are 3× and 4× the exact viscosity, respectively. All the
parameters (e.g., the architecture of the PINN, the number of residual points, etc.) in these two cases
are kept the same as the case with νc = 0.05/π. As shown in Table 3, it is interesting to find that
the computational errors for these three cases are comparable, but the number of iterations increases
with the viscosity employed in the coarse solver. Hence, in order to obtain an optimum strategy in
selecting the CG solver we have to consider the trade-off between the accuracy that the CG solver
can obtain and the number of iterations required for convergence of the PPINN.

νc 0.05/π 0.09/π 0.12/π


Iterations (#) 2 3 4
l2 (%) 0.13 0.19 0.22

Table 3: The PPINN for solving the Burgers equation with different viscosities in the coarse solver.
νc represents the viscosity employed in the coarse solver.

3.3 Diffusion-reaction equation


We consider the following two-dimensional diffusion-reaction equation
πx πy
∂t C = ∇ · (D∇C) + 2A sin( ) sin( ), x, y ∈ [0, l], t ∈ [0, T], (12)
l l
C(x, y, t = 0) = 0, (13)
C(x = 0, y = 0, t) = C(x = 0, y = l, t) = C(x = l, y = 0, t) = C(x = l, y = l, t) = 0, (14)

11
(a) (b)

Figure 5: Example of using the PPINN for solving the Burgers equation. G(u(t)|k ) represents the
rough prediction generated by the CG PINN in the (k + 1)-th iteration, while F (u(t)|k ) is the solution
corrected in parallel by the fine PINNs. (a) Predictions after the first iteration (k = 0) at t = 0.3 and
0.9), (b) Predictions after the second iteration (k = 1) at t = 0.3 and 0.9.

where l = 1 is the length of the computational domain, C is the solute concentration, D is the diffusion
coefficient, and A is a constant. Here D depends on C as follows:

D = D0 exp(RC), (15)

where D0 = 1 × 10−2 .
We first set T = 1, A = 0.5511, and R = 1 to validate the proposed algorithm for the two-
dimensional case. In the PPINN, we use the PINNs for both the coarse and fine solver. The time
domain is divided into 10 uniform subdomains with ∆t = 0.1 for each subdomain. For the coarse
solver, the following simplified PDE ∂t C = D0 ∇2 C + 2A sin(πx/l) sin(πy/l) is solved with the same
initial and boundary conditions described in Eq. (12). The diffusion coefficient for the coarse solver is
about 1.7 times smaller than the maximum D in the fine solver. The architecture of the NNs employed
in both the coarse and fine solvers for each subdomain is kept the same, i.e., 2 hidden layers with 20
neurons per layer. The employed learning rate as well as the optimization method are the same as
those applied in Sec. 3.2.
We employ 2,000 random residual points to compute the MSER in each subdomain for the coarse
solver. We also employ 1,000 random points to compute the MSEBC for each boundary, and 2,000
random points to compute the MSEIC . For each fine PINN, we use 5,000 random residual points
for MSER , while the numbers of training points for the boundary and initial conditions are kept the
same as those used in the coarse solver. It takes two iterations to meet the convergence criterion,
i.e. Etol = 1%. The PPINN solutions as well as the distribution of the diffusion coefficient for each
iteration at three representative times (i.e., t = 0.4, 0.8, and 1) are displayed in Figs. 6 and 7,
respectively. We see that the solution at t = 0.4 agrees well with the reference solution after the first
iteration, while the discrepancy between the PPINN and reference solutions increases with the time,
which can be observed for t = 0.8 and 1 in the first row of Fig. 6. This is reasonable because the error
of the initial condition for each subdomain will increase with time. In addition, the solutions at the

12
Figure 6: Example of using PPINN to solve the nonlinear diffusion-reaction equation. First row
(k = 0): First iteration, Second row (k = 1): Second iteration. Black solid line: Reference solution,
which is obtained from the lattice Boltzmann method using a uniform grid of x× y×t = 200×200×100
[21]. Red dashed line: Solution from the PPINN.

Figure 7: Distribution of the normalized diffusion coefficient (D/D0 ). First row (k = 0): First iteration,
Second row (k = 1): Second iteration.

13
three representative times agree quite well with the reference solutions after the second iteration, as
demonstrated in the second row in Fig. 6. All the above results again confirm the accuracy of the
present approach.

Subdomains (#) NNs MSER MSEBC MSEIC Optimizer Learning rate


1 [20] × 3 200,000 10, 000 × 4 2,000 Adam, L-BFGS 10−3
10 [20] × 2 20,000 1, 000 × 4 2,000 Adam, L-BFGS 10−3
20 [16] × 2 10,000 500 × 4 2,000 Adam, L-BFGS 10−3
40 [16] × 2 5,000 250 × 4 2,000 Adam, L-BFGS 10−3
50 [16] × 2 400 200 × 4 2,000 Adam, L-BFGS 10−3

Table 4: Parameters used in the PPINN for modeling the two-dimensional diffusion-reaction system.
The NNs are first trained using the Adam optimizer with a learning rate 10−3 until the loss is less
than 10−3 , then we utilize the L-BFGS-B to further train the NNs [1].

Subdomains (#) Iterations (#) Ttotal (s) S


1 - 27,627 -
10 2 6,230 4.3
PPINN (PINN) 20 3 7,620 3.6
40 3 20,748 1.3
50 3 > 27,627 -
1 - 27,627 -
10 2 5,421 5.1
PPINN (FDM) 20 2 2,472 11.2
40 3 1,535 18.0
50 3 1,620 17.1

Table 5: Speed-ups for using the PPINN with different coarse solvers to model the nonlinear
diffusion-reaction system. Ttotal denotes the total computational time taken by the PPINN, and S is
the speed-up ratio.

We further investigate the speed-up of the PPINN using the PINN as coarse solver. Here T = 10,
A = 0.1146, and R = 0.5, and we use five different subdomains, i.e., 1, 10, 20, 40, and 50. The
parallel code is run on multiple CPUs (Intel Xeon E5-2670). The total number of the residual points
in the coarse solver is 20,000, which is uniformly divided into N subdomains. The number of
training points for the boundary condition on each boundary is 10,000, which is also uniformly
assigned to each subdomain. In addition, 2,000 randomly sampled points are employed for the
initial condition. The architecture and other parameters (e.g., learning rate, optimizer, etc) for the
CG PINN in each subdomain are the same as the fine PINN, which are displayed in Table 4. The
speed-up ratios for the representative cases are shown in Table 5. We notice that the speed-up does
not increase monotonically with the number of subdomains. On the contrary, the speed-up decreases
with the number of subdomains. This result suggests that the cost for the CG PINN is not only
related to the number of training data but is also affected by the number of hyperparameters. The
total hyperparameters in the CG PINNs increases with the number of subdomains, leading to the
walltime for the PPINN to increase with the number of subdomains.
To validate the above hypothesis, we replace the PINN with the finite difference method [22]
(FDM, grid size: x × y = 20 × 20, time step δt = 0.05) for the coarse solver, which is much more

14
efficient than the PINN. The walltime for the FDM in each subdomain is negligible compared to the
fine PINN. As shown in Table 5, the speed-ups are now proportional to the number of subdomains
as expected.

4 Summary and Discussion


Our overarching goal is to develop data-driven simulation capabilities that go beyond the traditional
data assimilation methods by taking advantage of the recent advances in deep learning methods.
Physics-informed neural networks (PINNs) [1] play exactly that role but they become computationally
intractable for very long-time integration of time-dependent PDEs. Here, we address this issue for
first time by proposing a parallel-in-time PINN (PPINN). In this approach, two different PDEs are
solved by the coarse-grained (CG) solver and the fine PINN, respectively. In particular, the CG solver
can be any efficient numerical method since the CG solver solves a surrogate PDE, which can be
viewed as either a simplified form of the exact PDE or it could be a reduced-order model or any other
surrogate model, including an offline pre-trained PINN although this was not pursued here. The
solutions of the CG solver are then serving as initial conditions for the fine PINN in each subdomain.
The fine PINNs can then run in parallel independently based on the provided initial conditions,
hence significantly reduce their training cost. In addition, it is worth mentioning that the walltime
for the PINN in each subdomain at k-th (k > 1) iteration is negligible compared to the first iteration
for the fine PINN, which leads to a superlinear speed-up of PPINNs assuming an efficient CG solver
is employed.
The convergence of the PPINN algorithm is first validated for deterministic and stochastic ODE
problems. In both cases, we employ a simplified ODE in the coarse solver to supervise the fine
PINNs. The results demonstrate that the PPINN can converge in a few iterations as we decompose
the time-domain into 10 uniform subdomains. Furthermore, superlinear speed-ups are achieved for
both cases. Two PDE problems, i.e., the one-dimensional Burgers equation and the two-dimensional
diffusion-reaction system with nonlinear diffusion coefficient, are further tested to validate the present
algorithm. For the 1D case, the simplified PDE is also a Burgers equation but with an increased
viscosity, which yields much smoother solution and thus can be solved more efficiently. In the
diffusion-reaction system, we use a constant diffusion coefficient instead of the nonlinear one in the
CG solver, which is also much easier to solve compared to the exact PDE. The results showed that
the fine PINNs can converge in only a few iterations based on the initial conditions provided by the
CG solver in both cases. Finally, we also demonstrate that a superlinear speed-up can be obtained
for the two-dimensional case if the CG solver is efficient.
We would like to point out another advantage of the present PPINN. In general, GPUs have very
good computational efficiency in training DNNs with big-data sets. However, when one works on
training a DNN with small-data sets, CPUs may have comparable performance to GPUs. The reason
is that we need to transfer the dataset from the CPU memory to the GPU, and then transfer it back
to the CPU memory, which can be even more time consuming than the computational process for
small dataset [23]. For example, given a DNN with architecture of [2] + [20] ∗ 3 + [1], the wall time
of training the PINN (1D Burgers equation) for 10,000 steps is about 100 seconds on an Intel Xeon
CPU (E5-2643) and 172 seconds on a NVIDIA GPU (GTX Titan XP) for a training data set consisting
of 5,000 points. Because the proposed PPINN approach is able to break a big-data set into multiple
small-data sets for training, it enables us to train a PINN with big-data sets on CPU resources if one
does not have access to sufficient GPU resources.
The concept of PPINN could be readily extended to domain decomposition of physical problems

15
with large spatial databases by drawing an analog of multigrid/multiresolution methods [24], where
a cheap coarse-grained (CG) PINN should be constructed and be used to supervise and connect the
PINN solutions of sub-domains iteratively. Moreover, in this prediction-correction framework, the
CG PINN only provides a rough prediction of the solution, and the equations encoded into the CG
PINN can be different from the equations encoded in the fine PINNs. Therefore, PPINN is able
to tackle multi-fidelity modeling of inverse physical problems [25]. Both topics are interesting and
would be investigated in future work.

Acknowledgements
This work was supported by the DOE PhILMs project DE-SC0019453 and the DOE-BER grant DE-
SC0019434. This research was conducted using computational resources and services at the Center
for Computation and Visualization, Brown University. Z. Li would like to thank Dr. Zhiping Mao,
and Dr. Ansel L Blumers for helpful discussions.

References
[1] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learn-
ing framework for solving forward and inverse problems involving nonlinear partial differential
equations. J. Comput. Phys., 378:686–707, 2019.

[2] N. O. Hodas and P. Stinis. Doing the impossible: Why neural networks can be trained at all.
Front. Psychol., 9(1185), 2018.

[3] T. Kurth, S. Treichler, J. Romero, M. Mudigonda, N. Luehr, E. Phillips, A. Mahesh, M. Matheson,


J. Deslippe, M. Fatica, Prabhat, and M. Houston. Exascale deep learning for climate analytics.
arXiv preprint arXiv:1810.01993, 2018.

[4] J. Lew, D. A. Shah, S. Pati, S. Cattell, M. Zhang, A. Sandhupatla, C. Ng, N. Goli, M. D. Sinclair,
T. G. Rogers, and T. M. Aamodt. Analyzing machine learning workloads using a detailed GPU
simulator. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software
(ISPASS), pages 151–152, 2019.

[5] Y. You, Z. Zhang, C. Hsieh, J. Demmel, and K. Keutzer. Fast deep neural network training on
distributed systems and cloud TPUs. IEEE Transactions on Parallel and Distributed Systems, pages
1–14, 2019.

[6] L. Bottou, F. Curtis, and J. Nocedal. Optimization methods for large-scale machine learning.
SIAM Review, 60(2):223–311, 2018.

[7] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980, 2014.

[8] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and
stochastic optimization. J. Mach. Learn. Res., 12(Jul):2121–2159, 2011.

[9] C. Michoski, M. Milosavljevic, T. Oliver, and D. Hatch. Solving irregular and data-enriched
differential equations using deep neural networks. arXiv preprint arXiv:1905.04351, 2019.

16
[10] N. A. K. Doan, W. Polifke, and L. Magri. Physics-informed echo state networks for chaotic sys-
tems forecasting. In ICCS 2019 - International Conference on Computational Science, Faro, Portugal,
2019.

[11] M. Mattheakis, P. Protopapas, D Sondak, M. Di Giovanni, and E. Kaxiras. Physical symmetries


embedded in neural networks. arXiv preprint arXiv:1904.08991, 2019.

[12] M. W. M. G. Dissanayake and N. Phan-Thien. Neural-network-based approximations for solving


partial-differential equations. Comm. Numer. Meth. Eng., 10(3):195–201, 1994.

[13] L. Yang, D. Zhang, and G. E. Karniadakis. Physics-informed generative adversarial networks


for stochastic differential equations. arXiv preprint arXiv:1811.02033, 2018.

[14] D. Zhang, L. Lu, L. Guo, and G. E. Karniadakis. Quantifying total uncertainty in physics-
informed neural networks for solving forward and inverse stochastic problems. arXiv preprint
arXiv:1809.08327, 2018.

[15] D. Zhang, L. Guo, and G. E. Karniadakis. Learning in modal space: Solving time-dependent
stochastic PDEs using physics-informed neural networks. arXiv preprint arXiv:1905.01205, 2019.

[16] Y. Yang and P. Perdikaris. Adversarial uncertainty quantification in physics-informed neural


networks. J. Comput. Phys., 394:136–152, 2019.

[17] G. Pang, L. Lu, and G. E. Karniadakis. fPINNs: Fractional physics-informed neural networks.
arXiv preprint arXiv:1811.08967, 2018.

[18] Y. Maday and G. Turinici. A parareal in time procedure for the control of partial differential
equations. C. R. Math., 335(4):387–392, 2002.

[19] A. Blumers, Z. Li, and G. E. Karniadakis. Supervised parallel-in-time algorithm for long-time
Lagrangian simulations of stochastic dynamics: Application to hydrodynamics. J. Comput. Phys.,
393:214–228, 2019.

[20] R. Livni, S. Shalev-Shwartz, and O. Shamir. On the computational efficiency of training neural
networks. In Advances in neural information processing systems, pages 855–863, 2014.

[21] X. Meng and Z. Guo. Localized lattice Boltzmann equation model for simulating miscible viscous
displacement in porous media. Int. J. Heat Mass Tran., 100:767–778, 2016.

[22] M. M. Meerschaert, H.-P. Scheffler, and C. Tadjeran. Finite difference methods for two-
dimensional fractional dispersion equation. J. Comp. Phys., 211(1):249–261, 2006.

[23] J. Sanders and E. Kandrot. CUDA by example: an introduction to general-purpose GPU programming.
Addison-Wesley Professional, 2010.

[24] G. Beylkin and N. Coult. A multiresolution strategy for reduction of elliptic PDEs and eigenvalue
problems. Appl. Comput. Harmon. Anal., 5(2):129–155, 1998.

[25] X. Meng and G. E. Karniadakis. A composite neural network that learns from multi-fidelity
data: Application to function approximation and inverse PDE problems. arXiv preprint
arXiv:1903.00104, 2019.

17

View publication stats

You might also like