0% found this document useful (0 votes)
22 views44 pages

Du and He - 2023 - Neural-Integrated Meshfree (NIM) Method a Differe

Uploaded by

duhhcm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views44 pages

Du and He - 2023 - Neural-Integrated Meshfree (NIM) Method a Differe

Uploaded by

duhhcm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Neural-Integrated Meshfree (NIM) Method: A

differentiable programming-based hybrid solver for


computational mechanics

Honghui Dua , QiZhi Hea,∗


arXiv:2311.12915v1 [cs.LG] 21 Nov 2023

a Department of Civil, Environmental, and Geo- Engineering, University of Minnesota, 500


Pillsbury Drive S.E., Minneapolis, MN 55455

Abstract
While deep learning and data-driven modeling approaches based on deep neu-
ral networks (DNNs) have recently attracted increasing attention for solving
partial differential equations, their practical application to real-world scientific
and engineering problems remains limited due to the relatively low accuracy
and high computational cost. In this study, we present the neural-integrated
meshfree (NIM) method, a differentiable programming-based hybrid meshfree
approach within the field of computational mechanics. NIM seamlessly inte-
grates traditional physics-based meshfree discretization techniques with deep
learning architectures. It employs a hybrid approximation scheme, NeuroPU, to
effectively represent the solution by combining continuous DNN representations
with partition of unity (PU) basis functions associated with the underlying
spatial discretization. This neural-numerical hybridization not only enhances
the solution representation through functional space decomposition but also
reduces both the size of DNN model and the need for spatial gradient computa-
tions based on automatic differentiation, leading to a significant improvement in
training efficiency. Under the NIM framework, we propose two truly meshfree
solvers: the strong form-based NIM (S-NIM) and the local variational form-
based NIM (V-NIM). In the S-NIM solver, the strong-form governing equation is
directly considered in the loss function, while the V-NIM solver employs a local
Petrov-Galerkin approach that allows the construction of variational residuals
based on arbitrary overlapping subdomains. This ensures both the satisfaction
of underlying physics and the preservation of meshfree property. We perform
extensive numerical experiments on both stationary and transient benchmark
problems to assess the effectiveness of the proposed NIM methods in terms of
accuracy, scalability, generalizability, and convergence properties. Moreover,
comparative analysis with other physics-informed machine learning methods
demonstrates that NIM, especially V-NIM, significantly enhances both accuracy
and efficiency in end-to-end predictive capabilities.

∗ Corresponding author
Email address: [email protected] (QiZhi He)

Preprint submitted to Elsevier November 23, 2023


Keywords: Differentiable programming, Meshfree methods, Hybrid
approximation, Physics-informed learning, Variational formulation, Surrogate
model, Artificial intelligence

1. Introduction

1.1. Numerical methods for PDEs


Solving partial differential equations (PDEs) is crucial to various real-world
applications, particularly in material science, aerospace, civil, and mechanical
engineering. As it is intractable to obtain analytical solutions for most complex
PDEs, many canonical computational methods have been established in the past
80 years to numerically approximate the PDEs, including finite element Method
(FEM) [18, 57], finite volume method (FVM) [14], meshless methods [5, 56, 2, 15],
and isogeometric analysis (IGA) [41], just to name a few. The core idea of these
approaches is to approximate both differential operators and the solution of
the given PDE problem upon mesh-based or point-based spatial discretization,
resulting in a set of finite-dimensional algebraic equations , which can be efficiently
solved by modern sparse solvers. Many of these approaches are based on the
weak form, rendering the desirable convergence property of the approximate
solution. While the above-mentioned numerical methods have demonstrated
theoretical foundation and have been widely adopted in many scientific and
engineering problems, numerous challenges persist in computational science
across the disciplines such as those problems that are high-dimensional and
computationally expensive, and that involve unknown (or partially unknown)
physics.

1.2. Data-driven methods for physical simulation


In recent decades, owing to the advancements in deep learning (DL) algo-
rithms and increased computing power, the integration of data-driven techniques
into the realm of computational science and engineering has revolutionized the
way we approach complex physical problems [72, 63, 12]. Depending on the roles
that physics-based laws and data-driven techniques play in the application, this
new hybrid data-physics paradigm can be broadly categorized into three classes
of methods used to simulate the PDE-governing physical processes: data-fit,
data-free, and blending schemes. The data-fit scheme can be regarded as the
construction of a surrogate or reduced-order model [12], aiming to directly learn
the underlying physics from available data, for instance, the lately emerging
deep operator learning techniques approach [71, 58, 55]. On the contrary, in
the data-free methods, the existing physical laws are fully encoded/informed
into the data-driven models to generate solutions by rigorously constraining the
PDEs through minimizing the loss formulation, which can be thought of as an
unsupervised strategy that does not require labeled data. This approach is also
often referred to as physics-informed machine learning (PIML) [44, 19]. The
third class refers to the approaches in which data-driven models and physical

2
models are employed individually to approximate different components of the
PDE-governing systems while they can be integrated seamlessly to perform
predictive simulation. In the area of computational mechanics, the exemplary
approaches such as model-free data-driven computing [48, 23, 34, 37] and the
coupled data-driven/numerical solvers [80, 35] have received extensive attention.

1.2.1. Physics-informed machine learning (PIML)


Of particular interest to this study is the second class of methods, PIML,
because it preserves the well-accepted physical laws whereas the prerequisite of
big data is significantly relaxed. In this respect, one of the most prominent studies
is physics-informed neural networks (PINNs) [65, 44], where deep neural networks
(DNNs) are used to approximate the solution of PDEs with respect to space and
time variables, and the strong form of the governing equations are penalized
in the loss function together with the initial/boundary conditions. While the
concept of solving differential equations through neural network approximations
could be traced back to at least the 1990’s [52, 60, 51], the resurgence of this
PINN approach is primarily attributed to the recent advancements in deep
learning infrastructure, e.g., TensorFlow, PyTorch, and JAX, which enables
efficient automatic differentiation (AD) operations [4] and optimization [28].
PINNs provide a unified framework for solving forward, inverse, and data
assimilation problems associated with PDEs [65, 44, 36, 19]. Given its flexibility,
the PINN method has been successfully applied to different engineering applica-
tions including fluid mechanics [66, 13], solid mechanics [31, 68, 80], subsurface
transport [76, 33, 36, 22], among others. Besides, various types of neural net-
works, such as fully connected neural networks (FCNN) [65, 6, 33], convolutional
neural networks (CNN) [24, 25], and recurrent neural networks (RNN) [82, 75]
have been explored to approximate the continuous solution within the PINN
framework.
Although the PINN-based methods may seem to be straightforward, they
usually incur difficulties in training the DNN approximation to satisfy all equation
residuals, resulting in slow convergence or reduced accuracy. These issues have
been attributed to unbalanced back-propagated gradients during model training
and the notorious spectral bias of DNNs, as discussed in the studies [77, 74].
Consequently, numerous treatments have been developed to mitigate the training
issues, including adaptive weight schemes [77, 59], domain decomposition [73,
46, 22], and variable/feature transformations [78, 22].
Nevertheless, the costly training can also be attributed to the involvement of
high-order derivatives in the strong-form PDEs [50]. Based on the weak (varia-
tional) formulation of the differential model, the variational PINNs (VPINNs)
have been developed for solving PDEs [45, 47, 7]. Particularly, linear or high-
order piecewise basis functions [47, 7] can be implemented as the test functions
in the variational framework. In addition, constructing the loss function based
on the corresponding energy functional, deep Ritz method [81] and deep energy
method (DEM) [70] were proposed for computational mechanics. These varia-
tional form-based PINNs can be regarded as the Petrov-Galerkin method, since
the trial space approximated by DNNs is usually different from the space spanned

3
by the test functions. Recently, Kharazmi et al. extended the VPINN method
to hp-VPINN[46] by introducing domain decomposition, allowing a localized
network parameter optimization and improved training accuracy.

1.3. Differentiable numerical solvers


In addition to the continuous representation of differential operators in
the above-mentioned PIML approaches, leveraging physics in their discretized
forms derived from classical numerical techniques has been attracting increasing
attention [42, 49, 79]. The recent development includes finite difference [24, 17, 8],
FVM [67], hp approximation [53], FEM [79, 21], IGA [26], peridynamics [30],
and HiDeNN [69] that can reproduce different interpolation functions based on
DNN architectures. In particular, there is a growing interest to develop efficient
differentiable finite discretization solvers under JAX [11] such as JAX-CFD [49],
JAX-Fluids [8], JAX-FEM [79], and JAX-DISP [61]. It is noted that all these
discretization-based solvers are based on differentiable programming [4, 42],
which embeds the numerical linear algebra and gradient operations in the neural
network architectures, enabling the application of end-to-end differentiable
gradient-based optimization methods.
Despite the simplicity of implementing these differentiable numerical solvers
and their potential for significant acceleration on specialized hardware, this
field is still in its nascent stages and requires further development. Specifically,
two limitations should be highlighted. First, as pointed out in [43, 61], the
fundamental assumption that AD capabilities within current machine learning
frameworks can compute “exact” derivatives through complex architectural
neural network models may not always be valid. This can result in inaccuracies
in the spatial gradients of partial differential equations (PDEs) or discretized
differential operators during the training of numerical models, leading to sub-
optimal convergence and potentially unstable approximations [43, 50]. Second,
many of the aforementioned differentiable numerical methods rely on structured
or conforming grids to approximate gradients, which compromises the truly
meshfree property, a distinctive feature often associated with PIML methods.
This contradicts the applicability advantages offered by modern neural networks.

1.4. Differentiable meshfree solver


To overcome these limitations, we develop a differentiable-programming
hybrid method, which incorporates the conventional physics-based meshfree
discretization method in a deep learning architecture to achieve highly accu-
rate, efficient, and end-to-end training and predictive simulation. Furthermore,
motivated by the advantages of nodal shape functions employed in traditional
numerical methods [3, 5, 56, 2, 41], while simultaneously harnessing the uni-
versal approximation capabilities of DNNs, we introduce an auxiliary hybrid
approach for solution approximation. This hybrid scheme, termed the Neuro-
partition of unity (NeuroPU) approximation, is to interpolate the solution by
seamlessly integrating a set of partition of unity (PU) basis functions [3] defined
on the underlying meshfree discretization with DNN represented nodal coefficient

4
functions. This new framework is coined the neural-integrated meshfree (NIM)
method.
In this work, without loss of generality, we chose to employ the reproducing
kernel (RK) meshfree shape functions [56, 16] in the NeuroPU approximation
since the RK shape functions, constructed based on spatially distributed nodes
over physical domain, offer the flexibility to design arbitrary order of accuracy,
smoothness, and compactness. The motivation of introducing NeuroPU for solu-
tion approximation is to regulate the solution representation by well-established
PU basis functions and mitigate the need for intricate computations of high-order
gradients that usually rely on AD, ultimately improving training efficiency. On
the other hand, NeuroPU leverages embedded neural networks to represent
the functional space related to problem-related parameters, enabling effective
surrogate modeling of parameterized or time-dependent problems.
In this study, we will present two NIM solvers for computational mechanics
modeling: strong form-based NIM (S-NIM) and local variational form-based
NIM (V-NIM). Notably, both of these solvers are genuinely meshfree, elim-
inating the need for high-cost conforming mesh generation. Like the PINN
methods [65, 44, 36, 19], the former one considers the strong-form governing
equations in the associated loss function. We will demonstrate the superior
performance of S-NIM over standard PINNs due to the incorporation of Neu-
roPU approximation, which substantially reduces the DNN solution space and
improves the training efficiency and accuracy. In order to further improve the
accuracy and stability, we propose the V-NIM solver, which is inspired by the
meshless local Petrov–Galerkin method (MLPG) [2, 1] that drives the consistent
weak formulation over local subdomains. Distinct from other variational PINN
methods [47, 46, 7], the proposed V-NIM allows for using arbitrary overlapping
subdomains to formulate the loss function, and thus, upholding the meshfree
property. We demonstrate the outstanding performance of the proposed NIM
methods, especially V-NIM, through extensive experiments on the benchmark
examples (e.g., Poisson equation, linear elasticity, time-dependent problem, and
a parameterized PDE), where the accuracy, convergence property, generaliz-
ability, and efficiency are compared against other baseline methods. To the
best of the authors’ knowledge, this study represents the first attempt to de-
velop differentiable programming-based meshfree solvers integrated with hybrid
neuro-numerical approximation for computational mechanics modeling.
The remainder of the paper is organized as follows: Section 2 provides a back-
ground review of numerical discretization and DNNs for function approximation.
In Section 3, we delve into the construction of hybrid NeuroPU approximation.
We present the methodology development of the NIM framework based on the
NeuroPU approxiamtion and meshfree discretization in Section 4, followed by
the detailed solution procedures of the proposed S-NIM and V-NIM methods
provided in Section 5. Numerical tests on static and time-dependent bench-
mark problems are presented in Section 6 and Section 7, respectively. Section 8
concludes the paper by summarizing the main findings and contributions.

5
2. Preliminaries
In this section, we first present a background review of the two fundamental
methodologies of function approximation based on numerical discretization and
deep neural networks (DNNs), as well as their applications to solving PDEs in
computational mechanics.
For demonstration, a classical linear elastostatics problem is taken as the
model problem. Let us consider an elastic solid defined in a bounded domain
Ω ⊂ Rd , where d denotes the spatial dimension, and its boundary ∂Ω ⊂ Rd−1 is
split as the essential boundary condition (EBC) on Γg and the natural boundary
condition (NBC) on Γt , i.e., ∂Ω = Γg ∪ Γt with Γg ∩ Γt = ∅. The governing PDE
describing the static equilibrium is written as:

∇ · σ(u) + f = 0, in Ω

n · σ(u) = t, on Γt (1)

u = u, on Γg

where u is the displacement vector, σ is the Cauchy stress tensor, f is the body
force, u and t are the displacement and traction values prescribed on Γg and Γt ,
respectively, and n is the surface normal on Γt .
For solid mechanics problems, the constitutive law that relates stress and
strain, e.g., σ = σ(ε(u)), is required to solve the boundary value problem (BVP)
in (1), where the linear strain tensor is given by
1
ε := ∇sym u = (∇u + ∇uT ) (2)
2
For linear elastic materials, the strain-stress relation becomes
σ=C:ε (3)
where C is the elasticity tensor.
For many engineering applications, such as inverse design, uncertainty quan-
tification, and surrogate modeling, varying model parameters are considered in
the solid mechanics analysis (1). Here, µ ∈ Rc is used to denote the parameter
vector associated with the problem-specific variables, such as materials properties,
loading, and boundary conditions, etc. In this scenario, the displacement solution
is then generalized as a parameter-dependent field u(x, µ) : Ω × Rc 7→ Rd .

2.1. Numerical approximation via shape functions


In spatial discretization-based numerical methods for solid mechanics prob-
lems (e.g., FEM [40], meshfree methods [5, 56], and IGA [41]), the cornerstone
is to apply nodal shape functions to approximate the solution field in a finite-
dimensional manner.
Let the computing domain Ω be discretized by a set of nodal points {xI }N
I=1 ,
h

the approximation of solution uh is defined as the linear combination of nodal


shape functions:
Nh
X
u(x) ≈ uh (x) = ΨI (x)dI (4)
I=1

6
where Nh represents the number of nodes, dI ∈ Rd is the nodal coefficient at
location xI , and ΨI is the shape function associated with the Ith node.
While the shape functions ΨI can be defined in many different forms, they
are required to satisfy specific conditions (completeness and continuity) to ensure
the convergence property of approximate solutions. One essential condition is
the so-called partition of unity (PU) [3], which requires that the ensemble of
the compact
SN h support of shape
PNhfunctions generates a covering for domain Ω, i.e.,
Ω ⊂ I=1 supp{ΨI }, and I=1 ΨI = 1 with 0 ≤ ΨI (x) ≤ 1.
For x ∈ Ω, let Sx = {I | x ∈ supp (ΨI )} be an index set of the nodal shape
functions whose influence do not vanish at location x, the PU approximation (4)
can be rewritten as X
uh (x) = ΨI (x)dI (5)
I∈Sx

In the following, we introduce a special type of meshfree shape functions, i.e.,


reproducing kernel (RK) approximation, which allows an arbitrary order of
accuracy while maintaining higher-order smoothness [56, 16, 15]. The RK shape
functions {ΨI }N
I=1 used for approximating displacement take the following form:
h

ΨI (x) = p[p]T (xI − x) b(x)ϕa (xI − x) (6)

where the kernel function ϕa controls the smoothness of the RK approximation


function and defines the compact support with a size a. A widely used kernel
function is the cubic B-splines that preserves C 2 continuity

2 2
 3 − 4z + 4z
 3
0 ≤ z ≤ 21
ϕa (z) = 34 − 4z + 4z 2 − 34 z 3 12 < z ≤ 1 (7)

0 z>1

with z = ∥xI − x∥/a. In (6), p[p] (x) is a vector of monomial basis functions up
to the pth order
n oT
p[p] (x) = 1, x1 , x2 , x3 , · · · , xi1 xj2 xk3 , . . . , xp3 ,0 ≤ i + j + k ≤ p (8)

and the parameter vector b(x) is determined by enforcing the following pth order
reproducing conditions:
X
ΨI (x)p[p] (xI ) = p[p] (x) (9)
I∈Sx

Substituting Eq. (9) into Eq. (6) yields

b(x) = A−1 (x)p[p] (0) (10)

where A(x) is the moment matrix


X
A(x) = p[p] (xI − x) p[p]T (xI − x) ϕa (xI − x) (11)
I∈Sx

7
Invoking (10) into (6), we have the following expression for the RK shape
functions:
ΨI (x) = p[p]T (x)A−1 (x)p[p] (xI − x) ϕa (xI − x) (12)
Subsequently, the nodal coefficients in Eq. (5) can be determined by solving the
Galerkin weak formulation [40] corresponding to the elasticity problem (1).
Specifically, the RK shape function with quadratic basis function (p = 2)
and the cubic B-spline function (7) is, by default, employed in the following
numerical study, which will be further discussed in Section 3 and 6. Additionally,
we denote a normalized support size with the characteristic nodal distance h as
ā = a/h.

2.2. Neural network-based approximation for solving PDEs


Thanks to the universal approximation [38, 9], DNNs are capable of approx-
imating arbitrary continuous functions, which makes it become an emerging
candidate of approximate solutions in solving PDEs. By taking the spatial coor-
dinates as inputs, denoted as z0 = x, we consider a feed-forward fully-connected
neural network (FFCN) which consists of the input layer, n hidden layers, and
the output layer, defined as

u(z0 ) ≈ û(z0 ; θ) = zn+1 (zn (· · · z2 (z1 (z0 )))) (13)

in which the symbol ˆ is used to indicate a variable that is parameterized by


DNNs, and θ represents the collection of trainable parameters associated with
the DNN model. The connection between (l − 1)th layers and lth layer is defined
as
zl = zl (zl−1 ) = σ (bl + Wl zl−1 ) , for 1 ≤ l ≤ n
(14)
zn+1 = bn+1 + Wn+1 zn , for l = n
where the activation function σ(·) is selected as hyperbolic tangent function,
zn+1 denotes the output vector, and Wi and bi correspond to the weights and
bias associated with the ith layer, respectively.

2.2.1. PINNs
When using the PINN method [65, 44] for solving the elasticity problem in
Eqs. (1)-(3), a DNN model is used to approximate the displacement solution,
denoted as û(x), with the spatial coordinates x as the network inputs. The
differential operators, e.g., ∇û(x; θ), are computed by performing Automatic
Differentiation (AD) [4] with respect to the inputs. As a result, the mean
square errors (MSEs) of the governing partial differential equations (PDEs) are
subsequently incorporated into the loss function by feeding both the approximate
solution and its derivatives on the given set of collocation points. With slightly
abusing the notion, the corresponding loss of the elasticity problem in (1)-(3) is
given as

L(θ) = |û − u|Γg + |n · (C : ∇sym û) − t|Γt + |∇ · (C : ∇sym û) + f |Ω (15)

8
The parameters θ of neural networks are optimized through the minimization
of the loss function. For a more formal formulation regarding the standard
PINN [31] or the mixed-form PINN [68] for linear elasticity, we refer readers to
the following studies [65, 31, 68, 36].
Aided by DNNs, the traditional PINN methods offer a straightforward way
to approximate the solution fields by incorporating the given physical laws.
Nevertheless, the complex solution fields often necessitate a relatively large
size of neural networks to ensure sufficient approximation capacity, inevitably
leading to over-parameterized search space that poses challenges in training the
non-convex PDEs-informed loss functions [33, 19, 17]. Consequently, a huge
amount of collocation points as well expensive optimizer iterations are usually
required to train the PINN model resulting from complex high-dimensional PDE
problems [19, 50, 22]. Moreover, performing AD operators with regard to large
data points over spatiotemporal domain usually takes a considerable portion of
computational time, especially when dealing with high-order PDEs.

3. Hybrid approach: Neuro-partition of unity (NeuroPU) approxima-


tion

Figure 1: Schematic of neuro-partion of unity (NeuroPU) approximation (16), where ΨI is the


nodal shape function centered at xI , I = 1, 2, ...Nh , related to the numerical discretization
over the physical domain, and the coefficient functions dˆ ∈ Rd×Nh are approximated by using
neural networks with the system parameters µ ∈ Rc as inputs. The sizes of input layer and
output layer of neural network are decided by the dimensions of µ and d, ˆ respectively.

In order to enhance the training process and accuracy within the field of
PIML, we propose a hybrid approach called neural-partition of unity (NeuroPU)
approximation, which integrates numerical discretization generated by PU shape
functions (Section 2.1) over the physical domain and the neural network-based
approximation (Section 2.2) for nodal coefficient functions. This hybrid approx-
imation method is the building block of the proposed NIM framework, which
will be further elaborated in Section 4.

9
Given a set of predefined nodal shape functions, the NeuroPU approximation
of the parametric solution u(x, µ) is expressed as
X
ûh (x, µ) = ΨI (x)dˆI (µ) (16)
I∈Sx

where ΨI is the PU shape function and specifically the RK shape function defined
in (6) is adopted, and Sx is the set of nodes that contribute to the interpolation
at x (see Section 2.1). In Eq. (16), the nodal coefficient function dˆI (µ) ∈ Rd is
the corresponding Ith output of a neural network Nθ which is parametrized by θ
as explained in Section 2.2. Let the collection of nodal coefficients be denoted as
dˆ := {dˆI }N
I=1 ∈ R
h d×Nh
, which defines a mapping from inputs µ to the discrete
ˆ
nodal variable d through a multi-layer neural network, i.e.,
dˆ : µ ∈ Rc 7→ Nθ (µ) (17)
Notably, the utilization of the DNN model allows the nodal coefficients to be
expressed as a function of problem-defined system parameters µ, e.g., tempo-
ral coordinates and material coefficients. Thus, the NeuroPU approximation
can be readily employed for a surrogate model for parameterized systems, as
demonstrated in Section 6.3.
Due to the compact nature of the support domain, only a few entries of ΨI
and dˆI interacted within the support domain Sx will be active in the dot product
in Eq. (16). This results in a sparsity structure that can streamline matrix
calculations and save storage. Besides, the hybrid NeuroPU approximation
offers an efficient computation of all the space-dependent derivative terms. To
wit, since spatial coordinates are only involved via the shape functions in the
NeuroPU construction, the spatial gradient of ûh yields
X
∇ûh (x, µ) = ∇ΨI (x)dˆI (µ) (18)
I∈Sx

In this setting, the derivatives of shape functions, ∇ΨI , can be pre-computed and
stored in advance for the subsequent derivation of ∇ûh when establishing the loss
function. This is essentially distinct from the PINN method where the differential
and gradient operators in governing equations are computed through automatic
differentiation during training. Furthermore, in contrast to the studies [53, 69],
where shape functions are implicitly encoded using architectural or hierarchical
neural networks, the NeuroPU approximation explicitly expresses these shape
functions, which preserves the necessary simplicity and compatibility for scalable
implementation, akin to classical numerical methods.
Remark 3.1. It is noted that while the reproducing kernel (RK) approximation
is adopted as PU shape functions in Eq. (16), other types of PU shape functions ,
e.g., Lagrange polynomial basis [18, 40], spectral series [64], and NURBS [41], are
also applicable to the proposed framework. However, to ensure truly meshfree
properties and avoid tedious mesh generation, we adopt meshfree-type shape
functions that are defined on physically spatial domain and allow overlapping
compact supports.

10
Remark 3.2. The proposed NeuroPU approach offers the flexibility to utilize
various parameter inputs µ and neural network architectures for nodal coefficient
functions dˆI (µ), depending on the specific problem at hand. For example, when
dealing with temporal variable as input, fully connected neural networks (FCNN)
provide a continuous representation that captures temporal evolution, while
convolutional neural networks (CNN) can be employed to process the image
inputs, obtaining nodal coefficient functions with discrete representations.

4. Neural-Integrated Meshfree (NIM) Framework

This section aims to develop the neural-integrated meshfree (NIM) method,


a novel differentiable programming-based meshfree computational framework
for solving computational mechanics problems. Particularly, we will focus on
establishing a highly efficient neural-numerical platform by leveraging both the
hybrid NeuroPU approximation and meshfree formulation that is well designed
for the end-to-end neural network training procedure.
In the following subsections, we propose two NIM solvers that are formulated
based on the strong form and the local variational form of governing equations,
denoted as S-NIM and V-NIM, respectively.

4.1. Approach I: Strong form-based neural integrated meshfree solver (S-NIM)


With the employment of the NeuroPU approximation (16), we first develop
the strong form-based NIM solver (S-NIM), which is designed to directly encode
the governing equations and boundary conditions associated with the problem
in the network structure. Similar to the PINN method described in (15), the
loss function in S-NIM is formulated as the sum of mean squared residuals of
Eqs. (1)-(3):

Nµ  Nf
S 1 X 1 X 2
L (θ) = ∇ · σ̂ h (xi , µj ) + f (xi , µj )
Nµ j=1 Nf i=1

Nt
α1 X 2
+ t̂h (xi , µj ) − t (xi , µj ) (19)
Nt i=1

Ng
α2 X h 2

+ û (xi , µj ) − u (xi , µj )
Ng i=1 

where the matrix forms of ûh , σ̂ h and t̂h are provided in Appendix A. The
superscript h denotes the variables involving the NeuroPU approximation. Nf
Nf
represents the number of residual points Sf = {xi }i=1 sampled over computation
domain Ω (denoted by blue stars in Figure 2a), Nt and Ng are the numbers
Ng
of sampling points St = {xi }N i=1 and Sg = {xi }i=1 whereby EBC on Γg and
t

NBC on Γt are imposed, respectively. Here, we consider a mechanics problem

11
parameterized by the parameter vector µ, and Nµ denotes the number of sample

points in the parameter set, i.e., Sµ = {µi }i=1 .
In Eq. 19, the weight coefficients α1 and α2 are used to penalize the loss
terms associated with boundary conditions. It has been reported that the
weights are critical to the convergence rates of loss functions and the accuracy of
approximation solution [36, 77]. While the loss function of S-NIM resembles the
standard PINN method for elasticity problems [31, 68], distinct approximation
functions are used. Our numerical studies in Section 6 will demonstrate that the
introduction of the NeuroPU approximation in S-NIM significantly boosts both
the training efficiency and accuracy, attributing to the reduced dimensionality
in approximation space and the efficient, high-order accurate spatial gradients
provided by RK shape functions.

Figure 2: Schematics of the S-NIM and V-NIM methods, where the nodes are represented
by black points, and the influence domains supp(xI ) of the trial shape functions (ΨI ) are
indicated by grey rectangular or square areas around nodes. (a) S-NIM: The sample points are
represented by blue stars; (b) V-NIM: The local subdomains, denoted by Ωs , are represented by
blue rectangles, with their centers marked as blue stars. The boundary of the local subdomain,
∂Ωs , is divided into two parts, Γs and Ls , where Γs represents the portion of ∂Ωs that lies
on the global boundary ∂Ω, whereas Ls corresponds to the portion of the boundary located
within the domain Ω.

4.2. Approach II: Local variational form-based neural integrated meshfree frame-
work (V-NIM)
While formulating the loss function for the proposed S-NIM method is
straightforward due to the employment of strong-form governing PDEs, the
involvement of higher-order derivatives in loss functions could potentially impact
the efficiency and accuracy. Several variational physics-informed machine learn-
ing methods [47, 46, 7], as discussed in the introduction, have been developed to
alleviate these issues, where the corresponding loss function is constructed based

12
on the variational (weak) form of the governing equations derived by using the
weighted residual methods along with various testing functions. The use of weak
form decreases the required regularity of the approximate solution. Consequently,
a reduction in the highest order of derivatives and an improvement in training
accuracy can be achieved when compared to the strong form counterpart.
However, these variational approaches typically require a conforming dis-
cretization to construct the loss function described by an integral form on the
entire computation domain, which inevitably results in the loss of the truly
meshless characteristic. Motivated by this concern, we propose to introduce a
local weak formulation in the NIM method, namely V-NIM, which effectively
preserves the discretization/mesh-free property. This V-NIM approach admits
the local incorporation of underlying physics and avoids the significant costs
associated with mesh generation.

4.2.1. Local weak formulation for BVP


Inspired by the idea of local weak form used in the meshless local Petrov-
Galerkin (MLPG) approach [2, 1, 32], we let T be the set of (overlapping)
S local
subdomains such that their union covers the whole domain, i.e., Ω̄ ⊂ s∈T {Ωs },
with NT = |T | denoting the number of subdomains in T , as shown in Figure
2b. Under this setting, we can define an arbitrary local test function v on the
subdomain Ωs , i.e., v(x) : x ∈ Ωs ⊂ Ω 7→ Rd . Therefore, the local weak form of
the governing equation (1) over Ωs is written as
Z
v · (∇ · σ + f ) dΩ = 0 (20)
Ωs

Applying integration by parts and the divergence theorem to the above equation,
with introducing the traction/natural boundary condition, yields the following
local variational (weak) formulation
Z Z Z
∇v : σdΩ − v · tdΓ − v · tdΓ
Ωs Ls Γsg
Z Z (21)
= v · f dΩ + v · tdΓ
Ωs Γst

where t = σ ·n. In general, the boundary of the local domain Ωs is ∂Ωs = Γs ∪Ls ,
in which Γs denotes the portion of the local boundary ∂Ωs located on the global
boundary Γ = ∂Ω, i.e., Γs = ∂Ω ∩ Γ, whereas Ls represents the remaining
part of the local boundary that lies inside the domain, as shown in Figure 2b.
Specifically, we denote Γsg = Γs ∩ Γg and Γst = Γs ∩ Γt as the parts of the
local boundary Γs on which the EBC and NBC are specified, respectively. For a
subdomain located entirely within the global domain, there exists no Γs , and
thus, the boundary integrals over Γsg and Γst vanish and Ls = ∂Ωs . It should
be emphasized that the essential boundary conditions have not yet been imposed
in Eq. (21), and this will be addressed in Section 4.2.4.
Given a selected test function v defined on Ωs , the discretization of Eq. (21)
will only yield one linear algebraic equation. To ensure obtaining a sufficient

13
number of linearly independent equations for displacement solution u ∈ Rd , we
can apply Nv (Nv ≥ d) independent sets of test functions {v (k) }N k=1 to Eq. (21).
v

As such, the local variational residual associated with the kth test function v (k)
over Ωs is defined as
Z Z Z
(k) (k) (k)
Rs = v · tdΓ + v · tdΓ + v (k) · tdΓ
Ls Γ Γst
Z Z sg (22)
− ε(k)
v : σdΩ + v (k)
· f dΩ
Ωs Ωs

(k)
where εv = 12 (∇v (k) + ∇v (k)T ) is the symmetric part of ∇v (k) considering the
symmetry of Cauchy stress σ.
In practice, Nv = d is commonly considered in the study using local variational
form [1]. In the following, we take a 2D solid problem as an example, i.e., d = 2.
(k)
The set of test functions {v (k) }2k=1 and {εv }2k=1 can be assembled in matrices
w and εw , respectively, which are
" # " #T
(1) (2) (1) (1) (1) (1)
(j) v1 v1 v1,1 v2,2 v1,2 + v2,1
w= [vi ] = (1) (2) , εw = (2) (2) (2) (2) (23)
v2 v2 v1,1 v2,2 v1,2 + v2,1

As a result, with invoking the NeuroPU approximation (16) in Eq. (22), the
matrix form of the local variational residual at Ωs is formulated as
Z Z Z
Rhs = wT t̂h dΓ + wT t̂h dΓ + wT tdΓ
Ls Γsg Γst
Z Z (24)
− εTw σ̂ h dΩ + T
w f dΩ
Ωs Ωs

Note that here Rhs consists of Nv = 2 independent equations, but it is straightfor-


ward to extend the formulation to the case of Nv > d by using more independent
test functions in constructing the local variational residuals.
Furthermore, we can simplify the set of test functions by designing isotropic
(j)
w = [vi ] = vδij with v(x) being a chosen function defined on Ωs . Therefore,
Eq. (23) is further reduced to
   T
v 0 v,1 0 v,2
w= , εw = (25)
0 v 0 v,2 v,1

Unless otherwise stated, the setting of test functions in Eq. (25) is employed
in V-NIM across the present study. The matrix forms of other variables can be
referred to Appendix A.
Remark 4.1. It is noted that the trial shape functions ΨI (x) in the NeuroPU
approximation and the test function v(x) can be chosen from different function
spaces, leading to the Petrov-Galerkin method [39, 10, 2]. This flexibility allows
the test functions defined on overlapping subdomains, and the test functions

14
are not required to vanish on the boundary where EBCs are specified (more
discussion will be shown in Section 4.2.4).
Remark 4.2. We also note that the natural boundary condition (NBC) on Γst
are consistently imposed in V-NIM as its boundary integral terms are involved
in the local variational form (21) and the associated residual (24). Due to
the local consistency, we argue that V-NIM can provide a more accurate and
stable approximation compared to the weakly imposition of NBC via the penalty
method, such as PINN [65, 68] and S-NIM (19).
Remark 4.3. V-NIM allows the local weak form (21) (or the local residual
form (24)) to be constructed locally with Galerkin consistency. Given this unique
feature, the proposed V-NIM is a truly meshfree framework, distinct from other
global variational form-based methods [47, 46, 21, 70, 81], where the loss function
is constructed by using globally defined test functions or conforming background
integration cells over the entire domain.

4.2.2. Size of local subdomain


The selection of appropriate subdomains is critical as the local weak formula-
tion is built upon and computed over the set of subdomains Ωs . While these
subdomains can take on arbitrary shapes, in practice, common choices for a
local subdomain Ωs include a sphere (in the 3D case) or a circle (in the 2D case),
as well as a cube (in the 3D case) or a rectangle (in the 2D case) [2, 1, 32]. In
this study, we select the square shape with a side length of 2r for subdomains.
Here, we define r = r̄h, where h represents the characteristic nodal distance, and
r̄ serves as a normalized size constant. To ensure appropriate overlaps among
subdomains, we adopt 0.5 < r̄ < 1.5 for the subdomain size.

4.2.3. Design of test functions


As indicated before, the local weak form permits the flexibility of defining
arbitrary test functions over the local subdomains, including piece-wise poly-
nomials, Legendre polynomials, and radial basis functions. In practice, we can
customize special test functions on the subdomains to enable distinct properties
and simplify the form of local variational residuals (24). In the following, we
will introduce two special properties that can be embedded in the test functions
defined on a square subdomain (refer to Section 4.2.2).
1) Test functions vanishing on local boundary. In order to eliminate the
boundary integral terms on the part of local boundary Ls (see Figure 2b), we
define a test function v that vanishes on Ls , i.e., v(x) = 0 when x ∈ Ls . To this
end, a B-spline function or Gaussian function can be employed. By incorporating
these test functions into the local variational residual in Eq. (24), the boundary
integral terms over Ls will be eliminated, and Eq. (24) can be recast as
Z Z Z Z
Rhs = wT t̂h dΓ + wT tdΓ − εTw σ̂ h dΩ + wT f dΩ (26)
Γsg Γst Ωs Ωs

where w and εw are defined in Eq. (25).

15
2) Heaviside step function. The cumbersome computation of the domain
integral in Eq. (24) can be fully circumvented if we adopt the Heaviside step
function as the test function, namely,
(
0 x∈ / (Ωs ∪ Ls )
v(x) = (27)
1 x ∈ (Ωs ∪ Ls )

Introducing Eq. (27) into Eqs. (24) and (25) leads to


Z Z Z Z
Rhs = t̂h dΓ + t̂h dΓ + tdΓ + f dΩ (28)
Ls Γsg Γst Ωs

where the domain integral term containing the derivative of v(x) is canceled due
to the property of Heaviside step function.
It is also worthwhile pointing out that if Dirac’s Delta function δ(x − xs ),
where xs is the center of subdomain Ωs , is adopted for test functions, the local
variational form (21) that V-NIM is based on will be degenerated to the strong
formulation so that the S-NIM method is restored. In this case, the centers
Nf
of subdomains {xs }N s=1 are considered as the sample points {xi }i=1 in S-NIM,
T

refer to Figure 2.

4.2.4. Treatment of essential boundary conditions


In the field of physics-informed machine learning, it remains difficult to
exactly impose the boundary conditions including EBC and NBC. As suggested
in Remark 4.2, NBC is consistently considered in the local Petrov-Galerkin
formulation in V-NIM, so we are only concerned with the enforcement of EBC
in this subsection.
The penalty method has been widely used in traditional numerical methods,
we borrow this idea by adding additional terms on the local variational loss Rhs
Z
h,g h
Rs = Rs + α wT (ûh − ū)dΓ (29)
Γsg

where α is a penalty parameter.


It is noted that since PU shape functions are adopted to construct the
approximation (16), various boundary enforcement methods proposed in Meshfree
community [15] can be readily applied to the V-NIM method, such as Lagrangian
method, singular kernel method, or Nitche’s method. Here, we only consider the
penalty method for the sake of simplicity and consistency in comparison with
the S-NIM and PINN methods.

4.2.5. Loss function of V-NIM


Integrating the local variational residuals over the set of subdomains {Ωs }NT
s=1
and considering the set of system parameters, the total loss of the V-NIM is

16
written as:
Nµ NT
1 XX 2
V
L (θ) = Rh,g
s (µj )
Nµ NT j=1 s=1
2
(30)
Nµ NT Z
1 XX
= Rhs + α wT (ûh − ū)dΓ
Nµ NT j=1 s=1 Γsg

where some examples of the local variational residual Rhs are provided in Sec-
tion 4.2.3. Again, ûh (x, µ; θ) = I∈Sx ΨI (x)dˆI (µ; θ) adopts the NeuroPU
P
approximation, where θ are the trainable parameters of the neural networks.
As reported in Remark 4.3, it can be seen that the loss LV for V-NIM simply
relies on the summation of local variational residuals instead of a full integral
form over the whole domain to enforce equilibrium [47, 70, 81, 7]. Thanks to this
collocation-like construction, each local residual can be minimized separately. On
the other hand, compared to other domain decomposition-based DNN methods,
such as hp-VPINN [46] or local extreme learning machines [20], the proposed
V-NIM doesn’t require conforming subdomains to formulate the local residuals,
which offers a great flexibility in constructing the loss function. Overall, the
local feature embedded in V-NIM enables the employment of an efficient and
scalable (mini-)batch training procedure. The enhanced computational efficiency
and accuracy will be highlighted in the following numerical experiments.

5. Solution Procedures

A summary of the numerical procedures for the proposed S-NIM and V-NIM
methods is provided in this section.

5.1. Numerical procedures


Implementation of the S-NIM method
1. Define the following parameters.
1.a. The sets of meshfree nodes {xI }NI=1 and residual points Sf distributed
h

over Ω. The sets of sample points St , Sg distributed on Γt and Γg ,


respectively.
1.b. The parameters for NeuroPU approximation (Figure 1) including the
neural network architecture, the order of basis function p, the support
size a, and the type of shape functions.
2. Calculate and store the trial shape functions {ΨI }NI=1 associated with nodes
h

Nh ˆ θ) = Nθ (µ).
{xI }I=1 , and initialize the nodal coefficient network d(µ;
3. Construct the loss function based on Eq. (19).
4. Define the optimizer and minimize the loss function until convergence.
5. Output the trained network d(µ; ˆ θ ∗ ).

17
Implementation of V-NIM method
1. Define the following parameters.
1.a. The set of meshfree nodes {xI }N I=1 and the set of center points of
h

NT
subdomains {xs }s=1 distributed over Ω.
1.b. The parameters for NeuroPU approximation (Figure 1) including the
neural network architecture, the order of basis function p, the support
size a, and the type of shape functions.
1.c. The parameters for local variational form including the size of subdo-
mains r and the type of test functions.
2. Calculate and store the trial shape functions {ΨI }NI=1 associated with nodes
h

Nh ˆ θ) = Nθ (µ).
{xI }I=1 , and initialize the nodal coefficient network d(µ;
3. Construct the loss function based on Eq. (30) by using the Gauss quadra-
ture points.
3.a. By introducing the quadrature rules, the discrete forms of Eq. (26)
and Eq. (28) are, respectively, given as follows

Z Z Z Z
Rhs = T h
w t dΓ + T
w tdΓ − εTw σ h dΩ + wT f dΩ
Γsg Γst Ωs Ωs
NL
( NB
)
X X
T h
= JΓE
sg
w (xB ) t (xB ) ωB
E=1 B=1
NL
( NB
)
X X
T
+ JΓE
st
w (xB ) t (xB ) ωB
E=1 B=1
x y
NE NE ( NG
)
X X X
− J (Ex ,Ey ) εTw h
(xG ) σ (xG ) ωG
Ωs
Ex =1 Ey =1 G=1
x y
NE NE ( NG
)
X X X
T
+ J (Ex ,Ey ) w (xG ) f (xG ) ωG
Ωs
Ex =1 Ey =1 G=1
(31)
and
Z Z Z Z
Rhs = th dΓ + th dΓ + tdΓ + f dΩ
Ls Γsg Γst Ωs
NL
X NB
X NL
X NB
X
= [JLE
s
th (xB ) ωB ] + [JΓE
sg
t (xB ) ωB ]
E=1 B=1 E=1 B=1
x y
NL NB NE NE NG
X X X X X
+ [JΓE
st
t (xB ) ωB ] + [J (Ex ,Ey ) f (xG ) ωG ]
Ωs
E=1 B=1 Ex =1 Ey =1 G=1
(32)
where xB ’s and ωB ’s are the locations and weights of quadrature
points for boundary integrals, while xG ’s and ωG ’s correspond to

18
those for domain integrals. J stands for the corresponding Jacobian.
NL represents the number of segments for the boundary integral. NEx
and NEy are the number of segments for the domain integral along
x and y directions, respectively, when considering a 2D rectangle
subdomain.
3.b. The essential boundary terms in (30) will be discretized in the similar
way shown in Eqs. (31) and (32).
4. Define the optimizer and minimize the loss function until convergence.
ˆ θ ∗ ).
5. Output the trained network d(µ;
ˆ θ ∗ ) is given, the solution field is obtained by
Once the trained network d(µ;
using the NeuroPU approximation: ûh (x, µ) = I∈Sx ΨI (x)d(µ;
P ˆ θ ∗ ).

5.2. Summary: S-NIM & V-NIM


Table 1 provides an overview of the proposed S-NIM and V-NIM solvers.
While we only consider two different test functions, the framework can be easily
extended to other types of functions depending on applications of interest (see
Remark 4.1). We also refer to Table 1 in [46], which provides a summary of
test function and trial function for various variations and energy-based PINN
approaches. In the next section, we will compare the performance of these
proposed methods through a series of numerical examples.

Methods S-NIM V-NIM


Trial function/solution NeuroPU approximation (16)
Test function δ(x − xs ) Heaviside step Cubic B-spline
Loss/Residual Strong form (19) Local variational form (30)
Integration N/A N/A Gauss quadrature

Table 1: Summary of the two proposed NIM methods: S-NIM and V-NIM.

The proposed NIM framework is implemented based on TensorFlow 1.14 in


Python 3.9 for leveraging its built-in automatic differentiation capacity. The
computations are executed on a single NVIDIA A100 graphics processing unit
(GPU).

6. Numerical Results: Static Problem

In this section, we will examine the performance of the proposed S-NIM


and V-NIM methods on various numerical examples. The approximation and
convergence properties of the NeuroPU approach and the features of the local
formulation will be investigated using the static scalar (Poisson’s equation) and
vector (elasticity) problems. The superior performance of V-NIM based on the
local weak formulation will be highlighted. Moreover, we will demonstrate that
the NIM framework can be used for surrogate modeling of a parameterized PDE
with a desirable extrapolative capacity. In Section 7, the proposed method will be
further validated on a time-dependent problem, the advection-diffusion equation,

19
where time is considered as the model parameter. The solutions obtained by
the PINN and hp-VPINN methods [46] are also provided for comparison to
underscore the advantages of the NIM solver in both approximation accuracy
and computational efficiency.
Unless stated otherwise, the construction of the NeuroPU approximation
(16) in the NIM methods employ quadratic meshfree shape functions, i.e., p = 2
in (8), with the normalized support size ā = 2.5. To simplify notation, a neural
network with l hidden layers, each of which contains m neurons, is denoted as
l × [m]. The input size of the neural network in NeuroPU approximation is given
by the dimension of parameters µ ∈ Rc , and the output size is Nh .
As shown in Table 1, two different test functions are examined in V-NIM. To
distinguish the employment of the Heaviside function and cubic B-spline function
as the test functions, we denote the resultant V-NIM solvers as V-NIM/h and
V-NIM/c, respectively. As domain integral is required in V-NIM, we let each
subdomain be uniformly divided into 4 × 4 segments in 2D case (or 4 segments
in 1D case), where 5 Gauss quadrature points per direction are used for the
segment integration.
The initial weights and bias of neural networks in the NIM framework are
initialized using the Xavier scheme [27]. For the penalty parameters α adopted
for the S-NIM (19) and V-NIM (30) methods, we determine the optimal one
from the heuristic tests on the values [1, 10, 102 , 103 ]. In order to conduct a
fair comparison, we consider the same training scheme for different methods
to evaluate their training efficiency and accuracy. The Adam optimizer with a
learning rate of 0.001 is used by default unless stated otherwise.

6.1. 2D Poisson’s equation


To demonstrate the accuracy and convergence properties of the NIM solvers,
we first consider a two-dimensional Poisson’s equation:
(
∇2 u(x, y) = f (x, y) in Ω : [−1, 1] × [−1, 1]
(33)
u(x, y) = ū on Γg : x = ±1 or y = ±1

where f (x, y) and ū are the prescribed force term and the essential boundary
condition (EBC) on Γg , respectively. Let the analytical solution [45, 46] be

uexact (x, y) = (0.1 sin(2πx) + tanh(10x)) × sin(2πy) (34)

The corresponding f (x, y) and ū can be obtained by substituting the exact


solution (34) in (33). In this static problem, since the parameter coefficient µ is
considered a constant, it is dropped in Eq. (33) for simplicity. As a result, a
shallow neural network (see Table 2) is selected for the NeuroPU approximation.

6.1.1. Effect of NeuroPU approximation


The effect of NeuroPU approximation with different orders of basis on the
NIM solutions is investigated in this subsection. As shown in Figure 3(left), a
uniform meshfree discretization of Nh = 1681 nodes is adopted for the NeuroPU

20
Figure 3: Left: The discretization for NIM, where the nodes are used to define the nodal
shape function and the center points represent the sample points for S-NIM and centers of
subdomains for V-NIM; Right: The reference solution for the 2D Poisson’s problem.

Methods PINN S-NIM V-NIM


Neural network 4×[40] 1×[10] 1×[10]
Nh N/A 1681
Subdomain size r̄ N/A N/A 1.5
Nf or NT 10400 2601 2601
α 100 1000 100

Table 2: Hyperparameters of PINN, S-NIM, and V-NIM for 2D Poisson problem.

approximation (16), whereas a set of 2601 uniformly distributed residual points


(or subdomains) is used for S-NIM (or V-NIM) method. The comparison of
the V-NIM/h and V-NIM/c methods with the same normalized size r̄ = 1.5 of
subdomains is also provided. The network architecture of hidden layers for the
NIM methods is 1 × [10], and the output layer has Nh = 1681 neurons. We set
the number of Adam epochs to 50,000. All the hyperparameters are provided in
Table 2.
Figure 4 presents the point-wise error comparison among S-NIM, V-NIM/h
and V-NIM/c using different orders of basis (p = 1, 2, 3 with ā = 1.5, 2.5 and
3.5) for the NeuroPU shape functions. As the order of basis increases, the
approximation errors of all these NIM methods reduce significantly. For example,
the maximum point-wise error in S-NIM decreases from around 0.85 with a linear
basis to 6 × 10−3 with a cubic basis, whereas that of V-NIM/h from 1 × 10−2
to 2 × 10−4 . This highlights the flexibility of using high-order approximation
to achieve desirable accuracy in the NIM solution even with a relatively small
network, which is also evidenced by the convergence study (Figure 6). It is
important to note that the diminished accuracy in the S-NIM solution using
linear basis, as shown in Figure 4a, stems from the involvement of 2nd order
derivatives in the strong-form PDE residuals, which results in the linear RK
shape functions lacking sufficient approximation capacity.
Comparing the strong form and local weak form-based NIMs, it is observed
that both V-NIM/h and V-NIM/c exhibit higher accuracy than S-NIM, especially
in the case of V-NIM/c, which surpasses S-NIM by more than an error order

21
Figure 4: The comparisons of point-wise displacement errors obtained by S-NIM, V-NIM/h
and V-NIM/c using different orders of NeuroPU shape functions: (a) Linear; (b) Quadratic;
(c) Cubic. The reference solution is provided in the right panel of Figure 3.

of magnitude. This mainly attributes to the following properties provided by


the local variational formulation. First, the residuals of V-NIM (24) involve
lower order of derivatives than S-NIM (and PINN, refer to Section 6.1.2), which
reduces the NeuroPU approximation error related to derivatives; Second, the
proposed variational approach yields a consistent residual form to satisfy the
local equilibrium given its loss is adequately minimized; Third, due to the
compact support of test functions, the nonlocal information over a subdomain
can be incorporated in the residuals, whereas S-NIM only captures the misfit
information locally at the sample points.
On the other hand, we observe that V-NIM using the cubic B-spline test
function (V-NIM/c) yields superior performance compared to V-NIM with a
Heaviside step function (V-NIM/h). This is understandable because the test
functions with higher order continuity will improve the accuracy and stability in
the local weak form, at the expense of obtaining a more complicated residual
form (see Eqs. (26) and (28)).

6.1.2. NIM versus PINN


To highlight the superior accuracy and efficiency of the proposed hybrid
framework, we compare the solutions obtained by the standard PINN model and
the S-NIM and V-NIM model, in which the cubic trial shape function (p = 3)
with a normalized support size of ā = 3.5 is adopted for NeuroPU approximation.
The hyperparameters used for S-NIM and V-NIM remain the same as Table

22
2. To ensure sufficient accuracy in the PINN solution, we utilize Nf = 10, 400
uniformly distributed residual points, whereas only NT = 2, 601 subdomains
(or Nf = 2, 601 sample points) are considered in V-NIM (or S-NIM). A neural
network with hidden layers 4 × [40] is adopted for the PINN method.
Figure 5 shows the comparison of the approximate solutions obtained by dif-
ferent NIMs and PINN, and their corresponding absolute point-wise errors. The
maximum errors in solution approximation for all cases are less 0.01. Compared
to PINN, the results indicate that S-NIM generally provides a slightly improved
approximation on the edge regions, albeit with a slightly larger error over the
higher-gradient areas. Nevertheless, V-NIM yields the most accurate results,
indicating the preferable accuracy of the proposed V-NIM methods over the
PINN method.
Furthermore, the comparison of corresponding mean absolute errors (M AE)
and training costs are listed in Table 3. The M AE of S-NIM using 2601 residual
points is 1.11 × 10−3 , which showcases a slight degradation over the PINN
method with M AE = 6.32 × 10−4 . However, this is because S-NIM only utilizes
one fourth of sample points compared to PINN. In terms of training cost, PINN
takes approximately 10.65s for every 1000 epochs, approximately 8 times slower
than S-NIM, as illustrated in Table 3. Furthermore, in order to demonstrate the
superior ability of S-VIM over PINN method, we adopt an increased number of
residual points (Nf = 10, 000) to train the S-NIM model, denoted as S-NIM2 in
Table 3. The corresponding M AE error is substantially reduced to 2.19 × 10−4
from 1.11 × 10−3 . The training time for S-NIM2 becomes 2.01s/1000 epochs,
which is 1.4 times longer than the S-NIM case with Nf = 2, 601. Overall,
this refined S-NIM solution yields 3 times higher accuracy and 5 times higher
efficiency compared with the PINN method, demonstrating the enhancement by
introducing the NeuroPU approximation in S-VIM.
The V-NIM methods further improve the performance, outperforming PINN
by approximately 1.5 times and 8 times in terms of accuracy when using V-NIM/h
and V-NIM/c, respectively. This remarkably higher accuracy is attributed to
the local variational form as we described in Section 4.2. Consistent with the
observation in Section 6.1.1, the employment of smooth test functions (V-NIM/c)
leads to remarkably higher accuracy in approximating the solution.
Because the shape functions in NIMs are pre-computed and stored, similar
to the approach in FEM, the additional computational cost incurred from using
higher-order approximation is, in fact, quite marginal during online training. It
is interesting to observe that, owing to the lower order derivatives involved in
the weak-form residuals, V-NIM even demonstrates superior training efficiency
to S-NIM. Specifically, the training of V-NIM for every 1000 epochs costs
approximately 1.20s, resulting in a 1.2x faster training rate than that of S-NIM.
The efficiency enhancement becomes more pronounced when compared to PINN,
which leads to a 10x speedup.

6.1.3. Convergence study


The preceding results show the outstanding performance of the proposed
V-NIM framework, resulting from the introduction of NeuroPU and local weak

23
Methods PINN S-NIM S-NIM2 V-NIM/h V-NIM/c
Epochs 50000
Training time:
10.65 1.46 2.01 1.21 1.20
[s]/1000 epochs
M AE 6.32e-04 1.11e-03 2.19e-04 4.21e-04 7.89e-05

Table 3: Training results of PINN, S-NIM, V-NIM/h and V-NIM/c for 2D Poisson’s problem,
where the corresponding hyperparameters are referred to Table 2. For the demonstration of
the effect of sample points, we also provide the S-NIM2 solution that is obtained by using
Nf = 10, 000 sample points.

Figure 5: Comparison of the approximated displacement (upper row) and the absolute point-
wise errors (lower row) obtained by (a) PINN, (b) S-NIM, (c) V-NIM/h and (d) V-NIM/c.
The cubic NeuroPU shape function is adopted for all NIM methods.

formulation. In this subsection, we will illustrate the convergence property of


V-NIM in relation to meshfree discretization, exploring various combinations of
test and trial functions. We will consider four distinct discretizations, comprising
121, 441, 961 and 1681 uniformly distributed nodes, respectively. Note that we
have excluded the discussion of strong-form methods due to their lack of clearly
convergent behavior under the same training conditions.
In this test, we adopt 100,000 epochs with a decaying learning rate from 10−3
to 10−5 for Adam optimizer and a fixed number of subdomains (NT = 2, 601)
to ensure a stable solution. The normalized sizes of support domain are set
as ā = 2.5 for quadratic shape function and ā = 3.5 for cubic shape function,
respectively. The other unstated parameters are referred to Table 2. To properly
measure the solution errors, we define the errors in relative L2 norm and relative
semi-H 1 norm, namely, e0 and e1 , as follows:
qR qR P
h )2 dΩ 2 h 2 dΩ


(u − u Ω i=1 u,i − u,i
e0 = qR , c1 = qR P (35)
2 2

u2 dΩ Ω i=1 u,i dΩ

where u and uh denote the reference and predicted solution, respectively.

24
Figure 6 displays the convergence results of V-NIM/h (left) and V-NIM/c
(right) when employing different-order bases (p = 1, 2, 3) in the trial (NeuroPU)
shape functions. In the case of V-NIM/h, we do not consider the linear shape
function (p = 1) as the resultant weak form solution becomes unstable due to
the low-order continuity of the Heaviside step function. Overall, the results show
that both the e0 and e1 errors of the NIM methods progressively decrease as
refining the meshfree discretizations, where the rates of asymptotic convergence
are also provided in the figure. It is interesting to notice that these convergence
rates approximately follow the error estimates derived from the classical FEM
or Meshfree approximation [40, 15], i.e., e0 ∼ O(hp+1 ) and e1 ∼ O(hp ), where
h represents the characteristic nodal distance and p is the order of basis for
trial functions. For example, the convergence rates of e0 and e1 produced by
V-NIM/h, using quadratic (p = 2) NeuroPU shape functions, are approximately
re0 = 2.9 and re1 = 2.2, respectively. On the other hand, when employing
V-NIM/c with linear and quadratic NeuroPU shape functions, the convergence
rates for e0 are observed to be re0 = 2.3 and 3.4, while those for e1 are re1 = 1.1
and 2.5, respectively. Surprisingly, we notice that the cases with cubic shape
functions (p = 3) yield increased convergence rates in V-NIMs.
It is clearly shown in Figure 6 that in comparison to V-NIM/h, V-NIM/c
tends to offer better stability and attain favorable accuracy due to the high
continuity of the test function.

Figure 6: Convergence study of the V-NIM methods using different test functions: (left)
V-NIM/h with Heaviside step function; (right) V-NIM/c with cubic B-spline function. The
convergence rates presented in the legend are determined by calculating the average slope
of the last two segments. p denotes the order of basis used for the NeuroPU approximation
function in the NIM methods.

6.2. Linear elasticity: Defected plate


In this example, we consider a plane-stress elastic square plate subjected to
a uniform normal traction, as shown in Figure 7(left). Due to the symmetry of
the problem, only a quarter plate with a length of 0.5 m is simulated, where
the Young’s modulus E and Poisson’s ratio ν are 20 MPa and 0.25, respectively,
and the traction imposed on the right side is T = 1 MPa.

25
Figure 7: Left: Schematic of defected plate under normal traction; Right: Distribution of
nodes (black points) and centers of subdomains (green points).

For demonstration, the V-NIM solver with Heaviside test function (V-NIM/h)
is adopted to solve this problem, where the associated normalized size of sub-
domains is set as r̄ = 1.2. The model discretization with 727 nodes and 4117
subdomains is shown in Figure 7(right). A neural network with one hidden layer
(10 neurons), i.e., 1 × [10], and the quadratic trial shape function with normalized
support size of ā = 2.4 are employed to construct the NeuroPU approximation
in V-NIM/h. As it was reported that the standard PINN performs inefficiently
in elasticity problems, the mixed-variable based PINN method developed in [68]
for elasticity mechanics is adopted here as the baseline for comparison. We note
that, consistent with the previous examples, the penalty method is still used
to impose the boundary conditions for both V-NIM/h and the mixed-variable
PINN, in contrast to the hard enforcement of boundary conditions as reported
in [68]. The architecture of PINN is set as 5 × [50], and 22000 residual points
are fed for training. An Adam optimizer with 150,000 epochs and a decaying
learning rate from 10−3 to 10−5 is utilized for both PINN and V-NIM/h methods.
All the hyperparameters of method settings are listed in Table 4.
The V-NIM/h method exhibits remarkably high training efficiency, as shown
in Table 4, with a training speedup of nearly 15 times compared to the mixed-
variable base PINN method (i.e., 2.25s vs. 30.24s for every 1000 epochs of
training). Apart from the training time, Table 4 shows the comparison of
e0 and e1 errors against the FEM reference solution, revealing approximately
5 times higher accuracy in V-NIM/h compared with PINN. This illustrates
the effectiveness of the end-to-end differentiation capacity within the proposed
variational framework, as well as the enhanced efficiency and accuracy achieved
through the use of NeuroPU for computing approximations and spatial gradients.
The approximated displacement and stress distributions generated by V-
NIM/h, PINN and FEM methods are visualized in Figures 8 and 9, respectively.
It is observed that V-NIM/h consistently yields agreeable results with the
reference solution obtained by the FEM method, while the PINN model is not

26
Methods V-NIM/h PINN
Neural networks 1×[10] 5 × [50]
Nh 727 N/A
Subdomain size r̄ 1.2 N/A
Support size ā 2.4 N/A
Nf or NT 4117 22000
α 100 100
Segments 2×2 N/A
Quadrature rule 3×3 N/A
Epochs 150,000
Training time:
2.25 30.24
[s]/1000 epochs
e0 : 1.44e-2 e0 : 8.02e-2
Error
e1 : 1.37e-2 e1 : 2.84e-1

Table 4: Hyperparameters and training results of the V-NIM/h and the mixed-variable PINN
for 2D elasticity problem, where the FEM reference solution is used for calculating the errors.

Figure 8: Comparison of the approximated displacement computed by (a) V-NIM/h, (b) FEM
and (c) Mixed-variable PINN.

capable of capturing the stress concentration around the notch region, leading
to a less satisfactory solution. The point-wise error of the displacement (ux
and uy ) and two stress fields (σxx and σyy ) are also portrayed in Figure 10.
It shows that the errors of PINN are normally 5 ∼ 10 times larger than the
corresponding errors produced by V-NIM/h. The improvement of V-NIM is
particularly pronounced in the stress field, which further confirms the proposed
method offers enhanced approximation capability in the higher-order derivatives
of solution.
Taking a closer observation, the stress distributions around the notch are
plotted in Figure 11, where the reference FEM, PINN, and V-NIM/h are com-
pared. It is evident that the solutions obtained by V-NIM/h method closely
align with the reference solution, demonstrating its advantage of accuracy in
critical regions. In contrast, the results obtained through the PINN method

27
Figure 9: Comparison of the approximated stress components σxx , σyy and σxy computed by
(a) V-NIM/h, (b) FEM and (c) Mixed-variable PINN.

show significant deviations from the reference solution, failing to capture the
steep changes of stress on the notch along with the degree. This could be due to
the insufficient approximation of the network model and the training difficulty
associated with the elasticity problem.

6.3. Surrogate modeling for parameterized elliptic PDE


In this example, we aim to investigate the applicability of the NIM method
for surrogate modeling. Let us consider a parameterized elliptic PDE [29] defined
in Ω = [0, 1] × [0, 1]
µ1 µ2 u
−∇2 u(x, y) + (e − 1) = 100 sin(2πx) sin(2πy) (36)
µ2

and the associated Dirichlet (essential) boundary condition defined at Γ = ∂Ω is

u(0, y) = u(1, y) = u(x, 0) = u(x, 1) = 0 (37)

where µ1 and µ2 are the system parameters, with (µ1 , µ2 ) ∈ D = [0.01, 10]2 .
Following the network architecture of NeuroPU shown in Figure 1, we input
(µ1 , µ2 ) into the neural network. The resulting outputs are the respective Nh
nodal coefficient functions obtained after passing through hidden layers structured
as 4 × [40], representing the surrogate model associated with the two parameters.
We consider the V-NIM model with the cubic B-spline test function (V-
NIM/c), where the computation domain is discretized using 441 uniform nodes,
with an equal number of subdomains, i.e., Nh = NT = 441. The penalty number
for essential boundary enforcement is set as 10. The training of the V-NIM

28
Figure 10: Comparison of the point-wise error of displacement and stress components obtained
by (a) Mixed-variable PINN, (b) V-NIM/h.

Figure 11: Comparison of the stress distribution (a) σxx , (b) σyy and (c) σxy on the notched
surface obtained by the PINN, V-NIM/h and FEM methods.

surrogate model is conducted by 7000 epochs, utilizing an Adam optimizer with


a learning rate that decreases from 1e-3 to 1e-4. The normalized support size
and subdomain size are given by ā = 2.5 and r̄ = 1.5.
To evaluate the extrapolation performance of the V-NIM surrogate, 121
uniformly distributed parameter points (i.e., Nµ = 121) within (µ1 , µ1 ) ∈
[0.01, 6]2 ⊂ D are used for training the V-NIM model (30), and then the trained
model will be tested on the parameter set (µ1 , µ1 ) = [10, 10] that is outside the
range of training set.
Figure 12 depicts the loss evolution during the training process of the V-NIM
surrogate using the Adam optimizer for 7,000 epochs. It shows that the V-NIM
model converges well after about 5,000 epochs and the boundary condition is
satisfied adequately. The prediction by V-NIM at the testing parameter point
(µ1 , µ2 ) = [10, 10] is given in Figure 13. It can be seen from the error distribution
that the higher errors locate at boundary areas and the regions where large
gradients appear. However, the maximum point-wise error of V-NIM is less
than 0.02. Overall, the result exhibits a desirable agreement with the reference
solution despite being outside the training set. This demonstrates the ability
of the NIM method for accurate extrapolative prediction beyond the training

29
Figure 12: The evolution of L2 error (Left) and the M AE loss function (Right) during training
the V-NIM/c surrogate. The loss term associated with the Dirichlet boundary condition is
also provided.

region.
Additionally, Figure 14 presents the e0 error of the predicted solution for
(µ1 , µ2 ) ∈ D, where the training set is enclosed within a white box. The
remaining area represents the extrapolation region of the surrogate model,
illustrating the capability of V-NIM in surrogate modeling. It is noteworthy that
once the NIM surrogate model is trained, the predictions for other parameter
points can be generated in real-time with minimal additional computational
cost. This surrogate modeling capacity of the NIM framework offers a unique
advantage compared to conventional numerical solvers that require repeated full
computation for each case.

Figure 13: (Left) The reference solution, (Middle) predicted solution by the V-NIM/c surrogate;
(Right) the point-wise error. The parameter testing point is (µ1 , µ2 ) = [10, 10].

7. Numerical Results: Time-Dependent Problem

To showcase the applicability of the NIM method to dynamics problems, we


consider a time-dependent advection-diffusion (AD) equation [36, 46] defined in
computation domain (x, t) ∈ Ω × Ωt

 u,t + au,x = κu,xx , in Ω × Ωt
u(±1, t) = 0 (38)
u(x, 0) = u0 (x) = − sin(πx)

30
Figure 14: The distribution of e0 error of the V-NIM surrogate over the parameter space
(µ1 , µ1 ) ∈ D.

where Ω = [0, 1], Ωt = [0, 1], the Dirichlet (Essential) boundary is Γg = {(x, t)|x =
±1, t ∈ Ωt }, the initial boundary is Γ0 = {(x, t)|x ∈ Ω, t = 0}, and a = 1
and κ = 0.1/π represent the advection velocity and the diffusivity coefficient,
respectively. The analytical solution of Eq. (38) is given in [62] with infinite
series summation.

Methods V-NIM/c hp-VPINN


Trial functions NeuroPU Neural network
Test functions Cubic B-spline Legendre polynomials (Pn (x), n = 1, ..., 5)
Nh 41 N/A
Subdomain size r 1.5h 0.25
Support size a 2.5h N/A
Nµ 41 N/A
Subdomain NT 41 (1D) 4 × 4 (2D)
α1 , α2 1,1 10, 10
Quadrature rule 15*1 (1D) 10*10 (2D)
Epochs 100000 150000

Table 5: Hyperparameters of the V-NIM and hp-VPINN for the advection-diffusion equation.
The support size a and the subdomain size r are calculated by a = āh = 1.5/20 = 0.075 and
r = r̄h = 2.5/20 = 0.125, where h = 1/20 is the characteristic nodal distance.

By defining the temporal variable as a system parameter input, the NeuroPU


approximation (16) for this problem can be rewritten as
X
ûh (x, t; θ) = ΨI (x)dˆI (t; θ) (39)
I∈Sx

It is noted that, aided by NeuroPU approximation, the NIM method avoids


the necessity of applying semi-discretization to the spatiotemporal domain that
is commonly used in numerical schemes for time-dependent problems [54, 40].

31
Instead, NIM is able to directly approximate the solution as a continuous
function of temporal variable. While it is not within the scope of this study,
this continuous representation opens up a potential avenue to deal with time-
dependent observation data for inverse dynamics problems. On the other hand,
NIM simplifies the integration process by decoupling the spatial and temporal
domains, unlike the two-dimensional spatiotemporal integration required for the
weak form-based PINN methods [46].
For demonstration, we utilize the V-NIM method with cubic B-spline function
as the test functions, i.e., V-NIM/c method, to solve this problem. With the
employment of Eq. (39), the local variational residual for the AD equation (38)
over Ωs can be derived as
Z
h
v ûh,t + aûh,x − κûh,xx dΩ

Rs =
Ωs
Z Z Z (40)
h h h h

=κ v,x û,x dΩ − κ vû,x nx dΓ + v û,t + aû,x dΩ
Ωs Γsu Ωs

where the divergence theorem is applied and the boundary integral term on
Ls is canceled due to the boundary vanishing property of the cubic B-spline
function employed as test functions (refer to Section 4.2.3). The corresponding
loss function modified based on (30) can be formulated as
Nµ NT Nµ
1 XX 2 α1 X h 2
LV (θ) = Rhs (tj ) + û (±1, tj ) − ū(±1, tj )
Nµ NT j=1 s=1 Nµ j=1
(41)
NT
α2 X 2
+ ûh (xs , 0) − u0 (xs )
NT s=1

where Nµ is considered as the number of sampling points in the parameter space,


namely, the temporal space Ωt .
As shown in Table 5, for the setting of V-NIM/c, Nh = 41 uniformly dis-
tributed nodes and NT = 41 subdomains are utilized for meshfree discretization
in the physical domain Ω = [−1, 1], whereas Nµ = 41 sampling points along the
temporal dimension Ωt = [0, 1] are used to train the neural network of NeuroPU
approximation. The normalized size of subdomains and quadratic trial shape
functions are set as r̄ = 1.5 and ā = 2.5, respectively. For comparison, we also
conduct simulations using the hp-VPINN method [46], where the associated
hyperparameters of hp-VPINN are kept the same as Figure 16 in the reference
[46] and summarized in Table 5. We would like to emphasize that achieving a
perfectly fair comparison is nontrivial, if not impossible, due to the introduction
of NeuroPU and the local variational form. These elements render the proposed
V-NIM a fundamentally distinct approach when compared to hp-VPINN. For in-
stance, we have noticed that hp-VPINN necessitates using a set of test functions
upto 5th order polynomials as well as a large number of training epochs to attain
sufficient accuracy, which however are not required in the NIM solver. Hence,
the following comparative study should be regarded as an observational analysis,
recognizing the inherent differences in the methodologies being compared.

32
Methods Neural networks Time [s]/100 epochs M AE e0
V-NIM/c 1 × [10] 0.98 2.64e-03 9.59e-02
V-NIM/c 2 × [10] 1.16 2.04e-03 7.64e-02
V-NIM/c 3 × [20] 1.37 1.37e-03 3.24e-02
V-NIM/c 4 × [30] 1.67 1.17e-03 2.59e-02
hp-VPINN 3 × [5] 1.23 1.81e-02 7.24e-01
hp-VPINN 3 × [20] 1.91 1.69e-02 5.83e-01

Table 6: Comparison of V-NIM/c and hp-VPINN under different sizes of neural networks. The
model parameters can be referred to Table 5.

The numerical tests with regard to different sizes of neural networks are
conducted to show the accuracy and robustness of the proposed method V-NIM/c
and hp-VPINN. As can be seen from the M AE and e0 errors in Table 6, an
enhancement in accuracy is attained as a larger neural network is adopted in
the V-NIM method, with increasing training cost (see the ”Time/100 epochs”
column in Table 6) as expected. This demonstrates a larger network can provide
a better approximation capacity to capture the time-dependent behaviors. Due
to the hybrid approximation property by integrating high-order meshfree shape
functions and neural networks, the search space in V-NIM is greatly reduced
compared to the conventional method (e.g., hp-VPINN) purely approximated by
neural networks. Consequently, the proposed V-NIM is capable of attaining a
preferable accuracy as hp-VPINN but only with a much smaller network (1 × [10]
vs. 3 × [20]), leading a fraction of training cost (as shown in Table 6). In
addition, rather than relying on expensive AD to calculate high-order derivatives,
the introduction of NeuroPU approximation in NIM enables the pre-computed
spatial gradients that singly operated on the shape functions (18). This also
contributes to a reduction in the computational complexity.
The snapshots of the solutions of the V-NIM/c and hp-VPINN methods at
t = 0.2s, 0.6s and 1.0s are plotted in Figure 15, which evinces that the V-NIM/c
solution agrees well with the analytical solution, while for the hp-VPINN result
a small deviation is observed in the region where the solution has relatively large
magnitudes. This implies the enhanced approximation of V-NIM for the sharply
changing solution and higher-order derivatives. Figure 16 shows the comparison
of point-wise errors of hp-VPINN and V-NIM/c over the whole spatial-temporal
domain, with different sizes of neural networks. The maximum point-wise error
in V-NIM/c solutions with different sizes of neural networks consistently remains
under 0.04, with only a minor error observed at the top edge. In contrast, the
error snapshot of the hp-VPINN solution exhibits larger errors over most of
the domain. It is also noted that 50,000 more epochs are used in training the
hp-VPINN model to obtain a satisfactory result.
As supplementary, we provide the evolution of training loss of V-NIM/c using
neural network with dimensions 1 × [10], 3 × [20] and 4 × [30] in Figure 17, which
showcases the stable convergence property of the NIM method.

33
Figure 15: The snapshots of analytical solution, and approximation solutions obtained by
V-NIM/c and hp-VPINN methods at t = 0.2s, 0.6s and 1.0s.

Figure 16: Comparison of the point-wise errors of the approximate solution obtained by V-
NIM/c and hp-VPINN. The horizontal and vertical axes denote the time and space coordinates,
respectively.

8. Conclusion

In this study, we present a novel framework, NIM, as a differentiable


programming-based AI methodology to solve a variety of computational me-
chanics problems. The integration of the numerically discretized system using
meshfree methods with deep neural networks encoded in differentiable computa-
tion graphs enables the end-to-end training of the entire hybrid model to seek
approximate solutions of the PDEs. The main characteristic of NIM is the hybrid
approximation scheme, NeuroPU, which interpolates DNN representations with
meshfree basis functions based on the PU concept, is introduced to enhance
the approximation accuracy and computational efficiency. As an example, the
reproducing kernel (RK) shape function is employed in the NeuroPU approxima-
tion as it permits arbitrary accuracy and smoothness defined a priori and it is
well-suited for meshfree discretization. Thanks to the interpolation property of
NeuroPU, the size of neural networks and subsequently the number of sampling
points required to train the NIM solution can be significantly reduced while
achieving satisfactory accuracy.
Within the proposed NIM framework, we propose two meshfree solvers, S-

34
Figure 17: The evolution of training loss and e0 error generated by V-NIM/c using different
neural network architectures: (a) 1 × [10], (b) 3 × [20] and (c) 4 × [30].

NIM and V-NIM. While S-NIM presents a straightforward solution procedure by


using strong-form PDEs in the loss function, our particular interest lies in V-NIM,
which is built on the variationally consistent formulation, so that the required
regularity of the approximate solution reduces. To achieve this, we introduce a
local Petrov-Galerkin approach that constructs the loss function of V-NIM using
local residuals defined on overlapping subdomains. This embedded meshfree
property eliminates the need for costly conforming mesh generation and enables
efficient batch training. The variational nature of V-NIM offers the flexibility
of using various test functions. Two types of test functions including Heaviside
step function and cubic B-spline function, are employed for investigation in our
study.
The merits and effectiveness of the NIM solvers have been demonstrated
through various numerical examples across static and dynamic problems in
comparison with the classical FEM, standard PINN and variational PINN
methods. The results of static problems (Section 6.1 and 6.2) show that the
NIM methods utilizing the NeuroPU approximation significantly enhance both
efficiency and accuracy in comparison to the PINN method. For instance,
in the case of Poisson’s problem, S-NIM exhibits half the M AE errors and
requires only 1/5 of the training time when using a similar number of sampling
points compared to PINN. Attributing to the enhanced stability and consistency
enforced by the local variational formulation, even greater improvements are
achieved by V-NIMs (V-NIM/c, etc.), leading to nearly an order of magnitude
higher accuracy while being 10 times faster than PINN. V-NIMs also demonstrate
favorable convergence rates under different orders of NeuroPU shape functions.
In the elasticity problem, we also highlight the superior performance of V-NIM in
approximating the stress field, a higher-order derivative of the solution, compared
to the strong-form methods, such as S-NIM and PINN.

35
The versatility of the proposed differentiable method beyond conventional
simulation techniques is also demonstrated in this study. Specifically, we highlight
the desirable extrapolative ability and the real-time prediction of the V-NIM
model for surrogate modeling of a parameterized elliptic PDE. It is also shown
that, by considering the temporal variable as inputs to the NeuroPU approxi-
mation, the NIM method yields more stable results while maintaining superior
training efficiency compared to the hp-VPINN method for the time-dependent
(advection-diffusion equation) problem.
In conclusion, we describe the NIM method as a differentiable meshfree
solver, enabling end-to-end gradient-based optimization procedures for seeking
solutions. This innovative method holds great promise for the development
of next-generation physics-based data-driven solvers that offer a remarkable
balance between accuracy and computational efficiency. Additionally, it provides
a versatile framework for data assimilation and adaptive refinement due to
its meshfree nature. It is also worth noting that NIM opens a new avenue
for creating more efficient deep learning models through seamlessly blending
shape functions that represent the finite discretized domain with DNNs that
represent the problem-parameter space. In the future, we plan to explore the
performance of the NIM method in operator learning, inverse modeling, and
diverse applications related to nonlinear material modeling.

Acknowledgment

This research was partially supported by Q.Z. He’s Startup Fund and Data
Science Initiative (DSI) Seed Grant at the University of Minnesota. The authors
also acknowledge the Minnesota Supercomputing Institute (MSI) for providing
resources that contributed to the research results reported within this paper.

Appendix A.

For 2D elasticity associated with the NIM framework (Section 4), the ap-
proximated displacement ûh (x) is given as
X
ûh (x) = NI (x)dˆI (A.1)
I∈Sx

also the stress σ̂ h (x) and traction t̂h (x) tensors are given as
X
σ̂ h (x) = DBI (x)dˆI (A.2)
I∈Sx

X
t̂h (x) = nDBI (x)dˆI (A.3)
I∈Sx

where NI BI n, and D are given as

36
   T
  ΦI,x 0 nx 0
ΦI,x 0
NI = , BI =  0 ΦI,y  , n= 0 ny  (A.4)
0 ΦI,y
ΦI,y ΦI,x ny nx

and
  (
1 ν̄
D=
Ē 
ν̄ 1  , Ē = E, ν̄ = ν (plane stress)
1 − ν̄ 2 E ν
(1 − ν̄)/2 Ē = 1−ν 2 , ν̄ = 1−ν (plane strain)
(A.5)
with E and ν being the Young’s modulus and Poisson’s ratio, respectively.

37
References

[1] Atluri, S., Zhu, T., 2000. New concepts in meshless methods. International
journal for numerical methods in engineering 47, 537–556.

[2] Atluri, S.N., Zhu, T., 1998. A new meshless local petrov-galerkin (mlpg)
approach in computational mechanics. Computational mechanics 22, 117–
127.
[3] Babuška, I., Melenk, J.M., 1997. The partition of unity method. Interna-
tional journal for numerical methods in engineering 40, 727–758.

[4] Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M., 2018. Auto-
matic differentiation in machine learning: a survey. Journal of Marchine
Learning Research 18, 1–43.
[5] Belytschko, T., Lu, Y.Y., Gu, L., 1994. Element-free galerkin methods.
International journal for numerical methods in engineering 37, 229–256.

[6] Berg, J., Nyström, K., 2018. A unified deep artificial neural network
approach to partial differential equations in complex geometries. Neurocom-
puting 317, 28–41.
[7] Berrone, S., Canuto, C., Pintore, M., 2022. Variational physics informed
neural networks: the role of quadratures and test functions. Journal of
Scientific Computing 92, 100.
[8] Bezgin, D.A., Buhendwa, A.B., Adams, N.A., 2023. Jax-fluids: A fully-
differentiable high-order computational fluid dynamics solver for compress-
ible two-phase flows. Computer Physics Communications 282, 108527.

[9] Blum, E.K., Li, L.K., 1991. Approximation theory and feedforward networks.
Neural networks 4, 511–515.
[10] Bottasso, C.L., Micheletti, S., Sacco, R., 2002. The discontinuous petrov–
galerkin method for elliptic problems. Computer Methods in Applied
Mechanics and Engineering 191, 3391–3409.

[11] Bradbury, J., Frostig, R., Hawkins, P., Johnson, M.J., Leary, C., Maclaurin,
D., Necula, G., Paszke, A., VanderPlas, J., Wanderman-Milne, S., et al.,
2018. Jax: composable transformations of python+ numpy programs .
[12] Brunton, S.L., Kutz, J.N., 2019. Data-driven science and engineering:
Machine learning, dynamical systems, and control. Cambridge University
Press.
[13] Cai, S., Mao, Z., Wang, Z., Yin, M., Karniadakis, G.E., 2021. Physics-
informed neural networks (pinns) for fluid mechanics: A review. Acta
Mechanica Sinica 37, 1727–1738.

38
[14] Cardiff, P., Demirdžić, I., 2021. Thirty years of the finite volume method
for solid mechanics. Archives of Computational Methods in Engineering 28,
3721–3780.
[15] Chen, J.S., Hillman, M., Chi, S.W., 2017. Meshfree methods: progress
made after 20 years. Journal of Engineering Mechanics 143, 04017001.

[16] Chen, J.S., Pan, C., Wu, C.T., Liu, W.K., 1996. Reproducing kernel particle
methods for large deformation analysis of non-linear structures. Computer
methods in applied mechanics and engineering 139, 195–227.
[17] Chiu, P.H., Wong, J.C., Ooi, C., Dao, M.H., Ong, Y.S., 2022. Can-pinn: A
fast physics-informed neural network based on coupled-automatic–numerical
differentiation method. Computer Methods in Applied Mechanics and
Engineering 395, 114909.
[18] Clough, R., 1960. The Finite Element Method in Plane Stress Analysis.
American Society of Civil Engineers.

[19] Cuomo, S., Di Cola, V.S., Giampaolo, F., Rozza, G., Raissi, M., Piccialli, F.,
2022. Scientific machine learning through physics–informed neural networks:
Where we are and what’s next. Journal of Scientific Computing 92, 88.
[20] Dong, S., Li, Z., 2021. Local extreme learning machines and domain
decomposition for solving linear and nonlinear partial differential equations.
Computer Methods in Applied Mechanics and Engineering 387, 114129.
[21] Dong, Y., Liu, T., Li, Z., Qiao, P., 2023. Deepfem: A novel element-based
deep learning approach for solving nonlinear partial differential equations
in computational solid mechanics. Journal of Engineering Mechanics 149,
04022102.

[22] Du, H., Zhao, Z., Cheng, H., Yan, J., He, Q., 2023. Modeling density-
driven flow in porous media by physics-informed neural networks for co2
sequestration. Computers and Geotechnics 159, 105433.
[23] Eggersmann, R., Kirchdoerfer, T., Reese, S., Stainier, L., Ortiz, M., 2019.
Model-free data-driven inelasticity. Computer Methods in Applied Mechan-
ics and Engineering 350, 81–99.
[24] Fang, Z., 2021. A high-efficient hybrid physics-informed neural networks
based on convolutional neural network. IEEE Transactions on Neural
Networks and Learning Systems 33, 5514–5526.
[25] Gao, H., Sun, L., Wang, J.X., 2021. Phygeonet: Physics-informed geometry-
adaptive convolutional neural networks for solving parameterized steady-
state pdes on irregular domain. Journal of Computational Physics 428,
110079.

39
[26] Gasick, J., Qian, X., 2023. Isogeometric neural networks: A new deep
learning approach for solving parameterized partial differential equations.
Computer Methods in Applied Mechanics and Engineering 405, 115839.
[27] Glorot, X., Bengio, Y., 2010. Understanding the difficulty of training deep
feedforward neural networks, in: Proceedings of the thirteenth international
conference on artificial intelligence and statistics, JMLR Workshop and
Conference Proceedings. pp. 249–256.
[28] Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y., 2016. Deep learning.
volume 1. MIT press Cambridge.
[29] Grepl, M.A., Maday, Y., Nguyen, N.C., Patera, A.T., 2007. Efficient reduced-
basis treatment of nonaffine and nonlinear partial differential equations.
ESAIM: Mathematical Modelling and Numerical Analysis 41, 575–605.
[30] Haghighat, E., Bekar, A.C., Madenci, E., Juanes, R., 2021a. A nonlocal
physics-informed deep learning framework using the peridynamic differential
operator. Computer Methods in Applied Mechanics and Engineering 385,
114012.
[31] Haghighat, E., Raissi, M., Moure, A., Gomez, H., Juanes, R., 2021b.
A physics-informed deep learning framework for inversion and surrogate
modeling in solid mechanics. Computer Methods in Applied Mechanics and
Engineering 379, 113741.

[32] Han, Z., Atluri, S., 2004. A meshless local petrov-galerkin (mlpg) approach
for 3-dimensional elasto-dynamics. CMC: Computers, Materials & Continua
1, 129–140.
[33] He, Q., Barajas-Solano, D., Tartakovsky, G., Tartakovsky, A.M., 2020.
Physics-informed neural networks for multiphysics data assimilation with
application to subsurface transport. Advances in Water Resources 141,
103610.
[34] He, Q., Chen, J.S., 2020. A physics-constrained data-driven approach based
on locally convex reconstruction for noisy database. Computer Methods in
Applied Mechanics and Engineering 363, 112791.

[35] He, Q., Perego, M., Howard, A.A., Karniadakis, G.E., Stinis,
P., 2023. A hybrid deep neural operator/finite element method
for ice-sheet modeling. Journal of Computational Physics 492,
112428. URL: https://www.sciencedirect.com/science/article/pii/
S0021999123005235, doi:doi:https://doi.org/10.1016/j.jcp.2023.112428.

[36] He, Q., Tartakovsky, A.M., 2021. Physics-informed neural network method
for forward and backward advection-dispersion equations. Water Resources
Research 57, e2020WR029479.

40
[37] He, X., He, Q., Chen, J.S., 2021. Deep autoencoders for physics-constrained
data-driven nonlinear materials modeling. Computer Methods in Applied
Mechanics and Engineering 385, 114034.
[38] Hornik, K., 1991. Approximation capabilities of multilayer feedforward
networks. Neural networks 4, 251–257.
[39] Hughes, T.J., 1982. A theoretical framework for petrov-galerkin methods
with discontinuous weighting functions: Application to the streamline-
upwind procedure. Finite element in fluids 4, Chapter–3.
[40] Hughes, T.J., 2012. The finite element method: linear static and dynamic
finite element analysis. Courier Corporation.
[41] Hughes, T.J., Cottrell, J.A., Bazilevs, Y., 2005. Isogeometric analysis: Cad,
finite elements, nurbs, exact geometry and mesh refinement. Computer
methods in applied mechanics and engineering 194, 4135–4195.
[42] Innes, M., Edelman, A., Fischer, K., Rackauckas, C., Saba, E., Shah, V.B.,
Tebbutt, W., 2019. A differentiable programming system to bridge machine
learning and scientific computing. arXiv preprint arXiv:1907.07587 .
[43] Johnson, D., Maxfield, T., Jin, Y., Fedkiw, R., 2023. Software-based
automatic differentiation is flawed. arXiv preprint arXiv:2305.03863 .
[44] Karniadakis, G.E., Kevrekidis, I.G., Lu, L., Perdikaris, P., Wang, S., Yang,
L., 2021. Physics-informed machine learning. Nature Reviews Physics 3,
422–440.
[45] Kharazmi, E., Zhang, Z., Karniadakis, G.E., 2019. Variational physics-
informed neural networks for solving partial differential equations. arXiv
preprint arXiv:1912.00873 .
[46] Kharazmi, E., Zhang, Z., Karniadakis, G.E., 2021. hp-vpinns: Variational
physics-informed neural networks with domain decomposition. Computer
Methods in Applied Mechanics and Engineering 374, 113547.
[47] Khodayi-Mehr, R., Zavlanos, M., 2020. Varnet: Variational neural networks
for the solution of partial differential equations, in: Learning for Dynamics
and Control, PMLR. pp. 298–307.
[48] Kirchdoerfer, T., Ortiz, M., 2016. Data-driven computational mechanics.
Computer Methods in Applied Mechanics and Engineering 304, 81–101.
[49] Kochkov, D., Smith, J.A., Alieva, A., Wang, Q., Brenner, M.P., Hoyer, S.,
2021. Machine learning–accelerated computational fluid dynamics. Proceed-
ings of the National Academy of Sciences 118, e2101784118.
[50] Krishnapriyan, A., Gholami, A., Zhe, S., Kirby, R., Mahoney, M.W., 2021.
Characterizing possible failure modes in physics-informed neural networks.
Advances in Neural Information Processing Systems 34, 26548–26560.

41
[51] Lagaris, I.E., Likas, A., Fotiadis, D.I., 1998. Artificial neural networks for
solving ordinary and partial differential equations. IEEE transactions on
neural networks 9, 987–1000.
[52] Lee, H., Kang, I.S., 1990. Neural algorithm for solving differential equations.
Journal of Computational Physics 91, 110–131.
[53] Lee, K., Trask, N.A., Patel, R.G., Gulian, M.A., Cyr, E.C., 2021. Partition
of unity networks: deep hp-approximation. arXiv preprint arXiv:2101.11256
.
[54] LeVeque, R.J., 2007. Finite difference methods for ordinary and partial
differential equations: steady-state and time-dependent problems. SIAM.
[55] Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart,
A., Anandkumar, A., 2020. Fourier neural operator for parametric partial
differential equations. arXiv preprint arXiv:2010.08895 .
[56] Liu, W.K., Jun, S., Zhang, Y.F., 1995. Reproducing kernel particle methods.
International journal for numerical methods in fluids 20, 1081–1106.
[57] Liu, W.K., Li, S., Park, H.S., 2022. Eighty years of the finite element
method: Birth, evolution, and future. Archives of Computational Methods
in Engineering 29, 4431–4453.
[58] Lu, L., Jin, P., Pang, G., Zhang, Z., Karniadakis, G.E., 2021. Learning
nonlinear operators via deeponet based on the universal approximation
theorem of operators. Nature machine intelligence 3, 218–229.
[59] McClenny, L., Braga-Neto, U., 2020. Self-adaptive physics-informed neural
networks using a soft attention mechanism. arXiv preprint arXiv:2009.04544
.
[60] Meade Jr, A.J., Fernandez, A.A., 1994. The numerical solution of linear or-
dinary differential equations by feedforward neural networks. Mathematical
and Computer Modelling 19, 1–25.
[61] Mistani, P.A., Pakravan, S., Ilango, R., Gibou, F., 2023. Jax-dips: Neural
bootstrapping of finite discretization methods and application to elliptic
problems with discontinuities. Journal of Computational Physics , 112480.
[62] Mojtabi, A., Deville, M.O., 2015. One-dimensional linear advection–diffusion
equation: Analytical and finite element solutions. Computers & Fluids 107,
189–195.
[63] Montáns, F.J., Chinesta, F., Gómez-Bombarelli, R., Kutz, J.N., 2019. Data-
driven modeling and learning in science and engineering. Comptes Rendus
Mécanique 347, 845–855.
[64] Patera, A.T., 1984. A spectral element method for fluid dynamics: laminar
flow in a channel expansion. Journal of computational Physics 54, 468–488.

42
[65] Raissi, M., Perdikaris, P., Karniadakis, G.E., 2019. Physics-informed neural
networks: A deep learning framework for solving forward and inverse
problems involving nonlinear partial differential equations. Journal of
Computational physics 378, 686–707.
[66] Raissi, M., Yazdani, A., Karniadakis, G.E., 2020. Hidden fluid mechanics:
Learning velocity and pressure fields from flow visualizations. Science 367,
1026–1030.
[67] Ranade, R., Hill, C., Pathak, J., 2021. Discretizationnet: A machine-learning
based solver for navier–stokes equations using finite volume discretization.
Computer Methods in Applied Mechanics and Engineering 378, 113722.

[68] Rao, C., Sun, H., Liu, Y., 2021. Physics-informed deep learning for com-
putational elastodynamics without labeled data. Journal of Engineering
Mechanics 147, 04021043.
[69] Saha, S., Gan, Z., Cheng, L., Gao, J., Kafka, O.L., Xie, X., Li, H., Tajdari,
M., Kim, H.A., Liu, W.K., 2021. Hierarchical deep learning neural network
(hidenn): An artificial intelligence (ai) framework for computational science
and engineering. Computer Methods in Applied Mechanics and Engineering
373, 113452.
[70] Samaniego, E., Anitescu, C., Goswami, S., Nguyen-Thanh, V.M., Guo, H.,
Hamdia, K., Zhuang, X., Rabczuk, T., 2020. An energy approach to the
solution of partial differential equations in computational mechanics via
machine learning: Concepts, implementation and applications. Computer
Methods in Applied Mechanics and Engineering 362, 112790.
[71] Sanchez-Gonzalez, A., Godwin, J., Pfaff, T., Ying, R., Leskovec, J.,
Battaglia, P., 2020. Learning to simulate complex physics with graph
networks, in: International conference on machine learning, PMLR. pp.
8459–8468.
[72] Schmidt, M., Lipson, H., 2009. Distilling free-form natural laws from
experimental data. science 324, 81–85.
[73] Shukla, K., Jagtap, A.D., Karniadakis, G.E., 2021. Parallel physics-informed
neural networks via domain decomposition. Journal of Computational
Physics 447, 110683.
[74] Tancik, M., Srinivasan, P., Mildenhall, B., Fridovich-Keil, S., Raghavan, N.,
Singhal, U., Ramamoorthi, R., Barron, J., Ng, R., 2020. Fourier features
let networks learn high frequency functions in low dimensional domains.
Advances in Neural Information Processing Systems 33, 7537–7547.
[75] Taneja, K., He, X., He, Q., Chen, J., 2023. A multi-resolution physics-
informed recurrent neural network: Formulation and application to muscu-
loskeletal systems. arXiv preprint arXiv:2305.16593 .

43
[76] Tartakovsky, A.M., Marrero, C.O., Perdikaris, P., Tartakovsky, G.D.,
Barajas-Solano, D., 2020. Physics-informed deep neural networks for learn-
ing parameters and constitutive relationships in subsurface flow problems.
Water Resources Research 56, e2019WR026731.
[77] Wang, S., Teng, Y., Perdikaris, P., 2021a. Understanding and mitigating
gradient flow pathologies in physics-informed neural networks. SIAM Journal
on Scientific Computing 43, A3055–A3081.
[78] Wang, S., Wang, H., Perdikaris, P., 2021b. On the eigenvector bias of
fourier feature networks: From regression to solving multi-scale pdes with
physics-informed neural networks. Computer Methods in Applied Mechanics
and Engineering 384, 113938.
[79] Xue, T., Liao, S., Gan, Z., Park, C., Xie, X., Liu, W.K., Cao, J., 2023.
Jax-fem: A differentiable gpu-accelerated 3d finite element solver for au-
tomatic inverse design and mechanistic data science. Computer Physics
Communications , 108802.

[80] Yin, M., Zhang, E., Yu, Y., Karniadakis, G.E., 2022. Interfacing finite ele-
ments with deep neural operators for fast multiscale modeling of mechanics
problems. Computer methods in applied mechanics and engineering 402,
115027.
[81] Yu, B., et al., 2018. The deep ritz method: a deep learning-based numerical
algorithm for solving variational problems. Communications in Mathematics
and Statistics 6, 1–12.
[82] Zhang, R., Liu, Y., Sun, H., 2020. Physics-informed multi-lstm networks
for metamodeling of nonlinear structures. Computer Methods in Applied
Mechanics and Engineering 369, 113226.

44

You might also like