0% found this document useful (0 votes)
5 views

2002.01600

This paper introduces a method for modeling vector fields from physical systems using neural networks that adhere to linear operator constraints. The proposed approach guarantees that predictions will satisfy these constraints, allowing for reduced model complexity and data requirements. The method is demonstrated with both simulated and real data, showing improved performance over traditional constraint incorporation techniques.

Uploaded by

leon.josel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

2002.01600

This paper introduces a method for modeling vector fields from physical systems using neural networks that adhere to linear operator constraints. The proposed approach guarantees that predictions will satisfy these constraints, allowing for reduced model complexity and data requirements. The method is demonstrated with both simulated and real data, showing improved performance over traditional constraint incorporation techniques.

Uploaded by

leon.josel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Linearly Constrained Neural Networks

Johannes N. Hendriksa,∗, Carl Jidlingb , Adrian G. Willsa , Thomas B. Schönb

a School of Engineering, The University of Newcastle, Callaghan NSW 2308, Australia


b Department of Information Technology, Uppsala University, 75105 Uppsala, Sweden
arXiv:2002.01600v4 [stat.ML] 28 Apr 2021

Abstract

We present a novel approach to modelling and learning vector fields from phys-
ical systems using neural networks that explicitly satisfy known linear operator
constraints. To achieve this, the target function is modelled as a linear transfor-
mation of an underlying potential field, which is in turn modelled by a neural
network. This transformation is chosen such that any prediction of the target
function is guaranteed to satisfy the constraints. The approach is demonstrated
on both simulated and real data examples.
Keywords: Neural Networks, Linear operator constraints, Physical systems,
Vector fields

1. Introduction

Developments during recent years have established deep learning as the per-
haps most prominent member of the machine learning toolbox. Today neural
networks are present in a broad range of applications and are used for both
classification and regression problems. This includes the use of neural networks
to model and learn vector-valued quantities from physical systems such as mag-
netic fields [1], plasma fields [2], and the dynamics of conservative systems [3, 4],
to name a few. The popularity of neural networks is to a large extent explained

∗ Correspondingauthor
Email address: [email protected] (Johannes N. Hendriks)

Preprint submitted to Journal of Computational Physics April 29, 2021


by the highly flexible nature that enables these models to encode a very large
class of non-linear functions.

Nevertheless, the performance of the neural network is often dependent on care-


ful design and the amount of training data available. In particular, a larger
network is more flexible but also requires more training data to reduce the risk
of overfitting. Different types of regularisation techniques are sometimes used
to facilitate this balance.

Instead of focusing on the network per se, it may be just as important to consider
prior knowledge provided by the problem setting. For instance, the function of
interest can represent a quantity subject to fundamental physical constraints.
In some cases these physical constraints take the form of linear operator con-
straints. This includes many vector fields that are known to be either divergence-
or curl-free. Examples of divergent-free vector fields (also known as solenoidal
fields) are the magnetic field [5]—see Figure 1, the velocity of an anelastic flow
[6], the vorticity field [7, 8], and current density where the charge is constant
over time as given by the continuity equation [9, 10]. Low-mach number flow is a
simplification of the compressible Euler equations and describes flow with a pre-
scribed divergence [11] (i.e. it takes the form of an affine constraint). Another
example of fields satisfying an affine constraint is given by Maxwell’s equations,
which describe electromagnetic fields with a prescribed curl and divergence [12].
Within continuum mechanics, the stress field and strain field inside a solid object
satisfy the equilibrium conditions and the strain field inside a simply connected
body satisfies the compatibility constraints [13]. Equilibrium can also be used
as a constraint when modelling plasma fields [2].

The list can be made longer, but the point is clear – by making sure that certain
constraints are fulfilled, we (significantly) reduce the set of functions that could
explain our measured data. This, in turn, implies that we can maintain high
performance without requiring the same amount of flexibility. Put simply: we
can obtain the same results with a smaller network and less training data.

2
Figure 1: Magnetic field predictions (blue) using a constrained neural network trained on 500
observations (red) sampled from the trajectory indicated by the black curve. The magnetic
field, B is curl-free satisfying the constraint ∇B = 0, and the method proposed in this paper
ensures that the predictions satisfy this constraint.

1.1. Contribution

This paper presents a novel approach for designing neural network based models
that satisfy linear operator constraints. It is a step towards addressing a central
issue applying machine learning methods to physical systems: enforcement of
physical constraints in machine learning techniques. The approach models the
vector field to be learned as a linear transformation of an underlying potential
field. The benefits of using this approach are two-fold:

1. Any predictions made using this approach will satisfy the constraints for
the entire input space.

2. Incorporating known constraints reduces the problem size. This reduces


the amount of training data required and also allows a smaller neural
network to be used while still achieving the same performance. Reducing
the amount of data required can save time and money during the data
collection phase.

3
Additionally, the approach allows a wide class of neural network models to be
used to model the underlying function, including convolutional neural networks
and recurrent neural networks.

Existing methods (see Section 6—Related Work), have predominately tackled


the problem by either (a) augmenting the cost function to penalise constraint
violation or (b) developing problem specific models. In contrast we present a
general approach that guarantees the constraints to be satisfied, and can be
used for any linear operator constraints. Further, we show that the proposed
approach has better performance than that given by augmenting the cost func-
tion — a standard approach to incorporating constraints into neural network
models.

2. Problem Formulation

Assume we are given a data set of N observations {xi , yi }N


i=1 where xi denotes

the input and yi denotes the output. Both the input and output are potentially
vector-valued with xi ∈ RD and yi ∈ RK . Here we consider the regression
problem where the data can be described by the non-linear function yi = f (xi )+
ei , where ei is zero-mean white noise representing the measurement uncertainty.
In this work, a neural network is used to model f and can be described by

f (x) = hL (hL−1 (· · · h2 (h1 (x)))), (1)

where each hl (z) has the form

hl (z) = φl (Wl z + bl ). (2)

Here, L is the number of layers in the neural network, each φl is an element-


wise non-linear function commonly referred to as an activation function and
{Wl , bl }L
l=1 are the parameters of the neural network that are to be learned

from the data.

In addition to the data, we know that the function f should fulfil certain con-
straints
Cx [f ] = 0, (3)

4
where Cx is a linear operator [14] mapping the function f to another function
g. That is Cx [f ] = g. Further, we restrict Cx to be a linear operator, meaning
that Cx [λ1 f1 + λ2 f2 ] = λ1 Cx [f1 ] + λ2 Cx [f2 ], where λ1 , λ2 ∈ R. A simple example
is if the operator is a linear transformation Cx [f ] = Cf which together with the
constraints (3) forces a certain linear combination of the outputs to be linearly
dependent.

The operator Cx can be used to represent a wide class of linear constraints on


the function f and for a background on linear operators the interested reader
is referred to [14]. For example, we might know that the function f : R2 → R2
∂f1 ∂f2
should obey the partial differential equation Cx [f ] = ∂x1 + ∂x2 = 0.

The constraints can come from either known physical laws or other prior knowl-
edge about the data. Here, the objective is to determine an approach to derive
models based on neural networks such that all predictions from these models
will satisfy the constraints.

3. Building a Constrained Neural Network

In this section, an approach to learn a function using a neural network such that
any resulting estimate satisfies the constraints (3) is proposed. The novelty of
this approach is in recognising that vector fields subject to linear constraints
can, in general, be modelled by a linear transformation of a neural network
such that the learned field will always satisfy the constraints. This section first
presents the approach and then gives a brief discussion of conditions that may be
imposed on the neural network. Then, the following section gives two methods
for determining the required transformation.

3.1. Our Approach

Our approach designs a neural network that satisfies the constraint for all possi-
ble values of its parameters, rather than imposing constraints on the parameter
values themselves. This is done by considering f to be related to another func-

5
tion g via some linear operator Gx :

f = Gx [g]. (4)

The constraints (3) can then be written as

Cx [Gx [g]] = 0. (5)

We require this relation to hold for any function g. To do this, we will in-
terpret Cx and Gx as matrices and use a similar procedure to that of solving
systems of linear equations. Since Cx and Gx are linear operators, we can think
of Cx [f ] and Gx [g] as matrix-vector multiplications where Cx [f ] = Cx f , with
PK
(Cx f )i = j=1 (Cx )ij fj where each element (Cx )ij in the operator matrix is

a scalar operator. With this notation, (5) can be expressed as matrix-vector


products [14]:
Cx Gx g = 0, (6)

where a solution is given by


Cx Gx = 0. (7)

This reformulation imposes constraints on the operator Gx rather than on the


neural network model of f directly. We can then proceed by first modelling the
function g as a neural network and then transform it using the mapping (4) to
provide a neural network for f that explicitly satisfies the constraints according
to
f = Gx g. (8)

An illustration of the constrained model is given in Figure 2. The procedure to


design the neural network can now be divided into three steps:

1. Find an operator Gx satisfying the condition (7).

2. Choose a neural network structure for g.

3. Determine the neural network based model for f according to (8).

6
The choice of neural network structure in step 2 may have some conditions
placed upon it by the transformation found in step 1 for the resulting model to
be mathematically correct. For example, if the transformation contains partial
derivatives then this may restrict the choice of activation function. A more
detailed discussion is given in Section 3.2. Despite these conditions, the ap-
proach admits a reasonably wide class of neural network models including fully
connected neural networks (FCNN), convolutional neural networks (CNN), and
recurrent neural networks (RNN). For the ease of explanation, the examples
given in this paper use FCNNs.

The parameters of the resulting model can be learned using existing methods
such as stochastic gradient descent. It is worth noting that if the data requires
scaling then care should be taken as this scaling can modify the form of the
constraints.

In the case where the operator Gx contains partial derivatives, such as for curl-
free and divergent-free fields, the implementation can be done using automatic
differentiation such as the grad function in PyTorch [15].

3.2. Conditions due to Derivative Transformations

When the transformation Gx contains partial derivatives the underlying neural


network g must be chosen to satisfy some conditions. Intuitively, it is required
that the partial derivative of the neural network must be a function of both the
inputs and the network parameters. If this is not the case, then the model loses
the ability to represent a spatially varying target function. Here, we provide a
few examples of this.

If the transformation contains only first-order derivatives then this does not
result in any restrictive conditions. To see this consider a neural network with
a single hidden layer and an identity activation function in its output layer,

7
Input Hidden Ouput
layer layer layer

x1

a1
x2

..
f1

x3

.. . ..
f2

.
xn
an
.
fn

(a) Standard Neural Network


Input Hidden Ouput Mapping Constrained
layer layer layer function Outputs

x1

a1
x2 g

.. Gx f1

x3

.. . ..
f2

.
xn
an
.
fn

(b) Our Constrained Neural Network

Figure 2: Diagram illustrating the difference between a standard neural network structure
(a) and our constrained model (b). Since the output layer of the constrained model is of
lower dimension than that of the standard neural network, less hidden layers and neurons are
required. In this figure, we assume that g is scalar and show a single hidden layer.

written along with its partial derivative as

g(x) = W2 φ1 (W1 x + b1 ) + b2 = W2 φ1 (a1 ) + b2 ,


∂g(x) ∂a1 ∂φ1 (a1 ) ∂φ1 (a1 ) (9)
= W2 = W2 W1 .
∂x ∂x ∂a1 ∂a1
Here, we have introduced the notation a1 = W1 x + b1 to simplify the de-
scription. Hence, it is only required that the first derivative of the activation
function with respect to a1 is not constant. However, for higher-order deriva-

8
tives, there are requirements on the activation functions chosen. Consider the
second derivatives of the same network
 0 
2 2  2
∂ g(x)  ∂ a1 ∂φ1 (a1 ) ∂a1 ∂ φ1 (a1 )  ∂a1 ∂ 2 φ1 (a1 )
= W 2 + = W2 . (10)
∂x2 ∂x2 ∂a1 ∂x ∂a21 ∂x ∂a21
 

To use this model it is required that the second derivatives of the activation
function are non-constant. This excludes, for instance, the ReLU function. The
same procedure can be easily used to show that this condition remains when
the neural network is extended to two hidden layers.

4. Finding the Transformation Operator

This section presents two methods for determining a suitable operator Gx . Prior
knowledge about the physics of a problem could inform the choice of operator.
If this is not the case, then a suitable operator could be found by proposing an
ansatz and solving a system of linear equations.

4.1. From Physics

From fundamental physics, it may be the case that we know that the vector
field of interest is related to an underlying potential field. Common examples of
this are divergence-free (∇ · f = 0)1 vector fields and curl-free (∇ × f = 0) vector
fields. A curl-free vector field can be written as a function of an underlying
scalar potential field g:
f = ∇ · g, (11)
h iT
which gives Gx = ∂ ∂ ∂ . Divergence-free vector fields can, on the
∂x ∂y ∂z
other hand, be expressed as a function of a vector potential field g ∈ R3 , given
by
f = ∇ × g, (12)

h iT
1∇ = ∂ ∂ ∂ .
∂x ∂y ∂z

9
which gives  
∂ ∂
0 − ∂z ∂y
 
 ∂ ∂ 
Gx =  ∂z 0 − ∂x . (13)
 
∂ ∂
− ∂y ∂x 0

Many natural phenomena can be modelled according to these constraints, and


several examples were given in Section 1.

4.2. Ansatz
In absence or ignorance of underlying mathematical relations, the operator Gx
can be constructed using the pragmatic approach of which an exhaustive version
is described by Jidling et. al. [16]; a brief outline is given below. A solid
analysis of the mathematical properties of this operator is provided by Lange-
Hegermann [17].

The cornerstone of the approach is an ansatz on what operators we assume Gx


to contain; we formulate it as

Gx = Γξ, (14)

where ξ is a vector of operators, and Γ = [γij ] is a real-valued matrix that we


wish to determine. Here, we have assumed for simplicity that Gx is a vector,
implying that g is a scalar function. We now use (14) to rewrite (7) as

Cx Γξ = 0. (15)

Expanding the product on the left-hand side, we find that it reduces to a linear
combination of operators. Requiring all coefficients to equal 0, we obtain a
system of equations from which we can determine Γ, and thus also Gx .

For illustration, consider a toy example where


h i

Cx = ∂x ∂ . (16)
∂y

Assuming that Gx contains the same operators as Cx , we let


h iT
ξ = ∂x∂ ∂ . (17)
∂y

10
We then expand

Requiring this expression to equal 0, we get the following system of equations


 
  γ11
1 0 0 0  
  γ

 
 12
= 0, (18)

0 1 1 0  
  γ21  
0 0 0 1  
γ22

which is solved by γ11 = γ22 = 0 and γ12 = −γ21 . Letting γ21 = 1, we obtain
h iT

Gx = − ∂y ∂ , (19)
∂x

which can easily be verified to satisfy (7).

In the general case, Gx may contain operators of higher order than those in Cx .
It is also possible that a suitable underlying function may have a vector rather
than a scalar output. The procedure should, therefore, be considered iterative.
Additionally, within Appendix B, we show that this approach can be extended
to affine constraints.

5. Experimental Results

In this section, we demonstrate the proposed approach on simulated data from


a divergence-free field, simulated data of a strain field satisfying the equilibrium
conditions, and real data of a magnetic field that satisfies the curl-free constraint.
Python code to run the examples in this section is available at https://github.
com/jnh277/Linearly-Constrained-NN/.

These demonstrations use an FCNN for the underlying model in the proposed
approach. Hence, they compare the results to a standard FCNN. Additionally,
we compare the performance of the proposed approach and the commonly used
approach of augmenting the cost function to penalise constraint violations.

11
5.1. Simulated Divergence-Free Function
Consider the problem of modelling a divergence-free vector field defined as
f1 (x1 , x2 ) = exp(−ax1 x2 )(ax1 sin(x1 x2 ) − x1 cos(x1 x2 )),
(20)
f2 (x1 , x2 ) = exp(−ax1 x2 )(x2 cos(x1 x2 ) − ax2 sin(x1 x2 )),
∂f1 ∂f2
where a is a constant. This vector field satisfies the constraint ∂x1 + ∂x2 = 0.
A neural network based model satisfying these constraints is given by
 

f =  ∂x2  g. (21)

− ∂x 1

The regression of this problem using the proposed constrained neural network
and an unconstrained (standard) neural network is compared. The networks
root mean square error (RMSE) are compared in two studies:

1. Maintaining a constant network size of 2 hidden layers (100 neurons in the


first and 50 in the second) and increasing the number of measurements.
See Figure 3a.

2. Maintaining a constant number of measurements (4000) and increasing the


network size. In this case, the total number of neurons is reported with
two-thirds belonging to the first hidden layer and one third belonging to
the second. See Figure 3b.

In both studies, 200 random trials were completed with the measurements ran-
domly picked over the domain [0, 4]×[0, 4], and corrupted by zero-mean Gaussian
noise of standard deviation σ = 0.1. For both networks, a tanh activation layer
was placed on the output of the hidden layers. The networks were then trained
using a mean squared error loss function and the ADAM optimiser, with the
learning rate reduced as the validation loss plateaued. A uniform grid of 20 × 20
points was chosen to predict the function values at. The root mean square error
was then calculated between the true vector field and the predictions at these
locations. To focus this analysis on the impacts of the suggested approach, reg-
ularisation and other methods to reduce overfitting were not implemented. The
effect of regularisation on both networks is considered in Appendix A.

12
0.5 0.5 0.5 0.5

0 0 0 0

-0.5 -0.5 -0.5 -0.5

-1 -1 -1 -1

-1.5 -1.5 -1.5 -1.5

-2 -2 -2 -2

-2.5 -2.5 -2.5 -2.5

-3 -3 -3 -3

-3.5 -3.5
-3.5 -3.5

-4 -4
-4 -4
100 200 300 400 500 1000 1500 2000 2500 3000 3500 4000 100 200 300 400 500 1000 1500 2000 2500 3000 3500 4000 3 6 9 12 15 18 21 27 33 39 45 51 60 90 120 150 3 6 9 12 15 18 21 27 33 39 45 51 60 90 120 150

(a) Data size study (b) Network size study

Figure 3: Two studies comparing the performance of the proposed constrained neural network
based model with a standard unconstrained neural network using simulated measurements of a
divergence-free field. The RMSE is compared as (a) the number of measurements is increased
and (b) as the size of the neural network is increased.

In both these studies, the proposed approach yields a significantly lower RMSE
than a standard neural network. To highlight a few points, the proposed ap-
proach with 500 measurements has the same RMSE as the standard neural net-
work with 4000 measurements. Similarly, with 21 total neurons the proposed
approach performs as well as the standard neural network with 150 neurons.

An example of the learned vector fields from 200 noisy observations is provided
in Figure 4. For this comparison, both networks had two hidden layers with 100
neurons in the first and 50 in the second and a tanh activation function was
placed on the outputs of both hidden layers.

Our Constrained Approach RMSE=0.13 Unconstrained NN RMSE=0.38


4.0
3.5 3.5
3.5
3.0 3.0
3.0
2.5 2.5 2.5

2.0 2.0 2.0


x2

x2

x2

1.5 1.5 1.5

1.0 1.0 1.0

0.5 0.5 0.5


0.0 0.0 0.0
0 1 2 3 4 0 1 2 3 0 1 2 3
x1 x1 x1

Figure 4: Comparison of learning the divergence-free field from 200 noisy observations using
an unconstrained neural network (NN) and our constrained approach. Left: the true field
(grey) and observations (red). Centre and right: learned fields subtracted from the true field.
The comparison was performed using 2 hidden layers, 100 neurons in first, 50 in second for
both methods.

13
These results indicate that the proposed approach can achieve equivalent per-
formance with either less data or smaller network size. Another property of
our constrained neural network is that its predictions will automatically satisfy
the constraints. This is true even in regions where no measurements have been
made as illustrated in Figure 5. By comparison, the standard neural network
gives estimates which violate the constraints.

Figure 5: Comparison of learned fields constraint violations from 200 simulated noisy observa-
tions of a divergence-free field using an unconstrained neural network and our approach. Left:
the true field (grey) and observations (red). Centre and right: the constraint violations. No
measurements were made inside the dashed blue box. Centre and right: Constraint violations
∂f1 ∂f2
for the learned fields calculated as c = ∂x
+ ∂x2
.

5.2. Simulated Divergence-Free Function Continued


In the previous section, our proposed approach was compared against a neural
network that made no attempt to incorporate knowledge of the constraints. In
this section, we compare the performance of a neural network that augments
the loss function by penalising the constraint violation at a finite number of
points. Similar versions of this relatively straightforward approach have been
used to approximate solutions to differential equations [18, 19, 20, 21] and to
learn plasma fields subject to equilibrium constraints [2].

The simulated divergence-free function, equation (5.1), is learned using a neural


network with an augmented loss function given by
N Nc
1 X 1 X
loss = (yi − ŷi )2 + λ |cj |, (22)
N i=1 Nc j=1

where ŷi is the neural network prediction of measurement yi , Nc is the number


of points the constraint violation is evaluated at, the constraint violation is given

14
by
∂f1 ∂f2
c= + , (23)
∂x ∂x2
and λ is a tuning parameter that weights the relative importance of the measure-
ments and the constraint. Note that varying the ratio of measurement points
to constraint evaluation points has a similar effect to changing λ.

The results from 200 random trials with N = 3000, Nc = 3000 and λ ranging
from 0 to 256 are shown in Figure 6. Also indicated, by a dashed line, is the
median result using our proposed constrained approach for the same number
of measurements. These results indicate that rather than improving the neural
network’s predictions this approach of augmenting the cost function creates a
trade-off between learning the field from the measurements and achieving low
constraint violation. In contrast, our proposed approach removes the need to
tune the weight λ by building the constraints into the model. Building the con-
straints into the model also has the added benefit of reducing the problem size,
which improves the predictions and guarantees the constraints to be satisfied at
all locations.
0.5 0.5
0
0.4
-0.5
-1 0.3
-1.5
-2 0.2

-2.5
0.1
-3
-3.5 0
0 0.02 0.08 0.32 1.2 4.4 16 64 256 0 0.02 0.08 0.32 1.2 4.4 16 64 256

(a) (b)

Figure 6: Performance of a neural network with cost function augmented by penalising the
mean squared constraint violation. The study compares (a) the RMSE of the predicted
field and (b) the mean absolute constraint violation of the predicted field for a range of
weighting factors λ. For comparison, results from our proposed approach for the same number
of measurements is indicated by the dashed line.

15
5.3. Simulated Strain Field

An example of a more complex constraint is given by considering the estimation


of strain fields. Strain fields describe the relative deformation of points within a
solid body and can be measured by neutron and X-ray diffraction [22] providing
a means to study the stress—a quantity that cannot be directly measured.
Maximum stresses are commonly accepted as a major contributing factor to
component failure [13] and hence studying stress is of interest for the design of
engineering components.

Physical strain fields satisfy the equilibrium constraints and, as such, it is im-
portant to ensure that any estimates of these fields from measurements also
satisfies these constraints [13]. Here, we consider a two-dimensional strain field
with components described by xx (x, y), yy (x, y), xy (x, y). Under the assump-
tion of plane stress the equilibrium constraints are given by [23]
∂ ∂
(xx + νyy ) + (1 − ν)xy = 0,
∂x ∂y
(24)
∂ ∂
(yy + νxx ) + (1 − ν)xy = 0.
∂y ∂x
A neural network based model satisfying these constraints can be derived from
physics using the so-called Airy stress function [13] and is given by
   
∂2 ∂2
xx 2 − ν 2
   ∂y ∂x 
   ∂2 ∂2 
yy  =  ∂x2 − ν ∂y 2 
g. (25)
   
2

xy −(1 + ν) ∂x∂y

This model is used to learn the classical Saint-Venant cantilever beam strain field
under an assumption of plane stress [24] from 200 noisy simulated measurements.
Details of this strain field and the measurements is given in Appendix C.

Predictions of this strain field using the proposed model and a standard neu-
ral network are shown in Figure 7. The proposed approach gives an RMSE of
5.52 × 10−5 compared to 67.7 × 10−5 for the standard neural network. Qualita-
tively, we can see that the proposed approach provides more accurate estimates
of the strain field, particularly for the xy component.

16
Figure 7: The theoretical Saint-Venant cantilever beam field and strain fields learned from 200
noisy measurements using the presented constrained approach and a standard neural network.
Both networks have three hidden layers with 20, 10, and 5 neurons, respectively. Values are
given in micro strain.

5.4. Real Data

Magnetic fields can be mathematically described as a vector field mapping a


3D position to a 3D magnetic field vector, B. Based on the magnetostatic
equations, this can be modelled as a curl-free vector field [25, 26]:

∇ × B = 0. (26)

As such, a neural network satisfying the curl-free constraint can be designed to


model the magnetic field according to
 

 ∂x1 
B
b = ∂ 
 ∂x  g. (27)
 2

∂x3

With a magnetic field sensor and an optical positioning system, both position
and magnetic field data have been collected in a magnetically distorted indoor
environment — with a total of 16,782 data points collected. Details of the data
acquisition can be found in the supplementary materials of Jidling et. al. [16],
where this data was previously published. Figure 1 illustrates magnetic field
predictions using a constrained neural network trained on 500 measurements
sampled from the trajectory shown in black. The constrained neural network
had two hidden layers of 150 and 75 neurons, with Tanh activation layers. Using

17
the remaining data points for validation, our constrained model has a RMSE
validation loss of 0.048 compared to 0.053 for a standard unconstrained neural
network of the same size and structure.

Two studies were run comparing the proposed approach and a standard neural
network for a range of training data sizes and neural network sizes. The networks
RMSE when validated against 8,000 reserved validation data points are shown
in Figure 8 for the following settings:

1. Maintaining a constant network size of 2 hidden layers (150 neurons in the


first and 75 in the second) and increasing the number of measurements.
See Figure 8a.

2. Maintaining a constant number of measurements (6000) and increasing the


network size. In this case, the total number of neurons is reported with
two-thirds belonging to the first hidden layer and one third belonging to
the second. See Figure 8b.

For both studies, the results of training the networks for 100 random initialisa-
tions are shown.

The studies show that the proposed approach performs better than the standard
neural network for a smaller number of measurements or a smaller network size.
As the number of measurements or neurons is increased, the performance of
both networks converges. This is expected as given enough measurements and
a large enough network size, both methods should converge to the true field and
hence a minimum validation RMSE.

6. Related work

Focusing on neural networks, related work falls into two broad categories: in-
corporating known physics relations as prior knowledge, and optimising neural
networks subject to constraints. Here, we discuss some of these methods that
are closely related or of particular interest.

18
-2.2 -2.2 -3.3 -3.3

-2.4 -2.4
-3.4 -3.4

-2.6 -2.6

-3.5 -3.5
-2.8 -2.8

-3 -3
-3.6 -3.6

-3.2 -3.2
-3.7 -3.7
-3.4 -3.4

-3.6 -3.6 -3.8 -3.8

-3.8 -3.8
-3.9 -3.9

-4 -4
250 500 1000 3000 6000 8000 250 500 1000 3000 6000 8000 45 90 135 180 225 270 315 45 90 135 180 225 270 315

(a) Data size study (b) Network size study

Figure 8: Two studies comparing the performance of the proposed constrained neural network
based model with a standard unconstrained neural network using data collected of a magnetic
field. The RMSE is compared as (a) the number of measurements is increased and (b) as the
size of the neural network is increased.

Several papers have discussed incorporating differential equation constraints into


neural network models by augmenting the cost function to include a penalty
term given by evaluating the constraint at a finite number of points [2, 18, 19,
20]. This idea is presented as a method to approximate the solution to partial
differential equations by Dissanayake and Phan-Thien [18], as it transforms the
problem into an unconstrained optimisation problem. Van Milligen et al. [2]
applies this idea to learning plasma fields which are subject to equilibrium.
Sirignano and Spiliopoulos [19] applies this idea to the learning of high dimen-
sional partial differential equations, with boundary conditions included in the
same manner as the constraints — that is, by augmenting the cost function.

A similar approach was used by Raissi et al. [27] to learn the solution to lin-
ear and non-linear partial differential equations and they demonstrate the ap-
proach on examples from physics such as Shrödinger’s equation. They also
demonstrated that automatic differentiation included in TensorFlow and Py-
Torch provides a straightforward way to calculate the augmented cost. Further,
they provide an alternate method for discovering ordinary differential equation
style models from data.

Outside of modelling vector fields and differential equations, augmenting the


cost function to penalise constraint violation has seen use in other applications
of neural networks including image segmentation and classification [28, 29, 30,

19
31]. An alternative that strictly enforces the constraints is to learn the neural
network parameters by solving a soft-barrier constrained optimisation problem
[32, 33]. However, it has been shown that the soft-barrier style approaches are
outperformed by the augmented cost function approaches [34], possibly due to
the challenges of solving the non-convex optimisation problem [35].

While the approach of augmenting the cost function can be used to solve the
problem presented in our paper it has some downsides. Firstly, it does not guar-
antee that the constraint is satisfied, and this is especially true in regions where
the constraint may not have been evaluated as part of the cost. Secondly, its
performance is subject to the number of points at which the constraint is evalu-
ated. Thirdly, a relative weighting needs to be chosen between the original cost
and the cost due to constraint satisfaction and this creates a trade-off. The ap-
proach presented in our paper avoids these issues by presenting a neural network
model that uses a transformation to guarantee the constraints to be satisfied
everywhere rather than augmenting the cost function. In Section 5.2, we have
compared to performance of our proposed approach to that of augmenting the
cost function.

Augmenting the cost function does, however, have the advantage that it can
be used even when no suitable transform Gx is forthcoming. For instance, it
can be used to enforce boundary constraints [19, 20]. Since using our proposed
approach does not exclude the possibility of also augmenting the loss function,
we suggest that the two approaches are complementary. Whereby, our proposed
approach is used to satisfy constraints for which a suitable transform Gx can
be designed, and then the cost function is augmented to penalise violation of
other constraints, such as boundary conditions. A similar combined approach
was used to model strain fields subject to both equilibrium constraints and
boundary conditions [36].

The idea of modelling potential functions using neural networks as a means to


include prior knowledge about the problem is not new and has been used to
learn models of dynamic systems and vector fields.

20
There are several examples of using neural networks to learn the Hamiltonian or
Lagrangian of a dynamic system and training this model using the derivatives of
the neural network [3, 4, 37, 38, 39, 40]. These methods ensure that the learned
dynamics are conservative, i.e. the total energy in the system is constant. An-
other approach to learning dynamic systems is presented by Chen et al. [41]
where neural networks are used to model solutions to ODEs and Massaroli et
al. [42] propose a Port-Hamiltonian based approach to training these models.

A method for simultaneous fitting of magnetic potential fields and force fields
using neural networks is studied by Pukrittayakamee et al. [1]. This method
uses a neural network to model the potential field and the force field is then the
partial derivatives of the neural network. In their work, measurements of both
the potential field and the force field are used to train the model. This work
was later extended to provide a practical approach to fitting measurements of
a function and its derivatives using neural networks as well as a discussion on
specific types of overfitting that might be encountered [43]. A similar approach
is taken by Handley and Popelier [44] where it is used for modelling molecular
dynamics and Monte Carlo studies on gas-phase chemical reactions.

Although it was not the motivation for the method, the model for the force
field given by Pukrittayakamee et al. [1] will obey the curl-free constraint and
is equivalent to that presented in Section 5.4. In our work, we extend the idea
of representing the target function by a transformation of a potential function
modelled by a neural network to a broader range of problems that obey a variety
of constraints. Additionally, we focus only on the transformed target function
and do not require measurements of the potential function.

Another interesting idea is presented by Schmidt and Lipson [45] who derived
a method for distilling natural laws from data. In their work, symbolic terms
including partial derivatives are used as building blocks with which to learn
equations that the data satisfies.

Instead of incorporating known physics or constraints into neural networks, an-

21
other area of research is using neural networks to solve constrained optimisation
problems. Several methods using neural networks to solve constraint satisfac-
tion problems have been presented [46, 47]. For example, Xia et al. [48] used a
recurrent neural network for solving the non-linear projection formulation, and
is applicable to many constrained linear and non-linear optimisation problems.
Similar to our approach, their method uses a projection or transformation of
the neural network; however, both the motivation and realisation are substan-
tially different. Neural networks have also been applied to solving optimisation
problems quadratic cost functions subject to bound constraints [49].

7. Conclusion

An approach for designing neural network based models for regression of vector-
valued signals in which the target function is known to obey linear operator
constraints has been proposed. By construction, this approach guarantees that
any prediction made by the model will obey the constraints for any point in
the input space. It has been demonstrated on simulated data and real data
that this approach provides benefits by reducing the size of the problem and
hence providing performance benefits. This reduces the required number of data
points and the size of the neural network — providing savings in terms of time
and cost required to collect the data set.

It was also demonstrated that the proposed approach can out-perform the com-
monly used method of augmenting the cost function to penalise constraint viola-
tions. This is likely due to the fact that augmenting the cost function increases
the size of the problem (by including artificial measurements of the constraint)
and creates a trade-off between constraint satisfaction and minimising error.
Whereas our proposed approach reduces the size of the problem and does not
suffer from this trade-off.

The proposed approach constructs the model by a transformation of an underly-


ing potential function, where the construction is chosen such that the constraints
are always satisfied. This transformation may be known from physics or, in the

22
absence of such knowledge, constructed using a method of ansatz. Addition-
ally, we provide an example of extending this approach to affine constraints
(constraints with a non-zero right-hand side) in Appendix B. Whilst the pro-
posed approach admits a wide class of neural networks to model the underlying
function, the work presented in this paper focusses on FCNs and it would be
interesting future work to explore the use of other types of neural networks such
as CNNs.

Another interesting area for future research would be to determine if it is possible


to learn the transformation as a combination of symbolic elements using tools
similar to those presented by Schmidt and Lipson [45].

Acknowledgements

This research was financially supported by the Swedish Foundation for Strategic
Research (SSF) via the project ASSEMBLE (contract number: RIT15-0012)
and by the Swedish Research Council via the project Learning flexible models
for nonlinear dynamics (contract number: 2017-03807).

Appendix A. Regularisation Study

In the simulated study in Section 5.1, the use of regularisation was not consid-
ered. Here, we consider the impact of regularisation on the performance of both
the standard and the constrained neural network. To investigate this impact
the network size study from the previous section was rerun with the networks
regularised by the addition of an L2 penalty on the network weights to the loss
function,
N m
1 X 2
X
loss = (yi − ŷi ) + γ wj2 , (A.1)
N i j

where ŷi is the network’s prediction of the measurement i, wj , ∀j = 1, . . . , m


are the network weights, and γ is a tuning parameter often known as the weight
decay.

23
-0.5

-1

-1.5

-2

-2.5

-3

-3.5

-4
0 500 1000 1500 2000 2500 3000 3500 4000

Figure A.9: Comparison of both the constrained and standard neural network with and with-
out regularisation. A weight decay of γ = 1 × 10−4 is used to regularise both networks. The
average loss for 200 random trials is shown.

The impact of regularisation on the performance of both networks is shown in


Figure A.9. A weight decay of γ = 1×10−4 was used to regularise both networks
as it was found to give the best results for the standard neural network. These
results indicate that the inclusion of regularisation can improve the performance
of both the standard and the proposed constrained neural network, in this case,
by approximately the same amount. This comparison further highlights the
benefit of the constrained approach as even without regularisation it performs
better than the regularised standard neural network.

Appendix B. Simulated Affine Example

It is also possible to design a model to satisfy an affine constraint Cx f = b. This


type of constraint arises for vector fields that have a prescribed divergence of
curl. For example, Maxwell’s equations [12] and low-mach number flow [11].

To illustrate the approach, we design a neural network that will satisfy constant
divergence, i.e. ∇f = b, and demonstrate the method in simulation. The model
satisfying this constraint can be built by starting from the model used in the
previous section that satisfies ∇f = 0. From this starting model, we need to add
a component that when mapped through the constraints results in a constant
term. This is easily achieved using a 2 input 1 output linear layer with no bias

24
term, giving the final model as
 

∂x2
f̂ =   g + c0 x1 + c1 x2 , (B.1)

− ∂x 1

where the weights c0 and c1 will be learned along with the rest of the neural
network parameters.

Figure B.10 shows the results of learning a field satisfying these constraints
using our approach and a standard neural network. Measurement locations are
randomly picked over the domain [0, 4]×[0, 4], with 200 measurements simulated
from the field given by

f1 (x1 , x2 ) = exp(−ax1 x2 )ax1 sin(x1 x2 )

− exp(−ax1 x2 )x1 cos(x1 x2 ) + 1.1x1 ,


(B.2)
f2 (x1 , x2 ) = exp(−ax1 x2 )x2 cos(x1 x2 )

− exp(−ax1 x2 )ax2 sin(x1 x2 ) − 0.3x2 ,

and zero-mean Gaussian noise of standard deviation σ = 0.1 added. For both
networks, two hidden layers are used with 100 and 50 neurons respectively and
Tanh activation layers. The field is then predicted at a grid of 20 × 20 points
with the proposed approach achieving a RMSE of 0.21 compared to 0.48 for the
standard neural network.
Our Approach RMS error =0.21 Unconstrained NN RMS error =0.48
4.0
3.5 3.5
3.5
3.0 3.0
3.0
2.5 2.5 2.5

2.0 2.0 2.0


x2

x2

x2

1.5 1.5 1.5

1.0 1.0 1.0

0.5 0.5 0.5


0.0 0.0 0.0
0 1 2 3 4 0 1 2 3 0 1 2 3
x1 x1 x1

Figure B.10: Comparison of learning the affine constrained field from 200 noisy observations
using an unconstrained neural network and our approach. Left: the true field (grey) and
observations (red). Centre and right: learned fields subtracted from the true field. Done
using 2 hidden layers, 100 neurons in first, 50 in second for both methods.

25
Appendix C. Strain Field and Measurement Details

This section provides details of the strain field equations and the simulated mea-
surements used for the simulated strain field example in the main paper. The
simulation uses the classical Saint-Venant cantilever beam strain field equations
under an assumption of plane stress [24];

P
xx (x, y) = (l − x)y,
EI
νP
yy (x, y) = − (l − x)y, (C.1)
EI !
 2
(1 + ν)P h 2
xy (x, y) = − −y ,
2EI 2

where P = 2 kN is the applied load, E = 200 GPa is the elastic modulus,


ν = 0.28 is Poisson’s ratio, l = 20 mm is the beam length, h = 10 mm is the
th3
beam height, t = 5 mm is the beam width, and I = 12 is the second moment
of inertia.

Simulated measurements of this strain field were made at random locations


within the beam and were corrupted by zero-mean Gaussian noise with stan-
dard deviation of 2.5 × 10−4 . In practice, such measurements can be made by
X-ray or neutron diffraction and correspond to the average strain within a small
volume of material inside the sample, known as a gauge volume [22]. Gauge vol-
umes can be made small enough that it is practical to treat these measurements
as corresponding to points in the sample, and noise levels as low as 1 × 10−4 or
better can be achieved.

References

[1] A. Pukrittayakamee, M. Malshe, M. Hagan, L. Raff, R. Narulkar, S. Bukka-


patnum, R. Komanduri, Simultaneous fitting of a potential-energy surface
and its corresponding force fields using feedforward neural networks, The
Journal of chemical physics 130 (13) (2009) 134101.

[2] B. P. Van Milligen, V. Tribaldos, J. Jiménez, Neural network differential

26
equation and plasma equilibrium solver, Physical review letters 75 (20)
(1995) 3594.

[3] S. Greydanus, M. Dzamba, J. Yosinski, Hamiltonian Neural Networks,


arXiv preprint arXiv:1906.01563.

[4] M. Lutter, C. Ritter, J. Peters, Deep Lagrangian Networks: Using Physics


as Model Prior for Deep Learning, ArXiv abs/1907.04490.

[5] E. J. Konopinski, What the electromagnetic vector potential describes,


American Journal of Physics 46 (5) (1978) 499–502.

[6] D. R. Durran, Improving the anelastic approximation, Journal of the at-


mospheric sciences 46 (11) (1989) 1453–1461.

[7] P. K. Kundu, D. R. Dowling, G. Tryggvason, I. M. Cohen, Fluid mechanics.

[8] C. Truesdell, The kinematics of vorticity, Courier Dover Publications, 2018.

[9] T. L. Chow, Introduction to electromagnetic theory: a modern perspective,


Jones & Bartlett Learning, 2006.

[10] D. J. Griffiths, Introduction to electrodynamics, Prentice Hall New Jersey,


1962.

[11] A. S. Almgren, J. B. Bell, C. A. Rendleman, M. Zingale, Low Mach num-


ber modeling of type Ia supernovae. I. hydrodynamics, The Astrophysical
Journal 637 (2) (2006) 922.

[12] D. Fleisch, A student’s guide to Maxwell’s equations, Cambridge University


Press, 2008.

[13] M. H. Sadd, Elasticity: theory, applications, and numerics, Academic Press,


2009.

[14] D. G. Luenberger, Optimization by vector space methods, John Wiley &


Sons, 1997.

27
[15] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin,
A. Desmaison, L. Antiga, A. Lerer, Automatic differentiation in pytorch.

[16] C. Jidling, N. Wahlström, A. Wills, T. B. Schön, Linearly constrained


Gaussian processes, in: Advances in Neural Information Processing Sys-
tems, 2017, pp. 1215–1224.

[17] M. Lange-Hegermann, Algorithmic Linearly Constrained Gaussian Pro-


cesses, in: Advances in Neural Information Processing Systems 31, 2018,
pp. 2137–2148.

[18] M. Dissanayake, N. Phan-Thien, Neural-network-based approximations for


solving partial differential equations, communications in Numerical Meth-
ods in Engineering 10 (3) (1994) 195–201.

[19] J. Sirignano, K. Spiliopoulos, Dgm: A deep learning algorithm for solving


partial differential equations, Journal of Computational Physics 375 (2018)
1339–1364.

[20] M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics Informed Deep Learn-


ing (Part I): Data-driven Solutions of Nonlinear Partial Differential Equa-
tions, arXiv preprint arXiv:1711.10561.

[21] M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics Informed Deep Learn-


ing (Part II): Data-driven Discovery of Nonlinear Partial Differential Equa-
tions, arXiv preprint arXiv:1711.10566.

[22] I. C. Noyan, J. B. Cohen, Determination of Strain and Stress Fields by


Diffraction Methods, in: Residual Stress, Springer, 1987, pp. 117–163.

[23] A. Gregg, J. Hendriks, C. Wensrich, A. Wills, A. Tremsin, V. Luzin, T. Shi-


nohara, O. Kirstein, M. Meylan, E. Kisi, Tomographic reconstruction of
two-dimensional residual strain fields from Bragg-edge neutron imaging,
Physical Review Applied 10 (6) (2018) 064034.

28
[24] F. Beer, E. Johnston Jr, J. Dewolf, D. Mazurek, Mechanics of Materials,
sixth edit edition (2010).

[25] N. Wahlström, Modeling of Magnetic Fields and Extended Objects for


Localization Applications, Ph.D. thesis, Linköping University Electronic
Press (2015).

[26] A. Solin, S. Särkkä, Hilbert space methods for reduced-rank gaussian pro-
cess regression, Statistics and Computing 30 (2) (2020) 419–446.

[27] M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural net-


works: A deep learning framework for solving forward and inverse problems
involving nonlinear partial differential equations, Journal of Computational
Physics 378 (2019) 686–707.

[28] D. Pathak, P. Krahenbuhl, T. Darrell, Constrained convolutional neural


networks for weakly supervised segmentation, in: Proceedings of the IEEE
international conference on computer vision, 2015, pp. 1796–1804.

[29] Z. Jia, X. Huang, I. Eric, C. Chang, Y. Xu, Constrained deep weak supervi-
sion for histopathology image segmentation, IEEE transactions on medical
imaging 36 (11) (2017) 2376–2388.

[30] Y. Liu, A. W. K. Kong, C. K. Goh, A constrained deep neural network for


ordinal regression, in: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, 2018, pp. 831–839.

[31] O. Oktay, E. Ferrante, K. Kamnitsas, M. Heinrich, W. Bai, J. Caballero,


S. A. Cook, A. De Marvao, T. Dawes, D. P. O‘Regan, et al., Anatomi-
cally constrained neural networks (acnns): application to cardiac image en-
hancement and segmentation, IEEE transactions on medical imaging 37 (2)
(2017) 384–395.

[32] P. Márquez-Neila, M. Salzmann, P. Fua, Imposing hard constraints on deep


networks: Promises and limitations, arXiv preprint arXiv:1706.02025.

29
[33] H. Kervadec, J. Dolz, J. Yuan, C. Desrosiers, E. Granger, I. B. Ayed, Log-
barrier constrained cnns, Computing Research Repository (CoRR).

[34] J. Drgona, A. R. Tuor, V. Chandan, D. L. Vrabie, Physics-constrained


deep learning of multi-zone building thermal dynamics, arXiv preprint
arXiv:2011.05987.

[35] T. Yang, Advancing non-convex and constrained learning: Challenges and


opportunities, AI Matters 5 (3) (2019) 29–39.

[36] J. Hendriks, A. Gregg, C. Wensrich, A. Wills, Implementation of traction


constraints in bragg-edge neutron transmission strain tomography, Strain
(2019) e12325doi:10.1111/str.12325.

[37] Y. D. Zhong, B. Dey, A. Chakraborty, Symplectic ODE-Net: Learning


Hamiltonian Dynamics with Control, arXiv preprint arXiv:1909.12077.

[38] J. K. Gupta, K. Menda, Z. Manchester, M. J. Kochenderfer, A general


framework for structured learning of mechanical systems, arXiv preprint
arXiv:1902.08705.

[39] M. Cranmer, S. Greydanus, S. Hoyer, P. Battaglia, D. Spergel, S. Ho,


Lagrangian neural networks, arXiv preprint arXiv:2003.04630.

[40] M. Lutter, K. Listmann, J. Peters, Deep lagrangian networks for end-to-end


learning of energy-based control for under-actuated systems, arXiv preprint
arXiv:1907.04489.

[41] T. Q. Chen, Y. Rubanova, J. Bettencourt, D. K. Duvenaud, Neural ordi-


nary differential equations, in: Advances in neural information processing
systems, 2018, pp. 6571–6583.

[42] S. Massaroli, M. Poli, F. Califano, A. Faragasso, J. Park, A. Yamashita,


H. Asama, Port-Hamiltonian Approach to Neural Network Training, arXiv
preprint arXiv:1909.02702.

30
[43] A. Pukrittayakamee, M. Hagan, L. Raff, S. T. Bukkapatnam, R. Koman-
duri, Practical training framework for fitting a function and its derivatives,
IEEE transactions on neural networks 22 (6) (2011) 936–947.

[44] C. M. Handley, P. L. Popelier, Potential energy surfaces fitted by artificial


neural networks, The Journal of Physical Chemistry A 114 (10) (2010)
3371–3383.

[45] M. Schmidt, H. Lipson, Distilling free-form natural laws from experimental


data, science 324 (5923) (2009) 81–85.

[46] H.-M. Adorf, M. D. Johnston, A discrete stochastic neural network algo-


rithm for constraint satisfaction problems, in: IJCNN International Joint
Conference on Neural Networks, 1990, pp. 917–924.

[47] E. P. Tsang, C. J. Wang, A generic neural network approach for constraint


satisfaction problems, in: Neural network applications, Springer, 1992, pp.
12–22.

[48] Y. Xia, H. Leung, J. Wang, A projection neural network and its application
to constrained optimization problems, IEEE Transactions on Circuits and
Systems I: Fundamental Theory and Applications 49 (4) (2002) 447–458.

[49] A. Bouzerdoum, T. R. Pattison, Neural network for quadratic optimization


with bound constraints, IEEE transactions on neural networks 4 (2) (1993)
293–304.

31

You might also like