2002.01600
2002.01600
Abstract
We present a novel approach to modelling and learning vector fields from phys-
ical systems using neural networks that explicitly satisfy known linear operator
constraints. To achieve this, the target function is modelled as a linear transfor-
mation of an underlying potential field, which is in turn modelled by a neural
network. This transformation is chosen such that any prediction of the target
function is guaranteed to satisfy the constraints. The approach is demonstrated
on both simulated and real data examples.
Keywords: Neural Networks, Linear operator constraints, Physical systems,
Vector fields
1. Introduction
Developments during recent years have established deep learning as the per-
haps most prominent member of the machine learning toolbox. Today neural
networks are present in a broad range of applications and are used for both
classification and regression problems. This includes the use of neural networks
to model and learn vector-valued quantities from physical systems such as mag-
netic fields [1], plasma fields [2], and the dynamics of conservative systems [3, 4],
to name a few. The popularity of neural networks is to a large extent explained
∗ Correspondingauthor
Email address: [email protected] (Johannes N. Hendriks)
Instead of focusing on the network per se, it may be just as important to consider
prior knowledge provided by the problem setting. For instance, the function of
interest can represent a quantity subject to fundamental physical constraints.
In some cases these physical constraints take the form of linear operator con-
straints. This includes many vector fields that are known to be either divergence-
or curl-free. Examples of divergent-free vector fields (also known as solenoidal
fields) are the magnetic field [5]—see Figure 1, the velocity of an anelastic flow
[6], the vorticity field [7, 8], and current density where the charge is constant
over time as given by the continuity equation [9, 10]. Low-mach number flow is a
simplification of the compressible Euler equations and describes flow with a pre-
scribed divergence [11] (i.e. it takes the form of an affine constraint). Another
example of fields satisfying an affine constraint is given by Maxwell’s equations,
which describe electromagnetic fields with a prescribed curl and divergence [12].
Within continuum mechanics, the stress field and strain field inside a solid object
satisfy the equilibrium conditions and the strain field inside a simply connected
body satisfies the compatibility constraints [13]. Equilibrium can also be used
as a constraint when modelling plasma fields [2].
The list can be made longer, but the point is clear – by making sure that certain
constraints are fulfilled, we (significantly) reduce the set of functions that could
explain our measured data. This, in turn, implies that we can maintain high
performance without requiring the same amount of flexibility. Put simply: we
can obtain the same results with a smaller network and less training data.
2
Figure 1: Magnetic field predictions (blue) using a constrained neural network trained on 500
observations (red) sampled from the trajectory indicated by the black curve. The magnetic
field, B is curl-free satisfying the constraint ∇B = 0, and the method proposed in this paper
ensures that the predictions satisfy this constraint.
1.1. Contribution
This paper presents a novel approach for designing neural network based models
that satisfy linear operator constraints. It is a step towards addressing a central
issue applying machine learning methods to physical systems: enforcement of
physical constraints in machine learning techniques. The approach models the
vector field to be learned as a linear transformation of an underlying potential
field. The benefits of using this approach are two-fold:
1. Any predictions made using this approach will satisfy the constraints for
the entire input space.
3
Additionally, the approach allows a wide class of neural network models to be
used to model the underlying function, including convolutional neural networks
and recurrent neural networks.
2. Problem Formulation
the input and yi denotes the output. Both the input and output are potentially
vector-valued with xi ∈ RD and yi ∈ RK . Here we consider the regression
problem where the data can be described by the non-linear function yi = f (xi )+
ei , where ei is zero-mean white noise representing the measurement uncertainty.
In this work, a neural network is used to model f and can be described by
In addition to the data, we know that the function f should fulfil certain con-
straints
Cx [f ] = 0, (3)
4
where Cx is a linear operator [14] mapping the function f to another function
g. That is Cx [f ] = g. Further, we restrict Cx to be a linear operator, meaning
that Cx [λ1 f1 + λ2 f2 ] = λ1 Cx [f1 ] + λ2 Cx [f2 ], where λ1 , λ2 ∈ R. A simple example
is if the operator is a linear transformation Cx [f ] = Cf which together with the
constraints (3) forces a certain linear combination of the outputs to be linearly
dependent.
The constraints can come from either known physical laws or other prior knowl-
edge about the data. Here, the objective is to determine an approach to derive
models based on neural networks such that all predictions from these models
will satisfy the constraints.
In this section, an approach to learn a function using a neural network such that
any resulting estimate satisfies the constraints (3) is proposed. The novelty of
this approach is in recognising that vector fields subject to linear constraints
can, in general, be modelled by a linear transformation of a neural network
such that the learned field will always satisfy the constraints. This section first
presents the approach and then gives a brief discussion of conditions that may be
imposed on the neural network. Then, the following section gives two methods
for determining the required transformation.
Our approach designs a neural network that satisfies the constraint for all possi-
ble values of its parameters, rather than imposing constraints on the parameter
values themselves. This is done by considering f to be related to another func-
5
tion g via some linear operator Gx :
f = Gx [g]. (4)
We require this relation to hold for any function g. To do this, we will in-
terpret Cx and Gx as matrices and use a similar procedure to that of solving
systems of linear equations. Since Cx and Gx are linear operators, we can think
of Cx [f ] and Gx [g] as matrix-vector multiplications where Cx [f ] = Cx f , with
PK
(Cx f )i = j=1 (Cx )ij fj where each element (Cx )ij in the operator matrix is
6
The choice of neural network structure in step 2 may have some conditions
placed upon it by the transformation found in step 1 for the resulting model to
be mathematically correct. For example, if the transformation contains partial
derivatives then this may restrict the choice of activation function. A more
detailed discussion is given in Section 3.2. Despite these conditions, the ap-
proach admits a reasonably wide class of neural network models including fully
connected neural networks (FCNN), convolutional neural networks (CNN), and
recurrent neural networks (RNN). For the ease of explanation, the examples
given in this paper use FCNNs.
The parameters of the resulting model can be learned using existing methods
such as stochastic gradient descent. It is worth noting that if the data requires
scaling then care should be taken as this scaling can modify the form of the
constraints.
In the case where the operator Gx contains partial derivatives, such as for curl-
free and divergent-free fields, the implementation can be done using automatic
differentiation such as the grad function in PyTorch [15].
If the transformation contains only first-order derivatives then this does not
result in any restrictive conditions. To see this consider a neural network with
a single hidden layer and an identity activation function in its output layer,
7
Input Hidden Ouput
layer layer layer
x1
a1
x2
..
f1
x3
.. . ..
f2
.
xn
an
.
fn
x1
a1
x2 g
.. Gx f1
x3
.. . ..
f2
.
xn
an
.
fn
Figure 2: Diagram illustrating the difference between a standard neural network structure
(a) and our constrained model (b). Since the output layer of the constrained model is of
lower dimension than that of the standard neural network, less hidden layers and neurons are
required. In this figure, we assume that g is scalar and show a single hidden layer.
8
tives, there are requirements on the activation functions chosen. Consider the
second derivatives of the same network
0
2 2 2
∂ g(x) ∂ a1 ∂φ1 (a1 ) ∂a1 ∂ φ1 (a1 ) ∂a1 ∂ 2 φ1 (a1 )
= W 2 + = W2 . (10)
∂x2 ∂x2 ∂a1 ∂x ∂a21 ∂x ∂a21
To use this model it is required that the second derivatives of the activation
function are non-constant. This excludes, for instance, the ReLU function. The
same procedure can be easily used to show that this condition remains when
the neural network is extended to two hidden layers.
This section presents two methods for determining a suitable operator Gx . Prior
knowledge about the physics of a problem could inform the choice of operator.
If this is not the case, then a suitable operator could be found by proposing an
ansatz and solving a system of linear equations.
From fundamental physics, it may be the case that we know that the vector
field of interest is related to an underlying potential field. Common examples of
this are divergence-free (∇ · f = 0)1 vector fields and curl-free (∇ × f = 0) vector
fields. A curl-free vector field can be written as a function of an underlying
scalar potential field g:
f = ∇ · g, (11)
h iT
which gives Gx = ∂ ∂ ∂ . Divergence-free vector fields can, on the
∂x ∂y ∂z
other hand, be expressed as a function of a vector potential field g ∈ R3 , given
by
f = ∇ × g, (12)
h iT
1∇ = ∂ ∂ ∂ .
∂x ∂y ∂z
9
which gives
∂ ∂
0 − ∂z ∂y
∂ ∂
Gx = ∂z 0 − ∂x . (13)
∂ ∂
− ∂y ∂x 0
4.2. Ansatz
In absence or ignorance of underlying mathematical relations, the operator Gx
can be constructed using the pragmatic approach of which an exhaustive version
is described by Jidling et. al. [16]; a brief outline is given below. A solid
analysis of the mathematical properties of this operator is provided by Lange-
Hegermann [17].
Gx = Γξ, (14)
Cx Γξ = 0. (15)
Expanding the product on the left-hand side, we find that it reduces to a linear
combination of operators. Requiring all coefficients to equal 0, we obtain a
system of equations from which we can determine Γ, and thus also Gx .
10
We then expand
which is solved by γ11 = γ22 = 0 and γ12 = −γ21 . Letting γ21 = 1, we obtain
h iT
∂
Gx = − ∂y ∂ , (19)
∂x
In the general case, Gx may contain operators of higher order than those in Cx .
It is also possible that a suitable underlying function may have a vector rather
than a scalar output. The procedure should, therefore, be considered iterative.
Additionally, within Appendix B, we show that this approach can be extended
to affine constraints.
5. Experimental Results
These demonstrations use an FCNN for the underlying model in the proposed
approach. Hence, they compare the results to a standard FCNN. Additionally,
we compare the performance of the proposed approach and the commonly used
approach of augmenting the cost function to penalise constraint violations.
11
5.1. Simulated Divergence-Free Function
Consider the problem of modelling a divergence-free vector field defined as
f1 (x1 , x2 ) = exp(−ax1 x2 )(ax1 sin(x1 x2 ) − x1 cos(x1 x2 )),
(20)
f2 (x1 , x2 ) = exp(−ax1 x2 )(x2 cos(x1 x2 ) − ax2 sin(x1 x2 )),
∂f1 ∂f2
where a is a constant. This vector field satisfies the constraint ∂x1 + ∂x2 = 0.
A neural network based model satisfying these constraints is given by
∂
f = ∂x2 g. (21)
∂
− ∂x 1
The regression of this problem using the proposed constrained neural network
and an unconstrained (standard) neural network is compared. The networks
root mean square error (RMSE) are compared in two studies:
In both studies, 200 random trials were completed with the measurements ran-
domly picked over the domain [0, 4]×[0, 4], and corrupted by zero-mean Gaussian
noise of standard deviation σ = 0.1. For both networks, a tanh activation layer
was placed on the output of the hidden layers. The networks were then trained
using a mean squared error loss function and the ADAM optimiser, with the
learning rate reduced as the validation loss plateaued. A uniform grid of 20 × 20
points was chosen to predict the function values at. The root mean square error
was then calculated between the true vector field and the predictions at these
locations. To focus this analysis on the impacts of the suggested approach, reg-
ularisation and other methods to reduce overfitting were not implemented. The
effect of regularisation on both networks is considered in Appendix A.
12
0.5 0.5 0.5 0.5
0 0 0 0
-1 -1 -1 -1
-2 -2 -2 -2
-3 -3 -3 -3
-3.5 -3.5
-3.5 -3.5
-4 -4
-4 -4
100 200 300 400 500 1000 1500 2000 2500 3000 3500 4000 100 200 300 400 500 1000 1500 2000 2500 3000 3500 4000 3 6 9 12 15 18 21 27 33 39 45 51 60 90 120 150 3 6 9 12 15 18 21 27 33 39 45 51 60 90 120 150
Figure 3: Two studies comparing the performance of the proposed constrained neural network
based model with a standard unconstrained neural network using simulated measurements of a
divergence-free field. The RMSE is compared as (a) the number of measurements is increased
and (b) as the size of the neural network is increased.
In both these studies, the proposed approach yields a significantly lower RMSE
than a standard neural network. To highlight a few points, the proposed ap-
proach with 500 measurements has the same RMSE as the standard neural net-
work with 4000 measurements. Similarly, with 21 total neurons the proposed
approach performs as well as the standard neural network with 150 neurons.
An example of the learned vector fields from 200 noisy observations is provided
in Figure 4. For this comparison, both networks had two hidden layers with 100
neurons in the first and 50 in the second and a tanh activation function was
placed on the outputs of both hidden layers.
x2
x2
Figure 4: Comparison of learning the divergence-free field from 200 noisy observations using
an unconstrained neural network (NN) and our constrained approach. Left: the true field
(grey) and observations (red). Centre and right: learned fields subtracted from the true field.
The comparison was performed using 2 hidden layers, 100 neurons in first, 50 in second for
both methods.
13
These results indicate that the proposed approach can achieve equivalent per-
formance with either less data or smaller network size. Another property of
our constrained neural network is that its predictions will automatically satisfy
the constraints. This is true even in regions where no measurements have been
made as illustrated in Figure 5. By comparison, the standard neural network
gives estimates which violate the constraints.
Figure 5: Comparison of learned fields constraint violations from 200 simulated noisy observa-
tions of a divergence-free field using an unconstrained neural network and our approach. Left:
the true field (grey) and observations (red). Centre and right: the constraint violations. No
measurements were made inside the dashed blue box. Centre and right: Constraint violations
∂f1 ∂f2
for the learned fields calculated as c = ∂x
+ ∂x2
.
14
by
∂f1 ∂f2
c= + , (23)
∂x ∂x2
and λ is a tuning parameter that weights the relative importance of the measure-
ments and the constraint. Note that varying the ratio of measurement points
to constraint evaluation points has a similar effect to changing λ.
The results from 200 random trials with N = 3000, Nc = 3000 and λ ranging
from 0 to 256 are shown in Figure 6. Also indicated, by a dashed line, is the
median result using our proposed constrained approach for the same number
of measurements. These results indicate that rather than improving the neural
network’s predictions this approach of augmenting the cost function creates a
trade-off between learning the field from the measurements and achieving low
constraint violation. In contrast, our proposed approach removes the need to
tune the weight λ by building the constraints into the model. Building the con-
straints into the model also has the added benefit of reducing the problem size,
which improves the predictions and guarantees the constraints to be satisfied at
all locations.
0.5 0.5
0
0.4
-0.5
-1 0.3
-1.5
-2 0.2
-2.5
0.1
-3
-3.5 0
0 0.02 0.08 0.32 1.2 4.4 16 64 256 0 0.02 0.08 0.32 1.2 4.4 16 64 256
(a) (b)
Figure 6: Performance of a neural network with cost function augmented by penalising the
mean squared constraint violation. The study compares (a) the RMSE of the predicted
field and (b) the mean absolute constraint violation of the predicted field for a range of
weighting factors λ. For comparison, results from our proposed approach for the same number
of measurements is indicated by the dashed line.
15
5.3. Simulated Strain Field
Physical strain fields satisfy the equilibrium constraints and, as such, it is im-
portant to ensure that any estimates of these fields from measurements also
satisfies these constraints [13]. Here, we consider a two-dimensional strain field
with components described by xx (x, y), yy (x, y), xy (x, y). Under the assump-
tion of plane stress the equilibrium constraints are given by [23]
∂ ∂
(xx + νyy ) + (1 − ν)xy = 0,
∂x ∂y
(24)
∂ ∂
(yy + νxx ) + (1 − ν)xy = 0.
∂y ∂x
A neural network based model satisfying these constraints can be derived from
physics using the so-called Airy stress function [13] and is given by
∂2 ∂2
xx 2 − ν 2
∂y ∂x
∂2 ∂2
yy = ∂x2 − ν ∂y 2
g. (25)
2
∂
xy −(1 + ν) ∂x∂y
This model is used to learn the classical Saint-Venant cantilever beam strain field
under an assumption of plane stress [24] from 200 noisy simulated measurements.
Details of this strain field and the measurements is given in Appendix C.
Predictions of this strain field using the proposed model and a standard neu-
ral network are shown in Figure 7. The proposed approach gives an RMSE of
5.52 × 10−5 compared to 67.7 × 10−5 for the standard neural network. Qualita-
tively, we can see that the proposed approach provides more accurate estimates
of the strain field, particularly for the xy component.
16
Figure 7: The theoretical Saint-Venant cantilever beam field and strain fields learned from 200
noisy measurements using the presented constrained approach and a standard neural network.
Both networks have three hidden layers with 20, 10, and 5 neurons, respectively. Values are
given in micro strain.
∇ × B = 0. (26)
With a magnetic field sensor and an optical positioning system, both position
and magnetic field data have been collected in a magnetically distorted indoor
environment — with a total of 16,782 data points collected. Details of the data
acquisition can be found in the supplementary materials of Jidling et. al. [16],
where this data was previously published. Figure 1 illustrates magnetic field
predictions using a constrained neural network trained on 500 measurements
sampled from the trajectory shown in black. The constrained neural network
had two hidden layers of 150 and 75 neurons, with Tanh activation layers. Using
17
the remaining data points for validation, our constrained model has a RMSE
validation loss of 0.048 compared to 0.053 for a standard unconstrained neural
network of the same size and structure.
Two studies were run comparing the proposed approach and a standard neural
network for a range of training data sizes and neural network sizes. The networks
RMSE when validated against 8,000 reserved validation data points are shown
in Figure 8 for the following settings:
For both studies, the results of training the networks for 100 random initialisa-
tions are shown.
The studies show that the proposed approach performs better than the standard
neural network for a smaller number of measurements or a smaller network size.
As the number of measurements or neurons is increased, the performance of
both networks converges. This is expected as given enough measurements and
a large enough network size, both methods should converge to the true field and
hence a minimum validation RMSE.
6. Related work
Focusing on neural networks, related work falls into two broad categories: in-
corporating known physics relations as prior knowledge, and optimising neural
networks subject to constraints. Here, we discuss some of these methods that
are closely related or of particular interest.
18
-2.2 -2.2 -3.3 -3.3
-2.4 -2.4
-3.4 -3.4
-2.6 -2.6
-3.5 -3.5
-2.8 -2.8
-3 -3
-3.6 -3.6
-3.2 -3.2
-3.7 -3.7
-3.4 -3.4
-3.8 -3.8
-3.9 -3.9
-4 -4
250 500 1000 3000 6000 8000 250 500 1000 3000 6000 8000 45 90 135 180 225 270 315 45 90 135 180 225 270 315
Figure 8: Two studies comparing the performance of the proposed constrained neural network
based model with a standard unconstrained neural network using data collected of a magnetic
field. The RMSE is compared as (a) the number of measurements is increased and (b) as the
size of the neural network is increased.
A similar approach was used by Raissi et al. [27] to learn the solution to lin-
ear and non-linear partial differential equations and they demonstrate the ap-
proach on examples from physics such as Shrödinger’s equation. They also
demonstrated that automatic differentiation included in TensorFlow and Py-
Torch provides a straightforward way to calculate the augmented cost. Further,
they provide an alternate method for discovering ordinary differential equation
style models from data.
19
31]. An alternative that strictly enforces the constraints is to learn the neural
network parameters by solving a soft-barrier constrained optimisation problem
[32, 33]. However, it has been shown that the soft-barrier style approaches are
outperformed by the augmented cost function approaches [34], possibly due to
the challenges of solving the non-convex optimisation problem [35].
While the approach of augmenting the cost function can be used to solve the
problem presented in our paper it has some downsides. Firstly, it does not guar-
antee that the constraint is satisfied, and this is especially true in regions where
the constraint may not have been evaluated as part of the cost. Secondly, its
performance is subject to the number of points at which the constraint is evalu-
ated. Thirdly, a relative weighting needs to be chosen between the original cost
and the cost due to constraint satisfaction and this creates a trade-off. The ap-
proach presented in our paper avoids these issues by presenting a neural network
model that uses a transformation to guarantee the constraints to be satisfied
everywhere rather than augmenting the cost function. In Section 5.2, we have
compared to performance of our proposed approach to that of augmenting the
cost function.
Augmenting the cost function does, however, have the advantage that it can
be used even when no suitable transform Gx is forthcoming. For instance, it
can be used to enforce boundary constraints [19, 20]. Since using our proposed
approach does not exclude the possibility of also augmenting the loss function,
we suggest that the two approaches are complementary. Whereby, our proposed
approach is used to satisfy constraints for which a suitable transform Gx can
be designed, and then the cost function is augmented to penalise violation of
other constraints, such as boundary conditions. A similar combined approach
was used to model strain fields subject to both equilibrium constraints and
boundary conditions [36].
20
There are several examples of using neural networks to learn the Hamiltonian or
Lagrangian of a dynamic system and training this model using the derivatives of
the neural network [3, 4, 37, 38, 39, 40]. These methods ensure that the learned
dynamics are conservative, i.e. the total energy in the system is constant. An-
other approach to learning dynamic systems is presented by Chen et al. [41]
where neural networks are used to model solutions to ODEs and Massaroli et
al. [42] propose a Port-Hamiltonian based approach to training these models.
A method for simultaneous fitting of magnetic potential fields and force fields
using neural networks is studied by Pukrittayakamee et al. [1]. This method
uses a neural network to model the potential field and the force field is then the
partial derivatives of the neural network. In their work, measurements of both
the potential field and the force field are used to train the model. This work
was later extended to provide a practical approach to fitting measurements of
a function and its derivatives using neural networks as well as a discussion on
specific types of overfitting that might be encountered [43]. A similar approach
is taken by Handley and Popelier [44] where it is used for modelling molecular
dynamics and Monte Carlo studies on gas-phase chemical reactions.
Although it was not the motivation for the method, the model for the force
field given by Pukrittayakamee et al. [1] will obey the curl-free constraint and
is equivalent to that presented in Section 5.4. In our work, we extend the idea
of representing the target function by a transformation of a potential function
modelled by a neural network to a broader range of problems that obey a variety
of constraints. Additionally, we focus only on the transformed target function
and do not require measurements of the potential function.
Another interesting idea is presented by Schmidt and Lipson [45] who derived
a method for distilling natural laws from data. In their work, symbolic terms
including partial derivatives are used as building blocks with which to learn
equations that the data satisfies.
21
other area of research is using neural networks to solve constrained optimisation
problems. Several methods using neural networks to solve constraint satisfac-
tion problems have been presented [46, 47]. For example, Xia et al. [48] used a
recurrent neural network for solving the non-linear projection formulation, and
is applicable to many constrained linear and non-linear optimisation problems.
Similar to our approach, their method uses a projection or transformation of
the neural network; however, both the motivation and realisation are substan-
tially different. Neural networks have also been applied to solving optimisation
problems quadratic cost functions subject to bound constraints [49].
7. Conclusion
An approach for designing neural network based models for regression of vector-
valued signals in which the target function is known to obey linear operator
constraints has been proposed. By construction, this approach guarantees that
any prediction made by the model will obey the constraints for any point in
the input space. It has been demonstrated on simulated data and real data
that this approach provides benefits by reducing the size of the problem and
hence providing performance benefits. This reduces the required number of data
points and the size of the neural network — providing savings in terms of time
and cost required to collect the data set.
It was also demonstrated that the proposed approach can out-perform the com-
monly used method of augmenting the cost function to penalise constraint viola-
tions. This is likely due to the fact that augmenting the cost function increases
the size of the problem (by including artificial measurements of the constraint)
and creates a trade-off between constraint satisfaction and minimising error.
Whereas our proposed approach reduces the size of the problem and does not
suffer from this trade-off.
22
absence of such knowledge, constructed using a method of ansatz. Addition-
ally, we provide an example of extending this approach to affine constraints
(constraints with a non-zero right-hand side) in Appendix B. Whilst the pro-
posed approach admits a wide class of neural networks to model the underlying
function, the work presented in this paper focusses on FCNs and it would be
interesting future work to explore the use of other types of neural networks such
as CNNs.
Acknowledgements
This research was financially supported by the Swedish Foundation for Strategic
Research (SSF) via the project ASSEMBLE (contract number: RIT15-0012)
and by the Swedish Research Council via the project Learning flexible models
for nonlinear dynamics (contract number: 2017-03807).
In the simulated study in Section 5.1, the use of regularisation was not consid-
ered. Here, we consider the impact of regularisation on the performance of both
the standard and the constrained neural network. To investigate this impact
the network size study from the previous section was rerun with the networks
regularised by the addition of an L2 penalty on the network weights to the loss
function,
N m
1 X 2
X
loss = (yi − ŷi ) + γ wj2 , (A.1)
N i j
23
-0.5
-1
-1.5
-2
-2.5
-3
-3.5
-4
0 500 1000 1500 2000 2500 3000 3500 4000
Figure A.9: Comparison of both the constrained and standard neural network with and with-
out regularisation. A weight decay of γ = 1 × 10−4 is used to regularise both networks. The
average loss for 200 random trials is shown.
To illustrate the approach, we design a neural network that will satisfy constant
divergence, i.e. ∇f = b, and demonstrate the method in simulation. The model
satisfying this constraint can be built by starting from the model used in the
previous section that satisfies ∇f = 0. From this starting model, we need to add
a component that when mapped through the constraints results in a constant
term. This is easily achieved using a 2 input 1 output linear layer with no bias
24
term, giving the final model as
∂
∂x2
f̂ = g + c0 x1 + c1 x2 , (B.1)
∂
− ∂x 1
where the weights c0 and c1 will be learned along with the rest of the neural
network parameters.
Figure B.10 shows the results of learning a field satisfying these constraints
using our approach and a standard neural network. Measurement locations are
randomly picked over the domain [0, 4]×[0, 4], with 200 measurements simulated
from the field given by
and zero-mean Gaussian noise of standard deviation σ = 0.1 added. For both
networks, two hidden layers are used with 100 and 50 neurons respectively and
Tanh activation layers. The field is then predicted at a grid of 20 × 20 points
with the proposed approach achieving a RMSE of 0.21 compared to 0.48 for the
standard neural network.
Our Approach RMS error =0.21 Unconstrained NN RMS error =0.48
4.0
3.5 3.5
3.5
3.0 3.0
3.0
2.5 2.5 2.5
x2
x2
Figure B.10: Comparison of learning the affine constrained field from 200 noisy observations
using an unconstrained neural network and our approach. Left: the true field (grey) and
observations (red). Centre and right: learned fields subtracted from the true field. Done
using 2 hidden layers, 100 neurons in first, 50 in second for both methods.
25
Appendix C. Strain Field and Measurement Details
This section provides details of the strain field equations and the simulated mea-
surements used for the simulated strain field example in the main paper. The
simulation uses the classical Saint-Venant cantilever beam strain field equations
under an assumption of plane stress [24];
P
xx (x, y) = (l − x)y,
EI
νP
yy (x, y) = − (l − x)y, (C.1)
EI !
2
(1 + ν)P h 2
xy (x, y) = − −y ,
2EI 2
References
26
equation and plasma equilibrium solver, Physical review letters 75 (20)
(1995) 3594.
27
[15] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin,
A. Desmaison, L. Antiga, A. Lerer, Automatic differentiation in pytorch.
28
[24] F. Beer, E. Johnston Jr, J. Dewolf, D. Mazurek, Mechanics of Materials,
sixth edit edition (2010).
[26] A. Solin, S. Särkkä, Hilbert space methods for reduced-rank gaussian pro-
cess regression, Statistics and Computing 30 (2) (2020) 419–446.
[29] Z. Jia, X. Huang, I. Eric, C. Chang, Y. Xu, Constrained deep weak supervi-
sion for histopathology image segmentation, IEEE transactions on medical
imaging 36 (11) (2017) 2376–2388.
29
[33] H. Kervadec, J. Dolz, J. Yuan, C. Desrosiers, E. Granger, I. B. Ayed, Log-
barrier constrained cnns, Computing Research Repository (CoRR).
30
[43] A. Pukrittayakamee, M. Hagan, L. Raff, S. T. Bukkapatnam, R. Koman-
duri, Practical training framework for fitting a function and its derivatives,
IEEE transactions on neural networks 22 (6) (2011) 936–947.
[48] Y. Xia, H. Leung, J. Wang, A projection neural network and its application
to constrained optimization problems, IEEE Transactions on Circuits and
Systems I: Fundamental Theory and Applications 49 (4) (2002) 447–458.
31