0% found this document useful (0 votes)
7 views62 pages

ubc_2022_may_ross_andres.pdf

This thesis presents a proof of concept for implementing linear and nonlinear models based on the Atomic Cluster Expansion (ACE) to approximate atomic energies and forces using machine-learned interatomic potentials. The author, Andres Ross, develops the mathematical framework, describes practical implementations, and tests the models on benchmark data for materials like Silicon, Copper, and Molybdenum. The work includes a Julia implementation of these models and outlines future directions for the code base.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views62 pages

ubc_2022_may_ross_andres.pdf

This thesis presents a proof of concept for implementing linear and nonlinear models based on the Atomic Cluster Expansion (ACE) to approximate atomic energies and forces using machine-learned interatomic potentials. The author, Andres Ross, develops the mathematical framework, describes practical implementations, and tests the models on benchmark data for materials like Silicon, Copper, and Molybdenum. The work includes a Julia implementation of these models and outlines future directions for the code base.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Implementation of a nonlinear Atomic Cluster Expansion

by

Andres Ross

B.S. Honours, McGill University, 2020

A THESIS SUBMITTED IN PARTIAL FULFILLMENT


OF THE REQUIREMENTS FOR THE DEGREE OF

Master of Science

in

THE FACULTY OF GRADUATE AND POSTDOCTORAL


STUDIES
(Mathematics)

The University of British Columbia


(Vancouver)

April 2022

© Andres Ross, 2022


The following individuals certify that they have read, and recommend to the Fac-
ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:

Implementation of a nonlinear Atomic Cluster Expansion

submitted by Andres Ross in partial fulfillment of the requirements for the degree
of Master of Science in Mathematics.

Examining Committee:
Christoph Ortner, Professor, Mathematics, UBC
Supervisor
Chad Sinclair, Professor, Materials Engineering, UBC
Supervisory Committee Member
Khanh Dao Duc, Professor, Mathematics, UBC
Supervisory Committee Member

ii
Abstract

In this thesis, we present a proof of concept implementation of linear and nonlin-


ear models based on the Atomic Cluster Expansion (ACE) introduced in [16]. We
introduce machine-learned interatomic potentials and derive the ACE as an atomic
descriptor. This produces a model linear in its coefficient that serves to approxi-
mate the energies and forces of an atomic configuration. We train its coefficients
for Silicon, Copper, and Molybdenum, and analyze the fit accuracy for energies
and forces benchmark training sets [37]. Furthermore, we extend the ACE model
to approximate energies and forces through a nonlinear combination of linear ACE
models. We describe how to implement this model, and in particular, how to effi-
ciently compute the derivatives, and present example results for the same data sets.
We summarize the Julia implementation of these nonlinear models and provide an
overview of the direction the code base will take in the future.

iii
Lay Summary

Modelling materials at the atomic scale has become a crucial part of scientific re-
search. However, simulating thousands or even millions of atoms directly with
quantum mechanics is highly costly. A way to reduce such a cost is to gener-
ate a surrogate model trained by machine learning with data from a high fidelity
model. In this thesis, we explore a particular class of surrogate models. We develop
the mathematical theory, describe their practical implementation, and test them on
benchmark data.

iv
Preface

This thesis is original, unpublished, independent work by the author, Andres Ross
under the supervision of Dr. Christoph Ortner.

v
Table of Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Lay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

1 Introduction and Background . . . . . . . . . . . . . . . . . . . . . 1


1.1 Materials Modeling . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Calculating the PES . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Interatomic potentials . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Pair Potentials . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Machine Learned Interatomic Potentials . . . . . . . . . . . . . . 5
1.4.1 Gaussian approximation potentials . . . . . . . . . . . . . 7
1.4.2 Moment Tensor Potentials . . . . . . . . . . . . . . . . . 8

2 The Linear ACE Model . . . . . . . . . . . . . . . . . . . . . . . . . 9


2.1 The Atomic Cluster Expansion . . . . . . . . . . . . . . . . . . . 9
2.2 Tensor approximation . . . . . . . . . . . . . . . . . . . . . . . . 11

vi
2.3 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 Nonlinear ACE Model . . . . . . . . . . . . . . . . . . . . . . . . . . 22


3.1 Model Definition . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.1 Loss function . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Efficient computation . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1 List of neighbours . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Reverse mode differentiation: Example in ACE . . . . . . . . . . 32
4.3 Introduction to the code base . . . . . . . . . . . . . . . . . . . . 35
4.4 Implementing nonlinear combinations . . . . . . . . . . . . . . . 37
4.4.1 Multiple properties . . . . . . . . . . . . . . . . . . . . . 37
4.4.2 Forward pass . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4.3 Making models differentiable . . . . . . . . . . . . . . . 38
4.5 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5.1 Multiprocessing . . . . . . . . . . . . . . . . . . . . . . 40
4.5.2 Interacting with optimization packages in Julia . . . . . . 40
4.6 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5 Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . 44


5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

vii
List of Tables

Table 2.1 RMSE for energy and forces. κ represents the basis size used
for that value. . . . . . . . . . . . . . . . . . . . . . . . . . . 20

viii
List of Figures

Figure 1.1 Different scales of materials modeling (courtesy of Gabor Csanyi). 2

Figure 2.1 An example of a set R of J = 22 particles described by their


position vectors rj , with the distances between them as rij . Ri
is the atomic neighbourhood of i such that |rij | < rcut . In the
figure i = 9 has neighbourhood of all hollow atoms. . . . . . 10
Figure 2.2 Log RMSE for varying ϵ and αE /αF . . . . . . . . . . . . . . 19
Figure 2.3 Log RMSE v.s log basis size for energy and forces. . . . . . . 21

Figure 3.1 (ii) embedding, 3 dense layers. . . . . . . . . . . . . . . . . . 28


Figure 3.2 Log RMSE v.s iterations for energy and forces. . . . . . . . . 29

Figure 4.1 Example situation of fi for 14 atoms. The left and the right rep-
resent fi (R) for two different atoms i. The number of atoms n
inside the cutoff radius can (and usually is) of different length
for different i. . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Figure 5.1 Example of f on a configuration for two atoms i = {7, 4}. On


the right we see the atomic environment, and on the left we see
the action of f on the input layer. . . . . . . . . . . . . . . . 46
Figure 5.2 ACE model divided into layers. . . . . . . . . . . . . . . . . 46

ix
Nomenclature

E(Ri ) Site energy of Ri .

F Nonlinear embedding.

N correlation order.

P The radial basis.

α{E,F } Weighting of energies v.s forces.

ϵ RRQR tolerance.

κ Number of basis functions.

ϕnlm One particle basis.

φi An atomic property.

Ainlm Atomic basis.

Ainlm Product basis.

B ACE basis functions.

cB coefficients corresponding to basis function B.

E(R) Total energy of R.

F The forces.

Fi The gradient of energy centered at i.

x
f Function to find the neighbours of an atom.

J Number of atoms.

R := {r1 , ..., rJ }.

Ri := {rij }i̸=j .

rcut A cutoff radius to limit the range of interaction.

rj Position of an atom j.

rij := rj − ri .

VN N th body term in the expansion of E.

Ylm The angular basis.


t , y t ) A DFT calculated training point.
y (t) = (yE F

xi
Chapter 1

Introduction and Background

1.1 Materials Modeling


Modelling of materials at the atomic scale has become a crucial part of scientific
research [4]. From quantum mechanical models to interatomic potentials, approx-
imations of the potential energy surface (PES) have a variety of applications. A
PES describes the energy of a system as a function of specific atomic parameters,
namely the positions of the atoms. Macroscopic properties of a material depend
on its atomic structure, and hence the accurate prediction of the atomic strucuure
is crucial to computational material discovery [32]. Employing the PES enables us
to address challenges like atomic-scale deposition and growth of amorphous car-
bon films [7], proton-transfer mechanisms [20], or dislocations in materials [18]
involving thousands or even millions of atoms. The performance of these simula-
tions relies heavily on the quality of our PES, both its accuracy and computational
cost.

1.2 Calculating the PES


The first approach to calculating the PES comes from Quantum Chemistry. Using
the many-body Schrödinger equation, one can calculate the potential energy in
terms of the wave function by minimizing the following expectation:

1
geometrical

topological

qualitative
1 eV / 40 kT
0.1 eV / 4 kT
0.0001 eV /
0.004 kT

Atoms 10 50 100 1000 1,000,000 108 ∞


time 0 0 ps ns 𝜇𝑠 ms ∞

Figure 1.1: Different scales of materials modeling (courtesy of Gabor


Csanyi).

⟨Ψ|H|Ψ⟩
E= . (1.1)
||Ψ||2
This allows for a complete description of the system, but it requires numerical
approximation. For J atoms, let R := {r1 , ..., rJ } be an atomic structure, with
rj = (x, y, z) the positions of atom j, rj = |rj | its magnitude, and rj1 j2 :=
rj1 − rj2 the distance between atoms i and j. Then we can formulate the time-
independent electronic Schrödinger equation with the Born-Oppenheimer PES

H(R)Ψ = E(R)Ψ, (1.2)

as an eigenvalue problem where the energy levels are the eigenvalues of the sys-
tem. However, solving this PDE with simple discretization results in exponential
growth of the degrees of freedom with the number of electrons (the cause of high
dimensionality), which is very costly. As we see in Figure 1.1, Quantum Chemistry
is limited to only small atomic structures, on the order of at most a few atoms [6].
Although highly accurate, its cost makes it unfeasible for almost all applications.
In 1964 Honhenberg and Kohn [21] proved the existence of a universal density

2
functional that allowed for the calculation of energy. This functional is based on
the electronic density and serves as an approximation of E in 1.2. It is the basis of
what we now know as DFT (Density Functional Theory) and is faster than quantum
chemistry while still retaining good accuracy [21]. We briefly present the general
idea of DFT but quickly move on to methods without electrons.
In DFT models, the energy is given as the sum of external potential energy (V ),
kinetic energy (T ), and interaction energy of the atoms (U ) and is described by the
Hamiltonian operator H = T + V + U , which is evaluated with (1.1). Assuming
a non-degenerate ground state (i.e. a unique quantum state represents that energy),
let us denote the electronic density by ρ(r). This allows us to define a universal
functional:

F [ρ(r)] := ⟨Ψ, (T + U )Ψ⟩. (1.3)

On the other hand V is given by an external potential v(r), as:


Z
V = v(r)Ψ∗ (r)Ψ(r). (1.4)

We can then replace the wavefunction by the electronic density ρ(r) and use the
density functional F [ρ(r)] to replace the other energy contributions, which will
leave us with a representation of the energy that only depends on the external po-
tential v(r) and the electronic density
Z
Ev [ρ] := v(r)ρ(r)dr + F [ρ]. (1.5)

If the functionals v and F were chosen to only depend on the atomic position r,
then energy will also only depend on r. One way to choose ρ(r) is the Kohn-Sham
method [25]. It assumes that the electron density of a system of J electrons can be
written as the sum of one-electron orbitals ψi :

J
X
ρ(r) = |ψi (r)|2 . (1.6)
i=1

Even when optimized, using the ab-initio DFT approach scales with O(J 3 )
since the ψi solve our eigenvalue problem (1.2) [35], which limits the size of sim-

3
ulations one can realistically run with this approach and is the reason why inter-
atomic potentials are useful.

1.3 Interatomic potentials


To overcome the cubic scaling cost of Kohn-Sham DFT, one can employ inter-
atomic potentials. We start again from (1.2), and approximate the energy E wi-
htout using the positions of electrons. Not using electrons is what accounts for the
jump in capability in Figure 1.1 between 103 ns and 106 µs [13].
One idea to construct interatomic potentials is to formally expand E into a
series with each term a specific interaction order. The first term is the summation
of the energies of each atom individually, the second term is pairwise interactions,
and so forth:

X X
E(R) ≈ V0 + V1 (rj1 ) + V2 (rj1 , rj2 ) + ... (1.7)
j1 j1 <j2
X
+ VN (rj1 , rj2 , ...rjN ) + ...
j1 <...<jN

There are several ways to model VN , which depend on the type of interac-
tions we care about in a specific case. For example, since the density of electrons
around an atom decreases exponentially with distance, short-range interactions can
be modeled with a repulsive functional V2 (rj1 j2 ) = Ae−αrj1 j2 , for some param-
eters A and α. Similarly if we wanted to account for Van der Waals interactions,
A
we could use a functional of the form V2 (rj1 j2 ) = rj6 j
for some material depen-
1 2
dent parameter A [25]. Several energy models are derived using physics-inspired
functions and truncating the series expansion. These models tend to be material-
dependent, thus some will perform better than others in different cases.

1.3.1 Pair Potentials


When we truncate at pairwise interactions, we obtain pair potentials. Although,
these models will generally be empirical and not particularly accurate [25], they
provide value when investigating materials processes. One example is the Lennard-
Jones potential which provides a good description of central-force interatomic in-

4
teractions V2 (rj1 j2 ) = 4ϵ(( rjr0j )12 − ( rjr0j )6 ), where r0 is the equilibrium dis-
1 2 1 2
tance that minimizes the potential. The long term contribution is represented with
rj−6
1 j2
, but when this potential was originally developed, short-range decay was not
known, so it was approximated with rj−12
1 j2
. We would get the Born-Mayer potential
if we wanted to use exponential decay for short-range interactions. There are many
other pair potentials that can perform well in some cases, but most of them fail to
describe the properties of metals adequately because they fail to capture the local
electron density [25]. Embedded-atom model potentials (EAM) were developed to
address this by adding an energy functional of the local electron density. Motivated
by DFT, they have the general form:
 X 
X 1 X
V = Fi fj1 j2 (rj1 j2 ) + V2 (rj1 j2 ), (1.8)
2
j1 j1 ̸=j2 j1 ̸=j2

where f is a function that approximates the electron density and F models the cost
of embedding on a nucleus into an electron cloud. A major limitation of EAM-type
potentials is that they do not reflect dynamic changes that arise from changing the
local environment [25]. Finnis and Sinclair introduced a similar potential which,
based on a second-moment approximation to the tight-binding density of states,

uses fj1 j2 (rj1 j2 ) ∼ rj1 j2 [14].
Adding more interaction orders improves performance, which leads to higher-
order potentials like the Stillinger-Weber potential [22] and modified embedded-
atom method (MEAM) potential [24]. However, increasing the order of interaction
J

N increases the cost of evaluation by N where J is the number of atoms, which
makes potentials with N > 3 very expensive.

1.4 Machine Learned Interatomic Potentials


One of the major problems with Interatomic potentials is that they are not transfer-
able. As seen earlier, selecting the parameters and the functionals for a potential
is a very situational choice. These potentials were often fitted using experimen-
tal quantities within an analytic framework which makes them transfer poorly, not
only among materials but also among applications. In 1994 first-principles meth-
ods (methods with electrons) were making big leaps in their capabilities but were

5
nowhere near the capabilities of interatomic potentials. In an attempt to bridge
this gap, Ercolessi and Adams proposed using data created by first principles to
train interatomic potentials, arguing that a richer dataset would allow for increased
transferability [17]. This was the beginning of Machine learned interatomic po-
tentials which sparked a new movement to employ data from high fidelity models
to fit interatomic potentials. In 2007 Behler and Parrinello proposed the use of a
neural network representation of a potential energy surface using DFT data [3].
This network provided the energy and the forces directly as a function of all the
atomic positions in a system. Their method was orders of magnitude faster than
DFT, and they demonstrated high accuracy for bulk silicon compared to standard
empirical interatomic potentials. They used a densely connected neural network
with radial G1i and angular G2i functions as the input and energy as the output. The
radial functions were constructed as a sum of Gaussians with parameters η and rs ,

2
X
G1i (r) = e−η(rij −rs ) fcut (rij ) (1.9)
j̸=i

where fcut is a cutoff function that is 0 for values rij > rcut for some cutoff value
rcut . Define θijk as the angle between rij and rik for a central atom i. Then The
angular functions were constructed for all triplets of atoms by summing over the
cosine values of θijk ,

X
G2i (r) =2 1−ζ
(1 + λcos(θijk ))ζ × (1.10)
j,k̸=i

2 2 2
e−η(rij +rik +rjk ) fcut (rij )fcut (rik )fcut (rjk ) ,

with the parameters λ, η, and ζ.


By using a Silicon data set with the positions of atoms as training points and
the energies as the targets, Behler and Parrinello optimized a cost function to train
the network. In total, 8200 DFT energies were used as training data and 800 were
used as testing data, reaching 5-6 meV per atom root mean squared error (RMSE).

6
1.4.1 Gaussian approximation potentials
In 2010 Bartók, Payne, Kondor and Csányi introduced Gaussian approximation
potentials (GAP), a class of interatomic potentials without a fixed functional form.
These were created to maintain special symmetries of the system and to be auto-
matically generated from DFT energies and forces data. We start by writing the
total energy of a system as the sum of atomic energies Ei (introduced by Behler
and Parrinello),

J
X
E := E({rij }i̸=j ), (1.11)
i=0

where {rij }i̸=j is the set of distances between the central atom i and the neigh-
bouring atoms j. We then define a local density for atom i and it’s neighbours,

X
ρi (r) := δ(r) + δ(r − rij )fcut(|rij |) , (1.12)
j

with fcut a cutoff function as previously shown. With (1.12) we build a kernel G
such that

X
E({rij }i̸=j ) := αn G(b, bn ), (1.13)
n

with n ranging over the configurations, and αn coefficients to be calculated. bi are


the elements of the bispectrum [2] for atom i. The actual construction is omitted
here but can be found in the paper [2].
They define a matrix C as

Cnn′ := δ 2 G(bn , bn′ ) + σ 2 I, (1.14)

with σ and δ hyperparameters, and then solve for αn with

{αn } = α = C −1 y. (1.15)

This produces a symmetry preserving method that improves with more data.
However, two problems that arise are that data contains only total energies and

7
forces and that these will be heavily correlated. To solve these, the authors propose
using a sparsification procedure that reduces the data to a much smaller set of
configurations and replaces y with a linear combination of all data values. With
this approach, they reached, for example, an RMSE of 1 meV per atom for Silicon
energies [2].

1.4.2 Moment Tensor Potentials


In 2016 Shapeev proposed a class of systematically improvable interatomic poten-
tials called Moment Tensor Potentials (MTP) [35]. We divide interatomic poten-
tials into two broad classes, parametric and nonparametric. Parametric potentials
have a fixed number of parameters, like empirical potentials, whose accuracy can-
not be systematically improved. Nonparametric potentials can be systematically
improved in theory. They are composed of two main components, a representation
(also called a descriptor) of the atomic environment and a regression model. The
aforementioned Behler and Parrinello’s neural network potentials (NNP) use neu-
ral networks as the regression model and radial and angular symmetry functions
as the descriptors. GAP uses linear regression as the regression model (1.15) and
the bispectrum as a descriptor. MTP also uses linear regression, but its novelty
comes from using invariant polynomials as descriptors. These polynomials can
provably approximate any regular function that satisfies smoothness with respect
to the number of atoms and permutation, rotational and reflection invariance [35].
The main idea is that a potential V can be approximated by a polynomial that can
be symmetrized, which allows us to construct a basis of such polynomials B(r) as

X
V (R) ≈ cB B(R). (1.16)
B∈B

The idea of using symmetrized polynomials in MTP was the precursor for the
Atomic Cluster Expansion (ACE) which is at the centre of this thesis. We therefore
omit the derivation of B(R) in favour of a detailed description of ACE in the next
chapter.

8
Chapter 2

The Linear ACE Model

In this chapter we derive the Atomic Cluster Expansion (ACE) and describe its use
to model the PES of an atomic configuration. We present a linear model based on
the ACE and show results for benchmark data set for Silicon, Copper and Molyb-
denum.

2.1 The Atomic Cluster Expansion


Let us consider J particles described by their position vectors rj . A set R :=
{r1 , ..., rJ } ∈ R3J of J particle positions is called an atomic configuration (Figure
2.1). In practice in materials simulations, computational cells are endowed with
periodic boundary conditions. During simulations we take this into account, but
for the sake of simplicity we will ignore it in our derivation.
A PES is a mapping from the set of all configurations to a real number E(R) ∈
R. In our current context, E is a configuration’s total energy. This mapping is
naturally permutation invariant since configurations are defined as sets, but we will
further require it to be also invariant under isometries, that is,

E({Qr1 , ..., QrJ }) = E({r1 , ..., rJ }) ∀Q ∈ O(3).

Interatomic potential models represent E in terms of lower dimensional com-


ponents [16]. Let rij = rj − ri be the distances between atom j and a central atom
i, and let Ri = {rij }j̸=i denote the atomic neighbourhood of atom i (Figure 2.1).

9
1 3
4

𝑅 = {𝑟1 , … , 𝑟22 } 2 5
10

6
7
8
𝑅𝑖
9 11 12

17 16
𝑦
𝑧 15 13
19
14
18

20
22 21
𝑥

Figure 2.1: An example of a set R of J = 22 particles described by their


position vectors rj , with the distances between them as rij . Ri is the
atomic neighbourhood of i such that |rij | < rcut . In the figure i = 9 has
neighbourhood of all hollow atoms.

Now let us assume that E can be represented in terms of the following body order
expansion:

J
X
E(R) = E(Ri ), (2.1)
i=1

N
X X
E(Ri ) = V0 + VN (rij1 , ..., rijN ),
N =1 j1 <j2 <...<jN

where V0 ∈ R, N ∈ N is the maximal order of interaction and VN : R3J →



R. These VN are functions that map N rij to a real number, and are called N -
body functions. E is called a site energy function, and it depends on its atomic
environment Ri . We write the equation above for site energies more explicitly as

10
X X
E(Ri ) ≈ V0 + V1 (rij1 ) + V2 (rij1 , rij2 ) + ... (2.2)
j1 j1 <j2
X
+ VN (rij1 , ..., rijN ).
j1 <...<jN

In this form, we can see the role of N in the expansion. We are representing the
site energy E as a summation of increasing body order terms, i.e. terms that account
for higher-order interactions. For example, V1 is a pair potential that accounts for
all pairwise interactions in the atomic environment, and similarly, VN accounts for
N + 1 particle interactions. The goal of this expansion is to truncate at N << J,
which significantly reduces evaluation cost.
When we defined E, we required isometry invariance and noticed it already
possessed permutation invariance. Therefore, we will assume that our components
VN are also isometry and permutation invariant. Moreover, we will further assume
regularity and locality.

VN ∈ C t (R3N \ {0}), for some t ≥ 1, (2.3)

∃rcut > 0 s.t VN (rij1 , ..., rijN ) = 0 if max1<j<N |rj | ≥ rcut .

where rcut is a cutoff radius to limit the range of interaction (Figure 2.1). Similarly,
we also restrict the domain by introducing a minimal radius r0 > 0 since we are
not interested in atomic collisions. We will include rcut in E such that E(Ri ) takes
Ri = {rij }j̸=i , but only considers {rij }rij <rcut .

2.2 Tensor approximation


We now approximate VN by using a tensor product basis consisting of a radial and
a spherical function,

N
Y
ϕnlm (r1 , ..., rN ) := ϕnα lα mα (rα ), ϕnlm (r) := Pn (r)Ylm (r̂) (2.4)
α=1

where n = (n1 , ..., nN ) and similarly for l and m, n = 0, 1, 2, ... dictates the

11
radial functions while l = 0, 1, 2, ... and m = −l, ..., l (the azimuthal and magnetic
quantum numbers respectively) dictate the angular functions, which in our case
are the spherical harmonics. We also denote r̂ as the unit vector of r, r as it’s
magnitude, and R̂ = (r̂1 , ..., r̂N ). The choice of spherical harmonics will later
allow us to conveniently impose rotational symmetries, but the choice of Pn has
considerable freedom. This allows us to play with different choices to improve
convergence, but we will not pursue this freedom in the present work. Let the
radial basis

P := {Pn (r)|n = 1, 2, ...} (2.5)

be a linearly independent subset of {f ∈ C t ([r0 , ∞))|f = 0 in [rcut , ∞)}, where t


is the parameter from (2.3). Moreover, we assume that any f ∈ C ∞ with support
in [0, rcut ] can be approximated to within arbitrary accuracy from span P, i.e.,

spanC t P ⊃ {f ∈ C ∞ ([r0 , ∞))|f = 0 in [rcut , ∞)]}, (2.6)

where spanC t P denotes the closure of P with respect to the norm || · ||C t . These
two assumptions of the radial function mean that VN can be approximated with a
linear combination of the tensor product ϕ,
X
VeN ≈ cnlm ϕnlm . (2.7)
n,l,m

It has been shown in [16] that since the ϕnlm are linearly independent and VeN is
permutation invariant (VeN = VeN ◦ σ), we can assume cnlm = cσn,σl,σm without
loss of accuracy. This allows us to write

X X
VeN ≈ cnlm ϕnlm ◦ σ, (2.8)
(n,l,m) ordered σ∈SN

where cnlm could be different coefficients, and ”(n, l, m) ordered” denotes the
lexicographically ordered tuples. Since we assumed point reflection symmetry for
P
VN , all basis functions ϕnlm for which l is odd vanish. Hence

12
X X
VeN (R) ≈ cnlm (ϕnlm ◦ σ)(R). (2.9)
(n,l,m)
P ordered σ∈SN
l even

To make 2.9 rotationally invariant, we integrate over all rotations using the
Haar integral [16],

X X Z
VeN ≈ cnlm (ϕnlm ◦ σ)(QR)dQ. (2.10)
(n,l,m) σ∈SN SO(3)
P ordered
l even

Recall that the radial functions P are already rotationally invariant, so we focus
on Ylm . Now we represent the rotated spherical harmonics in terms of the Wigner
D-matrices [16]

(Q)Ylµ (R̂)
X
Ylm (QR̂) = l
Dµm ∀Q ∈ SO(3), (2.11)
µ∈Ml

where Ml := {µ ∈ ZN | − lα ≤ µα ≤ lα , for α = 1, ..., N }, and

N
Y
l
Dµm (Q) = Dµlαα mα (Q). (2.12)
α=1

Integrating yields a spanning set {blm }, with

Ylµ (R̂),
X
l
blm (R̂) := D̄µm (2.13)
µ∈Ml

where
Z
l l
D̄µm (Q) = Dµm (Q)dQ. (2.14)
SO(3)

l
The D̄µm coefficients can be efficiently computed with a recursive formula in-
volving Clebsch-Gordan coefficients [16]. Then we reduce this set to a basis by
defining Uel and ñl := rankD̄l ,
µi

13
Ylµ (R̂),
X
l
bli (R̂) := Ueµi i = 1, ..., ñl , (2.15)
µ∈Ml

l to span the same space as the columns of


where we require the columns of Ueµi
l
D̄µm . Adding the radial component again, we define the rotational and permuta-
tion invariant basis

X X
l
B
enli (R) := Uemi (ϕnlm ◦ σ)(R), i = 1, ..., ñl . (2.16)
σ∈SN m∈M0
l

However, these are not linearly independent. Therefore we define a new set of
coefficients by diagonilizing the Gramian Gnl′ = ⟨⟨B enli′ ⟩⟩, with respect to
enli , B
i,i
the abstract inner product ⟨⟨ϕnlm , ϕn′ l′ m′ ⟩⟩ := δnn′ δll′ δmm′ .

ñl
1 X
nl
Umi := [Vαi ]∗ Uemα
l
, i = 1, ..., nnl (2.17)
Σii
α=1

where nnl = rank(Gnl ) and we diagonilized Gnl = VΣV T . With this new coeffi-
cients we obtain

X X
nl
Bnli (R) := Umi (ϕnlm ◦ σ)(R). (2.18)
m∈Ml σ∈SN

So far, we have only treated a single correlation order N . We now move back
to treating all atoms J by defining Bnli as

X
Bnli (r1 , ..., rJ ) := Bnli (rJ1 , ..., rJN ). (2.19)
j1 <j2 <...<jN

J

We currently have (2.19) scale as N , which is terribly inefficient. We will now
leverage the tensor products to reduce the computational cost of our current basis.
P P
We start by completing the summation from j1 <j2 <...<jN to j1 ,j2 ,...,jN and use
(2.18) to get

14
X X X
nl
Bnli (R) = Umi ϕnlm (rjσ1 , ..., rjσN ) (2.20)
j1 ,...,jN m∈Ml σ∈SN

X X
nl
= Umi ϕnlm (rj1 , ..., rjN ) (2.21)
m∈Ml j1 ,...,jN

X J
X N
Y
nl
= Umi ϕnα lα mα (rjα ). (2.22)
m∈Ml j1 ,...,jN =1 α=1

Since we sum over tensor products, we may interchange the summations and the
product in the following way

X N X
Y J
nl
... = Umi ϕnα lα mα (rj ). (2.23)
m∈Ml α=1 j=1

We now define Ainα lα mα as

X N
Y
nl
Bnli (R) =: Umi Ainα lα mα (R), (2.24)
m∈Ml α=1
QN
and a product basis Ainlm (R) := α=1 Ainα lα mα (R). This avoids the N ! cost
J

for symmetrising the basis as well as the N cost of summation over all clusters
in (2.19). We denote the resulting basis by

 X 
2N
BN := Bnli |(n, l) ∈ N ordered, l even, i = 1, ..., nnl , (2.25)

and redefine the site energy as

N
X
E(Ri ) := BN (Ri ). (2.26)
N =0

Finally, we choose a sub set of (2.25) by further restricting (n, l, m). We define a
maximal degree d ∈ R. Then choose all nα and lα such that

15
X
(nα + wL lα ) ≤ d, (2.27)
α

where wL is a relative weighting of the angular and radial basis functions. Higher
wL leads to higher resolution in the radial component. With this choice of (n, l, m)
we define a set of basis functions B ∈ B and their coefficients cB as

X
E(Ri ) =: cB B(Ri ), (2.28)
B∈B

and the resulting total energy as

J X
X
E(R) = cB B(Ri ). (2.29)
i=1 B∈B

Let κ be the number of basis functions, i.e. the length of B and the number of
parameters cB . We can then switch the summations and define a new basis function
B, but now over whole configurations, resulting in the linear model

X
E(R) = cB B(R) = c · B(R). (2.30)
B∈B

We can compute the forces, Fi := − ∂E(R)


∂ri , using

∂E(R) J
 
F (R) := − . (2.31)
∂ri i=1

2.3 Parameter estimation


Similar to Section 1.4, our goal is to train parameters c through a set of energies
and forces calculated from a high fidelity model (DFT). Let us then consider a
training set of T atomic configurations R(t) , each with Jt atoms. We define the
extended design matrix Ψ using equations (2.29) and (3.3) as follows:

16
 
αE B1 (R(1) ) ... αE Bκ (R(1) )
 −αF ∂B1 (R(1) ) −αF ∂Bκ∂r
(R (1) )
 
...
 
∂r1

α E(R (1) )
 1  E
 .. .. ..   
.  αF F (R(1) ) 


 . .  c
 1
(1) ) (1) )  
  ...  = 
∂B 1 (R
  
−αF ∂B∂r κ (R (2) ) 

Ψ · c := 
 −αF ∂r J1
... J1     α E E(R ,
.. 
 αE B1 (R(2) ) ... (2)
αE Bκ (R )  cκ
  
 . 

.. ..
 
.. αF F (R )(T )
.
 
 . . 
∂B1 (R(T ) ) ∂Bκ (R(T ) )
 
−αF ∂rJ ... −αF ∂rJ
T T
(2.32)
where the length of B is κ, i.e the number of basis functions and parameters.
We also define αE , αF ∈ R as weightings to multiply the energies and forces in
Ψ. Now define a set of DFT calculations to act as targets in the training. For
(t) (t)
each training configuration R(t) , let y (t) = (yE , yF ) be the corresponding DFT
(1) (1) (2) (T )
calculated energy and forces. Then let y = [yE , yF , yE , ..., yF ]. We seek
parameters c such that the loss

L(c) := ||Ψc − y||2 , (2.33)

is minimized. To avoid over-fitting to the training set, we multiply Ψ by the diag-


onal of a matrix Λ that estimates the scaling of ∇2 ϕnlm . From 2.24 we can create
a matrix Λ such that Λnl = n2 + l2 where n and l correspond to a specific B ∈ B
through 2.25. Once Λ is calculated we update Ψ = Ψdiag(Λ).
We solve this problem using rank revealing QR factorization (RRQR). Given a
matrix Ψ, it can be shown that there exists a permutation Π and an integer k such
that the QR factorization
!
R11 R12
ΨΠ = QR = Q (2.34)
R22
has upper-triangular k × k matrix R11 , and R12 is linearly dependent on R11 [19].
We perform the factorization above using [23], and terminate when a certain toler-
ance ϵ is reached. Finally, we solve QR11 c = y.
To measure the performance of c, we calculate the root mean squared error

17
(RMSE) on a test set for both energies and forces separately. For energies, we use
v
uP (t) t )2
u T (ΨE ·c−yE
t t=1 Jt2
RMSEE = (2.35)
T
(t) (t)
where ΨE is the tth row of a matrix ΨE of only the energies, yE ∈ yE are the
elements of a vector with only the DFT energies, and T is the total number of
energies. For forces we use
s
||ΨF · c − yF ||22
RMSEF = , (2.36)
3||J ||1
where similarly, ΨF is a matrix containing only the forces of Ψ and yF are the
forces in y.

2.4 Results
To test the ACE model, we use the data sets from [37], which contains molecular
dynamics, elastic, surface and vacancies structures for bulk silicon, copper and
molybdenum (as well as other materials). The data set contains training sets of
sizes 214, 262, and 194, and test sets of sizes 25, 31, and 23, of DFT energies and
forces, for Si, Cu, and Mo, respectively. It is worth mentioning that these data sets
are pretty small and should only be used as proof of concept. Practical data sets
can be much larger (order of thousands of structures). It is also common to include
virials in ACE fits in addition to energies and forces.
We used the codes supplied by ACE1pack.jl [33]. There are several param-
eters one can modify to achieve better results, but in this work, we focus on (i) the
RRQR tolerance ϵ, (ii) the relative weighting of energies and forces αE /αF , and
(iii) the basis size κ. Further parameters such as the cutoff radius are taken from
[37]. We used LowRankApprox.jl [23] for the RRQR factorization, where we
can manually set the tolerance on the error of the factorization.
Since the data set is small, and linear models are relatively fast we generate a
heat map with ϵ and αE /αF as parameters. We set the correlation order to N = 3
and choose a basis size of κ = 964 to keep computational cost low. With higher

18
Si Energy Forces

Cu Energy Forces

Mo Energy Forces

Figure 2.2: Log RMSE for varying ϵ and αE /αF .

19
Table 2.1: RMSE for energy and forces. κ represents the basis size used for
that value.

RMSE silicon copper molybdenum


energy (meV) 2.923 (κ = 1706) 0.320 (κ = 3695) 2.857 (κ = 1291)
forces (meV/at) 0.111 (κ = 1706) 0.015 (κ = 3695) 0.182 (κ = 1291)

basis sizes we could have over fitting as well as more costly simulations. We used
a cutoff radius of 5.5Å for silicon, 4.1Å for copper, and 5.2Å for molybdenum.
Figure 2.2 shows the log of the RMSE of the test set for energy and forces in a
heat map against the parameters ϵ and αE /αF . We see in Figure 2.2 that both low
ϵ and low αE /αF , as well as high ϵ and high αE /αF , give higher RMSE, likely
due to over-fitting. Therefore, we chose parameters closer to the middle. (ϵ =
10−7 , αE /αF = 10), (ϵ = 10−6 , αE /αF = 30), and (ϵ = 10−7 , αE /αF = 10)
where visually chosen for silicon, copper and molybdenum respectively.
Using these parameters, we calculated the RMSE of energy and forces for cor-
relation order N = 3 and increasing polynomial degrees, meaning increasing basis
sizes. Figure 2.3 shows the log scaled RMSE of the test set against the log scaled
basis size. The lowest RMSE for energy and forces gave different κ. Therefore,
we chose the κ visually from Figure 2.3 to give the both low RMSE for energies
and for forces, this results can be found in Table 2.1.
We can see in Figure 2.3 that for basis sizes below 1000, a larger basis size
leads to better accuracy, which we expect since increasing basis size increases the
number of parameters. However, for κ bigger than 1000, the RMSE for forces
increases again, likely due to over-fitting. Since we report results on the test data
set, increasing parameters without adjusting regularization overfits to the training
set, hence dropping performance in the test set. We likely only see this in forces and
not in energy since Ψ contains significantly more entries for forces which indicates
we could use a bigger αE /αF for larger κ.

20
Si Energy Forces

Cu Energy Forces

Mo Energy Forces

Figure 2.3: Log RMSE v.s log basis size for energy and forces.

21
Chapter 3

Nonlinear ACE Model

In this chapter we extend the models seen in Chapter 2 to allow for a nonlinear
composition of linear models. We describe the model as well as the efficient com-
putation of its derivatives. We present results based on the same data sets studied
in Chapter 2.

3.1 Model Definition


The atomic cluster expansion (ACE) we introduced in Chapter 2 can be used to
model any invariant atomic property φ. In the current chapter we will extend this
model to allow for a nonlinear combination of such properties. Furthermore, we
will present a method to efficiently evaluate the gradients of the nonlinear model
and show initial regression results that serve as a proof of concept.
Consider (2.29), and define an atomic property centred at atom i in terms of an
ACE basis B as

X
φi := cB B(Ri ) = c · B(Ri ), (3.1)
B∈B

Then we define the energy of a nonlinear model as

J J
(1) (2) (P )
X X
E(R) = E(Ri ) := F(φi , φi , ..., φi ), (3.2)
i=1 i=1

22
where p ∈ {1, ..., P } indexes the atomic properties, and F : Rp → R is a nonlinear
embedding function. The forces would be

∂E(R) J
 
F (R) := − , (3.3)
∂ri i=1

but now there is not a simple formula like (3.3).


This type of nonlinear model is called a nonlinear combination of properties
and was proposed in [15]. Models of this type will have κP parameters, where
(p)
we assume all φi have basis length κ. There are other ways one can incorpo-
rate physics-inspired nonlinearities but this thesis will focus on (3.2) as a proof of
concept. We briefly visit other nonlinear models in Chapter 5.
Some examples of the embedding F include
q
(1) (2) (1) (2)
F(φi , φi ) = φi − φi (3.4)

inspired by the Finnis-Sinclair [14] and embedded atom models, or its generaliza-
tion

P
(1) (P ) (p)
X
F(φi , ..., φi ) = |φi |αp . (3.5)
p=1

where {α}Pp=1 is a set of exponents. Finally, one could consider a neural network
parametrization of F, e.g.,
   
(1) (P )
F(φi , ..., φi ) = W3 W2 W1 φTi + b1 + b2 + b3 (3.6)

(1) (P )
a feed forward neural network with 3 dense layers. In (3.6) φTi = [φi , ..., φi ]T ,
while W = [W1 , W2 , W3 ] and b = [b1 , b2 , b3 ] are the weights and biases of the
dense layers respectively. Notice that in (3.6) and (3.5) F has trainable parameters.
We denote the trainable parameters of a nonlinear model by θ = [θF , θφi ], where
F has parameters θF and φi has parameters θφ := [c(1) , ..., c(P ) ] for all i.

23
3.1.1 Loss function
As in Section 2.3, we define a training set R = [R(1) , ..., R(T ) ], with Jt the number
of atoms in R(t) , and its respective DFT energies and forces Y = [y (1) , ..., y (T ) ],
(t) (t) (t) (t)
with y (t) = (yE , yF ). Recall that yE ∈ R and yF ∈ R3×J . We then define a
loss function to minimize as follows
PT (t) , y (t) )
t=1 L(R
L(R, Y ) = + λ||θ||22 , (3.7)
T

Jt
X
2
L(R, y) = αE (E(R) − yE )2 + αF2 |Fi − yF |2 =: LE (R, yE ) + LF (R, yF ),
i=1
(3.8)
where R = {r1 , ..., rJ }, λ, αE and αF are hyperparameters. We define the loss
as an explicit function of the training set (R, Y ) with implicit dependence on the
trainable parameters θ. Therefore, we seek parameters θ such that

min L(R, Y ) (3.9)


θ

To solve (3.9) we require the gradient

PT ∂L(R(t) ,y (t) )
∂L(R, Y ) t=1 ∂θ
= + 2λθ, (3.10)
∂θ T
where

Jt
∂L(R, y) 2 ∂E(R) 2
X ∂ 2 E(R)
= 2αE (E(R) − yE ) + 2αF |Fi − yF | (3.11)
∂θ ∂θ ∂ri ∂θ
i=1

where we used (3.3). A naive evaluation of the gradients is very costly due to the
∂ 2 E(R)
computation of ∂ri ∂θ . We will review an efficient way to evaluate it in Section
3.2 via backpropagation.
Similar to Section 2.3, we evaluate the performance of the model by measuring
the RMSE of the energy and the forces for a test set. For the energy we measure

24
v
(E(R(t) )−yE )2
uP
u T
t t=1 Jt2
RMSEE = , (3.12)
T
and for the forces
v
u T PJ
2
i=1 |Fi − yF |
uX t
RMSEF = t . (3.13)
3Jt
t=1

3.2 Efficient computation


In this section we will demonstrate how to efficiently compute the derivative of the
forces with respect to parameters, but we will postpone its Julia implementation
to Chapter 4. Let us start with (3.11) and consider only the second term, LF ,
representing the fit accuracy on the forces:

J
∂LF (R, y) X ∂ 2 F(φi )
= 2αF |Fi − yF |
∂θ ∂rij ∂θ
i=1
J
X ∂2F ∂ 2 φi
= 2αF |Fi − yF | (3.14)
∂φi ∂θF ∂rij ∂θφ
i=1
J
X ∂ 2 φi
=: ωi ,
∂rij ∂θφ
i=1

where we defined ωi implicitly. Then using (2.28) and (2.24) we can compute

(p) N X
∂φi X (p) ∂Aiv
= c̃v (3.15)
∂rij v
∂rij
N =0

with v representing the summation over the corresponding (nlm), as described by


(p) nl from (2.24) for a property
(2.25). Then c̃v are the corresponding parameters Umi
p. Defining Nv := {N }N
N =0 , expanding the product basis, and using the product
rule we obtain

25
(p) Nv X
Y J 
∂φi X (p) ∂
= c̃v ϕvs (rij )
∂rij v
∂rij
α=1 s=1
Nv
(3.16)
(p)
X X Y
= c̃v Aivs ∇ϕvt (rij ).
v α=1 α̸=s

Expression (3.16) has cost equal to #c̃ × N 2 × J × P . We have our first cost
reduction by switching the order of the expression above.

X Nv 
(p)
X X Y
... = ∇ϕv (rij ) c̃v δvvt Aivs
v v α=1 α̸=s (3.17)
X
=: ∇ϕv (rij ) · ωvϕ ,
v

where we implicitly defined ωvϕ . The cost is now #c̃ × N 2 × P + #v × J. Now


using (3.14) and (3.19)

J
∂LF (R, y) X X ∂ωvϕ
= ωi ∇ϕv (rij )
∂θ v
∂θφ
i=1
J
XX ∂ωvϕ

= ωi · ∇ϕv (rij ) (3.18)
v i=1
∂[c(1) , ..., c(P ) ]
X ∂ωvϕ
=: ωv′ ,
v
∂[c(1) , ..., c(P ) ]

where we defined ωv′ to hold all the i dependence. The cost of evaluating ωv′ over
all v is J × #v. Then, using (3.19)

26
X Nv 
X ∂ (p)
X Y
... = ωv′ c̃v δvvt Aivs
v
∂[c(1) , ..., c(P ) ] v α=1 α̸=s
Nv X
∂ X (p) X

Y
= c̃v δ vvt ω v Aivs
∂[c(1) , ..., c(P ) ] v α=1 v α̸=s (3.19)
Nv
X 
∂ X (p)
Y
= c̃v ωv′ α Aivs
∂[c(1) , ..., c(P ) ] v α=1 α̸=s
∂ X (p)
=: (1) (P )
c̃v A′v ,
∂[c , ..., c ] v

where we defined A′v . Now using vector notation over all v,


=: c̃(p) · A′ , (3.20)
∂[c(1) , ..., c(P ) ]
and using (2.24)


= c(p) · UA′ . (3.21)
∂[c(1) , ..., c(P ) ]
The cost of computation of A′v is #c̃ × N 2 for all A′ = {A′v }v , but it can be
further reduced to N [26]. The final cost of the expression is cost(U × A′ ) + #c̃ ×
N 2 ×P +J ×#v. The cost of U can be reduced further through symmetries which
make it a sparse matrix [16].

3.3 Results
Using the techniques mentioned in this chapter we minimized equation (3.7) using
two different embeddings F, (i) a Finnis-Sinclair inspired model from [26], and
(ii) three dense layers (equation 3.6). For both of them we used 2 properties with
basis size of κ = 489, which gave a θφ ∈ R978×2 .
(i) For the Finnis-Sinclair embedding we use a function very closely inspired
by what was used in the copper model in [26]. We defined F as:

27
1

3 1

1 4 2

5 3 1

2 6 4

7 5

9
10

Figure 3.1: (ii) embedding, 3 dense layers.

s
(2) (2)
e−|φi | e−|φi |
 
(1) (2) (1) (2) (2)
F(φi , φi ) = φi + sign(φi ) |φi | + − . (3.22)
4 2

(ii) For the dense layers we used the architecture in figure 3.1 and (3.6). In our
case W1 ∈ R2×10 , W1 ∈ R10×5 , W3 ∈ R5×1 , b1 ∈ R1 0, b1 ∈ R5 , and b3 ∈ R1 .
This gives θF = [W1 , b1 , W2 , b2 , W3 , b3 ].
We solve (3.9) using the same data sets and cutoff radius as we did for the
linear model, and correlation order N = 3. We used BFGS [31] on (3.7) for 500
iterations. Usually, models would be ran for longer iterations, but we choose 500
as a proof of concept. We empirically chose αE /αF = 1/10 for all models and
λ = {10−7 , 10−6 , 10−7 } for Si, Co, and Mo respectively.
We present 6 plots in figure 3.2, where we plot the log of the RMSE for energy
and forces for both embeddings. We compare them against the best linear RMSE in
table 2.1. For all materials we beat the best linear forces, but we don’t beat the en-
ergies. There is similar performance for the two embeddings, except for the forces
of copper where we see (ii) converge faster, and for the forces of molybdenum,
were (ii) retains a low RMSE while (i) seems to over-fit.

28
Si Energy Forces

Cu Energy Forces

Mo Energy Forces

Figure 3.2: Log RMSE v.s iterations for energy and forces.

29
These results should serve as a proof of concept. However, more work is
needed for our current implementation to be competitive.

30
Chapter 4

Implementation

In this chapter we go over the implementation of the nonlinear models described


in Chapter 3. The goal is to overview the challenges involved in implementing
such models in Julia [5]. We do so for two main reasons, (1) as documentation
for further development and (2) as a starting point for a Journal of Open Source
Software paper.

4.1 List of neighbours


Recall that when calculating the energy and forces of an ACE model, we only
consider the atoms that are a distance rij < rcut , i.e. {rij }rij <rcut . In Chapters 2
and 3 we incorporated this cutoff into E, but for implementation it is necesary to
define a separate function

fi (R) = (rij )rij <rcut . (4.1)

f (Ri ) works by finding a list of neighbours of atom ri . Figure 4.1 shows an exam-
ple of f (Ri ) for two different central atoms i with J = 14 and cutoff radius rcut .
We can then define f (R) = [f1 (R), ..., fJ (R)], which returns the neighbour lists
for all atoms. With this new definition, we implement

J
X
E(R) = F(φ(fi (R))), (4.2)
i=1

31
n
n

n
n

n n
i n

i n
n

n
n

Figure 4.1: Example situation of fi for 14 atoms. The left and the right rep-
resent fi (R) for two different atoms i. The number of atoms n inside
the cutoff radius can (and usually is) of different length for different i.

(1) (P )
with φ({rij }rij <rcut ) = [φi , ..., φi ].

4.2 Reverse mode differentiation: Example in ACE


To implement the derivatives of the models described, we use automatic differenti-
ation. Specifically, we use reverse mode differentiation, starting with the outermost
function rather than the innermost (in contrast to forward mode). As an example,
consider (4.2) and take it’s derivative according to rij for some fixed i

∂E(R) ∂E(R) ∂F ∂φ ∂fi (R)


= . (4.3)
∂rij ∂F ∂φ ∂fi ∂rij
First, we define a Jacobian vector product (JV P ) as the projection of a given
vector g into the Jacobian matrix of an operator h according to θ [1]

∂h
ωθ (g, h) := g . (4.4)
∂θ
Forward mode differentiation defines a JV P for each function to differentiate.
We also call JV P push-forwards to be consistent with naming convention in the
Julia package ChainRules.jl. We can represent the computation of (4.3) in
the following order

32
∂fi (R)
wfi := (4.5)
∂rij
∂φ
wφ := wf (4.6)
∂fi i
∂F
wF := wφ (4.7)
∂φ
∂E(R)
wE := wF (4.8)
∂F
∂E(R)
:= 1wE (4.9)
∂rij
where all w’s are placeholders for numerical values, not symbolic expressions.
Then we can represent the same computation as the composition of push-forwards,

∂E(R)    
= ωrij ωfi ωφ ωF 1, E , F , φ , fi . (4.10)
∂rij
For reverse mode differentiation we further define functions
 T
∂h
ωθT (g, h) := g (4.11)
∂θ
to carry out the Jacobian transpose vector product (J T V P ), and call them pull-
backs. The naming convention comes from the Julia package ChainRules.jl
and has a definition broadly in agreement with their use in differential geometry
[36]. We will continue to use the word ”pullback” throughout this chapter, and
also its ChainRules.jl functional implementation name ”rrule”. Let’s look at
the same example, but now with reverse mode differentiation. We compute the
derivative in a ”reverse” order

33
∂E
wE (1) := 1 (4.12)
∂E

∂E
wF (wE ) := wE (4.13)
∂F

∂F
wφ (wF ) := wF (4.14)
∂φ

∂φ
wfi (wφ ) := wφ (4.15)
∂fi

∂E(R) ∂fi 


= w E w F w φ w fi (4.16)
∂rij ∂rij
where the w’s are now stored as functions. These functions are already the pull-
backs (or rrules), so in the notation of 4.11

∂E(R) T T
ωfTi ωrTij 1, fi , φ , F , E .
   
= ωF ωφ (4.17)
∂rij
In our implementation we use reverse mode differentiation, and in Section 4.4.3
we will see how the pullbacks were implemented. We used reverse mode differen-
tiation to allow for the computation shown in Section 3.2. As an example, let us
consider (3.14). We start with

∂LF (R, y) ∂LF ∂F ∂2φ


= · ⊙ , (4.18)
∂θ ∂F ∂φ ∂rij ∂θφ
where ⊙ is the Hadamard product, and we use contraction over the gradients of
vectors F and φ instead of the summations. Then we define ωθTφ (the pullback of
φ) as

∂LF (R, y) ∂LF (R, y) ∂F ∂φ


= ωθTφ ( · , ), (4.19)
∂θ ∂F ∂φ ∂rij
where ωθTφ can be thought of as ωi in (3.14).

34
4.3 Introduction to the code base
We first introduce the most relevant repositories and their functionality. Most of
these can be found under the Github organization ACEsuit [10].

• ACE1pack.jl: Convenience functionality for fitting inter atomic potentials


using ACE [33].

– Used in Section 2.4 to create linear fits.

• ACE.jl: General approximation schemes for permutation and isometry equiv-


ariant functions [12].
(1) (P )
– The backbone of the codes, everything needed to calculate (φi , ..., φi )
given a configuration {rij }j̸=i is here.

• ACEatoms.jl: Generic atomistic modelling related extensions of ACE.jl


[9].

– Utility functions to calculate the energy and forces, including f .


– This package takes R and calls functions in ACE.jl with {rij }rij <rcut .
– Then given E and it’s gradients it computes E and F .

• JuLIP.jl: Rapid implementation and testing of new interatomic potentials


and molecular simulation algorithms [11].

– Very similar to ACEatoms.jl.


– This package handles everything relating to atomic neighbours.
– Handles the computation of the total energy and forces of an atomic
environment given the values for each site.

• ChainRules.jl: Common utilities that can be used by downstream automatic


differentiation (AD) [36].

– Provides ”rrules” which are custom pullback functions for each func-
tion we want to differentiate.

35
– We coded the optimized differentiation (Section 3.2) by creating cus-
tom pullbacks ω within this package.

• Zygote.jl: Source-to-source automatic differentiation (AD) in Julia [27].

– Performs the actual derivatives for the nonlinear models.


– Given a loss function, it differentiates it using reverse mode AD.
– Calls all of our custom defined pullbacks in ChainRules.jl when
needed.

• Flux.jl: Julia machine learning package [29].

– Using Zygote.jl, it differentiates and trains models.


– Allows you to customize layers, as well as providing several machine
learning utilities, for example Dense layers.
– Allowed us to implement (3.6).

• ACEflux.jl: Interface between ACE.jl and Flux.jl [8].

– Using most of the previous packages it provides a framework to gener-


ate nonlinear ACE models.
– Generates E as a Flux.jl layer.
– A ACEflux.jl layer takes an R, calls ACE.jl to generate ψi , then
uses Flux.jl to generate F.
– It overloads JuLIP.jl’s E and F functions to accept ACEflux.jl
models as input.
– Ensures E and F are fully differentiable with respect to parameters.

• Optim.jl: Univariate and multivariate optimization in Julia [34].

– Provides the BFGS optimizer used in the nonlinear fits.

36
4.4 Implementing nonlinear combinations
ACEflux.jl is a wrapper around ACE.jl and JuLIP.jl to allow compatibil-
ity with Flux.jl. This bridge allows us to leverage Flux.jl layers to generate
and compose nonlinear functions on an ACE model φi . However, as we will see in
this section, there were several caveats and issues when bringing all the packages
together. The package is now operational but limited in what it supports. New ef-
forts are now placed into making ACE.jl more general by splitting φi into several
layers (the one-particle basis, the product basis, and the symmetric basis.). This
new model is outside the scope of the current work, but we will briefly overview it
in the Chapter 5.

4.4.1 Multiple properties


We quickly give an overview of the implementation on ACE,jl for completeness
and so the reader can more easily understand the code base.
Before implementing nonlinear models, we enabled our ACE models to have
multiple properties, i.e implement φi . This simply means generating a struc-
(1) (P )
ture that allows for multiple linear models (φi , ..., φi ) with the same basis
B, but different parameters c. The way ACE.jl works, is by taking full ad-
vantage of the typecasting and multiple dispatching Julia offers. An ACE linear
model (LinearACEModel) is a structure that contains 3 elements, the ACE basis
(B), the parameters of the model (θφ ), and an evaluator object which exists only
for dispatching. This structure has 5 important helper functions: set params()
(mutates the structure by setting the parameters), evaluate() (evaluates the site
energy E(Ri )), grad params() (returns the derivative according to parameters),
grad config() (derivative according to {ri }Ji=1 ) and grad params config() (mixed
derivative according to θφ and {ri }Ji=1 ). To allow for multiple properties, we had
to write the same functions, but dispatching on the type of θφ .

4.4.2 Forward pass


The next step was defining evaluation of the site energies E(Ri ) = F(φ(f (Ri ))),
following the Flux.jl framework. We start by creating a structure called Lin-
ear ACE, to hold both the parameters θ and a LinearACEModel. Then we set the

37
forward pass of this layer to simply call evaluate() on a configuration f (Ri ). This
layer is then equivalent to φi . It can be composed with a nonlinearity F to produce
site energies. F can be a user specified function or could be a Flux.jl layer. We
can then define F ◦ φ ◦ f through Chain(), a composition function in Flux.jl.
This creates a structure that holds θ and evaluates E. We usually call this object a
model. The gradients are then computed by taking the derivative of model() with
respect to {ri }Ji=1 .
To obtain total energy E and forces F we rely on two functions from JuLIP.jl,
energy() and forces(). These where extended in ACEflux.jl to support a Flux-
Potential, which is a structure containing a model and a cutoff radius rcut . energy()
is (4.2), with x → F(φ(x) being model() and f using the rcut saved in the Flux-
Potential. forces() uses the pullback of f to add and substract contributions of the
gradients of each atom according to the following equation:

∂E(R) X ∂E(R)
Fi = − + (4.20)
∂ri ∂rj
{j|rij <rcut }

where we use the negative gradient centered at i, like before, but now we add the
contributions of the gradients of atoms inside the atomic neighbourhood {j|rij <
rcut }, 1 ≤ j ≤ J}. The set of neighbours is computed with a function similar to
f , and the gradients are computed using Zygote.jl.
These functions can then be placed inside a loss function to optimize. In our
case, the final forward pass for (3.8) is given as follows:

loss(pot, R, y) = αE ∗ (energy(pot, R) − yE )2 + (4.21)

αF ∗ sum(sum((forces(pot, R) − yF )2 ))

with pot a FluxPotential object, R a simple structure, and y the corresponding


DFT and force observations.

4.4.3 Making models differentiable


For the actual implementation of gradients, we decided to rely on Zygote.jl.
ChainRules.jl and Zygote.jl work together to generate an automatic dif-

38
ferentiation framework. Simply put, ChainRules.jl stores pullbacks for dif-
ferent functions (called rrules) and Zygote.jl reads a function and calls the
necessary pullbacks in the proper order. However, Zygote.jl has reduced per-
formance when differentiating complex functionals, for example φi , which makes
differentiation slow and, in some rare cases, impossible. Therefore, we imple-
mented custom pullbacks. Defining pullback functions with efficient gradient com-
putations allows our codes to be fully differentiable efficiently.
An rrule is a ChainRules.jl function that dispatches on the type of h and
computes both the value of h(θ) (the forward pass) and the pullback ωθT (g, h). We
had to define a custom rrule for the derivative of forces with respect to parameters
since Zygote.jl does not support second derivatives. We defined a helper func-
tion adj evaluate as the pullback of Linear ACE. Then we defined two rrules one
that simply called adj evaluate, and one that computed its pullback. This second
rrule returned the second order derivatives.
Using Zygote.jl provides great flexibility. Even though we provide some
custom pullbacks, Zygote.jl has built-in functionality to differentiate a great
variety of functions. This allows us to compose several functions on top of en-
ergy() and forces, without defining rrules for them, which allows users to define
the loss functions without having to implement their derivatives. Furthermore, the
rrules that we define allow us to optimize the derivatives. For example, in Section
3.2, we saw how one could implement efficient derivatives of the forces, and by
defining the rrule of forces() explicitly, we ensure efficient computation.
While Zygote.jl is crucial for differentiation and training of parameters,
Flux.jl is not. Currently Flux.jl manages the parameters and creates the
Linear ACE layer. However, one could manage the parameters manually with
get params() and create Linear ACE as a simple structure. The original motiva-
tion to make ACE.jl compatible with Flux.jl was to allow for the immediate
use of machine learning utilities in our models, which would allow us to implement
models with θF . At the time, it seemed that moulding our codes to fit this frame-
work would allow for great flexibility and connection to well tested and maintained
codes. However, in hindsight, these same features could have been implemented in
comparable time without the use of Flux.jl. As we will see in the next chapter,
the current path of ACE models is to allow for even more general nonlinearities.

39
With this goal in mind, it is likely that the framework enforced by Flux.jl will
be too restrictive. Nonetheless, these ideas are still novel, and there is currently no
clear best path.

4.5 Training

4.5.1 Multiprocessing
To speed up simulations, we parallelize computations among different workers.
Since most of the cost of computation is the gradient evaluation, we decided to par-
allelize across training points (R, y) ∈ (R, Y ). Similarly, the time spent sending
and retrieving data from available cores is small compared to the time of a gradient
evaluation. Therefore we settled on a multiprocessing implementation, in contrast
to multithreading. We also evaluated using a GPU to perform optimization, how-
ever our matrices are very sparse, because of the symmetries on the (n, l, m), and
there is not enough support for sparse operations in GPUs in Julia.
We begin by dividing the training data (R, Y ) into subsets (Rρ , Yρ ) where
ρ ∈ [1, 2, ..., ρc ] and ρc is the number of processes available minus 1. This is so that
we have a main process to feed the others and take optimization steps. The main
process constructs a model, and sets random starting parameters θ. This model is
then shared to all cores ρ, along with their corresponding (Rρ , Yρ ). The simulation
then starts by sharing a starting θ0 with all the processes, which subsequently set
their local model’s parameters to θ0 . Once done, each process computes its piece
of the loss function and its gradient. Then all processes send their current losses
and gradients to the main process, which adds them together. This is equivalent to
splitting the summation over T in 3.7 and 3.10 into ρc parts, and then adding them
together. The main process then adds the regularization and its gradient to create
the total loss and gradient. These are then used to take an optimization step, and
progress is logged into JLD files through JLD.jl.

4.5.2 Interacting with optimization packages in Julia


Since we adopted the Flux.jl framework, we can use any of its built-in optimiz-
ers. These include gradient descent, Nesterov’s momentum descent, and several

40
versions of the ADAM algorithm. However, we wanted to use BFGS to optimize
the Finnis-Sinclair model, and this method is not contained in Flux. Optim.jl
offers BFGS, but we cannot simply plug our codes into it because of the non-
standard structures in which Zygote.jl stores gradients. When taking gradients
according to the attributes of our layer, Zygote.jl returns a Grads(...) object,
which is a dictionary with the parameters as keys and the gradients as values, but
Optim.jl expects the gradients in a flattened array. In order to make this work,
we defined flattening and reshaping functions for both the gradients and the pa-
rameters. The main process performs these operations before and after taking an
optimization step. We could theoretically use any optimizer in a similar fashion, as
long as we can reshape the parameters and gradients accordingly.

4.6 Example
We will now go over an example of the functionality of ACEflux.jl. We begin
by importing the necessary packages.
using ACE, ACEflux, Zygote, Flux, ACE, StaticArrays, ASE, JuLIP
using Zygote: gradient

We create a random configuration for testing.


R i = ACE.ACEConfig([ACE.State(rr=rand(SVector{3, Float64})) for = 1:10])

Recall that a nonlinear model is of the form (4.2), then the function call to construct
a model for φi is
phi = Linear ACE(max deg, cor order, num props)

where Linear ACE() takes a maximum degree of the polynomial, the correlation
order N , and the number of properties P to evaluate. Calling phi(R i), which
will return P atomic properties. To add a nonlinearity, for example a Finnis-
Sinclair like embedding, we simply compose the nonliearity with phi(). We use
Flux.jl’s chain():
FS(phi) = phi [1] − sqrt (abs(phi [2]) + 1/100) − 1/10
E i = Chain(phi, GenLayer(FS))

41
where GenLayer creates a Flux.jl structure to surround the nonlinear embed-
ding. E i now represents the function F ◦φi . We can do more complex embeddings
with trainable parameters:
E i = Chain(phi, Dense(2,7), Dense(7,2), GenLayer(FS), sum)

where we use several Dense layers and a Finnis-Sinclair model. To compute gradi-
ents, we simply need to call the gradient() function in Zygote.jl. There are two
ways to call this function, with explicit and with implicit parameters. The syntax is
a little confusing at first, but the implicit parameters section in Zygote’s docu-
mentation is very helpful [28]. For an explicit call, we simply call grads(function,
parameters) with a function and the parameters we want to differentiate according
to
g configs = gradient( E i , R i)

In this case, we are differentiating site energy with respect to configurations, which
we use to calculate forces.
Now, to get a gradient according to the parameters, we need to do it implicitly
because the parameters are defined implicitly in a Flux.jl layer. This is identical
to the way Flux.jl is differentiated, so their documentation could prove helpful
[30]. The function params() is Flux.jl native and allows us to extract the parame-
ters of the model. To get the derivative of site energy according to its parameters,
we would call:
g params = gradient(()−>E i(R i), params(E i))

E i() and g configs() can be combined into a loss function or composed with
more complex functions. The advantage of utilizing Zygote.jl and Flux.jl
is that all these outer functions can be differentiated out of the box. However,
Zygote.jl will still call our custom pullbacks when necessary, meaning the
derivative will leverage the efficient adjoints. However, Zygote.jl does not dif-
ferentiate functions with object mutation, so keep this in mind when creating these
functions.
Even though this functions exist, the goal is not to differentiate site energies
by hand, but rather call energy() and forces(),and for that we need to create a

42
FluxPotential(model, cutoff).
pot = FluxPotential ( E i , 6.0)

Now we create an atoms object to represent R, see JuLIP or ACEatoms docu-


mentation.
at = bulk (: Cu, cubic=true) * 3
rattle !( at ,0.6) #adding random noise

We can now evaluate the energy and forces with our potential. This is the same
syntax JuLIP and ACEatoms use.
e = energy(pot, at)
f = forces(pot, at)

To calculate the gradients, we simply do an implicit gradient() call like before.


ge = gradient (() −>energy(pot, at), params(E i))
gf = gradient (() −>sum(forces(pot, at)ˆ2), params(E i))

Now we can create a loss function with e and f as in (4.21) and compute its deriva-
tive via
gl = gradient (() −>loss(pot,R,y), params(E i))

Once we have this we can flatten the gradients like we mentioned in Section
4.5.2 and plug into any optimizer. For Section ?? we used BFGS in Optim.jl
and used ACEatoms.jl to create R and y from the imported datasets [37].
We implemented multiprocessing by calling @spawnat and @everywhere from
Distributed.jl.

43
Chapter 5

Conclusion and Outlook

5.1 Conclusion
Machine learned interatomic potentials continue to be a hot area of research, and
nonlinear models might be crucial in the future. In this thesis, we provided an
overview of the atomic cluster expansion as an atomic descriptor. We showcased
results on the test data sets [37] for silicon, copper and molybdenum. To that
end, we explored the parameter space of the weighting of energy versus forces
and the tolerance for RRQR. Furthermore, we presented accuracy as a function of
the number of parameters κ. We then extended this model by composing several
linear models inside a nonlinearity. We demonstrated efficient evaluation of the
gradients and comprehensively explained the Julia implementation of the models.
We showed results for silicon, copper and molybdenum as a proof of concept. The
codes are still in the experimental phase, and more work is needed for them to be
competitive.

5.2 Outlook
In chapter 4 we went over the current implementation of nonlinear ACE models
in Julia, which are still experimental. Much of the code had to be implemented
manually, especially the gradients, which made our implementation efficient and
flexible. In the future, we strive to capitalize on the generality of ACE.jl, rather

44
than focusing on a specialized type of models. Moving in this direction, the im-
plementation of gradients through Zygote.jl will stay as the primary way to
differentiate models, but the Flux.jl wrapper is subject to change. This is be-
cause a Linear ACE layer is quite a restrictive structure. We treat the computation
of the basis B as a black box. This allows for the nonlinearities implemented in this
thesis but restricts the implementation of other types of physics-inspired nonlinear-
ities. A few examples are (i) parametrization of the radial basis, (ii) composition
of layers, and (iii) changing architecture.
To understand how these nonlinearities would work, we need to consider the
basis evaluation in layers. Let us start with the input layer R = {R(1) , ..., R(T ) }
where we have T atomic environments defined by their atomic positions R(t) =
{r1 , ..., rJ }. One can extend the set R(t) to hold more properties. In fact this is
already implemented in ACE.jl. A user can define an input

R(t) := ({r1 , ..., rJ }, W1 , ..., WK ),

where Wk can be other atomic features like magnetic properties, spin or even the
output of another nonlinear ACE model. The treatment of features Wk would have
to be defined in each layer and is already being implemented.
The input layer is then fed into a one-particle basis ϕ using f . We can compare
the action of f to a kernel (sometimes called a filter) in a convolutional neural
network (figure 5.1). ϕ is then passed to an atomic basis Ainlm , which gives the
product basis Ainlm , and finally an atomic property φi . We do this for every node
i in the Jt starting nodes for every R(t) in the training set. This is finally fed into
a nonlinearity, and then everything is summed over to generate an energy (figure
5.2).
To calculate the forces, we would need to backpropagate and derivate accord-
ing to rij . Then we can combine the forward pass and the backward pass in a loss
function to train energies and forces. This implementation is more flexible since
we now have every computation step as a standalone layer. For example, to imple-
ment (i), one simply needs to create a parametric structure to substitute ϕ, where
P contains parameters to train. For (ii), we would simply need to compose the
structures, and similarly in (iii). With all the steps defined as layers, we can com-

45
1 2

3 4 1 1
5

2 2
6
7 8
3 3

4 4
9
5 5

1 2 6 6

3 4 5
7 7

6 8 8
7 8
9 9

Figure 5.1: Example of f on a configuration for two atoms i = {7, 4}. On the
right we see the atomic environment, and on the left we see the action
of f on the input layer.

{R(1) , ..., R(T ) }

ϕnlm = Pn Ylm

Ainlm

Ainlm

φi

Figure 5.2: ACE model divided into layers.

46
pose them however we want. We could compose several atomic properties φi , or
neural networks, or even switch layers like the one-particle basis for other descrip-
tors. This new framework could be extremely general, but will require changes
in the current Julia packages. We will need to implement the different layers as
structures as well as a way to manage their parameters and the required pullbacks
to differentiate through them with Zygote.jl.

47
Bibliography

[1] R. Balestriero and R. Baraniuk. Fast jacobian-vector product for deep


networks, 2021. URL https://arxiv.org/abs/2104.00219. → page 32
[2] A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi. Gaussian
approximation potentials: The accuracy of quantum mechanics, without the
electrons. Physical Review Letters, 104(13), Apr 2010. ISSN 1079-7114.
doi:10.1103/physrevlett.104.136403. URL
http://dx.doi.org/10.1103/PhysRevLett.104.136403. → pages 7, 8

[3] J. Behler and M. Parrinello. Generalized neural-network representation of


high-dimensional potential-energy surfaces. Phys. Rev. Lett., 98:146401,
Apr 2007. doi:10.1103/PhysRevLett.98.146401. URL
https://link.aps.org/doi/10.1103/PhysRevLett.98.146401. → page 6

[4] N. Bernstein, G. Csányi, and V. L. Deringer. De novo exploration and


self-guided learning of potential-energy surfaces. npj Computational
Materials, 5(1), 2019. doi:10.1038/s41524-019-0236-6. → page 1
[5] J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah. Julia: A fresh
approach to numerical computing. SIAM review, 59(1):65–98, 2017. URL
https://doi.org/10.1137/141000671. → page 31

[6] E. Cances, C. J. Garc´ıa-Cervera, and Y. A. Wang. Density functional


theory: Fundamentals and applications in condensed matter physics. BIRS
workshop, Jan 2011.
doi:https://www.birs.ca/workshops/2011/11w5121/report11w5121.pdf. →
page 2
[7] M. A. Caro, V. L. Deringer, J. Koskinen, T. Laurila, and G. Csányi. Growth
mechanism and origin of high sp3 content in tetrahedral amorphous carbon.
Phys. Rev. Lett., 120:166101, Apr 2018.
doi:10.1103/PhysRevLett.120.166101. URL
https://link.aps.org/doi/10.1103/PhysRevLett.120.166101. → page 1

48
[8] A. R. Christoph Ortner. Aceflux.jl. https://github.com/ACEsuit/ACEflux.jl,
2022. → page 36
[9] D. P. K. Christoph Ortner. Aceatoms.jl.
https://github.com/ACEsuit/ACEatoms.jl, 2022. → page 35

[10] G. C. Christoph Ortner, James Kermode and et al. Acesuit.


https://github.com/ACEsuit, 2022. → page 35

[11] J. G. D. P. K. T. K. C. v. d. O. F. E. F. P. Christoph Ortner, James Kermode


and et al. Julip.jl. https://github.com/JuliaMolSim/JuLIP.jl, 2022. → page 35
[12] M. S. A. R. C. v. d. O. Christoph Ortner, Liwei Zhang. Ace.jl.
https://github.com/ACEsuit/ACE.jl, 2022. → page 35

[13] G. Csanyi. Machine learning the quantum mechanics of materials and


molecules. URL https://www.youtube.com/watch?v=ZjBff6-5amo. → page 4
[14] X. Dai, Y. Kong, J. Li, and B. Liu. Extended finnis–sinclair potential for bcc
and fcc metals and alloys. Journal of Physics: Condensed Matter, 18:4527,
04 2006. doi:10.1088/0953-8984/18/19/008. → pages 5, 23
[15] R. Drautz. Atomic cluster expansion for accurate and transferable
interatomic potentials. Phys. Rev. B, 99:014104, Jan 2019.
doi:10.1103/PhysRevB.99.014104. URL
https://link.aps.org/doi/10.1103/PhysRevB.99.014104. → page 23

[16] G. Dusson, M. Bachmayr, G. Csanyi, R. Drautz, S. Etter, C. van der Oord,


and C. Ortner. Atomic cluster expansion: Completeness, efficiency and
stability, 2021. → pages iii, 9, 12, 13, 27
[17] F. Ercolessi and J. B. Adams. Interatomic potentials from first-principles
calculations: The force-matching method. Europhysics Letters (EPL), 26(8):
583–588, Jun 1994. ISSN 1286-4854. doi:10.1209/0295-5075/26/8/005.
URL http://dx.doi.org/10.1209/0295-5075/26/8/005. → page 6
[18] M. R. Fellinger, A. M. Z. Tan, L. G. Hector, and D. R. Trinkle. Geometries
of edge and mixed dislocations in bcc fe from first-principles calculations.
Phys. Rev. Materials, 2:113605, Nov 2018.
doi:10.1103/PhysRevMaterials.2.113605. URL
https://link.aps.org/doi/10.1103/PhysRevMaterials.2.113605. → page 1

[19] M. Gu and S. C. Eisenstat. Efficient algorithms for computing a strong


rank-revealing qr factorization. SIAM Journal on Scientific Computing, 17
(4):848–869, 1996. doi:10.1137/0917055. → page 17

49
[20] M. Hellström, V. Quaranta, and J. Behler. One-dimensional vs.
two-dimensional proton transport processes at solid–liquid zinc-oxide–water
interfaces. Chemical Science, 10(4):1232–1243, 2019.
doi:10.1039/c8sc03033b. → page 1

[21] P. Hohenberg and W. Kohn. Inhomogeneous electron gas. Physical Review,


136(3B), 1964. doi:10.1103/physrev.136.b864. → pages 2, 3
[22] J.-W. Jiang and Y.-P. Zhou. Parameterization of stillinger-weber potential for
two- dimensional atomic crystals. Handbook of Stillinger-Weber Potential
Parameters for Two-Dimensional Atomic Crystals, Dec 2017.
doi:10.5772/intechopen.71929. URL
http://dx.doi.org/10.5772/intechopen.71929. → page 5

[23] S. O. M. S. E. J. Ken Ho, Sheehan Olver. Lowrankapprox.jl.


https://github.com/JuliaMatrices/LowRankApprox.jl, 2021. → pages 17, 18

[24] B.-J. Lee, W.-S. Ko, H.-K. Kim, and E.-H. Kim. The modified
embedded-atom method interatomic potentials and recent progress in
atomistic simulations. Calphad, 34(4):510–522, 2010. ISSN 0364-5916.
doi:https://doi.org/10.1016/j.calphad.2010.10.007. URL
https://www.sciencedirect.com/science/article/pii/S0364591610000817. →
page 5
[25] R. LeSar. Introduction to computational materials science: Fundamentals to
applications. Cambridge University Press, 2016. → pages 3, 4, 5
[26] Y. Lysogorskiy, C. van der Oord, A. Bochkarev, S. Menon, M. Rinaldi,
T. Hammerschmidt, M. Mrovec, A. Thompson, G. Csányi, C. Ortner, and
R. Drautz. Performant implementation of the atomic cluster expansion
(pace): Application to copper and silicon, 2021. → page 27
[27] C. L. Mike J Inness, Michael Abbott and et al. Zygote.jl.
https://github.com/FluxML/Zygote.jl, 2022. → page 36

[28] C. L. Mike J Inness, Michael Abbott and et al. Zygote.jl documentation.


https://fluxml.ai/Zygote.jl/latest/#Explicit-and-Implicit-Parameters-1, 2022.
→ page 42
[29] D. G. Mike J Inness, Carlo Lucibello and et al. Flux.jl.
https://github.com/FluxML/Flux.jl, 2022. → page 36

[30] D. G. Mike J Inness, Carlo Lucibello and et al. Flux.jl documentation.


https://fluxml.ai/Flux.jl/stable/models/basics/, 2022. → page 42

50
[31] J. Nocedal and S. J. Wright. Numerical optimization. Springer, 2006. →
page 28

[32] A. R. Oganov, C. J. Pickard, Q. Zhu, and R. J. Needs. Structure prediction


drives materials discovery. Nature Reviews Materials, 4(5):331–348, 2019.
doi:10.1038/s41578-019-0101-8. → page 1

[33] C. Ortner. Ace1pack.jl. https://github.com/ACEsuit/ACE1pack.jl, 2022. →


pages 18, 35

[34] J. M. W. Patrick Kofod Mogensen and et al. Optim.jl.


https://github.com/JuliaNLSolvers/Optim.jl/, 2022. → page 36

[35] A. V. Shapeev. Moment tensor potentials: A class of systematically


improvable interatomic potentials. Multiscale Model. Simul., 14:1153–1173,
2016. → pages 3, 8

[36] M. Zgubic. Chainrules.jl documentation.


https://juliadiff.org/ChainRulesCore.jl/stable/, 2022. → pages 33, 35

[37] Y. Zuo, C. Chen, X. Li, Z. Deng, Y. Chen, J. Behler, G. Csányi, A. V.


Shapeev, A. P. Thompson, M. A. Wood, and et al. Performance and cost
assessment of machine learning interatomic potentials. The Journal of
Physical Chemistry A, 124(4):731–745, 2020.
doi:10.1021/acs.jpca.9b08723. → pages iii, 18, 43, 44

51

You might also like